[Babase] The REPSTATS/CYCSTATS design
Karl O. Pinc
babase@www.eco.princeton.edu
Wed, 09 Feb 2005 05:27:08 +0000
Hi,
Here it is again. I've made adjustments where requested,
although did not rework text that wasn't clear and also
didn't bear on calculating REPSTATS/CYCSTATS. So,
there's still some 'old' stuff to be thought-through
or re-written. And there's some 'new' stuff that
might need improved wording. If you've got better
wording send it back and we'll stick it in if everybody
likes it.
Here's some more comments I made on the changes while
making them:
Notes on the changes in the documentation.
The SEEDS table is now called GAPS.
CYCSTATS statuses are now M, S, O, D
(instead of M, F, O, L)
Pregnant state in REPSTATS now
includes the D date but excludes the
birthdate, lactating state now
includes the birthdate. Previously
the pregnant state included the birthdate
and lactating state did not start until
the day after the birthdate.
I don't know the biology, but this gives
us an overall consistency where each
state always includes the 'transition'
that marks the beginning of the
state.
I've removed the Cids and Cide columns
from the REPSTATS and CYCSTATS tables.
These can be computed from Din (days in)
and Dr (days remaining) and the Date.
I want to keep it simple. I can make
views or functions or whatever to support
any lost functionality if necessary.
I have not removed REPSTATS.Pid. It's
computable also, but I imagine it might
actually be used all the time and would
rather not have the computational load.
Note the last comment in the description
of the CYCSTATS table which remarks on the
interactions between missing end transitions
and days in calculations. This is unique
to the swelling (follicular phase 2) state. Does
something need to be done?
Columns cannot contain NULL values unless
otherwise noted.
I'd still like to go to something like
CYCPOINTS, where each transition date
has it's own row. At the same time I'd
probably want to move the early and late
dates into a separate table, just to get
them out of the way because they're not
really used, or at least mostly not
there at all. But not until after we
get something working and there's time.
Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
--------------<snip>-----------------------
PREGS (Pregnancies)
This table records pregnancies. It contains one row for each recorded
pregnancy. A pregnancy is defined to be an event occurring to some
mother, a single pregnancy could result in more than one fetus. The
only time there will not be an associated BIOGRAPH row is when the
pregnancy is still in progress, otherwise there will always be a
BIOGRAPH row which records the progeny of the pregnancy.
The conception sexual cycle dates (Conceive) of the pregnancy should
not be later than the birth date value (Birth) of the associated
BIOGRAPH row. The birth date value (Birth) of the associated BIOGRAPH
row should not be later than the resumption of cycling date values
(Resume.)
The sequence number (Seq on CYCLES) of the sexual cycle immediately
following pregnancy (Resume) should always be exactly one more than
the sequence number of the sexual cycle associated with conception
(Conceive). The female associated with the conception sexual cycle
(Conceive) should be the same as the female associated with the sexual
cycle immediately following pregnancy (Resume). There should be no
overlap of pregnancy time periods, from conception date to birth date
or, if known, resumption of sexual cycling date, among the pregnancies
associated with a particular female.
Data Entry Rules
No special program supports the maintenance of this table.
Data Element Descriptions
Pid
The contents of this column uniquely identifies the pregnancy record.
The Pid is the mothers Sname followed by the parity. Because the Pid
is only used to identify the record, it is not necessary to change the
Pid just because the parity of the pregnancy is found to have changed.
In general, once a unique Pid is established, it should not be
changed. When retrieving data from this table the safe approach is to
assume nothing about the contents of this column except that it will
uniquely identify a pregnancy. Although it is true that the
non-numeric portion of the PID is the mother's Sname, care must be
taken because, for instance, an Sname may (and at least one does) end
with a space. The safe way to obtain the bearer of the pregnancy is
to find the female associated with the ovulation by joining PREGS.
Conceive with CYCLES.Csid to find CYCLES.Sname. Likewise, the Parity
column should always be used to obtain a meaningful parity value.
Parity
(This use of 100, etc. in this column is under review.)
The cardinality of the pregnancy. 1 for a female's first pregnancy, 2
for a female's second pregnancy, and so forth. There should be no
'gaps' in the pregnancies, sequenced by Parity, of any female. When
the first pregnancy is known, the Parity sequence begins with 1. When
the first pregnancy is not known, the Parity sequence begins with 101.
Conceive
The information recorded on the sexual cycle of the conception which
initiated the pregnancy. This is the Cid of a CYCLES row of the
mother. The associated CYCLES row should contain a Ddate value to
record the date of conception. The dates of the associated CYCLES
record, when dates are present, should be should be between the sexual
maturity date and the death date of the mother. This column should
contain a unique datum.
When the date of conception is estimated because there is no sexual
cycle data, the conception date recorded should be 178 days before the
recorded birthday.
Resume (NULL allowed)
The resumption of cycle information of the first cycle following the
pregnancy. This is the Cid of a row in CYCLES. The associated CYCLES
record will usually not have a resumption of menses date. This column
may be NULL for those cases when resumption of cycle information is
not known. When this column is not NULL, it should contain a unique
datum.
CYCLES (Female sexual cycles)
This table records information on the sexual cycle of the females. It
contains one row for every record of a female's sexual cycle. This
includes one row for those cases in which we know that the female had
a cycle, but we don't know anything else about the cycle. This case
occurs when we know that a female bore a specific child, but little
else about when this occurred. These children will have a pregnancy
record, which will be associated with the sexual cycle of conception,
and the CYCLES row will contain an estimated conception date. (See
the PREGS.Conceive documentation above.) The table also includes rows
that record estimated conception dates. Each of these rows are
associated with a pregnancy record and can be classified as estimated
by examining the earliest possible deturgesence date (Eddate) and
latest possible deturgesence date (Lddate) values.
Because every female that has reached sexual maturity should have a
maturation date (Matured) in MATUREDATES, with a corresponding row for
the first sexual cycle, the sexual cycle with a sequence (Seq) of 1
should be the female's first sexual cycle and should not, in general,
have a onset of menses date (Mdate).
The combination of Sname and each of the three types of dates should
be unique. The combination of Sname and Seq should be unique. Each
row should contain data on either an onset of menses, onset of
turgesence, or onset of deturgesence.
Estimated onset of menses dates (Mdates) are calculated when there is
not an observed onset of menses date, and the estimated onset of
menses date falls within a period of continuous observation, and there
is a prior deturgesence date (Ddate) in the immediately preceding
sexual cycle. (See GAPS.) The calculated Mdate falls on the 13th
day after the preceding deturgesence date, unless the turgesence date
(Tdate) falls within this time period in which case the calculated
Mdate is set to the day of the turgesence date. Estimated onset of
menses dates never have associated early and late dates (Emdate and
Lmdate.)
A cycle's onset of menses date (Mdate) should not be after the onset
of turgesence date (Tdate). A cycle's onset of turgesence date
(Tdate) should be before the onset of deturgesence date (Ddate).
The onset of menses date (Mdate), onset of turgesence (Tdate), and
onset of deturgesence (Ddate) of any record should not be equal to or
between any pair of the onset of menses dates (Mdate), onset of
turgesence dates (Tdate), or onset of deturgesence dates (Ddate) of
any other recorded cycle for an individual.
The earliest possible onset of menses (Emdate), onset of turgesence
(Etdate), and onset of deturgesence (Eddate) columns should not be
after the onset of menses (Mdate), onset of turgesence (Tdate), and
onset of deturgesence (Ddate) columns, respectively. The latest
possible onset of menses (Lmdate), onset of turgesence (Ltdate), and
onset of deturgesence (Lddate) columns should not be before the onset
of menses (Mdate), onset of turgesence (Tdate), and onset of
deturgesence (Ddate) columns, respectively.
Data Entry Rules
This table is updated by supplying text files to the Cycle Update
Program, in the usual fashion.
Data Element Descriptions
Cid
A numeric identifier unique to each menses/turgesence/deturgesence
cycle. This is used to reference the cycle elsewhere in the database.
Sname
The short name of the female. This column should contain the Sname of
a female in BIOGRAPH.
Series
A female's rows that have the same series number are part of a
continuous series of observations that, presumably, have produce a
complete record all the female's sexual transition events which
occurred throughout the observation period. When a female first comes
under observe ration her CYCLES rows have a Series value of 1. If the
female can no longer be observed the series ends. When observation of
the female resumes the Series number is incremented and the female's
subsequent CYCLES rows are given a Series value of 2. Additional
breaks in observation result in further increments. The initiation
and cessation of observation should be recorded in the GAPS table
and there should be a row there to record every Series endpoint,
menarche and disappearance or death excepted.
The value in this column is not user maintainable, the system
automatically generates the value.
Seq
Sequence. The first recorded sexual cycle of a female has a Seq value
of 1, the second a value of 2, etc. There are no gaps in the sequence
numbers assigned to a female. Even when records of cycles are
missing, the first recorded cycle after the missing period has a
sequence one greater than the last recorded cycle before the missing
period. The sequence number recorded corresponds to the Nth sequence
of the female only when the Series number is 1 and the turgesence date
(Tdate) of the cycle with a Seq value of 1 is the same as the female's
maturity date (MATUREDATES.Matured.)
The value in this column is not user maintainable, the system
automatically generates the value.
Mdate (NULL allowed)
Onset of menses date. Only one of these date values may be before the
individual's onset of first menses date (Matured), and all should be
on or before the individual's Statdate. This column may be NULL when
this date is not known or cannot be estimated.
Emdate (NULL allowed)
Earliest possible onset of menses date. Only one of these date values
may be before the individual's puberty date (Matured), and all should
be on or before the individual's Statdate. When the menses is
observed, this value should be the same as the Mdate value. This
column may be NULL when this date is not known, either because the
Mdate is not known or because the accuracy of the Mdate is not known.
Lmdate (NULL allowed)
Latest possible onset of menses date. Only one of these date values
may be before the individual's puberty date (Matured), and all should
be on or before the individual's Statdate. When the menses is
observed, this value should be the same as the Mdate value. This
column may be NULL when this date is not known, either because the
Mdate is not known or because the accuracy of the Mdate is not known.
EstM
A boolean value indicating whether or not the M date was
the result of observation or whether it was calculated from the Ddate
value. TRUE means the value was calculated, FALSE means it was not.
Tdate (NULL allowed)
Onset of turgesence date. This date should be between the
individual's puberty date (Matured) and Statdate, inclusive. This
column may be NULL when this date is not known.
Etdate (NULL allowed)
Earliest possible onset of turgesence date. Only one of these date
values may be before the individual's puberty date (Matured), and all
should be on or before the individual's Statdate. When the onset of
turgesence is observed, this value should be the same as the Tdate
value. This column may be NULL when this date is not known, either
because the Tdate is not known or because the accuracy of the Tdate is
not known.
Ltdate (NULL allowed)
Latest possible onset of turgesence date. This date should be between
the individual's puberty date (Matured) and Statdate, inclusive. When
the onset of turgesence is observed, this value should be the same as
the Tdate value. This column may be NULL when this date is not known,
either because the Tdate is not known or because the accuracy of the
Tdate is not known.
Ddate (NULL allowed)
Date of onset of deturgesence. This date should be between the
individual's puberty date (Matured) and Statdate, inclusive. This
column may be blank when this date is not known.
Eddate (NULL allowed)
Earliest possible onset of deturgesence date. Only one of these date
values may be before of the individual's puberty date (Matured), and
all should on or before the individual's Statdate. When the onset of
deturgesence is observed, this value should be the same as the Ddate
value. This column may be NULL when this date is not known, either
because the Ddate is not known or because the accuracy of the Ddate is
not known.
Lddate (NULL allowed)
Latest possible onset of deturgesence date. This date should be
between the individual's puberty date (Matured) and Statdate,
inclusive. When the onset of deturgesence is observed, this value
should be the same as the Ddate value. This column may be NULL when
this date is not known, either because the Ddate is not known or
because the accuracy of the Ddate is not known.
GAPS
Records of the initiation and cessation of continuous periods of
observation during which a female's cycles are, presumed, to all have
been recorded. This table contains one row for each female for each
initiation or cessation of a continuous period of observation.
Rows with a Code value of 'S' or 'P', that mark the beginning of
observational periods or that represent isolated single days of
observation must have a value in the State column. All other rows,
those with a code of 'E' that represent the end of an observational
period, must not have a value in the State column.
This table is used to construct the reproductive state tables,
REPSTATS and CYCSTATS.
The combination of Sname and Date is unique.
Data Entry Rules
We'll figure something out.
Data Element Descriptions
Gapid
A number which uniquely identifies each row.
Sname
The short name of the female. This column should contain the Sname of
a female in BIOGRAPH. This column should not be blank.
Code
What kind of endpoint the date records. Legal values are 'S' (Start),
the date is the start of a period of observation; 'E' (End), the date
is the end of a period of observation; 'P' (point), the date is an
isolated observation that belongs with no other observations, it is
both a start and an end of an observational period.
Date
The date upon which observations began or ended. Observations were
made on the given date.
State (NULL allowed)
The state of the female's sexual cycle on the given date. Valid
values are: 'M', menses, follicular phase 1 -- Mdate (inclusive) to
Tdate (exclusive); 'S', swelling, follicular phase 2 -- Tdate
(inclusive) to 6 days prior to Ddate (inclusive); 'O', ovulating -- 5
days prior to Ddate (inclusive) to Ddate (exclusive); 'D',
deturgesence, luteal -- Ddate (inclusive) to Mdate (exclusive); 'P'
pregnant -- Ddate (inclusive) to birth (exclusive); 'L', lactating --
birth (inclusive) to Mdate (exclusive).
Must not be NULL when Code is 'S' or 'P', must be NULL when code is
'E'. See discussion in the table description above.
REPSTATS
(REProductive STATus) Contains one row per female per day for every
day during continuous observation periods from date of menarche
through date of death (inclusive). When menarche is unobserved then
REPSTATS rows begin on a beginning of observation date. Likewise, the
cessation or resumption of observation interrupts or resumes the
contiguous series of the females REPSTATS' dates. (See GAPS.) While
the individual is alive the last date is either the BIOGRAPH.Statdate
or the last recorded sexual cycle endpoint, which ever is later. End
of cycle dates are (exclusive of both) M (menses onset) date or
end-of-pregnancy date. The day-by-day nature of this table makes it
easy to correlate reproductive cycle information with other events.
Note that because of gaps in the observational record some sexual
cycles may not be recorded, or may be partially recorded. In these
cases the Din and Dr columns are NULL. (See below.)
See CYCSTATS for more fertility detail.
Data Entry Rules
This table is not maintainable by the user. The system constructs
this table automatically from the data values recorded in the CYCLES
table, the BIOGRAPH.Status and BIOGRAPH.Statdate columns, and the
GAPS table.
Data Element Descriptions
Rid
A unique number which serves to identify the row.
Date
The row records a female's reproductive state on this day.
Sname
The Sname identifying the female whose reproductive state is recorded.
(See BIOGRAPH.)
State
General reproductive state of the female on the given Date. The legal
values are: C (cycling), from (including) the T (turgesence onset)
date up to (but not including) the T date of the following cycle, or
in the case of pregnancy, up to (but not including) the D date. P
(Pregnant), from (including) the D (deturgesence onset) date up to
(but not including) the end-of-pregnancy date, date of birth,
abortion, or death. L (lactating), from (including) the
end-of-pregnancy date to (but not including) the next T date. Note
that post menopausal individuals have a state of C, or possibly L if
the last cycle resulted in a pregnancy.
Dins (NULL allowed)
(Days INto State) The number of days since the state started. The
first day of the state has a value of 1, the next a value of 2, etc.
This column is NULL when the system cannot determine when the state
began. This occurs when the beginning of the reproductive state
occurs during a period when the individual is not under regular
observation. (See GAPS.)
Dr (NULL allowed)
(Days Remaining) The number of days remaining in the state. The last
day of the state has a value of 0, the next to last day a value of 1,
etc. Note that the sum of Dins and Dr is always the total number of
days the cycle spent in the state.
This column is NULL when the system cannot determine when the state
ends. This occurs when the end of the reproductive state was not
observed due to cessation of regular observation. (See GAPS.) It
also occurs while the individual is alive and the state has not ended,
or rather when the observations of the state have not been entered
into the system. Finally, it occurs when the individual dies as it is
not known when the state would have ended.
Pid (NULL allowed)
(Pregnancy IDentifier) The Pid of the pregnancy associated with the
state. This value must be present when the state is P or L. There is
also a Pid value for those C cycles that result in pregnancy. (See
PREGS table.)
CYCSTATS
(fertility CYCle STATus) Contains one row per female per day, for
those days in REPSTATS where the REPSTATS Status is C (cycling.) This
is a day-by-day record of the details of the females' fertile cycles.
The day-by-day nature of this table makes it easy to correlate sexual
cycle information with other events. Where data on a portion of the
female's cycles is missing there will be "gaps" where there are no
rows for a female for a sequence of dates. Missing Mdate, Tdate, or
Ddate values in CYCLES cause gaps. When there is no data on entire
cycles, .i.e. some cycles are not recorded in CYCLES rows, there will
not be a gap, but there will be long cycles that span the entire
interval.
Note that post-menopausal individuals' final cycles will have a State
of D and a long duration, with the individual's date of death being
the last day of the cycle.
When the end of S (swelling, follicular phase 2) cycle state is not
known, that is Dr (days remaining in state) is NULL, some of the
computed Din (days into state) values may be skewed as the end of the
state is counted backward from the beginning of the D date, the next
observed transition marker. See the information on the calculation of
the O (ovulatory) state below.
Data Entry Rules
This table is not maintainable by the user. The system constructs
this table automatically from the data values recorded in the CYCLES
table, the BIOGRAPH.Status and BIOGRAPH.Statdate columns, and the
GAPS table.
Data Element Descriptions
Csid
A unique number which serves to identify the row.
Date
The row records a female's reproductive state on this day.
Sname
The Sname identifying the female whose reproductive state is recorded.
(See BIOGRAPH.)
State
Categorizes the period within the reproductive cycle. Legal values
are: M (menses, follicular phase 1), the M (onset of menses) date to
the day before the T (turgesence onset) date (inclusive of endpoints);
S (swelling, follicular phase 2), the T date through 6 days before the
D (deturgesence onset) date (inclusive of endpoints); O (ovulating),
from 5 days before the D date through the day before the D date
(inclusive of endpoints); D (deturgesing, luteal), from the D date
through the day before the M date (inclusive of endpoints).
Dins (NULL allowed)
(Days INto State) The number of days since the state started. The
first day of the state has a value of 1, the next a value of 2, etc.
This column is NULL when the system cannot determine when the state
began. This occurs when the cycle is the female's first cycle, as
there is no menses to begin the cycle, and likewise for the first
cycle after pregnancy. The cycle's starting date is also unknown when
it occurs during a period when the individual is not under regular
observation. (See GAPS.)
Dr (NULL allowed)
(Days Remaining) The number of days remaining in the state. The last
day of the state has a value of 0, the next to last day a value of 1,
etc. Note that the sum of Dins and Dr is always the total number of
days the cycle spent in the state.
This column is NULL when the system cannot determine when the state
ends. This occurs when the end of the reproductive state was not
observed due to cessation of regular observation. (See GAPS.) It
also occurs while the individual is alive and the state has not ended,
or rather when the observations of the state have not been entered
into the system. Finally, it occurs when the individual dies as it is
not known when the state would have ended.