[Babase] Cycles et-al documentation for review
Karl O. Pinc
babase@www.eco.princeton.edu
Sat, 05 Feb 2005 01:36:08 +0000
Hi Y'all,
Here's the design we've come up with for recording female
sexual cycle information, or those parts of it that people
(mostly) have to enter. This is what we'll be using to create
the REPSTATS and CYCSTATS tables.
This contains many parts of the orignal documentation, which
is wrong, or at least you're not using the system this way.
Stuff like:
---<snip>---
PREGS.Conceive
...
When the date of conception is estimated because there is no sexual
cycle data, the conception date recorded should be 178 days before the
recorded birthday.
---<snip>---
When in fact there's no recorded conception date
or
---<snip>---
Lmdate
... When the menses is
observed, this value should be the same as the Mdate value.
---<snip>---
When in fact again it's empty.
So, when reviewing this I don't want to worry about that stuff
but get the new stuff correct. I've marked
the new paragraphs added to old documentation with a *
so you can find them. Otherwise, please read all of the
docs on the SEEDS table as it's all new.
Please pay particular attention
to the codes I've selected and things like which day at the
endpoint is included and which is excluded.
You'll also want to pay attention to the names of new
columns/tables to make sure they're understandable
and sane as you'll be using them all the time.
I've also included documentation on the CYCPOINTS table.
We won't be going to this right now so we can get something
working quick. But I'm convinced this is the way to go
for the future and that it will significantly reduce the
amount of code I have to include in the system to validate
data, compute REPSTATS and CYCSTATS, and so forth.
Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
PREGS (Pregnancies)
This table records pregnancies. It contains one row for each recorded
pregnancy. A pregnancy is defined to be an event occurring to some
mother, even though we may not know who the mother is. A single
pregnancy could result in more than one fetus. The only time there
will not be an associated BIOGRAPH row is when the pregnancy is still
in progress, otherwise there will always be a BIOGRAPH row which
records the progeny of the pregnancy.
The conception sexual cycle dates (Conceive) of the pregnancy should
not be later than the birth date value (Birth) of the associated
BIOGRAPH row. The birth date value (Birth) of the associated BIOGRAPH
row should not be later than the resumption of cycling date values
(Resume.)
The sequence number (Seq on CYCLES) of the sexual cycle immediately
following pregnancy (Resume) should always be exactly one more than
the sequence number of the sexual cycle associated with conception
(Conceive). The female associated with the conception sexual cycle
(Conceive) should be the same as the female associated with the sexual
cycle immediately following pregnancy (Resume). There should be no
overlap of pregnancy time periods, from conception date to birth date
or, if known, resumption of sexual cycling date, among the pregnancies
associated with a particular female.
Data Entry Rules
This table is updated with the regular FoxPro data maintenance tools.
See section 2.0 in the Protocol for Data Management: Amboseli Baboon
Project document.
Data Element Descriptions
Pid
The contents of this column uniquely identifies the pregnancy record.
When the mother is known, the Pid is the mothers Sname followed by the
parity. When the mother is not known, the Pid is the Sname of the
individual that the pregnancy produced. Because the Pid is only used
to identify the record, it is not necessary to change the Pid just
because the parity of the pregnancy is found to have changed. In
general, once a unique Pid is established, it should not be changed.
When retrieving data from this table, do not assume anything about the
contents of this column except that it will uniquely identify a
pregnancy. Use the female associated with the ovulation (through
Conceive) to obtain the mother, and the Parity column to obtain
meaningful information on mother and parity.
This column should not be blank.
Parity
The cardinality of the pregnancy. 1 for a female's first pregnancy, 2
for a female's second pregnancy, and so forth. There should be no
'gaps' in the pregnancies, sequenced by Parity, of any female. When
the first pregnancy is known, the Parity sequence begins with 1. When
the first pregnancy is not known, the Parity sequence begins with 101.
This column should not be blank.
Conceive
The information recorded on the sexual cycle of the conception which
initiated the pregnancy. This is the Cid of a CYCLES row of the
mother. The associated CYCLES row should contain a Ddate value to
record the date of conception. The dates of the associated CYCLES
record, when dates are present, should be should be between the sexual
maturity date and the death date of the mother. When this column is
not blank, it should contain a unique datum. This column should not be
blank.
When the date of conception is estimated because there is no sexual
cycle data, the conception date recorded should be 178 days before the
recorded birthday.
Resume
The resumption of cycle information of the first cycle following the
pregnancy. This is the Cid of a row in CYCLES. The associated CYCLES
record will usually not have a resumption of menses date. This column
may be blank for those cases when resumption of cycle information is
not known. When this column is not blank, it should contain a unique
datum.
CYCLES (Female sexual cycles)
This table records information on the sexual cycle of the females. It
contains one row for every record of a female's sexual cycle. This
includes one row for those cases in which we know that the female had
a cycle, but we don't know anything else about the cycle. This case
occurs when we know that a female bore a specific child, but little
else about when this occurred. These children will have a pregnancy
record, which will be associated with the sexual cycle of conception,
and the CYCLES row will contain an estimated conception date. (See
the PREGS Conceive documentation above.) The table also includes rows
that record estimated conception dates. Each of these rows are
associated with a pregnancy record and can be classified as estimated
by examining the earliest possible deturgesence date (Eddate) and
latest possible deturgesence date (Ddate) values.
Because every female that has reached sexual maturity should have a
maturation date (Matured) in BIOGRAPH, with a corresponding row for
the first sexual cycle, the sexual cycle with a sequence (Seq) of 1
should be the female's first sexual cycle and should not, in general,
have a onset of menses date (Mdate).
The combination of Sname and each of the three types of dates should
be unique. The combination of Sname and Seq should be unique. Each
row should contain data on either an onset of menses, onset of
turgesence, or onset of deturgesence.
* Estimated onset of menses dates (Mdates) are calculated when there is
not an observed onset of menses date, and the estimated onset of
menses date falls within a period of continuous observation, and there
is a prior deturgesence date (Ddate) in the immediately preceding
sexual cycle. (See SEEDS.) The calculated Mdate falls on the 13th
day after the preceding deturgesence date, unless the turgesence date
(Tdate) falls within this time period in which case the calculated
Mdate is set to the day of the turgesence date. Estimated onset of
menses dates never have associated early and late dates (Emdate and
Lmdate.)
A cycle's onset of menses date (Mdate) should not be after the onset
of turgesence date (Tdate). A cycle's onset of turgesence date
(Tdate) should be before the onset of deturgesence date (Ddate).
The onset of menses date (Mdate), onset of turgesence (Tdate), and
onset of deturgesence (Ddate) of any record should not be equal to or
between any pair of the onset of menses dates (Mdate), onset of
turgesence dates (Tdate), or onset of deturgesence dates (Ddate) of
any other recorded cycle for an individual.
The earliest possible onset of menses (Emdate), onset of turgesence
(Etdate), and onset of deturgesence (Eddate) columns should not be
after the onset of menses (Mdate), onset of turgesence (Tdate), and
onset of deturgesence (Ddate) columns, respectively. The latest
possible onset of menses (Lmdate), onset of turgesence (Ltdate), and
onset of deturgesence (Lddate) columns should not be before the onset
of menses (Mdate), onset of turgesence (Tdate), and onset of
deturgesence (Ddate) columns, respectively.
Data Entry Rules
This table is updated by supplying text files to the Cycle Update
Program, in the usual fashion.
Data Element Descriptions
Cid
A numeric identifier unique to each menses/turgesence/deturgesence
cycle. This is used to reference the cycle elsewhere in the database.
This column should not be blank.
Sname
The short name of the female. This column should contain the Sname of
a female in BIOGRAPH. This column should not be blank.
* Series
* A female's rows that have the same series number are part of a
continuous series of observations that, presumably, have produce a
complete record all the female's sexual transition events which
occurred throughout the observation period. When a female first comes
under observe ration her CYCLES rows have a Series value of 1. If the
female can no longer be observed the series ends. When observation of
the female resumes the Series number is incremented and the female's
subsequent CYCLES rows are given a Series value of 2. Additional
breaks in observation result in further increments. The initiation
and cessation of observation should be recorded in the SEEDS table
and there should be a row there to record every Series endpoint,
menarche and disappearance or death excepted.
* The value in this column is not user maintainable, the system
automatically generates the value.
Seq
* Sequence. The first sexual cycle of a female has a Seq value of 1,
the second a value of 2, etc. There are no gaps in the sequence
numbers assigned to a female. Even when records of cycles are
missing, the first recorded cycle after the missing period has a
sequence one greater than the last recorded cycle before the missing
period. The sequence number recorded corresponds to the Nth sequence
of the female only when the Series number is 1 and the turgesence date
(Tdate) of the cycle with a Seq value of 1 is the same as the female's
maturity date (MATUREDATES.Matured.)
* The value in this column is not user maintainable, the system
automatically generates the value.
This column should not be blank.
Mdate
Onset of menses date. Only one of these date values may be before the
individual's onset of first menses date (Matured), and all should be
on or before the individual's Statdate. This column may be blank when
this date is not known.
Emdate
Earliest possible onset of menses date. Only one of these date values
may be before the individual's puberty date (Matured), and all should
be on or before the individual's Statdate. When the menses is
observed, this value should be the same as the Mdate value. This
column may be blank when this date is not known, either because the
Mdate is not known or because the accuracy of the Mdate is not known.
Lmdate
Latest possible onset of menses date. Only one of these date values
may be before the individual's puberty date (Matured), and all should
be on or before the individual's Statdate. When the menses is
observed, this value should be the same as the Mdate value. This
column may be blank when this date is not known, either because the
Mdate is not known or because the accuracy of the Mdate is not known.
* EstM
* A boolean value indicating whether or not the M date was
the result of observation or whether it was calculated from the Ddate
value. 'Y' means the value was calculated, 'N' means it was not.
Tdate
Onset of turgesence date. This date should be between the
individual's puberty date (Matured) and Statdate, inclusive. This
column may be blank when this date is not known.
Etdate
Earliest possible onset of turgesence date. Only one of these date
values may be before the individual's puberty date (Matured), and all
should be on or before the individual's Statdate. When the onset of
turgesence is observed, this value should be the same as the Tdate
value. This column may be blank when this date is not known, either
because the Tdate is not known or because the accuracy of the Tdate is
not known.
Ltdate
Latest possible onset of turgesence date. This date should be between
the individual's puberty date (Matured) and Statdate, inclusive. When
the onset of turgesence is observed, this value should be the same as
the Tdate value. This column may be blank when this date is not known,
either because the Tdate is not known or because the accuracy of the
Tdate is not known.
Ddate
Date of onset of deturgesence. This date should be between the
individual's puberty date (Matured) and Statdate, inclusive. This
column may be blank when this date is not known.
Eddate
Earliest possible onset of deturgesence date. Only one of these date
values may be before of the individual's puberty date (Matured), and
all should on or before the individual's Statdate. When the onset of
deturgesence is observed, this value should be the same as the Ddate
value. This column may be blank when this date is not known, either
because the Ddate is not known or because the accuracy of the Ddate is
not known.
Lddate
Latest possible onset of deturgesence date. This date should be
between the individual's puberty date (Matured) and Statdate,
inclusive. When the onset of deturgesence is observed, this value
should be the same as the Ddate value. This column may be blank when
this date is not known, either because the Ddate is not known or
because the accuracy of the Ddate is not known.
SEEDS
Records of the initiation and cessation of continuous periods of
observation during which a female's cycles are, presumed, to all have
been recorded. This table contains one row for each female for each
initiation or cessation of a continuous period of observation.
Rows with a Code value of 'S' or 'P', that mark the beginning of
observational periods or that represent isolated single days of
observation must have a value in the State column. All other rows,
those with a code of 'E' that represent the end of an observational
period, may not have a value in the State column.
This table is used to construct the reproductive state tables,
REPSTATS and CYCSTATS.
Data Entry Rules
We'll figure something out.
Data Element Descriptions
Sid
A number which uniquely identifies each row.
Sname
The short name of the female. This column should contain the Sname of
a female in BIOGRAPH. This column should not be blank.
Code
What kind of endpoint the date records. Legal values are 'S' (Start),
the date is the start of a period of observation; 'E' (End), the date
is the end of a period of observation; 'P' (point), the date is an
isolated observation that belongs with no other observations, it is
both a start and an end of an observational period.
Date
The date upon which observations began or ended. Observations were
made on the given date.
State
The state of the female's sexual cycle on the given date. Valid
values are: 'M', follicular phase 1 -- mdate (inclusive) to tdate
(exclusive); 'F', follicular phase 2 -- tdate (inclusive) to 6 days
prior to ddate (inclusive); 'O', ovluating -- 5 days prior to ddate
(inclusive) to ddate (exclusive); 'L', luteal -- ddate (inclusive) to
mdate (exclusive); 'P' pregnant -- ddate (inclusive) to birth
(exclusive); 'A', lactating -- birth (inclusive) to mdate (exclusive).
CYCPOINTS
This table records information on the sexual cycle of the females.
The usual events that mark the transitions of a female baboon's sexual
cycles are onset of menses, onset of turgesence, and beginning of
deturgesence (turgesence peak.) CYCPOINTS contains one row for every
recorded transition of a female's sexual cycle. In addition to the
usual recorded observations of transition states there are additional
rows that record estimations of when unobserved transitions occurred,
notably onset of menses dates (Mdates) but also unobserved onset of
deturgesence dates for pregnancies.
The transition events recorded in CYCPOINTS are collected into sexual
cycles, each cycle having (at most) a onset of menses date (Mdate), a
onset of turgesence date (Tdate), and an onset of deturgesence date
(Ddate). Each cycle is assigned a sequence (Seq) beginning with 1 and
the different transition event dates are distinguished by Code values
of M, T, and D respectively. The combination of Sname, Code, and Seq
must be unique. Some sexual cycles may be missing one or more of the
transition codes, should there be no record of an observation. In
this case the respective row is omitted from the table.
The sexual cycles themselves are aggregated into periods of continuous
observation, termed series, indicated by the assignment of a Series
number to each row. The first period of continuous observation for an
individual has a Series of 1, the second a series of 2, etc.
Aggregating a female's CYCPOINTS rows into a series indicates that the
collection of data points is believed to be complete, no unobserved or
unrecorded sexual cycle transitions occurring during the time spanned
by the series. This allows the series to be used as the basis of an
analysis of sexual cycle transition intervals. All a female's
CYCPOINTS belonging to the same sexual cycle, i.e. having that same
Seq value, must also belong to the same series, have the same Series
value.
CYCPOINTS includes rows for those cases in which we know that a female
had a cycle, because a pregnancy/birth resulted, but we don't know
anything else about the cycle. The child will have a pregnancy
record, and CYCPOINTS will contain an associated row to record the
estimated D date of the pregnancy. (See the PREGS Conceive
documentation above.) CYCPOINTS also includes rows that record other
estimated conception dates.
Because every female that has reached sexual maturity should have a
maturation date (Matured) in BIOGRAPH, with a corresponding row for
the first sexual cycle, the sexual cycle with a sequence (Seq) of 1
should be the female's first sexual cycle and should not, in general,
have a onset of menses date (Mdate).
A cycle's onset of menses date (Mdate) should not be after the onset
of turgesence date (Tdate). A cycle's onset of turgesence date
(Tdate) should be before the onset of deturgesence date (Ddate).
The onset of menses date (Mdate), onset of turgesence (Tdate), and
onset of deturgesence (Ddate) of any cycle should not be equal to or
between any pair of the onset of menses dates (Mdate), onset of
turgesence dates (Tdate), or onset of deturgesence dates (Ddate) of
any other recorded cycle for an individual.
The earliest possible onset of menses (Emdate), onset of turgesence
(Etdate), and onset of deturgesence (Eddate) columns should not be
after the onset of menses (Mdate), onset of turgesence (Tdate), and
onset of deturgesence (Ddate) columns, respectively. The latest
possible onset of menses (Lmdate), onset of turgesence (Ltdate), and
onset of deturgesence (Lddate) columns should not be before the onset
of menses (Mdate), onset of turgesence (Tdate), and onset of
deturgesence (Ddate) columns, respectively.
Only one of (each different kind of date, early, regular, and late) an
individual's M date values may be before the individual's onset of
first menses date (MATUREDATES.Matured), and all should be on or
before the individual's Statdate. All of (each different kind of
date, early, regular, and late) an individuals T and D date values
must be after the individuals onset of first menses date
(MATUREDATES.Matured), and all should be on or before the individual's
Statdate.
Each series must either consist of a single observation (Period = P)
or have starting and ending dates that are marked with a Period of S
and E respectively.
Data Entry Rules
We'll figure something out.
Data Element Descriptions
Cpid
A numeric identifier unique to each row. This is used to reference
the sexual cycle transition elsewhere in the database. This column
should not be blank.
Sname
The short name of the female. This column should contain the Sname of
a female in BIOGRAPH. This column should not be blank.
Cid
A numeric identifier identifying each sexual cycle. It is unique
across all cycles of all females. A cycle is defined to begin with
onset of menses, encompass the turgesence and deturgesence transitions,
and
end the day before the next onset of menses. Cycles that 'start' late
or
'stop' early, that contain faux transitions, include the faux
transitions in the cycle. Note that some cycles may only contain a
single CYCPOINTS row, that is, the Cid value may be unique to a single
CYCPOINTS row.
Seq
Sequence. The first sexual cycle of a female has a Seq value of 1,
the second a value of 2, etc. This column does not need to be manually
maintained. There are no gaps in the sequence numbers assigned to a
female. Even when records of cycles are missing, the first recorded
cycle after the missing period has a sequence one greater than the
last recorded cycle before the missing period. This column should not
be blank.
Date
The date of the transition event. This column may not be blank.
Edate
Earliest possible date of the transition event. This column may
be blank when there is no need to record a range of date values.
Ldate
Latest possible date of the transition event. This column may
be blank when there is no need to record a range of date values.
Source
Code indicating from whence the data was derived. D (Data) for
observed data. A (Auto) for automatically calculated dates, such as M
dates computed by adding 13 to the previous D date. E (Estimated) for
estimated values not to be used in other computations, such as
estimated D dates entered to relate mothers and pregnancies.
Code
The type of sexual cycle transition. M (onset of Menses), T (onset of
Turgesence), or D (onset of Deturgesence).
Series
Series. Number indicating with which series of continuous observation
the transition event belongs. Events that are isolated observations
have a series of their own. As with Seq, the series are per-female.
Each female begins with a series of 1 and and series is incremented
with each interruption in regular observation.
EstM
A boolean value indicating whether or not the M date was
the result of observation or whether it was calculated from the Ddate
value. 'Y' means the value was calculated, 'N' means it was not.