[Babase] Interpolation documentation for review

Karl O. Pinc babase@www.eco.princeton.edu
Fri, 29 Jul 2005 09:03:34 +0000


At long last there's documentation on interpolation,
as well as the related tables.  I only wish I had this
while writing the code.

Please review for accuracy, comprehensibility, style,
and anything else.  I'm sure there are tortured sentences
in there screaming to be helped.  I particularly want
feedback on interpolation, before I leave it behind.
The documentation has got to be comprehensible to
people or else they won't know how to use the system
so please comment if there's problems.

So that people have something to notate, I've appended
the "thin" format of the text version.  This should
allow people to comment directly in an email reply.
Or maybe we want a conference call after everybody's
digested it.  (The adventurous can notate
the xml source directly I suppose.)  The xml (and everything
else) is at http://papio.biology.duke.edu.  It's probably
best to go to the
web version and do the reading there as it'll be
both pretty and hyperlinked.  You could also glance
at the PDF version on the web site.  The PDF has a
strange formatting issue with the interpolation diagrams
and with some of the tables, so you don't need to tell
me about that.  The problems with the diagrams in the
PDF mean that it's probably better to read something
else.

The sections I want review of are, in order:

BIOGRAPH (Baboon Biographical Data)
      Column Descriptions
https://papio.biology.duke.edu/babase_system_html/ar01s05.html

MEMBERS (Group Membership)
      Column Descriptions
https://papio.biology.duke.edu/babase_system_html/ar01s14.html

CENSUS
      Column Descriptions
https://papio.biology.duke.edu/babase_system_html/ar01s15.html

DEMOG (Demography Notes)
      Column Descriptions
https://papio.biology.duke.edu/babase_system_html/ar01s16.html

Interpolation
     Interpolation's 3 Fundamentals
     Interpolation Visualized
     The Interpolation Rules
     Expectations and Implications
https://papio.biology.duke.edu/babase_system_html/ar01s24.html

A. Changes to Babase between 1.0 and 2.0
     Changes to .Statdate
     Changes To Interpolation and MEMBERS
     Changes To The Sexual Cycle Information
https://papio.biology.duke.edu/babase_system_html/apa.html

Note that in the appended text all the footnotes are at the
bottom.

Thanks.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein

----------------------------------------------------------

                              Babase:

   Technical Specifications for the Amboseli Baboon Project Data
   Management System

   Karl O. Pinc

    The Meme Factory, Inc.

   Jeanne Altmann, PhD.

    Princeton University

   Susan C. Alberts, PhD.

    Duke University

    ER Diagram layout and conversion to Dia: Leah Gerber

    Docbook formatting: Anne Ndeti Hubbard, Karl O. Pinc

    Copyright (c) 2005 Karl O. Pinc, Jeanne Altmann, Susan
    Alberts, Leah Gerber, The Meme Factory, Inc.

      Permission is granted to copy, distribute and/or modify
      this document under the terms of the GNU Free
      Documentation License, Version 1.2 or any later version
      published by the Free Software Foundation; with no
      Invariant Sections, no Front-Cover Texts, and no
      Back-Cover Texts. A copy of the license is included in
      the section entitled "GNU Free Documentation License."

    March 2, 2005

    +---------------------------------------------------------+
    | Revision History                                        |
    |---------------------------------------------------------|
    | Revision 0.0               | March, 2 2004              |
    |---------------------------------------------------------|
    | Initial document                                        |
    +---------------------------------------------------------+

      -------------------------------------------------------

    Table of Contents

    Introduction

                 This Document

                 System Designs

                 To Start BABASE

    Data organization

                 Databases

                 Users, Groups and Database Permissions

                 Schemas

                 Organization of the Babase Program Code

                 Data Relationships

    The Master Tables

    GROUPS (Groups)

                 Data Entry Rules

                 Data Element Descriptions

    BIOGRAPH (Baboon Biographical Data)

                 Column Descriptions

    MATUREDATES (Sexual Maturity Dates)

                 Matured

                 Mstatus (Sexual Maturity Status)

    RANKDATES (Adult Rank Attainment Dates)

                 Ranked

    CONSORTDATES (First Consortship Dates)

                 Consorted

    DISPERSEDATES (Dispersal Dates)

                 Dispersed

    PREGS (Pregnancies)

                 Data Entry Rules

                 Data Element Descriptions

    CYCGAPS

                 Data Entry Rules

                 Data Element Descriptions

    CYCPOINTS

                 Data Entry Rules

                 Data Element Descriptions

    SEXSKINS (Sexskin Turgescence Measurements)

                 Data Entry Rules

                 Data Element Descriptions

    MEMBERS (Group Membership)

                 Column Descriptions

    CENSUS

                 Column Descriptions

    DEMOG (Demography Notes)

                 Column Descriptions

    RANKS (Rankings Within Groups)

                 Data Entry Rules

                 Data Element Descriptions

    INTERACT (Interactions)

                 Data Entry Rules

                 Data Element Descriptions

    PARTS (Participants in interactions)

                 Data Entry Rules

    Data Element Descriptions

                 Sname

                 Role

                 Iid

    REPSTATS

                 Data Entry Rules

                 Data Element Descriptions

    CYCSTATS

                 Data Entry Rules

                 Data Element Descriptions

    THE SUPPORT TABLES

                 BSTATUSES

                 STATUSES

                 DCAUSES

                 MSTATUSES

                 WSTATIONS

                 ACTS

                 RNKTYPES

    Interpolation

                 Interpolation's 3 Fundamentals

                 Interpolation Visualized

                 The Interpolation Rules

                 Expectations and Implications

    Data Entry

                 Automaticlly Generated IDs

    The Dataset Tables

                 Datasets Containing INTERACT and PARTS Data

                 Datasets Containing CENSUS Data

                 Datasets Containing DEMOG Data

                 Datasets Containing CYCLES Data

    BABASE PROGRAMS

                 Data Maintenance Programs

                 Useful Programs and Functions

    A. Changes to Babase between 1.0 and 2.0

                 Changes to .Statdate

                 Changes To Interpolation and MEMBERS

                 Changes To The Sexual Cycle Information

    B. Docbook, Styling and other issues

---------------------------<snip>------------------------

BIOGRAPH (Baboon Biographical Data)

    This table records the basic biographical data on the
    baboons. It contains one row for each baboon, including
    aborted fetuses and fetal deaths (collectively, fetal
    losses), on which any data have been collected. All
    individuals with an Sname, i.e., those which aren't fetal
    losses, should have a Name and should have rows on MEMBERS.
    Those rows that record data on fetal losses should maintain
    the following relations between their data values: the
    Sname and Name values should be NULL; the Statdate should
    be the same as the birth date (Birth); the Status should be
    1 (definitely dead); and the Dcause should be 7 (unknown)
    or 5 (loss of mother). Jeanne needs to confirm that this is
    still the case since her changes to DCAUSES. Because the
    fetal losses have no Sname, there will not be any record of
    their group membership in MEMBERS. The Statdate value
    should not be less than the Birth value. Live animals
    should not have a recorded cause of death. Live animals
    that have no associated CENSUS rows (absences excepted)
    must have a Statdate equal to their Birth date.

   Column Descriptions

     Sname

    The short name of the individual. This is an exactly three
    character long name abbreviation which is used to identify
    the individual and so should be a unique data value. This
    value appears in many other places in the system and so
    should not be changed without changing all the other places
    in the database where the abbreviation appears; really,
    once established, the only reason to change this column is
    because the short name had already been used.^[5] The Sname
    is always composed of capital letters (and may not contain
    a space). This column should only be NULL if the row
    represents an aborted fetus.

     Name

    The name of the individual. This is a textual column used
    for descriptive purposes. This value should be unique when
    a comparison is done in a case insensitive fashion. This
    column should only be NULL if the row records an aborted
    fetus.

     Pid

    The Pid value, from the PREGS (Pregnancies) table, of the
    individual's mother's pregnancy that ended in the
    birth^[6]of the individual. This column may be NULL when
    there is no record of the individual's mother.

     Birth

    The date the pregnancy ends. If the pregnancy results in a
    birth, this date is the birth date of the offspring,
    otherwise, this is the date of the fetal loss. (A pregnancy
    that ends with the mother's death is considered as a
    spontaneous abortion for this purpose.)

    This column may not be NULL.

     Bstatus

    Birthday status. This column records the quality of the
    birth date estimate. The legal values for this column are
    defined by the BSTATUSES support table.

   Tip

    At the time of this writing the legal values are:

    The BSTATUSES Table

    Code                      Description
    0    Known exactly (to within several weeks, usually to
         within a few days)
    1    Estimate good to within 1 year
    2    Estimate good to within 2 years
    3    Estimate good to within 3 years
    4    Estimate good within 4 years
    9    Unknown, i.e. these dates are guesses and should not
         be used

    I don't think it's a particularly good idea to show support
    table values in this document. The procedure manual is the
    place for that. The whole point of support tables are that
    you can put anything you like in them.

    This column may not be NULL.

     Sex

    The sex of the individual. The legal values are:

    Valid Sex Values

    Code           Description
    M    the individual is male
    F    the individual is female
    U    the individual is of unknown sex

    This column may not be NULL.

     Matgrp

    The maternal group of the individual, the Gid of the
    sub-group into which the individual was born.

    This column must contain a Gid value of a row on the GROUPS
    table. This column may not be NULL.

   Tip

    If the maternal group is not known, the maternal group
    should be recorded as the unknown group.

     Statdate

    The status date of the individual. When the individual is
    alive, this is the latest date on which the animal was
    censused and found in a group^[7], absences don't count.
    When there are no such censuses, and the individual is
    alive, then the Statdate is the birth date. This column is
    automatically updated when CENSUS is updated to ensure the
    these relationship remain true. When the individual is not
    alive the Statdate is the date of death, disappearance,
    etc.

   Caution

    Living individuals, unlike dead ones, can have MEMBERS rows
    created by the interpolation procedure that locate the
    individual in a group on a date later than the individual's
    Statdate. For further information see: Interpolation At The
    Statdate .

    Statdate (almost, given the preceding caveat) provides a
    convenient way of determining the end of the time interval
    during which there is data on an individual, a way that is
    independent of whether the individual is alive or dead.

    This column may not be NULL.

     Status

    The state of the individual's life at the Statdate. The
    legal values for this column are defined by the STATUSES
    support table.

   Tip

    At the time of this writing the legal values are:

    The STATUSES Table

    Code   Description
    0    alive
    1    known death
    2    suspected death

    This column may not be NULL.

     Dcause

    The cause of death or circumstances associated with death.
    The legal values for this column are defined by the DCAUSES
    support table.

   Tip

    At the time of this writing the legal values are:

    The DCAUSES Table

    Code           Description
    1    predation
    2    conspecific
    3    other wounds or injuries
    4    Pathology or congenital problem
    5    loss of mother
    6    human action
    7    unknown
    8    under review

   Tip

    A value of 5 should only be present for individuals whose
    mother has died or disappeared at the same time or shortly
    before said individual.

    This column may not be NULL.

---------------------------<snip>------------------------

MEMBERS (Group Membership)

    The group membership table. This table records which group
    each animal is in on which date, excepting fetal losses
    (individuals with no Sname). There is a row in MEMBERS for
    every individual for every day between Birth and Statdate,
    inclusive, including periods during which the whereabouts
    of an individual are unknown or assumed unknown. (See: the
    unknown group.) Some living individuals have MEMBERS rows
    after their Statdate, for more information see the section:
    Interpolation At The Statdate . MEMBERS is most useful when
    one is interested in an individual's location on a
    particular date. Simply check MEMBERS for the individual on
    that date. To find all the individuals in a group on a
    date, look at all the rows in the table on that date for
    the group.

    MEMBERS is a single population-wide table created and
    updated automatically using information from CENSUS,
    BIOGRAPH, and DEMOG. The method used to do this is called
    interpolation and is described fully in a section below.
    Briefly, interpolation guesses which group an individual is
    likely to be in when there is no observational data. The
    MEMBERS rows which are the result of guessing have an I as
    their Origin value.

   Note

    Babase requires that an animal be located in exactly one
    group on any particular day, the combination of Sname and
    Date should be unique. The intent of this table is to
    record the location of each animal at the start of each
    day. See other documents for further information on how the
    actual practice of data acquisition and entry impacts this
    goal.

   Column Descriptions

     Sname

    The individual whose location is being recorded. The three
    letter code that identifies the individual's row in the
    BIOGRAPH table. There will always be a row in BIOGRAPH for
    the individual identified here.

    This column may not be NULL.

     Date

    The date.

    This column may not be NULL.

     Grp

    The group where the individual is located. This is a Gid
    value from GROUPS. This field should contain the most
    specific sub-grouping available -- subject to the
    constraints of the data entry protocol, of course.
    Aggregation into larger groupings is accomplished by
    retrieving the associated Supergroup from GROUPS.

    This column may not be NULL.

   Note

    Usage exception: For the years 1989-1991, inclusive, the
    group recorded for the sub-groups of Alto's group do not
    necessarily reflect the actual groupings of the animals on
    a particular day, but are instead indications of the
    group-splitting process. See Jeanne Altmann and the Data
    Management Manual for a further explanation.

     Origin

    A one letter code indicating the source of the location
    information. This information is derived from, and has the
    same values as, the Status column of CENSUS, although
    MEMBERS.Origin contains the I (interpolated) value not
    found in CENSUS. The codes are as follows: C (CENSUS)
    values represent census data points, I (interpolated)
    values are derived from the census data points, D
    (demography) values represent demography notes not present
    in the census sheets, M and N (manual) values represent
    census data points due to operator intervention in CENSUS .
    The S, E, F, B, G, T, L, and R codes are derived from
    analysis of historical data. See the CENSUS section for
    further information.

    This column may not be NULL.

     Interp

    The distance, in days, from the date in which an individual
    was previously observed to be in a group (censused --
    automatic placement in the unknown group does not count) to
    the date of the MEMBERS row. So the value is 0 on those
    days on which the individuals are censused, 1 on those
    (non-census) days immediately before or after the census
    days, etc. For those MEMBERS rows that the interpolation
    procedure has placed in the unknown group for lack of a
    better place to put them, the Interp column is the number
    of days "distant" from the interpolating CENSUS row, or the
    birth date, that determined the group membership. Note that
    the CENSUS row that determined that the MEMBERS.Grp should
    be unknown may record an absence.

   Important

    The Interp value is not meaningful over intervals that
    contain census rows which are themselves the result of an
    analysis. Over these intervals Interp is NULL. For more
    information see Interpolation, Data is not Re-Analyzed.

    This column many be NULL.

CENSUS

    The population census table. Aside from the BIOGRAPH Matgrp
    column, this table is the origin of all information
    regarding group membership. This table holds all the field
    census data any any information regarding group membership
    that is recorded in the field demography notes. It contains
    one row per animal per group per day censused. There is an
    additional row per individual per demography note for those
    days when there is a demography note regarding the
    individual and group but no census of the group. (See
    DEMOG.)

   Tip

    The way to record that an individual is alone is to create
    a row in GROUPS (Groups) meaning alone, and then to assign
    individuals who are alone to this group. The "alone-ness"
    of an individual can then be tracked in the same fashion as
    group membership, although the Babase user does then need
    to be aware that the members of the "alone" group are not
    actually proximate to one another.

    As noted in the MEMBERS documentation, Babase does not
    allow an individual to be in more than one group on a given
    day.

    The original field census data sheets can be recovered from
    CENSUS, with one exception. Data is lost when an individual
    is actually censused in two groups on the same day because
    of movement between groups and the timing of the censuses.
    In this situation a decision should be made as to which
    group CENSUS should record the individual's presence on
    that day.. A demography note should then be added to DEMOG,
    with text that notes the individual's presence in the
    second group. Although it is technically true that this
    does put into the database all of the information from the
    censuses in the field, as the information regarding the
    second census is in textual information it is not readily
    available to automated tools.

   Caution

    Be careful when changing this data; remember that rank will
    almost certainly change should group membership change.

   Column Descriptions

     Cenid

    A unique identifier. This is an automatically generated
    sequential number. Cenid links CENSUS to DEMOG.

    This column may not be NULL.

     Date

    The date of the census, or the date of the demography note
    (when Status is D).

    This column may not be NULL.

     Sname

    The individual whose location is being recorded. The three
    letter code that identifies an individual in BIOGRAPH.
    There will always be a row in BIOGRAPH for the individual
    identified here.

    This column may not be NULL.

     Grp

    The group where the individual is located. This is a Gid
    value from GROUPS. This column should contain the most
    specific sub-grouping available -- subject to the
    constraints of the data entry protocol, of course.
    Aggregation into larger groupings is accomplished by
    retrieving the associated Supergroup from GROUPS.

    This column may not be NULL.

   Note

    Usage exception: For the years 1989-1991, inclusive, the
    group recorded for the sub-groups of Alto's group do not
    necessarily reflect the actual groupings of the animals on
    a particular day, but are instead indications of the
    group-splitting process. See Protocol for Data Management:
    Amboseli Baboon Project document for a further explanation.

     Status

    A one letter code indicating the source of the location
    information. Status is the source of MEMBERS.Origin data.
    The current codes are as follows: C (census), A (absent), D
    (demography), and M or N (manual). Other values derived
    from analysis of historical data include: S, E, F, B, G, T,
    L, and R.

    The CENSUS.Status Codes

    C

            (census) The animal was found in the group on a
            field census sheet: from the census datasheets.
            (There may or may not be a corresponding demography
            note on DEMOG as well.)

    A

            (absent) The animal was not found in the group on a
            field census sheet. Note that while an individual
            should not be recorded "present" in more than one
            group on the same day, s/he may be absent from
            several groups on any given day.

    D

            (demography) The animal was noted in the field
            notebooks or elsewhere to be in a group but was not
            marked present in a field census on that day. There
            is an associated DEMOG row associated with the
            CENSUS row. The individual may or may not have been
            marked "absent" on the same group's field census
            for the day.^[9]

    M

            (manual, interpolated) This code provides a way to
            manually supplement what is in the CENSUS table
            when there is no other way to get the data in.
            Babase considers this code to be the same as the C
            code.

    N

            (manual, not interpolated) This code provides an
            alternate way to manually supplement what is in the
            CENSUS table when there is no other way to get the
            data in. This code does not interpolate, it is
            presumed to be the result of some analysis.

    S

            (Susan's data) The data comes from the old DISPERSE
            database where the record had both a Datein and a
            Dateout.

    E

            (ending date) The data comes from the old DISPERSE
            database where the record had a Datein but not a
            Dateout.

    F

            (final date) The data comes from the old DISPERSE
            database where there is a Dateout and the last
            recorded location is before the Statdate.

    B

            (birth date) The data comes from the old DISPERSE
            database where the record had a Dateout but not a
            Datein.

    T

            (total) The data comes from the old DISPERSE
            database where the record had neither a Datein nor
            a Dateout.

    G

            (gap) The data is a record of the animal in the
            unknown group when the animal appeared in the old
            DISPERSE database but where there was a gap between
            times of recorded location.

    L

            (lineage) The group is from the Matgrp on the old
            CYCTOT database, either because the animal did not
            appear in the DISPERSE database, or because the
            first location for the animal in the old DISPERSE
            database had a Datein and this Datein was after the
            birth date of the animal.

    R

            (result of Alto's breakup) The data is S, E, F, B,
            G, T, or L data which has had locations which were
            changed from 1.0 to the group in which the animal
            was censused on 15/4/92. This change left all R
            rows as part of a contiguous series of days during
            which the animals are located in the Alto's
            sub-group as censused on 15/4/92, and the
            time-adjacent locations were not 1.0.

    A C Status is marked on the census data sheet as an "X" .
    An A or D Status is marked on the census data sheet as a
    "0".

    This column may not be NULL.

     Cen

    Whether or not the CENSUS row represents an entry on a
    census data sheet. TRUE means the CENSUS row exists because
    of an entry on a census data sheet, FALSE means there was
    no census done and the CENSUS row exists to support a
    demography note, manual notation of absence, etc. Cen
    should only be TRUE when Status is C, A, or D.

    This column may not be NULL.

DEMOG (Demography Notes)

    This table holds group membership related text from the
    field demography notes, especially that which records group
    membership information not otherwise written on the regular
    field census sheets. DEMOG provides a means of notating
    CENSUS rows, and thus facilitates management of additional
    "free form" CENSUS rows, rows that do not directly
    correspond with the field census sheets.^[10] It contains
    one row for every individual for every date for every group
    where the individual was noted present in the field
    demography notes. The DEMOG row holds the textual
    information. There is always exactly one corresponding
    CENSUS row which holds the corresponding group membership
    information in the usual coded and structured form. (Note
    that only some CENSUS rows will have DEMOG rows; CENSUS
    rows that originate entirely in the regular censuses of
    groups will not, in general, have an associated DEMOG row).
    A single field note referring to more than one individual
    must appear in DEMOG as two (or more) separate rows, one
    row per individual. Multiple field notes pertaining to a
    single individual on a single date must be combined into
    one piece of text and entered in a single DEMOG row. (See
    Protocol notes for structure of the demography data as
    entered by the operator.)

   Column Descriptions

     Cenid

    A unique identifier. This is an automatically generated
    sequential number. Cenid links CENSUS to DEMOG.

    This column may not be NULL.

     Reference

    A GROUPS Gid value that links the DEMOG row with the
    written field notebook where the note can be found.

    This column may not be NULL.

     Comment

    The demography note text pertaining to the CENSUS row with
    the given Cenid.

    This column may not be NULL.

---------------------------<snip>------------------------

Interpolation

    The Babase database uses a procedure called interpolation
    to update MEMBERS whenever the CENSUS table, or the
    BIOGRAPH.Birth, or BIOGRAPH.Statdate columns are updated.
    Interpolation extrapolates the group membership of
    individuals into days for which there is no actual
    observation of the individuals' whereabouts. It "guesses"
    with which group an individual is associated, given
    knowledge of the individual's group membership (or lack
    thereof) at given points in time, and records the result in
    MEMBERS. Thus, MEMBERS always has a row recording group
    membership for every day of every individual's life.

   Interpolation's 3 Fundamentals

    It is primarily by census records that Babase tracks group
    membership. The CENSUS table is the source of all group
    membership information. Babase places rows in the CENSUS
    table to indicate presence in a group whenever demography
    information is stored other tables.^[12][13] Throughout
    this section it is to be understood that any sort of
    demographic information which results in CENSUS data is
    implied when the term census, or it's plural, is used.
    Unfortunately, the term census is further overloaded. It is
    occasionally used in the colloquial sense, meaning present
    -- found when a group census was taken, the alternative
    being absent. It is hoped the meaning will be clear from
    context.

    It is important to remember that censuses record absence
    from a group as well as presence in a group, that there are
    two mutually exclusive classes of CENSUS rows: absences,
    records of absence from specific groups on specific days;
    and "locating censuses", records that place the individual
    in specific groups on specific days.

    The premise of interpolation is that an individual is
    assumed to be in the group where observed for a period of
    14 days to either side of the observation unless there's
    indication otherwise. To this end, interpolation keeps an
    individual in the group where a census locates him for a
    time period that is the shorter of:

     1. Half of the time interval between the individual's next
        (or prior) census which finds the individual in any
        group.

     2. Half of the time interval between the next (or prior)
        recorded absence from the group in which the individual
        was censused. Absences from other groups are ignored.

     3. The 14 day Interpolation Limit. Given no other
        information, an individual is considered to remain (or
        have been) in the group where observed for 14 days
        following (or preceding) the date of observation.

    Should the above process not place an individual in a
    group, the individual is placed in the unknown group; so
    long as the individual is alive on the day in question.

    There are some subtleties to these rules, and there is
    further elaboration necessary to allow for "old style"
    CENSUS rows, which do not directly correspond with actual
    census taking, and other factors. But these rules are the
    foundation and we begin with them.

   Interpolation Visualized

    Interpolation is best described with the help of diagrams
    as it is all about computing and comparing time intervals
    of various lengths, which are easily represented in a
    diagram by lines of various lengths. We begin with the
    simplest case, an individual censused present and absent in
    a single group.

   Tip

    As the examples throughout this section are developed be
    sure to pay close attention to the diagrams' keys. At times
    the meaning of a symbol changes from diagram to diagram to
    reflect a subtlety.

     Interpolating presences and absences

    Figure 5 shows a record of one individual's censuses. The
    group, for the moment we'll assume group 1, is censused
    several times over a period of days. One day the individual
    is absent.

    Figure 5. An Individual is Censused Present and Absent

                    One individual's census records
     CENSUS:        C       C                   A           C
       Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)


    The first step in interpolation is to construct the various
    intervals from the given CENSUS rows. Figure 6 shows how
    interpolation "splits the difference" between presences and
    absences to construct two intervals for each locating
    census, one preceding the census and one following it. As
    the diagrams given here can only show a window in time and
    omit what falls outside that window, only one interval each
    is shown for the censuses taken on day 1 and day 11.

    Figure 6. Interpolating From Presences and Absences

                    Interpolation intervals within a group
     CENSUS:        C       C                   A           C
  Intervals:        X---|---X---------|         O     |-----X
       Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    Interpolation creates MEMBERS rows that place the
    individual in a group each day. Figure 7 shows how group
    membership assignment is based upon the computed intervals.
    Because of the absence, there are days when the individual
    is placed in group 9, the unknown group.

    Figure 7. Interpolating Group Membership

                    Intervals determine group membership
     CENSUS:        C       C                   A           C
  Intervals:        X---|---X---------|         O     |-----X
    MEMBERS.
      Group:        1   1   1   1   1   9   9   9   9   1   1
     Origin:        C   I   C   I   I   I   I   I   I   I   C

       Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    Figure 7 also introduces the MEMBERS' Origin column. As can
    be seen, the Origin column mimics the corresponding CENSUS
    Status column on those days when interpolation is not
    guessing group membership. Origin is I on those day when
    interpolation is guessing.

    The MEMBERS' Interp column represents distance from a
    census. Interp is zero on those days when a census has
    located the individual. The recorded absence is reflected
    in the group, but is immaterial to Interp. Even though
    there's an absence, the Interp count is over the interval
    between the two locating censuses. Interp gets it's value
    from a "split the difference" between censuses which record
    presence in the group, a different sort of "split the
    difference" than is used to determine into which group an
    individual should be placed. Figure 8 extends Figure 7,
    showing the computation of Interp. With this addition the
    interpolation has finished, the MEMBERS table can be
    constructed from the given CENSUS rows.

    Figure 8.  Computing Interp Values

                    The resulting MEMBERS rows
      CENSUS:        C       C                   A           C
   Intervals
   For Group:        X---|---X---------|         O     |-----X
  For Interp:        X~~~|~~~X~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~X
     MEMBERS.
       Group:        1   1   1   1   1   9   9   9   9   1   1
      Interp:        0   1   0   1   2   3   4   3   2   1   0
      Origin:        C   I   C   I   I   I   I   I   I   I   C

        Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  ~ Inside of interval
  | Midpoint of interval


     Applying the 14 day interpolation limit

    So far we have only explored the first 2 of the 3
    fundamental interpolation intervals, those dealing with
    being censused present and absent. Before we elaborate
    further and examine the more complicated interactions
    between presences and absences let us dispense with the 14
    day interpolation limit.

    Figure 9 shows the effect of the 14 day interpolation
    limit. For reasons of space some days are removed from the
    interval. There are no censuses, present or absent, on the
    days omitted. As the "Date:" line shows, a total of 33 days
    are examined, an entire month 31 days in length and the
    first two days of the following month. Again, we assume the
    censuses are taken in group 1.

    Figure 9.  The 14 Day Interpolation Limit

                   The shorter intervals are chosen
        CENSUS:    C                                           C
  C C Interval:    X----- ... -----------|------- ... ---------X
  14 Day Limit:    X----- ... -------|       |--- ... ---------X
       MEMBERS.
         Group:    1   1  ...  1   1   9   9   1  ...  1   1   1
        Interp:    0   1  ... 13  14  15  15  14  ...  2   1   0
        Origin:    C   I  ...  I   I   I   I   I  ...  I   I   C

          Date:    1   2  ... 14  15  16  17  18  ... 31   1   2

  Key:
  C Censused present in group (group 1)
  X Known present in group (group 1)
  - Inside of interval
  | Interval endpoint


    As the 16th and 17th are more than 14 days away from either
    census the individual is placed in the unknown group on
    those days. Days that are closer to the actual censuses are
    interpolated into group 1. So, as the rules require, the
    individual is interpolated into the censused group for the
    shorter of the two time periods. As before, all the
    interpolated MEMBERS rows, those which do not correspond to
    an actual census, have an Origin of I. And as before the
    Interp column counts up from and down to the actual
    censuses.

     Interpolation and Birth Dates

    There are some exceptions to the rules as stated so far.
    Not surprisingly, interpolation will not presume to put an
    individual in a group, create a MEMBERS row, before the
    individual's Birth date.

    The birth date is an exception another fashion, it locates
    the individual in his Matgrp like a special sort of census.
    The rationale for this is that although the birth may not
    be observed the individual most certainly enters the group
    when born. Further, this rule ensures that we have a row in
    MEMBERS for every day the individual is alive. When there
    is a regular census on the birth date, finding the
    individual in his Matgrp -- or so one would hope, the
    interpolated MEMBERS row is like that for any other census.
    But when there is no locating census on the birth date the
    resulting MEMBERS row has a Origin of I and an Interp of 0.
    This is shown in Figure 10.

    Figure 10. Interpolation at Birth

                    Individual born into group 1
     CENSUS:                B           C   C           C
  Intervals:                X-----|-----X-|-X-----|-----X
    MEMBERS.
      Group:                1   1   1   1   1   1   1   1
     Interp:                0   1   1   0   0   1   1   0
     Origin:                I   I   I   C   C   I   I   C

       Date:        1   2   3   4   5   6   7   8   9  10

  Key:
  B Born (into group 1)
  C Censused present in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    Clearly, there are no MEMBERS rows before the birth date,
    the individual is in his Matgrp on the day of his birth,
    and the Interp value counts up from the birth date and then
    down to the next census as though there were a census on
    the birth date.

    An individual is placed in his Matgrp on his birth date
    even when a regular census has an absence recorded for the
    individual on the date of birth.^[14]

     Interpolation At The Statdate

    Another exception to the rules, or rather two exceptions,
    occur at the Statdate. You might expect that interpolation
    would not place a row after the individual's Statdate, and
    this is indeed true, but true only when the individual is
    dead. When an individual is alive, interpolation will place
    a row after the individual's Statdate, but only when there
    is a subsequent absence from the same group as the group in
    which the individual was censused.^[15][16] While at first
    this may seem odd, the reasoning behind this behavior is
    clear -- the Statdate is not the last date on which there
    is data for the individual. This is elaborated below.

    All the same, at times there is a reason to have
    interpolation halt at the Statdate. When individuals are
    alive the system should not try to interpolate into time
    periods for which data has yet to be entered, elsewise
    there would always be spurious interpolated MEMBERS rows
    which vanish as soon as additional data is entered. The
    trouble with creating such rows is that, although the
    interpolation is corrected and the rows disappear once data
    entry resumes, the use of these rows in analysis is always
    inappropriate. Such rows will exist at the end of every
    period of data entry, as there will always be a large
    number of living individuals found in their groups on the
    last census entered. The solution is to not create the
    rows.^[17] When a living individual has no later absences
    from the group where last located, no absences from the
    group of his last locating census that post-date his last
    locating census, this is taken to mean that there is
    additional as yet unentered data on the individual. In this
    case interpolation stops on the day the individual was last
    found in a group. This situation is shown in Figure 11,
    where the last census taken found the individual in group 1
    on day 5, and so this day is the individual's Statdate as
    well. There is no interpolation past the last census.

    Figure 11.  Alive and Present When Last Censused

                    Living individual with Statdate of 5
     CENSUS:        C           A   C
  Intervals:        X-----|     O |-X
    MEMBERS.
      Group:        1   1   9   9   1
     Interp:        0   1   2   1   0
     Origin:        C   I   I   I   C

       Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    In Figure 12 more data has been entered, the individual has
    been missing since the last census shown in Figure 11
    above. As there have been no further censuses during which
    the individual was found the individual's Statdate is still
    day 5, although there is now subsequent interpolation.
    Notice that there are no MEMBERS rows created after day 7.
    When interpolating a living individual, after the Statdate
    there is no default placement of the individual into the
    unknown group.^[18]

    Figure 12.  Alive and Absent in Last Census^[19]

                    Living individual with Statdate of 5
     CENSUS:        C           A   C                   A   A
  Intervals:        X-----|     O |-X---------|         O
    MEMBERS.
      Group:        1   1   9   9   1   1   1
     Interp:        0   1   2   1   0   1   2
     Origin:        C   I   I   I   C   I   I

       Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    Although the only change between Figure 11 and Figure 12 is
    the entry into CENSUS of rows recording absence, that is
    enough to signal that interpolation can go forward without
    creating spurious MEMBERS rows -- rows likely erased upon
    the entry of more data. It is important that interpolation
    does go forward in this case, past the Statdate, as
    otherwise bias would be introduced. The last C CENSUS would
    be interpolated differently from all the other censuses. To
    be sure, there is bias introduced in Figure 11 when
    interpolation is cut short. But censoring bias at the end
    of data collection is unavoidable, whereas we can avoid
    introducing bias here.

   Warning

    So long as an individual is alive the last CENSUS to locate
    the individual ought be followed by a record of absence, an
    absence from the group where the individual was last found.
    To do otherwise, as must occur when there is simply no
    further data to be entered, is to introduce a bias into
    MEMBERS.

    In Figure 13 there is no additional census information, but
    the individual's Status has been adjusted to mark the
    individual dead. A new Statdate value indicates the
    individual died on day 9 and interpolation is now up to and
    including the day of death. As is usual, when an
    individual's group membership cannot be determined he is
    placed in the unknown group.

    Figure 13.  Interpolation to Statdate When Dead

                    Dead individual with Statdate of 9
     CENSUS:        C           A   C                   A   A
  Intervals:        X-----|     O |-X---------|         O
    MEMBERS.
      Group:        1   1   9   9   1   1   1   9   9
     Interp:        0   1   2   1   0   1   2   3   4
     Origin:        C   I   I   I   C   I   I   I   I

       Date:        1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    Although Figure 13 does not show this, the 14 day
    interpolation limit applies when the individual is dead.
    When there are no absences after the last census and there
    are more than 14 days between the last census and the
    Statdate the individual is placed in the unknown group from
    the 15th day through the day of death.

     The Midpoint Rule

    The alert reader may have noticed that the above examples
    are carefully crafted so that the midpoint between
    presences and absences always falls between two days. What
    happens when there is an odd number of days in the interval
    so that the midpoint is a day exactly in between the
    endpoints, as occurs 3 times in Figure 14?

    Figure 14.  Midpoint Days

                    Intervals with an odd number of days
       CENSUS:      C       A               C   C       A   C
    Intervals:      X---|   O       |-------X-|-X---|   O |-X
         Date:      1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Midpoint between census takings


    The MEMBERS table has a 1 day precision, there is no way to
    be in a group in the morning and out of it in the
    afternoon, so on any one midpoint day the individual must
    either be in the group or out of it. Should the individual
    be in the group on midpoint day or out of it? The question
    is resolved using a property of the date itself. Briefly,
    the julian dating system is a method of assigning every day
    a unique number. As a midpoint day is no more likely to be
    on one day than another, we can avoid bias by using whether
    or not the midpoint day falls on an even or an odd julian
    date to resolve the problem.

    Whenever interpolation is called upon to halve an interval
    between two CENSUS rows that contains an odd number of days
    then the "midpoint day" is assigned to the left, earlier,
    half of the interval when the julian date of the midpoint
    day is even. A midpoint day is assigned to the right,
    later, half of the interval when the julian date of the
    midpoint day is odd.

    So, The Midpoint Rule resolves the issue by adjusting the
    intervals as shown in Figure 15. The intervals are no
    longer perfectly halved. On the midpoint day there is no
    preference either for or against interpolating the
    individual into the group censused.

    Figure 15.  The Midpoint Rule Adjusts Intervals

                    Intervals with an odd number of days
       CENSUS:      C       A               C   C       A   C
    Intervals:      X-----| O     |---------X-|-X-|     O |-X
      MEMBERS.
        Group:      1   1   9   9   1   1   1   1   9   9   1
       Interp:      0   1   2   3   2   1   0   0   1   1   0
       Origin:      C   I   I   I   I   I   C   C   I   I   C

  Julian Date:      1   2   3   4   5   6   7   8   9  10  11

  Key:
  C Censused present in group (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  | Interval endpoint


     Interpolating When The Group Changes

    Having dispensed with the various elaborations and
    exceptions that occur in unusual cases it is time to return
    to the fundamentals of interpolation and examine what
    happens when an individual moves between groups. What comes
    into play are the first 2 of the 3 interpolation intervals.
    Recall:

      Interpolation keeps an individual in the group where a
      census locates him for a time period that is the shorter
      of:

       1. Half of the time interval between the individual's
          next (or prior) census which finds the individual in
          any group.

       2. Half of the time interval between the next (or prior)
          recorded absence from the group in which the
          individual was censused. Absences from other groups
          are ignored.

    Figure 16 shows a record of one individual's censuses. He,
    a male, is censused in 2 groups, group 1 and group 2. The
    census records for each group reflect both presence in the
    group and absence from the group.

    Figure 16. An Individual is Censused in 2 Groups

                    One individual's census records
     Group 1:       C       C                   A   C   A
     Group 2:       A                   C               C

        Date:       1   2   3   4   5   6   7   8   9  10

  Key:
  C Censused present
  A Censused absent


    Figure 17 shows what would happen if interpolation worked
    with each group separately. There are conflicts, days when
    the individual is in both groups. Something else must be
    done.

   Caution

    Figure 17 is an example of an interpolation method that
    does not work. The method shown in the figure is not one
    Babase uses when interpolating.

    Figure 17. Interpolating Each Group Separately

                    One individual's census records
     Group 1:       C       C                   A   C   A
     Group 2:       A                   C               C

     Group 1        Interpolating just group 1
      CENSUS:       C       C                   A   C   A
   Intervals:       X---|---X---------|         O |-X-| O

     Group 2        Interpolating just group 2
      CENSUS:       A                   C               C
   Intervals:       O         |---------X-------|-------X

        Date:       1   2   3   4   5   6   7   8   9  10

  Key:
  C Censused present
  A Censused absent
  X Known present
  O Known absent
  - Presumed present
  | Interval endpoint


    The solution is return to the interpolation fundamentals.
    We begin by taking a closer look at the way we have been
    diagramming intervals. In Figure 17 the first group has 3
    locating census and 2 absences, and yet we've diagrammed
    the resultant intervals on a single line. The interpolation
    fundamentals tell us to obtain 2 pairs of intervals for
    each locating census. A "halfway to census" pair of
    intervals and a "halfway to absence" pair of intervals.
    Figure 18 takes the CENSUS rows of the first group shown in
    Figures 16 and 17 and does this for each locating census.
    In Figure 18 the CENSUS rows of days 1, 3 and 9 each have
    their own sections detailing the intervals to the nearest
    censuses and intervals to the nearest absences. The lines
    labeled Presence show the intervals that are halfway from
    each locating census to the next. The lines labeled Absence
    show the intervals that are halfway from each census to the
    nearest absence. This detailed breakdown is followed by a
    composite interval diagram of the familiar type encountered
    in figures 6 through 17 above. It should be clear that we
    have arrived at the "composite" form of the interval
    diagram by following the fundamentals, the composite is
    made up of the shorter of each census's intervals. The
    result is correct, the composite constructed in Figure 18
    is identical to the one shown previously in Figure 17. It
    had better be, or else the interpolations of Figure 17
    would be in conflict with the fundamental interpolation
    rules.

    Figure 18.  A Closer Look at Intervals

                    CENSUS rows from group 1
      CENSUS:       C       C                   A   C   A

       Day 1        Intervals by presence and absence
    Presence:       X---|   X
     Absence:       X-------------|             O

       Day 3        Intervals by presence and absence
    Presence:       X   |---X-----------|           X
     Absence:               X---------|         O

       Day 9        Intervals by presence and absence
    Presence:               X           |-----------X
     Absence:                                   O |-X-| O

                    Combining the shorter intervals
    Interval:       X---|---X---------|         O |-X-|

        Date:       1   2   3   4   5   6   7   8   9  10

  Key:
  C Censused present
  A Censused absent
  X Known present in same group
  x Known present in different group
  O Known absent in same group
  - Inside of interval
  | Interval endpoint


    The intervals in Figure 18 did not have to be grouped by
    censused day, they could have been grouped by Presence and
    Absence or any other way. For each set of locating censuses
    we can always split out the "halfway to census" intervals
    from the "halfway to absence" intervals, group them any way
    we like, and later use the interpolation fundamentals to
    recombine them, without affecting the result. This has not
    been necessary so far, but it is essential if we are to
    correctly interpolate when an individual moves between
    groups, as above in Figure 16: "An Individual is Censused
    in 2 Groups". We must return to the fundamentals to make
    sense of interpolation. Rather than trying to combine the
    results of interpolating the groups separately, as was done
    in Figure 17: "Interpolating Each Group Separately",
    instead combine the results of interpolating the presences
    in all the groups with separate interpolations of the
    absences in each group. Each time a census finds an
    individual in a group, separately compute both the interval
    halfway to the nearest census that finds the individual in
    any group and the interval halfway to the nearest absence
    from the particular group being censused. In Figure 19,
    this method is applied to the data first seen in Figure 16.
    For clarity the intervals surrounding the censuses that
    belong to one group are shown separately from those
    belonging to the other group.^[20] The lines labeled
    Presence show the intervals that are halfway from each
    census to the nearest census that finds the individual in
    any group. The lines labeled Absence show the intervals
    that are halfway from each census to the nearest absence in
    the same group. Censuses with no neighboring absence do not
    have this latter sort of interval shown.^[21]

    Figure 19.  Presence and Absence Interpolated Separately

                    One individual's census records
     Group 1:       C       C                   A   C   A
     Group 2:       A                   C               C

     Group 1        The intervals of group 1's censuses
    Presence:       X---|---X-----|     x     |-----X-| x
     Absence:               X---------|         O |-X-| O

     Group 2        The intervals of group 2's censuses
    Presence:       x       x     |-----X-----|     x |-X
     Absence:       O         |---------X

        Date:       1   2   3   4   5   6   7   8   9  10

  Key:
  C Censused present
  A Censused absent
  X Known present in same group
  x Known present in different group
  O Known absent in same group
  - Inside of interval
  | Interval endpoint


    Figure 20 shows how interpolation combines the "presence"
    and "absence" intervals by choosing the shorter of the two
    to as the period during which the individual is assumed to
    be in the group where censused. The line labeled Used
    contains the shorter of each census's two intervals.^[22]

    Figure 20.  Combining Presence and Absence Intervals

                    One individual's census records
     Group 1:       C       C                   A   C   A
     Group 2:       A                   C               C

     Group 1        The intervals of group 1's censuses
    Presence:       X---|---X-----|     x     |-----X-| x
     Absence:               X---------|         O |-X-| O
        Used:       X---|---X-----|               |-X-|
    In Group:       1   1   1   1   ?   ?   ?   ?   1   ?

     Group 2        The intervals of group 2's censuses
    Presence:       x       x     |-----X-----|     x |-X
     Absence:       O         |---------X
        Used:                     |-----X-----|       |-X
    In Group:       ?   ?   ?   ?   2   2   2   ?   ?   2

        Date:       1   2   3   4   5   6   7   8   9  10

  Key:
  C Censused present
  A Censused absent
  X Known present in same group
  x Known present in different group
  O Known absent in same group
  - Inside of interval
  | Interval endpoint


    Having interpolated the intervals surrounding each census,
    determining the final group membership is a straightforward
    matter of placing the individual in the unknown group when
    there's no where else to put him. Figure 21 shows this
    process. All that remains is to compute the Interp values
    in the usual fashion, by ignoring absences and counting
    distance from the nearest census. In Figure 21 the
    intervals between locating census are shown, labeled For
    Interp, to support the Interp values given.

    Figure 21.  Group Membership Given Multiple Groups

                    One individual's census records
     Group 1:       C       C                   A   C   A
     Group 2:       A                   C               C

     Group 1        The intervals of group 1's censuses
        Used:       X---|---X-----|               |-X-|
    In Group:       1   1   1   1   ?   ?   ?   ?   1   ?

     Group 2        The intervals of group 2's censuses
        Used:                     |-----X-----|       |-X
    In Group:       ?   ?   ?   ?   2   2   2   ?   ?   2

                    Intervals between locating censuses
  For Interp:       X~~~|~~~X~~~~~|~~~~~X~~~~~|~~~~~X~|~X

     MEMBERS.
       Group:       1   1   1   1   2   2   2   9   1   2
      Interp:       0   1   0   1   1   0   1   1   0   0
      Origin:       C   I   C   I   I   C   I   I   C   C

        Date:       1   2   3   4   5   6   7   8   9  10

  Key:
  C Censused present
  A Censused absent
  X Known present in same group
  - Presumed present
  ~ Inside of interval
  | Interval endpoint


    By now it should be clear that interpolation^[23] is a
    function over CENSUS row sets. It is a function, for every
    input you get exactly one output. It takes sets of CENSUS
    rows as input. Because sets are unordered you can put
    CENSUS rows into the database in any order and the result
    will be the same. And, because it is a function, you can
    re-interpolate the same CENSUS rows as many times as
    desired without altering the final result.

    It should also be clear why interpolation always chooses to
    use "the shorter interval", and why this always produces
    the "correct" result. The shorter interval is short for a
    reason, there is some reason to believe the individual is
    not in the group elsewise the interval would be longer.
    Further, every time the shorter interval is chosen a
    possible overlap with another interval from a different
    locating census is eliminated. By always choosing the
    shorter interval interpolation insures that the
    interpolation of any two locating censuses will not
    conflict.

     Pre-Analyzed Data Disturbs Interpolation

    In addition to that most important distinction which
    classifies CENSUS rows into absent and locating censuses
    there is a second distinction which further divides
    locating censuses into those which interpolate and those
    which do not. Those CENSUS rows that record observational
    data are interpolating censuses; those with Status values
    of C, D and, M.^[24] (All of the previous examples have
    concerned CENSUS rows of this type.) The remaining
    CENSUS.Status values indicate that the CENSUS row is the
    result of analysis, all of the "old style", that is
    "historical", CENSUS.Status values and the N manual Status
    value. These are the non-interpolating censuses.

    This further division of locating censuses into
    interpolating and non-interpolating, the division between
    raw and already analyzed data, leads to the final
    refinement to the interpolation procedure. We do not want
    interpolation to produce re-analyzed results from already
    analyzed data. Interpolation occurs only between "regular",
    that is to say interpolating, censuses (and to the birth
    date as a special case). "Non-interpolating" census rows
    are copied directly from CENSUS to MEMBERS, CENSUS.Status
    becomes MEMBERS.Origin, and Interp is set to 0. When a
    non-interpolating census is found on the birth date, the
    birth date will not interpolate.

    Interpolation looks at "regular" census rows and attempts
    to guess the individual's location on those days when there
    are no observations. It does so by looking at the intervals
    between the "regular" censuses. Finding non-interpolating
    CENSUS rows, that is to say already analyzed data, on one
    of these intervals breaks the assumptions interpolation
    uses in it's "guessing". The previously analyzed data point
    could be there for any reason at all, and there's no point
    in pretending it's not there either. What interpolation
    does is give up. It interpolates up to the offending data
    point and then stops.^[25] After that it still creates rows
    in MEMBERS, but it does not attempt to make guesses about
    where to place an individual or what the interpolated row
    means.

   Note

    This situation is not expected to occur, or, rather,
    whenever there are non-interpolating CENSUS rows between
    interpolating censuses, the non-interpolating CENSUS rows
    are expected to be contiguous over the entire interval
    between the interpolating censuses. So, the expected cases
    are the trivial degenerate ones. None the less, such
    situations probably do occur in the existent data. It would
    probably best to either require the expected behavior, or
    to get rid of all the pre-analyzed CENSUS rows and replace
    them with raw data. Especially given the design problems
    pointed out below.

    Regardless, non-trivial examples are presented here so that
    a complete understanding of interpolation can be developed.

    Figure 22 shows that the 3 fundamental interpolation
    intervals are shortened when a non-interpolating census is
    found between interpolating censuses. The intervals for
    each locating census are examined separately. The
    non-interpolating census has no interpolation intervals.
    The intervals of the interpolating censuses are truncated,
    reduced to the interval between the interpolating and
    non-interpolating censuses. By this means a portion of the
    diagram, days 4 and 5, are blocked from interpolating into
    the group. If there were no N census, the Absence interval
    would be day 1's shortest interval, and days 4 and 5 (as
    well as day 3) would interpolate into the group. (Notice
    that day 1's Absence interval has a midpoint day, day 5,
    and that it would have been included in the interval.)
    Interpolation is prevented from placing individuals in the
    group of their interpolating census on the "far side" of
    non-interpolating censuses.

    Figure 22.  Pre-Analyzed Data Truncates Interpolation
    Intervals

                 CENSUS rows from group 1
      CENSUS:    C       N                       A           C

       Day 1     Intervals per fundamental type
    Presence:    X-----| N                                   X
     Absence:    X-----| N                       O
  14 Day Lim:    X-----| N

       Day 3     Intervals per fundamental type
    Presence:            N
     Absence:            N
  14 Day Lim:            N

      Day 12     Intervals per fundamental type
    Presence:    X       N             |---------------------X
     Absence:            N                       O     |-----X
  14 Day Lim:            N |---------------------------------X

  Julian Day:    1   2   3   4   5   6   7   8   9  10  11  12

  Key:
  C Censused present in group (group 1)
  N Manual entry,
      present in group but non-interpolating (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Inside of interval
  | Interval endpoint


    In Figure 23 the shortest intervals of each locating census
    have been chosen and combined; the result is the line
    labeled For Group. This is then used to determine group
    membership.

    The interesting part of Figure 23 is the computation of the
    Interp values. The "halfway to census" intervals of
    Figure 22 have been combined and labeled For Interp. Recall
    that it is these intervals that are used to compute the
    Interp values. The N census has created a "gap" in
    interpolation, clearly shown on the For Interp line as
    running from day 3 through day 6. Over this interval
    interpolation's assumptions have been violated and it does
    not know what to do. The group membership is easy. On day
    3, the day of the N census it can simply copy the CENSUS
    row's Grp and Status into the appropriate MEMBERS columns
    in the same fashion it would for any other locating census.
    On days 4 through 6 it can do what it usually does with
    group membership when it does not know where to locate an
    individual, it places the individual in the unknown group
    with a Origin of I. On days 3 through 6 interpolation has
    no way of knowing how far away the day is from the nearest
    locating census, which is what is supposed to go in the
    Interp column. Due to this lack of information it assigns
    the Interp column a value of NULL, no data, on this
    interval.

    Figure 23.  Pre-Analyzed Data Interrupts Interpolation

                 An individual is censused
      CENSUS:    C       N                       A           C
   Intervals
   For Group:    X-----| N                       O     |-----X
  For Interp:    X~~~~~|               |~~~~~~~~~~~~~~~~~~~~~X
     MEMBERS.
       Group:    1   1   1   9   9   9   9   9   9   9   1   1
      Interp:    0   1                   5   4   3   2   1   0
      Origin:    C   I   N   I   I   I   I   I   I   I   I   C

        Date:    1   2   3   4   5   6   7   8   9  10  11  12

  Key:
  C Censused present in group (group 1)
  N Manual entry,
      present in group but non-interpolating (group 1)
  A Censused absent in group (group 1)
  X Known present in group (group 1)
  O Known absent in group (group 1)
  - Presumed in group (group 1)
  ~ Inside of interval
  | Interval endpoint


    When looking at Figure 23, one way to explain what happens
    to Interp is to say that it is fixed at NULL over that
    portion of the day 1 census's "halfway to census" interval
    that was truncated because the N row showed up. (See
    Figure 22.) Effectively, as MEMBERS Interp counts up with
    increasing distance from the interpolating census, the
    count is fixed at NULL upon encountering a
    non-interpolating census until the point is reached at
    which counting back down to the next interpolating census
    begins, at which point the count downward resumes as though
    never interrupted.^[27]

    The approach interpolation takes, in some sense, attempts
    to minimize the disturbance created when already analyzed
    census data is mixed in with raw census information.
    However, as can be seen in Figure 23, it is not entirely
    successful. Although day 7, for example, has an Interp
    value indicating it is 5 days away from a census, it is
    really 4 days away from the N census. If the N CENSUS does
    really represent a census, then day 7's Interp value is
    wrong. And the problems are not restricted to Interp
    values. Is it really true that days 4 and 5 should be
    assigned to the unknown group? If so then why aren't there
    N rows that say so? Day 2 is even more disturbing. There is
    no diagram for this, but suppose the N census found the
    individual in a different group. Figure 22 would be
    unchanged, all of day 1's intervals would be truncated at
    the N census. The effect would be more clear if the
    interval between the preceding C census and the following N
    census were larger, but consider that day 2, by the
    midpoint rule, would be "assigned" to the N census. That
    means that if the N census really does represent a census
    in a different group, that day 2 should be assigned to that
    group, not to group 1.

    Note that, in the general case, even though the "halfway to
    census" interval does not determine group membership (all
    the intervals are truncated, leaving a "gap" in which
    interpolation defaults to the unknown group), whether this
    interval has a midpoint day, and if so where it falls, does
    matter to the computation of Interp. If the midpoint day
    happens to fall into the side of the interval containing
    the non-interpolating census then the Interp value will be
    NULL. Otherwise, it will have a value representing the
    number of days to the nearest locating, and interpolating,
    census.

    Incorporating the above safety checks into the rules we
    already have, ensuring that data is not re-analyzed,
    produces the actual interpolation rules.

   The Interpolation Rules

    Using these rules interpolation creates rows in MEMBERS
    based on the information it finds in CENSUS, and the
    BIOGRAPH columns Birth, Matgrp, Statdate and Status.

    I. CENSUS Rows Are Either Absences, Interpolating, or
       Non-Interpolating

       Interpolation partitions all CENSUS rows into one of 3
       categories:

         1. Absences

            CENSUS rows which indicate absence from a group.

         2. Interpolating censuses

            Those CENSUS rows that record observational data
            are interpolating censuses; those with Status
            values of C, D and, M.

         3. Non-interpolating censuses

            The remaining CENSUS.Status values indicate the
            CENSUS row is the result of analysis. These rows,
            all of the "old style", that is "historical",
            CENSUS.Status values and the N manual Status value,
            are not re-analyzed and so do not interpolate.

       For convenience, the CENSUS rows that are not absences,
       the interpolating and the non-interpolating censuses,
       are termed "locating censuses".

    II. Censusing Assigns Group Membership

        On those days when an individual is censused in a
        group, when there is a locating CENSUS row, a row is
        created in MEMBERS to place that individual in the
        group on the given day. The Origin value is the CENSUS
        row's Status value. When the CENSUS row is
        interpolating the Interp value is 0. When the CENSUS
        row is non-interpolating the Interp value is NULL.

    III. The 3 Interpolation Intervals

         Interpolation places an individual in the group into
         which he is censused, the Grp of an interpolating
         CENSUS row (Status values C, D, and M), on the days to
         either side of the census being interpolated for a
         time period that is the shorter of:

           1. The Halfway to Census Interval

              Half of the time interval between the
              individual's next (or prior) locating and
              interpolating census, which may locate the
              individual in any group.

           2. The Halfway to Absence Interval

              Half of the time interval between the next (or
              prior) recorded absence, considering only
              absences from the same group in which the
              individual was censused. Absences from other
              groups are ignored.

           3. The 14 day Interpolation Limit

              Given no other information, an individual is
              considered to remain (or have been) in the group
              where observed for 14 days following (or
              preceding) the date of observation.

         The resulting MEMBERS rows have an Origin of I and an
         Interp value of the number of days difference between
         the MEMBERS row's Date and the date of the nearest
         locating census; Interp values count up over the The
         Halfway to Census Interval as the distance from the
         interpolated census increases. An interpolated MEMBERS
         row falling on the day after a census has an Interp of
         1, the day after that the Interp is 2, and so forth,
         assuming, of course, the individual has no other
         nearby CENSUS rows.

    IV. The Midpoint Rule

        This rule qualifies how interpolation assigns the
        halfway point between two CENSUS rows in The Halfway to
        Census Interval and The Halfway to Absence Intervals,
        above, when the number of days in the interval cannot
        be divided into equal halves. Whenever interpolation is
        called upon to halve an interval between two CENSUS
        rows that contains an odd number of days then the
        "midpoint day" is assigned to the left, earlier, half
        of the interval when the julian date of the midpoint
        day is even. A midpoint day is assigned to the right,
        later, half of the interval when the julian date of the
        midpoint day is odd.

    V. Births Locate Individuals

       This rule declares a live birth to be the equivalent of
       an interpolating census, one that indicates presence in
       the individual's Matgrp. fetal losses, individuals with
       NULL Snames, are not considered births and are never
       interpolated. An individual is placed in his Matgrp on
       his birth date even when a regular census has an absence
       recorded for the individual on the date of birth. In
       this case interpolation always entirely ignores the
       absence and will not use such an absence to compute a
       Halfway To Absence Interval.

       When there is a locating census on the birth date, the
       MEMBERS row interpolation creates is like that made for
       any other locating census with the given Status. But,
       when there is no locating census on the birth date the
       resulting MEMBERS row has a Origin of I (and an Interp
       of 0 as any census with a Status of C would have.) Aside
       from their I Origin value, births interpolate as would
       any CENSUS with a C Status.

    VI. No Data Implies Unknown Group Membership

        On days when none of the above rules serve to place an
        individual in a group, the individual is placed in the
        unknown group. The resulting MEMBERS rows have an
        Origin of I and an Interp value of the number of days
        difference between the MEMBERS row's Date and the date
        of individual's nearest interpolating census.^[28]

    VII. Birth stops interpolation

         Interpolation will not place a row in MEMBERS before
         an individual's Birth date.

    VIII. Death stops interpolation

          When an individual is dead, interpolation will not
          place a row after the individual's Statdate.

    IX. Data Entry Cessation Stops Interpolation of Living
        Individuals

        When an individual is alive, interpolation will create
        rows after the individual's last locating census only
        when there are subsequent absences; absences, that is,
        from the group in which the individual was
        censused.^[29] In this case, unlike above, no data does
        not imply unknown group membership; such rows are
        created only so long as the individual is interpolated
        into the group of his last locating census. When a
        living individual has no absences after their last
        locating census, absences from the group of their last
        locating census, interpolation assumes that there is
        further data available which has yet to be entered and
        interpolation stops at the last locating census.

    X. Data is not Re-Analyzed

       Interpolation is only done to regular, that is
       interpolating, CENSUS rows; data that was collected in
       the field. Other data, the "non-interpolating" census
       rows that represent the result of prior analysis, do not
       interpolate; they are copied directly from CENSUS to
       MEMBERS, CENSUS.Status becomes MEMBERS.Origin and Interp
       is set to 0. Further, when a non-interpolating census is
       found on one of The 3 Interpolation Intervals the
       interval is shortened enough that the non-interpolating
       census is no longer on the interval. When a
       non-interpolating census is found on a birth date, the
       birth date does not interpolate.

       The MEMBERS Interp column is fixed at NULL on the
       interval from the non-interpolating census row through
       the "midpoint" end of The Halfway to Census Interval,
       endpoints included.^[30] Here we are speaking of The
       Halfway to Census Interval as computed, not a Halfway to
       Census Interval shortened in the preceding paragraph.

   Expectations and Implications

    It is expected that all non-interpolating CENSUS rows, that
    is to say CENSUS rows produced by prior analysis, will be
    clustered in contiguous intervals with "regular" census
    rows at the endpoints. This is particularly expected of
    "old style" census rows from before Babase, as they precede
    all "regular" census data, but is also expected of the N
    non-interpolating, manual, Status code, should it ever be
    used. If these expectations are born out, the Data is not
    Re-Analyzed rule will never be invoked.

    There are some not-quite-obvious implications given these
    interpolation rules:

      o The only rows in MEMBERS that have an Origin of I, and
        an Interp of 0, and are not placed in the unknown group
        are birth dates. Not every birth date will have an
        associated MEMBERS row with these values, as some birth
        dates have locating censuses, but MEMBERS rows with
        these values will be birth dates.

      o Living individuals, but not dead ones, can have MEMBERS
        rows created by the interpolation procedure that locate
        the individual in a group on a date later than the
        individual's Statdate.^[31]

      o So long as an individual is alive the last CENSUS to
        locate the individual ought be followed by a record of
        absence, an absence from the group where the individual
        was last found. To do otherwise, as must occur when
        there is simply no further data to be entered, is to
        introduce a bias into MEMBERS.

      o Aside from births, the only other rows in MEMBERS with
        an Origin of I and an Interp of 0 are those in the
        unknown group which were created by Data is not
        Re-Analyzed.

      o As fetal losses, individuals with NULL Snames, cannot
        appear in CENSUS, are not considered a live birth, and
        always have their birth date equal to their Statdate,
        they never have MEMBERS rows associated with them.

      o When computing Interp values from The Halfway to Census
        Interval The Midpoint Rule is usually immaterial.
        However, when non-interpolating censuses affect the
        interpolation The Midpoint Rule can be the factor that
        determines whether or not a MEMBERS row has a 0 Interp
        value or not.

---------------------------<snip>------------------------

A. Changes to Babase between 1.0 and 2.0

    A number of changes were made to Babase in the transition
    from FoxPro (Babase 1.0) to Postgresql (Babase 2.0). This
    appendix attempts to document changes made to data
    semantics.

   Changes to BIOGRAPH.Statdate

    The Statdate is now constrained, when the individual is
    alive, to be the most recent date on which a census located
    an individual in a group. Although this was true in
    practice, the 1.0 system did not require it.

    This constraint leads directly to another, when the
    individual is alive and there are no (non-absent) censuses
    then the individual's Statdate must be the individual's
    birth date. Because arbitrary Statdates are not allowed, we
    prevent automatic changes from erasing manually set
    Statdates.

   Changes To Interpolation and MEMBERS

    The interpolation procedure changed somewhat. As the
    interpolation is what creates the MEMBERS table this
    appendix also describes the changes made to MEMBERS between
    1.0 and 2.0.

      o Individuals have a row in MEMBERS for every day of
        their lives.

        Interpolation now places individuals in the unknown
        group when individuals' locations cannot be otherwise
        assigned, for example outside of the 14 day
        interpolation limit. Formerly, when the individual
        could not be place in a group on a particular day the
        individual had no row in MEMBERS on that day.

      o Individuals are no longer placed in a group, the group
        in which they were last censused, on their Statdate and
        this "location" no longer interpolates.

        When first written, the interpolation procedure was
        designed to work with females, who are unlikely to be
        absent from their group for more than 28 days. (Twice
        the 14 day interpolation limit.) By placing an
        individual in a group on their Statdate, the group in
        which they were last censused, the females were assured
        a row in MEMBERS for every day of their lives. Further,
        analysis was simplified as each of these rows
        associated the females with their group (even though at
        the end of their lives they may not have been present
        in the group.)

        The new interpolation procedure does not consider the
        Statdate in it's determination of the individual's
        group membership on that day, although, as always, when
        the Statdate is a death date it does stop
        interpolation.

      o There is a change in what happens when an individual is
        censused absent on his birth day. In the new system, if
        the individual is censused "absent" on his birth
        interpolation will "override" the absence and place the
        individual in his Matgrp group in MEMBERS.

        In the old system, if the individual is censused
        "absent" on his birth interpolation will not "override"
        the absence and place the individual in a group in
        MEMBERS. As the individual is expected to be somewhere
        on his birth, it's expected that there be a demography
        note made for the individual on that date to give the
        individual a location ' a row in MEMBERS.

      o MEMBERS.Interp may now be NULL. The Foxpro system did
        not have NULL values. In the new system Interp is NULL
        when interpolation does not know where the nearest
        locating census is. See Pre-Analyzed Data Disturbs
        Interpolation

      o The behavior of interpolation on the last census is now
        documented.

        The interpolation procedure changed during the period
        of use of Babase 1.0, but the changes were not
        documented. The primary change was that interpolation
        was altered so that it did not interpolate if there was
        no subsequent, absent or not, censuses. This prevented
        (almost) every living individual currently monitored
        from having a 14 day "tail" of interpolated values
        following the last entered census -- a "tail" that
        would disappear the next time the census information
        was updated.

   Changes To The Sexual Cycle Information

    The structure of the sexual cycle portion of the database
    was changed. The CYCLES table became CYCPOINTS. The CYCGAPS
    table was added. And the CYCSTATS and REPSTATS were
    modified and made useful. For further information please
    compare the old and new documentation.

---------------------------<snip>------------------------


    --------------

    ^[1] We do this rather than paying one of the regular
    certification authorities to validate our identity. These
    certification authorities appear to validate the identity
    of their customers by virtue of having successfully been
    paid.

    ^[2] As security restrictions permit, of course.

    ^[3] That way if you unknowingly revealed your password to
    the terrorists last weekend when you were drunk, by the
    time everybody sobers up the password will have been
    changed and the amount of damage done will be limited.

    ^[4] Presently group 9.0. This hardcoded at present.

    ^[5] This is unlikely as the database will not allow entry
    of a duplicate Sname.

    ^[6] Or whatever you want to call it in the case of a fetal
    loss.

    ^[7] An actual census does not have to be taken, as the
    Statdate of live individuals is derived from the CENSUS
    table, any observation of an individual in a group which
    results in a row being added to CENSUS is sufficient.

    ^[8] This criteria is specifically phrased to account for
    gaps in the recorded data during the time period in which
    the peak turgesence probably occured.

    ^[9] D usually occurs when a male is seen alone or in a
    non-census group.

    ^[10] DEMOG nearly makes the M CENSUS Status code obsolete,
    were it not so hard to search on textual data. Indeed, it
    was created in response to difficulties with the M code.

    ^[11] One would think that, in order to maintain perfect
    database consistency, the actor and actee participants in
    an interaction should be in the same Supergroup, according
    to the MEMBERS table. The database consistency checker
    (integrit.prg) does report when the actor and actee are not
    members of the same Supergroup. However, there is currently
    no check for actor/actee location correspondence in the
    update programs. This is for three reasons. First,
    movements between groups and the timing of censuses and
    interaction data collection may result in valid records of
    interactions between individuals that are recorded as being
    in different supergroups. The effects of this on the manual
    data correction process could be reduced by having the
    interaction master table update process add additional
    location data into the MEMBERS table, but not totally
    eliminated because the resolution of the MEMBERS table is
    one day and individuals can move between groups during a
    one day interval. Second, some of the interaction data are
    entered with a date of the first of the month, not the
    actual date of the interaction. Thus, the animals could be
    in different groups on the first of the month and still
    interact during the month. When this situation is
    discovered, the date of the interaction for these
    interactions should be manually changed to the first day of
    the month on which the two animals were in the same group.
    Third, the lack of a check allows the interaction data to
    be entered before the census data for the month. Also, from
    1989 through 1991, inclusive, recorded group for the
    sub-groups of Alto's group does not always represent the
    actual location of the individual. (See the MEMBERS
    documentation.)

    ^[12] At this time only DEMOG, the demography notes table,
    contributes to CENSUS any information regarding group
    membership.

    ^[13] Sometimes, when demography information is added into
    other tables, CENSUS rows are altered rather than removed.
    Likewise, CENSUS rows are removed (or altered as necessary)
    when demography information is removed from other tables.

    ^[14] This is the one exception, if you wish to consider it
    so, to the rule that an individual cannot be censused both
    present and absent in the same group on the same day.

    ^[15] The "same group" condition is one that must be met
    whenever interpolation examines intervals between presence
    and absence.

    ^[16] As the individual is alive, every census that
    post-dates the individual's Statdate must record an
    absence, else the Statdate would be adjusted to reflect the
    date of last census.

    ^[17] This is a heuristic. While it should work well enough
    most of the time the Babase user must be aware of the
    pitfalls in this approach. These are explained below.

    ^[18] Without this restriction interpolation would have to
    insert rows forever, placing the individual in the unknown
    group off into the indefinite future.

    ^[19] Notice that interpolation does not bother analyzing
    absences, such as the last-most, that are not neighbor to
    censuses.

    ^[20] As locating censuses are interpolated individually
    the figure could diagram the intervals associated with each
    census separately, as in Figure 18, work out group
    membership from that, and then combine the results; the
    outcome would be unaffected. The chosen presentation form
    allows the interval endpoints to "match up" in a revealing
    fashion. As an exercise the reader should prove to himself
    that the intervals associated with each locating census are
    accurately depicted, and that the order in which locating
    censuses are interpolated does indeed make no difference.

    ^[21] Figure 18: " A Closer Look at Intervals" makes clear
    that it is not necessary to show these intervals. By
    definition, the omitted intervals will always be longer
    than the "halfway to census" interval of the census being
    interpolated. As the shorter interval is the one used the
    longer may be ignored.

    ^[22] When there are two intervals. When there's no
    "absence" interval the "Used:" line shows the "presence"
    interval.

    ^[23] The proper term is "The Glorious Interpolation
    Procedure", but we don't tell this to just anybody.

    ^[24] See MEMBERS.Origin.

    ^[25] It might be better if interpolation did not
    interpolate at all on those intervals between interpolating
    censuses that contain a non-interpolating census^[25] -- if
    it put the individual in the unknown group, with an Interp
    of 0 and an Origin of NULL whenever there was no locating
    census. However, this could easily cause problems because
    interpolation has always worked as the body of this
    document describes. Although these situations are not
    supposed to occur, it is likely the data contains such
    situations and changes should not be made to interpolation
    which break the database.

    ^[2525] I have not thought this through. At first glance it
    seems the code would be simpler, but perhaps not. And the
    effect on data analysis is unclear. It is probably best to
    adopt one of the solutions presented in the note below.

    ^[27] Although in this example we "count up" traversing the
    timeline from left to right, had the N census had been
    closer to the right side of the diagram than the left we
    would be "counting up" the interval by traversing the
    timeline in the opposite direction, from right to left.

    ^[28] The same method is used to compute Interp values when
    interpolation uses The 3 Interpolation Intervals, above.

    ^[29] This "same group" criteria corresponds with the
    criteria found in The Halfway to Absence Interval.

    ^[30] Interp is fixed at 0 over the portion of The Halfway
    to Census Interval that was truncated in the preceding
    paragraph. Effectively, as MEMBERS Interp counts up with
    increasing distance from the interpolating census, the
    count is fixed at NULL upon encountering a
    non-interpolating census until the point is reached at
    which counting back down to the next interpolating census
    begins, at which point the count downward resumes as though
    never interrupted.

    ^[31] This is examined in detail in Interpolation At The
    Statdate .

    ^[32] Be sure to read the edition that describes the
    version of Docbook you're using. This text was written for
    Docbook 4.3.