Interpolation

The Babase database uses a procedure called interpolation to update MEMBERS whenever the CENSUS table, or the BIOGRAPH.Birth, or BIOGRAPH.Statdate columns are updated. Interpolation extrapolates the group membership of individuals into days for which there is no actual observation of the individuals' whereabouts. It guesses with which group an individual is associated, given knowledge of the individual's group membership (or lack thereof) at given points in time, and records the result in MEMBERS. Thus, MEMBERS always has a row recording group membership for every day of every individual's life.

This section is comprised of 3 sub-sections. The first section introduces interpolation incrementally. Rules are presented in an informal fashion and examples and exceptions progressively developed. The second section is a formal specification of interpolation. The third section supplements the formal specification with expectations regarding the use of interpolation and brief descriptions of interpolation's implications. Most of the third section is a restatement of material already presented in the first section.

Interpolation's 3 Fundamentals

It is primarily by the field census records that Babase tracks group membership. However, despite its name, within the Babase database the the CENSUS table is the source of all group membership information and so contains data from sources other than just the field census records. Babase places rows in the CENSUS table to indicate presence in a group whenever any demography information is stored other tables.[170][171] Throughout this section it is to be understood that any sort of demographic information that results in CENSUS data are implied when the term census, or its plural, is used. Unfortunately, the term census is further overloaded. It is occasionally used in the colloquial sense, meaning present -- found when a group census was taken, the alternative being absent. It is hoped the meaning will be clear from context.

It is important to remember that censuses record absence from a group as well as presence in a group, that there are two mutually exclusive classes of CENSUS rows: absences, records of absence from specific groups on specific days; and locating censuses, records that place the individual in specific groups on specific days.

The premise of interpolation is that an individual is assumed to be in the group where observed for a period of 14 days to either side of the observation unless there's indication otherwise. To this end, interpolation keeps an individual in the group where a census locates him for a time period that is the shorter of:

  1. Half of the time interval between the individual's next (or prior) census that finds the individual in any group.

  2. Half of the time interval between the next (or prior) recorded absence from the group in which the individual was censused. Absences from other groups are ignored.

  3. The 14 day Interpolation Limit. Given no other information, an individual is considered to remain (or have been) in the group where observed for 14 days following (or preceding) the date of observation.

Should the above process not place an individual in a group, the individual is placed in the unknown group; so long as the individual is alive on the day in question.

There are some subtleties to these rules, and there is further elaboration necessary to allow for old style CENSUS rows, which do not directly correspond with actual census taking, and other factors. But these rules are the foundation and we begin with them.

Interpolation Visualized

Interpolation is best described with the help of diagrams as it is all about computing and comparing time intervals of various lengths, which are easily represented in a diagram by lines of various lengths. We begin with the simplest case, censusing a single individual either present or absent in a single group. This simple case is elaborated on extensively to illustrate a variety of special cases such as birth, death, prolonged periods without observation, and so forth, before introducing the complexities of multiple groups into the example.

Tip

As the examples throughout this section are developed be sure to pay close attention to the diagrams' keys. At times the meaning of a symbol changes from diagram to diagram to reflect a subtlety.

Interpolating presences and absences

Figure 4.1 shows a record of one individual's censuses. The group, for the moment we'll assume group 1, is censused 4 times over a period of 11 days. One day the individual is absent.

Figure 4.1. An Individual is Censused Present and Absent

                  One individual's census records
   CENSUS:        C       C                   A           C
     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
              

The first step in interpolation is to construct the various intervals from the given CENSUS rows. Figure 4.2 shows how interpolation splits the difference between presences and absences to construct two intervals for each locating census, one preceding the census and one following it. As the diagrams given here can only show a window in time and omit what falls outside that window, only one interval each is shown for the censuses taken on day 1 and day 11.

Figure 4.2. Interpolating From Presences and Absences

                  Interpolation intervals within a group
   CENSUS:        C       C                   A           C
Intervals:        X---|---X---------|         O     |-----X
     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Interpolation creates MEMBERS rows that place the individual in a group each day. Figure 4.3 shows how group membership assignment is based upon the computed intervals. Because of the absence, the individual is placed in group 9, the unknown group, on some days.

Figure 4.3. Interpolating Group Membership

                  Intervals determine group membership
   CENSUS:        C       C                   A           C
Intervals:        X---|---X---------|         O     |-----X
  MEMBERS.
    Group:        1   1   1   1   1   9   9   9   9   1   1
   Origin:        C   I   C   I   I   I   I   I   I   I   C

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Figure 4.3 also introduces the MEMBERS' Origin column. As can be seen, the Origin column mimics the corresponding CENSUS Status column on those days when interpolation is not guessing group membership. Origin is I on those day when interpolation is guessing.

The MEMBERS' Interp column represents number of says from a census in which an individual was recorded as present in some known group. Interp is zero on those days when a census has located the individual. The recorded absence is reflected in the group, but is immaterial to Interp. Even though there's an absence, the Interp count is over the interval between the two locating censuses. Interp gets its value from a split the difference between censuses that record presence in the group, a different sort of split the difference than is used to determine into which group an individual should be placed. Figure 4.4 extends Figure 4.3, showing the computation of Interp. With this addition the interpolation has finished, the MEMBERS table can be constructed from the given CENSUS rows.

Figure 4.4. Computing Interp Values

                  The resulting MEMBERS rows
    CENSUS:        C       C                   A           C
 Intervals
 For Group:        X---|---X---------|         O     |-----X
For Interp:        X~~~|~~~X~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~X
   MEMBERS.
     Group:        1   1   1   1   1   9   9   9   9   1   1
    Interp:        0   1   0   1   2   3   4   3   2   1   0      
    Origin:        C   I   C   I   I   I   I   I   I   I   C

      Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
~ Inside of interval
| Midpoint of interval
              

Applying the 14 day interpolation limit

So far we have only explored the first 2 of the 3 fundamental interpolation intervals, those dealing with being censused present and absent. Before we elaborate further and examine the more complicated interactions between presences and absences let us dispense with the 14 day interpolation limit.

Figure 4.5 shows the effect of the 14 day interpolation limit. To save space in this document, some days are removed from the interval. There are no censuses, present or absent, on the days omitted. As the Date: line shows, a total of 33 days are examined, an entire month 31 days in length and the first two days of the following month. Again, we assume the censuses are taken in group 1.

Figure 4.5. The 14 Day Interpolation Limit

                 The shorter intervals are chosen
      CENSUS:    C                                           C
C C Interval:    X----- ... -----------|------- ... ---------X
14 Day Limit:    X----- ... -------|       |--- ... ---------X
     MEMBERS.
       Group:    1   1  ...  1   1   9   9   1  ...  1   1   1
      Interp:    0   1  ... 13  14  15  15  14  ...  2   1   0  
      Origin:    C   I  ...  I   I   I   I   I  ...  I   I   C

        Date:    1   2  ... 14  15  16  17  18  ... 31   1   2

Key:
C Censused present in group (group 1)
X Known present in group (group 1)
- Inside of interval
| Interval endpoint
              


Because the 16th and 17th are more than 14 days away from either census the individual is placed in the unknown group on those days. Days that are closer to the actual censuses are interpolated into group 1. So, as the rules require, the individual is interpolated into the censused group for the shorter of the two time periods. As before, all the interpolated MEMBERS rows, those which do not correspond to an actual census, have an Origin of I. And as before, the Interp column counts up from and down to the actual censuses.

Interpolation and Birth Dates

There are some exceptions to the rules as stated so far. Not surprisingly, interpolation will not presume to put an individual in a group, create a MEMBERS row, before the individual's Birth date.

The birth date is an exception in another fashion, it locates the individual in his Matgrp like a special sort of census. The rationale for this is that although the birth may not be observed, the individual most certainly enters the group when born. Further, this rule ensures that we have a row in MEMBERS for every day the individual is alive. When there is a regular census on the birth date[172] the resultant MEMBERS row, having a date matching the individual's birth date, is no different from the individual's other MEMBERS rows that have dates which match the individual's other census dates; they all have an Origin of C and an Interp of 0. When there is no locating census on the birth date the resulting MEMBERS row still have a 0 Interp value, but have a Origin of I, not C. The Origin reflects the fact that there was no actual census, while the Interp shows that the individual was located that day. Figure 4.6 shows an individual that was not censused on his birth date.

Figure 4.6. Interpolation at Birth

                  Individual born into group 1
   CENSUS:                B           C   C           C
Intervals:                X-----|-----X-|-X-----|-----X
  MEMBERS.
    Group:                1   1   1   1   1   1   1   1
   Interp:                0   1   1   0   0   1   1   0
   Origin:                I   I   I   C   C   I   I   C

     Date:        1   2   3   4   5   6   7   8   9  10

Key:
B Born (into group 1)
C Censused present in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
                          


Clearly, there are no MEMBERS rows before the birth date, the individual is in his Matgrp on the day of his birth, and the Interp value counts up from the birth date and then down to the next census as though there were a census on the birth date.

An individual is placed in his Matgrp on his birth date even when a regular census has an absence recorded for the individual on the date of birth.[173]

Interpolation at the Statdate

Another exception to the rules, or rather two exceptions, occur at the Statdate. You might expect that interpolation would not place a row after the individual's Statdate, and this is indeed true, but true only when the individual is dead. When an individual is alive, interpolation will place a row after the individual's Statdate, but only when there is a subsequent absence from the same group as the group in which the individual was censused.[174][175] While at first this may seem odd, the reasoning behind this behavior is clear -- the Statdate is not the last date on which there are data for the individual. This is elaborated below.

All the same, at times there is a reason to have interpolation halt at the Statdate. When individuals are alive the system should not try to interpolate into time periods for which data have yet to be entered, else-wise there would always be spurious interpolated MEMBERS rows which vanish as soon as additional data are entered. The trouble with creating such rows is that, although the interpolation is corrected and the rows disappear once data entry resumes, the use of these rows in analysis is always inappropriate. Such rows will exist at the end of every period of data entry, as there will always be a large number of living individuals found in their groups on the last census entered. The solution is to not create the rows.[176] When a living individual has no later absences from the group where last located, no absences from the group of his last locating census that post-date his last locating census, this is taken to mean that there are additional as yet unentered data on the individual. In this case interpolation stops on the day the individual was last found in a group. This situation is shown in Figure 4.7, where the last census taken found the individual in group 1 on day 5, and so this day is the individual's Statdate as well. There is no interpolation past the last census.

Figure 4.7. Alive and Present When Last Censused

                  Living individual with Statdate of 5
   CENSUS:        C           A   C
Intervals:        X-----|     O |-X
  MEMBERS.
    Group:        1   1   9   9   1
   Interp:        0   1   2   1   0
   Origin:        C   I   I   I   C

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

In Figure 4.8 more data have been entered, the individual has been missing since the last census shown in Figure 4.7 above. As there have been no further censuses during which the individual was found the individual's Statdate is still day 5, although there is now subsequent interpolation. Notice that there are no MEMBERS rows created after day 7. When interpolating a living individual, after the Statdate there is no default placement of the individual into the unknown group.[177]

Figure 4.8. Alive and Absent in Last Census[178]

                  Living individual with Statdate of 5
   CENSUS:        C           A   C                   A   A
Intervals:        X-----|     O |-X---------|         O  
  MEMBERS.
    Group:        1   1   9   9   1   1   1   
   Interp:        0   1   2   1   0   1   2   
   Origin:        C   I   I   I   C   I   I   

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Although the only change between Figure 4.7 and Figure 4.8 is the entry into CENSUS of rows recording absence, that is enough to signal that interpolation can go forward without creating spurious MEMBERS rows -- rows likely erased upon the entry of more data. It is important that interpolation does go forward in this case, past the Statdate, as otherwise bias would be introduced. The last C CENSUS would be interpolated differently from all the other censuses. To be sure, there is bias introduced in Figure 4.7 when interpolation is cut short. But censoring bias at the end of data collection is unavoidable, whereas we can avoid introducing bias here.

Warning

So long as an individual is alive the last CENSUS to locate the individual ought be followed by a record of absence, an absence from the group where the individual was last found. To do otherwise, as must occur when there is simply no further data to be entered, is to introduce a bias into MEMBERS.

In Figure 4.9 there is no additional census information, but the individual's Status has been adjusted to mark the individual dead. A new Statdate value indicates the individual died on day 9 and interpolation is now up to and including the day of death. As is usual, when an individual's group membership cannot be determined he is placed in the unknown group.

Figure 4.9. Interpolation to Statdate When Dead

                  Dead individual with Statdate of 9
   CENSUS:        C           A   C                   A   A
Intervals:        X-----|     O |-X---------|         O  
  MEMBERS.
    Group:        1   1   9   9   1   1   1   9   9
   Interp:        0   1   2   1   0   1   2   3   4
   Origin:        C   I   I   I   C   I   I   I   I

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Although Figure 4.9 does not show this, the 14 day interpolation limit applies when the individual is dead. When there are no absences after the last census and there are more than 14 days between the last census and the Statdate the individual is placed in the unknown group from the 15th day through the day of death.

The Midpoint Rule

The alert reader may have noticed that the above examples are carefully crafted so that the midpoint between presences and absences always falls between two days. What happens when there is an odd number of days in the interval so that the midpoint is a day exactly in between the endpoints, as occurs 3 times in Figure 4.10?

Figure 4.10. Midpoint Days

                  Intervals with an odd number of days
     CENSUS:      C       A               C   C       A   C
  Intervals:      X---|   O       |-------X-|-X---|   O |-X
       Date:      1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

The MEMBERS table has a 1 day precision, there is no way to be in a group in the morning and out of it in the afternoon, so on any one midpoint day the individual must either be in the group or out of it. Should the individual be in the group on midpoint day or out of it? The question is resolved using a property of the date itself. Briefly, the Julian dating system is a method of assigning every day a unique number. As a midpoint day is no more likely to be on one day than another, we can avoid bias by using whether or not the midpoint day falls on an even or an odd Julian date to resolve the problem.

Whenever interpolation is called upon to halve an interval between two CENSUS rows that contains an odd number of days then the midpoint day is assigned to the left, earlier, half of the interval when the Julian date of the midpoint day is even. A midpoint day is assigned to the right, later, half of the interval when the Julian date of the midpoint day is odd.

So, The Midpoint Rule resolves the issue by adjusting the intervals as shown in Figure 4.11. The intervals are no longer perfectly halved. On the midpoint day there is no preference either for or against interpolating the individual into the group censused.

Figure 4.11. The Midpoint Rule Adjusts Intervals

                  Intervals with an odd number of days
     CENSUS:      C       A               C   C       A   C
  Intervals:      X-----| O     |---------X-|-X-|     O |-X
    MEMBERS.
      Group:      1   1   9   9   1   1   1   1   9   9   1
     Interp:      0   1   2   3   2   1   0   0   1   1   0
     Origin:      C   I   I   I   I   I   C   C   I   I   C

Julian Date:      1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Interval endpoint
              

Interpolating When The Group Changes

Having dispensed with the various elaborations and exceptions that occur in unusual cases it is time to return to the fundamentals of interpolation and examine what happens when an individual moves between groups. What comes into play are the first 2 of the 3 interpolation intervals. Recall:

Interpolation keeps an individual in the group where a census locates him for a time period that is the shorter of:

  1. Half of the time interval between the individual's next (or prior) census which finds the individual in any group.

  2. Half of the time interval between the next (or prior) recorded absence from the group in which the individual was censused. Absences from other groups are ignored.

Figure 4.12 shows a record of one individual's censuses. He, a male, is censused in 2 groups, group 1 and group 2. The census records for each group reflect both presence in the group and absence from the group.

Figure 4.12. An Individual is Censused in 2 Groups

                  One individual's census records
   Group 1:       C       C                   A   C   A                  
   Group 2:       A                   C               C                  

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
              

Figure 4.13 shows what would happen if interpolation worked with each group separately. There are conflicts, days when the individual is in both groups. Something else must be done.

Caution

Figure 4.13 is an example of an interpolation method that does not work. The method shown in the figure is not one Babase uses when interpolating.

Figure 4.13. Interpolating Each Group Separately

                  One individual's census records
   Group 1:       C       C                   A   C   A                  
   Group 2:       A                   C               C                  

   Group 1        Interpolating just group 1
    CENSUS:       C       C                   A   C   A
 Intervals:       X---|---X---------|         O |-X-| O

   Group 2        Interpolating just group 2
    CENSUS:       A                   C               C                  
 Intervals:       O         |---------X-------|-------X

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present
O Known absent
- Presumed present
| Interval endpoint
              


The solution is return to the interpolation fundamentals. We begin by taking a closer look at the way we have been diagramming intervals. In Figure 4.13 the first group has 3 locating census and 2 absences, and yet we've diagrammed the resultant intervals on a single line. The interpolation fundamentals tell us to obtain 2 pairs of intervals for each locating census. A halfway to census pair of intervals and a halfway to absence pair of intervals. Figure 4.14 takes the CENSUS rows of the first group shown in Figures 4.12 and 4.13 and does this for each locating census. In Figure 4.14 the CENSUS rows of days 1, 3 and 9 each have their own sections detailing the intervals to the nearest censuses and intervals to the nearest absences. The lines labeled Presence show the intervals that are halfway from each locating census to the next. The lines labeled Absence show the intervals that are halfway from each census to the nearest absence. This detailed breakdown is followed by a composite interval diagram of the familiar type encountered in figures 4.2 through 4.13 above. It should be clear that we have arrived at the composite form of the interval diagram by following the fundamentals, the composite is made up of the shorter of each census's intervals. The result is correct, the composite constructed in Figure 4.14 is identical to the one shown previously in Figure 4.13. It had better be, or else the interpolations of Figure 4.13 would be in conflict with the fundamental interpolation rules.

Figure 4.14. A Closer Look at Intervals

                  CENSUS rows from group 1
    CENSUS:       C       C                   A   C   A

     Day 1        Intervals by presence and absence
  Presence:       X---|   X
   Absence:       X-------------|             O

     Day 3        Intervals by presence and absence
  Presence:       X   |---X-----------|           X
   Absence:               X---------|         O

     Day 9        Intervals by presence and absence
  Presence:               X           |-----------X
   Absence:                                   O |-X-| O

                  Combining the shorter intervals
  Interval:       X---|---X---------|         O |-X-|

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
x Known present in different group
O Known absent in same group
- Inside of interval
| Interval endpoint
              

The intervals in Figure 4.14 did not have to be grouped by censused day, they could have been grouped by Presence and Absence or any other way. For each set of locating censuses we can always split out the halfway to census intervals from the halfway to absence intervals, group them any way we like, and later use the interpolation fundamentals to recombine them, without affecting the result. This has not been necessary so far, but it is essential if we are to correctly interpolate when an individual moves between groups, as above in Figure 4.12: “An Individual is Censused in 2 Groups”. We must return to the fundamentals to make sense of interpolation. Rather than trying to combine the results of interpolating the groups separately, as was done in Figure 4.13: “Interpolating Each Group Separately”, instead combine the results of interpolating the presences in all the groups with separate interpolations of the absences in each group. Each time a census finds an individual in a group, separately compute both the interval halfway to the nearest census that finds the individual in any group and the interval halfway to the nearest absence from the particular group being censused.[179]In Figure 4.15, this method is applied to the data first seen in Figure 4.12. For clarity the intervals surrounding the censuses that belong to one group are shown separately from those belonging to the other group.[180] The lines labeled Presence show the intervals that are halfway from each census to the nearest census that finds the individual in any group. The lines labeled Absence show the intervals that are halfway from each census to the nearest absence in the same group. Censuses with no neighboring absence do not have this latter sort of interval shown.[181]

Figure 4.15. Presence and Absence Interpolated Separately

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C                  

   Group 1        The intervals of group 1's censuses
  Presence:       X---|---X-----|     x     |-----X-| x
   Absence:               X---------|         O |-X-| O

   Group 2        The intervals of group 2's censuses
  Presence:       x       x     |-----X-----|     x |-X
   Absence:       O         |---------X               

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
x Known present in different group
O Known absent in same group
- Inside of interval
| Interval endpoint
              

Figure 4.16 shows how interpolation combines the presence and absence intervals by choosing the shorter of the two to as the period during which the individual is assumed to be in the group where censused. The line labeled Used contains the shorter of each census's two intervals.[182]

Figure 4.16. Combining Presence and Absence Intervals

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C                  

   Group 1        The intervals of group 1's censuses
  Presence:       X---|---X-----|     x     |-----X-| x
   Absence:               X---------|         O |-X-| O
      Used:       X---|---X-----|               |-X-|
  In Group:       1   1   1   1   ?   ?   ?   ?   1   ?

   Group 2        The intervals of group 2's censuses
  Presence:       x       x     |-----X-----|     x |-X
   Absence:       O         |---------X            
      Used:                     |-----X-----|       |-X
  In Group:       ?   ?   ?   ?   2   2   2   ?   ?   2

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
x Known present in different group
O Known absent in same group
- Inside of interval
| Interval endpoint
              

Having interpolated the intervals surrounding each census, determining the final group membership is a straightforward matter of placing the individual in the unknown group when there's no where else to put him. Figure 4.17 shows this process. All that remains is to compute the Interp values in the usual fashion, by ignoring absences and counting distance from the nearest census. In Figure 4.17 the intervals between locating census are shown, labeled For Interp, to support the Interp values given.

Figure 4.17. Group Membership Given Multiple Groups

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C                  

   Group 1        The intervals of group 1's censuses
      Used:       X---|---X-----|               |-X-|
  In Group:       1   1   1   1   ?   ?   ?   ?   1   ?

   Group 2        The intervals of group 2's censuses
      Used:                     |-----X-----|       |-X
  In Group:       ?   ?   ?   ?   2   2   2   ?   ?   2

                  Intervals between locating censuses
For Interp:       X~~~|~~~X~~~~~|~~~~~X~~~~~|~~~~~X~|~X

   MEMBERS.
     Group:       1   1   1   1   2   2   2   9   1   2
    Interp:       0   1   0   1   1   0   1   1   0   0
    Origin:       C   I   C   I   I   C   I   I   C   C

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
- Presumed present
~ Inside of interval
| Interval endpoint
              

By now it should be clear that interpolation[183] is a function over CENSUS row sets. It is a function, for every input you get exactly one output. It takes sets of CENSUS rows as input. Because sets are unordered you can put CENSUS rows into the database in any order and the result will be the same. And, because it is a function, you can re-interpolate the same CENSUS rows as many times as desired without altering the final result.

It should also be clear why interpolation always chooses to use the shorter interval, and why this always produces the correct result. The shorter interval is short for a reason, there is some reason to believe the individual is not in the group else-wise the interval would be longer. Further, every time the shorter interval is chosen a possible overlap with another interval from a different locating census is eliminated. By always choosing the shorter interval interpolation insures that the interpolation of any two locating censuses will not conflict.

Pre-Analyzed Data Disturbs Interpolation

In addition to that most important distinction which classifies CENSUS rows into absent and locating censuses there is a second distinction which further divides locating censuses into those which interpolate and those which do not. Those CENSUS rows that record observational data are interpolating censuses; those with Status values of C, D and, M.[184] (All of the previous examples have concerned CENSUS rows of this type.) The remaining CENSUS.Status values indicate that the CENSUS row is the result of analysis, all of the old style, that is historical, CENSUS.Status values and the N manual Status value. These are the non-interpolating censuses.

This further division of locating censuses into interpolating and non-interpolating, the division between raw and already analyzed data, leads to the final refinement to the interpolation procedure. We do not want interpolation to produce re-analyzed results from already analyzed data. Interpolation occurs only between regular, that is to say interpolating, censuses (and to the birth date as a special case). Non-interpolating census rows are copied directly from CENSUS to MEMBERS, CENSUS.Status becomes MEMBERS.Origin, and Interp is set to NULL. When a non-interpolating census is found on the birth date, the birth date will not interpolate.

Interpolation looks at regular census rows and attempts to guess the individual's location on those days when there are no observations. It does so by looking at the intervals between the regular censuses. Finding non-interpolating CENSUS rows, that is to say already analyzed data, on one of these intervals breaks the assumptions interpolation uses in its guessing. The previously analyzed data point could be there for any reason at all, and there's no point in pretending it's not there either. What interpolation does is give up. It interpolates up to the offending data point and then stops.[185] After that it still creates rows in MEMBERS, but it does not attempt to make guesses about where to place an individual or what the interpolated row means.

Note

This situation is not expected to occur, or, rather, whenever there are non-interpolating CENSUS rows between interpolating censuses, the non-interpolating CENSUS rows are expected to be contiguous over the entire interval between the interpolating censuses. So, the expected cases are the trivial degenerate ones. None the less, such situations probably do occur in the existent data. It would probably best to either require the expected behavior, or to get rid of all the pre-analyzed CENSUS rows and replace them with raw data. Especially given the design problems pointed out below.

Regardless, non-trivial examples are presented here so that a complete understanding of interpolation can be developed.

Figure 4.18 shows that the 3 fundamental interpolation intervals are shortened when a non-interpolating census is found between interpolating censuses. The intervals for each locating census are examined separately. The non-interpolating census has no interpolation intervals. The intervals of the interpolating censuses are truncated, reduced to the interval between the interpolating and non-interpolating censuses. By this means a portion of the diagram, days 4 and 5, are blocked from interpolating into the group. If there were no N census, the Absence interval would be day 1's shortest interval, and days 4 and 5 (as well as day 3) would interpolate into the group. (Notice that day 1's Absence interval has a midpoint day, day 5, and that it would have been included in the interval.) Interpolation is prevented from placing individuals in the group of their interpolating census on the far side of non-interpolating censuses.

Figure 4.18.  Pre-Analyzed Data Truncates Interpolation Intervals

               CENSUS rows from group 1
    CENSUS:    C       N                       A           C

     Day 1     Intervals per fundamental type
  Presence:    X-----| N                                   X
   Absence:    X-----| N                       O
14 Day Lim:    X-----| N

     Day 3     Intervals per fundamental type
  Presence:            N
   Absence:            N
14 Day Lim:            N

    Day 12     Intervals per fundamental type
  Presence:    X       N             |---------------------X
   Absence:            N                       O     |-----X
14 Day Lim:            N |---------------------------------X

Julian Day:    1   2   3   4   5   6   7   8   9  10  11  12

Key:
C Censused present in group (group 1)
N Manual entry,
    present in group but non-interpolating (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Inside of interval
| Interval endpoint
              

In Figure 4.19 the shortest intervals of each locating census have been chosen and combined; the result is the line labeled For Group. This is then used to determine group membership.

The interesting part of Figure 4.19 is the computation of the Interp values. The halfway to census intervals of Figure 4.18 have been combined and labeled For Interp. Recall that it is these intervals that are used to compute the Interp values. The N census has created a gap in interpolation, clearly shown on the For Interp line as running from day 3 through day 6. Over this interval interpolation's assumptions have been violated and it does not know what to do. The group membership is easy. On day 3, the day of the N census it can simply copy the CENSUS row's Grp and Status into the appropriate MEMBERS columns in the same fashion it would for any other locating census. On days 4 through 6 it can do what it usually does with group membership when it does not know where to locate an individual, it places the individual in the unknown group with a Origin of I. On days 3 through 6 interpolation has no way of knowing how far away the day is from the nearest locating census, which is what is supposed to go in the Interp column. Due to this lack of information it assigns the Interp column a value of NULL, no data, on this interval.

Figure 4.19.  Pre-Analyzed Data Interrupts Interpolation

               An individual is censused
    CENSUS:    C       N                       A           C
 Intervals
 For Group:    X-----| N                       O     |-----X
For Interp:    X~~~~~|               |~~~~~~~~~~~~~~~~~~~~~X
   MEMBERS.
     Group:    1   1   1   9   9   9   9   9   9   9   1   1
    Interp:    0   1                   5   4   3   2   1   0
    Origin:    C   I   N   I   I   I   I   I   I   I   I   C

      Date:    1   2   3   4   5   6   7   8   9  10  11  12

Key:
C Censused present in group (group 1)
N Manual entry,
    present in group but non-interpolating (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
~ Inside of interval
| Interval endpoint
              

When looking at Figure 4.19, one way to explain what happens to Interp is to say that it is fixed at NULL over that portion of the day 1 census's halfway to census interval that was truncated because the N row showed up. (See Figure 4.18.) Effectively, as MEMBERS Interp counts up with increasing distance from the interpolating census, the count is fixed at NULL upon encountering a non-interpolating census until the point is reached at which counting back down to the next interpolating census begins, at which point the count downward resumes as though never interrupted.[187]

The approach interpolation takes, in some sense, attempts to minimize the disturbance created when already analyzed census data are mixed in with raw census information. However, as can be seen in Figure 4.19, it is not entirely successful. Although day 7, for example, has an Interp value indicating it is 5 days away from a census, it is really 4 days away from the N census. If the N CENSUS does really represent a census, then day 7's Interp value is wrong. And the problems are not restricted to Interp values. Is it really true that days 4 and 5 should be assigned to the unknown group? If so then why aren't there N rows that say so? Day 2 is even more disturbing. There is no diagram for this, but suppose the N census found the individual in a different group. Figure 4.18 would be unchanged, all of day 1's intervals would be truncated at the N census. The effect would be more clear if the interval between the preceding C census and the following N census were larger, but consider that day 2, by the midpoint rule, would be assigned to the N census. That means that if the N census really does represent a census in a different group, that day 2 should be assigned to that group, not to group 1.

Note that, in the general case, even though the halfway to census interval does not determine group membership (all the intervals are truncated, leaving a gap in which interpolation defaults to the unknown group), whether this interval has a midpoint day, and if so where it falls, does matter to the computation of Interp. If the midpoint day happens to fall into the side of the interval containing the non-interpolating census then the Interp value will be NULL. Otherwise, it will have a value representing the number of days to the nearest locating, and interpolating, census.

Incorporating the above safety checks into the rules we already have, ensuring that data are not re-analyzed, produces the actual interpolation rules.

The Interpolation Rules

Using these rules interpolation creates rows in MEMBERS based on the information it finds in CENSUS, and the BIOGRAPH columns Birth, Matgrp, Statdate and Status.

  1. CENSUS Rows Are Either Absences, Interpolating, or Non-Interpolating

    Interpolation partitions all CENSUS rows into one of 3 categories:

    1. Absences

      CENSUS rows which indicate absence from a group.

    2. Interpolating censuses

      Those CENSUS rows that record observational data are interpolating censuses; those with Status values of C, D and, M.

    3. Non-interpolating censuses

      The remaining CENSUS.Status values indicate the CENSUS row is the result of analysis. These rows, all of the old style, that is historical, CENSUS.Status values and the N manual Status value, are not re-analyzed and so do not interpolate.

    For convenience, the CENSUS rows that are not absences, the interpolating and the non-interpolating censuses, are termed locating censuses.

  2. Censusing Assigns Group Membership

    On those days when an individual is censused in a group, when there is a locating CENSUS row, a row is created in MEMBERS to place that individual in the group on the given day. The Origin value is the CENSUS row's Status value. When the CENSUS row is interpolating the Interp value is 0. When the CENSUS row is non-interpolating the Interp value is NULL.

  3. The 3 Interpolation Intervals

    Interpolation places an individual in the group into which he is censused, the Grp of an interpolating CENSUS row (Status values C, D, and M), on the days to either side of the census being interpolated for a time period that is the shorter of:

    1. The Halfway to Census Interval

      Half of the time interval between the individual's next (or prior) locating and interpolating census, which may locate the individual in any group.

    2. The Halfway to Absence Interval

      Half of the time interval between the next (or prior) recorded absence, considering only absences from the same group in which the individual was censused. Absences from other groups are ignored.

    3. The 14 day Interpolation Limit

      Given no other information, an individual is considered to remain (or have been) in the group where observed for 14 days following (or preceding) the date of observation.

    The resulting MEMBERS rows have an Origin of I and an Interp value of the number of days difference between the MEMBERS row's Date and the date of the nearest locating census; Interp values count up over the The Halfway to Census Interval as the distance from the interpolated census increases. An interpolated MEMBERS row falling on the day after a census has an Interp of 1, the day after that the Interp is 2, and so forth, assuming, of course, the individual has no other nearby CENSUS rows.

  4. The Midpoint Rule

    This rule qualifies how interpolation assigns the halfway point between two CENSUS rows in The Halfway to Census Interval and The Halfway to Absence Intervals, above, when the number of days in the interval cannot be divided into equal halves. Whenever interpolation is called upon to halve an interval between two CENSUS rows that contains an odd number of days then the midpoint day is assigned to the left, earlier, half of the interval when the Julian date of the midpoint day is even. A midpoint day is assigned to the right, later, half of the interval when the Julian date of the midpoint day is odd.

  5. Births Locate Individuals

    This rule declares a live birth to be the equivalent of an interpolating census, one that indicates presence in the individual's Matgrp. fetal losses, individuals with NULL Snames, are not considered births and are never interpolated. An individual is placed in his Matgrp on his birth date even when a regular census has an absence recorded for the individual on the date of birth. In this case interpolation always entirely ignores the absence and will not use such an absence to compute a Halfway To Absence Interval.

    When there is a locating census on the birth date, the MEMBERS row interpolation creates is like that made for any other locating census with the given Status. But, when there is no locating census on the birth date the resulting MEMBERS row has a Origin of I (and an Interp of 0 as any census with a Status of C would have.) Aside from their I Origin value, births interpolate as would any CENSUS with a C Status.

  6. No Data Implies Unknown Group Membership

    On days when none of the above rules serve to place an individual in a group, the individual is placed in the unknown group. The resulting MEMBERS rows have an Origin of I and an Interp value of the number of days difference between the MEMBERS row's Date and the date of individual's nearest interpolating census.[188]

  7. Birth stops interpolation

    Interpolation will not place a row in MEMBERS before an individual's Birth date.

  8. Death stops interpolation

    When an individual is dead, interpolation will not place a row after the individual's Statdate.

  9. Data Entry Cessation Stops Interpolation of Living Individuals.

    When an individual is alive, interpolation will create rows after the individual's last locating census only when there are subsequent absences; absences, that is, from the group in which the individual was censused.[189] In this case, unlike above, no data does not imply unknown group membership; such rows are created only so long as the individual is interpolated into the group of his last locating census. When a living individual has no absences after their last locating census, absences from the group of their last locating census, interpolation assumes that there is further data available which has yet to be entered and interpolation stops at the last locating census.

  10. Data are not Re-Analyzed

    Interpolation is only done to regular, that is interpolating, CENSUS rows; data that were collected in the field. Other data, the non-interpolating census rows that represent the result of prior analysis, do not interpolate; they are copied directly from CENSUS to MEMBERS, CENSUS.Status becomes MEMBERS.Origin and Interp is set to 0. Further, when a non-interpolating census is found on one of The 3 Interpolation Intervals the interval is shortened enough that the non-interpolating census is no longer on the interval. When a non-interpolating census is found on a birth date, the birth date does not interpolate.

    The MEMBERS Interp column is fixed at NULL on the interval from the non-interpolating census row through the midpoint end of The Halfway to Census Interval, endpoints included.[190] Here we are speaking of The Halfway to Census Interval as computed, not a Halfway to Census Interval shortened in the preceding paragraph.

Expectations and Implications

It is expected that all non-interpolating CENSUS rows, that is to say CENSUS rows produced by prior analysis, will be clustered in contiguous intervals with regular census rows at the endpoints. This is particularly expected of old style census rows from before Babase, as they precede all regular census data, but is also expected of the N non-interpolating, manual, Status code, should it ever be used. If these expectations are born out, the Data are not Re-Analyzed rule will never be invoked.

There are some not-quite-obvious implications given these interpolation rules:

  • The only rows in MEMBERS that have an Origin of I, and an Interp of 0, and are not placed in the unknown group are birth dates. Not every birth date will have an associated MEMBERS row with these values, as some birth dates have locating censuses, but MEMBERS rows with these values will be birth dates.

  • Living individuals, but not dead ones, can have MEMBERS rows created by the interpolation procedure that locate the individual in a group on a date later than the individual's Statdate.[191]

  • So long as an individual is alive the last CENSUS to locate the individual ought be followed by a record of absence, an absence from the group where the individual was last found. To do otherwise, as must occur when there is simply no further data to be entered, is to introduce a bias into MEMBERS.

  • Aside from births, the only other rows in MEMBERS with an Origin of I and an Interp of 0 are those in the unknown group which were created by the Data are not Re-Analyzed rule above.

  • As fetal losses, individuals with NULL Snames, cannot appear in CENSUS, are not considered a live birth, and always have their birth date equal to their Statdate, they never have MEMBERS rows associated with them.

  • When computing Interp values from The Halfway to Census Interval The Midpoint Rule is usually immaterial. However, when non-interpolating censuses affect the interpolation The Midpoint Rule can be the factor that determines whether or not a MEMBERS row has a 0 Interp value or not.



[170] At this time only DEMOG, the demography notes table, contributes to CENSUS any information regarding group membership.

[171] Sometimes, when demography information is added into other tables, CENSUS rows are altered rather than removed. Likewise, CENSUS rows are removed (or altered as necessary) when demography information is removed from other tables.

[172] A census finding the individual in his Matgrp -- or so one would hope.

[173] This is the one exception, if you wish to consider it so, to the rule that an individual cannot be censused both present and absent in the same group on the same day.

[174] The same group condition is one that must be met whenever interpolation examines intervals between presence and absence.

[175] As the individual is alive, every census that post-dates the individual's Statdate must record an absence, else the Statdate would be adjusted to reflect the date of last census.

[176] This is a heuristic. While it should work well enough most of the time the Babase user must be aware of the pitfalls in this approach. These are explained below.

[177] Without this restriction interpolation would have to insert rows forever, placing the individual in the unknown group off into the indefinite future.

[178] Notice that interpolation does not bother analyzing absences, such as the last-most, that are not neighbor to censuses.

[179] Note that the intervals spoken of here are always anchored at one end by a census that finds an individual in a group. Each such census can therefore have 2 intervals associated with it, one of the days preceding the census date and one containing the following days. These intervals can then appear in the diagrams as single lines that contain a census date. It is important to remember that there are really 2 intervals depicted; one line that ends on the date of the census and another that begins at that point.

[180] As locating censuses are interpolated individually the figure could diagram the intervals associated with each census separately, as in Figure 4.14, work out group membership from that, and then combine the results; the outcome would be unaffected. The chosen presentation form allows the interval endpoints to match up in a revealing fashion. As an exercise the reader should prove to himself that the intervals associated with each locating census are accurately depicted, and that the order in which locating censuses are interpolated does indeed make no difference.

[181] Figure 4.14: “A Closer Look at Intervals” makes clear that it is not necessary to show these intervals. By definition, the omitted intervals will always be longer than the halfway to census interval of the census being interpolated. As the shorter interval is the one used the longer may be ignored.

[182] When there are two intervals. When there's no absence interval the Used: line shows the presence interval.

[183] The proper term is The Glorious Interpolation Procedure, but we don't tell this to just anybody.

[184] See MEMBERS.Origin.

[185] It might be better if interpolation did not interpolate at all on those intervals between interpolating censuses that contain a non-interpolating census[185] -- if it put the individual in the unknown group, with an Interp of 0 and an Origin of NULL whenever there was no locating census. However, this could easily cause problems because interpolation has always worked as the body of this document describes. Although these situations are not supposed to occur, it is likely the data contains such situations and changes should not be made to interpolation which break the database.

[185185] I have not thought this through. At first glance it seems the code would be simpler, but perhaps not. And the effect on data analysis is unclear. It is probably best to adopt one of the solutions presented in the note below.

[187] Although in this example we count up traversing the timeline from left to right, had the N census had been closer to the right side of the diagram than the left we would be counting up the interval by traversing the timeline in the opposite direction, from right to left.

[188] The same method is used to compute Interp values when interpolation uses The 3 Interpolation Intervals, above.

[189] This same group criteria corresponds with the criteria found in The Halfway to Absence Interval.

[190] Interp is fixed at 0 over the portion of The Halfway to Census Interval that was truncated in the preceding paragraph. Effectively, as MEMBERS Interp counts up with increasing distance from the interpolating census, the count is fixed at NULL upon encountering a non-interpolating census until the point is reached at which counting back down to the next interpolating census begins, at which point the count downward resumes as though never interrupted.

[191] This is examined in detail in Interpolation at the Statdate.


Page generated: 2016-07-22T23:08:19-04:00.