The Babase database uses a procedure called interpolation to update MEMBERS whenever the CENSUS table, or the BIOGRAPH.Birth, or BIOGRAPH.Statdate columns are updated. Interpolation extrapolates the group membership of individuals into days for which there is no actual observation of the individuals' whereabouts. It “guesses” in which group an individual is primarily, physically, located, given knowledge of the individual's group membership (or lack thereof) at given points in time, and records the result in MEMBERS. Thus, MEMBERS always has a row recording group membership for every day of every individual's life.
This section is comprised of 3 sub-sections. The first section introduces interpolation incrementally. Rules are presented in an informal fashion and examples and exceptions progressively developed. The second section is a formal specification of interpolation. The third section supplements the formal specification with expectations regarding the use of interpolation and brief descriptions of interpolation's implications. Most of the third section is a restatement of material already presented in the first section.
It is primarily by the field census records that Babase tracks group membership. However, despite its name, within the Babase database the the CENSUS table is the source of all group membership information and so contains data from sources other than just the field census records. Babase places rows in the CENSUS table to indicate presence in a group whenever any demography information is stored other tables.[185][186] Throughout this section it is to be understood that any sort of demographic information that results in CENSUS data are implied when the term census, or its plural, is used. Unfortunately, the term census is further overloaded. It is occasionally used in the colloquial sense, meaning present -- found when a group census was taken, the alternative being absent. It is hoped the meaning will be clear from context.
It is important to remember that censuses record absence from a group as well as presence in a group, that there are two mutually exclusive classes of CENSUS rows: absences, records of absence from specific groups on specific days; and “locating censuses”, records that place the individual in specific groups on specific days.
The premise of interpolation is that an individual is assumed to be in the group where observed for a period of 14 days to either side of the observation unless there's indication otherwise. To this end, interpolation keeps an individual in the group where a census locates him for a time period that is the shorter of:
Half of the time interval between the individual's next (or prior) census that finds the individual in any group.
Half of the time interval between the next (or prior) recorded absence from the group in which the individual was censused. Absences from other groups are ignored.
The 14 day Interpolation Limit. Given no other information, an individual is considered to remain (or have been) in the group where observed for 14 days following (or preceding) the date of observation.
Should the above process not place an individual in a group, the individual is placed in the unknown group; so long as the individual is alive on the day in question.
There are some subtleties to these rules, and there is further elaboration necessary to allow for “old style” CENSUS rows, which do not directly correspond with actual census taking, and other factors. But these rules are the foundation and we begin with them.
Interpolation is best described with the help of diagrams as it is all about computing and comparing time intervals of various lengths, which are easily represented in a diagram by lines of various lengths. We begin with the simplest case, censusing a single individual either present or absent in a single group. This simple case is elaborated on extensively to illustrate a variety of special cases such as birth, death, prolonged periods without observation, and so forth, before introducing the complexities of multiple groups into the example.
As the examples throughout this section are developed be sure to pay close attention to the diagrams' keys. At times the meaning of a symbol changes from diagram to diagram to reflect a subtlety.
Figure 4.1 shows a record
of one individual's censuses. The group, for the moment
we'll assume group 1
, is censused 4 times
over a period of 11 days. One day the individual is
absent.
Figure 4.1. An Individual is Censused Present and Absent
One individual's census records CENSUS: C C A C Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1)
The first step in interpolation is to construct the various intervals from the given CENSUS rows. Figure 4.2 shows how interpolation “splits the difference” between presences and absences to construct two intervals for each locating census, one preceding the census and one following it. As the diagrams given here can only show a window in time and omit what falls outside that window, only one interval each is shown for the censuses taken on day 1 and day 11.
Figure 4.2. Interpolating From Presences and Absences
Interpolation intervals within a group CENSUS: C C A C Intervals: X---|---X---------| O |-----X Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
Interpolation creates MEMBERS rows that place the
individual in a group each day. Figure 4.3 shows how group membership
assignment is based upon the computed intervals. Because of
the absence, the individual is placed in group
9
, the unknown group, on some
days.
Figure 4.3. Interpolating Group Membership
Intervals determine group membership CENSUS: C C A C Intervals: X---|---X---------| O |-----X MEMBERS. Group: 1 1 1 1 1 9 9 9 9 1 1 Origin: C I C I I I I I I I C Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
Figure 4.3 also introduces
the MEMBERS' Origin column. As can be
seen, the Origin column mimics the
corresponding CENSUS Status column on those days when
interpolation is not guessing group membership. Origin is
I
on those day when
interpolation is guessing.
The MEMBERS' Interp column represents number of says from a census in which an individual was recorded as present in some known group. Interp is zero on those days when a census has located the individual. The recorded absence is reflected in the group, but is immaterial to Interp. Even though there's an absence, the Interp count is over the interval between the two locating censuses. Interp gets its value from a “split the difference” between censuses that record presence in the group, a different sort of “split the difference” than is used to determine into which group an individual should be placed. Figure 4.4 extends Figure 4.3, showing the computation of Interp. With this addition the interpolation has finished, the MEMBERS table can be constructed from the given CENSUS rows.
Figure 4.4. Computing Interp Values
The resulting MEMBERS rows CENSUS: C C A C Intervals For Group: X---|---X---------| O |-----X For Interp: X~~~|~~~X~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~X MEMBERS. Group: 1 1 1 1 1 9 9 9 9 1 1 Interp: 0 1 0 1 2 3 4 3 2 1 0 Origin: C I C I I I I I I I C Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) ~ Inside of interval | Midpoint of interval
So far we have only explored the first 2 of the 3 fundamental interpolation intervals, those dealing with being censused present and absent. Before we elaborate further and examine the more complicated interactions between presences and absences let us dispense with the 14 day interpolation limit.
Figure 4.5 shows the effect of the
14 day interpolation limit. To save
space in this document, some days are removed from the
interval. There are no censuses, present or absent, on the
days omitted. As the “Date:” line shows, a
total of 33 days are examined, an entire month 31 days in
length and the first two days of the following month.
Again, we assume the censuses are taken in group
1
.
Figure 4.5. The 14 Day Interpolation Limit
The shorter intervals are chosen CENSUS: C C C C Interval: X----- ... -----------|------- ... ---------X 14 Day Limit: X----- ... -------| |--- ... ---------X MEMBERS. Group: 1 1 ... 1 1 9 9 1 ... 1 1 1 Interp: 0 1 ... 13 14 15 15 14 ... 2 1 0 Origin: C I ... I I I I I ... I I C Date: 1 2 ... 14 15 16 17 18 ... 31 1 2 Key: C Censused present in group (group 1) X Known present in group (group 1) - Inside of interval | Interval endpoint
Because the 16th and 17th are more than
14 days away from either census the
individual is placed in the unknown group on those
days. Days that are closer to the actual censuses are
interpolated into group 1
. So, as the
rules require, the individual is interpolated into the
censused group for the shorter of the two time periods.
As before, all the interpolated MEMBERS
rows, those which do not correspond to an actual census,
have an Origin of
I
. And as before, the
Interp column counts up from and
down to the actual censuses.
There are some exceptions to the rules as stated so far. Not surprisingly, interpolation will not presume to put an individual in a group, create a MEMBERS row, before the individual's Birth date.
The birth date is an exception in another fashion, it
locates the individual in his Matgrp like a special sort of census.
The rationale for this is that although the birth may not be
observed, the individual most certainly enters the group
when born. Further, this rule ensures that we have a row in
MEMBERS for every day the individual is
alive. When there is a regular census on the birth
date[187] the resultant MEMBERS row,
having a date matching the individual's birth date, is no
different from the individual's other MEMBERS rows that have dates which match the
individual's other census dates; they all have an Origin of C
and an Interp of
0
. When there is no locating census on the birth date the resulting MEMBERS row still have a 0
Interp value, but have a Origin of
I
, not
C
. The Origin reflects the fact that there was
no actual census, while the Interp
shows that the individual was located that day. Figure 4.6 shows an individual that was not
censused on his birth date.
Figure 4.6. Interpolation at Birth
Individual born into group 1 CENSUS: B C C C Intervals: X-----|-----X-|-X-----|-----X MEMBERS. Group: 1 1 1 1 1 1 1 1 Interp: 0 1 1 0 0 1 1 0 Origin: I I I C C I I C Date: 1 2 3 4 5 6 7 8 9 10 Key: B Born (into group 1) C Censused present in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
Clearly, there are no MEMBERS rows
before the birth date, the individual is in his Matgrp on the day of his birth,
and the Interp value counts up from the birth date and
then down to the next census as though there were a
census on the birth date.
An individual is placed in his Matgrp on his birth date even when a regular census has an absence recorded for the individual on the date of birth.[188]
Another exception to the rules, or rather two exceptions, occur at the Statdate. You might expect that interpolation would not place a row after the individual's Statdate, and this is indeed true, but true only when the individual is dead. When an individual is alive, interpolation will place a row after the individual's Statdate, but only when there is a subsequent absence from the same group as the group in which the individual was censused.[189][190] While at first this may seem odd, the reasoning behind this behavior is clear -- the Statdate is not the last date on which there are data for the individual. This is elaborated below.
All the same, at times there is a reason to have interpolation halt at the Statdate. When individuals are alive the system should not try to interpolate into time periods for which data have yet to be entered, else-wise there would always be spurious interpolated MEMBERS rows which vanish as soon as additional data are entered. The trouble with creating such rows is that, although the interpolation is corrected and the rows disappear once data entry resumes, the use of these rows in analysis is always inappropriate. Such rows will exist at the end of every period of data entry, as there will always be a large number of living individuals found in their groups on the last census entered. The solution is to not create the rows.[191] When a living individual has no later absences from the group where last located, no absences from the group of his last locating census that post-date his last locating census, this is taken to mean that there are additional as yet unentered data on the individual. In this case interpolation stops on the day the individual was last found in a group. This situation is shown in Figure 4.7, where the last census taken found the individual in group 1 on day 5, and so this day is the individual's Statdate as well. There is no interpolation past the last census.
Figure 4.7. Alive and Present When Last Censused
Living individual with Statdate of 5 CENSUS: C A C Intervals: X-----| O |-X MEMBERS. Group: 1 1 9 9 1 Interp: 0 1 2 1 0 Origin: C I I I C Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
In Figure 4.8 more data have been entered, the individual has been missing since the last census shown in Figure 4.7 above. As there have been no further censuses during which the individual was found the individual's Statdate is still day 5, although there is now subsequent interpolation. Notice that there are no MEMBERS rows created after day 7. When interpolating a living individual, after the Statdate there is no default placement of the individual into the unknown group.[192]
Figure 4.8. Alive and Absent in Last Census[193]
Living individual with Statdate of 5 CENSUS: C A C A A Intervals: X-----| O |-X---------| O MEMBERS. Group: 1 1 9 9 1 1 1 Interp: 0 1 2 1 0 1 2 Origin: C I I I C I I Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
Although the only change between Figure 4.7 and Figure 4.8 is the entry into CENSUS of rows recording absence, that is enough
to signal that interpolation can go forward without creating
spurious MEMBERS rows -- rows likely
erased upon the entry of more data. It is important that
interpolation does go forward in this case, past the Statdate, as otherwise bias would be
introduced. The last C
CENSUS would be interpolated differently from
all the other censuses. To be sure, there is bias
introduced in Figure 4.7 when
interpolation is cut short. But censoring bias at the end
of data collection is unavoidable, whereas we can avoid
introducing bias here.
So long as an individual is alive the last CENSUS to locate the individual ought be followed by a record of absence, an absence from the group where the individual was last found. To do otherwise, as must occur when there is simply no further data to be entered, is to introduce a bias into MEMBERS.
In Figure 4.9 there is no additional census information, but the individual's Status has been adjusted to mark the individual dead. A new Statdate value indicates the individual died on day 9 and interpolation is now up to and including the day of death. As is usual, when an individual's group membership cannot be determined he is placed in the unknown group.
Figure 4.9. Interpolation to Statdate When Dead
Dead individual with Statdate of 9 CENSUS: C A C A A Intervals: X-----| O |-X---------| O MEMBERS. Group: 1 1 9 9 1 1 1 9 9 Interp: 0 1 2 1 0 1 2 3 4 Origin: C I I I C I I I I Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
Although Figure 4.9 does not show this, the 14 day interpolation limit applies when the individual is dead. When there are no absences after the last census and there are more than 14 days between the last census and the Statdate the individual is placed in the unknown group from the 15th day through the day of death.
The alert reader may have noticed that the above examples are carefully crafted so that the midpoint between presences and absences always falls between two days. What happens when there is an odd number of days in the interval so that the midpoint is a day exactly in between the endpoints, as occurs 3 times in Figure 4.10?
Figure 4.10. Midpoint Days
Intervals with an odd number of days CENSUS: C A C C A C Intervals: X---| O |-------X-|-X---| O |-X Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Midpoint between census takings
The MEMBERS table has a 1 day precision, there is no way to be in a group in the morning and out of it in the afternoon, so on any one midpoint day the individual must either be in the group or out of it. Should the individual be in the group on midpoint day or out of it? The question is resolved using a property of the date itself. Briefly, the Julian dating system is a method of assigning every day a unique number. As a midpoint day is no more likely to be on one day than another, we can avoid bias by using whether or not the midpoint day falls on an even or an odd Julian date to resolve the problem.
Whenever interpolation is called upon to halve an interval between two CENSUS rows that contains an odd number of days then the “midpoint day” is assigned to the left, earlier, half of the interval when the Julian date of the midpoint day is even. A midpoint day is assigned to the right, later, half of the interval when the Julian date of the midpoint day is odd.
So, The Midpoint Rule resolves the issue by adjusting the intervals as shown in Figure 4.11. The intervals are no longer perfectly halved. On the midpoint day there is no preference either for or against interpolating the individual into the group censused.
Figure 4.11. The Midpoint Rule Adjusts Intervals
Intervals with an odd number of days CENSUS: C A C C A C Intervals: X-----| O |---------X-|-X-| O |-X MEMBERS. Group: 1 1 9 9 1 1 1 1 9 9 1 Interp: 0 1 2 3 2 1 0 0 1 1 0 Origin: C I I I I I C C I I C Julian Date: 1 2 3 4 5 6 7 8 9 10 11 Key: C Censused present in group (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) | Interval endpoint
Having dispensed with the various elaborations and exceptions that occur in unusual cases it is time to return to the fundamentals of interpolation and examine what happens when an individual moves between groups. What comes into play are the first 2 of the 3 interpolation intervals. Recall:
Interpolation keeps an individual in the group where a census locates him for a time period that is the shorter of:
Half of the time interval between the individual's next (or prior) census which finds the individual in any group.
Half of the time interval between the next (or prior) recorded absence from the group in which the individual was censused. Absences from other groups are ignored.
Figure 4.12 shows a record of one individual's censuses. He, a male, is censused in 2 groups, group 1 and group 2. The census records for each group reflect both presence in the group and absence from the group.
Figure 4.12. An Individual is Censused in 2 Groups
One individual's census records Group 1: C C A C A Group 2: A C C Date: 1 2 3 4 5 6 7 8 9 10 Key: C Censused present A Censused absent
Figure 4.13 shows what would happen if interpolation worked with each group separately. There are conflicts, days when the individual is in both groups. Something else must be done.
Figure 4.13 is an example of an interpolation method that does not work. The method shown in the figure is not one Babase uses when interpolating.
Figure 4.13. Interpolating Each Group Separately
One individual's census records Group 1: C C A C A Group 2: A C C Group 1 Interpolating just group 1 CENSUS: C C A C A Intervals: X---|---X---------| O |-X-| O Group 2 Interpolating just group 2 CENSUS: A C C Intervals: O |---------X-------|-------X Date: 1 2 3 4 5 6 7 8 9 10 Key: C Censused present A Censused absent X Known present O Known absent - Presumed present | Interval endpoint
The solution is return to the interpolation fundamentals. We begin by taking a closer look at the way we have been diagramming intervals. In Figure 4.13 the first group has 3 locating census and 2 absences, and yet we've diagrammed the resultant intervals on a single line. The interpolation fundamentals tell us to obtain 2 pairs of intervals for each locating census. A “halfway to census” pair of intervals and a “halfway to absence” pair of intervals. Figure 4.14 takes the CENSUS rows of the first group shown in Figures 4.12 and 4.13 and does this for each locating census. In Figure 4.14 the CENSUS rows of days 1, 3 and 9 each have their own sections detailing the intervals to the nearest censuses and intervals to the nearest absences. The lines labeled Presence show the intervals that are halfway from each locating census to the next. The lines labeled Absence show the intervals that are halfway from each census to the nearest absence. This detailed breakdown is followed by a composite interval diagram of the familiar type encountered in figures 4.2 through 4.13 above. It should be clear that we have arrived at the “composite” form of the interval diagram by following the fundamentals, the composite is made up of the shorter of each census's intervals. The result is correct, the composite constructed in Figure 4.14 is identical to the one shown previously in Figure 4.13. It had better be, or else the interpolations of Figure 4.13 would be in conflict with the fundamental interpolation rules.
Figure 4.14. A Closer Look at Intervals
CENSUS rows from group 1 CENSUS: C C A C A Day 1 Intervals by presence and absence Presence: X---| X Absence: X-------------| O Day 3 Intervals by presence and absence Presence: X |---X-----------| X Absence: X---------| O Day 9 Intervals by presence and absence Presence: X |-----------X Absence: O |-X-| O Combining the shorter intervals Interval: X---|---X---------| O |-X-| Date: 1 2 3 4 5 6 7 8 9 10 Key: C Censused present A Censused absent X Known present in same group x Known present in different group O Known absent in same group - Inside of interval | Interval endpoint
The intervals in Figure 4.14 did not have to be grouped by censused day, they could have been grouped by Presence and Absence or any other way. For each set of locating censuses we can always split out the “halfway to census” intervals from the “halfway to absence” intervals, group them any way we like, and later use the interpolation fundamentals to recombine them, without affecting the result. This has not been necessary so far, but it is essential if we are to correctly interpolate when an individual moves between groups, as above in Figure 4.12: “An Individual is Censused in 2 Groups”. We must return to the fundamentals to make sense of interpolation. Rather than trying to combine the results of interpolating the groups separately, as was done in Figure 4.13: “Interpolating Each Group Separately”, instead combine the results of interpolating the presences in all the groups with separate interpolations of the absences in each group. Each time a census finds an individual in a group, separately compute both the interval halfway to the nearest census that finds the individual in any group and the interval halfway to the nearest absence from the particular group being censused.[194]In Figure 4.15, this method is applied to the data first seen in Figure 4.12. For clarity the intervals surrounding the censuses that belong to one group are shown separately from those belonging to the other group.[195] The lines labeled Presence show the intervals that are halfway from each census to the nearest census that finds the individual in any group. The lines labeled Absence show the intervals that are halfway from each census to the nearest absence in the same group. Censuses with no neighboring absence do not have this latter sort of interval shown.[196]
Figure 4.15. Presence and Absence Interpolated Separately
One individual's census records Group 1: C C A C A Group 2: A C C Group 1 The intervals of group 1's censuses Presence: X---|---X-----| x |-----X-| x Absence: X---------| O |-X-| O Group 2 The intervals of group 2's censuses Presence: x x |-----X-----| x |-X Absence: O |---------X Date: 1 2 3 4 5 6 7 8 9 10 Key: C Censused present A Censused absent X Known present in same group x Known present in different group O Known absent in same group - Inside of interval | Interval endpoint
Figure 4.16 shows how interpolation combines the “presence” and “absence” intervals by choosing the shorter of the two to as the period during which the individual is assumed to be in the group where censused. The line labeled Used contains the shorter of each census's two intervals.[197]
Figure 4.16. Combining Presence and Absence Intervals
One individual's census records Group 1: C C A C A Group 2: A C C Group 1 The intervals of group 1's censuses Presence: X---|---X-----| x |-----X-| x Absence: X---------| O |-X-| O Used: X---|---X-----| |-X-| In Group: 1 1 1 1 ? ? ? ? 1 ? Group 2 The intervals of group 2's censuses Presence: x x |-----X-----| x |-X Absence: O |---------X Used: |-----X-----| |-X In Group: ? ? ? ? 2 2 2 ? ? 2 Date: 1 2 3 4 5 6 7 8 9 10 Key: C Censused present A Censused absent X Known present in same group x Known present in different group O Known absent in same group - Inside of interval | Interval endpoint
Having interpolated the intervals surrounding each census, determining the final group membership is a straightforward matter of placing the individual in the unknown group when there's no where else to put him. Figure 4.17 shows this process. All that remains is to compute the Interp values in the usual fashion, by ignoring absences and counting distance from the nearest census. In Figure 4.17 the intervals between locating census are shown, labeled For Interp, to support the Interp values given.
Figure 4.17. Group Membership Given Multiple Groups
One individual's census records Group 1: C C A C A Group 2: A C C Group 1 The intervals of group 1's censuses Used: X---|---X-----| |-X-| In Group: 1 1 1 1 ? ? ? ? 1 ? Group 2 The intervals of group 2's censuses Used: |-----X-----| |-X In Group: ? ? ? ? 2 2 2 ? ? 2 Intervals between locating censuses For Interp: X~~~|~~~X~~~~~|~~~~~X~~~~~|~~~~~X~|~X MEMBERS. Group: 1 1 1 1 2 2 2 9 1 2 Interp: 0 1 0 1 1 0 1 1 0 0 Origin: C I C I I C I I C C Date: 1 2 3 4 5 6 7 8 9 10 Key: C Censused present A Censused absent X Known present in same group - Presumed present ~ Inside of interval | Interval endpoint
By now it should be clear that interpolation[198] is a function over CENSUS row sets. It is a function, for every input you get exactly one output. It takes sets of CENSUS rows as input. Because sets are unordered you can put CENSUS rows into the database in any order and the result will be the same. And, because it is a function, you can re-interpolate the same CENSUS rows as many times as desired without altering the final result.
It should also be clear why interpolation always chooses to use “the shorter interval”, and why this always produces the “correct” result. The shorter interval is short for a reason, there is some reason to believe the individual is not in the group else-wise the interval would be longer. Further, every time the shorter interval is chosen a possible overlap with another interval from a different locating census is eliminated. By always choosing the shorter interval interpolation insures that the interpolation of any two locating censuses will not conflict.
In addition to that most important distinction which
classifies CENSUS rows into absent and
locating censuses
there is a second distinction which further divides locating
censuses into those which interpolate and those which do
not. Those CENSUS rows that record
observational data are interpolating censuses; those
with Status values of
C
,
D
and,
M
.[199] (All of the previous examples have concerned
CENSUS rows of this type.) The remaining CENSUS.Status values
indicate that the CENSUS row is the result
of analysis, all of the “old style”, that is
“historical”, CENSUS.Status values and the
N
manual Status
value. These are the
non-interpolating censuses.
This further division of locating censuses into
interpolating and non-interpolating, the division between
raw and already analyzed data, leads to the final refinement
to the interpolation procedure. We do not want
interpolation to produce re-analyzed results from already
analyzed data. Interpolation occurs only between
“regular”, that is to say interpolating,
censuses (and to the birth date as a special case).
“Non-interpolating” census rows are copied
directly from CENSUS to MEMBERS, CENSUS.Status becomes MEMBERS.Origin, and
Interp is set to NULL
. When a
non-interpolating census is found on the birth date, the
birth date will not interpolate.
Interpolation looks at “regular” census rows and attempts to guess the individual's location on those days when there are no observations. It does so by looking at the intervals between the “regular” censuses. Finding non-interpolating CENSUS rows, that is to say already analyzed data, on one of these intervals breaks the assumptions interpolation uses in its “guessing”. The previously analyzed data point could be there for any reason at all, and there's no point in pretending it's not there either. What interpolation does is give up. It interpolates up to the offending data point and then stops.[200] After that it still creates rows in MEMBERS, but it does not attempt to make guesses about where to place an individual or what the interpolated row means.
This situation is not expected to occur, or, rather, whenever there are non-interpolating CENSUS rows between interpolating censuses, the non-interpolating CENSUS rows are expected to be contiguous over the entire interval between the interpolating censuses. So, the expected cases are the trivial degenerate ones. None the less, such situations probably do occur in the existent data. It would probably best to either require the expected behavior, or to get rid of all the pre-analyzed CENSUS rows and replace them with raw data. Especially given the design problems pointed out below.
Regardless, non-trivial examples are presented here so that a complete understanding of interpolation can be developed.
Figure 4.18 shows that the
3 fundamental interpolation intervals are shortened when a
non-interpolating census is found between interpolating
censuses. The intervals for each locating census
are examined separately. The non-interpolating census has
no interpolation intervals. The intervals of the
interpolating censuses are truncated, reduced to the
interval between the interpolating and non-interpolating
censuses. By this means a portion of the diagram, days 4
and 5, are blocked from interpolating into the group. If
there were no N
census, the Absence interval would
be day 1's shortest interval, and days 4 and 5 (as well as
day 3) would interpolate into the group. (Notice that day
1's Absence interval has a midpoint
day, day 5, and that it would have been included in the
interval.) Interpolation is prevented from placing
individuals in the group of their interpolating census on
the “far side” of non-interpolating
censuses.
Figure 4.18. Pre-Analyzed Data Truncates Interpolation Intervals
CENSUS rows from group 1 CENSUS: C N A C Day 1 Intervals per fundamental type Presence: X-----| N X Absence: X-----| N O 14 Day Lim: X-----| N Day 3 Intervals per fundamental type Presence: N Absence: N 14 Day Lim: N Day 12 Intervals per fundamental type Presence: X N |---------------------X Absence: N O |-----X 14 Day Lim: N |---------------------------------X Julian Day: 1 2 3 4 5 6 7 8 9 10 11 12 Key: C Censused present in group (group 1) N Manual entry, present in group but non-interpolating (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Inside of interval | Interval endpoint
In Figure 4.19 the shortest intervals of each locating census have been chosen and combined; the result is the line labeled For Group. This is then used to determine group membership.
The interesting part of Figure 4.19
is the computation of the Interp
values. The “halfway to census” intervals of
Figure 4.18 have been combined
and labeled For Interp. Recall
that it is these intervals that are used to compute the
Interp values. The
N
census has created
a “gap” in interpolation, clearly shown on the
For Interp line as running from day
3 through day 6. Over this interval interpolation's
assumptions have been violated and it does not know what to
do. The group membership is easy. On day 3, the day of the
N
census it can
simply copy the CENSUS row's Grp and Status
into the appropriate MEMBERS columns in
the same fashion it would for any other locating census. On days 4 through 6 it can do what it
usually does with group membership when it does not know
where to locate an individual, it places the individual in
the unknown group with a Origin
of I
. On days 3 through
6 interpolation has no way of knowing how far away the day
is from the nearest locating census, which is what
is supposed to go in the Interp
column. Due to this lack of information it assigns the Interp column a value of NULL
, no
data, on this interval.
Figure 4.19. Pre-Analyzed Data Interrupts Interpolation
An individual is censused CENSUS: C N A C Intervals For Group: X-----| N O |-----X For Interp: X~~~~~| |~~~~~~~~~~~~~~~~~~~~~X MEMBERS. Group: 1 1 1 9 9 9 9 9 9 9 1 1 Interp: 0 1 5 4 3 2 1 0 Origin: C I N I I I I I I I I C Date: 1 2 3 4 5 6 7 8 9 10 11 12 Key: C Censused present in group (group 1) N Manual entry, present in group but non-interpolating (group 1) A Censused absent in group (group 1) X Known present in group (group 1) O Known absent in group (group 1) - Presumed in group (group 1) ~ Inside of interval | Interval endpoint
When looking at Figure 4.19, one way
to explain what happens to Interp
is to say that it is fixed at NULL
over that portion of
the day 1 census's “halfway to census” interval
that was truncated because the
N
row showed up.
(See Figure 4.18.) Effectively,
as MEMBERS Interp counts up with increasing
distance from the interpolating census, the count is fixed
at NULL
upon encountering a non-interpolating census until
the point is reached at which counting back down to the next
interpolating census begins, at which point the count
downward resumes as though never interrupted.[202]
The approach interpolation
takes, in some sense, attempts to minimize the disturbance
created when already analyzed census data are mixed in with
raw census information. However, as can be seen in Figure 4.19, it is not entirely successful.
Although day 7, for example, has an Interp value indicating it is 5 days
away from a census, it is really 4 days away from the
N
census. If the
N
CENSUS does really represent a census, then day
7's Interp value is wrong. And
the problems are not restricted to Interp values. Is it really true that
days 4 and 5 should be assigned to the unknown group? If so
then why aren't there
N
rows that say so?
Day 2 is even more disturbing. There is no diagram for
this, but suppose the
N
census found the
individual in a different group. Figure 4.18 would be unchanged, all of
day 1's intervals would be truncated at the
N
census. The effect
would be more clear if the interval between the preceding
C
census and the following
N
census were larger,
but consider that day 2, by the midpoint rule, would be
“assigned” to the
N
census. That means
that if the N
census
really does represent a census in a different group, that
day 2 should be assigned to that group, not to group
1.
Note that, in the general case, even though the
“halfway to census” interval does not determine
group membership (all the intervals are truncated, leaving a
“gap” in which interpolation defaults to the unknown group), whether this interval has a midpoint day,
and if so where it falls, does matter
to the computation of Interp. If
the midpoint day happens to fall into the side of the
interval containing the non-interpolating census then the
Interp value will be NULL
.
Otherwise, it will have a value representing the number of
days to the nearest locating,
and interpolating, census.
Incorporating the above safety checks into the rules we already have, ensuring that data are not re-analyzed, produces the actual interpolation rules.
Using these rules interpolation creates rows in MEMBERS based on the information it finds in CENSUS, and the BIOGRAPH columns Birth, Matgrp, Statdate and Status.
CENSUS Rows Are Either Absences, Interpolating, or Non-Interpolating
Interpolation partitions all CENSUS rows into one of 3 categories:
CENSUS rows which indicate absence from a group.
Those CENSUS rows that
record observational data are interpolating
censuses; those with Status
values of C
,
D
and,
M
.
The remaining CENSUS.Status values indicate the CENSUS row is the result of analysis.
These rows, all of the “old style”,
that is “historical”, CENSUS.Status
values and the
N
manual
Status value, are not re-analyzed and so do not
interpolate.
For convenience, the CENSUS rows that are not absences, the interpolating and the non-interpolating censuses, are termed “locating censuses”.
Censusing Assigns Group Membership
On those days when an individual is censused in a
group, when there is a locating CENSUS
row, a row is created in MEMBERS to
place that individual in the group on the given day. The
Origin value is the CENSUS row's Status
value. When the CENSUS row is interpolating the
Interp value is
0
. When the CENSUS
row is non-interpolating
the Interp value is
NULL
.
Interpolation places an individual in the group into
which he is censused, the Grp of
an interpolating CENSUS row (Status values
C
,
D
, and
M
), on the days to either
side of the census being interpolated for a time period
that is the shorter of:
The Halfway to Census Interval
Half of the time interval between the individual's next (or prior) locating and interpolating census, which may locate the individual in any group.
The Halfway to Absence Interval
Half of the time interval between the next (or prior) recorded absence, considering only absences from the same group in which the individual was censused. Absences from other groups are ignored.
The 14 day Interpolation Limit
Given no other information, an individual is considered to remain (or have been) in the group where observed for 14 days following (or preceding) the date of observation.
The resulting MEMBERS
rows have an Origin of
I
and an Interp value of the number of days
difference between the MEMBERS row's
Date and the date of the
nearest locating census; Interp values count up over the
The Halfway to Census Interval as the distance from the
interpolated census increases. An interpolated MEMBERS row falling on the day after a
census has an Interp of 1, the
day after that the Interp is
2, and so forth, assuming, of course, the individual has
no other nearby CENSUS rows.
This rule qualifies how interpolation assigns the halfway point between two CENSUS rows in The Halfway to Census Interval and The Halfway to Absence Intervals, above, when the number of days in the interval cannot be divided into equal halves. Whenever interpolation is called upon to halve an interval between two CENSUS rows that contains an odd number of days then the “midpoint day” is assigned to the left, earlier, half of the interval when the Julian date of the midpoint day is even. A midpoint day is assigned to the right, later, half of the interval when the Julian date of the midpoint day is odd.
This rule declares a live birth to be the equivalent
of an interpolating census, one that indicates presence in
the individual's Matgrp. fetal
losses, individuals with NULL
Snames, are not considered births and
are never interpolated. An individual is placed in his
Matgrp on his birth date even
when a regular census has an absence recorded for the
individual on the date of birth. In this case
interpolation always entirely ignores the absence and will
not use such an absence to compute a Halfway To Absence
Interval.
When there is a locating census on the
birth date, the MEMBERS row
interpolation creates is like that made for any other
locating census with the given Status. But, when there is no locating census on the birth date the resulting MEMBERS row has a Origin of
I
(and an Interp of 0
as any
census with a Status of
C
would have.) Aside from
their I
Origin value, births interpolate as
would any CENSUS with a
C
Status.
No Data Implies Unknown Group Membership
On days when none of the above rules serve to place
an individual in a group, the individual is placed in
the unknown group. The resulting MEMBERS rows have an Origin of
I
and an Interp value of the number of days
difference between the MEMBERS row's
Date and the date of
individual's nearest interpolating census.[203]
Interpolation will not place a row in MEMBERS before an individual's Birth date.
When an individual is dead, interpolation will not place a row after the individual's Statdate.
Data Entry Cessation Stops Interpolation of Living Individuals.
When an individual is alive, interpolation will create rows after the individual's last locating census only when there are subsequent absences; absences, that is, from the group in which the individual was censused.[204] In this case, unlike above, no data does not imply unknown group membership; such rows are created only so long as the individual is interpolated into the group of his last locating census. When a living individual has no absences after their last locating census, absences from the group of their last locating census, interpolation assumes that there is further data available which has yet to be entered and interpolation stops at the last locating census.
Interpolation is only done to regular, that is interpolating, CENSUS
rows; data that were collected in the field. Other data,
the “non-interpolating” census rows that
represent the result of prior analysis, do not
interpolate; they are copied directly from CENSUS to MEMBERS, CENSUS.Status becomes
MEMBERS.Origin
and Interp is set to
0
. Further, when a non-interpolating
census is found on one of The 3 Interpolation Intervals
the interval is shortened enough that the
non-interpolating census is no longer on the interval.
When a non-interpolating census is found on a birth date,
the birth date does not interpolate.
The MEMBERS Interp column is fixed at NULL
on
the interval from the non-interpolating census row through
the “midpoint” end of The Halfway to Census Interval, endpoints included.[205] Here we are speaking of The Halfway to Census Interval as computed, not a Halfway to Census Interval
shortened in the preceding paragraph.
It is expected that all non-interpolating CENSUS rows, that is to say CENSUS rows produced
by prior analysis, will be clustered in contiguous intervals
with “regular” census rows at the endpoints.
This is particularly expected of “old style”
census rows from before Babase, as they precede all
“regular” census data, but is also expected of
the N
non-interpolating, manual, Status
code, should it ever be used. If these expectations are born
out, the Data are not Re-Analyzed rule will
never be invoked.
There are some not-quite-obvious implications given these interpolation rules:
The only rows in MEMBERS that
have an Origin of
I
, and an Interp of 0
, and
are not placed in the unknown group are birth dates.
Not every birth date will have an associated MEMBERS row with these values, as some birth
dates have locating
censuses, but MEMBERS rows with
these values will be birth dates.
Living individuals, but not dead ones, can have MEMBERS rows created by the interpolation procedure that locate the individual in a group on a date later than the individual's Statdate.[206]
So long as an individual is alive the last CENSUS to locate the individual ought be followed by a record of absence, an absence from the group where the individual was last found. To do otherwise, as must occur when there is simply no further data to be entered, is to introduce a bias into MEMBERS.
Aside from births, the only other rows in MEMBERS with an Origin of
I
and an Interp of 0
are
those in the unknown group which were created by the
“Data are not Re-Analyzed” rule
above.
As fetal losses, individuals with NULL
Snames, cannot appear in CENSUS, are not considered a live birth,
and always have their birth date equal to their Statdate, they never have MEMBERS rows associated with them.
When computing Interp
values from The Halfway to Census Interval The Midpoint Rule is usually immaterial.
However, when non-interpolating
censuses affect the interpolation The Midpoint Rule can be the factor that
determines whether or not a MEMBERS
row has a 0
Interp value or not.
[185] At this time only DEMOG, the demography notes table, contributes to CENSUS any information regarding group membership.
[186] Sometimes, when demography information is added into other tables, CENSUS rows are altered rather than removed. Likewise, CENSUS rows are removed (or altered as necessary) when demography information is removed from other tables.
[188] This is the one exception, if you wish to consider it so, to the rule that an individual cannot be censused both present and absent in the same group on the same day.
[189] The “same group” condition is one that must be met whenever interpolation examines intervals between presence and absence.
[190] As the individual is alive, every census that post-dates the individual's Statdate must record an absence, else the Statdate would be adjusted to reflect the date of last census.
[191] This is a heuristic. While it should work well enough most of the time the Babase user must be aware of the pitfalls in this approach. These are explained below.
[192] Without this restriction interpolation would have to insert rows forever, placing the individual in the unknown group off into the indefinite future.
[193] Notice that interpolation does not bother analyzing absences, such as the last-most, that are not neighbor to censuses.
[194] Note that the intervals spoken of here are always anchored at one end by a census that finds an individual in a group. Each such census can therefore have 2 intervals associated with it, one of the days preceding the census date and one containing the following days. These intervals can then appear in the diagrams as single lines that contain a census date. It is important to remember that there are really 2 intervals depicted; one line that ends on the date of the census and another that begins at that point.
[195] As locating censuses are interpolated individually the figure could diagram the intervals associated with each census separately, as in Figure 4.14, work out group membership from that, and then combine the results; the outcome would be unaffected. The chosen presentation form allows the interval endpoints to “match up” in a revealing fashion. As an exercise the reader should prove to himself that the intervals associated with each locating census are accurately depicted, and that the order in which locating censuses are interpolated does indeed make no difference.
[196] Figure 4.14: “A Closer Look at Intervals” makes clear that it is not necessary to show these intervals. By definition, the omitted intervals will always be longer than the “halfway to census” interval of the census being interpolated. As the shorter interval is the one used the longer may be ignored.
[197] When there are two intervals. When there's no “absence” interval the “Used:” line shows the “presence” interval.
[198] The proper term is “The Glorious Interpolation Procedure”, but we don't tell this to just anybody.
[200] It might be better if interpolation did not
interpolate at all on those intervals between
interpolating censuses that contain a non-interpolating
census[200] -- if it put the individual in the unknown group, with an Interp of 0
and
an Origin of NULL
whenever
there was no locating census. However, this
could easily cause problems because interpolation has
always worked as the body of this document describes.
Although these situations are not supposed to occur,
it is likely the data contains such situations and
changes should not be made to interpolation which
break the database.
[200200] I have not thought this through. At first glance it seems the code would be simpler, but perhaps not. And the effect on data analysis is unclear. It is probably best to adopt one of the solutions presented in the note below.
[202] Although in this example we “count
up” traversing the timeline from left to right,
had the N
census
had been closer to the right side of the diagram than
the left we would be “counting up” the
interval by traversing the timeline in the opposite
direction, from right to left.
[203] The same method is used to compute Interp values when interpolation uses The 3 Interpolation Intervals, above.
[204] This “same group” criteria corresponds with the criteria found in The Halfway to Absence Interval.
[205] Interp is fixed at
0
over the portion of The Halfway to Census Interval that was truncated in the
preceding paragraph. Effectively, as MEMBERS Interp
counts up with increasing distance from the
interpolating census, the count is fixed at NULL
upon encountering a non-interpolating census until the
point is reached at which counting back down to the
next interpolating census begins, at which point the
count downward resumes as though never
interrupted.
[206] This is examined in detail in Interpolation at the Statdate.