Babase:

Technical Specifications for the Amboseli Baboon Project Data Management System

Jeanne Altmann, PhD.

Susan C. Alberts, PhD.

Jacob B. Gordon

Leah Gerber

ER Diagram conversion to Dia 

Leah Gerber

ER Diagram layout 

Karl O. Pinc

ER Diagram layout 

Anne Ndeti Hubbard

ER Diagram layout 

Anne Ndeti Hubbard

DocBook formatting 

Karl O. Pinc

DocBook formatting 

Document generated: 2017-08-15 11:01:37.

Copyright Notices

Copyright (C) 2005-2014 Karl O. Pinc, Jeanne Altmann, Susan Alberts, Leah Gerber, Jake Gordon, The Meme Factory, Inc.

Except as otherwise noted permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.

Copyright (C) 1996-2011 The PostgreSQL Global Development Group

The appendix titled Database Transactions Explained is Copyright (C) 1996-2011 by the PostgreSQL Global Development Group, distributed under the terms of the license of the University of California below.

Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS-IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

March 2, 2005

Revision History
Revision 0.0March, 2 2004
Initial document
Revision 1.0November 23, 2007
Many revisions towards a final Babase 2.0 documentation. Addition of multiparty interaction tables.

Acknowledgments

We gratefully acknowledge the support of the National Science Foundation for the supporting the collection of the majority of the data stored in the database; in the past decade in particular we acknowledge support from IBN 9985910, IBN 0322613, IBN 0322781, BCS 0323553, BCS 0323596, DEB 0846286, DEB 0846532 and DEB 0919200. We are also very grateful for support from the National Institute of Aging (R01AG034513-01 and P01AG031719) and the Princeton Center for the Demography of Aging (P30AG024361). We also thank the Chicago Zoological Society, the Max Planck Institute for Demographic Research, the L.S.B. Leakey Foundation and the National Geographic Society for support at various times over the years. In addition, we thank the National Institute of Aging (R03-AG045459-01) for supporting recent work extending the database to incorporate genetic and genomic data.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, the National Institute of Aging, the Princeton Center for the Demography of Aging, the Chicago Zoological Society, the Max Planck Institute for Demographic Research, the L.S.B. Leakey Foundation, the National Geographic Society, or any other organization which has supplied support for this work.

Table of Contents

1. Introduction
This Document
Conventions Used In This Document
A Guide for the Reader
System Design
To Start Babase
Other Resources
2. Babase System Architecture
Databases
The babase Database
The babase_copy Database
The babase_test Database
Users, Groups and Database Permissions
The babase_readers group
The babase_editors group
Schemas
The babase schema
The babase_something_views schemas
The babase_pending schema
The sandbox schema
The devel schema
The per-user schemas
Table Overview
Entity-Relationship Diagrams
Views
Special Values
Indexes
The Babase Program Code
3. Baboon Data: Primary Source Material
Group Membership and Life Events
ALTERNATE_SNAMES (Alternate Short Names)
BEHAVE_GAPS (Gaps in Behavior Observations)
BIOGRAPH (Baboon Biographical Data)
CENSUS (Group Membership)
CONSORTDATES (First Consortship Dates)
DEMOG (Demography Notes)
DISPERSEDATES (Dispersal Dates)
GROUPS (Groups)
MATUREDATES (Sexual Maturity Dates)
RANKDATES (Adult Rank Attainment Dates)
Sexual Cycles
CYCGAPS (Gaps in Female Cycle Observations)
CYCLES (Female Sexual Cycles)
CYCPOINTS (Female Sexual Cycle Events)
PCSKINS (ParaCallosal Skin observations)
PREGS (Pregnancies)
SEXSKINS (Sexskin Turgesence Measurements)
Social and Multiparty Interactions
ALLMISCS (Ad-libitum sample data)
CONSORTS (multiparty disputes over CONSORTshipS)
FPOINTS (Point data on Females)
INTERACT_DATA (Interactions)
MPIS (Multiparty InteractionS)
MPI_DATA (Multiparty dyadic Interactions)
MPI_PARTS (Multiparty Interaction PARTicipantS)
PARTS (Participants in interactions)
POINT_DATA (Point observation data)
NEIGHBORS (point observation data on Neighbors)
SAMPLES (all-occurrences Samples)
Darting
ANESTHS (Extra Sedation Administered During Darting)
BODYTEMPS (Darting Body Temperature Measurements)
CHESTS (Darting Chest Circumference Measurements)
CROWNRUMPS (Darting Crown-to-Rump Measurements)
DART_SAMPLES (Darting Tissue Sample Records)
DARTINGS (Baboon Darting Events)
DPHYS (Darting Physiological Measurements)
HUMERUSES (Darting Humerus Length Measurements)
PCVS (Darting Blood Measurements)
TEETH (Darting Tooth Data)
TESTES_ARC (Darting Testes circumference Data)
TESTES_DIAM (Darting Testes Diameter Data)
TICKS (Darting Tick and Parasite Data)
ULNAS (Darting Ulna Length Measurements)
SWERB Data (Group-level Geolocation Data)
AERIALS (Aerial photos)
GPS_UNITS (Individual GPS Devices)
QUAD_DATA (map Quadrants)
SWERB_BES (Begin/Ends: Uninterrupted bouts of group-level observation)
SWERB_DATA (Group Level GPS Point Samples)
SWERB_DEPARTS_DATA (Observation team departures from camp)
SWERB_DEPARTS_GPS (SWERB GPS Departure data)
SWERB_GWS (SWERB Grove and Waterholes)
SWERB_GW_LOC_DATA (SWERB Grove/Waterhole Location Data)
SWERB_LOC_DATA
SWERB_LOC_GPS
SWERB_OBSERVERS
TREES
Weather Data
RAINGAUGES (Rain Measurements)
RGSETUPS (Rain Gauge Setups)
TEMPMINS (Minimum Temperature Measurements)
TEMPMAXS (Maximum Temperature Measurements)
WEATHERHAWK (WeatherHawk Data)
WREADINGS (Weather Readings)
4. Baboon Data: Analyzed
Darting
WBC_COUNTS (White Blood Cell Counts)
Group Membership and Life Events
DAD_DATA (Paternity analysis results)
MEMBERS (Day-by-day Group Membership)
RANKS (Rankings Within Groups)
Hybrid Scores
HYBRIDGENE_ANALYSES
HYBRIDGENE_SCORES
Interpolation
Interpolation's 3 Fundamentals
Interpolation Visualized
The Interpolation Rules
Expectations and Implications
The Sexual Cycle Day-By-Day Tables
CYCGAPDAYS (Day-by-day Periods of No Observation)
CYCSTATS (Female Fertility Cycle States)
MDINTERVALS (Mdate to Ddate Intervals)
MMINTERVALS (Mdate to Mdate Intervals)
REPSTATS (Female Reproductive States)
Sexual Cycle Determination
Automatic Sequencing
Automatic Mdate Generation
5. Support Tables
General Support Tables
OBSERVERS (Data Collection Staff)
OBSERVER_ROLES
UNKSNAMES (problem in identifying focal's neighbor or a lone male)
Group Membership and Life Events
BSTATUSES (Birth Accuracy Indicators)
CONFIDENCES (death cause (nature and agent), dispersal, and matgrp Confidence levels)
DAD_SOFTWARE
DCAUSES (Causes of Death)
DEMOG_REFERENCES (Demography Note References)
DEATHNATURES (Natures of Death Causes)
ENTRYTYPES (Categories of Entry to Study Population)
GAP_END_STATUSES (Explanations for Behavior Gap Ends)
MSTATUSES (Maturity Marker Statuses)
PATERNITY_COMPLETENESS (Completeness Scores in Paternity Analyses)
PATERNITY_MISMATCHES (Types of Genetic Mismatches)
RNKTYPES (Ranking Categories)
STATUSES (Indicators of Record and Baboon Vividity)
Hybrid Scores
HYBRIDGENE_SOFTWARE
MARKERS
Social And Multiparty Interactions
ACTIVITIES
ACTS (Interaction Types)
DATA_STRUCTURES (Data structures produced by the palmtop)
CONTEXT_TYPES (multiparty Interaction Context Categories)
FOODCODES (Food item Codes)
FOODTYPES (Food Types)
KIDCONTACTS (spatial relationship between mother and infant)
MPIACTS (Multiparty Interaction Types)
NCODES (Neighbor classifications)
PALMTOPS (the Palmtop hand-held data collection units)
PARTUNKS (problem identifying a multiparty interaction participant)
POSTURES
PROGRAMIDS (Program versions used on the palmtop)
SETUPIDS (Setup files used on the palmtop)
SUCKLES (infant suckling activity)
Sexual Cycles and The Sexual Cycle Day-By-Day Tables
PCSCOLORS (ParaCallosal Skin Colors)
Darting
BODYPARTS
DART_SAMPLE_CATS (Darting Sample Categories)
DART_SAMPLE_TYPES (Sample Types)
DRUGS (darting anesthetics)
LYMPHSTATES (Lymph node conditions)
PARASITES (Parasites and their indicators)
TCONDITIONS (Tooth Conditions)
TICKSTATUSES (parasite count classifications)
TOOTHCODES (kinds of teeth)
TOOTHSITES (Locations of deciduous or adult teeth)
TSTATES (State of Tooth existence)
SWERB Data
ADCODES (SWERB Ascent and Descent relationships)
PLACE_TYPES (codes for various landscape features)
SWERB_LOC_CONFS (SWERB sleeping grove Confidences)
SWERB_TIME_SOURCES (SWERB Time Sources)
SWERB_XYSOURCES (SWERB Time Sources)
Weather Data
WEATHERHAWK_SOFTWARES (Programs used with WeatherHawk)
WSTATIONS (Weather Stations)
6. The Babase Views
Group Membership and Life Events
CENSUS_DEMOG (CENSUS extended with DEMOG information)
CENSUS_DEMOG_SORTED (CENSUS_DEMOG, Sorted)
CYCPOINTS_CYCLES (CYCPOINTS extended with CYCLES information)
CYCPOINTS_CYCLES_SORTED (CYCPOINTS_CYCLES, Sorted)
DEMOG_CENSUS (DEMOG, showing CENSUS information)
DEMOG_CENSUS_SORTED (DEMOG_CENSUS, Sorted)
GROUPS_HISTORY
PARENTS
POTENTIAL_DADS (Potential Dads)
Sexual Cycles
CYCLES_SEXSKINS (CYCLES extended with SEXSKINS information)
CYCLES_SEXSKINS_SORTED (CYCLES_SEXSKINS, Sorted)
MATERNITIES (completed reproductive events)
MTD_CYCLES (CYCLES and Mdate, Tdate, and Ddate CYCPOINTS data)
PCSKINS_SORTED (PCSKINS, sorted for maintenance purposes)
SEXSKINS_CYCLES (CYCLES extended with SEXSKINS information)
SEXSKINS_CYCLES_SORTED (SEXSKINS_CYCLES, Sorted)
Social and Multiparty Interactions
ACTOR_ACTEES (Complete social interactions, INTERACT_DATA extended twice with PARTS)
INTERACT (INTERACT_DATA, with enhanced dates and times)
INTERACT_SORTED
MPI_EVENTS (Dyadic social interactions that comprise multiparty interaction collections, MPIS joined with MPI_DATA extended twice with MPI_PARTS)
POINTS (POINT_DATA, with enhanced times)
POINTS_SORTED (POINTS, Sorted)
SAMPLES_GOFF (SAMPLES, with the Group OF the Focal)
Darting
ANESTH_STATS (darting additional Anesthetic Statistics)
BODYTEMP_STATS (darting Body Temperature Statistics)
CHEST_STATS (darting Chest circumference Statistics)
CROWNRUMP_STATS (darting Crown-to-Rump Statistics)
DSAMPLES (darting sample records with columns for each sample type)
DENT_CODES (darting Dentition records with columns for each Toothcode)
DENT_SITES (darting Dentition records with columns for each Toothsite)
HUMERUS_STATS (darting Humerus length Statistics)
PCV_STATS (darting PCV Statistics)
TESTES_ARC_STATS (darting Testes circumference Statistics)
TESTES_DIAM_STATS (darting Testes Diameter Statistics)
ULNA_STATS (darting Ulna length Statistics)
SWERB Data (Group-level Geolocation Data)
QUADS (map Quadrants)
SWERB (Group level gps point samples)
SWERB_DATA_XY (The SWERB_DATA table with separate X and Y coordinates)
SWERB_DEPARTS (SWERB observation team Departures from camp)
SWERB_GW_LOCS (SWERB Grove and Waterhole Locations)
SWERB_GW_LOC_DATA_XY (The SWERB_GW_LOC_DATA table with separate X and Y coordinates)
SWERB_LOC_GPS_XY (The SWERB_LOC_GPS table with separate X and Y coordinates)
SWERB_LOCS (placement of a group at a landscape feature)
SWERB_UPLOAD (facility for uploading data into SWERB)
Weather Data
MIN_MAXS (Manually collected minimum and maximum temperature and rain data)
MIN_MAXS_SORTED (MIN_MAXS, Sorted)
Views Which Add Gid To Tables
The BIRTH_GRP View
The ENTRYDATE_GRP View
The STATDATE_GRP View
The CONSORTDATES_GRP View
The CYCGAPDAYS_GRP View
The CYCGAPS_GRP View
The CYCSTATS_GRP View
The DARTINGS_GRP View
The DISPERSEDATES_GRP View
The MATUREDATES_GRP View
The MDINTERVALS_GRP View
The MMINTERVALS_GRP View
The PCSKINS_GRP View
The RANKDATES_GRP View
The REPSTATS_GRP View
7. Data Entry
Data Entry Overview
Automatically Generated IDs
8. Babase Programs
Data Maintenance Programs
SWERB_UPLOAD: View to upload into SWERB
Upcen: Update CENSUS table
Upmpi: Upload Multiparty Interactions
Updart: Upload Darting Data
Uptick: Load darting parasite data
Psionload: Load Psion point/sample data
Upload: Upload Into Any Table or View
Useful Programs and Functions
Functions
Logout: Logout From Babase Custom Programs
Wwwdiff: World Wide Web based Difference program
Overview of Data Analysis Procedures
A. Manipulating Date and Time Values
B. Querying-All-Occurrences-Interactions
C. Alteration Of Sexual Cycle Ids (Cid)
D. Babase Revision History
Changes to Babase since 3.0
Backward in-compatible changes
Changes to Babase between 2.0 and 3.0
Backward in-compatible changes
Changes to Babase between 1.0 and 2.0
GROUPS
BIOGRAPH
Interpolation and MEMBERS
The Sexual Cycle Information
JPSAMPS and FPSAMPS (and POINT_DATA and FPOINTS)
Time Representation
The All-Occurrences Focal Point Data
Support Tables
The Addition of Views
E. DocBook, Styling and other issues
F. Restrictions: Things Not To Do
G. Database Transactions Explained
H. The Warning Sub-System
Introduction to the Warning Sub-System
An Overview of the Warning Sub-System Data Structures
The Warning Sub-System Main Tables
INTEGRITY_QUERIES
INTEGRITY_WARNINGS (Warning Sub-System Results)
Warning Sub-System Support Tables
IQTYPES (Integrity Query Types)
WARNING_REMARKS (Remarks Regards Warning Results)
The Warning Sub-System Functions (Activating The Warning Sub-System)
run_integrity_queries — execute one or more of the queries stored in the INTEGRITY_QUERIES table

List of Figures

2.1. Key to the Babase Entity Relationship Diagrams
2.2. Babase Group Membership Entity Relationship Diagram
2.3. Babase Life Events Entity Relationship Diagram
2.4. Babase Sexual Cycle Entity Relationship Diagram
2.5. Babase Sexual Cycle Day-To-Day Tables Entity Relationship Diagram
2.6. Babase Social Interactions Entity Relationship Diagram
2.7. Babase Multiparty Interactions Entity Relationship Diagram
2.8. Babase Darting Logistics and Morphology Entity and Relationship Diagram
2.9. Babase Darting Physiology Entity and Relationship Diagram
2.10. Babase Darting Samples Entity and Relationship Diagram
2.11. Babase Darting Teeth and Ticks Entity and Relationship Diagram
2.12. Babase Hybrid Scores Data Entity Relationship Diagram
2.13. Babase SWERB Core Tables Entity Relationship Diagram
2.14. Babase SWERB Grove/Waterhole Location Tables Entity Relationship Diagram
2.15. Babase Manual Weather Data Entity Relationship Diagram
2.16. Babase WeatherHawk Data Entity Relationship Diagram
4.1. An Individual is Censused Present and Absent
4.2. Interpolating From Presences and Absences
4.3. Interpolating Group Membership
4.4. Computing Interp Values
4.5. The 14 Day Interpolation Limit
4.6. Interpolation at Birth
4.7. Alive and Present When Last Censused
4.8. Alive and Absent in Last Census
4.9. Interpolation to Statdate When Dead
4.10. Midpoint Days
4.11. The Midpoint Rule Adjusts Intervals
4.12. An Individual is Censused in 2 Groups
4.13. Interpolating Each Group Separately
4.14. A Closer Look at Intervals
4.15. Presence and Absence Interpolated Separately
4.16. Combining Presence and Absence Intervals
4.17. Group Membership Given Multiple Groups
4.18. Pre-Analyzed Data Truncates Interpolation Intervals
4.19. Pre-Analyzed Data Interrupts Interpolation
6.1. Query Defining the CENSUS_DEMOG View
6.2. Entity Relationship Diagram of the CENSUS_DEMOG View
6.3. Query Defining the CENSUS_DEMOG_SORTED View
6.4. Entity Relationship Diagram of the CENSUS_DEMOG_SORTED View
6.5. Query Defining the CYCPOINTS_CYCLES View
6.6. Entity Relationship Diagram of the CYCPOINTS_CYCLES View
6.7. Query Defining the CYCPOINTS_CYCLES_SORTED View
6.8. Entity Relationship Diagram of the CYCPOINTS_CYCLES_SORTED View
6.9. Query Defining the DEMOG_CENSUS View
6.10. Entity Relationship Diagram of the DEMOG_CENSUS View
6.11. Query Defining the DEMOG_CENSUS_SORTED View
6.12. Entity Relationship Diagram of the DEMOG_CENSUS_SORTED View
6.13. Query Defining the GROUPS_HISTORY View
6.14. Entity Relationship Diagram of the GROUPS_HISTORY View
6.15. Query Defining the PARENTS View
6.16. Entity Relationship Diagram of the PARENTS View
6.17. Query Defining the POTENTIAL_DADS View
6.18. Entity Relationship Diagram of the foundation of the POTENTIAL_DADS View
6.19. Entity Relationship Diagram of that portion of the POTENTIAL_DADS View which places the mother and potential father in the same group during the fertile period
6.20. Entity Relationship Diagram of that portion of the POTENTIAL_DADS View having easily computed columns
6.21. Entity Relationship Diagram of that portion of the POTENTIAL_DADS View involving social interactions
6.22. Query Defining the CYCLES_SEXSKINS View
6.23. Entity Relationship Diagram of the CYCLES_SEXSKINS View
6.24. Query Defining the CYCLES_SEXSKINS_SORTED View
6.25. Entity Relationship Diagram of the CYCLES_SEXSKINS_SORTED View
6.26. Query Defining the MATERNITIES View
6.27. Entity Relationship Diagram of the MATERNITIES View
6.28. Query Defining the MTD_CYCLES View
6.29. Entity Relationship Diagram of the MTD_CYCLES View
6.30. Query Defining the PCSKINS_SORTED View
6.31. Entity Relationship Diagram of the PCSKINS_SORTED View
6.32. Query Defining the SEXSKINS_CYCLES View
6.33. Entity Relationship Diagram of the SEXSKINS_CYCLES View
6.34. Query Defining the SEXSKINS_CYCLES_SORTED View
6.35. Entity Relationship Diagram of the SEXSKINS_CYCLES_SORTED View
6.36. Query Defining the ACTOR_ACTEES View
6.37. Entity Relationship Diagram of the ACTOR_ACTEES View
6.38. Query Defining the INTERACT View
6.39. Entity Relationship Diagram of the INTERACT View
6.40. Query Defining the INTERACT_SORTED View
6.41. Entity Relationship Diagram of the INTERACT_SORTED View
6.42. Query Defining the MPI_EVENTS View
6.43. Entity Relationship Diagram of the MPI_EVENTS View
6.44. Query Defining the POINTS View
6.45. Entity Relationship Diagram of the POINTS View
6.46. Query Defining the POINTS_SORTED View
6.47. Entity Relationship Diagram of the POINTS_SORTED View
6.48. Query Defining the SAMPLES_GOFF View
6.49. Entity Relationship Diagram of the SAMPLES_GOFF View
6.50. Query Defining the ANESTH_STATS View
6.51. Entity Relationship Diagram of the ANESTH_STATS View
6.52. Query Defining the BODYTEMP_STATS View
6.53. Entity Relationship Diagram of the BODYTEMP_STATS View
6.54. Query Defining the CHEST_STATS View
6.55. Entity Relationship Diagram of the CHEST_STATS View
6.56. Query Defining the CROWNRUMP_STATS View
6.57. Entity Relationship Diagram of the CROWNRUMP_STATS View
6.58. Query Defining the DSAMPLES View
6.59. Query Defining the DENT_CODES View
6.60. Entity Relationship Diagram of the DENT_CODES View
6.61. Query Defining the DENT_SITES View
6.62. Entity Relationship Diagram of the DENT_SITES View
6.63. Query Defining the HUMERUS_STATS View
6.64. Entity Relationship Diagram of the HUMERUS_STATS View
6.65. Query Defining the PCV_STATS View
6.66. Entity Relationship Diagram of the PCV_STATS View
6.67. Query Defining the TESTES_ARC_STATS View
6.68. Entity Relationship Diagram of the TESTES_ARC_STATS View
6.69. Query Defining the TESTES_DIAM_STATS View
6.70. Entity Relationship Diagram of the TESTES_DIAM_STATS View
6.71. Query Defining the ULNA_STATS View
6.72. Entity Relationship Diagram of the ULNA_STATS View
6.73. Query Defining the QUADS View
6.74. Entity Relationship Diagram of the QUADS View
6.75. Query Defining the SWERB View
6.76. Entity Relationship Diagram of the SWERB View
6.77. Query Defining the SWERB_DATA_XY View
6.78. Entity Relationship Diagram of the SWERB_DATA_XY View
6.79. Query Defining the SWERB_DEPARTS View
6.80. Entity Relationship Diagram of the SWERB_DEPARTS View
6.81. Query Defining the SWERB_GW_LOCS View
6.82. Entity Relationship Diagram of the SWERB_GW_LOCS View
6.83. Query Defining the SWERB_GW_LOC_DATA_XY View
6.84. Entity Relationship Diagram of the SWERB_GW_LOC_DATA_XY View
6.85. Query Defining the SWERB_LOC_GPS_XY View
6.86. Entity Relationship Diagram of the SWERB_LOC_GPS_XY View
6.87. Query Defining the SWERB_LOCS View
6.88. Entity Relationship Diagram of the SWERB_LOCS View
6.89. Query Defining the SWERB_UPLOAD View
6.90. Entity Relationship Diagram of the SWERB_UPLOAD View
6.91. Query Defining the MIN_MAXS View
6.92. Entity Relationship Diagram of the MIN_MAXS View
6.93. Query Defining the MIN_MAXS_SORTED View
6.94. Entity Relationship Diagram of the MIN_MAXS_SORTED View
6.95. Query Defining the BIRTH_GRP View
6.96. Entity Relationship Diagram of the BIRTH_GRP View
6.97. Query Defining the ENTRYDATE_GRP View
6.98. Entity Relationship Diagram of the ENTRYDATE_GRP View
6.99. Query Defining the STATDATE_GRP View
6.100. Entity Relationship Diagram of the STATDATE_GRP View
6.101. Query Defining the CONSORTDATES_GRP View
6.102. Entity Relationship Diagram of the CONSORTDATES_GRP View
6.103. Query Defining the CYCGAPDAYS_GRP View
6.104. Entity Relationship Diagram of the CYCGAPDAYS_GRP View
6.105. Query Defining the CYCGAPS_GRP View
6.106. Entity Relationship Diagram of the CYCGAPS_GRP View
6.107. Query Defining the CYCSTATS_GRP View
6.108. Entity Relationship Diagram of the CYCSTATS_GRP View
6.109. Query Defining the DARTINGS_GRP View
6.110. Entity Relationship Diagram of the DARTINGS_GRP View
6.111. Query Defining the DISPERSEDATES_GRP View
6.112. Entity Relationship Diagram of the DISPERSEDATES_GRP View
6.113. Query Defining the MATUREDATES_GRP View
6.114. Entity Relationship Diagram of the MATUREDATES_GRP View
6.115. Query Defining the MDINTERVALS_GRP View
6.116. Entity Relationship Diagram of the MDINTERVALS_GRP View
6.117. Query Defining the MMINTERVALS_GRP View
6.118. Entity Relationship Diagram of the MMINTERVALS_GRP View
6.119. Query Defining the PCSKINS_GRP View
6.120. Entity Relationship Diagram of the PCSKINS_GRP View
6.121. Query Defining the RANKDATES_GRP View
6.122. Entity Relationship Diagram of the RANKDATES_GRP View
6.123. Query Defining the REPSTATS_GRP View
6.124. Entity Relationship Diagram of the REPSTATS_GRP View
H.1. Warning Sub-System Entity Relationship Diagram

List of Tables

2.1. A Simple Database Table
2.2. The Main Babase Tables
2.3. The Babase Support Tables
2.4. The Babase Views
2.5. The table_GRP Views
6.1. Columns in the CENSUS_DEMOG View
6.2. Columns in the CENSUS_DEMOG_SORTED View
6.3. Columns in the CYCPOINTS_CYCLES View
6.4. Columns in the CYCPOINTS_CYCLES_SORTED View
6.5. Columns in the DEMOG_CENSUS View
6.6. Columns in the DEMOG_CENSUS_SORTED View
6.7. Columns in the GROUPS_HISTORY View
6.8. Columns in the PARENTS View
6.9. Columns in the POTENTIAL_DADS View
6.10. Columns in the CYCLES_SEXSKINS View
6.11. Columns in the CYCLES_SEXSKINS_SORTED View
6.12. Columns in the MATERNITIES View
6.13. Columns in the MTD_CYCLES View
6.14. Columns in the SEXSKINS_CYCLES View
6.15. Columns in the SEXSKINS_CYCLES_SORTED View
6.16. Columns in the ACTOR_ACTEES View
6.17. Columns in the INTERACT View
6.18. Columns in the INTERACT_SORTED View
6.19. Columns in the MPI_EVENTS View
6.20. Columns in the POINTS View
6.21. Columns in the POINTS_SORTED View
6.22. Columns in the SAMPLES_GOFF View
6.23. Columns in the ANESTH_STATS View
6.24. Columns in the BODYTEMP_STATS View
6.25. Columns in the CHEST_STATS View
6.26. Columns in the CROWNRUMP_STATS View
6.27. Columns in the DSAMPLES View
6.28. Columns in the DENT_CODES View
6.29. Columns in the DENT_SITES View
6.30. Columns in the HUMERUS_STATS View
6.31. Columns in the PCV_STATS View
6.32. Columns in the TESTES_ARC_STATS View
6.33. Columns in the TESTES_DIAM_STATS View
6.34. Columns in the ULNA_STATS View
6.35. Columns in the QUADS View
6.36. Columns in the SWERB View
6.37. Columns in the SWERB_DATA_XY View
6.38. Columns in the SWERB_DEPARTS View
6.39. Columns in the SWERB_GW_LOCS View
6.40. Columns in the SWERB_GW_LOC_DATA_XY View
6.41. Columns in the SWERB_LOC_GPS_XY View
6.42. Columns in the SWERB_LOCS View
6.43. Columns in the SWERB_UPLOAD View
6.44. Columns in the MIN_MAXS View
6.45. Columns in the MIN_MAXS_SORTED View
C.1. Sexual cycle events before insertion
C.2. Sexual cycle events after insertion
H.1. The Warning Sub-System Tables
H.2. The Warning Sub-System Support Tables

List of Examples

1.1. A note
1.2. A caution
1.3. A warning
1.4. Text denoted important
1.5. A tip
2.1. Creating table foo in the sandbox schema
2.2. Granting permission to table foo in the sandbox schema
2.3. Creating table foo in user mylogin's schema
A.1. Using the Postgresql date_trunc() function to set seconds to zero
A.2. Using the Babase date_mod() function to return the minutes and seconds.
A.3. Using the Postgresql to_char() function to convert times to HH:MM text
B.1. Finding all the all-occurrences interactions
C.1. Splitting a sexual cycle in two
H.1. Inserting a query into INTEGRITY_QUERIES using dollar quoting
H.2. Executing all INTEGRITY_QUERIES
H.3. Executing a single INTEGRITY_QUERIES.Query
H.4. Executing INTEGRITY_QUERIES of the bdate type

Chapter 1. Introduction

This Document

This document describes the Babase baboon data management system. This includes a description of the tables, the intended use of all related programs and directories, the design of the system, and procedures for maintaining the data management system itself. This document does not include the procedures actually used to enter data into the system, or the details of how to operate the systems programs. Nor does it include any instructions on the operation or administration of the computer itself. Further information on the topics not covered in this document can be found in the Protocol for Data Management: Amboseli Baboon Project document.

The Protocol for Data Management: Amboseli Baboon Project document is an important adjunct to the Babase system, but it is not considered part of the system itself because it describes the use of the system but not the capabilities of the system. It is important to maintain the distinction between use and capabilities so that when an enhancement is needed, it is clear whether the desired result can be obtained by altering the way the system is used, or whether the system itself needs to be modified. It is also important to provide different types of documentation to those who operate the system from those who manage and maintain the system because each of these two groups do not need to know all the details of the others' work.

Any deviation from the standards described in this document should be discussed with the project directors and may God have mercy on your souls.

Conventions Used In This Document

This document follows a number of conventions, most of them typographic but some of them stylistic. Some output formats, particularly plain text, have limited typographic capabilities so the various forms of typographic markup are not always distinguishable, either from each other or from the surrounding text.

Each table in Babase is documented in a section of its own, beginning with a description of the table as a whole and continuing with sub-sections for each column in the table. Of particular importance is the sentence that describes what a row in the given table represents. These are summarized in the textual tables given in the Table Overview section.

Interrelationships between the columns of a table, or between tables, is documented at the beginning of the table's section, not in the sub-sections documenting the columns themselves. Although relationships between 2 tables concern both of the tables the description of each such relationship appears only once in this document, in the overall description of one of the of the two tables concerned. On occasion there may be be brief mention elsewhere.

All TABLE NAMES are written in UPPER CASE. Column Names are in lower case with Initial Capitals. SOMETABLE.Somecolumn is shorthand for the Somecolumn column of the SOMETABLE table. The use of a period to separate the table from the column name is the convention used by SQL to eliminate ambiguity regarding which table a column belongs to. When a column name includes an acronym the acronym is capitalized, as is the first letter of the next word when the acronym begins the column name. For example, PCSColor.

Actual database values are typographically distinguished from the surrounding text, as in the following sentence: The Sname (short name) of the baboon Pebbles is PEB.

When this document defines a word, uses it for the first time, or otherwise wishes to refer to a word or phrase as a thing in and of itself, the word or phrase is typographically distinguished as follows: The word census has several meanings within this document.

Text that has special meaning to computer systems is typographically distinguished as follows: The SQL SELECT statement is the standard method for retrieving data from relational databases.

Emphasized text is typographically distinguished as follows: Always backup your data.

When the words must or cannot or the phrases must not or may not are used, the system will not allow a contrary condition. For example: "Sname must be a unique data value" or "A user with read-only permissions may not change data values." Babase will immediately raise an error when a dis-allowed change is attempted and the change will not take effect.

When the words should or ought are used the system does not enforce the condition. It may or may not report a violation of the condition. An example: "The sexual cycle event referred to in the pregnancy table's Conceive column should date the conception that began the pregnancy." In this case the system has no way of knowing when the pregnancy began and so no way of validating the date.

When the phrase the system will report is used there is some mechanism for reporting a an unusual but not dis-allowed condition. Unlike prohibited conditions, unusual conditions are not generally reported at the time the condition is created.[1]

The documentation is written with a tendency to emphasize Special Values. So, for example, not alive is often written instead of dead because Babase has a special value that means alive but the system is not aware of a particular code that means dead. The result is an occasional double negative.

Significant but often slightly off-topic paragraphs are set off from the surrounding material as a note, shown in Example 1.1.

Example 1.1. A note

Note

Written material has no voice that can be raised, but attention can be drawn with typographical conventions.


When the reader should take care, particularly when the system might do something unexpected in a given circumstance, this is noted in a caution. Example 1.2 shows how a caution is set off from the surrounding text.

Example 1.2. A caution

Babase will reject your change if you try to do something that is not allowed, like giving a male an onset of turgesence date.

Caution

When the rejected change is one of a number of changes bundled into a transaction none of the changes will make it into the database.


When a mis-use of the system will lead to incorrect results, particularly when such results are not obvious, this document contains a warning. Example 1.3 show how warnings are set off from the surrounding text.

Example 1.3. A warning

Warning

Babase cannot detect when an Sname is mis-typed, so it is possible to inadvertently assign a female's sexual cycle to the wrong female.


To otherwise draw the readers attention to material some text is marked important. Example 1.4 show how important text is set off from the surrounding material.

Example 1.4. Text denoted important

Babase has a number of components, many of them, like the SQL web interface, are third party tools, not written by the Babase developers.

Important

When the third party tools are upgraded their look may change but the features they provide should remain. As Babase is composed of Free Software the Babase project always has the option of customizing any of its third party tools and can contribute its improvements back to the program's developers for inclusion into future releases.


Suggestions as to how to use Babase are noted in tips, as are remarks on how data are presently entered in Babase or recorded in the field. Example 1.5 show how a tip is set off from the regular document text.

Example 1.5. A tip

Tip

Lick all the chocolate off your fingers before beginning data entry.


Often, the tips are the result of best practice developed from considered experience and so document how Babase is used at the time of this writing. However, as best practice continues to develop and field protocols change, the Protocol for Data Management: Amboseli Baboon Project and the Amboseli Baboon Research Project Monitoring Guide should always be consulted. Those documents have precedence over the tips presented herein should there be conflicting advice.

Supplemental and cross-referential material is presented in footnotes.

A Guide for the Reader

Anyone who is changing or adding programs to the system should read this entire document. Chapter 3: “Baboon Data: Primary Source Material is particularly important for all those using the system. Chapter 2: “Babase System Architecture provides the introduction to Babase. It explains fundamental concepts without which Babase cannot be understood, although some portions can be skipped; the sections “The Babase Program Code” and “Indexes” are primarily of interest to programmers and the section “Special Values” is for the data maintainers. Everyone will want to pay special attention to the “Entity-Relationship Diagrams” section. These diagrams can also be found in PDF form in The Babase Pocket Reference, where they may be easier on the eye. The section “Data Maintenance Programs” of chapter 8: “Babase Programs is of little interest to those who only want to retrieve information from the system. Portions of the “Useful Programs and Functions” section of the same chapter is of interest to the more sophisticated user. Note that some functions may be hidden in Next links, depending on the format chosen when reading this document. Data maintainers should be sure to understand chapter 5: “Support Tables. Those who are only retrieving data from Babase need not read chapter 7: “Data Entry.

System Design

The Babase system is designed to facilitate the retrieval, storage, and maintenance of the Amboseli Baboon Project data. Data integrity is foremost. Analytical power, ease of use, and low cost are secondary goals. The system consists of tables to store and organize the data, software supporting data validation and derivative data generation, stand-alone programs used to facilitate the entry and maintenance of the data, a minimal tool set supporting the maintenance of the Babase system software itself, and documentation. data are retrieved from Babase using the SQL language, the standard[2] language used to query relational databases. SQL is declarational as opposed to procedural; from a single SQL query (a single statement) the database determines how to best retrieve the data requested, no matter the number of tables or criteria required. SQL provides a single, powerful, interface for ad-hoc data retrieval and manipulation. Generic software provides the bulk of the user interface[3], traditionally the most complex and costly software component.[4] Consequently there are few stand-alone programs written specifically for Babase. The overall philosophy of the systems implementation is to keep the software as easy to maintain as possible while assuring data integrity. To this end, the system is comprised of as many generic components as possible and the design requires custom programming for only the most crucial features.

Babase puts as much intelligence as possible into the database itself, including automatic data validation and complex automatic analysis and storage of the derived data.[5] Babase extends its sometimes complex and rather abstract database structures with alternative, more familiar and user-friendly, means of accessing the underlying data[6]. These constructs are, in so far as is possible, made indistinguishable from the underlying data when querying and updating the database. Babase often generates derivative data for more ready analysis. This is, for the most part, transparent to the user. The end-user is insulated from implementation details, the number of interfaces (primarily SQL) the user must learn is minimized, and the user is free to work with the data structures that embody the conceptual model best suited to the task at hand.[7]

Data input is an example of how Babase incorporates generic programs. The prototypical way to import data into Babase is in bulk, via a plain text file having columns delimited by the tab character. These are easily produced by almost any spreadsheet program; it is expected that most data imported into Babase will be typed into a spreadsheet and then exported to tab-delimited text for upload.[8] The use of generic interfaces reduces cost, and minimizing the number of novel interfaces frees the end-user to concentrate on the task at hand.

Babase is designed to be accessed over the Internet, primarily via the web. Although there are exceptions[9] the majority of Babase is accessed via a W3C compliant web browser. Individually assigned usernames and passwords are used, along with encryption, to secure the database content. The Babase Wiki provides content for an the structure of the project's web site. Another example of Babase leveraging a generic program, the wiki allows project members collaborate, share information, and build the project's web site without programmer intervention.

Babase is built upon standards[10] and popular, widely deployed, Open Source and Free Software. This means, among other things, that the tools used to build and run Babase are very likely available to anyone free of cost, and that the skill-sets required for the system maintenance of and, to some extent, use of Babase are readily learned[11] and unlikely to become obsolete[12].[13] The Babase source code itself is Free Software[14] and may be downloaded by the public.[15]

Note

The database design attempts 5th normal form, no redundant data, no empty data elements allowed, etc. What we've actually wound up with is about 3rd normal form.

To Start Babase

The Babase system is accessed over the web. Any web browser may be used to view the data using the phpPgAdmin generic database interface. More advanced usage of the website will likely require a web browser that conforms to the international standards for the web defined by the World Wide Web Consortium , otherwise known as the W3C ,as we have put forth no particular effort to accommodate non-standards conforming browsers. The browser must support CSS2 style sheets and XHTML 1.0. Note that at the time of this writing Microsoft Internet Explorer does not provide adequate style sheet support. Other browsers that do have such support include Mozilla , Mozilla Firefox ,Apple's Safari ,and Opera. The W3.org site maintains a list of browsers supporting style sheets.

Babase's URL (web address) is https://papio.biology.duke.edu/ . Be sure to type the s in https . This secures your web connection.

You must access most of the Babase web site using a secure communications protocol ( HTTPS ) that encrypts all communication to foil eavesdroppers and checks the identity of the web site itself. The Babase project has signed its own security certificate, the certificate that ensures you are talking with the website you think you are.[16] Our certificate expires annually and is re-generated.

Your browser probably will not trust that our website is who it says it is and so will very likely object when you first access the Babase web site, and annually thereafter. You may tell your browser to accept our certificate permanently.

Other Resources

Resources related to Babase include:

Babase users are encouraged to ask questions, both on the Babase mailing list and on the mailing lists setup for questions on the software that Babase is made of.



[1] Immediate reporting of some unusual conditions could be added to Babase at a later date.

[2] More or less. The last actual SQL standard was issued a very long time ago. None the less SQL is pervasive and, although specific SQL statements may not always be, the skill set involved in SQL use is quite portable.

[3] There are many PostgreSQL user interfaces available, although at the time of this writing only 2, phpPgAdmin and psql, are installed on the Babase database server. Many of these front-ends must be installed on the local workstation. These may require that the Babase VPN be running before initiating a connection to the database. Some of available front-ends may be found via the PostgreSQL FAQ question regarding graphical user interfaces for PostgreSQL.

[4] It's those pesky unpredictable users. Computer software would be a lot easier to write if it weren't for users always messing things up and then insisting on knowing what happened.

[5] A process which, admittedly, sometimes conflicts with the notion of easily maintaining the software. On the other hand when done right this approach does wonders for data integrity.

[7] These features also free the user from software interface lock-in. The database may be accessed and maintained with the software of choice. Data integrity, in both raw and derived data, is assured. Significantly, these features are those that allow Babase to leverage generic programs, using them for the bulk of its user interface as opposed to building a custom, Babase specific, interface.

[8] Of course, because Babase has no designated front-end and so much data validation takes place inside the database itself, any program able to talk with PostgreSQL, the database engine Babase uses, can be used to import data into the database. So there are no real limits on how data must be structured for import into Babase.

[9] There are 2 Unix shell programs that provide peripheral utility; both do tasks that can be done with other tools but are handy to have automated. The use of these programs are documented on the Babase Wiki. Comprehensive documentation of these programs should probably be added to this document.

The Unix Shell Programs

babase-copy-babase-schema

Copies the entire content of the babase schema from one database to another.

babase-user-add

Adds a postgresql user, granting the permission to use Babase

There is also the ranker program, which runs on the local workstation and uses the Internet to communicate with the database. Developed separately from the rest of Babase, neither the source code management of nor the documentation for the ranker program is particularly well integrated into Babase.

[10] Actual standards, not de facto ones.

[11] Because open standards and the documentation for Open Source and Free Software programs are available, without cost; and because the inherently transparent and public nature of open standards, Open Source and Free Software leads not only to a wealth of good instructional material freely available on the Internet but also rounds out the basic requirements of a complete learning environment by ensuring that the software itself is available to everyone.

[12] Because once software is released and distributed under a Free or Open Source license it cannot be locked away and made unavailable, and because open standards are rarely changed in a backwards-incompatible way.

[13] Consequently the skills are rather widely available. The difficult part, as always, is finding the all of the relevant skills at once. For more on this see The Babase Program Code section.

[14] Presently licensed under the GPL Version 3 or later.

[15] Babase database content is not available to the public.

[16] We do this rather than paying one of the regular certification authorities to validate our identity. These certification authorities appear to validate the identity of their customers by virtue of little more than having successfully been paid.

Chapter 2. Babase System Architecture

Databases

Databases are collections of information, all of which can be queried and otherwise manipulated alone or in aggregation with all other database content.[17] Babase contains three databases.

The babase Database

The babase database contains the real information. All research takes place in this database.

The babase_copy Database

The babase_copy database contains a copy of the babase database. It is a place to try out dangerous things that might break the babase database.

The babase_test Database

The babase_test database contains a few bits of made up information. It is a place to try out random things and a place where the babase developers can work on alterations and enhancements.

Users, Groups and Database Permissions

Each user is given a login and a password they must use to gain access to the database. It is good form to change your password occasionally.[18]

The database can grant specific users various levels of access to specific tables, although such access is not common as it is difficult to administer and maintain such a fine grained degree of control. For further information see the PostgreSQL documentation on Database Users and Privileges.

Rather than maintain database access privileges on a per-user basis it is more convenient to place users in groups and then grant these groups different levels of database access.

Babase contains the following groups:

The babase_readers group

The members of this group have read access to Babase data and cannot add, delete, or otherwise alter any of the data.

The babase_editors group

The members of this group have unlimited rights to the Babase data. They may add data, delete data, or alter existing data. They may not, however, alter the structure of the babase database or change the rules to which the data are required to conform. Thus, they may not add or delete tables, alter triggers, or write or replace stored procedures.

Schemas

Schemas partition databases. Tables, procedures, triggers, and so forth are all kept in schemas. Schemas are like sub-databases within a database. The salient difference between schemas and databases is that a single SQL statement can refer to objects in the different schemas of the parent database, but cannot refer to objects in other databases -- tables within a database can be related, but tables in different databases cannot. Babase uses schemas to partition each database into areas where users have a greater or lesser degree of freedom to make changes. For further information on schemas see the schema documentation for PostgreSQL.

Each database is divided into the same schemas. That is, each schema described below exists within each of the databases described here.

The system looks at the different schemas for objects, for example table names appearing in SQL queries, in the order in which the schemas are listed below. If the table does not appear in the first schema it looks in the second, and so forth. As soon as a table is found with the name given, that table is used and the search stops.

To explicitly reference an object in a specific schema, place the name of the schema in front of the object, separating the two with a period (e.g. schemaname.tablename).

The babase schema

The babase schema holds the official Babase tables. Everything in the babase schema is documented and supported.

In this schema the babase_readers and babase_editors have the access described above.

The babase_something_views schemas

Babase contains a number of schemas that exist to simplify things for those interested only in particular portions of Babase. These schemas contain nothing but views that reference other parts of Babase, the parts that are especially relevant and useful to those interested only in one of the broad categories of Babase data. These schemas and their corresponding categories are:

The categories of Babase data and their schemas
SchemaCategory
babase_cycles_viewsSexual Cycles
babase_darting_viewsDarting
babase_demog_viewsGroup Membership and Life Events
babase_social_viewsSocial and Multiparty Interactions
babase_support_viewsSupport Tables
babase_swerb_viewsSWERB Data (Group-level Geolocation Data)
babase_weather_viewsWeather Data
babase_group_viewsViews Which Add Gid To Tables

These schemas provide an overview of the major areas of Babase. They should be especially useful to those starting out with Babase or those interested only in particular portions of Babase data.

The views in these schemas may only be queried. Any updating of Babase data must be done in the babase schema.

Note

Some of Babase's tables and views appear in more than one of these schemas, some in none.

Warning

Do not create any views that reference the views in these schemas. Reference the babase schema instead. Any views created that reference anything in these category schemas will be destroyed on occasion as Babase is modified.

The babase_pending schema

The babase_pending schema holds tables pending planned integration into Babase. The tables in this schema are intended to be used with the official Babase tables but, unlike the official Babase tables, there is no automated validation process and the table structure has not been thoroughly reviewed. The tables in babase_pending are to be used but their content and structure may change when officially incorporated into Babase.

Documentation on the content of the babase_pending schema may be found on the babase_pending page of the Babase Wiki.

The difference between this schema and the sandbox schema is in the permissions granted.

babase_readers permissions in the babase_pending schema

Members of the babase_readers group have the same permissions they do in the babase schema, they have read access to the data but cannot add, delete or modify it. However, unlike in the babase schema, individual users may be granted the right to add, delete, or change data on a table-by-table basis.

babase_editors permissions in the babase_pending schema

Members of the babase_editors group have the permissions they normally have in the babase schema, they may add, delete or modify all data in the schema's tables.

The sandbox schema

The sandbox schema holds tables that are used together with the official Babase tables but have not yet made it into the Babase project. They will not be documented in the Babase documentation.

The groups have the following permissions:

babase_readers permissions in the sandbox schema

The babase_readers have all the permissions in the sandbox schema that the babase_editors have in the babase schema. They may add, delete, or modify any information in the schema but may not alter the structure of the schema by adding or removing tables, procedures, triggers, or anything else.

babase_editors permissions in the sandbox schema

The babase_editors have all the permissions of the babase_readers, plus they may add or delete tables, stored procedures, or any other sort of object necessary to control the structure of the data.

Because of the schema search order the schema name must be used to qualify anything created in the sandbox schema. E.g.

Example 2.1. Creating table foo in the sandbox schema


CREATE TABLE sandbox.foo (somecolumn INTEGER);
              


PostgreSQL, the database underlying Babase, is secure by default. This means that any tables or other database objects cannot be accessed by anyone but their creator without permission of the creator. Babase_editors who create tables in the sandbox schema should use the GRANT statement to grant access to Babase's other users.

This is done as follows:

Example 2.2. Granting permission to table foo in the sandbox schema


GRANT ALL ON sandbox.foo TO GROUP babase_editors;
GRANT SELECT ON sandbox.foo TO GROUP babase_readers;

              


There is one other issue. Only the creator of a table can change its structure -- to add another column, change the table name, etc. And only the creator can destroy (DROP) the table.

The devel schema

The devel schema holds tables undergoing integration into Babase. Normally it is empty, but during the design and development of new tables it may contain the tables being developed.

The tables in this schema do not necessarily contain valid or finalized data and so are not expected to be used for other than developmental purposes.

Permissions are granted in the devel schema on the same basis as the granting of permissions in the babase schema.

The difference between this schema and the sandbox schema is that the development tools support the creation and modification of the tables in the devel schema, which facilitates the movement of tables from the devel schema into the babase schema.

The per-user schemas

Each user has her own schema, a schema named with the user's login. Users have permissions to do anything they want in their own schemas, and no permissions whatsoever to anybody else's schema. A user's schema is private.

Caution

Users are not encouraged to grant others permissions to the tables in their schema, as shown in the Section: “The sandbox schema” above. A user's schema is deleted when she leaves Babase. All shared tables belong in the sandbox schema where they can be maintained without regard to personnel changes.

Because of the schema search order the schema name must be used to qualify anything created in the user's schema. E.g.

Example 2.3. Creating table foo in user mylogin's schema


CREATE TABLE mylogin.foo (somecolumn INTEGER);
            


Table Overview

The data in Babase are stored in tables. Tables can be visualized as grids, with rows and columns. Each row represents a single real-world thing or event, an entity, e.g. a baboon. Each cell in the row contains a single unit of information, e.g. a birth date, a name, and a sex. The row holds the entirety of the information belonging to the entity as an isolated thing, e.g. baboon database entities consist of a birth date, a name, and a sex. Each column contains one and only one kind of information, e.g. birth date.

Table 2.1 is an example of a database table that might be used to represent baboons, one baboon per row. Notice that each cell contains one and exactly one unit of information.

Table 2.1. A Simple Database Table

BirthNameSex
May 23, 1707AliceFemale
February 12, 1809BobMale
July 22, 1822CarolFemale

Anyone working with Babase will require a familiarity with the database's tables. An understanding of the entity each row represents is critical when working with a table. The remainder of this section provides short definitions of the entities each table holds in its rows.

Some of the tables in Babase exist to define a vocabulary. These are the support tables. For lack of a better term, the remainder of the tables are labeled main tables in Table 2.2.

Warning

Tables which have names ending in _DATA should not be used, there is always a view of the data in these tables that may be used in their place. Tables ending in _DATA may change in future Babase minor releases, breaking queries and programs which use the table. Use of the corresponding views will ensure compatibility with future Babase releases.

Table 2.2. The Main Babase Tables

Group Membership and Life Events
TableOne row for each
ALTERNATE_SNAMESrescinded sname
BIOGRAPHanimal, including fetuses
CENSUSday each individual is (or is not) observed in a group
CONSORTDATESmale who has a known first consortship
DEMOGmention of an individual's presence in a group within a field textual note
DISPERSEDATESmale who has left his maternal study group
GROUPSgroup (including solitary males)
MATUREDATESindividual who is sexually mature
RANKDATESindividual[a] who has attained adult rank
 
Analyzed: Group Membership and Life Events
TableOne row for each
DAD_DATAoffspring having a paternity analysis
MEMBERSday each individual is alive
RANKSmonth each individual is ranked in each group
 
Sexual Cycles
TableOne row for each
CYCGAPSfemale for each initiation or cessation of a continuous period of observation
CYCLESfemale's cycle (complete or not)
CYCPOINTSMdate (menses), Tdate (turgesence onset), or Ddate (deturgesence onset) date of each female
PCSKINSPCS color of each female
PREGStime a female becomes pregnant
SEXSKINSsexskin measurement of each female
 
The Sexual Cycle Day-By-Day Tables
TableOne row for each
CYCGAPDAYSfemale for each day within a period during which there is not continuous observation
CYCSTATSday each female is cycling -- by M, T and Ddates
MDINTERVALSday each female is cycling and is between M and Ddates
MMINTERVALSday each female is cycling -- by Mdates
REPSTATSday each female has a known reproductive state
 
Social and Multiparty Interactions
TableOne row for each
ALLMISCSfree form all-occurrences datum
CONSORTSmultiparty dispute over a consortship
FPOINTSpoint observation of a mature female
INTERACT_DATAinteraction between individuals
MPIScollection of multiparty interactions
MPI_DATAsingle dyadic interaction of a multiparty interaction collection
MPI_PARTSparticipant in a dyadic interaction of a multiparty interaction collection
PARTSparticipant in each interaction
POINT_DATAindividual point observation
NEIGHBORSneighbor recorded in each point sample
SAMPLESall-occurrences sample
 
Darting
TableOne row for each
ANESTHStime additional sedation is administered to a darted individual
BODYTEMPSbody temperature measurement taken of a darted individual
CHESTSchest circumference measurement made of a darted individual
CROWNRUMPScrown to rump measurement made of a darted individual
DART_SAMPLESsample type collected at each darting
DARTINGSdarting of an animal when data was collected
DPHYSdarting event during which physiological measurements were taken
HUMERUSEShumerous length measurement made of a darted individual
PCVSpacked cell volume measurement taken from a darted individual
TEETHpossible tooth site within the mouth on which data was collected for every darting event during which dentition data was collected
TESTES_ARCevery testicle width/length measurement recorded, as measured along a portion of the circumference
TESTES_DIAMevery testicle width/length measurement recorded, as measured along the diameter
TICKSdarting event during which data on ticks and other parasites were recorded
ULNASulna length measurement made of a darted individual
 
Analyzed: Darting
TableOne row for each
WBC_COUNTScount from a blood smear collected during a darting
 
SWERB Data (Group-level Geolocation Data)
TableOne row for each
AERIALSaerial photo used for map quadrant specification
GPS_UNITSGPS device
QUAD_DATASWERB map quadrant
SWERB_BESuninterrupted bout of group-level observation
SWERB_DATAevent related to group-level geolocation
SWERB_DEPARTS_DATAdeparture from camp of a observation team which collected SWERB data
SWERB_GWSgeolocated physical object (grove or waterhole)
SWERB_GW_LOC_DATArecorded location of a geolocated physical object (grove or waterhole)
SWERB_LOC_DATAobservation of a group at a time at a geolocated physical object
SWERB_LOC_GPSobservation of a group at a time at a geolocated physical object made using gps units and a protocol that requires 2 waypoint readings
SWERB_OBSERVERSdeparture from camp of an observer who drove or collected SWERB data
 
Weather Data
TableOne row for each
RAINGAUGESrain gauge reading
RGSETUPSrain gauge installation
TEMPMAXSmaximum temperature reading
TEMPMINSminimum temperature reading
WEATHERHAWKweather reading reported by the WeatherHawk instruments
WREADINGSmanually collected meteorological data collection event
 

[a] At this time of this writing only males have data entered into RANKDATES.


The significant aspects of the the support tables are: the Id column -- the name of the column holding the vocabulary term, which columns of which tables use the vocabulary, and what sort of vocabulary the table defines. Table 2.3 summarizes this information.

Note

The Id columns throughout Babase do not allow values that are NULL, or which are textual but contain no characters, or which consist solely of spaces.

Table 2.3. The Babase Support Tables

General Support Tables
TableId ColumnRelated Column(s) One entry for every possible choice of...
OBSERVERSInitialsSAMPLES.Observer, WREADINGS.WRperson, RGSETUPS.RGSPerson, CROWNRUMPS.CRobserver, CHESTS.Chobserver, ULNAS.Ulobserver, HUMERUSES.Huobserver, SWERB_OBSERVERS.Observerperson who records information
OBSERVER_ROLESInitialsOBSERVERS.Role, OBSERVERS.SWERB_Observer_Role, OBSERVERS.SWERB_Driver_Role, SWERB_OBSERVERS.Roleway in which a person can be involved in the data collection process
UNKSNAMESUnksnameNEIGHBORS.Unksname and the SWERB_UPLOAD viewproblem in identifying neighbor of focal during point sampling or in identifying a lone male in a SWERB other group observation
 
Group Membership and Life Events
TableId ColumnRelated Column(s) One entry for every possible choice of...
BSTATUSESBstatusBIOGRAPH.Bstatusbirthday estimation accuracy
CONFIDENCESConfidenceBIOGRAPH.DcauseNatureConfidence, BIOGRAPH.DcauseAgentConfidence, DISPERSEDATES.Dispconfidence, BIOGRAPH.Matgrpconfidencedegree of certitude in nature of death, agent of death, disperse date assignment, or maternal group assignment
DAD_SOFTWARESoftwareDAD_DATA.Softwaresoftware package used to perform genetic paternity analysis
DCAUSESDcauseBIOGRAPH.Dcausecause of death
DEATHNATURESNatureDCAUSES.Naturereason for death
DEMOG_REFERENCESReferenceDEMOG.Referencedata source for demography notes
MSTATUSESMstatusMATUREDATES.Matured, RANKDATES.Rankedmaturity marker date estimation process
PATERNITY_COMPLETENESSCompletenessDAD_DATA.Completenesscategory of analysis completeness
PATERNITY_MISMATCHESMismatchDAD_DATA.Consensus_Mismatchcategory of genetic mismatch
RNKTYPESRnktypeRANKS.Rnktyperank ordering assigned to subject and month
STATUSESStatusBIOGRAPH.Statusbaboon alive at last observation
 
Social and Multiparty Interactions
TableId ColumnRelated Column(s) One entry for every possible choice of...
ACTIVITIESActivityPOINT_DATA.Activityactivity classification
ACTSActINTERACT_DATA.Actinteraction classification
DATA_STRUCTURESData_StructureSETUPIDS.Data_Structureversion of data structure produced by the palmtops which collect data
CONTEXT_TYPESContext_typeMPIS.Context_typecontext in which a multiparty interaction occurs
FOODCODESFoodcodePOINT_DATA.Foodcodename of a food item
FOODTYPESFtypeFOODCODES.Ftypefood category
KIDCONTACTSKidcontactFPOINTS.Kidcontactspatial relationship between mother and infant
MPIACTSMpiactMPI_DATA.MPIActmultiparty interaction classification
NCODESNcodeNEIGHBORS.Ncodeneighbor classification
PALMTOPSPalmtopSAMPLES.Palmtophand-held computer used in the field
PARTUNKSUnksnameMPI_PARTS.Unksnameproblem in identifying participant in a multiparty interaction
POSTURESPosturePOINT_DATA.Posturedesignated posture
PROGRAMIDSProgramidSAMPLES.Programidversion of program used on the palmtops to collect data
SETUPIDSSetupidSAMPLES.Setupidsetupfile used on the palmtops to collect data
SUCKLESSuckleFPOINTS.Kidsuckleinfant suckling activity
 
Sexual Cycles and The Sexual Cycle Day-By-Day Tables
TableId ColumnRelated Column(s) One entry for every possible choice of...
PCSCOLORSColorPCSKINS.Colorparacallosal skin coloration
 
Darting
TableId ColumnRelated Column(s) One entry for every possible choice of...
BODYPARTSBodypartTICKS.Bodypart, BODYPARTS.Bodyregionpart of the body examined for parasites when darting
DART_SAMPLE_CATSDs_catDART_SAMPLE_CATS.DS_Catcategory of darting sample type
DART_SAMPLE_TYPESDS_TypeDART_SAMPLE_TYPES.DS_Typetype of sample collected during dartings
DRUGSDrugDRUGS.Druganesthetic drug
LYMPHSTATESLymphstateDPHYS.Ringnode, DPHYS.Lingnode, DPHYS.Raxnode, DPHYS.Laxnode, DPHYS.Lsubmandnode, DPHYS.Rsubmandnodelymph node condition
PARASITESPARASITETICKS.Tickkindparasite species, species developmental stage, or kind of parasite sign counted
TCONDITIONSTconditionTEETH.Tconditionphysical condition of a tooth
TICKSTATUSESTickstatusTICKS.Tickstatusparasite count outcome category
TOOTHCODESToothTEETH.Toothadult or deciduous tooth
TOOTHSITESToothsiteTOOTHCODES.Toothsitedental site within the mouth
TSTATESTstateTEETH.Tstatetooth presence
 
SWERB Data (Group-level Geolocation Data)
TableId ColumnRelated Column(s) One entry for every possible ...
ADCODESADCodeSWERB_LOC_DATA.ADcoderelationship between baboon groups and sleeping groves.
SWERB_LOC_CONFS (SWERB sleeping grove Confidences)ConfSWERB_LOC_DATA.Conflevel of confidence in sleeping grove on record.
SWERB_TIME_SOURCESSourceSWERB_BES.Bsource, SWERB_BES.Esourcedata source used to estimate beginning and ending of observation bouts
SWERB_XYSOURCES (SWERB Time Sources)SourceSWERB_GW_LOC_DATA.XYSource data source used to obtain XY coordinates
 
Weather Data
TableId ColumnRelated Column(s) One entry for every possible choice of...
WEATHERHAWK_SOFTWARESWSoftwareWEATHERHAWK.WSoftwaresoftware used to retrieve data from a WeatherHawk instrument
WSTATIONSWstationWREADINGS.Wstationmeteorological data collection location or device

Entity-Relationship Diagrams

Most tables have have an id, or key, column that contains a number unique to that row within its table. The id can be used, in perpetuity, to refer to its related row and distinguish it from all the other rows of the table. Ids are arbitrary, although for convenience they are often sequentially generated integers. The name of the column is not always Id, although it sometimes is.

A relationship is established between the rows of two tables when an id value from one table appears as data in the other. The relationship notion is made most clear by way of diagrams and examples. If the next paragraph is unclear, don't worry. Have a look at the Babase diagrams below by way of example and see if that does not clear things up. The relationship concept is at the heart of relational databases and, while the underlying idea is rather simple, it took many years to develop relational database concepts[19] so don't expect a full understanding immediately.

When an id value of a row in one table appears as data in a second table, the data in the second table can be used to retrieve the identified row from the first table.[20] When an id value of a row in the first table appears as data only once in the second table, the two tables are said to have a one-to-one relationship. One row in the first table relates to one (or possibly zero) row(s) in the second table. When a row's id value can appear in more than one row of a second table, the two tables are said to have a one-to-many relationship. One row of the first table can be related to many rows in the second table. One-to-many relationships are more common than one-to-one relationships. The relationship between the various Babase tables can be visualized in entity relationship diagrams, as shown here. In this diagram each table (entity) is a box, and each box contains a list of the table's columns. The lines between the boxes represent the relationships between the tables.

Note

If you have trouble viewing the diagrams in your browser, you may wish to view them in PDF format. The diagrams are available in The Babase Pocket Reference (approx. 4.8MB) in PDF form.

Figure 2.1. Key to the Babase Entity Relationship Diagrams

If we could we would display the diagram key here.


Figure 2.2. Babase Group Membership Entity Relationship Diagram

If we could we would display a diagram here depicting censusing and group membership.


Figure 2.3. Babase Life Events Entity Relationship Diagram

If we could we would display here a diagram depicting maturity markers and ranking.


Figure 2.4. Babase Sexual Cycle Entity Relationship Diagram

If we could we would display a diagram here depicting female sexual cycle information.


Figure 2.5. Babase Sexual Cycle Day-To-Day Tables Entity Relationship Diagram

If we could we would display a diagram here depicting female sexual cycle day-to-day tables.


Figure 2.6. Babase Social Interactions Entity Relationship Diagram

If we could we would display a diagram here depicting social interactions and focal point samples.


Figure 2.7. Babase Multiparty Interactions Entity Relationship Diagram

If we could we would display a diagram here depicting multiparty interactions.


Figure 2.8. Babase Darting Logistics and Morphology Entity and Relationship Diagram

If we could we would display a diagram here depicting darting logistics and morphology.


Figure 2.9. Babase Darting Physiology Entity and Relationship Diagram

If we could we would display a diagram here depicting darting logistics and morphology.


Figure 2.10. Babase Darting Samples Entity and Relationship Diagram

If we could we would display a diagram here depicting darting logistics and morphology.


Figure 2.11. Babase Darting Teeth and Ticks Entity and Relationship Diagram

If we could we would display a diagram here depicting darting logistics and morphology.


Figure 2.12. Babase Hybrid Scores Data Entity Relationship Diagram

If we could we would display here a diagram depicting Babase Hybrid Score Data tables.


Figure 2.13. Babase SWERB Core Tables Entity Relationship Diagram

If we could we would display a diagram here depicting the SWERB core tables.


Figure 2.14. Babase SWERB Grove/Waterhole Location Tables Entity Relationship Diagram

If we could we would display a diagram here depicting the SWERB Grove/Waterhole Location tables.


Figure 2.15. Babase Manual Weather Data Entity Relationship Diagram

If we could we would display here a diagram depicting Babase manual weather Data Samples.


Figure 2.16. Babase WeatherHawk Data Entity Relationship Diagram

If we could we would display here a diagram depicting Babase WeatherHawk Data Samples.


Views

Views provide an alternative to direct reference of Babase tables. Views appear to be tables, but are really pre-composed queries into the underlying Babase tables. Views can be used almost anywhere in Babase in place of a table, specifically, they can be queried just like tables. An SQL query can freely intermix the use of tables and views.

Important

Babase uses views to hide implementation details, details that may change as Babase develops. Tables that have names ending in _DATA should not be used, there is always a view of the data in these tables that may be used in their place. Tables ending in _DATA may change in future Babase minor releases, breaking queries and programs that use the table. Use of the corresponding views will ensure compatibility with future Babase releases.

Views make it easy to reuse complex or commonly used queries, or portions of queries. They allow a database designed around the capabilities of the computer to be interacted with in a fashion that makes sense to people. Although the views do not appear in the entity relationship diagrams that document the underlying database, and so are omitted from the high level overview these diagrams provide, most Babase users will greatly benefit if they take the time to understand how the views fit into the overall database and will usually find it easier to work with the views than with the underlying tables.

Table 2.4. The Babase Views

Group Membership and Life Events
ViewOne row for eachPurposeTables/Views used
CENSUS_DEMOGCENSUS rowMaintenance of CENSUS rows that are extended with DEMOG information.CENSUS, DEMOG
CENSUS_DEMOG_SORTEDCENSUS rowMaintenance of CENSUS_DEMOG rows in a pre-sorted fashion.CENSUS, DEMOG
CYCPOINTS_CYCLESCYCPOINTS rowMaintenance of CYCPOINTS rows that are extended with CYCLES information.CYCLES, CYCPOINTS
CYCPOINTS_CYCLES_SORTEDCYCPOINTS rowThe CYCPOINTS_CYCLES view sorted by CYCLES.Sname, by CYCPOINTS.Date.CYCLES, CYCPOINTS
DEMOG_CENSUSDEMOG rowMaintenance of DEMOG rows.CENSUS, DEMOG
DEMOG_CENSUS_SORTEDCENSUS rowMaintenance of DEMOG_CENSUS rows in a pre-sorted fashion.CENSUS, DEMOG
GROUPS_HISTORYGROUPS rowDepiction of GROUPS rows in a more human-readable format.GROUPS
PARENTSBIOGRAPH row for which there is either a row in MATERNITIES with a record of the individual's mother or there is a row in DAD_DATA with a record of the individual's father -- with a non-NULLDad_consensus.Easy access to parental information.BIOGRAPH, MATERNITIES, DAD_DATA, MEMBERS
POTENTIAL_DADS(completed) female reproductive event for every male more than 2192 days old (approximately 6 years) present in the mother's group during her fertile periodResearch into paternity, especially the selection of potential fathers for further genetic testing.MATERNITIES, MEMBERS (multiple times), ACTOR_ACTEES (multiple times), BIOGRAPH, RANKDATES, MATUREDATES
 
Sexual Cycles
ViewOne row for eachPurposeTables/Views used
CYCLES_SEXSKINSCYCLES rowMaintenance of SEXSKINS rows.CYCLES, SEXSKINS
CYCLES_SEXSKINS_SORTEDCYCLES rowThe CYCLES_SEXSKINS view sorted by CYCLES.Sname, by SEXSKINS.Date.CYCLES, SEXSKINS
MATERNITIESbirth or fetal lossSummarizes (completed) reproductive events.BIOGRAPH, PREGS, CYCPOINTS, CYCLES
MTD_CYCLESCYCLES rowPresents CYCLES together with Mdate, Tdate, and Ddate CYCPOINTS information for a view of an "entire" sexual cycle as a single row.CYCLES, CYCPOINTS
PCSKINS_SORTEDPCSKINS rowSorts PCSKINS by Sname for ease of maintenance.PCSKINS
SEXSKINS_CYCLESSEXSKINS rowMaintenance of SEXSKINS rows.CYCLES, SEXSKINS
SEXSKINS_CYCLES_SORTEDSEXSKINS rowThe SEXSKINS_CYCLES view sorted by CYCLES.Sname, by SEXSKINS.Date.CYCLES, SEXSKINS
 
Social and Multiparty Interactions
ViewOne row for eachPurposeTables/Views used
ACTOR_ACTEESINTERACT rowMaintenance of social interaction data, INTERACT rows and POINTS. A view optimized for highest performance when working with these tables. Analysis of social interaction data.INTERACT, PARTS
INTERACTINTERACT_DATA rowPresents INTERACT_DATA with additional date and time columns that transform the underlying date and time columns in useful and interesting ways.INTERACT_DATA
INTERACT_SORTEDINTERACT_DATA rowPresents the INTERACT view sorted in a fashion expected to ease maintenance.INTERACT_DATA
MPI_EVENTSMPI_DATA rowAnalysis and correction of multiparty interaction data.MPI_DATA, MPI_PARTS, MPIACTS
POINTSPOINT_DATA rowPresents POINT_DATA with the Ptime column transformed into a column that may be useful and interesting.POINT_DATA
POINTS_SORTEDPOINTS rowPresents POINTS sorted by Sid, and within that by Ptime.POINTS
SAMPLES_GOFFSAMPLES rowPresents SAMPLES with an additional column Grp_of_focal, which has the group of the focal at the time of sampling.SAMPLES
 
Darting
ViewOne row for eachPurposeTables/Views used
ANESTH_STATSunique ANESTHS.Dartid value -- for each darting during which additional anesthetic was administeredAnalysis and eyeballing of data involving additional administration of anesthetic when darting.ANESTHS
BODYTEMP_STATSunique BODYTEMPS.Dartid value -- for each darting having body temperature measurementsAnalysis and eyeballing of darting body temperature measurements.BODYTEMPS
CHEST_STATSunique CHESTS.Dartid value -- for each darting having chest circumference measurementsAnalysis and eyeballing of darting chest circumference measurements.CHESTS
CROWNRUMP_STATSunique CROWNRUMPS.Dartid value -- for each darting having crown-to-rump measurementsAnalysis and eyeballing of darting crown-to-rump measurements.CROWNRUMPS
DSAMPLESunique DARTINGS.Dartid value -- for each dartingVisualization of all samples collected per darting.DARTINGS, MEMBERS, DART_SAMPLES
DENT_CODESunique TEETH.Dartid value -- for each darting with recorded tooth informationPerusal and maintenance of TEETH rows by kind of tooth.TEETH
DENT_SITESunique TEETH.Dartid value -- for each darting with recorded tooth informationPerusal of TEETH rows by position in the mouth.TEETH, TOOTHCODES
HUMERUS_STATSunique HUMERUSES.Dartid value -- for each darting having humerus length measurementsAnalysis and eyeballing of darting humerus length measurements.HUMERUSES
PCV_STATSunique PCVS.Dartid value -- for each darting having PCV measurementsAnalysis and eyeballing of darting PCV measurements.PCVS
TESTES_ARC_STATSunique TESTES_ARC.Dartid value -- for each darting having at least one measurement of testes length or width circumferenceAnalysis of testes length and width measurements taken during darting.TESTES_ARC
TESTES_DIAM_STATSunique TESTES_DIAM.Dartid value -- for each darting having at least one measurement of testes length or width diameterAnalysis of testes length and width measurements taken during darting.TESTES_DIAM
ULNA_STATSunique ULNAS.Dartid value -- for each darting having ulna length measurementsAnalysis and eyeballing of darting ulna length measurements.ULNAS
 
SWERB Data (Group-level Geolocation Data)
ViewOne row for eachPurposeTables/Views used
QUADSQUAD_DATA rowQuerying of X, Y coodinates from and maintenance of QUAD_DATA rows.QUAD_DATA
SWERBSWERB_DATA row -- for every SWERB event, departure from camp excludedCollects SWERB related information spread among several tables and separates geolocation points into X and Y coordinates.SWERB_DATA, QUADS, SWERB_BES, SWERB_DEPARTS_DATA, SWERB_DEPARTS_GPS
SWERB_DATA_XYSWERB_DATA row -- for every SWERB event, departure from camp excludedSeparates SWERB_DATA geolocation points into X and Y coordinates for ease of maintenance.SWERB_DATA
SWERB_DEPARTSSWERB_DEPARTS_DATArow -- for every departure from camp of every observation team, for those observation teams which have collected SWERB dataCollects departure related information spread among several tables and separates geolocation points into X and Y coordinates.SWERB_DEPARTS_DATA, SWERB_DEPARTS_GPS
SWERB_GW_LOCSSWERB_GW_LOC_DATA row -- for every geolocation of an object, of a grove or waterholeCollects SWERB grove and waterhole location information spread between tables and separates geolocation points into X and Y coordinates.SWERB_GW_LOC_DATA, QUADS
SWERB_GW_LOC_DATA_XYSWERB_GW_LOC_DATA row -- for every geolocation of an object, of a grove or waterholeSeparates SWERB_GW_LOC_DATA geolocation points into X and Y coordinates for ease of maintenance.SWERB_GW_LOC_DATA
SWERB_LOC_GPS_XYSWERB_LOC_GPS row -- for every time a group is observed at a geolocated physical object, usually a grove or waterhole, and 2 GPS waypoints are required to by the protocol to collect the dataSeparates SWERB_LOC_GPS geolocation points into X and Y coordinates for ease of maintenance.SWERB_LOC_DATA, ADCODES
SWERB_LOCSSWERB_LOC_DATA row -- for every time a group is observed at a geolocated physical object, usually a grove or waterholePresents the relationship between the groups and physical features of the landscape in a more comprehensive manner for simpler querying.SWERB_LOC_DATA, ADCODES
SWERB_UPLOADrow uploaded into SWERBThis view returns no rows, it is used only to upload data into the swerb portion of Babase.SWERB_DEPARTS_DATA, SWERB_DEPARTS_GPS, SWERB_BES, SWERB_DATA, SWERB_LOC_DATA
 
Weather Data
ViewOne row for eachPurposeTables/Views used
MIN_MAXSWREADINGS rowAnalysis and correlation of manually collected weather data.WREADINGS TEMPMINS TEMPMAXS RAINGAUGES
MIN_MAXS_SORTEDWREADINGS rowThe MIN_MAXS view sorted for convienience.WREADINGS TEMPMINS TEMPMAXS RAINGAUGES

In addition to the above views there are a number of views which produce the group of a referenced individual as of a pertinent date. These views are all named after the table from which they are derived, with the addition of the suffixed _GRP. They are nearly identical to the table from which they derive, differing only by the addition of a column named Grp. The views which produce an individual's group are listed in the following table.


Special Values

To as great an extent as possible Babase utilizes a controlled vocabulary within the system's data store. Again, as far as is possible, this vocabulary may be tailored by adding or deleting codes to tables that define the vocabulary used elsewhere.[21]

At times, the Babase system recognizes that particular codes have special meanings, for example, the BIOGRAPH table's F (female) Sex code or the 0 (alive) Status code. The meaning of these codes is fixed into the logic of the system. As examples, an individual must be female to be allowed to have a menstruation, or, the individual must be alive if a sexual cycle event is to post-date the individual's Statdate. Some of these codes, like sex, are not defined in tables, they are hardcoded into the system. Others are defined in support or other tables. Because these codes have intrinsic meaning, they cannot be removed from the Babase system nor should their presence in the data be used to code a different meaning from that which the code presently has. For example, the meaning of STATUSES code value 0 should not be changed to mean death due to meteorite impact because the system's programs would then allow dead individuals to have sexual cycles. Each of the special values that the system requires retain particular meaning is listed in the Special Values section of the table's documentation. For further information on the meaning of the special values, see the description of the data table(s) that contain the code values. Should the meaning of one of these special values need to be changed, the logic in the Babase programs should be adjusted to reflect the change.

Babase prevents ordinary users from altering rows that contain special values in an attempt to prevent mis-configuration of the system. Only users with permissions to modify a table's triggers may alter the table's special values. This is not a panacea. To return to the example above, not only does the system expect a STATUSES code of 0 to mean alive, it also expects 0 to be the only code on STATUSES that means alive. If another STATUSES code is created to indicate a more specific sort of alive-ness, unless re-programmed the system will consider all individuals given that code to be dead, not alive. A careful review of the documentation should be undertaken before modifying the content of tables that instantiate special values.

Indexes

Indexes are a feature of databases, a feature which greatly speeds data retrieval. In return there is a small cost in the time it takes to change table content, and cost in disk space used. Databases generally require indexes to perform efficently. It is a good idea to index the tables each user has in their personal schema.

There is no documentation on the indexes used in Babase. In general, there is an index for each way the tables are commonly referenced. For example, if records are often looked up on the basis of date, there will be an index on the date. As a practical guide, there is an index on each of the columns at the endpoint of a relational line in the above entity-relationship diagrams, as well as an index on every date column with the exception of the CYCPOINTS table's Edate and Ldate columns. Almost all indexes are b-tree indexes.

The Babase Program Code

Babase uses common and widespread Unix development tools and techniques[22] to minimize a new developer's learning curve. This is a vain hope. Babase is complex and contains a lot of moving parts.

The remainder of this section describes conventions and procedures that those working with the Babase source code are expected to follow. It is of interest primarily to those who work with, or are considering working with, the code. It is not a comprehensive list, guidance should be taken from the existing code.

Anything and everything that is part of Babase should be checked into the project's revision control system.

All data values used in the code should be abstracted, either via m4 or PHP defines, using names that begin with bb_.

Minimize hardcoding. The use of data values in the code should be minimized. By keeping the number of hardcoded values to a minimum, the values used within the system can be altered through procedural changes alone, expensive programming can be avoided, and the flexibility of the system is increased.[23]

All database extension, triggers, functions, etc. should be written in PL/PgSQL, supplemented by m4.

All stand-alone programs should be accessible via the web. They should be written in PHP and styled with CSS2. The web pages they produce should be XHTML 1.0 compliant and should pass W3C validation at http://validator.w3.org/. Style sheets should pass the CSS validator at http://jigsaw.w3.org/css-validator/. Programs that access the database should obtain their PostgreSQL login credentials from the user, preferably using the existing PHP library code.

Each database user must be assigned unique login credentials to the PostgreSQL database. Each user is responsible for the security of his own login credentials and should never use login credentials that are not her own. All code should support this paradigm.

Every file should begin with a statement of copyright.

Each program, function, or procedure should have documented: its input arguments; its return value; any side effects including changes to pass-by-reference arguments, changes to the screen, changes to the database cursors, etc.

Clarity in your code is more important than efficiency. If the code is not clear, it is less likely to work and more likely to have bugs introduced upon maintenance. There is no point in getting a wrong answer quickly.

See the README files in the source tree's directories for information on how the source code is organized.



[17] As security restrictions permit, of course.

[18] That way if you unknowingly revealed your password to the terrorists last weekend when you were drunk, by the time everybody sobers up the password will have been changed and the amount of damage done is limited.

[19] Don't try this at home! Trained Professionals Only! Etc. ;-)

[20] And the reverse is true. The id of a row in the first table can be used to find the row in the second table that holds it.

[21] Examples may be readily found in the Chapter: “Support Tables.

[22] Usually.

[23] This is very important but the reasons behind it are not obvious, coding values into the programs means creating office procedures that cannot be altered without a programmer. For example, encoding the value of the unknown group into the system would make it impossible to create different unknown groups for animals disappearing from different groups, or different unknown groups for animals disappearing in varying states of health, or whatever.

Chapter 3. Baboon Data: Primary Source Material

Table of Contents

Group Membership and Life Events
ALTERNATE_SNAMES (Alternate Short Names)
BEHAVE_GAPS (Gaps in Behavior Observations)
BIOGRAPH (Baboon Biographical Data)
CENSUS (Group Membership)
CONSORTDATES (First Consortship Dates)
DEMOG (Demography Notes)
DISPERSEDATES (Dispersal Dates)
GROUPS (Groups)
MATUREDATES (Sexual Maturity Dates)
RANKDATES (Adult Rank Attainment Dates)
Sexual Cycles
CYCGAPS (Gaps in Female Cycle Observations)
CYCLES (Female Sexual Cycles)
CYCPOINTS (Female Sexual Cycle Events)
PCSKINS (ParaCallosal Skin observations)
PREGS (Pregnancies)
SEXSKINS (Sexskin Turgesence Measurements)
Social and Multiparty Interactions
ALLMISCS (Ad-libitum sample data)
CONSORTS (multiparty disputes over CONSORTshipS)
FPOINTS (Point data on Females)
INTERACT_DATA (Interactions)
MPIS (Multiparty InteractionS)
MPI_DATA (Multiparty dyadic Interactions)
MPI_PARTS (Multiparty Interaction PARTicipantS)
PARTS (Participants in interactions)
POINT_DATA (Point observation data)
NEIGHBORS (point observation data on Neighbors)
SAMPLES (all-occurrences Samples)
Darting
ANESTHS (Extra Sedation Administered During Darting)
BODYTEMPS (Darting Body Temperature Measurements)
CHESTS (Darting Chest Circumference Measurements)
CROWNRUMPS (Darting Crown-to-Rump Measurements)
DART_SAMPLES (Darting Tissue Sample Records)
DARTINGS (Baboon Darting Events)
DPHYS (Darting Physiological Measurements)
HUMERUSES (Darting Humerus Length Measurements)
PCVS (Darting Blood Measurements)
TEETH (Darting Tooth Data)
TESTES_ARC (Darting Testes circumference Data)
TESTES_DIAM (Darting Testes Diameter Data)
TICKS (Darting Tick and Parasite Data)
ULNAS (Darting Ulna Length Measurements)
SWERB Data (Group-level Geolocation Data)
AERIALS (Aerial photos)
GPS_UNITS (Individual GPS Devices)
QUAD_DATA (map Quadrants)
SWERB_BES (Begin/Ends: Uninterrupted bouts of group-level observation)
SWERB_DATA (Group Level GPS Point Samples)
SWERB_DEPARTS_DATA (Observation team departures from camp)
SWERB_DEPARTS_GPS (SWERB GPS Departure data)
SWERB_GWS (SWERB Grove and Waterholes)
SWERB_GW_LOC_DATA (SWERB Grove/Waterhole Location Data)
SWERB_LOC_DATA
SWERB_LOC_GPS
SWERB_OBSERVERS
TREES
Weather Data
RAINGAUGES (Rain Measurements)
RGSETUPS (Rain Gauge Setups)
TEMPMINS (Minimum Temperature Measurements)
TEMPMAXS (Maximum Temperature Measurements)
WEATHERHAWK (WeatherHawk Data)
WREADINGS (Weather Readings)

These tables contain the permanent records of baboon-related data. For the most part this data are as collected in the field, although presumably the field staff is not perfect and there will be some errors that are corrected before data entry into Babase. Some columns, and more rarely entire rows, do contain derived data. Some of the derived data, such as pregnancy parity, is manually maintained, other derived data, such as sexual cycle sequence numbers or menses dates computed from onset of turgesence, is maintained by the system. The documentation clearly indicates which data are collected in the field, which data are derivative, and how derived data values are constructed.[24]

Group Membership and Life Events

ALTERNATE_SNAMES (Alternate Short Names)

This table records cases where short names (Snames) were assigned to individuals and then the choice of name was rescinded. It contains one row for every rescinded Sname, linking the rescinded value to the Sname presently assigned to the individual.

A new row may not be inserted into BIOGRAPH with an Sname value that is an Alternate_Sname value. However, in order to accommodate cases of switched identities, ALTERNATE_SNAME rows may have Alternate_Sname values which appear in the BIOGRAPH.Sname column.

The Sname value must differ from the Alternate_Sname value.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears.

This column may not be NULL and may not be 998.

Alternate_Sname (Alternate Short Name)

An Sname once associated with the individual identified in the Sname column. This column may not be empty, it must contain exactly 3 characters, it may not contain lower case letters, and it may not contain the space character. This column may not be NULL.

Name_Alternate (Alternate Name)

The name associated with the alternate sname. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

This column may be NULL.

Notes

Notes regarding the existence of the alternate Sname. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

This column may not be NULL.

BEHAVE_GAPS (Gaps in Behavior Observations)

This table indicates periods of time during which behavioral data (e.g. interactions, focal sampling) may be sparse or lacking for an indicated group. Data from indicated periods are not any less "valid" than data from any other times. However, when aggregating and analyzing data, the sparseness of data in a given period may affect the final results. This table points out such periods and allows users to decide for themselves how to deal with them.

The reason for each gap is also noted. Reasons for gaps vary widely, so these reasons are noted in a text column, rather than with a support table of possible "gap reasons". This makes querying for reasons unwieldy, but this is by design; the table is intended to be used as a guide for thoughtful consideration[25] of time periods where gaps in observation may be affecting analyses.

When discussed in this table, a "gap" does not necessarily mean a complete absence of data for the indicated period. It may merely refer to periods where collected data is sparser than usual. Also, a gap does not necessarily indicate that all data types are uniformly sparse. It may be that the gap only applies to a single type of data. Users should pay attention to the Gap_End_Status and Notes columns for details about which data types are affected.

Identification of a gap is done by a data manager. The system is not involved with this process, and does not handle data from gap periods differently than data from any other time periods. Those kinds of judgments are left for the user to make.

A group may have overlapping behavior gaps; it's possible for more than one factor to affect observation of a group at the same time.

A gap's Gap_End must be after its Gap_Start, or NULL. The Gap_End can only be NULL if the group's GROUPS.Cease_To_Exist is NULL. This allows for recording of ongoing, not-yet-completed gaps.

A gap's Gap_End and Gap_End_Status must both be NULL or both be non-NULL.

Column Descriptions

BGId (Behave_Gaps Identifier)

A unique integer identifying the BEHAVE_GAPS row.

This column is automatically maintained by the database and must not be NULL.

Grp (Group)

The Gid of the group affected by this gap.

This column must contain a Gid value of a row on the GROUPS table. This column may not be NULL.

Gap_Start (Start Date of the Gap)

The date on which the gap began. This date must be between the group's GROUPS.Start and GROUPS.Cease_To_Exist, inclusive.

This column may not be NULL.

Gap_End (End Date of the Gap)

The date on which the gap ended. This date must be between the group's GROUPS. Start and GROUPS.Cease_To_Exist, inclusive.

This column may be NULL, see above.

Gap_End_Status

The reason for, or status of, the gap's end. The legal values for this column are defined by the GAP_END_STATUSES support table.

This column may be NULL, see above.

Notes (Explanatory Notes)

Text notes about the gap, especially information about the gap's cause.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

BIOGRAPH (Baboon Biographical Data)

This table records the basic biographical data on baboons. It contains one row for each baboon, including still births and fetal deaths (collectively, fetal losses), on which data have been collected. In all cases the Statdate value must not be less than the Birth value. Live animals, those with a Status of 0, must have a recorded cause of death of not applicable, a Dcause of 0. Live animals that have no associated CENSUS rows (absences excepted) must have a Statdate equal to their Birth date. Animals with no recorded cause of death, a Dcause of 0, must have not applicable as the degree of confidence in both the nature and agent of death; their DcauseNatureConfidence and DcauseAgentConfidence must both be 0.

The system will generate an error when it finds a birth date that is later than the the team's last contact with the mother -- when the Birth date is later than the mother's Statdate.[26]

All individuals with an Sname, i.e. those that aren't fetal losses, must have a Name and will have rows in MEMBERS. Individuals with an Sname may not have their Sname removed (set to NULL).

Caution

The Psionload program treats an Sname value of 998 in a special fashion. 998 may not be used as an Sname value. See the Psionload documentation below for details.

Those rows that record data on fetal losses must maintain the following relations between their data values: the Sname, Name, Entrydate, and Entrytype values must be NULL; the Statdate must be the same as the birth date (Birth); and the Status must not be 0 (alive). Because fetal losses have no Sname they cannot have corresponding CENSUS rows and there will not be any record of their group membership in MEMBERS.

Entrydate and Entrytype can only be NULL for fetal losses--when their Sname is also NULL. Otherwise, they cannot be NULL and Entrydate must be between the individual's Birth and Statdate values, inclusive. When Entrytype is B (Birth), the Entrydate must be the individual's Birth. When Entrytype is any other value, Entrydate cannot equal Birth.

The Statdate of live individuals is derived from the CENSUS table. An actual census does not have to be taken. Any observation of an individual in a group that results in a row being added to CENSUS is sufficient, except that Absences don't count. When there are no non-absent censuses and the individual is alive, then the Statdate is the Entrydate. This column is automatically updated when CENSUS is updated to ensure that these conditions remain true. When the individual is not alive the Statdate is the date of death.

Caution

Living individuals, unlike dead ones, can have MEMBERS rows created by the interpolation procedure that locate the individual in a group on a date later than the individual's Statdate. For further information see: Interpolation at the Statdate.

In a like fashion, living individuals, unlike dead ones, can have CYCPOINTS rows created by automatic Mdate generation on a date later than the individual's Statdate. For further information see: Automatic Mdate Generation.

Caution

Male Dispersed dates may be after the Statdate when the individual is alive and there are subsequent censuses of the group from which the individual dispersed.

Caution

When dates are encoded as intervals to account for uncertainty in the data, as with the CYCPOINTS Edate and Ldate columns, the latter end of the interval may post-date the Statdate.

Aside from the preceding caveats, Babase does not allow data to be related with an individual when the date of the data postdates the individual's Statdate. Therefore Statdate provides a convenient way of determining the end of the time interval during which there are data on an individual, a way that is independent of whether the individual is alive or dead.

Bioid (Biograph IDentifier)

A unique integer identifying the BIOGRAPH row.

Babase does not use this identifier; it exists for the convenience of application programs. This column provides a convenient way to distinguish individuals without Snames, fetal losses, from each other and from other individuals.[27]

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Sname (Short Name)

The short name of the individual. This is an exactly three character long name abbreviation which is used to identify the individual and so must be a unique data value. It may not contain lower case letters or spaces.

Tip

The Sname is usually, but not always, the first 3 characters of the Name.

This value appears in many other places in the system and so should not be changed without changing all the other places in the database where the abbreviation appears; really, once established, the only reason to change this column is because the short name had already been used. [28] Because this is unlikely, Babase does not allow the Sname to be changed. The Sname is always composed of capital letters and may not contain a space.[29] This column should only be NULL if the row represents a fetal loss.

Name

The name of the individual. This is a textual column used for descriptive purposes. This value must be unique when a comparison is done in a case insensitive fashion. This column should only be NULL if the row records a fetal loss. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

The Pid value, from the PREGS table, of the individual's mother's pregnancy that ended in the birth[30]of the individual. This column may be NULL. A NULL value indicates there is no record of the individual's mother.

Caution

More than one individual may have the same Pid, as long as they were products of the same pregnancy. This occurs when twins are born into the study population.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Birth

The date the pregnancy ends. If the pregnancy results in a live birth, this date is the birth date of the offspring, otherwise, this is the date of the fetal loss. (A pregnancy that ends with the mother's death is considered as a spontaneous abortion (fetal loss) for this purpose.)

This column may not be NULL.

Birthday status. This column records the quality of the birth date estimate. The legal values for this column are defined by the BSTATUSES support table.

This column may not be NULL.

Sex

The sex of the individual. The legal values are:

Valid Sex Values
CodeDescription
Mthe individual is male
Fthe individual is female
Uthe individual is of unknown sex

This column may not be NULL.

The maternal group of the individual, the Gid of the group into which the individual was born.

This column must contain a Gid value of a row on the GROUPS table. This column may not be NULL.

Tip

If the maternal group is not known, the maternal group should be recorded as the unknown group.

Matgrpconfidence (confidence in maternal group assignment)

The degree of confidence in the assignment of the Matgrp value. The legal values for this column are defined by the CONFIDENCES support table.

This column may not be NULL.

Statdate

The status date of the individual. When the individual is alive, this is the latest date on which the animal was censused and found in a group.

This column may not be NULL.

The state of the individual's life at the Statdate. The legal values for this column are defined by the STATUSES support table.

This column may not be NULL.

The cause of death or circumstances associated with death. The legal values for this column are defined by the DCAUSES support table.

This column may not be NULL.

Alt_Snames (Alternate Short Names Exists)

A boolean value indicating whether or not there exist rows on the ALTERNATE_SNAMES table related to the individual's Sname. This value is true if and only if there exists a row on ALTERNATE_SNAMES with an Sname value which is the individual's sname or there exists an ALTERNATE_SNAMES row with a Alternate_Sname value which is the individual's sname.

The value in this column is automatically maintained and will never be NULL.

Entrydate

The date the individual entered the study population.

Note

Because of Interpolation, it may seem like this column could be maintained automatically. However, the opacity of "non-interpolating" rows in CENSUS and the related historical analyses prevent accurate automatic determination of the entry date for many individuals. For more information, see CENSUS.Status and Interpolation, Data are not Re-Analyzed.

This column can be NULL, only if the row represents a fetal loss.

Entrytype

The way the individual entered the study population. The legal values for this column are defined by the ENTRYTYPES table.

This column can be NULL, only if the row represents a fetal loss.

DcauseNatureConfidence (Confidence in Nature of Death)

The degree of confidence in the nature of the individual's death or circumstances associated with the individual's death (their DCAUSES.Nature). The legal values for this column are defined by the CONFIDENCES support table.

This column may not be NULL.

DcauseAgentConfidence (Confidence in Agent of Death)

The degree of confidence in the agent of the individual's death or circumstances associated with the individual's death (their DCAUSES.Agent). The legal values for this column are defined by the CONFIDENCES support table.

This column may not be NULL.

CENSUS (Group Membership)

The population census table. Aside from the BIOGRAPH.Matgrp column, this table is the origin of all information regarding group membership. This table holds all the field census data and any information regarding group membership that is recorded in the field demography notes. It contains one row per animal per group per day censused. There is an additional row per individual per demography note for those days when there is a demography note regarding the individual and group but no census of the group. (See DEMOG.)

Tip

One way to have Babase record that an individual is alone is to first create a row in GROUPS meaning alone, and then to assign individuals who are alone to this group. The alone-ness of an individual can then be tracked in the same fashion as group membership, although the Babase user does then need to be aware that the members of the alone group are not actually proximate to one another.

The system will report individuals who are first censused in a group other than their maternal group (BIOGRAPH.Matgrp). The exception to this is when the maternal group is the unknown group or censuses that record absence.

The system will report individuals with a SNAME that do not have any related (non-absent) CENSUS rows.

The system will report a warning when CENSUS rows have a Status of C and a Date before the individual's Entrydate.

As noted in the MEMBERS documentation, Babase does not allow an individual to be in more than one group on a given day.

The original field census data sheets can be recovered from CENSUS, with one exception. A datum is lost when an individual is actually censused in two groups on the same day because of movement between groups and the timing of the censuses.[31] In this situation a decision should be made as to which group CENSUS should record the individual's presence on that day. A demography note should then be added to DEMOG, with text that notes the individual's presence in the second group. This results, technically, in all of the information from both censuses, or other location information, being entered into the database. However, it should be remembered that, because the information regarding the second census is in textual form, it is not readily available to automated tools.

Caution

Be careful when changing these data. When CENSUS data are inserted, deleted, or updated, the MEMBERS table and BIOGRAPH.Statdate column are automatically updated via Interpolation. Also, remember that rank will almost certainly change should group membership change.

Cenid

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row. Cenid links CENSUS to DEMOG.

This column may not be NULL.

Date

The date of the census, or the date of the demography note (when Status is D).

Note

The date value must not be more than a year later than the present moment. This rule prevents accidental data entry errors from creating so many rows in MEMBERS that all available disk space is used.

This column may not be NULL.

The individual whose location is being recorded. The three-letter code that uniquely identifies an individual in BIOGRAPH. There will always be a row in BIOGRAPH for the individual identified here.

This column may not be NULL.

The group where the individual is located. This is a Gid value from GROUPS. This column should contain the most specific sub-grouping available -- subject to the constraints of the data entry protocol, of course. Aggregation into larger groupings is accomplished by retrieving the associated Supergroup from GROUPS and/or use of the supergroup() function.

This column may not be NULL.

Note

Usage exception: For the years 1989-1991, inclusive, the group recorded for the sub-groups of Alto's group do not necessarily reflect the actual groupings of the animals on a particular day, but are instead indications of the group-splitting process. See Protocol for Data Management: Amboseli Baboon Project document for a further explanation.

Status

A one letter code indicating the source of the location information. Status is the source of MEMBERS.Origin data. The current codes are as follows: C (census), A (absent), D (demography), and M or N (manual). Other values derived from analysis of historical data include: S, E, F, B, G, T, L, and R.

The CENSUS.Status Codes

C

(census) The animal was found in the group on a field census sheet: from the census datasheets. (There may or may not be a corresponding demography note on DEMOG as well.)

Tip

A C Status is marked on the field census data sheet as an X.

A

(absent) The animal was not found in the group on a field census sheet. Note that while an individual should not be recorded present in more than one group on the same day, s/he may be absent from several groups on any given day.

Tip

An A Status is marked on the field census data sheet as an 0.

D

(demography) The animal was noted, in the field notebooks or elsewhere, to be in a group but was not marked present in a field census of a study group on that day.[32] There should be a DEMOG row associated with the CENSUS row. The individual may or may not have been marked absent on the same group's field census for the day.[33]

Tip

A D Status is marked on the field census data sheet as an 0, when there exists a corresponding place on the census data sheet.

Warning

The system will allow CENSUS rows with a Status of D to be entered without there being a corresponding DEMOG row in existence.[34] However it is expected that these rows exist only long enough to allow entry of a related DEMOG row. The system will report CENSUS rows with a Status of D that have no related DEMOG row.

M

(manual, interpolated) This code provides a way to manually supplement what is in the CENSUS table when there is no other way to get the data in. Babase considers this code to be the same as the C code.

N

(manual, not interpolated) This code provides an alternative way to manually supplement what is in the CENSUS table when there is no other way to get the data in. This code does not interpolate, it is presumed to be the result of some analysis.

S

(Susan's data) The data comes from the old DISPERSE database where the record had both a Datein and a Dateout.

E

(ending date) The data comes from the old DISPERSE database where the record had a Datein but not a Dateout.

F

(final date) The data comes from the old DISPERSE database where there is a Dateout and the last recorded location is before the Statdate.

B

(birth date) The data comes from the old DISPERSE database where the record had a Dateout but not a Datein.

T

(total) The data comes from the old DISPERSE database where the record had neither a Datein nor a Dateout.

G

(gap) The data are a record of the animal in the unknown group when the animal appeared in the old DISPERSE database but where there was a gap between times of recorded location.

L

(lineage) The group is from the Matgrp on the old CYCTOT database, either because the animal did not appear in the DISPERSE database, or because the first location for the animal in the old DISPERSE database had a Datein and this Datein was after the birth date of the animal.

R

(result of Alto's breakup) The datum is S, E, F, B, G, T, or L datum that has had locations which were changed from 1.0 to the group in which the animal was censused on 15/4/92. This change left all R rows as part of a contiguous series of days during which the animals are located in the Alto's sub-group as censused on 15/4/92, and the time-adjacent locations were not 1.0.

This column may not be NULL.

Cen

Cen is whether or not the CENSUS row represents an entry on a field census data sheet. TRUE means the CENSUS row exists because of an entry on a census data sheet, FALSE means there was no census done and the CENSUS row exists to support a demography note, manual notation of absence, etc. Cen should only be TRUE when Status is C, A, or D.

This column may not be NULL.

CONSORTDATES (First Consortship Dates)

This table records the dates of first consortship for males; this is a maturational milestone in males that we have analyzed in several contexts. It contains one and only one row for every individual for which there is a recorded first consortship. Individuals who have not yet consorted, or individuals that have consorted but whose first consortship date is not known, do not appear in the table.

Tip

Currently it only contains values for males; females may be added if desired.

Tip

All dates are exact, no BY dates are entered as we do for MATUREDATES and RANKDATES, so there is no Status column.

When there is a row in this table there must be a sexual maturity date in MATUREDATES, and the consortship date must be later than the sexual maturity date. The Consorted date cannot be before the individual's Entrydate nor after the individual's Statdate. The individual must be at least 5 years of age on his Consorted date. The system will report a warning if the individual is 12 or more years of age on his Consorted date.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Consorted

The date the individual had its first consortship. This column may not be NULL.

DEMOG (Demography Notes)

This table holds the text that records group membership information not written on the regular field census sheets, especially that from the field demography notes. DEMOG provides a means of notating CENSUS rows, and thus facilitates management of additional free form CENSUS rows, rows that do not directly correspond with the field census sheets.[35] Thus, in conjunction with these corresponding CENSUS rows, the DEMOG rows capture group membership information that otherwise would not appear in the CENSUS table.

DEMOG contains one and only one row for every individual for every date for every group where the individual was noted present in free form textual field notes or other miscellaneous sources. The DEMOG row holds textual information. There is always exactly one corresponding CENSUS row, which holds the corresponding group membership information in the usual coded and structured form. (Note that only some CENSUS rows will have DEMOG rows; CENSUS rows that originate entirely in the regular censuses of groups will not, in general, have an associated DEMOG row). A single field note referring to more than one individual must appear in DEMOG as two (or more) separate rows, one row per individual. Multiple field notes pertaining to a single individual on a single date must be combined into one piece of text and entered in a single DEMOG row. (See the Protocol for Data Management: Amboseli Baboon Project for structure of the demography data as entered by the operator.)

Adding or removing DEMOG rows automatically updates the CENSUS.Status column of the corresponding CENSUS row.

Tip

Use the DEMOG_CENSUS view to upload datasets into this table. Use CENSUS_DEMOG view to maintain this table by hand.

Caution

The data integrity rules require that when a demography note is entered the CENSUS row be created before the related DEMOG row.

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row. Cenid links CENSUS to DEMOG.

This column may not be NULL.

Reference

A code that identifies the written field notebook or other source where the demography note can be found.

The legal values for this column are defined by the DEMOG_REFERENCES support table, see below. This column may not be NULL.

Comment

The demography note text pertaining to the CENSUS row with the given Cenid.

This column may be NULL.[36]This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

DISPERSEDATES (Dispersal Dates)

This table records dates of dispersal for males (females do not disperse and do not appear in this table). It contains one and only one row for every male who has a known date of dispersed from the study groups. Males who have not yet dispersed do not have a row in this table. Only males can have rows on this table.

Tip

All dates are exact, no BY dates are entered as we do for MATUREDATES and RANKDATES, so there is no Status column.

The system will report a warning when there is a row in this table and there is no sexual maturity date in MATUREDATES. The Dispersed date must be on or after the individual's Entrydate. The Dispersed date cannot be after the individual's Statdate when the individual is not alive (when BIOGRAPH.Status is not 0). When the individual is alive the Dispersed date may only be after the Statdate when the individual has been censused absent (CENSUS.Status is A) in the group[37] and the Dispersed date is not after the earliest such post-Statdate census date.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Dispersed

The date the individual (male) left its maternal group. This column may not be NULL.

Dispconfidence (Dispersal date confidence)

The degree of confidence in the assignment of dispersal date or rationale behind the assignment of the dispersal date. The legal values for this column are defined by the CONFIDENCES support table.

This column may not be NULL.

GROUPS (Groups)

This table contains one row for every group on which there is some recorded information. This includes not only the study groups and non-study groups, but also temporary sub-groups and the special group Unknown[38](See the Protocol for Data Management: Amboseli Baboon Project for when to use this special group.) When a sub-group becomes a regular group (after a fission is complete), the new group should be given a Permanent date to indicate that it is now a permanent group (Permanent is not NULL). Any old sub-groups that did not become permanent should be left in GROUPS to support the sub-grouping membership history.

Tip

This table serves primarily as a tool for the system for data validation. To see its contents in a more human-readable format, use the GROUPS_HISTORY view.

Every reference to a group elsewhere in the Babase system corresponds to a Gid of one of the records in this table. Temporary groups (those with Permanent of NULL) must have a non-NULL From_group value and must not have their own Gid value as their Supergroup. Permanent groups must not have a Permanent value that is earlier than their Start value, and must have their own Gid as their Supergroup. Permanent groups may or may not have a NULL From_group value. During data entry for groups that are fission products of other groups, the fission products have the parent group as their Supergroup. This is a temporary condition for those fission groups that go on to become groups of their own.

Note that there is no particular reason to remove from GROUPS those sub-groups that exist for only a short time during group fission. Those sorts of groups can remain temporary forever.

A GROUPS row's From_group value may not be the same as its Gid value.

Tip

The supergroup() function can be used to determine the supergroup of a group on any given date.

A group's Permanent and From_group cannot both be NULL. But both can be non-NULL.

The Last_Reg_Census value must be NULL or greater than the Start value. It also must be less than or equal to the group's Cease_To_Exist date, unless the Cease_To_Exist is also NULL.

The Cease_To_Exist value must be NULL or greater than the Start value. The Cease_To_Exist value must also be greater than or equal to all subgroups' Start values.

The Cease_To_Exist must equal the Permanent date of any subgroups, unless the subgroup's Permanent is NULL.

Caution

The system enforces this rule "on-commit". In a transaction ending with a ROLLBACK, any changes to this table will not be validated against this rule. This means it is possible for an invalid change to appear error-free if executed in a rolled-back transaction. Committed transactions (and commands executed outside of transactions) perform this check as expected.

The One_letter_code value must be unique within the time period from the group's Start date through the group's Cease_To_Exist date, inclusive of endpoints.

Individuals cannot be placed into rows in the CENSUS table before the Start date of the group, or cannot be censused in the group at all if the value of the Start column is NULL. Individuals cannot be placed into rows of the CENSUS table after the Cease_To_Exist value of the group. Note that both these restrictions apply to all CENSUS rows, even those that indicate the individual is absent from the group.

Gaps in observation of a group cannot be added to the BEHAVE_GAPS table if the Gap_Start or Gap_End are before the Start date of the group. Similarly, gaps cannot be added to BEHAVE_GAPS if the Gap_Start or Gap_End are after the Cease_To_Exist date.

Warning

Some gaps in BEHAVE_GAPS may have a Gap_Start date that is equal to the group's Start or Permanent date, implying that the gap started because of the opening of observation of the group.[39] Gaps may also have a BEHAVE_GAPS.Gap_End date equal to the group's Last_Reg_Census or Cease_To_Exist date, implying that the gap ended because of the group's end.[40] If the Start, Permanent, Last_Reg_Census, or Cease_To_Exist column is updated, then these implications will no longer be true. The system makes no attempt to judge whether these implications really are true or just coincidence, so data managers must exercise this judgment. When changing any of these dates in GROUPS, be sure to check for rows in BEHAVE_GAPS with Gap_Start or Gap_End dates that also should be updated, and correct them as needed.

Special Values

Group 9.0, Unknown, has a special meaning. Individuals are placed in this group by Interpolation when their whereabouts are unknown. Also, a SWERB_DATA.Seen_grp value of 9.0 in rows with an Event value of O indicates an exceptional circumstance where Seen_grp is allowed to equal the related SWERB_BES.Focal_grp value. Another group code for unknown whereabouts should not be created.

The 10.0 group (Gid has the special meaning of lone animal. The SWERB_UPLOAD view uses this value as the SWERB_DATA.Seen_grp when a lone animal is sighted. Another group code for lone animals should not be created.

Column Descriptions

Gid

A positive numeric value with five digits (3 decimal places) that identifies the group. Each Gid must be unique. This column may not be NULL.

Name

The spelled out name of the group. This column must be unique, and unique insensitive of case. This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

From_group

The Gid of the group from which this group split off, if the group is a fission product (or the larger of the two groups that fused to form it, if the group is a fusion product).[41] This column may be NULL.

Permanent

This column contains the date the group became a permanent, regular group, or contains NULL if it has not and is a temporary sub-group. For groups that were created as a result of fissions or fusions (and therefore have a non-NULL From_group), this column represents the end date of the fission/fusion period. For groups that were already intact when observation began (and therefore have a NULL From_group), this column represents the first day of observation on that group.

Note

Permanent affects whether or not an individual can be censused only in a sub-group and still be ranked in the parent supergroup. See RANKS and supergroup() for further information.

Supergroup

The Gid of the permanent group to which this group belongs. Most of the time this will be the same as the Gid column, but if the group is a temporary subgroup (Permanent is NULL) then this will be the first permanent group from which this group has descended. This column is maintained automatically by the system and should not be entered manually.[42] This column may not be NULL.

Start

The date the group came into existence (or the earliest date it must have existed in the case of those groups existent before they were monitored.) The value of this column may be NULL to indicate the group exists but is not monitored.

Cease_To_Exist

The date on which the group is deemed to have permanently dissolved. This column may be NULL for groups still under observation, groups that have not yet dissolved, and groups whose dissolution occurred while not under regular observation.

Last_Reg_Census (Last Regular Census)

The date of the last regular census done on the group for study groups that were dropped or ceased to exist because of fission/fusion. This column may be NULL if the group hasn't been dropped or was never a study group.

Three_letter_code

A 3 character, and exactly 3 character, code that uniquely identifies the group. The characters must all be upper case. This code is used by the Psion data collection devices and in SWERB observations taken using handheld GPS units and exists solely as a cross reference from those devices to the regular Babase group Gids. This column may be NULL if the group is never monitored using the Psion devices or SWERB GPS devices.

One_letter_code

A 1 character, and exactly 1 character, code that uniquely identifies the group within the time period of the groups existence. The character must all be upper case. This code is used to cross reference SWERB waypoint data to the regular Babase group Gids. This column may be NULL.

Study_Grp

A boolean that indicates whether or not this group has ever been an "official" study group.[43] This column may not be NULL.

MATUREDATES (Sexual Maturity Dates)

This table records sexual maturity dates, the dates of menarche or testicular enlargement. It contains one and only one row for every animal who matured in a study group or who lived in a study group as a sexually mature individual, and it may occasionally contain a row for a male who was known to mature but who did not live in a study group. Individuals who have not yet matured do not have a row in this table. All sexually mature individuals should have a row in this table. Entry into sexual maturity is not always an obvious or definite event[44], especially for males, so the Matured may be recorded as the first of the month in which the individual entered maturity.

There are restrictions on when an individual may become mature. The age of an individual at sexual maturity (Matured) must be at least 1016 days. This is about 2.7 years of age. The system will issue a warning when the sexual maturity occurs on or before the 3rd birthday. Individuals with a Mstatus of O (On) must be mature before 2798 days of age. This is about 7.5 years of age. The system will issue a warning when the sexual maturity occurs on or after the 7th birthday. An individual's sexual maturity date must be on or before his Statdate.

Some maturity dates are based on irregular observations of individuals before the long-term study began, or before the individuals entered an "official" study group. Either way, these individuals' Matured dates may be long before their Entrydate. Because of this, the system will allow but issue a warning when the month of the maturity date is earlier than the month of the individual's entry into the study population (their Entrydate).

For females, when Mstatus is O (On) Matured must be the first T date recorded in the female's sexual cycling data in the CYCPOINTS table. When Mstatus is not O Matured may not be after the first Tdate.

Caution

Changing a female's first Tdate can automatically change the female's Matured date. See CYCPOINTS.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Matured

This is the date of menarche for females and the date of testicular enlargement for males, when either of these dates are known. Otherwise, this is the date by which the individual is considered to be sexually mature. See the Protocol for Data Management: Amboseli Baboon Project for more information regarding the dates used when the transition to maturity was not observed.[45] This column may not be NULL.

Mstatus (Sexual Maturity Status)

The status of the maturity date, that is, its precision, accuracy, quality, or other pertinent characteristics when it comes to the use of the value. The legal values for this column are defined by the MSTATUSES support table, see below. This column may not be NULL.

Tip

This column records whether the animal became mature ON a given (known) date, or BY a given (known) date. If a date is designated as an ON date[46] then we are saying that we know the animal attained that marker ON that date.[47] If a date is designated as a "BY" date the animal was adult or subadult BY that date but we do not know when the individual attained it. This scheme allows easy identification of which animals are infants or juveniles on any given day and which are not.

RANKDATES (Adult Rank Attainment Dates)

This table records dates individuals first attained adult rank. It allows one and only one row for every individual who has attained adult rank. Individuals who have not yet obtained adult rank do not have a row in this table.

The system will report a warning when an individual has a rank (in RANKS) before their Ranked date that is higher (where 1 is highest) than another individual who has already attained adult rank.

Tip

RANKDATES currently contains only data for males but data for females may be added.

When there is a row in this table there must be a sexual maturity date in MATUREDATES. When MATUREDATES.Mstatus is O (On) then the rank attainment date must be later than the sexual maturity date. Otherwise, the rank attainment date must not be before the sexual maturity date. The Ranked date cannot be after the individual's Statdate. All individuals must be 5 or more years of age on their rank attainment date. Individuals with a Rstatus of O (On) must be less than 12 years of age on their rank attainment date. The system will report a warning for any males over 8.5 (exclusive) that have not yet attained adult rank.

It is possible that an individual will be known to have attained rank in a non-study group before they entered the study population (their Entrydate). Because of this, the system will allow but issue a warning if an individual's Ranked is before the first of the month of his Entrydate.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Ranked

The date the individual first attained a rank among adults. The date must fall on the first of the month. This column may not be NULL.

Rstatus

The status of the rank date, that is, its precision, accuracy, quality, or other pertinent characteristics when it comes to the use of the value. The legal values for this column are defined by the MSTATUSES support table. This column may not be NULL.

Tip

The legal values for this column are O (for ON) and B (for BY), as with Mstatus in the MATUREDATES table above.

Sexual Cycles

CYCGAPS (Gaps in Female Cycle Observations)

Records of the initiation and cessation of continuous periods of observation during which all of a female's cycling events are presumed, for the purpose of analysis, to have been observed. This table contains one row for each female for each initiation or cessation of a continuous period of observation.

A female is considered to be under continuous observation when all of her sexual cycle transition events -- Mdates, Tdates, and Ddates -- are observed or clearly implied by observational data.[48] When CYCGAPS contains a record of observation cessation this is an indication that some of a female's sexual cycle events have gone unrecorded. For this reason when the interval enclosed by a Mdate, Tdate, Ddate sequence contains CYCGAPS rows indicating interruption of observation, the sexual cycle transition dates to either side of the interruption must be in different sexual cycles. For further information on this and other ways CYCGAPS interacts with the rest of Babase, see the documentation on the CYCLES, CYCPOINTS, and PREGS tables.

The presumption is that females are under continuous observation -- females with no CYCGAPS are presumed to be under continuous observation. Consequently a female's earliest CYCGAPS Code must be E (End), denoting the end of a period of observation.

A female may not have two start of observation (Code S) without an intervening end of observation (Code E), or vice versa. Otherwise there would be starts without ends or ends without starts. A female may not have a single day observation (point, Code P) between a start of observation/end of observation pair of rows. The single day observation is redundant. An end of observation may be dated the day after a start of observation, but otherwise there must be at least a 1 day interval between a female's CYCGAPS rows. Otherwise the same pattern of observation could be recorded using fewer rows.

Caution

The aforementioned rules do not allow the introduction or removal of a new period of observation, a start of observation/end of observation pair of CYCGAPS rows, anywhere other than before a female's earliest CYCGAPS row or after her latest CYCGAPS row.[49] In order to get around this restriction the user requests specific violations of the rules and the system then introduces or removes start of observation/end of observation pairs in response.

To introduce a period of observation into the period of no observation that occurs between an end/start sequence, insert a point (Code P) observation and then an end (Code E) observation on successive days. The system will transform the point observation to a start observation, resulting in a start/end pair delineating a 2 day observational period. The dates of the new start/end may then be updated as needed to increase the number of days in the interval.

To introduce a period of no observation into the period of observation that occurs between a start/end sequence, insert a start of observation row (Code S). The system will insert an additional end of observation row the day before next, resulting in a 1 day period of no observation on the date preceding the row the user inserted. The dates of the new end/start may then be adjusted to increase the interval of no observation.

To remove a period of no observation update a start of observation row (Code S), assigning it the same date as the end of observation row that marks the beginning the interval to be removed. The system will remove both the start and the end of observation rows, eliminating the interval of no observation.

To reduce a period of observation to a single day update a start of observation row (Code S), assigning it the same date as the end of observation row that marks the other end of the interval to be reduced. The system will remove both the start and the end of observation rows, and replace it with a point (Code P) observation. Delete the new point observation to remove the period of observation entirely.

Rows with a Code value of S (Start) or P (single Point), that mark the beginning of observational periods or that represent isolated single days of observation, must have a value in the State column. All other rows, those with a code of E (End) that represent the end of an observational period, must have no value (NULL) in the State column. When a State value is present, it must correspond to the sexual cycle transition information on CYCPOINTS. For further information regarding required correspondences between CYCGAPS and CYCPOINTS, and how changes in CYCPOINTS can automatically change CYCGAPS with a Code of S, see the CYCPOINTS documentation below.

Only females may have related CYCGAPS rows.

This table is used in the construction of the sexual cycle day-by-day tables. It also affects the determination of which sexual cycle events (CYCPOINTS) are part of a single sexual cycle (CYCLES), the construction of automatic Mdates, and the validation of sexual cycles with respect to pregnancies.

Caution

The State value is ignored in all a female's CYCGAPS rows with Dates on or before the female's Matured, excepting the row with the latest date, as the sexual cycle day-by-day tables contain no rows before the date of sexual maturity.

The combination of Sname and Date is unique.

Gapid

A number that uniquely identifies each row.

Sname

The short name of the female. This column should contain the Sname of a female in BIOGRAPH. This column may not be NULL.

To simplify the database code, this value may not be changed.

Code

What kind of endpoint the date records. Legal values are:

The CYCGAPS.Code Values
CodeMnemonicDefinition
SStartthe date is the start of a period of observation
EEndthe date is the end of a period of observation
PPointthe date is an isolated observation that belongs with no other observations, it is both a start and an end of an observational period

Date

The date upon which observations began or ended. Observations were made on the given date.

Note

The date is not validated against any of the individual's life dates[50] to facilitate establishment of common dates for all females in a group. It does no harm, for instance, to stop observing a dead individual.

State (NULL allowed)

The state of the female's sexual cycle on the given date. Valid values are:

The CYCGAPS.State Values
CodeMnemonicDefinition
Mmensesfollicular -- Mdate (inclusive) to Tdate (exclusive)
Sswellingfollicular -- Tdate (inclusive) to 5 days prior to Ddate (exclusive)
Oovulating5 days prior to Ddate (inclusive) to Ddate (exclusive)
Ddeturgesenceluteal -- Ddate (inclusive) to Mdate (exclusive)
PpregnantDdate (exclusive) to birth (exclusive)
Llactatingbirth (inclusive) to Tdate (exclusive)

Must not be NULL when Code is S or P , must be NULL when code is E. See discussion in the table description above.

CYCLES (Female Sexual Cycles)

This table records information on the sexual cycle of the females, one row per female per cycle. Babase automatically manages the creation and destruction of CYCLES on the basis of the the sexual cycle transition events recorded in CYCPOINTS. The fundamental sexual cycle is a Mdate, Tdate, Ddate sequence. The Babase system automatically creates one row in CYCLES for every Mdate, Ddate, Tdate series in CYCPOINTS, and automatically destroys the CYCLES row when the Mdate/Tdate/Ddate aggregation is removed from CYCPOINTS. However, the rules Babase uses when automatically creating, destroying, or updating CYCLES are complicated by menarche, death, and gaps in observation.

Caution

CYCLES is special in that some of its data are automatically maintained by the system. The columns Seq and Series are updated automatically. For further information see the documentation that follows, and each column's documentation.

Tip

CYCLES rows should always have related CYCPOINTS rows[51], but as a practical matter it is necessary to create the CYCLES row before creating the related CYCPOINTS rows. This requires noting the Cid of the new cycles row so that it can be referenced in the new CYCPOINTS rows. Rather than do this by hand the CYCPOINTS_CYCLES view can be used. This allows a Sname to be specified with each new CYCPOINTS row and leaves it up to the system to either find or create an appropriate CYCLES row.

The system will report as an error those rows on CYCLES with no related CYCPOINTS rows.[52] CYCLES with no related CYCPOINTS must have a NULL Seq.

The aggregation of CYCPOINTS rows into cycles is automatically managed by Babase. The determination is based on the order in time of a female's CYCPOINTS rows and the information on gaps in observation present in CYCGAPS. The transition events recorded in CYCPOINTS are collected into sexual cycles, each cycle having (at most) an onset of menses date (Mdate), an onset of turgesence date (Tdate), and an onset of deturgesence date (Ddate), appearing in the order given here when ordered by date, and with none of the female's other Mdate, Tdate, or Ddate CYCPOINTS rows on the interval. Some sexual cycles may be missing one or more of the transition events, should there be no record of an observation. In this case CYCGAPS should be updated with a record of the gap in observation and the respective row is omitted from CYCPOINTS.

Part of Babase's automatic management of cycles is the management of cycle sequence numbers. Babase assigns a sequence number (Seq) to each of a female's cycles, beginning with 1 at menarche and counting up. As a consequence of the numbering scheme, the sexual cycle with a sequence (Seq) of 1 must not have an onset of menses date (Mdate).

Gaps in periods of continuous observation (CYCGAPS) impact Babase's determination of what constitutes a cycle. The presence of a gap in observation forces a change in cycle. (However, gaps in observation, missing cycles, do not cause gaps in the sequence numbering.) The introduction or removal of a gap, or for that matter the addition or removal of new CYCPOINTS rows, can result in the split of an existing cycle into two -- the creation of a new CYCLES row --, or the merging of two previously distinct cycles into one -- the destruction of an existing CYCLES row. When this occurs the later CYCPOINTS rows retain their Cid, it is the earlier CYCPOINTS rows that change their Cid and move between cycles.[53][54][55]

The sexual cycles themselves are aggregated into periods of continuous observation, termed series, indicated by the assignment of a Series number to each CYCLES row. The aggregation of a female's sexual cycles into a series is also automatically managed by Babase, based on the information in CYCGAPS. Although series are computed based on CYCGAPS, the series value aggregates and numbers sexual Mdates, Tdates, and Ddates, not periods of observation. A consequence is that some periods of observation may not have an associated Series number. Some observational periods may occur before the female's sexual maturity date or before any recorded sexual cycle transition events (CYCPOINTS). An individual's first period of continuous observation containing Mdates, Tdates, or Ddates has a Series of 1, the second a Series of 2, etc.

Aggregating a female's CYCLES rows into a series indicates that the collection of data points is believed to be complete, no unobserved or unrecorded sexual cycle transitions (CYCPOINTS rows) occurred during the time spanned by the series. This allows the Series to be used as the basis for an analysis of sexual cycle transition intervals.

Tip

Those CYCLES with a Series of 1 for those females that have an O (On) Mstatus have Seq values that equal the ordinal numbering of the female's actual cycles, her first ever cycle having a Seq of 1, her second a Seq of 2, etc. All other CYCLES rows have Seq values that are useful for ordering each female's cycles but not for comparison between females.

Caution

Because a gap in observation always triggers a change in cycle, and because cycles must be complete, i.e. must contain a Mdate, a Tdate, and a Ddate, if there is no gap in observation it is impossible to have a single cycle missing nothing but a Tdate, i.e. it is impossible to have a cycle with a Mdate and a Ddate but no Tdate. If necessary, an estimated Tdate may be entered to work around this limitation.[56]

The system reports an error when the combination of Sname and Seq is not unique.[57]

Cid (sexual Cycle IDentifier)

A numeric identifier identifying each sexual cycle. It is unique across all cycles of all females

This column need not be manually specified when the row is created.

The value of this column may not be altered after a row is created.

This column must not be NULL.

Sname

The short name of the female. This column must contain the Sname of a female in BIOGRAPH.

The value of this column may not be altered after a row is created.

This column must not be NULL.

Seq (Sequence)

The first sexual cycle of a female has a Seq value of 1, the second a value of 2, etc. The system will report an error if the Seq does not begin with 1 or is not contiguous. This column does not need to be manually maintained.

Caution

There are no gaps in the sequence numbers assigned to a female. Even when records of cycles are missing, the first recorded cycle after the missing period has a sequence one greater than the last recorded cycle before the missing period.

If the user does specify a value for this column the system may recompute and replace the supplied value at any time.

This column may be NULL when the row is first inserted, so that the system can set the value correctly when CYCPOINTS are subsequently inserted, but it may not be changed from a non-NULL value to NULL.

Series

Number indicating with which series of continuous observation the transition event belongs. Events that are isolated observations have a series of their own. As with Seq, the Series are per-female. Each female begins with a Series of 1 and is incremented with each interruption in regular observation. For further information see the description of the CYCLES table above.

The system will report an error if the Series does not begin with one or if the Series does not progress in a contiguous fashion. This column does not need to be manually maintained.

If the user does specify a value for this column the system may recompute and replace the supplied value at any time.

This column may be NULL when the row is first inserted, so that the system can set the value correctly when CYCPOINTS are subsequently inserted, but it may not be changed from a non-NULL value to NULL.

CYCPOINTS (Female Sexual Cycle Events)

This table records information on the sexual cycle of the females, one row per female per event. The usual events that mark the transitions of a female baboon's sexual cycles are onset of menses (Mdate), onset of turgesence (Tdate), and onset of deturgesence (Ddate). These different transition event dates are distinguished by Code values of M, T, and D respectively. In addition to these usual observations of transition states, CYCPOINTS contains one other kind of row, estimations of when unobserved sexual cycle transitions occurred; notably the automatically calculated onset of menses dates but also unobserved onset of deturgesenceses (Ddates) related to pregnancy conception events[58].

The unusual events that impact female cycling records, notably death and the cessation or initiation of long term observation, are recorded in other tables.

Note

The interval between conception and birth (or fetal death) is the length the pregnancy, by definition, and CYCPOINTS is only place in Babase where conceptions are recorded. For this reason CYCPOINTS includes rows for the Ddate events that begin every pregnancy, including those that record estimated, unobserved, Ddates. It may be that all that is known about a cycle is that a Ddate must have occurred because a pregnancy resulted.

Although Babase requires pregnancies to have a conception Ddate, and consequently there may be pregnancies for which an estimated (Source of E) Ddate must be entered, there is nothing preventing the user from creating estimated CYCPOINTS rows for the other Codes: T, D,

Caution

CYCPOINTS is special in that some of its data are automatically maintained by the system. The columns Cid, and Source columns can be updated by automatic processes. For further information see the documentation of the CYCLES table and each column's documentation.

The presence of a Ddate row can trigger the automatic generation of a Mdate 13 days later. For further information see the section on Automatic Mdate Generation.

Only Mdates are automatically assigned, and only Mdates may have a Source of A (Automatic). Mdates may be manually given a Source of A, although this may well not be a good idea as the Automatic Mdate Generation process may remove the A row at any point. It is even less of a good idea because automatic Mdates are not validated, so it is quite simple to enter an invalid automatic Mdate.

During a period of continuous observation, a series, sexual cycle transition events (CYCPOINTS) should not be missing. An individual's Mdates, Tdates and Ddates should all appear, in Mdate, Tdate, Ddate order. The system will report an error if this is not the case.[59] In consequence the combination of Cid and Code must be unique.[60]

Usually a female does not have multiple CYCPOINTS rows for a given date, although there is an exception. A female's onset of menses date (Mdate) may be the same as her onset of turgesence (Tdate) date. Otherwise, none of a female's CYCPOINTS rows may share a date.

Babase allows each sexual cycle transition event to be associated with 3 dates, the date of record (Date), the earliest possible date (Edate), and the latest possible date (Ldate). The earliest (Edate) and latest (Ldate) possible dates may be NULL. The earliest possible date (Edate) may not be later than the date of record (Date), and the latest possible date (Ldate) may not earlier than the date of record (Date). A female's earliest Tdate may, and likely will, have an earliest possible date (Edate) assigned that is before onset of menarche.

A number of constraints on CYCPOINTS involve the females' sexual maturity dates (MATUREDATES.Matured). When an individual's sexual maturity date is determined by observation, MATUREDATES.Mstatus is O (On), her earliest Tdate must be equal to her sexual maturity date.

Warning

When a female's MATUREDATES.Mstatus is O (On) her MATUREDATES.Matured is automatically set to her earliest Tdate. Any error in the Tdate value will be reflected in the maturity date. This is not true of females with MATUREDATES.Mstatuses that are not O. These maturity dates must be manually maintained.

No date-of-records may occur before a female's maturation date. All of an individuals date-of-record (Date) and late (Ldate) sexual cycle transition date values must be on or after the individuals onset of menarche date (MATUREDATES.Matured). All of an individual's early dates (Edate), Bdates of record (Date), and the first Tdate date-of-record (Date), sexual cycle transition dates must be after the individual's birth date.

Females with CYCPOINTS rows must have a sexual maturity date. The system will report mature females with no CYCPOINTS rows on or after her maturity date (MATUREDATES.Matured).

All early date (Edate) and date-of-record (Date) values must be on or before the individual's Statdate.

Caution

Even when an individual is dead, late (Ldate) dates may be after the Statdate. This is because death is rarely observed; although the Statdate contains a single date, the uncertainty surrounding the date of death is reflected in the sexual cycle event Ldate.

There are gaps in observation. If the first cycling event in a series -- the first Mdate, Tdate, or Ddate -- falls on the day observation resumes then things are pretty simple. The state of sexual cycling at the time observation resumes, CYCGAPS.State, must correspond with the event. For a menses CYCGAPS.State is M and so forth. The situation is slightly complicated by the swelling-follicular and ovulating states. The details are this: If the first CYCPOINTS row in the series falls on the first day of the series, the CYCGAPS.State must be M (Menses, follicular) when the CYCPOINTS.Code is M (onset of Menses); CYCGAPS.State must be D (Deturgesence) when the CYCPOINTS.Code is D (onset of Deturgesence); CYCGAPS.State must be S (Swelling, follicular) when the CYCPOINTS.Code is T (onset of Turgesence) and the subsequent Ddate in the series is more than 5 days after the Tdate or there is no subsequent Ddate; and CYCGAPS.State must be O (Ovulating) when the CYCPOINTS.Code is T (onset of Turgesence) and the subsequent Ddate in the series is not more than 5 days after the Tdate.

If the above is not the case, i.e. the first cycling event in the series falls on the day observation resumes and CYCPOINTS.Code is M but the CYCGAPS.State is not, then the State of the CYCGAPS row is automatically changed to enforce correspondence between CYCGAPS and CYCPOINTS.

But what if observation starts and then later the first Mdate, Tdate, or Ddate is observed? What happens (to CYCSTATS) between the start of observation and the first event? That's what CYCGAPS.State is supposed to address and it needs to be set appropriately. This cannot always be done automatically either, although usually it can.

If the first CYCPOINTS row in the series does not fall on the first day of the series, the CYCGAPS.State must be D (Deturgesence) when the first CYCPOINTS.Code is M (onset of Menses); the CYCGAPS.State must be S (Swelling, follicular) when the CYCPOINTS.Code is D (onset of Deturgesence) and the CYCPOINTS.Date is more than 5 days after the CYCGAPS.Date; and the CYCGAPS.State must be O (Ovulating) when the CYCPOINTS.Code is D (onset of Deturgesence) and the CYCPOINTS.Date is not more than 5 days after the CYCGAPS.Date.

In these cases, as before, the State of the CYCGAPS row is automatically changed to enforce correspondence between CYCGAPS and CYCPOINTS.

The final set of possibilities have to do with Tdates, which are complicated because they occur at menarche and after pregnancies, as well as after menses. The system will report an error if the first CYCPOINTS row in a series does not fall on the first day of the series and the first CYCPOINTS row is a Tdate and the CYCGAPS.State is something other than M (Menses), P (Pregnant), or L (Lactating). Because there are 3 possibilities in this case, the CYCGAPS.State value is not automatically assigned.

Warning

Because deleting CYCPOINTS changes a female's cycling state -- a representation of which Babase keeps in the sexual cycle day-by-day tables -- but not the interval of time during which she was under observation (CYCGAPS), removing Mdates, Tdates, or Ddates from CYCPOINTS at the beginning of a series can, possibly, leave the beginning of the series either in an incorrect state or the correct state for an overly long period of time. This can be equally true when the dates of the first CYCPOINTS in a series are changed. Removing all the CYCPOINTS Mdate, Tdate, and Ddate rows from a series will leave the entire observational period in the State specified by the CYCGAPS row that denotes the start of the observational period. This may or may not be correct, especially when the CYCGAPS.State was automatically changed due to the insertion or deletion of CYCPOINTS rows.

When deleting all sexual cycle transition CYCPOINTS rows from an observational period it is best to delete later rows before earlier rows, as deleting CYCPOINTS rows from the beginning of the observational period changes the CYCGAPS.State value marking the start of the observational period.

CYCPOINTS rows must not fall in an interval of no observation, excepting estimated (Source is E) Ddates (Code D) that are also conception events. (See PREGS.Conceive.) None of the different kinds of date values -- early (Edate), date-of-record (Date), or late (Ldate) -- of the individual's CYCPOINTS rows may be in an interval during which the individual is not under observation -- may fall on a date on which the individual has a row in CYCGAPDAYS. The system will allow but report as an error CYCPOINTS rows with a Source of E and a Code of D that are not referenced in PREGS.Conceive.[61]

Caution

CYCPOINTS and CYCLES are intimately related. Be sure to read and understand the CYCLES documentation.

Once a row is created it must remain associated with the same female -- any re-assignment of Cid must retain the association between the CYCPOINTS row and the old Cid's female.

Note

There are plans afoot to automatically fill in the early and late dates. The early dates would include the day after the immediately prior census date, the late date would be the day of the immediately following census date. There must also be a mechanism for manually overriding the automatic dates.

Cpid (sexual Cycle data Points IDentifier)

A numeric identifier unique to each row. This is used to reference the sexual cycle transition elsewhere in the database. This column may not be NULL.

This column need not be manually assigned when the row is created. It may not be changed.

Cid (sexual Cycle IDentifier)

A numeric identifier identifying each sexual cycle. It is unique across all cycles of all females, but shared by all CYCPOINTS rows comprising a cycle -- a Mdate, Tdate, Ddate sequence -- of a female.[62]

This column need not be manually specified when the row is created using the CYCPOINTS_CYCLES view. If it is not specified, the system will determine with which cycle the row should be associated and assign the correct Cid. Should the system find that the sexual cycle transition date belongs in a new cycle, it will make and assign a new Cid.[63] If the column is specified the system does the same work, but when it is appropriate to create a new cycle the supplied value is used.

As the system does the same amount of work whether or not the user specifies a value, the only utility in specifying a value is to manually assign a specific Cid to a new sexual cycle which Babase would otherwise automatically create.

Tip

When sexual cycle transition dates are incorrectly aggregated into sexual cycles, i.e. when the Cid is wrong, it is because the record of when the female was under observation -- the data on the CYCGAPS table -- is incorrect. Correcting CYCGAPS will correct the problem.

Caution

The system automatically assigns, or re-assigns, Cid values as CYCPOINTS and, especially, CYCGAPS rows are inserted, deleted, and altered to keep the database in a state consistent with the definition of a sexual cycle. For this reason any particular Cid is not guaranteed to forever identify a particular Sname/Date/Code. Cpids may be used for this purpose, or the data itself. For further information see the CYCLES documentation.

Supplying a NULL value causes the system to recompute the correct value, and use it in place of the NULL.

Date

The date-of-record of the transition event. See the Protocol for Data Management: Amboseli Baboon Project for information regarding the determination of this date from the field data. This column may not be NULL.

Edate (Early Date)

Earliest possible date of the transition event. This column may be NULL when there is no need to record a range of date values.

Ldate (Ldate Date)

Latest possible date of the transition event. This column may be NULL when there is no need to record a range of date values.

Source

Code indicating from whence the data were derived. D (Data -- the default) for observed data. A (Auto) for automatically inserted rows (see Automatic Mdate Generation). E (Estimated) for estimated values not to be used in other computations, such as estimated D dates entered to relate mothers and pregnancies.

This column may not be changed after the row is created.

Code

The type of sexual cycle transition:

The CYCPOINTS.Code Values
CodeDescription
Monset of Menses, a sexual cycle transition event
Tonset of Turgesence, a sexual cycle transition event
Donset of Deturgesence, a sexual cycle transition event

This column may not be changed.

PCSKINS (ParaCallosal Skin observations)

This table records information on the females' paracallosal skins. It contains one row for every recorded observation of each female's paracallosal skin.

Tip

The PCSKINS_SORTED view may be useful when maintaining the table, depending upon the front-end tool used for maintenance.

Tip

Use the sexual cycle day-by-day tables to determine the female's sexual state on the days of the paracallosal observations.

The combination of Sname and Date must be unique.

Pasty (Pcskins IDentifier)

A unique integer which identifies the PCSKINS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Sname

The Sname of the observed female. This column may not be NULL.

Date

The date of the observation. This date must be after the individuals Birth date. When the individual is dead the date must not be after the individual's Statdate. This column may not be NULL.

Color (ParaCallosal Skin color)

A Color code from the PCSCOLORS table -- the observed paracallosal skin color. This column must not be NULL.

PREGS (Pregnancies)

This table contains one row for each recorded pregnancy. A pregnancy is defined to be an event occurring to a mother; a single pregnancy could result in more than one fetus. The only time there will not be a related BIOGRAPH row for the zygote(s) is when the pregnancy is still in progress[64], otherwise there will always be a BIOGRAPH row that records the progeny of the pregnancy.

The progeny may not be born before being conceived -- the conception date (Ddate via Conceive) of the pregnancy must not be later than the birth date value (Birth) of the associated BIOGRAPH row, the child. The mother may not resume cycling until after birth -- the birth date value of the associated BIOGRAPH row must not be later than the resumption of cycling date values (Resume).

The sequence of a female's pregnancies when ordered by parity must correspond with the sequence when ordered by conception date.

The sequence number (CYCLES.Seq obtained via CYCPOINTS.Cid) of the sexual cycle event immediately following pregnancy (Resume) must always be exactly one more than the sequence number of the sexual cycle event associated with conception (Conceive). Only one pregnancy is allowed per conception event -- each Conceive value differs from all the others. These rules ensure that the resumption date follows the conception date and that there is no overlap of pregnancy time periods, from conception date to birth date or, if known, resumption of sexual cycling date, among the pregnancies associated with a particular female.[65] The female associated with the conception sexual cycle event (Conceive) must be the same as the female associated with the sexual cycle event immediately following pregnancy (Resume).

There must not be a resumption of menses date (Mdate) in the sexual cycle (CYCPOINTS.Cid) of the Resume cycle.

The pregnancy must terminate in a birth or fetal loss before the female resumes cycling; the only exception is cessation of observation as described below. The Resume column must be NULL until there is a row in BIOGRAPH with a Pid referring to the pregnancy.

Note

Note that the check for pregnancy termination, as well as the parity sequence checks, are not performed until the database transaction is committed. This allows a pregnancy discovered after subsequent pregnancies are already on-record to be added to the database by making multiple changes within a single database transaction. Inserting the new PREGS row, inserting a BIOGRAPH row for the progeny, and then updating the PREGS.Resume of the new pregnancy within a single transaction allows the referential integrity rules to be satisfied when the transaction commits.

Caution

Babase keeps a record of the reproductive state of mature females in the sexual cycle day-by-day tables. If these tables are to be correct Babase must know when each pregnancy ends (see BIOGRAPH.Birth), and when cycling resumes. When there is no record of the end of a pregnancy or resumption of cycling Babase must know whether this is due to cessation of observation or just cessation of data entry.

Babase cannot detect when the user has failed to enter rows in CYCGAPS when observation of a pregnant female has ceased. However, it will report errors and unusual conditions it can detect.

The system will report a warning: when an ongoing pregnancy exceeds 191 days -- when there are more than 191 days between the conception date (PREGS.Conceive) and the Statdate, and there are no progeny recorded for the pregnancy (in BIOGRAPH.Birth), and when there are no gaps in observation (see CYCGAPS) during the 191 day interval; when it appears that a conception date should be estimated but it is not -- when there is no Tdate in the conception cycle but the conception Ddate[66] is not estimated, and there is no gap in observation between the conception date and all of the female's prior CYCPOINTS rows.

The system will report an error: when a female has sexual cycles while a pregnancy is ongoing[67] -- when the female has Tdate CYCPOINTS rows that post-date her pregnancy's Conceive date but pre-date gaps in observation, and the pregnancy has no (NULL) Resume.[68] A female must not have any CYCPOINTS rows that postdate a pregnancy with a NULL Resume, unless the first CYCPOINTS row is a Tdate or unless they postdate a gap in observation following the pregnancy.

Warning

The Resume column is automatically updated by Babase. so long as there is no gap in observation (See CYCGAPS) between the conception date and the Tdate that resumes cycling. It is set to the Tdate immediately following the conception date. The system will report an error if there is a gap in the observation of sexual cycle events (CYCPOINTS and the Resume column is not NULL.[69]

Tip

The temporary creation of a gap in observation (CYCGAPS) allows a conception-birth-resumption sequence to be inserted into a pre-existing series of sexual cycle events (CYCPOINTS).

Pid

The contents of this column uniquely identifies the pregnancy record. The Pid must be the mother's Sname followed by the probable parity. Because the Pid is only used to identify the record, it is not necessary to change the Pid just because the parity of the pregnancy is found to have changed. Once a unique Pid is established, it may not be changed. When retrieving data from this table the safe approach is to assume nothing about the contents of this column except that it will uniquely identify a pregnancy.

Note

The preferred way to obtain the bearer of the pregnancy is to find the female associated with the ovulation by joining PREGS.Conceive with CYCPOINTS.Cpid to find CYCPOINTS.Cid, join that with CYCLES.Cid to find CYCLES.Sname, and then use that value to find the mother's BIOGRAPH row.[70][71]

Warning

The Parity column must always be used to obtain a meaningful parity value. As Pids cannot change, should a pregnancy be missed and correction only entered into Babase after the entry of a subsequent pregnancy, the female's subsequent Pid will forever contain an incorrect parity.[72]

Parity

The cardinality of the pregnancy. 1 for a female's first pregnancy, 2 for a female's second pregnancy, and so forth. There must not be gaps in the pregnancies, sequenced by Parity, of any female. When the first pregnancy is known, the Parity sequence begins with 1. When the first pregnancy is not known, the Parity sequence begins with 101.

The parity of a female's first pregnancy must be specified. This tells the system whether the parity sequence begins with 1 or 101. The system will automatically generate the parity of subsequent pregnancies, when the user does not supply a parity. When the user does specify a parity the system compares the supplied value with the value it computes for the column and and raises an error if the two do not match. As a special exception the parity is allowed to be in the 100s rather than the 1s, although the parity must remain sequential and without gaps when only the 10s and 1's place of the female's pregnancy parities are considered. E.g. the parity sequence may be either 1, 2, 3 or 1, 2, 103 but not 1, 2, 104. The 1 in the 100ths place signals that there has been a period of no observation[73] and a pregnancy may have been missed. When a pregnancy's parity is changed from the 1's (or 10's) to the 100s Babase will update the parity of subsequent pregnancies so that they are also in the 100s. Babase will only allow a change from the 100s to the 1s (or 10s) of the smallest of a female's pregnancy parities that are larger than 100 -- the first pregnancy after a period of no observation. In this case Babase will not change the parity of subsequent pregnancies; this must be done manually, from smallest to largest. Babase will not allow a change from the 100s to the 1s (or 10s) of a female's pregnancy parities that are larger than the smallest parity larger than 100.

Supplying a NULL value for the Parity causes the system to recompute the correct value, a value one larger than the parity of the previous pregnancy, and use it in place of the NULL.

Conceive

The information related to the Ddate event that initiated the pregnancy. This is the Cpid of a CYCPOINTS row of the mother. The related CYCPOINTS row should record the date of conception and must record a Ddate.

This column must contain a unique datum.

Tip

When the date of conception is estimated because there is no sexual cycle data, the conception date recorded should be 178 days before the recorded birthday.

This column must not be NULL.

Resume (NULL allowed)

The resumption of cycling event (Tdate) of the first cycle following the pregnancy. This is the Cpid of a row in CYCPOINTS, which must record a Tdate. This column may be NULL in those cases when resumption of cycle information is not known. When this column is not NULL, it should contain a unique datum.

This column may be automatically updated. (See the description of the PREGS table above.

SEXSKINS (Sexskin Turgesence Measurements)

This table records information on the females' sexskins. It contains one row for every recorded observation of each female's sexskin.

Babase requires sexskin measurements be associated with sexual cycles (CYCLES) in accordance with the rules described in the Sexual Cycle Determination section.

Caution

Because sexskin measurements must be related to a female's sexual cycle -- a CYCLES row --, which is how sexskin measurements are related to a specific female, a female's Mdate, Tdate, and Ddate sexual cycle events -- her CYCPOINTS rows --, must be updated before sexskin information may be entered.

Tip

Use the CYCLES_SEXSKINS view to maintain this table.

Note

The checks that compare all the sexskins of a particular cycle raise their errors immediately when the error is a result of changes made directly to the SEXSKINS table. But, should an error condition be created as a result of automatic shifting of sexskins between cycles due to changes to the sexual cycle dates (See CYCPOINTS) the errors are not immediately reported.

Tdates normally occur at some point during the transition from sexskin Size 0 to Size 1, but can occur during the transition from sexskin Size 0 to Size 5. Measurements larger than 5 cannot come on or before the Tdate of the cycle. The system will generate a warning when there is sexskin measurement larger than 1 before the Tdate. The Tdate of a cycle must be after the dates of all the cycle's sexskin measurements of zero that precede the earliest 1 or greater measurement occurring in the cycle.

A Ddate occurs when the sexskin begins to deturgesence. The Ddate of a cycle must be after the last measurement before the largest measurement of the cycle.[74] The system will report a warning when Ddates occur after sexskin turgesence has begun to subside -- Ddates after the first measurement following the largest sexskin measurement(s) of the cycle.

Sexskin turgesence normally begins after menses. The system will generate an error if there is no Mdate in the sexual cycle to which the SEXSKINS row is assigned, unless the sexual cycle's Tdate falls on the individual's MATUREDATES.Matured date and the maturity date is an ON date[75], or the cycle is is the first after a pregnancy (The Cid is a PREGS.Resume value), or the cycle's Tdate is the first CYCPOINTS row after a (CYCGAPS) gap. In the latter case the system will generate a warning. The sexskin measurement before the Mdate cannot be larger than 0. The sexskin measurement on the Mdate cannot be larger than 1, unless the Mdate is also a Tdate in which case the measurement cannot be larger than 5. The system will generate a warning when the sexskin measurement on the Mdate is larger than 0.

Sexskin turgesence associated with one cycle must not be contemporaneous with Mdates, Tdates, Ddates, or sexskin turgesence observations related to a different cycle. All of the SEXSKINS Date values associated with a particular cycle must be later than the Mdate, Tdate, and Ddate of the previous cycle and earlier than the Mdate, Tdate, and Ddate of the succeeding cycle. There must not be any overlap of the cycles' sexskin measurement dates, over the time period from a cycle's earliest sexskin measurement date to its latest, between the sexskin measurement dates of a female's different cycles.

The combination of Sname, from the associated CYCLES row, and Date must be unique.

The combination of Date and Cid must be unique.

Sxid (Sexskins IDentifier)

A unique integer which identifies the SEXSKINS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Cid (Cycle IDentifier)

The CYCLES identifier associated with the sexskin measurement. This is a Cid from the CYCPOINTS table. This column can be used to retrieve the Sname of the female that was measured as well as all other data collected on the cycle.

This column is automatically assigned by the system. Although some (arbitrary) cycle must be associated with the SEXSKINS row upon insert in order to relate the row to a female, the system always uses the Sexual Cycle Determination rules to re-assign the row to the appropriate cycle.

This column may not be NULL.

Date

The date of the observation. This date must be after the individuals Birth date. The date must not be after the individual's Statdate. This column may not be NULL.

Size

This column contains a number indicating the size of the sexskin in a metric with units that are integers, with the exception that 0.5 value is allowed, ranging from 0 through 20, inclusive. This column must not be NULL

Social and Multiparty Interactions

ALLMISCS (Ad-libitum sample data)

One row for every unstructured data collection event recorded during all-occurrences protocols. The ALLMISCS row containing data collected during a particular sample is related to the SAMPLES row representing the sample. Samples do not have a fixed number of related rows on ALLMISCS, any particular sample may have one, none, or many. Further information may be found on SAMPLES.

A variety of ad-libitum data may be collected during sample data collection. Some of these ad-libitum data can be placed in the INTERACT_DATA and POINT_DATA tables, in which case ALLMISCS is not involved. The data that does not conform to the design of INTERACT_DATA and POINT_DATA is kept in the ALLMISC table.

Note

Consortships recorded as ad-libitum data during focal point sampling are not stored on INTERACT_DATA because INTERACT_DATA requires that consortships have a starting and an ending time and data collected during focal point sampling is without duration. Such consortship data are stored as an ALLMISCS row. Babase presumes that all consortships are recorded systematically during the day on paper and entered into Babase and so it is not necessary to attempt to place ad-libitum consortship data recorded during focal sampling into INTERACT_DATA. . Consortship data are collected during focal samples in order to note whether focal animals are engaged in consortships during a particular sample, and not to record the consortship per se.

Note

Mounts involving the focal individual during all-occurrences sampling are recorded both in the palmtop and on the paper field ad-libitum records. Consequently, to avoid duplicates in INTERACT_DATA, Babase stores the mounts recorded in the palmtop in the ALLMISCS table, but not the INTERACT_DATA table. Mounts in the ALLMISCS table are therefore redundant and may be ignored.

Warning

Babase does the same thing with ejaculations recorded on the palmtops as it does with mounts: it records them in ALLMISCS rather than INTERACT_DATA. However, the protocol says nothing about ejaculations occurring during all-occurrences sampling. Anyone researching ejaculations will need to investigate this further.

For further information regarding the information collected see the Amboseli Baboon Research Project Monitoring Guide. For further information regarding which ad-libitum data winds up in ALLMISCS see the Protocol for Data Management: Amboseli Baboon Project. For further information on the structure of the ad-libitum text stored in ALLMISCS see the documentation on the Psion palmtop data collection program(s).

The combination of Sid and Time must be unique.

Almid (Allmiscs IDentifier)

A unique integer which identifies the ALLMISCS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Sid (Sample Identifier)

The focal point data set in which the data were collected. (See SAMPLES.Sid.)

Atime (time)

The time the ad-libitum data were taken. This column stores the time using a data type having a precision of one second but the precision and accuracy of the data values are dependent upon the palmtop's timekeeping, the operator, and the protocol and is surely not one second. Consult the Amboseli Baboon Research Project Monitoring Guide.

The time may not be before 05:00 and may not be after 19:00.

Txt (unstructured Text)

The unstructured ad-libitum information collected.

At present the text in this column actually does have some structure[76] but appears in ALLMISCS because Babase contains no other place suitable for the storage of the data. The text begins with a one letter code followed by a comma. The allowed one letter codes and their meaning are:

C

Consortship. This is redundant information. Because consortships happen over time these consortships should always also be independently recorded and therefore independently entered into INTERACT_DATA and PARTS.

U

Unknown. This was once reserved for meta-information -- the field data collection team's comments on the process of data collection -- but its meaning has since become confused with the O code.

O

Other. Other information about the baboons or their environment. Its meaning has become confused with the U code.

For further information see the Amboseli Baboon Research Project Monitoring Guide.

This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

CONSORTS (multiparty disputes over CONSORTshipS)

One row for every MPIS row (multiparty interaction) involving a consortship. This table extends the MPIS table to include information about consortships.[77]

Mpiid (Multiparty Interaction IDentifier)

A unique integer which identifies the MPIS row -- the multiparty interaction.

Because the CONSORTS table extends the MPIS table, the two tables have a one-to-one relationship, this value also uniquely identifies the CONSORTS row.

The value of this column may not be changed.

Female

The disputed female. A BIOGRAPH.Sname of a female.

This column may be NULL when the consorted female is unrecorded.

Had

The male who consorted with the female prior to the multiparty interaction. A BIOGRAPH.Sname of a male.

This column may not be NULL.

Got

The male who consorted with the female after the multiparty interaction. A BIOGRAPH.Sname of a male.

This column may not be NULL.

FPOINTS (Point data on Females)

Exactly one row for every point collected using the adult female sampling protocol. SAMPLES rows with a Stype of F have related FPOINTS rows. The FPOINTS rows have the same id, a Pntid, as POINT_DATA. The FPOINTS rows hold the data unique to the adult female protocol sampling method; there is exactly one FPOINTS row for every POINT_DATA row in the sample, exactly one row for every point taken under the protocol. The system reports an error for those POINTS rows collected using the adult female sampling protocol (having a SAMPLES.Stype of F) that do not have a related Fpoints row.

Although every FPOINTS row should have a related row on POINT_DATA, not every POINT_DATA row has a related FPOINTS row. If a point is not an adult female point then the point simply has no row on FPOINTS.

Tip

Because every FPOINTS row must have a related POINT_DATA row, when entering a point the POINT_DATA row must be entered before the FPOINTS row.

Pntid (Point Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row. Pntid links FPOINTS with POINT_DATA in a one-to-one manner.

This column may not be NULL.

Kidcontact (female/infant position)

The position of the infant with respect to the focal female. The legal values for this column are defined by the KIDCONTACTS support table.

This column may not be NULL.

Kidsuckle (Suckling activity)

The suckling activity of the infant. The legal values for this column are defined by the SUCKLES support table.

This column may not be NULL.

INTERACT_DATA (Interactions)

This table contains a row for every recorded interaction between animals, including all-occurrence data taken during focal point samples but excluding multiparty interactions (MPI_DATA). Each row records when the interaction occurred. Further information on the interaction is stored elsewhere, notably PARTS. Each interaction in INTERACT_DATA is represented as though it occurs between two ordered individuals designated actor and actee -- thus resulting in two rows in the PARTS table.

Warning

The INTERACT view should always be used in place of this table. (See Views for the rationale.) INTERACT is an extension of this table which may be useful. It is identical to INTERACT_DATA but is extended with alternate representations of dates and times.

Tip

The ACTOR_ACTEES view provides a way to view interactions as a single rows.[78]

Caution

The date of the interaction must not be before the participants' Entrydate, or after their Statdate (with some exceptions. See PARTS for details). Therefore the demographic data for any particular time interval must be entered into Babase before that time period's social interaction data.

A female may not participate in a mount, consortship, or ejaculation interaction before menarche (MATUREDATES.Matured). A male may not participate in a mount, consortship, or ejaculation interaction before 4 years of age.[79]

Many rules surrounding INTERACT_DATA's values are closely tied to the project's data collection protocols. There are two sorts of data collected on behavioral interactions: all-occurrences data and ad-libitum data. All occurrences data are collected only during focal animal samples. They are data on all the occurrences of a particular behavior or interaction during a given time interval and/or involving a participating focal individual.[80] All occurrences data will always have an INTERACT_DATA.Sid that is not NULL. Ad-libitum data are data that are collected opportunistically at the will of the observer; we do not assume that ad lib data capture all the occurrences of a given behavior. Ad-libitum data, which generally are not collected as part of focal animal samples, usually have a NULL Sid value (only those collected during a focal animal sample have a non-NULL Sid). Some sorts of interactions are only collected during focal sampling and not as ad libitum data outside of focal samples. Approach (ACTS.Class = P), and request to groom (ACTS.Class = R) are these interactions; they are only collected during all-occurrences sampling and must have a non-NULL Sid. Although consortship and mount[81] data are collected as all-occurrences data during focal point samples, these data are also collected, simultaneously and in more detail, in ad libitum notes. Consequently, they appear in Babase as ad libitum data in INTERACT_DATA, not as all occurrences data, and consortships (ACTS.Class = C), mounts (ACTS.Class = M), and ejaculation (ACTS.Class = E) rows always have a NULL Sid.

Tip

An individual's all-occurrences interactions can be distinguished from ad-libitum data by using the Sid column to reference SAMPLES to see if the individual is the focal of an all-occurrences sample. An example is presented in Appendix B.

Note

INTERACT_DATA rows having a related SAMPLES row, having a non-NULL Sid[82], will automatically have an Observer value equal to the value in the related SAMPLES.Observer column -- the system automatically synchronizes observer values between related INTERACT_DATA and SAMPLES rows. Such automatically assigned values cannot be changed. To change the observer the SAMPLES.Observer column must be changed.

Caution

Care must be taken when breaking a relationship between INTERACT_DATA and SAMPLES, when setting INTERACT_DATA.Sid to NULL. The automatically assigned INTERACT_DATA.Observer value may no longer be correct and so may require manual adjustment.

An INTERACT_DATA row with a NULL Sid and a non-NULL Observer cannot be updated with a non-NULL Sid unless the Observer value is also set to NULL -- manually assigning an observer to an ad-lib interaction precludes relating the interaction to a focal point sampling period. Setting Observer to NULL when changing Sid to a non-NULL value causes the system to automatically assign the correct value to Observer -- causes the system to automatically synchronize observers.[83] Likewise, an INTERACT_DATA row with a non-NULL Sid cannot be inserted unless the Observer value is either NULL or matches that of the related SAMPLES.Observer value -- new focal sample interactions must be consistent with respect to the observers recorded in the INTERACT_DATA and SAMPLES tables. When an INTERACT_DATA row with a non-NULL Sid and a NULL Observer value is inserted then the Observer value is automatically updated with the related SAMPLES.Observer value -- again, the observer associated with the interaction is automatically brought into sync with the focal sample.

INTERACT_DATA encodes interaction time and duration by storing the start and stop times of the interaction. The columns Start and Stop are used for this purpose. Consortships may have a NULL in either the Start or the Stop time when the respective value is unknown, otherwise the Start time must precede the Stop time. Ad-libitum sample agonism and grooming interactions (ACTS.Class values of A and G respectively) must have a NULL in both the Start and Stop columns. All-occurrences agonism, grooming, approach (ACTS.Class = P), and request to groom (ACTS.Class = R) interactions must have non-NULL Start times that equal Stop times. Start always equals Stop for mounts (ACTS.Class = M) and ejaculations (ACTS.Class = E).

The columns of this table that contain times, Start and Stop, are stored using a data type that has a precision of 1 second. The Amboseli Baboon Research Project Monitoring Guide must be consulted regarding the precision and accuracy of these data. It is expected that ad-libitum datum is entered with a 1 minute precision.[84] Consequently the seconds portion of the time values must always be 0 when Sid is NULL. All-occurrences interaction data (Sid is not NULL) do contain seconds.[85]

An interaction's Handwritten must be FALSE when Sid is not NULL.

Note

It is physically possible for interactions associated with a focal sample to be handwritten, but in practice this never occurs. Interactions associated with focal samples are always recorded electronically.

The system will report a warning for interactions which occur between individuals who are not in the same group on the date of the interaction.

Iid

A positive integer that uniquely identifies the interaction. This number is assigned by the system. This column must not be NULL.

Sid (all-occurrences Sample IDentifier)

The origin of the data. When the interaction data were collected during all-occurrences sampling this column holds a SAMPLES.Sid identifying the all-occurrences sample during which the data were collected, otherwise this column is NULL.

Act (kind of interaction)

A code indicating the kind of interaction. The ACTS support table defines the legal values for this column.

Note

Although Act contains ACTS.Act values, it is often the broader ACTS.Class classification that is of interest.

This column may not be NULL.

Date

The date on which the interaction took place. This column may not be NULL.

Caution

For grooming data prior to 2006-07-01 only the month and year of the interaction are valid.[86] For these data, all of these dates must fall on the first day of each month for that year, unless the data comes from point sampling (the Sid column is NULL). The data collected during point sampling has accurate dates.

Start (interaction Starting time)

The time the interaction began or, in the case of all-occurrences data, the time the interaction was recorded in the field.

The data type of this column has a 1 second precision. The precision and accuracy of the data itself is dependent upon the protocol and the operator and is almost surely not 1 second. Consult the Amboseli Baboon Research Project Monitoring Guide.

The time may not be before 05:00 and may not be after bb_day_end;.

This column may be NULL.

Stop (interaction ending time)

The time the interaction stopped or, in the case of all-occurrences data, the time the interaction was recorded in the field.

The data type of this column has a 1 second precision. The precision and accuracy of the data itself is dependent upon the protocol and the operator and is almost surely not 1 second. Consult the Amboseli Baboon Research Project Monitoring Guide.

The time may not be before 05:00 and may not be after 20:00.

This column may be NULL.

Observer

Initials of the person who collected the sample. The legal values of this column are defined by the OBSERVERS support table.

This column may be NULL.

Handwritten

A boolean indicating whether or not the observer recorded the interaction by hand[87] . This value is TRUE if yes, FALSE if no.

This column may not be NULL.

MPIS (Multiparty InteractionS)

One row for each collection of multiparty interactions.

Multiparty interactions are recorded as an ordered series of dyadic interactions. Each complete series has a single MPIS row in the database.

Note

This is a separate data set from the dyadic interactions recorded in INTERACT_DATA and related tables. Interactions appearing there do not appear in the multiparty interaction data, or vice versa.

The date of the multiparty interaction must be between the Entrydate and Statdate, inclusive, of all the participants.

The two participants in the dyadic interactions must be different individuals, the two MPI_PARTS.Snames must be different.

The Context column must be NULL when the Context_type value is N, no context.

The Context_type column must be C (Consortship) and the Context column must be NULL when a related CONSORTS row exists. The system will generate a warning when the Context_type column is C and there is no related CONSORTS row.

Mpiid (Multiparty Interaction IDentifier)

A unique integer which identifies the MPIS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Date

The date the interactions occurred.

Context_type

Multiparty interactions may be categorized by the context in which they occur. This column identifies the context of the multiparty interaction.

The legal values of this column are defined by the CONTEXT_TYPES support table. This column may not be NULL.

Context (Unstructured text)

Unstructured text describing the context in which the multiparty interaction occurred.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

MPI_DATA (Multiparty dyadic Interactions)

Multiparty interactions are recorded as collections of individual dyadic interactions. This table contains one row for every dyadic interaction of a multiparty interaction collection. Each interaction is represented as though it occurs between two ordered individuals designated actor and actee -- these individuals are recorded in the MPI_PARTS table. The dyadic interactions within the collection are time-wise sequenced. Two rows may have the same sequence number (Seq), indicating that the two interactions occurred simultaneously.

Tip

The MPI_EVENTS view provides a convenient way to view multiparty interactions as single rows.

Caution

Babase records little in the way of causality among the various interactions collected together under the multiparty interaction collection umbrella. At the time of this writing the data protocols require that the initial interaction is a kind of agonism or a kind of help request, so that can be considered causal of the remaining interactions. However there is nothing, other than time-wise sequencing, linking particular requests for help with aid supplied. As a result it is impossible, in the general case, to associate help supplied with help requested. For example, an individual may request help twice, from two different individuals, and then receive help from an third individual. The columns recording the results of help requests (Helped and Active) must therefore be used with caution, as must any attempt to correlate the specifics of help given with help requested.

Multiparty interactions which occur simultaneously must have the same MPIAct values.

The system will generate a warning when more than two MPI_DATA rows, sharing a Mpiid, have the same Seq value -- when there are more than two dyadic interactions occurring simultaneously.

The first interaction of a multiparty interaction (those with a Seq of 1) must be an agonism or a request for help, the MPIAct value must be that of an MPIACTS row having a Kind value of A or R.

The first interaction of a multiparty interaction collection is expected to be a single dyadic interaction unless otherwise allowed by the MPIACTS table -- the first interaction of a multiparty interaction collection may only occur simultaneously with another interaction, the two dyadic interactions both having a Seq of 1, when all of these initial interactions have MPIAct values that relate the rows to MPIACTS rows having TRUE Multi_first values.

The Helped and Active columns are meaningful when the MPI_DATA row records a request for help.[88] These columns must be NULL when the MPI_DATA row does not record a request for help, otherwise they must not be NULL. The system will generate a warning when the Helped column indicates that no help was given but there are subsequent interactions which record help being given (where the MPIAct values have H MPIACTS.Kind values) to the individual who requested help. The system will generate a warning when Active is TRUE and there are no subsequent AH interactions where the help-requestee is the recipient of help in the same multiparty interaction collection. The system will generate a warning when Helped is true and Active is FALSE and there are no subsequent PH interactions where the help-requestee is the recipient of help in the same multiparty interaction collection.

Mpidid (Multiparty Interaction Data IDentifier)

A unique integer which identifies the MPI_DATA row, and thereby the interaction the row records.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Mpiid (Multiparty Interaction IDentifier)

A number identifying the multiparty interaction collection (MPIS) of which the MPI_DATA interaction is a member.

This column cannot be changed and must not be NULL.

MPIAct (Multiparty Interaction Act code)

This column records the kind of interaction which took place. The legal values for this column are defined by the MPIACTS support table.

This column may not be NULL.

Seq (Sequence)

The first interaction of each multiparty interaction collection has a Seq value of 1, the second a value of 2, etc. The system will report an error if the Seq does not begin with 1 or is not contiguous.

Note

The Seq values need not be unique, per Mpiid. Duplicate sequence numbers are used to indicate simultaneous interactions, as would happen if, e.g., 2 individuals aggressed against 1.

This column may not be NULL.

Helped

This column indicates whether help was given, by the individual from whom help was requested, in response to a request for help. Helped must be FALSE when help was requested from an unknown individual.[89]This column contains meaningful information only for those MPI_DATA rows which record requests for help. (See above.)

This column is TRUE when help was given and FALSE when no help was forthcoming.

This column may be NULL.

Active

This column indicates whether help given was active or passive. It contains meaningful information only for those MPI_DATA rows which record requests for help. (See above.)

This column is TRUE when the help supplied was active and FALSE when either the help supplied was passive or when no help was supplied. This column is NULL when the MPIAct value represents an action other than a request for help.

Caution

When looking for help requests that received passive help always check the Helped value to be sure that help was actually received.

This column may be NULL.

MPI_PARTS (Multiparty Interaction PARTicipantS)

This table contains records of participants in the interactions which make up a multiparty interaction collection (MPIS). Each interaction is represented as though it occurs between two individuals designated actor and actee. Interactions between multiple individuals are broken down into interactions between pairs according to rules described in the protocols. Therefore, this table should contain two rows for every record of an interaction (for every row in MPI_DATA), one row to record the actor, and one to record the actee. Rules for classifying individuals as actor or actee are documented below in the description of the Role column.

Tip

The MPI_EVENTS view provides a convenient way to view multiparty interactions as single rows.

Warning

Every MPI_DATA row should be related to exactly two MPI_PARTS rows, otherwise it is an error. However, the system allows this condition to exist. It is presumed that such an error condition will exist for only as long as it takes to enter a complete set of data. The system will report those cases where there are not exactly two MPI_PARTS rows for every MPI_DATA row.

Caution

The data integrity rules require that the MPI_DATA row be entered before the 2 MPI_PARTS rows.

Either the Sname or the Unksname column must be NULL, but not both.

The actor and the actee of an interaction, when specified as Snames, must not be the same individual.

Mpipid (Multiparty Interaction Participant IDentifier)

A unique integer which identifies the MPI_PARTS row, and thereby the participant in the interaction the row records.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Mpidid (Multiparty Interaction Data IDentifier)

Multiparty interaction identifier. This column holds the Mpiid value of the row on the MPI_DATA table containing further information on the interaction in which the animal is a participant. It can be used to retrieve the other information recorded on the multiparty interaction. There must be a row in MPI_DATA with an Mpiid of this value. This column cannot be changed and may not be NULL.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information, such as the maternal group of the animal, from BIOGRAPH or other places where the animal's three-letter code appears.

This column must not be NULL when the participating individual is precisely identified and NULL otherwise.

Unksname (Unknown neighbor code)

The nature of the problem when one of the participants in the interaction cannot be precisely identified. The legal values of this column are defined by the PARTUNKS support table.

This column must be NULL when the participating individual is precisely identified and not NULL otherwise.

Role

This column designates whether the row records the actor or the actee of the interaction. The two possible values are:

The MPI_PARTS.Role Values
CodeMnemonicDefinition
RActorThe actor is usually the one performing the act. For the agonism data, the individual that is the winner (does not perform a submissive behavior) is the actor. For help requests, the individual that is requesting the help is the actor. For help supplied, the individual supplying the help is the actor. For grooming data, the individual that is grooming is the actor. And so forth.
EActeeThe actee is usually the one that is the recipient of another animal's attentions. For the agonism data, the individual that is the loser (performing a submissive behavior) is the actee. For help requests, the individual of whom help is requested is the actor. For help supplied, the individual to whom the help is supplied is the actor. For grooming data, the individual that is groomed is the actee. And so forth.

This column may not be NULL.

PARTS (Participants in interactions)

This table contains records of the participants in observed interactions between animals. Each row in the table records a participant. Each interaction is represented as though it occurs between two individuals designated actor and actee. Interactions between multiple individuals are broken down into interactions between pairs according to rules described in the protocols. Therefore, this table should contain two rows for every record of an interaction (for every row in INTERACT_DATA), one row to record the actor, and one to record the actee. Rules for classifying individuals as actor or actee are documented below in the description of the Role column.

Caution

Every INTERACT_DATA row must be related to exactly 2 PARTS rows, excepting those INTERACT_DATA rows that are associated with ad-lib focal point sampling -- those that have non-NULL Sid values. Ad-lib interactions collected during focal point sampling are allowed to have only one participant, but only when that participant is the focal individual. So that data can be entered the system allows these error conditions to exist while a transaction is in progress. These conditions are validated on transaction commit.

Caution

The data integrity rules require that the INTERACT_DATA row be entered before the 2 PARTS rows.

Tip

The utility in the PARTS table, as opposed to having single rows for interactions as the ACTOR_ACTEES view does, is in writing database queries that search for interaction participants. It is easy to use PARTS to search for a participant without knowing whether the participant is the actor or the actee. The same is not true of the ACTOR_ACTEES view.

Note

It is easy to produce the ACTOR_ACTEES view from INTERACT_DATA and PARTS, but the reverse would not be true. This is why the underlying database representation is as it is and not the reverse.

The date of the interaction must be between the Entrydate and Statdate, inclusive, of the participants. The exception are grooming interactions before July 1, 2006, as these are always given a first-of-the-month date. Consequently, the date of these interactions may be before the entrydate but the month and year of the interaction date may not be before the month and year of the birthdate and entrydate.

The actor and the actee of an interaction must not be the same individual.

Partid (Parts IDentifier)

A unique integer which identifies the PARTS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Sname

A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information, such as the maternal group of the animal, from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Role

This column designates whether the row records the actor or the actee of the interaction. The two possible values are:

The PARTS.Role Values
CodeMnemonicDefinition
RActorThe actor is usually the one performing the act. For grooming data, the individual that is grooming is the actor. For the agonism data, the individual that is the winner (does not perform a submissive behavior) is the actor. For mounts, consortships, and ejaculations, the male is the actor.
EActeeThe actee is usually the one that is the recipient of another animal's attentions. For grooming data, the individual that is groomed is the actee. For the agonism data, the individual that is the loser (performing a submissive behavior) is the actee. For mounts, consortships, and ejaculations, the female is recorded as actee.

This column may not be NULL.

Iid (Interaction identifier)

Interaction identifier. This column holds the Iid value of the row on the INTERACT_DATA table containing further information on the interaction in which the animal is a participant. It can be used to retrieve the other information recorded on the interaction. There must be a row in INTERACT_DATA with an Iid of this value. This column may not be NULL.

POINT_DATA (Point observation data)

One row for every point observation collected on a focal individual during a sampling interval. When, for whatever reason, there are no point data collected on the focal individual at the turn of the minute, there is no row on POINT_DATA. The position of the points within the sample, Min value, may therefore contain gaps -- missing numbers. The missing numbers are points taken when the focal animal is out of sight or the point was missed for whatever reason. Babase represents the observational period during which a sample is collected as a SAMPLES row.

Warning

Always use the POINTS view in place of this table (see Views for the rationale.) It contains additional computed columns which may be of interest and is guaranteed to remain consistent in future Babase releases.

A POINT_DATA row must contain a Foodcode when the Activity column indicates the focal is feeding, otherwise Foodcode must be NULL.

Consistency is enforced with respect to time taken to collect the sample and the number of point observations. The Min value must not be larger than the Mins of the corresponding sample.

Pntid (Point observation Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to particular points.

Sid (Sample Identifier)

The sample (SAMPLES row) of which the point is a member.

Min (Point number within the sample)

The ordinal number of the point within the sample. The first point in the sample has a Point value of 1, the second a Point value of 2, etc. Note that these numbers need not be contiguous since some points are lost during data collection. (See above.) This column cannot be NULL.

Ptime (Point observation Time)

The time the point was recorded. This column stores the time using a data type having a precision of one second. The precision and accuracy of the data values are dependent upon the palmtop's timekeeping, the operator, and the protocol and is surely not one second. Consult the Amboseli Baboon Research Project Monitoring Guide.[90]

Warning

It is unlikely that the researcher is interested in this data because, as of January 2006, the field protocols require no particular relationship between the time of the point and the time the observer records the data.

The time may not be before 05:00 and may not be after 19:00.

This column may not be NULL.

Activity

An Activity code from the ACTIVITIES table describing the activity of the individual when the point was taken. Note that some activities are restricted based on the sampling protocol. (Whether the Juvenile/Infant or Adult Female protocol is used to collect the data. See ACTIVITIES and SAMPLES.Stype.)

This column may not be NULL.

Posture

A Posture code from the POSTURES table describing the posture of the individual when the point was taken. Note that some postures are restricted based on the sampling protocol. (Whether the Juvenile/Infant or Adult Female protocol is used to collect the data. See POSTURES and SAMPLES.Stype.)

This column may not be NULL.

Foodcode (May be NULL)

Food item eaten when the point was taken, if any. NULL when no food items are eaten. The legal values for this column are determined by the FOODCODES support table.

NEIGHBORS (point observation data on Neighbors)

The neighbors of the focal individual are recorded during point sampling. NEIGHBORS contains one row for every neighbor recorded during a point data collection event (minute). The protocol used to record neighbors of the focal individual has changed over time.[91] See the Amboseli Baboon Research Project Monitoring Guide for further information.

A focal individual's neighbors are not always recognizable or for some other reason do not always have a row in BIOGRAPH. For this reason NEIGHBORS contains two different columns used to identify the neighbor, Sname and Unksname. The first for recording known neighboring individuals and the second for recording unknown neighboring individuals. One and only one of these columns must contain a value, the other column must then contain NULL.[92]

The system will report a warning when the neighbor is not in the same group as the focal individual.

Caution

The neighbor must be alive and in the study population on the day of the sample (SAMPLES.Date, as discovered via POINT_DATA.Sid) -- the day of the sample may not be before the neighbor's Entrydate and may not be after the neighbor's Statdate.[93] This means that the demographic information for a particular time interval must be entered into Babase before the sample data for that interval.

Each point observation (Pntid value) may have at most one NEIGHBORS row of a given neighbor classification (Ncode value.) The combination of Pntid and Ncode must be unique.

The NCODES table places restrictions on which individuals can be neighbors. One effect of this is to limit the order in which NEIGHBORS may be added to and deleted from Babase.

The sample's focal individual (SAMPLES.Sname, as discovered via POINT_DATA.Sid) may not be her own neighbor.

The combination of Pntid and Sname must be unique.

Nghid (Neighbors IDentifier)

A unique integer which identifies the NEIGHBORS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Pntid (Point Identifier)

A number that uniquely identifies the point for which the neighbor was recorded. Pntid links NEIGHBORS with POINT_DATA. Further information related to the entire sample must be found by using POINT_DATA.Sid, the sample identifier.

This column may not be NULL.

Sname

The neighbor of the focal individual. A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears.

This column must be NULL when the neighbor is an unknown individual or otherwise not in BIOGRAPH.

Ncode (code classifying the kind of neighbor)

Different protocols classify neighbors differently. This column describes the kind of neighbor the row represents.

The legal values of this column are defined by the NCODES support table, which also determines which Ncodes may be used with which point sampling protocols (with which SAMPLES.Stype).

This column may not be NULL.

Unksname (Unknown neighbor code)

The nature of the problem when the neighbor cannot be precisely identified. The legal values of this column are defined by the UNKSNAMES support table.[94]

This column must be NULL when the neighboring individual is precisely identified.

SAMPLES (all-occurrences Samples)

One row for every continuous period of time during which data are collected at regular intervals on a specific focal individual. Although the field protocols center around collecting data primarily stored in the POINT_DATA table, other information, normally collected during ad-libitum data collection, may be collected as well and are also associated with the sample set. Further, a sample is allowed to contain no (animal) information.[95]Each SAMPLES row contains the information pertaining to all the data collected during the sample.

Caution

The date of the sample must not be before the focal individual's Entrydate, nor after the focal individual's Statdate. Therefore the demographic data pertaining to any particular time period must be entered into Babase before the sample data collected during that time period.

The number of point observations occurring during the sampling interval (Minsis) must be less than or equal to the total number of minutes elapsed (Mins) during the sampling interval.[96]

Although the goal is to sample females as juveniles only prior to menarche and exclusively as adults following menarche, there is a margin for error. For example, a female may have a false start toward menarche where she begins to swell and is sampled as an adult but then the swelling turns out to be extremely small and, in hind sight, is not considered to be the onset of menarche. On the other hand, females may continue to be sampled as juveniles for a short period of time after onset of menarche. Since the two sample types contain different data, the data collected cannot be easily changed from juvenile to adult or vice versa. Therefore, in order to prevent loss of data for females near menarche, juvenile data is allowed for a short time following a female's maturedate or a male's rankdate and adult data is allowed for a short time prior to a female's maturedate.

Females may have juvenile (Stype=J) SAMPLES rows until their first conception. Males (or those of unknown sex) may have juvenile (Stype=F) SAMPLES rows for up to but not including 1 year after ranking (RANKDATES.Ranked).

Only females may have female samples (Stype=F). Females may not have female samples more than 1 year before maturity (MATUREDATES.Matured).

The system will report a warning when the group (Grp) of the focal individual, as recorded on SAMPLES, is not the same as the group MEMBERS records for the focal individual on the date of data collection.

One of the participants in all interactions collected during the sample (see INTERACT_DATA.Sid and PARTS) must be the focal individual.

The Psion palmtops collect data using a custom program. That program in turn is driven by what is called a setup file, which controls the data collection procedure. The data collection procedures can thereby be changed without having to enlist the services of a programmer. However, it is likely that changes in the kind of data collected will require enhancements to Babase as the database structure is closely tied to the nature of the data. The sandbox schema may be used to extend Babase in an informal or temporary basis to work around this problem.

Note

Some of the information in this table is related to the mechanics of data collection and the Psion palmtops and not particularly relevant to research, or perhaps much of anything, but is recorded here anyway should the information ever be found useful.

Sid (Sample Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to a particular sample.

Date

The date of the sample set.[97] This column may not be NULL.

Stime (time sampling began)

The time the sampling began. The data type of this column supports a 1 second precision, however as of Jan 2007 the Psion palmtop data has a 1 minute precision. This column might be used to compute the actual times of the related point observation. This column can not be NULL.[98]

This column may be NULL for pre-2007 data because the program that loads the Psion palmtop data into Babase did not retain this information.[99]

Grp (Group observed)

The group of the focal individual. The legal values for this column are from the Gid column of the GROUPS table.

This column may not be NULL.

Sname

A three-letter code (an id) that uniquely identifies the focal animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Stype (Sample Type)

A code indicating the nature of the focal individual and the data collection procedure used. The legal values are:

Valid SAMPLES.Stype Values
CodeMnemonicDescription
JJuvenileJuvenile/infant focal sampling procedure
FFemaleAdult female focal sampling procedure

This column may not be NULL.

Mins (Minutes elapsed during sample set collection)

The total number of minutes which actually elapsed while the sample was collected, as determined by the number of point observations reported by the palmtop.[100] Although the protocols designate how many minutes should elapse, samples collected in the field do not always conform to the protocol for various reasons. For further information see the Amboseli Baboon Research Project Monitoring Guide.

This value must be zero or larger and less than or equal to 10.

This column may not be NULL.

Minsis (Minutes In Sight)

The actual number of point observations taken during the sample. Although the protocols (currently) specify that observation is to occur on 1 minute intervals during sampling, for various reasons sometimes observations are missed.

Babase maintains this value automatically by counting the number of POINT_DATA rows associated with the sample. If this value is manually set Babase compares the supplied value with the value it computes and issues an error if the two do not match.

This value must be zero or larger.

This column may not be NULL.

Observer

Initials of the person who collected the sample. The legal values of this column are defined by the OBSERVERS support table.

This column may not be NULL.

Palmtop (Palmtop identifier)

The palmtop computer used to collect the sample. The legal values for this column are defined by the PALMTOPS support table.

This column may not be NULL.

Programid (palmtop Program IDentifier)

The unique identifier of the program, including program version, used on the palmtop to collect the data in the sample. The legal values for this column are defined by the PROGRAMIDS table.

This column may be NULL as older versions of the software that runs on the Psion palmtop did not record this information.

Setupid (palmtop Setup file version IDentifier)

The version of the Psion palmtop setup file or corresponding version identifier of whatever is used on the palmtops to collect the data in the sample. The legal values for this column are defined by the SETUPIDS table.

This column may be NULL as older versions of the software that runs on the Psion palmtop did not record this information.

Darting

ANESTHS (Extra Sedation Administered During Darting)

ANESTHS contains one row for each time additional sedation is administered to a darted individual. If no additional sedation was administered then this table should not contain rows related to the darting.

Anesthetic cannot be administered to the same individual more than once at any given time -- the combination of Dartid and Antime must be unique.

Anesthetic cannot be administered before the individual is darted or before the individual is picked up -- the Antime value cannot be before either the related DARTINGS.Darttime time. Anesthetic cannot be administered after the individual recovers from the previous dose -- the Antime value cannot be later than 2 hours after the later of the DARTINGS.Darttime time or the previous administration of additional sedation.

Tip

The ANESTH_STATS view aggregates the multiple administrations of anesthetic given during a darting and so provides a convenient way to analyze ANESTHS rows.[101]

Anesthid (Extra Sedation Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to a particular administration of extra sedation.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which extra sedation was administered -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Drug (Anesthetic Administered)

Anesthetic administered to extend sedation. The legal values for this column are defined by the DRUGS support table.

This column may not be NULL.

Antime (Time when additional anesthetic was administered)

The time additional sedation was administered to the darted individual.

The time zone is Nairobi local time.

The precision of this column is 1 minute -- seconds and fractions thereof must be 0.

This column may be NULL when there is no record of what time additional sedation was administered.

Anamount (Anesthetic Amount)

The amount of anesthetic administered, in CCs.

The maximum allowed is 1.0CC. The minimum is 0. The precision allowed and accuracy are .01CC.

This column may not be NULL.

BODYTEMPS (Darting Body Temperature Measurements)

BODYTEMPS contains one row for each body temperature measurement taken of a darted individual.

The temperature cannot be measured before the individual is darted or before the individual is picked up -- the Bttime value cannot be before either the related DARTINGS.Pickuptime time or[102] the Darttime time. The temperature cannot be taken after the individual has recovered from sedation - the Bttime value, when non-NULL, cannot be later than 2 hours after the later of the DARTINGS.Darttime time or the last administration of additional sedation, if any, as recorded in the ANESTHS table. A non-NULL Bttime value implies that there must be a known time of anesthetic administration -- either DARTINGS.Darttime or ANESTHS.Antime must be non-NULL.

Btid (Packed Cell Volume Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular body temperature measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the body temperature measurement was taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Btemp (Body Temperature)

The measured temperature in degrees Celsius to a precision of 1/10th of a degree. The minimum allowed value is 25 degrees and the maximum 45 degrees.

This column may not be NULL.

Bttime (Time of Body Temperature measurement)

The time the body temperature of the darted individual was taken.

The time zone is Nairobi local time.

The precision of this column is 1 minute -- seconds and fractions thereof must be 0.

This column may be NULL when there is no record of when the body temperature measurement was taken.

CHESTS (Darting Chest Circumference Measurements)

CHESTS contains a row for each chest circumference measurement made of a darted individual.

Chid (Chest circumference measurement Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular chest circumference measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the chest circumference measurement was taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Chcircum (Chest circumference measurement)

The chest circumference measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 25 centimeters. The maximum value allowed is 99.9 centimeters.

Caution

The value contained in this column may have been adjusted for systematic observational bias. See the Chunadjusted column for more information.

This column may not be NULL.

Chunadjusted (Unadjusted Chest circumference measurement)

Some measurements were subject to systemic bias when taken. When this is known to have occurred the original, biased measurements are recorded in this column. When there is no known bias this column is NULL.

When non-NULL this column contains the original chest circumference measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 25 centimeters. The maximum value allowed is 99.9 centimeters.

Chseq (Chest circumference measurement Sequence)

A sequence number indicating the order in which the measurements were taken. The first chest circumference measurement taken during a darting has a Chseq value of 1, the second a value of 2, etc.

The system automatically re-computes Chseq values to ensure that they are contiguous and begin with 1. See the Automatic Sequencing section for further information.

Chobserver (Chest circumference measurer)

Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.

This column may be NULL.

CROWNRUMPS (Darting Crown-to-Rump Measurements)

CROWNRUMPS contains a row for each crown-to-rump measurement made of a darted individual.

Tip

The CROWNRUMP_STATS view aggregates the multiple crown-to-rump measurements taken during a darting and so provides a convenient way to analyze CROWNRUMPS rows.

CRid (Crown-to-Rump measurement Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular crown-to-rump measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the crown-to-rump measurement was taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

CRlength (Crown-to-Rump measurement)

The crown-to-rump measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 10 centimeters. The maximum value allowed is 99.9 centimeters.

This column may not be NULL.

CRseq (Crown-to-Rump measurement Sequence)

A sequence number indicating the order in which the measurements were taken.The first crown-to-rump measurement taken during a darting has a CRseq value of 1, the second a value of 2, etc.

The system automatically re-computes CRseq values to ensure that they are contiguous and begin with 1. See the Automatic Sequencing section for further information.

CRobserver (Crown-to-rump measurer)

Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.

This column may be NULL.

DART_SAMPLES (Darting Tissue Sample Records)

DART_SAMPLES contains one row for every sample type collected in each darting.

The combination of Dartid and DS_Type must be unique.

Tip

The DSAMPLES view also shows these data, one line per Dartid. For some users, this may be a more desirable way to look at these data.

Column Descriptions

DS_Id (Darting Sample collection Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to all the samples of a particular DS_Type collected during a single darting.

This column cannot be changed. This column may not be NULL.

Dartid (Darting Identifier)

The darting event during which the indicated samples were collected -- a DARTINGS.Dartid value.

This column cannot be changed. This column may not be NULL.

DS_Type (Darting Sample Type Identifier)

The DART_SAMPLE_TYPES.DS_Type of this sample.

This column cannot be changed. This column may not be NULL.

Num

The number of samples collected of the type given in the DS_Type column.

This column may not be NULL, must be greater than zero, and must be between the DART_SAMPLES.DS_Type's corresponding DART_SAMPLE_TYPES.Minimum and DART_SAMPLE_TYPES.Maximum values, inclusive.

DARTINGS (Baboon Darting Events)

DARTINGS contains one row for every darting of an animal when data was collected.

The combination of Sname and Date must be unique.

The individual must be alive and in the study population when darted -- the Date must be between BIOGRAPH.Entrydate and BIOGRAPH. Statdate, inclusive.

The system will report a warning for females darted on or after 2006-01-01 for which there is no related DART_SAMPLES row that indicates a vaginal swab collection.

The Downtime value cannot be before the Darttime value and cannot be more than 1 hour after the Darttime value.

The Pickuptime value cannot be before the Downtime value and cannot be more than 90 minutes after the Downtime value. It also[103] cannot be before Darttime and cannot be more than 90 minutes after Darttime. The system will report a warning if the Pickuptime is more than 30 minutes after the Downtime.

Dartid (Darting Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to a particular darting event.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Sname

A three-letter code (an id) that uniquely identifies the darted animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears. This column may not be NULL.

Date (Darting Date)

The date the individual was darted.

This column may not be NULL.

Darttime (Darting Time)

The time the individual was darted -- when the dart was fired. The time zone is Nairobi local time.

The time may not be before 05:00 and may not be after 20:00.

The precision of this column is 1 minute -- seconds and fractions thereof must be 0.

This column may be NULL when the time of darting is unknown.

Downtime (Time the darted individual went down)

The time the darted individual succumbed to the anesthetic. The time zone is Nairobi local time.

The precision of this column is 1 minute -- seconds and fractions thereof must be 0.

This column may be NULL when the downtime is not known.

Pickuptime (Time the darted individual was picked up by the team)

The time that the darting team picked up the anesthetized individual.

The precision of this column is 1 minute -- seconds and fractions thereof must be 0.

This column may be NULL when the pickup time is not known.

Drug (Dart Anesthetic)

Anesthetic administered by the dart. The legal values for this column are defined by the DRUGS support table.

This column may not be NULL.

Mass (Mass of the darted individual)

Mass of the darted individual, in kilograms. The precision of this column is 1/10th of a kilogram. The minimum value allowed is 1Kg. The maximum value allowed is 40Kg.

The system will report a warning when this column is NULL.[104]

Logisticnotes (Notes on Logistics)

Notes regarding the logistics of the darting. Comments about collars, anesthetic, etc. Consult the Amboseli Baboon Research Project Monitoring Guide for further guidance as to usage.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Dartcomments (General comments about the darting)

Comments about the animal’s condition, darting circumstances, etc. during darting. Consult the Amboseli Baboon Research Project Monitoring Guide for further guidance as to usage.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

CRnotes (Crown-to-Rump measurement notes)

Notes on the crown-to-rump measurements taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Chnotes (Chest circumference measurement Notes)

Notes on the chest circumference measurements taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Ulnotes (Ulna length measurement Notes)

Notes on the ulna length measurements taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Hunotes (Humerus length measurement Notes)

Notes on the humerus length measurements taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Dphysnotes (Darting Physiological measurement Notes)

Ad libitum notes taken on the physiological features of the darted individual, if any.

This column may be NULL.[105]. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

PCVnotes (PCV measurement Notes)

Notes on the PCV measurements taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Bodytempnotes (Body Temperature Notes)

Notes on the body temperature readings taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Dsamplenotes (Darting Sample Notes)

Notes that accompany any of the different samples recorded in the DART_SAMPLES table, if any.

This column may be NULL.[106]. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Teethnotes (Notes on the Teeth)

Notes on the teeth, if any observations on the teeth were made.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Caninenotes (Notes on the Canines)

Notes on the canines, if any observations on the teeth were made.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Testesnotes (Testes measurement Notes)

Notes on the testes measurements taken, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Ticknotes (Notes on the Parasite counts)

Notes on the parasite counts done, if any.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

DPHYS (Darting Physiological Measurements)

DPHYS contains one row for each darting event during which physiological measurements were taken.

Additional physiological measurements are recorded in the PCVS and BODYTEMPS tables.

Dphysid (Darting Physiological measurements Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular set of physiological measurements taken during a darting.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the set of physiological measurements were taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Pulse

The pulse of the individual in beats per minute. The pulse must be greater than 0.

This column may be NULL.

Respiration

The respiration rate of the individual measured in counts per minute. The respiration rate must be greater than 0.

This column may be NULL.

Ringnode (state of Right Inguinal lymph Node)

The state of the right inguinal lymph node. The legal values of this column are defined by the LYMPHSTATES support table.

This column may be NULL.

Lingnode (state of Left Inguinal lymph Node)

The state of the left inguinal lymph node. The legal values of this column are defined by the LYMPHSTATES support table.

This column may be NULL.

Ringnode (state of Right Axillary lymph Node)

The state of the right axillary lymph node. The legal values of this column are defined by the LYMPHSTATES support table.

This column may be NULL.

Ringnode (state of Left Axillary lymph Node)

The state of the left axillary lymph node. The legal values of this column are defined by the LYMPHSTATES support table.

This column may be NULL.

Ringnode (state of Right Submandibular lymph Node)

The state of the right submandibular lymph node. The legal values of this column are defined by the LYMPHSTATES support table.

This column may be NULL.

Ringnode (state of Left Submandibular lymph Node)

The state of the left submandibular lymph node. The legal values of this column are defined by the LYMPHSTATES support table.

This column may be NULL.

HUMERUSES (Darting Humerus Length Measurements)

HUMERUSES contains a row for each humerus length measurement made of a darted individual.

Huid (humerus length measurement Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular humerus length measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the humerus length measurement was taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Hulength (Humerus Length measurement)

The humerus length measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 10 centimeters. The maximum value allowed is 35 centimeters.

Caution

The value contained in this column may have been adjusted for systematic observational bias. See the Huunadjusted column for more information.

This column may not be NULL.

Huunadjusted (Unadjusted Humerus length measurement)

Some measurements were subject to systemic bias when taken. When this is known to have occurred the original, biased measurements are recorded in this column. When there is no known bias this column is NULL.

When non-NULL this column contains the original humerus length measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 10 centimeters. The maximum value allowed is 35 centimeters.

Huseq (Humerus measurement Sequence)

A sequence number indicating the order in which the measurements were taken. The first humerus length measurement taken during a darting has a Huseq value of 1, the second a value of 2, etc.

The system automatically re-computes Huseq values to ensure that they are contiguous and begin with 1. See the Automatic Sequencing section for further information.

Huobserver (Humerus length measurer)

Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.

This column may be NULL.

PCVS (Darting Blood Measurements)

PCVS contains one row for each PCV (packed cell volume) measurement taken from a darted individual.

PCVid (Packed Cell Volume Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular PCV measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the PCV measurement was taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

PCV (Packed Cell Volume)

The packed cell volume measurement. This is a percentage and must be between 1 and 99, inclusive.

This column may not be NULL.

PCVseq (PCV measurement Sequence)

A sequence number indicating the order in which the PCV measurements were taken. The first PCV measurement taken during a darting has a PCVseq value of 1, the second a value of 2, etc.

The system automatically re-computes PCVseq values to ensure that they are contiguous and begin with 1. See the Automatic Sequencing section for further information.

TEETH (Darting Tooth Data)

TEETH contains one row for every possible tooth site within the mouth on which data was collected for every darting event during which dentition data was collected. There may not be data on each tooth or tooth site. The absence of a row in this table says nothing about the presence or absence of a particular tooth at the time of darting.

When the tooth is missing, the Tstate is M, the Tcondition value must be NULL. When the tooth is not missing Teeth-Tcondition must be non-NULL.

There may be only one tooth in any given tooth site within the mouth, at any one time -- for any given darting there may be at most one row in TEETH for each tooth site (TOOTHSITES).

Warning

When inserting a row into TEETH a NULL Tstate value has special meaning. Inserted rows with a NULL Tstate value are silently ignored; no such rows are ever inserted.[107]

The Tstate column cannot be changed to a NULL.

Note

The DENT_CODES view may be used to maintain the TEETH table. This view may also be useful when querying. It returns a single row with individual columns for every kind of tooth.

The DENT_SITES view provides a way to query TEETH, returning a single row with individual columns for each position in the mouth.

Teethid (Teeth row Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular tooth (or tooth site when a tooth is missing).

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the tooth examinations were made -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Tooth (Tooth examined)

The tooth, or tooth site if the tooth is missing. The legal values of this column are defined by the TOOTHCODES support table.

This column may not be NULL.

Tstate (Tooth existential State)

The degree to which the tooth exists. The legal values of this column are defined by the TSTATES support table.

This column will never contain a NULL. See the warning above for more information.

Tcondition (Tooth Condition)

A code rating the physical condition of the tooth. The legal values of this column are defined by the TCONDITIONS support table.

This column may be NULL. See TEETH above.

TESTES_ARC (Darting Testes circumference Data)

TESTES_ARC contains one row for every darting event for every recorded measurement of testicle width and length circumference.

Caution

The TESTES_ARC table contains testes measurements of a portion of the testicle circumference. The TESTES_DIAM table contains testes measurements of the diameter. The two tables are otherwise identical in that they have the same structure and have corresponding validation rules.

Note

The pairing of the width and length measurements within this table exists to make data storage convenient; no special relationship is implied regarding the order in which the measurements were taken. For example, if there are 3 length measurements taken during a darting and 2 width measurements the width and length measurements may have been taken in either of the following orders, as well as other possible orders not listed here: length1, length2, length3, width1, width2 or length1, width1, length2, width2, length3. In other words the value of the Seq column describes the order in which the length measurements were taken and the order in which width measurements were taken but says nothing about the interspersing of length and width measurements.[108]

Either the width or the length must be specified -- both Testwidth and Testlength cannot be NULL in the same row.

There can only be one measurement taken per darting per testicle per measurement sequence number -- Testseq must be unique per Dartid per Testside.

Once a Testwidth value is NULL all the rows (for the same darting) with higher Testseq values must also have a NULL Testwidth value. The same is true of the Testlength column.[109]

An individual must be male to have a row in this table.

Individuals at least 7 years as of the DARTINGS.Date must have testes length (Testlength) measurement of at least 15mm and a testes width measurement (Testwidth) of at least 10mm. The system will report a warning when individuals darted when less than 7 years old have testes length measurements less than 15mm or have testes width measurements less than 10mm.

Testesid (Testes measurements Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular testes measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the testes measurements were taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Testside (Testicle measured)

The testicle measured. The legal values are:

Valid Testside Values
CodeDescription
Lthe left testicle
Rthe right testicle

This column may not be NULL.

Testlength (Testes Length measurement)

The testes length measurement, in millimeters, with a precision of 1/10th of a millimeter. The minimum value allowed is 15 millimeters. The maximum value allowed is 140 millimeters.

This column may not be NULL.

Testwidth (Testes Width measurement)

The testes width measurement, in millimeters, with a precision of 1/10th of a millimeter. The minimum value allowed is 10 millimeters. The maximum value allowed is 95 millimeters.

This column may not be NULL.

Testseq (Testes measurement Sequence)

A sequence number indicating the order in which the measurements were taken. The first measurement, of each testicle, taken during a darting has a Testseq value of 1, the second a value of 2, etc.

The system automatically re-computes Testseq values to ensure that they are contiguous and begin with 1. Note that the TESTES_ARC rows are sequenced within Dartid within Testside whereas the other darting tables are sequenced only within Dartid. See the Automatic Sequencing section for further information.

TESTES_DIAM (Darting Testes Diameter Data)

TESTES_DIAM contains one row for every darting event for every recorded measurement of testicle width and length diameter.

Caution

The TESTES_ARC table contains testes measurements of a portion of the testicle circumference. The TESTES_DIAM table contains testes measurements of the diameter. The two tables are otherwise identical in that they have the same structure and have corresponding validation rules.

Note

The pairing of the width and length measurements within this table exists to make data storage convenient; no special relationship is implied regarding the order in which the measurements were taken. For example, if there are 3 length measurements taken during a darting and 2 width measurements the width and length measurements may have been taken in either of the following orders, as well as other possible orders not listed here: length1, length2, length3, width1, width2 or length1, width1, length2, width2, length3. In other words the value of the Seq column describes the order in which the length measurements were taken and the order in which width measurements were taken but says nothing about the interspersing of length and width measurements.[110]

Either the width or the length must be specified -- both Testwidth and Testlength cannot be NULL in the same row.

There can only be one measurement taken per darting per testicle per measurement sequence number -- Testseq must be unique per Dartid per Testside.

Once a Testwidth value is NULL all the rows (for the same darting) with higher Testseq values must also have a NULL Testwidth value. The same is true of the Testlength column.[111]

An individual must be male to have a row in this table.

Individuals at least 7 years as of the DARTINGS.Date must have testes length (Testlength) measurement of at least 40mm and a testes width measurement (Testwidth) of at least 25mm. The system will report a warning when individuals darted when less than 7 years old have testes length measurements less than 40mm or have testes width measurements less than 25mm.

Testesid (Testes measurements Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular testes measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the testes measurements were taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Testside (Testicle measured)

The testicle measured. The legal values are:

Valid Testside Values
CodeDescription
Lthe left testicle
Rthe right testicle

This column may not be NULL.

Testlength (Testes Length measurement)

The testes length measurement, in millimeters, with a precision of 1/10th of a millimeter. The minimum value allowed is 15 millimeters. The maximum value allowed is 75 millimeters.

This column may not be NULL.

Testwidth (Testes Width measurement)

The testes width measurement, in millimeters, with a precision of 1/10th of a millimeter. The minimum value allowed is 10 millimeters. The maximum value allowed is 51 millimeters.

This column may not be NULL.

Testseq (Testes measurement Sequence)

A sequence number indicating the order in which the measurements were taken. The first measurement, of each testicle, taken during a darting has a Testseq value of 1, the second a value of 2, etc.

The system automatically re-computes Testseq values to ensure that they are contiguous and begin with 1. Note that the TESTES_DIAM rows are sequenced within Dartid within Testside whereas the other darting tables are sequenced only within Dartid. See the Automatic Sequencing section for further information.

TICKS (Darting Tick and Parasite Data)

TICKS contains one row for every darting event during which data on ticks and other parasites were recorded.

When a specific number could not be arrived at because there was a large number of parasites or there was some other reason why the count could not be taken, Tickcount should be left NULL.

The value of the Tickstatus column is constrained based on the Tickcount value. For further information see the documentation of the TICKSTATUSES support table and the meaning of the table's Special Values.

The combination of Dartid, Bodypart, and Tickkind must be unique.

Tickid (Tick and other parasite count Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular tick count.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the tick count was made -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Bodypart

The part of the body examined for ticks or other parasites. The legal values of this column are defined by the BODYPARTS support table.

This column may not be NULL.

Tickkind (Kind of Tick or other parasite)

The kind of tick or other parasite, or kind of parasite and it's developmental stage, or kind of parasite indicator counted. The legal values of this column are defined by the PARASITES support table.

This column may not be NULL.

Tickcount (Count of ticks or other parasites and their signs)

The recorded count of ticks, ticks in the indicated developmental stage, other parasites, or parasite signs. The minimum value allowed is 0, the maximum is 250.

This column may be NULL when there were too many parasites to count or the count was not taken for some other reason.

Tickstatus

A status value indicating whether and what sort of tick count was taken. The legal values of this column are from the Tickstatus column of the TICKSTATUSES table. See the documentation of the TICKSTATUSES support table for more information regarding what values may be used under which conditions.

This column may not be NULL.

Tickbpnotes (Body Part Notes)

Notes on the parasite infestation of the indicated body part.

Caution

Notes pertaining to parasites but not specific to the particular body part examined belong in DARTINGS.Ticknotes.

This column may contain NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

ULNAS (Darting Ulna Length Measurements)

ULNAS contains a row for each ulna length measurement made of a darted individual.

Ulid (Ulna length measurement Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular ulna length measurement.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The darting event during which the ulna length measurement was taken -- a DARTINGS.Dartid value. This column cannot be changed and may not be NULL.

Ullength (Ulna Length measurement)

The ulna length measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 10 centimeters. The maximum value allowed is 35 centimeters.

Caution

The value contained in this column may have been adjusted for systematic observational bias. See the Ulunadjusted column for more information.

This column may not be NULL.

Ulunadjusted (Unadjusted Ulna length measurement)

Some measurements were subject to systemic bias when taken. When this is known to have occurred the original, biased measurements are recorded in this column. When there is no known bias this column is NULL.

When non-NULL this column contains the original ulna length measurement, in centimeters, with a precision of 1/10th of a centimeter. The minimum value allowed is 10 centimeters. The maximum value allowed is 10 centimeters.

Ulseq (Ulna length measurement Sequence)

A sequence number indicating the order in which the measurements were taken.The first ulna length measurement taken during a darting has a Ulseq value of 1, the second a value of 2, etc.

The system automatically re-computes Ulseq values to ensure that they are contiguous and begin with 1. See the Automatic Sequencing section for further information.

Ulobserver (ulna length measurer)

Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.

This column may be NULL.

SWERB Data (Group-level Geolocation Data)

This section contains timestamped geolocation data on groups, observers, and significant landscape features (groves, waterholes[112], and possibly other temporary or permanent landmarks), either recorded in a quad coordinate system or collected from GPS units. SWERB stands for Sleeping grove, Waterhole, End time, Ranging, and Begin time. Typically SWERB data are collected at hourly or half hourly intervals. Supporting information includes the locations of tree groves and waterholes. For more information see the Protocol for Data Management: Amboseli Baboon Project.

The quad coordinate system was devised prior to the incorporation of GPS technology into the data collection protocols. It is based on regular sub-divisions of a landscape. There is no altitude information associated with quad coordinate points.

The GPS X and Y coordinates are in the WGS 1984 UTM Zone 37South coordinate system. The units of these coordinates are meters, as is the recorded altitude. The recorded precision of the X and Y values include at most 1 non-zero digit to the right of the decimal place. X and Y coordinates must be on or within the bounding rectangle having X coordinates between 42300.0 and 651000.0, inclusive, and Y coordinates between 9497000.0 and 9894500.0, inclusive. The system will generate a warning when the location falls outside the bounding rectangle having X coordinates between 277000.0 and 311100.0, inclusive, and Y coordinates between 9689200.0 and 9709500.0, inclusive. The accuracy may vary; see the Protocol for Data Management: Amboseli Baboon Project for further information on accuracy at various times. Altitude is in meters. Altitude values must be between 0 and 10000, inclusive. There must be no (non-zero) digits to the right of the decimal place for altitude measurements taken before 2004-01-01. After 2004-01-01 one digit may appear to the right of the decimal place. The system will generate a warning when altitude values are NULL but X and Y coordinates are non-NULL.

All PDOP columns must have values between 0 and 50, inclusive, and have one digit of precision to the right of the decimal. PDOP values are unit-less and should be multiplied by the specified accuracy in meters of the GPS unit to produce a 3 dimensional vector, in meters, representing the possible distance from the true location.[113]

All accuracy columns are in meters[114] with one digit of precision to the right of the decimal and must have values between 0 and 15, inclusive.

The kind of reported error is partially determined by characteristics of the the GPS unit used for data collection. GPS units which report error as a PDOP reading, those with GPS_UNITS.Errortype values of PDOP, cannot be related to rows with non-NULL Accuracy values. GPS units which report error as an accuracy reading, those with GPS_UNITS.Errortype values of accuracy, cannot be related to rows with non-NULL PDOP values. PDOP values must be NULL for data collected before 1993-09-01 or after 2001-01-31. Accuracy values must be NULL for data collected before 2001-02-01.[115] The system will report a warning when data collected with a GPS unit supporting PDOP or accuracy does not include, respectively, PDOP or accuracy values.

Warning

In May 2000, the United States government ended its use of Selective Availability, a national security measure which intentionally lowered the accuracy of GPS signals. For more information about this, see Selective Availability on GPS.gov. The GPS accuracy indices in Babase (Accuracy and PDOP) do not and cannot account for this inaccuracy, so users should be aware that any GPS data collected before May 2000 are likely less accurate than indicated.

Starting 2004-01-01, GPS data began to be downloaded directly from the GPS units instead of being transcribed by hand. One consequence is that starting 2004-01-01 operators entered up to 10 characters of descriptive codes with each GPS waypoint taken. This information is processed and distributed throughout the SWERB data but the various Garmincode columns retain the raw data as entered by the operator.[116] Before 2004-01-01 the Garmincode columns must contain a NULL. On or after this date the Garmincode columns must not be NULL, but may be a string 0 characters long.[117] SWERB_DATA are the exception to this rule and may always be NULL. Begin and end rows, rows with a SWERB_DATA.Event values of B or E, may have NULL Garmincode columns regardless of date so that the data entry staff may supply begin and end rows without X and Y coordinates should the field team forget to record a begin or end row. Other SWERB_DATA rows are except from the Garmincode requirement to handle situations, notably those which involve lone animals, where data was written manually for some reason.

Before 2004-01-01 the GPS_Datetime columns must be NULL. The date portion of the GPS_Datetime columns must correspond to the date related to containing row. The time portion of the GPS_Datetime column is not validated, although the time portion of the GPS_Datetime value occasionally serves as data against which other columns are validated.

Caution

The Garmincode and GPS_Datetime columns may be NULL, without warning, no matter the date. This is to accommodate the manual recording of data taken using GPS units.[118]

Note

Data is validated per-observation team, per-group, per-day. Data upload and maintenance must be done within transactions that produce valid per-observation team, per-group, per-day data sets.

Note that it may be more convenient to use the views that support the SWERB data than to access the raw data.

AERIALS (Aerial photos)

This table contains one row for every aerial photo used in the specification of map quadrant system used in the early SWERB data.

Aerial (Aerial Identifier)

A unique identifier of the aerial photo. This is an integer greater than or equal to 1. It is used to refer to a particular aerial photo.

This column may not be NULL.

Date

The date the aerial photo was taken. This column may not be NULL.

GPS_UNITS (Individual GPS Devices)

This table contains one row for each GPS unit which has been used in the field.

Note

In actual fact early records of unit identification may have been lost. In such cases a row in GPS_UNITS represents a number of units having the same capabilities (i.e. of the same make and model). For further information see the SWERB Notebook.

The date the unit was first used (Start) must be on or before the date the unit was last used (Finish).

The label on the GPS unit, the Label value must be unique within the time period in during which the GPS unit was in use, between the Start and Finish dates, inclusive.

GPS (GPS unit identifier)

A 2 digit non-negative numeric value that identifies the GPS unit as a distinct object throughout all time.

This column may not be NULL.

Descr (Description)

A short textual description of the GPS unit. If necessary this may include additional notes on such details as when the unit was used, its purpose, and so forth.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may not be NULL.

Make (Manufacturer)

The manufacturer of the GPS unit.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may not be NULL.

Model

The model of the GPS unit. This should be sufficiently detailed that the technical specifications of the unit can be found given this information.[119]

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may not be NULL.

Errortype (Type of Error reporting)

The type of error the unit reports. This must be one of:

PDOP

The error is supplied as positional dilution of position.

accuracy

The error is in meters.

See the SWERB Data overview for more information.

This column may not be NULL.

Label (Identifying letter marked on the unit)

The letter code marked on the unit. Note that this information is not enough to uniquely identify the unit because the same letter codes have been used on different units at different times.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may not be NULL.

Start (Date of first use)

The date the GPS device entered service. This date cannot be before 1993-09-01, the date GPS units were first used. This column may not be NULL.

Finish (Date of last use)

The date the GPS unit was taken out of service. This column may be NULL when the unit is still in service.

QUAD_DATA (map Quadrants)

The QUAD_DATA table contains one row for every map quadrant. For more information on the format of the quadrant identifiers and their history see the SWERB Notebook.

Note

Map quadrants were used to signify location in SWERB data collected before 1981-11-01.

Note

The QUADS view can be used to maintain the QUAD_DATA table. This view may also be more useful than the table when querying.

Quad (map Quadrant identifier)

The unique identifier code used to refer to a particular map quadrant.[120] This column may not be NULL.

XYLoc (X and Y WGS 1984 UTM Zone 37South coordinates)

The X and Y WGS 1984 UTM Zone 37South coordinates of the centroid of the map quadrant. This column may be NULL.

See the SWERB Data overview for more information.

Aerial (Aerial photo Identifier)

Code indicating the aerial photo in which the map quadrant is located, if any. Must be a value on the AERIALS table.

This column may be NULL when there is no aerial photo for the map quadrant.

SWERB_BES (Begin/Ends: Uninterrupted bouts of group-level observation)

This table contains one row for every uninterrupted bout of group-level observation for which there is SWERB data.

Start and Stop values are automatically assigned the SWERB_DATA.Time value from the related SWERB_DATA row with an Event value of B and E rows, respectively. The begin and end of the bout of observation is determined by the begin and end rows entered in the field (or determined by the data manager).

Start must be NULL or be after the related SWERB_DEPARTS_DATA.Time, if any.

The Start value records the start of the day's observation of the group when there exists a related SWERB_DATA.Event value of B and that value is the first for that group/day and there is no earlier SWERB_DATA.Event E value. Likewise the Stop value records the end of the day's observation of the group when there exists a related SWERB_DATA.Event value of E and that value is the last for that group/day and there is no later SWERB_DATA.Event B value. The Start time cannot be after the Stop time.

The Btimeest value is only meaningful when either there is a begin time value or when investigation of existing records indicates that no record of a begin time on file -- when either the Start time value is non-NULL or the Bsource value is NR. The Etimeest value is only meaningful when either there is an end time value or when investigation of existing records indicates that no record of an end time on file -- when the Stop time value is not NULL or the Esource value is NR. When the values in these columns are meaningful they must contain a non-NULL value, otherwise they must contain a NULL value.[121] When the source of the start or stop time is NR then the estimated time flag must be FALSE and the time must be NULL.[122][123] It is required that there be a record of whether the start and stop times are estimated when there are start and stop times -- the Start and Stop columns cannot be non-NULL when the Btimeest and Etimeest columns, respectively, are NULL.[124] It is required that there be a record of the source of the start and stop times when there are start and stop times -- the Bsource and Esource values must be NULL unless, respectively, the Btimeest and Etimeest values are non-NULL.

SWERB_BES rows are automatically sequenced when no Seq is specified[125]by Start value, unless the Start value is NULL in which case they are sequenced last of all existing SWERB_BES rows for the group/day when initially inserted and otherwise not automatically sequenced.[126] In the case of a tie the automatic sequencing places the newly inserted row[127] last among the rows that are tied. Seq values may be manually assigned so long as the manual sequencing does not result in out-of-order Start values, or in those cases where Start is NULL, so long as the manually assigned sequence number is less than or equal to that which would be automatically assigned.[128]

As expected, changing the Start value (via a SWERB_DATA row with an Event value which indicates the start of observation) will automatically change the Seq value. Should there be other SWERB_BES rows for that group/day with the same SWERB_BEs-Start value the newly changed row will be be sequenced after the existing rows.[129]

Every bout of observation must have exactly one beginning -- there must be exactly one related row on SWERB_DATA with an Event of B. Every bout of observation must have exactly one end -- there must be exactly one related row on SWERB_DATA with an Event of E. These requirements are enforced on transaction commit, so the SWERB_BE row and the begin and end SWERB_DATA rows must all be created within a single transaction. The system will generate a warning when there are no observations in a bout of observation -- when there are no related SWERB_DATA rows with Event values other than B and E.

The focal group, Focal_grp, must be in existence, based on GROUPS.Start and GROUPS.Cease_To_Exist, on the date of the observation.

BEId (Begin/End Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular bout of uninterrupted observation.

This column is automatically maintained by the database[130], cannot be changed, and must not be NULL.

DId (Departure Identifier)

The id of the SWERB_DEPARTS_DATA row representing the departure from camp of the observation team. This column cannot be changed.[131] This column must not be NULL.

Focal_grp

The group under observation. The legal values for this column are from the Gid column of the GROUPS table. This column cannot be changed.[132]This column may not be NULL.

Start (observation Starting time)

The time the bout of observation started. The time may not be before 05:00 and may not be after 20:00. The time must be on the minute mark; the seconds must be zero. This column may be NULL when the start of observation is unknown.

Btimeest (Begin Time is Estimated)

TRUE when the Start value is an estimation of the time the daily observation of the group began. FALSE otherwise. This column should be NULL when the Start time is the start of a uninterrupted bout of observation but is not the start of the day's observation of a group.

Bsource (Begin time estimation Source)

The source of the data used to estimate the Start value when that value is estimated and represents the start of the day's observation of the group -- how the start of the daily observation of the group was estimated. The legal values of this column are defined by the SWERB_TIME_SOURCES table. This column must be NULL when the Start time is the start of a uninterrupted bout of observation but is not the start of the day's observation of a group.

Stop (observation ending time)

The time the bout of observation ended. The time may not be before 05:00 and may not be after 20:00. The time must be on the minute mark; the seconds must be zero. This column may be NULL when the end of observation is unknown.

Etimeest (End Time is Estimated)

TRUE when the Stop value is an estimation of the time the daily observation of the group began. FALSE otherwise. This column should be NULL when the Stop time is the end of a uninterrupted bout of observation but is not the end of the day's observation of a group.

Esource (End time estimation Source)

The source of the data used to estimate the Stop value when that value is estimated and represents the end of the day's observation of the group -- how the end of the daily observation of the group was estimated. The legal values of this column are defined by the SWERB_TIME_SOURCES table. This column must be NULL when the Stop time is the end of a uninterrupted bout of observation but is not the end of the day's observation of a group.

Seq (daily per-group Sequence number)

A sequence number indicating the ordering of the bouts of uninterrupted observation of each group each day. The first bout of observation for the group for the day has a Seq value of 1, the second a value of 2, etc.

The system automatically re-computes Seq values to ensure that they are contiguous and begin with 1. See the overview of the SWERB_BES table and the Automatic Sequencing section for further information.

Is_Effort (does the bout count toward observer Effort)

A boolean value. TRUE means that the bout of observation counts toward total observer effort. FALSE means that the bout is concurrent with another bout of observation by the same team and should not count toward observer effort.

This column cannot be NULL.

Notes (Notes on the bout of observation)

Notes, if any, on the bout of observation. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

This column may be NULL.

SWERB_DATA (Group Level GPS Point Samples)

This table contains one row for every event related to group-level geolocation.[133]Such events geolocate a group upon the occurrence of a significant activity, including but not limited to ascent, descent, and drinking. Other events include geolocation at regular intervals and the begin and end of each bout of uninterrupted observation.

Note

The typical Babase user may find the SWERB view to be easier to query than SWERB_DATA and its related tables. It may be easier to use the SWERB_DATA_XY view to maintain SWERB_DATA than it is to modify the table content directly.

Rows with an Event value of O are not part of an observation bout of the focal group and so, unless the observed group is a Subgroup[134] or is the unknown group[135], must have a Seen_grp value which differs from that of the group under observation -- the SWERB_BES.Focal_grp value of the related SWERB_BES row. Likewise, rows which do not have an Event value of O must have a Seen_grp value of the group under observation -- a value which equals the SWERB_BES.Focal_grp value in the related SWERB_BES row.[136]The system will generate a warning when the SWERB_DATA row is for a non-focal group and the observed group is a subgroup and the observed group is the same as the focal group -- when Event is O and Subgroup is TRUE and SWERB_DATA.Seen_grp is the same as the related SWERB_BES.Focal_grp.

Per bout of observation, per BEId, there must be exactly one SWERB_DATA row recording the start and exactly one recording the finish of the bout -- exactly one SWERB_DATA row having an Event value of B and exactly one having a E value, respectively.

The time of the observation must be between the start and stop times of the bout of observation -- the Time value must be between (inclusive) the related SWERB_BES.Start and SWERB_BES.Stop values. Because SWERB_BES.Start may be NULL the Time value is also checked to be sure that it's not before the time the observation team departed from camp, before SWERB_DEPARTS_DATA.Time. Because SWERB_DEPARTS_DATA.Time may also be NULL the SWERB_Data-Time value is checked to be sure that it is not before 05:00. Because SWERB_BES.Stop may be NULL the Time value is checked to be sure that it is not after 20:00.

The date portion of the GPS_Datetime value must be the date of the observation team's departure from camp -- must equal the related SWERB_DEPARTS_DATA.Date value. The waypoint time recorded by the operator cannot be more than 15 minutes before the actual time the observation was taken -- the Time value cannot be more than 15 minutes before the time portion of the GPS_Datetime value. The exception to this rule is when a group drinks from a water hole; for these water hole events, the waypoint time cannot be more than 30 minutes minutes before the actual time the observation was taken. The waypoint time recorded by the operator cannot be more than 5 minutes after the actual time the observation was taken -- the Time value cannot be more than 5 minutes after the time portion of the GPS_Datetime value.

The Quad column records group location based on map quadrants and is used only in older data. Data recorded after 1994-09-30, rows associated with SWERB_DEPARTS_DATA rows with Date values after 1994-09-30, must have NULL Quad values. GPS units were used in later SWERB data collection so data recorded before 1993-09-01, rows associated with SWERB_DEPARTS_DATA rows having Date values before 1993-09-01, must have NULL XYLoc values.

Only data collected using GPS units have altitude, PDOP, accuracy, a GPS timestamp, or Garmincode values -- when the XYLoc column is NULL then the Altitude, PDOP, Accuracy GPS_Datetime, and Garmincode values must also be NULL.

The observed lone animal must be NULL unless the waypoint is an observation of a lone animal/non-focal group -- Lone_Animal must be NULL unless Event is O.

Note

An other group observation of an unknown lone animal is recorded in a SWERB_DATA row having a NULL Lone_Animal value and a Seen_grp value of 10.0, the group denoting a lone animal.

The observer's distance from the observed lone animal or non-focal group must be NULL unless the waypoint is an observation of a lone animal or non-focal group -- Ogdistance must be NULL unless Event is O.

The observed group, Seen_grp, must be in existence, based on GROUPS.Start and GROUPS.Cease_To_Exist, on the date of the observation.

An observed lone animal, Lone_Animal, must have already entered the study population and must be alive on the date of observation -- the SWERB_DEPARTS_DATA.Date related to the SWERB_DATA row must be on or after the individual's BIOGRAPH.Entrydate and cannot be later than the individual's BIOGRAPH. Statdate. The system will generate a warning if a lone animal is a male and is observed more than 60 days before his assigned dispersal date -- before DISPERSEDATES.Dispersed.

When a lone individual is observed the observed group must be the group reserved for lone animals -- when SWERB_DATA.Lone_Animal is non-NULL then SWERB_DATA.Seen_grp must be 10.0.

Caution

Interpolation does not reference SWERB data when making its computations. Consequently the MEMBERS table does not reflect SWERB sightings of lone individuals -- unless those sightings are otherwise recorded in the DEMOG table.

SWId (SWerb event Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular GPS event.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

BEId (Begin/End Identifier)

The id of the SWERB_BES row representing the bout of uninterrupted observation of which the SWERB_DATA row is a part. This column cannot be changed and must not be NULL.

Seen_grp

The group under observation. Note that this is not always the focal group which the observation team set out to observer. For further details see the Protocol for Data Management: Amboseli Baboon Project. The legal values for this column are from the Gid column of the GROUPS table.

This column may not be NULL.

Lone_Animal

The BIOGRAPH.Sname of the observed lone animal.

This column may be NULL to indicate either that a lone animal was not observed or that an unknown lone male was observed.

Event (half hourly/hourly, watering, begin, end, other group)

A code indicating what sort of event the row represents. The following codes are defined:

B

The row represents the beginning of a bout of uninterrupted observation of the focal group.

E

The row represents the end of a bout of uninterrupted observation of the focal group.

H

The row represents an observation of the focal group. These occur on half hourly or hourly intervals, depending on the protocol used to record the data. For further information see the SWERB Notebook.

W

The row records the focal group's drinking.

O

The row represents the observation of a non-focal group or a lone animal. For further information see the SWERB Notebook.

This column may not be NULL.

Time

The time of the observation. This is usually the manually entered by the observer but in those cases where the observer does not enter a time (such as begin and end rows) the SWERB_UPLOAD view may use GPS supplied information to calculate a time. See the section on the SWERB_UPLOAD.Description column. The time must be on the minute mark; the seconds must be zero. This column may be NULL when the time is not known.

Quad (map Quadrant)

The map quadrant of the seen group's location, when recorded in the field. The legal values for this column are from the Quad column of the QUAD_DATA table.

This column may be NULL.

XYLoc (X and Y WGS 1984 UTM Zone 37South coordinates)

The X and Y WGS 1984 UTM Zone 37South coordinates of the seen group. This column may be NULL.

See the SWERB Data overview for more information.

Altitude

The altitude, in meters, of the landscape on which the seen group is located. This column may be NULL.

See the SWERB Data overview for more information.

PDOP (error in Positional Dilution Of Precision)

The amount of error reported as positional dilution of precision. This column may be NULL when there is no PDOP information.

See the SWERB Data overview for more information.

Accuracy (in meters)

The accuracy of the GPS reading, in meters. This column may be NULL when there is no accuracy information in meters.

See the SWERB Data overview for more information.

Subgroup

TRUE when the observation is of a subgroup, FALSE when not. See the SWERB Notebook for further information.

Note that the field team cannot always record subgroup information and the value in this column is therefore sometimes determined heuristically[137] when the data is uploaded by the SWERB_UPLOAD view.

This column must not be NULL.

Ogdistance (Distance to Other Group)

The distance, in meters, between the observer and the observed non-focal group or the observer and the observed lone animal. This value must be a 3 digit non-negative integer having a last digit of 0.

This column may be NULLwhen the observers did not record an Ogdistance (i.e. NULL values are not to be confused with zero distance).

GPS_Datetime (GPS supplied Date and Time)

The date and time automatically supplied by the GPS unit at the time the waypoint was recorded. For further information on when this column is NULL and when non-NULL see the SWERB Data overview.

This column may be NULL.

Garmincode (operator supplied waypoint value)

The information manually entered by the observer into the GPS unit as a coded waypoint that describe the SWERB data being recorded. This column may be empty, it need not contain characters, but it may not contain only whitespace characters. For further information on the content of his column see the SWERB Notebook. For further information on when this column is NULL and when non-NULL see the SWERB Data overview.

This column may be NULL. See the SWERB Data overview for more information.

SWERB_DEPARTS_DATA (Observation team departures from camp)

This table contains one row for every departure from camp of every observation team, for those observation teams which have collected SWERB data.

The Time value may not be NULL when there is a related SWERB_DEPARTS_GPS row -- data collected using the GPS units must have a non-NULL time.

One observer may not depart camp on the same day at the same time with two different observation teams -- the combination of SWERB_DEPARTS_DATA.Date, SWERB_DEPARTS_DATA.Time, and SWERB_OBSERVERS.Observer, when all are non-NULL, must be unique.

The system will generate a warning for SWERB_DEPARTS_DATA rows having a Date after 1994-09-30 that do not also have a related SWERB_DEPARTS_GPS row.

The system will generate a warning for SWERB_DEPARTS_DATA rows for which no SWERB data was collected; that do not have a related SWERB_BES row.

Note

The SWERB_DEPARTS view can be used to maintain the SWERB_DEPARTS_DATA table. This view may also be more useful than the table when querying.

Warning

At the time of this writing departure data prior to about March of 2011 is not in the database. The process involved in loading historical data fabricates (departure date excepted, the actual departure date is used) the minimal required departure information. The early process used by the Data Manager involving loading data from the GPS units sometimes involved removing departure information. For further information and exact dates see the Data Manager's [Process for Uploading SWERB] document.

DId (Departure Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular departure from camp of a particular observation team.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Date

The date of departure. This date must be on or after 1981-11-01 This column must not be NULL.

Time

The time of departure. The time may not be before 04:00 and may not be after 20:00. The system will generate a warning if the time is before 05:00 or after 14:30. The time must be on the minute mark; the seconds must be zero. This column may be NULL.

SWERB_DEPARTS_GPS (SWERB GPS Departure data)

This table contains one row for every departure from camp of every observation team, for those observation teams which have collected SWERB data using GPS units. This table is an extension of the SWERB_DEPARTS_DATA that contains the additional information collected when a GPS unit is used to record the departure. There is at most one row in this table for every row in SWERB_DEPARTS_DATA. When a row exists it contains the information involving the GPS unit used by the observation team on that day. All SWERB_DEPARTS_DATA rows having associated SWERB_DEPARTS_GPS rows must have SWERB_DEPARTS_DATA.Date values on or after 1993-09-01.

The date of departure (SWERB_DEPARTS_DATA.Date) must be between the SWERB_DEPARTS_GPS' Start and Finish dates, inclusive.

Note

The SWERB_DEPARTS view can be used to maintain the SWERB_DEPARTS_GPS table. This view may also be more useful than the table when querying.

The system will generate a warning when there is more than one departure per GPS unit per day.

DId (Departure Identifier)

The id of the SWERB_DEPARTS_DATA row representing the departure from camp of the observation team. This column cannot be changed and must not be NULL.

XYLoc (X and Y WGS 1984 UTM Zone 37South coordinates)

The X and Y WGS 1984 UTM Zone 37South coordinates at departure. This column must not be NULL.

See the SWERB Data overview for more information.

Altitude

The altitude in meters of the GPS unit. This column may be NULL.

See the SWERB Data overview for more information.

PDOP (error in Positional Dilution Of Precision)

The error reported as positional dilution of precision. This column may be NULL.

See the SWERB Data overview for more information.

Accuracy (in meters)

The error reported in meters. This column may be NULL.

See the SWERB Data overview for more information.

GPS (GPS used by the team)

The identifier of the GPS device (the GPS_UNITS.GPS) used by the observation team. The legal values of this column are defined by the GPS_UNITS support table.

This column must not be NULL.

Garmincode (operator supplied waypoint value)

The information manually entered into the waypoint by the observer. This is a set of, mostly, single character codes that describe the SWERB data being recorded. This column may be empty, it need not contain characters, but it may not contain only whitespace characters. For further information on the content of his column see the SWERB Notebook. For further information on when this column is NULL and when non-NULL see the SWERB Data overview.

This column may be NULL. See the SWERB Data overview for more information.

SWERB_GWS (SWERB Grove and Waterholes)

This table contains one row for every geolocated physical object, that is, for every grove and waterhole.[138]

Caution

This table may contain one row with special meaning. The SWERB_GWS row with a Loc value of UNK represents the unknown grove -- a grove with special properties. When a SWERB_GWS row exists with a SWERB_GWs-Loc value of UNK then the Type value must be G (grove). No trees may be located in the unknown grove -- TREES.Loc may not be UNK. The unknown grove may not be located anywhere -- SWERB_GW_LOC_DATA.Loc may not be UNK. And when it is not known where a group slept there can be no uncertainty regarding the sleeping grove -- when SWERB_LOC_DATA.Loc is UNK then SWERB_LOC_DATA.Conf must be C (certain).

SWERB_GWS rows that represent groves, those with a SWERB_GWs-Type of G, have restrictions on the allowed Loc values due to the data structure supplied the SWERB_UPLOAD view (the Name column sometimes contains a grove code prefaced with the letter P). There cannot be two codes for groves, one which begins with the letter P and another which consists entirely of the same characters as the first but with the initial P omitted.[139] Because of this restriction the Babase administrator is the only user allowed to create Loc values which begin with the letter P.

With the exception of the unknown grove, the system will report a warning when the grove or waterhole has not been geolocated -- when there is no related SWERB_GW_LOC_DATA row.

Loc (Location)

A unique identifier. Up to 4 alphanumeric non-lowercase characters that uniquely identifies the row and may be used to refer to the grove or waterhole.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column cannot be changed and must not be NULL.

Type (Type of place)

The type of place; whether grove, waterhole, or some other landmark. The legal values for this column are from the Place column of the PLACE_TYPES (codes for various landscape features) table.

This column must not be NULL.

Altname (Alternative Name)

Up to 20 characters of alternative name for the grove or waterhole.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may be NULL.

Start (Starting date)

The date when the grove or waterhole was named. This date cannot be before 1981-11-01.

This column must not be NULL.

Finish (Finish date)

The date of last known use after which the resource became permanently unavailable.

This column may be NULL when observations are ongoing or the row represents an object that cannot become unavailable.

Notes

Textual notes on the grove or waterhole, if any.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may be NULL.

SWERB_GW_LOC_DATA (SWERB Grove/Waterhole Location Data)

This table contains one row for each time a location of a place, a grove or waterhole is recorded. Any given grove or waterhole may have its location recorded more than once.

Note

The typical Babase user may find the SWERB_GW_LOCS view to be easier to query than SWERB_GW_LOC_DATA and its related tables. It may be easier to use the SWERB_GW_LOC_DATA_XY view to maintain SWERB_GW_LOC_DATA than it is to modify the table content directly.

The date related to the location (SWERB_GW_LOC_DATA.Date) may not be before the grove or waterhole was first observed, may not be before the related SWERB_GWS.Start value. The date related to the location (SWERB_GW_LOC_DATA.Date) may not be after the grove or waterhole ceases existance, may not be after the related SWERB_GWS.Finish value.

The Quad column records group location based on map quadrants and is used only in older data. Data recorded after 1994-09-30, rows with Date values after 1994-09-30, must have NULL Quad values. GPS units were used in later SWERB data collection so data recorded before 1993-09-01, rows having Date values before 1993-09-01, must have NULL XYLoc values, unless the UTM XY coordinates were obtained through other means (XYSource is non-NULL).

There can only be a source for the recorded X and Y coordinates when there are recorded UTM coordinates -- the XYSource value may be non-NULL only when XYLoc is non-NULL. There must be X and Y UTM coordinates when there is a recorded source for the X and Y coodinates -- XYLoc must be non-NULL when XYSource is non-NULL.

Only data collected using GPS units have altitude, PDOP, accuracy, and GPS values -- when the XYLoc column is NULL then the Altitude, PDOP, Accuracy, GPS values must also be NULL.

The GPS unit used to make the observation must be in service on the date of the observation -- the date of the observation (Date) must be between the SWERB_DEPARTS_GPS' Start and Finish dates, inclusive, of the related GPS_UNITS row.

SGWLId (SWERB Grove/Waterhole Location Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to an observation which recorded the location of a particular grove or waterhole.

This column is automatically maintained by the database and must not be NULL.

Loc (Location)

The SWERB_GWS.Loc of the grove or waterhole associated with the recorded location.

This column must not be NULL.

Date

The date related to the location. This is either the date the location was caculated or an observation date. See the Protocol for Data Management: Amboseli Baboon Project for further information. This column must not be NULL.

Time

The time of the observation. When the data are taken with a GPS unit this is the time recorded by the GPS unit. The time cannot be before 05:00 and cannot be after 20:00. The time must be on the minute mark; the seconds must be zero. This column may be NULL when the time is not known.

Quad (map Quadrant)

The map quadrant of the grove or waterhole's location, when recorded. The legal values for this column are from the Quad column of the QUAD_DATA table.

This column may be NULL.

XYSource (Source of X/Y coordinates data)

The source of the UTM coodinate data. The legal values for this column are from the XYSource column of the SWERB_XYSOURCES (SWERB Time Sources) table.

This column may be NULL.

XYLoc (X and Y WGS 1984 UTM Zone 37South coordinates)

The X and Y WGS 1984 UTM Zone 37South coordinates of the grove or waterhole. This column may be NULL.

See the SWERB Data overview for more information.

Altitude

The altitude, in meters, of the grove or waterhole. This column may be NULL.

See the SWERB Data overview for more information.

PDOP (error in Positional Dilution Of Precision)

The error reported as positional dilution of precision. This column may be NULL when there is no PDOP information.

See the SWERB Data overview for more information.

Accuracy (in meters)

The error reported in meters. This column may be NULL when there is no accuracy information in meters.

See the SWERB Data overview for more information.

GPS (GPS used by the team)

The identifier of the GPS device (the GPS_UNITS.GPS) used in the observation. The legal values of this column are defined by the GPS_UNITS support table.

This column may be NULL.

See the SWERB Data overview for more information.

Notes

Textual notes regarding the record of the grove or waterhole's location, if any.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may be NULL.

SWERB_LOC_DATA

This table contains one row every time a group is observed at a geolocated physical object, at a grove or a waterhole or, possibly, some other physical landmark.[140]

SWERB_LOC_DATA rows must place a group at a single location -- each SWERB_DATA row has at most one related SWERB_LOC_DATA row. In effect, SWERB_LOC_DATA extends SWERB_DATA with additional columns.

Tip

When a group splits into subgroups and descends from or ascends into multiple groves there must be a separate bout of observation, another SWERB_BES row, to record the location of each subgroup.

On any given day the start of each observation team's observation of all groups but the unknown group (9.0)[141]must locate the focal group at exactly one grove (possibly the unknown grove) to record descent from the sleeping grove(s), unless all of the descents from sleeping groves are of subgroups and excepting subgroups which descend from the unknown grove[142] -- every SWERB_DATA row with a Event value of B, which represents the start of an observation team's observation of a group on a day, must have exactly one related SWERB_LOC_DATA row with a ADcode value which relates to a ADCODES row(s) having a D ADN value, ignoring those related to SWERB_DATA rows with TRUE Subgroup values that are also related to SWERB_LOC_DATA rows with UNK Loc values, unless all of the SWERB_DATA rows with a Event value of B, possibly excepting the first of the day, also have a TRUE Subgroup value. This is how the system knows the SWERB_LOC_DATA row represents descent from a grove. Further, the SWERB_DATA row representing the team's first observation of each group for each day must be related to a SWERB_LOC_DATA row recording descent from a grove. The first bout of a team's observation of a group for a day is the one with the smallest SWERB_BES.Seq value.

On any given day the end of each observation team's observation of all groups but the unknown group (9.0)[143] must locate the focal group at exactly one grove (possibly the unknown grove) to record ascent into the sleeping grove(s), unless all of the ascents from sleeping groves are of subgroups and excepting subgroups which descend from the unknown grove[144] -- every SWERB_DATA row with a Event value of E, which represents the end of an observation team's observation of a group on a day, must have exactly one related SWERB_LOC_DATA row with a ADcode value which relates to a ADCODES row(s) having a A ADN value, ignoring those related to SWERB_DATA rows with TRUE Subgroup values that are also related to SWERB_LOC_DATA rows with UNK Loc values, unless all of the SWERB_DATA rows with a Event value of E, possibly excepting the last of the day, also have a TRUE Subgroup value. This is how the system knows the SWERB_LOC_DATA row represents ascent into a grove. Further, the SWERB_DATA row representing the team's last observation of each group for each day must be related to a SWERB_LOC_DATA row recording ascent into a grove. The last bout of a team's observation of a group for a day is the one with the latest SWERB_BES.Stop time, or, should there be a bout of observation with a NULL SWERB_BES.Stop value, then the last bout can be any one of the following: either the one with the latest SWERB_BES.Stop time or one of the ones with a NULL Stop value. The database rules that enforce these ascent into sleeping grove rules are checked at Database Transactions Explained commit.[145]

The observations recording descent from or ascent into sleeping groves must be related to groves -- the related SWERB_GWS rows must have a Type of G.

Whether a SWERB_LOC_DATA row must have a NULL ADtime value or must have a non-NULL ADtime value is determined by the related ADCODES.Time flag.[146] Ascent and descent times related to about of observation cannot be before the beginning of the bout of observation -- SWERB_LOC_DATA.ADtime cannot be before the related SWERB_BES.Start time.[147] The database rules that enforce ADtime values are checked at Database Transactions Explained commit.[148]

Note

Decent and ascent times are recorded manually, they are not taken from the timestamps supplied by the GPS units. This necessitates additional columns for descent and ascent information. For further information see the Amboseli Baboon Research Project Monitoring Guide.

When the location is the unknown grove, confidence in that location must be certitude -- when the Loc value is UNK then the Conf value must be C.

Babase allows SWERB data to record group presence at arbitrary landmarks, but some possibilities are rare and result in a warning. The system will issue a warning when a group is located at a waterhole but the recorded activity is not water -- when the SWERB_GWS row's Type is W but the related SWERB_DATA row's Event value is not W.

SWERB_DATA rows representing observation of a group drinking at a waterhole must be related to waterholes -- when SWERB_DATA.Event is W there must be a related SWERB_GWS row, even if it is the generic and non-specific row which represents all rainpools, and the related SWERB_GWS row must have a Type value of W. In some cases this check is at Database Transactions Explained commit time and in other cases not.

Rows that record a drinking event, those related to SWERB_DATA rows which have W Event values, must have SWERB_LOC_DATA.ADcode values that indicate no involvement with a sleeping grove; the related ADCODES row must have a ADN value of N.

Groups may not be located at a place before observations began at the place or after observations ended at the place; the SWERB_DEPARTS_DATA.Date related to the SWERB_DATA row referenced by the SWERB_LOC_DATA.SWId value must not be before the related SWERB_GWS.Start value and must not be after the related SWERB_GWS.Finish value.

SWId (SWERB Identifier)

The number that uniquely identifies the row and may be used to refer to an observation of a group at a particular time at a particular grove or waterhole. This is also the SWERB_DATA.SWId identifying the group, place, and time of the observation.

This column must not be NULL and cannot be changed.

Loc (Location)

The SWERB_GWS.Loc of the object (grove, waterhole, or landmark) where the group was observed.

This column must not be NULL.

ADcode (Ascent/Descent Code)

A code representing the nature of relationship between the baboon group and the landscape feature at which the SWERB_LOC_DATA row places the group; with special import on whether the group slept at the location. The legal values of this column are defined by the ADCODES support table.[149]

This column must not be NULL.

Conf (level of Confidence)

Code indicating the level of confidence in the location on record. Must be a value on the SWERB_LOC_CONFS (SWERB sleeping grove Confidences) table.

Note

Although the database supports degrees of certitude with respect to any group location in practical terms the only time that there will be any degree of uncertainty will involve sleeping groves. This is for two reasons, at present the only provision in the Amboseli Baboon Research Project Monitoring Guide involving uncertainty is with respect to sleeping groves, and the SWERB_UPLOAD will only ever enter an indication of uncertainty into the database when the location is a sleeping grove.[150]

This column may not be NULL.

ADtime (Ascent/Descent Time)

The median time of group decent from or ascent into a sleeping grove. See the Amboseli Baboon Research Project Monitoring Guide for information regarding how median descent and ascent times are determined. The time may not be before 05:00 and may not be after 20:00. The time must be on the minute mark; the seconds must be zero. This column may be NULL.

SWERB_LOC_GPS

The SWERB data collection protocol which is performed using GPS units sometimes requires 2 GPS waypoint entries to record a group's presence at a physical landscape feature. (At the time of this writing descent from and ascent into sleeping groves requres 2 GPS waypoint entries.) This table contains one row every time a group is observed at a geolocated landscape feature and 2 GPS waypoints are required to record the data. The rows of this table contain the information stored in the second GPS waypoint, information automatically generated by the GPS unit or manually entered into the GPS unit, that otherwise have no place in the database.

Note

It may be easier to use the SWERB_LOC_GPS_XY view to maintain SWERB_LOC_GPS table than it is to modify the table content directly.

The SWERB_LOC_GPS table extends the SWERB_LOC_DATA table[151] with additional columns; SWERB_LOC_GPS contains at most one row for every row in SWERB_LOC_DATA.

As described in the SWERB Data overview above, data was first obtained directly from the GPS units on 2004-01-01. Consequently, this table cannot contain rows dated earlier than 2004-01-01.

SWId (SWerb event Identifier)

The number that uniquely identifies the row and may be used to refer to the GPS information involving an observation of a group at a particular time at a particular grove or waterhole. This is also the SWERB_DATA.SWId value, identifying the group, place, and time of the observation, and the SWERB_LOC_DATA.SWId value, identifying the placement of the group at a landscape feature.

This column must not be NULL and cannot be changed.

XYLoc (X and Y WGS 1984 UTM Zone 37South coordinates)

The X and Y WGS 1984 UTM Zone 37South coordinates of the SWERB_DATA.seen group. This column may not be NULL.

See the SWERB Data overview for more information.

Altitude

The altitude, in meters, of the landscape on which the seen group is located. This column may be NULL.

See the SWERB Data overview for more information.

PDOP (error in Positional Dilution Of Precision)

The amount of error reported as positional dilution of precision. This column may be NULL when there is no PDOP information.

See the SWERB Data overview for more information.

Accuracy (in meters)

The accuracy of the GPS reading, in meters. This column may be NULL when there is no accuracy information in meters.

See the SWERB Data overview for more information.

GPS_Datetime (GPS supplied Date and Time)

The date and time automatically supplied by the GPS unit at the time the waypoint was recorded. This column may not be NULL.

This column may be NULL.

Garmincode (operator supplied waypoint value)

The information manually entered by the observer into the GPS unit as a coded waypoint that describe the SWERB data being recorded. This column may be empty, it need not contain characters, but it may not contain only whitespace characters. For further information on the content of his column see the SWERB Notebook. This column may not be NULL, although it may be a string 0 characters long. See the SWERB Data overview for more information.

SWERB_OBSERVERS

For teams collecting SWERB data this table contains one row for every departure from camp of every member of the departing observation team for those team members who drive or record data.

The system will generate a warning for those SWERB_DEPARTS_DATA rows without at least one related row in SWERB_OBSERVERS.

SWERBOId (SWERB Observers Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular observer's departure from camp as part of a particular observation team.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

DId (Departure Identifier)

The id of the SWERB_DEPARTS_DATA row representing the departure from camp of the observer's observation team. This column must not be NULL.

Observer (Observer code)

Initials of the observer. The legal values of this column are defined by the OBSERVERS support table.

This column must not be NULL.

Role

The role assumed by the member of the SWERB observation team. The legal values of this column are defined by the OBSERVER_ROLES support table.

This column must not be NULL.

TREES

This table contains one row for every tree in the tree monitoring project.

Trees can only be located in groves -- the value of the TREES.Loc column must reference a SWERB_GWS row which has a SWERB_GWS.Type of G (Grove).

Tree numbers are unique within each grove. The combination of Loc and Tree must be unique.

TId (Tree Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular tree.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Loc (Tree Location)

The identifier of the grove, a SWERB_GWS.Loc value, in which the tree is located.

This column must not be NULL.

Tree (Tree number)

The integer used to uniquely identify a tree within a particular grove.

This column must not be NULL.

Weather Data

The data in this section are collected from manually read instruments, with the exception of the data in the WEATHERHAWK table, which are automatically collected by the WeatherHawk instruments.

Tip

The MIN_MAXS view provides a way to view all the tables containing manually collected weather data at once, with each weather data collection event appearing as a single row.

Note

The weather-related tables contain weather-related information and so do not directly relate to any of the baboon information contained in Babase.

RAINGAUGES (Rain Measurements)

This table contains one row for every time a rain gauge reading is recorded. There can be at most one RAINGAUGES row per WREADINGS row.

WRid

The identifier of the meteorological collection event during which the rain gauge was read. Must be a value contained in the WRid column of a row on the WREADINGS table, and the associated row may not be associated with any other row in RAINGAUGES.

This column cannot be changed; and must not be NULL.

RGspan

The interval, in an integral number of seconds, since the previous rain gauge collection event.

This column is automatically maintained by the database and cannot be changed. This column must not be NULL.

Caution

When the WREADINGS.WRdaytime values used to compute RGspan are not integral, the resulting RGspan value is rounded to the nearest second. Values of .5 seconds are rounded to the nearest even number of seconds.

Warning

When a new row is inserted the value of this column is silently ignored and an automatically computed value is used in its place. It is best to omit this column from the inserted data (or specify the NULL value).

EstRGspan

Whether or not any estimated WREADINGS.WRdaytime values were used in the computation of the RGspan column. TRUE if any of the relevant WREADINGS.Estdaytime values are true, FALSE otherwise.

This column is automatically maintained by the database and cannot be changed. This column must not be NULL.

Warning

When a new row is inserted the value of this column is silently ignored and an automatically computed value is used in its place. It is best to omit this column from the inserted data (or specify the NULL value).

Rain

The measurement of rain accumulated since the last time the rain gauge was read. In millimeters stored using a data type having a precision of 0.1 millimeter. For the precision and accuracy of the data itself see the Amboseli Baboon Research Project Monitoring Guide.

This column must be non-negative and may not be more than 200.0. This column may not be NULL.

RGSETUPS (Rain Gauge Setups)

This table contains one row for every time a rain gauge is installed. There can be no RAINGAUGES rows recording rain gauge measurements at any given weather station (WSTATIONS) unless there is a prior record of a rain gauge installation in RGSETUPS.

Rain gauge measurements are only meaningful when it is known how long the rain has been collected. In the event that, e.g., an elephant steps on the rainguage, there will be a period of time until the rain gauge is replaced. The first reading of the replacement rain gauge is not a measurement of rain since the last rainguage reading, but is instead a measurement of the rain collected since the replacement rain gauge was installed. The RGSETUPS table allows the system to compute RAINGAUGES.RGspan intervals when rain gauges are replaced, first installed, or after an interval of corrupted measurements.[152]

There cannot be a RGSETUPS row and a RAINGAUGES row for the same location at the same time.

The combination of RGSdaytime and Wstation must be unique.

RGSid

A unique positive integer representing the rain gauge setup event.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Wstation

Code indicating the station at which the rain gauge was installed. Must be a value on the WSTATIONS table.

This column cannot be changed and must not be NULL.

RGSdaytime

The day and time the rain gauge was installed. The time zone is Nairobi local time.

RGSestdaytime

TRUE when the RGSdaytime column contains an estimated time. FALSE when the RGSdaytime column is an accurate record of the time the rain gauge was installed.

RGSPerson

Initials of the person who collected the data. Must be a value contained in the Initials column of a row on the OBSERVERS table.

TEMPMINS (Minimum Temperature Measurements)

This table contains one row for every time a minimum temperature reading was recorded. There can be at most one TEMPMINS row for every WREADINGS row.

WRid

The identifier of the meteorological collection event during which the minimum temperature was read. Must be a value contained in the WRid column of a row on the WREADINGS table, and the associated row may not be associated with any other row in TEMPMINS.

This column cannot be changed; and must not be NULL.

Tempmin

The minimum temperature recorded since the last minimum temperature reading. The data type of this column has one half decimal point of precision, the digit to the right of the decimal point must be either a 0 or a 5. The actual precision of the reading may be different depending upon the units in which the temperature reading was recorded. Consult the Amboseli Baboon Research Project Monitoring Guide for further information regarding the precision and accuracy of the data.

This table must contain a value between -5 and 35, inclusive of endpoints, and must not be NULL.

TEMPMAXS (Maximum Temperature Measurements)

This table contains one row for every time a maximum temperature reading was recorded. There can be at most one TEMPMAXS row for every WREADINGS row.

WRid

The identifier of the meteorological collection event during which the maximum temperature was read. Must be a value contained in the WRid column of a row on the WREADINGS table, and the associated row may not be associated with any other row in TEMPMAXS.

This column cannot be changed; and must not be NULL.

Tempmax

The maximum temperature recorded since the last maximum temperature reading. The data type of this column has one half decimal point of precision, the digit to the right of the decimal point must be either a 0 or a 5. The actual precision of the reading may be different depending upon the units in which the temperature reading was recorded. Consult the Amboseli Baboon Research Project Monitoring Guide for further information regarding the precision and accuracy of the data.

This table must contain a value between 10 and 50, inclusive of endpoints, and must not be NULL.

WEATHERHAWK (WeatherHawk Data)

This table records the weather data automatically collected each hour by the WeatherHawk instrument.

The combination of TimeStamp and WStation must be unique.

Instrument accuracy may not, and probably does not, correspond with the recorded degree of precision. The instrument collects its data in engineering units, which are interpreted and converted to standardized units (degrees, kPa, etc.) by PC software when the data are retrieved from the instrument. Different PC software programs may vary in terms of units used, the number of significant figures employed, or other ways that are not yet apparent. There are even some values that are simply not recorded by some programs.

Despite differences in software, most measurements saved in this table use a single column and a specified unit. Data management should ensure that data are converted to the appropriate units, if needed. The allowed precision in these columns—usually a single digit to the right of the decimal—is based on a private message from WeatherHawk’s technical support[153], who asserted that this is the maximum plausible precision that the WeatherHawk is capable of measuring. This may be more precise than the value originally reported by the software.

Note

The units and decimal precision used in this table's columns are not necessarily the same as what was exported from the WeatherHawk instrument.

Tip

Use the WEATHERHAWK_SOFTWARES table to see what is known about differences in these programs, including precision of measurements, units used, etc.

The WSoftware column is used to indicate which software was used to generate the data in each row, but the system does not treat data any differently based on this value. Users should be aware of the possibility of differences between programs, and decide for themselves how to handle any possible discrepancies.

Information about the voltage of the WeatherHawk’s battery is provided in the BatVolt and BatVolt_Min columns. These values are not directly relevant to weather but can be useful if technical support is needed.

Average wind speed may be recorded in km/hr as an integer or m/s with 1 decimal point of precision, depending on the software used. The precision difference between these two measures is large enough that they are divided into separate columns, not unified into a single “wind speed” column. Each row must have a non-NULL WindSpeed_Avg_Km_Hr or WindSpeed_Avg_M_S, but cannot have both. Similarly, if maximum wind speed is recorded, it can be in the WindSpeed_Max_Km_Hr or WindSpeed_Max_M_S column, but not both.

Each row must only use a single unit for wind speed; when WindSpeed_Avg_Km_Hr is NULL, WindSpeed_Max_Km_Hr must also be NULL, and when WindSpeed_Avg_M_S is NULL, WindSpeed_Max_M_S must also be NULL.

The WeatherHawk measures rain using a 1-millimeter tip bucket and only records rain when the bucket fills (see the WeatherHawk Signature Series User's Manual for more information). If there is less than 1 mm of rainfall over the course of a given hour, the bucket may not fill up at that time and the rain will not be measured until later, or may even evaporate before the bucket fills. When there is a gap in the hourly measurements (due to changing out sensors, battery malfunctions, etc.) rainfall data during the down period might not be recorded.

Although it measures every hour, the WeatherHawk does not report rainfall on an hourly basis. Instead, it reports the cumulative rainfall since the beginning of the year (the YearlyRain column). Because rainfall for each hour will usually be more useful, the calculated column TimeStampRain is included in this table. For each row, the YearlyRain of the most recent row from the same calendar year and the same WStation is subtracted from this row’s YearlyRain, resulting in the amount of rainfall that was measured since the previous TimeStamp.

Warning

Do not assume that TimeStampRain values always describe a single hour’s worth of rain. When one or more hours is absent from the data, the TimeStampRain value is the amount of rainfall measured since the previous row in the same year. Also do not assume that these values describe all of the rain that occurred in the intervening hours. If the WeatherHawk was off or malfunctioning at the time[154], then actual rainfall may have occurred and/or evaporated without being measured.

Column Descriptions

Wid (WeatherHawk Identifier)

A unique positive integer representing the WeatherHawk's meteorological data collection event.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

WStation (Weather Station Identifier)

The WSTATIONS.Wstation of the WeatherHawk instrument used to collect the data.

This column may not be NULL.

TimeStamp

Date and time of the measurement. Measurements must be taken on the hour. Minutes, seconds, microseconds etc must be 0.

Warning

As indicated by the name, this value is a time stamp. It indicates the end of the period described in each row, not the beginning. This means that the last hour of a day will have a TimeStamp from the next day, e.g. the data from 23:00-23:59 on 31 Dec 1999 will have a TimeStamp of 2000-01-01 00:00.

This column may not be NULL.

WSoftware (WeatherHawk Software Identifier)

The WEATHERHAWK_SOFTWARES.WSoftware value indicating which software was used to generate the data.

This column may not be NULL.

RecordNum (Record Number)

The record number for this line, exported in the software. This appears to be a unique ID number used either by the WeatherHawk or the software, or both.

This column may be NULL, when the software did not report this value.

BatVolt (Battery Voltage)

The voltage of the battery at the TimeStamp. Values must be between 12.00 and 14.00, inclusive.

This column may not be NULL.

BatVolt_Min (Minimum Battery Voltage)

The minimum voltage of the battery in this hour. Values must be between 12.00 and 14.00, inclusive.

This column may be NULL, when the software did not report this value.

AirTemp_Avg

Average air temperature for this hour, in degrees Celsius. Values must be between -10.0 and 50.0, inclusive.

This column may not be NULL.

RelativeHumidity_Avg

Average relative humidity for this hour in percent humidity. Values must be between 0.0 and 100.0, inclusive.

This column may not be NULL.

WindSpeed_Avg_Km_Hr (Average Wind Speed, in km/hr)

Average wind speed for this hour, in km/hr. Values must be between 0 and 30, inclusive.

This column may be NULL.

WindSpeed_Avg_M_S (Average Wind Speed, in m/s)

Average wind speed for this hour, in m/s. Values must be between 0.0 and 15.0, inclusive.

This column may be NULL.

Solar (Solar radiation)

Solar radiation in Watts per square meter. Values must be between 0.0 and 2000.0, inclusive.

This column may be NULL[155].

AirTemp_Min (Minimum Air Temperature)

Minimum air temperature for this hour, in degrees Celsius. Values must be between -10.0 and 50.0, inclusive.

This column may be NULL, when the software did not report this value.

AirTemp_Min_Time (Time of Minimum Air Temperature)

A time stamp indicating the minute in which the AirTemp_Min occurred.

This column may be NULL, when the software did not report this value.

AirTemp_Max (Maximum Air Temperature)

Maximum air temperature for this hour, in degrees Celsius. Values must be between -10.0 and 50.0, inclusive.

This column may be NULL, when the software did not report this value.

AirTemp_Max_Time (Time of Maximum Air Temperature)

A time stamp indicating the minute in which the AirTemp_Max occurred.

This column may be NULL, when the software did not report this value.

Wind_Dir (Wind Direction)

Wind direction in degrees from North. Values must be between 0.0 and 360.0, inclusive.

Caution

The values of 0.0 and 360.0 represent the same direction. There's no telling if one or the other of them means something special, like no measurement. If they really do represent the same direction then we should probably change the rules and adjust the data values so that legal values are between 0 and 359.

This column may not be NULL.

WindSpeed_Max_Km_Hr (Maximum Wind Speed, in km/hr)

Maximum wind speed for this hour, in km/hr. Values must be between 0 and 30, inclusive.

This column may be NULL, when the software did not report the value.

WindSpeed_Max_M_S (Maximum Wind Speed, in m/s)

Maximum wind speed for this hour, in m/s. Values must be between 0.0 and 15.0, inclusive.

This column may be NULL, when the software did not report the value.

WindSpeed_Max_Time (Time of Maximum Wind Speed)

A time stamp indicating the minute in which the maximum wind speed[156] was recorded.

This column may be NULL, when the software did not report the value.

Barometer (Barometric pressure)

Atmospheric pressure at the TimeStamp, in kPa. Values must be between 85.0 and 95.0, inclusive.

This column may not be NULL.

YearlyRain (Yearly Rainfall)

The amount of rain measured since the beginning of the year, in millimeters. Values must be integers greater than or equal to 0.

This column may not be NULL.

TimeStampRain (Rainfall for this TimeStamp)

The amount of rain that was measured at this WStation since the previous TimeStamp in the same calendar year.

This column is calculated by the system automatically. Attempts to insert, update, or delete data in this column will be silently ignored.

This column may not be NULL.

WREADINGS (Weather Readings)

The WREADINGS table contains one row for each time a person has collected data from the meteorological instruments. So, each WREADINGS row should have at least one associated RAINGAUGES, TEMPMINS, or TEMPMAXS row, but no more than one associated row from any one of these tables.

Note

Automated weather readings are not recorded in WREADINGS .

For any one weather reading the minimum recorded temperature cannot exceed the maximum recorded temperature -- the TEMPMINS.Tempmin value related to the WREADINGS row cannot exceed the related TEMPMAXS.Tempmax value.

The combination of WRdaytime and Wstation must be unique.

The Wstation column cannot be changed when there is a related RAINGAUGES row.

WRid

A unique positive integer representing the meteorological data collection event.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Wstation

Code indicating the station from which the data were collected. Must be a value on the WSTATIONS table.

WRdaytime

The day and time the meteorological data were collected. The time zone is Nairobi local time.

Estdaytime

TRUE when the WRdaytime column contains an estimated time. FALSE when the WRdaytime column is an accurate record of the time the measurement was taken.

WRperson

Initials of the person who collected the data. Must be a value contained in the Initials column of a row on the OBSERVERS table.

WRnotes

Textual notes on the weather reading.

This column may be NULL when there are no notes.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.



[24] There are, of course, also system generated row identifiers, which are arbitrary and not derived from any field collected data.

[25] As opposed to using a query to let the database do all the considering for you.

[26] This is a generated error instead of one that is immediately raised in order to ease the data entry process. Because births are recorded before CENSUS rows are entered so that new births do not raise errors when uploading census data, new births regularly have dates that follow the mother's Statdate. This could be avoided by entering births without a Pid and then updating the Pid once the CENSUS table has been updated but this was deemed overly burdensome.

[27] This column was added when PostgreSQL depreciated its hidden identifier column, Oid.

[28] This is unlikely as the database will not allow entry of a duplicate Sname.

[29] At the time of this writing the Psion palmtop data collection devices use the Sname XXX for its own special purposes. There may be other such reserved Sname values unknown to Babase.

[30] Or whatever you want to call it in the case of a fetal loss.

[31] This is termed a visit in the Protocol for Data Management: Amboseli Baboon Project, which should be consulted for further details.

[32] D usually occurs when a male is seen alone or in a non-census group.

[33] When the Status column is D, the value of the Cen column indicates whether or not the individual was marked absent on the field census for the day.

[34] Facilities exist to require such CENSUS rows and their associated DEMOG rows be entered in a single transaction, and the rule requiring CENSUS rows with a Status of D to have a related DEMOG rows could then be enforced.

[35] DEMOG nearly makes the M CENSUS Status code obsolete, were it not so hard to search on textual data. Indeed, it was created in response to difficulties with the M code.

[36] It may seem odd that the Comment column may be NULL given that this is the only column in the table containing baboon-related data. However the data entered into the database can be an abbreviated version of the actual demography note, abbreviated even into non-existence.

[37] The system checks the group in which the individual was last censused present rather than the individual's Matgrp in order to accommodate group splitting.

[38] Presently group 9.0. This value is hardcoded at present.

Individuals are generally put in the unknown group when interpolation does not know their group membership, but it is also possible for an individual to be explicitly placed in the unknown group.

[39] As opposed to it being merely a coincidence that the gap began the same date that the group did.

[40] Again, as opposed to it being coincidence that gap and group ended at the same time.

[41] This selection of the larger From_group after a fusion should be made by the user. The system itself does not care about the size of the parent group(s), or whether a group started because of a fission or fusion.

[42] Technically, because this column is computed it need not exist. Still, it is convenient to have this column be pre-computed by the system at the time data are entered, rather than have this column be part of a view or computed dynamically when other data are validated.

[43] The precise definition of an "official" study group is left for data management to determine.

[44] In constrast to birth and death, which mercifully tend to be pretty definite.

[45] At the time of this writing, the date used in the case where the transition to sexual maturity was not observed is the date when the individual first came under observation and was already mature.

[46] The ON date MSTATUSES code is a special value. See MSTATUSES: Special Values.

[47] Note that this is not literally true, because testicular changes in males are not tracked on a daily basis - males are assigned a matured date on the first day of the month in which seen with fully round testes. Likewise, a female's first Tdate will sometimes have a few days of error around it, as might other transitions.

[48] Therefore during periods of continuous observation no sexual cycle transition events can go unrecorded. See the CYCPOINTS documentation below for the constraints this places on CYCPOINTS within a series.

[49] Or the reverse, as written they also do not allow the introduction or removal of a period of no observation, an end of observation/start of observation pair of CYCGAPS rows anywhere other than before a female's first or after her last CYCGAPS row.

[50] Birth or Statdate or any other date

[51] The CYCPOINTS.Cid column relates the CYCPOINTS row to the cycle of which it is a part.

[52] It is expected that these rows will exist only until related CYCPOINTS rows are entered.

[53] See Appendix C for an example.

[54] This rule minimizes the degree to which CYCPOINTS move between cycles, minimizes the degree to which their Cids change.

[55] It may not be worth documenting this, as there are certainly cases where it is not clear which rows are earlier. One such case is changing the date of a Ddate to a later date, that fall after subsequent cycles. If there is concern about the permanence of Cids then it may be best to simply delete CYCPOINTS rows and re-insert them rather than modify existing rows. This at least gives the greatest degree of control over the Cid values.

[56] Quite a bit of Babase's logic relies on there being a continuous series of Mdate, Tdate, Ddate sequences unless there are gaps in observation. It is for this reason that cycles must be complete.

[57] This is checked rather than enforced by index or trigger because the condition must exist temporarily as the triggers update the Seq.

[58] See the PREGS.Conceive documentation above.

[59] The system allows the condition to occur to provide an opportunity to insert a new Mdate, Ddate, Tdate aggregate -- a new cycle -- into the middle of a period of observation. One of these dates must be inserted first, breaking, for the moment, the pattern of cycling -- the repetition of the Mdate, Ddate, Tdate sequence.

[60] This is enforced in triggers rather than by index as the triggers use this condition as a test for whether a new CYCLES row must be created.

[61] It is expected that such rows will exist only until PREGS.Conceive is updated with a reference to them.

[62] Note that cycles may be cut off, for a variety of reasons; some cycles may only contain a single CYCPOINTS row, that is, the Cid value may be unique to a single CYCPOINTS row.

[63] See CYCLES.

[64] Or was in progress when observation ceased, which Babase treats the same as pregnancies in progress at the time data entry ceased. When now is is an important consideration in the determination of what in progress means. The cessation of data entry (e.g. BIOGRAPH.Statdate), for whatever reason, is the closest Babase comes to the concept of now.

[65] This implies that each Resume value differs from all the others.

[66] Zdate really.

[67] This condition also ensures that a female will not have more than one ongoing pregnancy, as pregnancies require a conception cycle.

[68] It is expected that such Tdates will exist only long enough to update a pregnancy's Resume value.

[69] There should only be CYCGAPS rows when a sexual cycle event may have been missed, but clearly when there is a CYCPOINTS.Resume value then no sexual cycle was missed.

[70] The MATERNITIES view does exactly this. It can be used whenever there is a need for these tables to be joined in this way.

[71] Why is this round-about-the-barn way preferred? Because curmudgeonly old database designers like to insist that keys contain no meaningful information, that's why.

[72] See? We told you that keys should not contain meaningful information.

[73] This indication of a period of no observation is not validated against the CYCGAPS table, that serves as a record of periods of no observation which are long enough that a sexual cycle transition event (Mdate, Tdate, or Ddate) may be missed. Babase does not have records of periods of no observation that are long enough to miss pregnancies. Although it would seem that CYCGAPS could be used for this purpose, and indeed CYCGAPS does black out REPSTATS, validating parity against CYCGAPS has not been thought through and awaits a future Babase enhancement.

Regardless, Babase does not presently automatically place a parity in the 100's -- the decision to switch between the 100s and the 1s (or 10s) must be made manually.

[74] This criteria is carefully phrased to account for gaps in the recorded data during the time period in which deturgesence probably began.

[75] When an individual matures, at menarche, there is no Mdate in the first sexual cycle.

[76] notably consortships

[77] There is no restriction on the age or maturity status of the female.

[78] This is not always as useful as it seems. See the rationale for the PARTS table.

[79] It is not that these interactions never occur among young individuals, it is that the researchers' interest is in paternity and maternity and so find that having to concern themselves with filtering out sexual interactions between juvenile individuals is distracting.

[80] See: SAMPLES

[81] and perhaps ejaculation

[82] Presumably data that is collected on a Psion or other electronic device.

[83] Requiring INTERACT_DATA.Observer be NULL, even when the existing value is correct and synchronized with SAMPLES.Observer, ensures that the value of the observer column has been taken into consideration by the person modifying the database.

[84] Consult the Amboseli Baboon Research Project Monitoring Guide to be sure, but this is because the accuracy of the data are never more than one minute, if that.

[86] At the time of this writing the first half of the 2006 grooming data were given first-of-the-month dates, although the database does not require this. This may be fixed by the time you read this.

[87] As opposed to recording the interaction with an electronic device.

[88] Whether or not a MPI_DATA row records a request for help is determined by whether or not the value of the related MPIACTS.Kind column is R.

[89] Because the individual from whom help was requested is unknown, there is no way to tell if help was given in response to the request.

[90] Note that if we had the time the sample started, to the second, and we knew that the operator never took more than 59 seconds to enter the point data, and we assume that the operator makes the observation when the timer chimes, then we could calculate the actual time the point was observed. Absent these conditions there appears difficult or impossible to tell which of the 1 minute observation intervals were missed when there is not an exact match between the number of points taken and the total number of minutes in the sample.

[91] However, at the time of this writing, Babase did not contain any data collected before the protocol changed.

[92] It is possible to create a view that extends the NEIGHBORS table by adding another column, call it Neighbor, that contains either the Sname or the Unksname, which ever is not NULL. However, the utility of such a column is not obvious because it seems that any analysis done using such a column would have to consistently use outer joins and then constantly test for NULL results, lest the Unksname data disappear from the analysis. At first glance this seems similar to the testing which must be done to when using two separate columns, the existing design, so it is not clear whether there's anything to be gained.

Such a view can always be added in the future without breaking backward compatibility.

[93] Assuming that the neighbor is a known individual, that the NEIGHBORS.Sname column is not NULL.

[94] The information on the actual unknown neighbor codes used in the field does not appear to be in the Amboseli Baboon Research Project Monitoring Guide.

[95] The name of the focal individual is always recorded, as there is always the intention to observe the focal individual even though this does not always happen.

[96] As the values in the POINT_DATA.Ptime column has little to do with the actual time of observation, it is impossible for Babase to perform additional consistency checks to between the points and the corresponding summary information in SAMPLES. Fortunately, as the data loading process is automated, there is little opportunity for data corruption.

[97] As all observation occurs during the day there are no issues surrounding samples taken just before midnight that start on one day and end on the next. Should there ever be such, this should be the date the sample started.

[98] The POINTS table contains only the time the data were entered into the palmtop. This could be a considerable time after the observation. Assuming the observations are taken as prescribed at regular intervals after the start of sampling the actual observation times might be derived by adding 1 minute increments to the time sampling started, and then adjusting for missed intervals using the time of data entry as a guide.

Unfortunately the low precision of the Psion palmtop data makes this an unrewarding exercise. Further, the sample starting time supplied by the palmtop may not be the actual time the sample was started. There may be some delay before the sample begins. See the Amboseli Baboon Research Project Monitoring Guide.

[99] Should Babase ever be updated with the old data, this column should be merged with the Date column to facilitate comparisons that involve both date and time.

[100] The Psion palmtop computers deliver data for every minute of the sampling set interval. Those minutes during which, for whatever reason, no data were collected have no corresponding row in POINT_DATA. But the program which loads the Psion palmtop's data into Babase (see Psionload) notes the existence of these minutes in the Mins value it constructs.

[101] The anesthetic administration times are not aggregated in this view although it could be useful to aggregate the difference between the time of darting and the time additional anesthetic was administered.

[102] To cover the case where Dartings-Pickuptime is NULL.

[103] To catch the case where Downtime is NULL.

[104] The column is allowed to be NULL due to data entry procedural constraints. The first data uploaded creates rows in DARTINGS but the data set containing mass is not uploaded until later.

[105] In a canonical database design this column would be on the DPHYS table. The column is part of the DARTINGS table due to concerns that the column might be overlooked by a user because so many other note columns are on the DARTINGS table.

[106] In a canonical database design this column would be on the DART_SAMPLES table. The column is part of the DARTINGS table due to concerns that the column might be overlooked by a user because so many other columns are on the DART_SAMPLES table and DSAMPLES view.

[107] This behavior exists so that rows can be inserted into TEETH via the DENT_CODES view.

[108] The alternative to this, an approach closer to the ideal database design, is to have separate tables for width and length measurements. This seems excessive.

[109] This rule is a result of the aforementioned design choice that places Testwidth and Testlength in the same table. A consequence of this choice is that this rule must exist to ensure that Testseq values are, effectively, contiguous.

Note that this condition must remain true even while the rows are in the process of automatic re-sequencing. It may be that some combinations of data values will simply not work with all possible UPDATE statements that change the row sequencing. Those experiencing problems should delete the rows in question and re-insert them with the correct sequence numbers.

[110] The alternative to this, an approach closer to the ideal database design, is to have separate tables for width and length measurements. This seems excessive.

[111] This rule is a result of the aforementioned design choice that places Testwidth and Testlength in the same table. A consequence of this choice is that this rule must exist to ensure that Testseq values are, effectively, contiguous.

Note that this condition must remain true even while the rows are in the process of automatic re-sequencing. It may be that some combinations of data values will simply not work with all possible UPDATE statements that change the row sequencing. Those experiencing problems should delete the rows in question and re-insert them with the correct sequence numbers.

[112] Waterholes may be more or less permanent features of the landscape, or only temporary rain pools. This is no surprise to those familiar to the SWERB dataset, but whenever waterholes are mentioned in relation to SWERB data the waterhole may be either a waterhole or a rainpool.

[113] It is believed but not certain that this is the way PDOP is used.

[114] It is not clear whether the accuracy is 2 or 3 dimensional vector; whether the reported distance includes error in altitude.

[115] Because database rules which enforce when PDOP and Accuracy values must be NULL are hardcoded into the database it will take programmatic changes to change these limits. Normally this would be avoided by adding a column to the GPS_UNITS table to indicate whether or not the particular GPS unit records a PDOP or accuracy reading, thus allowing new units to be introduced which record such data. However because records have been lost as to which specific GPS units were used when and, as of the time of this writing, no one wishes to reconstruct the categories of GPS units in use based on a PDOP/Accuracy capability criteria the system design uses hardcoded dates to validate. Note further that given the existing set of validation criteria for PDOP and Accuracy there is never a circumstance which requires a PDOP or accuracy to be present. Normally the values of GPS_UNITS.Errortype would force the presence of PDOP or Accuracy values. Instead they merely enforce their absence. This is partly for reasons similar to the preceding and partly because, particularly during periods when GPS data was hand-transcribed, sometimes data is missing.

[116] And, possibly, subsequently corrected by the data specialists after consultation with the field teams.

Because the data manager expands the observer codes in the departure rows from 1 to 3 characters the SWERB_DEPARTS_GPS.Garmincode column can hold more than 10 characters.

[117] From a database design perspective it would make sense to control whether or not a Garmincode must be present based on a column in the GPS_UNITS table. In practice because all future GPS units will very likely allow the entry of data when waypoints are taken the matter is moot.

[118] While it may be desirable to have a cutoff date after which all data obtained using GPS units must come from the GPS units themselves, no such cutoff date has been established.

[119] Electronic manufacturers have taken to silently changing the specifications of a device without changing the model, a situation which is quite annoying when the specifications matter. When no other sort of identifying information is available sometimes the serial number can be used to determine device capabilities.

[120] The Amboseli Baboon project data protocols require these codes have a particular structure. Babase does not enforce these requirements, primarily because the QUAD_DATA table is essentially a support table and, once created, is static so enforcing specific rules in the database is not worth the time.

[121] Note that rows that violate this rule are not instantly rejected; the error is caught at the time of transaction commit. This is so that during data entry Btimeest and Etimeest values may be entered without Start and Stop values in the expectation that by the time the transaction is committed the insertion of SWERB_DATA rows will have automatically filled in the missing Start and Stop values.

[122] This last check is also performed at transaction commit time, for the same reason.

[123] Ideally, a begin or end time should not be NULL unless the records have been perused and no time found, in which case the time source would always be bb_norecord when there was no time. In practice this has not been done.

[124] Note that this rule is tested for immediately, not at the time of transaction commit. This means that the Btimeest and Etimeest columns must be non-NULL before inserting SWERB begin and end rows that have non-NULL times.

[125] More precisely, when the SWERB_BES.Seq is NULL. This typically amounts to the automatic sequencing of newly inserted rows because those are the rows which typically have no Seq value.

[126] At first glance it would seem appropriate to sequence those SWERB_BES rows with NULL Start times based on the first related SWERB_DATA.Time value but this presents a number of problems. Such a design would not allow for any flexibility in manually re-sequencing such rows unless automatic sequencing took place only upon insert of SWERB_DATA rows, in which case inserting and then deleting the inserted row could change the sequencing of the SWERB_BES rows. Such un-reversible changes can be confusing.

[127] Or whatever row has has a NULL SWERB_BES.Seq value.

[128] Manual sequencing is therefore only useful when the SWERB_BES.Start is NULL or when there are ties. Sequencing is normally manipulated by changing SWERB_BES.Start values, which are themselves automatically picked up from SWERB_DATA rows with B Event values.

When testing for correct sequencing of a SWERB_BES row other bouts of observation (other SWERB_BES rows) related to the same group on the same day cannot have a smaller Seq and also have a Start value greater than the smallest related SWERB_DATA.Time related to the given row. In those cases where other bouts of observations related to the same group on the same day have a NULL Start value the comparison is instead against the other bout's earliest related SWERB_DATA.Time value. SWERB_DATA rows with NULL Time values are ignored by the automatic sequencing process.

[129] This can cause indeterminate results when more than one row is changed in a single update statement.

[130] It generally makes sense to use the last created SWERB_BES.BEId. If a BEId has been created during the current PostgreSQL session this can be referenced using the PostgreSQL expression currval('swerb_bes_beid_seq').

[131] Allowing changes to the SWERB_BES.DId column would make it difficult to maintain the automatic sequencing of the Seq values.

[132] Allowing changes to the SWERB_BES.Focal_grp column would make it difficult to maintain the automatic sequencing of the Seq values.

[133] All the lines of data dumped from the GPS units are represented as rows in the SWERB_DATA table with the exception of the departure records.

[134] When a group has fragmented a fragment of the group other than the focal fragment may be observed at some distance.

[135] For the occasional unknown other group sighting.

[136] These rules imply that when a group is in the process of undergoing fission that the data collection team taking SWERB observations will not flag one of the semi-permanent fission group having it's own code in the groups table a subgroup -- unless that semi-permanent group has itself temporarily split.

[137] I.e. guessed.

[138] Although the system design allows SWERB_GWS rows to represent places other than groves and waterholes, at the time of this writing these are the only places recorded -- with the possible exception of rain pools, which count as waterholes.

[139] Otherwise the SWERB_UPLOAD view would not be able to distinguish between the two grove codes, one of them certain, the other a probable sleeping location.

[140] At the time of this writing there the only physical landmarks recorded are groves and waterholes/rainpools.

[141] The exception of the unknown group allows for easy creation of bouts of observation of the unknown group. This is useful because all observations, including those of a non-focal group made on an ad-hoc basis, must be made as part of a bout of observation. But such ad-hoc observations of non-focal groups are made, wait for it, on an ad-hoc basis. A bout of observation may not be in progress. The creation of bouts of observation of the unknown group provide a convenient way to ensure such non-focal group observations are part of an observational bout, and hence are related to an observation team's daily effort -- to a SWERB_DEPARTS_DATA row.

[142] The exception of subgroups that descend from the unknown grove allows the creation of bouts of observation that record subgroup ascent.

[143] See the preceeding footnote for further detail.

[144] The exception of subgroups that ascend into the unknown grove allows the creation of bouts of observation that record subgroup descent.

[145] Checking ascent into sleeping grove rules at the time of Database Transactions Explained commit allows end-of-observation rows that record ascent into a sleeping grove to be inserted into the database after all other SWERB rows for that bout of observation. Because sequence numbering is not related to end of observation and because of subgroups and because of the possibility of missing end of observation times (SWERB_BES.Stop may be NULL) it is not always possible to distinguish the bout of observation which represents the last observation of the group by the team for the day without having a bout that is related to ascent into a sleeping grove. This means that tests related to end-of-observation cannot be done as rows are inserted.

[146] At the time of this writing the ADCODES values are structured such that SWERB_LOC_DATA rows that represent the first or last observation of each group by each observation team on each day, the rows that record the group's descent from or ascent into a grove, must have non-NULL ADtime values. The obverse is also true; SWERB_LOC_DATA rows that are not the first or last for the team for the group for the day, that are not associated with the group's descent from or ascent into a sleeping grove, must have NULL ADtime values.

[147] A similar rule for the end of observation is not feasible. There are time when, after the last bout of observation of the day has ended, the observation team remains in the field and happens to notice and record ascent into a sleeping grove.

[148] This allows the data that is entered in the field as two separate GPS waypoints but which comprises a single SWERB_LOC_DATA row to be inserted into the database piecemeal.

[149] The decision to create the ADCODES table instead of hardcoding values in the SWERB_LOC_DATA.ADcode column is somewhat arbitrary. At the time of this writing the SWERB_LOC_DATA table is only used to relate baboon groups with sleeping groves at the time of ascent or descent, or to relate the groups with waterholes when drinking. Baboons are never related to groves or waterholes for any other reason, nor are baboons ever related to any other landscape feature. Consequently the expectation is that there will be 3 rows created in the ADCODES table, one for ascent, one for decent, and one for neither that is used when groups drink at waterholes -- and that the ADCODES table will subsequently be forgotten.

Never the less, there is little if any extra technical work involved in having an ADCODES table and its presence opens up future opportunities for recording additional relationships between baboon groups and landscape features, opportunities that do not require any additional programming or other technical involvement. It is for these reasons that the choice was made to have an ADCODES table.

[150] Although the Amboseli Baboon Research Project Monitoring Guide has no provision for uncertainty with respect to any location other than sleeping groves the database contains no rules prohibiting such use. Because the SWERB_UPLOAD will not indicate uncertainty unless a sleeping grove is involved having such a rule seems unnecessary.

[151] SWERB_LOC_DATA is itself an extension of SWERB_DATA

[152] One would think that the TEMPMINS and TEMPMAXS tables would need a "span" column similar to RAINGAUGES.RGspan, and a table to correspond to RGSETUPS. As it happens the extraordinary diligence of the field staff in taking regular temperature measurements, in conjunction with the keen analytical skills of the Babase user population, make such an enhancement a flagrant extravagance. Or, to put it another way, it mostly works the way it is so we're leaving well enough alone.

[153] 9 Sep 2010 14:08 EDT, from Dion Almond, Yes all sensors should be good to 1 decimal place.

[154] Usually these are the only reasons these gaps occur.

[155] The WeatherHawk always reports a value for this column. However, the instrument was reporting faulty values for a period of time. An improved validation rule would force the values to NULL for the time interval(s) during which the instrument was broken, and require non-NULL values otherwise.

Chapter 4. Baboon Data: Analyzed

These tables contain baboon-related data that are the result of analysis. The RANKS table holds the result of a manual analysis, its data may be updated by Babase users. The remaining tables are automatically updated in accordance with changes made to the primary baboon source data. There is no provision for manual modification of the automatically generated tables.

These tables exist because a relatively large amount of effort, either human or machine, is required to populate them. The tables store the results of that effort and make the results readily available to further analysis.

This section first presents the tables themselves. In the case of the automatically populated tables, or whatever portions of the primary source tables are automatically generated, subsequent sub-sections explain exactly how the tables are populated and so provide further insight into their content.

Darting

WBC_COUNTS (White Blood Cell Counts)

Results from white blood cell counting performed on blood smears collected during dartings. Contains one row for each count of a blood smear. Blood smears from a Dartid must first be recorded in the DART_SAMPLES table.

After darting, blood smears are stained using a Giemsa (or similar) stain. This allows for easy identification of different types of white blood cells when viewed under a microscope. The technician systematically scans the slide and counts the number of each cell type present until reaching a high number, usually 100 or 200. The counts are then used to estimate the proportion of each cell type present in the blood.

Occasionally, blood doesn't smear well, and the technician is unable to count even 100 cells before the smear becomes too dense to read. For these cases with lower total counts, users should consider for themselves whether or not enough cells were counted to accurately estimate cell type proportions.

Each row's Count_Date must be on or after the row's related DARTINGS.Date.

The combination of Dartid and Slide_number must be unique.

The Slide_number cannot exceed the number of blood smears recorded in the related DART_SAMPLES.Num.

Column Descriptions

WCId (WBC_Counts Identifier)

A unique integer that identifies the cell count.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Dartid (Darting Identifier)

The DARTINGS.Dartid of the darting from which the counted blood smear was collected.

This column may not be NULL.

Count_Date

The date that the blood smear was counted.

This column may not be NULL.

Basophils

The number of basophils counted.

This column may not be NULL.

Eosinophils

The number of eosinophils counted.

This column may not be NULL.

Monocytes

The number of monocytes counted.

This column may not be NULL.

Lymphocytes

The number of lymhpocytes counted.

This column may not be NULL.

Neutrophils

The number of neutrophils counted.

This column may not be NULL.

Counted_by

The OBSERVERS.Initials of the person who performed this count.

This column may not be NULL.

Slide_number

An integer indicating which of this Dartid's blood smear slides was counted for this row.

This column may not be NULL.

Group Membership and Life Events

DAD_DATA (Paternity analysis results)

A summary of paternity analyses. Contains one row for each offspring having a paternity analysis.

The Kid value must be unique -- there can be at most 1 row on DAD_DATA per offspring. The BIOGRAPH row related to the Kid must have a non-NULL Birth -- the offspring must be born.

There can be information as to whether the mother has been genetically sampled (there can be a non-NULL Mom_sampled) if and only if the mother is known (BIOGRAPH.Pid of the Kid is non-NULL). The system will report an error when this is not the case. The system will not allow changes to Mom_sampled that violate this rule but does allow changes to BIOGRAPH.Pid that violate this rule. It is assumed that any inconsistencies introduced in this fashion are only temporary and will be fixed soon when the related Mom_sampled value is updated.

There can be information as to whether the father has been genetically sampled (there can be a non-NULL Dad_sampled) if and only if the father is known (Dad_consensus is not NULL).

The number of potential dads genotyped (Pdads_typed) must not be larger than the number of potential dads considered (Pdads_considered). This number must be 0 or larger.

The columns identifying potential dads, Dad_excl, Dad_1perr, Dad_5perr, Dad_allmales, and Dad_consensus are subject to a number of data integrity checks, as follows: The individual must be male. If the mother is known he must be alive during the mother's fertile period -- the male's BIOGRAPH.Statdate must be on or after the mother's Zdate minus the 5 day fertile period, minus an additional 14 days to allow for interpolation if the male is alive. If the mother is known the male must be mature before the conception date -- the male must have a row in MATUREDATES and MATUREDATES.Matured must be before the Zdate. The system will report a warning if the male is not in the mother's supergroup at any time during the fertile period.

The Loci_excl column must be NULL if the Dad_excl column is NULL. Otherwise Loci_excl must be non-NULL.

The Conf_1perr column must be NULL if the Dad_1perr column is NULL. Otherwise Conf_1perr must be non-NULL.

The Conf_5perr column must be NULL if the Dad_5perr column is NULL. Otherwise Conf_5perr must be non-NULL.

The Conf_allmales column must be NULL if the Dad_allmales column is NULL. Otherwise Conf_allmales must be non-NULL.

The Date must be on or after the offspring's BIOGRAPH.Birth date.

The Dad_consensus may not have been a perfect choice, but merely the best option; for many reasons, the genotypes of the offspring, mom, and consensus dad may conflict, or mismatch. These mismatches do not mean that the Dad_consensus is invalid. The reasons for these mismatches are known (e.g. quality of tissue samples, technological limitations) and are considered when doing the paternity analyses. A Dad_consensus is provided only when the user is reasonably confident of its accuracy, regardless of any mismatches recorded in Consensus_Mismatch.

The offspring's Consensus_Mismatch can be NULL only when the Dad_consensus is also NULL.

A Completeness score for an offspring's paternity assignment is also given. This score is a categorical expression of how much is known about the genotypes of the offspring, mother, and potential dads, as well as how much more information is expected to be gained in the future. The Completeness for an offspring with few Pdads_typed, for example, depends on whether the untyped potential dads are still alive and available for further sampling. If all potential dads are dead, then no further information is likely to arise to inform this assignment and it is probably as complete as it will ever be. If several untyped potential dads are still alive, then the assignment has the potential to change in the future and should have a different Completeness score.

Tip

Use the Completeness column when planning a new paternity analysis to help determine which paternities should be re-analyzed and which can be omitted from any further analyses.

Dadid (Dad_Data Identifier)

A unique integer which identifies the DAD_DATA row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Kid

The individual on which the paternity analysis was done. A three-letter code which uniquely identifies an individual (an Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here. This column must not be NULL.

Mom_sampled (Mother's genetic sample taken)

TRUE when there is a genetic sample of the mother on file, FALSE when there is not. This column must not be NULL.

Dad_sampled (Father's genetic sample taken)

TRUE when there is a genetic sample of the father on file (the Dad_consensus), FALSE when there is not. This column must not be NULL.

Dad_excl (Dad manually chosen based on an Exclusion analysis of genetic loci)

The father chosen based on an exclusion analysis of locus matches between the offspring and all potential fathers for which genetic data were available (note that potential fathers are by definition fathers that were in the group in which the infant was conceived during the 5 days prior to the Zdate). A three-letter code which uniquely identifies an individual (an Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here. Field observations of physical proximity, social interaction, etc., are not part of this analysis.

This column may be NULL when the exclusion analysis yields no father.

Loci_excl (Number of Loci that do not match the Dad_excl)

The number of loci at which the offspring and father, the Dad_excl, do not match.

The value of this column, when non-NULL, must be between 0 and 40, inclusive.

Pdads_considered (Number of potential dads considered)

Total number of potential dads considered. The primary factors leading to inclusion in the pool of potential fathers are maturity as of the Zdate and membership in the mother's social group during the 5 days prior to the Zdate.[157]

Tip

The POTENTIAL_DADS view may be used to produce a list of potential fathers that are currently considered to be members of the mother's group at the time of conception.

This column must not be NULL and must be between 0 and 50, inclusive.

Pdads_typed (Potential Dads Typed)

The number of potential dads, those which Pdads_considered counts, for which there are genetic data.

This column must not be NULL.

Dad_1perr (Dad chosen by software assuming a 1% error)

The father chosen by the analysis software from among potential fathers (those present in the mother's social group during the 5 days prior to the Zdate) under the assumption of a 1% error rate in the determination of the genotype at the loci. A three-letter code which uniquely identifies an individual (an Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here.

This column is NULL when the automated analysis yields no father given an 80% confidence level.

Conf_1perr (Confidence level given a 1% error assumption)

The percent confidence in the Dad_1perr result. Values must be NULL or integers between 0 and 1, inclusive.

Dad_5perr (Dad chosen by software assuming a 5% error)

The father chosen by the analysis software from among potential fathers (those present in the mother's social group during the 5 days prior to the Zdate) under the assumption of a 5% error rate in determining the genotype at the loci. A three-letter code which uniquely identifies an individual (a Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here.

This column is NULL when the automated analysis yields no father given an 80% confidence level.

Conf_5perr (Confidence level given a 5% error assumption)

The percent confidence in the Dad_5perr result. Values must be NULL or integers between 0 and 1, inclusive.

Dad_allmales (Dad chosen by software from All Males in the population)

The father chosen by the analysis software considering all males in the population under the assumption of a 1% error rate in determining the genotype at the loci. A three letter code which uniquely identifies an individual (a Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here.

This column is NULL when the automated analysis yields no father given an 80% confidence level.

Conf_allmales (Confidence level for Dad_allmales)

The percent confidence in the Dad_allmales result. Values must be integers between 0 and 1, inclusive. This column must not be NULL.

Dad_consensus (The manually chosen father-of-choice)

The father chosen taking all factors into account. A three-letter code which uniquely identifies an individual (an Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here.

This column may be NULL if there is no consensus dad.

Software (Software used in paternity analysis)

Code for the software used[158] in the genetic paternity analysis. The legal values of this column are defined by the DAD_SOFTWARE support table.

Date (Date analysis performed)

The date of paternity assignment. This column may not be NULL.

Comments

Comments on or notes regarding the analysis. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character. This column may be NULL.

Consensus_Mismatch (Mismatch Types Observed With Consensus Dad)

The PATERNITY_MISMATCHES.Mismatch category for the trio of Kid, mom, and Dad_consensus.

Completeness (Completeness of Genotypes Used for this Assignment)

The PATERNITY_COMPLETENESS.Completeness of the paternity assignment (or lack thereof) for the offspring.

This column may not be NULL.

MEMBERS (Day-by-day Group Membership)

The group membership table. This table records which group each animal is in on which date, excepting fetal losses (individuals with no Sname). There is a row in MEMBERS for every individual for every day between Birth and Statdate, inclusive, including periods during which the whereabouts of an individual are either recorded as being unknown or assumed unknown by the interpolation procedure. (See: the unknown group.) Some living individuals have MEMBERS rows after their Statdate, for more information see the section: Interpolation at the Statdate. MEMBERS is most useful when one is interested in an individual's location on a particular date. Simply check MEMBERS for the individual on that date. To find all the individuals in a group on a date, look at all the rows in the table on that date for the group.

MEMBERS is a single population-wide table created and updated automatically using information from CENSUS, BIOGRAPH, and DEMOG. The method used to do this is called interpolation and is described fully in a section below. Briefly, interpolation guesses which group an individual is likely to be in when there is no observational data. The MEMBERS rows which are the result of guessing have an I as their Origin value.

Note

Babase requires that an animal be located in exactly one group on any particular day, the combination of Sname and Date should be unique. The intent of this table is to record the location of each animal at the start of each day. See other documents for further information on how the actual practice of data acquisition and entry impacts this goal.

Babase populates this table automatically, users cannot directly manipulate the table's data.

Membid (Members IDentifier)

A unique integer which identifies the MEMBERS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

The individual whose location is being recorded. The three-letter code that identifies the individual's row in the BIOGRAPH table. There will always be a row in BIOGRAPH for the individual identified here.

This column may not be NULL.

Date

The date.

This column may not be NULL.

The group where the individual is located. This is a Gid value from GROUPS. This field should contain the most specific sub-grouping available -- subject to the constraints of the data entry protocol, of course. Aggregation into larger groupings is accomplished by retrieving the associated Supergroup from GROUPS and/or use of the supergroup() function.

This column may not be NULL.

Note

Usage exception: For the years 1989-1991, inclusive, the group recorded for the sub-groups of Alto's group do not necessarily reflect the actual groupings of the animals on a particular day, but are instead indications of the group-splitting process. See Jeanne Altmann and the Data Management Manual for a further explanation.

Origin

A one letter code indicating the source of the location information. This information is derived from, and has the same values as, the Status column of CENSUS, with the exceptions that MEMBERS.Origin contains the I (interpolated) value not found in CENSUS, and does not contain the A (absent) value. The codes are as follows: C (CENSUS) values represent census data points, I (interpolated) values are derived from the census data points, D (demography) values represent demography notes not present in the census sheets, M and N (manual) values represent census data points due to operator intervention in CENSUS . The S, E, F, B, G, T, L, and R codes are derived from analysis of historical data. See the CENSUS section for further information.

This column may not be NULL.

Interp

The time interval, in days, from the date in which an individual was previously observed to be in a group (censused or born into group -- automatic placement in the unknown group does not count) to the date of the MEMBERS row. So the value is 0 on those days on which the individuals are censused (and on the individuals' birth dates), 1 on those (non-census) days immediately before or after the census days, etc. For those MEMBERS rows in which the interpolation procedure has associated an individual with the unknown group, for lack of a better place to put them, the Interp column is the number of days distant from the interpolating CENSUS row, or the birth date, that determined the group membership. Note that the CENSUS row that determined that the MEMBERS.Grp should be unknown may record an absence.

Important

The Interp value is not meaningful over intervals that contain census rows that are themselves the result of an analysis. Over these intervals Interp is NULL. For more information see Interpolation, Data are not Re-Analyzed.

This column many be NULL.

RANKS (Rankings Within Groups)

The ranking of individuals within groups. This table contains a row for every month for every ranked individual for every type of rank assigned to the individual. When the ranking has not been done for a type of rank in a month, there are no rows for members of that group for that month with that type of rank.

Rankings are determined via a manual process that considers both quantitative information, such as the outcome of agonism interactions within a particular month, and some qualitative judgments such as other observed behavior during and surrounding the month in question. As such the rankings are somewhat smoothed and are not strictly dependent upon observations made within a single 1 month time interval. For further information please consult your local Babase scientist.

The system will report a warning when a ranking of some Rnktype has been done on a group and there are individuals (returned by the RNKTYPES.Query) who have not been ranked.

Caution

The above warning has not yet been implemented.

Rankings may be based on irregular observations of a group before the long-term study began, or before it became an "official" study group. Either way, the ranks for such a group will likely be before any of the individuals' Entrydates. Because of this, the system will allow but issue a warning when an individual's Rnkdate is before the first of the month of the individual's Entrydate.

The combination of Sname, Rnkdate, Grp, and Rnktype must be unique.

Ranks are assigned within groups, so all individuals must be in the group ranked at some point during the month. Specifically, MEMBERS must record that the ranked individual is a member of some sub-group[159]of the ranked group's Supergroup, as determined by the supergroup() function given MEMBERS.Grp and MEMBERS.Date, during the ranked month.

Caution

Be careful when changing group membership or group rankings; the rank will almost certainly change if an individual's group is changed.

A number of possibilities exist for ranking, and coding the groups, during a period of group fission. Individuals may be simultaneously ranked within all temporary sub-groups and in the original group, the supergroup, regardless of which individuals are censused in which sub-groups. The censuses of the sub-groups and the records of the temporary sub-groups may be retained forever, even if the sub-groups never become groups in their own right. The only caveat is that once a group becomes a permanent group (GROUPS.Permanent) it is no longer a sub-group of the group from which it fissioned, at which point an individual must be censused in both the original group and the sub-group in order to be ranked in both.

Rnkid (Ranks IDentifier)

A unique integer which identifies the RANKS row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Sname

The individual whose rank is being recorded. The three-letter code which uniquely identifies an individual (in Sname) in BIOGRAPH. There must always be a row in BIOGRAPH for the individual identified here. This column must not be NULL.

Rnkdate

A date that falls on the first day of a month, representing The year and month of the ranking. The year must be between 1940 and 2040, inclusive. This column must not be NULL.

Tip

Use the rnkdate() function to obtain the first day of the month when writing queries.

Grp

The group (GROUPS.Gid) in which the individual is ranked.

Rnktype

The kind of rank assigned to the individual, a Rnktype value from the RNKTYPES table. This column may not be NULL. Examples of various rankings are: Adult Females, All Females, etc., as defined in the RNKTYPES table.

Rank

This is the ranking among all the animals of the Rnktype in the group over the Rnkdate period. The most dominant individual is given a rank of 1, the next most dominant a rank of 2, etc. This information is updated through the ranking program and as a rule need not be manually updated. This column must not be NULL. The rank values must be contiguous and start with 1.[160]

Hybrid Scores

HYBRIDGENE_ANALYSES

A table listing each analysis that has been performed to generate genetic hybrid scores for individual baboons, with basic information about each analysis.

Each analysis combines statistical techniques with the genetic data available at the time to estimate what proportion of each individual's genome came from ancestry of a specified other species[161]. These estimates are the so-called hybrid scores. After several years have elapsed, more individuals are available for scoring, which prompts a new analysis. For many reasons, each analysis may yield somewhat different scores for the same individual. A more-recent analysis does not necessarily negate or supersede an older one, so all analyses are stored here.

The HYBRIDGENE_ANALYSES.Date must be after the BIOGRAPH.Entrydate of all individuals scored in that analysis in HYBRIDGENE_SCORES.

Column Descriptions

HGAId (HybridGene_Analyses Identifier)

A unique integer that identifies the analysis.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Date

The date the hybrid scores were generated by this analysis.

This column may not be NULL.

Analyzed_By

The OBSERVERS.Initials of the person who performed the analysis.

Tip

It's technically possible to have more than one person involved with an analysis, but even in such cases there will certainly be a lead whose initials should fill this column.

This column may not be NULL.

Software

The HYBRIDGENE_SOFTWARE.Software used to perform the analysis.

This column may not be NULL.

Marker

The MARKERS.Marker type used in the analysis.

This column may not be NULL.

Comments

Notes or comments about the analysis.

This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

This column may be NULL.

HYBRIDGENE_SCORES

A table listing all the hybrid scores determined by genetic hybridity analyses. Hybridity analyses use statistical tools that also determine upper and lower confidence limits. This table also stores those values.

The combination of Sname and HGAId must be unique.

An individual's Score must be greater than its Lower_Conf (inclusive), and less than its Upper_Conf (inclusive).

Column Descriptions

HGSId (HybridGene_Scores Identifier)

A unique integer that identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

HGAId (HybridGene_Analyses Identifier)

The HYBRIDGENE_ANALYSES.HGAId of the analysis in which this score was determined.

This column may not be NULL.

Sname

The BIOGRAPH.Sname of the scored individual.

This column may not be NULL.

Score

The individual's hybrid score for this analysis. This value must be between 0 and 1 (inclusive).

This column may not be NULL.

Lower_Conf

The lower confidence limit for the hybrid score. This value must be between 0 and 1 (inclusive).

This column may not be NULL.

Upper_Conf

The upper confidence limit for the hybrid score. This value must be between 0 and 1 (inclusive).

This column may not be NULL.

Interpolation

The Babase database uses a procedure called interpolation to update MEMBERS whenever the CENSUS table, or the BIOGRAPH.Birth, or BIOGRAPH.Statdate columns are updated. Interpolation extrapolates the group membership of individuals into days for which there is no actual observation of the individuals' whereabouts. It guesses with which group an individual is associated, given knowledge of the individual's group membership (or lack thereof) at given points in time, and records the result in MEMBERS. Thus, MEMBERS always has a row recording group membership for every day of every individual's life.

This section is comprised of 3 sub-sections. The first section introduces interpolation incrementally. Rules are presented in an informal fashion and examples and exceptions progressively developed. The second section is a formal specification of interpolation. The third section supplements the formal specification with expectations regarding the use of interpolation and brief descriptions of interpolation's implications. Most of the third section is a restatement of material already presented in the first section.

Interpolation's 3 Fundamentals

It is primarily by the field census records that Babase tracks group membership. However, despite its name, within the Babase database the the CENSUS table is the source of all group membership information and so contains data from sources other than just the field census records. Babase places rows in the CENSUS table to indicate presence in a group whenever any demography information is stored other tables.[162][163] Throughout this section it is to be understood that any sort of demographic information that results in CENSUS data are implied when the term census, or its plural, is used. Unfortunately, the term census is further overloaded. It is occasionally used in the colloquial sense, meaning present -- found when a group census was taken, the alternative being absent. It is hoped the meaning will be clear from context.

It is important to remember that censuses record absence from a group as well as presence in a group, that there are two mutually exclusive classes of CENSUS rows: absences, records of absence from specific groups on specific days; and locating censuses, records that place the individual in specific groups on specific days.

The premise of interpolation is that an individual is assumed to be in the group where observed for a period of 14 days to either side of the observation unless there's indication otherwise. To this end, interpolation keeps an individual in the group where a census locates him for a time period that is the shorter of:

  1. Half of the time interval between the individual's next (or prior) census that finds the individual in any group.

  2. Half of the time interval between the next (or prior) recorded absence from the group in which the individual was censused. Absences from other groups are ignored.

  3. The 14 day Interpolation Limit. Given no other information, an individual is considered to remain (or have been) in the group where observed for 14 days following (or preceding) the date of observation.

Should the above process not place an individual in a group, the individual is placed in the unknown group; so long as the individual is alive on the day in question.

There are some subtleties to these rules, and there is further elaboration necessary to allow for old style CENSUS rows, which do not directly correspond with actual census taking, and other factors. But these rules are the foundation and we begin with them.

Interpolation Visualized

Interpolation is best described with the help of diagrams as it is all about computing and comparing time intervals of various lengths, which are easily represented in a diagram by lines of various lengths. We begin with the simplest case, censusing a single individual either present or absent in a single group. This simple case is elaborated on extensively to illustrate a variety of special cases such as birth, death, prolonged periods without observation, and so forth, before introducing the complexities of multiple groups into the example.

Tip

As the examples throughout this section are developed be sure to pay close attention to the diagrams' keys. At times the meaning of a symbol changes from diagram to diagram to reflect a subtlety.

Interpolating presences and absences

Figure 4.1 shows a record of one individual's censuses. The group, for the moment we'll assume group 1, is censused 4 times over a period of 11 days. One day the individual is absent.

Figure 4.1. An Individual is Censused Present and Absent

                  One individual's census records
   CENSUS:        C       C                   A           C
     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
              

The first step in interpolation is to construct the various intervals from the given CENSUS rows. Figure 4.2 shows how interpolation splits the difference between presences and absences to construct two intervals for each locating census, one preceding the census and one following it. As the diagrams given here can only show a window in time and omit what falls outside that window, only one interval each is shown for the censuses taken on day 1 and day 11.

Figure 4.2. Interpolating From Presences and Absences

                  Interpolation intervals within a group
   CENSUS:        C       C                   A           C
Intervals:        X---|---X---------|         O     |-----X
     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Interpolation creates MEMBERS rows that place the individual in a group each day. Figure 4.3 shows how group membership assignment is based upon the computed intervals. Because of the absence, the individual is placed in group 9, the unknown group, on some days.

Figure 4.3. Interpolating Group Membership

                  Intervals determine group membership
   CENSUS:        C       C                   A           C
Intervals:        X---|---X---------|         O     |-----X
  MEMBERS.
    Group:        1   1   1   1   1   9   9   9   9   1   1
   Origin:        C   I   C   I   I   I   I   I   I   I   C

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Figure 4.3 also introduces the MEMBERS' Origin column. As can be seen, the Origin column mimics the corresponding CENSUS Status column on those days when interpolation is not guessing group membership. Origin is I on those day when interpolation is guessing.

The MEMBERS' Interp column represents number of says from a census in which an individual was recorded as present in some known group. Interp is zero on those days when a census has located the individual. The recorded absence is reflected in the group, but is immaterial to Interp. Even though there's an absence, the Interp count is over the interval between the two locating censuses. Interp gets its value from a split the difference between censuses that record presence in the group, a different sort of split the difference than is used to determine into which group an individual should be placed. Figure 4.4 extends Figure 4.3, showing the computation of Interp. With this addition the interpolation has finished, the MEMBERS table can be constructed from the given CENSUS rows.

Figure 4.4. Computing Interp Values

                  The resulting MEMBERS rows
    CENSUS:        C       C                   A           C
 Intervals
 For Group:        X---|---X---------|         O     |-----X
For Interp:        X~~~|~~~X~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~X
   MEMBERS.
     Group:        1   1   1   1   1   9   9   9   9   1   1
    Interp:        0   1   0   1   2   3   4   3   2   1   0
    Origin:        C   I   C   I   I   I   I   I   I   I   C

      Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
~ Inside of interval
| Midpoint of interval
              

Applying the 14 day interpolation limit

So far we have only explored the first 2 of the 3 fundamental interpolation intervals, those dealing with being censused present and absent. Before we elaborate further and examine the more complicated interactions between presences and absences let us dispense with the 14 day interpolation limit.

Figure 4.5 shows the effect of the 14 day interpolation limit. To save space in this document, some days are removed from the interval. There are no censuses, present or absent, on the days omitted. As the Date: line shows, a total of 33 days are examined, an entire month 31 days in length and the first two days of the following month. Again, we assume the censuses are taken in group 1.

Figure 4.5. The 14 Day Interpolation Limit

                 The shorter intervals are chosen
      CENSUS:    C                                           C
C C Interval:    X----- ... -----------|------- ... ---------X
14 Day Limit:    X----- ... -------|       |--- ... ---------X
     MEMBERS.
       Group:    1   1  ...  1   1   9   9   1  ...  1   1   1
      Interp:    0   1  ... 13  14  15  15  14  ...  2   1   0
      Origin:    C   I  ...  I   I   I   I   I  ...  I   I   C

        Date:    1   2  ... 14  15  16  17  18  ... 31   1   2

Key:
C Censused present in group (group 1)
X Known present in group (group 1)
- Inside of interval
| Interval endpoint
              


Because the 16th and 17th are more than 14 days away from either census the individual is placed in the unknown group on those days. Days that are closer to the actual censuses are interpolated into group 1. So, as the rules require, the individual is interpolated into the censused group for the shorter of the two time periods. As before, all the interpolated MEMBERS rows, those which do not correspond to an actual census, have an Origin of I. And as before, the Interp column counts up from and down to the actual censuses.

Interpolation and Birth Dates

There are some exceptions to the rules as stated so far. Not surprisingly, interpolation will not presume to put an individual in a group, create a MEMBERS row, before the individual's Birth date.

The birth date is an exception in another fashion, it locates the individual in his Matgrp like a special sort of census. The rationale for this is that although the birth may not be observed, the individual most certainly enters the group when born. Further, this rule ensures that we have a row in MEMBERS for every day the individual is alive. When there is a regular census on the birth date[164] the resultant MEMBERS row, having a date matching the individual's birth date, is no different from the individual's other MEMBERS rows that have dates which match the individual's other census dates; they all have an Origin of C and an Interp of 0. When there is no locating census on the birth date the resulting MEMBERS row still have a 0 Interp value, but have a Origin of I, not C. The Origin reflects the fact that there was no actual census, while the Interp shows that the individual was located that day. Figure 4.6 shows an individual that was not censused on his birth date.

Figure 4.6. Interpolation at Birth

                  Individual born into group 1
   CENSUS:                B           C   C           C
Intervals:                X-----|-----X-|-X-----|-----X
  MEMBERS.
    Group:                1   1   1   1   1   1   1   1
   Interp:                0   1   1   0   0   1   1   0
   Origin:                I   I   I   C   C   I   I   C

     Date:        1   2   3   4   5   6   7   8   9  10

Key:
B Born (into group 1)
C Censused present in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
                          


Clearly, there are no MEMBERS rows before the birth date, the individual is in his Matgrp on the day of his birth, and the Interp value counts up from the birth date and then down to the next census as though there were a census on the birth date.

An individual is placed in his Matgrp on his birth date even when a regular census has an absence recorded for the individual on the date of birth.[165]

Interpolation at the Statdate

Another exception to the rules, or rather two exceptions, occur at the Statdate. You might expect that interpolation would not place a row after the individual's Statdate, and this is indeed true, but true only when the individual is dead. When an individual is alive, interpolation will place a row after the individual's Statdate, but only when there is a subsequent absence from the same group as the group in which the individual was censused.[166][167] While at first this may seem odd, the reasoning behind this behavior is clear -- the Statdate is not the last date on which there are data for the individual. This is elaborated below.

All the same, at times there is a reason to have interpolation halt at the Statdate. When individuals are alive the system should not try to interpolate into time periods for which data have yet to be entered, else-wise there would always be spurious interpolated MEMBERS rows which vanish as soon as additional data are entered. The trouble with creating such rows is that, although the interpolation is corrected and the rows disappear once data entry resumes, the use of these rows in analysis is always inappropriate. Such rows will exist at the end of every period of data entry, as there will always be a large number of living individuals found in their groups on the last census entered. The solution is to not create the rows.[168] When a living individual has no later absences from the group where last located, no absences from the group of his last locating census that post-date his last locating census, this is taken to mean that there are additional as yet unentered data on the individual. In this case interpolation stops on the day the individual was last found in a group. This situation is shown in Figure 4.7, where the last census taken found the individual in group 1 on day 5, and so this day is the individual's Statdate as well. There is no interpolation past the last census.

Figure 4.7. Alive and Present When Last Censused

                  Living individual with Statdate of 5
   CENSUS:        C           A   C
Intervals:        X-----|     O |-X
  MEMBERS.
    Group:        1   1   9   9   1
   Interp:        0   1   2   1   0
   Origin:        C   I   I   I   C

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

In Figure 4.8 more data have been entered, the individual has been missing since the last census shown in Figure 4.7 above. As there have been no further censuses during which the individual was found the individual's Statdate is still day 5, although there is now subsequent interpolation. Notice that there are no MEMBERS rows created after day 7. When interpolating a living individual, after the Statdate there is no default placement of the individual into the unknown group.[169]

Figure 4.8. Alive and Absent in Last Census[170]

                  Living individual with Statdate of 5
   CENSUS:        C           A   C                   A   A
Intervals:        X-----|     O |-X---------|         O
  MEMBERS.
    Group:        1   1   9   9   1   1   1
   Interp:        0   1   2   1   0   1   2
   Origin:        C   I   I   I   C   I   I

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Although the only change between Figure 4.7 and Figure 4.8 is the entry into CENSUS of rows recording absence, that is enough to signal that interpolation can go forward without creating spurious MEMBERS rows -- rows likely erased upon the entry of more data. It is important that interpolation does go forward in this case, past the Statdate, as otherwise bias would be introduced. The last C CENSUS would be interpolated differently from all the other censuses. To be sure, there is bias introduced in Figure 4.7 when interpolation is cut short. But censoring bias at the end of data collection is unavoidable, whereas we can avoid introducing bias here.

Warning

So long as an individual is alive the last CENSUS to locate the individual ought be followed by a record of absence, an absence from the group where the individual was last found. To do otherwise, as must occur when there is simply no further data to be entered, is to introduce a bias into MEMBERS.

In Figure 4.9 there is no additional census information, but the individual's Status has been adjusted to mark the individual dead. A new Statdate value indicates the individual died on day 9 and interpolation is now up to and including the day of death. As is usual, when an individual's group membership cannot be determined he is placed in the unknown group.

Figure 4.9. Interpolation to Statdate When Dead

                  Dead individual with Statdate of 9
   CENSUS:        C           A   C                   A   A
Intervals:        X-----|     O |-X---------|         O
  MEMBERS.
    Group:        1   1   9   9   1   1   1   9   9
   Interp:        0   1   2   1   0   1   2   3   4
   Origin:        C   I   I   I   C   I   I   I   I

     Date:        1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

Although Figure 4.9 does not show this, the 14 day interpolation limit applies when the individual is dead. When there are no absences after the last census and there are more than 14 days between the last census and the Statdate the individual is placed in the unknown group from the 15th day through the day of death.

The Midpoint Rule

The alert reader may have noticed that the above examples are carefully crafted so that the midpoint between presences and absences always falls between two days. What happens when there is an odd number of days in the interval so that the midpoint is a day exactly in between the endpoints, as occurs 3 times in Figure 4.10?

Figure 4.10. Midpoint Days

                  Intervals with an odd number of days
     CENSUS:      C       A               C   C       A   C
  Intervals:      X---|   O       |-------X-|-X---|   O |-X
       Date:      1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Midpoint between census takings
              

The MEMBERS table has a 1 day precision, there is no way to be in a group in the morning and out of it in the afternoon, so on any one midpoint day the individual must either be in the group or out of it. Should the individual be in the group on midpoint day or out of it? The question is resolved using a property of the date itself. Briefly, the Julian dating system is a method of assigning every day a unique number. As a midpoint day is no more likely to be on one day than another, we can avoid bias by using whether or not the midpoint day falls on an even or an odd Julian date to resolve the problem.

Whenever interpolation is called upon to halve an interval between two CENSUS rows that contains an odd number of days then the midpoint day is assigned to the left, earlier, half of the interval when the Julian date of the midpoint day is even. A midpoint day is assigned to the right, later, half of the interval when the Julian date of the midpoint day is odd.

So, The Midpoint Rule resolves the issue by adjusting the intervals as shown in Figure 4.11. The intervals are no longer perfectly halved. On the midpoint day there is no preference either for or against interpolating the individual into the group censused.

Figure 4.11. The Midpoint Rule Adjusts Intervals

                  Intervals with an odd number of days
     CENSUS:      C       A               C   C       A   C
  Intervals:      X-----| O     |---------X-|-X-|     O |-X
    MEMBERS.
      Group:      1   1   9   9   1   1   1   1   9   9   1
     Interp:      0   1   2   3   2   1   0   0   1   1   0
     Origin:      C   I   I   I   I   I   C   C   I   I   C

Julian Date:      1   2   3   4   5   6   7   8   9  10  11

Key:
C Censused present in group (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
| Interval endpoint
              

Interpolating When The Group Changes

Having dispensed with the various elaborations and exceptions that occur in unusual cases it is time to return to the fundamentals of interpolation and examine what happens when an individual moves between groups. What comes into play are the first 2 of the 3 interpolation intervals. Recall:

Interpolation keeps an individual in the group where a census locates him for a time period that is the shorter of:

  1. Half of the time interval between the individual's next (or prior) census which finds the individual in any group.

  2. Half of the time interval between the next (or prior) recorded absence from the group in which the individual was censused. Absences from other groups are ignored.

Figure 4.12 shows a record of one individual's censuses. He, a male, is censused in 2 groups, group 1 and group 2. The census records for each group reflect both presence in the group and absence from the group.

Figure 4.12. An Individual is Censused in 2 Groups

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
              

Figure 4.13 shows what would happen if interpolation worked with each group separately. There are conflicts, days when the individual is in both groups. Something else must be done.

Caution

Figure 4.13 is an example of an interpolation method that does not work. The method shown in the figure is not one Babase uses when interpolating.

Figure 4.13. Interpolating Each Group Separately

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C

   Group 1        Interpolating just group 1
    CENSUS:       C       C                   A   C   A
 Intervals:       X---|---X---------|         O |-X-| O

   Group 2        Interpolating just group 2
    CENSUS:       A                   C               C
 Intervals:       O         |---------X-------|-------X

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present
O Known absent
- Presumed present
| Interval endpoint
              


The solution is return to the interpolation fundamentals. We begin by taking a closer look at the way we have been diagramming intervals. In Figure 4.13 the first group has 3 locating census and 2 absences, and yet we've diagrammed the resultant intervals on a single line. The interpolation fundamentals tell us to obtain 2 pairs of intervals for each locating census. A halfway to census pair of intervals and a halfway to absence pair of intervals. Figure 4.14 takes the CENSUS rows of the first group shown in Figures 4.12 and 4.13 and does this for each locating census. In Figure 4.14 the CENSUS rows of days 1, 3 and 9 each have their own sections detailing the intervals to the nearest censuses and intervals to the nearest absences. The lines labeled Presence show the intervals that are halfway from each locating census to the next. The lines labeled Absence show the intervals that are halfway from each census to the nearest absence. This detailed breakdown is followed by a composite interval diagram of the familiar type encountered in figures 4.2 through 4.13 above. It should be clear that we have arrived at the composite form of the interval diagram by following the fundamentals, the composite is made up of the shorter of each census's intervals. The result is correct, the composite constructed in Figure 4.14 is identical to the one shown previously in Figure 4.13. It had better be, or else the interpolations of Figure 4.13 would be in conflict with the fundamental interpolation rules.

Figure 4.14. A Closer Look at Intervals

                  CENSUS rows from group 1
    CENSUS:       C       C                   A   C   A

     Day 1        Intervals by presence and absence
  Presence:       X---|   X
   Absence:       X-------------|             O

     Day 3        Intervals by presence and absence
  Presence:       X   |---X-----------|           X
   Absence:               X---------|         O

     Day 9        Intervals by presence and absence
  Presence:               X           |-----------X
   Absence:                                   O |-X-| O

                  Combining the shorter intervals
  Interval:       X---|---X---------|         O |-X-|

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
x Known present in different group
O Known absent in same group
- Inside of interval
| Interval endpoint
              

The intervals in Figure 4.14 did not have to be grouped by censused day, they could have been grouped by Presence and Absence or any other way. For each set of locating censuses we can always split out the halfway to census intervals from the halfway to absence intervals, group them any way we like, and later use the interpolation fundamentals to recombine them, without affecting the result. This has not been necessary so far, but it is essential if we are to correctly interpolate when an individual moves between groups, as above in Figure 4.12: “An Individual is Censused in 2 Groups”. We must return to the fundamentals to make sense of interpolation. Rather than trying to combine the results of interpolating the groups separately, as was done in Figure 4.13: “Interpolating Each Group Separately”, instead combine the results of interpolating the presences in all the groups with separate interpolations of the absences in each group. Each time a census finds an individual in a group, separately compute both the interval halfway to the nearest census that finds the individual in any group and the interval halfway to the nearest absence from the particular group being censused.[171]In Figure 4.15, this method is applied to the data first seen in Figure 4.12. For clarity the intervals surrounding the censuses that belong to one group are shown separately from those belonging to the other group.[172] The lines labeled Presence show the intervals that are halfway from each census to the nearest census that finds the individual in any group. The lines labeled Absence show the intervals that are halfway from each census to the nearest absence in the same group. Censuses with no neighboring absence do not have this latter sort of interval shown.[173]

Figure 4.15. Presence and Absence Interpolated Separately

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C

   Group 1        The intervals of group 1's censuses
  Presence:       X---|---X-----|     x     |-----X-| x
   Absence:               X---------|         O |-X-| O

   Group 2        The intervals of group 2's censuses
  Presence:       x       x     |-----X-----|     x |-X
   Absence:       O         |---------X

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
x Known present in different group
O Known absent in same group
- Inside of interval
| Interval endpoint
              

Figure 4.16 shows how interpolation combines the presence and absence intervals by choosing the shorter of the two to as the period during which the individual is assumed to be in the group where censused. The line labeled Used contains the shorter of each census's two intervals.[174]

Figure 4.16. Combining Presence and Absence Intervals

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C

   Group 1        The intervals of group 1's censuses
  Presence:       X---|---X-----|     x     |-----X-| x
   Absence:               X---------|         O |-X-| O
      Used:       X---|---X-----|               |-X-|
  In Group:       1   1   1   1   ?   ?   ?   ?   1   ?

   Group 2        The intervals of group 2's censuses
  Presence:       x       x     |-----X-----|     x |-X
   Absence:       O         |---------X
      Used:                     |-----X-----|       |-X
  In Group:       ?   ?   ?   ?   2   2   2   ?   ?   2

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
x Known present in different group
O Known absent in same group
- Inside of interval
| Interval endpoint
              

Having interpolated the intervals surrounding each census, determining the final group membership is a straightforward matter of placing the individual in the unknown group when there's no where else to put him. Figure 4.17 shows this process. All that remains is to compute the Interp values in the usual fashion, by ignoring absences and counting distance from the nearest census. In Figure 4.17 the intervals between locating census are shown, labeled For Interp, to support the Interp values given.

Figure 4.17. Group Membership Given Multiple Groups

                  One individual's census records
   Group 1:       C       C                   A   C   A
   Group 2:       A                   C               C

   Group 1        The intervals of group 1's censuses
      Used:       X---|---X-----|               |-X-|
  In Group:       1   1   1   1   ?   ?   ?   ?   1   ?

   Group 2        The intervals of group 2's censuses
      Used:                     |-----X-----|       |-X
  In Group:       ?   ?   ?   ?   2   2   2   ?   ?   2

                  Intervals between locating censuses
For Interp:       X~~~|~~~X~~~~~|~~~~~X~~~~~|~~~~~X~|~X

   MEMBERS.
     Group:       1   1   1   1   2   2   2   9   1   2
    Interp:       0   1   0   1   1   0   1   1   0   0
    Origin:       C   I   C   I   I   C   I   I   C   C

      Date:       1   2   3   4   5   6   7   8   9  10

Key:
C Censused present
A Censused absent
X Known present in same group
- Presumed present
~ Inside of interval
| Interval endpoint
              

By now it should be clear that interpolation[175] is a function over CENSUS row sets. It is a function, for every input you get exactly one output. It takes sets of CENSUS rows as input. Because sets are unordered you can put CENSUS rows into the database in any order and the result will be the same. And, because it is a function, you can re-interpolate the same CENSUS rows as many times as desired without altering the final result.

It should also be clear why interpolation always chooses to use the shorter interval, and why this always produces the correct result. The shorter interval is short for a reason, there is some reason to believe the individual is not in the group else-wise the interval would be longer. Further, every time the shorter interval is chosen a possible overlap with another interval from a different locating census is eliminated. By always choosing the shorter interval interpolation insures that the interpolation of any two locating censuses will not conflict.

Pre-Analyzed Data Disturbs Interpolation

In addition to that most important distinction which classifies CENSUS rows into absent and locating censuses there is a second distinction which further divides locating censuses into those which interpolate and those which do not. Those CENSUS rows that record observational data are interpolating censuses; those with Status values of C, D and, M.[176] (All of the previous examples have concerned CENSUS rows of this type.) The remaining CENSUS.Status values indicate that the CENSUS row is the result of analysis, all of the old style, that is historical, CENSUS.Status values and the N manual Status value. These are the non-interpolating censuses.

This further division of locating censuses into interpolating and non-interpolating, the division between raw and already analyzed data, leads to the final refinement to the interpolation procedure. We do not want interpolation to produce re-analyzed results from already analyzed data. Interpolation occurs only between regular, that is to say interpolating, censuses (and to the birth date as a special case). Non-interpolating census rows are copied directly from CENSUS to MEMBERS, CENSUS.Status becomes MEMBERS.Origin, and Interp is set to NULL. When a non-interpolating census is found on the birth date, the birth date will not interpolate.

Interpolation looks at regular census rows and attempts to guess the individual's location on those days when there are no observations. It does so by looking at the intervals between the regular censuses. Finding non-interpolating CENSUS rows, that is to say already analyzed data, on one of these intervals breaks the assumptions interpolation uses in its guessing. The previously analyzed data point could be there for any reason at all, and there's no point in pretending it's not there either. What interpolation does is give up. It interpolates up to the offending data point and then stops.[177] After that it still creates rows in MEMBERS, but it does not attempt to make guesses about where to place an individual or what the interpolated row means.

Note

This situation is not expected to occur, or, rather, whenever there are non-interpolating CENSUS rows between interpolating censuses, the non-interpolating CENSUS rows are expected to be contiguous over the entire interval between the interpolating censuses. So, the expected cases are the trivial degenerate ones. None the less, such situations probably do occur in the existent data. It would probably best to either require the expected behavior, or to get rid of all the pre-analyzed CENSUS rows and replace them with raw data. Especially given the design problems pointed out below.

Regardless, non-trivial examples are presented here so that a complete understanding of interpolation can be developed.

Figure 4.18 shows that the 3 fundamental interpolation intervals are shortened when a non-interpolating census is found between interpolating censuses. The intervals for each locating census are examined separately. The non-interpolating census has no interpolation intervals. The intervals of the interpolating censuses are truncated, reduced to the interval between the interpolating and non-interpolating censuses. By this means a portion of the diagram, days 4 and 5, are blocked from interpolating into the group. If there were no N census, the Absence interval would be day 1's shortest interval, and days 4 and 5 (as well as day 3) would interpolate into the group. (Notice that day 1's Absence interval has a midpoint day, day 5, and that it would have been included in the interval.) Interpolation is prevented from placing individuals in the group of their interpolating census on the far side of non-interpolating censuses.

Figure 4.18.  Pre-Analyzed Data Truncates Interpolation Intervals

               CENSUS rows from group 1
    CENSUS:    C       N                       A           C

     Day 1     Intervals per fundamental type
  Presence:    X-----| N                                   X
   Absence:    X-----| N                       O
14 Day Lim:    X-----| N

     Day 3     Intervals per fundamental type
  Presence:            N
   Absence:            N
14 Day Lim:            N

    Day 12     Intervals per fundamental type
  Presence:    X       N             |---------------------X
   Absence:            N                       O     |-----X
14 Day Lim:            N |---------------------------------X

Julian Day:    1   2   3   4   5   6   7   8   9  10  11  12

Key:
C Censused present in group (group 1)
N Manual entry,
    present in group but non-interpolating (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Inside of interval
| Interval endpoint
              

In Figure 4.19 the shortest intervals of each locating census have been chosen and combined; the result is the line labeled For Group. This is then used to determine group membership.

The interesting part of Figure 4.19 is the computation of the Interp values. The halfway to census intervals of Figure 4.18 have been combined and labeled For Interp. Recall that it is these intervals that are used to compute the Interp values. The N census has created a gap in interpolation, clearly shown on the For Interp line as running from day 3 through day 6. Over this interval interpolation's assumptions have been violated and it does not know what to do. The group membership is easy. On day 3, the day of the N census it can simply copy the CENSUS row's Grp and Status into the appropriate MEMBERS columns in the same fashion it would for any other locating census. On days 4 through 6 it can do what it usually does with group membership when it does not know where to locate an individual, it places the individual in the unknown group with a Origin of I. On days 3 through 6 interpolation has no way of knowing how far away the day is from the nearest locating census, which is what is supposed to go in the Interp column. Due to this lack of information it assigns the Interp column a value of NULL, no data, on this interval.

Figure 4.19.  Pre-Analyzed Data Interrupts Interpolation

               An individual is censused
    CENSUS:    C       N                       A           C
 Intervals
 For Group:    X-----| N                       O     |-----X
For Interp:    X~~~~~|               |~~~~~~~~~~~~~~~~~~~~~X
   MEMBERS.
     Group:    1   1   1   9   9   9   9   9   9   9   1   1
    Interp:    0   1                   5   4   3   2   1   0
    Origin:    C   I   N   I   I   I   I   I   I   I   I   C

      Date:    1   2   3   4   5   6   7   8   9  10  11  12

Key:
C Censused present in group (group 1)
N Manual entry,
    present in group but non-interpolating (group 1)
A Censused absent in group (group 1)
X Known present in group (group 1)
O Known absent in group (group 1)
- Presumed in group (group 1)
~ Inside of interval
| Interval endpoint
              

When looking at Figure 4.19, one way to explain what happens to Interp is to say that it is fixed at NULL over that portion of the day 1 census's halfway to census interval that was truncated because the N row showed up. (See Figure 4.18.) Effectively, as MEMBERS Interp counts up with increasing distance from the interpolating census, the count is fixed at NULL upon encountering a non-interpolating census until the point is reached at which counting back down to the next interpolating census begins, at which point the count downward resumes as though never interrupted.[179]

The approach interpolation takes, in some sense, attempts to minimize the disturbance created when already analyzed census data are mixed in with raw census information. However, as can be seen in Figure 4.19, it is not entirely successful. Although day 7, for example, has an Interp value indicating it is 5 days away from a census, it is really 4 days away from the N census. If the N CENSUS does really represent a census, then day 7's Interp value is wrong. And the problems are not restricted to Interp values. Is it really true that days 4 and 5 should be assigned to the unknown group? If so then why aren't there N rows that say so? Day 2 is even more disturbing. There is no diagram for this, but suppose the N census found the individual in a different group. Figure 4.18 would be unchanged, all of day 1's intervals would be truncated at the N census. The effect would be more clear if the interval between the preceding C census and the following N census were larger, but consider that day 2, by the midpoint rule, would be assigned to the N census. That means that if the N census really does represent a census in a different group, that day 2 should be assigned to that group, not to group 1.

Note that, in the general case, even though the halfway to census interval does not determine group membership (all the intervals are truncated, leaving a gap in which interpolation defaults to the unknown group), whether this interval has a midpoint day, and if so where it falls, does matter to the computation of Interp. If the midpoint day happens to fall into the side of the interval containing the non-interpolating census then the Interp value will be NULL. Otherwise, it will have a value representing the number of days to the nearest locating, and interpolating, census.

Incorporating the above safety checks into the rules we already have, ensuring that data are not re-analyzed, produces the actual interpolation rules.

The Interpolation Rules

Using these rules interpolation creates rows in MEMBERS based on the information it finds in CENSUS, and the BIOGRAPH columns Birth, Matgrp, Statdate and Status.

  1. CENSUS Rows Are Either Absences, Interpolating, or Non-Interpolating

    Interpolation partitions all CENSUS rows into one of 3 categories:

    1. Absences

      CENSUS rows which indicate absence from a group.

    2. Interpolating censuses

      Those CENSUS rows that record observational data are interpolating censuses; those with Status values of C, D and, M.

    3. Non-interpolating censuses

      The remaining CENSUS.Status values indicate the CENSUS row is the result of analysis. These rows, all of the old style, that is historical, CENSUS.Status values and the N manual Status value, are not re-analyzed and so do not interpolate.

    For convenience, the CENSUS rows that are not absences, the interpolating and the non-interpolating censuses, are termed locating censuses.

  2. Censusing Assigns Group Membership

    On those days when an individual is censused in a group, when there is a locating CENSUS row, a row is created in MEMBERS to place that individual in the group on the given day. The Origin value is the CENSUS row's Status value. When the CENSUS row is interpolating the Interp value is 0. When the CENSUS row is non-interpolating the Interp value is NULL.

  3. The 3 Interpolation Intervals

    Interpolation places an individual in the group into which he is censused, the Grp of an interpolating CENSUS row (Status values C, D, and M), on the days to either side of the census being interpolated for a time period that is the shorter of:

    1. The Halfway to Census Interval

      Half of the time interval between the individual's next (or prior) locating and interpolating census, which may locate the individual in any group.

    2. The Halfway to Absence Interval

      Half of the time interval between the next (or prior) recorded absence, considering only absences from the same group in which the individual was censused. Absences from other groups are ignored.

    3. The 14 day Interpolation Limit

      Given no other information, an individual is considered to remain (or have been) in the group where observed for 14 days following (or preceding) the date of observation.

    The resulting MEMBERS rows have an Origin of I and an Interp value of the number of days difference between the MEMBERS row's Date and the date of the nearest locating census; Interp values count up over the The Halfway to Census Interval as the distance from the interpolated census increases. An interpolated MEMBERS row falling on the day after a census has an Interp of 1, the day after that the Interp is 2, and so forth, assuming, of course, the individual has no other nearby CENSUS rows.

  4. The Midpoint Rule

    This rule qualifies how interpolation assigns the halfway point between two CENSUS rows in The Halfway to Census Interval and The Halfway to Absence Intervals, above, when the number of days in the interval cannot be divided into equal halves. Whenever interpolation is called upon to halve an interval between two CENSUS rows that contains an odd number of days then the midpoint day is assigned to the left, earlier, half of the interval when the Julian date of the midpoint day is even. A midpoint day is assigned to the right, later, half of the interval when the Julian date of the midpoint day is odd.

  5. Births Locate Individuals

    This rule declares a live birth to be the equivalent of an interpolating census, one that indicates presence in the individual's Matgrp. fetal losses, individuals with NULL Snames, are not considered births and are never interpolated. An individual is placed in his Matgrp on his birth date even when a regular census has an absence recorded for the individual on the date of birth. In this case interpolation always entirely ignores the absence and will not use such an absence to compute a Halfway To Absence Interval.

    When there is a locating census on the birth date, the MEMBERS row interpolation creates is like that made for any other locating census with the given Status. But, when there is no locating census on the birth date the resulting MEMBERS row has a Origin of I (and an Interp of 0 as any census with a Status of C would have.) Aside from their I Origin value, births interpolate as would any CENSUS with a C Status.

  6. No Data Implies Unknown Group Membership

    On days when none of the above rules serve to place an individual in a group, the individual is placed in the unknown group. The resulting MEMBERS rows have an Origin of I and an Interp value of the number of days difference between the MEMBERS row's Date and the date of individual's nearest interpolating census.[180]

  7. Birth stops interpolation

    Interpolation will not place a row in MEMBERS before an individual's Birth date.

  8. Death stops interpolation

    When an individual is dead, interpolation will not place a row after the individual's Statdate.

  9. Data Entry Cessation Stops Interpolation of Living Individuals.

    When an individual is alive, interpolation will create rows after the individual's last locating census only when there are subsequent absences; absences, that is, from the group in which the individual was censused.[181] In this case, unlike above, no data does not imply unknown group membership; such rows are created only so long as the individual is interpolated into the group of his last locating census. When a living individual has no absences after their last locating census, absences from the group of their last locating census, interpolation assumes that there is further data available which has yet to be entered and interpolation stops at the last locating census.

  10. Data are not Re-Analyzed

    Interpolation is only done to regular, that is interpolating, CENSUS rows; data that were collected in the field. Other data, the non-interpolating census rows that represent the result of prior analysis, do not interpolate; they are copied directly from CENSUS to MEMBERS, CENSUS.Status becomes MEMBERS.Origin and Interp is set to 0. Further, when a non-interpolating census is found on one of The 3 Interpolation Intervals the interval is shortened enough that the non-interpolating census is no longer on the interval. When a non-interpolating census is found on a birth date, the birth date does not interpolate.

    The MEMBERS Interp column is fixed at NULL on the interval from the non-interpolating census row through the midpoint end of The Halfway to Census Interval, endpoints included.[182] Here we are speaking of The Halfway to Census Interval as computed, not a Halfway to Census Interval shortened in the preceding paragraph.

Expectations and Implications

It is expected that all non-interpolating CENSUS rows, that is to say CENSUS rows produced by prior analysis, will be clustered in contiguous intervals with regular census rows at the endpoints. This is particularly expected of old style census rows from before Babase, as they precede all regular census data, but is also expected of the N non-interpolating, manual, Status code, should it ever be used. If these expectations are born out, the Data are not Re-Analyzed rule will never be invoked.

There are some not-quite-obvious implications given these interpolation rules:

The Sexual Cycle Day-By-Day Tables

These tables all record females' sexual cycle states on a day-by-day basis, and provide daily measures of the number of days each female has been in and will remain in the given state. REPSTATS provides the broad overview and the remainder of the tables supply detail on the days REPSTATS indicates that the females are cycling. The day-by-day nature of these tables makes it easy to correlate reproductive cycle information with other events.

CYCGAPDAYS is something of an exception, in that it records days during which females are not under observation (according to a very specific definition.) It is included in this section because it exists to aid reproductive state tracking.

CYCGAPDAYS (Day-by-day Periods of No Observation)

A day-by-day record indicating which days a female is not under observation. The definition of not under observation is that of CYCGAPS, see that table for more information. Contains one row per female per day during which the female is not under regular, continuous, observation.

Caution

Because the CYCGAPDAYS table primary purpose is to support the Babase system in it's validation and automatic analysis of the sexual cycle data an individual's last CYCGAPDAYS date is after the the BIOGRAPH.Statdate, should observation of the individual cease and not resume. This allows for easy determination of where there are gaps in observation and where automatic Mdates, which may occur after the individual's Statdate, must be generated.

This table is automatically constructed from the CYCGAPS table. It may not be manually maintained.

Cgdid (CycGapDays ID)

An integer uniquely identifying the row. This column must not be NULL.

The female that is not observed. The three-letter code that identifies the individual's row in the BIOGRAPH table. There will always be a row in BIOGRAPH for the individual identified here.

This column may not be NULL.

Date

The date the female was not observed. This column must not be NULL.

CYCSTATS (Female Fertility Cycle States)

A day-by-day record of the details of the females' cycles -- whether menses-follicular, swelling-follicular, ovulating, or luteal. Contains one row per female per day, for those days in REPSTATS for which the REPSTATS State is C (cycling.)

A female has rows in CYCSTATS whenever cycling; there are no CYCSTATS rows when a female is pregnant or lactating. Likewise there are no CYCSTATS rows when there are gaps in the observational record. (See CYCGAPS.) See the REPSTATS table for further detail as to exactly when a female is considered to be cycling, and for important cautions. See the description of the Din and Dr columns below for further information on how sexual cycles are recorded when there missing sexual cycle transition markers due to cessation of observation.

Caution

REPSTATS may show a female to be cycling even when there are no rows in CYCSTATS for the dates in question. This occurs when there are no CYCPOINTS during a period of observation. This can only occur for females without a MATUREDATES.Mstatus of O when observation ceases before the first observed sexual cycle transition event.

The system will issue a warning when REPSTATS indicates a female is cycling but there is no row in CYCSTATS for the day in question.

Caution

Females may become turgesent (have a Tdate) on the day they are in menses (Mdate). As CYCSTATS has a 1 day resolution and, essentially, these females are in menses for less than a day, when this happens CYCSTATS will not show any days in menses (State is M) for these cycles even though the cycle has an Mdate row in CYCPOINTS.

Similarly, when there are less than 6 days between an Mdate and the following Ddate a cycle will have no days in the swelling-follicular state (State is S).

Caution

When the last date of a S (Swelling-follicular) cycle state is not known[184], that is, a cycle has no Ddate due to cessation of observation, death, delay in data entry, or whatever other reason, two problems arise that will, unless accounted for, adversely affect sexual cycling analysis. First, the O (Ovulating) state will not occur because the transition between S and the O state is determined by the following Ddate[185], which does not exist. Second, because the O state cannot be calculated, the S state may be erroneously lengthy; days when the female is actually in the ovulating state may be marked with a S rather than an O and these rows will have an incorrect Din (days into state) values.

Rather than omit the accurate S rows along with the inaccurate, the Babase designers chose to include all available data to accommodate those analysis that do not distinguish between the S (Swelling-follicular) state and the O (Ovulating) state. The Babase user is expected to know the conditions under which various data may be used.

Note

In the case of an individual that has ceased cycling due to pathology or old age, and whose last cycle did not end in pregnancy, the final CYCSTATS rows will have a State of D and an unusually long duration, with the individual's date of death being the last day of the cycle.

The sum of Dins and Dr is always the total number of days the cycle spent in the state.

Warning

Babase does not populate this table automatically, although we would like it to do so. The rebuild_all_cycstats() or rebuild_cycstats() programs must be manually executed to ensure the content of this table corresponds with that of the rest of the database.

Users cannot directly manipulate the table's data.

Csid (CYCSTATS Id)

A unique number that serves to identify the row.

Date

The row records a female's reproductive cycle state on this day.

Sname

The Sname uniquely identifying the female whose reproductive state is recorded. (See BIOGRAPH.)

State

Categorizes the period within the reproductive cycle. Legal values are:

CYCSTATS.State Values
CodeMnemonicDescription
MMenses-follicularthe Mdate (onset of menses) to the day before the Tdate (turgesence onset) (inclusive of endpoints)
SSwelling-follicularthe Tdate through 6 days before the Ddate (deturgesence onset) (inclusive of endpoints)
OOvulatingfrom 5 days before the Ddate through one day before the Ddate (inclusive of endpoints)
DDeturgesenceluteal -- from the Ddate through the day before the Mdate (inclusive of endpoints)

Dins (Days INto State, NULL allowed)

The number of days since the state started. The first day of the state has a value of 1, the next a value of 2, etc.

This column is NULL when the system cannot determine when the state began. This happens when the cycle's starting date occurs during a period when the individual is not under regular observation. (See CYCGAPS.)

Dr (Days Remaining in state, NULL allowed)

The number of days remaining in the state. The last day of the state has a value of 0, the next to last day a value of 1, etc.

This column is NULL when the system cannot determine when the state ends. This occurs when the end of the cycle was not observed, either because the individual is alive and additional observations have not yet been entered into Babase or due to cessation of regular observation. (See CYCGAPS.) It also occurs when the individual dies while cycling as it is not known when the state would have ended.

Cpids (sexual Cycle data Points IDentifer, Starting) (May be NULL)

The Cpid of the CYCPOINTS row recording the sexual cycle transition event that started the state. NULL when there is no such row. See REPSTATS.Dins for further detail.

The Cpids value of CYCSTATS rows with a State of O (Ovulating) reference a Tdate (Code of T) CYCPOINTS row, even though the Tdate is not (usually) the first ovulation date. This is because the Tdate, if it exists, if the Cpids is not NULL, is the sexual cycle transition event which precedes the ovulation. The Dins column should be subtracted from the Date column to find the first day of ovulation.

Cpide (sexual Cycle data Points IDentifer, Ending) (May be NULL)

The Cpid of the CYCPOINTS row recording the sexual cycle transition event that ended the state. NULL when there is no such row. See REPSTATS.Dr for further detail.

The Cpide value of CYCSTATS rows with a State of S (Swelling-follicular) reference a Ddate (Code of D) CYCPOINTS row, even though the Ddate is not the day after the last day of the swelling-follicular state. This is because the Ddate, if it exists, if the Cpide is not NULL, is the sexual cycle transition event which follows the swelling-follicular state. The Dr column should be added to the Date column to find the last day of the swelling-follicular state.

MDINTERVALS (Mdate to Ddate Intervals)

A day-by-day record of the number of days since the previous Mdate/until the next Ddate. Contains one row per female per day, for those days in REPSTATS for which the REPSTATS State is C (cycling), for those days between the cycle's Mdate and Ddate, inclusive of the Mdate but exclusive of the Ddate. This table contains rows whenever there are rows on CYCSTATS. See the CYCSTATS documentation for further details and the REPSTATS documentation for details and cautions.

When there is no prior Mdate, due to pregnancy, menarche, or resumption of observation, the Dini column is NULL. However, the corresponding row in the REPSTATS table contains what may be a relevant Din value.

Note

In the case of an individual that has ceased cycling due to pathology or old age, that individual's final Mdate to Ddate interval will have a long duration, with the individual's date of death being the last day of the interval.

The sum of Dini and Dr is always the total number of days counting[186]from the cycle's Mdate up to[187] its Ddate.

Warning

Babase does not populate this table automatically, although we would like it to do so. The rebuild_all_mdintervals() or rebuild_mdintervals() programs must be manually executed to ensure the content of this table corresponds with that of the rest of the database.

Users cannot directly manipulate the table's data.

Mdiid (Mdate to Ddate Interval IDentifier)

A unique number which serves to identify the row.

Date

The row records the number of days until the cycle's Ddate/from the cycle's Mdate relative to this day.

Sname

The Sname uniquely identifying the female. (See BIOGRAPH.)

Dini (Days INto Interval since last Mdate, NULL allowed)

The number of days into the interval. The first day of the interval, the Mdate at the beginning of the interval, has a value of 1, the next day a value of 2, etc.

This column is NULL when there is no Mdate at the beginning of the interval. This occurs when the cycle is the female's first cycle, as there is no menses to begin the cycle, and likewise for the first cycle after pregnancy. The cycle's Mdate is also unknown when it occurs during a period when the individual is not under regular observation. (See CYCGAPS.)

Dr (Days Remaining to next Ddate, NULL allowed)

The number of days remaining in the interval -- days to, but not including, the Ddate that ends the interval. The last day of the interval, the day before the Ddate that ends the interval, has a value of 0, the day before that a value of 1, etc.

This column is NULL when there is no next Ddate, either because the individual is alive and additional observations have not yet been entered into Babase or due to cessation of regular observation. (See CYCGAPS.) It can also occur when an individual dies.

Cpids (sexual Cycle data Points IDentifer, Starting) (May be NULL)

The Cpid of the CYCPOINTS row recording the starting Mdate. NULL when there is no such row, when the interval occurs at the beginning of a period of continuous observation (see CYCGAPS), after a pregnancy, or at menarche.

Cpide (sexual Cycle data Points IDentifer, Ending) (May be NULL)

The Cpid of the CYCPOINTS row recording the ending Ddate. NULL when there is no such row, when the interval occurs at the end of a period of continuous observation (see CYCGAPS) or the point of cessation of data entry.

MMINTERVALS (Mdate to Mdate Intervals)

A day-by-day record of the number of days since the previous/until the next Mdate. Contains one row per female per day, for those days in REPSTATS for which the REPSTATS State is C (cycling). The Mdate-to-Mdate interval includes the Mdate at the beginning of the interval but does not include the Mdate at the end of the interval[188]. This table contains rows whenever there are rows in CYCSTATS. See the CYCSTATS documentation for further details and the REPSTATS documentation for details and cautions.

When there is no previous Mdate, due to pregnancy, menarche, or resumption of observation, the Dini column is NULL. However, the corresponding row in the REPSTATS table contains what may be a relevant Din value.

When there is no subsequent Mdate due to pregnancy, death, interruption of observation, or cessation of data entry, the Dr value is NULL. When there is no subsequent Mdate due to pregnancy what may be a relevant Dr value can be found in the REPSTATS table.

Note

In the case of an individual that has ceased cycling due to pathology or old age, that individual's final Mdate to Mdate interval will have a long duration, with the individual's date of death being the last day of the interval.

The sum of Dini and Dr is always the total number of days between Mdates.

Warning

Babase does not populate this table automatically, although we would like it to do so. The rebuild_all_mmintervals() or rebuild_mmintervals() programs must be manually executed to ensure the content of this table corresponds with that of the rest of the database.

Users cannot directly manipulate the table's data.

Mmiid (Mdate to Mdate Interval IDentifier)

A unique number that serves to identify the row.

Date

The row records the number of days until/from the nearest Mdates relative to this day.

Sname

The Sname uniquely identifying the female. (See BIOGRAPH.)

Dini (Days INto Interval since last Mdate, NULL allowed)

The number of days into the interval. The first day of the interval, the Mdate at the beginning of the interval, has a value of 1, the next day a value of 2, etc.

This column is NULL when there is no Mdate at the beginning of the interval. This occurs when the cycle is the female's first cycle, as there is no menses to begin the cycle, and likewise for the first cycle after pregnancy. The cycle's Mdate is also unknown when it occurs during a period when the individual is not under regular observation. (See CYCGAPS.)

Dr (Days Remaining to next Mdate, NULL allowed)

The number of days remaining in the interval -- days until the Mdate which follows the interval[189]. The last day of the interval, the day before a Mdate that comprises the end of the interval, has a value of 0, the day before that a value of 1, etc.

This column is NULL when there is no next Mdate, either because the individual is alive and additional observations have not yet been entered into Babase or due to cessation of regular observation. (See CYCGAPS.) It can also occur when an individual dies while cycling as it is not known when the state would have ended.

Cpids (sexual Cycle data Points IDentifer, Starting) (May be NULL)

The Cpid of the CYCPOINTS row recording the earlier Mdate. NULL when there is no such row, when the interval occurs at the beginning of a period of continuous observation (see CYCGAPS), after a pregnancy, or at menarche.

Cpide (sexual Cycle data Points IDentifer, Ending) (May be NULL)

The Cpid of the CYCPOINTS row recording the later Mdate. NULL when there is no such row, when the interval occurs at the end of a period of continuous observation (see CYCGAPS) or ends in pregnancy.

REPSTATS (Female Reproductive States)

A day-by-day record indicating whether a female is pregnant, lactating, or cycling. Contains one row per female per day for every day during intervals of continuous observation from date of menarche through date of death (inclusive). When menarche is unobserved then REPSTATS rows begin on a beginning of observation date.[190] Likewise, the cessation or resumption of observation interrupts or resumes the contiguous series of the female's REPSTATS' dates. (See CYCGAPS.) While the individual is alive[191], and under observation, the last date is either the BIOGRAPH.Statdate or the last recorded sexual cycle endpoint, which ever is later. When the individual is not alive, but was under observation until death, the last date is the female's Statdate.

Warning

Because Babase generates REPSTATS rows ending, at minimum, with females' Statdates, the data entry staff should enter sexual cycle information (CYCPOINTS and CYCGAPS) for a time interval before entering demographic information (CENSUS, BIOGRAPH Statdate and Status) for that interval, otherwise Babase may continue a particular reproductive state to the Statdate when there are reproductive data to the contrary yet to be entered.

Caution

Babase assumes individuals are under continuous observation. If there is no record of a gap in observation (see CYCGAPS), the entire interval between the onset of menarche (Matured) and the first recorded sexual cycling event (CYCPOINTS) is included in the individual's first reproductive state interval in REPSTATS and possibly in CYCSTATS, MMINTERVALS, and MDINTERVALS as well.

Note

Because of gaps in the observational record, some sexual cycles may not be recorded, or may be partially recorded. In these cases the Dins and Dr columns are NULL. (See below.)

The sum of Dins and Dr is always the total number of days spent in the state.[192]

Warning

Babase does not populate this table automatically, although we would like it to do so. The rebuild_all_repstats() or rebuild_repstats() programs must be manually executed to ensure the content of this table corresponds with that of the rest of the database.

Users cannot directly manipulate the table's data.

See CYCSTATS, MMINTERVALS, and MDINTERVALS for more fertility detail.

Rid

A unique number that serves to identify the row.

Date

The row records a female's reproductive state on this day.

Sname

The Sname uniquely identifying the female whose reproductive state is recorded. (See BIOGRAPH.)

State (reproductive State)

General reproductive state of the female on the given Date. The legal values are:

REPSTATS.State values
CodeMnemonicDescription
CCyclingFrom (including) the Tdate (turgesence onset) up to (but not including) the Ddate of the onset of pregnancy.
PPregnantFrom (including) the Ddate (deturgesence onset) up to (but not including) the end-of-pregnancy date, i.e., the date that the female experiences an infant birth, experiences a spontaneous abortion, or dies.
LLactatingPostpartum amenorrhea. From (including) the end-of-pregnancy date to (but not including) the next Tdate.

Caution

The above definition of pregnant means that on the conception date the mother is in a pregnant state, even though the conception date is a Ddate and the Ddate has a cycle (a Cid on CYCPOINTS).

Note

REPSTATS does not keep track of whether a female's cycles are normal; it simply forces females into one of these three states. Individuals who have ceased cycling or have irregular cycles due to pathology or old age have a state of C, or possibly L if the last cycle resulted in a pregnancy.

Any of the above states may start late or end early in the event of gaps in observation. (See CYCGAPS.)

Dins (Days INto State, NULL allowed)

The number of days since the state started. The first day of the state has a value of 1, the next a value of 2, etc.

This column is NULL when the system cannot determine when the state began. This occurs when the beginning of the reproductive state occurs during a period when the individual is not under regular observation (see CYCGAPS) or when an individual's sexual maturity date is not also a Tdate (see MATUREDATES).

Dr (Days Remaining, NULL allowed)

The number of days remaining in the state. The last day of the state has a value of 0, the next to last day a value of 1, etc.

This column is NULL when the system cannot determine when the state ends. This occurs when the end of the reproductive state was not observed, either because the individual is alive and additional observations have not yet been entered into Babase, or due to cessation of regular observation. (See CYCGAPS.) It also occurs when the individual dies, as it is not known when the state would have ended.

Pid (Pregnancy Identifier, NULL allowed)

The Pid of the pregnancy associated with the state. This value must be present when the State is P (Pregnant) or L (Lactating). There is also a Pid value for those C (Cycling) states that end in pregnancy; this will apply to the majority of the C states, as the only other way to exit the C state is death or cessation of observation.

Sexual Cycle Determination

Sexual cycles (CYCLES) are defined by Mdate, Ddate, and Tdate sexual cycle transition events. CYCLES should be created and destroyed in correspondence with Mdate, Tdates, and Ddates. But Babase contains other information related to sexual cycles, most obviously sexskin swelling This section describes how this information is related to specific sexual cycles.[193]

Note

The determination of when a new sexual cycle starts is, because by definition a cycle is a periodicity with no start and no end, arbitrary[194], as then is the determination of which cycle to associate various data with. The method used by Babase was chosen for its simplicity and its ability to be consistently applied to all sorts of cycle related data. It may lead to what may be non-intuitive results. As with all things Babase, users must take care to familiarize themselves with the intricacies[195] of the system, and the data.

Babase uses the date of the measurement, of whatever data, sexskin swelling, PCS color, etc., to determine which sexual cycle the measurement should be associated with. Dates are assigned to cycles by virtue of falling in the interval each cycle spans, each cycle starting with an Mdate and continuing through the day before the next Mdate; although cycles can be cut off by cessation, or initiation, of observation. The following method implements these policies and can be used as a guide when there are questions as to the specifics:[196]

Relate the measurement to the cycle of the Mdate, Tdate, or Ddate that falls on the date of the measurement or is the latest Mdate, Tdate, or Ddate preceding the measurement, so long as there is no gap in observation between the measurement date and Mdate, Tdate, or Ddate. If there is no such Mdate, Tdate, or Ddate due to gaps in observation or simple lack of data then relate the measurement to the cycle of the earliest Tdate or Ddate that follows the measurement but is not separated from the measurement by a gap in observation or an intervening Mdate. If there is no such Tdate or Ddate then the measurement may not be recorded in Babase.

Warning

Because there are conditions under which sexual cycle related data may not be recorded in Babase, and, as a rule, Babase does not automatically delete data, Babase will not permit some orderings of data maintenance operations. For example, Babase will not allow a gap in observation to be inserted after a female's last Ddate but before her last sexual swelling date because this would require removal of the sexual swelling information. An alternate ordering of the operations resulting in identical database content is required. In the above example either the sexual swelling data must be deleted or subsequent Mdates, Tdates, or Ddates must be entered before the gap in observation may be entered.

Automatic Sequencing

Note

This section describes how Babase automatically re-computes the sequence numbers used within various tables to give a timewise ordering to rows that would not otherwise have an ordering. The columns that hold the sequence values have names that vary by table. The following description uses the generic column name of Seq when referring to the name of the column that holds the sequential numbering.

The system automatically re-computes Seq values to ensure that they are contiguous and begin with 1. Seq may be NULL when the row is first inserted, in which case the system will automatically assign the next available sequence number. Changing a sequence number to match one that already exists (for, e.g., a given darting), or inserting a new row having a sequence number equal to that of an existing row (for, e.g., a given darting) causes the sequence number of the unchanged row to be incremented and the recomputation of subsequent sequence values. E.g. starting with rows A, B, C, and D having Seq values of 1, 2, 3, and 4 respectively, changing the Seq value of row D to 2 automatically changes the Seq values of rows B and C, increasing them by one. The result is that the new ordering of the rows by sequence number becomes: A, D, C, B. Deleting a row recomputes the sequence numbers of the remaining rows in a corresponding fashion.

Caution

Updating a row to increment the sequence value by 1 will do nothing[197]. Performing such an operation creates a gap in the sequence which is then filled by decrementing the sequence numbers of all the rows above the gap, including the row that the original update incremented.

Likewise, updating the Seq column in a way that assigns Seq numbers past the end of the sequence results not in the user-specified Seq values but rather in Seq values that are re-computed so as to maintain contiguity.

Warning

A single UPDATE statement that relies on automatic resequencing to eliminate more than one duplicate Seq (per, e.g., a given darting) produces indeterminate results.[198] For example given rows A, B, C, and D, with Seq values of 1, 2, 3, and 4 respectively. One UPDATE statement that changes the Seq of A to 3 and B to 4 will result in an indeterminate ordering.[199]

The system will report an error when the Seq values of inserted rows would create non-contiguous Seq values or a sequence that does not begin with 1.[200]

Automatic Mdate Generation

CYCPOINTS is special in that the presence of a Ddate row can trigger the automatic generation of a Mdate row 13 days later. Automatically generated Mdates are distinguished by having a CYCPOINTS.Source of A. As Ddate rows are inserted, updated, or deleted Babase makes appropriate changes to ensure that automatically generated Mdate rows exist on the 13th day following a qualified Ddate. The exception is when a Tdate follows a Ddate by less than 13 days (and there are no intervening gaps in observation.) In this case the automatically generated Mdate will have the Tdate's date and be less than 13 days after the previous Ddate.

An Mdate will be generated from a Ddate when all of the following conditions are met:

  • Either there is no Mdate in the cycle following the Ddate's cycle or there is a gap in observation between the Ddate's cycle and the following cycle.

  • The Ddate is not the start of a pregnancy, its Cpid does not appear as a Conceive value on the PREGS table.

  • Observation proceeds without a gap for at least 13 days following the Ddate, or up to the Tdate immediately following the Ddate, which ever comes first,

  • The Ddate is not estimated. (Source is not E)

  • The individual is alive (BIOGRAPH.Status is 0) on the automatic Mdate.[201]

A Mdate automatically generated from a Ddate will be removed when any of the above conditions are no longer met, or when another Mdate is automatically generated for the Ddate.[202]More precisely, it is not a Mdate automatically generated from a Ddate that will be removed but rather any Mdate will be removed that has a Source of A, and that post-dates the Ddate, and that has no Mdates, Tdates, or Ddates, or periods of no observation (see CYCGAPS) on the interval between the Ddate and the automatic Mdate. Babase cannot distinguish manually entered Mdates with a Source of A from automatically generated Mdates. Therefore it is not just automatically generated Mdates that will be removed.

Automatically assigned Mdates, those with a Cycpoints-Source of A, have NULL Edates and Ldates.



[157] Group membership on the Zdate does not include a male in the set of potential fathers.

[158] Or other basis of analysis.

[159] Including the supergroup itself, as a supergroup is a sub-group of itself.

[160] Note that the requirement that ranks be contiguous means that in order to change an existing ranking the ranks must first be deleted, from highest numbered rank to lowest, and then the new ranking re-created, from lowest numbered rank to highest.

[161] Usually the olive baboon, Papio anubis.

[162] At this time only DEMOG, the demography notes table, contributes to CENSUS any information regarding group membership.

[163] Sometimes, when demography information is added into other tables, CENSUS rows are altered rather than removed. Likewise, CENSUS rows are removed (or altered as necessary) when demography infor