Document generated: 2024-08-22 14:16:10.
Copyright Notices
Copyright (C) 2005-2023 Karl O. Pinc, Jeanne Altmann, Susan Alberts, Leah Gerber, Jake Gordon, The Meme Factory, Inc.
Except as otherwise noted permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License.”
Copyright (C) 1996-2011 The PostgreSQL Global Development Group
The appendix titled Database Transactions Explained is Copyright (C) 1996-2011 by the PostgreSQL Global Development Group, distributed under the terms of the license of the University of California below.
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS-IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
March 2, 2005
We gratefully acknowledge the support of the National Science Foundation for the supporting the collection of the majority of the data stored in the database; in the past decade in particular we acknowledge support from IBN 9985910, IBN 0322613, IBN 0322781, BCS 0323553, BCS 0323596, DEB 0846286, DEB 0846532 and DEB 0919200. We are also very grateful for support from the National Institute of Aging (R01AG034513-01 and P01AG031719) and the Princeton Center for the Demography of Aging (P30AG024361). We also thank the Chicago Zoological Society, the Max Planck Institute for Demographic Research, the L.S.B. Leakey Foundation and the National Geographic Society for support at various times over the years. In addition, we thank the National Institute of Aging (R03-AG045459-01) for supporting recent work extending the database to incorporate genetic and genomic data.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, the National Institute of Aging, the Princeton Center for the Demography of Aging, the Chicago Zoological Society, the Max Planck Institute for Demographic Research, the L.S.B. Leakey Foundation, the National Geographic Society, or any other organization which has supplied support for this work.
Table of Contents
List of Figures
List of Tables
List of Examples
foo
in the sandbox
schemafoo
in the sandbox schemafoo
in user
mylogin
's schemadate_trunc()
function to set
seconds to zerodate_mod()
function to
return the minutes and seconds.to_char()
function to
convert times to HH:MM textTable of Contents
This document describes the Babase baboon data management system. This includes a description of the tables, the intended use of all related programs and directories, the design of the system, and procedures for maintaining the data management system itself. This document does not include the procedures actually used to enter data into the system, or the details of how to operate the systems programs. Nor does it include any instructions on the operation or administration of the computer itself. Further information on the topics not covered in this document can be found in the Protocol for Data Management: Amboseli Baboon Project document.
The Protocol for Data Management: Amboseli Baboon Project document is an important adjunct to the Babase system, but it is not considered part of the system itself because it describes the use of the system but not the capabilities of the system. It is important to maintain the distinction between use and capabilities so that when an enhancement is needed, it is clear whether the desired result can be obtained by altering the way the system is used, or whether the system itself needs to be modified. It is also important to provide different types of documentation to those who operate the system from those who manage and maintain the system because each of these two groups do not need to know all the details of the others' work.
Any deviation from the standards described in this document should be discussed with the project directors and may God have mercy on your souls.
This document follows a number of conventions, most of them typographic but some of them stylistic. Some output formats, particularly plain text, have limited typographic capabilities so the various forms of typographic markup are not always distinguishable, either from each other or from the surrounding text.
Each table in Babase is documented in a section of its own, beginning with a description of the table as a whole and continuing with sub-sections for each column in the table. Of particular importance is the sentence that describes what a row in the given table represents. These are summarized in the textual tables given in the Table Overview section.
Interrelationships between the columns of a table, or between tables, is documented at the beginning of the table's section, not in the sub-sections documenting the columns themselves. Although relationships between 2 tables concern both of the tables the description of each such relationship appears only once in this document, in the overall description of one of the of the two tables concerned. On occasion there may be be brief mention elsewhere.
All TABLE NAMES are written in UPPER CASE. Column Names are in lower case with Initial Capitals. SOMETABLE.Somecolumn is shorthand for “the Somecolumn column of the SOMETABLE table”. The use of a period to separate the table from the column name is the convention used by SQL to eliminate ambiguity regarding which table a column belongs to. When a column name includes an acronym the acronym is capitalized, as is the first letter of the next word when the acronym begins the column name. For example, PCSColor.
Actual database values are typographically distinguished
from the surrounding text, as in the following sentence:
“The Sname (short name) of the baboon Pebbles is
PEB
.”
When this document defines a word, uses it for the first time, or otherwise wishes to refer to a word or phrase as a thing in and of itself, the word or phrase is typographically distinguished as follows: “The word census has several meanings within this document.”
Text that has special meaning to computer systems is
typographically distinguished as follows: “The SQL
SELECT
statement is the standard method for
retrieving data from relational databases.”
Emphasized text is typographically distinguished as follows: “Always backup your data.”
When the words must or cannot or the phrases must not or may not are used, the system will not allow a contrary condition. For example: "Sname must be a unique data value" or "A user with read-only permissions may not change data values." Babase will immediately raise an error when a dis-allowed change is attempted and the change will not take effect.
When the words should or ought are used the system does not enforce the condition. It may or may not report a violation of the condition. An example: "The sexual cycle event referred to in the pregnancy table's Conceive column should date the conception that began the pregnancy." In this case the system has no way of knowing when the pregnancy began and so no way of validating the date.
When the phrase the system will report is used there is some mechanism for reporting a an unusual but not dis-allowed condition. Unlike prohibited conditions, unusual conditions are not generally reported at the time the condition is created.[1]
The documentation is written with a tendency to emphasize Special Values. So, for example, “not alive” is often written instead of “dead” because Babase has a special value that means alive but the system is not aware of a particular code that means dead. The result is an occasional double negative.
Significant but often slightly off-topic paragraphs are set off from the surrounding material as a note, shown in Example 1.1.
Example 1.1. A note
Note
Written material has no voice that can be raised, but attention can be drawn with typographical conventions.
When the reader should take care, particularly when the system might do something unexpected in a given circumstance, this is noted in a caution. Example 1.2 shows how a caution is set off from the surrounding text.
Example 1.2. A caution
Babase will reject your change if you try to do something that is not allowed, like giving a male an onset of turgesence date.
Caution
When the rejected change is one of a number of changes bundled into a transaction none of the changes will make it into the database.
When a mis-use of the system will lead to incorrect results, particularly when such results are not obvious, this document contains a warning. Example 1.3 show how warnings are set off from the surrounding text.
Example 1.3. A warning
Warning
Babase cannot detect when an Sname is mis-typed, so it is possible to inadvertently assign a female's sexual cycle to the wrong female.
To otherwise draw the readers attention to material some text is marked important. Example 1.4 show how important text is set off from the surrounding material.
Example 1.4. Text denoted important
Babase has a number of components, many of them, like the SQL web interface, are third party tools, not written by the Babase developers.
Important
When the third party tools are upgraded their “look” may change but the features they provide should remain. As Babase is composed of Free Software the Babase project always has the option of customizing any of its third party tools and can contribute its improvements back to the program's developers for inclusion into future releases.
Suggestions as to how to use Babase are noted in tips, as are remarks on how data are presently entered in Babase or recorded in the field. Example 1.5 show how a tip is set off from the regular document text.
Often, the tips are the result of best practice
developed from considered experience and so document how
Babase is used at the time of this writing. However, as best
practice continues to develop and field protocols change, the
Protocol for Data
Management: Amboseli Baboon Project and the Amboseli
Baboon Research Project Monitoring Guide should always be
consulted. Those documents have precedence over the tips
presented herein should there be conflicting advice.
Supplemental and cross-referential material is presented in footnotes.
Anyone who is changing or adding programs to the system should read this entire document. Chapter 3: “Baboon Data: Primary Source Material” is particularly important for all those using the system. Chapter 2: “Babase System Architecture” provides the introduction to Babase. It explains fundamental concepts without which Babase cannot be understood, although some portions can be skipped; the sections “The Babase Program Code” and “Indexes” are primarily of interest to programmers and the section “Special Values” is for the data maintainers. Everyone will want to pay special attention to the “Entity-Relationship Diagrams” section. These diagrams can also be found in PDF form in The Babase Pocket Reference, where they may be easier on the eye. The section “Data Maintenance Programs” of chapter 8: “Babase Programs” is of little interest to those who only want to retrieve information from the system. Portions of the “Useful Programs and Functions” section of the same chapter is of interest to the more sophisticated user. Note that some functions may be hidden in “Next” links, depending on the format chosen when reading this document. Data maintainers should be sure to understand chapter 5: “Support Tables”. Those who are only retrieving data from Babase need not read chapter 7: “Data Entry”.
The Babase system is designed to facilitate the retrieval, storage, and maintenance of the Amboseli Baboon Project data. Data integrity is foremost. Analytical power, ease of use, and low cost are secondary goals. The system consists of tables to store and organize the data, software supporting data validation and derivative data generation, stand-alone programs used to facilitate the entry and maintenance of the data, a minimal tool set supporting the maintenance of the Babase system software itself, and documentation. data are retrieved from Babase using the SQL language, the standard[2] language used to query relational databases. SQL is declarational as opposed to procedural; from a single SQL query (a single statement) the database determines how to best retrieve the data requested, no matter the number of tables or criteria required. SQL provides a single, powerful, interface for ad-hoc data retrieval and manipulation. Generic software provides the bulk of the user interface[3], traditionally the most complex and costly software component.[4] Consequently there are few stand-alone programs written specifically for Babase. The overall philosophy of the systems implementation is to keep the software as easy to maintain as possible while assuring data integrity. To this end, the system is comprised of as many generic components as possible and the design requires custom programming for only the most crucial features.
Babase puts as much intelligence as possible into the database itself, including automatic data validation and complex automatic analysis and storage of the derived data.[5] Babase extends its sometimes complex and rather abstract database structures with alternative, more familiar and user-friendly, means of accessing the underlying data[6]. These constructs are, in so far as is possible, made indistinguishable from the underlying data when querying and updating the database. Babase often generates derivative data for more ready analysis. This is, for the most part, transparent to the user. The end-user is insulated from implementation details, the number of interfaces (primarily SQL) the user must learn is minimized, and the user is free to work with the data structures that embody the conceptual model best suited to the task at hand.[7]
Data input is an example of how Babase incorporates generic programs. The prototypical way to import data into Babase is in bulk, via a plain text file having columns delimited by the tab character. These are easily produced by almost any spreadsheet program; it is expected that most data imported into Babase will be typed into a spreadsheet and then exported to tab-delimited text for upload.[8] The use of generic interfaces reduces cost, and minimizing the number of novel interfaces frees the end-user to concentrate on the task at hand.
Babase is designed to be accessed over the Internet, primarily via the web. Although there are exceptions[9] the majority of Babase is accessed via a W3C compliant web browser. Individually assigned usernames and passwords are used, along with encryption, to secure the database content. The Babase Wiki provides content for an the structure of the project's web site. Another example of Babase leveraging a generic program, the wiki allows project members collaborate, share information, and build the project's web site without programmer intervention.
Babase is built upon standards[10] and popular, widely deployed, Open Source and Free Software. This means, among other things, that the tools used to build and run Babase are very likely available to anyone free of cost, and that the skill-sets required for the system maintenance of and, to some extent, use of Babase are readily learned[11] and unlikely to become obsolete[12].[13] The Babase source code itself is Free Software[14] and may be downloaded by the public.[15]
The database design attempts 5th normal form, no redundant data, no empty data elements allowed, etc. What we've actually wound up with is about 3rd normal form.
The Babase system is accessed over the web. Any web browser may be used to view the data using the phpPgAdmin generic database interface. More advanced usage of the website will likely require a web browser that conforms to the international standards for the web defined by the World Wide Web Consortium , otherwise known as the W3C ,as we have put forth no particular effort to accommodate non-standards conforming browsers. The browser must support CSS2 style sheets and XHTML 1.0. Note that at the time of this writing Microsoft Internet Explorer does not provide adequate style sheet support. Other browsers that do have such support include Mozilla , Mozilla Firefox ,Apple's Safari ,and Opera. The W3.org site maintains a list of browsers supporting style sheets.
Babase's URL (web address)
is https://papio.biology.duke.edu/ . Be sure to
type the s
in https
. This secures
your web connection.
You must access most of the Babase web site using a secure communications protocol ( HTTPS ) that encrypts all communication to foil eavesdroppers and checks the identity of the web site itself. The Babase project has signed its own security certificate, the certificate that ensures you are talking with the website you think you are.[16] Our certificate expires annually and is re-generated.
Your browser probably will not trust that our website is who it says it is and so will very likely object when you first access the Babase web site, and annually thereafter. You may tell your browser to accept our certificate permanently.
Resources related to Babase include:
The Babase Pocket Reference is available as a PDF (4.8MB) and as a web page.
Babase Wiki, wherein lives much goodness including tutorial's, guides, and documentation on various data and material related to but not yet part of Babase (like the tables in the babase_pending schema)
The Babase Web site
The Amboseli Baboon Research Project's web site
The Protocol for Data Management: Amboseli Baboon Project
The Babase mailing list and its associated archive
The PostgreSQL web site, documentation, PostgreSQL's mailing list for beginner's questions, PostgreSQL's mailing list for SQL questions, and the general mailing list
The phpPgAdmin web site, documentation, and help forum
Some vagaries of spreadsheets, Excel in particular, that can affect their use as an source of Babase data are found in the page Spreadsheet Addiction, although the main focus of the page is the appropriate and inappropriate uses of spreadsheets in comparison with alternatives.
Babase users are encouraged to ask questions, both on the Babase mailing list and on the mailing lists setup for questions on the software that Babase is made of.
[1] Immediate reporting of some unusual conditions could be added to Babase at a later date.
[2] More or less. The last actual SQL standard was issued a very long time ago. None the less SQL is pervasive and, although specific SQL statements may not always be, the skill set involved in SQL use is quite portable.
[3] There are many PostgreSQL user interfaces available, although at the time of this writing only 2, phpPgAdmin and psql, are installed on the Babase database server. Many of these front-ends must be installed on the local workstation. These may require that the Babase VPN be running before initiating a connection to the database. Some of available front-ends may be found via the PostgreSQL FAQ question regarding graphical user interfaces for PostgreSQL.
[4] It's those pesky unpredictable users. Computer software would be a lot easier to write if it weren't for users always messing things up and then insisting on knowing what happened.
[5] A process which, admittedly, sometimes conflicts with the notion of easily maintaining the software. On the other hand when done right this approach does wonders for data integrity.
[7] These features also free the user from “software interface lock-in”. The database may be accessed and maintained with the software of choice. Data integrity, in both raw and derived data, is assured. Significantly, these features are those that allow Babase to leverage generic programs, using them for the bulk of its user interface as opposed to building a custom, Babase specific, interface.
[8] Of course, because Babase has no designated front-end and so much data validation takes place inside the database itself, any program able to talk with PostgreSQL, the database engine Babase uses, can be used to import data into the database. So there are no real limits on how data must be structured for import into Babase.
[9] There are 2 Unix shell programs that provide peripheral utility; both do tasks that can be done with other tools but are handy to have automated. The use of these programs are documented on the Babase Wiki. Comprehensive documentation of these programs should probably be added to this document.
The Unix Shell Programs
babase-copy-babase-schema
Copies the entire content of the babase schema from one database to another.
babase-user-add
Adds a postgresql user, granting the permission to use Babase
There is also the ranker program, which runs on the local workstation and uses the Internet to communicate with the database. Developed separately from the rest of Babase, neither the source code management of nor the documentation for the ranker program is particularly well integrated into Babase.
[10] Actual standards, not de facto ones.
[11] Because open standards and the documentation for Open Source and Free Software programs are available, without cost; and because the inherently transparent and public nature of open standards, Open Source and Free Software leads not only to a wealth of good instructional material freely available on the Internet but also rounds out the basic requirements of a complete learning environment by ensuring that the software itself is available to everyone.
[12] Because once software is released and distributed under a Free or Open Source license it cannot be locked away and made unavailable, and because open standards are rarely changed in a backwards-incompatible way.
[13] Consequently the skills are rather widely available. The difficult part, as always, is finding the all of the relevant skills at once. For more on this see The Babase Program Code section.
[14] Presently licensed under the GPL Version 3 or later.
[15] Babase database content is not available to the public.
[16] We do this rather than paying one of the regular certification authorities to validate our identity. These certification authorities appear to validate the identity of their customers by virtue of little more than having successfully been paid.
Table of Contents
Databases are collections of information, all of which can be queried and otherwise manipulated alone or in aggregation with all other database content.[17] Babase contains three databases.
The babase database contains the “real” information. All research takes place in this database.
The babase_copy database contains a copy of the babase database. It is a place to try out dangerous things that might break the babase database.
Each user is given a login and a password they must use to gain access to the database. It is good form to change your password occasionally.[18]
The database can grant specific users various levels of access to specific tables, although such access is not common as it is difficult to administer and maintain such a fine grained degree of control. For further information see the PostgreSQL documentation on Database Users and Privileges.
Rather than maintain database access privileges on a per-user basis it is more convenient to place users in groups and then grant these groups different levels of database access.
Babase contains the following groups:
The members of this group have read access to Babase data and cannot add, delete, or otherwise alter any of the data.
The members of this group have unlimited rights to the Babase data. They may add data, delete data, or alter existing data. They may not, however, alter the structure of the babase database or change the rules to which the data are required to conform. Thus, they may not add or delete tables, alter triggers, or write or replace stored procedures.
Schemas partition databases. Tables, procedures, triggers, and so forth are all kept in schemas. Schemas are like sub-databases within a database. The salient difference between schemas and databases is that a single SQL statement can refer to objects in the different schemas of the parent database, but cannot refer to objects in other databases -- tables within a database can be related, but tables in different databases cannot. Babase uses schemas to partition each database into areas where users have a greater or lesser degree of freedom to make changes. For further information on schemas see the schema documentation for PostgreSQL.
Each database is divided into the same schemas. That is, each schema described below exists within each of the databases described here.
The system looks at the different schemas for objects, for example table names appearing in SQL queries, in the order in which the schemas are listed below. If the table does not appear in the first schema it looks in the second, and so forth. As soon as a table is found with the name given, that table is used and the search stops.
To explicitly reference an object in a specific schema, place the name of the schema in front of the object, separating the two with a period (e.g. schemaname.tablename).
The babase schema holds the “official” Babase tables. Everything in the babase schema is documented and supported.
In this schema the babase_readers and babase_editors have the access described above.
Babase contains a number of schemas that exist to simplify things for those interested only in particular portions of Babase. These schemas contain nothing but views that reference other parts of Babase, the parts that are especially relevant and useful to those interested only in one of the broad categories of Babase data. These schemas and their corresponding categories are:
Schema | Category |
---|---|
babase_cycles_views | Sexual Cycles |
babase_darting_views | Darting |
babase_demog_views | Group Membership and Life Events |
babase_physical_traits_views | Physical Traits |
babase_social_views | Social and Multiparty Interactions |
babase_support_views | Support Tables |
babase_swerb_views | SWERB Data (Group-level Geolocation Data) |
babase_weather_views | Weather Data |
babase_group_views | Views Which Add Gid To Tables |
These schemas provide an overview of the major areas of Babase. They should be especially useful to those starting out with Babase or those interested only in particular portions of Babase data.
The views in these schemas may only be queried. Any
updating of Babase data must be done in the
babase
schema.
Some of Babase's tables and views appear in more than one of these schemas, some in none.
Do not create any views that reference the views in
these schemas. Reference the babase
schema
instead. Any views created that reference anything in these
“category schemas” will be destroyed on
occasion as Babase is modified.
The babase_history schema contains a table for each temporal table in the babase schema. The tables in this schema store the "old" versions of data from those temporal tables, allowing the ability to query for earlier versions of the data. See the Temporal Tables and babase_history appendix for more details.
The name of each table in this schema should be a concatenation of 1) the name of the related babase schema table, and 2) "_HISTORY". For example, a table in the babase schema called SOMETABLE would have a table in the babase_history schema called SOMETABLE_HISTORY.
Members of the babase_readers and babase_editors
groups both have the same permissions in the babase_history
schema: they have read access to the data but cannot perform
INSERT
, UPDATE
, or
DELETE
commands in any tables[19], nor can they add new tables to the schema.
Only administrators are allowed to perform these
actions.
The babase_pending schema holds tables pending planned integration into Babase. The tables in this schema are intended to be used with the “official” Babase tables but, unlike the “official” Babase tables, there is no automated validation process and the table structure has not been thoroughly reviewed. The tables in babase_pending are to be used but their content and structure may change when officially incorporated into Babase.
Documentation on the content of the babase_pending schema may be found on the babase_pending page of the Babase Wiki.
The difference between this schema and the sandbox schema is in the permissions granted.
Members of the babase_readers group have the same permissions they do in the babase schema, they have read access to the data but cannot add, delete or modify it. However, unlike in the babase schema, individual users may be granted the right to add, delete, or change data on a table-by-table basis.
The sandbox schema holds tables that are used together with the “official” Babase tables but have not yet made it into the Babase project. They will not be documented in the Babase documentation.
The groups have the following permissions:
The babase_readers have all the permissions in the sandbox schema that the babase_editors have in the babase schema. They may add, delete, or modify any information in the schema but may not alter the structure of the schema by adding or removing tables, procedures, triggers, or anything else.
The babase_editors have all the permissions of the babase_readers, plus they may add or delete tables, stored procedures, or any other sort of object necessary to control the structure of the data.
Because of the schema search order the schema name must be used to qualify anything created in the sandbox schema. E.g.
Example 2.1. Creating table foo
in the sandbox
schema
CREATE TABLE sandbox.foo (somecolumn INTEGER);
PostgreSQL, the database underlying Babase, is secure by default. This means that any tables or other database objects cannot be accessed by anyone but their creator without permission of the creator. Babase_editors who create tables in the sandbox schema should use the GRANT statement to grant access to Babase's other users.
This is done as follows:
Example 2.2. Granting permission to table
foo
in the sandbox schema
GRANT ALL ON sandbox.foo TO GROUP babase_editors;
GRANT SELECT ON sandbox.foo TO GROUP babase_readers;
There is one other issue. Only the creator of a table can change its structure -- to add another column, change the table name, etc. And only the creator can destroy (DROP) the table.
The devel schema holds tables undergoing integration into Babase. Normally it is empty, but during the design and development of new tables it may contain the tables being developed.
The tables in this schema do not necessarily contain valid or finalized data and so are not expected to be used for other than developmental purposes.
Permissions are granted in the devel schema on the same basis as the granting of permissions in the babase schema.
The difference between this schema and the sandbox schema is that the development tools support the creation and modification of the tables in the devel schema, which facilitates the movement of tables from the devel schema into the babase schema.
Each user has her own schema, a schema named with the user's login. Users have permissions to do anything they want in their own schemas, and no permissions whatsoever to anybody else's schema. A user's schema is private.
Users are not encouraged to grant others permissions to the tables in their schema, as shown in the Section: “The sandbox schema” above. A user's schema is deleted when she leaves Babase. All shared tables belong in the sandbox schema where they can be maintained without regard to personnel changes.
Because of the schema search order the schema name must be used to qualify anything created in the user's schema. E.g.
Example 2.3. Creating table foo
in user
mylogin
's schema
CREATE TABLE mylogin.foo (somecolumn INTEGER);
The data in Babase are stored in tables. Tables can be visualized as grids, with rows and columns. Each row represents a single real-world thing or event, an entity, e.g. a baboon. Each cell in the row contains a single unit of information, e.g. a birth date, a name, and a sex. The row holds the entirety of the information belonging to the entity as an isolated thing, e.g. baboon database entities consist of a birth date, a name, and a sex. Each column contains one and only one kind of information, e.g. birth date.
Table 2.1 is an example of a database table that might be used to represent baboons, one baboon per row. Notice that each cell contains one and exactly one unit of information.
Table 2.1. A Simple Database Table
Birth | Name | Sex |
---|---|---|
May 23, 1707 | Alice | Female |
February 12, 1809 | Bob | Male |
July 22, 1822 | Carol | Female |
Anyone working with Babase will require a familiarity with the database's tables. An understanding of the entity each row represents is critical when working with a table. The tables below provide short definitions of the entities each The babase schema table holds in its rows.
Some of the tables in Babase exist to define a vocabulary. These are the support tables. For lack of a better term, the remainder of the tables are labeled “main tables” in Table 2.2.
Tables which have names ending in “_DATA” should not be used, there is always a view of the data in these tables that may be used in their place. Tables ending in “_DATA” may change in future Babase minor releases, breaking queries and programs which use the table. Use of the corresponding views will ensure compatibility with future Babase releases.
Table 2.2. The Main Babase Tables
Group Membership and Life Events | |
Table | One row for each |
ALTERNATE_SNAMES | rescinded sname |
BIOGRAPH | animal, including fetuses |
CENSUS | day each individual is (or is not) observed in a group |
CONSORTDATES | male who has a known first consortship |
DEMOG | mention of an individual's presence in a group within a field textual note |
DISPERSEDATES | male who has left his maternal study group |
GROUPS | group (including solitary males) |
MATUREDATES | individual who is sexually mature |
RANKDATES | individual[a] who has attained adult rank |
Analyzed: Group Membership and Life Events | |
Table | One row for each |
DAD_DATA | offspring having a paternity analysis |
MEMBERS | day each individual is alive |
RANKS | month each individual is ranked in each group |
RESIDENCIES | bout of each individual's residency |
Physical Traits | |
Table | One row for each |
WP_AFFECTEDPARTS | body part affected by a specific wound/pathology |
WP_DETAILS | wound or pathology cluster indicated on a report |
WP_HEALUPDATES | update on progress of wound/pathology healing |
WP_REPORTS | wound/pathology report |
Analyzed: Physical Traits | |
Table | One row for each |
HORMONE_KITS | kit or protocol used to assay hormone concentration |
HORMONE_PREP_DATA | laboratory preparation performed on a sample in the specified series |
HORMONE_PREP_SERIES | series of preparations and assays performed on a sample |
HORMONE_RESULT_DATA | assay for hormone concentration in a sample |
HORMONE_SAMPLE_DATA | tissue sample used in hormone analysis |
HYBRIDGENE_ANALYSES | analysis of genetic hybrid scores |
HYBRIDGENE_SCORES | genetic hybrid score for an individual from an analysis |
Sexual Cycles | |
Table | One row for each |
CYCGAPS | female for each initiation or cessation of a continuous period of observation |
CYCLES | female's cycle (complete or not) |
CYCPOINTS | Mdate (menses), Tdate (turgesence onset), or Ddate (deturgesence onset) date of each female |
PREGS | time a female becomes pregnant |
SEXSKINS | sexskin measurement of each female |
The Sexual Cycle Day-By-Day Tables | |
Table | One row for each |
CYCGAPDAYS | female for each day within a period during which there is not continuous observation |
CYCSTATS | day each female is cycling -- by M, T and Ddates |
MDINTERVALS | day each female is cycling and is between M and Ddates |
MMINTERVALS | day each female is cycling -- by Mdates |
REPSTATS | day each female has a known reproductive state |
Social and Multiparty Interactions | |
Table | One row for each |
ALLMISCS | “free form” all-occurrences datum |
CONSORTS | multiparty dispute over a consortship |
FPOINTS | point observation of a mature female |
INTERACT_DATA | interaction between individuals |
MPIS | collection of multiparty interactions |
MPI_DATA | single dyadic interaction of a multiparty interaction collection |
MPI_PARTS | participant in a dyadic interaction of a multiparty interaction collection |
PARTS | participant in each interaction |
POINT_DATA | individual point observation |
NEIGHBORS | neighbor recorded in each point sample |
SAMPLES | focal sample |
Darting | |
Table | One row for each |
ANESTHS | time additional sedation is administered to a darted individual |
BODYTEMPS | body temperature measurement taken of a darted individual |
CHESTS | chest circumference measurement made of a darted individual |
CROWNRUMPS | crown to rump measurement made of a darted individual |
DART_SAMPLES | sample type collected at each darting |
DARTINGS | darting of an animal when data was collected |
DPHYS | darting event during which physiological measurements were taken |
HUMERUSES | humerous length measurement made of a darted individual |
PCVS | packed cell volume measurement taken from a darted individual |
TEETH | possible tooth site within the mouth on which data was collected for every darting event during which dentition data was collected |
TESTES_ARC | every testicle width/length measurement recorded, as measured along a portion of the circumference |
TESTES_DIAM | every testicle width/length measurement recorded, as measured along the diameter |
TICKS | darting event during which data on ticks and other parasites were recorded |
ULNAS | ulna length measurement made of a darted individual |
VAGINAL_PHS | vaginal pH measurement made of a darted individual |
Analyzed: Darting | |
Table | One row for each |
FLOW_CYTOMETRY | flow cytometric analysis of a blood sample collected during a darting |
WBC_COUNTS | count from a blood smear collected during a darting |
Inventory | |
Table | One row for each |
LOCATIONS | Location that can be used to store tissue and nucleic acid samples |
NUCACID_CONC_DATA | Quantification of a nucleic acid sample's concentration |
NUCACID_DATA | Nucleic acid sample that is or ever has been in the inventory |
NUCACID_LOCAL_IDS | Name/ID used to identify a nucleic acid sample at a particular institution |
NUCACID_SOURCES | Nucleic acid sample that has another nucleic acid sample as its source |
POPULATIONS | Study population under observation or from which tissue or nucleic acid samples have been collected |
TISSUE_DATA | Tissue sample that is or ever has been in the inventory |
TISSUE_LOCAL_IDS | Name/ID used to identify a tissue sample at a particular institution |
TISSUE_SOURCES | Tissue sample that has another tissue sample as its source |
UNIQUE_INDIVS | Individual under observation or from whom tissue or nucleic acid samples have been collected |
SWERB Data (Group-level Geolocation Data) | |
Table | One row for each |
AERIALS | aerial photo used for map quadrant specification |
GPS_UNITS | GPS device |
QUAD_DATA | SWERB map quadrant |
SWERB_BES | uninterrupted bout of group-level observation |
SWERB_DATA | event related to group-level geolocation |
SWERB_DEPARTS_DATA | departure from camp of a observation team which collected SWERB data |
SWERB_GWS | geolocated physical object (grove or waterhole) |
SWERB_GW_LOC_DATA | recorded location of a geolocated physical object (grove or waterhole) |
SWERB_LOC_DATA | observation of a group at a time at a geolocated physical object |
SWERB_LOC_DATA_CONFIDENCES | analyzed observation of a location |
SWERB_LOC_GPS | observation of a group at a time at a geolocated physical object made using gps units and a protocol that requires 2 waypoint readings |
SWERB_OBSERVERS | departure from camp of an observer who drove or collected SWERB data |
Weather Data | |
Table | One row for each |
RAINGAUGES | rain gauge reading |
RGSETUPS | rain gauge installation |
TEMPMAXS | maximum temperature reading |
TEMPMINS | minimum temperature reading |
DIGITAL_WEATHER | digital weather reading reported from an electronic weather collection device |
WREADINGS | manually collected meteorological data collection event |
The significant aspects of the the support tables are: the Id column -- the name of the column holding the vocabulary term, which columns of which tables use the vocabulary, and what sort of vocabulary the table defines. Table 2.3 summarizes this information.
The Id columns throughout Babase do not allow values
that are NULL
, or which are textual but contain no
characters, or which consist solely of spaces.
Table 2.3. The Babase Support Tables
General Support Tables | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
BODYPARTS | Bodypart | TICKS.Bodypart, BODYPARTS.Bodyregion, WP_AFFECTEDPARTS.Bodypart | part of the body |
LAB_PERSONNEL | Initials | HYBRIDGENE_ANALYSES.Analyzed_By, NUCACID_CREATORS.Creator, WBC_COUNTS.Counted_By | person who generates data, usually in a lab setting |
OBSERVERS | Initials | SAMPLES.Observer, WREADINGS.WRperson, RGSETUPS.RGSPerson, CROWNRUMPS.CRobserver, CHESTS.Chobserver, ULNAS.Ulobserver, HUMERUSES.Huobserver, SWERB_OBSERVERS.Observer | person who record observational data |
OBSERVER_ROLES | Initials | OBSERVERS.Role, OBSERVERS.SWERB_Observer_Role, OBSERVERS.SWERB_Driver_Role, SWERB_OBSERVERS.Role | way in which a person can be involved in the data collection process |
UNKSNAMES | Unksname | NEIGHBORS.Unksname and the SWERB_UPLOAD view | problem in identifying neighbor of focal during point sampling or in identifying a lone male in a SWERB other group observation |
Group Membership and Life Events | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
BSTATUSES | Bstatus | BIOGRAPH.Bstatus | birthday estimation accuracy |
CONFIDENCES | Confidence | BIOGRAPH.DcauseNatureConfidence, BIOGRAPH.DcauseAgentConfidence, DISPERSEDATES.Dispconfidence, BIOGRAPH.Matgrpconfidence | degree of certitude in nature of death, agent of death, disperse date assignment, or maternal group assignment |
DAD_SOFTWARE | Software | DAD_DATA.Software | software package used to perform genetic paternity analysis |
DCAUSES | Dcause | BIOGRAPH.Dcause | cause of death |
DEATHNATURES | Nature | DCAUSES.Nature | reason for death |
DEMOG_REFERENCES | Reference | DEMOG.Reference | data source for demography notes |
MSTATUSES | Mstatus | MATUREDATES.Matured, RANKDATES.Ranked | maturity marker date estimation process |
DAD_DATA_COMPLETENESS | Completeness | DAD_DATA.Completeness | category of analysis completeness |
DAD_DATA_MISMATCHES | Mismatch | DAD_DATA.Consensus_Mismatch | category of genetic mismatch |
RNKTYPES | Rnktype | RANKS.Rnktype | rank ordering assigned to subject and month |
STATUSES | Status | BIOGRAPH.Status | baboon alive at last observation |
Physical Traits | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
HORMONE_IDS | Hormone | HORMONE_KITS.Hormone | hormone that may be extracted and assayed for |
HORMONE_PREP_PROCEDURES | Id | HORMONE_PREP_DATA.Procedure | procedure that may be performed in preparation for a hormone assay |
HYBRIDGENE_SOFTWARE | Software | HYBRIDGENE_ANALYSES.Software | software used for genetic hybrid score analysis |
MARKERS | Marker | HYBRIDGENE_ANALYSES.Marker | type of genetic marker used for genetic hybrid score analysis |
WP_HEALSTATUSES | Healstatus | WP_HEALUPDATES.HealStatus | healing progress used in healing updates |
WP_REPORTSTATES | ReportState | WP_REPORTS.ReportState | status of wound/pathology report |
WP_WOUNDPATHCODES | WoundPathCode | WP_DETAILS.WoundPathCode | wound or pathology |
Social and Multiparty Interactions | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
ACTIVITIES | Activity | POINT_DATA.Activity | activity classification |
ACTS | Act | INTERACT_DATA.Act | interaction classification |
DATA_STRUCTURES | Data_Structure | SETUPIDS.Data_Structure | version of data structure produced by the data collection devices |
CONTEXT_TYPES | Context_type | MPIS.Context_type | context in which a multiparty interaction occurs |
FOODCODES | Foodcode | POINT_DATA.Foodcode | name of a food item |
FOODTYPES | Ftype | FOODCODES.Ftype | food category |
KIDCONTACTS | Kidcontact | FPOINTS.Kidcontact | spatial relationship between mother and infant |
MPIACTS | Mpiact | MPI_DATA.MPIAct | multiparty interaction classification |
NCODES | Ncode | NEIGHBORS.Ncode | neighbor classification |
PARTUNKS | Unksname | MPI_PARTS.Unksname | problem in identifying participant in a multiparty interaction |
POSTURES | Posture | POINT_DATA.Posture | designated posture |
PROGRAMIDS | Programid | SAMPLES.Programid | version of each program used on the devices to collect focal sampling data |
SAMPLES_COLLECTION_SYSTEMS | Collection_System | SAMPLES.Collection_System | device or "system" used in the field for collecting focal sampling data |
SETUPIDS | Setupid | SAMPLES.Setupid | setupfile used on the devices to collect focal sampling data |
STYPES | SType | SAMPLES.SType | protocol for focal sampling data collection |
STYPES_ACTIVITIES | SType-Activity pair | SAMPLES.SType, ACTIVITIES.Activity | activity classification allowed to be used in each focal sampling protocol |
STYPES_NCODES | SType-Ncode pair | SAMPLES.SType, NCODES.Ncode | neighbor classification allowed to be used in each focal sampling protocol |
STYPES_POSTURES | SType-Posture pair | SAMPLES.SType, POSTURES.Posture | posture classification allowed to be used in each focal sampling protocol |
SUCKLES | Suckle | FPOINTS.Kidsuckle | infant suckling activity |
Sexual Cycles and The Sexual Cycle Day-By-Day Tables | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
PCSCOLORS | Color | SEXSKINS.Color | paracallosal skin coloration |
Darting | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
DART_SAMPLE_CATS | Ds_cat | DART_SAMPLE_CATS.DS_Cat | category of darting sample type |
DART_SAMPLE_TYPES | DS_Type | DART_SAMPLE_TYPES.DS_Type | type of sample collected during dartings |
DRUGS | Drug | DRUGS.Drug | anesthetic drug |
LYMPHSTATES | Lymphstate | DPHYS.Ringnode, DPHYS.Lingnode, DPHYS.Raxnode, DPHYS.Laxnode, DPHYS.Lsubmandnode, DPHYS.Rsubmandnode | lymph node condition |
PARASITES | PARASITE | TICKS.Tickkind | parasite species, species developmental stage, or kind of parasite sign counted |
TCONDITIONS | Tcondition | TEETH.Tcondition | physical condition of a tooth |
TICKSTATUSES | Tickstatus | TICKS.Tickstatus | parasite count outcome category |
TOOTHCODES | Tooth | TEETH.Tooth | adult or deciduous tooth |
TOOTHSITES | Toothsite | TOOTHCODES.Toothsite | dental site within the mouth |
TSTATES | Tstate | TEETH.Tstate | tooth “presence” |
Inventory | |||
Table | Id Column | Related Column(s) | One entry for every possible ... |
INSTITUTIONS | Institution | LOCATIONS.Institution, NUCACID_LOCAL_IDS.Institution, TISSUE_LOCAL_IDS.Institution | possible locale where tissue and nucleic acid samples can be stored or used |
MISID_STATUSES | Misid_Status | TISSUE_DATA.Misid_Status | level of confidence in the identity of a tissue sample |
NUCACID_CONC_METHODS | Conc_Method | NUCACID_CONC_DATA.Conc_Method | method used for quantifying nucleic acid concentrations |
NUCACID_CREATION_METHODS | Creation_Method | NUCACID_DATA.Creation_Method | method used for creating nucleic acid samples |
NUCACID_TYPES | NucAcid_Type | NUCACID_DATA.NucAcid_Type | type of nucleic acid sample |
STORAGE_MEDIA | Storage_Medium | TISSUE_DATA.Storage_Medium | medium used for storage/archiving of tissue samples |
TISSUE_TYPES | Tissue_Type | TISSUE_DATA.Tissue_Type | type of tissue sample |
SWERB Data (Group-level Geolocation Data) | |||
Table | Id Column | Related Column(s) | One entry for every possible ... |
ADCODES | ADCode | SWERB_LOC_DATA.ADcode | relationship between baboon groups and sleeping groves. |
SWERB_LOC_CONFIDENCES | Conf | SWERB_LOC_DATA_CONFIDENCES.Confidence | confidence score used when analyzing the accuracy of a recorded observation of a location. |
SWERB_LOC_STATUSES | Conf | SWERB_LOC_DATA.Loc_Status | status for a recorded observation of a location. |
SWERB_TIME_SOURCES | Source | SWERB_BES.Bsource, SWERB_BES.Esource | data source used to estimate beginning and ending of observation bouts |
SWERB_XYSOURCES (SWERB Time Sources) | Source | SWERB_GW_LOC_DATA.XYSource | data source used to obtain XY coordinates |
Weather Data | |||
Table | Id Column | Related Column(s) | One entry for every possible choice of... |
WEATHER_SOFTWARES | WSoftware | DIGITAL_WEATHER.WSoftware | software used to retrieve data from an electronic weather collection instrument |
WSTATIONS | Wstation | WREADINGS.Wstation | meteorological data collection location or device |
Beginning with Babase 5.0, nearly every table in Babase has a column called "Sys_Period", which shows the range of time when the data in a row is considered "valid". When a row in a table in the babase schema is updated or deleted, the "old" version is no longer "valid" and is saved in a corresponding table in the babase_history schema.
All data in the babase schema are valid, simply by virtue of their being in that schema. Users should not let this discussion of validity mislead them into undue suspicion of the accuracy of the data.
Updates to this column should only be performed automatically by the system, when data are inserted, updated, or deleted. Manual updates to this column are only allowed when done by an admin[20].
In the babase
schema, the lower bound of the Sys_Period column indicates
when the row was last updated, when the row was inserted to
the table, or when the Sys_Period column was added to the
table, whichever is most recent. The upper bound of the
Sys_Period column for tables in that schema will always be
NULL
, meaning "no end" (yet).
In the babase_history schema,
each row represents an old "version" of the row. In these
tables, the lower bound of the Sys_Period column is the
timestamp of the INSERT
or
UPDATE
that created that version of the
row, or the date and time that the Sys_Period column was added
to the original table, whichever is most recent. The upper
bound is the timestamp of the INSERT
,
UPDATE
, or DELETE
that
rendered the row no longer "valid".
In all tables, this column is a timestamp range (with
time zone), with inclusive lower bound
and exclusive upper bound. The lower
bound cannot be NULL
, and defaults to the
current_timestamp
when the row is
inserted/updated.
Most tables have have an id, or key, column that contains
a number unique to that row within its table. The id can be
used, in perpetuity, to refer to its related row and
distinguish it from all the other rows of the table. Ids are
arbitrary, although for convenience they are often sequentially
generated integers. The name of the column is not always
Id
, although it sometimes is.
A relationship is established between the rows of two tables when an id value from one table appears as data in the other. The relationship notion is made most clear by way of diagrams and examples. If the next paragraph is unclear, don't worry. Have a look at the Babase diagrams below by way of example and see if that does not clear things up. The relationship concept is at the heart of relational databases and, while the underlying idea is rather simple, it took many years to develop relational database concepts[21] so don't expect a full understanding immediately.
When an id value of a row in one table appears as data in a second table, the data in the second table can be used to retrieve the identified row from the first table.[22] When an id value of a row in the first table appears as data only once in the second table, the two tables are said to have a one-to-one relationship. One row in the first table relates to one (or possibly zero) row(s) in the second table. When a row's id value can appear in more than one row of a second table, the two tables are said to have a one-to-many relationship. One row of the first table can be related to many rows in the second table. One-to-many relationships are more common than one-to-one relationships. The relationship between the various Babase tables can be visualized in entity relationship diagrams, as shown here. In this diagram each table (entity) is a box, and each box contains a list of the table's columns. The lines between the boxes represent the relationships between the tables.
If you have trouble viewing the diagrams in your browser, you may wish to view them in PDF format. The diagrams are available in The Babase Pocket Reference (approx. 4.8MB) in PDF form.
Views provide an alternative to direct reference of Babase tables. Views appear to be tables, but are really pre-composed queries into the underlying Babase tables. Views can be used almost anywhere in Babase in place of a table, specifically, they can be queried just like tables. An SQL query can freely intermix the use of tables and views.
Babase uses views to hide implementation details, details that may change as Babase develops. Tables that have names ending in “_DATA” should not be used, there is always a view of the data in these tables that may be used in their place. Tables ending in “_DATA” may change in future Babase minor releases, breaking queries and programs that use the table. Use of the corresponding views will ensure compatibility with future Babase releases.
Views make it easy to reuse complex or commonly used queries, or portions of queries. They allow a database designed around the capabilities of the computer to be interacted with in a fashion that makes sense to people. Although the views do not appear in the entity relationship diagrams that document the underlying database, and so are omitted from the high level overview these diagrams provide, most Babase users will greatly benefit if they take the time to understand how the views fit into the overall database and will usually find it easier to work with the views than with the underlying tables.
Table 2.4. The Babase Views
Group Membership and Life Events | |||
View | One row for each | Purpose | Tables/Views used |
CENSUS_DEMOG | CENSUS row | Maintenance of CENSUS rows that are extended with DEMOG information. | CENSUS, DEMOG |
CENSUS_DEMOG_SORTED | CENSUS row | Maintenance of CENSUS_DEMOG rows in a pre-sorted fashion. | CENSUS, DEMOG |
CYCPOINTS_CYCLES | CYCPOINTS row | Maintenance of CYCPOINTS rows that are extended with CYCLES information. | CYCLES, CYCPOINTS |
CYCPOINTS_CYCLES_SORTED | CYCPOINTS row | The CYCPOINTS_CYCLES view sorted by CYCLES.Sname, by CYCPOINTS.Date. | CYCLES, CYCPOINTS |
DEMOG_CENSUS | DEMOG row | Maintenance of DEMOG rows. | CENSUS, DEMOG |
DEMOG_CENSUS_SORTED | CENSUS row | Maintenance of DEMOG_CENSUS rows in a pre-sorted fashion. | CENSUS, DEMOG |
GROUPS_HISTORY | GROUPS row | Depiction of GROUPS rows in a more human-readable format. | GROUPS |
PARENTS | BIOGRAPH
row for which there is either a row in MATERNITIES with a
record of the individual's mother or there is a row in DAD_DATA with a record
of the individual's father -- with a non-NULL Dad_consensus. | Easy access to parental information. | BIOGRAPH, MATERNITIES, DAD_DATA, MEMBERS |
POTENTIAL_DADS | (completed) female reproductive event for every male more than 2192 days old (approximately 6 years) present in the mother's group during her fertile period | Research into paternity, especially the selection of potential fathers for further genetic testing. | MATERNITIES, MEMBERS (multiple times), ACTOR_ACTEES (multiple times), BIOGRAPH, RANKDATES, MATUREDATES |
PROPORTIONAL_RANKS | RANKS row | Automatic calculation of proportional ranks from the ordinal ranks in RANKS. | RANKS |
Physical Traits | |||
View | One row for each | Purpose | Tables/Views used |
ESTROGENS | HORMONE_RESULT_DATA row with an estrogen kit | Easy access to estrogen data. | BIOGRAPH, HORMONE_KITS, HORMONE_PREP_DATA, HORMONE_PREP_SERIES, HORMONE_RESULT_DATA, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
GLUCOCORTICOIDS | HORMONE_RESULT_DATA row with a glucocorticoid kit. | Easy access to glucocorticoid data. | BIOGRAPH, HORMONE_KITS, HORMONE_PREP_DATA, HORMONE_PREP_SERIES, HORMONE_RESULT_DATA, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
HORMONE_PREPS | HORMONE_PREP_DATA row | Presents HORMONE_PREP_DATA with identifying information from TISSUE_DATA and BIOGRAPH. Also useful for maintaining data in HORMONE_PREP_DATA. | BIOGRAPH, HORMONE_PREP_DATA, HORMONE_PREP_SERIES, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
HORMONE_RESULTS | HORMONE_RESULT_DATA row | Presents HORMONE_RESULT_DATA with identifying information from TISSUE_DATA and BIOGRAPH. Also useful for maintaining data in HORMONE_RESULT_DATA. | BIOGRAPH, HORMONE_KITS, HORMONE_PREP_SERIES, HORMONE_RESULT_DATA, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
HORMONE_SAMPLES | HORMONE_SAMPLE_DATA row | Presents HORMONE_SAMPLE_DATA with identifying information from TISSUE_DATA and BIOGRAPH. Also useful for maintaining data in HORMONE_SAMPLE_DATA. | BIOGRAPH, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
PROGESTERONES | HORMONE_RESULT_DATA row with a progesterone kit. | Easy access to progesterone data. | BIOGRAPH, HORMONE_KITS, HORMONE_PREP_DATA, HORMONE_PREP_SERIES, HORMONE_RESULT_DATA, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
TESTOSTERONES | HORMONE_RESULT_DATA row with a testosterone kit. | Easy access to testosterone data. | BIOGRAPH, HORMONE_KITS, HORMONE_PREP_DATA, HORMONE_PREP_SERIES, HORMONE_RESULT_DATA, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
THYROID_HORMONES | HORMONE_RESULT_DATA row with a thyroid hormone kit. | Easy access to thyroid hormone data. | BIOGRAPH, HORMONE_KITS, HORMONE_PREP_DATA, HORMONE_PREP_SERIES, HORMONE_RESULT_DATA, HORMONE_SAMPLE_DATA, TISSUE_DATA, UNIQUE_INDIVS |
WOUNDSPATHOLOGIES | WP_AFFECTEDPARTS row | Querying of wounds/pathologies data (without heal updates). | WP_REPORTS, WP_OBSERVERS, WP_DETAILS, WP_AFFECTEDPARTS, BODYPARTS |
WP_DETAILS_AFFECTEDPARTS | WP_AFFECTEDPARTS row | Upload of WP_DETAILS and WP_AFFECTEDPARTS rows. | WP_DETAILS, WP_AFFECTEDPARTS, BODYPARTS |
WP_HEALS | WP_HEALUPDATES row | Upload and viewing of WP_HEALUPDATES rows. | WP_REPORTS, WP_DETAILS, WP_AFFECTEDPARTS, BODYPARTS, WP_HEALUPDATES |
WP_REPORTS_OBSERVERS | WP_REPORTS row | Upload of WP_REPORTS and WP_OBSERVERS rows. | WP_REPORTS, WP_OBSERVERS |
Sexual Cycles | |||
View | One row for each | Purpose | Tables/Views used |
CYCLES_SEXSKINS | CYCLES row | Maintenance of SEXSKINS rows. | CYCLES, SEXSKINS |
CYCLES_SEXSKINS_SORTED | CYCLES row | The CYCLES_SEXSKINS view sorted by CYCLES.Sname, by SEXSKINS.Date. | CYCLES, SEXSKINS |
MATERNITIES | birth or fetal loss | Summarizes (completed) reproductive events. | BIOGRAPH, PREGS, CYCPOINTS, CYCLES |
MTD_CYCLES | CYCLES row | Presents CYCLES together with Mdate, Tdate, and Ddate CYCPOINTS information for a view of an "entire" sexual cycle as a single row. | CYCLES, CYCPOINTS |
SEXSKINS_CYCLES | SEXSKINS row | Maintenance of SEXSKINS rows. | CYCLES, SEXSKINS |
SEXSKINS_CYCLES_SORTED | SEXSKINS row | The SEXSKINS_CYCLES view sorted by CYCLES.Sname, by SEXSKINS.Date. | CYCLES, SEXSKINS |
SEXSKINS_REPRO_NOTES | SEXSKINS row, or REPRO_NOTES row | Maintenance of SEXSKINS rows. | CYCLES, REPRO_NOTES, SEXSKINS |
Social and Multiparty Interactions | |||
View | One row for each | Purpose | Tables/Views used |
ACTOR_ACTEES | INTERACT row | Maintenance of social interaction data, INTERACT rows and POINTS. A view optimized for highest performance when working with these tables. Analysis of social interaction data. | INTERACT, PARTS |
INTERACT | INTERACT_DATA row | Presents INTERACT_DATA with additional date and time columns that transform the underlying date and time columns in useful and interesting ways. | INTERACT_DATA |
INTERACT_SORTED | INTERACT_DATA row | Presents the INTERACT view sorted in a fashion expected to ease maintenance. | INTERACT_DATA |
MPI_EVENTS | MPI_DATA row | Analysis and correction of multiparty interaction data. | MPI_DATA, MPI_PARTS, MPIACTS |
POINTS | POINT_DATA row | Presents POINT_DATA with the Ptime column transformed into a column that may be useful and interesting. | POINT_DATA |
POINTS_SORTED | POINTS row | Presents POINTS sorted by Sid, and within that by Ptime. | POINTS |
SAMPLES_GOFF | SAMPLES row | Presents SAMPLES with an additional column Grp_of_focal, which has the group of the focal at the time of sampling. | SAMPLES |
Darting | |||
View | One row for each | Purpose | Tables/Views used |
ANESTH_STATS | unique ANESTHS.Dartid value -- for each darting during which additional anesthetic was administered | Analysis and “eyeballing” of data involving additional administration of anesthetic when darting. | ANESTHS |
BODYTEMP_STATS | unique BODYTEMPS.Dartid value -- for each darting having body temperature measurements | Analysis and “eyeballing” of darting body temperature measurements. | BODYTEMPS |
CHEST_STATS | unique CHESTS.Dartid value -- for each darting having chest circumference measurements | Analysis and “eyeballing” of darting chest circumference measurements. | CHESTS |
CROWNRUMP_STATS | unique CROWNRUMPS.Dartid value -- for each darting having crown-to-rump measurements | Analysis and “eyeballing” of darting crown-to-rump measurements. | CROWNRUMPS |
DSAMPLES | unique DARTINGS.Dartid value -- for each darting | Visualization of all samples collected per darting. | DARTINGS, MEMBERS, DART_SAMPLES |
DENT_CODES | unique TEETH.Dartid value -- for each darting with recorded tooth information | Perusal and maintenance of TEETH rows by kind of tooth. | TEETH |
DENT_SITES | unique TEETH.Dartid value -- for each darting with recorded tooth information | Perusal of TEETH rows by position in the mouth. | TEETH, TOOTHCODES |
HUMERUS_STATS | unique HUMERUSES.Dartid value -- for each darting having humerus length measurements | Analysis and “eyeballing” of darting humerus length measurements. | HUMERUSES |
PCV_STATS | unique PCVS.Dartid value -- for each darting having PCV measurements | Analysis and “eyeballing” of darting PCV measurements. | PCVS |
TESTES_ARC_STATS | unique TESTES_ARC.Dartid value -- for each darting having at least one measurement of testes length or width circumference | Analysis of testes length and width measurements taken during darting. | TESTES_ARC |
TESTES_DIAM_STATS | unique TESTES_DIAM.Dartid value -- for each darting having at least one measurement of testes length or width diameter | Analysis of testes length and width measurements taken during darting. | TESTES_DIAM |
ULNA_STATS | unique ULNAS.Dartid value -- for each darting having ulna length measurements | Analysis and “eyeballing” of darting ulna length measurements. | ULNAS |
VAGINAL_PH_STATS | unique VAGINAL_PHS.Dartid value -- for each darting having vaginal pH measurements | Analysis and “eyeballing” of darting vaginal pH measurements. | VAGINAL_PHS |
Inventory | |||
View | One row for each | Purpose | Tables/Views used |
LOCATIONS_FREE | LOCATIONS row that isn't used in NUCACID_DATA or in TISSUE_DATA | Querying of available ("free") locations for storing new samples | LOCATIONS, NUCACID_DATA, TISSUE_DATA |
NUCACID_CONCS | NUCACID_CONC_DATA row | Converting and standardizing units of nucleic acid concentration | NUCACID_CONC_DATA, NUCACID_CONC_METHODS, NUCACID_LOCAL_IDS |
NUCACIDS | NUCACID_DATA row | Showing data about nucleic acids in a human-readable format | NUCACID_DATA, TISSUE_DATA, UNIQUE_INDIVS, BIOGRAPH, NUCACID_LOCAL_IDS, NUCACID_SOURCES |
NUCACIDS_W_CONC | NUCACID_DATA row | Showing data about nucleic acids in a human-readable format, including concentrations from the most-recent quantifications | NUCACID_DATA, NUCACID_CONC_DATA, TISSUE_DATA, UNIQUE_INDIVS, BIOGRAPH, NUCACID_LOCAL_IDS, NUCACID_SOURCES |
TISSUES | TISSUE_DATA row | Showing data about tissue samples in a human-readable format | TISSUE_DATA, UNIQUE_INDIVS, BIOGRAPH, TISSUE_LOCAL_IDS |
TISSUES_HORMONES | TISSUE_DATA row | Providing an expanded set of information about tissue samples used for hormone analysis. Also useful for simultaneous upload of data to TISSUE_DATA and HORMONE_SAMPLE_DATA | TISSUE_DATA, UNIQUE_INDIVS, BIOGRAPH, TISSUE_LOCAL_IDS, HORMONE_SAMPLE_DATA |
SWERB Data (Group-level Geolocation Data) | |||
View | One row for each | Purpose | Tables/Views used |
QUADS | QUAD_DATA row | Querying of X, Y coodinates from and maintenance of QUAD_DATA rows. | QUAD_DATA |
SWERB | SWERB_DATA row -- for every SWERB event, departure from camp excluded | Collects SWERB related information spread among several tables and separates geolocation points into X and Y coordinates. | SWERB_DATA, QUADS, SWERB_BES, SWERB_DEPARTS_DATA, SWERB_DEPARTS_GPS |
SWERB_DATA_XY | SWERB_DATA row -- for every SWERB event, departure from camp excluded | Separates SWERB_DATA geolocation points into X and Y coordinates for ease of maintenance. | SWERB_DATA |
SWERB_DEPARTS | SWERB_DEPARTS_DATArow -- for every departure from camp of every observation team, for those observation teams which have collected SWERB data | Collects departure related information spread among several tables and separates geolocation points into X and Y coordinates. | SWERB_DEPARTS_DATA, SWERB_DEPARTS_GPS |
SWERB_GW_LOCS | SWERB_GW_LOC_DATA row -- for every geolocation of an object, of a grove or waterhole | Collects SWERB grove and waterhole location information spread between tables and separates geolocation points into X and Y coordinates. | SWERB_GW_LOC_DATA, QUADS |
SWERB_GW_LOC_DATA_XY | SWERB_GW_LOC_DATA row -- for every geolocation of an object, of a grove or waterhole | Separates SWERB_GW_LOC_DATA geolocation points into X and Y coordinates for ease of maintenance. | SWERB_GW_LOC_DATA |
SWERB_LOC_GPS_XY | SWERB_LOC_GPS row -- for every time a group is observed at a geolocated physical object, usually a grove or waterhole, and 2 GPS waypoints are required to by the protocol to collect the data | Separates SWERB_LOC_GPS geolocation points into X and Y coordinates for ease of maintenance. | SWERB_LOC_DATA, ADCODES |
SWERB_LOCS | SWERB_LOC_DATA row -- for every time a group is observed at a geolocated physical object, usually a grove or waterhole | Presents the relationship between the groups and physical features of the landscape in a more comprehensive manner for simpler querying. | SWERB_LOC_DATA, ADCODES |
SWERB_UPLOAD | row uploaded into SWERB | This view returns no rows, it is used only to upload data into the swerb portion of Babase. | SWERB_DEPARTS_DATA, SWERB_DEPARTS_GPS, SWERB_BES, SWERB_DATA, SWERB_LOC_DATA |
Weather Data | |||
View | One row for each | Purpose | Tables/Views used |
MIN_MAXS | WREADINGS row | Analysis and correlation of manually collected weather data. | WREADINGS TEMPMINS TEMPMAXS RAINGAUGES |
MIN_MAXS_SORTED | WREADINGS row | The MIN_MAXS view sorted for convienience. | WREADINGS TEMPMINS TEMPMAXS RAINGAUGES |
In addition to the above views there are a number of views
which produce the group of a referenced individual as of a
pertinent date. These views are all named after the table from
which they are derived, with the addition of the suffixed
_GRP
. They are nearly identical to the table
from which they derive, differing only by the addition of a
column named Grp
. The views which produce an
individual's group are listed in the following table.
Table 2.5. The table_GRP Views
To as great an extent as possible Babase utilizes a controlled vocabulary within the system's data store. Again, as far as is possible, this vocabulary may be tailored by adding or deleting codes to tables that define the vocabulary used elsewhere.[23]
At times, the Babase system recognizes that particular
codes have special meanings, for example, the BIOGRAPH table's
F
(female) Sex code or the
0
(alive) Status code. The meaning of these codes is
fixed into the logic of the system. As examples, an individual
must be female to be allowed to have a menstruation, or, the
individual must be alive if a sexual cycle event is to post-date
the individual's Statdate. Some of
these codes, like sex, are not defined in tables, they are
hardcoded into the system. Others are defined in support or
other tables. Because these codes have intrinsic meaning, they
cannot be removed from the Babase system nor should their
presence in the data be used to code a different meaning from
that which the code presently has. For example, the meaning of
STATUSES code value
0
should not be changed to mean
“death due to meteorite impact” because the
system's programs would then allow dead individuals to have
sexual cycles. Each of the “special” values that
the system requires retain particular meaning is listed in the
Special Values section of the table's documentation. For further
information on the meaning of the “special” values,
see the description of the data table(s) that contain the code
values. Should the meaning of one of these
“special” values need to be changed, the logic in
the Babase programs should be adjusted to reflect the
change.
Babase prevents ordinary users from altering rows that
contain special values in an attempt to prevent
mis-configuration of the system. Only users with permissions to
modify a table's triggers may alter the table's special values.
This is not a panacea. To return to the example above, not only
does the system expect a STATUSES code of
0
to mean alive, it also expects
0
to be the
only code on STATUSES
that means alive. If another STATUSES code is
created to indicate a more specific sort of
“alive-ness”, unless re-programmed the system will
consider all individuals given that code to be dead, not alive.
A careful review of the documentation should be undertaken
before modifying the content of tables that instantiate special
values.
Indexes are a feature of databases, a feature which greatly speeds data retrieval. In return there is a small cost in the time it takes to change table content, and cost in disk space used. Databases generally require indexes to perform efficently. It is a good idea to index the tables each user has in their personal schema.
There is no documentation on the indexes used in Babase. In general, there is an index for each way the tables are commonly referenced. For example, if records are often looked up on the basis of date, there will be an index on the date. As a practical guide, there is an index on each of the columns at the endpoint of a “relational line” in the above entity-relationship diagrams, as well as an index on every date column with the exception of the CYCPOINTS table's Edate and Ldate columns. Almost all indexes are b-tree indexes.
Babase uses common and widespread Unix development tools and techniques[24] to minimize a new developer's learning curve. This is a vain hope. Babase is complex and contains a lot of moving parts.
The remainder of this section describes conventions and procedures that those working with the Babase source code are expected to follow. It is of interest primarily to those who work with, or are considering working with, the code. It is not a comprehensive list, guidance should be taken from the existing code.
Anything and everything that is part of Babase should be checked into the project's revision control system.
All data values used in the code should be abstracted,
either via m4 or PHP defines, using names that begin with
“bb_
”.
Minimize hardcoding. The use of data values in the code should be minimized. By keeping the number of hardcoded values to a minimum, the values used within the system can be altered through procedural changes alone, expensive programming can be avoided, and the flexibility of the system is increased.[25]
All database extension, triggers, functions, etc. should be written in PL/PgSQL, supplemented by m4.
All stand-alone programs should be accessible via the web. They should be written in PHP and styled with CSS2. The web pages they produce should be XHTML 1.0 compliant and should pass W3C validation at http://validator.w3.org/. Style sheets should pass the CSS validator at http://jigsaw.w3.org/css-validator/. Programs that access the database should obtain their PostgreSQL login credentials from the user, preferably using the existing PHP library code.
Each database user must be assigned unique login credentials to the PostgreSQL database. Each user is responsible for the security of his own login credentials and should never use login credentials that are not her own. All code should support this paradigm.
Every file should begin with a statement of copyright.
Each program, function, or procedure should have documented: its input arguments; its return value; any side effects including changes to pass-by-reference arguments, changes to the screen, changes to the database cursors, etc.
Clarity in your code is more important than efficiency. If the code is not clear, it is less likely to work and more likely to have bugs introduced upon maintenance. There is no point in getting a wrong answer quickly.
See the README files in the source tree's directories for information on how the source code is organized.
[17] As security restrictions permit, of course.
[18] That way if you unknowingly revealed your password to the terrorists last weekend when you were drunk, by the time everybody sobers up the password will have been changed and the amount of damage done is limited.
[19] There is one exception to this rule. Members of the
babase_editors group actually may insert data to these
tables, but only when it is done automatically as part of
an UPDATE
or DELETE
in a The babase schema table.
[20] Manual updates probably shouldn't be allowed either, but we need to allow automatic updates resulting from legitimate data changes made by babase editors. To allow this, the rule is that only admins are allowed to update this column at all, and the "versioning" function is always run as an admin.
[21] Don't try this at home! Trained Professionals Only! Etc. ;-)
[22] And the reverse is true. The id of a row in the first table can be used to find the row in the second table that holds it.
[23] Examples may be readily found in the Chapter: “Support Tables”.
[24] Usually.
[25] This is very important but the reasons behind it are not obvious, coding values into the programs means creating office procedures that cannot be altered without a programmer. For example, encoding the value of the unknown group into the system would make it impossible to create different unknown groups for animals disappearing from different groups, or different unknown groups for animals disappearing in varying states of health, or whatever.
Table of Contents
These tables contain the permanent records of baboon-related data. For the most part this data are as collected in the field, although presumably the field staff is not perfect and there will be some errors that are corrected before data entry into Babase. Some columns, and more rarely entire rows, do contain derived data. Some of the derived data, such as pregnancy parity, is manually maintained, other derived data, such as sexual cycle sequence numbers or menses dates computed from onset of turgesence, is maintained by the system. The documentation clearly indicates which data are collected in the field, which data are derivative, and how derived data values are constructed.[26]
This table records cases where short names (Snames) were assigned to individuals and then the choice of name was rescinded. It contains one row for every rescinded Sname, linking the rescinded value to the Sname presently assigned to the individual.
A new row may not be inserted into BIOGRAPH with an Sname value that is an Alternate_Sname value. However, in order to accommodate cases of switched identities, ALTERNATE_SNAME rows may have Alternate_Sname values which appear in the BIOGRAPH.Sname column.
The Sname value must differ from the Alternate_Sname value.
A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information from BIOGRAPH or other places where the animal's three-letter code appears.
This column may not be NULL
and may not be
998
.
An Sname once associated
with the individual identified in the Sname column. This
column may not be empty, it must contain exactly 3
characters, it may not contain lower case letters, and it
may not contain the space character. This column may not be
NULL
.
The name associated with the alternate sname. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.
This column may be NULL
.
Notes regarding the existence of the alternate Sname. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table lists and explains "behavior gaps": periods of time during which behavioral data (e.g. interactions, focal sampling) for an indicated group are (or are suspected to be) sparse, lacking, or simply lower than normal, for a known reason. The "known reason" is an important element of these gaps; periods of time where data collection happens to dip below the norm for unknown reasons are not included in this table.
Data from gap periods are not any less "valid" than data from any other times. However, when aggregating and analyzing data, the sparseness of data in a given period may affect the final results. The purpose of this table is to point out such periods and allow users to decide for themselves how to deal with them.
Reasons for gaps vary widely, so they are noted in a text column rather than with a support table of possible "gap reasons". This makes querying for reasons unwieldy, but this is by design; the table is intended to be used as a guide for thoughtful consideration[27] of time periods where gaps in observation may be affecting analyses.
When discussed in this table, a "gap" does not necessarily mean a complete absence of data for the indicated period. It may merely refer to periods where collected data is sparser than usual. Also, a gap does not necessarily indicate that all data types are uniformly sparse. It may be that the gap only applies to a single type of data. Users should pay attention to the Gap_End_Status and Notes columns for details about which data types are affected.
Identification of a gap is done by a data manager. The system is not involved with this process, and does not handle data from gap periods differently than data from any other time periods. Those kinds of judgments are left for the user to make.
A group may have overlapping behavior gaps; it's possible for more than one factor to affect observation of a group at the same time.
A gap's Gap_End must be
after its Gap_Start, or NULL
.
The Gap_End can only be NULL
if the group's GROUPS.Cease_To_Exist is NULL
. This allows for
recording of ongoing, not-yet-completed gaps.
A gap's Gap_End and Gap_End_Status must both be NULL
or
both be non-NULL
.
A unique integer identifying the BEHAVE_GAPS row.
This column is automatically maintained by the
database and must not be NULL
.
The Gid of the group affected by this gap.
This column must contain a Gid value of a row on the GROUPS table. This column may not be NULL
.
The date on which the gap began. This date must be between the group's GROUPS.Start and GROUPS.Cease_To_Exist, inclusive.
This column may not be NULL
.
The date on which the gap ended. This date must be between the group's GROUPS. Start and GROUPS.Cease_To_Exist, inclusive.
This column may be NULL
, see above.
The reason for, or status of, the gap's end. The legal values for this column are defined by the GAP_END_STATUSES support table.
This column may be NULL
, see above.
Text notes about the gap, especially information about the gap's cause.
This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records the basic biographical data on
baboons. It contains one row for each baboon, including still
births and fetal deaths (collectively, fetal losses), on which
data have been collected. In all cases the Statdate value
must not be less than the Birth value. Live animals, those
with a Status of
0
, must have a recorded cause of
death of “not applicable”, a Dcause of
0
. Live animals that have no
associated CENSUS rows (absences excepted) must have a
Statdate equal to their Birth date. Animals with no recorded
cause of death, a Dcause of
0
, must have “not
applicable” as the degree of confidence in both the
nature and agent of death; their DcauseNatureConfidence and DcauseAgentConfidence must both be
0
.
The system will generate an error when it finds a birth date that is later than the the team's last contact with the mother -- when the Birth date is later than the mother's Statdate.[28]
All individuals with an Sname, i.e. those that aren't
fetal losses, must have a Name and will have rows in MEMBERS. Individuals with an Sname
may not have their Sname removed (set to NULL
).
The Psionload program treats an
Sname value of 998
in
a special fashion.
998
may not be used
as an Sname value. See the Psionload
documentation below for details.
Those rows that record data on fetal losses must
maintain the following relations between their data values:
the Sname, Name, Entrydate, and Entrytype values must be NULL
; the
Statdate must be the same as the birth date (Birth); and the
Status must not be 0
(alive).
Because fetal losses have no Sname they cannot have
corresponding CENSUS rows and there will not
be any record of their group membership in MEMBERS.
Entrydate and Entrytype can only be NULL
for fetal
losses--when their Sname is also
NULL
. Otherwise, they cannot be NULL
and Entrydate must be between the
individual's Birth and Statdate values, inclusive. When Entrytype is
B
(Birth), the Entrydate must be the individual's Birth. When Entrytype is any other value, Entrydate cannot equal Birth.
The Statdate of live individuals is derived from the CENSUS table. An actual census does not have to be taken. Any observation of an individual in a group that results in a row being added to CENSUS is sufficient, except that Absences don't count. When there are no non-absent censuses and the individual is alive, then the Statdate is the Entrydate. This column is automatically updated when CENSUS is updated to ensure that these conditions remain true. When the individual is not alive the Statdate is the date of death.
Living individuals, unlike dead ones, can have MEMBERS rows created by the interpolation procedure that locate the individual in a group on a date later than the individual's Statdate. For further information see: Interpolation at the Statdate.
In a like fashion, living individuals, unlike dead ones, can have CYCPOINTS rows created by automatic Mdate generation on a date later than the individual's Statdate. For further information see: Automatic Mdate Generation.
Male Dispersed dates may be after the Statdate when the individual is alive and there are subsequent censuses of the group from which the individual dispersed.
When dates are encoded as intervals to account for uncertainty in the data, as with the CYCPOINTS Edate and Ldate columns, the latter end of the interval may post-date the Statdate.
Aside from the preceding caveats, Babase does not allow data to be related with an individual when the date of the data postdates the individual's Statdate. Therefore Statdate provides a convenient way of determining the end of the time interval during which there are data on an individual, a way that is independent of whether the individual is alive or dead.
An individual's Dcause represents a specific Nature and Agent of death. When considering the associated DcauseNatureConfidence and DcauseAgentConfidence values, it is important to remember that a Dcause should be interpreted as "if Nature, then Agent". It is tempting to assume that this means that the DcauseAgentConfidence cannot be higher than the DcauseNatureConfidence, but this is not so. The DcauseAgentConfidence is assigned contingent on the associated Nature being true, so it is possible for the DcauseAgentConfidence to be higher than the DcauseNatureConfidence. For this reason, the system has no rules validating the DcauseAgentConfidence based on the DcauseNatureConfidence, nor vice versa.
Confidence in the accuracy of the estimated birth date is categorized in the Bstatus column. The estimated range of possible birth dates might not be as symmetrical around the Birth date as is implied in BSTATUSES, so the specific boundaries of this range are recorded in the EarliestBirth and LatestBirth columns.
The EarliestBirth and LatestBirth columns cannot be NULL
,
unless the Bstatus is
9.0
("unknown"), in which case
both EarliestBirth and LatestBirth must be NULL
.
The EarliestBirth must be on or before the individual's Birth, which must be on or before the individual's LatestBirth. LatestBirth must be on or before the individual's Statdate, but only for individuals with non-absent rows in CENSUS.[29]
The LatestBirth must be on or
before the Entrydate, unless the
individual's Entrytype is
B
(Birth). As mentioned
above, when Entrytype is
B
, the Entrydate must equal the Birth date. In these cases, if there is
any uncertainty about when the
individual's "true" birth date is, the LatestBirth might legitimately be after
the Birth date and therefore after
the Entrydate. The LatestBirth should never be long after
the Entrydate[30], so even in these cases there are boundaries
placed on LatestBirth. When Entrytype is
B
(Birth), the LatestBirth cannot be more than
29
days[31] after the Entrydate
unless either or both of them is NULL
.
The system will return a warning if the length of time
between EarliestBirth and LatestBirth is more than Bstatus years[32]. Similarly, the system will return a warning if
the EarliestBirth is more than
(0.5
× Bstatus) years before the Birth date, and another if the LatestBirth is more than
(0.5
× Bstatus) years after the Birth date.
It's possible for an individual's Bstatus to be "too high", based on the length of time between EarliestBirth and LatestBirth. That is, a high Bstatus could mistakenly be used for individuals whose EarliestBirth - LatestBirth range is significantly less than Bstatus years. The system will return a warning if the length of time between the EarliestBirth and LatestBirth is less than or equal to the length of time indicated by a smaller BSTATUSES.Bstatus value.
When inserting or
updating data in this table, the system can use the row's
Bstatus to automatically populate
the EarliestBirth and LatestBirth columns, if desired. When the
Bstatus is not
9.0
("unknown"):
Any provided EarliestBirth or LatestBirth values are inserted into their corresponding columns.
When no EarliestBirth is
provided, this column is calculated as the Birth date −
(0.5
× Bstatus years).
When no LatestBirth is
provided, this column is calculated as the Birth date +
(0.5
× Bstatus years).
While the UNIQUE_INDIVS table contains a larger list of
all individuals across multiple
populations, this table is the primary authority for the
"main" population. When rows are inserted or deleted in this
table, related rows are automatically inserted or deleted in
UNIQUE_INDIVS with IndivId = the Bioid, and PopId =
1
. Individuals in
the main population cannot be added to UNIQUE_INDIVS before being added to this
table.
A unique integer identifying the BIOGRAPH row.
Babase rarely uses this identifier; it exists for the convenience of application programs and for distinguishing individuals without Snames (fetal losses) from each other and from other individuals.[33]
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The short name of the individual. This is an exactly three character long name abbreviation which is used to identify the individual and so must be a unique data value. It may not contain lower case letters or spaces.
The Sname is usually, but not always, the first 3 characters of the Name.
This value appears in many other places in the system and
so should not be changed without changing all the other
places in the database where the abbreviation appears;
really, once established, the only reason to change this
column is because the short name had already been
used. [34] Because this is unlikely, Babase does not
allow the Sname to be changed. The Sname is always
composed of capital letters and may not contain a
space.[35] This column should only
be NULL
if the row represents a fetal loss.
The name of the individual. This is a textual column
used for descriptive purposes. This value must be unique
when a comparison is done in a case insensitive
fashion. This column should only be NULL
if the row
records a fetal loss. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The Pid value, from the PREGS
table, of the individual's mother's pregnancy that ended
in the birth[36]of the individual. This column may be
NULL
. A NULL
value indicates there is no record of
the individual's mother.
More than one individual may have the same Pid, as long as they were products of the same pregnancy. This occurs when twins are born into the study population.
This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.
The sex of the individual. The legal values are:
Code | Description |
---|---|
M | the individual is male |
F | the individual is female |
U | the individual is of unknown sex |
This column may not be NULL
.
The date the pregnancy ends. For live births and for individuals whose maternity is unknown, this is their estimated birth date. Otherwise, this is the date of the fetal loss. (A pregnancy that ends with the mother's death is considered as a spontaneous abortion (fetal loss) for this purpose.)
This column may not be NULL
.
The BSTATUSES.Bstatus categorizing the quality of the birth date estimate.
This column may not be NULL
.
The maternal group of the individual, the Gid of the group into which the individual was born.
This column must contain a Gid value of a row on the
GROUPS table. This column
may not be NULL
.
If the maternal group is not known, the maternal group should be recorded as the unknown group.
The degree of confidence in the assignment of the Matgrp value. The legal values for this column are defined by the CONFIDENCES support table.
This column may not be NULL
.
The date the individual entered the study population.
Because of Interpolation, it may seem like this column could be maintained automatically. However, the opacity of "non-interpolating" rows in CENSUS and the related historical analyses prevent accurate automatic determination of the entry date for many individuals. For more information, see CENSUS.Status and Interpolation, Data are not Re-Analyzed.
This column can be NULL
, only if the row
represents a fetal loss.
The way the individual entered the study population. The legal values for this column are defined by the ENTRYTYPES table.
This column can be NULL
, only if the row
represents a fetal loss.
The status date of the individual. When the individual is alive, this is the latest date on which the animal was censused and found in a group.
This column may not be NULL
.
The state of the individual's life at the Statdate. The legal values for this column are defined by the STATUSES support table.
This column may not be NULL
.
The cause of death or circumstances associated with death. The legal values for this column are defined by the DCAUSES support table.
This column may not be NULL
.
The degree of confidence in the nature of the individual's death or circumstances associated with the individual's death (their DCAUSES.Nature). The legal values for this column are defined by the CONFIDENCES support table.
This column may not be NULL
.
The degree of confidence in the agent of the individual's death or circumstances associated with the individual's death (their DCAUSES.Agent). The legal values for this column are defined by the CONFIDENCES support table.
This column may not be NULL
.
A boolean value indicating whether or not there exist rows on the ALTERNATE_SNAMES table related to the individual's Sname. This value is true if and only if there exists a row on ALTERNATE_SNAMES with an Sname value which is the individual's sname or there exists an ALTERNATE_SNAMES row with a Alternate_Sname value which is the individual's sname.
The value in this column is automatically maintained
and will never be NULL
.
The earliest estimated birth date for this individual.
The values in this column may be calculated automatically, as discussed above.
This column may be NULL
, but only when the
accuracy of the birth estimate is unknown (when Bstatus is
9.0
).
The latest estimated birth date for this individual.
The values in this column may be calculated automatically, as discussed above.
This column may be NULL
, but only when the
accuracy of the birth estimate is unknown (when Bstatus is
9.0
).
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
The population census table. Aside from the BIOGRAPH.Matgrp column, this table is the origin of all information regarding group membership. This table holds all the field census data and any information regarding group membership that is recorded in the field demography notes. It contains one row per animal per group per day censused. There is an additional row per individual per demography note for those days when there is a demography note regarding the individual and group but no census of the group. (See DEMOG.)
One way to have Babase record that an individual is alone is to first create a row in GROUPS meaning alone, and then to assign individuals who are alone to this group. The “alone-ness” of an individual can then be tracked in the same fashion as group membership, although the Babase user does then need to be aware that the members of the “alone” group are not actually proximate to one another.
The system will report individuals who are first censused in a group other than their maternal group (BIOGRAPH.Matgrp). The exceptions to this are when the maternal group is the unknown group or that first census row records an absence.
The system will report individuals with a BIOGRAPH.Sname that do not have any related (non-absent) CENSUS rows.
The Date must be between the
Grp's related GROUPS.Start and Cease_To_Exist, inclusive, with one
exception. Rows indicating absences — rows whose Status is A
— may occur outside of the date range for a group's
lifetime. These may sometimes be needed during fission/fusion
periods to manually prevent an individual from being
interpolated into a group that no longer exists or doesn't yet
exist. However, a need for such absences is rare, so the
system will report a warning for any "absent" censuses before
the Grp's Start or after its Cease_To_Exist, exclusive.
The system will report a warning when CENSUS rows have a Status
of C
or
D
and a Date before the individual's LatestBirth, and another warning if
before the individual's Entrydate.
As noted in the MEMBERS documentation, Babase does not allow an individual to be in more than one group on a given day.
Ideally, the original field census data sheets could be recovered from CENSUS, but there are several situations where that is not possible:
First, a datum is lost when an individual is actually censused in two groups on the same day because of movement between groups and the timing of the censuses.[37] In this situation a decision should be made as to which group CENSUS should record the individual's presence on that day. A demography note should then be added to DEMOG, with text that notes the individual's presence in the second group. This results, technically, in all of the information from both censuses, or other location information, being entered into the database. However, it should be remembered that, because the information regarding the second census is in textual form, it is not readily available to automated tools.
Second, it may be necessary during group fissions and fusions to record a different Grp than what was actually recorded because it is usally not clear in real time that a fission/fusion has begun. There is necessarily a lag between when a change can be seen retroactively and when the field notebooks are actually updated to reflect the existence of the newly-formed group(s). For fusions it is important to construct group membership in Babase carefully, for the sake of maintaining group residency. If an individual is a resident of one parent group and is censused in another, the residency algorithm recognizes the other parent group as an entirely different group. That is, it does not recognize that the groups will soon be related. To prevent a loss of residency due to an apparent group change, censuses in the other parent group(s) should be recorded with the daughter group as the Grp whenever at least some of both parent groups are together.
Example 3.1. Crossovers during a fusion
"Bruce" has been a resident of the "Gotham" group for years. "Clark", meanwhile, is a resident of the "Metropolis" group, and "Diana" is the alpha female (so, definitely resident) of the "Amazons" group. On 01 June, the three will permanently fuse together and form the "JLA" group, after first being seen together on 01 May. (JLA's Start is 01 May, Permanent is 01 June) Throughout May, census records show Bruce making short visits to Metropolis and to the Amazons. Knowing that the groups are in a fusion period, whenever Bruce is with Metropolis or the Amazons, he and all members of the group he is with should be recorded as being in the JLA. Similarly on dates later in the month when Bruce and his close associates Robin and Alfred — along with Clark and sometimes his sister Kara — were with the Amazons, all members of the Amazons and their friends from Gotham and Metropolis should be recorded in the JLA.
In January of the same year, Clark made a brief visit to Gotham. That was before the fusion began in May, so that visit's Grp need not be changed in any way[38].
Third, some CENSUS rows are derived from analyses of historical data and employ MEMBERS-style rows where group members generally have a row on every date of a given month that they were present, rather than just those dates when censuses were performed. See the Status column for details.
Be careful when changing these data. When CENSUS data are inserted, deleted, or updated, the MEMBERS table and BIOGRAPH.Statdate column are automatically updated via Interpolation. Also, remember that rank will almost certainly change should group membership change.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row. Cenid links CENSUS to DEMOG.
This column may not be NULL
.
The date of the census, or the date of the
demography note (when Status is
D
).
The date value must not be more than a year later than the present moment. This rule prevents accidental data entry errors from creating so many rows in MEMBERS that all available disk space is used.
This column may not be NULL
.
The individual whose location is being recorded. The three-letter code that uniquely identifies an individual in BIOGRAPH. There will always be a row in BIOGRAPH for the individual identified here.
This column may not be NULL
.
The group where the individual is located. This is a Gid value from GROUPS. This column should contain the most specific sub-grouping available -- subject to the constraints of the data entry protocol, of course. Aggregation into larger groupings is accomplished by retrieving the associated MEMBERS.Supergroup of the individual on the date of census.
This column may not be NULL
.
Usage exception: For dates between 21 Mar 1990 and 29 Feb 1992, inclusive, the group recorded for the sub-groups of Alto's group do not necessarily reflect the actual groupings of the animals on a particular day, but are instead indications of the group-splitting process. See the Protocol for Data Management: Amboseli Baboon Project document for further explanation.
A one letter code indicating the source of the
location information. Status is the source of MEMBERS.Origin data. The current codes are as
follows: C
(census),
A
(absent),
D
(demography), and
M
or
N
(manual). Other
values derived from analysis of historical data include:
S
, E
,
F
, B
,
G
, T
,
L
, and
R
.
The CENSUS.Status Codes
C
(census) The animal was found in the group on a field census sheet: from the census datasheets. (There may or may not be a corresponding demography note on DEMOG as well.)
A C
Status is
marked on the field census data sheet as an
“X”.
A
(absent) The animal was not found in the group on a field census sheet. Note that while an individual should not be recorded “present” in more than one group on the same day, s/he may be absent from several groups on any given day.
An A
Status is
marked on the field census data sheet as an
“0”.
D
(demography) The animal was noted, in the field notebooks or elsewhere, to be in a group but was not marked present in a field census of a study group on that day.[39] There should be a DEMOG row associated with the CENSUS row. The individual may or may not have been marked “absent” on the same group's field census for the day.[40]
A D
Status is
marked on the field census data sheet as an
“0”, when there exists a
corresponding place on the census data
sheet.
The system will allow CENSUS rows with a
Status of D
to be
entered without there being a corresponding DEMOG row in existence.[41]
However it is expected that these rows exist only
long enough to allow entry of a related DEMOG row. The system will report
CENSUS rows with a Status of
D
that have no related
DEMOG row.
M
(manual, interpolated) This code provides a
way to manually supplement what is in the CENSUS
table when there is no other way to get the data in.
Babase considers this code to be the same as the
C
code.
N
(manual, not interpolated) This code provides an alternative way to manually supplement what is in the CENSUS table when there is no other way to get the data in. This code does not interpolate, it is presumed to be the result of some analysis.
S
(Susan's data) The data comes from the old DISPERSE database where the record had both a Datein and a Dateout.
E
(ending date) The data comes from the old DISPERSE database where the record had a Datein but not a Dateout.
F
(final date) The data comes from the old DISPERSE database where there is a Dateout and the last recorded location is before the Statdate.
B
(birth date) The data comes from the old DISPERSE database where the record had a Dateout but not a Datein.
T
(total) The data comes from the old DISPERSE database where the record had neither a Datein nor a Dateout.
G
(gap) The data are a record of the animal in the unknown group when the animal appeared in the old DISPERSE database but where there was a gap between times of recorded location.
L
(lineage) The group is from the Matgrp on the old CYCTOT database, either because the animal did not appear in the DISPERSE database, or because the first location for the animal in the old DISPERSE database had a Datein and this Datein was after the birth date of the animal.
R
(result of Alto's breakup) The datum is
S
,
E
,
F
,
B
,
G
,
T
, or
L
datum that has had
locations which were changed from 1.0 to the group
in which the animal was censused on 15/4/92. This
change left all R
rows as
part of a contiguous series of days during which the
animals are located in the Alto's sub-group as
censused on 15/4/92, and the time-adjacent locations
were not 1.0.
This column may not be NULL
.
Cen is whether or not the CENSUS row represents an entry on
a field census data sheet. TRUE
means the CENSUS row exists because of an
entry on a census data sheet, FALSE
means there was no
census done and the CENSUS
row exists to support a demography note, manual notation
of absence, etc. Cen should only be TRUE
when Status is
C
,
A
, or
D
.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records the dates of first consortship for males; this is a maturational milestone in males that we have analyzed in several contexts. It contains one and only one row for every individual for which there is a recorded first consortship. Individuals who have not yet consorted, or individuals that have consorted but whose first consortship date is not known, do not appear in the table.
Currently it only contains values for males; females may be added if desired.
All dates are exact, no “BY” dates are entered as we do for MATUREDATES and RANKDATES, so there is no “Status” column.
When there is a row in this table there must be a sexual maturity date in MATUREDATES, and the consortship date must be later than the sexual maturity date. The Consorted date cannot be before the individual's Entrydate, nor after the individual's Statdate. The individual must be at least 5 years of age on his Consorted date. The system will report a warning if the individual is 12 or more years of age on his Consorted date.
A three-letter code (an id) that uniquely identifies
a particular animal (an Sname) in BIOGRAPH. This code can be used
to retrieve information from BIOGRAPH or other places where
the animal's three-letter code appears. This column may
not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table holds the text that records group membership information not written on the regular field census sheets, especially that from the field demography notes. DEMOG provides a means of notating CENSUS rows, and thus facilitates management of additional “free form” CENSUS rows, rows that do not directly correspond with the field census sheets.[42] Thus, in conjunction with these corresponding CENSUS rows, the DEMOG rows capture group membership information that otherwise would not appear in the CENSUS table.
DEMOG contains one and only one row for every individual for every date for every group where the individual was noted present in free form textual field notes or other miscellaneous sources. The DEMOG row holds textual information. There is always exactly one corresponding CENSUS row, which holds the corresponding group membership information in the usual coded and structured form. (Note that only some CENSUS rows will have DEMOG rows; CENSUS rows that originate entirely in the regular censuses of groups will not, in general, have an associated DEMOG row). A single field note referring to more than one individual must appear in DEMOG as two (or more) separate rows, one row per individual. Multiple field notes pertaining to a single individual on a single date must be combined into one piece of text and entered in a single DEMOG row. (See the Protocol for Data Management: Amboseli Baboon Project for structure of the demography data as entered by the operator.)
Adding or removing DEMOG rows automatically updates the CENSUS.Status column of the corresponding CENSUS row.
Use the DEMOG_CENSUS view to upload datasets into this table. Use CENSUS_DEMOG view to maintain this table by hand.
The data integrity rules require that when a demography note is entered the CENSUS row be created before the related DEMOG row.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row. Cenid links CENSUS to DEMOG.
This column may not be NULL
.
A code that identifies the written field notebook or other source where the demography note can be found.
The legal values for this column are defined by the
DEMOG_REFERENCES support table, see
below. This column may not be NULL
.
The demography note text pertaining to the CENSUS row with the given Cenid.
This column may be NULL
.[43]This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records dates of dispersal for males (females do not disperse and do not appear in this table). It contains one and only one row for every male who has a known date of dispersed from the study groups. Males who have not yet dispersed do not have a row in this table. Only males can have rows on this table.
All dates are exact, no “BY” dates are entered as we do for MATUREDATES and RANKDATES, so there is no “Status” column.
The system will report a warning when there is a row in
this table and there is no sexual maturity date in MATUREDATES. The Dispersed date must be on or
after the individual's Entrydate.
The Dispersed date cannot be after the individual's Statdate when the individual is not alive
(when BIOGRAPH.Status is not
0
). When the individual is alive
the Dispersed date may only be after the Statdate when the individual has been
censused absent (CENSUS.Status is A
)
in the group[44]
and the Dispersed date is not after the earliest such
post-Statdate census date.
The system will returning a warning when the Dispersed date is before his LatestBirth.
A three-letter code (an id) that uniquely identifies a
particular animal (an Sname) in BIOGRAPH. This code can be used to
retrieve information from BIOGRAPH or other places where the
animal's three-letter code appears. This column may not be
NULL
.
The degree of confidence in the assignment of dispersal date or rationale behind the assignment of the dispersal date. The legal values for this column are defined by the CONFIDENCES support table.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every group on which
there is some recorded information. This includes not only the
study groups and non-study groups, but also temporary
daughter groups and the special group “Unknown”[45](See the Protocol for Data
Management: Amboseli Baboon Project for when to use this special
group.) When a daughter group becomes a regular group (after a
fission or fusion is complete), the new group should be given
a Permanent date to indicate that it
is now a permanent group (Permanent
is not NULL
). Any “old” daughter groups that
did not become permanent should be left in GROUPS to support
the daughter grouping membership history.
This table serves primarily as a tool for the system for data validation. To see its contents in a more human-readable format, use the GROUPS_HISTORY view.
Every reference to a group elsewhere in the Babase
system corresponds to a Gid
of one of the records in this table. Temporary groups (those
with Permanent of
NULL
) must have a
non-NULL
> From_group value. Permanent
groups must not have a Permanent
value that is earlier than their Start value. Permanent
groups may or may not have a NULL
From_group value.
Note that there is no particular reason to remove from GROUPS those daughter groups that exist for only a short time during group fission. Those sorts of groups can remain temporary forever.
The MEMBERS.Supergroup column may be used to determine the supergroup of an individual on any given date.
Neither a GROUPS row's From_group value nor its To_group value may be the same as its Gid value.
A group's Permanent and From_group cannot both be NULL
. But both
can be non-NULL
.
The Cease_To_Exist value must
be NULL
or greater than the Start value. The Study_Grp value must be NULL
or must not
be less than the Start value. When the
Cease_To_Exist
and the Study_Grp value are both
non-NULL
the Study_Grp value must
not be after the Cease_To_Exist
value.
The Cease_To_Exist value must also be greater than or equal to all daughter groups' Start values.
The Last_Reg_Census value must
be NULL
or greater than the Start
value. It also must be less than or equal to the group's
Cease_To_Exist date, unless the
Cease_To_Exist is also NULL
. And Last_Reg_Census must be NULL
or Study_Grp must be NULL
or Last_Reg_Census must be on or after the
Study_Grp date. The Last_Reg_Census must be NULL
when Study_Grp is NULL
.
The Cease_To_Exist must be the
day preceeding the Permanent date of
any daughter groups, unless the daughter group's Permanent is NULL
. An important
consequence is that all of a group's permanent daughter groups
must have the same Permanent
date.
A group that is a fusion product cannot have a fission
parent -- the From_group must be
NULL
when the group is the result of group fusion, i.e.,
when the group's Gid appears in the
To_group column of another group.
[46]
The system enforces the rules of the 3 previous
paragraphs "on-commit". In a transaction ending with a
ROLLBACK
, any changes to this table will
not be validated against these rules. This means it is
possible for an invalid change to appear error-free if
executed in a rolled-back transaction. Committed
transactions (and commands executed outside of transactions)
perform this check as expected.
The One_letter_code value must be unique within the time period from the group's Start date through the group's Cease_To_Exist date, inclusive of endpoints.
Individuals cannot be placed into rows in the CENSUS table before the Start date of the group, or cannot be
censused in the group at all if the value of the Start column
is NULL
. Individuals cannot be placed into rows of the
CENSUS table after the Cease_To_Exist value of the group. Note
that both these restrictions apply to all CENSUS rows, even those that indicate the
individual is absent from the group.
Gaps in observation of a group cannot be added to the BEHAVE_GAPS table if the Gap_Start or Gap_End are before the Start date of the group. Similarly, gaps cannot be added to BEHAVE_GAPS if the Gap_Start or Gap_End are after the Cease_To_Exist date.
Some gaps in BEHAVE_GAPS may have a Gap_Start date that is equal to the group's Start or Permanent date, implying that the gap started because of the opening of observation of the group.[47] Gaps may also have a BEHAVE_GAPS.Gap_End date equal to the group's Last_Reg_Census or Cease_To_Exist date, implying that the gap ended because of the group's end.[48] If the Start, Permanent, Last_Reg_Census, or Cease_To_Exist column is updated, then these implications will no longer be true. The system makes no attempt to judge whether these implications really are true or just coincidence, so data managers must exercise this judgment. When changing any of these dates in GROUPS, be sure to check for rows in BEHAVE_GAPS with Gap_Start or Gap_End dates that also should be updated, and correct them as needed.
Group
9.0
,
Unknown
, has a special meaning.
Individuals are placed in this group by Interpolation when their
whereabouts are unknown. Also, a SWERB_DATA.Seen_grp
value of 9.0
in rows with an
Event value of
O
indicates an exceptional
circumstance where Seen_grp is
allowed to equal the related SWERB_BES.Focal_grp
value. Another group code for unknown whereabouts should
not be created.
The 10.0
group has the
special meaning of “lone animal”. The SWERB_UPLOAD view uses this value as the SWERB_DATA.Seen_grp
when a lone animal is sighted. Another group code for lone
animals should not be created.
The 99.0
group has the
special meaning of “predator sighting”. The
SWERB_UPLOAD view uses this value as the
SWERB_DATA.Seen_grp when a predator is sighted.
Another group code for predator sightings should not be
created.
A positive numeric value with six digits (4 decimal
places) that identifies the group. Each Gid must be
unique. This column may not be NULL
.
The spelled out name of the group. This column must
be unique, and unique insensitive of case. This column may
be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The Gid of the group from
which this group split off, if the group is a fission
product. This column may be NULL
.
The Gid of the group formed when the daughter of the group is the result of group fusion.
This column may be NULL
to indicate there is no
daughter group or the daughter groups are fission
products.
This column contains the date the group became a
permanent, regular group, or contains NULL
if it has not
and is a temporary daughter group. For groups that were
created as a result of fissions or fusions this
column represents the end date of the fission/fusion
period. For groups that were already intact when
observation began this column represents the
first day of observation on that group.
Permanent affects whether or not an individual can be censused only in a daughter group and still be ranked in the parent supergroup. See RANKS and MEMBERS.Supergroup for further information.
The date the group came into existence (or the
earliest date it must have existed in the case of those
groups existent before they were monitored.) The value of
this column may be NULL
to indicate the group exists but
is not monitored.
If any parent group has the daughter group as its To_group then the start date is also the date the fusion started.[49]
The date on which the group is deemed to have
permanently dissolved into fission products or merged into
a fusion product. This column may be NULL
for groups
still under observation, groups that have not yet
dissolved/merged, and groups whose dissolution/merge
occurred while not under regular observation.
The date of the last regular census done on the
group for study groups that were dropped or ceased to
exist because of fission/fusion. This column may be NULL
if the group hasn't been dropped or was never a study
group.
A 3 character, and exactly 3 character, code that
uniquely identifies the group. The characters must all be
upper case. This code is used by the Psion data
collection devices and in SWERB observations taken using
handheld GPS units and exists solely as a cross reference
from those devices to the regular Babase group Gids. This
column may be NULL
if the group is never monitored using
the Psion devices or SWERB GPS devices.
A 1 character, and exactly 1 character, code that
uniquely identifies the group within the time period of
the groups existence. The character must all be upper
case. This code is used to cross reference SWERB waypoint
data to the regular Babase group Gids. This column may be
NULL
.
The date the group first became an "official" study
group[50] or NULL
if the group was never a study
group.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records sexual maturity dates, the dates of menarche or testicular enlargement. It contains one and only one row for every animal who matured in a study group or who lived in a study group as a sexually mature individual, and it may occasionally contain a row for a male who was known to mature but who did not live in a study group. Individuals who have not yet matured do not have a row in this table. All sexually mature individuals should have a row in this table. Entry into sexual maturity is not always an obvious or definite event[51], especially for males, so the Matured may be recorded as the first of the month in which the individual entered maturity.
There are restrictions on when an individual may become
mature. The age of an individual at sexual maturity (Matured) must be at least
1016 days. This is about 2.7 years of age.
The system will issue a warning when the sexual maturity
occurs on or before the 3rd birthday. Individuals with a
Mstatus of
O
(On) must be mature before
2922 days of age (8 years). The system will
issue a warning when the sexual maturity occurs on or after
the 7th birthday. An individual's sexual maturity date must
be on or before his Statdate.
Some maturity dates are based on irregular observations of individuals before the long-term study began, or before the individuals entered an "official" study group. Either way, these individuals' Matured dates may be long before their Entrydate. Because of this, the system will allow but issue a warning when the month of the maturity date is earlier than the month of the individual's entry into the study population (their Entrydate).
For females, when Mstatus
is O
(On) Matured must be the first
T
date recorded in the
female's sexual cycling data in the CYCPOINTS table. When Mstatus is not
O
Matured may not be after the first
Tdate.
Changing a female's first Tdate can automatically change the female's Matured date. See CYCPOINTS.
A three-letter code (an id) that uniquely identifies
a particular animal (an Sname) in BIOGRAPH. This code can be used
to retrieve information from BIOGRAPH or other places where
the animal's three-letter code appears. This column may
not be NULL
.
This is the date of menarche for females and the
date of testicular enlargement for males, when either of
these dates are known. Otherwise, this is the date by
which the individual is considered to be sexually mature.
See the Protocol for Data
Management: Amboseli Baboon Project for more information regarding the
dates used when the transition to maturity was not
observed.[52] This column may not be NULL
.
The status of the maturity date, that is, its
precision, accuracy, quality, or other pertinent
characteristics when it comes to the use of the value.
The legal values for this column are defined by the MSTATUSES support table, see
below. This column may not be NULL
.
This column records whether the animal became mature ON a given (known) date, or BY a given (known) date. If a date is designated as an “ON” date[53] then we are saying that we know the animal attained that marker ON that date.[54] If a date is designated as a "BY" date the animal was adult or subadult BY that date but we do not know when the individual attained it. This scheme allows easy identification of which animals are infants or juveniles on any given day and which are not.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records dates individuals first attained adult rank. It allows one and only one row for every individual who has attained adult rank. Individuals who have not yet obtained adult rank do not have a row in this table.
The system will report a warning when an individual has a rank (in RANKS) before their Ranked date that is higher (where 1 is highest) than another individual who has already attained adult rank.
RANKDATES currently contains only data for males but data for females may be added.
When there is a row in this table there must be a sexual
maturity date in MATUREDATES. When MATUREDATES.Mstatus
is O
(On) then the rank attainment
date must be later than the sexual maturity date. Otherwise,
the rank attainment date must not be before the sexual
maturity date. The Ranked date cannot be after the
individual's Statdate. All
individuals must be 5 or more years of age
on their rank attainment date. Individuals with a Rstatus of
O
(On) must be less than
12 years of age on their rank attainment
date. The system will report a warning for any males over
8.5 (exclusive) that have not yet attained
adult rank.
It is possible that an individual will be known to have attained rank in a non-study group before they entered the study population (their Entrydate). Because of this, the system will allow but issue a warning if an individual's Ranked is before the first of the month of his Entrydate.
A three-letter code (an id) that uniquely identifies
a particular animal (an Sname) in BIOGRAPH. This code can be used
to retrieve information from BIOGRAPH or other places where
the animal's three-letter code appears. This column may
not be NULL
.
The date the individual first attained a rank among
adults. The date must fall on the first of the
month. This column may not be NULL
.
The status of the rank date, that is, its precision,
accuracy, quality, or other pertinent characteristics when
it comes to the use of the value. The legal values for
this column are defined by the MSTATUSES support table. This
column may not be NULL
.
The legal values for this column are O (for ON) and B (for BY), as with Mstatus in the MATUREDATES table above.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This section contains data about specific physical traits of the individuals.
In general, the data in this section tends to be collected "globally". That is, the data are collected for all living individuals in the study population (or as close to "all" as is possible). While many of the Darting tables could also easily be considered "physical traits", those data are only collected during dartings and therefore are not "global" but instead only available for individuals in that fraction of the population that has been darted.
Records which body parts were affected in each related wound or pathology Cluster, and the quantity of these wounds/pathologies affecting the specific part when that quantity is known. This table contains one row for each recorded body part, per associated wound/pathology (from WP_DETAILS) cluster. For example, if a report indicates two clusters affecting body part A and another cluster affecting body part B, this will be recorded in three rows in this table: two for body part A and one for body part B.
Each WPDId-Bodypart pair must be unique; a wound/pathology cluster can be associated with a particular body part only once.
The Quantity_Affecting_Part column
records the quantity of individual wounds/pathologies in the
related cluster that are affecting this row's body part. When
this quantity is unknown or unclear from the report, or when
the related wound/pathology is not obviously countable
(e.g. "fatigue"), this column should be NULL
. When a single
wound/pathology affects more than one body part, this
wound/pathology will be counted more than once: the Quantity_Affecting_Part column
should be 1
for each of the affected parts'
separate rows. For example, if there was a long
slash/laceration extending from the arm to the trunk, this
would be recorded with a Quantity_Affecting_Part of
1
in both the "arm" row and the "trunk"
row, effectively counting a single wound twice.
Remember, the Quantity_Affecting_Part column indicates the number of wounds/pathologies that were affecting the specified body part. When aggregating data across multiple rows (e.g. sum, average, etc.), remember that individual wounds/pathologies affecting multiple body parts will be counted in more than one row of this table. Using this column to count the number[55]of discrete, independent wounds/pathologies may overestimate the true number.
When adding or updating data in this table, use the WP_DETAILS_AFFECTEDPARTS view. It is includes related columns from BODYPARTS to facilitate easy entry of Bodypart values, and from WP_DETAILS to determine the appropriate WPDId.
Use the WP_HEALS or WOUNDSPATHOLOGIES views to select data from this table. These views include related data from the other wound/pathology tables, respectively with and without related healing updates.
A unique identifier for the row. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The WP_DETAILS.WPDId of the wound/pathology associated with this body part.
This column may not be NULL
.
A positive integer indicating how many wounds/pathologies of the related type are affecting this body part.
This column may be NULL
, when the quantity is
unknown, unclear, or uncountable.
When not NULL
, this column cannot exceed
9
[56].
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for each cluster of wounds or pathologies that are indicated in a report.
Similar to our use of "report" in WP_REPORTS, a "cluster" of wounds/pathologies is mostly just a data management term: useful for bookkeeping but lacking biological relevance. For our uses, a "cluster" is a group of one or more co-occuring wounds/pathologies of the same type (same WoundPathCode). They are "co-occurring" in that these wounds/pathologies were observed to appear on the same date and were likely not acquired independently. The decision to divide multiple wounds/pathologies into separate one-wound/pathology clusters or to group them into clusters of multiple wounds/pathologies is mostly made by the data manager.
In many cases, how exactly to cluster a set of wounds/pathologies is not a decision, but a necessity. When multiple wounds/pathologies of the same type are indicated on a report, there may be particular MaxDimension, ImpairsLocomotion, and/or InfectionSigns values that apply to some but not all of the wounds/pathologies (e.g. "This one slash is impairing locomotion, those other three are not"). In these cases, it is necessary to divide the multiple wounds/pathologies into separate clusters.
Clusters are numbered in the Cluster column, which must be unique per WPRId.
Some WoundPathCode values
may inherently imply that the ImpairsLocomotion or InfectionSigns column(s) be a particular
value. For example, if an individual is limping, by definition
this pathology is impairing the individual's locomotion so the
ImpairsLocomotion value should
always be Y
for that pathology. Because of this
possibility, some validation of the ImpairsLocomotion and InfectionSigns columns is controlled by
values in the WP_WOUNDPATHCODES table and
its ImpairsLocomotion and
InfectionSigns columns.
See the WP_WOUNDPATHCODES documentation for
more details.
The system will return a warning for any WP_DETAILS rows that do not have at least one related row in WP_AFFECTEDPARTS.
When adding or updating data in this table, use the WP_DETAILS_AFFECTEDPARTS view. It facilitates inserting or updating data with the related WId instead of the WPRId.
Use the WP_HEALS or WOUNDSPATHOLOGIES views to select data from this table. These views include related data from the other wound/pathology tables, respectively with and without related healing updates.
A unique identifier for this row. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The WP_REPORTS.WPRId of the report in which this wound/pathology cluster was recorded.
This column may not be NULL
.
The WP_WOUNDPATHCODES.WoundPathCode for this wound/pathology cluster.
This column may not be NULL
.
A positive integer identifying this cluster of wounds/pathologies.
This column may not be NULL
.
The estimated maximum dimension of this cluster's wound or wounds (e.g. length, depth, etc., as applicable), in centimeters.
This column may be NULL
, when a dimension is not
recorded or not applicable.
A character, indicating if this cluster's
wound/pathology impairs the individual's locomotion. Legal
values are Y
,
N
, or
U
, meaning "Yes",
"No", and "Unknown" (or Uncertain, or Unspecified),
respectively.
This column may not be NULL
.
A character, indicating if signs of infection
(e.g. oozing, stiffness, redness) were observed. Legal
values are Y
,
N
, or
U
, meaning "Yes",
"No", and "Unknown" (or Uncertain, or Unspecified),
respectively.
This column may not be NULL
.
Comments or descriptive notes about this wound/pathology.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for each instance where an observer provides an update on a report. These updates discuss how the wounds/pathologies have healed (or possibly how they haven't healed), so these updates are generally referred to as "heal updates".
Heal updates may be very specific, referring to a
particular body part (a WP_AFFECTEDPARTS
row). They may be a bit more vague and refer only to a
particular wound/pathology (a WP_DETAILS
row). They may even be so vague that the identity of the
report (a WP_REPORTS row) being updated is
the only "known" datum. To flexibly accommodate this
variation, this table includes the WPRId, WPDId, and WPAId columns. These columns allow
the recording of the report being updated, the particular
wound/pathology from that report, or the body part affected by
that wound/pathology, respectively. In each row of this table,
one of these columns (WPRId,
WPDId, and WPAId) must not be NULL
, and the
others must be NULL
.
The Date in this table must be on or after the associated report's Date.
A heal update may indicate that an individual is
missing, or presumed dead. For this reason, the Date may be after the individual's
Statdate. However, the system will
send a warning when the Date
is more than
90
days after
the individual's Statdate.
When wounds/pathologies are especially severe or
life-changing, heal updates may continue for years after the
related Date. However, these are
rare. The system will return a warning when a Date is more than
365
days after its
related Date.
Use the WP_HEALS view instead of this table. It presents the data in a format more hospitable for humans to read, and performs the somewhat-tricky task of joining the different ID columns to their respective wound/pathology tables.
A unique identifier for the row. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The WP_REPORTS.WPRId of the related wound/pathology report, if a more specific indicator is not known.
The WPDId of the related wound/pathology cluster, if known and if the related body part being updated is not known.
The WPAId of the related body part, if known.
The WP_HEALSTATUSES.HealStatus indicating how well the related wound(s)/pathology(ies) have healed.
This column may not be NULL
.
Textual notes about the healing (or lack thereof) in this update.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records the observers of the wound/pathology reports, one row for each observer. When a report has multiple observers, each of them is recorded in this table in a separate row.
Each WPRId-Observer combination must be unique; a report cannot have the same observer more than once.
Use the WP_REPORTS_OBSERVERS view to insert data into this table. It provides a simple way to determine the appropriate WPRId value to use, and for a human data enterer to provide multiple observers in a single row.
Use the WP_HEALS or WOUNDSPATHOLOGIES views to select data from this table. These views include related data from the other wound/pathology tables, respectively with and without related healing updates.
A unique identifier for the row. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The WP_REPORTS.WPRId of the related report.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
Records each distinct report of wounds/pathologies for an individual. When a wound or pathology is first seen, field observers usually report it with a specialized form that helps systematically report various pieces of pertinent data. These data may include but are not limited to which kind(s) of wound/pathology was observed, which body part(s) was affected, wound size (if applicable), and updates on later dates describing how well the individual has healed. These reports may describe something small like a single scrape or limp, or something larger like a set of bleeding wounds on several body parts. This table contains one row for each of these wound/pathology "reports".
It is difficult to provide a precise definition of a "report" in this sense. The aforementioned specialized forms are not always used, so a "report" does not always refer to these forms. Some wounds may be especially serious or life-altering, and pathologies may be chronic or reoccuring, meaning that some wounds/pathologies may recur throughout life in several reports. Because of this, each "report" is not necessarily a distinct "instance" of wound/pathology. Frankly, the distinction between "reports" made in this table is mostly artificial, useful for bookkeeping but lacking biological relevance. In general, each "report" is a discrete observation of wounds/pathologies for an individual on a specific date. A "report" may be an elaborate form, a brief note, or something in between.
Each combination of Sname,
Date, and non-NULL
Time must be unique; an individual can
have multiple reports on the same date, but not at the same
time.
The Date must be between the individual's Entrydate and Statdate, inclusive; the individual must be alive and in the study population when the report was created. The system will return a warning if the Date is before theindividual's LatestBirth.
The Grp indicates the group written on the form by the observer. For a variety of reasons (e.g. immigrations, group fissions/fusions), the Grp column may be different from the individual's Grp on this Date. Because of this, validation of the Grp column is limited: the Date must be on or after the group's Start date. The system will return a warning when a report occurs after its Grp has ceased to exist; that is, the system will return a warning when the report's Date is after the group's Cease_To_Exist.
The system will return a warning for any WP_REPORTS rows that do not have at least one related row in WP_DETAILS.
When adding new data to this table, use the WP_REPORTS_OBSERVERS view. It simplifies the process of adding multiple observers to WP_OBSERVERS.
Use the WP_HEALS or WOUNDSPATHOLOGIES views to select data from this table. These views include related data from the other wound/pathology tables, respectively with and without related healing updates.
A unique identifier for this report. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
A unique identifier for this report, generated by data management.
This column may not be NULL
.
The date that the wounds/pathologies in this report were first observed.
This column may not be NULL
.
The time that the wounds/pathologies in this report were first observed, if known.
This column may be NULL
, when the time is
unknown.
The BIOGRAPH.Sname of the individual whose wounds/pathologies are described in this report.
This column may not be NULL
.
The GROUPS.Gid of the group in which the individual was located when the wounds/pathologies were recorded, according to the observer(s).
This column may not be NULL
.
Comments or descriptive notes about the wounds/pathologies from the observers on initial observation.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
Records of the initiation and cessation of continuous periods of observation during which all of a female's cycling events are presumed, for the purpose of analysis, to have been observed. This table contains one row for each female for each initiation or cessation of a continuous period of observation.
A female is considered to be under continuous observation when all of her sexual cycle transition events -- Mdates, Tdates, and Ddates -- are observed or clearly implied by observational data.[57] When CYCGAPS contains a record of observation cessation this is an indication that some of a female's sexual cycle events have gone unrecorded. For this reason when the interval enclosed by a Mdate, Tdate, Ddate sequence contains CYCGAPS rows indicating interruption of observation, the sexual cycle transition dates to either side of the interruption must be in different sexual cycles. For further information on this and other ways CYCGAPS interacts with the rest of Babase, see the documentation on the CYCLES, CYCPOINTS, PREGS, and SEXSKINS tables.
The presumption is that females are under continuous
observation -- females with no CYCGAPS are presumed to be
under continuous observation. Consequently a female's
earliest CYCGAPS Code must be
E
(End), denoting the end of a
period of observation.
A female may not have two “start of
observation” (Code S
)
without an intervening “end of observation” (Code
E
), or vice versa. Otherwise
there would be starts without ends or ends without starts.
Single day observation rows ("points", Code
P
) may only occur between an
end of observation/start of observation pair of rows. There
must be a 1-day interval between a female's CYCGAPS rows, with
the single exception that an end of observation may be dated
the day after a start of observation. Otherwise the same
pattern of observation could be recorded using fewer
rows.
Rows with a Code value of
S
(Start) or
P
(single Point), that mark the
beginning of observational periods or that represent isolated
single days of observation, must have a value in the State
column. All other rows, those with a code of
E
(End) that represent the end of
an observational period, must have no value (NULL
) in the
State column. When a State value is present, it must
correspond to the sexual cycle transition information on CYCPOINTS. For further information regarding
required correspondences between CYCGAPS and CYCPOINTS, and how changes in CYCPOINTS can automatically change CYCGAPS with a
Code of S
, see the
CYCPOINTS documentation below.
To simplify updates to this table, all of the above conditions are validated on transaction commit.
Any changes to the Date or
Code — including
UPDATE
and all INSERT
and DELETE
commands — cause
cascading updates to the CYCGAPDAYS table
upon transaction commit. However, the
validation for several other tables — especially CYCPOINTS — depends on the accuracy of
CYCGAPDAYS. As a result, transactions
involving simultaneous updates to both CYCGAPS and CYCPOINTS may result in spurious data, because
validation on the latter may not be reliable. Therefore,
when making changes for a given individual to both CYCGAPS
and CYCPOINTS, don't do them in the same
transaction. Specifically, CYCGAPS inserts, updates, or
deletes should be performed in a transaction where no other
tables are affected[58][59].
Only females may have CYCGAPS rows.
This table is used in the construction of the sexual cycle day-by-day tables. It also affects the determination of which sexual cycle events (CYCPOINTS) are part of a single sexual cycle (CYCLES), the construction of automatic Mdates, and the validation of sexual cycles with respect to pregnancies.
The State value is ignored in all a female's CYCGAPS rows with Dates on or before the female's Matured, excepting the row with the latest date, as the sexual cycle day-by-day tables contain no rows before the date of sexual maturity.
The combination of Sname and Date is unique.
All rows must be while the individual is alive. That is, the Date must be on or after the individual's Birth and on or before her Statdate.
The short name of the female. This column should
contain the Sname of a female
in BIOGRAPH. This column may not be
NULL
.
To simplify the database code, this value may not be changed.
What kind of endpoint the date records. Legal values are:
Code | Mnemonic | Definition |
---|---|---|
S | Start | the date is the start of a period of observation |
E | End | the date is the end of a period of observation |
P | Point | the date is an isolated observation that belongs with no other observations, it is both a start and an end of an observational period |
The state of the female's sexual cycle on the given date. Valid values are:
Code | Mnemonic | Definition |
---|---|---|
M | menses | follicular -- Mdate (inclusive) to Tdate (exclusive) |
S | swelling | follicular -- Tdate (inclusive) to 5 days prior to Ddate (exclusive) |
O | ovulating | 5 days prior to Ddate (inclusive) to Ddate (exclusive) |
D | deturgesence | luteal -- Ddate (inclusive) to Mdate (exclusive) |
P | pregnant | Ddate (exclusive) to birth (exclusive) |
L | lactating | birth (inclusive) to Tdate (exclusive) |
Must not be NULL
when Code is
S
or
P
, must be NULL
when code
is E
. See discussion in the
table description above.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records information on the sexual cycle of the females, one row per female per cycle.
The user may find it easiest to understand the function of this table by considering the CYCPOINTS_CYCLES view, which joins the CYCLES and CYCPOINTS tables.
Rows in this table depend upon rows in CYCPOINTS: Babase automatically manages the creation and destruction of CYCLES on the basis of the the sexual cycle transition events recorded in CYCPOINTS. The fundamental sexual cycle is a Mdate, Tdate, Ddate sequence. The Babase system automatically creates one row in CYCLES for every Mdate, Tdate, Ddate series in CYCPOINTS. Similarly, if an Mdate, Tdate, Ddate series is removed from CYCPOINTS, the corresponding row is removed from CYCLES. However, the rules Babase uses when automatically creating, destroying, or updating CYCLES are complicated by menarche, death, and gaps in observation.
In some cases there are turgescences of small size and short duration that typically occur during a pregnancy, prior to maturity, after a span of time spent in postpartum amenorrhea, or during a time of stress. These brief turgescent periods are not recorded as cycles because they were deemed too brief or small to be biologically functional.
If it ever becomes desirable to know when these brief turgescences occurred, this information should be recoverable from the SEXSKINS table, where the actual size of a female's turgescence is recorded.
CYCLES is special in that some of its data are automatically maintained by the system. The columns Seq and Series are updated automatically. For further information see the documentation that follows, and each column's documentation.
CYCLES rows should always have related CYCPOINTS rows[60], but as a practical matter it is necessary to create the CYCLES row before creating the related CYCPOINTS rows. This requires noting the Cid of the new cycles row so that it can be referenced in the new CYCPOINTS rows. Rather than do this by hand the CYCPOINTS_CYCLES view can be used. This allows a Sname to be specified with each new CYCPOINTS row and leaves it up to the system to either find or create an appropriate CYCLES row.
The system will report as an error those rows on CYCLES
with no related CYCPOINTS rows[61]. CYCLES with no related CYCPOINTS must have a NULL
Seq.
The aggregation of CYCPOINTS rows into cycles is automatically managed by Babase. The determination is based on the order in time of a female's CYCPOINTS rows and the information on gaps in observation present in CYCGAPS. The transition events recorded in CYCPOINTS are collected into sexual cycles, each cycle having (at most) an onset of menses date (Mdate), an onset of turgesence date (Tdate), and an onset of deturgesence date (Ddate), appearing in the order given here when ordered by date, and with none of the female's other Mdate, Tdate, or Ddate CYCPOINTS rows on the interval. Some sexual cycles may lack one or more of the transition events. This may occur for biological reasons — there must not be a resumption of menses date (Mdate) in an individual's first adolescent cycle, nor in the firct cycle after a pregnancy — or simply because there are no data available to identify the date(s). In the latter case, CYCGAPS should be updated with a record of the gap in observation and the respective row is omitted from CYCPOINTS.
Part of Babase's automatic management of cycles is the
management of cycle sequence numbers. Babase assigns a
sequence number (Seq) to each of a
female's cycles, beginning with 1
at
menarche and counting up.
As a consequence of the numbering scheme, the sexual cycle
with a sequence (Seq) of
1
must not have an onset of menses date
(Mdate).
Gaps in periods of continuous observation (CYCGAPS) impact Babase's determination of what constitutes a cycle. The presence of a gap in observation forces a change in cycle. (However, gaps in observation, missing cycles, do not cause gaps in the sequence numbering.) The introduction or removal of a gap, or for that matter the addition or removal of new CYCPOINTS rows, can result in the split of an existing cycle into two -- the creation of a new CYCLES row --, or the merging of two previously distinct cycles into one -- the destruction of an existing CYCLES row. When this occurs the later CYCPOINTS rows retain their Cid, it is the earlier CYCPOINTS rows that change their Cid and “move” between cycles.[62][63][64]
The sexual cycles themselves are aggregated
into periods of continuous observation, termed series,
indicated by the assignment of a Series number to each CYCLES row. The aggregation of a female's sexual
cycles into a series is also automatically managed by Babase,
based on the information in CYCGAPS.
Although series are computed based on CYCGAPS, the series value aggregates and numbers
sexual Mdates, Tdates, and Ddates, not periods of observation.
A consequence is that some periods of observation may not have
an associated Series number. Some
observational periods may occur before the female's sexual
maturity date or before any recorded sexual cycle transition
events (CYCPOINTS). An individual's first
period of continuous observation containing Mdates,
Tdates, or Ddates has a Series of 1
, the second
a Series of 2
,
etc.
Aggregating a female's CYCLES rows into a series indicates that the collection of data points is believed to be complete, no unobserved or unrecorded sexual cycle transitions (CYCPOINTS rows) occurred during the time spanned by the series. This allows the Series to be used as the basis for an analysis of sexual cycle transition intervals.
Those CYCLES with a Series of 1
for
those females that have an O
(On)
Mstatus have Seq values that
equal the ordinal numbering of the female's actual cycles,
her first ever cycle having a Seq of 1
,
her second a Seq of 2
, etc. All other
CYCLES rows have Seq values that are useful for ordering
each female's cycles but not for comparison between
females.
Because a gap in observation always triggers a change in cycle, and because cycles must be “complete”, i.e. must contain a Mdate, a Tdate, and a Ddate, if there is no gap in observation it is impossible to have a single cycle missing nothing but a Tdate, i.e. it is impossible to have a cycle with a Mdate and a Ddate but no Tdate. If necessary, an estimated Tdate may be entered to work around this limitation.[65]
The system reports an error when the combination of Sname and Seq is not unique.[66]
A numeric identifier identifying each sexual cycle. It is unique across all cycles of all females.
This column need not be manually specified when the row is created.
The value of this column may not be altered after a row is created.
This column must not be NULL
.
The short name of the female. This column must contain the Sname of a female in BIOGRAPH.
The value of this column may not be altered after a row is created.
This column must not be NULL
.
The first sexual cycle of a female has a Seq value
of 1
, the second a value of
2
, etc.
The system will report an error if the Seq does not begin
with 1
or is not contiguous. This
column does not need to be manually maintained.
There are no gaps in the sequence numbers assigned to a female. Even when records of cycles are missing, the first recorded cycle after the missing period has a sequence one greater than the last recorded cycle before the missing period.
If the user does specify a value for this column the system may recompute and replace the supplied value at any time.
This column may be NULL
when the row is first
inserted, so that the system can set the value correctly
when CYCPOINTS are subsequently
inserted, but it may not be changed from a non-NULL
value to NULL
.
Number indicating with which series of continuous observation the
transition event belongs. Events that are isolated
observations have a series of their own. As with Seq, the
Series are per-female. Each female begins with a Series
of 1
and is incremented with
each interruption in regular observation. For further
information see the description of the CYCLES table
above.
The system will report an error if the Series does
not begin with 1
or if the Series does
not progress in a contiguous fashion. This column does
not need to be manually maintained.
If the user does specify a value for this column the system may recompute and replace the supplied value at any time.
This column may be NULL
when the row is first
inserted, so that the system can set the value correctly
when CYCPOINTS are subsequently
inserted, but it may not be changed from a non-NULL
value to NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records information on the sexual cycle of the females, one row per female per event.
The user may find it easiest to understand the function of this table by considering the CYCPOINTS_CYCLES view, which joins the CYCLES and CYCPOINTS tables.
The usual events that mark the transitions of a female
baboon's sexual cycles are onset of menses (Mdate), onset of
turgesence (Tdate), and onset of deturgesence (Ddate). These
different transition event dates are distinguished by Code values of
M
,
T
, and
D
respectively. In
addition to these usual observations of transition states,
CYCPOINTS contains
one other
kind of row,
estimations of when unobserved sexual cycle transitions
occurred; notably the automatically calculated onset of menses
dates but also unobserved onset of deturgesenceses (Ddates)
related to pregnancy conception events[67].
The unusual events that impact female cycling records, notably death and the cessation or initiation of long term observation, are recorded in other tables.
The interval between conception and birth (or fetal death) is the length the pregnancy, by definition, and CYCPOINTS is only place in Babase where conceptions are recorded. For this reason CYCPOINTS includes rows for the Ddate events that begin every pregnancy, including those that record estimated, unobserved, Ddates. It may be that all that is known about a cycle is that a Ddate must have occurred because a pregnancy resulted.
Although Babase requires pregnancies to have a
conception Ddate, and consequently there may be pregnancies
for which an estimated (Source of
E
) Ddate must be entered,
there is nothing preventing the user from creating estimated
CYCPOINTS rows for the other Codes.
CYCPOINTS is special in that some of its data are automatically maintained by the system. The Cid and Source columns can be updated by automatic processes. For further information see the documentation of the CYCLES table and each column's documentation.
The presence of a Ddate row can trigger the automatic generation of a Mdate 13 days later. For further information see the section on Automatic Mdate Generation.
Only Mdates are automatically assigned, and only Mdates
may have a Source of A
(Automatic).
Mdates may be manually given a Source of
A
, although this may well not be a
good idea as the Automatic Mdate Generation
process may remove the A
row at any
point. It is even less of a good idea because automatic
Mdates are not validated, so it is quite
simple to enter an invalid automatic Mdate.
During a period of continuous observation — a series — sexual cycle transition events (CYCPOINTS) should not be missing, except that Mdates cannot be assigned in the case of the first adolescent cycle (at maturity) or at the start of a Resume cycle. An individual's Mdates, Tdates and Ddates should all appear in Mdate-Tdate-Ddate order. The system will report an error if this is not the case.[68] In consequence the combination of Cid and Code must be unique.[69]
Usually a female does not have multiple CYCPOINTS rows for a given date, although there is an exception. A female's onset of menses date (Mdate) may be the same as her onset of turgesence (Tdate) date. Otherwise, none of a female's CYCPOINTS rows may share a date.
Babase allows each sexual cycle transition event to be
associated with 3 dates, the date of record (Date), the
earliest possible date (Edate),
and the latest possible date (Ldate). The earliest (Edate) and latest
(Ldate) possible dates may be NULL
. The earliest possible
date (Edate) may not be later than the date of record (Date),
and the latest possible date (Ldate) may not earlier than the
date of record (Date). A female's earliest Tdate may, and
likely will, have an earliest possible date (Edate) assigned
that is before onset of menarche.
A number of constraints on CYCPOINTS involve the
females' sexual maturity dates (MATUREDATES.Matured). When an individual's sexual
maturity date is determined by observation, MATUREDATES.Mstatus
is O
(On), her earliest Tdate must be
equal to her sexual maturity date.
When a female's MATUREDATES.Mstatus is
O
(On) her MATUREDATES.Matured is automatically set to her
earliest Tdate. Any error in the Tdate value will be
reflected in the maturity date. This is not true of females
with MATUREDATES.Mstatuses that are not
O
. These maturity dates must be
manually maintained.
No date-of-records may occur before a female's maturation date. All of an individual's date-of-record (Date) and late (Ldate) sexual cycle transition date values must be on or after the individual's onset of menarche date (MATUREDATES.Matured). All of an individual's early dates (Edate), Bdates of record (Date), and the first Tdate date-of-record (Date), sexual cycle transition dates must be after the individual's Birth date.
Females with CYCPOINTS rows must have a sexual maturity date. The system will report mature females with no CYCPOINTS rows on or after her maturity date (MATUREDATES.Matured).
All early date (Edate) and date-of-record (Date) values must be on or before the individual's Statdate.
Even when an individual is dead, late (Ldate) dates may be after the Statdate. This is because death is rarely observed; although the Statdate contains a single date, the uncertainty surrounding the date of death is reflected in the sexual cycle event Ldate.
There are gaps in observation. If the first cycling
event in a series -- the first Mdate, Tdate, or Ddate -- falls
on the day observation resumes then things are pretty simple.
The state of sexual cycling at the time observation resumes,
CYCGAPS.State,
must correspond with the event. For a menses CYCGAPS.State is
M
and so forth. The situation
is slightly complicated by the swelling-follicular and
ovulating states. The details are this: If the first
CYCPOINTS row in the series falls on the first day of the
series, the CYCGAPS.State must be
M
(Menses, follicular) when
the CYCPOINTS.Code is M
(onset of
Menses); CYCGAPS.State must be
D
(Deturgesence) when
the CYCPOINTS.Code is D
(onset of Deturgesence); CYCGAPS.State must be
S
(Swelling, follicular)
when the CYCPOINTS.Code is T
(onset of Turgesence) and the subsequent Ddate in the series
is more than 5 days after the Tdate or there
is no subsequent Ddate; and CYCGAPS.State must be
O
(Ovulating) when the
CYCPOINTS.Code is T
(onset of
Turgesence) and the subsequent Ddate in the series is not more
than 5 days after the Tdate.
If the above is not the case, i.e. the first cycling
event in the series falls on the day observation resumes and
CYCPOINTS.Code
is M
but the CYCGAPS.State is not,
then the State of the CYCGAPS row is automatically changed to enforce
correspondence between CYCGAPS and
CYCPOINTS.
But what if observation starts and then later the first Mdate, Tdate, or Ddate is observed? What happens (to CYCSTATS) between the start of observation and the first event? That's what CYCGAPS.State is supposed to address and it needs to be set appropriately. This cannot always be done automatically either, although usually it can.
If the first CYCPOINTS row in the series does
not fall on the first day of the series,
the CYCGAPS.State
must be D
(Deturgesence)
when the first CYCPOINTS.Code is
M
(onset of Menses); the CYCGAPS.State must be
S
(Swelling, follicular)
when the CYCPOINTS.Code is
D
(onset of Deturgesence)
and the CYCPOINTS.Date is more than 5 days
after the CYCGAPS.Date; and the CYCGAPS.State must be
O
(Ovulating) when the
CYCPOINTS.Code is D
(onset
of Deturgesence) and the CYCPOINTS.Date is not more than
5 days after the CYCGAPS.Date.
In these cases, as before, the State of the CYCGAPS row is automatically changed to enforce correspondence between CYCGAPS and CYCPOINTS.
The final set of possibilities have to do with Tdates,
which are complicated because they occur at menarche and after
pregnancies, as well as after menses. The system will report
an error if the first CYCPOINTS row in a series does not fall
on the first day of the series and the first CYCPOINTS row is
a Tdate and the CYCGAPS.State is something other than
M
(Menses),
P
(Pregnant), or
L
(Lactating). Because
there are 3 possibilities in this case, the CYCGAPS.State value is
not automatically assigned.
All of the validation and possible updating of the CYCGAPS.State is performed on transaction commit.
Because deleting CYCPOINTS changes a female's cycling state -- a representation of which Babase keeps in the sexual cycle day-by-day tables -- but not the interval of time during which she was under observation (CYCGAPS), removing Mdates, Tdates, or Ddates from CYCPOINTS at the beginning of a series can, possibly, leave the beginning of the series either in an incorrect state or the correct state for an overly long period of time. This can be equally true when the dates of the first CYCPOINTS in a series are changed. Removing all the CYCPOINTS Mdate, Tdate, and Ddate rows from a series will leave the entire observational period in the State specified by the CYCGAPS row that denotes the start of the observational period. This may or may not be correct, especially when the CYCGAPS.State was automatically changed due to the insertion or deletion of CYCPOINTS rows.
When deleting all sexual cycle transition CYCPOINTS rows from an observational period it is best to delete them all in a single transaction, or to delete later rows before earlier rows. Deleting CYCPOINTS rows from the beginning of the observational period changes the CYCGAPS.State value marking the start of the observational period.
CYCPOINTS rows must not fall in an
interval of no observation, excepting estimated (Source is
E
) Ddates (Code
D
) that are also conception
events. (See PREGS.Conceive.) None of the different kinds of
date values -- early (Edate), date-of-record (Date), or late
(Ldate) -- of the individual's CYCPOINTS rows may be in an
interval during which the individual is not under observation
--
may fall on a date on which the individual has a row in CYCGAPDAYS.
The system will allow but report as an error CYCPOINTS rows
with a Source of E
and a Code
of D
that are not
referenced in PREGS.Conceive.[70]
CYCPOINTS and CYCLES are intimately related. Be sure to read and understand the CYCLES documentation.
Once a row is created it must remain associated with the same female -- any re-assignment of Cid must retain the association between the CYCPOINTS row and the old Cid's female.
There are plans afoot to automatically fill in the early and late dates. The early dates would include the day after the immediately prior census date, the late date would be the day of the immediately following census date. There must also be a mechanism for manually overriding the automatic dates.
When making changes to data for individuals with observation gaps, avoid updating this table in a transaction that also makes changes to CYCGAPS. See above for more information.
A numeric identifier unique to each row. This is
used to reference the sexual cycle transition elsewhere in
the database. This column may not be NULL
.
This column need not be manually assigned when the row is created. It may not be changed.
A numeric identifier identifying each sexual cycle. It is unique across all cycles of all females, but shared by all CYCPOINTS rows comprising a cycle -- a Mdate, Tdate, Ddate sequence -- of a female.[71]
This column need not be manually specified when the row is created using the CYCPOINTS_CYCLES view. If it is not specified, the system will determine with which cycle the row should be associated and assign the correct Cid. Should the system find that the sexual cycle transition date belongs in a new cycle, it will make and assign a new Cid.[72] If the column is specified the system does the same work, but when it is appropriate to create a new cycle the supplied value is used.
As the system does the same amount of work whether or not the user specifies a value, the only utility in specifying a value is to manually assign a specific Cid to a new sexual cycle which Babase would otherwise automatically create.
When sexual cycle transition dates are incorrectly aggregated into sexual cycles, i.e. when the Cid is wrong, it is probably because the record of when the female was under observation — the data on the CYCGAPS table — is incorrect. Correcting CYCGAPS may correct the problem.
The system automatically assigns, or re-assigns, Cid values as CYCPOINTS and, especially, CYCGAPS rows are inserted, deleted, and altered to keep the database in a state consistent with the definition of a sexual cycle. For this reason any particular Cid is not guaranteed to forever identify a particular Sname/Date/Code. Cpids may be used for this purpose, or the data itself. For further information see the CYCLES documentation.
Supplying a NULL
value causes the system to
recompute the correct value and use it in place of the
NULL
.
The date-of-record of the transition event. See the
Protocol for Data
Management: Amboseli Baboon Project for information regarding the determination
of this date from the field data. This column may not be
NULL
.
Earliest possible date of the transition event. This
column may be NULL
when there is no need to record a
range of date values.
Latest possible date of the transition event. This
column may be NULL
when there is no need to record a
range of date values.
Code indicating from whence the data were
derived. D
(Data -- the
default) for observed data. A
(Auto) for automatically inserted rows (see Automatic Mdate Generation). E
(Estimated) for estimated values not to be used in other
computations, such as estimated
D
dates entered to
relate mothers and pregnancies.
This column may not be changed after the row is created.
The type of sexual cycle transition:
Code | Description |
---|---|
M | onset of Menses, a sexual cycle transition event |
T | onset of Turgesence, a sexual cycle transition event |
D | onset of Deturgesence, a sexual cycle transition event |
This column may not be changed.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for each recorded pregnancy. A pregnancy is defined to be an event occurring to a mother; a single pregnancy could result in more than one fetus. The only time there will not be a related BIOGRAPH row for the zygote(s) is when the pregnancy is still in progress[73], otherwise there will always be a BIOGRAPH row that records the progeny of the pregnancy.
The progeny may not be born before being conceived -- the conception date (Ddate via Conceive) of the pregnancy must not be later than the birth date value (Birth) of the associated BIOGRAPH row, the child. The mother may not resume cycling until after birth -- the birth date value of the associated BIOGRAPH row must not be later than the resumption of cycling date values (Resume).
The sequence of a female's pregnancies when ordered by parity must correspond with the sequence when ordered by conception date.
The sequence number (CYCLES.Seq obtained via CYCPOINTS.Cid) of the sexual cycle event immediately following pregnancy (Resume) must always be exactly one more than the sequence number of the sexual cycle event associated with conception (Conceive). Only one pregnancy is allowed per conception event -- each Conceive value differs from all the others. These rules ensure that the resumption date follows the conception date and that there is no overlap of pregnancy time periods, from conception date to birth date or, if known, resumption of sexual cycling date, among the pregnancies associated with a particular female.[74] The female associated with the conception sexual cycle event (Conceive) must be the same as the female associated with the sexual cycle event immediately following pregnancy (Resume).
There must not be a resumption of menses date (Mdate) in the sexual cycle (CYCPOINTS.Cid) of the Resume cycle.
The pregnancy must terminate in a birth or fetal loss
before the female resumes cycling; the only exception is
cessation of observation as described below. The Resume
column must be NULL
until there is a row in BIOGRAPH with a Pid
referring to the pregnancy.
Note that the check for pregnancy termination, as well as the parity sequence checks, are not performed until the database transaction is committed. This allows a pregnancy discovered after subsequent pregnancies are already on-record to be added to the database by making multiple changes within a single database transaction. Inserting the new PREGS row, inserting a BIOGRAPH row for the progeny, and then updating the PREGS.Resume of the new pregnancy within a single transaction allows the referential integrity rules to be satisfied when the transaction commits.
Babase keeps a record of the reproductive state of mature females in the sexual cycle day-by-day tables. If these tables are to be correct Babase must know when each pregnancy ends (see BIOGRAPH.Birth), and when cycling resumes. When there is no record of the end of a pregnancy or resumption of cycling Babase must know whether this is due to cessation of observation or just cessation of data entry.
Babase cannot detect when the user has failed to enter rows in CYCGAPS when observation of a pregnant female has ceased. However, it will report errors and unusual conditions it can detect.
The system will report a warning: when an ongoing pregnancy exceeds 191 days -- when there are more than 191 days between the conception date (PREGS.Conceive) and the Statdate, and there are no progeny recorded for the pregnancy (in BIOGRAPH.Birth), and when there are no gaps in observation (see CYCGAPS) during the 191 day interval; when it appears that a conception date should be estimated but it is not -- when there is no Tdate in the conception cycle but the conception Ddate[75] is not estimated, and there is no gap in observation between the conception date and all of the female's prior CYCPOINTS rows.
The system will report an error: when a female has
sexual cycles while a pregnancy is ongoing[76] -- when the female has Tdate CYCPOINTS rows that post-date her pregnancy's
Conceive date but pre-date gaps in observation, and the
pregnancy has no (NULL
) Resume.[77] A female must not have any CYCPOINTS rows that postdate a pregnancy with a
NULL
Resume, unless the first CYCPOINTS
row is a Tdate or unless they postdate a gap in observation
following the pregnancy.
The Resume column is automatically updated by
Babase. so long as there is no gap in observation (See CYCGAPS) between the conception date and the
Tdate that resumes cycling. It is set to the Tdate
immediately following the conception date. The system will
report an error if there is a gap in the observation of
sexual cycle events (CYCPOINTS and the
Resume column is not NULL
.[78]
The temporary creation of a gap in observation (CYCGAPS) allows a conception-birth-resumption sequence to be inserted into a pre-existing series of sexual cycle events (CYCPOINTS).
The contents of this column uniquely identifies the pregnancy record. The Pid must be the mother's Sname followed by the probable parity. Because the Pid is only used to identify the record, it is not necessary to change the Pid just because the parity of the pregnancy is found to have changed. Once a unique Pid is established, it may not be changed. When retrieving data from this table the safe approach is to assume nothing about the contents of this column except that it will uniquely identify a pregnancy.
The preferred way to obtain the bearer of the pregnancy is to find the female associated with the ovulation by joining PREGS.Conceive with CYCPOINTS.Cpid to find CYCPOINTS.Cid, join that with CYCLES.Cid to find CYCLES.Sname, and then use that value to find the mother's BIOGRAPH row.[79][80]
The Parity column must always be used to obtain a meaningful parity value. As Pids cannot change, should a pregnancy be missed and correction only entered into Babase after the entry of a subsequent pregnancy, the female's subsequent Pid will forever contain an incorrect parity.[81]
The cardinality of the
pregnancy. 1
for a female's first
pregnancy, 2
for a female's second
pregnancy, and so forth. There must not be
“gaps” in the pregnancies, sequenced by
Parity, of any female. When the first pregnancy is known,
the Parity sequence begins with 1
. When
the first pregnancy is not known, the Parity sequence
begins with
101
.
The parity of a female's first pregnancy must be specified. This tells the system whether the parity sequence begins with 1 or 101. The system will automatically generate the parity of subsequent pregnancies, when the user does not supply a parity. When the user does specify a parity the system compares the supplied value with the value it computes for the column and and raises an error if the two do not match. As a special exception the parity is allowed to be in the 100s rather than the 1s, although the parity must remain sequential and without gaps when only the 10s and 1's place of the female's pregnancy parities are considered. E.g. the parity sequence may be either 1, 2, 3 or 1, 2, 103 but not 1, 2, 104. The 1 in the 100ths place signals that there has been a period of no observation[82] and a pregnancy may have been missed. When a pregnancy's parity is changed from the 1's (or 10's) to the 100s Babase will update the parity of subsequent pregnancies so that they are also in the 100s. Babase will only allow a change from the 100s to the 1s (or 10s) of the smallest of a female's pregnancy parities that are larger than 100 -- the first pregnancy after a period of no observation. In this case Babase will not change the parity of subsequent pregnancies; this must be done manually, from smallest to largest. Babase will not allow a change from the 100s to the 1s (or 10s) of a female's pregnancy parities that are larger than the smallest parity larger than 100.
Supplying a NULL
value for the Parity causes the
system to recompute the correct value, a value one larger
than the parity of the previous pregnancy, and use it in
place of the NULL
.
The information related to the Ddate event that initiated the pregnancy. This is the Cpid of a CYCPOINTS row of the mother. The related CYCPOINTS row should record the date of conception and must record a Ddate.
This column must contain a unique datum.
When the date of conception is estimated because there is no sexual cycle data, the conception date recorded should be 178 days before the recorded birthday.
This column must not be NULL
.
The resumption of cycling event (Tdate) of the first
cycle following the pregnancy. This is the Cpid of a row in CYCPOINTS, which must record a Tdate. This
column may be NULL
in those cases when resumption of
cycle information is not known. When this column is not
NULL
, it should contain a unique datum.
This column may be automatically updated. (See the description of the PREGS table above.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records textual notes made by field observers about a female's reproductive status. It contains one row for every date on which such a note was recorded, per female.
Most of the data related to a female's reproduction is recorded systematically by observers and stored in the other tables in this section. In addition to those data, observers occasionally record miscellaneous notes or comments related to a female's reproductive state. Those notes are recorded in this table.
This table only records notes about female reproduction; the Sname must be female in BIOGRAPH.
All notes made about a female on a single day are recorded in a single row; every Sname-Date pair must be unique.
Reproductive notes can only be recorded while the female is alive and under observation; the Date must be between the female's Entrydate and Statdate, inclusive.
It is rare but possible for a note to be recorded before a female reaches sexual maturity. The system will return a warning for rows that are before a female's Matured date, or for rows with females who do not appear in MATUREDATES at all.
Usually, if an observer took the time to write a note about a female, then they also will have recorded the size and/or color of her paracollosal skin. The system will return a warning if a female does not have a row in SEXSKINS whose Date matches the note's Date.
The SEXSKINS_REPRO_NOTES view is useful for simultaneous uploading of data to this table and to SEXSKINS.
A unique identifier for the note. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The text of the note.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records information on the females' sexskins, including size and/or color. It contains one row for every recorded observation of each female's sexskin.
Babase requires sexskin measurements be associated with sexual cycles (CYCLES) in accordance with the rules described in the Sexual Cycle Determination section.
Because sexskin measurements must be related to a female's sexual cycle (a CYCLES row), her Mdate, Tdate, and Ddate sexual cycle events (her CYCPOINTS rows) must be updated before sexskin information may be entered.
Use the CYCLES_SEXSKINS, SEXSKINS_CYCLES, or SEXSKINS_REPRO_NOTES views to maintain this table.
The checks that compare all the sexskins of a particular cycle raise their errors immediately when the error is a result of changes made directly to the SEXSKINS table. But, should an error condition be created as a result of automatic shifting of sexskins between cycles due to changes to the sexual cycle dates (See CYCPOINTS) the errors are not immediately reported.
Tdates normally occur at some point during the
transition from sexskin Size 0
to Size
1
, but can
occur during the transition from sexskin Size
0
to Size
5
. Measurements
larger than 5 cannot come on or before
the Tdate of the cycle. The system will generate a warning
when there is sexskin measurement larger than
1
before the
Tdate. The Tdate of a cycle must be after the dates of all
the cycle's sexskin measurements of zero that precede the
earliest 1 or greater measurement occurring in the
cycle.
A Ddate occurs when the sexskin begins to deturgesce. The Ddate of a cycle must be after the last measurement before the largest measurement of the cycle.[83] The system will report a warning when Ddates occur after sexskin turgesence has begun to subside -- Ddates after the first measurement following the largest sexskin measurement(s) of the cycle.
Sexskin turgesence normally begins after menses, so
sexskin measurements (the Size)
before the related cycle's Mdate cannot be larger than
0
. When the Size is greater than
0
and there is no
Mdate in the sexual cycle to which the SEXSKINS row is
assigned, the system will generate an error unless the sexual
cycle's Tdate falls on the individual's MATUREDATES.Matured
date and the maturity date is an “ON”
date[84], or the cycle is the first after a pregnancy (the
Cid is a PREGS.Resume value), or the cycle's first CYCPOINTS row after a (CYCGAPS)
gap is 30 or fewer days after
that gap's end date. In the latter case the system will
generate a warning. The sexskin
measurement on the Mdate cannot be larger than
1
, unless the Mdate is
also a Tdate in which case the measurement cannot be larger
than 5
. The system will
generate a warning when the sexskin measurement on the Mdate
is larger than
0
.
In constrast with the Size column, the Color column has no rules governing which values are allowed during different stages of a cycle.
Sexskin rows associated with one cycle must not be contemporaneous with Mdates, Tdates, Ddates, or sexskin turgesence observations related to a different cycle. All of the SEXSKINS Date values associated with a particular cycle must be later than the Mdate, Tdate, and Ddate of the previous cycle and earlier than the Mdate, Tdate, and Ddate of the succeeding cycle. There must not be any overlap of the cycles' sexskin measurement dates, over the time period from a cycle's earliest sexskin measurement date to its latest, between the sexskin measurement dates of a female's different cycles.
Sexskin observations cannot occur during gaps in observation. That is, each row's Date cannot be during any of the individual's gap periods in CYCGAPS. However, there is an exception: sexskin observations are allowed on the date of "point" observations in CYCGAPS.
The combination of Sname, from the associated CYCLES row, and Date must be unique.
The combination of Date and Cid must be unique.
Usually the observer records both the size and the color
on a date, but occasionally they might only one record one and
not the other. Because of this, the system allows either of
the Size and Color columns to be NULL
, but will also
return a warning in this case. It is an error if both of
those columns are NULL
.
A unique integer which identifies the SEXSKINS row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The CYCLES identifier associated with the sexskin measurement. This is a Cid from the CYCPOINTS table. This column can be used to retrieve the Sname of the female that was measured as well as all other data collected on the cycle.
This column is automatically assigned by the system. Although some (arbitrary) cycle must be associated with the SEXSKINS row upon insert in order to relate the row to a female, the system always uses the Sexual Cycle Determination rules to re-assign the row to the appropriate cycle.
This column may not be NULL
.
The date of the observation. This date must be after
the individuals Birth date.
The date must not be after the individual's Statdate. This column may not be
NULL
.
This column contains a number indicating the size of
the sexskin in a metric with units that are integers, with
the exception that 0.5
value
is allowed, ranging from 0
through 20
, inclusive.
This column may be NULL
, but only when the Color is not NULL
.
A PCSCOLORS.Color code indicating the observed paracallosal skin color.
This column may be NULL
, but only when the Size is not NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
One row for every unstructured data collection event recorded during all-occurrences protocols. The ALLMISCS row containing data collected during a particular sample is related to the SAMPLES row representing the sample. Samples do not have a fixed number of related rows on ALLMISCS, any particular sample may have one, none, or many. Further information may be found on SAMPLES.
A variety of ad-libitum data may be collected during sample data collection. Some of these ad-libitum data can be placed in the INTERACT_DATA and POINT_DATA tables, in which case ALLMISCS is not involved. The data that does not conform to the design of INTERACT_DATA and POINT_DATA is kept in the ALLMISC table.
Consortships recorded as ad-libitum data during focal point sampling are not stored on INTERACT_DATA because INTERACT_DATA requires that consortships have a starting and an ending time and data collected during focal point sampling is without duration. Such consortship data are stored as an ALLMISCS row. Babase presumes that all consortships are recorded systematically during the day on paper and entered into Babase and so it is not necessary to attempt to place ad-libitum consortship data recorded during focal sampling into INTERACT_DATA. . Consortship data are collected during focal samples in order to note whether focal animals are engaged in consortships during a particular sample, and not to record the consortship per se.
Mounts involving the focal individual during all-occurrences sampling are recorded both in the focal sample data and on the paper field ad-libitum records. Consequently, to avoid duplicates in INTERACT_DATA, Babase stores the mounts recorded in the focal data in the ALLMISCS table, but not the INTERACT_DATA table. Mounts in the ALLMISCS table are therefore redundant and may be ignored.
Babase does the same thing with ejaculations recorded in the focal data as it does with mounts: it records them in ALLMISCS rather than INTERACT_DATA. However, the protocol says nothing about ejaculations occurring during all-occurrences sampling. Anyone researching ejaculations will need to investigate this further.
For further information regarding the information collected see the Amboseli Baboon Research Project Monitoring Guide. For further information regarding which ad-libitum data winds up in ALLMISCS see the Protocol for Data Management: Amboseli Baboon Project. For further information on the structure of the ad-libitum text that is eventually stored in ALLMISCS, see the documentation for the focal sampling data collection program, or see the Amboseli Baboon Research Project Monitoring Guide if the focal sampling data were handwritten.
The combination of Sid and Time must be unique.
A unique integer which identifies the ALLMISCS row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The time the ad-libitum data were taken. This column stores the time using a data type having a precision of one second but the precision and accuracy of the data values are dependent upon the focal data collection system's timekeeping, the operator, and the protocol and is surely not one second. Consult the Amboseli Baboon Research Project Monitoring Guide.
The time may not be before
05:00
and may not be after
19:00
.
The unstructured ad-libitum information collected.
At present the text in this column actually does have some structure[85] but appears in ALLMISCS because Babase contains no other place suitable for the storage of the data. The text begins with a one letter code followed by a comma. The allowed one letter codes and their meaning are:
C
Consortship. This is redundant information. Because consortships happen over time these consortships should always also be independently recorded and therefore independently entered into INTERACT_DATA and PARTS.
U
Unknown. This was once reserved for
meta-information -- the field data collection team's
comments on the process of data collection -- but
its meaning has since become confused with the
O
code.
O
Other. Other information about the baboons or
their environment. Its meaning has become confused
with the U
code.
For further information see the Amboseli Baboon Research Project Monitoring Guide.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
One row for every MPIS row (multiparty interaction) involving a consortship. This table extends the MPIS table to include information about consortships.[86]
A unique integer which identifies the MPIS row -- the multiparty interaction.
Because the CONSORTS table extends the MPIS table, the two tables have a one-to-one relationship, this value also uniquely identifies the CONSORTS row.
The value of this column may not be changed.
The disputed female. A BIOGRAPH.Sname of a female.
This column may be NULL
when the consorted female
is unrecorded.
The male who consorted with the female prior to the multiparty interaction. A BIOGRAPH.Sname of a male.
This column may not be NULL
.
The male who consorted with the female after the multiparty interaction. A BIOGRAPH.Sname of a male.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains data from focal sampling points during which the observer recorded information about the focal individual's infant, one row for each focal sampling point. Any focal sampling protocol that includes recording this kind of information is almost certainly going to require that the focal individual be a female, hence the "F" in this table's name.
Despite its name, this table does not require that a focal individual be of any particular Sex. Requirements like those are set and enforced by the STYPES.Sex column.
Whether or not a focal sample is allowed to have data in this table is determined by the sample's SAMPLES.SType and that SType's related STYPES.Has_FPoints value. See the STYPES table for more information.
Each FPOINTS row is connected to a POINT_DATA row via the Pntid column. That is, each row in this
table must have exactly one row in POINT_DATA with the same Pntid. The system will report a
warning for those POINT_DATA rows that
belong to a sample whose SType's related Has_FPoints is TRUE
but which do not have
a related FPOINTS row. While every FPOINTS row must have a
related row in POINT_DATA, not every POINT_DATA row has a related FPOINTS row.
Because every FPOINTS row must have a related POINT_DATA row, when entering a point the POINT_DATA row must be entered before the FPOINTS row.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row. Pntid links FPOINTS with POINT_DATA in a one-to-one manner.
This column may not be NULL
.
The position of the infant with respect to the focal female. The legal values for this column are defined by the KIDCONTACTS support table.
This column may not be NULL
.
The suckling activity of the infant. The legal values for this column are defined by the SUCKLES support table.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains a row for every recorded interaction between animals, including all-occurrence data taken during focal point samples but excluding multiparty interactions (MPI_DATA). Each row records when the interaction occurred. Further information on the interaction is stored elsewhere, notably PARTS. Each interaction in INTERACT_DATA is represented as though it occurs between two ordered individuals designated “actor” and “actee” -- thus resulting in two rows in the PARTS table.
The INTERACT view should always be used in place of this table. (See Views for the rationale.) INTERACT is an extension of this table which may be useful. It is identical to INTERACT_DATA but is extended with alternate representations of dates and times.
The ACTOR_ACTEES view provides a way to view interactions as a single rows.[87]
The actual Date of an
interaction is usually known. However, in some cases only the
year and month of an interaction were recorded without specifying
the day. The specificity (or lack thereof) of the Date is indicated by the boolean
Exact_Date column. When Exact_Date is FALSE
, this
indicates that the year and month of the Date are known, but not the day. In
these cases, the Date must be
the first day of the month.
The Date of the interaction is constrained by various related dates of its participants, as follows:
The Date cannot be
before a participant's Entrydate, with one exception. When
Exact_Date is FALSE
the
Date can be before the
participant's Entrydate, but
the month and year of the Date cannot be before those of
the participant's Entrydate.
A female may not participate in a mount,
consortship, or ejaculation interaction before menarche
(MATUREDATES.Matured). When Exact_Date is FALSE
the Date may be before her Matured date, but the month and
year of the Date cannot be
before those of her Matured
date.
A male may not participate in a mount, consortship,
or ejaculation interaction before 4 years of
age[88]. When Exact_Date is FALSE
the Date may be before he reaches
that age, but the month and year of the Date cannot be before the month
and year in which he reaches that age.
The system will return a warning when the Date is before the LatestBirth of either participant in
the interaction. When Exact_Date is FALSE
, the Date may be before the
LatestBirth but the month and year of the Date cannot be before the month
and year of the LatestBirth.
Many rules surrounding INTERACT_DATA's values are
closely tied to the project's data collection protocols.
There are two sorts of data collected on behavioral
interactions: all-occurrences data
and ad-libitum data. All occurrences
data are collected only during focal animal samples. They are
data on all the occurrences of a particular behavior or
interaction during a given time interval and/or involving a
participating focal individual.[89] All occurrences data will always have an
INTERACT_DATA.Sid that is not NULL. Ad-libitum data are
data that are collected opportunistically at the will of the
observer; we do not assume that ad lib data capture all the
occurrences of a given behavior. Ad-libitum data, which
generally are not collected as part of focal animal samples,
usually have a NULL
Sid value (only those collected during
a focal animal sample have a non-NULL
Sid). Some sorts of
interactions are only collected during focal sampling and
not as ad libitum data outside of focal samples. Approach
(ACTS.Class =
P
), and request to groom
(ACTS.Class =
R
) are these
interactions; they are only collected during all-occurrences
sampling and must have a non-NULL
Sid. Although
consortship and mount[90] data are collected as all-occurrences data
during focal point samples, these data are also collected,
simultaneously and in more detail, in ad libitum
notes. Consequently, they appear in Babase as ad libitum
data in INTERACT_DATA, not as all occurrences data, and
consortships (ACTS.Class =
C
), mounts (ACTS.Class =
M
), and ejaculation (ACTS.Class =
E
) rows always have a
NULL
Sid.
An individual's all-occurrences interactions can be distinguished from ad-libitum data by using the Sid column to reference SAMPLES to see if the individual is the focal of an all-occurrences sample. An example is presented in Appendix B.
INTERACT_DATA rows having a related SAMPLES row, having a non-NULL
Sid[91], will automatically have an Observer value equal to the value
in the related SAMPLES.Observer column -- the system
automatically synchronizes observer values between related
INTERACT_DATA and SAMPLES rows. Such
automatically assigned values cannot be changed. To change
the observer the SAMPLES.Observer column must be changed.
Care must be taken when breaking a relationship
between INTERACT_DATA and SAMPLES, when
setting INTERACT_DATA.Sid to
NULL
. The automatically assigned INTERACT_DATA.Observer value may no longer be
correct and so may require manual adjustment.
An INTERACT_DATA row with a NULL
Sid and a non-NULL
Observer cannot be updated with a
non-NULL
Sid unless the
Observer value is also set to
NULL
-- manually assigning an observer to an ad-lib
interaction precludes relating the interaction to a focal
point sampling period. Setting Observer to NULL
when changing
Sid to a non-NULL
value
causes the system to automatically assign the correct value to
Observer -- causes the system
to automatically synchronize observers.[92] Likewise, an INTERACT_DATA row with a non-NULL
Sid cannot be inserted
unless the Observer value is
either NULL
or matches that of the related SAMPLES.Observer value
-- new focal sample interactions must be consistent with
respect to the observers recorded in the INTERACT_DATA and
SAMPLES tables. When an INTERACT_DATA row
with a non-NULL
Sid and a
NULL
Observer value is
inserted then the Observer
value is automatically updated with the related SAMPLES.Observer value
-- again, the observer associated with the interaction is
automatically brought into sync with the focal
sample.
INTERACT_DATA encodes interaction time and duration by
storing the start and stop times of the interaction. The
columns Start and Stop are used for this purpose.
Consortships may have a NULL
in either the Start or the Stop
time when the respective value is unknown, otherwise the Start
time must precede the Stop time. Ad-libitum sample agonism
and grooming interactions (ACTS.Class values of
A
and
G
respectively) must have a
NULL
in both the Start and Stop columns. All-occurrences
agonism, grooming, approach (ACTS.Class = P
),
and request to groom (ACTS.Class =
R
) interactions must
have non-NULL
Start times that equal Stop times. Start
always equals Stop for mounts (ACTS.Class = M
) and
ejaculations (ACTS.Class =
E
).
The columns of this table that contain times, Start and
Stop, are stored using a data type that has a precision of 1
second. The Amboseli
Baboon Research Project Monitoring Guide must be consulted regarding
the precision and accuracy of these data. It is expected that
ad-libitum datum is entered with a 1 minute
precision.[93] Consequently the seconds portion of the time
values must always be 0 when Sid is NULL
. All-occurrences
interaction data (Sid is not NULL
) do contain
seconds.[94]
When more than one observer is with a group at the same time, they are responsible for making sure that each interaction is only recorded in only one notebook, not duplicated across multiple observers' notebooks (see the Amboseli Baboon Research Project Monitoring Guide for more details). For this reason, it should be emphasized that the Observer column only indicates who recorded this row's interaction (when known), not who actually saw it.
The system will report a warning for interactions which occur between individuals who are not in the same group on the date of the interaction.
A positive integer that uniquely identifies the
interaction. This number is assigned by the system. This
column must not be NULL
.
The origin of the data. When the interaction data
were collected during all-occurrences sampling this column
holds a SAMPLES.Sid identifying the all-occurrences
sample during which the data were collected, otherwise
this column is NULL
.
A code indicating the kind of interaction. The ACTS support table defines the legal values for this column.
Although Act contains ACTS.Act values, it is often the broader ACTS.Class classification that is of interest.
This column may not be NULL
.
The time the interaction began or, in the case of all-occurrences data, the time the interaction was recorded in the field.
The data type of this column has a 1 second precision. The precision and accuracy of the data itself is dependent upon the protocol and the operator and is almost surely not 1 second. Consult the Amboseli Baboon Research Project Monitoring Guide.
The time may not be before
05:00
and may not be after
20:00
.
This column may be NULL
.
The time the interaction stopped or, in the case of all-occurrences data, the time the interaction was recorded in the field.
The data type of this column has a 1 second precision. The precision and accuracy of the data itself is dependent upon the protocol and the operator and is almost surely not 1 second. Consult the Amboseli Baboon Research Project Monitoring Guide.
The time may not be before
05:00
and may not be after
20:00
.
This column may be NULL
.
A boolean indicating whether or not the observer
recorded the interaction by hand[95]
. This value is TRUE
if yes, FALSE
if no.
This column may not be NULL
.
A boolean indicating whether or not the Date is the specific date of the interaction.
This column defaults to TRUE
, and cannot be
NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
One row for each collection of multiparty interactions.
Multiparty interactions are recorded as an ordered series of dyadic interactions. Each complete series has a single MPIS row in the database.
This is a separate data set from the dyadic interactions recorded in INTERACT_DATA and related tables. Interactions appearing there do not appear in the multiparty interaction data, or vice versa.
The date of the multiparty interaction must be between the Entrydate and Statdate, inclusive, of all the participants. The system will return a warning for each participant whose LatestBirth is after the date of the interaction.
The two participants in the dyadic interactions must be different individuals, the two MPI_PARTS.Snames must be different.
The Context column must be NULL
when the Context_type
value is N
, no
context.
The Context_type column must be C
(Consortship) and the Context column must be NULL
when a
related CONSORTS row exists. The system
will generate a warning when the Context_type column is
C
and there is no
related CONSORTS row.
A unique integer which identifies the MPIS row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
Multiparty interactions may be categorized by the context in which they occur. This column identifies the context of the multiparty interaction.
The legal values of this column are defined by the
CONTEXT_TYPES support table. This
column may not be NULL
.
Unstructured text describing the context in which the multiparty interaction occurred.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
Multiparty interactions are recorded as collections of individual dyadic interactions. This table contains one row for every dyadic interaction of a multiparty interaction collection. Each interaction is represented as though it occurs between two ordered individuals designated actor and actee -- these individuals are recorded in the MPI_PARTS table. The dyadic interactions within the collection are time-wise sequenced. Two rows may have the same sequence number (Seq), indicating that the two interactions occurred simultaneously.
The MPI_EVENTS view provides a convenient way to view multiparty interactions as single rows.
Babase records little in the way of causality among the various interactions collected together under the multiparty interaction collection umbrella. At the time of this writing the data protocols require that the initial interaction is a kind of agonism or a kind of help request, so that can be considered causal of the remaining interactions. However there is nothing, other than time-wise sequencing, linking particular requests for help with aid supplied. As a result it is impossible, in the general case, to associate help supplied with help requested. For example, an individual may request help twice, from two different individuals, and then receive help from an third individual. The columns recording the results of help requests (Helped and Active) must therefore be used with caution, as must any attempt to correlate the specifics of help given with help requested.
Multiparty interactions which occur simultaneously must have the same MPIAct values.
The system will generate a warning when more than two MPI_DATA rows, sharing a Mpiid, have the same Seq value -- when there are more than two dyadic interactions occurring simultaneously.
The first interaction of a multiparty interaction
(those with a Seq of
1
) must be an agonism or a request for
help, the MPIAct value must be that of an MPIACTS row having a Kind value of
A
or
R
.
The first interaction of a multiparty interaction
collection is expected to be a single dyadic interaction
unless otherwise allowed by the MPIACTS
table -- the first interaction of a multiparty interaction
collection may only occur simultaneously with another
interaction, the two dyadic interactions both having a Seq of 1
, when all of
these initial interactions have MPIAct values that relate the rows to
MPIACTS rows having TRUE
Multi_first values.
The Helped and Active columns are meaningful when the
MPI_DATA row records a request for help.[96] These columns must be NULL
when the MPI_DATA
row does not record a request for help, otherwise they must
not be NULL
. The system will generate a warning when the
Helped column indicates that no
help was given but there are subsequent interactions which
record help being given (where the MPIAct values have
H
MPIACTS.Kind values)
to the individual who requested help. The system will
generate a warning when Active is
TRUE
and there are no subsequent
AH
interactions where the
help-requestee is the recipient of help in the same
multiparty interaction collection. The system will generate
a warning when Helped is true and
Active is FALSE
and there are
no subsequent PH
interactions where the help-requestee is the recipient of
help in the same multiparty interaction collection.
A unique integer which identifies the MPI_DATA row, and thereby the interaction the row records.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
A number identifying the multiparty interaction collection (MPIS) of which the MPI_DATA interaction is a member.
This column cannot be changed and must not be
NULL
.
This column records the kind of interaction which took place. The legal values for this column are defined by the MPIACTS support table.
This column may not be NULL
.
The first interaction of each multiparty
interaction collection has a Seq value of
1
, the second a value of
2
, etc. The system will report an
error if the Seq does not begin with 1 or is not
contiguous.
The Seq values need not be unique, per Mpiid. Duplicate sequence numbers are used to indicate simultaneous interactions, as would happen if, e.g., 2 individuals aggressed against 1.
This column may not be NULL
.
This column indicates whether help was given, by the
individual from whom help was requested, in response to a
request for help. Helped must be FALSE
when help was
requested from an unknown individual.[97]This column contains meaningful information
only for those MPI_DATA rows which record requests for
help. (See above.)
This column is TRUE
when help was given and
FALSE
when no help was forthcoming.
This column may be NULL
.
This column indicates whether help given was active or passive. It contains meaningful information only for those MPI_DATA rows which record requests for help. (See above.)
This column is TRUE
when the help supplied was
active and FALSE
when either the help supplied was
passive or when no help was supplied. This column is
NULL
when the MPIAct value represents an action other
than a request for help.
When looking for help requests that received passive help always check the Helped value to be sure that help was actually received.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains records of participants in the interactions which make up a multiparty interaction collection (MPIS). Each interaction is represented as though it occurs between two individuals designated actor and actee. Interactions between multiple individuals are broken down into interactions between pairs according to rules described in the protocols. Therefore, this table should contain two rows for every record of an interaction (for every row in MPI_DATA), one row to record the actor, and one to record the actee. Rules for classifying individuals as actor or actee are documented below in the description of the Role column.
The MPI_EVENTS view provides a convenient way to view multiparty interactions as single rows.
Every MPI_DATA row should be related to exactly two MPI_PARTS rows, otherwise it is an error. However, the system allows this condition to exist. It is presumed that such an error condition will exist for only as long as it takes to enter a complete set of data. The system will report those cases where there are not exactly two MPI_PARTS rows for every MPI_DATA row.
The data integrity rules require that the MPI_DATA row be entered before the 2 MPI_PARTS rows.
Either the Sname or the
Unksname column must be NULL
,
but not both.
The actor and the actee of an interaction, when specified as Snames, must not be the same individual.
A unique integer which identifies the MPI_PARTS row, and thereby the participant in the interaction the row records.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
Multiparty interaction identifier. This column holds
the Mpiid value of the row on
the MPI_DATA table containing further
information on the interaction in which the animal is a
participant. It can be used to retrieve the other
information recorded on the multiparty interaction. There
must be a row in MPI_DATA with an Mpiid of this value. This column
cannot be changed and may not be NULL
.
A three-letter code (an id) that uniquely identifies a particular animal (an Sname) in BIOGRAPH. This code can be used to retrieve information, such as the maternal group of the animal, from BIOGRAPH or other places where the animal's three-letter code appears.
This column must not be NULL
when the
participating individual is precisely identified and
NULL
otherwise.
The nature of the problem when one of the participants in the interaction cannot be precisely identified. The legal values of this column are defined by the PARTUNKS support table.
This column must be NULL
when the participating
individual is precisely identified and not NULL
otherwise.
This column designates whether the row records the actor or the actee of the interaction. The two possible values are:
Code | Mnemonic | Definition |
---|---|---|
R | Actor | The actor is usually the one performing the act. For the agonism data, the individual that is the winner (does not perform a submissive behavior) is the actor. For help requests, the individual that is requesting the help is the actor. For help supplied, the individual supplying the help is the actor. For grooming data, the individual that is grooming is the actor. And so forth. |
E | Actee | The actee is usually the one that is the recipient of another animal's attentions. For the agonism data, the individual that is the loser (performing a submissive behavior) is the actee. For help requests, the individual of whom help is requested is the actor. For help supplied, the individual to whom the help is supplied is the actor. For grooming data, the individual that is groomed is the actee. And so forth. |
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains records of the participants in observed interactions between animals. Each row in the table records a participant. Each interaction is represented as though it occurs between two individuals designated actor and actee. Interactions between multiple individuals are broken down into interactions between pairs according to rules described in the protocols. Therefore, this table should contain two rows for every record of an interaction (for every row in INTERACT_DATA), one row to record the actor, and one to record the actee. Rules for classifying individuals as actor or actee are documented below in the description of the Role column.
Every INTERACT_DATA row must be
related to exactly 2 PARTS rows, excepting
those INTERACT_DATA rows that are
associated with ad-lib focal point sampling -- those that
have non-NULL
Sid values.
Ad-lib interactions collected during focal point sampling
are allowed to have only one participant, but only when that
participant is the focal individual. So that data can be
entered the system allows these error conditions to exist
while a transaction is in progress. These
conditions are validated on transaction
commit.
The data integrity rules require that the INTERACT_DATA row be entered before the 2 PARTS rows.
The utility in the PARTS table, as opposed to having single rows for interactions as the ACTOR_ACTEES view does, is in writing database queries that search for interaction participants. It is easy to use PARTS to search for a participant without knowing whether the participant is the actor or the actee. The same is not true of the ACTOR_ACTEES view.
It is easy to produce the ACTOR_ACTEES view from INTERACT_DATA and PARTS, but the reverse would not be true. This is why the underlying database representation is as it is and not the reverse.
The actor and the actee of an interaction must not be the same individual.
A unique integer which identifies the PARTS row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
A three-letter code (an id) that uniquely identifies
a particular animal (an Sname) in BIOGRAPH. This code can be used
to retrieve information, such as the maternal group of the
animal, from BIOGRAPH or
other places where the animal's three-letter code
appears. This column may not be NULL
.
This column designates whether the row records the actor or the actee of the interaction. The two possible values are:
Code | Mnemonic | Definition |
---|---|---|
R | Actor | The actor is usually the one performing the act. For grooming data, the individual that is grooming is the actor. For the agonism data, the individual that is the winner (does not perform a submissive behavior) is the actor. For mounts, consortships, and ejaculations, the male is the actor. |
E | Actee | The actee is usually the one that is the recipient of another animal's attentions. For grooming data, the individual that is groomed is the actee. For the agonism data, the individual that is the loser (performing a submissive behavior) is the actee. For mounts, consortships, and ejaculations, the female is recorded as actee. |
This column may not be NULL
.
Interaction identifier. This column holds the Iid value of the row on the
INTERACT_DATA table containing further
information on the interaction in which the animal is a
participant. It can be used to retrieve the other
information recorded on the interaction. There must be a
row in INTERACT_DATA
with an Iid of this value. This column may not be
NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
One row for every point observation collected on a focal individual during a sampling interval. When, for whatever reason, there are no point data collected on the focal individual at the turn of the minute, there is no row on POINT_DATA. The “position” of the points within the sample, Min value, may therefore contain “gaps” -- missing numbers. The “missing numbers” are points taken when the focal animal is out of sight or the point was missed for whatever reason. Babase represents the observational period during which a sample is collected as a SAMPLES row.
Always use the POINTS view in place of this table (see Views for the rationale.) It contains additional computed columns which may be of interest and is guaranteed to remain consistent in future Babase releases.
A POINT_DATA row must contain a Foodcode when the Activity column indicates the focal is
feeding, otherwise Foodcode must be NULL
.
Consistency is enforced with respect to time taken to collect the sample and the number of point observations. The Min value must not be larger than the Mins of the corresponding sample.
Validation of the Activity and Posture columns partially depends on the row's related SAMPLES.SType. The STYPES_ACTIVITIES and STYPES_POSTURES tables define which SType values can be used with which Activity and Posture values, respectively.
Changing the Sid risks data integrity issues that are not easily prevented with simple data checks, especially with the calculating of Minsis. Because of this, the Sid can only be changed by an administrator or superuser.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to particular points.
This column may not be NULL
.
The SAMPLES.Sid of the focal sample during which this point was collected.
This column may not be NULL
.
The ordinal number of the point within the sample.
The first point in the sample has a Point value of
1
, the second a Point value of
2
, etc. Note that these numbers need
not be contiguous since some points are
“lost” during data collection. (See
above.)
This column may not be NULL
.
The time the point was recorded. This column stores the time using a data type having a precision of one second. The precision and accuracy of the data values are dependent upon the focal data collection system's timekeeping, the operator, and the protocol and is surely not one second. Consult the Amboseli Baboon Research Project Monitoring Guide.[98]
It is unlikely that the researcher is interested in this data because, as of January 2006, the field protocols require no particular relationship between the time of the point and the time the observer records the data.
The time may not be before
05:00
and may not be after
19:00
.
This column may not be NULL
.
The ACTIVITIES.Activity of the individual when the point was taken.
Some values from ACTIVITIES may be restricted, based on the sampling protocol. See STYPES_ACTIVITIES for more information.
This column may not be NULL
.
The POSTURES.Posture of the individual when the point was taken.
Some values from POSTURES may be restricted, based on the sampling protocol. See STYPES_POSTURES for more information.
This column may not be NULL
.
Food item eaten when the point was taken, if any.
NULL
when no food items are eaten. The legal values for
this column are determined by the FOODCODES support table.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
The neighbors of the focal individual are recorded during point sampling. NEIGHBORS contains one row for every neighbor recorded during a point data collection event (minute).
When no neighbor is observed for a particular neighbor type (Ncode), no new rows are added to this table. This is different from how an "unknown" neighbor is recorded, as discussed below.
A focal individual's neighbors are not always
recognizable or for some other reason do not always have a row
in BIOGRAPH. For this reason NEIGHBORS
contains two different columns used to identify the neighbor,
Sname and Unksname. The first for recording known
neighboring individuals and the second for recording unknown
neighboring individuals. One and only one of these columns
must contain a value, the other column must then contain
NULL
.[99]
The system will report a warning when the neighbor is not in the same group as the focal individual.
The neighbor must be alive and in the study population on the day of the sample (SAMPLES.Date, as discovered via POINT_DATA.Sid) -- the day of the sample may not be before the neighbor's Entrydate, and may not be after the neighbor's Statdate.[100] This means that the demographic information for a particular time interval must be entered into Babase before the sample data for that interval.
The system will report a warning when the related Date is before a neighbor's LatestBirth.
Each point observation (Pntid value) may have at most one NEIGHBORS row of a given neighbor classification (Ncode value.) The combination of Pntid and Ncode must be unique.
The NCODES table places restrictions on which individuals can be neighbors. One effect of this is to limit the order in which NEIGHBORS may be added to and deleted from Babase.
The sample's focal individual (SAMPLES.Sname, as discovered via POINT_DATA.Sid) may not be her own neighbor.
The combination of Pntid and Sname must be unique.
Validation of the Ncode column partially depends on the row's related SAMPLES.SType. The STYPES_NCODES table defines which SType values can be used with which Ncode values.
A unique integer which identifies the NEIGHBORS row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The POINT_DATA.Pntid of the point in which this neighbor was recorded. Further information related to the entire sample must be found by using POINT_DATA.Sid, the sample identifier.
This column may not be NULL
.
The BIOGRAPH.Sname of the neighbor.
This column must be NULL
when the neighbor is an
unknown individual or otherwise not in BIOGRAPH, i.e. when the Unksname is not NULL
.
The NCODES.Ncode describing the kind of neighbor represented in the row.
Some values from NCODES may be restricted, based on the sampling protocol. See STYPES_NCODES for more information.
This column may not be NULL
.
The UNKSNAMES.Unksnamenature code recorded when the neighbor cannot be precisely identified[101].
This column must be NULL
when the neighboring
individual is precisely identified, i.e. when the Sname is not NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
One row for every continuous period of time during which data are collected at regular intervals on a specific focal individual. Although the field protocols center around collecting data primarily stored in the POINT_DATA, FPOINTS, and NEIGHBORS tables, other information — normally collected ad-libitum during data collection — may be collected as well and are also associated with the specific sample. Further, a sample is allowed to contain no (animal) information.[102] Each SAMPLES row contains the information pertaining to all the data collected during the sample.
The date of the sample must not be before the focal individual's Entrydate, nor after the focal individual's Statdate. Therefore the demographic data pertaining to any particular time period must be entered into Babase before the sample data collected during that time period.
The system will return a warning when the Date is before the focal individual's LatestBirth.
The number of point observations occurring during the sampling interval (Minsis) must be less than or equal to the total number of minutes elapsed (Mins) during the sampling interval.[103]
Other data integrity checks may be performed on a SAMPLES row — and on related rows in POINT_DATA, FPOINTS, and NEIGHBORS — depending on the data collection protocol used in the focal sample. Each sample's protocol is indicated by its SType, and the details of these other data integrity checks are defined in the STYPES table.
The system will report a warning when the group (Grp) of the focal individual, as recorded on SAMPLES, is not the same as the group MEMBERS records for the focal individual on the date of data collection.
One of the participants in all interactions collected during the sample (see INTERACT_DATA.Sid and PARTS) must be the focal individual.
Focal sampling protocols usually designate how many minutes should elapse in each sample, but for various reasons samples collected in the field may last for fewer than the expected number of minutes. Regardless of the expected number of elapsed minutes in a sample, the actual number and the number of those minutes in which a focal "point" was collected are recorded in the Mins and Minsis columns, respectively. Both of these integer columns cannot be less than zero. Their maximum allowed value depends on the row's SType and related STYPES.Max_Points.
The data collected during a focal sample are complex. To assist the observer with recording it all, these data are often though not always collected with an electronic device — e.g. a handheld phone/tablet — and specialized data collection software. This table uses three columns to record details about the hardware and software — or lack thereof — used for data collection: Collection_System, Programid, and Setupid. The Collection_System indicates the hardware used (e.g. "Samsung Tablet B", "Psion unit 6", "Pen and paper"). The Programid indicates the software that the hardware used, and the Setupid indicates any special configuration file(s) that the software used.
If a focal sampling arrangement has no particular need
for one of these columns — e.g. samples recorded with
a pen and paper likely won't need Programid nor Setupid
— do not set that column to NULL
. Collection_System isn't allowed to be
NULL
, and both Programid and
Setupid should only be NULL
when
their true values are unknown. Instead, add a row to the
column's respective support table that essentially means
"N/A" and use that value in this table.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to a particular sample.
This column cannot be changed and must not be
NULL
.
The GROUPS.Gid of the focal individual's group, recorded at the time of data collection by the observer.
This column may not be NULL
.
The STYPES.SType of the data collection protocol used in this focal sample.
This column may not be NULL
.
The total number of minutes which actually elapsed while the sample was collected.
This column may not be NULL
.
The actual number of point observations (once per minute) recorded during the sample.
Babase maintains this value automatically by counting the number of POINT_DATA rows associated with the sample. If this value is manually set, Babase compares the supplied value with the value it computes and issues an error if the two do not match.
This column may not be NULL
and must be less than
or equal to this row's Mins.
The SAMPLES_COLLECTION_SYSTEMS.Collection_System indicating how the sample's data were collected.
This column may not be NULL
.
The PROGRAMIDS.Programid of the software ("program") used on this row's device to collect this sample's data.
This column may be NULL
, indicating that this
information is unknown.
The SETUPIDS.Setupid representing the configuration ("setup") file(s) used by this row's software to collect the data in this sample.
This column may be NULL
, indicating that this
information is unknown.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
ANESTHS contains one row for each time additional sedation is administered to a darted individual. If no additional sedation was administered then this table should not contain rows related to the darting.
Anesthetic cannot be administered to the same individual more than once at any given time -- the combination of Dartid and Antime must be unique.
Anesthetic cannot be administered before the individual is darted -- the Antime value cannot be before the related DARTINGS.Darttime time.
Anesthetic cannot be administered after the individual recovers from the previous dose -- the Antime value cannot be later than 2 hours after the later of the DARTINGS.Darttime time or the previous administration of additional sedation.
The ANESTH_STATS view aggregates the multiple administrations of anesthetic given during a darting and so provides a convenient way to analyze ANESTHS rows.[105]
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to a particular administration of extra sedation.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which extra sedation was
administered -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
Anesthetic administered to extend sedation. The legal values for this column are defined by the DRUGS support table.
This column may not be NULL
.
The time additional sedation was administered to the darted individual.
The time zone is Nairobi local time.
The precision of this column is 1 minute -- seconds
and fractions thereof must be 0
.
This column may be NULL
when there is no record of
what time additional sedation was administered.
The amount of anesthetic administered, in CCs.
The maximum allowed is 1.0CC. The minimum is 0. The precision allowed and accuracy are .01CC.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
BODYTEMPS contains one row for each body temperature measurement taken of a darted individual.
The temperature cannot be measured before the individual
is darted or before the individual is picked up -- the Bttime
value cannot be before either the related DARTINGS.Pickuptime
time or[106] the Darttime time. The
temperature cannot be taken after the individual has recovered
from sedation - the Bttime value, when non-NULL
, cannot be
later than 2 hours after the later of the
DARTINGS.Darttime
time or the last administration of additional sedation, if
any, as recorded in the ANESTHS table. A
non-NULL
Bttime value implies that there must be a known
time of anesthetic administration -- either DARTINGS.Darttime or
ANESTHS.Antime
must be non-NULL
.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular body temperature measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the body temperature
measurement was taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The measured temperature in degrees Celsius to a
precision of 1/10th of a degree. The minimum allowed value
is 25
degrees and the maximum
45
degrees.
This column may not be NULL
.
The time the body temperature of the darted individual was taken.
The time zone is Nairobi local time.
The precision of this column is 1 minute -- seconds
and fractions thereof must be 0
.
This column may be NULL
when there is no record of
when the body temperature measurement was taken.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
CHESTS contains a row for each chest circumference measurement made of a darted individual.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular chest circumference measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the chest circumference
measurement was taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The chest circumference measurement, in centimeters,
with a precision of 1/10th of a centimeter. The minimum
value allowed is 25
centimeters. The maximum value allowed is
99.9
centimeters.
The value contained in this column may have been adjusted for systematic observational bias. See the Chunadjusted column for more information.
This column may not be NULL
.
Some measurements were subject to systemic bias when
taken. When this is known to have occurred the original,
biased measurements are recorded in this column. When there
is no known bias this column is NULL
.
When non-NULL
this column contains the original
chest circumference measurement, in centimeters, with a
precision of 1/10th of a centimeter. The minimum value
allowed is 25
centimeters.
The maximum value allowed is
99.9
centimeters.
A sequence number indicating the order in which the
measurements were taken. The first chest circumference
measurement taken during a darting has a Chseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes Chseq values to
ensure that they are contiguous and begin with
1
. See the Automatic Sequencing section for further
information.
Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
CROWNRUMPS contains a row for each crown-to-rump measurement made of a darted individual.
The CROWNRUMP_STATS view aggregates the multiple crown-to-rump measurements taken during a darting and so provides a convenient way to analyze CROWNRUMPS rows.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular crown-to-rump measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the crown-to-rump
measurement was taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The crown-to-rump measurement, in centimeters, with a
precision of 1/10th of a centimeter. The minimum value
allowed is 10
centimeters. The maximum value allowed is
99.9
centimeters.
This column may not be NULL
.
A sequence number indicating the order in which the
measurements were taken.The first crown-to-rump measurement
taken during a darting has a CRseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes CRseq values to
ensure that they are contiguous and begin with
1
. See the Automatic Sequencing section for further
information.
Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
DART_SAMPLES contains one row for every sample type collected in each darting.
The combination of Dartid and DS_Type must be unique.
The DSAMPLES view also shows these data, one line per Dartid. For some users, this may be a more desirable way to look at these data.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to all the samples of a particular DS_Type collected during a single darting.
This column cannot be changed. This column may not be NULL
.
The darting event during which the indicated samples were collected -- a DARTINGS.Dartid value.
This column cannot be changed. This column may not be NULL
.
The DART_SAMPLE_TYPES.DS_Type of this sample.
This column cannot be changed. This column may not be NULL
.
The number of samples collected of the type given in the DS_Type column.
This column may not be NULL
, must be greater than
zero, and must be between the DART_SAMPLES.DS_Type's corresponding DART_SAMPLE_TYPES.Minimum and DART_SAMPLE_TYPES.Maximum values,
inclusive.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
DARTINGS contains one row for every darting of an animal when data was collected.
The combination of Sname and Date must be unique.
The individual must be alive and in the study population when darted -- the Date must be between the individual's Entrydate and Statdate, inclusive. The system will return a warning when the Date is before the individual's LatestBirth.
The system will report a warning for females darted on or after 2006-01-01 for which there is no related DART_SAMPLES row that indicates a vaginal swab collection.
The Downtime value cannot be before the Darttime value and cannot be more than 1 hour after the Darttime value.
The Pickuptime value cannot be before the Downtime value and cannot be more than 90 minutes after the Downtime value. It also[107] cannot be before Darttime and cannot be more than 90 minutes after Darttime. The system will report a warning if the Pickuptime is more than 30 minutes after the Downtime.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and is used in other tables to refer to a particular darting event.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
A three-letter code (an id) that uniquely identifies
the darted animal (an Sname) in BIOGRAPH. This code can be used to
retrieve information from BIOGRAPH or other places where the
animal's three-letter code appears. This column may not be
NULL
.
The time the individual was darted -- when the dart was fired. The time zone is Nairobi local time.
The time may not be before
05:00
and may not be after
20:00
.
The precision of this column is 1 minute -- seconds
and fractions thereof must be 0
.
This column may be NULL
when the time of darting is
unknown.
The time the darted individual succumbed to the anesthetic. The time zone is Nairobi local time.
The precision of this column is 1 minute -- seconds
and fractions thereof must be 0
.
This column may be NULL
when the downtime is not
known.
The time that the darting team picked up the anesthetized individual.
The precision of this column is 1 minute -- seconds
and fractions thereof must be 0
.
This column may be NULL
when the pickup time is not
known.
Anesthetic administered by the dart. The legal values for this column are defined by the DRUGS support table.
This column may not be NULL
.
Mass of the darted individual, in kilograms. The precision of this column is 1/10th of a kilogram. The minimum value allowed is 1Kg. The maximum value allowed is 40Kg.
The system will report a warning when this column is
NULL
.[108]
Notes regarding the logistics of the darting. Comments about collars, anesthetic, etc. Consult the Amboseli Baboon Research Project Monitoring Guide for further guidance as to usage.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Comments about the animal's condition, darting circumstances, etc. during darting. Consult the Amboseli Baboon Research Project Monitoring Guide for further guidance as to usage.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the crown-to-rump measurements taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the chest circumference measurements taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the ulna length measurements taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the humerus length measurements taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Ad libitum notes taken on the physiological features of the darted individual, if any.
This column may be NULL
.[109]. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the PCV measurements taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the body temperature readings taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes that accompany any of the different samples recorded in the DART_SAMPLES table, if any.
This column may be NULL
.[110]. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the teeth, if any observations on the teeth were made.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the canines, if any observations on the teeth were made.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the testes measurements taken, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Notes on the parasite counts done, if any.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
DPHYS contains one row for each darting event during which physiological measurements were taken.
Additional physiological measurements are recorded in the PCVS and BODYTEMPS tables.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular set of physiological measurements taken during a darting.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the set of
physiological measurements were taken -- a DARTINGS.Dartid
value. This column cannot be changed and may not be
NULL
.
The pulse of the individual in beats per minute. The
pulse must be greater than
0
.
This column may be NULL
.
The respiration rate of the individual measured in
counts per minute. The respiration rate must be greater
than 0
.
This column may be NULL
.
The state of the right inguinal lymph node. The legal values of this column are defined by the LYMPHSTATES support table.
This column may be NULL
.
The state of the left inguinal lymph node. The legal values of this column are defined by the LYMPHSTATES support table.
This column may be NULL
.
The state of the right axillary lymph node. The legal values of this column are defined by the LYMPHSTATES support table.
This column may be NULL
.
The state of the left axillary lymph node. The legal values of this column are defined by the LYMPHSTATES support table.
This column may be NULL
.
The state of the right submandibular lymph node. The legal values of this column are defined by the LYMPHSTATES support table.
This column may be NULL
.
The state of the left submandibular lymph node. The legal values of this column are defined by the LYMPHSTATES support table.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
HUMERUSES contains a row for each humerus length measurement made of a darted individual.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular humerus length measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the humerus length
measurement was taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The humerus length measurement, in centimeters, with a
precision of 1/10th of a centimeter. The minimum value
allowed is 10
centimeters.
The maximum value allowed is
35
centimeters.
The value contained in this column may have been adjusted for systematic observational bias. See the Huunadjusted column for more information.
This column may not be NULL
.
Some measurements were subject to systemic bias when
taken. When this is known to have occurred the original,
biased measurements are recorded in this column. When there
is no known bias this column is NULL
.
When non-NULL
this column contains the original
humerus length measurement, in centimeters, with a precision
of 1/10th of a centimeter. The minimum value allowed is
10
centimeters. The
maximum value allowed is
35
centimeters.
A sequence number indicating the order in which the
measurements were taken. The first humerus length
measurement taken during a darting has a Huseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes Huseq values to
ensure that they are contiguous and begin with
1
. See the Automatic Sequencing section for further
information.
Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
PCVS contains one row for each PCV (packed cell volume) measurement taken from a darted individual.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular PCV measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the PCV measurement was
taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The packed cell volume measurement. This is a
percentage and must be between
1
and
99
, inclusive.
This column may not be NULL
.
A sequence number indicating the order in which the
PCV measurements were taken. The first PCV measurement
taken during a darting has a PCVseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes PCVseq values to
ensure that they are contiguous and begin with
1
. See the Automatic Sequencing section for further
information.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
TEETH contains one row for every possible tooth site within the mouth on which data was collected for every darting event during which dentition data was collected. There may not be data on each tooth or tooth site. The absence of a row in this table says nothing about the presence or absence of a particular tooth at the time of darting.
When the tooth is missing, the Tstate is
M
, the Tcondition value must be NULL
. When the
tooth is not missing Teeth-Tcondition must be
non-NULL
.
There may be only one tooth in any given tooth site within the mouth, at any one time -- for any given darting there may be at most one row in TEETH for each tooth site (TOOTHSITES).
While rows in this table record tooth presence/absence and condition in separate columns, these data might not be recorded that way in the field. In dartings from 2006-onward, the tooth's presence/absence is recorded in the same place that indicates a tooth has the "erupting" condition. Between this and the fact that it can be difficult for observers to discriminate between partially- and fully-erupted teeth, a tooth that in fact was still erupting might only be recorded as "present". Thus, erupting teeth might appear in this table without a Tcondition indicating it. Teeth that were recorded as "erupting" can safely be assumed to truly be erupting, however.
In other words: in dartings since 2006 (inclusive), there are likely some cases where an erupting tooth was mistakenly recorded only as 'present', and there is no way to identify when this has occurred.
When inserting a row into TEETH a NULL
Tstate value has special meaning.
Inserted rows with a NULL
Tstate
value are silently ignored; no such rows are ever
inserted.[111]
The Tstate column cannot be
changed to a NULL
.
The DENT_CODES view may be used to maintain the TEETH table. This view may also be useful when querying. It returns a single row with individual columns for every kind of tooth.
The DENT_SITES view provides a way to query TEETH, returning a single row with individual columns for each position in the mouth.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular tooth (or tooth site when a tooth is missing).
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the tooth examinations
were made -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The tooth, or tooth site if the tooth is missing. The legal values of this column are defined by the TOOTHCODES support table.
This column may not be NULL
.
The degree to which the tooth exists. The legal values of this column are defined by the TSTATES support table.
This column will never contain a NULL
. See the warning above for
more information.
A code rating the physical condition of the tooth. The legal values of this column are defined by the TCONDITIONS support table.
This column may be NULL
. See TEETH above.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
TESTES_ARC contains one row for every darting event for every recorded measurement of testicle width and length circumference.
The TESTES_ARC table contains testes measurements of a portion of the testicle circumference. The TESTES_DIAM table contains testes measurements of the diameter. The two tables are otherwise identical in that they have the same structure and have corresponding validation rules.
The “pairing” of the width and length measurements within this table exists to make data storage convenient; no special relationship is implied regarding the order in which the measurements were taken. For example, if there are 3 length measurements taken during a darting and 2 width measurements the width and length measurements may have been taken in either of the following orders, as well as other possible orders not listed here: length1, length2, length3, width1, width2 or length1, width1, length2, width2, length3. In other words the value of the Seq column describes the order in which the length measurements were taken and the order in which width measurements were taken but says nothing about the interspersing of length and width measurements.[112]
Either the width or the length must be specified -- both
Testwidth and Testlength cannot be
NULL
in the same row.
There can only be one measurement taken per darting per testicle per measurement sequence number -- Testseq must be unique per Dartid per Testside.
Once a Testwidth value is NULL
all the rows (for the same darting) with higher Testseq values must also
have a NULL
Testwidth value. The
same is true of the Testlength
column.[113]
An individual must be male to have a row in this table.
The system will report a warning when individuals have testes length measurements less than 15mm or have testes width measurements less than 10mm.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular testes measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the testes measurements
were taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The testicle measured. The legal values are:
Code | Description |
---|---|
L | the left testicle |
R | the right testicle |
This column may not be NULL
.
The testes length measurement, in millimeters, with a
precision of 1/10th of a millimeter. The minimum value
allowed is 15
millimeters. The maximum value allowed is
140
millimeters.
This column may not be NULL
.
The testes width measurement, in millimeters, with a
precision of 1/10th of a millimeter. The minimum value
allowed is 10
millimeters. The maximum value allowed is
95
millimeters.
This column may not be NULL
.
A sequence number indicating the order in which the
measurements were taken. The first measurement, of each
testicle, taken during a darting has a Testseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes Testseq values to
ensure that they are contiguous and begin with
1
. Note that the TESTES_ARC rows are
sequenced within Dartid within
Testside whereas the other darting
tables are sequenced only within Dartid. See the Automatic Sequencing section for further
information.
TESTES_DIAM contains one row for every darting event for every recorded measurement of testicle width and length diameter.
The TESTES_ARC table contains testes measurements of a portion of the testicle circumference. The TESTES_DIAM table contains testes measurements of the diameter. The two tables are otherwise identical in that they have the same structure and have corresponding validation rules.
The “pairing” of the width and length measurements within this table exists to make data storage convenient; no special relationship is implied regarding the order in which the measurements were taken. For example, if there are 3 length measurements taken during a darting and 2 width measurements the width and length measurements may have been taken in either of the following orders, as well as other possible orders not listed here: length1, length2, length3, width1, width2 or length1, width1, length2, width2, length3. In other words the value of the Seq column describes the order in which the length measurements were taken and the order in which width measurements were taken but says nothing about the interspersing of length and width measurements.[114]
Either the width or the length must be specified -- both
Testwidth and Testlength cannot be
NULL
in the same row.
There can only be one measurement taken per darting per testicle per measurement sequence number -- Testseq must be unique per Dartid per Testside.
Once a Testwidth value is NULL
all the rows (for the same darting) with higher Testseq values must also
have a NULL
Testwidth value. The
same is true of the Testlength
column.[115]
An individual must be male to have a row in this table.
The system will report a warning when individuals have testes length measurements less than 40mm or have testes width measurements less than 25mm.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular testes measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the testes measurements
were taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The testicle measured. The legal values are:
Code | Description |
---|---|
L | the left testicle |
R | the right testicle |
This column may not be NULL
.
The testes length measurement, in millimeters, with a
precision of 1/10th of a millimeter. The minimum value
allowed is 15
millimeters. The maximum value allowed is
75
millimeters.
This column may not be NULL
.
The testes width measurement, in millimeters, with a
precision of 1/10th of a millimeter. The minimum value
allowed is 10
millimeters. The maximum value allowed is
51
millimeters.
This column may not be NULL
.
A sequence number indicating the order in which the
measurements were taken. The first measurement, of each
testicle, taken during a darting has a Testseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes Testseq values to
ensure that they are contiguous and begin with
1
. Note that the TESTES_DIAM rows are
sequenced within Dartid within
Testside whereas the other darting
tables are sequenced only within Dartid. See the Automatic Sequencing section for further
information.
TICKS contains one row for every darting event during which data on ticks and other parasites were recorded.
When a specific number could not be arrived at because
there was a large number of parasites or there was some other
reason why the count could not be taken, Tickcount should be left NULL
.
The value of the Tickstatus column is constrained based on the Tickcount value. For further information see the documentation of the TICKSTATUSES support table and the meaning of the table's Special Values.
The combination of Dartid, Bodypart, and Tickkind must be unique.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular tick count.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the tick count was made
-- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The part of the body examined for ticks or other parasites. The legal values of this column are defined by the BODYPARTS support table.
This column may not be NULL
.
The kind of tick or other parasite, or kind of parasite and it's developmental stage, or kind of parasite indicator counted. The legal values of this column are defined by the PARASITES support table.
This column may not be NULL
.
The recorded count of ticks, ticks in the indicated
developmental stage, other parasites, or parasite signs.
The minimum value allowed is
0
, the maximum is
250
.
This column may be NULL
when there were too many
parasites to count or the count was not taken for some other
reason.
A status value indicating whether and what sort of tick count was taken. The legal values of this column are from the Tickstatus column of the TICKSTATUSES table. See the documentation of the TICKSTATUSES support table for more information regarding what values may be used under which conditions.
This column may not be NULL
.
Notes on the parasite infestation of the indicated body part.
Notes pertaining to parasites but not specific to the particular body part examined belong in DARTINGS.Ticknotes.
This column may contain NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
ULNAS contains a row for each ulna length measurement made of a darted individual.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular ulna length measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The darting event during which the ulna length
measurement was taken -- a DARTINGS.Dartid value. This column cannot be
changed and may not be NULL
.
The ulna length measurement, in centimeters, with a
precision of 1/10th of a centimeter. The minimum value
allowed is 10
centimeters.
The maximum value allowed is
35
centimeters.
The value contained in this column may have been adjusted for systematic observational bias. See the Ulunadjusted column for more information.
This column may not be NULL
.
Some measurements were subject to systemic bias when
taken. When this is known to have occurred the original,
biased measurements are recorded in this column. When there
is no known bias this column is NULL
.
When non-NULL
this column contains the original ulna
length measurement, in centimeters, with a precision of
1/10th of a centimeter. The minimum value allowed is
10
centimeters. The
maximum value allowed is
10
centimeters.
A sequence number indicating the order in which the
measurements were taken.The first ulna length measurement
taken during a darting has a Ulseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes Ulseq values to
ensure that they are contiguous and begin with
1
. See the Automatic Sequencing section for further
information.
Initials of the person who performed the measurement. The legal values of this column are defined by the OBSERVERS support table.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
VAGINAL_PHS contains a row for each vaginal pH measurement taken on a darted female.
The VAGINAL_PH_STATS view aggregates the multiple vaginal pH measurements taken during a darting and so provides a convenient way to analyze VAGINAL_PHS rows.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular vaginal pH measurement.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The DARTINGS.Dartid of the darting during which this vaginal pH measurement was taken.
This column cannot be changed and must not be
NULL
.
The vaginal pH measurement, precise to the nearest
0.5
. This must be a number between
4.0
and
10.0
.
This column may not be NULL
.
A sequence number indicating the order in which the
measurements were taken. The first vaginal pH measurement
taken during a darting has a VPseq value of
1
, the second a value of
2
, etc.
The system automatically re-computes VPseq values to
ensure that they are contiguous and begin with
1
. See the Automatic Sequencing section for further
information.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This section contains data about the origin, identity, location, and various other traits about the tissue and nucleic acid samples in the users' inventory. This includes samples currently residing in the users' inventory, as well as older samples that may have previously been in use but have since been sent to others, consumed, discarded, or lost. Because of this, the data in this section serve as both a historical record of all samples that have ever been in the users' possession and an active record of the samples that are currently in the users' possession.
The text in this section uses the terms "nucleic acid" and "nucleic acid sample" interchangeably[116]. At the time of this writing, the system does not attempt to record details at the molecular level, so the reader can be assured that comments about the location, source, etc. of a specific "nucleic acid" should be interpreted as referring to a sample and not a specific molecule.
This table contains one row for every location that may be used to store tissue or nucleic acid samples.
Samples may be stored in varied locations with different organizations/research groups ("institutions"). The Institution column is included to allow easy segregation of locations across these varying locales.
The name of each distinct location is recorded in the Location column. Different organizations have their own conventions about how to organize and name storage locations, so this code may be a very descriptive and specific space ("Shelf 1, Rack 2, Box 3, Position D") or something more general ("PINK BOX").
Each Institution-Location pair must be unique.
To allow the use of nondescriptive
general Location values but retain
the ability to enforce uniqueness of specific ones, the
boolean column Is_Unique is
included. When Is_Unique is
TRUE
, the row's LocId may occur
at most once across both the NUCACID_DATA.LocId
and TISSUE_DATA.LocId columns (once total, not once
per table). When FALSE
, the LocId may be used any number of times in
either table.
A unique identifier for the location. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The INSTITUTIONS.Institution indicating the organization or research group at which this row's Location exists.
This column may not be NULL
.
A boolean indicating whether or not this location at this institution is unique.
This column defaults to TRUE
.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every quantification of a nucleic acid sample's concentration. All concentrations are recorded in picograms per microliter (pg/μL).
A nucleic acid sample
cannot be quantified before it was created, before the source
tissue sample was collected, nor before the tissue sample's
donor entered the study population (if applicable); the Conc_Date cannot be before the
related NUCACID_DATA.Creation_Date, TISSUE_DATA.Collection_Date, nor the related BIOGRAPH.Entrydate. These dates already have a
required sequence to them — Entrydate <= Collection_Date <= Creation_Date <= Conc_Date — so in many
cases it may be sufficient for the system to only require that
Conc_Date is after the
Creation_Date. However, any of
these date columns can be NULL
, so for the sake of
completeness the system separately checks that Conc_Date is greater than each
of them.
Some quantification methods may use a different unit of concentration than that used in this table. Nanograms per microliter (ng/μL) is especially common. Such concentrations must be converted to pg/μL before they are added to this table.
Use the NUCACID_CONCS view instead of this table. It includes an additional column that indicates concentration in ng/μL, and also allows the insertion of quantifications in ng/μL. The conversion to ng/μL is thus performed by the system and not the user.
Do not assume that the number of significant figures employed in the Pg_ul column is the "true" number of significant figures for this quantification. This table records concentrations from a variety of quantification methods with varying levels of accuracy and stores them all in a single column that records all data to the nearest 0.1 pg/μL[117]. When new data are added, this column pays no attention to the number of provided significant figures and may indicate more than were actually used at the time of quantification. See the example below.
Example 3.2. (Mis)Use of Significant Figures in NUCACID_CONC_DATA
The concentration of a new DNA sample is determined to
be 10.0
ng/μL, which has 3 significant
figures. When recorded in NUCACID_CONC_DATA, this concentration will be
recorded in Pg_ul as
10000.0
pg/μL, with 6 significant
figures. A user should not assume that this quantification
was originally performed with 6 significant figures'
accuracy.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The NUCACID_DATA.NAId of the quantified sample.
This column may not be NULL
.
The NUCACID_CONC_METHODS.Conc_Method used to quantify this concentration.
This column may not be NULL
.
The date that this concentration was quantified.
This column may be NULL
, when the date is
unknown.
The concentration of the sample according to this quantification, in picograms per microliter (pg/μL).
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every person involved with the creation of a specific nucleic acid sample. When a nucleic acid sample has multiple creators, each of them is recorded here in a separate row.
Most nucleic acid samples are created via "extraction". This table favors using "creation" rather than "extraction", for reasons explained in the discussion of the NUCACID_DATA table.
Each NAId-Creator combination must be unique; a sample cannot have the same creator more than once.
Use the NUCACIDS view to insert data into this table. It provides a simple way to determine the appropriate NAId value to use, and for a human data enterer to provide multiple creators in a single row.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The NUCACID_DATA.NAId of the related nucleic acid sample.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every nucleic acid sample that is or ever has been in the inventory. Each nucleic acid sample is associated with a "source" tissue sample, which is indicated in the TId column.
Always use the NUCACIDS view in place of this table. It contains additional related columns which may be of interest.
This table records a nucleid acid sample's current location using the LocId column. Values in this column constrain and are constrained by values in the TISSUE_DATA.LocId column, and may or may not be unique, as discussed in the LOCATIONS table.
The Name_on_Tube column indicates whatever "name" or other identifying information is recorded on the tube. Because of labeling errors or misidentification in the field, this value may not indicate the true identity of the individual from whom this sample came.
To see the "true" identity of this individual, see the related line in the TISSUE_DATA table. This information is also provided in the NUCACIDS view.
Two columns in this table record information related to the sample's creation: Creation_Date and Creation_Method. Also the related table, NUCACID_CREATORS. In laboratory vernacular, the term "extraction" is usually favored over "creation" for most nucleic acid sample types. However, some samples are not "extracted" and are instead generated via a laboratory procedure (e.g. reverse transcription, dilution, PCR amplification, etc.). Because of this, the generic term "creation" is used here.
A sample's Creation_Date cannot be before the source tissue's Collection_Date, nor before the source individual's Entrydate, if any. It may often be redundant to verify that Creation_Date is on or after both dates, but this redundancy is intended, as discussed above.
This table attempts to keep an ongoing record of a
sample's current volume in the Actual_Vol_ul column. It is left to
the user to judge this column's accuracy, which depends
greatly on 1) how diligently the lab personnel keep the data
manager(s) informed of changes, and 2) the amount of time that
has passed since this volume was determined[118]. To assist users in making these judgments, the
date that the Actual_Vol_ul was
last updated is recorded in the Actual_Vol_Date column. A sample's
current volume cannot be recorded without also recording this
date; both of the Actual_Vol_ul
and Actual_Vol_Date columns
must be NULL
or both non-NULL
.
A sample cannot have its current volume determined before the sample was created; the Actual_Vol_Date must be on or after the sample's Creation_Date.
It is unlikely, though not impossible, that a sample's volume might increase after its creation. The system will report a warning when a sample's Actual_Vol_ul is greater than its Initial_Vol_ul.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The TISSUE_DATA.TId of the tissue sample from which this nucleic acid sample originated.
This column may not be NULL
.
The LOCATIONS.LocId indicating the current locale and location of the nucleic acid sample.
This column may not be NULL
.
The name of the source individual, according to the label on the tube.
This column may be NULL
, when there is no
identifying information on the tube. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The NUCACID_TYPES.NucAcid_Type of this nucleic acid sample.
This column may not be NULL
.
The date that this nucleic acid sample was created. When the process to generate a sample lasts more than one day, this is the date that the procedure was completed.
This column may be NULL
, when the creation date is
unknown.
The NUCACID_CREATION_METHODS.Creation_Method describing how this nucleic acid sample was created.
This column may not be NULL
.
The sample's volume, in microliters, when it was first created.
This column may be NULL
, when the initial volume
is unknown.
The sample's volume, in microliters, as of the Actual_Vol_Date.
This column may be NULL
, when users have not
updated the sample's "current" volume or when the sample
has not yet been used.
The date that the Actual_Vol_ul was determined.
This column may be NULL
, when users have not
updated the sample's "current" volume or when the sample
has not yet been used.
Comments or miscellaneous information about this nucleic acid sample.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every name or ID used only at a specific institution (an ID that is "local" to that institution) to describe a particular nucleic acid.
Identity of samples is maintained by the system as much as possible, but when working with samples in the laboratory this is often inconvenient or impractical. Different groups and institutions often have their own systems for giving unique names to their samples, and while these names may be useful and meaningful for humans, they are mostly unhelpful from the database's perspective. They're vulnerable to typos, and can be very confusing when a sample is shared between institutions. However, these "local names" remain important for the people who are actually using these samples, so these identifiers are recorded in this table, one per nucleic acid sample, per institution.
Every combination of NAId and Institution must be unique; an NAId cannot go by more than one local name at the same Institution.
Every combination of Institution and LocalId must be unique; the same local name cannot be used at a single Institution more than once.
The NUCACID_DATA.NAId of the nucleic acid sample.
This column may not be NULL
.
The INSTITUTIONS.Institution indicating the organization or research group at which this NaId's name is used.
This column may not be NULL
.
The local name used for this NAId at this Institution.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every nucleic acid sample having another nucleic acid as its source.
Often, nucleic acid samples are created through some "extraction" process in which the nucleic acids are purified from a tissue sample (e.g. a blood draw, a buccal swab, etc.) However, there are also numerous different methods by which nucleic acid samples may instead be created from another nucleic acid sample (e.g PCR[119], reverse transcription, dilution, etc.). In addition to recording the identity of the source nucleic acid, this table includes the Relationship column, which indicates the nature of the connection between the row's nucleic acid and its source nucleic acid. This relationship may be simple enough to explain in a single word (e.g. "DILUTION"), or complex enough to require a lengthy explanation. To allow this flexibility, Relationship is not constrained to a set of legal values in a support table.
A nucleic acid sample cannot indicate itself as its source; the NAId and Source_NAId cannot be equal.
A nucleic acid sample cannot have more than one other sample as its source; this table's NAId column is unique.
A nucleic acid cannot have been created before its source; the related Creation_Date of this NAId must be on or after the Source_NAId's related Creation_Date.
Although a nucleic acid sample may have been generated from another nucleic acid sample, there will always be a single tissue sample from which both the nucleic acid samples originated; both samples' related NUCACID_DATA.TId's must be equal.
The NUCACID_DATA.NAId of the nucleic acid that has another nucleic acid as its source.
This column may not be NULL
.
The NUCACID_DATA.NAId of the source nucleic acid.
This column may not be NULL
.
A textual description of how this nucleic acid and its source are connected.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every population under observation, and/or from which tissue or nucleic acid samples have been collected.
In this context, the term "population" refers to a particular species at a specific location. "The baboons in the Amboseli basin in Kenya", for example, are a population. "All baboons", or "all wildlife in the Amboseli basin", are not.
In the common vernacular, a population is often referred to only by the name of its site, e.g. "Gombe" when referring to the Gombe chimpanzees. Because of this, the Pop_Name and Site columns may seem redundant, but when setting vernacular aside it should be obvious that these two columns contain objectively different information. In practice, users may elect to enter the same value in both of these columns, but the two columns remain independent of each other.
PopId 1
has special meaning to the system. Data integrity rules for
the UNIQUE_INDIVS table presume that the
population with this PopId is the population whose
individuals are recorded in BIOGRAPH. No
other code should be created to refer to that
population.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The name of the population.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The scientific name of this population's species.
This column may be NULL
, when unknown or not
applicable. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The common name of this population's species.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
A code indicating whether or not the population is wild or captive. The legal values are shown below.
POPULATIONS.Wild_Captive Values
W
Wild.
C
Captive.
U
Unknown.
NA
Not applicable.
This column may not be NULL
.
The location of the population.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
Comments or miscellaneous information about this population.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every tissue sample that is or ever has been in the inventory.
Always use the TISSUES view in place of this table. It contains additional related columns which may be of interest.
This table records a tissue sample's current location using the LocId column. Values in this column constrain and are constrained by values in the NUCACID_DATA.LocId column, and may or may not be unique, as discussed in the LOCATIONS table.
If a sample was collected from an individual in BIOGRAPH — if the related UNIQUE_INDIVS.UIId
has a PopId of
1
— the
sample's Collection_Date must be
on or after that individual's Entrydate. Depending on the sample's
Tissue_Type, the Collection_Date may also be
constrained by the individual's Statdate. See TISSUE_TYPES for more information.
The system will return a warning if a sample's Collection_Date is after the
individual's Statdate, but only
when the sample's Tissue_Type
indicates that the Collection_Date is not constrained by
the individual's Statdate. That
is, when the related TISSUE_TYPES.Max_After_Statdate is NULL
.
From time to time, field observers may mistakenly record the wrong collection date on a tube. To help identify when this has occurred, the system uses the CENSUS table to confirm whether the Collection_Date is a date that the individual was actually observed[120]. The result of that confirmation is indicated in the Collection_Date_Status column.
When a sample's Collection_Date is not a Date on which the individual was recorded
present in CENSUS, the Collection_Date is
not necessarily "wrong". There are numerous circumstances in
which a sample may have been collected without a census being
performed. Still, the absence of a related row in CENSUS is suspicious, so it elicits a warning.
That is, the system will return a warning a tissue sample's
Collection_Date_Status is
1
.
Do not assume that the date written on a sample's label will always match the Collection_Date. When data managers determine that the date written on a label is erroneous, they may be able to determine the true date and update the Collection_Date as needed.
A unique identifier for the tissue sample. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The UNIQUE_INDIVS.UIId of the individual from whom this tissue sample was collected.
This column may not be NULL
.
The LOCATIONS.LocId indicating the current locale and location of the sample.
This column may not be NULL
.
The name of the individual from whom this tissue sample was collected, according to the label on the tube.
This column may be NULL
, when there is no
identifying information on the tube. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The date the sample was collected or originated.
This column may be NULL
, when the date is
unknown.
The time the sample was collected or originated.
This column may be NULL
, when the time is
unknown.
The STORAGE_MEDIA.Storage_Medium in which the sample is stored.
This column may not be NULL
.
The MISID_STATUSES.Misid_Status of this tissue sample.
This column may not be NULL
.
A code indicating whether this row's Collection_Date is or isn't plausible according to available CENSUS data. The legal values are:
Code | Description |
---|---|
0 | This individual is part of the main population and has a non-"absent" CENSUS row on this Collection_Date, OR this individual is not part of the main population and we have no basis to question the accuracy of this Collection_Date |
1 | This Collection_Date is NULL , OR this
individual is part of the main population and either i)
has no CENSUS rows on this
Collection_Date or ii) has only "absent" censuses on
this Collection_Date |
This column is automatically maintained by the
database and may not be NULL
. Attempts to manually
populate or update this column are silently
ignored.
Comments or miscellaneous information about this tissue sample.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every name or ID used only at a specific institution (an ID that is "local" to that institution) to describe a particular tissue sample.
For more details about the reason for this table and the difference between a "local" name/identifier and an ID generated by the database, see the discussion for the NUCACID_LOCAL_IDS table.
Every combination of TId and Institution must be unique; a TId cannot go by more than one name at the same Institution.
Every combination of Institution and LocalId must be unique; the same local name cannot be used at a single Institution to describe more than one sample.
The INSTITUTIONS.Institution indicating the locale in which this TId's name is used.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every tissue sample having another tissue as its source.
In addition to recording the identity of the source tissue, this table includes the Relationship column, which indicates the nature of the connection between the row's tissue and its source tissue. This relationship may be simple enough to explain in a single word (e.g. "ALIQUOT"), or complex enough to require a lengthy explanation. To allow this flexibility, Relationship is not constrained to a set of legal values in a support table.
A tissue sample cannot indicate itself as its source; the TId and Source_TId cannot be equal.
A tissue sample cannot have more than one other sample as its source; this table's TId column is unique[121].
A tissue sample cannot have been collected before its source; the related Collection_Date of this TId must be on or after the Source_TId's related Collection_Date.
Depending on the details of the Relationship, a tissue sample and its source may or may not be from a different individual. The system does not require that the related UIId's of the TId and Source_TId be equal. However, the system will return a warning when they are not equal.
The TISSUE_DATA.TId of the tissue sample that has another tissue sample as its source.
This column may not be NULL
.
The TISSUE_DATA.TId of the source tissue sample.
This column may not be NULL
.
A textual description of how this tissue sample and its source are connected.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every individual under observation, and every individual from whom tissue or nucleic acid samples have been collected.
In contrast to BIOGRAPH, which records the identities of every individual in the main study population[122], this table also records the identities of all the individuals in other populations from whom there are tissue or nucleic acid samples recorded in the inventory. All individuals in BIOGRAPH are also included in this table, whether or not tissue or nucleic acid samples exist in the inventory. This presents a problem: there are two tables that separately track the identities of all individuals in the main population. To address this, the triggers have been written to ensure that BIOGRAPH retains primary authority over all individuals in the main population.
Management of individuals in the main population is done by BIOGRAPH (see its discussion for more information), so the ability to perform inserts/updates/deletes in this table for those individuals is heavily constrained, as follows:
Inserting rows for individuals in the main population is only allowed for the unknown individual or for individuals in BIOGRAPH who have not yet been added to this table[123].
The unknown individual's row can only be updated or deleted by an administrator.
Deleting rows for individuals in the main population is only allowed for individuals who are no longer in BIOGRAPH[124].
Updating rows for individuals in the main population is only allowed when changing only the Notes column.
Any individual's PopId cannot be updated to add or remove the individual from the main population.
Do not manually insert or delete rows in this table for individuals in BIOGRAPH. Perform those actions in BIOGRAPH, and the action will automatically be performed in this table, as well. Manual inserts and deletes in this table should only be done for individuals who are not in BIOGRAPH.
The IndivId column is used to record the individual's name or similar ID. Study projects and research institutions each have their own rules of nomenclature for their individuals, so this might be a lengthy name, an abbreviation, a series of numbers, or some mix of these. This value is not unique; the same identifier may be used more than once across different populations. However, per PopId, each IndivId must be unique; a population cannot use the same identifier more than once.
PopId
1
is the
population recorded in BIOGRAPH, so any
row with this PopId (with a
few exceptions, discussed below) must use the individual's
Bioid as its IndivId.
IndivId
UNKNOWN
indicates
the unknown individual, and is allowed to have PopId
1
and not be a
Bioid.
IndivId
MULTIPLE
is used
to indicate when TISSUE_DATA row includes
samples from multiple individuals. It is allowed to have
PopId
1
and not be a
Bioid.
A unique identifier for the individual. This is an automatically generated sequential number that uniquely identifies the row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
The name/identifier for this individual.
This column may not be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The POPULATIONS.PopId of the individual's population.
This column may not be NULL
.
Comments or miscellaneous information about this individual.
This column may be NULL
. This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This section contains timestamped geolocation data on groups, observers, and significant landscape features (groves, waterholes[125], and possibly other temporary or permanent landmarks), either recorded in a “quad” coordinate system or collected from GPS units. SWERB stands for Sleeping grove, Waterhole, End time, Ranging, and Begin time. Typically SWERB data are collected at hourly or half hourly intervals. Supporting information includes the locations of tree groves and waterholes. For more information see the Protocol for Data Management: Amboseli Baboon Project.
The quad coordinate system was devised prior to the incorporation of GPS technology into the data collection protocols. It is based on regular sub-divisions of the landscape into a grid of squares, 170 m per side. There is no altitude information associated with quad coordinate points. The IDs and coordinates of these quads are recorded in QUAD_DATA.
The GPS X and Y coordinates are in the WGS
1984 UTM Zone 37South coordinate
system. The units of these coordinates are meters, as is the
recorded altitude. The recorded precision of the X and Y values
include at most 1 non-zero digit to the right of the decimal
place, but when the coordinates were recorded using another system the
transformation to UTM may yield values with more digits to the
right of the decimal. X and Y coordinates must be on or within
the bounding rectangle having X coordinates between
42300.0 and 651000.0, inclusive, and Y
coordinates between 9497000.0 and 9894500.0,
inclusive. The system will generate a warning when the location
falls outside the bounding rectangle having X coordinates
between 277000.0 and 311100.0, inclusive,
and Y coordinates between 9689200.0 and
9709500.0, inclusive. The accuracy may vary; see the
Protocol for Data
Management: Amboseli Baboon Project for further information on accuracy at various
times. Altitude is in meters. Altitude values must be between
0 and 10000, inclusive. There must
be no (non-zero) digits to the right of the decimal place for
altitude measurements taken before 2004-01-01. After
2004-01-01 one digit may appear to the right of the
decimal place. The system will generate a warning when altitude
values are NULL
but X and Y coordinates are non-NULL
.
Some devices and data-exporting applications favor longitude and latitude coordinates via the WGS 1984 2D CRS. Because of this, Babase can read coordinates in that system and transform them to WGS 1984 UTM Zone 37South. Regardless of the coordinate system used when the data are inserted, the coordinates are recorded using UTM. That is, "XYLoc" columns in all Babase tables have the PostGIS "geometry" datatype with SRID 32737, that of WGS 1984 UTM Zone 37South.
All PDOP columns must have values between
0
and
50
, inclusive, and have one digit
of precision to the right of the decimal. PDOP values are
unit-less and should be multiplied by the specified accuracy in
meters of the GPS unit to produce a 3 dimensional vector, in
meters, representing the possible distance from the true
location.[126]
All accuracy columns are in meters[127] with one digit of precision to the right of the
decimal and must have values between
0
and
15
, inclusive.
The kind of reported error is partially determined by
characteristics of the the GPS unit used for data collection.
GPS units which report error as a PDOP reading, those with GPS_UNITS.Errortype
values of PDOP
, cannot be related
to rows with non-NULL
Accuracy values. GPS units which report
error as an accuracy reading, those with GPS_UNITS.Errortype
values of accuracy
, cannot be
related to rows with non-NULL
PDOP values. PDOP values must
be NULL
for data collected before 1993-09-01 or after
2001-01-31. Accuracy values must be NULL
for data
collected before 2001-02-01.[128] The system will report a warning when data
collected with a GPS unit supporting PDOP or accuracy does not
include, respectively, PDOP or accuracy values.
On 2000-05-02, the United States government ended its use of Selective Availability, a national security measure which intentionally lowered the accuracy of GPS signals. For more information about this, see Selective Availability on GPS.gov. The GPS accuracy indices in Babase (Accuracy and PDOP) do not and cannot account for this inaccuracy, so users should be aware that any GPS data collected through 2000-05-02 are likely less accurate than indicated.
GPS data between May and August 2019 are unreliable, apparently thanks to some issues with the European Union's Galileo satellites. See the SWERB Notebook for more information and documentation of this issue.
Starting 2004-01-01, GPS data began to be
downloaded directly from the GPS units instead of being
transcribed by hand. One consequence is that starting
2004-01-01 operators entered up to 10
characters of descriptive codes with each GPS waypoint taken.
This information is processed and distributed throughout the
SWERB data but the various Garmincode columns retain the raw
data as entered by the operator.[129] Before 2004-01-01 the Garmincode columns
must contain a NULL
. On or after this date the Garmincode
columns must not be NULL
, but may be a string 0 characters
long.[130] SWERB_DATA are the exception to
this rule and may always be NULL
. Begin and end rows, rows
with a SWERB_DATA.Event values of
B
or
E
, may have NULL
Garmincode columns regardless of date so
that the data entry staff may supply begin and end rows without
X and Y coordinates should the field team forget to record a
begin or end row. Other SWERB_DATA rows are
except from the Garmincode requirement to handle situations,
notably those which involve lone animals, where data was written
manually for some reason.
Before 2004-01-01 the GPS_Datetime columns must be
NULL
. The date portion of the GPS_Datetime columns must
correspond to the date related to containing row. The time
portion of the GPS_Datetime column is not validated, although
the time portion of the GPS_Datetime value occasionally serves
as data against which other columns are validated.
The Garmincode and GPS_Datetime columns may be NULL, without warning, no matter the date. This is to accommodate the manual recording of data taken using GPS units.[131]
Data is validated per-observation team, per-group, per-day. Data upload and maintenance must be done within transactions that produce valid per-observation team, per-group, per-day data sets.
Note that it may be more convenient to use the views that support the SWERB data than to access the raw data.
This table contains one row for every aerial photo used in the specification of map quadrant system used in the early SWERB data.
A unique identifier of the aerial photo. This is an integer greater than or equal to 1. It is used to refer to a particular aerial photo.
This column may not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for each GPS unit which has been used in the field.
In actual fact early records of unit identification may have been lost. In such cases a row in GPS_UNITS represents a number of units having the same capabilities (i.e. of the same make and model).
The date the unit was first used (Start) must be on or before the date the unit was last used (Finish).
The label on the GPS unit, the Label value must be unique within the time period in during which the GPS unit was in use, between the Start and Finish dates, inclusive.
A 2 digit non-negative numeric value that identifies the GPS unit as a distinct object throughout all time.
This column may not be NULL
.
A short textual description of the GPS unit. If necessary this may include additional notes on such details as when the unit was used, its purpose, and so forth.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may not be NULL
.
The manufacturer of the GPS unit.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may not be NULL
.
The model of the GPS unit. This should be sufficiently detailed that the technical specifications of the unit can be found given this information.[132]
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may not be NULL
.
The type of error the unit reports. This must be one of:
PDOP
The error is supplied as positional dilution of position.
accuracy
The error is in meters.
See the SWERB Data overview for more information.
This column may not be NULL
.
The letter code marked on the unit. Note that this information is not enough to uniquely identify the unit because the same letter codes have been used on different units at different times.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may not be NULL
.
The date the GPS device entered service. This date
cannot be before 1993-09-01, the date GPS units were
first used. This column may not be NULL
.
The date the GPS unit was taken out of service. This
column may be NULL
when the unit is still in
service.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
The QUAD_DATA table contains one row for every map quadrant. For more information, see above in the introduction to the SWERB data.
Before these quads were delineated on 1981-11-01, large scale aerial photographs were used to signify location in SWERB data.
The QUADS view can be used to maintain the QUAD_DATA table. This view may also be more useful than the table when querying.
The unique identifier code used to refer to a
particular map quadrant.[133] This column may not be NULL
.
The X and Y WGS
1984 UTM Zone 37South coordinates of the centroid of
the map quadrant. This column may be NULL
.
See the SWERB Data overview for more information.
Code indicating the aerial photo in which the map quadrant is located, if any. Must be a value on the AERIALS table.
This column may be NULL
when there is no aerial
photo for the map quadrant.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every uninterrupted bout of group-level observation for which there is SWERB data.
Start and Stop values are automatically assigned
the SWERB_DATA.Time value from the related SWERB_DATA row with an Event value of
B
and
E
rows, respectively. The
begin and end of the bout of observation is determined by
the begin and end rows entered in the field (or determined by
the data manager).
Start must be NULL
or be
after the related SWERB_DEPARTS_DATA.Time, if any.
The Start value records the
start of the day's observation of the group when there exists
a related SWERB_DATA.Event value of
B
and that value is the
first for that group/day and there is no earlier SWERB_DATA.Event
E
value. Likewise the Stop value records the end of the day's
observation of the group when there exists a related SWERB_DATA.Event
value of E
and that value is
the last for that group/day and there is no later SWERB_DATA.Event
B
value. The Start time cannot be after the Stop time.
The Btimeest value is only meaningful when
either there is a begin time value or when investigation of
existing records indicates that no record of a begin time on
file -- when either the Start time
value is non-NULL
or the Bsource
value is NR
. The Etimeest value is only meaningful when
either there is an end time value or when investigation of
existing records indicates that no record of an end time on
file -- when the Stop time value
is not NULL
or the Esource value
is NR
. When the values in these
columns are meaningful they must contain a non-NULL
value,
otherwise they must contain a NULL
value.[134]
When the source of the start or stop time is
NR
then the estimated time flag
must be FALSE
and the time must be NULL
.[135][136] It is required that there be a record of whether
the start and stop times are estimated when there are start
and stop times -- the Start and
Stop columns cannot be non-NULL
when the Btimeest and Etimeest columns, respectively, are
NULL
.[137] It is required that there be a record of the
source of the start and stop times when there are start and
stop times -- the Bsource and
Esource values must be NULL
unless, respectively, the Btimeest
and Etimeest values are
non-NULL
.
SWERB_BES rows are automatically sequenced when no Seq is specified[138]by Start value,
unless the Start value is NULL
in which case they are sequenced last of all existing
SWERB_BES rows for the group/day when initially inserted and
otherwise not automatically sequenced.[139] In the case of a tie the automatic sequencing
places the newly inserted row[140] last among the rows that are tied. Seq values may be manually assigned so
long as the manual sequencing does not result in
out-of-order Start values, or in
those cases where Start is
NULL
, so long as the manually assigned sequence number is
less than or equal to that which would be automatically
assigned.[141]
As expected, changing the Start value (via a SWERB_DATA row with an Event value which indicates the start of observation) will automatically change the Seq value. Should there be other SWERB_BES rows for that group/day with the same SWERB_BEs-Start value the newly changed row will be be sequenced after the existing rows.[142]
Every bout of observation must have exactly one
beginning -- there must be exactly one related row on
SWERB_DATA with an Event of
B
. Every bout of
observation must have exactly one end -- there must be exactly
one related row on SWERB_DATA with an Event of
E
. These requirements are
enforced on transaction commit, so the
SWERB_BE row and the begin and end SWERB_DATA rows must all be created within a
single transaction. The system will
generate a warning when there are no observations in a bout of
observation -- when there are no related SWERB_DATA rows with Event values other than
B
and
E
.
The focal group, Focal_grp, must be in existence, based on GROUPS.Start and GROUPS.Cease_To_Exist, on the date of the observation.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular bout of uninterrupted observation.
This column is automatically maintained by the
database[143], cannot be changed, and must not be
NULL
.
The id of the SWERB_DEPARTS_DATA row
representing the departure from camp of the observation
team. This column cannot be changed.[144] This column must not be NULL
.
The group under observation. The legal values for
this column are from the Gid column
of the GROUPS table. This column cannot
be changed.[145]This column may not be NULL
.
The time the bout of observation started. The time
may not be before 05:00
and may
not be after 20:00
. The time must
be on the minute mark; the seconds must be zero. This
column may be NULL
when the start of observation is
unknown.
TRUE
when the Start
value is an estimation of the time the daily observation of
the group began. FALSE
otherwise. This column should be
NULL
when the Start time is
the start of a uninterrupted bout of observation but is not
the start of the day's observation of a group.
The source of the data used to estimate the Start value when that value is
estimated and represents the start of the day's observation
of the group -- how the start of the daily observation of
the group was estimated. The legal values of this column
are defined by the SWERB_TIME_SOURCES
table. This column must be NULL
when the Start time is the start of a
uninterrupted bout of observation but is not the start of
the day's observation of a group.
The time the bout of observation ended. The time may
not be before 05:00
and may not
be after 20:00
. The time must be
on the minute mark; the seconds must be zero. This column
may be NULL
when the end of observation is unknown.
TRUE
when the Stop value
is an estimation of the time the daily observation of the
group began. FALSE
otherwise. This column should be
NULL
when the Stop time is the
end of a uninterrupted bout of observation but is not the
end of the day's observation of a group.
The source of the data used to estimate the Stop value when that value is
estimated and represents the end of the day's observation of
the group -- how the end of the daily observation of the
group was estimated. The legal values of this column are
defined by the SWERB_TIME_SOURCES table.
This column must be NULL
when the Stop time is the end of a
uninterrupted bout of observation but is not the end of the
day's observation of a group.
A sequence number indicating the ordering of the bouts
of uninterrupted observation of each group each day. The first
bout of observation for the group for the day has a Seq
value of 1
, the second a value of
2
, etc.
The system automatically re-computes Seq values to
ensure that they are contiguous and begin with
1
. See the overview of the SWERB_BES table and the Automatic Sequencing section for further
information.
A boolean value. TRUE
means that the bout of
observation counts toward total observer effort. FALSE
means that the bout is concurrent with another bout of
observation by the same team and should not count toward
observer effort.
This column cannot be NULL
.
Notes, if any, on the bout of observation. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.
This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every event related to group-level geolocation.[146]Such events geolocate a group upon the occurrence of a significant activity, including but not limited to ascent, descent, and drinking. Other events include geolocation at regular intervals and the begin and end of each bout of uninterrupted observation.
The typical Babase user may find the SWERB view to be easier to query than SWERB_DATA and its related tables. It may be easier to use the SWERB_DATA_XY view to maintain SWERB_DATA than it is to modify the table content directly.
Rows with an Event value of
O
or
P
are not part of an
observation bout of the focal group and so, unless the
observed group is a Subgroup[147] or is the unknown group[148], must have a Seen_grp
value which differs from that of the group under observation
-- the SWERB_BES.Focal_grp value of the related SWERB_BES row. Likewise, rows which do not have
an Event value of
O
or
P
must have a Seen_grp value of the group under
observation -- a value which equals the SWERB_BES.Focal_grp
value in the related SWERB_BES
row.[149]The system will generate a warning when the SWERB_DATA row is for a non-focal group and the
observed group is a subgroup and the observed group is the
same as the focal group -- when Event is
O
and Subgroup is TRUE
and SWERB_DATA.Seen_grp
is the same as the related SWERB_BES.Focal_grp.
Per bout of observation, per BEId, there must be exactly one
SWERB_DATA row recording the start and exactly one recording
the finish of the bout -- exactly one SWERB_DATA row having an
Event value of
B
and exactly one having a
E
value, respectively.
The time of the observation must be between the start
and stop times of the bout of observation -- the Time value must be between (inclusive)
the related SWERB_BES.Start and SWERB_BES.Stop values.
Because SWERB_BES.Start may be NULL
the Time value is also checked to be sure
that it's not before the time the observation team departed
from camp, before SWERB_DEPARTS_DATA.Time. Because SWERB_DEPARTS_DATA.Time may also be NULL
the
Time value is checked to be sure
that it is not before 05:00. Because SWERB_BES.Stop may be
NULL
the Time value is checked
to be sure that it is not after 20:00.
The date portion of the GPS_Datetime value must be the date of the observation team's departure from camp -- must equal the related SWERB_DEPARTS_DATA.Date value. The waypoint time recorded by the operator cannot be more than 15 minutes before the actual time the observation was taken -- the Time value cannot be more than 15 minutes before the time portion of the GPS_Datetime value. The exception to this rule is when a group drinks from a water hole; for these water hole events, the waypoint time cannot be more than 30 minutes minutes before the actual time the observation was taken. The waypoint time recorded by the operator cannot be more than 5 minutes after the actual time the observation was taken -- the Time value cannot be more than 5 minutes after the time portion of the GPS_Datetime value.
The Quad column records
group location based on map quadrants and is used only in
older data. Data recorded after 1994-09-30, rows
associated with SWERB_DEPARTS_DATA rows with
Date values after
1994-09-30
, must have NULL
Quad values. GPS units were used in
later SWERB data collection so data recorded before
1993-09-01, rows associated with SWERB_DEPARTS_DATA rows having Date values before
1993-09-01
, must have NULL
XYLoc values.
Only data collected using GPS units have altitude, PDOP,
accuracy, a GPS timestamp, or Garmincode values -- when the
XYLoc column is NULL
then the
Altitude, PDOP, Accuracy GPS_Datetime, and Garmincode values must also be
NULL
.
The observed lone animal must be NULL
unless the
waypoint is an observation of a lone animal/non-focal group
— Lone_Animal must be
NULL
unless Event is
O
.
An other group observation of an unknown lone animal
is recorded in a SWERB_DATA row having a NULL
Lone_Animal value and a Seen_grp value of
10.0
(the group denoting a lone
animal).
The observed predator must be NULL
unless the waypoint
is an observation of a predator — the Predator must be NULL
unless the
Event is
P
, in which case the Predator must not be NULL
.
The observer's distance from the observed lone animal,
predator, or non-focal group must be NULL
unless the
waypoint is an observation of a lone animal, predator, or
non-focal group -- Ogdistance
must be NULL
unless Event is
O
or
P
.
Through the end of 2022, the observers' protocol for
recording this distance was either been poorly defined or
poorly adhered-to. (It is unclear which.) Distances were
occasionally recorded but usually not. It is unclear what
decisions were made at the time that might decide whether or
not to record this distance. To avoid fallacious assumptions
about the nature of the data, all distances recorded before
01 Jan 2023 have been manually set to NULL
.
In case someone wants to use the SWERB_DATA_HISTORY
table to retrieve the once-present distances, they were set
to NULL
at 2023-03-22
00:20:44.206126+03
(Nairobi time).
The observed group, Seen_grp, must be in existence, based on GROUPS.Start and GROUPS.Cease_To_Exist, on the date of the observation.
An observed lone animal, Lone_Animal, must have already entered the study population and must be alive on the date of observation -- the SWERB_DEPARTS_DATA.Date related to the SWERB_DATA row must be between individual's related Entrydate and Statdate, inclusive. The system will return a warning if the related Date is before the individual's LatestBirth.
The system will generate a warning if a lone animal is a male and is observed more than 60 days before his assigned dispersal date -- before DISPERSEDATES.Dispersed.
When a lone individual is observed, the observed group
must be the group reserved for lone animals -- when
SWERB_DATA.Lone_Animal is
non-NULL
then SWERB_DATA.Seen_grp must be
10.0
.
Interpolation does not reference SWERB data when making its computations. Consequently the MEMBERS table does not reflect SWERB sightings of lone individuals -- unless those sightings are otherwise recorded in the DEMOG table.
When a predator is observed, the observed group must be
the group reserved for predator sightings -- when
SWERB_DATA.Predator is non-NULL
then SWERB_DATA.Seen_grp must be
99.0
.
It is not possible from these data to determine the number (quantity) of predators observed. Information like this is recorded, but not in the GPS units[150]. See the Amboseli Baboon Research Project Monitoring Guide for more information.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular GPS event.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The id of the SWERB_BES row
representing the bout of uninterrupted observation of which
the SWERB_DATA row is a part. This column cannot be changed
and must not be NULL
.
The group under observation. Note that this is not always the focal group which the observation team set out to observe. For further details see the Protocol for Data Management: Amboseli Baboon Project. The legal values for this column are from the Gid column of the GROUPS table.
This column may not be NULL
.
The BIOGRAPH.Sname of the observed lone animal.
This column may be NULL
to indicate either that a
lone animal was not observed or that an unknown lone animal
was observed.
A code indicating what sort of event the row represents. The following codes are defined:
B
The row represents the beginning of a bout of uninterrupted observation of the focal group.
E
The row represents the end of a bout of uninterrupted observation of the focal group.
H
The row represents an observation of the focal group. These occur on half hourly or hourly intervals, depending on the protocol used to record the data.
W
The row records the focal group's drinking.
O
The row represents the observation of a non-focal group or lone animal.
P
The row represents a sighting of a predator.
This column may not be NULL
.
The time of the observation. This is usually the time
manually entered by the observer but in those cases where
the observer does not enter a time (such as begin and end
rows) the SWERB_UPLOAD view may use GPS
supplied information to calculate a time. See the section
on the SWERB_UPLOAD.Description column. The time must
be on the minute mark; the seconds must be zero. This
column may be NULL
when the time is not known.
The map quadrant of the seen group's location, when recorded in the field. The legal values for this column are from the Quad column of the QUAD_DATA table.
This column may be NULL
.
The X and Y WGS
1984 UTM Zone 37South coordinates of the seen group. This
column may be NULL
.
See the SWERB Data overview for more information.
The altitude, in meters, of the landscape on which the
seen group is
located. This column may be NULL
.
See the SWERB Data overview for more information.
The amount of error reported as positional dilution of
precision. This column may be NULL
when there is no PDOP
information.
See the SWERB Data overview for more information.
The accuracy of the GPS reading, in meters. This
column may be NULL
when there is no accuracy information
in meters.
See the SWERB Data overview for more information.
TRUE
when the observation is of a subgroup, FALSE
when not.
Note that the field team cannot always record subgroup information and the value in this column is therefore sometimes determined heuristically[151] when the data is uploaded by the SWERB_UPLOAD view.
This column must not be NULL
.
The distance, in meters, between the observer and the
observed non-focal group or the observer and the observed
lone animal. This value must be a 3
digit non-negative integer that is also a mulitple of
0
.
This column may be NULL
when the observers did not
record an Ogdistance (i.e. NULL
values are not to be
confused with zero distance).
The date and time automatically supplied by the GPS
unit at the time the waypoint was recorded. For further
information on when this column is NULL
and when
non-NULL
see the SWERB Data
overview.
This column may be NULL
.
The information manually entered by the observer into
the GPS unit as a coded waypoint that describe the SWERB
data being recorded. This column may be empty, it need not contain characters,
but it may not contain only whitespace characters. For further
information on when this column is NULL
and when
non-NULL
see the SWERB Data
overview.
This column may be NULL
. See the SWERB Data
overview for more information.
The PREDATORS.Predator code of the observed predator.
This column may be NULL
, when this row is not for a
predator sighting.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every departure from camp of every observation team, for those observation teams which have collected SWERB data.
The Time value may
not be NULL
when there is a related SWERB_DEPARTS_GPS row -- data collected using the
GPS units must have a non-NULL
time.
One observer may not depart camp on the same day at the
same time with two different observation teams -- the
combination of SWERB_DEPARTS_DATA.Date, SWERB_DEPARTS_DATA.Time, and SWERB_OBSERVERS.Observer, when all are non-NULL
,
must be unique.
The system will generate a warning for SWERB_DEPARTS_DATA rows having a Date after 1994-09-30 that do not also have a related SWERB_DEPARTS_GPS row.
The system will generate a warning for SWERB_DEPARTS_DATA rows for which no SWERB data was collected; that do not have a related SWERB_BES row.
The SWERB_DEPARTS view can be used to maintain the SWERB_DEPARTS_DATA table. This view may also be more useful than the table when querying.
At the time of this writing departure data prior to about March of 2011 is not in the database. The process involved in loading historical data fabricates (departure date excepted, the actual departure date is used) the minimal required departure information. The early process used by the Data Manager involving loading data from the GPS units sometimes involved removing departure information. For further information and exact dates see the Data Manager's [Process for Uploading SWERB] document.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular departure from camp of a particular observation team.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The time of departure. The time may not be before
04:00
and may not be after
20:00
. The system will generate a
warning if the time is before
05:00
or after
14:30
. The time must
be on the minute mark; the seconds must be zero. This
column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every departure from
camp of every observation team, for those observation teams
which have collected SWERB data using GPS units. This table
is an extension of the SWERB_DEPARTS_DATA
that contains the additional information collected when a GPS
unit is used to record the departure. There is at most one
row in this table for every row in SWERB_DEPARTS_DATA. When a row exists it contains
the information involving the GPS unit used by the observation
team on that day. All SWERB_DEPARTS_DATA
rows having associated SWERB_DEPARTS_GPS rows must have SWERB_DEPARTS_DATA.Date values on or after
1993-09-01
.
The date of departure (SWERB_DEPARTS_DATA.Date) must be between the SWERB_DEPARTS_GPS' Start and Finish dates, inclusive.
The SWERB_DEPARTS view can be used to maintain the SWERB_DEPARTS_GPS table. This view may also be more useful than the table when querying.
The system will generate a warning when there is more than one departure per GPS unit per day.
The id of the SWERB_DEPARTS_DATA row
representing the departure from camp of the observation
team. This column cannot be changed and must not be
NULL
.
The X and Y WGS
1984 UTM Zone 37South coordinates at departure. This
column must not be NULL
.
See the SWERB Data overview for more information.
The altitude in meters of the GPS unit. This column
may be NULL
.
See the SWERB Data overview for more information.
The error reported as positional dilution of
precision. This column may be NULL
.
See the SWERB Data overview for more information.
The error reported in meters. This column may be
NULL
.
See the SWERB Data overview for more information.
The identifier of the GPS device (the GPS_UNITS.GPS) used by the observation team. The legal values of this column are defined by the GPS_UNITS support table.
This column must not be NULL
.
The information manually entered into the waypoint by
the observer. This is a set of, mostly, single character
codes that describe the SWERB data being recorded.
This column may be empty, it need not contain characters,
but it may not contain only whitespace characters. For further information on when this column
is NULL
and when non-NULL
see the SWERB Data overview.
This column may be NULL
. See the SWERB Data
overview for more information.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every geolocated physical object, that is, for every grove and waterhole.[152]
This table may contain one row with special meaning.
The SWERB_GWS row with a Loc
value of UNK
represents
the unknown grove -- a grove with special properties. When a
SWERB_GWS row exists with a SWERB_GWs-Loc value of
UNK
then the Type value must be
G
(grove). No trees
may be located in the unknown grove -- TREES.Loc may not be
UNK
. The unknown grove
may not be located anywhere -- SWERB_GW_LOC_DATA.Loc may not be
UNK
. And when it is not
known where a group slept there can be no uncertainty
regarding the sleeping grove -- when SWERB_LOC_DATA.Loc is
UNK
then SWERB_LOC_DATA.Loc_Status
must be C
(certain).
SWERB_GWS rows that represent groves, those with a
SWERB_GWs-Type of G
,
have restrictions on the allowed Loc values due to the data structure
supplied the SWERB_UPLOAD view (the Name column sometimes contains a
grove code prefaced with the letter
“P”). There cannot be two
codes for groves, one which begins with the letter
“P” and another which
consists entirely of the same characters as the first but with
the initial “P”
omitted.[153] Because of this restriction the Babase
administrator is the only user allowed to create Loc values which begin with the letter
“P”.
With the exception of the unknown grove, the system will report a warning when the grove or waterhole has not been geolocated -- when there is no related SWERB_GW_LOC_DATA row.
A unique identifier. Up to 4 alphanumeric non-lowercase characters that uniquely identifies the row and may be used to refer to the grove or waterhole.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column cannot be changed and must not
be NULL
.
The type of place; whether grove, waterhole, or some other landmark. The legal values for this column are from the Place column of the PLACE_TYPES (codes for various landscape features) table.
This column must not be NULL
.
Up to 20 characters of alternative name for the grove or waterhole.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may be NULL
.
The date when the grove or waterhole was named. This
date cannot be before
1981-11-01
.
This column must not be NULL
.
The date of last known use after which the resource became permanently unavailable.
This column may be NULL
when observations are
ongoing or the row represents an object that cannot become
unavailable.
Textual notes on the grove or waterhole, if any.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for each time a location of a place, a grove or waterhole is recorded. Any given grove or waterhole may have its location recorded more than once.
The typical Babase user may find the SWERB_GW_LOCS view to be easier to query than SWERB_GW_LOC_DATA and its related tables. It may be easier to use the SWERB_GW_LOC_DATA_XY view to maintain SWERB_GW_LOC_DATA than it is to modify the table content directly.
The date related to the location (SWERB_GW_LOC_DATA.Date) may not be before the grove or waterhole was first observed, may not be before the related SWERB_GWS.Start value. The date related to the location (SWERB_GW_LOC_DATA.Date) may not be after the grove or waterhole ceases existance, may not be after the related SWERB_GWS.Finish value.
The Quad column
records group location based on map quadrants and is used only
in older data. Data recorded after 1994-09-30, rows with
Date values after
1994-09-30
, must have NULL
Quad values. GPS units were used in
later SWERB data collection so data recorded before
1993-09-01, rows having Date values before
1993-09-01
, must have NULL
XYLoc values, unless the UTM XY
coordinates were obtained through other means (XYSource is non-NULL
).
There can only be a source for the recorded X and Y
coordinates when there are recorded UTM coordinates -- the
XYSource value may be
non-NULL
only when XYLoc
is non-NULL
. There must be X and Y UTM coordinates when
there is a recorded source for the X and Y coodinates -- XYLoc must be non-NULL
when
XYSource is
non-NULL
.
Only data collected using GPS units have altitude, PDOP,
accuracy, and GPS values -- when the XYLoc column is NULL
then the
Altitude, PDOP, Accuracy, GPS values must also be
NULL
.
The GPS unit used to make the observation must be in service on the date of the observation -- the date of the observation (Date) must be between the SWERB_DEPARTS_GPS' Start and Finish dates, inclusive, of the related GPS_UNITS row.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to an observation which recorded the location of a particular grove or waterhole.
This column is automatically maintained by the
database and must not be NULL
.
The SWERB_GWS.Loc of the grove or waterhole associated with the recorded location.
This column must not be NULL
.
The date related to the location. This is either the
date the location was caculated or an observation date. See
the Protocol for Data
Management: Amboseli Baboon Project for further information. This column must
not be NULL
.
The time of the observation. When the data are taken
with a GPS unit this is the time recorded by the GPS unit.
The time cannot be before 05:00
and cannot be after 20:00
. The
time must be on the minute mark; the seconds must be zero.
This column may be NULL
when the time is not known.
The map quadrant of the grove or waterhole's location, when recorded. The legal values for this column are from the Quad column of the QUAD_DATA table.
This column may be NULL
.
The source of the UTM coodinate data. The legal values for this column are from the XYSource column of the SWERB_XYSOURCES (SWERB Time Sources) table.
This column may be NULL
.
The X and Y WGS
1984 UTM Zone 37South coordinates of the grove or
waterhole. This column may be NULL
.
See the SWERB Data overview for more information.
The altitude, in meters, of the grove or waterhole.
This column may be NULL
.
See the SWERB Data overview for more information.
The error reported as positional dilution of
precision. This column may be NULL
when there is no PDOP
information.
See the SWERB Data overview for more information.
The error reported in meters. This column may be
NULL
when there is no accuracy information in
meters.
See the SWERB Data overview for more information.
The identifier of the GPS device (the GPS_UNITS.GPS) used in the observation. The legal values of this column are defined by the GPS_UNITS support table.
This column may be NULL
.
See the SWERB Data overview for more information.
Textual notes regarding the record of the grove or waterhole's location, if any.
This column may not be empty, it must contain characters,
and it must contain at least one non-whitespace character. This column may be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row every time a group is observed at a geolocated physical object, i.e. at a grove or a waterhole or, possibly, some other physical landmark.[154]
SWERB_LOC_DATA rows must place a group at a single location -- each SWERB_DATA row has at most one related SWERB_LOC_DATA row. In effect, SWERB_LOC_DATA extends SWERB_DATA with additional columns.
Descent from, or ascent into, groves is indicated in the
ADcode column. To indicate a
descent, the ADCode value must relate to an ADCODES row a ADN value
of D
. To indicate an ascent, the ADCode
value must relate to an ADCODES row a ADN value of A
.
The observations recording descent from or ascent into
sleeping groves must be related to groves (the related SWERB_GWS rows must have a Type of
G
).
The SWERB_DATA row representing the
"begin" of the team's first bout of observation (the bout with
the smallest SWERB_BES.Seq value) of any group (except the
unknown group, group
9.0
)[155] in a day must be related to a SWERB_LOC_DATA row
recording descent from a sleeping grove. This enforces the
requirement that a day's observations of a group must include
the group's descent from exactly one
grove (possibly the unknown grove). A group can be recorded
as descending from more than one grove, but only when all of
the descents are by subgroups (the related SWERB_DATA.Subgroup
is TRUE
), or all but one of the descents are by subgroups
and those subgroup descents are from the
unknown grove.
Similarly, the SWERB_DATA row
representing the "end" of the team's final bout of observation
(the bout with the greatest SWERB_BES.Seq value) of any group (except the
unknown group, group
9.0
)[156] in a day must be related to a SWERB_LOC_DATA row
recording ascent into a sleeping grove. This enforces the
requirement that a day's observations of a group must include
the group's ascent into exactly one grove
(possibly the unknown grove). A group can be recorded as
ascending into more than one grove, but only when all of the
ascents are by subgroups (the related SWERB_DATA.Subgroup
is TRUE
), or all but one of the ascents are by subgroups and
those subgroup ascents are into the unknown grove. The
database rules that enforce these "ascent into sleeping grove"
rules are checked at transaction
commit.[157]
When a group splits into subgroups and descends from or ascends into multiple groves there must be a separate bout of observation, another SWERB_BES row, to record the location of each subgroup.
Whether a SWERB_LOC_DATA row must have a NULL
ADtime value or must have a
non-NULL
ADtime value is
determined by the related ADCODES.Time flag.[158] Ascent and descent times related to a bout of
observation cannot be before the beginning of the bout of
observation -- SWERB_LOC_DATA.ADtime cannot be before the related
SWERB_BES.Start
time.[159] The database rules that enforce ADtime values are checked at transaction commit.[160]
Descent and ascent times are recorded manually; they are not taken from the timestamps supplied by the GPS units. This necessitates additional columns for descent and ascent information. For further information see the Amboseli Baboon Research Project Monitoring Guide.
When the location is the unknown grove, status of that
location must be 'certain'. That is, when the Loc value is
UNK
then the Loc_Status value must be
C
.
Babase allows SWERB data to record group presence at
arbitrary landmarks, but some possibilities are rare and
result in a warning. The system will issue a warning when a
group is located at a waterhole but the recorded activity is
not “water” (when the SWERB_GWS
row's Type is
W
but the related SWERB_DATA row's Event value is not
W
).
SWERB_DATA rows representing
observation of a group drinking at a waterhole must be related
to waterholes. That is, when SWERB_DATA.Event is
W
there must be a related
SWERB_GWS row, even if it is the generic and
non-specific row which represents all rainpools, and the
related SWERB_GWS row must have a Type value of
W
. In some cases this
check is at transaction commit time and in
other cases not.
Rows that record a drinking event -- those related to
SWERB_DATA rows which have
W
Event values -- must have
SWERB_LOC_DATA.ADcode values
that indicate no involvement with a sleeping grove; the
related ADCODES row must have a ADN value of
N
.
Groups may not be located at a place before observations began at the place or after observations ended at the place. That is, the SWERB_DEPARTS_DATA.Date related to the SWERB_DATA row referenced by the SWERB_LOC_DATA.SWId value must not be before the related SWERB_GWS.Start value and must not be after the related SWERB_GWS.Finish value.
The number that uniquely identifies the row and may be used to refer to an observation of a group at a particular time at a particular grove or waterhole. This is also the SWERB_DATA.SWId identifying the group, place, and time of the observation.
This column must not be NULL
and cannot be
changed.
The SWERB_GWS.Loc of the object (grove, waterhole, or landmark) where the group was observed.
This column must not be NULL
.
A code representing the nature of the relationship between the baboon group and the landscape feature at which the SWERB_LOC_DATA row places the group. The legal values of this column are defined by the ADCODES support table.[161]
This column must not be NULL
.
The SWERB_LOC_STATUSES.Loc_Status value indicating
the status of this observation of the location on record
(this row's Loc). Usually, this will indicate whether the
observers actually saw the group at the location or inferred
that the group was there. For instance, if the group is
still in a sleeping grove when the observers arrive then
they will be "certain" about that grove (Loc_Status =
C
), but if the group is
walking away from the grove when the observers arrive then
they may indicate the grove as 'probable'(Loc_Status =
P
).
Although the database supports degrees of certainty with respect to any group location, in practical terms the only time that there will be any degree of uncertainty will involve sleeping groves. This is for two reasons. First, at present the only provision in the Amboseli Baboon Research Project Monitoring Guide involving uncertainty is with respect to sleeping groves. Second, the SWERB_UPLOAD will only ever enter an indication of uncertainty into the database when the location is a sleeping grove.[162]
This column may not be NULL
.
The median time of group decent from or ascent into a
sleeping grove. See the Amboseli
Baboon Research Project Monitoring Guide for information
regarding how median descent and ascent times are
determined. The time may not be before
05:00
and may not be after
20:00
. The time must be on the
minute mark; the seconds must be zero. This column may be
NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
The SWERB data collection protocol sometimes requires 2 GPS waypoint entries to record a group's presence at a physical landscape feature. (At the time of this writing descent from and ascent into sleeping groves requires 2 GPS waypoint entries.) This table contains one row every time a group is observed at a geolocated landscape feature and 2 GPS waypoints are required to record the data. The rows of this table contain the information stored in the second GPS waypoint, information automatically generated by the GPS unit or manually entered into the GPS unit, that otherwise have no place in the database.
It may be easier to use the SWERB_LOC_GPS_XY view to maintain SWERB_LOC_GPS table than it is to modify the table content directly.
The SWERB_LOC_GPS table extends the SWERB_LOC_DATA table[163] with additional columns; SWERB_LOC_GPS contains at most one row for every row in SWERB_LOC_DATA.
As described in the SWERB Data overview above, data was first obtained directly from the GPS units on 2004-01-01. Consequently, this table cannot contain rows dated earlier than 2004-01-01.
The number that uniquely identifies the row and may be used to refer to the GPS information involving an observation of a group at a particular time at a particular grove or waterhole. This is also the SWERB_DATA.SWId value, identifying the group, place, and time of the observation, and the SWERB_LOC_DATA.SWId value, identifying the placement of the group at a landscape feature.
This column must not be NULL
and cannot be
changed.
The X and Y WGS
1984 UTM Zone 37South coordinates of the SWERB_DATA.seen group. This
column may not be NULL
.
See the SWERB Data overview for more information.
The altitude, in meters, of the landscape on which the
seen group is
located. This column may be NULL
.
See the SWERB Data overview for more information.
The amount of error reported as positional dilution of
precision. This column may be NULL
when there is no PDOP
information.
See the SWERB Data overview for more information.
The accuracy of the GPS reading, in meters. This
column may be NULL
when there is no accuracy information
in meters.
See the SWERB Data overview for more information.
The date and time automatically supplied by the GPS
unit at the time the waypoint was recorded. This column may
not be NULL
.
This column may be NULL
.
The information manually entered by the observer into
the GPS unit as a coded waypoint that describe the SWERB
data being recorded. This column may be empty, it need not contain characters,
but it may not contain only whitespace characters. This column may not be
NULL
, although it may be a string 0 characters long.
See the SWERB Data
overview for more information.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
For teams collecting SWERB data this table contains one row for every departure from camp of every member of the departing observation team for those team members who drive or record data.
The system will generate a warning for those SWERB_DEPARTS_DATA rows without at least one related row in SWERB_OBSERVERS.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular observer's departure from camp as part of a particular observation team.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The id of the SWERB_DEPARTS_DATA row
representing the departure from camp of the observer's
observation team. This column must not be NULL
.
Initials of the observer. The legal values of this column are defined by the OBSERVERS support table.
This column must not be NULL
.
The role assumed by the member of the SWERB observation team. The legal values of this column are defined by the OBSERVER_ROLES support table.
This column must not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every tree in the tree monitoring project.
Trees can only be located in groves -- the value of the
TREES.Loc column must reference a
SWERB_GWS row which has a SWERB_GWS.Type of
G
(Grove).
Tree numbers are unique within each grove. The combination of Loc and Tree must be unique.
A unique identifier. This is an automatically generated sequential number that uniquely identifies the row and may be used to refer to a particular tree.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
The identifier of the grove, a SWERB_GWS.Loc value, in which the tree is located.
This column must not be NULL
.
The integer used to uniquely identify a tree within a particular grove.
This column must not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
The data in this section are collected from manually read instruments, with one notable exception: the DIGITAL_WEATHER table contains data from electronic instruments that record weather data automatically.
The MIN_MAXS view provides a way to view all the tables containing manually collected weather data at once, with each weather data collection event appearing as a single row.
The weather-related tables contain weather-related information and so do not directly relate to any of the baboon information contained in Babase.
This table contains one row for every time a rain gauge reading is recorded. There can be at most one RAINGAUGES row per WREADINGS row.
The identifier of the meteorological collection event during which the rain gauge was read. Must be a value contained in the WRid column of a row on the WREADINGS table, and the associated row may not be associated with any other row in RAINGAUGES.
This column cannot be changed; and must not be
NULL
.
The interval, in an integral number of seconds, since the previous rain gauge collection event.
This column is automatically maintained by the
database and cannot be changed. This column must not be
NULL
.
When the WREADINGS.WRdaytime values used to compute RGspan are not integral, the resulting RGspan value is rounded to the nearest second. Values of .5 seconds are rounded to the nearest even number of seconds.
When a new row is inserted the value of this column
is silently ignored and an automatically computed value is
used in its place. It is best to omit this column from
the inserted data (or specify the NULL
value).
Whether or not any estimated WREADINGS.WRdaytime
values were used in the computation of the RGspan column. TRUE
if any of the
relevant WREADINGS.Estdaytime values are true, FALSE
otherwise.
This column is automatically maintained by the
database and cannot be changed. This column must not be
NULL
.
When a new row is inserted the value of this column
is silently ignored and an automatically computed value is
used in its place. It is best to omit this column from
the inserted data (or specify the NULL
value).
The measurement of rain accumulated since the last time the rain gauge was read. In millimeters stored using a data type having a precision of 0.1 millimeter. For the precision and accuracy of the data itself see the Amboseli Baboon Research Project Monitoring Guide.
This column must be non-negative and may not be more
than 200.0
. This column may not
be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every time a rain gauge is installed. There can be no RAINGAUGES rows recording rain gauge measurements at any given weather station (WSTATIONS) unless there is a prior record of a rain gauge installation in RGSETUPS.
Rain gauge measurements are only meaningful when it is known how long the rain has been collected. In the event that, e.g., an elephant steps on the rainguage, there will be a period of time until the rain gauge is replaced. The first reading of the replacement rain gauge is not a measurement of rain since the last rainguage reading, but is instead a measurement of the rain collected since the replacement rain gauge was installed. The RGSETUPS table allows the system to compute RAINGAUGES.RGspan intervals when rain gauges are replaced, first installed, or after an interval of corrupted measurements.[164]
There cannot be a RGSETUPS row and a RAINGAUGES row for the same location at the same time.
The combination of RGSdaytime and Wstation must be unique.
A unique positive integer representing the rain gauge setup event.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
Code indicating the station at which the rain gauge was installed. Must be a value on the WSTATIONS table.
This column cannot be changed and must not be
NULL
.
TRUE
when the RGSdaytime
column contains an estimated time. FALSE
when the RGSdaytime column is an accurate record
of the time the rain gauge was installed.
Initials of the person who collected the data. Must be a value contained in the Initials column of a row on the OBSERVERS table.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every time a minimum temperature reading was recorded. There can be at most one TEMPMINS row for every WREADINGS row.
The Tempmin column has one
decimal point of precision, but thanks to limitations of the
thermometers the temperature is normally collected with a half
decimal point of precision; the digit to the right of the
decimal point should be either a 0
or a
5
. This may not always be so, however.
The system will return a warning when the Tempmin is not a multiple of
0.5
.
Beginning 01 July 2022, a new thermometer with higher accuracy and precision was deployed, allowing for reliable recording of temperature to the nearest tenth of a degree. For this reason, the above warning only applies to data collected before that date[165].
The identifier of the meteorological collection event during which the minimum temperature was read. Must be a value contained in the WRid column of a row on the WREADINGS table, and the associated row may not be associated with any other row in TEMPMINS.
This column cannot be changed; and must not be
NULL
.
The minimum temperature recorded since the last minimum temperature reading.
This table must contain a value between
-5
and
35
, inclusive of endpoints,
and must not be NULL
.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table contains one row for every time a maximum temperature reading was recorded. There can be at most one TEMPMAXS row for every WREADINGS row.
In extreme circumstances where a temperature reading is
known to be spurious in some way, it may be desirable to
record a correction or adjustment from the original
temperature. When this is done, the adjusted temperature
should be recorded in the Tempmax
column, and the unadjusted temperature in the Unadjusted_Tempmax column. If no
adjustment has been made, the Unadjusted_Tempmax should be
NULL
.
Because a non-NULL
Unadjusted_Tempmax indicates that an
adjustment has occurred, the Unadjusted_Tempmax cannot be equal to the
Tempmax.
Both temperature columns have one decimal point of
precision, but thanks to limitations of the thermometers the
temperatures are normally collected with a half decimal point
of precision; the digit to the right of the decimal point
should be either a 0
or a
5
. This may not always be so, however.
Newer thermometers may be more precise, and temperature
adjustments may not conveniently be to the nearest 0.5°.
The system will return a warning when either Tempmax or Unadjusted_Tempmax is not a multiple of
0.5
.
Beginning 01 July 2022, a new thermometer with higher accuracy and precision was deployed, allowing for reliable recording of temperature to the nearest tenth of a degree. For this reason, the above warning only applies to data collected before that date[166].
Values in both of the temperature columns in this table
must be between 10
and
50
, inclusive.
Weather station BC1 was positioned too close to the kitchen, resulting in spuriously high Tempmax readings. To correct for this, all Tempmax readings from that weather station have been adjusted by -4.2°C (rounded from -4.245). This adjustment was calculated as the residual + fixed effect from a model of Tempmax as a function of day of the year + random intercept of weather station with only BC1 and BC2, BC3, BC4 combined in the dataset (i.e., Tempmax ∼ day of the year + (1 | Wstation)). Day of the year was included in the model to correct for the fact that BC1 had an overrepresentation of January to June dates compared to the other three BC weather stations. BC5 was not used in the calculation because at the time of calculation there was less than one year of weather data from this station. We also calculated adjustment factors in two alternative ways which yielded extremely similar values: (1) taking the difference between the mean Tempmax of BC1 and mean Tempmax of BC2, BC3, BC4 combined (adjustment factor = -4.29°C) and (2) taking a residual + fixed effect from a model of Tempmax as a function of a fixed intercept + random intercept of weather station with only BC1 and BC2, BC3, BC4 combined in the dataset (i.e., Tempmax ∼ 1 + (1 | Wstation); adjustment factor = -4.28°C).
The WREADINGS.WRid of the meteorological collection event during which this maximum temperature was read.
This column is unique, cannot be changed, and must
not be NULL
.
The maximum temperature recorded since the last maximum temperature reading.
This column may not be NULL
.
The original, unadjusted maximum temperature, when the value in the Tempmax column has been adjusted in some way.
This column may be NULL
, when the Tempmax has not been adjusted.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
This table records the weather data that are automatically collected each hour by an electronic weather collection instrument.
Originally, this table only contained data from WeatherHawk devices and was therefore named WEATHERHAWK. Likewise, the WEATHER_SOFTWARES table was originally named WEATHERHAWK_SOFTWARES. On 28 Nov 2023, these tables were renamed to reflect that they may also contain data related to other devices. Ideally, the WEATHERHAWK_HISTORY and WEATHERHAWK_SOFTWARES_HISTORY tables should remain in the babase_history schema so that changes to those tables will remain accessible. However, when these tables were renamed their history tables were both empty. There were no archived changes in either table that needed to be preserved, so the old history tables were not retained.
A weather station cannot have more than one reading at the same time. That is, the combination of TimeStamp and WStation must be unique.
Instrument accuracy may not, and probably does not, correspond with the recorded degree of precision. These instruments collect their data in engineering units, which are interpreted and converted to standardized units (degrees, kPa, etc.) by PC software when the data are retrieved from the instrument. Different PC software programs may vary in terms of units used, the number of significant figures employed, or other ways that are not immediately apparent. There are even some values that are simply not recorded by some programs or devices.
Despite hardware and software differences, most measurements saved in this table use a single column and a specified unit. Data managers should ensure that data are converted to the appropriate units, if needed. The allowed precision in these columns — usually a single digit to the right of the decimal — is based on a private message from WeatherHawk's technical support[167], who asserted that this is the maximum plausible precision that WeatherHawk devices are capable of measuring. It is presumed that this is also the maximum plausible precision for other (non-WeatherHawk) devices. This might be more or less precise than the value originally reported by the software.
Use the WEATHER_SOFTWARES table to see what is known about differences in these programs, including precision of measurements, units used, etc.
The WSoftware column is used to indicate which software was used to generate the data in each row, but the system does not treat data any differently based on this value. Users should be aware of the possibility of differences between programs, and decide for themselves how to handle any possible discrepancies.
Information about the voltage of the device's battery is provided in the BatVolt and BatVolt_Min columns. These values are not directly relevant to weather but can be useful if technical support is needed.
Wind speed may be recorded in km/hr as an integer or m/s
with 1 decimal point of precision, depending on the software
used. The precision difference between these two measures is
large enough that they are divided into separate columns.
Each row must indicate the average wind
speed; exactly one (not both) of the WindSpeed_Avg_Km_Hr and WindSpeed_Avg_M_S columns must not
be NULL
. Maximum wind speed is not
required, but when recorded it must be in either the WindSpeed_Max_Km_Hr or WindSpeed_Max_M_S column, but not
both.
Each row must only use a single unit for all of its wind
speed values; when WindSpeed_Avg_Km_Hr is NULL
,
WindSpeed_Max_Km_Hr must
also be NULL
, and when WindSpeed_Avg_M_S is NULL
, WindSpeed_Max_M_S must also be
NULL
.
The barometric pressure value provided in this table (Barometer) is corrected, accounting for Amboseli's elevation: ~1130 m. To calculate the uncorrected values, ask a meteorologist.
Prior to Babase 5.5.3, this column contained only UNcorrected values. Those values were corrected simply by adding 12.94503[168]to the uncorrected value.
When devices like these record rainfall, they often use a small "tip bucket" that only records rain when the bucket fills (see the device's user's manual for more information) and which theoretically may contribute to small errors in the accuracy of the measurement. For example, the WeatherHawk used a 1-mm tip bucket. If there is less than 1 mm of rainfall over the course of a given hour, the bucket may not fill up at that time and the rain will not be measured until later or may evaporate before the bucket fills. When there is a gap in the hourly measurements (due to changing out sensors, battery malfunctions, etc.), rainfall data during the down period might not be recorded.
Despite the fact that data are recorded every hour, some devices (e.g. WeatherHawk) do not simply report the amount of rainfall measured in that hour. Instead, these devices report the cumulative amount of rainfall measured since the beginning of the year[169]. That value is recorded in the YearlyRain column, for those devices that report it.
The rainfall for each hour is recorded in the TimeStampRain column. For rows
whose YearlyRain column is
not NULL
, this value is the result of a simple calculation:
this row's YearlyRain minus
that of the chronologically previous row.
When this table was first created and only contained data from WeatherHawk devices, the value of the TimeStampRain column was automatically calculated when new rows were added. That is, for a given row, the YearlyRain of the most recent row from the same calendar year and the same WStation was subtracted from the given row's YearlyRain, resulting in the amount of rainfall that was measured since the previous TimeStamp.
After the last WeatherHawk device was retired and data
from other devices began to be added, this automatic
calculation stopped being useful. In Babase 5.5.1, this table's
ability to calculate TimeStampRain from YearlyRain was removed, largely
based on the assumption that future devices are unlikely to
use the dubious YearlyRain
measurement. All previously calculated TimeStampRain values were
not removed, so the TimeStampRain in a row with a
non-NULL
YearlyRain can
safely be assumed to be a result of that
functionality.
The amount of rain measured in the year cannot be less
than the amount measured in a single timestamp. That is, when
the YearlyRain is not NULL
it cannot be greater than the TimeStampRain.
Do not assume that TimeStampRain values always describe a single hour's worth of rain. When one or more hours is absent from the data, the TimeStampRain value is the amount of rainfall measured since the previous row in the same year. Also do not assume that these values describe all of the rain that occurred in the intervening hours. If the device was off or malfunctioning at the time, then actual rainfall may have occurred and/or evaporated without being measured.
A unique positive integer identifying the device's meteorological data collection that is recorded in this row.
This column is automatically maintained by the
database, cannot be changed, and must not be
NULL
.
Date and time of the measurement. Measurements must
be taken on the hour. Minutes,
seconds, microseconds etc must be
0
.
As indicated by the name, this value is a time
stamp. It indicates the end of the period described in
each row, not the beginning. This means that the last
hour of a day will have a TimeStamp from the next day,
e.g. the data from 23:00-23:59 on 31 Dec 1999 will have
a TimeStamp of 2000-01-01
00:00
.
This column may not be NULL
.
The WEATHER_SOFTWARES.WSoftware value indicating which software was used to generate the data.
This column may not be NULL
.
The record number for this line, exported in the software. This appears to be a unique ID number used by the device or the software, or both.
This column may be NULL
if the software did not
report this value.
The voltage of the battery at the TimeStamp. Values must be
between 10.00
and
14.00
, inclusive.
This column may not be NULL
.
The minimum voltage of the battery in this hour.
Values must be between 10.00
and 14.00
, inclusive.
This column may be NULL
if the software did not
report this value.
Average air temperature for this hour, in degrees
Celsius. Values must be between
-10.0
and
50.0
, inclusive.
This column may not be NULL
.
Average relative humidity for this hour in percent
humidity. Values must be between
0.0
and
100.0
, inclusive.
This column may not be NULL
.
Average wind speed for this hour, in km/hr. Values
must be between
0
and
30
,
inclusive.
This column may be NULL
.
Average wind speed for this hour, in m/s. Values
must be between 0.0
and 15.0
,
inclusive.
This column may be NULL
.
Solar radiation in Watts per square meter. Values
must be between 0.0
and
2000.0
, inclusive.
This column may be NULL
if the device did not
report this value or if a reported value was subsequently
recognized as erroneous.
Minimum air temperature for this hour, in degrees
Celsius. Values must be between
-10.0
and
50.0
, inclusive.
This column may be NULL
if the software did not
report this value.
A time stamp indicating the minute in which the AirTemp_Min occurred.
This column may be NULL
if the software did not
report this value.
Maximum air temperature for this hour, in degrees
Celsius. Values must be between
-10.0
and
50.0
, inclusive.
This column may be NULL
if the software did not
report this value.
A time stamp indicating the minute in which the AirTemp_Max occurred.
This column may be NULL
if the software did not
report this value.
Wind direction in degrees from North. Values must
be between 0.0
and
360.0
, inclusive.
The values of 0.0
and 360.0
represent the
same direction. There's no telling if one or the other
of them means something special, like “no
measurement”. If they really do represent the
same direction then we should probably change the rules
and adjust the data values so that legal values are
between 0
and
359
.
This column may not be NULL
.
Maximum wind speed for this hour, in km/hr. Values
must be between
0
and
30
,
inclusive.
This column may be NULL
if the software did not
report this value.
Maximum wind speed for this hour, in m/s. Values
must be between 0.0
and 15.0
,
inclusive.
This column may be NULL
if the software did not
report this value.
A time stamp indicating the minute in which the maximum wind speed[170] was recorded.
This column may be NULL
if the software did not
report this value.
Atmospheric pressure at the TimeStamp, expressed in kPa
and corrected for elevation. Standard atmospheric pressure
at sea level is 101.325 kPa, so this column's values must
be between 96.3
and
106.3
, inclusive.
This column may be NULL
if the software did not
report this value or the reported value was subsequently
recognized as erroneous.
The amount of rain measured since the beginning of
the year, in millimeters. Values must be integers greater
than or equal to
0
.
This column may be NULL
if the device did not
report this value.
The amount of rain that was measured at this WStation since the previous TimeStamp.
This column may not be NULL
.
An integer, indicating the number of lightning strikes recorded throughout the hour represented by this row.
This column may be NULL
if the software or device
did not report this value.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
The WREADINGS table contains one row for each time a person has collected data from the meteorological instruments. So, each WREADINGS row should have at least one associated RAINGAUGES, TEMPMINS, or TEMPMAXS row, but no more than one associated row from any one of these tables.
Automated weather readings are not recorded in WREADINGS .
For any one weather reading the minimum recorded temperature cannot exceed the maximum recorded temperature -- the TEMPMINS.Tempmin value related to the WREADINGS row cannot exceed the related TEMPMAXS.Tempmax value.
The combination of WRdaytime and Wstation must be unique.
The Wstation column cannot be changed when there is a related RAINGAUGES row.
A unique positive integer representing the meteorological data collection event.
This column is automatically maintained by the
database, cannot be changed, and must not be NULL
.
Code indicating the station from which the data were collected. Must be a value on the WSTATIONS table.
The day and time the meteorological data were collected. The time zone is Nairobi local time.
TRUE
when the WRdaytime
column contains an estimated time. FALSE
when the WRdaytime column is an accurate record
of the time the measurement was taken.
Initials of the person who collected the data. Must be a value contained in the Initials column of a row on the OBSERVERS table.
Textual notes on the weather reading.
This column may be NULL
when there are no
notes.
This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.
The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.
[26] There are, of course, also system generated row identifiers, which are arbitrary and not derived from any field collected data.
[27] As opposed to using a query to let the database do all the considering for you.
[28] This is a generated error instead of one that is immediately raised in order to ease the data entry process. Because births are recorded before CENSUS rows are entered so that new births do not raise errors when uploading census data, new births regularly have dates that follow the mother's Statdate. This could be avoided by entering births without a Pid and then updating the Pid once the CENSUS table has been updated but this was deemed overly burdensome.
[29] Recall that when an individual has no non-absent CENSUS rows, their Statdate is set to their Entrydate, which might be before the LatestBirth. It is therefore presumed that a Statdate being before a LatestBirth will only ever be a temporary occurrence that will go away after the individual's CENSUS data have been added.
[31] This number was chosen based on data management
minutiae related to the fact that a single census in a group
can interpolate an individual present in the group for up to
14
days. If this
value in the interpolation code ever changes, then the
number of days that LatestBirth
is allowed to be after Entrydate
should be re-evaluated.
[32] Thanks to the annoying habit of certain months to not be exactly thirty days — not to mention "leap days" — it's possible that different users may have slightly different interpretations of how many days are contained in "X" years. To allow some flexibility when making these estimates, this rule is implemented as a warning and not an error.
[33] This column was added when PostgreSQL depreciated its “hidden” identifier column, Oid.
[34] This is unlikely as the database will not allow entry of a duplicate Sname.
[35] At the time of this writing, the focal sample
data collection devices use the Sname
XXX
for their own special purposes.
There may be other such reserved Sname values unknown
to Babase.
[36] Or whatever you want to call it in the case of a fetal loss.
[37] This is termed a visit in the Protocol for Data Management: Amboseli Baboon Project, which should be consulted for further details.
[39] D
usually
occurs when a male is seen alone or in a
non-census group.
[40] When the Status
column is D
, the
value of the Cen column
indicates whether or not the individual was
marked “absent” on the field census
for the day.
[41] Facilities exist to require such CENSUS
rows and their associated DEMOG rows be entered in a single
transaction, and the rule requiring CENSUS
rows with a Status of
D
to have a
related DEMOG rows could
then be enforced.
[42] DEMOG nearly makes the
M
CENSUS
Status code obsolete, were it not
so hard to search on textual data. Indeed, it was created
in response to difficulties with the
M
code.
[43] It may seem odd that the Comment column may be
NULL
given that this is the only column in the table
containing baboon-related data. However the data
entered into the database can be an abbreviated version
of the actual demography note, abbreviated even into
non-existence.
[44] The system checks the group in which the individual was last censused “present” rather than the individual's Matgrp in order to accommodate group splitting.
[45] Presently group 9.0
.
This value is hardcoded
at present.
Individuals are generally put in the unknown group when interpolation does not know their group membership, but it is also possible for an individual to be explicitly placed in the unknown group.
[46] This implies that a GROUPS row's From_group and it's To_group cannot be equal.
[47] As opposed to it being merely a coincidence that the gap began the same date that the group did.
[48] Again, as opposed to it being coincidence that gap and group ended at the same time.
[49] Because there is not a separate column for fusion start date Babase can only track fusions when all groups involved start fusing on the same date. Babase cannot track fusions involving more than one group when 2 groups begin to fuse and others fuse later before the first 2 groups complete fusing.
[50] The precise definition of an "official" study group is left for data management to determine.
[51] In constrast to birth and death, which mercifully tend to be pretty definite.
[52] At the time of this writing, the date used in the case where the transition to sexual maturity was not observed is the date when the individual first came under observation and was already mature.
[53] The “ON” date MSTATUSES code is a special value. See MSTATUSES: Special Values.
[54] Note that this is not literally true, because testicular changes in males are not tracked on a daily basis - males are assigned a matured date on the first day of the month in which seen with fully round testes. Likewise, a female's first Tdate will sometimes have a few days of error around it, as might other transitions.
[55] ...or average, or standard deviation, etc....
[56] This value was chosen somewhat arbitrarily. It's certainly possible to have more than 9 of a particular wound or pathology affecting a body part, but for our uses such a high number is implausible. This value may need to be adjusted in the future.
[57] Therefore during periods of continuous observation no sexual cycle transition events can go unrecorded. See the CYCPOINTS documentation below for the constraints this places on CYCPOINTS within a series.
[58] Yes, updates to CYCPOINTS can result in automatic changes to the CYCGAPS.State, meaning that updates to both tables are occurring in a single transaction. This is okay, because updates to State do not result in changes to CYCGAPDAYS.
[59] Admittedly, validation on CYCPOINTS and other tables could be rewritten to eschew CYCGAPDAYS and use CYCGAPS instead. However, that would result in a major performance dip, so let's not do it unless we have to.
[62] See Appendix C for an example.
[63] This rule minimizes the degree to which CYCPOINTS move between cycles, minimizes the degree to which their Cids change.
[64] It may not be worth documenting this, as there are certainly cases where it is not clear which rows are “earlier”. One such case is changing the date of a Ddate to a later date, that fall after subsequent cycles. If there is concern about the permanence of Cids then it may be best to simply delete CYCPOINTS rows and re-insert them rather than modify existing rows. This at least gives the greatest degree of control over the Cid values.
[65] Quite a bit of Babase's logic relies on there being a continuous series of Mdate, Tdate, Ddate sequences unless there are gaps in observation. It is for this reason that cycles must be “complete”.
[66] This is checked rather than enforced by index or trigger because the condition must exist temporarily as the triggers update the Seq.
[68] The system allows the condition to occur to provide an opportunity to insert a new Mdate, Ddate, Tdate aggregate -- a new cycle -- into the middle of a period of observation. One of these dates must be inserted first, breaking, for the moment, the pattern of cycling -- the repetition of the Mdate, Ddate, Tdate sequence.
[69] This is enforced in triggers rather than by index as the triggers use this condition as a test for whether a new CYCLES row must be created.
[70] It is expected that such rows will exist only until PREGS.Conceive is updated with a reference to them.
[71] Note that cycles may be “cut off”, for a variety of reasons; some cycles may only contain a single CYCPOINTS row, that is, the Cid value may be unique to a single CYCPOINTS row.
[73] Or was in progress when observation ceased, which Babase treats the same as pregnancies in progress at the time data entry ceased. When “now is” is an important consideration in the determination of what in progress means. The cessation of data entry (e.g. BIOGRAPH.Statdate), for whatever reason, is the closest Babase comes to the concept of “now”.
[74] This implies that each Resume value differs from all the others.
[75] Zdate really.
[76] This condition also ensures that a female will not have more than one ongoing pregnancy, as pregnancies require a conception cycle.
[77] It is expected that such Tdates will exist only long enough to update a pregnancy's Resume value.
[78] There should only be CYCGAPS rows when a sexual cycle event may have been missed, but clearly when there is a CYCPOINTS.Resume value then no sexual cycle was missed.
[79] The MATERNITIES view does exactly this. It can be used whenever there is a need for these tables to be joined in this way.
[80] Why is this round-about-the-barn way preferred? Because curmudgeonly old database designers like to insist that keys contain no meaningful information, that's why.
[81] See? We told you that keys should not contain meaningful information.
[82] This indication of a period of no observation is not validated against the CYCGAPS table, that serves as a record of periods of no observation which are long enough that a sexual cycle transition event (Mdate, Tdate, or Ddate) may be missed. Babase does not have records of periods of no observation that are long enough to miss pregnancies. Although it would seem that CYCGAPS could be used for this purpose, and indeed CYCGAPS does “black out” REPSTATS, validating parity against CYCGAPS has not been thought through and awaits a future Babase enhancement.
Regardless, Babase does not presently automatically place a parity in the 100's -- the decision to switch between the 100s and the 1s (or 10s) must be made manually.
[83] This criteria is carefully phrased to account for gaps in the recorded data during the time period in which deturgesence probably began.
[84] When an individual matures, at menarche, there is no Mdate in the first sexual cycle.
[85] notably consortships
[86] There is no restriction on the age or maturity status of the female.
[87] This is not always as useful as it seems. See the rationale for the PARTS table.
[88] It is not that these interactions never occur among young individuals, it is that the researchers' interest is in paternity and maternity and so find that having to concern themselves with filtering out sexual interactions between juvenile individuals is distracting.
[90] and perhaps ejaculation
[91] Presumably data that is collected on a Psion or other electronic device.
[92] Requiring INTERACT_DATA.Observer be NULL
, even when
the existing value is “correct” and
synchronized with SAMPLES.Observer, ensures that the value of
the observer column has been taken into consideration by
the person modifying the database.
[93] Consult the Amboseli Baboon Research Project Monitoring Guide to be sure, but this is because the accuracy of the data are never more than one minute, if that.
[94] See the appendix: The All-Occurrences Focal Point Data.
[95] As opposed to recording the interaction with an electronic device.
[96] Whether or not a MPI_DATA row records a request for
help is determined by whether or not the value of the
related MPIACTS.Kind column is
R
.
[97] Because the individual from whom help was requested is unknown, there is no way to tell if help was given in response to the request.
[98] Note that if we had the time the sample started, to the second, and we knew that the operator never took more than 59 seconds to enter the point data, and we assume that the operator makes the observation when the timer chimes, then we could calculate the actual time the point was observed. Absent these conditions it appears difficult or impossible to tell which of the 1 minute observation intervals were missed when there is not an exact match between the number of points taken and the total number of minutes in the sample.
[99] It is possible to create a view that extends the
NEIGHBORS table by adding another column, call it
Neighbor, that contains either the Sname or the Unksname,
which ever is not NULL
. However, the utility of such a
column is not obvious because it seems that any analysis
done using such a column would have to consistently use
outer joins and then constantly test for NULL
results,
lest the Unksname data disappear from the analysis. At
first glance this seems similar to the testing which must
be done to when using two separate columns, the existing
design, so it is not clear whether there's anything to be
gained.
Such a view can always be added in the future without breaking backward compatibility.
[100] Assuming that the neighbor is a known individual, that
the NEIGHBORS.Sname column is not NULL
.
[101] The information on the actual unknown neighbor codes used in the field does not appear to be in the Amboseli Baboon Research Project Monitoring Guide.
[102] The name of the focal individual is always recorded, as there is always the intention to observe the focal individual even though this does not always happen.
[103] As the values in the POINT_DATA.Ptime column has little to do with the actual time of observation, it is impossible for Babase to perform additional consistency checks to between the points and the corresponding summary information in SAMPLES. Fortunately, as the data loading process is automated, there is little opportunity for data corruption.
[104] As all observation occurs during the day there are no issues surrounding samples taken just before midnight that start on one day and end on the next. Should there ever be such, this should be the date the sample started.
[105] The anesthetic administration times are not aggregated in this view although it could be useful to aggregate the difference between the time of darting and the time additional anesthetic was administered.
[106] To cover the case where Dartings-Pickuptime is
NULL
.
[108] The column is allowed to be NULL
due to data
entry procedural constraints. The first data uploaded
creates rows in DARTINGS but the data
set containing mass is not uploaded until later.
[109] In a canonical database design this column would be on the DPHYS table. The column is part of the DARTINGS table due to concerns that the column might be overlooked by a user because so many other note columns are on the DARTINGS table.
[110] In a canonical database design this column would be on the DART_SAMPLES table. The column is part of the DARTINGS table due to concerns that the column might be overlooked by a user because so many other columns are on the DART_SAMPLES table and DSAMPLES view.
[111] This behavior exists so that rows can be inserted into TEETH via the DENT_CODES view.
[112] The alternative to this, an approach closer to the “ideal” database design, is to have separate tables for width and length measurements. This seems excessive.
[113] This rule is a result of the aforementioned design choice that places Testwidth and Testlength in the same table. A consequence of this choice is that this rule must exist to ensure that Testseq values are, effectively, contiguous.
Note that this condition must remain true even while
the rows are in the process of automatic
re-sequencing. It may be that some combinations of
data values will simply not work with all possible
UPDATE
statements that change the row
sequencing. Those experiencing problems should delete the
rows in question and re-insert them with the correct
sequence numbers.
[114] The alternative to this, an approach closer to the “ideal” database design, is to have separate tables for width and length measurements. This seems excessive.
[115] This rule is a result of the aforementioned design choice that places Testwidth and Testlength in the same table. A consequence of this choice is that this rule must exist to ensure that Testseq values are, effectively, contiguous.
Note that this condition must remain true even while
the rows are in the process of automatic
re-sequencing. It may be that some combinations of
data values will simply not work with all possible
UPDATE
statements that change the row
sequencing. Those experiencing problems should delete the
rows in question and re-insert them with the correct
sequence numbers.
[116] Also "tissue" and "tissue sample", but those two terms aren't terribly different anyway.
[117] This is expected to be the highest plausible accuracy to ever be used for the concentrations stored in this table. This can easily be expanded if needed.
[118] Even in the coldest of cold storage, frozen samples will slowly evaporate over time. A 100-μL sample that is frozen and stored for 5 years is unlikely to still be the full 100 μL at the end of that time.
[119] It is presumed that any reader who cares enough about nucleic acid samples to read this documentation is already familiar with the polymerase chain reaction. We will not attempt to explain it here.
[120] Admittedly, this approach is imperfect and is likely underestimating the true prevalance of the problem. The date written on a sample may not be the true date it was collected but may still be a date that the individual was censused. Unfortunately, there is little else that the system can do to recognize when this occurs.
[121] In real life, this rule could easily be violated. Something is going to need to change, before long.
[122] That is, the population whose data are recorded throughout the many tables in Babase.
[123] Related rows in this table are automatically inserted when rows are inserted into BIOGRAPH, so manual insertion of these rows is effectively not allowed.
[124] Similar to inserts, related rows in this table are automatically deleted when rows are deleted from BIOGRAPH, so manual deletion of these rows is effectively not allowed.
[125] Waterholes may be more or less permanent features of the landscape, or only temporary rain pools. This is no surprise to those familiar to the SWERB dataset, but whenever waterholes are mentioned in relation to SWERB data the “waterhole” may be either a waterhole or a rainpool.
[126] It is believed but not certain that this is the way PDOP is used.
[127] It is not clear whether the accuracy is 2 or 3 dimensional vector; whether the reported distance includes error in altitude.
[128] Because database rules which enforce when PDOP and
Accuracy values must be NULL
are hardcoded into the
database it will take programmatic changes to change these
limits. Normally this would be avoided by adding a column
to the GPS_UNITS table to indicate whether
or not the particular GPS unit records a PDOP or accuracy
reading, thus allowing new units to be introduced which
record such data. However because records have been lost as
to which specific GPS units were used when and, as of the
time of this writing, no one wishes to reconstruct the
categories of GPS units in use based on a PDOP/Accuracy
capability criteria the system design uses hardcoded dates
to validate. Note further that given the existing set of
validation criteria for PDOP and Accuracy there is never a
circumstance which requires a PDOP or accuracy to be
present. Normally the values of GPS_UNITS.Errortype
would force the presence of PDOP or Accuracy values.
Instead they merely enforce their absence. This is partly
for reasons similar to the preceding and partly because,
particularly during periods when GPS data was
hand-transcribed, sometimes data is missing.
[129] And, possibly, subsequently corrected by the data specialists after consultation with the field teams.
Because the data manager expands the observer codes in the departure rows from 1 to 3 characters the SWERB_DEPARTS_GPS.Garmincode column can hold more than 10 characters.
[130] From a database design perspective it would make sense to control whether or not a Garmincode must be present based on a column in the GPS_UNITS table. In practice because all future GPS units will very likely allow the entry of data when waypoints are taken the matter is moot.
[131] While it may be desirable to have a cutoff date after which all data obtained using GPS units must come from the GPS units themselves, no such cutoff date has been established.
[132] Electronic manufacturers have taken to silently changing the specifications of a device without changing the model, a situation which is quite annoying when the specifications matter. When no other sort of identifying information is available sometimes the serial number can be used to determine device capabilities.
[133] The Amboseli Baboon project data protocols require these codes have a particular structure. Babase does not enforce these requirements, primarily because the QUAD_DATA table is essentially a support table and, once created, is static so enforcing specific rules in the database is not worth the time.
[134] Note that rows that violate this rule are not instantly rejected; the error is caught at the time of transaction commit. This is so that during data entry Btimeest and Etimeest values may be entered without Start and Stop values in the expectation that by the time the transaction is committed the insertion of SWERB_DATA rows will have automatically filled in the missing Start and Stop values.
[135] This last check is also performed at transaction commit time, for the same reason.
[136] Ideally, a begin or end time should not be NULL
unless the records have been perused and no time found, in
which case the time source would always be
bb_norecord
when there was no time. In
practice this has not been done.
[137] Note that this rule is tested for immediately, not at
the time of transaction commit. This
means that the Btimeest and
Etimeest columns must be
non-NULL
before inserting SWERB begin and end rows that
have non-NULL times.
[138] More precisely, when the SWERB_BES.Seq is
NULL
. This typically amounts to the automatic
sequencing of newly inserted rows because those are the
rows which typically have no Seq value.
[139] At first glance it would seem appropriate to
sequence those SWERB_BES rows with
NULL
Start times based on
the first related SWERB_DATA.Time value but this presents a
number of problems. Such a design would not allow for any
flexibility in manually re-sequencing such rows unless
automatic sequencing took place only upon insert of SWERB_DATA rows, in which case inserting and
then deleting the inserted row could change the sequencing
of the SWERB_BES rows. Such
un-reversible changes can be confusing.
[141] Manual sequencing is therefore only useful when the
SWERB_BES.Start is NULL
or when there are
“ties”. Sequencing is normally manipulated
by changing SWERB_BES.Start values, which are themselves
automatically picked up from SWERB_DATA
rows with B
Event values.
When testing for correct sequencing of a SWERB_BES row other bouts of observation
(other SWERB_BES rows) related to the
same group on the same day cannot have a smaller Seq and also have a Start value greater than the
smallest related SWERB_DATA.Time related to the given row. In
those cases where other bouts of observations related to
the same group on the same day have a NULL
Start value the comparison is
instead against the other bout's earliest related SWERB_DATA.