Data and Database FAQs
On this page, we attempt to compile a list of answers to frequently asked questions from Babase users. This page is a living document, and will be updated regularly. If you feel like an important FAQ has been omitted, please make a suggestion to a data manager or project leader!
To be clear: this document is not a replacement for the database tech specs. If you have a question and don't see it listed here, you should try to find the answer in the tech specs before asking someone for help.
Data and Database FAQs
- How do I know when a group fission begins and ends?
- How do I determine proportional ranks?
- Which groups are "wild-feeding"?
- How do I determine if a group is a study group?
- How do I know when an individual is an "adult"?
- How do I calculate group size (per day/month/year)?
- Where do I find data about weather in Amboseli?
- How can I make a Babase-like database of my own?
How do I know when a group fission begins and ends?
During fission periods, daily group membership is in flux. An individual may be censused in a daughter group one day, the parent group the next, and another daughter group the following day. During these times, many of our common calculations and measurements (e.g. group size, social networking calculations, maybe even ranks) can be performed but they may not be biologically meaningful. Because of this, when gathering data for an analysis we often exclude data during fission periods.
To exclude fission periods, use the GROUPS_HISTORY view. Remember, nearly every group's lifetime is flanked by fission periods: the fission that created the group and the fission that dissolved it. The time between these periods, when a group is 'permanent' and not fissioning, is likely the time period that you want to retain for your analyses. This time begins on the group's "Permanent" date (when the 'creating' fission ends), and ends on the "Impermanent" date (when the 'dissolving' fission begins).
So for example, if you have a table (or subquery) called "mydata" that includes a "date" and "grp" column, you might exclude fission periods with a query like this:
SELECT mydata.* FROM mydata JOIN groups_history ON groups_history.gid = mydata.grp AND groups_history.permanent <= mydata.date AND (groups_history.impermanent >= mydata.date OR groups_history.impermanent IS NULL) -- For groups that haven't fissioned yet
How do I determine proportional ranks?
Use the PROPORTIONAL_RANKS view!
Which groups are "wild-feeding"?
Any group that is numbered "3.n" was the "Lodge" group or a fission product from it. These groups are NOT wild-feeding. As far as we know, all others are.
How do I determine if a group is a study group?
Many groups were study groups for their entire duration, but many other groups are only study groups during a period of time that is specified in the GROUPS table and GROUPS_HISTORY view.
A group is first considered a "study group" on its GROUPS.Study_Grp date. The end of its tenure as a study group is determined by one of the following:
- We stopped regular observation/study of the group (i.e. we "dropped" them), indicated by the GROUPS.Last_Reg_Census date
- The group ceased to exist, indicated by the GROUPS.Cease_to_Exist date
- The group has not stopped being a study group (i.e. it is a "current" study group), so both the Last_Reg_Census and Cease_to_Exist are NULL
Note that it is possible for a group to cease to exist after having been dropped. Queries that check each of these dates should not consider the Cease_to_Exist date if Last_Reg_Census is not NULL.
An example query, returning each row in MEMBERS that is from a date in a study group:
SELECT members.* FROM members JOIN groups ON groups.gid = members.grp WHERE members.date >= groups.study_grp AND (members.date <= COALESCE(groups.last_reg_census, groups.cease_to_exist) OR (COALESCE(groups.last_reg_census, groups.cease_to_exist) IS NULL) )
One more thing.
The "AND" clause at the end is written that way to demonstrate the different possibilities bulleted above. That clause could be written more elegantly so that it only uses COALESCE() once. When those two columns from GROUPS are NULL, then we can satisfy the condition simply by providing a third option in the COALESCE() that will certainly always be after any date in MEMBERS:
AND members.date <= COALESCE(groups.last_reg_census , groups.cease_to_exist , NOW())
If a day ever comes where MEMBERS somehow has dates that are after NOW(), then this version will no longer work. It's probably safe to assume that won't happen.
How do I know when an individual is an "adult"?
For females, this is indicated in MATUREDATES. Males have mature dates as well--these indicate testicular enlargement--but usually this is not truly when a male becomes an "adult". In most of our analyses, the date the male first attained rank over adult males (in RANKDATES) is a better indicator of when a male is truly "adult".
If you're unsure which date to use for males, talk with your PI about which measurement would be more appropriate for your project. The answer is usually "rank date", so be prepared to discuss why testicular enlargement is especially important or relevant to whatever question(s) your project is investigating.
How do I calculate group size (per day/month/year)?
Calculating this daily is a simple query, but more often, users want to calculate group size per month. There are a few ways to go about doing this, each with its own inherent strengths and weaknesses. See two examples here, from the baboon project's shared GitHub repository.
Where do I find data about weather in Amboseli?
MIN_MAXS shows daily temperature lows and highs and rainfall, collected manually by members of the field team nearly every day since 15 Aug 1976. WEATHERHAWK has temperature lows and highs and rainfall, but it also has various other weather-related measurements (wind speed, barometric pressure, etc.). However, WEATHERHAWK has several important differences:
- The earliest data point is from 09 Nov 2003.
All of the data are collected by a WeatherHawk weather station.
- Occasionally, parts of the weather station wear out (e.g. sensors for solar radiation or humidity), and gaps form in related data sets until the part is repaired/replaced. A list of known gaps like these is maintained in the WEATHERHAWK_SENSOR_CHANGES_AND_FAILURES table in babase_pending.
The software used to collect data from the instrument has changed over the years, resulting in changes in the units used for wind speed measurements, and a change from integers to floating point numbers. See the WEATHERHAWK_SOFTWARES table for more information.
Each data point is for a single hour. Pretty much every other dataset in our database corresponds to a particular day or month, so you'll need to aggregate the WeatherHawk data into days/months before you can join it to other datasets.
How can I make a Babase-like database of my own?
It's complicated. See this page for our advice.