Protocols for Data Management
Alberts Lab - Duke
September 2007
Last updated 07 September 2007 by L.Maryott
Contents
General Introduction
This document should contain all of the necessary procedures for data management related to the Amboseli Baboon Project at Duke University. Included is a description of procedures for handling incoming data from the field (electronic and paper) as well as detailed guidelines on how to enter and update information into the master database. A similar document exists at Princeton outlining the steps for managing the data held there.
A few basic rules:
- Always back up your work
- Follow the backup schedule
- Keep up with the data entry and proofing
- Keep track of all data entry, proofing, and updating in the appropriate log books
- Enter Amboseli data in the appropriate subdirectories of the “ABRP_DATA_MANAGEMENT folder”.
- Document any new protocols or protocol changes
Most of the data management should follow these steps:
- Sorting/filing of incoming original data from Amboseli
- Data entry
- Data proofing and correction
- Data validation and updating
- Data archiving
The following sections of this document explain in detail how these steps are followed with the different types of data. This manual is only for managing the data entry and updating. In depth details about the BaBase data management system and detailed descriptions about the data and data collection can be found in the Amboseli Baboon Project : Data Management System,, Pocket Reference, and the Guidebook for the Long-term Monitoring of Amboseli Baboons and their Habitat, respectively. These guides can be found online at the following url’s:
https://papio.biology.duke.edu/babase_system.html
https://papio.biology.duke.edu/pocket_reference.pdf
http://www.princeton.edu/~baboon/monitoring_guide.htm
Filing and Management of Amboseli Data
Original data is kept in the Alberts lab library at Duke. Original data remains in Amboseli until someone traveling from the field brings it to the US. Photocopies of the originals arrive every month at Princeton. The photocopies are filed and stored in the Altmann lab. Some photocopies are sent to Duke for data entry purposes.
Filing Original Data
Monitoring Data - There is a yearly monitoring notebook (binder) for each study group. The following types of data are filed in the monitoring notebooks: census and sexskins, demography notes, reproductive notes, subgroups, mounts/consorts, other groups, predation/human disturbance, grooming, decided agonisms, undecided agonisms, multiparty interactions, wounds and pathologies, and other notes.
- Use the existing notebooks as a model
- Make sections exactly as in previous binders
- Census, demography, Sex Skin and reproductive notes go together
- Check to make sure all the pages are there if they are listed as 1 of 5, 2 of 5 etc.
- Note big gaps in data that you notice (ie. 6 months missing or a group is missing)
- Try to fit a whole year in a notebook if possible (one group per binder)
- Tape the bottom page so the data stays in place but is still accessible from the top
Meteorological Data – Data on ambient temperature and rain gauge readings at the camp. Filed in chronological order in the “Meteorological Data” notebook.
Charcoal Fridge Temperature – Max-min data for the temperature of the charcoal fridge. Filed in the “Meteorological Data” notebook
Half-hourly Temperatures – Half-hourly weather recordings are now being sent as electronic files (to Princeton) so don’t be expecting to add many more to this section of the “Meteorological Data” notebook
Neonatal Assessments – There is a notebook called “Neonatal assessments” Both neonatal assessments for each new individual get filed in here. Neonatals are filed by group, alphabetically by the mothers name, and then infants in birth order for each mother.
Canine Condition and Scrotal Development – These two types of data are filed in the same notebook “Canine Condition and Male Age Estimates” by data type, group, and year.
Hybridity Scoring – There is a notebook called “Hybrid Scoring”. The original hybrid score sheets get photocopied when they arrive. The originals go in the hybrid score binder in the library and the photocopies go in the “working hybrid scores” binder. The data gets entered from the working photocopies.
Male Age Estimates – The get filed in the “Canine Condition and Male Age Estimates” notebook
Other Group Censuses – When possible, census data are collected for several other baboon groups in the Amboseli region including Joy’s, Nzige’s, Proton’s and Stud’s. These data are filed in the “Other Group Censuses” notebook chronologically by group.
Monthly Summaries and Satellite Phone Logs - File in “Monthly Summaries” box
Daily Activity Calendars - File chronologically in the “Activity Calendar” notebook by Person.
Fecal lists – Lists of fecal samples are put in the “Fecal Lists” box
Tree Grove Monitoring – These data are filed in chronological order in the “Tree Grove Monitoring” notebook.
Reviewing Original Data
When filing original data try to get a feel for what is going on and make a note of any missing periods of data that we will need to track down. Princeton receives data monthly so we can check with them to see if they have the photocopies to see if data was collected during the given time period.
Filing Photocopied Data
Photocopied data gets sent to Duke from Princeton when they receive it every month. This is to ensure that Duke can enter and process data in a timely fashion as the originals only arrive sporadically. The ad lib data (mounts and consorts, grooming, and agonisms) as well as census and demography notes data arrives and gets filed in the “Working Photocopies” binders organized by group. Data sent logs will also be sent and shoulb be filed in ‘Working Photocopies of Original Electronic Data Sending Logs” When the data come in follow these steps:
- Check to make sure all the data is there (no missing groups or pages).
- Check to see if the photocopies are clear and no edges have been cut off
- Cross-check the log sheets for GPS, Psion, and Data Sent against the actual data received throughout the respective month to ensure that what they team says they are sending and what is actually received matches up.
- Read the demography notes and check for any new grove and waterhole information (see GPS data section)
- Mark any data changes on the photocopies. These changes will come by email from Princeton. Princeton checks the data thoroughly and clarifies any questions with the field team. The questions and responses get cc’ed to Duke. Make any relevant changes. Look out for this email (sometimes they forget to cc us). If there has not been one then email Princeton.
Electronic Data
All electronic data is saved on the Duke server in subdirectories of the directory bio-beagle\home\a\alberts.lab. This directory will hereafter be refered to as Y:\. The Amboseli field team emails electronic data files generally on a weekly basis. These weekly emails contain three main types of data:
- CSV weather files (.csv) Point sample data (PSION) (.pts) GPS readings (.mps)
The email is sent to both Princeton and Duke. Princeton works with the csv weather data and Duke works with the GPS and Psion data. When a new email comes in, unzip the attached files and save only the GPS and Psion data to the appropriate month/year folder in Y:\ABRP_Data_Management\Data from Amboseli on the server. These 2 types of files should also be copied into the respective ‘in progress folders’, Y:\ABRP_Data_Management\DATA\GPS\In Progress or Y:\ABRP_Data_Management\DATA \PSION\In Progress. Princeton will be responsible for the weather data. Emails are then filed as backup. There are logbooks to keep track of data coming in so we can keep track of missing days and easily follow up. Further details are found in specific sections below.
As well as the weekly data, end of month data arrives sometime during early the following month. They get filed in the appropriate year-month folder. There are Matrices (.xls) ,Cash Account (.xls) ,Monthly Reports (.rtf), Salary Calc (.xls), and Agonisms (.xls).
A copy of the agonism files need to be renamed according to the naming conventions for adlib data (see section below), noted in the agonism logbook, and put in Y:\\ABRP_Data_Management\DATA\ ADLIBS\In Progress\AGONISMS for reformatting and proofing (see section on Agonism entry).
BaBase Database
The Babase database has now been moved onto the world wide web and can be accessed by going to https://papio.biology.duke.edu/phpPgAdmin/. A login can be requested through the database managers or Karl Pinc. Information about how updated the database is can be found at the babase wiki website, https://papio.biology.duke.edu/babasewiki/. It is important to use this site to establish that no one is in the process of updating, or otherwise altering the data before you begin to make any changes to the database.
Basics for Data Entry and Management
Many different types of data are being entered and utilized for the Amboseli Baboon Research project. Each of these types of data has a unique format and several programs are used in the process of entry, proofing, validating, and updating. Nearly all data types used in the project adhere to the following system for management.
Log Books
Each data set has its own log book containing a record of the current status of the data and all procedures as they are completed. These procedures include: arrival (electronic data), entry, comparison, validation, updating, and archiving of the data. Once each of these procedures is finished, it is checked off in the log book. Thus, the last check made reveals the current status of the data set.
Rules for Data Entry
For many types of data, two datasets are entered independently by two different people. This allows for proofing by comparing the two datasets against each other. The data sheets must be entered in exactly the same order for this proofing process to work. In general, start data entry at the top left of the page and proceed to the bottom right, entering data in each column in the order it was hand-written. For extra insurance, the person entering the “a” dataset should note the order of entry on the photocopied data pages so that the “b” dataset will match. Numbering and good notes during the data entry phase can save a lot of time and confusion later on.
On occasion, letters or numbers are omitted or unclear on the photocopies. It is possible that the originals may be discernable so omit the line for now and check the originals when they arrive. Clearly note the problems in pencil, and temporarily mark the page with a post-it note. If the problem is not resolved once the originals have been checked do not enter the record and make a note on the photocopy. Remember not to proof or update the file until the issue has been resolved.
Naming Conventions for Adlib Data
All Adlib data follow the same general naming convention. Beginning in early 2005, all datasets used to update the BaBase database will be entered into Excel and saved in tab-delimited text format (.txt extensions). The format is shown below:
G T M M Y Y a/b .txt
G = Group and refers to a one letter abbreviation of the population of study animals who were observed to produce the data in the file. The groups most often needed in file naming are:
LInda’s Nyayo’s Omo’s Viola’s Weaver’s
- Other study groups include Alto’s, Dotty’s, Joy’s, Lodge, and Nzige’s.
T or TT = Type of data. This field may consist of one or two digits. The types of data that are entered regularly are Agonism, Grooming, and Mounts/Consorts/Ejaculations. Other types to know are CeNcus and Matrices
MM = Two digits denoting the month. Always use a leading zero for months with MM less than 10. Use 01 for January, 02 for February etc.
YY = Two digits denoting the year. Always use leading zeros for years with YY less than 10. Use 99 for 1999, 00 for 2000, 01 for 2001 etc.
a/b (a or b) = The proofing system requires two sets of data to be entered. Use a for the first set entered and b for the second set entered.
Data Management
There are two main concepts to keep in mind when working with the data. First, because several computers and individuals will likely be involved in the data entry process we have to be very careful to keep track of the most recent versions of files and at what state of completion they are. Second, following proofing, validation, and update of database (or completion of data that is not uploaded directly into BaBase), immediately save data files and associated text files (when applicable) to the appropriate archive disk. To keep track of data we use the following four stage system
Working Data – The working stage begins when you first start to enter data and continues until the data has been fully entered and is ready for proofing. The working data should contain the most updated version of all files being entered. Once a file has been completed it will be transferred to “In Progress” for proofing. The Working Data is kept in the “Working Data” folder on the server in the bio-beagle\home\a\alberts.lab directory.
In Progress – Data that has been entered but is not ready for updating into BaBase is considered “In Progress”. Delete data from the “Working Data” folder (for undergrads and other data helpers) once it is transferred to the “In Progress” folder within the folder for the appropriate data type in the Y:\ABRP_Data_Management\DATA folder. Data “In Progress” includes the proofing, manipulating, and validation stages.
Final Data – Final copies of all data are stored in the folder for that particular data type (usually by year/month) in Y:\ABRP_Data_Management\DATA
Archive – Data archiving is extremely important to protect the original files from being accidentally opened and modified, and ensures that we have routine backups of the individual data files. All archived data are saved to cd’s unique to a specific data type. The cd’s are stored in the black box on the bookshelf.
INTERACTION DATA
The interaction data are entered as monthly files. Each group has a separate file for each of the three classes of interaction:
- (1) grooming (2) agonism (3) mounts/consorts/ejaculations
Changes to Photocopied Data
When there are errors or omissions in the data corrections or clarifications need to be noted. Princeton goes through monthly monitoring data and gets clarification from the field team if anything is missing or out of the ordinary. The email with questions and then replies should be copied to Duke. You have to look out for these. If there are changes to the data it should be made in pencil (and initialed and dated) in the working photocopies binders. It is important to do this before the data gets entered.
Data Entry
Rules for data entry vary slightly for the different types of interaction data. In general, two copies of each dataset are entered (to allow for proofing later). Because the proofing program works by comparing the datasets line by line, entering both versions in the same order is critical. All datasheets should be numbered and the same order followed by both people entering. In addition, data enterers should clearly note (in pencil) any observed errors directly on the photocopies (be sure to initial and date the note).
When entering data remember:
- (1) Check for data corrections from Princeton before entering (2) Enter all data in the appropriate folders in the ‘’Working Data” or “In Progress” directories. (3) Copy completed files to the “In Progress” directory for comparing and uploading. Remove from the “Working Data” folder. (4) Once data is updated in Babase it is put in the “final data” folder in the appropriate ADLIB directory and also burned on an archive disk. The data can then be deleted from the “In Progress” folder.
Remember that there is a log book associated with each type of data and keep it updated as you go. Within the “in progress” folder there will be a .txt file for each month that has been entered. If there is a B copy then the data has not yet been compared/corrected. Once the files are compared and corrected (A files is exactly the same as the B file) the B file gets deleted.
Grooming
There is a file for each group and each month. Four columns get entered. Date, Actor, Actee, (corresponds to X g Y) and Act (G). There are a few things to look for when entering this type of data.
- Self grooming (OFR g OFR)
- “White out” with nothing entered
Circle these entries on the photocopies and write “not entered” including the date and your initials.
Names that are unclear (DOU and DOV). If unclear go back to the census etc. to see which individuals were actually present in the group that day. Also, check Biograph in BaBase, sometimes they are actually dead but misidentified.
Don’t include the row if it is really unclear. Make a note if similar problems keep arising and contact the team.
Agonisms
Four columns are entered. They are date, actor, actee, and act. Similar to grooming entry but sometimes it is hard to distinguish “dS” and “OS”. The field team should be using a small “d” in the data recording. Data entry for one copy usually done in Kenya by SNS and sent with the emailed data. When emailed agonisms are saved in the “In Progress” folder and renamed remember to check off the file in the logbook with “SNS” as the initial. The agonism files from SNS have to be manipulated so that there is only one sheet per file and that the columns are in the correct order. The following steps should be followed to complete this reformatting:
- Each file must have every worksheet deleted except that for the given month
- The data sheet for the given month then needs to be reordered. SNS enters these data in the following order: actor, act, actee, date.
- The first step is to cut the actor column and paste it to the right of the date column.
- Next, cut the actee column and paste it to the right of the actor column.
- Lastly, cut the act column and place it at the right end of the other three columns.
- Delete the remaining empty columns to the left of the date column.
- You will then need to add in the appropriate headers for each column
- The file should now resemble the files which are entered at Duke.
- Save the file as a text file using the naming conventions above, into the ‘In Progress’ Folder. Then make sure to delete the initial file from SNS from the ‘In Progress’ folder.
Mount and Consorts
Similar to the two previous types of data but there is a time for mounts (entered as a start time) and a start and end time for consorts (“E” is an artificial end time). For the act, “E” corresponds to ejaculate seen, “M” is just a mount. All the consorts will have a “C”. If there is no start or end time the row can still be entered leaving that field blank.
Proofing
The current proofing system ensures data integrity by requiring that each dataset is entered by two different people. As mentioned previously, this allows for comparison between the two versions and greatly increases the chances of accurate data entry once all discrepancies between the copies have been resolved.
Unlike the entry phase, proofing follows the same procedure for all interaction data types.
- Copy files from the “Working Data” folder to the appropriate “In Progress” folder in Y:\ABRP_Data_Management\DATA\ADLIBS
Open the papio directory by going to https://papio.biology.duke.edu/ and select the program named “wwwdiff”
- Run the proofing program by selecting the A and B copies respectively, selecting the “Tabular by Word” option and clicking the “Diff” button. This will tell you where the discrepancies are on a line by line basis.
- Correct any mistakes by referring to the data binders and carefully researching any observed discrepancies. Be sure to note any judgment decisions, including omissions, in the data binder.
- When the two files are identical delete the B copy and retain the A copy. Update the logbook
Validation and Update
The remaining compared and corrected data file is the “A” set. The validation program searches the data for any records that are problematic. It checks the “A” set for problems such as individuals that are not in the demography files or that are recorded as deceased.
Open the papio directory by going to https://papio.biology.duke.edu/ and select the program named “Upload”
- This program completes both the validation and update steps by catching errors one by one, producing specific error messages, and allowing the user to subsequently fix the file before the upload process takes place. The file will not be uploaded into Babase 2.0 until all demographic and other errors have been corrected.
- In most cases, you will set the database = babase . There may be times when it is desirable to try out the upload program using babase_test or babase_copy, in which cases, the database field should be set = babase_copy or babase_test.
- Login using your PPA username and password.
- Select the table or view which you wish to upload into. Views are usually the easiest to upload into and for adlib interactions, the view of choice will be actor_actees.
- Check the box that allows the upload of null values and leave the null representation blank.
- Select the file which you wish to validate using the browse function and run the program by clicking “Upload”. Again, this will not actually upload the file until all errors have been corrected.
- If the update was successful “Upload” will tell you that the program completed successfully and how many records it added to actor_actees.
Be aware that some validation errors may require a bit of research before they can be resolved, For example, an individual who is classified as deceased in the demography files may turn up grooming. This may be because the individual never died, the three letter name code may have been recorded incorrectly on the original, or the animal may have been misidentified in the field. In any case, the discrepancy must be resolved. Ask Susan, Jeanne, or write to Kenya.
Keep running the program until no errors are left. After all the issues have been resolved, the dataset is ready to be added to the database. Remember that you CAN NOT update the interaction data until the demography update has been completed at Princeton for the particular 6 months in question. Interaction data gets uploaded every 6 months, after the demography update has been done.
- Copy the “A” file to the appropriate “Final Data” folder and also to the archive disk
- Make a note in the logbook as you go
- When an entire year is done and archived in the “Final Data” folder and on an archive disk delete the archived files from the “In Progress” folder.
PSION DATA
The Psion data is also referred to as the point samples and the focal animal samples data. This dataset has been collected since the project began, but was handcoded until August 1999. From this date on, it was entered directly into the handheld Psion units. The first two months are riddled with errors, so that data has been discarded. Therefore, Psion data effectively began in October 1999.
Psion data is emailed from the field every week (See section on filing Amboseli data for more information). The number of files per month varies somewhat – usually you can expect between 20-40 files. The file names are a combination of year, month, day, and Psion unit number. For example, 040103P1.pts has data from Psion unit 1 from 3rd of January 2004
Proofing
Before updating the Psion files, check each one and make sure the data looks okay. Common problems include double header rows, lack of food codes, and aborted rows. The data should have the same group and same initials (observer) all the way through but different snames. There are exceptions where the groups did switch but not very often. The header will indicate the observer, female or juvenile, and sname. The Psion data is only for females and juveniles. Remember to record the checking process in the appropriate log book. These .pts files can be viewed and edited with Notepad. See Monitoring Guide for more information about the PSION data collection.
- Common Errors:
- Two header rows or header row at the end with no data following – Delete the first one.
- Group name changes within a file – Check Biograph or binders to find out where the individual was on the particular day. Sometimes there is actually a true change of group.
Blank food codes – On a line with <pnt> F there should be and F for feeding, the neighbors, and a food code. This food code should never be blank. Add UNK if no foodcode.
Extra food codes – Any line without <PNT> F should NOT have a foodcode. If a foodcode exists delete the foodcode but leave the rest of the line as is.
- Both a JUV and FEM header with same sname – If there are two headers with the same sname and one is listed as a FEM and one is listed as JUV figure out the age of the individual on that date and delete the offending header and any associated rows. Rules are found in the babase documentation.
The data needs to be checked for glaring errors. It is not checked as thoroughly as other datasets. A good procedure to follow is to go through each file 3 times.
- Check for double headers and group name
- Compare the name in the header to the name in the points area
- Check for other errors (missing food codes etc.)
Validation and Update
Psion data needs to be updated in coordination with the demography update. Once all files for a given month have been proofed (and assuming the demography update has been completed at Princeton), you can then update the information into BaBase.
Back-up the BaBase database to the papio server, and subsequently a cd using the unix secure shell.
- If you are going to want to be able to restore tables within the database:
- [YOU@papio ~]$ pg_dump --file YOURBACKUP.sql --host=localhost --format=c database=babase --user YOU babase
- If you are going to want to be able to restore tables within the database:
Open the papio directory by going to https://papio.biology.duke.edu/ and select the program named ‘PsionLoad’.
- Set the database = babase, enter your login information, and select the psion file you wish to upload. The file format for upload should be .PTS.
- It is very important to note that currently, Psionload is only able to accommodate the latest version of Psion software as of 7 Aug 2007. The Psion handheld units should all have the most current software, but if that changes, the data will not be compatible with the current Psionload program.
- If there are errors in the file, individual error messages will be produced and displayed and none of the file will be uploaded until all errors are corrected. Error messages will supply a line number. This line number can be found by opening the file in textpad. The line numbers will match up between textpad and the error message.
- When a month is uploaded, the folder should be moved to the ‘Final Data’ folder.
- Archive files by copying the entire folder for that particular month to cd.
- Mark progress in the log book.
- Delete folder from ‘In Progress”
NOTE
When PSION data is uploaded into babase 2.0 with Psionload the following errors may come up. The following rules for handling such errors were decided by SC and JA October 2006
- When the error message states that an individual is its own neighbor, the neighbor should be changed to 998.
- When the error message states that there is a ‘non-unique’ neighbor, the second occurrence of the neighbor on that line should be changed to 998.
- When the error message states that there is a dead neighbor, biograph should be checked to ensure that the neighbor is dead. It is also helpful to make checks to ensure another animal wasn’t the intended neighbor, although this judgement cannot usually be made. In most cases, the neighbor should be changed to 998.
- When the error message states ‘neighbor not found in biograph’, it can be useful to check the file, to ensure that a typo wasn’t made. There are cases in which the intended neighbor is very clear, and this can be fixed. In other cases, the neighbor should be changed to 998.
- When the error message is invalid sname in samples or in other words, the focal animal doesn’t exist in biograph, the sample should be deleted unless the intended animal can be discerned. This can be sorted out by checking the teams hand written psion logs which list each animal sampled on a given day.
- When the error message is jpsamps after first conception the sample should be deleted. This usually means that she was sampled as a juvenile and not an adult, and different types of information are collected for the two.
- When the error is Invalid time, the offending line(s) should be deleted as the times are limited to after 07:00 and before 19:00.
- When the error is missing foodcode the foodcode should be entered as UNK.
- When the error is invalid foodcode the foodcode should be entered as UNK.
- When the error is male in fpoints the points should be deleted.
- When the error is Invalid Posture, the line with the invalid posture should be deleted.
- When the error is Foodcode Present but no ‘F’ in activity, the foodcode should be removed.
- When the error is invalid neighbor code (for example 988 or 999) the neighbor code should be changed to 998.
- When the error is Juv sample after matured the sample should be deleted, unless it can be discerned that the sample should have been for a different individual.
- When the error is Juv sampled as Female the sample should be deleted, unless it can be discerned that the sample should have been for a different individual.
This list is probably not exhaustive, and other rules will be created and noted as the errors present themselves.
GPS/SWERB DATA
Data on home range and group movement are collected through SWERB records and GPS readings. SWERB files date back to the early 1980’s and collection continues today. During this time period, however, the specific protocols for data collection and entry have changed somewhat. This is an important to consider when using older data files. The written data ends January 2004. This corresponds with the start of the handheld GPS units. The handwritten data have been entered in Foxpro (DBF files), the GPS files are received as text files (.txt) and are converted to Excel at Duke. Older files, prior to July 2006, were collected as Mapsource files (.mps). All of the handwritten SWERB data has been entered and GPS files are received weekly. There is no entry to be done for these datasets.
SWERB
Catherine Markham at Princeton currently holds this dataset. Contact her for information or for the most recent version of this dataset.
GPS
Step 1: Save files from Email
A new email should come in every week (usually on Sunday). Emails are then filed as backup. There is a logbook to keep track of electronic data coming in so we can keep track of missing days and easily follow up. The original GPS files get copied from the appropriate year/month folder in Y:\ABRP_Data_Management\Data from Amboseli into the appropriate year/month folder in Y:\ABRP_Data_Management \DATA\GPS\In Progress. The files in the “In Progress” folder are the ones that get edited. Keep track of how many files we receive in the Received Files section of the GPS logbook. In theory we should receive two files per day (except Sundays and the last few days of the month when the team are doing other things). This is just a spot check to make sure we don’t miss chunks of data. If the data looks sparse or there are multiple days missing it is worth sending the team an email to inquire.
Step 2: Convert files to CSV
- If you “right click” each text file you would like to convert to Excel you can choose “Open With” and then select “Excel”. Save the Excel file as a comma delimited file (.csv). Save this file in the same folder as the monthly text file.
- Delete the top four columns which just contain headers and other information that isn’t needed.
- Delete the columns A (Waypoint), D (User Waypoint) and G-K (everything after the altitude column). These columns contain no useful information.
- Sort the data by Column B (Datetime).
- Remove the “m” from the Altitude column by highlighting the altitude column and going to the “Edit” menu on the top toolbar and then selecting “Find” and then “Replace”. Under “Find what:” type ‘ m’ . Note that there is a space before the lowercase “m”. Leave the “Replace with: “selection blank. Select “Replace All”. The “m”’s should now be gone. Make sure to save file when done.
A final GPS file should look like this:
Step 3: Proofing and Correcting
GPS data should be proofed/corrected regularly so we can get back to the field team with errors. It is best if this data is proofed right away on Monday mornings. The team is much more likely to be able to solve inconsistencies if we get back to them right away. Consistently there have been many errors. Record proofing, errors, and corrections in the GPS logbook. Details about this data are found in the Monitoring Guide. GPS files are organized as follows:
RECORD |
ENTRY |
COLUMNS |
VALUE |
DETAILS |
||||
Departure (D) |
1 |
1 |
D |
|
||||
|
2 |
1 |
GPS letter code |
D,E, F, or G |
||||
|
3 |
1+ |
Initial of observer |
R, S, or K (sometimes others) |
||||
|
4 |
1 |
Initial of driver |
G or C |
||||
Begin (B) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
1 |
B |
|
||||
|
3 |
2+ |
Grove, pGrove, UNK, NG |
See grove list |
||||
Descent (MDT) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
3 |
MDT |
|
||||
|
3 |
2-4 |
Time or BA |
|
||||
½ hourlies |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
4 |
Time(nearest half hour) |
|
||||
Ascent (MAT) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
3 |
MAT |
|
||||
|
3 |
2-4 |
Time or AD |
|
||||
End (E) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
1 |
E |
|
||||
|
3 |
2+ |
Grove, pGrove, UNK, NG |
See grove list |
||||
Water(W) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
1 |
W |
|
||||
|
3 |
4 |
Time |
|
||||
|
4 |
2-4 |
RAIN , Waterhole, NW |
See waterhole list |
||||
Extra Begin (B1) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
1 |
B |
|
||||
|
3 |
1 |
Number (1,2,3, etc.) |
|
||||
Extra End (E1) |
1 |
1 |
Group (one letter) |
L,N,O,V, or W |
||||
|
2 |
1 |
E |
|
||||
|
3 |
1 |
Number (1,2,3, etc) |
|
All entries should exist in each file except W and B1 and E1 entries. They are entered as needed when those events occur. Note that the GPS data uses “L” for Linda’s group.
To proof the GPS files follow the following guidelines:
- Scan the date to make sure it is correct (same as the filename) and that only one date exists in the file.
- Check the Group.
Sometimes more than one group is monitored in one file but you must be careful because mistakes are made as well. If two groups are monitored they should both have B and E records and MDT and MAT records. Clarify with the team if there is confusion.
- Check the departure record (D) to make sure it is there and an observer and driver are listed.
- Check B and E records (Make sure groves are valid).
Grove ids are listed when the team knows or is pretty sure where the group slept. If they are pretty sure they will enter a probable grove. If the grove is 98N and they are pretty sure they slept there it will be entered as P98N. If they do not know or it is a new grove the entries will be UNK or NG respectively. We leave the NG entries as is. The team will start to use the new name once it has been decided upon.
- Check MDT and MAT records.
- Check ½ hourly records (Make sure they end in an even half hour and not the exact minute). Make sure there are no gaps. Missing half hourlies cannot usually be recreated but are noted as errors and mentioned to the team.
- Check W records (make sure waterholes are valid)
Similar to grove id’s. Make sure there is a RAIN or waterhole entry. Make sure that the waterhole is valid. If it is a new waterhole the id will be NW. We leave these as NW and assume the team will start to use the new name once it has been decided upon.
Common errors to watch for:
- • Grove and waterhole id’s are not on our list • Grove and waterhole names are misspelled or wrong • Descent and ascent are mixed up or missing • BA and AD are mixed up • No departure record • Group changes (wrong group typed in for some of the records) • Missing ½ hourly records • No begin or end records • Forgotten W in a water record • O’s instead of 0’s • Errors in B1 and E1 entries
Record all errors and corrections in the logbook. Send an email to the field team with all the errors that need their assistance. Some will be very obvious and can be corrected without assistance. They will respond with corrections or say that they do not know. Keep a record of correspondence in the GPS logbook.
Corrections
- Missing B or E record
If a B or E record is missing and the team cannot give any details then add a row with BMISS or EMISS after the group letter code. Use the coordinates and time from the MDT record if the B record is missing or the MAT record if the E record is missing. If both MDT and B (or MAT and E) records are missing then leave the coordinates blank and just enter the date. Put 00:00 for the time. This will indicate that the time is false as no GPS records are recorded at this time of day.
- Missing MDT or MAT record
If an MDT or MAT record is missing and the team cannot give any details then add a row with MDTMISS or MATMISS after the group letter code. Use the coordinates and time from the B record if the MDT record is missing or the E record if the MAT record is missing. If both MDT and B (or MAT and E) records are missing then leave the coordinates blank and just enter the date. Put 00:00 for the time. This will indicate that the time is false as no GPS records are recorded at this time of day.
- Incorrect grove or waterhole
When a grove or waterhole is left blank or is an incorrect code and the team cannot give details then we add MISSNAME as the grove or waterhole ID. The rest of the entry is assumed correct. For example, if the B entry was listed as VB98 and the team does not know if this is grove 98N or 98S then we would change the entry to VBMISSNAME.
- Missing departure record
Enter the missing Departure records as DMISS with no coordinates and a time of 00:00. Up until July 2006 we could not add DMISS records in the MPS files. We just kept track of the missing departure records in the logbook without adding them in. We can go back and do this if necessary.
Step 4: Consolidate CSV files by month
- Once all of the data for one month has arrived and is completely open the first CSV file for the particular month in Textpad.
- Do a “Save as” naming the file with the following convention:
GPSYYMM.csv
Where YY is the year and MM is the month. Save this file in the folder with the appropriate year in Y:\ABRP_Data_Management\DATA\GPS\Merged Data
Make sure to “Save as” and not just “Save” because you will overwrite your original file.
- Place the cursor at the end of the rows (on the first blank row) and go to “Edit” menu and then “Insert Files”. Highlight the rest of the CSV files for the month and insert them in date order.
Step 5: Quick proof
Sometimes errors are missed in the initial proofing phase so it is useful to proof the monthly Excel file one last time. In order to do this it is best to sort the data by the “Name” column. Scroll through the data and pick out any errors that jump out. This is an easy way to catch “0”’s where there should be “O”’s and vice versa. Also, missed errors such as MDTAD and MATBA will jump out when the data is sorted in this way. If an error is caught you have to remember to go back and correct the data not only in the monthly Excel file but also in that particular days CSV file. Also note the error and correction in the logbook.
Step 6: Save the Monthly File and Email
Once proofed save the monthly GPS file in the appropriate year folder within Y:\ABRP_Data_Management \DATA\GPS\Merged Data
The GPS data is not yet incorporated into BaBase so there are no validation and updating procedures yet. Catherine Markham (at Princeton) uses this data. The monthly Excel files are emailed to her when complete. Her e-mail address is amarkham@princeton.edu
Groves and Waterholes
The master copies of the grove and waterhole ID files are kept at Duke. These files are found in Y:\ABRP_Data_Management \DATA\GPS\Grove and Waterhole IDs. The files are called Groves_master.xls and Waterholes_master.xls. When NG’s or NW’s show up in the data the team is supposed to write down the coordinates and name in the demography notes. When new monthly photocopies come from Princeton scan the demography notes for new groves and waterholes and add them to the list. When a new grove or waterhole name starts showing up and you do not have the demography notes yet you can email the team and ask if it is valid. Add it to master files and try to get a GPS reading from the team. Every 6 months or so ask the team for their copy of the groves and waterholes in use and add any extras they may have. They are supposed to keep these files up to date.
RANKS
Decision Rules for Assigning Male Ranks
S. Alberts 11/94 Revised: 6/95 Last revised: 03/06 by LG
- Start with the matrix for the last month for which ranks were assigned.
- Look at the matrix for the first month for which ranks have not yet been assigned, ordered as for the previous month. Put new immigrant males below all adult males and above all subadult males. If a male leaves in one month and then returns several months later, treat him as though he had never left, i.e., put him where he was the month he left.
- Now, check all entries below the diagonal and rearrange the order so as to minimize these entries, with the following caveats.
- For every “win” below the diagonal, check the corresponding “loss” box above the diagonal. If the values in each box are the same, do not record a change in rank. Otherwise, whichever box has the higher value determines the overall winner in the dyad, i.e., determines who is higher ranking.
One exception to this is when a male wins many times over a male above him, like 6 or more wins – in that case give him credit for the rise but watch both him and the male he won over carefully for the next few months to see if the loser is dropping precipitously or the winner is rising.
Another exception is when a maturing male wins over a male several ranks above him, especially if he then disperses before he interacts with anyone else. In this situation, give the male credit for the rank rise. In general, these young “rising” males get more credit for multiple “jumps” up the hierarchy than do older established males, especially if they are obviously on an upward trend anyway.
- f. Sometimes a male wins over a male more than one rank above him several times in a month, but over no males in between. In order for a lower ranking male to get credit for rising in this situation, either (1) he must win over a male in between in the next few months, (2) the higher ranking male must keep winning over the males in between during the same period that the lower ranking male is winning over him, or (3) the lower ranking male must be a maturing male as described above. Alternatively, for situations in which none of these things occur, and in which the higher ranking male loses to males in between in the following month, it may be that the lower ranking male is dropping in rank (see 4b below). g. It is always helpful to look at the 6months or so beforeand after a rank change to get an idea of where the male is going and whether he confirms the wins/losses he had in a particular month. However, confirmations are not always necessary to record a change, as indicated above.
- There are some important differences between male rankings and female rankings
- Don’t assume stability as strongly for males as for females – male ranks change more often. Yes, scanty data sometimes makes it frustrating, it is sometimes unnerving to watch a male rise over a couple of months on the basis of single wins over a series of upper neighbors, but I think reflects reality and so I do it. b. Always think first in terms of males rising in rank, rather than falling. I.e.,for every rise there is a concomitant fall and vice versa – so focus on the rises. If a male is really going to fall in rank, as a result of an injury or something, you will usually see a whole slew of losses on the part of that male over a month or two. If he has a couple of losses in a single month, don’t just score him as dropping in rank, instead watch the winners and see what they are up to. c. Because the time course of wins and losses is important for males, 6- and 12- month summaries are not as useful for males as they are for females (although they can show general patterns). For instance, for females, If DUD wins 5 times over NYA and loses 4 times to NYA in the course of a year, DUD is considered to rank above NYA by the end of the year, and generally for the whole period. For males, in contrast, it depends totally on when the wins and losses occurred. This is because male ranks are so age-dependent – i.e. a male rises and then falls in his life, and we want to see that trajectory. Maturing males, for instance, get credit for attaining adulthood when they first win over adults, and the time course of their rank rise is of interest.
- Throughout the ranking process I look at what particular dyads are doing over the several months before and after the months I am ranking. This dyad-oriented checking is a good thing to do in addition to overall checking, because it makes you pay attention to how often particular dyads are being reinforced or challenged in their ordering.
- I find it easiest to sit with a set of printed matrices, printed before you’ve done all of the ranking properly, and a sheet of columned paper. I write a new ranking for every month as well as the relevant agonisms and comings and goings for the month. Make final rankings in this way, enter them into the computer, and make final clean copies of everything.
Added by LG Oct 2006
- When males come into a group and also leave in the same month and have no interaction the procedure will be as follows:
- ranked according to age if NOT adult b. Placed at the bottom of the list of adults IF adult
- If a male immigrates and is very shy so that we don’t see him interact for several months we rank him in the first month we can and back date that rank to his immigration date.
If a male is present for only a few days in a month (< 1 week), generally, place him at the bottom of the hierarchy unless he is interacting a lot and wins over many animals.
Updating Ranks
Male ranks get updated periodically with Susan. At this point the “ranker” program is not working at Duke. We have been asking Princeton to create the matrices for us. We create the “All Male” ranks and they are handwritten in the Male Agonism Binders found in the library. These ranks get hand entered into Excel to look exactly like the ranks table. They get proofed and uploaded into babase using the Appender program (see Appendix 2). Ranks “In Progress” have not yet been uploaded into Babase but are proofed. Ranks are found in C:\ALBERTS\DATA\RANKS
Ranked and Matured “On” and “By” Dates
Decision Rules
S. Alberts 4 May 2005 Last revised: 14 Sept 2005 by LG
Decision rules for assigning “ON” dates and “BY” dates for MATURED (testicular enlargement, onset of subadulthood) and RANKED (attainment of rank among adult males, onset of adulthood) for male baboons in babase.
- • Every male who has been in a study group as an adult will have a ranked date in babase. Every male who has been in a study group as a subadult will have a matured date. Ranked and matured dates will be of two types, “ON” dates and “BY” dates. • If a date is designated as an “ON” date then we are saying that we know the male attained that marker ON that date (although note that this is not literally true, because we don’t track rank changes or testicular changes on a daily basis – males are assigned a ranked date or a matured date on the first day of the month in which we saw them attain rank or testicular enlargement). “ON” dates can be used to estimate the age at which these maturational markers of subadulthood and adulthood are attained. Note that some of the dates from the 1980’s and 1990’s are not on the first of the month. This will not be changed at this time. • If a date is designated as a “BY” date then we are saying that we know the male was adult or subadult BY that date but we don’t know when he attained it. The point of assigning “by” dates for ranked and matured is so that we can easily identify which males in any group on any day are juvenile, subadult and adult. The point is NOT to estimate the actual time on which these events occurred, but instead to insure that we have used all available information to know whether a male had reached a given marker by a given time period. • Note that “by” dates will NEVER be used to estimate the age at which markers are attained. • Note also that we will not assign “BY” dates for “consorted” or “dispersed”. These markers are not used to differentiate age classes, and so we only want “ON” dates for these (because the only thing we think we will want to do with these markers is estimate dates at which they are attained).
Rules for assigning on and by dates in various cases are as follows.
- The male is an immigrant male not natal to our study groups. We follow these rules:
- If the male enters as a juvenile (field notes indicate testes not enlarged, or notes otherwise indicate that he is juvenile) he gets no “by” dates. He gets added to the scrotal development sheet automatically (this happens in the field and has been in place for many years) and when his testes enlarge he gets an “ON” date for MATURED. Similarly, if he goes on to attain rank in the group, he gets an “ON” date for RANKED. b. If the male enters as a subadult (testes enlarged but field notes say he is subadult and he is losing to all adult males in agonistic encounters) he gets a “BY” date for MATURED that is equal to his immigration date. If he goes on to attain rank in the group, he gets an “ON” date for RANKED. c. If the male enters as an adult (field notes indicate adult) OR he immediately starts winning fights with other adult males, he gets a “BY” date for RANKED but not for MATURED – he gets no entry at all in the MATUREDATES table.
- The male is a natal male from one of our study groups and he disperses before rank attainment, directly into another study group (or after some time alone, but without being in a non-study group for more than a few days). Upon immigration in a study group, he starts winning fights with adult males. This is a common occurrence. We follow this rule:
- Assign the male a ranked “ON” date that equals the date he immigrated into the non-natal study group.
- The male is a natal male from one of our study groups and he disperses before rank attainment, but is away from our observations and in an unknown location for a long time (more than a few weeks). We follow this rule:
- Assign the male a ranked “BY” date that equals the date he immigrated into the non-natal study group.
- The male is a natal male from one of our study groups and attained one or both markers in his natal group, but we did not observe him attaining one or both markers (he was already subadult or adult when we started collecting data on him, or we do not have enough data to estimate his dates accurately because we were unable to observe the group frequently enough during that time -- this happened occasionally, primarily associated with fission of Alto’s). We follow these rules:
We assign a matured “BY” date that is 6 years 8 months after his birth (this is median age for testicular enlargement according to Alberts & Altmann 1995). The point of this is that it will allow us to get a reasonable count of subadult males on any given day in the group, even if the male is not strictly subadult by our definition. There were not so many of these cases and I don’t think there is a better solution. b. We assign a ranked “BY” date ….. [have we come across any of these yet?]
- The male is a natal male from one of our study groups but he disperses before attaining one or both markers, and attains one or both markers in a non-study group. We follow these rules:
- If he emigrates before testicular enlargement and is known to be living in a non-study group and we have no other information about him, we assign him a matured “BY” date that is is 6 years 8 months afer his birth. Again, the point here is to allow us to (somewhat coarsely) designate the subadults in the population at any given time. This also provides a “BY” date if comes back into a study group after he has attained the marker. b. If he emigrates after testicular enlargement but before rank attainment, we assign a ranked “BY” date in one of two ways.
If he emigrates before 7 years of age, we assign his ranked “BY” date as 7 years 5 months of age (the median age for rank attainment according to Alberts & Altmann 1995), if he stays in a non-study group that long and we don’t have any other information about him. The point here is to be able to designate adults versus other age classes in social groups at any given time. We might also use information in other groups notes, concerning agonistic interactions seen in the non-study group, to give us clues about assigning the “BY” date. ii. If he emigrates after 7 years of age, we assign his ranked “BY” date as the date he enters the non-study group IF he remains in that group for at least several months. If a male enters a non-study group after the age of 7 years but leaves within a few days or weeks, we do not assume a ranked “BY” date until he enters a group and stays there for some time. This is based on my observation that subadult males on the verge of adulthood tend to stay in a group only when they are successful at attaining rank in that group. Subadults on the verge of adulthood may sometimes “shop around” but they usually leave groups quickly if they are not successful at getting adult rank.
Updating Matured “By” and Ranked “By” Dates
As ranks are created matured “by” and ranked “by” dates are written down. Matured “by” dates get entered directly into Babase. No procedure for archiving these matured “by” data has been established. Suggest keeping a year file with the dates each time ranks are done. Ranked “by” dates have been entered into the rankedby.txt file in Y:\ABRP_Data_Management \DATA\MALE MATURITY\Ranked dates. However, this file is no longer up to date as errors have been fixed in babase but not in this file. Any new dates need to go in a new file to add to the database. Currently the only method for getting these data into the database (PPA) is by sending the file to Karl Pinc. These ranked “by” dates are not in Foxpro Babase. Protocol for these dates needs to be developed.
WOUNDS/PATHOLOGIES
Entry
The Wounds and Pathologies dataset records data documenting field observations of wounds and pathologies as well as notes on the animal’s subsequent condition or recovery related to the wound/pathology. For entry, data from wounds/pathology field notes are separated into 4 related tables:
- INDEX table – Provides basic summary information about the wound/pathology. One row corresponds to every wound/pathology datasheet.
- WOUND/PATHOLOGY table – Lists specific wound and pathologies associated with a row in the index table; provides additional specifics. One row corresponds to every wound and/or pathology code associated with a row in the index table.
- BODY PARTS table – Lists the specific body part(s) affected by the wound/pathology. One row corresponds to every body part associated with a row in the index table.
- HEALING RATE table – Documents the healing status and dates for each wound/pathology listed in the index table. One row corresponds to every date of a follow-up comment.
Entry and maintenance of this dataset can be challenging, particularly when getting started. Refer to Appendix 4 for details on data entry rules and codes. Be certain to pay careful attention to these details when beginning entry or using the data. There is no Wounds/Pathologies logbook housed at Duke, however we do have a “Working Data” Binder from which entry has been done.
README From Princeton
C. Markham Last revised: 25 August 2006
INDEX Table – Columns:
WID Every wound/pathology is given a code specific to a particular wound/pathology. The original observation and subsequent follow-up entries should all reference the same code. WIDs are automatically generated.
Date The date of the initial observation. Dates are in British format (day/month/year).
Time Time refers to the time that the initial observation was made. Time is recorded in military format.
Observer The observer column lists the initials of the individual(s) responsible for the observation. Separate multiple observers by commas.
Sname Sname is the sname of the baboon with the wound/pathology.
Group Group is the GID of the group that the individual was in at the time of the wound/pathology.
Wound/Pathology The wound/pathology column refers to whether the wound/pathology row refers to a wound, a pathology, or both. Possible code values are:
Wound/Pathology Index A Wound B Pathology C Wound and Pathology
Comments The comments field is for any comments describing the initial observation of a wound/pathology.
HEALING RATE Table – Columns:
WID This column links the healing rate information back to the INDEX Table.
Date The date of the follow-up observation(s). Dates are in British format (day/month/year).
Healing Status The codes for healing status reflect how the wound/pathology has healed on each follow-up observation date. Possible code values are:
Healing Status Codes 1 Not healed 2 Partially healed 3 Healed 4 Terminal 5 Animal missing
NOTE: If a WID refers to more than one wound that heal at different rates, the healing status codes should reflect the slowest wound(s) to heal.
BODY PARTS Table – Columns:
WID This column links the body part information back to the INDEX Table.
Body Part Body part locates where the wound or pathology is on the baboon. Not all wounds/pathologies are associated with a body part. Possible code values (in reference to baboon drawing) are:
Body Part Codes 0 Head (unspecified) 1 Top of head 2 Eye region 3 Muzzle 4 Lower jaw, mouth, throat 5 Cheek 6 Ear 7 Back of head, bald and sides of neck 10 Arm (unspecified) 11 Should, armpit 12 Upper arm, elbow 13 Forearm 14 Hand, wrist 20 Trunk (unspecified) 21 Ventrum, chest, between leg 22 Flank 23 Upper, mid back 24 Lower back 25 Sacral region 26 Hindquarters, PCS 27 Sex skins 28 Genitals 30 Leg (unspecified) 31 Thigh, knee 32 Lower leg 33 Foot, ankle 40 Tail (unspecified) 41 Proximal Tail 42 Tail hook 43 Distal Tail 44 Over Entire Body
Body Part Side Body part side indicates whether the body part affected was on the left (L), right (R), or center (C) of the animal.
WOUND/PATHOLOGY Table – Columns:
WID This column links the body part information back to the INDEX Table.
Wounds/Pathology Code The wounds/pathology code refers to the specific code(s) listed on the datasheet. Possible code values are:
Wounds/Pathologies Codes Wound
1 Linear cut or slash 2 Puncture 3 Scrape, amorphous wound 4 Bruise, swelling 5 Mult, small cuts (all < 1 cm) 6 Other wound 16 Unknown/ Indiscernible Wound Type 17 Scalping/ Large, open wound on head exposing the skull 18 Broken bone
Pathology
- 7 Limp, no wound visible 8 Respiration problems, coughing, sneezing 9 Digestive problems, vomiting, diarrhea 10 Malaise, weakness, stiffness in absence of wound 11 Thinning fur 12 Nosebleed 13 Discharge, sores 14 Other pathology 15 White fur
Maximum Dimension (cm) The maximum dimension of the wound (in centimeters).
Impairs Locomotion A yes or no field for whether or not the wound/pathology impairs the animal’s locomotion. Note: Entry of a “yes” for impairs locomotion is inferred if notes field on datasheets explicitly describes a locomotion problem and/or if “limp, no wound visible” is checked on the wounds/pathologies codes.
Sign of Infection A yes or no field for whether or not signs of infection (oozing, redness, or stiffness) were observed. Note:: Entry of a “yes” for sign of infection is inferred if notes field on datasheets explicitly describes an infection associated with a wound. Note also that this field is for signs of infection at any point during the monitoring of a wound – it is not limited to wound status on first observation.
Data Entry Tips and Tricks
Overview. Entering the Wounds and Pathology data may sound complicated, but it really isn’t bad once you get the hang of it. The most important thing to remember is keeping track of adding all the information to all the possible tables – if you stop for a break, try to avoid leaving entries half-finished. And save the spreadsheet often – nothing is more frustrating than having to enter the same information twice!
Before you even open the spreadsheet and consider starting data entry, take a look at one of the Wounds and Pathology data sheets. A sample one is scanned in below. Familiarize yourself with the type of information collected and what constitutes a new form. You should notice that each datasheet is specific to one animal, but not necessarily a single wound or pathology. For example, in the sample below, on 19 August 2004, Orion was observed with both a wound (scrape, amorphous wound) and a pathology (limp, no wound visible). All follow-ups after the initial description of a wound/pathology are written towards the bottom of the data sheet. Keep in mind that the number of follow-ups and descriptions associated with each follow-up varies somewhat. In the example below, there were two follow-ups to the wounds/pathologies originally recorded for Orion on 19 August 2004 – one follow-up on 28 August 2004 and another on 28 September 2004. In theory, follow-ups should continue until the wound/pathology is healed or the animal dies.
Proofing
Proofing of this data has historically been done at Princeton. Consult with Princeton if any needs to be done at Duke.
Historical Census Entry
Old Census Records are in the process of being updated at Duke. Alto’s Sept 1980 through Dec 1988 have been uploaded into Babase. Entry of Alto’s Jan-Aug 1980 as well as Hook’s census for this time period are pending.
By: C. Markham Date: 2 June 2006 Updated by L. Gerber October 23, 2006
Summary: The following are basic instructions for entering monthly group census records. These notes are modified from the census entry portion of the Princeton Protocols for Database Management Guide.
Basic Instructions
Monthly census files are created and updated for each study group. These files entered in \\bio-beagle\home\a\alberts.lab\WORKING DATA\CENSUS. Follow the steps below to review and enter census data for each group-month.
PART 1. Review the hand-written field census sheets Goal: Scan the data and check for consistency (1) between census datasheets with regard to date and also (2) between census and demography notes with regard to presences/absences for particular baboons.
- Check that the census dates on each separate census sheet are consistent for a particular group in a given month (i.e., the date dates on sheet 1 match the dates on sheet 2 and so on). For example, if the first census sheet indicates that the group was censused on June 1, June 15, June 18, and June 29, be certain that another census sheet for that group does not indicate that the group was censused on June 1, June 15, June 18, and June 28. If you notice a discrepancy, check dates in other data sources (i.e. female cycling data, demography notes, ad lib data) to determine the correct date(s). If you change a date on the census datasheet photocopies, be certain to write your initials and the date next to the change.
- Proof entries on the census sheet against data in the demography notes. The field protocol for data recording is to confirm absences, immigrations, births, etc. that appear as simple presence/absence on the census sheet with a demography note. Make sure the dates and individual baboons are consistent in both datasets.
- For example, on the census datasheet in Fig. 1 and the corresponding demography note in Fig. 2, there is confirmation that NAH was first observed on 5 November and that ZIN was not with the group on 5 November. Both of these descriptive notes support what was recorded on the census sheet for those individuals that day (NAH was marked present and ZIN was marked absent on 5 November). b. If you notice any discrepancies, use other data sources to try and confirm correct information. Also, checking the current CENSUS and MEMBERS tables might provide some insight. Check with Leah before making any judgment calls/changes and always date and initial changes that you both decide upon.
- Check for incomplete census days. On some observation days, the field team is only able to finish a partial, or incomplete, census of the individuals in a study group. This is indicated on the census datasheet next to census time and census recorded (see below). Incomplete censuses usually just mark animals as present – they typically do not mark confirmed absences. Since we follow slightly different protocols for the entry of incomplete census days, for now simply start a list of incomplete census days for each group. Do not enter any data for these days yourself – your list will be used for either Lacey or Tabby to later go in and handle the entry of census data on these days.
- Put green arrows () next to the name of each individual that has at least one absence in the group during the month. This will serve to flag you for the absences you’ll later be entering.
- For example, on the census datasheet in Fig. 1, ZIN, NIA, and NAH all have some absences in Linda’s group in November 2004. I put a green arrow () to the left of each of their names.
- Put green “0’s” (0) where needed for absences that were not entered by the field team. When a male arrives in a group mid-month, or when there has been a new birth, the team typically doesn’t mark absences for the day(s) before the individual’s arrival in the group. You need to do this yourself by putting a “0” for every complete census day before the individual’s arrival (be careful not to assume absences on days when the census was not complete!).
- For example, on the census datasheet in Fig. 1, NAH was a new infant first observed on 5 November 2004. He had neither a presence nor an absence marked on the group census that was done for Linda’s group before his birth that month (on 3 November 2004). Prior to data entry, I put a green “0” (0) for him on 3 November.
PART 2. Data Entry Goal: Enter the census data in electronic format. There should be a separate excel (.xls) file for each group for each month. Only enter absences (indicated by a zero in entry).
1. Start by opening a new excel spreadsheet. Save the new file following the naming convention below:
G N M M Y Y . dbf
G = Group and refers to a one letter abbreviation of the population of study animals who were observed to produce the data in the dataset. The groups most often referenced in file naming are Linda’s, Nyayo, Omo’s, Viola’s, and Weaver’s. Other study groups include Alto’s, Dotty’s, Joy’s, Lodge, and Nzige’s.
N = Type of data – always an N for Census here.
MM = Two digits denoting the month. Always use a leading zero for months with MM less than 10. Use 01 for January, 02 for February, 03 for March, etc.
YY = Two digits denoting the year. Always use leading zeros for years with YY less than 10. Use 99 for 1999, 00 for 2000, 01 for 2001, etc.
2. The topmost, leftmost cell should contain the group number, and this cell should be formatted as “Number” with 2 decimal points using the “Format Cells” Option.
3. Snames should be entered down the left side of the worksheet. You can copy the previous month’s snames using the previous month’s census file (if available) of this group as a template (because censused individuals will most closely match if only one month’s difference). If there is no previous month available, you’ll just have to enter the entire list of baboon names yourself (sometimes a tedious little step, but not that big a deal).
4. Format all of the snames as text. All of the snames need to be in caps.
5. Census dates should be entered across the top line of cells in the worksheet.
6. Format the dates of the month as text and write them as 1989-1-3 .
7. Enter any absences with a zero (0) – be certain not to use a capital “o” (O). Remember that the green arrows () should flag you to where the absences occur. Format “0”s for absences as numbers with 0 decimal places.
8. Lastly, empty cells should be formatted as a number with 0 decimal places as well.
9. Below is what a completed file should look like, although the date format is not correct as excel insists on reformatting it. Rest assured that the correct formatting will be retained in the txt file.
10. Close file.
11. Check off progress in Census log book. File should now be ready for proofing.
Figure 1. Sample census sheet. Note that notes shown here for death dates are NOT a part of census entry (just ignore them).
Figure 2. Sample demography note.
Editing Babase
Great care must be taken when editing BaBase files.
1. Always make a backup before you begin editing or uploading new data. 2. Record all changes in the Babase_changelog.doc file. Princeton and Duke both record all changes here. This file gets transferred back and forth with the data. This file is a good resource when looking for details on past changes.
Papio Notes
FTP data to Papio
Every time you need to transfer a new copy of babase to the papio server follow the following steps. Using SSH file transfer window copy BaBase data to Papio.
1. login to papio 2. go to biology/groups/babase/database 3. copy DATA folder to database folder. This will overwrite the data that is already there. Make certain you are sure you want to do this 4. copy ‘dump’ folder files to biology/groups/babase/dump 5. Copy Babase_changelog.doc to biology/groups/babase/database 6. Copy PSION folder to biology/groups/babase/PSION if any changes have been made to Psion data or if new data has been added 7. Change permission for all files
Change Permissions
In order for other people to access your files on papio you need to change permissions. Using the SSH window:
1. login to papio 2. change directories to the one you are interested in 3. type at the prompt chmod –R +rwx *.*
- This will change the permission for all files in the particular folder
Example – To change permission for the files in the “dump” folder
- $ cd /biology/groups/babase/dump (changes directories to the dump folder) $ chmod –R + rwx *.* (changes permissions for every file in the folder)
Example – To change permission for one file
- $ cd /biology/groups/babase/database (change directories to database folder) $ chmod –R + rwx Babase_changelog.doc (changes permissions for one file)
Hornbill Backup
Using SSH file transfer window make a backup of Hornbill’s ALBERTS directory
8. login to papio 9. go to biology/groups/babase/Hornbill_Backup 10. create a folder ALBERTS_YYMMDD [where YYMMDD is the date] 11. transfer the ALBERTS directory into this folder
Note: Only the user who created this directory will have permissions to access these files unless they are changed. I
Backup Schedule
It is imperative that all data be backed up regularly. The z:\ drive and Papio are backed up regularly elsewhere but Hornbill (c:\) is not. This is roughly the backup schedule I have followed.
Monthly
1. Make a cd backup of the “Working Data” folder 2. Make a cd backup of BaBase folder
Weekly
1. Make a backup of the ALBERTS directory on Papio (see Hornbill backup section above)
Other
1. Make a cd backup of BaBase folder before you make any changes to the data 2. Make archive cd’s after you upload new data into BaBase or add significant amounts of new data (major entry project).
Appendices
Appendix 1: Baboon Friendly Environment (Using Babase with Foxpro)
To properly use the Babase database locally with Foxpro you must set up Foxpro in the following way. This needs to be done in order for the custom programs to work.
1. In Foxpro, Tools menu, select options. Select file locations tab. Click on search path, click modify.
Choose: z:\babase\data;
- z:\babase\programs\needed;
z:\babase\programs\hacks; z:\babase\programs\reports;
2. Right click on the FoxPro icon from the desktop. Select Properties, select the Shortcut tab. Where it says target, type:
“C:\Program Files\Microsoft Visual FoxPro 7\vfp7.exe” –A –Cz: \babase\programs\needed\config.fpw
If the file pathway changes or installing on another computer then you need to:
1. Install the Babase folders which are found in:
z:\babase\DATA\PROGRAMS
This file pathway is needed for the rest of the setup. If it changes here then all the rest of the instructions will change accordingly.
2. Config.fpw
Modify path:
Command = DO Z:\BABASE\PROGRAMS\NEEDED\INITIAL
3. Initial.prg
Modify path:
SET PATH TO "z:\babase\data;z:\babase\programs\needed;z:\babase\programs\hacks;z:\babase\programs\reports"
Compile:
Under Program menu in Foxpro click on compile
4. setup.prg
Modify paths:
SET PATH TO "z:\babase\data;z:\babase\programs\needed;z:\babase\programs\hacks;z:\babase\programs\reports"
SET PROCEDURE TO Z:\babase\programs\needed\library
Compile:
Under Program menu in Foxpro click on compile
5. Close Foxpro. Open it again and it should be a “Baboon friendly environment”
Appendix 2: Append Foxpro files
When appending files you need to be very careful that all of the columns are exactly the same. To run the program set the default folder to a temp folder. Append everything in that folder together in a new file with the command “do appender with “foldername” “new name””. The folder with all of the files must be in the temp folder. The file “new name” will be placed in the folder where the originals are located. The originals remain. If there are more files than fit on a page you must hit return for the program to finish. Continue until “created new name” appears.
Appendix 3: Excel Dates (Mac versus PC)
“1904” Date button – This option seems to cause many problems when transferring Excel files back and forth between Mac’s and PC’s. Dates are sometimes off by 4 years (1462 Days). The date system needs to be the same on both computers for this to not happen
Mac’s – Preferences- Calculations- Make sure “1904” date button IS NOT checked
PC’s – Tools – Options – Calculations – Make sure “1904” date button IS NOT checked