Inventory

This section contains data about the origin, identity, location, and various other traits about the tissue and nucleic acid samples in the users' inventory. This includes samples currently residing in the users' inventory, as well as older samples that may have previously been in use but have since been sent to others, consumed, discarded, or lost. Because of this, the data in this section serve as both a historical record of all samples that have ever been in the users' possession and an active record of the samples that are currently in the users' possession.

Note

The text in this section uses the terms "nucleic acid" and "nucleic acid sample" interchangeably[116]. At the time of this writing, the system does not attempt to record details at the molecular level, so the reader can be assured that comments about the location, source, etc. of a specific "nucleic acid" should be interpreted as referring to a sample and not a specific molecule.

LOCATIONS

This table contains one row for every location that may be used to store tissue or nucleic acid samples.

Samples may be stored in varied locations with different organizations/research groups ("institutions"). The Institution column is included to allow easy segregation of locations across these varying locales.

The name of each distinct location is recorded in the Location column. Different organizations have their own conventions about how to organize and name storage locations, so this code may be a very descriptive and specific space ("Shelf 1, Rack 2, Box 3, Position D") or something more general ("PINK BOX").

Each Institution-Location pair must be unique.

To allow the use of nondescriptive general Location values but retain the ability to enforce uniqueness of specific ones, the boolean column Is_Unique is included. When Is_Unique is TRUE, the row's LocId may occur at most once across both the NUCACID_DATA.LocId and TISSUE_DATA.LocId columns (once total, not once per table). When FALSE, the LocId may be used any number of times in either table.

Column Descriptions

LocId (Location Identifier

A unique identifier for the location. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Institution

The INSTITUTIONS.Institution indicating the organization or research group at which this row's Location exists.

This column may not be NULL.

Location

A textual column naming this location.

This column may not be NULL.

Is_Unique

A boolean indicating whether or not this location at this institution is unique.

This column defaults to TRUE.

This column may not be NULL.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

NUCACID_CONC_DATA (NUCleic ACID CONCentration DATA)

This table contains one row for every quantification of a nucleic acid sample's concentration. All concentrations are recorded in picograms per microliter (pg/μL).

A nucleic acid sample cannot be quantified before it was created, before the source tissue sample was collected, nor before the tissue sample's donor entered the study population (if applicable); the Conc_Date cannot be before the related NUCACID_DATA.Creation_Date, TISSUE_DATA.Collection_Date, nor the related BIOGRAPH.Entrydate. These dates already have a required sequence to them — Entrydate <= Collection_Date <= Creation_Date <= Conc_Date — so in many cases it may be sufficient for the system to only require that Conc_Date is after the Creation_Date. However, any of these date columns can be NULL, so for the sake of completeness the system separately checks that Conc_Date is greater than each of them.

Some quantification methods may use a different unit of concentration than that used in this table. Nanograms per microliter (ng/μL) is especially common. Such concentrations must be converted to pg/μL before they are added to this table.

Tip

Use the NUCACID_CONCS view instead of this table. It includes an additional column that indicates concentration in ng/μL, and also allows the insertion of quantifications in ng/μL. The conversion to ng/μL is thus performed by the system and not the user.

Warning

Do not assume that the number of significant figures employed in the Pg_ul column is the "true" number of significant figures for this quantification. This table records concentrations from a variety of quantification methods with varying levels of accuracy and stores them all in a single column that records all data to the nearest 0.1 pg/μL[117]. When new data are added, this column pays no attention to the number of provided significant figures and may indicate more than were actually used at the time of quantification. See the example below.

Example 3.2. (Mis)Use of Significant Figures in NUCACID_CONC_DATA

The concentration of a new DNA sample is determined to be 10.0 ng/μL, which has 3 significant figures. When recorded in NUCACID_CONC_DATA, this concentration will be recorded in Pg_ul as 10000.0 pg/μL, with 6 significant figures. A user should not assume that this quantification was originally performed with 6 significant figures' accuracy.


Column Descriptions

NACId (Nucleic Acid Concentration Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

NAId (Nucleic Acid Identifier)

The NUCACID_DATA.NAId of the quantified sample.

This column may not be NULL.

Conc_Method

The NUCACID_CONC_METHODS.Conc_Method used to quantify this concentration.

This column may not be NULL.

Conc_Date (Concentration Date)

The date that this concentration was quantified.

This column may be NULL, when the date is unknown.

Pg_ul (Picograms per microliter)

The concentration of the sample according to this quantification, in picograms per microliter (pg/μL).

This column may not be NULL.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

NUCACID_CREATORS (NUCleic ACID CREATORS)

This table contains one row for every person involved with the creation of a specific nucleic acid sample. When a nucleic acid sample has multiple creators, each of them is recorded here in a separate row.

Most nucleic acid samples are created via "extraction". This table favors using "creation" rather than "extraction", for reasons explained in the discussion of the NUCACID_DATA table.

Each NAId-Creator combination must be unique; a sample cannot have the same creator more than once.

Tip

Use the NUCACIDS view to insert data into this table. It provides a simple way to determine the appropriate NAId value to use, and for a human data enterer to provide multiple creators in a single row.

Column Descriptions

NACrId (NUCACID_CREATORS Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

NAID (Nucleic Acid Identifier)

The NUCACID_DATA.NAId of the related nucleic acid sample.

This column may not be NULL.

Creator

The LAB_PERSONNEL.Initials of this creator.

This column may not be NULL.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

NUCACID_DATA (General information about NUCleic ACID samples)

This table contains one row for every nucleic acid sample that is or ever has been in the inventory. Each nucleic acid sample is associated with a "source" tissue sample, which is indicated in the TId column.

Tip

Always use the NUCACIDS view in place of this table. It contains additional related columns which may be of interest.

This table records a nucleid acid sample's current location using the LocId column. Values in this column constrain and are constrained by values in the TISSUE_DATA.LocId column, and may or may not be unique, as discussed in the LOCATIONS table.

The Name_on_Tube column indicates whatever "name" or other identifying information is recorded on the tube. Because of labeling errors or misidentification in the field, this value may not indicate the true identity of the individual from whom this sample came.

Tip

To see the "true" identity of this individual, see the related line in the TISSUE_DATA table. This information is also provided in the NUCACIDS view.

Two columns in this table record information related to the sample's creation: Creation_Date and Creation_Method. Also the related table, NUCACID_CREATORS. In laboratory vernacular, the term "extraction" is usually favored over "creation" for most nucleic acid sample types. However, some samples are not "extracted" and are instead generated via a laboratory procedure (e.g. reverse transcription, dilution, PCR amplification, etc.). Because of this, the generic term "creation" is used here.

A sample's Creation_Date cannot be before the source tissue's Collection_Date, nor before the source individual's Entrydate, if any. It may often be redundant to verify that Creation_Date is on or after both dates, but this redundancy is intended, as discussed above.

This table attempts to keep an ongoing record of a sample's current volume in the Actual_Vol_ul column. It is left to the user to judge this column's accuracy, which depends greatly on 1) how diligently the lab personnel keep the data manager(s) informed of changes, and 2) the amount of time that has passed since this volume was determined[118]. To assist users in making these judgments, the date that the Actual_Vol_ul was last updated is recorded in the Actual_Vol_Date column. A sample's current volume cannot be recorded without also recording this date; both of the Actual_Vol_ul and Actual_Vol_Date columns must be NULL or both non-NULL.

A sample cannot have its current volume determined before the sample was created; the Actual_Vol_Date must be on or after the sample's Creation_Date.

It is unlikely, though not impossible, that a sample's volume might increase after its creation. The system will report a warning when a sample's Actual_Vol_ul is greater than its Initial_Vol_ul.

Column Descriptions

NAId (Nucleic Acid Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

TId (Tissue Identifier of Source)

The TISSUE_DATA.TId of the tissue sample from which this nucleic acid sample originated.

This column may not be NULL.

LocId (Identifier for the sample's current location)

The LOCATIONS.LocId indicating the current locale and location of the nucleic acid sample.

This column may not be NULL.

Name_on_Tube

The name of the source individual, according to the label on the tube.

This column may be NULL, when there is no identifying information on the tube. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

NucAcid_Type (Nucleic Acid sample Type)

The NUCACID_TYPES.NucAcid_Type of this nucleic acid sample.

This column may not be NULL.

Creation_Date

The date that this nucleic acid sample was created. When the process to generate a sample lasts more than one day, this is the date that the procedure was completed.

This column may be NULL, when the creation date is unknown.

Creation_Method

The NUCACID_CREATION_METHODS.Creation_Method describing how this nucleic acid sample was created.

This column may not be NULL.

Initial_Vol_ul (Initial Volume in μL)

The sample's volume, in microliters, when it was first created.

This column may be NULL, when the initial volume is unknown.

Actual_Vol_ul (Actual Volume, in μL)

The sample's volume, in microliters, as of the Actual_Vol_Date.

This column may be NULL, when users have not updated the sample's "current" volume or when the sample has not yet been used.

Actual_Vol_Date (Date of the recorded Actual Volume)

The date that the Actual_Vol_ul was determined.

This column may be NULL, when users have not updated the sample's "current" volume or when the sample has not yet been used.

Notes

Comments or miscellaneous information about this nucleic acid sample.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

NUCACID_LOCAL_IDS (LOCAL IDentifierS for NUCleic ACID samples)

This table contains one row for every name or ID used only at a specific institution (an ID that is "local" to that institution) to describe a particular nucleic acid.

Identity of samples is maintained by the system as much as possible, but when working with samples in the laboratory this is often inconvenient or impractical. Different groups and institutions often have their own systems for giving unique names to their samples, and while these names may be useful and meaningful for humans, they are mostly unhelpful from the database's perspective. They're vulnerable to typos, and can be very confusing when a sample is shared between institutions. However, these "local names" remain important for the people who are actually using these samples, so these identifiers are recorded in this table, one per nucleic acid sample, per institution.

Every combination of NAId and Institution must be unique; an NAId cannot go by more than one local name at the same Institution.

Every combination of Institution and LocalId must be unique; the same local name cannot be used at a single Institution more than once.

Column Descriptions

NAId (Nucleic Acid Identifier)

The NUCACID_DATA.NAId of the nucleic acid sample.

This column may not be NULL.

Institution

The INSTITUTIONS.Institution indicating the organization or research group at which this NaId's name is used.

This column may not be NULL.

LocalId (Local Identifier)

The local name used for this NAId at this Institution.

This column may not be NULL.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

NUCACID_SOURCES

This table contains one row for every nucleic acid sample having another nucleic acid as its source.

Often, nucleic acid samples are created through some "extraction" process in which the nucleic acids are purified from a tissue sample (e.g. a blood draw, a buccal swab, etc.) However, there are also numerous different methods by which nucleic acid samples may instead be created from another nucleic acid sample (e.g PCR[119], reverse transcription, dilution, etc.). In addition to recording the identity of the source nucleic acid, this table includes the Relationship column, which indicates the nature of the connection between the row's nucleic acid and its source nucleic acid. This relationship may be simple enough to explain in a single word (e.g. "DILUTION"), or complex enough to require a lengthy explanation. To allow this flexibility, Relationship is not constrained to a set of legal values in a support table.

A nucleic acid sample cannot indicate itself as its source; the NAId and Source_NAId cannot be equal.

A nucleic acid sample cannot have more than one other sample as its source; this table's NAId column is unique.

A nucleic acid cannot have been created before its source; the related Creation_Date of this NAId must be on or after the Source_NAId's related Creation_Date.

Although a nucleic acid sample may have been generated from another nucleic acid sample, there will always be a single tissue sample from which both the nucleic acid samples originated; both samples' related NUCACID_DATA.TId's must be equal.

Column Descriptions

NAId (Nucleic Acid Identifier)

The NUCACID_DATA.NAId of the nucleic acid that has another nucleic acid as its source.

This column may not be NULL.

Source_NAId (Nucleic Acid Identifier of Source)

The NUCACID_DATA.NAId of the source nucleic acid.

This column may not be NULL.

Relationship

A textual description of how this nucleic acid and its source are connected.

This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

POPULATIONS

This table contains one row for every population under observation, and/or from which tissue or nucleic acid samples have been collected.

In this context, the term "population" refers to a particular species at a specific location. "The baboons in the Amboseli basin in Kenya", for example, are a population. "All baboons", or "all wildlife in the Amboseli basin", are not.

In the common vernacular, a population is often referred to only by the name of its site, e.g. "Gombe" when referring to the Gombe chimpanzees. Because of this, the Pop_Name and Site columns may seem redundant, but when setting vernacular aside it should be obvious that these two columns contain objectively different information. In practice, users may elect to enter the same value in both of these columns, but the two columns remain independent of each other.

Special Values

PopId 1 has special meaning to the system. Data integrity rules for the UNIQUE_INDIVS table presume that the population with this PopId is the population whose individuals are recorded in BIOGRAPH. No other code should be created to refer to that population.

Column Descriptions

PopId (Population Identifier)

A unique identifier. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

Pop_Name (Population Name)

The name of the population.

This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Species_Sci_Name (Scientific Name of the Species)

The scientific name of this population's species.

This column may be NULL, when unknown or not applicable. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Species_Common_Name (Common Name of the Species)

The common name of this population's species.

This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Wild_Captive

A code indicating whether or not the population is wild or captive. The legal values are shown below.

POPULATIONS.Wild_Captive Values

W

Wild.

C

Captive.

U

Unknown.

NA

Not applicable.

This column may not be NULL.

Site

The location of the population.

This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Notes

Comments or miscellaneous information about this population.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

TISSUE_DATA (General information about TISSUE samples)

This table contains one row for every tissue sample that is or ever has been in the inventory.

Tip

Always use the TISSUES view in place of this table. It contains additional related columns which may be of interest.

This table records a tissue sample's current location using the LocId column. Values in this column constrain and are constrained by values in the NUCACID_DATA.LocId column, and may or may not be unique, as discussed in the LOCATIONS table.

If a sample was collected from an individual in BIOGRAPH — if the related UNIQUE_INDIVS.UIId has a PopId of 1 — the sample's Collection_Date must be on or after that individual's Entrydate. Depending on the sample's Tissue_Type, the Collection_Date may also be constrained by the individual's Statdate. See TISSUE_TYPES for more information.

The system will return a warning if a sample's Collection_Date is after the individual's Statdate, but only when the sample's Tissue_Type indicates that the Collection_Date is not constrained by the individual's Statdate. That is, when the related TISSUE_TYPES.Max_After_Statdate is NULL.

From time to time, field observers may mistakenly record the wrong collection date on a tube. To help identify when this has occurred, the system uses the CENSUS table to confirm whether the Collection_Date is a date that the individual was actually observed[120]. The result of that confirmation is indicated in the Collection_Date_Status column.

When a sample's Collection_Date is not a Date on which the individual was recorded present in CENSUS, the Collection_Date is not necessarily "wrong". There are numerous circumstances in which a sample may have been collected without a census being performed. Still, the absence of a related row in CENSUS is suspicious, so it elicits a warning. That is, the system will return a warning a tissue sample's Collection_Date_Status is 1.

Tip

Do not assume that the date written on a sample's label will always match the Collection_Date. When data managers determine that the date written on a label is erroneous, they may be able to determine the true date and update the Collection_Date as needed.

Column Descriptions

TId (Tissue Identifier)

A unique identifier for the tissue sample. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

UIId (Unique Individual Identifier)

The UNIQUE_INDIVS.UIId of the individual from whom this tissue sample was collected.

This column may not be NULL.

LocId (Identifier for the sample's current location

The LOCATIONS.LocId indicating the current locale and location of the sample.

This column may not be NULL.

Name_on_Tube

The name of the individual from whom this tissue sample was collected, according to the label on the tube.

This column may be NULL, when there is no identifying information on the tube. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Collection_Date

The date the sample was collected.

This column may be NULL, when the date is unknown.

Collection_Time

The time the sample was collected.

This column may be NULL, when the time is unknown.

Tissue_Type

The TISSUE_TYPES.Tissue_Type of this tissue sample.

This column may not be NULL.

Storage_Medium

The STORAGE_MEDIA.Storage_Medium in which the sample is stored.

This column may not be NULL.

Misid_Status (Misidentification Status)

The MISID_STATUSES.Misid_Status of this tissue sample.

This column may not be NULL.

Collection_Date_Status

A code indicating whether this row's Collection_Date is or isn't plausible according to available CENSUS data. The legal values are:

Valid TISSUE_DATA.Collection_Date_Status Values
Code Description
0 This individual is part of the main population and has a non-"absent" CENSUS row on this Collection_Date, OR this individual is not part of the main population and we have no basis to question the accuracy of this Collection_Date
1 This Collection_Date is NULL, OR this individual is part of the main population and either i) has no CENSUS rows on this Collection_Date or ii) has only "absent" censuses on this Collection_Date

This column is automatically maintained by the database and may not be NULL. Attempts to manually populate or update this column are silently ignored.

Notes

Comments or miscellaneous information about this tissue sample.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

TISSUE_LOCAL_IDS (LOCAL IDentifierS for TISSUE samples)

This table contains one row for every name or ID used only at a specific institution (an ID that is "local" to that institution) to describe a particular tissue sample.

For more details about the reason for this table and the difference between a "local" name/identifier and an ID generated by the database, see the discussion for the NUCACID_LOCAL_IDS table.

Every combination of TId and Institution must be unique; a TId cannot go by more than one name at the same Institution.

Every combination of Institution and LocalId must be unique; the same local name cannot be used at a single Institution to describe more than one sample.

Column Descriptions

TId (Tissue Identifier)

The TISSUE_DATA.TId of the tissue sample.

This column may not be NULL.

Institution

The INSTITUTIONS.Institution indicating the locale in which this TId's name is used.

This column may not be NULL.

LocalId

The local name used for this TId at this Institution.

This column may not be NULL.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.

UNIQUE_INDIVS (All UNIQUE INDIVidualS)

This table contains one row for every individual under observation, and every individual from whom tissue or nucleic acid samples have been collected.

In contrast to BIOGRAPH, which records the identities of every individual in the main study population[121], this table also records the identities of all the individuals in other populations from whom there are tissue or nucleic acid samples recorded in the inventory. All individuals in BIOGRAPH are also included in this table, whether or not tissue or nucleic acid samples exist in the inventory. This presents a problem: there are two tables that separately track the identities of all individuals in the main population. To address this, the triggers have been written to ensure that BIOGRAPH retains primary authority over all individuals in the main population.

Management of individuals in the main population is done by BIOGRAPH (see its discussion for more information), so the ability to perform inserts/updates/deletes in this table for those individuals is heavily constrained, as follows:

  • Inserting rows for individuals in the main population is only allowed for the unknown individual or for individuals in BIOGRAPH who have not yet been added to this table[122].

  • The unknown individual's row can only be updated or deleted by an administrator.

  • Deleting rows for individuals in the main population is only allowed for individuals who are no longer in BIOGRAPH[123].

  • Updating rows for individuals in the main population is only allowed when changing only the Notes column.

  • Any individual's PopId cannot be updated to add or remove the individual from the main population.

Tip

Do not manually insert or delete rows in this table for individuals in BIOGRAPH. Perform those actions in BIOGRAPH, and the action will automatically be performed in this table, as well. Manual inserts and deletes in this table should only be done for individuals who are not in BIOGRAPH.

The IndivId column is used to record the individual's name or similar ID. Study projects and research institutions each have their own rules of nomenclature for their individuals, so this might be a lengthy name, an abbreviation, a series of numbers, or some mix of these. This value is not unique; the same identifier may be used more than once across different populations. However, per PopId, each IndivId must be unique; a population cannot use the same identifier more than once.

Special Values

PopId 1 is the population recorded in BIOGRAPH, so any row with this PopId (with a few exceptions, discussed below) must use the individual's Bioid as its IndivId.

IndivId UNKNOWN indicates the unknown individual, and is allowed to have PopId 1 and not be a Bioid.

IndivId MULTIPLE is used to indicate when TISSUE_DATA row includes samples from multiple individuals. It is allowed to have PopId 1 and not be a Bioid.

Column Descriptions

UIId (Unique Individual Identifier)

A unique identifier for the individual. This is an automatically generated sequential number that uniquely identifies the row.

This column is automatically maintained by the database, cannot be changed, and must not be NULL.

IndivId (Individual Identifier)

The name/identifier for this individual.

This column may not be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

PopId (Population Identifier)

The POPULATIONS.PopId of the individual's population.

This column may not be NULL.

Notes

Comments or miscellaneous information about this individual.

This column may be NULL. This column may not be empty, it must contain characters, and it must contain at least one non-whitespace character.

Sys_Period

The timestamp range during which this row's data are considered valid. See The Sys_Period Column for more information.



[116] Also "tissue" and "tissue sample", but those two terms aren't terribly different anyway.

[117] This is expected to be the highest plausible accuracy to ever be used for the concentrations stored in this table. This can easily be expanded if needed.

[118] Even in the coldest of cold storage, frozen samples will slowly evaporate over time. A 100-μL sample that is frozen and stored for 5 years is unlikely to still be the full 100 μL at the end of that time.

[119] It is presumed that any reader who cares enough about nucleic acid samples to read this documentation is already familiar with the polymerase chain reaction. We will not attempt to explain it here.

[120] Admittedly, this approach is imperfect and is likely underestimating the true prevalance of the problem. The date written on a sample may not be the true date it was collected but may still be a date that the individual was censused. Unfortunately, there is little else that the system can do to recognize when this occurs.

[121] That is, the population whose data are recorded throughout the many tables in Babase.

[122] Related rows in this table are automatically inserted when rows are inserted into BIOGRAPH, so manual insertion of these rows is effectively not allowed.

[123] Similar to inserts, related rows in this table are automatically deleted when rows are deleted from BIOGRAPH, so manual deletion of these rows is effectively not allowed.


Page generated: 2024-04-22T16:19:10-04:00.