Chado Tables

Selected Chado tables that are used by Babase are documented here.

ANALYSISFEATURE (Analysis Feature Relationships)

The ANALYSISFEATURE table is intended to capture relationships between analyses and genomic features in those cases where the analyses discover, predict, or otherwise generate the features. (E.g. Genscan generates transcripts and exons; sim4 alignments generate similarity/match features.) ANALYSISFEATURE contains one row per every relationship between a genomic feature and a analysis -- that is, one row per pairing of FEATURE row with a ANALYSIS row. ANALYSISFEATURE is used to decorate these features with analysis specific attributes. A feature is an analysisfeature if and only if there is a corresponding entry in the ANALYSISFEATURE table.

Note that the 4 analysis metrics provided by the columns of ANALYSISFEATURE do not cover the full range of scores an analysis might possibly generate; it would be undesirable to have columns for every score possible, as this should be kept extensible. Instead, for non-standard scores, use the ANALYSISFEATUREPROP table for record keeping.

ANALYSISFEATURE relates analysis to genomic features in, potentially, a many-to-many fashion. The table contains one row for every pairing of analysis to feature.

Tip

Using ANALYSISFEATURE it is possible to construct many-to-many relationships between ANALYSIS rows and FEATURE rows, but care should be taken since it is unlikely that such many-to-many relationships are desirable, at least not per analysis. It is probably more appropriate to, per analysis, construct one-to-many relationships between ANALYSIS and FEATURE rows, or the reverse, to construct many-to-one relationships between ANALYSIS and FEATURE rows. In some cases it may be appropriate to construct one-to-one relationships between ANALYSIS and FEATURE rows.

The use of ANALYSISFEATURE in Babase

When doing analyses in Babase results are typically produced per-individual, so results are typically stored in the ND_EXPERIMENT_STOCKPROP table. These per-individual results are then linked to features by way of the GENOTYPE table.

In some cases within Babase it may desirable to link analysis results to features directly. One example is SNV analysis. In this case the probability of variation at the SNV site is determined per analysis and it is appropriate to relate the analysis directly to the SNV feature.

Caution

When multiple join paths relate the ANALYSIS table to the FEATURE table care must be taken to ensure that the result makes sense. At the present time there are no data validation rules in place to ensure database integrity.

The combination of Feature_Id and Analysis_Id must be unique.

ANALYSISFEATURE rows are automatically deleted whenever any related FEATURE row is deleted. ANALYSISFEATURE rows are automatically deleted whenever any related ANALYSIS row is deleted.

Analysisfeature_Id

A unique number identifying the pairing of ANALYSIS row to FEATURE row. This column may not be NULL.

Feature_Id

The genomic feature related to the analysis. A FEATURE.Feature_Id value. This column may not be NULL.

Analysis_Id

The analysis related to the genomic feature. An ANALYSIS.Analysis_Id value. This column may not be NULL.

Rawscore (Raw Score)

This is the native score generated by the program; for example, the bitscore generated by blast, sim4 or genscan scores. One should not assume that high is necessarily better than low.

Normscore (Normalized Score)

This is the rawscore but semi-normalized. Complete normalization to allow comparison of features generated by different programs would be nice but too difficult. Instead the normalization should strive to enforce the following semantics: normscores are floating point numbers >= 0, high normscores are better than low one. For most programs, it would be sufficient to make the normscore the same as this rawscore, providing these semantics are satisfied.

Significance

This is some kind of expectation or probability metric, representing the probability that the analysis would appear randomly given the model. As such, any program or person querying this table can assume the following semantics: 0 <= significance <= n, where n is a positive number, theoretically unbounded but unlikely to be more than 10; low numbers are better than high numbers.

Identity

Percent identity between the locations compared.

The use of ANALYSISFEATURE.Identity within Babase

This column is uninteresting within Babase.

ANALYSISFEATUREPROP (Analysis to Genomic Feature Relationship Proprieties)

The ANALYSISFEATUREPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the ANALYSISFEATURE table. ANALYSISFEATUREPROP contains one row for every (non-NULL) additional column value, per ANALYSISFEATURE row, per Rank value.

The combination of Analysisfeature_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per analysis feature pairing per value type.

ANALYSISFEATUREPROP rows are automatically deleted whenever any related ANALYSISFEATURE row is deleted. ANALYSISFEATUREPROP rows are automatically deleted whenever any related CVTERM row is deleted.

Analysisfeatureprop_Id

A unique number identifying the ANALYSISFEATUREPROP row. This column may not be NULL.

Analysisfeature_Id

The analysis feature pairing having the property -- the Analysisfeature_Id value of the related ANALYSISFEATURE row. This column may not be NULL.

Type_Id (Type)

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The analysis feature pairing property value -- as text.

Rank

A number which gives the value an ordinal position among the other ANALYSISFEATUREPROP values, per the row's Type_Id value, related to the ANALYSISFEATURE row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

ANALYSISPROP (Analysis Properties)

The ANALYSISPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the ANALYSIS table. ANALYSISPROP contains one row for every (non-NULL) additional column value, per ANALYSIS row, per Rank value.

The combination of Analysis_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per analysis per value type.

ANALYSISPROP rows are automatically deleted when the related ANALYSIS row is deleted. ANALYSISPROP rows are automatically deleted when the related CVTERM row is deleted.

Analysisprop_Id

A unique number identifying the ANALYSISPROP row. This column may not be NULL.

Analysis_Id

The analysis having the property -- the Analysis_Id value of the related ANALYSIS row. This column may not be NULL.

Type_Id (Type)

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The analysis property value -- as text.

Rank

A number which gives the value an ordinal position among the other ANALYSISPROP values, per the row's Type_Id value, related to the ANALYSIS row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

CV (Controlled Vocabularies)

The CV table contains one row per controlled vocabulary set (i.e. per ontology). Ontologies may be locally defined or developed by some standard setting body. There are various names for sets of controlled vocabularies:

Names for Sets of Controlled Vocabularies

  • Ontologies

  • Terms

  • Classes

  • Types

  • Universals

  • Properties

  • Controlled Vocabularies

  • Support Tables

  • Namespaces

The CV table encompasses all of these, having one row per each set of terms.

One or more of the rows in the CV table represents a set of terms some of which define relationships. These relationships are used by Chado to construct acyclic directed graphs describing relationships between Chado database objects. For instance, genomic features may be related to one another in various ways, RNA may be transcripts of DNA, genes may be composed of exons and introns, and so forth. Stocks may be derived from other stocks, or split into other stocks.

CV_Id (Controlled Vocabulary Identifier)

A unique number identifying the row. This column may not be NULL.

Name

The name of the ontology. The Name value must be unique. This column may not be NULL.

Definition

A textual description of the criteria for membership in the ontology.

CVTERM (Controlled Vocabulary Terms)

The CVTERM table contains one row per controlled vocabulary term. Each term is assigned membership in a specific ontology by assignment of a CV_Id value.

The combination of Name, CV_Id, and Is_Obsolete must be unique.

A CVTERM row is automatically deleted when it's related CV row is deleted. A CVTERM row is automatically deleted when it's related DBXREF row is deleted.

CVTerm_Id (Controlled Vocabulary Term Identifier)

A unique number identifying the row. This column may not be NULL.

CV_Id (Controlled Vocabulary Identifier)

The controlled vocabulary to which the term belongs -- the CV_Id value of the related CV row. This column may not be NULL.

Name

A concise, human readable, label for the term suitable for identifying the term within the ontology. This column may not be NULL.

Definition

A human readable definition of the term.

DBXref_Id (External Database Cross-Reference Identifier)

The primary external database object represented by the row -- the related DBXREF.DBXref_Id value. This column may not be NULL. (See also the CVTERM_DBXREF table.)

Is_Obsolete

A number representing whether or not the term is obsolete. The number 0, the default, means the term is not obsolete. Any other value means the term is obsolete. This column may not be NULL.

Tip

Use the CVTERMPROP table to record when the term became obsolete.

Is_Relationshiptype

A number representing whether or not the term can be used to declare relationships between Chado database objects. The number 0, the default, means the term may not be used to declare relationships. Any other value means the term may be used to declare relationships. This column may not be NULL.

Terms that may be used to define relationships are called Typedefs in the OBO format. They are also known as properties or slots.

CVTERM_DBXREF (Controlled Vocabulary Term Ancillary External Database References)

The CVTERM_DBXREF table contains one row per ancillary external database object related to the controlled vocabulary term. Multiple external database references can be related to a controlled vocabulary term, and vice-versa.

The combination of CVTerm_Id and DBXref_Id must be unique.

CVTERM_DBXREF rows are automatically deleted when the related CVTERM row is deleted. CVTERM_DBXREF rows are automatically deleted when the related DBXREF row is deleted.

CVTerm_DBXref_Id (Controlled Vocabulary Term External Database Cross-Reference Identifier)

A unique number identifying the row. This column may not be NULL.

CVTerm_Id (Controlled Vocabulary Term Identifier)

The controlled vocabulary term with which the ancillary external database object is related -- the CVTerm_Id value of the related CVTERM row. This column may not be NULL.

DBXref_Id (External Database Cross-Reference Identifier)

The ancillary external database reference with which the controlled vocabulary term is related -- the DBXref_Id value of the related DBXREF row. This column may not be NULL.

CVTERM_RELATIONSHIP (Controlled Vocabulary Term Relationships)

CVTERM_RELATIONSHIP contains one row for every relationship between ontology terms in the CVTERM file.

Each CVTERM_RELATIONSHIP row represents a relationship linking two CVTERM rows. Each CVTERM_RELATIONSHIP row constitutes an edge in the graph defined by the collection of CVTERM rows and CVTERM_RELATIONSHIP rows. The meaning of the CVTERM_RELATIONSHIP row depends on the definition of the CVTERM row referred to by CVTERM_RELATIONSHIP.Type_Id. However, in general the definitions are such that the statement all Subject_Id predicate[2] some Object_Id is true. The CVTERM_RELATIONSHIP statement is about the subject, not the object. For example "insect wing part_of thorax".

For more on relationships between CVTERM rows see CVTERMPATH.

CVTerm_Relationship_Id (Controlled Vocabulary Term Relationship Identifier)

A unique number identifying the row. This column may not be NULL.

Type_Id (Type)

The nature of the relationship between subject and object. A CVTERM.CVTerm_Id value. Relations are typically from the OBO relationship ontology although other relationship types are allowed.

Subject_Id

The subject of the subj-predicate-obj sentence. A CVTERM.CVTerm_Id value. The CVTERM_RELATIONSHIP row is about the subject. In a graph, this typically corresponds to the child node. This column may not be NULL.

Object_Id

The object of the subject-relationship-object sentence. A CVTERM.CVTerm_Id value. In a graph, this typically corresponds to the parent node. This column may not be NULL.

CVTERMPATH (Controlled Vocabulary Term relationship Paths)

CVTERMPATH contains one row for every relationship chain between ontology terms in the CVTERM file.

The CVTERM file contains various ontologies. The vocabulary terms within these relationships may be related to one another in various ways. E.g. one of the possible relationships is the is a relationship; a chromosome is_a sequence. Further, these relationships can be chained and this chain used to reason about the data. E.g. a mitochondrial_chromosome is_a chromosome. So therefore a mitochondrial chromosome is a sequence.

CVTERMPATH contains the reflexive transitive closure of the CVTERM_RELATIONSHIP table -- a row for every link in the chain between related CVTERM rows. Regards the above example, there is a row in CVTERMPATH which says that a mitochondrial chromosome is a sequence.

CVTermpath_Id (Controlled Vocabulary Term relationship Path Identifier)

A unique number identifying the row. This column may not be NULL.

Type_Id (Type)

The relationship type that the row is a closure over, a CVTERM.CVTerm_Id value. If NULL the row is a closure over all relationship types. If non-NULL the related CVTERM row should be a relationship term -- should have a non-0 Is_Relationshiptype value.

When this value is non-NULL the closure applies to both the given relationship and the OBO:is_a (subclass) relationship.

Subject_Id

The subject of the subject-relationship-object sentence. A CVTERM.CVTerm_Id value. This column may not be NULL.

Object_Id

The object of the subject-relationship-object sentence. A CVTERM.CVTerm_Id value. This column may not be NULL.

CV_Id (Controlled Vocabulary Identifier)

Closures will mostly be within one CV. If the closure of a relationship traverses a CV, then this refers to the CV.CV_Id of the Object_Id. This column may not be NULL.

Pathdistance

The number of steps required to get from the subject CVTERM to the object CVTERM, counting from zero (reflexive relationship). This column may not be NULL.

Note

This number may be less than zero when counting backwards from the object to the subject.

CVTERMPROP (Controlled Vocabulary Term Properties)

The CVTERMPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the CVTERM table. CVTERMPROP contains one row for every (non-NULL) additional column value, per CVTERM row, per Rank value. A row corresponds to an AnnotationProperty in W3C OWL format.

The combination of CVTerm_Id, Type_Id, Rank, and Value must be unique. Note that this arrangement does not allow for duplicate values per controlled vocabulary term per type.

CVTERMPROP rows are automatically deleted when any related CVTERM row is deleted, whether the row is related by CVTerm_Id or Type_Id.

CVTermprop_Id (Controlled Vocabulary Term Property Identifier)

A unique number identifying the row. This column may not be NULL.

CVTerm_Id (Controlled Vocabulary Term Identifier)

The controlled vocabulary term to which the property value belongs -- the CVTerm_Id value of the related CVTERM row. This column may not be NULL.

Type_Id (Type)

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The controlled vocabulary property value -- as text. This column may not be NULL. The default value is the empty string.

Rank

A number which gives the value an ordinal position among the other CVTERMPROP values, per the row's Type_Id value, related to the CVTERM row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

DB (External Databases)

The DB table contains one row per database, often an external database that's curated by an outside entity. Databases, rows in the DB table, are generally named after a database authority; the authority that issues or curates the database. Examples of external databases are ontologies, reference genomes, collections of files in a filesystem, and Babase itself. Pretty much any collection of data can be a database in this sense.

DB_Id (Database Identifier)

A unique number used to identify the external database within Chado. This column may not be NULL.

Name

The name of the external database. This is generally the name of the database authority that produces and curates the database. The Name value must be unique. This column may not be NULL.

Description

A textual description of the external database.

URLPrefix (Prefix Used for Web Queries of the External Database)

A URL to which a term in the designated database may be appended to produce a complete URL which queries the external database for the given term.

URL (Web Address)

The URL used to retrieve from the web information on the given database.

DBXREF (External Database Object Cross-References)

The DBXREF table contains one row per external database per database object; i.e., per row or term in the external database. The purpose is to be able to link data in Chado back to specific rows in the external database. Hence, while there need not be a row in DBXREF for every row in the external database there must at least be a row for every row Chado references in the external database.

Since the DB table can refer to locally curated databases, including the local Chado database, DBXREF rows may also refer to rows within Chado.

Rows within Chado are often representations of rows within external database. In this case the DBXref_Id linking the two is stored in a column of the Chado table and the DBXref_Id is said to be the primary identifier. Chado rows may also be related to multiple other, so called ancillary or secondary DBXref_Id identifiers, to support one-to-many and many-to-many relationships between rows within Chado and external database objects. In this case there is a table within Chado ending in _DBXREF to record these sorts of relationships between the Chado rows and the secondary external database objects.

Although the DBXref_Id is a number, when referring to DBXREF rows the usual written notation uses the DB.Name, DBXREF.Accession, and DBXREF.Version values written in the form Name:Accession or Name:Accession:Version.

The combination of DB_Id, Accession, and Version must be unique.

DBXref_Id (External Database Object Cross-Reference Identifier)

A unique number used internally within Chado to represent the external database object. This column may not be NULL.

DB_Id (External Database Identifier)

The id of the external database in which the external database object resides. A DB.DB_Id value. This column may not be NULL.

Accession

An identifier which the curator of the external database guarantees is unique among the objects in the database. (One of) the external database's key(s) to the external database object. This column may not be NULL.

Version

The version number of the external database object.[3] This column may not be NULL. The version defaults to the empty string.

Description

A textual description of the external database object.

FEATURE (Genomic Feature)

The FEATURE table contains one row per biological sequence or a section of a biological sequence or collection of such sections. Anything that can be sequenced is a feature.

A feature is a biological sequence or a section of a biological sequence, or a collection of such sections. Examples include genes, exons, transcripts, regulatory regions, polypeptides, protein domains, chromosome sequences, sequence variations, cross-genome match regions such as hits and HSPs and so on; see the Sequence Ontology for more.

The use of FEATURE in Babase

Babase creates rows in FEATURE to represent each SNV site discovered. Note however that this is per SNV site, regardless of how many analyses were done of the site.

The combination of Organism_Id, Uniquename, and Type_Id is unique.

FEATURE rows are automatically deleted when the related DBXREF row is deleted. FEATURE rows are automatically deleted when the related ORGANISM row is deleted. FEATURE rows are automatically deleted when the related CVTERM row is deleted.

Feature_Id (Feature Identifier)

A unique number identifying the feature. This column may not be NULL.

DBXref_Id (External Database Object Identifier)

Identifier of the external database object the FEATURE row represents; the primary external database identifier. A DBXREF.DBXref_Id value.

Organism_Id (Organism Identifier)

Identifier of the ORGANISM row with which the feature is related. An ORGANISM.Organism_Id value. This column may not be NULL.

Name

A human readable name for the Feature.

Uniquename

The unique name for a feature; may not be necessarily be particularly human-readable, although this is preferred.

Note

The Uniquename value is unique only per organism per type. See above.

Residues

A sequence of alphabetic characters representing biological residues (nucleic acids, amino acids). This column does not need to be manifested for all features; it is optional for features such as exons where the residues can be derived from the featureloc. It is recommended that the value for this column be manifested for features which may have non-contiguous sublocations (e.g. transcripts), since derivation at query time is non-trivial. For expressed sequence, the DNA sequence should be used rather than the RNA sequence. The default storage method for the residues column is EXTERNAL, which will store it uncompressed to make substring operations faster.

Seqlen (Sequence Length)

The length of the residue feature. See column: Residues. This column is partially redundant with the residues column, and also with featureloc. This column is required because the location may be unknown and the residue sequence may not be manifested, yet it may be desirable to store and query the length of the feature. The seqlen should always be manifested where the length of the sequence is known.

MD5Checksum (Residue MD5 Hash)

The 32-character checksum of the sequence, calculated using the MD5 algorithm. This is practically guaranteed to be unique for any feature. This column thus acts as a unique identifier on the mathematical sequence.

Type_Id

The kind of feature that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

This will typically be a Sequence Ontology identifier.

Is_Analysis (Feature Is Determined By Analysis)

Boolean indicating whether this feature is annotated or the result of an automated analysis. Analysis results also use the companalysis module. Note that the dividing line between analysis and annotation may be fuzzy, this should be determined on a per-project basis in a consistent manner. One requirement is that there should only be one non-analysis version of each wild-type gene feature in a genome, whereas the same gene feature can be predicted multiple times in different analyses.

This column may not be NULL. The default value for this column is FALSE.

Is_Obsolete

Boolean indicating whether this feature has been obsoleted. Some chado instances may choose to simply remove the feature altogether, others may choose to keep an obsolete row in the table.

This column may not be NULL. The default value for this column is FALSE.

Consider recording the date the feature became obsolete in the FEATUREPROP table.

Timeaccessioned (Date and Time Feature Was Recorded in the Database)

The date and time the feature was first accessioned/determined/discovered. This is independent of entry into the database.

This column may not be NULL. The default value for this column is the current date and time. (The date and time of insert into the database.)

Timelastmodified (Date and Time Database Row Was Last Modified)

The date and time the feature was last accessioned/determined/discovered. This is independent of entry into the database.

This column may not be NULL. The default value for this column is the current date and time. (The date and time of insert into the database.)

FEATURE_CVTERM (Feature Typing/Tagging)

FEATURE_CVTERM contains one row per feature, per publication, per each ancillary type or tag used to classify the feature. As usual in Chado, the CVTERM table is used to supply the vocabulary (or vocabularies) used in classification.

The use of FEATURE_CVTERM within Babase

Babase is uninterested in publication. Every row has an identical Pub_Id value which may be safely ignored.

The combination of Feature_id, CVTerm_Id, Pub_Id, and Rank must be unique.

FEATURE_CVTERM rows are automatically deleted when the related FEATURE row is deleted. FEATURE_CVTERM rows are automatically deleted when the related PUB row is deleted. FEATURE_CVTERM rows are automatically deleted when the related CVTERM row is deleted.

Feature_CVTerm_Id

A unique number identifying the row. This column may not be NULL.

Feature_Id

The feature assigned an ancillary type or tag. An FEATURE.Feature_Id value. This column may not be NULL.

CVTerm_Id (Controlled Vocabulary Term Identifier)

The ancillary type or tag applied to the feature. A CVTERM.CVTerm_Id value. This column may not be NULL.

Pub_Id (Publication Identifier)

The publication associated with the typing or tagging of the feature. A PUB.Pub_Id value. This column may not be NULL.

Is_Not

A Boolean value which serves to negate the type or tag (to negate the controlled vocabulary term). This column may not be NULL.

Rank

A number which gives the value an ordinal position among the other FEATURE_CVTERM values, per the row's Pub_Id value, related to the FEATURE row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

FEATURE_DBXREF (Feature to External Database Object Cross-References)

FEATURE_DBXREF contains one row per external database object related to an feature.

The use of FEATURE_DBXREF within Babase

FEATURE_DBXREF is used within Babase to supply chromosomes (and possibly other features) alternative identifiers. These alternative identifiers appear within VCF files and are used by chado-vcf-load.

The combination of Feature_Id and DBXref_Id must be unique.

FEATURE_DBXREF rows are automatically deleted when the related FEATURE row is deleted. FEATURE_DBXREF rows are automatically deleted when the related DBXREF row is deleted.

Feature_DBXref_Id

A unique number identifying the row. This column may not be NULL.

Feature_Id (Feature Identifier)

Identifier of the FEATURE row with which the external database object is related. An FEATURE.Feature_Id value. This column may not be NULL.

DBXref_Id (External Database Object Identifier)

Identifier of the external database object related to the FEATURE row. A DBXREF.DBXref_Id value. This column may not be NULL.

Is_Current

A Boolean. When TRUE the relationship between the feature and the external database object is in current use. When FALSE the relationship is is outdated. This column may not be NULL. The default value for this column is TRUE.

FEATURE_GENOTYPE (Feature to Genotype Relationships)

The FEATURE_GENOTYPE table contains one row per every relationship between a genomic feature and a genotype -- that is, one row per pairing of FEATURE row with a GENOTYPE row.

FEATURE_GENOTYPE relates genotype to genomic features in, potentially, a many-to-many fashion. The table contains one row for every pairing of genotype to feature.

Tip

Using FEATURE_GENOTYPE it is possible to construct many-to-many relationships between GENOTYPE rows and FEATURE rows, but care should be taken since it is unlikely that such many-to-many relationships are desirable, at least not per analysis[4]. It is probably more appropriate to, per analysis, construct one-to-many relationships between GENOTYPE and FEATURE rows, or the reverse, to construct many-to-one relationships between GENOTYPE and FEATURE rows. In some cases it is appropriate to construct one-to-one relationships between GENOTYPE and FEATURE rows.

The use of FEATURE_GENOTYPE within Babase

FEATURE_GENOTYPE is not used in Babase. It is replace with the Babase Chado extension table ND_EXPERIMENT_FEATURE.

The combination of Feature_Id, Genotype_Id, CVTerm_Id, Chromosome_Id, Rank, and Cgroup must be unique.

FEATURE_GENOTYPE rows are automatically deleted whenever any related FEATURE row is deleted. FEATURE_GENOTYPE rows are automatically deleted whenever any related GENOTYPE row is deleted. FEATURE_GENOTYPE rows are automatically deleted whenever any related CVTERM row is deleted.

The FEATURE_GENOTYPE.Chromosome_Id column is automatically set to NULL when the related FEATURE row is deleted.

Feature_Genotype_Id

A unique number identifying the pairing of GENOTYPE row to FEATURE row. This column may not be NULL.

Feature_Id

The genomic feature related to the genotype. A FEATURE.Feature_Id value. This column may not be NULL.

Genotype_Id

The genotype, typically the input material, related to the genomic feature. An GENOTYPE.Genotype_Id value. This column may not be NULL.

Chromosome_Id (Chromosome Identifier)

The chromosome on which the feature resides. A FEATURE.Feature_Id value.

Caution

This should be a feature with an SO type of chromosome but this is not enforced. There is also no validation to ensure that the features resides on the referenced chromosome.

Rank

The Chado documentation states: rank can be used for n-ploid organisms or to preserve order.

The use of FEATURE_GENOTYPE.Rank within Babase

This column is not of interest in Babase.

This column may not be NULL. Although not allowed to be NULL this column has no default value.

For more on Rank values see The Chado Rank Column.

CGroup (Chromosomal Group)

The Chado documentation states: Spatially distinguishable group. group can be used for distinguishing the chromosomal groups, for example (RNAi products and so on can be treated as different groups, as they do not fall on a particular chromosome).

The use of FEATURE_GENOTYPE.CGroup within Babase

This column is not of interest in Babase.

This column may not be NULL. Although not allowed to be NULL this column has no default value.

CVTerm_Id (Controlled Vocabulary Term Identifier)

The type of relationship between the genetic feature and the genotype -- a CVTERM.CVTerm_Id value. This column may not be NULL.

FEATURE_RELATIONSHIP (Feature Inter-Relationships)

FEATURE_RELATIONSHIP contains one row per subject/verb/object or parent/child relationship between 2 features. This table allows organization of features into acyclic directed graphs to record things like which features are composed of which other features, which features are transcriptions of which other features, and so forth. It is one of the many relationship tables in Chado.

An example graph is exon part_of transcript part_of gene. If type is thought of as a verb, the each arc or edge makes a statement [Subject Verb Object]. The object can also be thought of as parent (containing feature), and subject as child (contained feature or subfeature). We include the relationship rank/order, because even though most of the time we can order things implicitly by sequence coordinates, we can not always do this - e.g. transpliced genes. It is also useful for quickly getting implicit introns.

The combination of Subject_Id, Object_Id, Type_Id, and Rank must be unique.

FEATURE_RELATIONSHIP rows are automatically deleted when either the related Subject_Id FEATURE row is deleted or when the related Object_Id FEATURE row is deleted. FEATURE_RELATIONSHIP rows are automatically deleted when the related CVTERM row is deleted.

Feature_Relationship_Id

A unique number identifying the row. This column may not be NULL.

Subject_Id

The subject of the subject-predicate-object sentence. A FEATURE.Feature_Id value. The subject is typically a sub-feature. This column may not be NULL.

Object_Id

The object of the subject-predicate-object sentence. A FEATURE.Feature_Id value. The object is typically the containing feature. This column may not be NULL.

Type_Id

The relationship between the subject feature and the object feature. A CVTERM.CVTerm_Id value that has a non-0 Is_Relationshiptype value. This column may not be NULL.

The Type_id is typically from the OBO relationship ontology, although other relationship types are allowed. The most common relationship type is OBO_REL:part_of. Valid relationship types are constrained by the Sequence Ontology.

Value (Additional Notes)

Additional textual notes on the relationship between the analyses.

Rank

A number which gives the value an ordinal position among the other FEATURE_RELATIONSHIP values, per Subject_Id/Object_Id pairing, related to the CVTERM row. This column may not be NULL.

The ordering of subject features with respect to the object feature may be important (for example, exon ordering on a transcript - not always derivable if you take trans spliced genes into consideration). Rank is used to order these.

Rank is zero-based, counting starts from zero. The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

FEATURE_SYNONYM (Genomic Feature to Synonym Relationships)

The FEATURE_SYNONYM table contains one row per feature having a synonym per synonym had by said feature. Typically a synonym exists so that somebody querying the db with an obsolete name can find the object they're looking for (under its current name). FEATURE_SYNONYM allows for a many-to-many relationship between features and synonyms.

The combination of Synonym_Id, Feature_Id, and Pub_Id must be unique.

FEATURE_SYNONYM rows are automatically deleted when the related SYNONYM row is deleted. FEATURE_SYNONYM rows are automatically deleted when the related FEATURE row is deleted. FEATUREPROP rows are automatically deleted when the related PUB row is deleted.

Feature_Synonym_Id (Feature to Synonym Relationship Identifier)

A unique number identifying the row. This column may not be NULL.

Synonym_Id (Synonym Identifier)

The synonym; an alternate name for the feature. A SYNONYM.Synonym_Id value. This column may not be NULL.

Feature_Id (Feature Identifier)

The feature having the synonym. A FEATURE.Feature_Id value. This column may not be NULL.

Pub_Id (Publication Identifier)

The publication in which the synonym was used to refer to the feature. A PUB.Pub_Id value. This column may not be NULL.

Is_Current

A Boolean. When TRUE the synonym is a relationship between the feature and the database object is a current, official, synonym for the feature. When FALSE the synonym is outdated. This column may not be NULL. The default value for this column is FALSE.

Is_Internal

A Boolean. When TRUE the synonym was never used publicly and intentionally (e.g. in a paper). This column may not be NULL. The default value for this column is FALSE.

FEATURELOC (Feature Locations)

FEATURELOC contains one row per feature or portion thereof which maps to a reference feature. The mapping coordinate system origin is zero-based and relative to the start of the reference feature.

Features typically have exactly ONE location but this need not be the case. Some features may not be localized (e.g. a gene that has been characterized genetically but no sequence or molecular information is available). Therefore each feature can have 0 or more locations. (However, multiple locations are not used to designate non-contiguous locations. If a feature such as a transcript has a non-contiguous location then the subfeatures such as exons should always be manifested). Multiple FEATURELOC rows for a feature designate alternate locations or grouped locations. For instance, a feature designating a blast hit or HSP will have two locations, one on the query feature and one on the subject feature. Features representing sequence variation could have alternate locations instantiated on a feature on the mutant strain. The column Rank is used to differentiate these different locations.

Reflexive locations should never be stored; FEATURELOC is for proper (i.e. non-self) locations only. Nothing should be located relative to itself.

Important

No cycles are allowed in the feature location graph.[5]

The use of FEATURELOC in Babase

Babase creates FEATURELOC rows to locate the SNV FEATURE rows is creates. See above.

The FMin value may not be greater than the FMax value.

Caution

There are no data validation rules in place to ensure correspondence between FMin and Is_FMin_Partial.

Caution

There are no data validation rules in place to ensure correspondence between FMax and Is_FMax_Partial.

The combination of Feature_Id, Locgroup, and Rank must be unique.

FEATURELOC rows are automatically deleted when any related FEATURE row is deleted, whether the row is related by Feature_Id or Srcfeature_Id.

Example 3.1. Using both FEATURELOC.Locgroup and FEATURELOC.Rank

Imagine a feature indicating a conserved region between the chromosomes of two different species. We may want to keep redundant locations on both contigs and chromosomes. We would thus have 4 locations for the single conserved region feature - two distinct Locgroups (contig level and chromosome level) and two distinct Ranks (for the two species).


Featureloc_Id (Feature Location Identifier)

A unique number identifying the mapping between the target feature (Feature_Id) and the reference feature (Srcfeature_Id). This column may not be NULL.

Feature_Id (Feature Identifier)

The feature being located. A FEATURE.Feature_Id value. This column may not be NULL.

Srcfeature_Id (Reference Feature Identifier)

The reference (or source or anchor) feature. The feature that the Feature_Id is located relative to. A FEATURE.Feature_Id value.

This column may be NULL when the source feature is not known.

FMin (Feature Lower Boundary Coordinate)

The leftmost/minimal boundary in the linear range represented by the FEATURELOC row. Sometimes (e.g. in Bioperl) this is called start although this is confusing because it does not necessarily represent the 5-prime coordinate.

Important

Chado uses space-based (interbase) coordinates, counting from zero. To convert this to the leftmost position in a base-oriented system (e.g. GFF, Bioperl), add 1 to FMin.

Note that this value may be NULL when the lower boundary is not known.

Is_FMin_Partial (Is the Feature Lower Boundary Coordinate Only Partially Known?)

A Boolean. Whether or not the lower boundary coordinate of the feature (the FMin value) is partially known. This is typically FALSE, but may be TRUE if the value for FMin is inaccurate or the leftmost part of the range is unknown or unbounded.

This column may not be NULL. The default value for this column is FALSE.

FMax (Feature Upper Boundary Coordinate)

The rightmost/maximal boundary in the linear range represented by the FEATURELOC row. Sometimes (e.g. in Bioperl) this is called end although this is confusing because it does not necessarily represent the 3-prime coordinate.

Important

Chado uses space-based (interbase) coordinates, counting from zero. No conversion is required to go from FMax to the rightmost coordinate in a base-oriented system that counts from 1 (e.g. GFF, Bioperl).

Note that this value may be NULL when the lower boundary is not known.

Is_FMax_Partial (Is the Feature Upper Boundary Coordinate Only Partially Known?)

A Boolean. Whether or not the upper boundary coordinate of the feature (the FMax value) is partially known. This is typically FALSE, but may be TRUE if the value for FMax is inaccurate or the rightmost part of the range is unknown or unbounded.

This column may not be NULL. The default value for this column is FALSE.

Strand (Strand Directionality)

The orientation/directionality of the location. Should be 0, -1 or +1. However, NULL is also allowed. The difference between NULL and 0 is undocumented.

The use of FEATURELOC.Strand within Babase

Since Babase works relative to the reference genome, and is not presently interested in cleavage sites etc, the value of Strand will always be +1.

Phase (Translation Phase)

Phase of translation with respect to Srcfeature_Id. Acceptable values are 0, 1, 2 (although this is not enforced). It may not be possible to manifest this column for some features such as exons, because the phase is dependent on the spliceform (the same exon can appear in multiple spliceforms). This column is mostly useful for predicted exons and CDSs.

Residue_Info (Alternative Residue Information)

Alternative residues, when these differ from FEATURE.Residues. For instance, a SNP feature located on a wild and mutant protein would have different alternative residues. For alignment/similarity features this column is used to represent the alignment string (CIGAR format).

The use of FEATURELOC.Residue_Info and variation features

Even if we do not want to instantiate a mutant chromosome/contig feature we can still represent a SNP etc with 2 locations: one location (Rank 0) on the genome, the other (Rank 1) would have most fields NULL -- except for alternative residues.

The use of FEATURELOC.Residue_Info in Babase

Babase chooses to store alternative SNV residues in GENOTYPE.Description since these can be better related to multiple versions of the reference genome.

Babase may someday choose to store other alternative residues in ND_EXPERIMENTPROP.Value or ND_EXPERIMENT_STOCKPROP.Value since these also can be made to refer to more than one version of the reference genome.

Locgroup (Location Group)

The Locgroup column is used to manifest redundant, derivable extra locations for a feature. The default Locgroup of 0 is used for the direct location of a feature.

This column may not be NULL. The default value of this column is 0.

Important

Most Chado users may never use FEATURELOC rows with a Locgroup value greater than 0.

Transitively derived locations are indicated with Locgroup values greater than 0. For example the position of an exon on a BAC when represented in global chromosome coordinates would have a Locgroup value > 0.

The Locgroup column is used to differentiate the transitively derived groupings of locations. The default Locgroup 0 is used for the main or primary location, from which the others are derived via coordinate transformations.

Another example of redundant locations is storage of ORF coordinates relative to both transcript and genome.

Redundant locations open the possibility of the database getting into inconsistent states; this schema gives us the flexibility of both warehouse instantiations with redundant locations (easier for querying) and management instantiations with no redundant locations.

Rank

The Rank column differentiates between multiple locations when features have multiple locations and the semantics of the locations are different. This can best be explained by way of example: Some features (e.g. blast hits and HSPs) have two locations - one on the query and one on the subject. Rank is used to differentiate these. A Rank value of 0 is always used for the query, a Rank value of 1 for the subject. For multiple alignments, assignment of rank is arbitrary. Rank is also used for sequence_variant features, such as SNPs. In this usage a Rank value of 0 indicates the wildtype (or baseline) feature, a Rank value of 1 indicates the mutant (or compared) feature.

The use of FEATURELOC.Rank in Babase

Features on the reference genome located on the reference genome have a Rank value of 0. Features of individual baboons which are located on the reference genome have a Rank value of 1.

This column may not be NULL. The default value of this column is 0. In the typical case where a feature has a single location the Rank value is always 0, the default.

FEATUREPROP (Feature Properties)

The FEATUREPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the FEATURE table. FEATUREPROP contains one row for every (non-NULL) additional column value, per FEATUREPROP row, per Rank value.

The combination of Feature_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per feature per value type.

FEATUREPROP rows are automatically deleted when the related FEATURE row is deleted. FEATUREPROP rows are automatically deleted when the related CVTERM row is deleted.

Featureprop_Id

A unique number identifying the FEATUREPROP row. This column may not be NULL.

Feature_Id

The feature having the property -- the Feature_Id value of the related FEATURE row. This column may not be NULL.

Type_Id

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Types are expected to come from the Sequence Ontology's feature property ontology but this is not enforced.

Value

The feature property value -- as text.

Rank

A number which gives the value an ordinal position among the other FEATUREPROP values, per the row's Type_Id value, related to the FEATURE row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

GENOTYPE (Genotypes)

The GENOTYPE table contains one row per genotype of interest.

Genetic context. A genotype is defined by a collection of features, mutations, balancers, deficiencies, haplotype blocks, or engineered constructs.

Note

Because Babase's focus is on individual baboons genotypes are not particularly of interest. They are, however, the way Chado links natural diversity experimental result sets (ND_EXPERIMENT rows) to features and so are typically part of the join path from experiment to feature.

In Babase the GENOTYPE table is not used. Instead the ND_EXPERIMENT_FEATURE table, a Babase extension, is used to relate ND_EXPERIMENT to FEATURE.

GENOTYPE rows are automatically deleted when the related CVTERM row is deleted.

Genotype_Id (Genotype Identifier)

A unique number identifying the genotype. This column may not be NULL.

Name

A human readable name for the genotype.

Uniquename

A unique name for the genotype. In Chado this is typically derived from the features making up the genotype. This column may not be NULL.

Description

A textual description of the genotype.

Type_Id

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

ND_EXPERIMENT (Natural Diversity Experiment Result Set)

ND_EXPERIMENT contains one row for every set of experimental results. A set of experimental results is typically not all results of what would normally be considered an experiment. Rather the ND_EXPERIMENT row represents disjoint portions of the entire collection of results partitioned according to one of the following schemes:

Partitioning of Experimental Results into ND_EXPERIMENT result sets

  • Per genotype analyzed. Depending on how (and whether) the genotype is related to genomic features there can be a single ND_EXPERIMENT row per genomic feature or multiple ND_EXPERIMENT rows per genomic feature. Regardless, in this partitioning of experimental results there is always a 1-to-1 relationship (per any given experiment) between the ND_EXPERIMENT and the GENOTYPE tables.

    Using this partitioning scheme per genotype results are stored in ND_EXPERIMENTPROP rows. (Should the genotype be uninteresting and there therefore is a 1-to-1 relationship between genotype and genomic features this system amounts to storing per genomic feature results in ND_EXPERIMENTPROP rows.) Per stock results are stored in ND_EXPERIMENT_STOCKPROP rows.

  • Per stock analyzed. In this partitioning of experimental results there is always a 1-to-1 relationship (per any given experiment) between the STOCK and the ND_EXPERIMENT tables.

    Using this partitioning scheme per stock results could be stored in ND_EXPERIMENTPROP rather than ND_EXPERIMENT_STOCKPROP.

    Given the current Chado schema[6] the per stock analyzed partitioning scheme is a degenerate case of the per genotype analyzed partitioning scheme because when partitioning experimental results in this fashion the schema design presently has no place[7] to store per genotype (per any given experiment) results.

The use of ND_EXPERIMENT within Babase

Babase uses the per genotype analyzed experimental result partitioning scheme.

ND_EXPERIMENT rows are automatically deleted whenever any related ND_GEOLOCATION row is deleted.

ND_Experiment_Id (Natural Diversity Experiment Result Set Identifier)

A unique number identifying the natural diversity experiment result set. This column may not be NULL.

ND_Geolocation_Id (Natural Diversity Geolocation Identifier)

Identifier of the ND_GEOLOCATION row which geolocates the experimental result set. This column may not be NULL.

The use of ND_EXPERIMENT.ND_Geolocation_Id within Babase

This value is hugely uninteresting to Babase.

Type_Id

The type of experimental results the row holds. This column may not be NULL.

The use of ND_EXPERIMENT.Type_Id within Babase

In Babase it is usually more useful to use ANALYSIS_ND_EXPERIMENT.Type_Id to classify the experimental results. The value of this column tends to be uninteresting.

ND_EXPERIMENT_GENOTYPE (Natural Diversity Experiment Result Set to Genotype Relationships)

The ND_EXPERIMENT_GENOTYPE table contains one row per every relationship between a natural diversity result set and a genotype -- that is, one row per pairing of ND_EXPERIMENT row with a GENOTYPE row.

ND_EXPERIMENT_GENOTYPE relates genotype to natural diversity experiment result sets in, potentially, a many-to-many fashion. The table contains one row for every pairing of genotype to natural diversity experiment result set.

Tip

Using ND_EXPERIMENT_GENOTYPE it is possible to construct many-to-many relationships between GENOTYPE rows and ND_EXPERIMENT rows, but care should be taken since it is unlikely that such many-to-many relationships are desirable, at least not per analysis[8]. It is probably more appropriate to, per analysis, construct one-to-many relationships between GENOTYPE and ND_EXPERIMENT rows, or the reverse, to construct many-to-one relationships between GENOTYPE and ND_EXPERIMENT rows. In some cases it is appropriate to construct one-to-one relationships between GENOTYPE and ND_EXPERIMENT rows.

The use of ND_EXPERIMENT_GENOTYPE within Babase

Babase uses it's Chado extension, ND_EXPERIMENT_FEATURE, instead of ND_EXPERIMENT_GENOTYPE. ND_EXPERIMENT_GENOTYPE is not used.

The combination of ND_Experiment_Id and Genotype_Id must be unique.

ND_EXPERIMENT_GENOTYPE rows are automatically deleted whenever any related ND_EXPERIMENT row is deleted. ND_EXPERIMENT_GENOTYPE rows are automatically deleted whenever any related GENOTYPE row is deleted. ND_EXPERIMENT_GENOTYPE rows are automatically deleted whenever any related CVTERM row is deleted.

ND_Experiment_Genotype_Id

A unique number identifying the pairing of GENOTYPE row to ND_EXPERIMENT row. This column may not be NULL.

Genotype_Id

The genotype, typically the input material, related to the natural diversity experiment result set. An GENOTYPE.Genotype_Id value. This column may not be NULL.

ND_Experiment_Id

The natural diversity experiment result set related to the genotype. A ND_EXPERIMENT.ND_Experiment_Id value. This column may not be NULL.

ND_EXPERIMENT_STOCK (Natural Diversity Experiment Result Set to Stock Relationships)

The ND_EXPERIMENT_STOCK table contains one row per every relationship between a natural diversity result set and a stock -- that is, one row per pairing of ND_EXPERIMENT row with a STOCK row.

ND_EXPERIMENT_STOCK relates stock to natural diversity experiment result sets in, potentially, a many-to-many fashion. The table contains one row for every pairing of stock to natural diversity experiment result set.

Tip

Using ND_EXPERIMENT_STOCK it is possible to construct many-to-many relationships between STOCK rows and ND_EXPERIMENT rows, but care should be taken since it is unlikely that such many-to-many relationships are desirable, at least not per analysis[9]. It is probably more appropriate to, per analysis, construct one-to-many relationships between STOCK and ND_EXPERIMENT rows, or the reverse, to construct many-to-one relationships between STOCK and ND_EXPERIMENT rows.

The use of ND_EXPERIMENT_STOCK within Babase

When doing analyses in Babase, whether of SNV sites or functional genomic analysis, an ND_EXPERIMENT row represents some set of results of an analysis regarding a genomic feature. Therefore the typical Babase use-case has multiple ND_EXPERIMENT rows per STOCK row per analysis.

ND_EXPERIMENT_STOCK rows are automatically deleted whenever any related ND_EXPERIMENT row is deleted. ND_EXPERIMENT_STOCK rows are automatically deleted whenever any related STOCK row is deleted. ND_EXPERIMENT_STOCK rows are automatically deleted whenever any related CVTERM row is deleted.

ND_Experiment_Stock_Id

A unique number identifying the pairing of STOCK row to ND_EXPERIMENT row. This column may not be NULL.

Stock_Id

The stock, typically the input material, related to the natural diversity experiment result set. An STOCK.Stock_Id value. This column may not be NULL.

ND_Experiment_Id

The natural diversity experiment result set related to the stock. A ND_EXPERIMENT.ND_Experiment_Id value. This column may not be NULL.

Type_Id

The type of relationship between the stock and the natural diversity experiment result set -- a CVTERM.CVTerm_Id value. This column may not be NULL.

ND_EXPERIMENT_STOCKPROP (Natural Diversity Experiment Result Set to Stock Relationship Properties)

The ND_EXPERIMENT_STOCKPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the ND_EXPERIMENT_STOCK table. ND_EXPERIMENT_STOCKPROP contains one row for every (non-NULL) additional column value, per ND_EXPERIMENT_STOCK row, per Rank value.

The use of ND_EXPERIMENT_STOCKPROP within Babase

In some cases this table may contain information on treatment of the stock. In Babase at least some of the rows hold values that represent experimental results.

The combination of ND_Experiment_Stock_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per natural diversity experiment result set per value type.

ND_EXPERIMENT_STOCKPROP rows are automatically deleted when the related ND_EXPERIMENT_STOCK row is deleted. ND_EXPERIMENT_STOCKPROP rows are automatically deleted when the related CVTERM row is deleted.

ND_Experiment_Stockprop_Id

A unique number identifying the ND_EXPERIMENT_STOCKPROP row. This column may not be NULL.

ND_Experiment_Stock_Id

The natural diversity result set to stock relationship having the property -- the ND_Experiment_Stock_Id value of the related ND_EXPERIMENT_STOCK row. This column may not be NULL.

Type_Id

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The natural diversity experimental result set to stock relationship property value -- as text.

Rank

A number which gives the value an ordinal position among the other ND_EXPERIMENT_STOCKPROP values, per the row's Type_Id value, related to the ND_EXPERIMENT_STOCK row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

ND_EXPERIMENTPROP (Natural Diversity Experiment Result Set Properties)

The ND_EXPERIMENTPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the ND_EXPERIMENT table. ND_EXPERIMENTPROP contains one row for every (non-NULL) additional column value, per ND_EXPERIMENTPROP row, per Rank value.

The combination of ND_Experiment_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per natural diversity experiment result set per value type.

ND_EXPERIMENTPROP rows are automatically deleted when the related ND_EXPERIMENT row is deleted. ND_EXPERIMENTPROP rows are automatically deleted when the related CVTERM row is deleted.

ND_Experimentprop_Id

A unique number identifying the ND_EXPERIMENTPROP row. This column may not be NULL.

ND_Experiment_Id

The natural diversity result set having the property -- the ND_Experiment_Id value of the related ND_EXPERIMENT row. This column may not be NULL.

Type_Id

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The natural diversity result set property value -- as text.

Rank

A number which gives the value an ordinal position among the other ND_EXPERIMENTPROP values, per the row's Type_Id value, related to the ND_EXPERIMENT row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

ND_GEOLOCATION (Natural Diversity Geolocation)

The ND_GEOLOCATION table contains one row for every location of a natural diversity experiment.

The use of ND_GEOLOCATION within Babase

This table is not of interest to Babase.

ND_Geolocation_Id (Natural Diversity Geolocation Identifier)

A unique number identifying the row. This column may not be NULL.

Description

A textual representation of the location, if this is the original georeference.

Latitude

The decimal latitude coordinate of the georeference, using positive and negative sign to indicate N and S, respectively.'

Longitude

The decimal longitude coordinate of the georeference, using positive and negative sign to indicate E and W, respectively.

Geodetic

The geodetic system on which the geo-reference coordinates are based.

Altitude

The altitude (elevation) of the location in meters.

ORGANISM

The ORGANISM table contains one row per organismal taxonomic classification.

Note that phylogenies are represented using the phylogeny module, and taxonomies can be represented using the cvterm module or the phylogeny module.

The combination of Genus and Species must be unique. When mapping from the NCBI taxonomy names.dmp file, this column must be used where it is present, as the common_name column is not always unique (e.g. environmental samples). If a particular strain or subspecies is to be represented, this is appended onto the species name. Follows standard NCBI taxonomy pattern.

Organism_Id (Organism Identifier)

A unique number representing the organism. This column may not be NULL.

Abbreviation

The abbreviated name.

Genus

The genus portion of the organism's taxonomic classification. This column may not be NULL.

Species

The species portion of the organism's taxonomic classification with additional information appended to provide further specification of sub-species, strain, environment, individual and so forth. This column may not be NULL.

Common_Name

The common name of the organism.

Should Babase decide to represent individuals as organisms then this column will contain the individual's Sname.

Comment

Additional remarks regards the organism.

ORGANISM_DBXREF (Organism to External Database Object Cross-References)

ORGANISM_DBXREF contains one row per external database object related to an organism.

The use of ORGANISM_DBXREF within Babase

In the case of Babase, ORGANISM_DBXREF is used to equate individual baboons (rows in the BIOGRAPH table) to organisms. An as yet undetermined mechanism is required to ensure data integrity since this involves storing the same information, i.e. Sname, in more than one place within the database.

The combination of Organism_Id and DBXref_Id must be unique.

ORGANISM_DBXREF rows are automatically deleted when the related ORGANISM row is deleted. ORGANISM_DBXREF rows are automatically deleted when the related DBXREF row is deleted.

Organism_DBXref_Id

A unique number identifying the row. This column may not be NULL.

Organism_Id

Identifier of the ORGANISM row with which the external database object is related. An ORGANISM.Organism_Id value. This column may not be NULL.

DBXref_Id (External Database Object Identifier)

Identifier of the external database object related to the ORGANISM row. A DBXREF.DBXref_Id value. This column may not be NULL.

PUB (Publication or Other Documentation of Provenance)

The PUB table contains one row for each document which somehow records the provenance of something represented in the Chado database. The documents can be publications, personal communications, or anything else of import.

The use of PUB within Babase

Babase is not interested in the rows of this table.

PUB rows are automatically deleted when the related CVTERM row is deleted.

Pub_Id (Publication Identifier)

A unique number representing the publication. This column may not be NULL.

Title

The title of the publication. A descriptive general heading.

Volumetitle (Title of Volume)

The title of the volume, if the publication is part of a series.

Volume

Volume number or other identifier of the volume.

Series_Name

Full name of (journal) series.

Issue

Issue number or other identifier of the issue.

PYear (Publication Year)

Year of the publication.

Pages

Page number range[s], e.g. 457--459, viii + 664pp, lv--lvii.

Miniref

The intended content of this column is not clear.

Uniquename

A unique name for the publication. This column may not be NULL.

Type_Id

The type of the publication -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Is_Obsolete

A Boolean value indicating whether or not the publication is appropriate to reference in future work. The default is FALSE, future reference to the row is appropriate. This column may not be NULL.

Publisher

The publisher of the publication. Note that this column is not validated. Storing this data in another publication related Chado table may result in better data integrity.

Pubplace (Publication Place)

The place of publication. Note that this column is not validated. Storing this data in another publication related Chado table may result in better data integrity.

STOCK (Stocks)

The STOCK table contains one row per physical item, whether stored or living. Generally stocks are physical things preserved over time. Typically, stocks are maintained in collections and used as input to genetic experiments.

STOCK rows are assigned primary and secondary types in the usual Chado fashion. There is a STOCK.Type_Id column to denote the stock's primary type, and a STOCK_CVTERM table to record other types/tags associated with the stock.

STOCK rows can be related to objects in external databases by way of the DBXREF table's external database object representation. STOCK rows may be representations of external database objects, i.e. be related to a primary external database object by way of STOCK.DBXref_Id, and may also be related to multiple secondary external database objects via the STOCK_DBXREF table.

The use of STOCK within Babase

Babase uses STOCK rows to represent tissue samples, DNA extracts, and derivatives and sub-divisions of these; including the library components that are mixed to produce the libraries which are inputs to the high-throughput DNA sequencing process. When a STOCK row represents a tissue sample or a DNA extract it's STOCK.DBXref_Id contains the appropriate value which denotes this.

The STOCK.Uniquename value is unique per STOCK.Organism_Id per STOCK.Type_Id -- to uniquely identify a stock all three values must be specified.

STOCK rows are automatically deleted when the related DBXREF row is deleted. STOCK rows are automatically deleted when the related ORGANISM row is deleted. STOCK rows are automatically deleted when the related CVTERM row is deleted.

Stock_Id (Stock Identifier)

A unique number identifying the stock. This column may not be NULL.

DBXref_Id (External Database Object Cross-Reference Identifier)

The primary external database object represented by the row -- the related DBXREF.DBXref_Id value.

Organism_Id (Organism Identifier)

The organism which is the source of the stock -- the Organism_Id value of the related ORGANISM row. This value should be NULL only when the organism cannot be determined.

Name

The human-readable name of the stock.

Uniquename

A name that is unique per organism per type. See above. This column may not be NULL.

Description

A textual description of the stock.

Type_Id (Type)

The kind of stock that the row holds -- a CVTERM.CVTerm_Id value. Typical types would be living stock, genomic DNA, or preserved specimen. This column may not be NULL.

Is_Obsolete

A Boolean value indicating whether or not the stock is no longer in use. This column may not be NULL.

Consider recording the date the stock became obsolete in the STOCKPROP table.

STOCK_DBXREF (Stock to External Database Object Cross-References)

STOCK_DBXREF contains one row per external database object related to an stock.

The use of STOCK_DBXREF within Babase

In the case of Babase, STOCK_DBXREF could used to relate stock to other parts of Babase in undetermined ways. A possible use is to relate stock to documents which describe their preparation etc.

The combination of Stock_Id and DBXref_Id must be unique.

STOCK_DBXREF rows are automatically deleted when the related STOCK row is deleted. STOCK_DBXREF rows are automatically deleted when the related DBXREF row is deleted.

Stock_DBXref_Id

A unique number identifying the row. This column may not be NULL.

Stock_Id (Stock Identifier)

Identifier of the STOCK row with which the external database object is related. An STOCK.Stock_Id value. This column may not be NULL.

DBXref_Id (External Database Object Identifier)

Identifier of the external database object related to the STOCK row. A DBXREF.DBXref_Id value. This column may not be NULL.

Is_Current

A Boolean. When TRUE the relationship between the stock and the external database object is in current use. When FALSE the relationship is is outdated. This column may not be NULL. The default value for this column is TRUE.

STOCK_DBXREFPROP (Stock External Database Entity Properties)

The STOCK_DBXREFPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the STOCK_DBXREF table. STOCK_DBXREFPROP contains one row for every (non-NULL) additional column value, per STOCK_DBXREF row, per Rank value.

The combination of Stock_DBXref_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per stock_dbxref per value type.

STOCK_DBXREFPROP rows are automatically deleted when the related STOCK_DBXREF row is deleted. STOCK_DBXREFPROP rows are automatically deleted when the related CVTERM row is deleted.

Stock_DBXrefprop_Id

A unique number identifying the STOCK_DBXREFPROP row. This column may not be NULL.

Stock_DBXref_Id

The external database entry/stock combination having the property -- the Stock_DBXref_Id value of the related STOCK_DBXREF row. This column may not be NULL.

Type_Id

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The external database entry/stock combination's property value -- as text.

Rank

A number which gives the value an ordinal position among the other STOCK_DBXREFPROP values, per the row's Type_Id value, related to the STOCK_DBXREF row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

STOCK_CVTERM (Stock Typing/Tagging)

STOCK_CVTERM contains one row per stock, per publication, per each ancillary type or tag used to classify the stock. As usual in Chado, the CVTERM table is used to supply the vocabulary (or vocabularies) used in classification.

The use of STOCK_CVTERM within Babase

Babase is uninterested in publication. Every row has an identical Pub_Id value which may be safely ignored.

The combination of Stock_id, CVTerm_Id, Pub_Id, and Rank must be unique.

STOCK_CVTERM rows are automatically deleted when the related STOCK row is deleted. STOCK_CVTERM rows are automatically deleted when the related PUB row is deleted. STOCK_CVTERM rows are automatically deleted when the related CVTERM row is deleted.

Stock_CVTerm_Id

A unique number identifying the row. This column may not be NULL.

Stock_Id

The stock assigned an ancillary type or tag. An STOCK.Stock_Id value. This column may not be NULL.

CVTerm_Id (Controlled Vocabulary Term Identifier)

The ancillary type or tag applied to the stock. A CVTERM.CVTerm_Id value. This column may not be NULL.

Pub_Id (Publication Identifier)

The publication associated with the typing or tagging of the stock. A PUB.Pub_Id value. This column may not be NULL.

Is_Not

A Boolean value which serves to negate the type or tag (to negate the controlled vocabulary term). This column may not be NULL.

Rank

A number which gives the value an ordinal position among the other STOCK_CVTERM values, per the row's Pub_Id value, related to the STOCK row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

STOCK_RELATIONSHIP (Stock Inter-Relationships)

STOCK_RELATIONSHIP contains one row per subject/verb/object or parent/child relationship between 2 stocks. This table allows organization of stock into acyclic directed graphs to record things like which stock was divided into which sub-stocks, which stock derived from which other stocks, and so forth. It is one of the many relationship tables in Chado.

The combination of Subject_Id, Object_Id, Type_Id, and Rank must be unique.

STOCK_RELATIONSHIP rows are automatically deleted when either the related Subject_Id STOCK row is deleted or when the related Object_Id STOCK row is deleted. STOCK_RELATIONSHIP rows are automatically deleted when the related CVTERM row is deleted.

Stock_Relationship_Id

A unique number identifying the row. This column may not be NULL.

Subject_Id

The subject of the subject-predicate-object sentence. A STOCK.Stock_Id value. This column may not be NULL.

Object_Id

The object of the subject-predicate-object sentence. A STOCK.Stock_Id value. This column may not be NULL.

Type_Id

The relationship between the subject stock and the object stock. A CVTERM.CVTerm_Id value that has a non-0 Is_Relationshiptype value. This column may not be NULL.

Value (Additional Notes)

Additional textual notes on the relationship between the analyses.

Rank

A number which gives the value an ordinal position among the other STOCK_RELATIONSHIP values, per Subject_Id/Object_Id pairing, related to the CVTERM row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

STOCKPROP (Stock Properties)

The STOCKPROP table is a Chado property table, it allows for the (effective) addition of arbitrary columns to the STOCK table. STOCKPROP contains one row for every (non-NULL) additional column value, per STOCK row, per Rank value.

The combination of Stock_Id, Type_Id, and Rank must be unique. Note that this arrangement allows for duplicate values per stock per value type.

STOCKPROP rows are automatically deleted when the related STOCK row is deleted. STOCKPROP rows are automatically deleted when the related CVTERM row is deleted.

Stockprop_Id

A unique number identifying the STOCKPROP row. This column may not be NULL.

Stock_Id

The stock having the property -- the Stock_Id value of the related STOCK row. This column may not be NULL.

Type_Id

The kind of value that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Value

The stock's property value -- as text.

Rank

A number which gives the value an ordinal position among the other STOCKPROP values, per the row's Type_Id value, related to the STOCK row. This column may not be NULL.

The default Rank value is 0.

For more on Rank values see The Chado Rank Column.

SYNONYM (Alternate Names)

The SYNONYM table contains one row for every alternate name of a feature. One feature can have multiple synonyms, and the same synonym can apply to multiple features.

The combination of Name and Type_Id must be unique.

SYNONYM rows are automatically deleted when related CVTERM row is deleted.

Caution

There is no validation ensuring correspondence between Name and Synonym-SGML.

Synonym_Id (Synonym Identifier)

A unique number which identifies the synonym. This column may not be NULL.

Name (Synonym)

The synonym. Human readable ASCII text. This column may not be NULL.

Type_Id (Type)

The kind of synonym that the row holds -- a CVTERM.CVTerm_Id value. This column may not be NULL.

Current types equate to symbol and fullname.

Synonym_SGML

The synonym with non-ASCII characters encoded as SGML entities.



[3] The version should not really go on this table but should be in a separate DB_VERSION table that is linked to instead of linking to the DB table (and which, in turn, links to the DB table).

[4] Where an analysis is an analysis of an experiment; in effect, where an analysis is an actual experiment.

[5] There are no validation rules in place to ensure this.

[6] Chado version 1.21.

[7] A ND_EXPERIMENT_GENOTYPEPROP table perhaps?

[8] Where an analysis is an analysis of an experiment; in effect, where an analysis is an actual experiment.

[9] Where an analysis is an analysis of an experiment; in effect, where an analysis is an actual experiment.


Page generated: 2021-09-17T11:17:05-04:00.