The Chado Approach

Chado is not your grandmother's database. It is designed to be highly flexible and is very ontology focused. What this means is that where, in other databases, you might expect to see a table having a specific column, e.g., a hair color column in a phenotype table, you instead see a hair color row in a phenotype property table. The row substitutes for the hardcoded column. The row is identified as one which describes hair color by the presence of a hair color ontology id in a type column in the phenotype property table. (In Chado parlance the column name is often some variant of cvterm rather than being named type.)

A different phenotype property, say, eye color, would be identified as such by the use of an eye color ontology id in the phenotype property table. The eye color in question would be a value in a value column in the eye color row. Looking at this from another angle, the hair color and eye color and other phenotype properties all appear in the value column of the phenotype property table. Exactly what any particular value in any given row represents is dependent upon the ontology term used in the row.

This approach to database design allows some phenotypes to have information on hair color and others not, but more importantly it allows each phenotype to have, effectively, an arbitrary and varying number of columns. The meaning of these virtual columns are defined by the ontologies in use.

Note that Chado property tables all have Type_Id and Rank columns. The Type_Id value declares what type of value the column contains. In the phenotype example above one Type_Id value would indicate that the value is a hair color. Another Type_Id value would indicate that the value is an eye color. The Rank value denotes whether or not the virtual column contains multiple values per Type_Id and, if so, indicates the ordinal position of the value amongst the multiple values. A Rank value of 0 indicates that the virtual column does not have multiple values. Other, non-0 values, denote the value's ordinal position in the the designated set of, in this case, per phenotype per Type_Id, values.

It is worth pointing out that database views[1] can reorganize the presentation of the data such that these virtual columns because, in effect, actual columns.

For more information on this approach to database design see the Wikipedia entry on the Entity-Attribute-Value model of database design.

[1] A feature of PostgreSQL and other databases.

Page generated: 2021-09-17T11:17:04-04:00.