Name

chado-stock-load — load stock information into Babase/Chado

Synopsis

chado-stock-load [-h] [--help] [-v] [--version]
[--dbname=dbname] [--host=hostname] [--port=port] [--username=username]
[--dry-run]
--organism-id=oid [--pub-id=pubid] --primary-db=name
[--allow-duplicates]
[pathname]

Description

chado-stock-load takes information in a tab delimited file, pathname, and loads the information into the STOCK table and related tables if the information is not already in the database. If pathname is omitted then data is taken from standard input. chado-stock-load outputs a list of the STOCK.Stock_Id values that are in the supplied file.

The input file is tab-delimited. The first two lines contain column headings. The first line of heading describes how the data in the column should be stored in the database. The second line of heading is ignored. The 3rd and subsequent lines contain stock information, one line per stock. The first 3 columns are reserved for specific data and must be present, the remaining columns are optional and their column headings determine how they are stored in the database.

The STOCK rows loaded into the database have as their primary (STOCK.DBXref_Id) DBXREF the individual from which the stock came. They always have a secondary (a related STOCK_DBXREF row) to relate the stock back to a physical sample.

The output of chado-load-stock is suitable for use as the --stock-map value of the chado-funcgen-load command.

All database modifications take place within a single transaction. Upon error the transaction is rolled back and the database returned to it's original state.

Unless otherwise noted all matching is done in a case-sensitive fashion.

Leading and trailing spaces are silently removed from all data inserted into or compared with values in the STOCKPROP.Value column.

Arguments

Database Options

--dbname=dbname

The (optional) name of the database to which to connect or a conninfo connection string.[19] Defaults to the user's Unix username.

--host=hostname

The (optional) host name of the machine on which the server is running. If the value begins with a slash it is used as the directory for the Unix-domain socket.

--port=port

The (optional) TCP port or the local Unix-domain socket file extension on which the server is listening for connections.

--username=username

(Optional) Username to use to connect to the database. Defaults to the user's Unix username.

--dry-run

When the (optional) --dry-run option is used a trial run is performed that does not permanently insert any data into the database, the transaction is rolled back when the program completes. --dry-run allows an upload to be tested[20] while minimally affecting database content.[21]

Data Options

--organism-id=oid

The ORGANISM.Organism_Id value to place in STOCK.Organism_Id.

--pub-id=pubid

The (optional) PUB.Pub_Id value to place in the STOCK_CVTERM.CVTerm_Id column of new STOCK_CVTERM rows.

This option is only optional if the input file contains no data which will create rows in STOCK_CVTERM.

--primary-db=name

The DB.Name of the database that contains the individuals from which the stock were taken. This is the the DB related to the STOCK.DBXref_Id values.

Note

In Babase the name will almost always be unique_baboons.

Processing Options

--allow-duplicates

(Optional) When omitted and the first two columns designate a DBXREF which is already related to a STOCK row via STOCK.DBXref_Id, then all remaining columns must contain values which match with the values in the database which are already related to the existing STOCK row. If present, and a STOCK row exists which has a STOCK.DBXref_Id value which is designated by the value of the first 2 columns, then chado-stock-load will create a new row on STOCK if (and only if) the remaining columns fail to match the data related to the STOCK row which already exists.

Matching is done only on the data in the columns which exist in the supplied file. The database may contain additional information related to the stock and that will remain unexamined if there are no columns in the supplied file which contain such additional information.

Caution

It is never advisable to use this option until after running the program without the option and verifying that it is appropriate that the database contain duplicate STOCK rows.

Note

Once this option has been used and there is more than one row in STOCK with matching STOCK.Type_Id values and related matching DBXREF.Accession values chado-stock-load will always report the duplicates and --allow-duplicates will have to be used in future when working with these stocks. In the case where duplicates already exist in STOCK chado-stock-load will report when one of the duplicates is an exact match of the uploaded stock line and when the uploaded stock line matches none of the existing duplicates.

Processing The Input File

Beginning with the third row, each row of the input file represents a single stock.

The first column is required to contain DB.Name values. These values indicate the external database in which the stock is tracked. The first two rows of the first column are ignored.

Note

In Babase the first column will usually contain did to indicate that the stock is from a did.

The second column is required to contain DBXREF.Accession values. These values designate the id number of the stock in the external database. The first two rows of the first column are ignored.

Note

In Babase the second column will usually contain did values.

Note

It is the first two columns that are used to identify pre-existing STOCK rows and cause them to be considered as possible matches for the uploaded line. If the entire line matches what is in the database the existing STOCK.Stock_Id is reported and a new STOCK row is not created. If the first two columns match but the remaining columns do not then the use of the --allow-duplicates option determines what happens. See the documentation above on --allow-duplicates for more information.

The third column is required to contain DBXREF.Accession values which are in the DB designated by the --primary-db name.

Note

In Babase the third column designates the individual baboon from which the sample was taken to create the stock.

Caution

In the general case, the system does not validate the 3rd column, typically the unique id value, against the first two columns. It is possible to associate a stock with the wrong baboon.

However, in the case where the the first column is did the chado-stock-load command itself will check that the value of the 3rd column matches that of the GENETIC_INVENTORY.DNA.Sname column of the GENETIC_INVENTORY.DNA row where the GENETIC_INVENTORY.DNA.DId column's value is the value in column 2. The match is performed as follows: If the value of the 3rd column is the empty string or NULL the values match. If the value of GENETIC_INVENTORY.DNA.Sname is NULL the values match. Otherwise, the two values are compared in a case-sensitive fashion.

The headings of the fourth and subsequent columns, appearing on the firs and 2nd line, have the following structure:

The first line is either column_number, or begins with one of type:, or primarytype:, or prop:.

The file must contain exactly one column having a column heading in the the first line of column_number. The data in this column records which column in the functional genetic data is associated with the stock in the uploaded file. Since all (but the first) column in a file of functional genomic results must represent the analysis of a stock, each row in the uploaded stock data must contain a unique column number. Further, the uploaded column numbers must, when considered as a collection, have 2 as their lowest value and each successive integer must be contained in the collection, without omission, until the maximal value is reached. The stock rows in the uploaded file need not be ordered by the column number.

The file must contain exactly one column having a column heading in the first line with a prefix of primarytype:. When the prefix is primarytype: the following holds true: The column contains CVTERM.Name values and the related CVTERM.CVTerm_Id value is stored in the STOCK.Type_Id column. In the first line of the header primarytype: is followed by the CV.Name of the vocabulary used in the column.

Note

This column is required because the STOCK.Type_Id cannot be NULL. In Babase it is not likely that primarytype: column heading will often read other than primarytype:babase_chado_terms and the column's data values be other than None.

Only one column with a prefix of primarytype: may appear in the input file. If no columns have this prefix then the STOCK.Type_Id is NULL.

When the prefix is type: the following holds true: The column contains CVTERM.Name values and a row is created in STOCK_CVTERM for each data row in the input file. In the first line of the header type: is followed by the CV.Name of the vocabulary used in the column.

When the prefix is prop: the following holds true: A row is created in the STOCKPROP table for each data row in the input file. In the first line of the header the prop: is followed by the CV.Name of the vocabulary used to designate the data values, followed by a colon, followed by the CVTERM.Name used to designate the data value. As rows are inserted into STOCKPROP, the related CVTERM.CVTerm_Id is used as the STOCKPROP.Type_Id and the value appearing in the column becomes the STOCKPROP.Value.

The value NULL has special meaning to chado-stock-load when appearing in a column which contains STOCKPROP.Value values. It indicates that the PostgreSQL NULL value should be stored in the database.

Note

There is no way to prevent the creation of either a STOCK_CVTERM row or STOCKPROP row when a STOCK row is created, if there are columns in the input file beyond the first 3. In other words, there is no special value that can be used in the input file to prevent the creation of a STOCK_CVTERM or STOCKPROP row.

The details of the creation follows:

Environment Variables

CHADO_STOCK_LOAD

Options which are passed to chado-stock-load automatically. This can be used to establish per-user or sitewide defaults.

PGDATABASE, PGHOST, PGPORT, PGUSER

Default PostgreSQL connection parameters. For further information see the PostgreSQL documentation.

FILES

~/.pgpass

The PostgreSQL per-user password file is consulted to obtain passwords. For further information see the PostgreSQL documentation.



[19] A sample conninfo connection string might look like user='scott', host='localhost', dbname='babase_test'.

[20] With the notable exclusion of the testing of constraints and triggers that are deferred until transaction commit.

[21] Automatically generated ids and other values produced by PostgreSQL sequence generators are consumed when --dry-run is used.

[22] chado-stock-load could increment STOCKPROP.Rank for those cases where the input file contains more than one column which creates properties and uses the same CV and CVTERM. At the time of this writing there are no such input files.


Page generated: 2021-09-17T11:17:06-04:00.