Differences between revisions 42 and 59 (spanning 17 versions)
Revision 42 as of 2008-03-08 03:01:15
Size: 34119
Editor: KarlPinc
Comment: The simplest select statement
Revision 59 as of 2021-09-09 19:49:10
Size: 54836
Editor: JakeGordon
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[TableOfContents]] <<TableOfContents>>
Line 3: Line 4:
SQL (Structured Query Language) is the standard programming language for creating and retrieving information that is stored in a relational database. The user interface or front-end (what you see online) is called phpPgAdmin. The following pages are meant as a basic introduction to SQL, phpPgAdmin, and query writing. Please see BaBase documentation for specifics about the database. There are many resources online for more detailed information about SQL. There are links to SQL tutorials and documentation through the Babase Wiki  SQL (Structured Query Language) is the standard programming language for creating and retrieving information that is stored in a relational database. The user interface or front-end (what you see online) is called phpPgAdmin. The following pages are meant as a basic introduction to SQL, phpPgAdmin, and query writing. Please see BaBase documentation for specifics about the database. There are many resources online for more detailed information about SQL. There are links to SQL tutorials and documentation through the Babase Wiki
Line 6: Line 7:
Line 12: Line 12:
Line 14: Line 13:

||UPPERCASE||||Uppercase indicates SQL language or commands||
||lowercase||||Lowercase indicates user defined items such as tables and columns specific to BaBase.||
||UPPERCASE ||||<style="text-align:center">Uppercase indicates SQL language or commands ||
||lowercase ||||<style="text-align:center">Lowercase indicates user defined items such as tables and columns specific to BaBase. ||


Line 38: Line 39:
 * When you create and execute a SELECT statement you are ‘querying’ the database.  
 * A SELECT statement is composed of several distinct keywords known as clauses. You define a SELECT statement using various configurations of the clauses to retrieve the information you require. Some clauses are required while others are optional. The main clauses in a SELECT statement are INTO, FROM, WHERE, GROUP BY, and ORDER BY.   
 * You may need multiple statements or queries to retrieve the desired information. 
 * When you create and execute a SELECT statement you are ‘querying’ the database.
 * A SELECT statement is composed of several distinct keywords known as clauses. You define a SELECT statement using various configurations of the clauses to retrieve the information you require. Some clauses are required while others are optional. The main clauses in a SELECT statement are INTO, FROM, WHERE, GROUP BY, and ORDER BY.
 * You may need multiple statements or queries to retrieve the desired information.
Line 44: Line 45:
Line 48: Line 48:
SELECT 'Hello World';}}}
SELECT 'Hello World';
}}}
Line 53: Line 53:
SELECT 'The answer is:', (2 * 20) + 2;}}}
SELECT 'The answer is:', (2 * 20) + 2;
}}}
Line 61: Line 61:
SELECT name FROM biograph;}}}
SELECT name FROM biograph;
}}}
Line 68: Line 68:
SELECT name, birth FROM biograph;}}}
SELECT name, birth FROM biograph;
}}}
Line 75: Line 75:
SELECT * FROM biograph;}}}
SELECT * FROM biograph;
}}}
Line 85: Line 85:
SELECT DISTINCT(sname) FROM members; }}}
SELECT DISTINCT(sname) FROM members;
}}}
Line 94: Line 94:

||Symbol ||||Usage ||
||= ||||Equality ||
||&lt;&gt;||||Inequality (not equal to) ||
||!= ||||Inequality (not equal to) ||
||< ||||Less than ||
||> ||||Greater than ||
||<= ||||Less than or equal to ||
||>= ||||Greater than or equal to ||
||Symbol ||||<style="text-align:center">Usage ||
||= ||||<style="text-align:center">Equality ||
||&lt;&gt; ||||<style="text-align:center">Inequality (not equal to) ||
||!= ||||<style="text-align:center">Inequality (not equal to) ||
||&lt;||||<style="text-align:center">Less than ||
||> ||||<style="text-align:center">Greater than ||
||&lt;=||||<style="text-align:center">Less than or equal to ||
||>= ||||<style="text-align:center">Greater than or equal to ||

Line 108: Line 109:
SELECT * FROM biograph 
         WHERE sname=’CHA’;}}}
SELECT * FROM biograph
         WHERE sname=’CHA’;
}}}
Line 114: Line 115:
SELECT * FROM biograph 
         WHERE sex=’F’;}}}
SELECT * FROM biograph
         WHERE sex=’F’;
}}}
Line 120: Line 121:
SELECT * FROM biograph 
         WHERE matgrp <> 9;}}}
SELECT * FROM biograph
         WHERE matgrp <> 9;
}}}
Line 128: Line 129:
SELECT * FROM biograph 
         WHERE birth < ‘2005-01-01’;}}}
SELECT * FROM biograph
         WHERE birth < ‘2005-01-01’;
}}}
Line 134: Line 135:
SELECT * FROM biograph 
         WHERE birth <= ‘2005-01-01’;}}}
SELECT * FROM biograph
         WHERE birth <= ‘2005-01-01’;
}}}
Line 147: Line 148:
SELECT * FROM biograph 
         WHERE matgrp=1.1 AND sex=’M’ AND birth >= ‘2005-01-01’;}}}
SELECT * FROM biograph
         WHERE matgrp=1.1 AND sex=’M’ AND birth >= ‘2005-01-01’;
}}}
Line 155: Line 156:
SELECT * FROM biograph 
         WHERE sname=’ELV’ OR sname=’CHA’ OR sname=’POW’;}}}
SELECT * FROM biograph
         WHERE sname=’ELV’ OR sname=’CHA’ OR sname=’POW’;
}}}
Line 163: Line 164:
SELECT * FROM biograph 
         WHERE NOT matgrp <= 3;}}}
SELECT * FROM biograph
         WHERE NOT matgrp <= 3;
}}}
Line 169: Line 170:
SELECT * FROM biograph 
         WHERE matgrp > 3;}}}
SELECT * FROM biograph
         WHERE matgrp > 3;
}}}
Line 178: Line 179:
SELECT * FROM biograph 
         WHERE name LIKE 'C%';}}}
SELECT * FROM biograph
         WHERE name LIKE 'C%';
}}}
Line 184: Line 185:
SELECT * FROM biograph 
         WHERE pid LIKE 'MET_';}}}
SELECT * FROM biograph
         WHERE pid LIKE 'MET_';
}}}
Line 188: Line 189:
  The exception to this rule is situations in which the pids, or other data fields contain double digits. This query will only return pids 1-9. To include all desired results, use

{{{
SELECT * FROM biograph 
         WHERE pid LIKE 'MET%';}}}


 .
The exception to this rule is situations in which the pids, or other data fields contain double digits. This query will only return pids 1-9. To include all desired results, use

{{{
SELECT * FROM biograph
         WHERE pid LIKE 'MET%';
}}}
Line 199: Line 200:
SELECT DISTINCT(sname) FROM members 
                       WHERE grp=1.1 AND date BETWEEN '2004-01-01' 
                                               AND '2005-01-01';}}}
SELECT DISTINCT(sname) FROM members
                       WHERE grp=1.1 AND date BETWEEN '2004-01-01'
                                               AND '2005-01-01';
}}}
Line 207: Line 208:
SELECT * FROM biograph 
         WHERE sname IN ('CHA', 'ELV', 'POW');}}}
SELECT * FROM biograph
         WHERE sname IN ('CHA', 'ELV', 'POW');
}}}
Line 216: Line 217:
SELECT * FROM biograph 
         WHERE pid IS NULL;}}}
SELECT * FROM biograph
         WHERE pid IS NULL;
}}}
Line 222: Line 223:
SELECT * FROM biograph 
         WHERE sname IS NOT NULL;}}}
SELECT * FROM biograph
         WHERE sname IS NOT NULL;
}}}
Line 233: Line 234:
SELECT * FROM biograph 
         WHERE sex=’M’ ORDER BY sname;}}}
SELECT * FROM biograph
         WHERE sex=’M’ ORDER BY sname;
}}}
Line 241: Line 242:
SELECT * FROM members 
         ORDER BY grp, date;}}}
SELECT * FROM members
         ORDER BY grp, date;
}}}
Line 247: Line 248:
SELECT * FROM members 
         WHERE date between '2005-01-01' and '2005-03-01' 
         ORDER by grp, date;}}}
SELECT * FROM members
         WHERE date between '2005-01-01' and '2005-03-01'
         ORDER by grp, date;
}}}
Line 254: Line 255:
In the queries we have seen so far we have accepted the return of the default column names. The AS clause can be used to create a column alias. A column alias is an alternative name that you specify to control how the column headings are displayed. If a column name is too long or cryptic to you, you may want to change it in your output. Double quotes are required around the new name unless the alias is a single word that contains only letters, digits, or underscores.

 * Change the name of a column in your output.

{{{
SELECT sname AS shortname FROM biograph;}}}
In the queries we have seen so far we have accepted the return of the default column names. The AS clause can be used to create a column alias. A column alias is an alternative name. The new name is used in both the column headings and elsewhere in the query itself. If a column name is too long or cryptic, or if the column is the result of a computation the query may be made more clear by changing the name.

Double quotes are required around the new name unless the alias is a single word that contains only lower case letters, digits, or underscores.

 * Change the name of a column.

{{{
SELECT 'The answer is:' AS "Label", (2 * 20) + 2 AS "Answer";
}}}
{{{
SELECT sname AS "short name" FROM biograph;
}}}
Line 264: Line 270:
SELECT sname AS "short name" FROM biograph;}}}

''These two statements will both change the name of the column sname in the output.''
SELECT sname AS shortname, statdate - birth AS daysalive FROM biograph WHERE statdate - birth < 20 ORDER BY daysalive, shortname;
}}}
''The computed value is given a column name in both the output column heading and the {{{ORDER BY}}} that orders the output, but not in the {{{WHERE}}} clause because the {{{WHERE}}} clause is needed to produce the output.''

When using column aliases, it is strongly-recommended that you use descriptive names that clearly illustrate what the column holds. It's tempting to abbreviate complicated titles with a few letters, if for no other reason than because it's easier to just type a few letters than a long name. For the sake of keeping your data readable (to others and even to yourself, in case you revisit old data a few months later), avoid over-abbreviating column names. See "[[https://papio.biology.duke.edu/babasewiki/SQL_Language#Saving_your_query_result_INTO_a_new_table|Saving your query result INTO a new table]]" for an example of over-abbreviating a table name.

== Writing Human Readable Queries ==
It is important to write your queries so that they can be read by a human. It is no good having a query that the computer understands but a human does not because it is the human that must interpret the query's result. Column aliases are useful in this regard, but the most useful technique is indentation. SQL statements can be continued over multiple lines; they do not end until the final semicolon. Breaking a query into multiple lines, and then using indentation that reflects the structure of the SQL, gives the reader an overview of the query structure at a glance and allows her to focus on and understand each component of the query in the context of the whole.

{{{
SELECT sname AS shortname
            , statdate - birth AS daysalive
  FROM biograph
  WHERE statdate - birth < 20
  ORDER BY daysalive, shortname;
}}}
''Line breaks and indentation makes queries easier to understand.''

Queries may contain comments. Comments begin with {{{--}}} (2 dashes) and continue to the end of the line. Comments can go anywhere; on a line by themselves or at the end of a line containing SQL. Judicious use of comments will help the reader understand the most complex of queries.

{{{
-- This is a comment. It does nothing, has no effect, and may as well not be here.
}}}
''Comments aid understanding, but unless they reveal what's otherwise hidden they are just clutter.''

Formatting and commenting queries ''as part of the writing process'' helps clarify goals, aids present and future understanding of the query, aids debugging, and helps ensure the query really does produce the result expected.

You may have noted that the query above, which returns baboons who have lived less than 20 days, does not consider whether the individuals are alive or dead. The results would include not only those individuals which lived less than 20 days, but also all individuals born within 20 days of the last observation of the group. Queries quickly become complex and hard to read, with subtleties that only the informed can appreciate. The formatting conventions and commenting techniques of the following example may be applied to queries of arbitrary complexity to greatly aid readability.

{{{
-- Show individuals who have lived less than 20 days
SELECT sname AS shortname
            , statdate - birth AS daysalive -- Subtracting two dates gives the number of days between the two
  FROM biograph
  WHERE statdate - birth < 20
                -- Don't show living individuals; do include more than just the confirmed dead
                AND status <> 0 -- Status of 0 means alive
  ORDER BY daysalive, shortname;
}}}
''Comments can document goals, describe obscure SQL, document a query's sub-components and the non-obvious implications of the choices made, explain data values, and so forth.''
Line 271: Line 314:

||Symbol ||||<style="text-align: center;">Usage ||
||+ ||||<style="text-align: center;">Addition ||
||- ||||<style="text-align: center;">Subtraction ||
||* ||||<style="text-align: center;">Multiplication ||
||/ ||||<style="text-align: center;">Division ||
||Symbol ||||<style="text-align:center;">Usage ||
||+ ||||<style="text-align:center;">Addition ||
||- ||||<style="text-align:center;">Subtraction ||
||* ||||<style="text-align:center;">Multiplication ||
||/ ||||<style="text-align:center;">Division ||

Line 282: Line 326:
SELECT (stop - start) AS "time spent" FROM interact;}}}
SELECT (stop - start) AS "time spent" FROM interact;
}}}
Line 289: Line 333:
SELECT sname, (statdate - birth) AS days FROM biograph;}}}
SELECT sname, (statdate - birth) AS days FROM biograph;
}}}
Line 296: Line 340:
Line 300: Line 343:
SELECT COUNT(sname) FROM biograph 
                    WHERE matgrp=1.1;}}}
SELECT COUNT(sname) FROM biograph
                    WHERE matgrp=1.1;
}}}
Line 308: Line 351:
SELECT COUNT(DISTINCT(sname)) FROM members 
                              WHERE grp=1.1;}}}
SELECT COUNT(DISTINCT(sname)) FROM members
                              WHERE grp=1.1;
}}}
Line 312: Line 355:
Line 313: Line 357:
Line 317: Line 360:
SELECT MAX(rank), MIN(rank) FROM ranks 
                            WHERE sname=’ROC’;}}}
SELECT MAX(rank), MIN(rank) FROM ranks
                            WHERE sname=’ROC’;
}}}
Line 325: Line 368:
SELECT MAX(date), MIN(date) FROM members 
                            WHERE grp=1.21;}}}
SELECT MAX(date), MIN(date) FROM members
                            WHERE grp=1.21;
}}}
Line 329: Line 372:
Line 330: Line 374:
Line 334: Line 377:
SELECT AVG(stop-start) FROM interact_data 
                       WHERE act='C' ;}}}
SELECT AVG(stop-start) FROM interact_data
                       WHERE act='C' ;
}}}
Line 338: Line 381:
Line 339: Line 383:
Line 343: Line 386:
SELECT SUM(stop-start) FROM interact_data 
                       WHERE act='C' ;}}}

''This query lets us know the total time for which we have observed consortships from the interact_data table.'' 
SELECT SUM(stop-start) FROM interact_data
                       WHERE act='C' ;
}}}
''This query lets us know the total time for which we have observed consortships from the interact_data table.''
Line 348: Line 392:
Line 352: Line 395:
SELECT FLOOR(1.21);}}}
SELECT FLOOR(1.21);
}}}
Line 359: Line 402:
SELECT CEILING(6.45);}}}

''This will return the value of 7 which is the smallest integer greater than 6.45.''    
SELECT CEILING(6.45);
}}}
''This will return the value of 7 which is the smallest integer greater than 6.45.''
Line 363: Line 407:
Line 369: Line 412:
SELECT matgrp, COUNT(matgrp) FROM biograph 
                             GROUP BY matgrp 
                             ORDER BY matgrp;}}}
SELECT matgrp, COUNT(matgrp) FROM biograph
                             GROUP BY matgrp
                             ORDER BY matgrp;
}}}
Line 378: Line 421:
SELECT matgrp, sex, COUNT(matgrp) FROM biograph 
                                  GROUP BY matgrp, sex 
                                  ORDER BY matgrp;}}}
SELECT matgrp, sex, COUNT(matgrp) FROM biograph
                                  GROUP BY matgrp, sex
                                  ORDER BY matgrp;
}}}
Line 383: Line 426:
Line 384: Line 428:
Line 386: Line 429:
Line 387: Line 431:

All of the queries so far have retrieved rows from a single table. In a relational database we may need information from multiple tables at once. In order to retrieve information from two or more tables simultaneously we need to understand how the columns in the different tables are related to each other. The Babase Pocket Reference outlines how the columns of all of the BaBase tables are connected to each other (entity diagram).  
All of the queries so far have retrieved rows from a single table. In a relational database we may need information from multiple tables at once. In order to retrieve information from two or more tables simultaneously we need to understand how the columns in the different tables are related to each other. The Babase Pocket Reference outlines how the columns of all of the BaBase tables are connected to each other (entity diagram).
Line 390: Line 434:
Line 394: Line 437:

||table.column||
||table.column ||


Line 400: Line 445:

A query that extracts data from more than one table must specify how we want to join the tables. There must be a connection! The tables are always joined row by row by satisfying whatever conditions you specify in the query. We must always specify one or more join conditions to be evaluated for each pair of joined rows.
A query that extracts data from more than one table must specify how we want to join the tables. There must be a connection! The tables are always joined row by row by satisfying whatever conditions you specify in the query. We must always specify one or more join conditions to be evaluated for each pair of joined rows.
Line 403: Line 448:
Line 407: Line 451:
SELECT biograph.sname, biograph.birth, maturedates.matured FROM biograph, maturedates 
                                                           WHERE biograph.sname=maturedates.sname;}}}
SELECT biograph.sname, biograph.birth, maturedates.matured FROM biograph, maturedates
                                                           WHERE biograph.sname=maturedates.sname;
}}}
Line 415: Line 459:
SELECT biograph.sname, biograph.birth, maturedates.matured, rankdates.ranked, consortdates.consorted FROM biograph, maturedates, rankdates,consortdates WHERE biograph.sname=maturedates.sname AND biograph.sname=rankdates.sname AND biograph.sname=consortdates.sname;}}}

''This query returns the sname and birth date from biograph, the matured date from maturedates, the ranked date from rankdates, and the consorted date from consortdates. Only the rows where the individual has all of these dates are returned. The next section describes joins which will allow you to retrieve the null or blank values as well.''     
SELECT biograph.sname, biograph.birth, maturedates.matured, rankdates.ranked, consortdates.consorted FROM biograph, maturedates, rankdates,consortdates WHERE biograph.sname=maturedates.sname AND biograph.sname=rankdates.sname AND biograph.sname=consortdates.sname;
}}}
''This query returns the sname and birth date from biograph, the matured date from maturedates, the ranked date from rankdates, and the consorted date from consortdates. Only the rows where the individual has all of these dates are returned. The next section describes joins which will allow you to retrieve the null or blank values as well.''
Line 419: Line 464:
Line 421: Line 465:

||Common Joins||||Description||
||Inner Join||||
The most common join. Uses a comparison operator (=, <>,<, >, <=, >=) to match rows from two tables based on the values from common columns.||
||Left Outer Join||||Returns all rows from the left table (first table mentioned is called the left table), not just the rows in which the joined columns match. Nulls are returned for the right tables if the left table has no matching rows in the right tables.||
||Right Outer Join||||The reverse of a Left Outer Join. All rows from the right table are returned. Nulls are returned for the left table if a right table has no matching row in the left table.||
||Full Outer Join||||A combination of the left and right outer joins. Returns all rows in both the left and right tables.||
||Common Joins ||||<style="text-align:center">Description ||
||Inner Join ||||<style="text-align:center">The most common join. Uses a comparison operator (=, <>,<, >, <=, >=) to match rows from two tables based on the values from common columns. ||
||Left Outer Join ||||<style="text-align:center">Returns all rows from the left table (first table mentioned is called the left table), not just the rows in which the joined columns match. Nulls are returned for the right tables if the left table has no matching rows in the right tables. ||
||Right Outer Join ||||<style="text-align:center">The reverse of a Left Outer Join. All rows from the right table are returned. Nulls are returned for the left table if a right table has no matching row in the left table. ||
||Full Outer Join ||||<style="text-align:center">A combination of the left and right outer joins. Returns all rows in both the left and right tables. ||


Line 432: Line 478:
       FROM biograph 
            INNER JOIN maturedates 
                       ON biograph.sname=maturedates.sname;}}}
       FROM biograph
            INNER JOIN maturedates
                       ON biograph.sname=maturedates.sname;
}}}
Line 441: Line 487:
SELECT biograph.sname, biograph.sex, maturedates.matured, 
members.grp FROM biograph 
                 INNER JOIN maturedates 
                            ON biograph.sname=maturedates.sname 
                 INNER JOIN members 
                            ON biograph.sname=members.sname 
                 WHERE members.grp=1.21 AND members.date='2005-01-01' 
                 ORDER BY biograph.sname;}}}
SELECT biograph.sname, biograph.sex, maturedates.matured,
members.grp FROM biograph
                 INNER JOIN maturedates
                            ON biograph.sname=maturedates.sname
                 INNER JOIN members
                            ON biograph.sname=members.sname
                 WHERE members.grp=1.21 AND members.date='2005-01-01'
                 ORDER BY biograph.sname;
}}}
Line 455: Line 501:
SELECT biograph.sname, biograph.birth, maturedates.matured 
        FROM biograph  
             LEFT OUTER JOIN maturedates 
                        ON biograph.sname=maturedates.sname;}}}
SELECT biograph.sname, biograph.birth, maturedates.matured
        FROM biograph
             LEFT OUTER JOIN maturedates
                        ON biograph.sname=maturedates.sname;
}}}
Line 465: Line 511:
SELECT biograph.sname, biograph.sex, maturedates.matured, members.grp 
           FROM biograph 
                LEFT OUTER JOIN maturedates 
                         ON biograph.sname=maturedates.sname 
                LEFT OUTER JOIN members 
                         ON biograph.sname=members.sname 
SELECT biograph.sname, biograph.sex, maturedates.matured, members.grp
           FROM biograph
                LEFT OUTER JOIN maturedates
                         ON biograph.sname=maturedates.sname
                LEFT OUTER JOIN members
                         ON biograph.sname=members.sname
Line 472: Line 518:
                ORDER BY biograph.sname;}}}                 ORDER BY biograph.sname;
}}}
Line 474: Line 521:
     Schemas partition databases. Tables, procedures, triggers, and so forth are all kept in schemas. Schemas are like sub-databases within a database. You can query tables from any schema within a particular database but not across databases. Tables within a database can be related, but tables in different databases cannot. Babase uses schemas to partition each database into areas where users have a greater or lesser degree of freedom to make changes. For further information on schemas see the schema documentation for PosgreSQL. Within the babase database the “official” babase tables are within the babase schema The system looks at the different schemas for objects, for example table names appearing in SQL queries, . If the table does not appear in the first schema it looks in the second, and so forth. As soon as table is found with the name given, that table is used and the search stops. To explicitly reference an object in a specific schema, place the name of the schema in front of the object, separating the two with a period (e.g. schemaname.tablename).  . Schemas partition databases. Tables, procedures, triggers, and so forth are all kept in schemas. Schemas are like sub-databases within a database. You can query tables from any schema within a particular database but not across databases. Tables within a database can be related, but tables in different databases cannot. Babase uses schemas to partition each database into areas where users have a greater or lesser degree of freedom to make changes. For further information on schemas see the schema documentation for PosgreSQL. Within the babase database the “official” babase tables are within the babase schema The system looks at the different schemas for objects, for example table names appearing in SQL queries, . If the table does not appear in the first schema it looks in the second, and so forth. As soon as table is found with the name given, that table is used and the search stops. To explicitly reference an object in a specific schema, place the name of the schema in front of the object, separating the two with a period (e.g. schemaname.tablename).
Line 479: Line 526:

||schemaname.tablename||
||schemaname.tablename ||


Line 483: Line 532:

||babase.biograph||

You can see which database you are in by looking at the schema search box within the SQL window. If you have selected the babase database the default search path is babase, then the sandbox, then your own personal schema. This means that the system looks at the different schemas for tables or columns, for example table names appearing in SQL queries, in the order in which the schemas are listed. If the table does not appear in the first schema it looks in the second, and so forth. As soon as table is found with the name given, that table is used and the search stops. a single SQL statement can refer to objects in the different schemas of the parent database, but cannot refer to objects in other databases -- tables within a database can be related, but tables in different databases cannot 
||babase.biograph ||




You can see which database you are in by looking at the schema search box within the SQL window. If you have selected the babase database the default search path is babase, then the sandbox, then your own personal schema. This means that the system looks at the different schemas for tables or columns, for example table names appearing in SQL queries, in the order in which the schemas are listed. If the table does not appear in the first schema it looks in the second, and so forth. As soon as table is found with the name given, that table is used and the search stops. a single SQL statement can refer to objects in the different schemas of the parent database, but cannot refer to objects in other databases -- tables within a database can be related, but tables in different databases cannot
Line 489: Line 540:

Unfortunately, at the present time downloading the results of a query requires the results be saved in a table and this table must be in your own schema.
In the SELECT statement you will need to specify that your new table will be saved in your schema. To identify the hierarchy, the schema and table name are separated by a dot or period.
 
In the SELECT statement you can specify that the query result be saved as a new table in your schema. To identify the hierarchy, the schema and table name are separated by a dot or period.
Line 495: Line 545:
SELECT * INTO username.mybiograph FROM biograph;}}}
SELECT * INTO username.mybiograph FROM biograph;
}}}
Line 499: Line 549:
As you start to write more-complicated queries and select them "into" new tables in your schema, it will be tempting to use small, highly-abbreviated names. DO NOT DO THIS! It may be easy for you at the moment to remember what that short name stands for, but in a few days/weeks/months it will be much more difficult. For example, perhaps you have a query that returns a set of juvenile males with their moms. You probably don't want to name the table "juvenile_males_with_moms", so it's tempting to use a query like this:

{{{
SELECT [columns] into username.jmm FROM [tables];
}}}
It'll be much faster to query from table "jmm", but in a few days or weeks, that abbreviation will lose all meaning to you. Better to use a more-descriptive name, like "juv_males_w_moms", "j_males_w_moms", or even "j_males_moms" (though that last one is getting close to being too abbreviated).
Line 500: Line 557:

********************NEED TO ADD THIS***************************** 
********************NEED TO ADD THIS*****************************
Line 504: Line 560:

********************NEED TO ADD THIS***************************** 
********************NEED TO ADD THIS*****************************
Line 508: Line 563:

You are allowed to delete tables from your own schema. Only BaBase administrators have sufficient privileges to edit or delete the BaBase tables. The tables you create in your own schemas are not considered a part of BaBase, even though they may have been created from data contained in BaBase tables. To delete a table go into your schema. You will see a list of all tables you have created. Under the Actions bar you will see a button called Drop. You will be asked if you are sure you would like to delete the table. If you are sure then hit the Drop button. Do NOT check the cascade button. Data managers may have permission to add and delete tables from some sandbox schemas (hybridity etc. but not babase itself).  
You are allowed to delete tables from your own schema. Only BaBase administrators have sufficient privileges to edit or delete the BaBase tables. The tables you create in your own schemas are not considered a part of BaBase, even though they may have been created from data contained in BaBase tables. To delete a table go into your schema. You will see a list of all tables you have created. Under the Actions bar you will see a button called Drop. You will be asked if you are sure you would like to delete the table. If you are sure then hit the Drop button. Do NOT check the cascade button. Data managers may have permission to add and delete tables from some sandbox schemas (hybridity etc. but not babase itself).
Line 512: Line 566:

* Turn your excel or text file into comma delimited format (CSV) using “save as”  
 * Go to your schema (You can’t create tables anywhere else unless you are a data manager).  
 * Hit the “create table” button 
 * Name the table (lowercase with characters other than underscore is best) 
 * Choose the number of columns  
 * Name the columns (exact same names and same case as your CSV file) 
 * Choose the data type for each column and whether nulls are allowed or not 
 * You now have an empty table that can be filled with your data. Click on the table which is now in your schema and hit the “import” button 
 * Pick “CSV” as the format and choose what your nulls look like (if any) 
 * Browse for your file and hit import. All the rows should now be there.
 * Turn your excel or text file into comma delimited format (CSV) or tab-delimited text (TXT) using “save as”
  * If you're using Mac OS X, you ''must'' save it as a TXT file.

 * Go to your schema (You can’t create tables anywhere else unless you are a data manager).
 * Hit the “create table” button
 * Name the table (lowercase with characters other than underscore is best)
 * Choose the number of columns
 * Name the columns (exact same names and same case as your CSV file)
 * Choose the data type for each column and whether nulls are allowed or not
 * You now have an empty table that can be filled with your data.
 * If using Windows or Linux:
  *
Click on the table which is now in your schema and hit the “import” button
  * Pick “CSV” as the format and choose what your nulls look like (if any)
  * Browse for your file and hit import. All the rows should now be there.  You can also use the [[https://papio.biology.duke.edu/programs/upload/|Babase Upload]] program, which Mac OS X users are required to use.
 * If using Mac OS X:
  * You cannot import data using the "import" button. Instead, open a new tab/window in your browser and use the [[https://papio.biology.duke.edu/programs/upload/|Babase Upload]] program (https://papio.biology.duke.edu/programs/upload/).
  * Fill in each field as appropriate:
   * Database: babase, unless for some reason you'll want to import babase_test or babase_copy
   * User: the ID you use to log in to babase
   * Password: the password you use to log in to babase
   * Table or view: the name of the table to which you want to add data. You should include the name of your schema, then a period, then the table's name, e.g. schema.tablename
   * Upload null values: if your data includes cells with null values, then check this.
   * NULL representation: type how your data have indicated NULL values. For example, if you type "\N" or "NULL" in each cell that you want to be NULL, then type /N or NULL, respectively. If you left your NULL cells blank, then leave this section empty as well.
  * Browse for your file and hit Upload. All the rows should now be there. If there are any problems with your data, nothing will import, and you will see a list of the errors in your data.
Line 525: Line 591:

 * Click on the table you would like to export
 * Hit the “export” button

 * Select “data only” and CSV format 
 * Under Option select “download” 
 * You will be prompted to save or open the file with excel. Either option is fine. If you save it it will be called dump.csv. You can rename it later. If you choose to open with excel you can do a “save as” and save it directly as the XLS file with the name of your choice. This is the easiest option. 
 * Click on the table you would like to export
 * Hit the “export” button
 * Select “data only” and CSV format
 * Under Option select “download”
 * You will be prompted to save or open the file with excel. Either option is fine. If you save it it will be called dump.csv. You can rename it later. If you choose to open with excel you can do a “save as” and save it directly as the XLS file with the name of your choice. This is the easiest option.
Line 533: Line 598:

The sandbox schema holds tables that are used together with the “official” BaBase tables but have not yet made it into the Babase project. They will not be documented in the BaBase documentation. The tables in the Sandbox will be considered “final copies” of data. Everyone can create tables in the Sandbox schema. Only place completely proofed and useable data in the Sandbox. 
The sandbox schema holds tables that are used together with the “official” BaBase tables but have not yet made it into the Babase project. They will not be documented in the BaBase documentation. The tables in the Sandbox will be considered “final copies” of data. Everyone can create tables in the Sandbox schema. Only place completely proofed and useable data in the Sandbox.
Line 537: Line 601:

All BaBase users have all the permissions in the sandbox schema. They may add, delete, or modify any information in the schema but may not alter the structure of the schema by adding or removing tables, procedures, triggers, or anything else. PosgreSQL, the database underlying Babase, is secure by default. This means that any tables or other database objects cannot be acesssed by anyone but their creator or without permission of the creator. Babase_editors who create tables in the sandbox schema should use the GRANT statement to grant access to Babase's other users. babase_editors permissions in the sandbox schema The babase_editors have all the permissions of the babase_readers, plus they may add or delete tables, stored procedures, or any other sort of object necessary to control the structure of the data.  
All BaBase users have all the permissions in the sandbox schema. They may add, delete, or modify any information in the schema but may not alter the structure of the schema by adding or removing tables, procedures, triggers, or anything else. PosgreSQL, the database underlying Babase, is secure by default. This means that any tables or other database objects cannot be acesssed by anyone but their creator or without permission of the creator. Babase_editors who create tables in the sandbox schema should use the GRANT statement to grant access to Babase's other users. babase_editors permissions in the sandbox schema The babase_editors have all the permissions of the babase_readers, plus they may add or delete tables, stored procedures, or any other sort of object necessary to control the structure of the data.
Line 540: Line 604:

{{{
CREATE TABLE sandbox.foo (somecolumn INTEGER);}}}

This is done as follows:  
{{{
CREATE TABLE sandbox.foo (somecolumn INTEGER);
}}}
This is done as follows:
Line 549: Line 612:
GRANT ALL ON sandbox.foo TO GROUP babase_editors;}}}
{{{
GRANT SELECT ON sandbox.foo TO GROUP babase_readers;}}}

There is one other issue. Only the creator of a table can change its structure -- to add another column, change the table name, etc. And only the creator can destroy (DROP) the table.
GRANT ALL ON sandbox.foo TO GROUP babase_editors;
}}}
{{{
GRANT SELECT ON sandbox.foo TO GROUP babase_readers;
}}}
 .
There is one other issue. Only the creator of a table can change its structure -- to add another column, change the table name, etc. And only the creator can destroy (DROP) the table.
Line 557: Line 622:

A view is a virtual table composed of fields from one or more tables. Views enable us to see information from the database from many different perspectives.  
A view is a virtual table composed of fields from one or more tables. Views enable us to see information from the database from many different perspectives.
Line 561: Line 625:

Save your SELECT statements! Saving your statements eliminates the need to recreate them every time you need to ask the same question of the database. At this point it is best to save your queries in any text program you like. It is also useful to write comments in your files. A comment is optional text that you type to explain your query Then you can copy and paste specific SELECT statements into the SQL window as many times as you like. Here is an example of a saved query

NOTE: SQL has assigned two hyphens, “--“, as a comment. This means that any text following the comment characters will be ignored by SQL. A comment continues to the end of a line.

-- count -- May 16, 2006 -- Count Number of
Save your SELECT statements! Saving your statements eliminates the need to recreate them every time you need to ask the same question of the database. If you write your queries in a text editor, you can easily save them on your computer and easily recall them in the future.

== About text editors: ==
 . The text editor(s) that comes pre-installed with your computer is adequate, but there are other options available for free that have more tools and can even help you check your SQL syntax:
  * For Windows, there's [[https://notepad-plus-plus.org/|Notepad++]]
  * For Mac OS X, there's [[http://www.barebones.com/products/textwrangler/|TextWrangler]]

== Documenting your queries with comments ==
After saving your query safely away somewhere, it may be weeks, months, or even years before you or someone else returns to look at the code you used. An elaborate calculation that you did may make sense now, but it probably won't be so clear in the future. For your own sake and that of others who may look at your code later, please include "comments" in your queries.

A comment is optional text that you type to help explain what's happening in your query, why, or anything else that may be noteworthy. SQL has assigned two hyphens, “--“, as a comment. This means that any text following the comment characters will be ignored by SQL. A comment continues until the first "newline" character (i.e. when you hit the enter/return key).

=== Some examples of comment use ===
There are a few different approaches a person can take when deciding when to add comments, where to write them, and how verbose the comment needs to be. Let's look at some examples from some "celebrity" authors.

Note that in every example, the user includes their name and the date. This is always a good idea.

==== All comments laid out before stating the query ====
{{{
--Thomas Robert Malthus, Jan 2016

--This query determines how many males and females we have blood samples for, as well as social connectedness data collected in the same year (sci data is stored in the 'sci' table).
--Exclude Lodge group, only look for available samples/not exhausted samples, restrict specifically to blood samples collected during particular periods when SCI data are available

select biograph.sname, birth, biograph.sex, count(avail)
from biograph, tissue, sci
where biograph.sname=tissue.sname
and avail='Y'
and sci.sname=biograph.sname
and sci.sname=tissue.sname
and sample_type='BLOOD'
and matgrp != '3.0'
and sci.start_date < collection_date
and sci.start_date+365 > collection_date
group by biograph.sname, collection_date, biograph.birth, biograph.sex, start_date, sci_af
order by biograph.sex, biograph.sname;
}}}
One major perk of this approach is that any explanations you might seek are easy to find because they're all in one place. This is also probably the easiest approach to writing comments, because you do it all at once and then are free to forget about it.

On the other hand, with this approach it can be hard to know which part(s) of a comment refers to which part(s) of the query. This makes this approach not ideal for complex queries.

==== Brief "big picture" explanation at beginning, then brief comments within the query on their own lines ====
{{{
--Alfred Russel Wallace, Jun 2014
--Query to extract data on early life variables for females in study groups. Comments indicated with --.

--1) pull basic data on everyone we might care about into a new table, following Charles's initial query and who survived to at least 4 years old (526 individuals)

select sname, matgrp, pid, left(pid,3) as mom, sex, birth, bstatus, biograph.status, statuses.descr as status_descr, statdate, ((statdate - birth)/365.25) as deathage
into coregroup_4yr
from biograph, statuses
where biograph.status = statuses.status
--exclude lodge group
and biograph.matgrp < '3.00'
--exclude animals with bad age estimates
and biograph.bstatus < '5'
--exclude animals born before the onset of monitoring
and birth > '1971-07-01'
--exclude animals who lived for < 1 day
and statdate > birth
--exclude individuals who died before age 4
and ((statdate - birth)/365.25) >=4
order by matgrp, sname;

--2) get all names and pids for moms represented in the core group and the statdate and birth for each kid
SELECT sname, name, pid, left(pid,3) as mom,(biograph.statdate - birth) AS deathagedays,birth;
FROM biograph
WHERE left(pid,3) in (SELECT DISTINCT coregroup_4yr.mom FROM coregroup_4yr);
}}}
Following this approach, comments are easy to write; everytime you write a more-complicated piece of code, you immediately add a comment explaning it. Naturally, this also helps make the code easier for a new reader to understand when explanations are right there among the text.

However, the extra lines of text can easily turn a relatively simple query into a wall of text that can thus become generally harder for a person to read. Also, if a commenter isn't consistent with comment placement, it can be unclear if a comment refers to code above it or below it.

==== Technical details laid out briefly at beginning, then comments on the same line as the code ====
{{{
-- Charles Lyell, Feb 2016
-- Rules for querying information about natal dispersal
    --only individuals above 6.5 years
    --only individuals with a bstatus <1
    --only individuals with matgrp <3
    --individuals with disperse confidence >2

    -- after you complete the query exclude the following individuals with BY dates: KRI, SAT, NJU

-- This is a nested query: the fourth select statement is the first one executed, and so on.

-- This query was written in TextWrangler for Mac. Comments and annotations are indicated with —- and are often provided after a query line.

select hold3.*

from

(select hold2.*
, dispersedates.dispersed
, (hold2.date - dispersedates.dispersed) as drd --# of days between dispersal and collection date of this line's fecal sample
, ((dispersedates.dispersed-hold2.birth)/365.25)::numeric(4,2) as dis_age
, rankdates.ranked
, (hold2.date - rankdates.ranked) as drr --# of days between rank attainment and collection date of this line's fecal sample
, ((rankdates.ranked-hold2.birth)/365.25)::numeric(4,2) as rank_age
, rankdates.rstatus, dispersedates.dispconfidence

from

(select hold.*

from

(select biograph.sname, biograph.matgrp, biograph.birth, fecal.prep.date, biograph.sex, fecal.prep.sid, fecal.results.gc, fecal.results.t, log(10, fecal.results.gc) as loggc, log(10,fecal.results.t) as logt, biograph.bstatus, ((fecal.prep.date-biograph.birth)/365.25)::numeric(4,2) as age

from biograph, fecal.prep, fecal.results

where biograph.sname = fecal.prep.sname --This is a join condition

and fecal.results.gc>0 --excludes null values

and fecal.results.t>0 --excludes null values

and fecal.prep.date between '2000-01-01' and '2013-12-31' --range of available data that include male dominance ranks

and biograph.sex = 'M'

and biograph.matgrp <3

and fecal.prep.sid = fecal.results.sid --This is a join condition

and biograph.bstatus <1 --includes individuals with known birthdates

)as hold

where hold.age between '6' and '10' --first cut to remove immature and older males, does not define the final age to be determined.

) as hold2

left join dispersedates --left join results in the inclusion of males from one table that have null values in the other table
on dispersedates.sname=hold2.sname --join condition

left join rankdates
on rankdates.sname = hold2.sname

) as hold3

where hold3.drd between -180 and 180 --use this alternative when you want data set restricted to 6 months before and after,

--if you want -1.5 years to +1.5, change to -540 to +540

and dis_age > 6.5

and hold3.dispconfidence>2

order by hold3.sname;
}}}
Note that in this case, many comments are reminders about SQL syntax (e.g. "This is a join condition"), rather than those in the previous examples that tended mostly to refer to the data itself (e.g. "exclude animals who lived for < 1 day"). This is okay; you should explain whatever you think may be unclear.

This approach is especially helpful when there are lots of technical details to relate. It's helpful for a reader if all those details are listed in one place at the beginning. However, as with an earlier example, when details are listed at the beginning it can be hard to know where a given rule is actually enforced in the code. In this example, the writer is trying to state the rules at the beginning and emphasize them again in-line, which is great. This can be risky, though: if you later decide to change a rule, it can be easy to change it in one place but not the other.

In this example, the writer included comments showing alternative code that could be added or could replace what's already there, e.g. "if you want -1.5 years to +1.5, change to -540 to +540". These kinds of comments are usually very helpful, and highly-encouraged.

Adding comments after a line of actual code has its pro's and con's. It's helpful for especially-brief comments (e.g. "excludes null values"), but if the comment or the code it refers to is especially long, then the whole line can become so long that the comment might be difficult to see.

==== Thorough explanation and rationale, then comments spread throughout the query ====
{{{
--Charles Darwin, Mar 2015
--Get relative (aka proportional) adult male ranks

--Ranks in Babase are recorded as "ordinal" rankings, i.e. first, second, third, etc. (or
--Alpha, Beta, Gamma, etc.) For some questions, it's may be useful to instead consider the
--ranks proportionally; being #1 in a group of 10 may be quite different from being #1 in
--a group of 50, for instance. This query calculates these "relative" ranks for each line
--in the RANKS table.

--In this case, the "proportion" being calculated is:
    --the adult male's rank / number of adult males in the group
    --("adult" males are defined by whether or not they have a rank date in RANKDATES)

--This query assumes that for any given month, all the adult males are ranked above the
--subadults and juveniles. If this is ever not so (a rel_rank > 1), a mistake was made in
--assigning ranks and/or rank dates. I should bring these cases to a data manager's
--attention ASAP.

--First, make a temporary version of RANKS that excludes subadult and juvenile males.
WITH adm_ranks AS
    (
    SELECT ranks.*
        FROM ranks
        JOIN rankdates
          ON (rankdates.sname = ranks.sname
            --Only allow males that have attained adult rank on/by this date
            AND rankdates.ranked <= ranks.rnkdate)
        WHERE ranks.rnktype = 'ALM'
    )

--Using the temporary adult males-only RANKS table, calculate relative rank
SELECT ord_ranks.rnkid
    , ord_ranks.sname
    , ord_ranks.rnkdate
    , ord_ranks.grp
    , ord_ranks.rnktype
    , ord_ranks.rank AS ord_rank
    , num_males.grp_size
    , (--Change both of these from integer so we can have a non-integer quotient
        ord_ranks.rank::numeric
        / num_males.grp_size::numeric
        )::numeric(3,2) AS rel_rank --Max 3 digits, 2 of which must be after the decimal
    FROM adm_ranks AS ord_ranks
    JOIN (--Get number of adult males by counting # of lines per grp-month in adm_ranks
         SELECT grp
             , rnkdate
             , COUNT(*) AS grp_size
             FROM adm_ranks
             GROUP BY grp, rnkdate
         ) AS num_males
      ON (num_males.grp = ord_ranks.grp
        AND num_males.rnkdate = ord_ranks.rnkdate)
    --In case I want to check a particular group and/or date:
    --WHERE ord_ranks.grp=2.11 AND ord_ranks.rnkdate>='2012-01-01'
    ORDER BY rnkdate, grp, ord_rank
    ;
}}}
When looking at someone's saved queries, the overall purpose of the query is often as unclear as the rationale behind a particular line of code. More than any of the other examples, this one uses comments to explain that purpose, not just the "why" behind a particular calculation or line of code. Most saved queries probably don't need the amount of detail used here to explain their purpose, but any amount of explanation along these lines can be helpful.

Note that unlike the others, this query also uses the recommendations laid out above for using descriptive aliases for [[https://papio.biology.duke.edu/babasewiki/SQL_Language#Creating_column_aliases_with_AS|column]] and [[https://papio.biology.duke.edu/babasewiki/SQL_Language#Saving_your_query_result_INTO_a_new_table|table]] names, and for using indentation to make the query generally more [[https://papio.biology.duke.edu/babasewiki/SQL_Language#Writing_Human_Readable_Queries|human-readable]].

This last approach tends to borrow from the styles shown in the previous examples. Sometimes a brief comment is added in-line, sometimes a comment is relegated to it's own line. Near the end, a "WHERE" statement is commented-out, but available to be used if needed. It's up to you to decide how you'll work. You might pick one of these, or (more likely) some hybrid of them. Whatever you decide, be consistent, and don't let yourself get lazy! You'll regret it in the future!
Line 570: Line 851:
Line 574: Line 854:
SELECT sname, (statdate - birth)/365.25 AS age FROM biograph;}}}
SELECT sname, (statdate - birth)/365.25 AS age FROM biograph;
}}}
Line 578: Line 858:
Note: There are multiple ways to calculate integer age and everyone has there own preference. The following example illustrates one method that used quite regularly in the Alberts lab. Note: There are multiple ways to calculate integer age and everyone has there own preference. The following example illustrates one method that used quite regularly in the Alberts lab.
Line 583: Line 863:
SELECT sname, FLOOR((statdate - birth)/365.25) + 1 AS age FROM biograph;}}}
SELECT sname, FLOOR((statdate - birth)/365.25) + 1 AS age FROM biograph;
}}}
Line 586: Line 866:

== Change a table name ==
 * Changing a table name

{{{
ALTER TABLE oldtablename RENAME TO newtablename
}}}
''This query will change the name of an existing table. Simply replace "oldtablename" and "newtablename" with your specific original table name and your revised table name, respectively.''

== Concatenate table columns ==
 * Concatenating table columns

{{{
SELECT *, season || year as seasonyear FROM sampletable
}}}
''This query will concatenate the data from two columns in an existing table into a single column. In the sample query above, sampletable has two columns called "season" and "year". By running this query, a new column is created that concatenates season and year. For example, a season value of "wet" and a year value of "2003" concatenate to form a new value of "wet2003" in the output table.''

== Extract year (or day or month) from a date field ==
 * Extracting year from a date field

{{{
SELECT *, extract (year from date) as year from sampletable
}}}
''This query will extract the year from a date column and save the value in a new column called "year". In the sample query above, sampletable already has a column for the date (called "date"). By running this query, a new column is created that extracts the year portion of the date column and saves the output in a new column. For example, a date value of 1976-09-20 will result in a year value of 1976. Similar query structure can be used to extract the day and month from a date field by replacing "year" in the above query with either "day" or "month".''

Contents

  1. The Basics
    1. Case sensitivity
    2. Conventions in SQL syntax
    3. Quotation marks
    4. Data Types
      1. Character
      2. Number
      3. Date
      4. Time
  2. Retrieving Data from a Table
    1. The Simplest SELECT Statements
    2. Retrieving columns with SELECT and FROM
    3. Eliminating duplicate rows with DISTINCT
    4. Filtering rows with WHERE
      1. Comparisons
      2. Combining and negating conditions with AND, OR, and NOT
      3. Matching patterns with LIKE
      4. Range filtering with BETWEEN
      5. List filtering with IN
      6. Testing for blanks or nulls with NULL
    5. Sorting rows with ORDER BY
    6. Creating column aliases with AS
    7. Writing Human Readable Queries
  3. Operators and Functions
    1. Performing Arithmetic Operations
  4. Summarizing and Grouping Data
    1. Aggregate Functions
      1. COUNT
      2. MIN and MAX
      3. AVG
      4. SUM
      5. FLOOR and CEILING
    2. Grouping rows with GROUP BY
    3. Filtering groups with HAVING
  5. Retrieving Data from Multiple Tables
    1. Qualifying column names
    2. Creating joins with JOIN or WHERE
      1. Selecting Columns from two or more tables with WHERE
      2. Join two or more tables using JOIN
  6. Schemas
    1. Saving your query result INTO a new table
    2. Queries with multiple SELECT statements
    3. Temporary tables
    4. Deleting tables
    5. Importing Tables into your schema
    6. Exporting Tables
    7. Sandbox
      1. Permissions
    8. Granting Permission to a table
  7. Views
  8. Saving your queries
    1. About text editors:
    2. Documenting your queries with comments
      1. Some examples of comment use
        1. All comments laid out before stating the query
        2. Brief "big picture" explanation at beginning, then brief comments within the query on their own lines
        3. Technical details laid out briefly at beginning, then comments on the same line as the code
        4. Thorough explanation and rationale, then comments spread throughout the query
  9. A Couple Useful Queries
    1. Calculate Age
    2. Change a table name
    3. Concatenate table columns
    4. Extract year (or day or month) from a date field

The Basics

SQL (Structured Query Language) is the standard programming language for creating and retrieving information that is stored in a relational database. The user interface or front-end (what you see online) is called phpPgAdmin. The following pages are meant as a basic introduction to SQL, phpPgAdmin, and query writing. Please see BaBase documentation for specifics about the database. There are many resources online for more detailed information about SQL. There are links to SQL tutorials and documentation through the Babase Wiki

Case sensitivity

SQL language is case insensitive but user-defined identifiers (table names, column names, and other such names for things that are not data) are case sensitive. For example, the SQL SELECT statement could be written as SELECT, SeLect, or select but the table name biograph is case sensitive. (A convention often used is to write key words in upper case and names in lower case, e.g.: SELECT * FROM babase;) Fortunately, unless identifiers are written inside double quotes (") they are automatically converted to lower case. In practice this means that you do not need to worry about case unless someone has gone to the trouble of creating mixed-case identifiers.

The point to take away is: create table names, column names, and the like, using lower case letters only. To do otherwise is to invite trouble.

Conventions in SQL syntax

In this document I have followed some SQL syntax conventions when writing query statements.

UPPERCASE

Uppercase indicates SQL language or commands

lowercase

Lowercase indicates user defined items such as tables and columns specific to BaBase.

Quotation marks

All quotation marks in SQL are straight quotes (such as ' and "), not curly quotes (such as ‘ and “). Curly quotes will not work in the SQL window. Be careful when cutting and pasting from text editors. Sometimes the default is curly quotes and you will receive an error message when you run the query. You can delete the quotes in the SQL query window and retype them. When you type directly into the SQL query window you will have the correct quote format.

Data Types

Each column in a table has a single data type. A data type determines a column’s allowable values. In general, we can classify columns as the following types.

Character

Represent text and are case sensitive. Whenever characters are requested they are case sensitive and need to be enclosed in single quotation marks. Examples of character columns are the name and pid columns in the biograph table.

Number

Numbers can be integers or decimals. Numbers can be negative, zero, or positive. Numbers do not require quotation marks. Examples of numeric columns are the matgrp and bstatus columns in the biograph table.

Date

A date stored in a column defined as DATE has three fields - YEAR, MONTH, and DAY - and is formatted yyyy-mm-dd. Dates must be enclosed in single quotations. Examples of date columns are the birth and statdate columns in the biograph table.

Time

Represents a time of day. A time stored in a column defined as TIME has three fields – HOUR, MINUTE, and SECOND – and is formatted hh:mm:ss. Times must be enclosed in single quotations. Examples of time columns are the start and stop columns in the interact_data table.Retrieving data from a table.

Retrieving Data from a Table

  • The SELECT statement forms the basis of every question you pose to the database.
  • When you create and execute a SELECT statement you are ‘querying’ the database.
  • A SELECT statement is composed of several distinct keywords known as clauses. You define a SELECT statement using various configurations of the clauses to retrieve the information you require. Some clauses are required while others are optional. The main clauses in a SELECT statement are INTO, FROM, WHERE, GROUP BY, and ORDER BY.
  • You may need multiple statements or queries to retrieve the desired information.
  • Every SELECT statement must end in a semi colon, ;

The Simplest SELECT Statements

The simplest select statements do not even return table content.

SELECT 'Hello World';

It is not very often that you ask SQL to return a constant value, but it does happen occasionally. More often, some columns are the result of computation.

SELECT 'The answer is:', (2 * 20) + 2;

Retrieving columns with SELECT and FROM

The simplest useful SELECT statement retrieves columns FROM a table. The SELECT and FROM clauses are always required if you retrieve columns from tables. All other clauses are optional.

  • To retrieve one column from a table

SELECT name FROM biograph;

This query will return every row in the name column of the biograph table.

  • To retrieve multiple columns from a table

SELECT name, birth FROM biograph;

This statement will return every name and birthdate in the biograph table.

  • To retrieve all columns from a table without writing them all out you can use an asterisk “*”.

SELECT * FROM biograph;

This will return all of the rows and columns contained in the biograph table.

Eliminating duplicate rows with DISTINCT

Although DISTINCT is an aggregate function (discussed below) it is useful to introduce this one earlier on.

  • Columns often contain duplicate values, and it is common to want a result that lists each duplicate only once.

SELECT DISTINCT(sname) FROM members;

The members table has multiple entries for most individuals. In this case the DISTINCT command returns each sname only once.

Filtering rows with WHERE

The result of each SELECT statement so far has included every row in the table (for the specified columns). The WHERE clause filters unwanted rows from the result. In a WHERE clause you specify a search condition that contains one or more conditions that need to be satisfied by the rows of the table. Rows for which the condition or conditions are true are returned.

Comparisons

The most common types of conditions involve comparing two values to each other. There are six types of comparisons we can define. Note that inequality can be defined in two different ways.

Symbol

Usage

=

Equality

<>

Inequality (not equal to)

!=

Inequality (not equal to)

<

Less than

>

Greater than

<=

Less than or equal to

>=

Greater than or equal to

  • Equality and inequality

SELECT * FROM biograph
         WHERE sname=’CHA’;

All information contained in the biograph table where the sname is ‘CHA’.

SELECT * FROM biograph
         WHERE sex=’F’;

All rows in the biograph table that have ‘F’ listed in the sex column will be returned. This is every row pertaining to a female.

SELECT * FROM biograph
         WHERE matgrp <> 9;

This query returns all rows from the biograph table where the matgrp (maternal group) is not 9.

  • Less than and greater than

SELECT * FROM biograph
         WHERE birth < ‘2005-01-01’;

This query will return every row in the biograph table where the birth date is before (but not including) January 1st, 2005.

SELECT * FROM biograph
         WHERE birth <= ‘2005-01-01’;

This query will return every row in the biograph table where the birth date is before or equal to January 1st, 2005.

The “greater than” statements work in exactly the same way as the “less than” statements shown above.

Combining and negating conditions with AND, OR, and NOT

Often you will need to specify multiple conditions in one SELECT statement. You can combine two or more conditions using the AND, OR, and NOT operators.

  • AND connects two or more conditions and returns true only if all conditions are true

SELECT * FROM biograph
         WHERE matgrp=1.1 AND sex=’M’ AND birth >= ‘2005-01-01’;

This statement will return all rows in the biograph table where the individual is male, has a matgrp of 1.1 (born in Nyayo’s group) and was born on or after January 1st 2005.

  • OR connects two or more conditions and returns true if either or all conditions are true

SELECT * FROM biograph
         WHERE sname=’ELV’ OR sname=’CHA’ OR sname=’POW’;

This statement will return all rows in the biograph table where the sname is ‘ELV’, ‘CHA’ or ‘POW’.

  • Unlike AND and OR, NOT doesn’t connect two conditions. Instead it negates (reverses) a single condition

SELECT * FROM biograph
         WHERE NOT matgrp <= 3;

All rows in the biograph table where matgrp is not less than or equal to 3 will be returned. This could also be stated without using a NOT statement and using

SELECT * FROM biograph
         WHERE matgrp > 3;

Matching patterns with LIKE

Preceding examples retrieved rows based on the exact value of a column or columns. You can use LIKE to retrieve rows based on partial information. LIKE works only with character strings. In the LIKE statement you can use two different wildcard characters. A percent sign (%) matches any string of zero or more characters. An underscore (_) matches any one character. You can negate a LIKE statement with NOT LIKE.

  • Find similar character strings

SELECT * FROM biograph
         WHERE name LIKE 'C%';

This statement will return all rows in the biograph table where the name starts with a “C”

SELECT * FROM biograph
         WHERE pid LIKE 'MET_';

This statement will return all rows in the biograph table where the pid starts with ‘MET’. In other words, all rows corresponding to an individual whose mother’s sname is ‘MET’.

  • The exception to this rule is situations in which the pids, or other data fields contain double digits. This query will only return pids 1-9. To include all desired results, use

SELECT * FROM biograph
         WHERE pid LIKE 'MET%';

Range filtering with BETWEEN

  • Use BETWEEN to determine whether a given value falls within a range

SELECT DISTINCT(sname) FROM members
                       WHERE grp=1.1 AND date BETWEEN '2004-01-01'
                                               AND '2005-01-01';

This statement will return the distinct snames of all the individuals in the members table from group 1.1 between (this is an inclusive range) January 1st 2004 and January 1st 2005. In other words, who was in group 1.1 at some point during the given time period.

List filtering with IN

SELECT * FROM biograph
         WHERE sname IN ('CHA', 'ELV', 'POW');

Testing for blanks or nulls with NULL

Nulls represent missing or unknown values. This situation can cause problems in WHERE clauses because unknown values do not satisfy specific conditions. Nulls match no values – not even other nulls.

  • Retrieve rows with nulls or non-null values

SELECT * FROM biograph
         WHERE pid IS NULL;

This statement returns every row in the biograph table where a null exists in the pid column. This translates to every individual whose mother is unknown to us.

SELECT * FROM biograph
         WHERE sname IS NOT NULL;

This statement returns every row in the biograph table where we have an sname for the individual. Aborted fetuses usually are not given a name but may still have a row in the biograph table.

Sorting rows with ORDER BY

Rows in a query result are unordered. Often you will want the output of your data in a particular order. For example, you may want your result ordered alphabetically or by date.

  • To sort your result by one column

SELECT * FROM biograph
         WHERE sex=’M’ ORDER BY sname;

This will return all entries in biograph where the sex is male and will be returned in alphabetical order by sname.

  • To sort your result by multiple columns

SELECT * FROM members
         ORDER BY grp, date;

This will return all entries in members where the group is equal to 2. The results will be in numerical order by group and, within each group, will be ordered chronologically by date.

SELECT * FROM members
         WHERE date between '2005-01-01' and '2005-03-01'
         ORDER by grp, date;

This will return all individuals present in any group during this time period, sorted into their groups and by the daye, in that order.

Creating column aliases with AS

In the queries we have seen so far we have accepted the return of the default column names. The AS clause can be used to create a column alias. A column alias is an alternative name. The new name is used in both the column headings and elsewhere in the query itself. If a column name is too long or cryptic, or if the column is the result of a computation the query may be made more clear by changing the name.

Double quotes are required around the new name unless the alias is a single word that contains only lower case letters, digits, or underscores.

  • Change the name of a column.

SELECT 'The answer is:' AS "Label", (2 * 20) + 2 AS "Answer";

SELECT sname AS "short name" FROM biograph;

or

SELECT sname AS shortname, statdate - birth AS daysalive FROM biograph WHERE statdate - birth < 20 ORDER BY daysalive, shortname;

The computed value is given a column name in both the output column heading and the ORDER BY that orders the output, but not in the WHERE clause because the WHERE clause is needed to produce the output.

When using column aliases, it is strongly-recommended that you use descriptive names that clearly illustrate what the column holds. It's tempting to abbreviate complicated titles with a few letters, if for no other reason than because it's easier to just type a few letters than a long name. For the sake of keeping your data readable (to others and even to yourself, in case you revisit old data a few months later), avoid over-abbreviating column names. See "Saving your query result INTO a new table" for an example of over-abbreviating a table name.

Writing Human Readable Queries

It is important to write your queries so that they can be read by a human. It is no good having a query that the computer understands but a human does not because it is the human that must interpret the query's result. Column aliases are useful in this regard, but the most useful technique is indentation. SQL statements can be continued over multiple lines; they do not end until the final semicolon. Breaking a query into multiple lines, and then using indentation that reflects the structure of the SQL, gives the reader an overview of the query structure at a glance and allows her to focus on and understand each component of the query in the context of the whole.

SELECT sname AS shortname
            , statdate - birth AS daysalive
  FROM biograph
  WHERE statdate - birth < 20
  ORDER BY daysalive, shortname;

Line breaks and indentation makes queries easier to understand.

Queries may contain comments. Comments begin with -- (2 dashes) and continue to the end of the line. Comments can go anywhere; on a line by themselves or at the end of a line containing SQL. Judicious use of comments will help the reader understand the most complex of queries.

-- This is a comment.  It does nothing, has no effect, and may as well not be here.

Comments aid understanding, but unless they reveal what's otherwise hidden they are just clutter.

Formatting and commenting queries as part of the writing process helps clarify goals, aids present and future understanding of the query, aids debugging, and helps ensure the query really does produce the result expected.

You may have noted that the query above, which returns baboons who have lived less than 20 days, does not consider whether the individuals are alive or dead. The results would include not only those individuals which lived less than 20 days, but also all individuals born within 20 days of the last observation of the group. Queries quickly become complex and hard to read, with subtleties that only the informed can appreciate. The formatting conventions and commenting techniques of the following example may be applied to queries of arbitrary complexity to greatly aid readability.

-- Show individuals who have lived less than 20 days
SELECT sname AS shortname
            , statdate - birth AS daysalive         -- Subtracting two dates gives the number of days between the two
  FROM biograph
  WHERE statdate - birth < 20
                -- Don't show living individuals; do include more than just the confirmed dead
                AND status <> 0                           -- Status of 0 means alive
  ORDER BY daysalive, shortname;

Comments can document goals, describe obscure SQL, document a query's sub-components and the non-obvious implications of the choices made, explain data values, and so forth.

Operators and Functions

Performing Arithmetic Operations

To add, subtract, multiply, or divide use the operators found in the following table.

Symbol

Usage

+

Addition

-

Subtraction

*

Multiplication

/

Division

  • Using subtraction with times

SELECT (stop - start) AS "time spent" FROM interact;

This query will create a column called “time spent” which will give us the length of each interaction in the interact table based on the start and stop time.

  • Using subtraction with dates

SELECT sname, (statdate - birth) AS days FROM biograph;

This query will create a column with sname and one called “days” with the number of days from birth until the statdate for each individual.

Summarizing and Grouping Data

Aggregate Functions

COUNT

  • The count command does exactly what it says, it counts whatever you ask of it

SELECT COUNT(sname) FROM biograph
                    WHERE matgrp=1.1;

This will show how many individuals were born into Nyayo’s group (1.1)

  • Many times it is very useful to combine the COUNT and the DISTINCT commands

SELECT COUNT(DISTINCT(sname)) FROM members
                              WHERE grp=1.1;

The members table has multiple entries for most individuals. In this case the DISTINCT command lets us count each sname only once. This query tells us how many different individuals have been a member of group 1.1.

MIN and MAX

  • As it sounds, these commands are used to determine the maximum and minimum value in a query

SELECT MAX(rank), MIN(rank) FROM ranks
                            WHERE sname=’ROC’;

The result of this query is the maximum numeric rank attained by Rocky (5), and the minimum numeric rank attained by Rocky (1).

  • The maximum and minimum can also be used with dates

SELECT MAX(date), MIN(date) FROM members
                            WHERE grp=1.21;

The result of this query is the first and last dates recorded in the members table for Omo’s group (1.21).

AVG

  • This command allows you to find the average or mean of a group of numbers.

SELECT AVG(stop-start) FROM interact_data
                       WHERE act='C' ;

This query lets us know the average length of all consortships found in the interact_data table

SUM

  • This command allows you to find the sum of a group of numbers.

SELECT SUM(stop-start) FROM interact_data
                       WHERE act='C' ;

This query lets us know the total time for which we have observed consortships from the interact_data table.

FLOOR and CEILING

  • FLOOR returns the largest integer equal or less to the value provided. This is useful when rounding numbers. We tend to use this when calculating age in years. See Useful Queries section

SELECT FLOOR(1.21);

This will return the value of 1 which is the largest integer equal or less than 1.21.

  • The CEILING function returns the smallest integer value that is greater than the value provided. This is equivalent to rounding up to the next whole number.

SELECT CEILING(6.45);

This will return the value of 7 which is the smallest integer greater than 6.45.

Grouping rows with GROUP BY

All of the examples so far have applied the aggregate functions across all the rows returned by the FROM and WHERE clauses. You can use the GROUP BY clause to divide a table into groups or categories and apply aggregate functions to each subset. The GROUP BY clause comes after the WHERE clause but before the ORDER BY clause. An example will clarify the concept.

  • Use the COUNT aggregate function with GROUP BY

SELECT matgrp, COUNT(matgrp) FROM biograph
                             GROUP BY matgrp
                             ORDER BY matgrp;

This query returns the total number of individuals attributed to each matgrp in the biograph table. The result shows the count of individuals grouped by matgrp.

  • You can use GROUP BY with more than one grouping column

SELECT matgrp, sex, COUNT(matgrp) FROM biograph
                                  GROUP BY matgrp, sex
                                  ORDER BY matgrp;

This query returns the total number of individuals attributed to each matgrp in the biograph table. The results are also grouped by sex. The result shows the count of individuals grouped by matgrp and then sex.

Filtering groups with HAVING

**************this needs to be added**************************

Retrieving Data from Multiple Tables

All of the queries so far have retrieved rows from a single table. In a relational database we may need information from multiple tables at once. In order to retrieve information from two or more tables simultaneously we need to understand how the columns in the different tables are related to each other. The Babase Pocket Reference outlines how the columns of all of the BaBase tables are connected to each other (entity diagram).

Qualifying column names

Column names must be unique within a table but can be reused in another table. To identify an otherwise ambiguous column uniquely in a query that involves multiple tables, we must use a qualified name. A qualified name is a table name followed by a dot followed by the name of the column in the table. Because table names within the database must be unique, a qualified name uniquely identifies a single column within the entire database. If two tables have any columns in common you must qualify all references to these columns throughout the query.

A qualified table name is of the form:

table.column

It is considered good form by the SQL digirati to always qualify all column names.

Creating joins with JOIN or WHERE

A query that extracts data from more than one table must specify how we want to join the tables. There must be a connection! The tables are always joined row by row by satisfying whatever conditions you specify in the query. We must always specify one or more join conditions to be evaluated for each pair of joined rows.

Selecting Columns from two or more tables with WHERE

  • You may want to retrieve information from both the biograph table and the maturedates table

SELECT biograph.sname, biograph.birth, maturedates.matured FROM biograph, maturedates
                                                           WHERE biograph.sname=maturedates.sname;

This query returns the sname and birth columns from the biograph table and the matured column from the maturedates table. The two tables are connected by sname. We are returned only the rows where we have a birth date and a matured date associated with a particular sname.

  • You can retrieve information from many tables at once as long as there is common column information with which you can compare the tables.

SELECT biograph.sname, biograph.birth, maturedates.matured, rankdates.ranked, consortdates.consorted  FROM biograph, maturedates, rankdates,consortdates  WHERE biograph.sname=maturedates.sname  AND biograph.sname=rankdates.sname  AND biograph.sname=consortdates.sname;

This query returns the sname and birth date from biograph, the matured date from maturedates, the ranked date from rankdates, and the consorted date from consortdates. Only the rows where the individual has all of these dates are returned. The next section describes joins which will allow you to retrieve the null or blank values as well.

Join two or more tables using JOIN

The most commonly used JOIN statements are shown below. There are others. I have given examples of the most useful joins.

Common Joins

Description

Inner Join

The most common join. Uses a comparison operator (=, <>,<, >, <=, >=) to match rows from two tables based on the values from common columns.

Left Outer Join

Returns all rows from the left table (first table mentioned is called the left table), not just the rows in which the joined columns match. Nulls are returned for the right tables if the left table has no matching rows in the right tables.

Right Outer Join

The reverse of a Left Outer Join. All rows from the right table are returned. Nulls are returned for the left table if a right table has no matching row in the left table.

Full Outer Join

A combination of the left and right outer joins. Returns all rows in both the left and right tables.

  • We can perform the same query from above (retrieving rows from biograph and maturedates using WHERE) using the INNER JOIN command

SELECT biograph.sname, biograph.birth, maturedates.matured
       FROM biograph
            INNER JOIN maturedates
                       ON biograph.sname=maturedates.sname;

This query returns exact same information as above using INNER JOIN instead of WHERE.

  • You can select information from many tables at once using INNER JOIN

SELECT biograph.sname, biograph.sex, maturedates.matured,
members.grp FROM biograph
                 INNER JOIN maturedates
                            ON biograph.sname=maturedates.sname
                 INNER JOIN members
                            ON biograph.sname=members.sname
                 WHERE members.grp=1.21 AND members.date='2005-01-01'
                 ORDER BY biograph.sname;

This more complicated query returns the snames of all the matured individuals plus sex, matured date, and group for all who were in grp 1.21 on January 1st 2005. They are ordered alphabetically by sname.

  • Example using LEFT OUTER JOIN

SELECT biograph.sname, biograph.birth, maturedates.matured
        FROM biograph
             LEFT OUTER JOIN maturedates
                        ON biograph.sname=maturedates.sname;

If you take a look at the results of this query you can see that all rows from the biograph table (left table) are returned. We can now see which individuals do not have a matured date associated with them as well as the ones do.

  • Example using LEFT OUTER JOIN with more than two tables

SELECT biograph.sname, biograph.sex, maturedates.matured, members.grp
           FROM biograph
                LEFT OUTER JOIN maturedates
                         ON biograph.sname=maturedates.sname
                LEFT OUTER JOIN members
                         ON biograph.sname=members.sname
                WHERE members.grp=1.21 AND members.date='2005-01-01'
                ORDER BY biograph.sname;

Schemas

  • Schemas partition databases. Tables, procedures, triggers, and so forth are all kept in schemas. Schemas are like sub-databases within a database. You can query tables from any schema within a particular database but not across databases. Tables within a database can be related, but tables in different databases cannot. Babase uses schemas to partition each database into areas where users have a greater or lesser degree of freedom to make changes. For further information on schemas see the schema documentation for PosgreSQL. Within the babase database the “official” babase tables are within the babase schema The system looks at the different schemas for objects, for example table names appearing in SQL queries, . If the table does not appear in the first schema it looks in the second, and so forth. As soon as table is found with the name given, that table is used and the search stops. To explicitly reference an object in a specific schema, place the name of the schema in front of the object, separating the two with a period (e.g. schemaname.tablename).

Each BaBase user has his or her own area of the database, called a schema, where they can create their own tables. Your schema is identified by your BaBase login name and is located within the babase database. We qualify schema names and the associated tables the same way we qualified tables and columns above.

A qualified table name within a schema is of the form:

schemaname.tablename

The biograph table could be written as:

babase.biograph

You can see which database you are in by looking at the schema search box within the SQL window. If you have selected the babase database the default search path is babase, then the sandbox, then your own personal schema. This means that the system looks at the different schemas for tables or columns, for example table names appearing in SQL queries, in the order in which the schemas are listed. If the table does not appear in the first schema it looks in the second, and so forth. As soon as table is found with the name given, that table is used and the search stops. a single SQL statement can refer to objects in the different schemas of the parent database, but cannot refer to objects in other databases -- tables within a database can be related, but tables in different databases cannot

Saving your query result INTO a new table

In the SELECT statement you can specify that the query result be saved as a new table in your schema. To identify the hierarchy, the schema and table name are separated by a dot or period.

  • Save the results of a query into a table in your schema (username)

SELECT * INTO username.mybiograph FROM biograph;

A table called mybiograph will be created in the “username” schema. This new table is a copy of the biograph table (all rows and columns selected).

As you start to write more-complicated queries and select them "into" new tables in your schema, it will be tempting to use small, highly-abbreviated names. DO NOT DO THIS! It may be easy for you at the moment to remember what that short name stands for, but in a few days/weeks/months it will be much more difficult. For example, perhaps you have a query that returns a set of juvenile males with their moms. You probably don't want to name the table "juvenile_males_with_moms", so it's tempting to use a query like this:

SELECT [columns] into username.jmm FROM [tables];

It'll be much faster to query from table "jmm", but in a few days or weeks, that abbreviation will lose all meaning to you. Better to use a more-descriptive name, like "juv_males_w_moms", "j_males_w_moms", or even "j_males_moms" (though that last one is getting close to being too abbreviated).

Queries with multiple SELECT statements

********************NEED TO ADD THIS*****************************

Temporary tables

********************NEED TO ADD THIS*****************************

Deleting tables

You are allowed to delete tables from your own schema. Only BaBase administrators have sufficient privileges to edit or delete the BaBase tables. The tables you create in your own schemas are not considered a part of BaBase, even though they may have been created from data contained in BaBase tables. To delete a table go into your schema. You will see a list of all tables you have created. Under the Actions bar you will see a button called Drop. You will be asked if you are sure you would like to delete the table. If you are sure then hit the Drop button. Do NOT check the cascade button. Data managers may have permission to add and delete tables from some sandbox schemas (hybridity etc. but not babase itself).

Importing Tables into your schema

  • Turn your excel or text file into comma delimited format (CSV) or tab-delimited text (TXT) using “save as”
    • If you're using Mac OS X, you must save it as a TXT file.

  • Go to your schema (You can’t create tables anywhere else unless you are a data manager).
  • Hit the “create table” button
  • Name the table (lowercase with characters other than underscore is best)
  • Choose the number of columns
  • Name the columns (exact same names and same case as your CSV file)
  • Choose the data type for each column and whether nulls are allowed or not
  • You now have an empty table that can be filled with your data.
  • If using Windows or Linux:
    • Click on the table which is now in your schema and hit the “import” button
    • Pick “CSV” as the format and choose what your nulls look like (if any)
    • Browse for your file and hit import. All the rows should now be there. You can also use the Babase Upload program, which Mac OS X users are required to use.

  • If using Mac OS X:
    • You cannot import data using the "import" button. Instead, open a new tab/window in your browser and use the Babase Upload program (https://papio.biology.duke.edu/programs/upload/).

    • Fill in each field as appropriate:
      • Database: babase, unless for some reason you'll want to import babase_test or babase_copy
      • User: the ID you use to log in to babase
      • Password: the password you use to log in to babase
      • Table or view: the name of the table to which you want to add data. You should include the name of your schema, then a period, then the table's name, e.g. schema.tablename
      • Upload null values: if your data includes cells with null values, then check this.
      • NULL representation: type how your data have indicated NULL values. For example, if you type "\N" or "NULL" in each cell that you want to be NULL, then type /N or NULL, respectively. If you left your NULL cells blank, then leave this section empty as well.
    • Browse for your file and hit Upload. All the rows should now be there. If there are any problems with your data, nothing will import, and you will see a list of the errors in your data.

Exporting Tables

  • Click on the table you would like to export
  • Hit the “export” button
  • Select “data only” and CSV format
  • Under Option select “download”
  • You will be prompted to save or open the file with excel. Either option is fine. If you save it it will be called dump.csv. You can rename it later. If you choose to open with excel you can do a “save as” and save it directly as the XLS file with the name of your choice. This is the easiest option.

Sandbox

The sandbox schema holds tables that are used together with the “official” BaBase tables but have not yet made it into the Babase project. They will not be documented in the BaBase documentation. The tables in the Sandbox will be considered “final copies” of data. Everyone can create tables in the Sandbox schema. Only place completely proofed and useable data in the Sandbox.

Permissions

All BaBase users have all the permissions in the sandbox schema. They may add, delete, or modify any information in the schema but may not alter the structure of the schema by adding or removing tables, procedures, triggers, or anything else. PosgreSQL, the database underlying Babase, is secure by default. This means that any tables or other database objects cannot be acesssed by anyone but their creator or without permission of the creator. Babase_editors who create tables in the sandbox schema should use the GRANT statement to grant access to Babase's other users. babase_editors permissions in the sandbox schema The babase_editors have all the permissions of the babase_readers, plus they may add or delete tables, stored procedures, or any other sort of object necessary to control the structure of the data.

Granting Permission to a table

CREATE TABLE sandbox.foo (somecolumn INTEGER);

This is done as follows:

Example 2.2. Granting permission to table foo in the sandbox schema

GRANT ALL ON sandbox.foo TO GROUP babase_editors;

GRANT SELECT ON sandbox.foo TO GROUP babase_readers;
  • There is one other issue. Only the creator of a table can change its structure -- to add another column, change the table name, etc. And only the creator can destroy (DROP) the table.

It is important to note that you have to consider who you want to be able to view these tables versus who you want to be able to edit these tables. Also, when granting permissions, granting permission to babase_readers, does not grant permission to babase_editors. These are two discrete groups. You can grant "select" permissions to babase_editors instead of "all" permissions, but just granting permission to readers does not automatically include giving permissions to the group babase_editors.

Views

A view is a virtual table composed of fields from one or more tables. Views enable us to see information from the database from many different perspectives.

Saving your queries

Save your SELECT statements! Saving your statements eliminates the need to recreate them every time you need to ask the same question of the database. If you write your queries in a text editor, you can easily save them on your computer and easily recall them in the future.

About text editors:

  • The text editor(s) that comes pre-installed with your computer is adequate, but there are other options available for free that have more tools and can even help you check your SQL syntax:

Documenting your queries with comments

After saving your query safely away somewhere, it may be weeks, months, or even years before you or someone else returns to look at the code you used. An elaborate calculation that you did may make sense now, but it probably won't be so clear in the future. For your own sake and that of others who may look at your code later, please include "comments" in your queries.

A comment is optional text that you type to help explain what's happening in your query, why, or anything else that may be noteworthy. SQL has assigned two hyphens, “--“, as a comment. This means that any text following the comment characters will be ignored by SQL. A comment continues until the first "newline" character (i.e. when you hit the enter/return key).

Some examples of comment use

There are a few different approaches a person can take when deciding when to add comments, where to write them, and how verbose the comment needs to be. Let's look at some examples from some "celebrity" authors.

Note that in every example, the user includes their name and the date. This is always a good idea.

All comments laid out before stating the query

--Thomas Robert Malthus, Jan 2016

--This query determines how many males and females we have blood samples for, as well as social connectedness data collected in the same year (sci data is stored in the 'sci' table).
--Exclude Lodge group, only look for available samples/not exhausted samples, restrict specifically to blood samples collected during particular periods when SCI data are available

select biograph.sname, birth, biograph.sex, count(avail)
from biograph, tissue, sci
where biograph.sname=tissue.sname
and avail='Y'
and sci.sname=biograph.sname
and sci.sname=tissue.sname
and sample_type='BLOOD'
and matgrp != '3.0'
and sci.start_date < collection_date
and sci.start_date+365 > collection_date
group by biograph.sname, collection_date, biograph.birth, biograph.sex, start_date, sci_af
order by biograph.sex, biograph.sname;

One major perk of this approach is that any explanations you might seek are easy to find because they're all in one place. This is also probably the easiest approach to writing comments, because you do it all at once and then are free to forget about it.

On the other hand, with this approach it can be hard to know which part(s) of a comment refers to which part(s) of the query. This makes this approach not ideal for complex queries.

Brief "big picture" explanation at beginning, then brief comments within the query on their own lines

--Alfred Russel Wallace, Jun 2014
--Query to extract data on early life variables for females in study groups.  Comments indicated with --.

--1) pull basic data on everyone we might care about into a new table, following Charles's initial query and who survived to at least 4 years old (526 individuals)

select sname, matgrp, pid, left(pid,3) as mom, sex, birth, bstatus, biograph.status, statuses.descr as status_descr, statdate, ((statdate - birth)/365.25) as deathage
into coregroup_4yr
from biograph, statuses
where biograph.status = statuses.status
--exclude lodge group
and biograph.matgrp < '3.00'
--exclude animals with bad age estimates
and biograph.bstatus < '5'
--exclude animals born before the onset of monitoring
and birth > '1971-07-01'
--exclude animals who lived for < 1 day
and statdate > birth
--exclude individuals who died before age 4
and ((statdate - birth)/365.25) >=4
order by matgrp, sname;

--2) get all names and pids for moms represented in the core group and the statdate and birth for each kid
SELECT sname, name, pid, left(pid,3) as mom,(biograph.statdate - birth) AS deathagedays,birth;
FROM biograph
WHERE left(pid,3) in (SELECT DISTINCT coregroup_4yr.mom FROM coregroup_4yr);

Following this approach, comments are easy to write; everytime you write a more-complicated piece of code, you immediately add a comment explaning it. Naturally, this also helps make the code easier for a new reader to understand when explanations are right there among the text.

However, the extra lines of text can easily turn a relatively simple query into a wall of text that can thus become generally harder for a person to read. Also, if a commenter isn't consistent with comment placement, it can be unclear if a comment refers to code above it or below it.

Technical details laid out briefly at beginning, then comments on the same line as the code

-- Charles Lyell, Feb 2016
-- Rules for querying information about natal dispersal
    --only individuals above 6.5 years
    --only individuals with a bstatus <1
    --only individuals with matgrp <3
    --individuals with disperse confidence >2

    -- after you complete the query exclude the following individuals with BY dates: KRI, SAT, NJU

-- This is a nested query: the fourth select statement is the first one executed, and so on.

-- This query was written in TextWrangler for Mac. Comments and annotations are indicated with —- and are often provided after a query line.

select hold3.*

from

(select hold2.*
, dispersedates.dispersed
, (hold2.date - dispersedates.dispersed) as drd --# of days between dispersal and collection date of this line's fecal sample
, ((dispersedates.dispersed-hold2.birth)/365.25)::numeric(4,2) as dis_age
, rankdates.ranked
, (hold2.date - rankdates.ranked) as drr --# of days between rank attainment and collection date of this line's fecal sample
, ((rankdates.ranked-hold2.birth)/365.25)::numeric(4,2) as rank_age
, rankdates.rstatus, dispersedates.dispconfidence

from

(select hold.*

from

(select biograph.sname, biograph.matgrp, biograph.birth, fecal.prep.date, biograph.sex, fecal.prep.sid, fecal.results.gc, fecal.results.t, log(10, fecal.results.gc) as loggc, log(10,fecal.results.t) as logt, biograph.bstatus, ((fecal.prep.date-biograph.birth)/365.25)::numeric(4,2) as age

from biograph, fecal.prep, fecal.results

where biograph.sname = fecal.prep.sname --This is a join condition

and fecal.results.gc>0 --excludes null values

and fecal.results.t>0 --excludes null values

and fecal.prep.date between '2000-01-01' and '2013-12-31' --range of available data that include male dominance ranks

and biograph.sex = 'M'

and biograph.matgrp <3

and fecal.prep.sid = fecal.results.sid --This is a join condition

and biograph.bstatus <1 --includes individuals with known birthdates

)as hold

where hold.age between '6' and '10' --first cut to remove immature and older males, does not define the final age to be determined.

) as hold2

left join dispersedates --left join results in the inclusion of males from one table that have null values in the other table
on dispersedates.sname=hold2.sname --join condition

left join rankdates
on rankdates.sname = hold2.sname

) as hold3

where hold3.drd between -180 and 180 --use this alternative when you want data set restricted to 6 months before and after,

--if you want -1.5 years to +1.5, change to -540 to +540

and dis_age > 6.5

and hold3.dispconfidence>2

order by hold3.sname;

Note that in this case, many comments are reminders about SQL syntax (e.g. "This is a join condition"), rather than those in the previous examples that tended mostly to refer to the data itself (e.g. "exclude animals who lived for < 1 day"). This is okay; you should explain whatever you think may be unclear.

This approach is especially helpful when there are lots of technical details to relate. It's helpful for a reader if all those details are listed in one place at the beginning. However, as with an earlier example, when details are listed at the beginning it can be hard to know where a given rule is actually enforced in the code. In this example, the writer is trying to state the rules at the beginning and emphasize them again in-line, which is great. This can be risky, though: if you later decide to change a rule, it can be easy to change it in one place but not the other.

In this example, the writer included comments showing alternative code that could be added or could replace what's already there, e.g. "if you want -1.5 years to +1.5, change to -540 to +540". These kinds of comments are usually very helpful, and highly-encouraged.

Adding comments after a line of actual code has its pro's and con's. It's helpful for especially-brief comments (e.g. "excludes null values"), but if the comment or the code it refers to is especially long, then the whole line can become so long that the comment might be difficult to see.

Thorough explanation and rationale, then comments spread throughout the query

--Charles Darwin, Mar 2015
--Get relative (aka proportional) adult male ranks

--Ranks in Babase are recorded as "ordinal" rankings, i.e. first, second, third, etc. (or
--Alpha, Beta, Gamma, etc.) For some questions, it's may be useful to instead consider the
--ranks proportionally; being #1 in a group of 10 may be quite different from being #1 in
--a group of 50, for instance.  This query calculates these "relative" ranks for each line
--in the RANKS table.

--In this case, the "proportion" being calculated is:
    --the adult male's rank / number of adult males in the group
    --("adult" males are defined by whether or not they have a rank date in RANKDATES)

--This query assumes that for any given month, all the adult males are ranked above the
--subadults and juveniles.  If this is ever not so (a rel_rank > 1), a mistake was made in
--assigning ranks and/or rank dates.  I should bring these cases to a data manager's
--attention ASAP.

--First, make a temporary version of RANKS that excludes subadult and juvenile males.
WITH adm_ranks AS
    (
    SELECT ranks.*
        FROM ranks
        JOIN rankdates
          ON (rankdates.sname = ranks.sname
            --Only allow males that have attained adult rank on/by this date
            AND rankdates.ranked <= ranks.rnkdate)
        WHERE ranks.rnktype = 'ALM'
    )

--Using the temporary adult males-only RANKS table, calculate relative rank
SELECT ord_ranks.rnkid
    ,  ord_ranks.sname
    ,  ord_ranks.rnkdate
    ,  ord_ranks.grp
    ,  ord_ranks.rnktype
    ,  ord_ranks.rank AS ord_rank
    ,  num_males.grp_size
    ,  (--Change both of these from integer so we can have a non-integer quotient
        ord_ranks.rank::numeric
        / num_males.grp_size::numeric
        )::numeric(3,2) AS rel_rank --Max 3 digits, 2 of which must be after the decimal
    FROM adm_ranks AS ord_ranks
    JOIN (--Get number of adult males by counting # of lines per grp-month in adm_ranks
         SELECT grp
             ,  rnkdate
             ,  COUNT(*) AS grp_size
             FROM adm_ranks
             GROUP BY grp, rnkdate
         ) AS num_males
      ON (num_males.grp = ord_ranks.grp
        AND num_males.rnkdate = ord_ranks.rnkdate)
    --In case I want to check a particular group and/or date:
    --WHERE ord_ranks.grp=2.11 AND ord_ranks.rnkdate>='2012-01-01'
    ORDER BY rnkdate, grp, ord_rank
    ;

When looking at someone's saved queries, the overall purpose of the query is often as unclear as the rationale behind a particular line of code. More than any of the other examples, this one uses comments to explain that purpose, not just the "why" behind a particular calculation or line of code. Most saved queries probably don't need the amount of detail used here to explain their purpose, but any amount of explanation along these lines can be helpful.

Note that unlike the others, this query also uses the recommendations laid out above for using descriptive aliases for column and table names, and for using indentation to make the query generally more human-readable.

This last approach tends to borrow from the styles shown in the previous examples. Sometimes a brief comment is added in-line, sometimes a comment is relegated to it's own line. Near the end, a "WHERE" statement is commented-out, but available to be used if needed. It's up to you to decide how you'll work. You might pick one of these, or (more likely) some hybrid of them. Whatever you decide, be consistent, and don't let yourself get lazy! You'll regret it in the future!

A Couple Useful Queries

Calculate Age

  • Calculating exact age at statdate

SELECT sname, (statdate - birth)/365.25  AS age FROM biograph;

This query will create a column with sname and one with the age of each individual on their statdate. This will give us exact age. We use 365.25 days the calculation to account for leap years.

Note: There are multiple ways to calculate integer age and everyone has there own preference. The following example illustrates one method that used quite regularly in the Alberts lab.

  • Often age is required in integer years. We use the FLOOR command and add one year.

SELECT sname, FLOOR((statdate - birth)/365.25) + 1 AS age FROM biograph;

This query returns sname and age in integer years. Note that 1 year is added to the calculation. The result is actually what “year of life” the individual is in on the statdate. Using this method an infant between birth and the first birthday will be considered of age 1, or in the first year of life. An individual who is 6.3 years old will be considered of age 7, or in the 7th year of life.

Change a table name

  • Changing a table name

ALTER TABLE oldtablename RENAME TO newtablename

This query will change the name of an existing table. Simply replace "oldtablename" and "newtablename" with your specific original table name and your revised table name, respectively.

Concatenate table columns

  • Concatenating table columns

SELECT *, season || year as seasonyear FROM sampletable

This query will concatenate the data from two columns in an existing table into a single column. In the sample query above, sampletable has two columns called "season" and "year". By running this query, a new column is created that concatenates season and year. For example, a season value of "wet" and a year value of "2003" concatenate to form a new value of "wet2003" in the output table.

Extract year (or day or month) from a date field

  • Extracting year from a date field

SELECT *, extract (year from date) as year from sampletable

This query will extract the year from a date column and save the value in a new column called "year". In the sample query above, sampletable already has a column for the date (called "date"). By running this query, a new column is created that extracts the year portion of the date column and saves the output in a new column. For example, a date value of 1976-09-20 will result in a year value of 1976. Similar query structure can be used to extract the day and month from a date field by replacing "year" in the above query with either "day" or "month".

SQL_Language (last edited 2021-09-09 19:49:10 by JakeGordon)

Wiki content based upon work supported by the National Science Foundation under Grant Nos. 0323553 and 0323596. Any opinions, findings, conclusions or recommendations expressed in this material are those of the wiki contributor(s) and do not necessarily reflect the views of the National Science Foundation.