- What does ending your query with “limit 100” do and when should you use this statement?
- What data will this query return? {{{select * from biograph, members
where matgrp = grp and biograph.sname = members.sname and sex = 'M' and status = 0 and dcause = 0 and statdate = date order by matgrp, pid; }}}
- What data will this query return? {{{select * from biograph, members
where biograph.matgrp = members.grp and biograph.sname = members.sname and biograph.sex = 'M' and biograph.status = 0 and biograph.dcause = 0 and biograph.statdate = members.date order by biograph.matgrp, biograph.pid; }}}
- What data will this query return? {{{select *
- from biograph
- join members on (biograph.matgrp = members.grp and
- biograph.sname = members.sname)
- where biograph.sex = 'M' and
- biograph.status = 0 and biograph.dcause = 0 and biograph.statdate = members.date
- join members on (biograph.matgrp = members.grp and
- from biograph
}}}
- What data will this query return? {{{-- This is an illustrative query used in the Babase Quiz
-- It is fiendishly tricky so pay close attention! SELECT *
- FROM biograph
- JOIN members ON (biograph.matgrp = members.grp AND
- biograph.sname = members.sname)
- biograph.status = 0 AND biograph.dcause = 0 AND biograph.statdate = members.date
- JOIN members ON (biograph.matgrp = members.grp AND
}}}
- What are the pros and cons of the various queries above. Which do you prefer?
- Answer:
- It's a matter of both taste and purpose. Your mileage may vary. While these methods are considered safe for most adults, always contact your physician before use as individual reactions may vary. De gustibus non est disputandum. When column names are not qualified
with the tables to which they belong (by writing tablename.columname) the query is quick to type and answers can be obtained rapidly. But fully qualifying all the column names has some advantages. It makes the query easier to understand and it makes the query easier to extend; extending a query by joining another table will break a query which uses unqualified column names if the new table shares a column name with any of the previously used tables. Queries do not usually spring into existence all at once, they are usually developed step by step with additional tables added as the query develops. Getting into the habit of fully qualifying all your column names can keep you focused as you write a more complex query because you don't have to go back and qualify columns when you want to concentrate on extending the query you are building. Likewise, breaking a query into multiple lines by "clause", FROM, WHERE, etc., and indenting the lines takes longer but makes the query easier to understand. This can help in developing a complex query and when going back to read and re-use an old query. When the query is simple and will never be re-used there's no point in expending the effort. At other times formatting can be very useful.
Using JOIN table ON (condition) makes your queries more clear, to both the reader and the computer, by explicitly specifying which columns are used to join tables. These correspond to the lines in the entity relationship diagrams that document the database structure. If the join relationships are buried in the WHERE clause along with the conditions that further restrict which rows from individual tables should be incorporated into the results it is more difficult to understand what the query is doing. Using capitalization to separate the keywords that are part of SQL, like SELECT, FROM, WHERE, AND, etc., from the parts of the query that are specific to Babase helps draw the eye to important aspects of the query and ignore unimportant aspects. At times the difference between an AND and an OR can be critical. At other times it's more important to focus on which tables are involved. Visually separating the SQL keywords from the source of the data helps to debug syntax and differentiate semantically similar queries. Remember, a query that can't be read by a human, even one that runs perfectly, is useless because nobody knows what the output means. Finally, adding comments, both specific comments describing what parts of the query do and why they are there, and general comments describing the overall purpose of the query helps immensely when going back to re-visit an old query. Often an analysis may take some time, and require corrections to both data and queries and require multiple runs of the same query. It is usually worth the up-front time to make it easy to go back and understand just what your queries do.
- Write a query that shows each date that each male offspring spent in his maternal group -- the one codified in the GROUPS table. {{{SELECT biograph.matgrp, biograph.sname, members.date
- FROM biograph
- JOIN members ON (members.grp = biograph.matgrp AND
- members.sname = biograph.sname)
- JOIN members ON (members.grp = biograph.matgrp AND
- FROM biograph
}}}
- Write a query that returns the number of days that each male offspring spends in his maternal group -- the one codified in the GROUPS table. {{{SELECT biograph.matgrp, biograph.sname, count(*) AS daysingrp
- FROM biograph
- JOIN members ON (members.grp = biograph.matgrp AND
- members.sname = biograph.sname)
- JOIN members ON (members.grp = biograph.matgrp AND
- FROM biograph
}}}
Hot tip: First get the rows you want, then analyze them.
- Write a query that returns the mean number of days, per each group defined in the GROUPS table, that a male offspring spends in his maternal group. Include a column showing how many males were in the sample. Having done that, extend the query so it shows not only the mean but the standard deviation. {{{-- Use a transaction, so even if there's an error the
-- table we create will go away automatically when the transaction -- rolls back. BEGIN;
-- Create an intermediate table in my personal schema to get the real results from. SELECT biograph.matgrp, biograph.sname, count(*) AS daysingrp
- INTO TABLE myschema.foobar FROM biograph
- JOIN members ON (members.grp = biograph.matgrp AND
- members.sname = biograph.sname)
- JOIN members ON (members.grp = biograph.matgrp AND
SELECT foobar.matgrp, count(*), avg(foobar.daysingrp)
- FROM foobar GROUP BY foobar.matgrp ORDER BY foobar.matgrp;
-- Undo all the changes we made to the database, i.e. forget -- about the intermediate table we created. ROLLBACK; }}}
- or better: {{{-- Create a temporary table that automatically goes away when this
-- sql session is done executing. SELECT biograph.matgrp, biograph.sname, count(*) AS daysingrp
- INTO TEMP TABLE foobar FROM biograph
- JOIN members ON (members.grp = biograph.matgrp AND
- members.sname = biograph.sname)
- JOIN members ON (members.grp = biograph.matgrp AND
SELECT foobar.matgrp, count(*), avg(foobar.daysingrp)
- FROM foobar GROUP BY foobar.matgrp ORDER BY foobar.matgrp;
}}}
- or even better: {{{-- Instead of a real table run a query and use those results
-- as if they were a real table. SELECT foobar.matgrp, count(*), avg(foobar.daysingrp)
- FROM (SELECT biograph.matgrp, biograph.sname, count(*) AS daysingrp
- FROM biograph
- JOIN members ON (members.grp = biograph.matgrp AND
- members.sname = biograph.sname)
- JOIN members ON (members.grp = biograph.matgrp AND
- FROM biograph
}}}
- {{{-- And, if we want standard devation:
SELECT foobar.matgrp, count(*), avg(foobar.daysingrp), stddev(foobar.daysingrp)
- FROM (SELECT biograph.matgrp, biograph.sname, count(*) AS daysingrp
- FROM biograph
- JOIN members ON (members.grp = biograph.matgrp AND
- members.sname = biograph.sname)
- JOIN members ON (members.grp = biograph.matgrp AND
- FROM biograph
}}}
- What are the four things that are wrong with using the above queries to analyze how long males stay in their maternal group before dispersing? I.e. What's wrong with the way the questions above were framed and what data peculiarities get in the way of obtaining a good answer? Answer: 1) Some of the males die before they disperse. 2) Some of the groups fission before the juvenile males disperse so the males "move" to another group as the result of fission. 3) Some groups were not under continuous observation and when there is no observation for long enough individuals are automatically placed in the unknown group. 4) As of the end of the most recent data entered some of the males are not old enough to have dispersed.
