1. What does ending your query with “limit 100” do and when should you use this statement?

  1. What data will this query return?
    select * from biograph, members 
    where matgrp = grp and 
    biograph.sname = members.sname and 
    sex = 'M' and 
    status = 0 and 
    dcause = 0 and 
    statdate = date 
    order by matgrp, pid;
  2. What data will this query return?
    select * from biograph, members 
    where biograph.matgrp = members.grp and 
    biograph.sname = members.sname and 
    biograph.sex = 'M' and 
    biograph.status = 0 and 
    biograph.dcause = 0 and 
    biograph.statdate = members.date 
    order by biograph.matgrp, biograph.pid;
  3. What data will this query return?
    select *
      from biograph
           join members on (biograph.matgrp = members.grp and 
                            biograph.sname = members.sname)
       where biograph.sex = 'M' and 
             biograph.status = 0 and 
             biograph.dcause = 0 and 
             biograph.statdate = members.date 
       order by biograph.matgrp, biograph.pid;
  4. What data will this query return?
    -- This is an illustrative query used in the Babase Quiz
    -- It is fiendishly tricky so pay close attention!
    SELECT *
      FROM biograph
           JOIN members ON (biograph.matgrp = members.grp AND
                            biograph.sname = members.sname)
      WHERE biograph.sex = 'M' AND   --  M means male (tricky 'eh?)
            biograph.status = 0 AND
            biograph.dcause = 0 AND
            biograph.statdate = members.date 
      ORDER BY biograph.matgrp, biograph.pid;
  5. What are the pros and cons of the various queries above. Which do you prefer?

  1. Write a query that shows each date that each male offspring spent in his maternal group -- the one codified in the GROUPS table.
    SELECT biograph.matgrp, biograph.sname, members.date
      FROM biograph
           JOIN members ON (members.grp = biograph.matgrp AND
                            members.sname = biograph.sname)
      WHERE biograph.sex = 'M'
      ORDER BY biograph.matgrp, biograph.sname, members.date;
  2. Write a query that returns the number of days that each male offspring spends in his maternal group -- the one codified in the GROUPS table.
    SELECT biograph.matgrp, biograph.sname, COUNT(*) AS daysingrp
      FROM biograph
           JOIN members ON (members.grp = biograph.matgrp AND
                            members.sname = biograph.sname)
      WHERE biograph.sex = 'M'
      GROUP BY biograph.matgrp, biograph.sname
      ORDER BY biograph.matgrp, biograph.sname;

    Hot tip to takeaway from the above: First get the rows you want, then analyze them.

  3. Write a query that returns the mean number of days, per each group defined in the GROUPS table, that a male offspring spends in his maternal group. Include a column showing how many males were in the sample. Having done that, extend the query so it shows not only the mean but the standard deviation.
    -- Use a transaction, so even if there's an error the
    -- table we create will go away automatically when the transaction
    -- rolls back.
    BEGIN;
    
    -- Create an intermediate table in my personal schema to query for the real results.
    SELECT biograph.matgrp, biograph.sname, COUNT(*) AS daysingrp
      INTO TABLE myschema.foobar
      FROM biograph
           JOIN members ON (members.grp = biograph.matgrp AND
                            members.sname = biograph.sname)
      WHERE biograph.sex = 'M'
      GROUP BY biograph.matgrp, biograph.sname;
    
    SELECT foobar.matgrp, COUNT(*), AVG(foobar.daysingrp)
      FROM foobar
      GROUP BY foobar.matgrp
      ORDER BY foobar.matgrp;
    
    -- Undo all the changes we made to the database, i.e. forget
    -- about the intermediate table we created.
    ROLLBACK;
    or better:
    -- Create a temporary table that automatically goes away when this
    -- sql session is done executing.
    SELECT biograph.matgrp, biograph.sname, COUNT(*) AS daysingrp
      INTO TEMP TABLE foobar
      FROM biograph
           JOIN members ON (members.grp = biograph.matgrp AND
                            members.sname = biograph.sname)
      WHERE biograph.sex = 'M'
      GROUP BY biograph.matgrp, biograph.sname;
    
    SELECT foobar.matgrp, COUNT(*), AVG(foobar.daysingrp)
      FROM foobar
      GROUP BY foobar.matgrp
      ORDER BY foobar.matgrp;
    or even better:
    -- Instead of a real table run a sub-query and use those results
    -- as if they were a real table.
    SELECT foobar.matgrp, COUNT(*), AVG(foobar.daysingrp)
      FROM (SELECT biograph.matgrp, biograph.sname, COUNT(*) AS daysingrp
              FROM biograph
                   JOIN members ON (members.grp = biograph.matgrp AND
                                    members.sname = biograph.sname)
              WHERE biograph.sex = 'M'
              GROUP BY biograph.matgrp, biograph.sname) AS foobar
      GROUP BY foobar.matgrp
      ORDER BY foobar.matgrp;
    -- And, if we want standard devation:
    SELECT foobar.matgrp, COUNT(*), AVG(foobar.daysingrp), STDDEV(foobar.daysingrp)
      FROM (SELECT biograph.matgrp, biograph.sname, COUNT(*) AS daysingrp
              FROM biograph
                   JOIN members ON (members.grp = biograph.matgrp AND
                                    members.sname = biograph.sname)
              WHERE biograph.sex = 'M'
              GROUP BY biograph.matgrp, biograph.sname) AS foobar
      GROUP BY foobar.matgrp
      ORDER BY foobar.matgrp;
  4. What are the four things that are wrong with using the above queries to analyze how long males stay in their maternal group before dispersing? I.e. Given that the purpose is to analyze male dispersal what's wrong with the way the questions above were framed and what data peculiarities get in the way of obtaining a good answer? Answer: 1) Some of the males die before they disperse. 2) Some of the groups fission before the juvenile males disperse so the males "move" to another group as the result of fission. 3) Some groups were not under continuous observation and when there is no observation for long enough individuals are automatically placed in the unknown group. 4) As of the end of the most recent data entered some of the males are not old enough to have dispersed.
  5. Write a query that shows, for each male, how many days the male stays in it's maternal group before dispersing. Hint: Use the DISPSERSEDATES table.
    -- We can use the Babase julian() function, which returns
    -- the date as a number of days from a starting point in time.
    
    SELECT biograph.sname, julian(dispersedates.dispersed) - julian(biograph.birth)
      FROM biograph JOIN dispersedates ON (dispersedates.sname = biograph.sname)
      ORDER BY biograph.sname;
    
    -- But really that is overkill because it turns out that the default
    -- result when you subtract one date from another in PostgreSQL is
    -- the number of days between the dates.
    SELECT biograph.sname, dispersedates.dispersed - biograph.birth
      FROM biograph JOIN dispersedates ON (dispersedates.sname = biograph.sname)
      ORDER BY biograph.sname;
  6. Ignoring problems due to intermittent observational frequency of the groups, what is the per-group sample size, mean, and standard deviation of the number of days a male stays in each group before dispersing?
    SELECT biograph.matgrp AS grp,
              count(*) AS samplesize,
              avg(dispersedates.dispersed - biograph.birth) AS mean,
              stddev(dispersedates.dispersed - biograph.birth) AS standarddeviation
     FROM biograph JOIN dispersedates ON (dispersedates.sname = biograph.sname)
     GROUP BY biograph.matgrp
     ORDER BY biograph.matgrp;

ExtendedQuizQuestions (last edited 2014-04-05 16:20:38 by KarlPinc)

Wiki content based upon work supported by the National Science Foundation under Grant Nos. 0323553 and 0323596. Any opinions, findings, conclusions or recommendations expressed in this material are those of the wiki contributor(s) and do not necessarily reflect the views of the National Science Foundation.