[Babase] POTENTIAL_DADS view can be tested-ish

Karl O. Pinc kop at meme.com
Sun Oct 17 22:52:23 EDT 2010



On 10/17/2010 08:31:47 PM, Susan Alberts wrote:
> thanks, i love it. probably the column age_days should be called  
> pdad_age_days and the column age_years should be pdad_age_years, just 
> 
> to avoid confusion. I know the names are long but I think clarity is  
> good here, as there are three animals in each row that age could 
> refer
>  
> to (in theory).

Ok.  


(You saw the comments/questions previously embedded in 
context below?)

FYI, here's the latest query.  I guess I'm ready
to have someone review it.

We're looking at a bit
under 5 minutes to generate the entire result set.

Note that all benchmarks are on a second run of the
query, after all possible disk I/O has been done.
If another large query has pushed, say, members
out of ram or if the machine has just booted, etc.,
then there will be extra time required to read the
disk.

I decided to use the MATERNITIES and ACTOR_ACTEES
views to keep it simpler.  As-is there seems no
performance impact.

Note that this is still not documented, and neither
is the zdate_grp column I added to the MATERNITIES
view.

SELECT maternities.child_bioid AS bioid
     , maternities.child AS kid
     , maternities.mom AS mom
     , maternities.zdate AS zdate
     , maternities.zdate_grp AS grp
     , pdads.sname AS pdad
     , CASE
         WHEN rankdates.ranked <= maternities.zdate
           THEN 'A'
         WHEN maturedates.matured <= maternities.zdate
           THEN 'S'
         ELSE 'O'
       END
       AS status
     , maternities.zdate - pdads.birth AS pdad_age_days
     , trunc((maternities.zdate - pdads.birth) / 365.25, 1) 
       AS pdad_age_years
     , (SELECT count(*)
          FROM members as dadmembers
          WHERE dadmembers.sname = pdads.sname
                AND dadmembers.grp = maternities.zdate_grp
                AND dadmembers.date < maternities.zdate
                AND dadmembers.date >= maternities.zdate - 5)
       AS estrous_presence
     , (SELECT count(*)
          FROM actor_actees
          WHERE actor_actees.date < maternities.zdate
                AND actor_actees.date >= maternities.zdate - 5
                AND (actor_actees.act = 'M'
                     OR actor_actees.act = 'E')
                AND actor_actees.actor = pdads.sname
                AND actor_actees.actee = maternities.mom)
       AS estrous_me
     , (SELECT count(*)
          FROM actor_actees
          WHERE actor_actees.date < maternities.zdate
                AND actor_actees.date >= maternities.zdate - 5
                AND actor_actees.act = 'C'
                AND actor_actees.actor = pdads.sname
                AND actor_actees.actee = maternities.mom)
       AS estrous_c

  FROM maternities
       JOIN biograph AS pdads
            ON (pdads.sname
                IN (SELECT dadmembers.sname
                      FROM members AS dadmembers
                      WHERE dadmembers.sname = pdads.sname
                            AND dadmembers.grp = maternities.zdate_grp
                            AND dadmembers.date < maternities.zdate
                            AND dadmembers.date >= maternities.zdate - 
5))

       LEFT OUTER JOIN rankdates
            ON (rankdates.sname = pdads.sname)
       LEFT OUTER JOIN maturedates
            ON (maturedates.sname = pdads.sname)
  WHERE pdads.sex = 'M'
        -- Speed things up by eliminating potential dads
        -- who could not be in interpolate into the mom's group
        -- during the fertile period.
        AND pdads.statdate >= maternities.zdate - 5 - 14
        -- Potential dad must be at least 6 years old on the zdate.
        AND trunc((maternities.zdate - pdads.birth) / 365.25, 1) >= 6;



> Susan
> 
> On Oct 17, 2010, at 5:22 PM, Karl O. Pinc wrote:
> 
> > On 10/17/2010 01:11:35 PM, Susan Alberts wrote:
> >> OK, it goes pretty fast when you query for a small subset. Very
> nice.
> >>
> >> We do want the Mom to be a column in there I realized. Just 
> calling
> >> it
> >>
> >> Mom is fine.
> >
> > Ok.  Done.
> >
> >
> >> On Oct 17, 2010, at 2:09 PM, Susan Alberts wrote:
> >>
> >>>
> >>> On Oct 17, 2010, at 2:06 PM, Karl O. Pinc wrote:
> >>>
> >>>> On 10/17/2010 12:51:32 PM, Susan Alberts wrote:
> >
> >>>>>> You can try out the POTENTIAL_DADS view.
> >>>>>
> >>>>> I have done so. very nice. Quite slow browsing when I first
> >> brought
> >>>>> it
> >>>>>
> >>>>> up, but that might have been the connectiivty problems you and
> >> Ryan
> >>>>> mentioned. On the other hand, it is a big query to build to
> view.
> >
> > Note that there is no preset order to the results so, in theory,
> > browsing is dangerous. (As with any regular unordered table or
> > other unordered views data _could_ come back in any order.)
> >
> >>>>
> >>>> It takes about a minute and a half to process.   The rest
> >>>> of the time is ppa and delivering the data to the browser.
> >
> > I completely redid some of the structure.  I will re-send the
> > query when I finish testing.  It now will likely take a lot
> > longer to look at the entire view.
> >
> > There was a serious bug -- the potential dad had to be there
> > on the zdate.
> >
> >>>>
> >>>> Well, postgres can compute age in years based on calendar date.
> >>>> I have not thought through the implications but I imagine that 
> it
> >>>> has a lot to do with whether you were born in a leap year and 
> how
> >>>> the rounding all works out.
> >>>
> >>> Ah I see. I think it is more important for us to be consistent
> >>> across different analyses, and I have always used 365.25 for
> years,
> >>
> >>> I think Jeanne uses months which can be calculated from the days
> >>> column. So my view is that we are OK on that column.
> >
> > Ok.  Good by me.  Note that, I think, some of the database checks
> > use "real" calendar age.  I.E.  Born on the 31s?  Then the 31st
> > is your birthday.  I can see possibilities for age computations
> > to vary if dividing by 365.25 at the end of longer months, 
> depending
> > on when birth happens in the leap year cycle.  Maybe.  Anyway,
> > a heads-up.
> >
> >>>>
> >>>>>
> >>>>> One thing that should be fixed is the count of days the male 
> was
> >>>>> present in the female's five-day window. Currently we have 6
> >>>>> possible
> >>>>>
> >>>>> days because I think you counted Z-day as well as the five days
> >>>>> prior.
> >>>>>
> >>>>> We only want the five days prior.
> >>>>
> >>>> Ok.  Do you want a male to be a potential dad if they were
> >>>> present on the zdate but not any of the 5 prior days?
> >>>
> >>> No
> >
> > Ok.  And we don't want MCE related counts then too.
> >
> > But all kid/mom group membership and age and status is still
> > computed based on zdate.  Right?
> >
> >>>
> >>>>
> >>>>>
> >>>>> Thanks for sending the query, I have not looked through it in
> >> detail
> >>>>> but will if you want.
> >
> > Eventually another pair of eyes would be good.
> >
> >>>>
> >>>> You could.  It's a lot simpler if you base it on the maternities
> >>>> and actor_actee views but that makes it run slower.  At least
> >> 1/3rd
> >>>> longer and probably more like twice as slow.  It may be
> >>>> a false optimization to tweak it like this, more trouble in the
> >>>> long run to maintain, but I've not figured that out yet.
> >>>> This is the sort of syntax-ie stuff I'm thinking about.
> >>>> (Note that the whole first series of joins, up until you
> >>>> get to the pdad stuff, is a re-creation of maternities.)
> >>>>
> >>>> Any thoughts on performance?  If you ask for a particular
> >>>> kid then it runs fast enough no matter how it's formulated.
> >>>
> >>> OK I will do some querying to test and see.
> >
> > Is it typical you return the entire result set or use the
> > view as a table in another query?   If not then it can
> > take twice as long and we don't really care.
> >
> >
> >
> >
> >>>
> >>> Susan
> >>>
> >>>>
> >>>>>
> >>>>> Susan
> >>>>>
> >>>>>>
> >>>>>> FYI, here's the query (more or less, I keep fussing):
> >>>>>>
> >>>>>> SELECT kids.bioid AS bioid
> >>>>>>  , kids.sname AS kid
> >>>>>>  , zdates.date AS zdate
> >>>>>>  , kidmembers.grp AS grp
> >>>>>>  , pdads.sname AS pdad
> >>>>>>  , CASE
> >>>>>>      WHEN rankdates.ranked <= pdadmembers.date
> >>>>>>        THEN 'A'
> >>>>>>      WHEN maturedates.matured <= pdadmembers.date
> >>>>>>        THEN 'S'
> >>>>>>      ELSE 'O'
> >>>>>>    END
> >>>>>>    AS status
> >>>>>>  , pdadmembers.date - pdads.birth AS age_days
> >>>>>>  , trunc((pdadmembers.date - pdads.birth) / 365.25,
> >>>>>>          1) AS age_years
> >>>>>>  , (SELECT count(*)
> >>>>>>       FROM members as dadmembers
> >>>>>>       WHERE dadmembers.sname = pdads.sname
> >>>>>>             AND dadmembers.grp = kidmembers.grp
> >>>>>>             AND dadmembers.date <= zdates.date
> >>>>>>             AND dadmembers.date >= zdates.date - 5)
> >>>>>>    AS estrous_presence
> >>>>>>  , (SELECT count(*)
> >>>>>>       FROM interact_data, parts AS actor, parts AS actee
> >>>>>>       WHERE actor.sname = pdads.sname
> >>>>>>             AND actee.sname = cycles.sname
> >>>>>>             AND interact_data.iid = actor.iid
> >>>>>>             AND interact_data.iid = actee.iid
> >>>>>>             AND interact_data.date <= zdates.date
> >>>>>>             AND interact_data.date >= zdates.date - 5
> >>>>>>             AND (interact_data.act = 'M'
> >>>>>>                  OR interact_data.act = 'E'))
> >>>>>>    AS estrous_me
> >>>>>>  , (SELECT count(*)
> >>>>>>       FROM interact_data, parts AS actor, parts AS actee
> >>>>>>       WHERE actor.sname = pdads.sname
> >>>>>>             AND actee.sname = cycles.sname
> >>>>>>             AND interact_data.iid = actor.iid
> >>>>>>             AND interact_data.iid = actee.iid
> >>>>>>             AND interact_data.date <= zdates.date
> >>>>>>             AND interact_data.date >= zdates.date - 5
> >>>>>>             AND interact_data.act = 'C')
> >>>>>>    AS estrous_c
> >>>>>>
> >>>>>> FROM biograph as kids
> >>>>>>    JOIN pregs
> >>>>>>      ON (pregs.pid = kids.pid)
> >>>>>>    JOIN cycpoints AS zdates
> >>>>>>      ON (zdates.cpid = pregs.conceive)
> >>>>>>    JOIN cycles
> >>>>>>      ON (cycles.cid = zdates.cid)
> >>>>>>    JOIN members AS kidmembers
> >>>>>>      ON (kidmembers.date = zdates.date
> >>>>>>          AND kidmembers.sname = cycles.sname)
> >>>>>>    JOIN members AS pdadmembers
> >>>>>>         ON (pdadmembers.date = zdates.date
> >>>>>>             AND pdadmembers.grp = kidmembers.grp)
> >>>>>>    JOIN biograph AS pdads
> >>>>>>         ON (pdadmembers.sname = pdads.sname)
> >>>>>>    LEFT OUTER JOIN rankdates
> >>>>>>         ON (rankdates.sname = pdads.sname)
> >>>>>>    LEFT OUTER JOIN maturedates
> >>>>>>         ON (maturedates.sname = pdads.sname)
> >>>>>>
> >>>>>> WHERE pdads.sex = 'M'
> >>>>>>     AND trunc((pdadmembers.date - pdads.birth) / 365.25, 1) >=
> >> 6
> >>>>>>     AND pdads.sname
> >>>>>>         IN (SELECT dadmembers.sname
> >>>>>>               FROM members AS dadmembers
> >>>>>>               WHERE dadmembers.sname = pdads.sname
> >>>>>>                     AND dadmembers.grp = kidmembers.grp
> >>>>>>                     AND dadmembers.date <= zdates.date
> >>>>>>                     AND dadmembers.date >= zdates.date - 5);
> >>>>>>
> >>>>>>
> >>>>>> Karl <kop at meme.com>
> >>>>>> Free Software:  "You don't pay back, you pay forward."
> >>>>>>              -- Robert A. Heinlein
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Babase mailing list
> >>>>>> Babase at www.eco.princeton.edu
> >>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>>
> >>>>> ------------------------------------------------------------
> >>>>> Susan Alberts, Professor of Biology, Duke University, Box 
> 90338,
> >>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Babase mailing list
> >>>>> Babase at www.eco.princeton.edu
> >>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Karl <kop at meme.com>
> >>>> Free Software:  "You don't pay back, you pay forward."
> >>>>               -- Robert A. Heinlein
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Babase mailing list
> >>>> Babase at www.eco.princeton.edu
> >>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>
> >>> ------------------------------------------------------------
> >>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
> >>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Babase mailing list
> >>> Babase at www.eco.princeton.edu
> >>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>
> >> ------------------------------------------------------------
> >> Susan Alberts, Professor of Biology, Duke University, Box 90338,
> >> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Babase mailing list
> >> Babase at www.eco.princeton.edu
> >> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>
> >>
> >
> >
> >
> >
> > Karl <kop at meme.com>
> > Free Software:  "You don't pay back, you pay forward."
> >                 -- Robert A. Heinlein
> >
> >
> > _______________________________________________
> > Babase mailing list
> > Babase at www.eco.princeton.edu
> > http://www.eco.princeton.edu/mailman/listinfo/babase
> 
> ------------------------------------------------------------
> Susan Alberts, Professor of Biology, Duke University, Box 90338,  
> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> 
> 
> 
> 
> _______________________________________________
> Babase mailing list
> Babase at www.eco.princeton.edu
> http://www.eco.princeton.edu/mailman/listinfo/babase
> 
> 




Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein




More information about the Babase mailing list