[Babase] POTENTIAL_DADS view can be tested-ish

Karl O. Pinc kop at meme.com
Wed Oct 20 22:57:34 EDT 2010


Susan,

On 10/18/2010 11:41:55 AM, Susan Alberts wrote:
> Thanks, I've gone through the query and I think it is lovely. No  
> seriously, I learned a few things from it (that I will probably  
> immediately forget but now I know I can go look there for how to do a 
> 
> couple of things I'd like to do from time to time). I didn't see any  
> problems, it seemed crystal clear to me. But, I don't have a lot of  
> querying chops at the moment so I am not the best judge.

I think I see a problem.  The query is looking for all the
dads that are in the mom's group during the fertile period,
but is defining "the mom's group" to be the mom's group as
of the zdate.  If the group fissions during the fertile period
and the mom changes groups you won't get good answers.

What you want is to test the potential dad's group on each
day of the fertile period against the mom's group on the
same day of the fertile period.  Right?

To better manage group fissions you probably _really_
want to test the potential dad's supergroup on each
day of the fertile period against the mom's supergroup
on that same day.  That way during fission you can census
moms and dads in sub-groups that have not yet become
permanent and the test will still find the dads in
the same supergroup as the moms.

It's all do-able. and relatively straightforward,
and will probably perform fine when
querying restricted sub-sets of the data although
will probably also take even longer than it does now
if you're looking at the whole view at once.

Where do you want to go with this?

> 
> Susan
> 
> On Oct 17, 2010, at 10:52 PM, Karl O. Pinc wrote:
> 
> >
> >
> > On 10/17/2010 08:31:47 PM, Susan Alberts wrote:
> >> thanks, i love it. probably the column age_days should be called
> >> pdad_age_days and the column age_years should be pdad_age_years,
> just
> >>
> >> to avoid confusion. I know the names are long but I think clarity
> is
> >> good here, as there are three animals in each row that age could
> >> refer
> >>
> >> to (in theory).
> >
> > Ok.
> >
> >
> > (You saw the comments/questions previously embedded in
> > context below?)
> >
> > FYI, here's the latest query.  I guess I'm ready
> > to have someone review it.
> >
> > We're looking at a bit
> > under 5 minutes to generate the entire result set.
> >
> > Note that all benchmarks are on a second run of the
> > query, after all possible disk I/O has been done.
> > If another large query has pushed, say, members
> > out of ram or if the machine has just booted, etc.,
> > then there will be extra time required to read the
> > disk.
> >
> > I decided to use the MATERNITIES and ACTOR_ACTEES
> > views to keep it simpler.  As-is there seems no
> > performance impact.
> >
> > Note that this is still not documented, and neither
> > is the zdate_grp column I added to the MATERNITIES
> > view.
> >
> > SELECT maternities.child_bioid AS bioid
> >     , maternities.child AS kid
> >     , maternities.mom AS mom
> >     , maternities.zdate AS zdate
> >     , maternities.zdate_grp AS grp
> >     , pdads.sname AS pdad
> >     , CASE
> >         WHEN rankdates.ranked <= maternities.zdate
> >           THEN 'A'
> >         WHEN maturedates.matured <= maternities.zdate
> >           THEN 'S'
> >         ELSE 'O'
> >       END
> >       AS status
> >     , maternities.zdate - pdads.birth AS pdad_age_days
> >     , trunc((maternities.zdate - pdads.birth) / 365.25, 1)
> >       AS pdad_age_years
> >     , (SELECT count(*)
> >          FROM members as dadmembers
> >          WHERE dadmembers.sname = pdads.sname
> >                AND dadmembers.grp = maternities.zdate_grp
> >                AND dadmembers.date < maternities.zdate
> >                AND dadmembers.date >= maternities.zdate - 5)
> >       AS estrous_presence
> >     , (SELECT count(*)
> >          FROM actor_actees
> >          WHERE actor_actees.date < maternities.zdate
> >                AND actor_actees.date >= maternities.zdate - 5
> >                AND (actor_actees.act = 'M'
> >                     OR actor_actees.act = 'E')
> >                AND actor_actees.actor = pdads.sname
> >                AND actor_actees.actee = maternities.mom)
> >       AS estrous_me
> >     , (SELECT count(*)
> >          FROM actor_actees
> >          WHERE actor_actees.date < maternities.zdate
> >                AND actor_actees.date >= maternities.zdate - 5
> >                AND actor_actees.act = 'C'
> >                AND actor_actees.actor = pdads.sname
> >                AND actor_actees.actee = maternities.mom)
> >       AS estrous_c
> >
> >  FROM maternities
> >       JOIN biograph AS pdads
> >            ON (pdads.sname
> >                IN (SELECT dadmembers.sname
> >                      FROM members AS dadmembers
> >                      WHERE dadmembers.sname = pdads.sname
> >                            AND dadmembers.grp =
> maternities.zdate_grp
> >                            AND dadmembers.date < maternities.zdate
> >                            AND dadmembers.date >= maternities.zdate
> -
> > 5))
> >
> >       LEFT OUTER JOIN rankdates
> >            ON (rankdates.sname = pdads.sname)
> >       LEFT OUTER JOIN maturedates
> >            ON (maturedates.sname = pdads.sname)
> >  WHERE pdads.sex = 'M'
> >        -- Speed things up by eliminating potential dads
> >        -- who could not be in interpolate into the mom's group
> >        -- during the fertile period.
> >        AND pdads.statdate >= maternities.zdate - 5 - 14
> >        -- Potential dad must be at least 6 years old on the zdate.
> >        AND trunc((maternities.zdate - pdads.birth) / 365.25, 1) >=
> 6;
> >
> >
> >
> >> Susan
> >>
> >> On Oct 17, 2010, at 5:22 PM, Karl O. Pinc wrote:
> >>
> >>> On 10/17/2010 01:11:35 PM, Susan Alberts wrote:
> >>>> OK, it goes pretty fast when you query for a small subset. Very
> >> nice.
> >>>>
> >>>> We do want the Mom to be a column in there I realized. Just
> >> calling
> >>>> it
> >>>>
> >>>> Mom is fine.
> >>>
> >>> Ok.  Done.
> >>>
> >>>
> >>>> On Oct 17, 2010, at 2:09 PM, Susan Alberts wrote:
> >>>>
> >>>>>
> >>>>> On Oct 17, 2010, at 2:06 PM, Karl O. Pinc wrote:
> >>>>>
> >>>>>> On 10/17/2010 12:51:32 PM, Susan Alberts wrote:
> >>>
> >>>>>>>> You can try out the POTENTIAL_DADS view.
> >>>>>>>
> >>>>>>> I have done so. very nice. Quite slow browsing when I first
> >>>> brought
> >>>>>>> it
> >>>>>>>
> >>>>>>> up, but that might have been the connectiivty problems you 
> and
> >>>> Ryan
> >>>>>>> mentioned. On the other hand, it is a big query to build to
> >> view.
> >>>
> >>> Note that there is no preset order to the results so, in theory,
> >>> browsing is dangerous. (As with any regular unordered table or
> >>> other unordered views data _could_ come back in any order.)
> >>>
> >>>>>>
> >>>>>> It takes about a minute and a half to process.   The rest
> >>>>>> of the time is ppa and delivering the data to the browser.
> >>>
> >>> I completely redid some of the structure.  I will re-send the
> >>> query when I finish testing.  It now will likely take a lot
> >>> longer to look at the entire view.
> >>>
> >>> There was a serious bug -- the potential dad had to be there
> >>> on the zdate.
> >>>
> >>>>>>
> >>>>>> Well, postgres can compute age in years based on calendar 
> date.
> >>>>>> I have not thought through the implications but I imagine that
> >> it
> >>>>>> has a lot to do with whether you were born in a leap year and
> >> how
> >>>>>> the rounding all works out.
> >>>>>
> >>>>> Ah I see. I think it is more important for us to be consistent
> >>>>> across different analyses, and I have always used 365.25 for
> >> years,
> >>>>
> >>>>> I think Jeanne uses months which can be calculated from the 
> days
> >>>>> column. So my view is that we are OK on that column.
> >>>
> >>> Ok.  Good by me.  Note that, I think, some of the database checks
> >>> use "real" calendar age.  I.E.  Born on the 31s?  Then the 31st
> >>> is your birthday.  I can see possibilities for age computations
> >>> to vary if dividing by 365.25 at the end of longer months,
> >> depending
> >>> on when birth happens in the leap year cycle.  Maybe.  Anyway,
> >>> a heads-up.
> >>>
> >>>>>>
> >>>>>>>
> >>>>>>> One thing that should be fixed is the count of days the male
> >> was
> >>>>>>> present in the female's five-day window. Currently we have 6
> >>>>>>> possible
> >>>>>>>
> >>>>>>> days because I think you counted Z-day as well as the five
> days
> >>>>>>> prior.
> >>>>>>>
> >>>>>>> We only want the five days prior.
> >>>>>>
> >>>>>> Ok.  Do you want a male to be a potential dad if they were
> >>>>>> present on the zdate but not any of the 5 prior days?
> >>>>>
> >>>>> No
> >>>
> >>> Ok.  And we don't want MCE related counts then too.
> >>>
> >>> But all kid/mom group membership and age and status is still
> >>> computed based on zdate.  Right?
> >>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks for sending the query, I have not looked through it in
> >>>> detail
> >>>>>>> but will if you want.
> >>>
> >>> Eventually another pair of eyes would be good.
> >>>
> >>>>>>
> >>>>>> You could.  It's a lot simpler if you base it on the
> maternities
> >>>>>> and actor_actee views but that makes it run slower.  At least
> >>>> 1/3rd
> >>>>>> longer and probably more like twice as slow.  It may be
> >>>>>> a false optimization to tweak it like this, more trouble in 
> the
> >>>>>> long run to maintain, but I've not figured that out yet.
> >>>>>> This is the sort of syntax-ie stuff I'm thinking about.
> >>>>>> (Note that the whole first series of joins, up until you
> >>>>>> get to the pdad stuff, is a re-creation of maternities.)
> >>>>>>
> >>>>>> Any thoughts on performance?  If you ask for a particular
> >>>>>> kid then it runs fast enough no matter how it's formulated.
> >>>>>
> >>>>> OK I will do some querying to test and see.
> >>>
> >>> Is it typical you return the entire result set or use the
> >>> view as a table in another query?   If not then it can
> >>> take twice as long and we don't really care.
> >>>
> >>>
> >>>
> >>>
> >>>>>
> >>>>> Susan
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Susan
> >>>>>>>
> >>>>>>>>
> >>>>>>>> FYI, here's the query (more or less, I keep fussing):
> >>>>>>>>
> >>>>>>>> SELECT kids.bioid AS bioid
> >>>>>>>> , kids.sname AS kid
> >>>>>>>> , zdates.date AS zdate
> >>>>>>>> , kidmembers.grp AS grp
> >>>>>>>> , pdads.sname AS pdad
> >>>>>>>> , CASE
> >>>>>>>>     WHEN rankdates.ranked <= pdadmembers.date
> >>>>>>>>       THEN 'A'
> >>>>>>>>     WHEN maturedates.matured <= pdadmembers.date
> >>>>>>>>       THEN 'S'
> >>>>>>>>     ELSE 'O'
> >>>>>>>>   END
> >>>>>>>>   AS status
> >>>>>>>> , pdadmembers.date - pdads.birth AS age_days
> >>>>>>>> , trunc((pdadmembers.date - pdads.birth) / 365.25,
> >>>>>>>>         1) AS age_years
> >>>>>>>> , (SELECT count(*)
> >>>>>>>>      FROM members as dadmembers
> >>>>>>>>      WHERE dadmembers.sname = pdads.sname
> >>>>>>>>            AND dadmembers.grp = kidmembers.grp
> >>>>>>>>            AND dadmembers.date <= zdates.date
> >>>>>>>>            AND dadmembers.date >= zdates.date - 5)
> >>>>>>>>   AS estrous_presence
> >>>>>>>> , (SELECT count(*)
> >>>>>>>>      FROM interact_data, parts AS actor, parts AS actee
> >>>>>>>>      WHERE actor.sname = pdads.sname
> >>>>>>>>            AND actee.sname = cycles.sname
> >>>>>>>>            AND interact_data.iid = actor.iid
> >>>>>>>>            AND interact_data.iid = actee.iid
> >>>>>>>>            AND interact_data.date <= zdates.date
> >>>>>>>>            AND interact_data.date >= zdates.date - 5
> >>>>>>>>            AND (interact_data.act = 'M'
> >>>>>>>>                 OR interact_data.act = 'E'))
> >>>>>>>>   AS estrous_me
> >>>>>>>> , (SELECT count(*)
> >>>>>>>>      FROM interact_data, parts AS actor, parts AS actee
> >>>>>>>>      WHERE actor.sname = pdads.sname
> >>>>>>>>            AND actee.sname = cycles.sname
> >>>>>>>>            AND interact_data.iid = actor.iid
> >>>>>>>>            AND interact_data.iid = actee.iid
> >>>>>>>>            AND interact_data.date <= zdates.date
> >>>>>>>>            AND interact_data.date >= zdates.date - 5
> >>>>>>>>            AND interact_data.act = 'C')
> >>>>>>>>   AS estrous_c
> >>>>>>>>
> >>>>>>>> FROM biograph as kids
> >>>>>>>>   JOIN pregs
> >>>>>>>>     ON (pregs.pid = kids.pid)
> >>>>>>>>   JOIN cycpoints AS zdates
> >>>>>>>>     ON (zdates.cpid = pregs.conceive)
> >>>>>>>>   JOIN cycles
> >>>>>>>>     ON (cycles.cid = zdates.cid)
> >>>>>>>>   JOIN members AS kidmembers
> >>>>>>>>     ON (kidmembers.date = zdates.date
> >>>>>>>>         AND kidmembers.sname = cycles.sname)
> >>>>>>>>   JOIN members AS pdadmembers
> >>>>>>>>        ON (pdadmembers.date = zdates.date
> >>>>>>>>            AND pdadmembers.grp = kidmembers.grp)
> >>>>>>>>   JOIN biograph AS pdads
> >>>>>>>>        ON (pdadmembers.sname = pdads.sname)
> >>>>>>>>   LEFT OUTER JOIN rankdates
> >>>>>>>>        ON (rankdates.sname = pdads.sname)
> >>>>>>>>   LEFT OUTER JOIN maturedates
> >>>>>>>>        ON (maturedates.sname = pdads.sname)
> >>>>>>>>
> >>>>>>>> WHERE pdads.sex = 'M'
> >>>>>>>>    AND trunc((pdadmembers.date - pdads.birth) / 365.25, 1) 
> >=
> >>>> 6
> >>>>>>>>    AND pdads.sname
> >>>>>>>>        IN (SELECT dadmembers.sname
> >>>>>>>>              FROM members AS dadmembers
> >>>>>>>>              WHERE dadmembers.sname = pdads.sname
> >>>>>>>>                    AND dadmembers.grp = kidmembers.grp
> >>>>>>>>                    AND dadmembers.date <= zdates.date
> >>>>>>>>                    AND dadmembers.date >= zdates.date - 5);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Karl <kop at meme.com>
> >>>>>>>> Free Software:  "You don't pay back, you pay forward."
> >>>>>>>>             -- Robert A. Heinlein
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Babase mailing list
> >>>>>>>> Babase at www.eco.princeton.edu
> >>>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>>>>
> >>>>>>> ------------------------------------------------------------
> >>>>>>> Susan Alberts, Professor of Biology, Duke University, Box
> >> 90338,
> >>>>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Babase mailing list
> >>>>>>> Babase at www.eco.princeton.edu
> >>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Karl <kop at meme.com>
> >>>>>> Free Software:  "You don't pay back, you pay forward."
> >>>>>>              -- Robert A. Heinlein
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Babase mailing list
> >>>>>> Babase at www.eco.princeton.edu
> >>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>>
> >>>>> ------------------------------------------------------------
> >>>>> Susan Alberts, Professor of Biology, Duke University, Box 
> 90338,
> >>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Babase mailing list
> >>>>> Babase at www.eco.princeton.edu
> >>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>
> >>>> ------------------------------------------------------------
> >>>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
> >>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Babase mailing list
> >>>> Babase at www.eco.princeton.edu
> >>>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>> Karl <kop at meme.com>
> >>> Free Software:  "You don't pay back, you pay forward."
> >>>                -- Robert A. Heinlein
> >>>
> >>>
> >>> _______________________________________________
> >>> Babase mailing list
> >>> Babase at www.eco.princeton.edu
> >>> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>
> >> ------------------------------------------------------------
> >> Susan Alberts, Professor of Biology, Duke University, Box 90338,
> >> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Babase mailing list
> >> Babase at www.eco.princeton.edu
> >> http://www.eco.princeton.edu/mailman/listinfo/babase
> >>
> >>
> >
> >
> >
> >
> > Karl <kop at meme.com>
> > Free Software:  "You don't pay back, you pay forward."
> >                 -- Robert A. Heinlein
> >
> >
> > _______________________________________________
> > Babase mailing list
> > Babase at www.eco.princeton.edu
> > http://www.eco.princeton.edu/mailman/listinfo/babase
> 
> ------------------------------------------------------------
> Susan Alberts, Professor of Biology, Duke University, Box 90338,  
> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
> 
> 
> 
> 
> _______________________________________________
> Babase mailing list
> Babase at www.eco.princeton.edu
> http://www.eco.princeton.edu/mailman/listinfo/babase
> 
> 
> 




Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein




More information about the Babase mailing list