[Babase] POTENTIAL_DADS view can be tested-ish

Susan Alberts alberts at duke.edu
Mon Oct 18 12:41:55 EDT 2010


Thanks, I've gone through the query and I think it is lovely. No  
seriously, I learned a few things from it (that I will probably  
immediately forget but now I know I can go look there for how to do a  
couple of things I'd like to do from time to time). I didn't see any  
problems, it seemed crystal clear to me. But, I don't have a lot of  
querying chops at the moment so I am not the best judge.

Susan

On Oct 17, 2010, at 10:52 PM, Karl O. Pinc wrote:

>
>
> On 10/17/2010 08:31:47 PM, Susan Alberts wrote:
>> thanks, i love it. probably the column age_days should be called
>> pdad_age_days and the column age_years should be pdad_age_years, just
>>
>> to avoid confusion. I know the names are long but I think clarity is
>> good here, as there are three animals in each row that age could
>> refer
>>
>> to (in theory).
>
> Ok.
>
>
> (You saw the comments/questions previously embedded in
> context below?)
>
> FYI, here's the latest query.  I guess I'm ready
> to have someone review it.
>
> We're looking at a bit
> under 5 minutes to generate the entire result set.
>
> Note that all benchmarks are on a second run of the
> query, after all possible disk I/O has been done.
> If another large query has pushed, say, members
> out of ram or if the machine has just booted, etc.,
> then there will be extra time required to read the
> disk.
>
> I decided to use the MATERNITIES and ACTOR_ACTEES
> views to keep it simpler.  As-is there seems no
> performance impact.
>
> Note that this is still not documented, and neither
> is the zdate_grp column I added to the MATERNITIES
> view.
>
> SELECT maternities.child_bioid AS bioid
>     , maternities.child AS kid
>     , maternities.mom AS mom
>     , maternities.zdate AS zdate
>     , maternities.zdate_grp AS grp
>     , pdads.sname AS pdad
>     , CASE
>         WHEN rankdates.ranked <= maternities.zdate
>           THEN 'A'
>         WHEN maturedates.matured <= maternities.zdate
>           THEN 'S'
>         ELSE 'O'
>       END
>       AS status
>     , maternities.zdate - pdads.birth AS pdad_age_days
>     , trunc((maternities.zdate - pdads.birth) / 365.25, 1)
>       AS pdad_age_years
>     , (SELECT count(*)
>          FROM members as dadmembers
>          WHERE dadmembers.sname = pdads.sname
>                AND dadmembers.grp = maternities.zdate_grp
>                AND dadmembers.date < maternities.zdate
>                AND dadmembers.date >= maternities.zdate - 5)
>       AS estrous_presence
>     , (SELECT count(*)
>          FROM actor_actees
>          WHERE actor_actees.date < maternities.zdate
>                AND actor_actees.date >= maternities.zdate - 5
>                AND (actor_actees.act = 'M'
>                     OR actor_actees.act = 'E')
>                AND actor_actees.actor = pdads.sname
>                AND actor_actees.actee = maternities.mom)
>       AS estrous_me
>     , (SELECT count(*)
>          FROM actor_actees
>          WHERE actor_actees.date < maternities.zdate
>                AND actor_actees.date >= maternities.zdate - 5
>                AND actor_actees.act = 'C'
>                AND actor_actees.actor = pdads.sname
>                AND actor_actees.actee = maternities.mom)
>       AS estrous_c
>
>  FROM maternities
>       JOIN biograph AS pdads
>            ON (pdads.sname
>                IN (SELECT dadmembers.sname
>                      FROM members AS dadmembers
>                      WHERE dadmembers.sname = pdads.sname
>                            AND dadmembers.grp = maternities.zdate_grp
>                            AND dadmembers.date < maternities.zdate
>                            AND dadmembers.date >= maternities.zdate -
> 5))
>
>       LEFT OUTER JOIN rankdates
>            ON (rankdates.sname = pdads.sname)
>       LEFT OUTER JOIN maturedates
>            ON (maturedates.sname = pdads.sname)
>  WHERE pdads.sex = 'M'
>        -- Speed things up by eliminating potential dads
>        -- who could not be in interpolate into the mom's group
>        -- during the fertile period.
>        AND pdads.statdate >= maternities.zdate - 5 - 14
>        -- Potential dad must be at least 6 years old on the zdate.
>        AND trunc((maternities.zdate - pdads.birth) / 365.25, 1) >= 6;
>
>
>
>> Susan
>>
>> On Oct 17, 2010, at 5:22 PM, Karl O. Pinc wrote:
>>
>>> On 10/17/2010 01:11:35 PM, Susan Alberts wrote:
>>>> OK, it goes pretty fast when you query for a small subset. Very
>> nice.
>>>>
>>>> We do want the Mom to be a column in there I realized. Just
>> calling
>>>> it
>>>>
>>>> Mom is fine.
>>>
>>> Ok.  Done.
>>>
>>>
>>>> On Oct 17, 2010, at 2:09 PM, Susan Alberts wrote:
>>>>
>>>>>
>>>>> On Oct 17, 2010, at 2:06 PM, Karl O. Pinc wrote:
>>>>>
>>>>>> On 10/17/2010 12:51:32 PM, Susan Alberts wrote:
>>>
>>>>>>>> You can try out the POTENTIAL_DADS view.
>>>>>>>
>>>>>>> I have done so. very nice. Quite slow browsing when I first
>>>> brought
>>>>>>> it
>>>>>>>
>>>>>>> up, but that might have been the connectiivty problems you and
>>>> Ryan
>>>>>>> mentioned. On the other hand, it is a big query to build to
>> view.
>>>
>>> Note that there is no preset order to the results so, in theory,
>>> browsing is dangerous. (As with any regular unordered table or
>>> other unordered views data _could_ come back in any order.)
>>>
>>>>>>
>>>>>> It takes about a minute and a half to process.   The rest
>>>>>> of the time is ppa and delivering the data to the browser.
>>>
>>> I completely redid some of the structure.  I will re-send the
>>> query when I finish testing.  It now will likely take a lot
>>> longer to look at the entire view.
>>>
>>> There was a serious bug -- the potential dad had to be there
>>> on the zdate.
>>>
>>>>>>
>>>>>> Well, postgres can compute age in years based on calendar date.
>>>>>> I have not thought through the implications but I imagine that
>> it
>>>>>> has a lot to do with whether you were born in a leap year and
>> how
>>>>>> the rounding all works out.
>>>>>
>>>>> Ah I see. I think it is more important for us to be consistent
>>>>> across different analyses, and I have always used 365.25 for
>> years,
>>>>
>>>>> I think Jeanne uses months which can be calculated from the days
>>>>> column. So my view is that we are OK on that column.
>>>
>>> Ok.  Good by me.  Note that, I think, some of the database checks
>>> use "real" calendar age.  I.E.  Born on the 31s?  Then the 31st
>>> is your birthday.  I can see possibilities for age computations
>>> to vary if dividing by 365.25 at the end of longer months,
>> depending
>>> on when birth happens in the leap year cycle.  Maybe.  Anyway,
>>> a heads-up.
>>>
>>>>>>
>>>>>>>
>>>>>>> One thing that should be fixed is the count of days the male
>> was
>>>>>>> present in the female's five-day window. Currently we have 6
>>>>>>> possible
>>>>>>>
>>>>>>> days because I think you counted Z-day as well as the five days
>>>>>>> prior.
>>>>>>>
>>>>>>> We only want the five days prior.
>>>>>>
>>>>>> Ok.  Do you want a male to be a potential dad if they were
>>>>>> present on the zdate but not any of the 5 prior days?
>>>>>
>>>>> No
>>>
>>> Ok.  And we don't want MCE related counts then too.
>>>
>>> But all kid/mom group membership and age and status is still
>>> computed based on zdate.  Right?
>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks for sending the query, I have not looked through it in
>>>> detail
>>>>>>> but will if you want.
>>>
>>> Eventually another pair of eyes would be good.
>>>
>>>>>>
>>>>>> You could.  It's a lot simpler if you base it on the maternities
>>>>>> and actor_actee views but that makes it run slower.  At least
>>>> 1/3rd
>>>>>> longer and probably more like twice as slow.  It may be
>>>>>> a false optimization to tweak it like this, more trouble in the
>>>>>> long run to maintain, but I've not figured that out yet.
>>>>>> This is the sort of syntax-ie stuff I'm thinking about.
>>>>>> (Note that the whole first series of joins, up until you
>>>>>> get to the pdad stuff, is a re-creation of maternities.)
>>>>>>
>>>>>> Any thoughts on performance?  If you ask for a particular
>>>>>> kid then it runs fast enough no matter how it's formulated.
>>>>>
>>>>> OK I will do some querying to test and see.
>>>
>>> Is it typical you return the entire result set or use the
>>> view as a table in another query?   If not then it can
>>> take twice as long and we don't really care.
>>>
>>>
>>>
>>>
>>>>>
>>>>> Susan
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Susan
>>>>>>>
>>>>>>>>
>>>>>>>> FYI, here's the query (more or less, I keep fussing):
>>>>>>>>
>>>>>>>> SELECT kids.bioid AS bioid
>>>>>>>> , kids.sname AS kid
>>>>>>>> , zdates.date AS zdate
>>>>>>>> , kidmembers.grp AS grp
>>>>>>>> , pdads.sname AS pdad
>>>>>>>> , CASE
>>>>>>>>     WHEN rankdates.ranked <= pdadmembers.date
>>>>>>>>       THEN 'A'
>>>>>>>>     WHEN maturedates.matured <= pdadmembers.date
>>>>>>>>       THEN 'S'
>>>>>>>>     ELSE 'O'
>>>>>>>>   END
>>>>>>>>   AS status
>>>>>>>> , pdadmembers.date - pdads.birth AS age_days
>>>>>>>> , trunc((pdadmembers.date - pdads.birth) / 365.25,
>>>>>>>>         1) AS age_years
>>>>>>>> , (SELECT count(*)
>>>>>>>>      FROM members as dadmembers
>>>>>>>>      WHERE dadmembers.sname = pdads.sname
>>>>>>>>            AND dadmembers.grp = kidmembers.grp
>>>>>>>>            AND dadmembers.date <= zdates.date
>>>>>>>>            AND dadmembers.date >= zdates.date - 5)
>>>>>>>>   AS estrous_presence
>>>>>>>> , (SELECT count(*)
>>>>>>>>      FROM interact_data, parts AS actor, parts AS actee
>>>>>>>>      WHERE actor.sname = pdads.sname
>>>>>>>>            AND actee.sname = cycles.sname
>>>>>>>>            AND interact_data.iid = actor.iid
>>>>>>>>            AND interact_data.iid = actee.iid
>>>>>>>>            AND interact_data.date <= zdates.date
>>>>>>>>            AND interact_data.date >= zdates.date - 5
>>>>>>>>            AND (interact_data.act = 'M'
>>>>>>>>                 OR interact_data.act = 'E'))
>>>>>>>>   AS estrous_me
>>>>>>>> , (SELECT count(*)
>>>>>>>>      FROM interact_data, parts AS actor, parts AS actee
>>>>>>>>      WHERE actor.sname = pdads.sname
>>>>>>>>            AND actee.sname = cycles.sname
>>>>>>>>            AND interact_data.iid = actor.iid
>>>>>>>>            AND interact_data.iid = actee.iid
>>>>>>>>            AND interact_data.date <= zdates.date
>>>>>>>>            AND interact_data.date >= zdates.date - 5
>>>>>>>>            AND interact_data.act = 'C')
>>>>>>>>   AS estrous_c
>>>>>>>>
>>>>>>>> FROM biograph as kids
>>>>>>>>   JOIN pregs
>>>>>>>>     ON (pregs.pid = kids.pid)
>>>>>>>>   JOIN cycpoints AS zdates
>>>>>>>>     ON (zdates.cpid = pregs.conceive)
>>>>>>>>   JOIN cycles
>>>>>>>>     ON (cycles.cid = zdates.cid)
>>>>>>>>   JOIN members AS kidmembers
>>>>>>>>     ON (kidmembers.date = zdates.date
>>>>>>>>         AND kidmembers.sname = cycles.sname)
>>>>>>>>   JOIN members AS pdadmembers
>>>>>>>>        ON (pdadmembers.date = zdates.date
>>>>>>>>            AND pdadmembers.grp = kidmembers.grp)
>>>>>>>>   JOIN biograph AS pdads
>>>>>>>>        ON (pdadmembers.sname = pdads.sname)
>>>>>>>>   LEFT OUTER JOIN rankdates
>>>>>>>>        ON (rankdates.sname = pdads.sname)
>>>>>>>>   LEFT OUTER JOIN maturedates
>>>>>>>>        ON (maturedates.sname = pdads.sname)
>>>>>>>>
>>>>>>>> WHERE pdads.sex = 'M'
>>>>>>>>    AND trunc((pdadmembers.date - pdads.birth) / 365.25, 1) >=
>>>> 6
>>>>>>>>    AND pdads.sname
>>>>>>>>        IN (SELECT dadmembers.sname
>>>>>>>>              FROM members AS dadmembers
>>>>>>>>              WHERE dadmembers.sname = pdads.sname
>>>>>>>>                    AND dadmembers.grp = kidmembers.grp
>>>>>>>>                    AND dadmembers.date <= zdates.date
>>>>>>>>                    AND dadmembers.date >= zdates.date - 5);
>>>>>>>>
>>>>>>>>
>>>>>>>> Karl <kop at meme.com>
>>>>>>>> Free Software:  "You don't pay back, you pay forward."
>>>>>>>>             -- Robert A. Heinlein
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Babase mailing list
>>>>>>>> Babase at www.eco.princeton.edu
>>>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> Susan Alberts, Professor of Biology, Duke University, Box
>> 90338,
>>>>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Babase mailing list
>>>>>>> Babase at www.eco.princeton.edu
>>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Karl <kop at meme.com>
>>>>>> Free Software:  "You don't pay back, you pay forward."
>>>>>>              -- Robert A. Heinlein
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Babase mailing list
>>>>>> Babase at www.eco.princeton.edu
>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>>
>>>>> ------------------------------------------------------------
>>>>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
>>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Babase mailing list
>>>>> Babase at www.eco.princeton.edu
>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>
>>>> ------------------------------------------------------------
>>>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Babase mailing list
>>>> Babase at www.eco.princeton.edu
>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> Karl <kop at meme.com>
>>> Free Software:  "You don't pay back, you pay forward."
>>>                -- Robert A. Heinlein
>>>
>>>
>>> _______________________________________________
>>> Babase mailing list
>>> Babase at www.eco.princeton.edu
>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>
>> ------------------------------------------------------------
>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>
>>
>>
>>
>> _______________________________________________
>> Babase mailing list
>> Babase at www.eco.princeton.edu
>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>
>>
>
>
>
>
> Karl <kop at meme.com>
> Free Software:  "You don't pay back, you pay forward."
>                 -- Robert A. Heinlein
>
>
> _______________________________________________
> Babase mailing list
> Babase at www.eco.princeton.edu
> http://www.eco.princeton.edu/mailman/listinfo/babase

------------------------------------------------------------
Susan Alberts, Professor of Biology, Duke University, Box 90338,  
Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293






More information about the Babase mailing list