[Babase] POTENTIAL_DADS view can be tested-ish
Susan Alberts
alberts at duke.edu
Mon Oct 18 12:41:55 EDT 2010
Thanks, I've gone through the query and I think it is lovely. No
seriously, I learned a few things from it (that I will probably
immediately forget but now I know I can go look there for how to do a
couple of things I'd like to do from time to time). I didn't see any
problems, it seemed crystal clear to me. But, I don't have a lot of
querying chops at the moment so I am not the best judge.
Susan
On Oct 17, 2010, at 10:52 PM, Karl O. Pinc wrote:
>
>
> On 10/17/2010 08:31:47 PM, Susan Alberts wrote:
>> thanks, i love it. probably the column age_days should be called
>> pdad_age_days and the column age_years should be pdad_age_years, just
>>
>> to avoid confusion. I know the names are long but I think clarity is
>> good here, as there are three animals in each row that age could
>> refer
>>
>> to (in theory).
>
> Ok.
>
>
> (You saw the comments/questions previously embedded in
> context below?)
>
> FYI, here's the latest query. I guess I'm ready
> to have someone review it.
>
> We're looking at a bit
> under 5 minutes to generate the entire result set.
>
> Note that all benchmarks are on a second run of the
> query, after all possible disk I/O has been done.
> If another large query has pushed, say, members
> out of ram or if the machine has just booted, etc.,
> then there will be extra time required to read the
> disk.
>
> I decided to use the MATERNITIES and ACTOR_ACTEES
> views to keep it simpler. As-is there seems no
> performance impact.
>
> Note that this is still not documented, and neither
> is the zdate_grp column I added to the MATERNITIES
> view.
>
> SELECT maternities.child_bioid AS bioid
> , maternities.child AS kid
> , maternities.mom AS mom
> , maternities.zdate AS zdate
> , maternities.zdate_grp AS grp
> , pdads.sname AS pdad
> , CASE
> WHEN rankdates.ranked <= maternities.zdate
> THEN 'A'
> WHEN maturedates.matured <= maternities.zdate
> THEN 'S'
> ELSE 'O'
> END
> AS status
> , maternities.zdate - pdads.birth AS pdad_age_days
> , trunc((maternities.zdate - pdads.birth) / 365.25, 1)
> AS pdad_age_years
> , (SELECT count(*)
> FROM members as dadmembers
> WHERE dadmembers.sname = pdads.sname
> AND dadmembers.grp = maternities.zdate_grp
> AND dadmembers.date < maternities.zdate
> AND dadmembers.date >= maternities.zdate - 5)
> AS estrous_presence
> , (SELECT count(*)
> FROM actor_actees
> WHERE actor_actees.date < maternities.zdate
> AND actor_actees.date >= maternities.zdate - 5
> AND (actor_actees.act = 'M'
> OR actor_actees.act = 'E')
> AND actor_actees.actor = pdads.sname
> AND actor_actees.actee = maternities.mom)
> AS estrous_me
> , (SELECT count(*)
> FROM actor_actees
> WHERE actor_actees.date < maternities.zdate
> AND actor_actees.date >= maternities.zdate - 5
> AND actor_actees.act = 'C'
> AND actor_actees.actor = pdads.sname
> AND actor_actees.actee = maternities.mom)
> AS estrous_c
>
> FROM maternities
> JOIN biograph AS pdads
> ON (pdads.sname
> IN (SELECT dadmembers.sname
> FROM members AS dadmembers
> WHERE dadmembers.sname = pdads.sname
> AND dadmembers.grp = maternities.zdate_grp
> AND dadmembers.date < maternities.zdate
> AND dadmembers.date >= maternities.zdate -
> 5))
>
> LEFT OUTER JOIN rankdates
> ON (rankdates.sname = pdads.sname)
> LEFT OUTER JOIN maturedates
> ON (maturedates.sname = pdads.sname)
> WHERE pdads.sex = 'M'
> -- Speed things up by eliminating potential dads
> -- who could not be in interpolate into the mom's group
> -- during the fertile period.
> AND pdads.statdate >= maternities.zdate - 5 - 14
> -- Potential dad must be at least 6 years old on the zdate.
> AND trunc((maternities.zdate - pdads.birth) / 365.25, 1) >= 6;
>
>
>
>> Susan
>>
>> On Oct 17, 2010, at 5:22 PM, Karl O. Pinc wrote:
>>
>>> On 10/17/2010 01:11:35 PM, Susan Alberts wrote:
>>>> OK, it goes pretty fast when you query for a small subset. Very
>> nice.
>>>>
>>>> We do want the Mom to be a column in there I realized. Just
>> calling
>>>> it
>>>>
>>>> Mom is fine.
>>>
>>> Ok. Done.
>>>
>>>
>>>> On Oct 17, 2010, at 2:09 PM, Susan Alberts wrote:
>>>>
>>>>>
>>>>> On Oct 17, 2010, at 2:06 PM, Karl O. Pinc wrote:
>>>>>
>>>>>> On 10/17/2010 12:51:32 PM, Susan Alberts wrote:
>>>
>>>>>>>> You can try out the POTENTIAL_DADS view.
>>>>>>>
>>>>>>> I have done so. very nice. Quite slow browsing when I first
>>>> brought
>>>>>>> it
>>>>>>>
>>>>>>> up, but that might have been the connectiivty problems you and
>>>> Ryan
>>>>>>> mentioned. On the other hand, it is a big query to build to
>> view.
>>>
>>> Note that there is no preset order to the results so, in theory,
>>> browsing is dangerous. (As with any regular unordered table or
>>> other unordered views data _could_ come back in any order.)
>>>
>>>>>>
>>>>>> It takes about a minute and a half to process. The rest
>>>>>> of the time is ppa and delivering the data to the browser.
>>>
>>> I completely redid some of the structure. I will re-send the
>>> query when I finish testing. It now will likely take a lot
>>> longer to look at the entire view.
>>>
>>> There was a serious bug -- the potential dad had to be there
>>> on the zdate.
>>>
>>>>>>
>>>>>> Well, postgres can compute age in years based on calendar date.
>>>>>> I have not thought through the implications but I imagine that
>> it
>>>>>> has a lot to do with whether you were born in a leap year and
>> how
>>>>>> the rounding all works out.
>>>>>
>>>>> Ah I see. I think it is more important for us to be consistent
>>>>> across different analyses, and I have always used 365.25 for
>> years,
>>>>
>>>>> I think Jeanne uses months which can be calculated from the days
>>>>> column. So my view is that we are OK on that column.
>>>
>>> Ok. Good by me. Note that, I think, some of the database checks
>>> use "real" calendar age. I.E. Born on the 31s? Then the 31st
>>> is your birthday. I can see possibilities for age computations
>>> to vary if dividing by 365.25 at the end of longer months,
>> depending
>>> on when birth happens in the leap year cycle. Maybe. Anyway,
>>> a heads-up.
>>>
>>>>>>
>>>>>>>
>>>>>>> One thing that should be fixed is the count of days the male
>> was
>>>>>>> present in the female's five-day window. Currently we have 6
>>>>>>> possible
>>>>>>>
>>>>>>> days because I think you counted Z-day as well as the five days
>>>>>>> prior.
>>>>>>>
>>>>>>> We only want the five days prior.
>>>>>>
>>>>>> Ok. Do you want a male to be a potential dad if they were
>>>>>> present on the zdate but not any of the 5 prior days?
>>>>>
>>>>> No
>>>
>>> Ok. And we don't want MCE related counts then too.
>>>
>>> But all kid/mom group membership and age and status is still
>>> computed based on zdate. Right?
>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks for sending the query, I have not looked through it in
>>>> detail
>>>>>>> but will if you want.
>>>
>>> Eventually another pair of eyes would be good.
>>>
>>>>>>
>>>>>> You could. It's a lot simpler if you base it on the maternities
>>>>>> and actor_actee views but that makes it run slower. At least
>>>> 1/3rd
>>>>>> longer and probably more like twice as slow. It may be
>>>>>> a false optimization to tweak it like this, more trouble in the
>>>>>> long run to maintain, but I've not figured that out yet.
>>>>>> This is the sort of syntax-ie stuff I'm thinking about.
>>>>>> (Note that the whole first series of joins, up until you
>>>>>> get to the pdad stuff, is a re-creation of maternities.)
>>>>>>
>>>>>> Any thoughts on performance? If you ask for a particular
>>>>>> kid then it runs fast enough no matter how it's formulated.
>>>>>
>>>>> OK I will do some querying to test and see.
>>>
>>> Is it typical you return the entire result set or use the
>>> view as a table in another query? If not then it can
>>> take twice as long and we don't really care.
>>>
>>>
>>>
>>>
>>>>>
>>>>> Susan
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Susan
>>>>>>>
>>>>>>>>
>>>>>>>> FYI, here's the query (more or less, I keep fussing):
>>>>>>>>
>>>>>>>> SELECT kids.bioid AS bioid
>>>>>>>> , kids.sname AS kid
>>>>>>>> , zdates.date AS zdate
>>>>>>>> , kidmembers.grp AS grp
>>>>>>>> , pdads.sname AS pdad
>>>>>>>> , CASE
>>>>>>>> WHEN rankdates.ranked <= pdadmembers.date
>>>>>>>> THEN 'A'
>>>>>>>> WHEN maturedates.matured <= pdadmembers.date
>>>>>>>> THEN 'S'
>>>>>>>> ELSE 'O'
>>>>>>>> END
>>>>>>>> AS status
>>>>>>>> , pdadmembers.date - pdads.birth AS age_days
>>>>>>>> , trunc((pdadmembers.date - pdads.birth) / 365.25,
>>>>>>>> 1) AS age_years
>>>>>>>> , (SELECT count(*)
>>>>>>>> FROM members as dadmembers
>>>>>>>> WHERE dadmembers.sname = pdads.sname
>>>>>>>> AND dadmembers.grp = kidmembers.grp
>>>>>>>> AND dadmembers.date <= zdates.date
>>>>>>>> AND dadmembers.date >= zdates.date - 5)
>>>>>>>> AS estrous_presence
>>>>>>>> , (SELECT count(*)
>>>>>>>> FROM interact_data, parts AS actor, parts AS actee
>>>>>>>> WHERE actor.sname = pdads.sname
>>>>>>>> AND actee.sname = cycles.sname
>>>>>>>> AND interact_data.iid = actor.iid
>>>>>>>> AND interact_data.iid = actee.iid
>>>>>>>> AND interact_data.date <= zdates.date
>>>>>>>> AND interact_data.date >= zdates.date - 5
>>>>>>>> AND (interact_data.act = 'M'
>>>>>>>> OR interact_data.act = 'E'))
>>>>>>>> AS estrous_me
>>>>>>>> , (SELECT count(*)
>>>>>>>> FROM interact_data, parts AS actor, parts AS actee
>>>>>>>> WHERE actor.sname = pdads.sname
>>>>>>>> AND actee.sname = cycles.sname
>>>>>>>> AND interact_data.iid = actor.iid
>>>>>>>> AND interact_data.iid = actee.iid
>>>>>>>> AND interact_data.date <= zdates.date
>>>>>>>> AND interact_data.date >= zdates.date - 5
>>>>>>>> AND interact_data.act = 'C')
>>>>>>>> AS estrous_c
>>>>>>>>
>>>>>>>> FROM biograph as kids
>>>>>>>> JOIN pregs
>>>>>>>> ON (pregs.pid = kids.pid)
>>>>>>>> JOIN cycpoints AS zdates
>>>>>>>> ON (zdates.cpid = pregs.conceive)
>>>>>>>> JOIN cycles
>>>>>>>> ON (cycles.cid = zdates.cid)
>>>>>>>> JOIN members AS kidmembers
>>>>>>>> ON (kidmembers.date = zdates.date
>>>>>>>> AND kidmembers.sname = cycles.sname)
>>>>>>>> JOIN members AS pdadmembers
>>>>>>>> ON (pdadmembers.date = zdates.date
>>>>>>>> AND pdadmembers.grp = kidmembers.grp)
>>>>>>>> JOIN biograph AS pdads
>>>>>>>> ON (pdadmembers.sname = pdads.sname)
>>>>>>>> LEFT OUTER JOIN rankdates
>>>>>>>> ON (rankdates.sname = pdads.sname)
>>>>>>>> LEFT OUTER JOIN maturedates
>>>>>>>> ON (maturedates.sname = pdads.sname)
>>>>>>>>
>>>>>>>> WHERE pdads.sex = 'M'
>>>>>>>> AND trunc((pdadmembers.date - pdads.birth) / 365.25, 1) >=
>>>> 6
>>>>>>>> AND pdads.sname
>>>>>>>> IN (SELECT dadmembers.sname
>>>>>>>> FROM members AS dadmembers
>>>>>>>> WHERE dadmembers.sname = pdads.sname
>>>>>>>> AND dadmembers.grp = kidmembers.grp
>>>>>>>> AND dadmembers.date <= zdates.date
>>>>>>>> AND dadmembers.date >= zdates.date - 5);
>>>>>>>>
>>>>>>>>
>>>>>>>> Karl <kop at meme.com>
>>>>>>>> Free Software: "You don't pay back, you pay forward."
>>>>>>>> -- Robert A. Heinlein
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Babase mailing list
>>>>>>>> Babase at www.eco.princeton.edu
>>>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> Susan Alberts, Professor of Biology, Duke University, Box
>> 90338,
>>>>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Babase mailing list
>>>>>>> Babase at www.eco.princeton.edu
>>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Karl <kop at meme.com>
>>>>>> Free Software: "You don't pay back, you pay forward."
>>>>>> -- Robert A. Heinlein
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Babase mailing list
>>>>>> Babase at www.eco.princeton.edu
>>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>>
>>>>> ------------------------------------------------------------
>>>>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
>>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Babase mailing list
>>>>> Babase at www.eco.princeton.edu
>>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>
>>>> ------------------------------------------------------------
>>>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
>>>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Babase mailing list
>>>> Babase at www.eco.princeton.edu
>>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> Karl <kop at meme.com>
>>> Free Software: "You don't pay back, you pay forward."
>>> -- Robert A. Heinlein
>>>
>>>
>>> _______________________________________________
>>> Babase mailing list
>>> Babase at www.eco.princeton.edu
>>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>
>> ------------------------------------------------------------
>> Susan Alberts, Professor of Biology, Duke University, Box 90338,
>> Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
>>
>>
>>
>>
>> _______________________________________________
>> Babase mailing list
>> Babase at www.eco.princeton.edu
>> http://www.eco.princeton.edu/mailman/listinfo/babase
>>
>>
>
>
>
>
> Karl <kop at meme.com>
> Free Software: "You don't pay back, you pay forward."
> -- Robert A. Heinlein
>
>
> _______________________________________________
> Babase mailing list
> Babase at www.eco.princeton.edu
> http://www.eco.princeton.edu/mailman/listinfo/babase
------------------------------------------------------------
Susan Alberts, Professor of Biology, Duke University, Box 90338,
Durham NC 27708. Phone 919-660-7272, FAX 919-660-7293
More information about the Babase
mailing list