[Babase] Re: MCE or CID duplicate entries
Karl O. Pinc
kop at meme.com
Fri Jun 26 16:12:46 EDT 2009
Hi Lacey,
I've gone back to look at old emails and found this
outstanding issue. I hope to suck less in the future,
sorry for the delay.
On 01/13/2009 11:11:25 AM, Lacey Maryott wrote:
> Karl, is there a query to search for duplicate information, in either
> the actor_actees table, or in cycles. Is there a rule in cycles or
> in interact_data that disallows duplicate id numbers?
There are rules that dis-allow duplicate ID numbers, but not
necessarily duplicate data. Let me know if we want
rules to prevent this. (Note that a lot of interaction
data has no times, and so "duplicates" are legitimate.
The queries below don't report these sorts of "legitimate
duplicates" because NULL is never equal to NULL.)
There are 104 duplicate
interactions. The trick is to compare the table
with itself by giving the table an alias using AS
in the FROM clause. The query to count is:
select count(aa1.iid)
from actor_actees as aa1
, actor_actees as aa2
where aa1.iid < aa2.iid
and aa1.act = aa2.act
and aa1.date = aa2.date
and aa1.start = aa2.start
and aa1.stop = aa2.stop
and aa1.actor = aa2.actor
and aa1.actee = aa2.actee;
Note the < comparison of iids. If you left that
off entirely then any row could be both aa1 _and_
aa2 and every row would match itself. The < rather
than a != keeps you from getting
double results back, one row where a given interaction
is the aa1 row and then a second where the same row
is the aa2 row.
If I restrict this to M C or E interactions by writing
where (aa1.act = 'M' or aa1.act = 'C' or aa1.act = 'E') and...
into the query I only get 101 duplicates.
Here's the query to get a listing:
select aa1.iid, aa2.iid
, aa1.sid, aa2.sid
, aa1.act
, aa1.date
, aa1.start
, aa1.stop
, aa1.actorid, aa2.actorid
, aa1.actor
, aa1.acteeid, aa2.acteeid
, aa1.actee
from actor_actees as aa1
, actor_actees as aa2
where aa1.iid < aa2.iid
and aa1.act = aa2.act
and aa1.date = aa2.date
and aa1.start = aa2.start
and aa1.stop = aa2.stop
and aa1.actor = aa2.actor
and aa1.actee = aa2.actee
order by aa1.date
, aa1.start
, aa1.stop
, aa1.actor
, aa1.actee;
When it comes to sexual cycles there's a lot
more rules that control what events are allowed
to happen when, so I would not expect to see
duplicates. I get no duplicates back from:
select count(cc1.cpid)
from cycpoints_cycles as cc1
, cycpoints_cycles as cc2
where cc1.cpid < cc2.cpid
and cc1.sname = cc2.sname
and cc1.date = cc2.date
and cc1.code = cc2.code;
>
> -------- Original Message --------
> Subject: mating behaviors dataset
> Date: Tue, 13 Jan 2009 09:58:12 -0500
> From: Jenny Tung <jt5 at duke.edu>
> To: Marie Charpentier <mariecharp at yahoo.fr>, Lacey Maryott
> <lacey.maryott at duke.edu>
> CC: Susan Alberts <alberts at duke.edu>
>
>
>
> Hi Marie and Lacey,
>
> I was working with the mating behaviors dataset last night and found
> a few inconsistencies in data from Group 2, July - Sep 1993. These
> are records that are repeated--i.e., there are two duplicate lines of
> data for a consortship between Kestrel and Nelson for cid 11424 (they
> are exactly the same--same day, same start and stop times). I'm
> assuming the error comes from Babase itself, not the query to extract
> the data, because I've only found this problem during this time
> period for this group (but not for all individuals in that group in
> that time period).
>
> I suspect this list is not exhaustive, but these are the records that
> I've found that are affected:
>
> KES consortship with NEL, cid 11424
> KES consortship with NEL, cid 11427
> LIM consortship with PEP cid 13012
> LUN consortship with LEO cid 14286
> NIX consortship with KEI cid 16132
>
> I can do a more thorough search for this kind of thing if you want me
> to do so; let me know.
>
> Jenny
>
>
>--
> Lacey Maryott
> Alberts Lab
> Department of Biology
> Duke University
> ph: 919-660-7306
> fax: 919-660-7293
> Lacey.Maryott at duke.edu
>
Karl <kop at meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
More information about the Babase
mailing list