[Babase] Validating ALLMISCS.Txt

Karl O. Pinc kop at meme.com
Fri Jul 13 14:56:10 EDT 2007


On 07/09/2007 07:39:53 AM, Lacey Maryott wrote:
> I made notes below about each sample,
> 
> Karl O. Pinc wrote:
>> Hi,
>> 
>> Looks like there are some "OS" type ALLMISCS rows.  Are these valid?
>> 
>> 
>> (Lacey, note that the following SQL is, unlike what we did yesterday,
>> SQL standards compliant.)
>> 
>> babase=> select substring(allmiscs.txt from 1 for position(',' in  
>> allmiscs.txt) - 1) as col1, count(*), max(samples.date) from  
>> allmiscs, samples where samples.sid = allmiscs.sid group by col1;
>>  col1 | count |    max
>> ------+-------+------------
>>  OS   |     3 | 2006-01-23
>>  C    |   563 | 2006-12-15
>>  U    |   990 | 2005-11-10
>>  O    |   573 | 2006-06-13
>> (4 rows)
>> 
>> babase=> select * from samples, allmiscs where samples.sid =  
>> allmiscs.sid and substring(allmiscs.txt from 1 for position(',' in  
>> allmiscs.txt) - 1) = 'OS' order by samples.date;
>>   sid  |    date    | stime | observer | stype | grp  | sname | mins  
>> | minsis | programid | setupid | palmtop | almid |  sid  |  atime    
>> |  txt
>> -------+------------+-------+----------+-------+------+-------+------+--------+-----------+---------+---------+-------+-------+----------+--------
>>  13254 | 2002-02-16 |       | JKW      | J     | 1.21 | OIL   |    5  
>> |      3 |         1 |       1 |       4 |  1084 | 13254 | 14:41:46  
>> | OS,,,,
>    OIL's sample for 02-02-16 P4 should be OS as in out of sight...  
> she had a few valid points within the sample too. it looks as though  
> the ADL was an error and should have been <PNT>
> <HDR>,0202161433OIL,PTSAMPLR_AUG01,SETUP_AUG01,4,16 Feb  
> 2002,OMO,JKW,OIL,JUV
> <PNT>,0202161433OIL,14:35:21,R2,OMO,OZO,XXX,
> <PNT>,0202161433OIL,14:36:22,R2,OZO,OMO,XXX,
> <PNT>,0202161433OIL,14:37:18,OS,,,,
> <PNT>,0202161433OIL,14:38:50,F3,OMO,DUI,XXX,UNK
> <PNT>,0202161433OIL,14:41:37,OS,,,,
> *<ADL>,0202161433OIL,14:41:46,OS,,,,*
>>  34732 | 2006-01-23 |       | RSM      | F     | 2.10 | LIW   |   10  
>> |      9 |         1 |       2 |       1 |  1981 | 34732 | 15:30:57  
>> | OS,
>> 34732 | 2006-01-23 |       | RSM      | F     | 2.10 | LIW   |   10  
>> |      9 |         1 |       2 |       1 |  1982 | 34732 | 15:31:20  
>> | OS,
>> (3 rows)
>> 
>> 
> the same seems to be the case with this LIW sample.
> 
> <HDR>,0601231530LIW,PTSAMPLR_JUL03,SETUP_JUL03,1,23 Jan  
> 2006,LIN,RSM,LIW,FEM
> <PNT>,0601231530LIW,15:30:49,OS,
> *<ADL>,0601231530LIW,15:30:57,OS,
> <ADL>,0601231530LIW,15:31:20,OS,*
> <PNT>,0601231530LIW,15:31:50,W1VS,WIP,WIP,XXX,
> <PNT>,0601231530LIW,15:32:42,W1VS,WIP,WIP,XXX,
> <PNT>,0601231530LIW,15:33:40,W1VS,WIP,WIP,XXX,
> <PNT>,0601231530LIW,15:34:41,W1VS,WIP,WIP,XXX,
> 
> what is the next step in fixing these?

First, I assume that the data has already been
loaded into the database.  Second, you may
need to go back into the raw psion data files
and change the ADL to PNT.  I think that the raw
files are being kept up to date with the changes
made in the database, but I'm not sure.

When the psionload program sees a PNT that's OS
it increments the SAMPLES.Mins column and does
nothing else.  So you'd delete the ALLMISCS
row and increment the SAMPLES.Mins column.

Note that you also may need to change the POINT_DATA.Min
columns of other points with the same Sid, in order
to say which point is which depending on where the
"out of sight" point belongs in the sample.

Note that there are _no_ restrictions on POINT_DATA.Min,
other than it must be > 0.  For instance:

babase=> select * from point_data pnta, point_data pntb where pnta.sid  
= pntb.sid and pnta.pntid != pntb.pntid and pnta.min = pntb.min;
  pntid | sid | min | activity | posture | foodcode | ptime | pntid |  
sid | min | activity | posture | foodcode | ptime
-------+-----+-----+----------+---------+----------+-------+-------+-----+-----+----------+---------+----------+-------
(0 rows)

Could have returned some rows, although I imagine this would
be an error in the data.

I can imagine a number of checks the database could do on
POINT_DATA.Min.  It could be unique within the sample,
it could be <= SAMPLES.Mins, and so forth.  None of these
are hard to incorporate into the database, but I've not
done it.  Let me know if somebody thinks it's worth doing.

The database does not check for everything that could
possibly go wrong.  There's particularly few checks
in the SAMPLES/POINT_DATA area of the database because
of the automated nature of the collection and data
loading.

Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein



More information about the Babase mailing list