[Babase] mpis-further thoughts after using

Karl O. Pinc kop at meme.com
Mon Mar 24 16:45:19 EDT 2008


On 03/23/2008 11:20:23 PM, Susan Alberts wrote:
> OK this sounds really good, I didn't realize that you had already  
> sorted this out.

You'll want to review the docs to be sure it really is sorted out.

The whole point of having a design/documentation is that we
sort out these problems early, before we have to go back
and make changes to work already done.  We didn't do that this
time because I was confident Lacey and I had worked out all the
kinks in our phone calls, and I was in a hurry and short-circuited
the process because there wasn't all that much work involved once the
design sorted itself out, or not much that I could see might go
badly anyhow.  Time will tell if this was the right decision.
Regardless, the process that minimizes wasted time (but not
necessarily elapsed time) is _first_ writing up the design
and _then_ implementing the design.  This does not always work
because there's nothing like poking a working system with
a stick to see whether it does what you want, but it's as
good a process as you'll get.  We should really try to follow
it in the future and not develop bad habits.

> One issue is whether the multiple actors are entered with enough  
> consistency that your import program could do this. Am I right that  
> the format of these would always have to be the same? How exactly?  
> (ie spaces in the same places? etc)

That's a little hard to say, given that the import program does not
exist.  It's not particularly tricky, but it does not exist.

(A bald hack would be to kluge up an awk program to mung your
tab delimited data into a form suitable for import into
the right mpi view.*  I wouldn't expect it to take more than
an hour or so.  But it'd be annoying to run because it'd work
from a unix-like command line (on a mac or papio or even on a
MS Windows box.)   Thing is, it'd only take a little more time
to do the actual import program.)

I can do pretty much anything, so long as you describe it to me.
Eliminating extra spaces are easy.  I'd be leery about going
much farther than that for fear of unintended consequences,
but if you feel you know what the data looks like and what
transformations are "safe" feel free to go wild.

The hard part of the import program is a rigorous documentation
of the format of the data imported.  Something like the psion
documentation on the wiki.  We've documented what the data's
supposed to look like once it gets into the DB, the other half
is (exactly) what it looks like now.  Actually transforming one
into the other is not hard once those two pieces are in place
because, well, there isn't _that_ much shuffling around of
data when it comes down to it.  The trick is knowing what's
supposed to be done.

(It might be of interest to compare the psion data format
document with textual source code of the upload program.
Seems to me the psion format is much more complicated
than the MPI format.
https://papio.biology.duke.edu/babasewiki/PsionFormat
https://papio.biology.duke.edu/src/babase/www/html/programs/psionload/index.php
For the source of functions shared with other programs see
the content of:
https://papio.biology.duke.edu/src/babase/www/include/
)

IIRC there's only 2 reasons we need an import program at all.
First, the consortship information is mixed in with the
MPI stuff.  That needs to be saved and put in (according to the
db rules) after the interacations.  (Maybe not after _everything_,
but I've a mental note that this is easiest.) Second,
there's the issue you brought up, transforming interactions
between multiple individuals that appear as one line on the
data entry sheets.

I tend to write the data format documents because I know
I'm anal enough to capture the necessary detail, but
there's no particular reason it has to be me doing this.
I almost surely will need to edit the result because
_something's_ bound to be left off, but that's just
a normal part of discovering that things take longer
than we think they will.

Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein

*  Sorry for the jargon.  I figured you'd get the
drift and I couldn't resist the flavor it imparts.



More information about the Babase mailing list