[Babase] Changing Immigrant Male Census Data
Karl O. Pinc
kop at meme.com
Wed Nov 18 22:04:07 EST 2009
On 11/18/2009 03:02:38 PM, Karl O. Pinc wrote:
> On 11/18/2009 02:52:16 PM, Karl O. Pinc wrote:
> > On 11/18/2009 02:09:59 PM, Lacey Maryott Roerish wrote:
> > > Is this still running Karl?
> >
> > Yes. Interpolation is really slow when working
> > with the "old style" codes. There might be
> > a faster way to structure things than a single
> > delete statement, perhaps deleting one at a time
> > enter from earlier to later or later to
I just calculated it'll take at least another 12
hours and decided to abort it.
>
> Thinking back to the conversion I bet the trick is
> to put a "regular" census row at "both ends" of the
> time period that you want to delete, then delete, then
> remove the "regular" census rows you added.
The other requirement to keep it from taking forever is
to keep the interval between the 2 "temporary regular
census rows" to a year or less. I believe that
interpolating around "old style" census codes
is a O(n^2) operation, which means that the
time taken is proportional to the square of
the number of contiguous "old style" census
rows. (This is not true when inserting an "old style"
census row as the last census of an individual,
but that's immaterial here.) As the number
of census rows goes up the time taken gets out of
control. I was trying to delete about 3,500 rows at once and
the process of interpolation was going to create and destroy
as many MEMBERS rows as it had ever created
or destroyed in all the interpolation ever done
in Babase 2.0.
Lacey, could you please try the above technique and see
how it works out for you? If you find the system
getting slower over time you may need to login as
babase_admin and run a VACUUM ANALYZE; command
to clean the cobwebs out of the whole database.
It wouldn't hurt to run one before you start if
you've just finished copying babase_test from babase.
(Or is that part of the copy procedure?)
If it makes you more comfortable start by deleting
only a month at a time and make the intervals larger
if it does not take too long. (It is however important
to "bracket" the old-style census rows with regular
census rows during the deletion. You may be able
to get away with having, say, having a regular
census row only at the upper end of the time interval
but it's not worth the time spent thinking about
or playing around with the actual requirements.)
Because we've always planned to get rid of them
the code was never optimized to deal with interpolating
changes made to the "old style" census rows, and I
don't imagine it's worth spending any time on it now.
Karl <kop at meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
More information about the Babase
mailing list