[Babase] Papio Information

Karl O. Pinc kop at meme.com
Wed Jun 10 23:43:42 EDT 2009


On 06/10/2009 07:32:45 PM, Susan Alberts wrote:
> Thanks Karl and Ryan,
> 
>> 
>> The one important take-away point that differs from the previously
>> explained backup plan is that backups are put into the backup system
>> only every other day.  We do keep the prior days backup, for 1 day
>> only, on disk on papio until it's overwritten by the next days  
>> backup.
>> So, in case of disaster we could lose 2 days of work.
> 
> For clarification, do you mean that the incrementals that Ryan talked  
> about are kept for only 24 hours?

No, the database dump to disk is kept for only 24 hours.  It is this
dump that we rely on to restore -- it is part of what's backed up every
2 days in incrementals to tape.

  I didn't understand "We do keep the
> prior days backup, for 1 day only..." But, this refers to a backup on  
> papio, and I think the incrementals that Ryan is talking about are on  
> a backup server. So, I think what Karl is saying is that in addition,  
> to Diffs, Fulls and Incrementals, we do a backup ourselves onto Papio  
> but only keep it for 24 hours.

Right.

  So if an incremental is done Monday at
> 4 am (or whatever), and the server crashes catastrophically on Weds  
> at 3 am, there is a chance that we've lost all changes from Monday 4  
> onwards, because any backup we made on Papio might have been taken  
> at, say, Monday 10 pm and not kept past Tuesday 10 pm. Right?

Right.

  But
> then, wouldn't there be a papio backup taken at 10 pm Tuesday?

If the crash was catastrophic enough (a meteor hits the primate
center and all the released animals are attracted to Babase
and wind up wrecking havoc in the server room) then
the 10pm Tuesday backup on papio would be
gone too due to the submersion of papio itself
in questionable fluids.  The tapes, one presumes,
are kept elsewhere in a facility secured from primates.

> 
> Sorry if I'm not getting soemthing obvious.

I didn't go into detail regarding how restores are done.

>> 
>> 
>> Ryan, the only thing I can think of that might go wrong is if
>> the database backup is being written at the same time that
>> the backup is done to tape.  This would leave us with an
>> incomplete backup on every tape.
> 
> I don't get this.

You cannot, in general, backup papio's disks and use that
to restore the database.  This is because the database may
be modified during the backup process, modifying various
parts of the disks which, at any one moment, may or may
not have already been incorporated into the backup.
Thus what is backed up represents pieces of the
database from different moments in time and the backup
as a whole does not represent the database at any one
particular moment; the backed up image is not consistent.

(Further, the database is not designed to be backed
up as a raw disk image.  It is possible to backup
an instantaneous snapshot of the papio disks, but
it could take the database a long time after
startup to recover.  This is why it can take a long
time for the database to become available after, say,
a reboot due to an unexpected power failure.)

A consistent snapshot of the database is required so
that what is restored is consistent database image.
We accomplish this by backing up the database to disk
with a program designed for the purpose (pg_dump & friends),
producing a consistent copy of the database, and then
backing up the backup onto tape/whatever else the
Duke Biology backup system uses.  However, this assumes
that what is written to tape is a complete and consistent
dump of the database.  If the copy on tape represents
a dump that is in the process of being written to disk
because the pg_dump is running at the same time as the tape
backup then there is no consistent database dump on tape from
which we can restore.

You solve this either by coordinating the timing of the
2 backup systems or by having the database dump to disk
keep the existing disk copy until the new copy is
completely written.  The latter requires more disk space
but the database dumps are only a few GB so it's entirely
feasible.  I'd have to enhance the database dump to disk
program -- a task that may not be worth the bother but
might have taken less time than I've spent explaining.


Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein



More information about the Babase mailing list