[Babase] Question about file upload

Karl O. Pinc kop at meme.com
Wed Mar 18 21:43:36 EDT 2009


On 03/18/2009 05:27:42 PM, Ryan Hardy wrote:
> As far as I am aware, PostgreSQL databases are stored in 1GB chunks,  
> so file size limitations shouldn't come into play.  I'm not aware of  
> any limitations of volume size for backups, but I'll admit we don't  
> have any volumes over 500GB or so and I've not done any testing of  
> the infrastructure beyond that point.

The problem with the backups, as done at Duke, is that they
rely on dumping the databases into the filesystem. One file
per database.  So, the total (compressed) backup of the
entire cluster, per your remark below, is limited
to 2TB.

Note that the "traditional" biology.duke.edu postgresql backup
is: papio.biology.duke.edu:/etc/crond.daily/postgres_backup.cron

There are 2 problems with with this script.  The first is
that the config files and logs are not backed up.  Not really
an issue because they are presumably backed up elsewhere.
(Although I think maybe not, that it's only the /disk/
partition that gets backed up?  Regularly?
The PG configs live on /var/.
At least that's the way papio's set up.  I have
got some of the configs in /usr/local/etc/ on papio,
softlinked back, because I wanted to keep all the
stuff I fiddled with in one place.)

The second problem with this is that the permissions on the
databases themselves are not backed up.  That is, the permissions
that the Postgresql users have to the postgres database
objects.  Who owns the database, who's allowed to create
databases themseles, create tables in each database,
etc.  I forget the exact details.  IIRC the stuff inside the
databases retains their permissions with the old script, I think.
This is a real problem if there are more than one or two
databases or if there are non-default permissions.

To solve these problems I extended Hunters script to be:
papio.biology.duke.edu:/etc/cron.daily/babase_postgres_backup.cron
It makes exactly the same files as the original but also dumps the
cluster schema (object definitions) and saves the config files.
It also has a documented restore procedure at the top.
(I tested each individual part of the restore procedure separately, but
never restored the entire cluster from bare O/S.)

The only problem is that papio is running both scripts,
so is using twice the filesystem space for backup storage.
You're the backup guy so we rely on you think about this
"problem".  It's not an issue now, but will become so
if we start putting GBs of data into the db.  Please think
about this.  I'm happy to turn off "Hunter's script",
but also don't want to be doing things different from
everything else at Duke because you're the one
ultimately responsible for restores.

FYI, according to Hunter many postgres dbs at Duke
were created with the default encoding, which is UTF-8.
The problem with this is that collation is significantly
slower than if you use the C (ascii) locale/collating
sequence.  Most applications are not going to need
UTF-8 and so are slowed for no reason.  (There
are a couple of other big tweaks too.  I forget,
see the papio config files.  Mostly increasing shared
memory size.  The postgres faq probably has something too.
It has gotten easier in newer versions.)

> 
> I do believe that the maximum volume size for 'ext3' filesystems is  
> something like 2TB though (depending on the block size), so that is  
> likely the limit of the current setup.

There's a 2TB table size limit, IIRC.  But tables and other
system objects can be placed into "tablespaces", which
can be on separate partitions.  This removes the total DB size
limit, and can also be used to tweak performance.

> 
> -Ryan
> 
> On Mar 18, 2009, at 12:18 PM, Karl O. Pinc wrote:
> 
>> We will have problems with backups exceeding the
>> maximum file size long before we reach these limits.
>> I think we run into a problem around 1TB?
> 
> 

Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein



More information about the Babase mailing list