[Babase] Re: Log access

Karl O. Pinc babase@www.eco.princeton.edu
Thu, 14 Oct 2004 14:34:34 -0500


On 2004.10.13 13:48 Hunter Matthews wrote:
> You can't change that - those logs go to our centralized logging host,
> and ALL of the LOCAL levels are spoken for.

Maybe I haven't clearly explained what I'm asking for.

I want a copy of the logs.  And I don't want postgresql log events to
be discarded, I want a log.  (You must too.)

I want to know when our database is not working as it should.  In
theory, I could get this from the users.  I practice does not work.
The users may not notice error messages, they may not accurately
record or report them.  I'm not interested in controlling or enforcing
what tools the users use to access the database and even if I was the
tools (like phppgadmin) used may not report problems back to the
users.

Beyond the usual (whatever those are) problems the database server
might have manipulating our data, there are a number of other messages
the database server generates to indicate whether or not it is
operating as expected.  Bear in mind that modern client-server design
means that the server is doing more than acting as a disk store, it
enforces data integrity rules and even computes and records
second-generation data.  In other words, part of the application lives
on the server and both unexpected application errors (program crashes)
and application detected data problems must be reported back to the
users, and recorded in the project's permanent records in case the
integrity of the data analysis is affected.  Examples of these sorts
of messages are, in a an ordering more or less moving from the system
to the application level: Missing required data value.  Duplicate data
value.  Referenced row does not exist.  Data inconsistent with other
existing data values.  Trigger crashed while checking data integrity
or updating database.  Programmatic errors in functions used to
retrieve and analyze data.  If these messages are not acted upon then
bad data can result. If they are not recorded then there is no record
of poor analysis, which is bad science.

Further, the database server logs changes to the database structure as
they are made.  Good science, including later review of prior
analytical results, requires that records of these changes be kept.
Taking the records straight from the postgres sever logs ensures
effortless and accurate records.

The science being done depends on accurate data and complete record
keeping.  This means I need access to the postgresl server logs.

Likewise, much of the remainder of the application runs on the
webserver.  The update operations performed by the application must be
recorded as part of the regular data logbook kept by the project.
Doing this by hand is time-consuming and error prone.  Automating this
requires that information be transferred from the webserver to the
babase project's shared data area,
login.biology.duke.edu:/biology/groups/babase/.  Rather than re-invent
the wheel, it seems to make sense to use the usual way daemons deliver
status messages to deliver application level log events.  If we get
postgresql log records we can get our own log records too.  No reason
not to log them to the same facility so they go to the same place our
postgresql server log messages go.

Logs should be kept forever and be available to our project personnel
on the project's website.

> Karl,
>    I'm trying to help - truly I want you and your project to be
> successful. But thats a SHARED resource and I will not alter its
> configuration to suit one project hosted on it.

I would expect other user's would want access to the logs too for
exactly the same reasons.  As the server is shared we have no problems
sharing our logs.  My suggestions in previous messages regards logging
were primarily concerned with how you might supply us with logs
_without_ giving us other people's logs.

I am not proposing you change how you keep logs or where they go.  I
propose you supply us with (at least our part of) the logs you keep.
Not altering your current setup, just writing an extra copy.

I would expect you to do this by adding a single line to
/etc/syslogd.conf on dbserve.biology.duke.edu, which must be where the
logs go first before going to your central log server.  (Although you
could copy us from the central server, I don't care.)  If you want to
get fancy that line can contain a grep so that we get only the logs
related to our project.  There are many ways to deliver a file from
the database server (and web server) to our shared data area, I leave
this to you.

> 
> I'm trying to find that iostat? cron job thing, but haven't had a
> chance
> yet - I'll take whatever it generates and dump it in your home
> directory.

Ok.  Thanks.

> 
> *** There will not be any network configuration changes for that
> machine
> at this time ***. No ssh, no inetd/telnet, just don't ask.
> 
> I said I will look at the memory issue, and I will. I realize (solely
> from the amount of email today if nothing else) that this is priority
> for you. But again, thats a shared resource that does more than just
> run
> postgres. And the postgres database runs more than just the babase*
> datasets.

Right.  Any memory allocated to postgres will be shared by all
postgres users.

> I won't stop trying to adjust things for you today but the easiest way
> to adjust things is for you guys to acquire your own server.

I leave that call to you.  Let us know if we should get our own
server.  I'd think that delivering logs to us would be a lot less work
than administering another server, but I don't know your setup.  For
our part I think buying our own server and using the standard syslog
mechanism to record application level log events (generated on the
webserver) is more cost-effective (and more maintainable) than my
developing some sort of custom solution.  I don't see any alternative
to recording the postgres server logs in our project records so as far
as the database itself goes I see no options.  Do you have alternate
suggestions?  Feel free to call.  773 363-2105.

It'd be a shame to have to get our own server because it'd be
mostly unused.  (Not that it's that big an expense.)

A related issue is that project's present practice is to keep the
'dataset' into which our staff enters database data 'off-line'.  The
application is later supplied with the these datasets and uses their
contents to update the database.  In the system we're building at duke
I expect the staff will be entering data into excel and proofing it
there before giving it to the webserver (as comma delimited text or
whatever) for entry into the database.  It'd be good if the webserver
would save the actual 'dataset' used to update the database rather
than rely on the operator to keep records.  Is there any way that the
webserver could write into our project's shared data space on
login.biology.duke.edu:/biology/groups/babase/?  All we'd need to do
is drop a file into some drop box and a cron job could take it from
there.  (Myself, I'd make a custom ssh key that can run only "cat >
dropfile.$(date +%s).$$" or the paranoid perl equivalent, YMMV.)

I'm not sure how important this is, as the current practice is by hand
it can obviously be lived with.  Perhaps the babase team can comment
on how desirable it is to have this happen automatically.

> On Wed, 2004-10-13 at 14:01, Karl O. Pinc wrote:
> > I've tried setting syslog_faclity in the babase_test
> > database and don't have permissions. (now is LOCAL4.)
> >
> > \u babase_test
> > select * from pg_settings where name = 'syslog_facility';
> > alter database babase_test set syslog_facility='LOCAL5';
> >
> > You might be able to do it as superuser, or maybe this
> > is something that's not per-database.
> >
> > There's also the question of logging application events,
> > and accessing these logs.
> > which is tied to system events as I want a log of

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                  -- Robert A. Heinlein