Copyright (C) 2004, 2005, 2008, 2010, 2011 Karl O. Pinc This file is part of Babase. Babase is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Babase. If not, see . Karl O. Pinc Design decisions regarding the database and it's construction toolkit. Tools: PostgreSQL: database PostgreSQL was chosen as it supports triggers and full referential integrity. As of this writing MySQL has only begin to have similar support. PL/PgSQL: database extension language Comes with PostgreSQL. Most of the headaches involved in data validation have to do with the timewise sequencing of trigger invocations, which is nothing that can be solved by the choice of language. May as well use PL/PgSQL, it's almost SQL. Postgis: Extends PostgreSQL with spatial data types. M4 (and debugging with m4): macro pre-processor Many of the programs used to construct the database are produced with the m4 macro toolkit. SQL databases in general and PostgreSQL in particular are not designed so that the tables can be abstracted out of the SQL and a variable used in it's place. There are some work-arounds, generally less efficient, but why go there when plain old textual substitution does the trick quite nicely. Having started down the m4 path it only makes sense to rely on m4 as the primary means of abstraction. So, for instance, the database's constants are defined using m4 rather than a block of PL/PgSQL code. The downside to m4 is that the line numbers in the m4 source have no relationship to the line numbers reported for runtime errors. The obvious solution is to expand the m4 source and work with the line numbers within the resulting output when debugging. This does not work very well, as there is always some text in the m4 output preceding the actual executable code. If nothing else there will be a bunch of SQL function declaration lines. PostgreSQL currently reports errors on line numbers in PL/PgSQL functions by counting the line containing the opening single quote as 0. Working in emacs I find it easiest to use command repetition and the go-down-a-line command to find the line containing the error. PostgreSQL also reports errors by character number. (These seem to come from the parser.) Character counts start with the first character of the statement, the C in CREATE FUNCTION, the S in SELECT, etc. Again, using command repetition and the forward-character command works well in locating the error. It may be that dumping the code of the function out of the database and working with that will yield better results, but my limited experience has not proven this out. In any case this has not yet been a big problem so I'm not worrying about it. A few data types are declared in m4 as PostgreSQL 7.3 has some limitations in the way data types are abstracted. These are declared in the include directory. To expand an m4 file use the m4 command. You will often need to supply (multiple) -I arguments so that the correct include files will be found. Typically you will need a path to the "include" directory and often to the directory containing the file you are expanding as well. The m4 documentation is in an texinfo file and must be read with the info command. There are no man pages. Most if not all m4 macros are surrounded with changequote() to change the open and closing quote characters to {} and then back. This is so that the regular single quote character can be used inside the macro. Dia: diagram editor It plays pretty with xml and so stores nicely as text in revision archives. Besides, other than Viseo it's the only game in town, and we can't run Viseo on Linux and who knows if it's scriptable to work with Makefiles. Make: determine which pieces are used to construct which pieces Gotta have Make. Gnu make is what comes with the system so is what we use. sh, psql, and the Posix/Unix environment: supports make There are also a couple of utilities I quickly wacked together. Apache: the webserver The worlds leading webserver. Any webserver which supports PHP should work. PHP: the web programming language Kinda sticky around the edges, this popular programming language is easy to work with. DocBook: technical documentation support Handy dandy. Input files are text/xml. Output files are anything. Produces good output. Standard, and an open format. ImageMagic: technical diagram image manipulation Dia does not have good output generation yet, and DocBook is immature in it's input diagram formats so this bridges the gap. Openjade: DocBook xml validation Seems the standard DocBook toolset. Xmlto: apply DocBook xml dtds to generate readable documents Seems to be the new tool of choice for converting from DocBook. passivtex: a DocBook to PDF tool Lame. A bad choice. Chosen because it takes XML (not just SGML) as input. xsltproc: an xml styler and munger Fast and popular. Used both for styling DocBook output and for sucking the view definitions out of the documenation and producing the code that creates the views. awk: A text processing lanugage. Pervasive, on Unix at least. Darcs: revision control and source code management Started with CVS, which is clunky and subversion not much better. Used GNU Arch for a while but was never satisfied and they replaced themselves with Bazar. Darcs is clean and simple and works for what we need to do. Design notes: Triggers: All referential integrity is done in the triggers. This allows any program what so ever to be used to update the database and users can make whatever ad-hoc updates are desired, for the most part without fear of database corruption. There are some checks which cannot be done in the triggers, as well as some unusual data allowed in the database in exceptional circumstances which one would not normally expect to see. These conditions are tested for by the check_integrity script. All updates are expected to take place inside transactions. When the triggers fail they roll back the current transaction. Triggers are not presently highly optimized, for example the triggers delete and recreate all of an individual's REPSTATS and CYCSTATS whenever anything changes rather than identifying just those rows that need adjustments. (Except that they don't current update either REPSTATS or CYCSTATS, but they do do a lot of other data creation.) Interpolation: The interpolation routines are called from the CENSUS triggers. They update MEMBERS and change the Statdate on BIOGRAPH. When they change the Statdate they do not interpolate the 'corner case' of interpolation to the Statdate. When the interpolation routines do not change the Statdate, they will interpolate over the Statdate corner case interval. (And the interpolation routines always interpolate over the Birth interval corner case. The BIOGRAPH triggers look for changes in Birth or Statdate and _also_ interpolate the corner case intervals, creating the appropriate rows in MEMBERS over those intervals. (Fill in function names when the functions are written.) Sequence numbers: Some tables have sequence number columns (CYCPOINTS.Seq, CYCPOINTS.Series, PREGS.Parity). These are automatically generated using the following technique: -- -- The triggers actually update cycpoints. This is the strategy: -- -- A BEFORE BY ROW INSERT/UPDATE trigger: -- Computes the appropriate values for -- Cid, Seq, and Series. If they differ (from non-NULL) supplied -- values throw an exception. -- -- Constructs and inserts a Mdate row from a Tdate row, if necessary. -- -- A AFTER BY ROW INSERT/UPDATE trigger: -- Checks Seq of current row, detects duplicates in other rows -- and, if found, sets Seq >= MIN(NEW.Seq, OLD.Seq) + 1 to -- the correct value (not NULL, to avoid infinite loop on error) -- (This presumes the correct value will never be a duplicate, -- so be sure there's first a duplicate date check or suchlike.) -- for processing by the BEFORE BY ROW trigger. -- Note that this also sets Seq of other CYCPOINTS in the -- same cycle (same Cid) to the correct value. -- Updates REPSTATS and CYCSTATS. (When date or Code changes) -- -- A AFTER BY ROW DELETE trigger: -- Set Seq of all rows >= earliest gap to NULL. -- -- A AFTER BY STATEMENT INSERT/UPDATE/DELETE trigger: -- Checks Seq for duplicates/gaps.