Querying the history

Knowing that every change to the data is recorded with a timestamp, it is now possible to "go back in time" and query tables in the database "as of" a specific time. Unfortunately, PostgreSQL does not include the syntax AS OF. However, the range operator @> effectively means the same thing.

Example I.1. Querying "as of" a date

  FROM sometable
  WHERE sys_period @> '2022-02-22 12:34:56'::timestamptz;


Note that in this example, only the fictional table SOMETABLE is being selected-from, so the query will only return rows that 1) were in the table at 2022-02-22 12:34:56, and 2) are still in the table now. Rows that were in the table on 2022-02-22 12:34:56 but which have since been removed or updated are in SOMETABLE_HISTORY in the babase_history schema, which is not selected-from here.

When querying for a table's data "as of" a specific date in the past, the best practice is to query both the table in babase and its history table in babase_history, simultaneously. The UNION or UNION ALL operators are ideal for this.

Example I.2. Querying a table's history "as of" a date

WITH sometable_all AS (SELECT * FROM babase.sometable
                       SELECT * FROM babase_history.sometable_history)
  FROM sometable_all
  WHERE sys_period @> '2022-02-22 12:34:56'::timestamptz;


Querying data from a view is more complicated. Views usually represent data from two or more tables that have been joined together somehow, so recreating data from a view will require that the Sys_Period of each component table is accounted-for. The "best" way to do this depends somewhat on the view itself, so a generalized example about "someview" is not provided here.

See below for specific examples using real tables and a real view.

Specific Examples

Returning to the story of TIM and JIM in the previous section, suppose a researcher named Dawn published a paper in 2015 that relied on data from a query that she executed at 2015-05-05 05:05:05. In 2015, it was not yet known that TIM and JIM were two names for the same individual, so Dawn's data includes both names and presumes that they are distinct individuals.

Several years later a new researcher, Pam, wants to revisit Dawn's analysis. As a first step, she needs to recreate Dawn's dataset. Pam cannot do this with the data in the babase schema alone; the TIM/JIM misidentification has already been identified and addressed in her time, so only one of those IDs is still present in Pam's "current" data. She needs the data exactly as it was when Dawn collected it.

Dawn's analysis used data from the BIOGRAPH and MATUREDATES tables, and interactions recorded in the ACTOR_ACTEES view. When collecting data to recreate the analysis, the code Pam would use to collect data from the tables is relatively simple:

WITH biograph_all AS (SELECT * FROM babase.biograph
                      SELECT * FROM babase_history.biograph_history)
  FROM biograph_all
  WHERE sys_period @> '2015-05-05 05:05:05'::timestamptz

WITH maturedates_all AS (SELECT * FROM babase.maturedates
                         SELECT * FROM babase_history.maturedates_history)
  FROM maturedates_all
  WHERE sys_period @> '2015-05-05 05:05:05'::timestamptz


Knowing that the ACTOR_ACTEES view is a join between INTERACT_DATA, two instances of PARTS, and two subqueries of MEMBERS, Pam accounted for the Sys_Period of each of those tables and recreated Dawn's May 2015 data using:

WITH dawns_time AS -- Declare the date/time once here so it doesn't
                   -- need to be retyped for every table
                   (SELECT '2015-05-05 05:05:05'::timestamptz AS this_time)
   , interact_data_all AS (SELECT * FROM babase.interact_data
                           SELECT * FROM babase_history.interact_data_history)
   , dawns_interact_data AS (SELECT *
                               FROM interact_data_all
                               WHERE sys_period @> (SELECT this_time FROM dawns_time))
   , parts_all AS (SELECT * FROM babase.parts
                   SELECT * FROM babase_history.parts_history)
   , dawns_parts AS (SELECT *
                       FROM parts_all
                       WHERE sys_period @> (SELECT this_time FROM dawns_time))
   , members_all AS (SELECT * FROM babase.members
                     SELECT * FROM babase_history.members_history)
   , dawns_members AS (SELECT *
                         FROM members_all
                         WHERE sys_period @> (SELECT this_time FROM dawns_time))
   , dawns_actor_actees AS (SELECT dawns_interact_data.iid AS iid
                                 , dawns_interact_data.sid AS sid
                                 , dawns_interact_data.act AS act
                                 , dawns_interact_data.date AS date
                                 , dawns_interact_data.start AS start
                                 , dawns_interact_data.stop AS stop
                                 , dawns_interact_data.observer AS observer
                                 , actor.partid AS actorid
                                 , COALESCE(actor.sname, '998'::CHAR(3)) AS actor
                                 , (SELECT actorms.grp
                                      FROM dawns_members AS actorms
                                      WHERE actorms.sname = actor.sname
                                            AND actorms.date = interact_data.date) AS actor_grp
                                 , actee.partid AS acteeid
                                 , COALESCE(actee.sname, '998'::CHAR(3)) AS actee
                                 , (SELECT acteems.grp
                                      FROM dawns_members AS acteems
                                      WHERE acteems.sname = actee.sname
                                            AND acteems.date = interact_data.date) AS actee_grp
                                 , dawns_interact_data.handwritten AS handwritten
                                 , dawns_interact_data.exact_date AS exact_date
                              FROM dawns_interact_data
                                   LEFT OUTER JOIN dawns_parts AS actor
                                        ON (actor.iid = dawns_interact_data.iid AND actor.role = 'R')
                                   LEFT OUTER JOIN dawns_parts AS actee
                                        ON (actee.iid = dawns_interact_data.iid AND actee.role = 'E')
  FROM dawns_actor_actees


Page generated: 2024-05-31T10:38:22-04:00.