Differences between revisions 14 and 15
Revision 14 as of 2009-11-11 20:13:25
Size: 14174
Editor: KarlPinc
Comment: Shutting down cleanly
Revision 15 as of 2009-11-11 20:23:33
Size: 14881
Editor: KarlPinc
Comment: Disconnecting from the backup computer
Deletions are marked like this. Additions are marked like this.
Line 145: Line 145:

==== Disconnecting from the backup computer ====

Those who use ssh to connect to the backup computer can disconnect by typing:

{{{
exit
}}}

This command will have to be typed twice if the {{{su}}} command is in effect, once to exit from the su command and cease being root and again to exit as a normal user.

The exit command can often be typed rapidly enough after the machine has been told to halt or reboot that the session will end before the backup computer finishes it's shutdown sequence. Typing {{{exit}}} in these circumstances has the advantage of taking effect immediately. It may otherwise take some time for the ssh session to realize that the backup computer has stopped.
Line 298: Line 310:
After verification that the backup server is operational type After verification that the backup server is operational type:
Line 313: Line 325:
To avoid any possibility of data loss first [#connecting connect to the backup computer] as described above. Then type To avoid any possibility of data loss first [#connecting connect to the backup computer] as described above. Then type:

Administration Guide to Backups of the Entire Computer

TableOfContents()

Design Overview

The backup computer must be physically plugged into an ethernet network that is itself connected to the Internet. It will automatically configure itself via [http://en.wikipedia.org/wiki/Dhcp DHCP].

The backup computer is a stock [http://debian.org/ Debian] Linux computer. Backups are kept on an external USB 2.0 hard drive. Backups are daily [http://rsync.samba.org/ rsync] snapshots, complete copies, of all of papio's filesystems via a script invoked from /etc/cron.d/rsync_backup. There should be at least 100 of them, probably more, in directories named by timestamp. The oldest backups are removed as space runs out or the limit of 400 backups is reached. The filesystem is ext3. On the backup computer the external hard drive is mounted on /srv/backups/papio/.

The expectation is that to restore we would give the external hard drive to Duke technical support staff. We could restore individual files or even database backups over the network but anything close to a restore on bare metal we'd expect Duke to have primary responsibility.

Papio backs up the database to disk daily (by /etc/cron.daily/[http://papio.biology.duke.edu/repos/babase/doc/pgsql_examples/babase_postgres_backup.cron babase_postgres_backup.cron]) as standard pg_dump files. The output is in /srv/babase_database/postgres/. To restore the database use the the standard postgres restore tools, i.e. pg_restore. See the restore instructions in the backup script.

The VPN Tunnel

Because the backup server may not have a static IP, and to get around NAT issues on the network to which the backup server is connected, the backup server initiates a non-necessarily-encrypted (but defiantly authenticated) [http://openvpn.net/index.php/open-source.html OpenVPN] VPN tunnel to papio. All communication between papio and the backup server is through this tunnel.

Papio initiates all the backups. Papio ensures that the VPN tunnel to the backup server does not forward to the rest of the network. Other than the VPN connection there are no inbound connections to papio.

The OpenVPN tunnel used for backups listens on papio on port 1195, rather than the usual 1194. This is because the VPN used for backups uses certificate authentication whereas the regular VPN does not.

The rsync command uses --hard-links so that no additional storage is allocated for those files that do not change between backups. The ext3 backup partition is created with (mke2fs -i 4096) 4096 bytes per inode to allocate the additional inodes such a scheme requires.

File ownership

Backups are stored with the numeric uid and gids used on papio. This may or may not, likely not, correspond to the uids and gids on the backup sever. Caution is required.

The backup script

The backup script is custom because at the time of this writing the rsnapshot program does not purge based on partition space available or is otherwise oriented around partitions.

Backups of the backup server(s)

The backup server is itself backed up to papio using the same methods. Again, all connections are initiated by papio.

Administrative Tasks

There are two administrative tasks to be performed daily.

The administrator of the backup computer receives daily emails reporting the status of the daily security updates. The first task of the administrator is to monitor these emails. Should the email report errors, an extremely unlikely occurence, a Linux administrator should be called in to examine the situation. Should the email report that manual intervention is required to install a security update the administrator should follow the procedure below to install the security update and to reboot the backup computer.

Should a daily backup, or some other automated operation, fail, the administrator will receive an email reporting this problem. The second task of the administrator is to refer these failures to a Linux administrator for resolution.

Aside from the backup itself, the only automated process specific to the backup system is a daily check that a minimum number of backups exist. Currently this minimum is set to 30. Mail is sent to the administrator for possible action should the number of existent backups fall below the limit. This number, as well as the maximum number of backups to keep, is set in /etc/cron.d/purge_backups. (The maximum should be set low enough to keep the filesystem from running out of inodes.)

Anchor(connecting)

Connecting to the backup computer

Logins, usernames and passwords, are required to connect to the backup computer. They are handed out by Karl, or whomever has the [http://en.wikipedia.org/wiki/Superuser root] password.

Those with physical access can plug in a keyboard and screen, and even a mouse if a GUI is desired.

Connections may be made to the backup computer using ssh from the physical network (LAN) to which the backup computer is plugged in. The backup computer obtains it's IP address via DHCP. If connections via the LAN are to be made it is up to the LAN administrator to keep track of the IP address assigned to the backup computer.

The presumption is that NATting will prevent arbitrary hosts on the Internet from connecting to the backup server. If this is not the case it is up to the network administrator of the backup server's network to firewall the backup server to prevent ssh connections from the Internet.

Connections from the Internet to the backup computer are made over ssh, via putty or some other ssh client. The user must first connect to papio and then connect over the VPN to the backup computer. This approach serves two purposes: it bypasses NATting and inbound connection firewalling; and it renders moot the occasional random changes most consumer ISPs make to assigned IP addresses which, in turn, randomly change the "location" of the backup server on the Internet.

Once logged in to papio the command to connect to the backup server (where username is your assigned login name) is:

ssh -l username backup-server1

If the username on the backup server is the same as the username used on papio the command may be shortened to:

ssh backup-server1

Disconnecting from the backup computer

Those who use ssh to connect to the backup computer can disconnect by typing:

exit

This command will have to be typed twice if the su command is in effect, once to exit from the su command and cease being root and again to exit as a normal user.

The exit command can often be typed rapidly enough after the machine has been told to halt or reboot that the session will end before the backup computer finishes it's shutdown sequence. Typing exit in these circumstances has the advantage of taking effect immediately. It may otherwise take some time for the ssh session to realize that the backup computer has stopped.

Daily messages

There are 3 sorts of daily messages.

If no message at all is received this is a sign of trouble and should be investigated.

Typically if an email is not received it is because the DSL modem used to connect the backup server's network to the Internet has locked up. The typical solution is to physically unplug the power cord of the DSL modem, wait 30 seconds, plug the power back in, and wait 2 minutes before testing that the Internet is again available on the local network.

Nothing done

The typical daily message will indicate that nothing happened.

Subject:        Cron <root@foo> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
Date:   09/09/2009 06:25:33 AM
From:   Cron Daemon <root@foo.example.com>
To:     root@foo.example.com

/etc/cron.daily/security_updates:
No security updates to apply.

The administrator need do nothing upon receiving such a message.

Security updates performed

On occasion the system will automatically update itself. This example shows 2 packages being updated.

Subject:        Cron <root@foo> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
Date:   09/09/2009 06:25:33 AM
From:   Cron Daemon <root@foo.example.com>
To:     root@foo.example.com

/etc/cron.daily/security_updates:
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Reading extended state information...
Initializing package states...
Reading task descriptions...
The following packages will be upgraded:
  libmysqlclient15off mysql-common
2 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 1920kB of archives. After unpacking 0B will be used.
Writing extended state information...
(Reading database ... 40166 files and directories currently i
nstalled.)
Preparing to replace mysql-common 5.0.51a-24+lenny1 (using .../mysql-common_5.0.51a-24+lenny2_all.deb) ...
Unpacking replacement mysql-common ...
Preparing to replace libmysqlclient15off 5.0.51a-24+lenny1 (using .../libmysqlclient15off_5.0.51a24+lenny2_i386.deb) ...
Unpacking replacement libmysqlclient15off ...
Setting up mysql-common (5.0.51a-24+lenny2) ...
Setting up libmysqlclient15off (5.0.51a-24+lenny2) ...
Reading package lists...
Building dependency tree...
Reading state information...
Reading extended state information...
Initializing package states...
Reading task descriptions...

Note that the important part is the bottom where, unlike the next example, there is no indication that the administrator need manually intervene. There are times when the system automatically installs some updates but others require manual installation.

The administrator should scan such emails for words like "error" or "fail" in the unlikely event that something failed during the automatic security update.

Manual intervention required

Less frequently the administrator must manually make a security update and reboot the backup computer. Messages like the one that follows indicate this.

Subject:        Cron <root@foo> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
Date:   09/09/2009 06:25:33 AM
From:   Cron Daemon <root@foo.example.com>
To:     root@foo.example.com

/etc/cron.daily/security_updates:
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Reading extended state information...
Initializing package states...
Reading task descriptions...
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
Reading package lists...
Building dependency tree...
Reading state information...
Reading extended state information...
Initializing package states...
Reading task descriptions...

Security updates that have been held back:
ihA linux-image-2.6.26-2-486        - Linux 2.6.26 image on x86

The important parts here are the last 2 lines. If the message contains the line:

Security updates that have been held back:

Then manual intervention is required.

The second line

ihA linux-image-2.6.26-2-486        - Linux 2.6.26 image on x86

reports which package must be manually upgraded. In this case the package is linux-image-2.6.26-2-486.

After identifying the package(s) which must be manually updated, ssh in to the backup server to perform the update and reboot the computer. The following sequence of commands will update, e.g., the linux-image-2.6.26-2-486 package. (Explanatory remarks beginning with ;# follow some commands.)

su                                               ;# You will be prompted for the root password.
aptitude update                                  ;# These next two commands perform 
aptitude safe-upgrade                            ;# non-security related updates.
aptitude unhold linux-image-2.6.26-2-486         ;# This begins the update of the desired package.
aptitude safe-upgrade                            ;# If prompted whether you really wish to replace
                                                 ;# currently running kernel, answer affirmatively.
aptitude hold linux-image-2.6.26-2-486           ;# Force manual updates in the future.
aptitude search linux-image-2.6.26-2-486         ;# Check the last step worked.
                                                 ;# The first 2 letters should be 'ih'.
reboot                                           ;# Restart the computer with the new kernel.

As the backup computer reboots your ssh session will be terminated and you will return to your login (ssh) session on papio. Check to ensure that the backup computer successfully reboots with the following command:

ping backup-server1

You know the backup server is responding when replies like the following are received:

64 bytes from foo.example.com (192.168.199.2): icmp_seq=1 ttl=254 time=1.59 ms

Terminate the ping command by holding down the Control key (usually labeled "Ctrl") and, while Ctrl is held down, pressing the "c" key.

Note that on occasion, typically at least every 120 days, the backup server will perform file system checks as part of the reboot. This can take quite some time, possibly hours, and the system will not respond to pings until it has finished. If the backup system does not come back up after reboot contact a Linux administrator for assistance.

After verification that the backup server is operational type:

exit

to end your ssh session with papio.

Note: At this time the only packages that require manual intervention are new versions of the kernel. This list should probably be expanded to include certain critical libraries like glibc.

Shutting down the computer

There are times when the computer must be dis-connected from power. If the backup computer is physically powered off without first being shutdown data loss is possible, although it is unlikely that anything significant will be lost unless a backup is running at the time of power loss and even then it is unlikely that database content will be lost.

To avoid any possibility of data loss first [#connecting connect to the backup computer] as described above. Then type:

su                                               ;# You will be prompted for the root password.
halt

After power is physically restored the power button on the front of the computer must be pressed to restart the machine.

OSBackups (last edited 2024-06-20 14:54:49 by JakeGordon)

Wiki content based upon work supported by the National Science Foundation under Grant Nos. 0323553 and 0323596. Any opinions, findings, conclusions or recommendations expressed in this material are those of the wiki contributor(s) and do not necessarily reflect the views of the National Science Foundation.