SOWN Backup Cloud
The SOWN Backup Cloud project is designed as a means of backing up important content and configuration from all SOWN servers, so if there is a failure the server can be rebuilt without losing anything important. It is a cloud as we currently have three different servers that hold backups:
- Sown-gw - In Zepler (building 59) level 1 server room
- Backup2 - In building 16 level 2 server room
- Backup3 - In building 32 level 3 north server room
Currently, the backup service runs once a day, rsync-ing all the files onto a central location Sown-auth2 and then rsync-ing all these files onto the three backup servers. Once on these backup servers, a tarball is created of those files on that particular day. All tarballs from the previous 7 days are retained along with all weekly (Sunday) tarballs from the past 30 days and all monthly (1st ofthe month) tarballs. Periodically older monthly tarballs are manually deleted when the servers start to run out of space.
As rsync is only incremental old files will not be deleted even if they are deleted from their original location. Therefore there is a tidy-backups script that runs on each server to deal with directories where this is a particular issue, (e.g. the remote syslogs captured off nodes stored on Sown-auth2 under /srv/www/sisyphus/docs/).
Proposed changes to backups
Backup2 and Backup3 are in need of upgrading as there disks are really too small (< 100GB). As part of this process there has been a review of how backups should be managed. There are currently two suggested changes:
- Tarball the daily backup on the central server (Sown-auth2) and then SCP this to each backup servers rather than getting each server to do the tarball. Backup2 and Backup3 have rather slow CPUs and this dramatically increases the time it takes to do a daily backup.
- Run rsync in a mode where old files are deleted if they have been deleted from their original location, so the backups are a more accurate representation of what is currently on the servers being backed up.
meshach that is one of the 1U quarter depth black boxes like Backup2 and Backup3 (abednego and shadrach) has been deployed as a replacement Backup3. meshach has a 1TiB SATA drive rather than the smaller IDE disks on the old Backup2 and Backup3. It is running a 32-bit non-pae version of Ubuntu 14.04.
Old Backup3 (shadrach) has now been upgraded with a 1TB disk and has replaced the old Backup2 but retains that CName but a new A record of shadrach. There still needs to be some sorting out ECS hostnames to change the A record for 188.8.131.52 to shadrach.ecs.soton.ac.uk.
The changes to the way backups are made in general are still required. However, this is less urgent, as though the backup process still takes some time (around 4 hours) the immediate disk space issues has been resolved. It may be worth considering changing the backup start time to around 4am so backups are complete around 8am rather than gone 10am.