How to Back Up Alfresco – Step by Step With Script

How to Back Up Alfresco – Step by Step With Script

Alfresco is an open source data management tool that I’ve recently implemented and rely heavily on.  However finding complete instructions on how automate Alfresco backups was a bit of a challenge. Most sources were sketchy with good points scattered here and there.  Eventually I put together a set of instructions for my own future reference but decided to share them here so others could benefit.

This article is based on the script originally written by Francesco Corti although I needed to do a number of modifications to make things work for me.  I also added a bit of depth that I gathered from other sources as I refined my process.

In my setup, I’m using Ubuntu 12.04 LTS and a PostgreSQL database.

Backup Overview and Steps

This is a cold backup meaning that the server does need to be stopped and access to Alfresco over the duration of the backup will be interrupted.  There are ways of doing hot backups but since most of my usage occurs during business hours I decided to go with the easier and less error prone cold version.

Backup Steps

  1. Create a backup script that will do the following:
    1. Stop Alfresco and related services
    2. Backup the Alfresco database
    3. Backup the Alfresco data folder
    4. Merge data files
    5. Restart services
    6. Delete outdated files to conserve space
  2. Create a postgreSQL .pgpass file that will pass credentials to your database backup
  3. Schedule your script using cron

Alfresco Backup Script

Download complete script here

Please note that I kept error checking to a minimum for simplicity’s sake.  However I did include a few checks in key areas such as in the Alfresco stop script which could potentially damage your data.  If you’re not familiar with shell scripts, an excellent tutorial can be found on FreeOS.com.  Most of what you need to know is located in chapters two and three.

Script Header

We start off with some housekeeping.  Mainly we need to specify the general locations of things in the Alfreso setup.  My Alfresco is installed in a /data/server/ folder so I configure my paths accordingly.  I also needed to create a custom timestamp that could be used to provide backup files with unique names.  The other variables will be talked about as they get used.

Stopping Alfresco

Since this is a cold backup, we need to run the Alfresco stop script.  The thing to note here is that an error check must be present.  According to the Alfresco Wiki, attempting to copy the Alfresco data files while services are running can almost definitely lead to data corruption.  Here is the warning message from the Wiki article:

IMPORTANT NOTE: Never, under any circumstances, attempt to backup the lucene-indexes subdirectory while Alfresco is running. Doing so is almost certain to cause Lucene index corruption. Use ‘backup-lucene-indexes’ instead.

To avoid this, simply do a $?=0 test after the stop script to see if it returned any exit numbers.  If so it’s best to call it quits.

Alfresco Start Script

Nothing special about the Alfresco start function.  You simply call the same alfresco.sh script passing the start argument.

PostgreSQL Start Script

You may be wondering why you need a PostgreSQL start function.  This is because the Alfresco stop script stops all Alfresco related services such as Tomcat and PostgreSQL.  In order to back up the database however, the database engine needs to be running.  So what we do is stop all services, start PostgreSQL, then back up the database.  Again, aside from understanding why, there really is nothing special to this.  Simply call the ctl.sh script with a “start” argument.  The script should be located in scripts folder of the PostgreSQL root (which, if you’ve done the standard Alfresco install, should itself be located in your Alfresco directory).  If you’re using MySQL as your database then you will need to modify your script accordingly.

Alfresco Backup Destination

Next we need need to verify that a backup destination was provided to our script. Nothing unusual here.  We simply verify that an argument has been passed in and confirm that it is of type “folder” with the “-d” option of IF.  If a folder has been specified then we save it to the TARGET_FOLDER variable which we use throughout the rest of the script.  If not, we offer up the typical “usage” message and quit.

1 – Call Alfresco Start Function

We now get into the actual script steps and start off by calling the stop Alfresco script.

2 – Backup Database

Next we back up the database.  As discussed above, we first need to start the PostgreSQL service since it gets stopped when we shut down Alfresco.

We create a file name called DB_DUMP that includes our timestamp variable.  Then call the PostgreSQL function pg_dump and export the alfresco database to our specified target destination.

I do a simple (if not pointless) error check here just to show how an error can be caught in this type of scenario.  You can use this to generate a more elaborate event although remember to call your al_start function at some point or else Alfresco will never get fired back up.

3 – Backup Alfresco Content Folders

Backing up the content folders is relatively simple.  All of your Alfresco data files should exist in the alf_data directory which we’ve saved as AL_FOLDER.  So a tar of the folder to our target destination is all that’s needed here.

4 – Merge Database and Data Files

At this point we have two backup files.  An alfresco_db_TIMESTAMP.tar and an alfresco_data_TIMESTAMP.tgz.  To make life easy, combining these two files into a single backup set is a great idea.  The following step does just that.

We start by including both files into a single tar then if successful, we delete the individual standalone files.  You’ll note that a variable called SUCCESS is set to one here.  This will be used in step 6 below.

5 – Start it up again

At this point we’re all done with our backup.  We can start Alfresco so as to minimize down time.

6 – Deleting Old Files

Since Alfresco keeps version information on all files (if enabled), there was really no need to keep too many copies of the backup.  In fact one should be more than enough.  However for security reasons having a few backups lying around is a good idea just in case something gets corrupted.

We start by checking if the SUCCESS variable that was set in step 4 above.  We don’t want to start deleting backup files unless we’re sure that the files were successfully combined.

At this point our scrips is complete.  I keep mine in the root of my alfreso install but you’re free to put yours anywhere that makes sense for you.

Also, remember that you need to first make this script executable by typing:

Creating a PostgreSQL Password File

As it stands your script will currently fail to back up your database.  In fact it will hang in step 2 on the pg_dump command as postgreSQL will ask for a password.  To avoid the password prompt you first need to create a password file for the account that will be running the script.  In this case that will be anyone who schedules the script using Cron.

To start, go to the home directory of the user.  You can do that by typing the following:

Next create a pgpass file using your favorite editor.  I use nano but you can use vi or anything else that works.

Type the following line but replace [DB-PASSWORD] with the password of the postgres database that you entered when you set up Alfresco.  Save the file.

If you don’t see the file don’t worry in your folder.  Files starting with a “.” (dot) are linux configuration files.  You can see them using the ls -a command.

Once saved you need to set current-user permissions on the .pgpass file.  This will ensure that only the current user can make use of the password stored within.

You should not be able to run the script successfully without any password prompts.

Schedule Alfresco Backup Script

To schedule your Alfresco script to run repetedly, you need to use the Linux Crontab.  At the command prompt type the following:

The first time you use Crontab you may get a message stating that there is “no crontab for user [you]”.  This is okay. Simply select an editor from the list available.  I suggest nano if you’re new to Ubuntu text editors.

Now add the following line at the bottom of the cron file.  It states that the alfrescoBak.sh script is to run every day at 4:00 am passing the backup folder as the script argument.  Remember to change the path of your script and the backup target folder as needed.

The last step is to restart the cron service so as to have it register your script.  To do this simply type the following two lines into your terminal.

That’s it!  You’re done.  Your script should automatically get fired up according to your schedule and your backup files should be generated up to however many you specified as that maximum.  Your next steps should be to find a way to get the files to a remote location or to back them up to some alternate media.

 

If you liked this article, have questions or something to add then please let me know by leaving your comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *