Tuesday, October 27, 2009

using the RMAN backup as copy as base for Disaster Recovery

In an earlier post I have elaborated on the advantages the backup as copy feature offers to quickly recover from a damaged database.

In this article I will try to show how this advantage goes even further by adding another server to the infrastructure, introducing some kind of Disaster Recovery solution. OK, it is not fully featured like Oracle Dataguard, but it works and can be the first step towards Disaster Recovery. No need for a standby database, redo log shipping etc. It is all plain simple and straight-forward.

Shopping list:
  1. 2 servers, not necessarily equal in specifications, but having identical Operating Systems
  2. Clustered File System (OCFS2 will suffice, even NFS will do the job)
First of all, you need to set up OCFS. Tons of information out there on the Internet to get your OCFS working. It is not difficult, See this PDF for information.
What I have done to make this work is set up 2 OCFS2 File Systems, one under /oradata and one under /orafra.
Next, install the Oracle RDBMS software on both machines, each using the same location for the Oracle Home and installing identical copies.

Once you have OCFS2 set up on your 2 servers, create or move your database onto /oradata, and make sure the DB_FILE_RECOVERY_DEST parameter points to /orafra and the DB_FILE_CREATION_DEST points to /oradata. Don't forget the DB_FILE_RECOVERY_DEST_SIZE parameter! Set it to a size equal to the file system where it is located, to make sure you will not run out of space too soon.

Your infrastructure will look like this (more or less):

/oradata is the location for you database, /orafra is for the Flash Recovery Area.

Now we can prepare the secondary machine to serve as Failover server.

To do this, take the following steps:

1. Copy the server parameter file and bring it to the secondary machine under $ORACLE_HOME/dbs. Edit this file and make adjustments to the following parameters:
CONTROL_FILES = /orafra/SID/controlfile/cp_cntrl.ctl
Optionally, you can create an spfile from this parameter file on the secondary machine.

2. Copy the /etc/oratab file to the secondary node

3. Create the necessary dump directories ($ORACLE_HOME/admin)

Now you are ready to go! Bring up your database on the primary machine and create a full backup as copy from your database:

allocate channel for maintenance type disk;
configure controlfile autobackup on;
configure default device type to disk;
copy current controlfile to ‘/orafra//controlfile/cp_cntrl.ctl’;

This script will do everything for you. See this post for an explanation of the script.

Now let's make some changes to the database and run the backup script again (NOTE: in order to have all changes you made to the database immediately reflected in the copy of your database, remove the UNTIL clause in the first line of the backup script).

What you can do now, is to simulate a crash of the database (there are a lot of ways to do this, I would recommend you just issue a shutdown abort at your first attempt).

Go to the second machine, log in as the oracle user, source your environment and perform the following steps to recover your database:

  1. start an rman session to the database (which is not started yet)
  2. mount the controlfile (located in /orafra/SID/controlfile, pointed to by the parameter file)
  3. issue "switch database to copy"
  4. issue "recover database"
  5. issue "alter database open resetlogs"

That is all it takes. Now your database is opened and available from the secondary machine. Now make sure that you reverse file locations in the backup scripts on the secondary machine to reflect the database file locations and the Flash Recovery Area. The database files should be located in /orafra, the flash recovery area should be located in /oradata. If you want to switch back, just use the backup script to create another backup as copy and switch back the way you did earlier. Make sure you have defined the correct file locations on the primary server, alike the secondary machine, to smoothen the switch.


Like what is the case with any other regular backup, when you have made changes to the physical structure of your database (like adding datafiles, tablespaces, etc), an incremental backup needs to be taken immediately after, to prevent issues while recovering the database from the copy.

Another caveat is that it is necessary to have both database servers available in your tns alias being used to access the database. In such a case it is important to have load balancing set to off and failover set to on, having the primary as the first address.


If your server is still available, but the database has become corrupted, you can easily switch to the copy without the need for the secondary machine with a similar procedure as mentioned above. This way you will always have the choice of what to do.

Happy testing!

No comments:

Post a Comment