Recovering a dataguard standby after downtime

Recently, we had a planned weekend of standby downtime on a production system from Friday afternoon to Monday morning.  This was all fine, and when the database came back up I expected it to just do it's FAL thing (Fetch Archive Log) and catch up automatically.

Except I forgot 2 important things:
  1. Backups on this system are taken on the primary
  2. Archive log deletion policy on the primary is set to "BACKED UP 1 TIMES TO DISK"
Thus when I came to start the redo apply on Monday morning, I found the archive logs had been deleted from the FRA on the primary and the standby couldn't recover automatically.

That was when I started to feel that sinking feeling you get when you suddenly realise that you could have avoided this.

I found out from the alert log which archive logs were needed.  This shows in the alert log as something like this:
FAL[client]: Failed to request gap sequence 
GAP - thread 1 sequence 27658-27658
  DBID 123456789 branch 823623081
FAL[client]: All defined FAL servers have been attempted.
I used RMAN (i.e.: 'list backup of archivelog until time '14-SEP-2015') to find out which backup pieces these archive logs were in.  I was missing Friday evening through to Saturday afternoon in the primary FRA, all of these would be in Friday & Saturday night's backups.

At this point, I did two things:
  1. I raised a severity 1 SR with Oracle support.  This was not really because I wanted help with the recovery but because I felt it was important to cover my arse
  2. I copied the backup pieces containing the relevant archivelogs to the standby
I could have used incremental backups to roll the database forward to a point where it could recover itself, however decided to stick to what I knew would work.

At the RMAN prompt:
RMAN> catalog start with '/path/to/backup/piece/with/archive/logs';
[...]
RMAN> recover database;
At this point, I got the call from Oracle support.  I explained I'd started the recovery and things seemed to be going OK and they was happy to leave the support call open until recovery had completed.

Recovery finished with the usual error about further recovery needed.  I opened the database read-only and started the managed recovery process ('ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;')

Things were, finally, back to normal.

Comments

Popular posts from this blog

Data Guard with Transparent Application Failover (TAF)

RMAN-05531 During RMAN Duplicate from Active Data Guard Standby

Data pump - "ORA-39786: Number of columns does not match between export and import databases"