Data Guard with Transparent Application Failover (TAF)

One of the problems with Data Guard is that even after switching/failing over to the standby database, all clients need to be reconfigured to communicate with the new primary.

The simplest way to achieve this (if using a centralised tnsnames.ora), is to modify the file and add the line "(FAILOVER=YES)", and either multiple (DESCRIPTION=...) or (ADDRESS=...) lines, one for each of the standby systems, in the order that the client should attempt connections. This tells the client to try each address in turn until one succeeds.

Doing so provides basic failover functionality within Oracle Net, and will probably function correctly most of the time.  Though be aware - because both primary and standby can share the same instance name (e.g. the same SID, therefore the same connection string), there is potential for this simple scheme to go wrong.

An example in an active data guard environment (where the standby is open read only): a temporary network issue causes all connections to the primary to fail, but the service is still up.  Connection attempts will try the primary, fail, and then connect to the second address specified within tnsnames.ora.  The app gets no notification that it's connected to the standby, it just gets silently redirected.

As no failover or switchover has happened and the standby is still read-only, any attempt to write to the database fails and the app won't know why!  As far as the application is concerned, it's requested a connection to the database it always uses, connected OK and should to be able to write.  This could cause problems if the application doesn't handle this situation well.

How to avoid this particular problem?  Dynamic service registration.

Using DBMS_SERVICE, it's possible to register the database with the listener as any arbitrary service name:

BEGIN
  DBMS_SERVICE.CREATE_SERVICE(
    SERVICE_NAME => 'my_service1.localdomain',
    NETWORK_NAME => 'my_service1.localdomain',
    FAILOVER_METHOD => 'BASIC',
    FAILOVER_TYPE => 'SELECT',
    FAILOVER_DELAY => 1,
    FAILOVER_RETRIES => 10
  );
  DBMS_SERVICE.START_SERVICE(
    SERVICE_NAME => 'my_service1.localdomain'
  );
END;
/

(The failover_method, type, delay and retries settings should all be adjusted for your environment - see docs here)

Checking the output of lsnrctl status after running this, should show something like this:

Services Summary...
Service "SID" has 1 instance(s).
  Instance "SID", status READY, has 11 handler(s) for this service...
Service "my_service1.localdomain" has 1 instance(s).
  Instance "SID", status READY, has 11 handler(s) for this service...
The command completed successfully

Combining this with a trigger that fires after the database starts up, the service names that are available to connect to can be made dependent on what mode the database is in (primary or standby).

An example trigger, which I originally found here, but modified for my own purposes:

CREATE OR REPLACE TRIGGER SWITCH_SERVICE_NAME AFTER STARTUP ON DATABASE
DECLARE
  ROLE VARCHAR(30);
  OMODE VARCHAR(30);
BEGIN
  SELECT DATABASE_ROLE INTO ROLE FROM V$DATABASE;
  SELECT OPEN_MODE INTO OMODE FROM V$DATABASE;
  IF ROLE = 'PRIMARY' 
  THEN
    DBMS_SERVICE.START_SERVICE ('myservice.localdomain');
  ELSIF ROLE = 'PHYSICAL STANDBY'
  THEN
    DBMS_SERVICE.START_SERVICE ('myservice_standby.localdomain');
  END IF;
END;
/

Then in tnsnames.ora:

PRODUCTION =
   (DESCRIPTION=
      (FAILOVER=ON)
        (ADDRESS=(PROTOCOL=tcp)(HOST=hostname1.localdomain)(PORT=1521))
        (ADDRESS=(PROTOCOL=tcp)(HOST=hostname2.localdomain)(PORT=1521))
      (CONNECT_DATA =
         (SERVICE_NAME = myservice.localdomain)
         (SERVER = DEDICATED)
      )
   )

This guarantees that should connection attempts to hostname1 fail, the app won't accidentally connect to hostname2, because the database is advertising itself as "myservice_standby.localdomain", and we're requesting a connection to "myservice.localdomain".

Comments

  1. can you please post the listener contents or listener file also for both primary and standby

    Samir

    ReplyDelete
  2. This article really helped me a lot, made me look like a hero at the office after implementing this lol

    ReplyDelete

Post a Comment

Popular posts from this blog

RMAN-05531 During RMAN Duplicate from Active Data Guard Standby

Data pump - "ORA-39786: Number of columns does not match between export and import databases"