Friday, September 12, 2014

Adding Far Sync Instances to Existing Data Guard Configuration

This post list the steps of adding far sync instances to an existing data guard configuration (for both primary and standby databases, to be used when standby become primary). Far sync is an new feature introduced with 12c which allows transporting of redo data synchronously from primary DB to a "near by" far sync instance, which then transport the redo data asynchronously "over a longer distance". The idea is that there's low overhead on primary when transporting redo synchronously to a "near by" far sync instance compared to transporting "over a long distance" to achieve zero data loss as well as off-loading of the redo transport.
Oracle documentation provides a complete description of the far sync concept. Given below are few important excerpts
Many configurations have a primary database shipping redo to a standby database using asynchronous transport at the risk of some data loss at failover time. Using synchronous redo transport to achieve zero data loss may not be a viable option because of the impact on the commit response times at the primary due to network latency between the two databases. Creating a far sync instance close to the primary has the benefit of minimizing impact on commit response times to an acceptable threshold (due to the smaller network latency between primary and far sync instance) while allowing for higher data protection guarantees -- if the primary were to fail, and assuming the far sync instance was synchronized at the time of the failure, the far sync instance and the terminal standby would coordinate a final redo shipment from the far sync instance to the standby to ship any redo not yet available to the Standby and then perform a zero-data-loss failover.
A far sync instance manages a control file, receives redo into standby redo logs (SRLs), and archives those SRLs to local archived redo logs. A far sync instance does not have user data files, cannot be opened for access, cannot run redo apply, and can never function in the primary role or be converted to any type of standby database.
Far sync instances are part of the Oracle Active Data Guard Far Sync feature, which requires an Oracle Active Data Guard license.
In a configuration that contains a far sync instance, there must still be a direct network connection between the primary database and the remote standby database. The direct connection between the primary and the remote standby is used to perform health checks and switchover processing tasks. It is not used for redo transport unless the standby has been configured as an alternate destination in case the far sync instance fails and there is no alternate far sync configured to maintain the protection level.
The existing data guard configuration's primary database parameter setting and active data guard creation script of the standby database is given below. (other prerequisites for setting up data guard is omitted)
Primary database parameter changes
alter system set log_archive_config='dg_config=(ent12c1,ent12c1s)' scope=both ;
alter system set log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=ent12c1' scope=both;
alter system set log_archive_dest_2='service=ENT12C1STNS ASYNC NOAFFIRM max_failure=10 max_connections=5 reopen=180 valid_for=(online_logfiles,primary_role) db_unique_name=ent12c1s' scope=both;
alter system set log_archive_dest_state_2='defer' scope=both;
alter system set log_archive_dest_state_1='enable' scope=both;
alter system set fal_server='ENT12C1STNS' scope=both;
alter system set log_archive_max_processes=10 scope=both;
alter system set db_file_name_convert='/opt/app/oracle/oradata/ENT12C1S','/data/oradata/ENT12C1' scope=spfile;
alter system set log_file_name_convert='/opt/app/oracle/oradata/ENT12C1S','/data/oradata/ENT12C1' ,'/opt/app/oracle/fast_recovery_area/ENT12C1S','/data/flash_recovery/ENT12C1' scope=spfile;
alter system set standby_file_management='AUTO' scope=both;
Standby creation script
mkdir -p  /opt/app/oracle/oradata/ENT12C1S/controlfile
mkdir -p /opt/app/oracle/fast_recovery_area/ENT12C1S/controlfile

duplicate target database for standby from active database spfile
parameter_value_convert 'ent12c1','ent12c1s','ENT12C1','ENT12C1S','data','opt/app/oracle','flash_recovery','fast_recovery_area'
set db_unique_name='ent12c1s'
set db_create_file_dest='/opt/app/oracle/oradata'
set db_recovery_file_dest='/opt/app/oracle/fast_recovery_area'
set db_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/ENT12C1S'
set log_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/ENT12C1S','/data/flash_recovery/ENT12C1','/opt/app/oracle/fast_recovery_area/ENT12C1S'
set log_archive_max_processes='10'
set fal_server='ENT12C1TNS'
set log_archive_dest_2='service=ENT12C1TNS ASYNC NOAFFIRM max_failure=10 max_connections=5 reopen=180 valid_for=(online_logfiles,primary_role) db_unique_name=ent12c1'
set log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=ent12c1s';
This single instance data guard configuration which transport redo asynchronously is transformed as show below.
In situation 1 ENT12C1 is the primary database while ENT12C1S is the standby and FS12C1 is the far sync instance the primary database ships redo synchronously.
After a role switch when the ENT12C1S becomes the new primary then it uses FS12C1S as the far sync instance to transport redo synchronously.
In both situation there exists a direct redo transport path between primary and standby and this will transport redo asynchronously in case of far sync instance failure. Once the far sync is backup again the data guard configuration will revert to using the far sync instance for redo transport. If the standby redo logs were created on the primary then when far sync instances are in use for redo transport standby redo logs will be created automatically for them.
1. On the servers used for creating the far sync instances, install the oracle database software and create a listener. There's no requirement for creating static listener configuration as far sync instance automatically registers with the listener.
2. Create TNS entries for far sync instances on the existing databases (primary and standby) and copy the existing TNS entries into the far sync instances tnsnames.ora file.
cat tnsnames.ora

ENT12C1TNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ent12c1-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ent12c1)
    )
  )

ENT12C1STNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ent12c1s-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ent12c1)
    )
  )

FS12C1TNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = fs12c1-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = fs12c1)
    )
  )

FS12C1STNS =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = fs12c1s-host)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = fs12c1s)
    )
  )
3. Create a control file for the far sync instances by connecting to the primary database. Same control file is used for both far sync instances in this case (fs12c1 and fs12c1s).
SQL> ALTER DATABASE CREATE FAR SYNC INSTANCE CONTROLFILE AS '/home/oracle/controlfs.ctl';
4. Copy the control file to the far sync instances. In this case the control file is multiplexed and renamed as control01.ctl and control02.ctl on the far sync instances (shown on the pfiles in subsequent steps).
scp controlfs.ctl fs12c1-host:/opt/app/oracle/oradata/FS12C1/controlfile/controlfs01.ctl
scp controlfs.ctl fs12c1s-host:/opt/app/oracle/oradata/FS12C1S/controlfile/controlfs01.ctl
Similarly copy the password file from the primary to $ORACLE_HOME/dbs on the servers where far sync instances will be created. For far sync instances fs12c1 and fs12c1s the password file need to be renamed as orapwfs12c1 and orapwfs12c1s respectively.

5. Create pfile from the primary spfile. This will be modified to reflect the far sync instance settings.
SQL> create pfile='/home/oracle/pfilefs.ora' from spfile;
6. Copy the pfile to far sync instances ($ORACLE_HOME/dbs) and rename them to reflect the instance names (eg. initfs12c1.ora and initfs12c1s.ora). Modify the init file used for the primary far sync instance (fs12c1) as shown below. Not all the parameters are needed for far sync and those could be removed from the pfile.
cat initfs12c1.ora

*.audit_file_dest='/opt/app/oracle/admin/fs12c1/adump'
*.audit_trail='OS'
*.compatible='12.1.0.2'
*.control_files='/opt/app/oracle/oradata/FS12C1/controlfile/controlfs01.ctl','/opt/app/oracle/oradata/FS12C1/controlfile/controlfs02.ctl'
*.db_block_size=8192
*.db_create_file_dest='/opt/app/oracle/oradata'
*.db_name='ent12c1'
*.db_recovery_file_dest_size=21474836480
*.db_recovery_file_dest='/opt/app/oracle/fast_recovery_area'
*.db_unique_name='fs12c1'
*.diagnostic_dest='/opt/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=fs12c1XDB)'
*.fal_server='ENT12C1TNS'
*.log_archive_config='dg_config=(ent12c1,ent12c1s,fs12c1,fs12c1s)'
*.log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=fs12c1'
*.log_archive_dest_2='service=ENT12C1STNS ASYNC NOAFFIRM valid_for=(STANDBY_LOGFILES,standby_role) db_unique_name=ent12c1s max_failure=10 max_connections=5 reopen=180'
*.log_archive_dest_state_1='enable'
*.log_archive_dest_state_2='enable'
*.log_archive_format='%t_%s_%r.dbf'
*.log_archive_max_processes=10
*.log_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/FS12C1','/data/flash_recovery/ENT12C1','/opt/app/oracle/fast_recovery_area/FS12C1'
*.remote_login_passwordfile='EXCLUSIVE'
*.standby_file_management='AUTO'
7. Mount the far sync instance using the pfile and then create a spfile from the pfile. Without the spfile a warning is shown when the far sync is added to a data guad broker configuration. Beside that having a spfile also helps with any subsequent parameter changes without the need to restart the far sync instance. Restart (mount) the far sync instance using the spfile.



8. Similarly create the pfile for the far sync instance used by current standby (when it becomes the primary) FS12C1S.
cat initfs12c1s.ora

*.audit_file_dest='/opt/app/oracle/admin/fs12c1s/adump'
*.audit_trail='OS'
*.compatible='12.1.0.2'
*.control_files='/opt/app/oracle/oradata/FS12C1S/controlfile/control01.ctl','/opt/app/oracle/fast_recovery_area/FS12C1S/controlfile/control02.ctl'
*.db_create_file_dest='/opt/app/oracle/oradata'
*.db_name='ent12c1'
*.db_recovery_file_dest_size=21474836480
*.db_recovery_file_dest='/opt/app/oracle/fast_recovery_area'
*.db_unique_name='fs12c1s'
*.diagnostic_dest='/opt/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=fs12c1sXDB)'
*.fal_server='ENT12C1STNS'
*.log_archive_config='dg_config=(ent12c1,ent12c1s,fs12c1,fs12c1s)'
*.log_archive_dest_1='location=use_db_recovery_file_dest valid_for=(all_logfiles,all_roles) db_unique_name=fs12c1s'
*.log_archive_dest_2='service=ENT12C1TNS ASYNC NOAFFIRM valid_for=(standby_logfiles,standby_role) db_unique_name=ent12c1 max_failure=10 max_connections=5 reopen=180'
*.log_archive_dest_state_1='enable'
*.log_archive_dest_state_2='enable'
*.log_archive_format='%t_%s_%r.dbf'
*.log_archive_max_processes=10
*.log_file_name_convert='/data/oradata/ENT12C1','/opt/app/oracle/oradata/FS12C1S','/data/flash_recovery/ENT12C1','/opt/app/oracle/fast_recovery_area/FS12C1S'
*.remote_login_passwordfile='EXCLUSIVE'
*.standby_file_management='AUTO'
9. Similar to earlier (step 7) create a spfile and restart (mount) the far sync instance using the spfile.

10. Update the log archive config parameter on both primary and standby to include the far sync instance information as well.
alter system set log_archive_config='dg_config=(ent12c1,ent12c1s,fs12c1,fs12c1s)' scope=both ;
11. Update the fal server parameter on the primary (ent12c1) as below which allows the ent12c1 to fetch archive logs (when it becomes a standby) either from the primary (ent12c1s) or from the far sync instance (fs12c1s).
alter system set fal_server='ENT12C1STNS','FS12C1STNS' scope=both;
12. Update the fal server parameter on the standby (ent12c1s) so that it can fetch the archive logs either from the ent12c1 (primary) or far sync instance (fs12c1).
alter system set fal_server='ENT12C1TNS','FS12C1TNS' scope=both;
13. Update the log archive destination and log archive destination state on the primary such that redo transport is synchronized between primary and far sync and asynchronous between the primary and standby (direct). Further more the asynchronous log archive destination is set with state alternate so that when the synchronous log archive destination fails the data guard configuration start shipping redo via this alternate log archive destination.
alter system set log_archive_dest_state_2='enable' scope=both;
alter system set log_archive_dest_2='service=FS12C1TNS SYNC AFFIRM  db_unique_name=fs12c1  max_failure=1 valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_3 max_connections=5' scope=both;

alter system set log_archive_dest_state_3='alternate' scope=both;
alter system set log_archive_dest_3='service=ENT12C1STNS ASYNC NOAFFIRM  db_unique_name=ent12c1s  valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_2 max_failure=10 max_connections=5 reopen=180' scope=both;
14. Change and add log archive destination settings on the standby database so that when it becomes primary (ent12c1s) it too can use the far sync instance for synchronous redo transport and failing that use asynchronous redo transport directly with the standby at the time (ent12c1)
alter system set log_archive_dest_state_2='enable' scope=both;
alter system set log_archive_dest_2='service=FS12C1STNS SYNC AFFIRM db_unique_name=fs12c1s valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_3 max_connections=5  max_failure=1' scope=both;

alter system set log_archive_dest_state_3='alternate' scope=both;
alter system set log_archive_dest_3='service=ENT12C1TNS ASYNC NOAFFIRM  db_unique_name=ent12c1 valid_for=(online_logfiles,primary_role) alternate=log_archive_dest_2 max_failure=10 max_connections=5 reopen=180' scope=both;
15. Increase the protection mode of the primary database to maximum availability.
ALTER DATABASE SET STANDBY TO MAXIMIZE AVAILABILITY;

SQL> SELECT PROTECTION_MODE FROM V$DATABASE;

PROTECTION_MODE
--------------------
MAXIMUM AVAILABILITY
16. Do few log switches and verify that redo transport is happening via the far sync instance. Easiest way to monitor is through the alert log, which log switch will be logged on both far sync instance alert log and standby instance alert log if it was transported via the far sync. If there's any issues with regard to log archive destination this could be observed on the primary
SQL> SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             VALID     FS12C1TNS
LOG_ARCHIVE_DEST_3             ALTERNATE ENT12C1STNS
and on the far sync instance
SQL> SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             VALID     ENT12C1STNS
STANDBY_ARCHIVE_DEST           VALID     USE_DB_RECOVERY_FILE_DEST
Above output shows that primary is transporting to log archive dest 2 and status valid and dest 3 is still in alternative state. On the far sync output it shows that far sync instance is shipping redo as per its log archive dest 2 value and current status is valid.

17. Shutdown abort the far instance and view the archive dest output. If the configuration works properly then log archive dest 3 should be valid and redo transport should be happening directory between primary and standby in asynchronous mode.
SQL> SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             ALTERNATE FS12C1TNS
LOG_ARCHIVE_DEST_3             VALID     ENT12C1STNS
From the above output it could be seen that after far sync instance is terminated the log archive dest 3 has become the valid destination and log archive dest 2 is kept as an alternate destination. On the primary instance alert log following could be observed when the far sync instance is terminated
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 1 thread 1 sequence 1503 (3113)
Destination LOG_ARCHIVE_DEST_2 no longer supports SYNCHRONIZATION
and on the standby instance alert log the following
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM PERFORMANCE mode
It must be also mentioned on few occasions where the far sync was abruptly terminated (shutdown abort) the recovery process on standby got stopped due to lost writes
MRP0: Background Media Recovery terminated with error 742  <-- far sync instance terminated
Fri Sep 12 17:32:28 2014
Errors in file /opt/app/oracle/diag/rdbms/ent12c1s/ent12c1s/trace/ent12c1s_pr00_10098.trc:
ORA-00742: Log read detects lost write in thread 1 sequence 1503 block 868
ORA-00312: online log 4 thread 1: '/opt/app/oracle/fast_recovery_area/ENT12C1S/onlinelog/o1_mf_4_b0y1dn8v_.log'
ORA-00312: online log 4 thread 1: '/opt/app/oracle/oradata/ENT12C1S/onlinelog/o1_mf_4_b0y1dml4_.log'
Managed Standby Recovery not using Real Time Apply
RFS[16]: Assigned to RFS process (PID:10165)
RFS[16]: Selected log 5 for thread 1 sequence 1504 dbid 209099011 branch 833730501
Fri Sep 12 17:32:28 2014
Recovery interrupted!
Recovered data files to a consistent state at change 19573793
Fri Sep 12 17:32:28 2014
Errors in file /opt/app/oracle/diag/rdbms/ent12c1s/ent12c1s/trace/ent12c1s_pr00_10098.trc:
ORA-00742: Log read detects lost write in thread 1 sequence 1503 block 868
ORA-00312: online log 4 thread 1: '/opt/app/oracle/fast_recovery_area/ENT12C1S/onlinelog/o1_mf_4_b0y1dn8v_.log'
ORA-00312: online log 4 thread 1: '/opt/app/oracle/oradata/ENT12C1S/onlinelog/o1_mf_4_b0y1dml4_.log'
Fri Sep 12 17:32:28 2014
MRP0: Background Media Recovery process shutdown (ent12c1s)
Fri Sep 12 17:32:28 2014
Archived Log entry 107 added for thread 1 sequence 1503 rlc 833730501 ID 0xde697d0 dest 3:
Fri Sep 12 17:38:51 2014
alter database recover managed standby database disconnect  <-- manual start at 17:38 after it was stopped 17:32
So it maybe good idea to keep an eye on the recovery when the far sync instance terminates. According to 1302539.1 when active data guard is in place the there's automatic block repair transparent to the user. Once the far sync instance is backup again the redo transport will go back to original setting
SQL>  SELECT DEST_NAME,STATUS,DESTINATION FROM V$ARCHIVE_DEST WHERE DESTINATION IS NOT NULL;

DEST_NAME                      STATUS    DESTINATION
------------------------------ --------- ------------------------------
LOG_ARCHIVE_DEST_1             VALID     USE_DB_RECOVERY_FILE_DEST
LOG_ARCHIVE_DEST_2             VALID     FS12C1TNS
LOG_ARCHIVE_DEST_3             ALTERNATE ENT12C1STNS
And following could be observed on the alert log of primary
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 1506
LGWR: Standby redo logfile selected for thread 1 sequence 1506 for destination LOG_ARCHIVE_DEST_2
Fri Sep 12 16:21:08 2014
Thread 1 advanced to log sequence 1506 (LGWR switch)
  Current log# 1 seq# 1506 mem# 0: /data/oradata/ENT12C1/onlinelog/o1_mf_1_9bcsl3ds_.log
  Current log# 1 seq# 1506 mem# 1: /data/flash_recovery/ENT12C1/onlinelog/o1_mf_1_9bcsl3hm_.log
Fri Sep 12 16:21:08 2014
Archived Log entry 1408 added for thread 1 sequence 1505 ID 0xde697d0 dest 1:
Fri Sep 12 16:21:11 2014
Thread 1 cannot allocate new log, sequence 1507
Checkpoint not complete
  Current log# 1 seq# 1506 mem# 0: /data/oradata/ENT12C1/onlinelog/o1_mf_1_9bcsl3ds_.log
  Current log# 1 seq# 1506 mem# 1: /data/flash_recovery/ENT12C1/onlinelog/o1_mf_1_9bcsl3hm_.log
Fri Sep 12 16:21:14 2014
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
On the alert log of the far sync instance
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Standby controlfile consistent with primary
RFS[1]: Assigned to RFS process (PID:3557)
RFS[1]: Selected log 5 for thread 1 sequence 1506 dbid 209099011 branch 833730501
Fri Sep 12 17:04:06 2014
******************************************************************
TT00: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
TT00: Standby redo logfile selected for thread 1 sequence 1506 for destination LOG_ARCHIVE_DEST_2
Fri Sep 12 17:04:07 2014
Changing standby controlfile to MAXIMUM PERFORMANCE mode
RFS[2]: Assigned to RFS process (PID:3561)
RFS[2]: Opened log for thread 1 sequence 1505 dbid 209099011 branch 833730501
Fri Sep 12 17:04:08 2014
Archived Log entry 203 added for thread 1 sequence 1505 rlc 833730501 ID 0xde697d0 dest 2:
Fri Sep 12 17:04:13 2014
Changing standby controlfile to MAXIMUM AVAILABILITY mode
On the standby alert log
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Similarly it is possible to shutdown the standby instance and see if primary is able to ship redo to far sync instance and if the redo is fetch by standby once it is started again. With these test the situation 1 on the figure shown at the beginning of the post is complete.

18. To test the situation 2 on the figure above do a switchover and check the redo transport via the far sync (fs12c1s) instance.

This conclude adding of far sync instances to existing data guard configuration on 12c.

Useful metalink notes
Cascaded Standby Databases in Oracle 12c [ID 1542969.1]
Data Guard 12c New Feature: Far Sync Standby [ID 1565071.1]
Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration [ID 1302539.1]