Sie sind auf Seite 1von 10

ORACLE DATAGUARD

INDEX

1. Introduction on Dataguard

2. Scenarios on DG Related Issues

2.1 Case 1 : Log gap between Primary and Standby

2.2 Case 2 : Failure due to Standby Archive log location is 100%

2.3 Case 3 : Remedy for N/W related issue

3. FSFO

3.1 Configuring FSFO

1. Introduction:
The following document allows you to understand the different issues commonly faced in an Oracle
Dataguard environment.
When you are using Dataguard, there are several scenarios when physical standby can go out of sync
with the primary database. Here are few senarios explained w/h solution.
2. Scenarios on DG Related issues:
Case 1: When the logs are missing on standby difference is huge (say more than 500 logs), you have to
rebuild the standby database from scratch.Without rebuild standby database, As an enhancement from
10g, an incremental backup created with BACKUP INCREMENTAL… FROM SCN can be used to refresh the
standby database. Here in this case we will be taking an SCN based incremental backup of primary using
the SCN retrieved from standby database.
Step 1: On both the PRIMARY and STANDBY database query the v$database view and record the
current SCN of the standby database:
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
Step 2: Stop the MRP process in standby database.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Once issued the command check if the MRP process has been properly stopped using the
v$managed_standby view.
SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;
Step 3: Now go to the Primary database and take a SCN based incremental backup using the SCN value
retrieved from the Standby database in 1st step.
RMAN> BACKUP INCREMENTAL FROM SCN 'SCN#' DATABASE FORMAT
'/tmp/ForStandby_%U' tag 'FOR STANDBY';
Step 4: Do a recovery of the Standby database using the above taken Incremental backup. Copy the
backup piece of the incremental backup taken above to a location accessible to standby server. Connect
to the RMAN prompt in Standby database and catalog the Incremental Backup pieces along with the
location. Cataloging will help RMAN to identify where the backup pieces are located to restore.
RMAN> CATALOG START WITH 'location/backup_pieces';
Step 5: Once the backup pieces are cataloged start the recovery.
RMAN> RECOVER DATABASE NOREDO;
Sample output of Recovery Log:
************************OUTPUT*****************************
Starting recover at 2015-09-17 04:59:57
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=309 devtype=DISK
channel ORA_DISK_1: starting incremental datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
....
..
..
.
channel ORA_DISK_1: reading from backup piece
/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1 tag=FOR STANDBY
channel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2015-07-25 05:20:3
************************OUTPUT*****************************
Step 6: Once the recovery is completed start the MRP process in Standby database.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING
CURRENT LOGFILE DISCONNECT FROM SESSION;
Step 7: After this, check whether the logs are being applied on the standby or not.
SQL> SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
After doing a recovery using the incremental backup, you will not see the sequence#'s which were
visible earlier with APPLIED=NO because they have been absorbed as part of the incremental backup
and applied on standby during recovery.The APPLIED column starts showing YES for the logs which are
being transmitted now, this means logs are being applied.
--INCASE OF RECOVERY FAILURE--
If the recovery failed after applying the incremental backup, we need to recreate the controlfile of
standby. The reason for recreating the controlfile is that the state of the database was same because the
database's scn was not updated in the control file after applying the incremental backup while the scn
for datafiles were updated. Consequently, the standby database was still looking for the old file to apply
Steps to recreate the standby controlfile and start the managed recovery on standby:
Step 1. Take the backup of controlfile from primary by connecting to the RMAN prompt
RMAN TARGET /

RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY;


Step 2. Copy the controlfile backup to the standby system (or if it is on the common NFS mount, no
need to transfer or copy) and restore the controlfile onto the standby database
-> Shutdown all instances (If standby is RAC) of the standby.
SHUT IMMEDIATE;

-> Startup nomount, one instance.


STARTUP NOMOUNT;
-> Restore the standby control file.
RMAN TARGET /
RESTORE STANDBY CONTROLFILE FROM 'CTRL_FILE_LOCATION';
Step 3. Startup the standby with the new control file.
SHUT IMMEDIATE;
STARTUP MOUNT;
Step 4. Restart managed recovery in one instance (if standby is RAC) of the standby database:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
FROM SESSION;
The above statement may succeed without errors but the MRP process will still not start. The reason is
that since the controlfile has been restored from the primary, it is looking for datafiles at the same
location as are in primary instead of standby. For example, if the primary datafiles are located at
'+DATA/proddb_1/DATAFILE' and standby datafiles are at '+DATA/proddb_2/DATAFILE', the new
controlfile will show the datafile's location as '+DATA/proddb_1/DATAFILE'. This can be verified from the
query "select name from v$datafile" on the standby instance. We need to rename all the datafiles to
reflect the correct location. This renaming can be achieved by doing the following.
Change the parameter standby_file_management=manual in standby's parameter file.
ALTER DATABASE RENAME FILE '+DATA/proddb_1/datafile/NAME.DBF' TO
'+DATA/proddb_2/datafile/NAME.DBF';
Now start the managed recovery as:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
FROM SESSION;
When the logs are missing on standby is less than 10-15 numbers, We can simple ship the logs which are
missing in the standby database from primary database by using SCP/FTP and then register the logfiles
in standby to resolve the gap. We can use the below query to find out the archive log gap.
SELECT ARCH.THREAD# “Thread”, ARCH.SEQUENCE# “Last Sequence Received”,
APPL.SEQUENCE# “Last Sequence Applied”, (ARCH.SEQUENCE# – APPL.SEQUENCE#)
“Difference” FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG
WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM
V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH, (SELECT THREAD# ,SEQUENCE#
FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT
THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1;
Step 1: Copy the missing archive logs from Primary to Standby by SCP.
SCP LOGFILE.ARC ORACLE@STANDBY:/LOG_LOCATION .
Step 2: Register the copied logfile to the database.
SQL> ALTER DATABASE REGISTER LOGFILE 'LOGFILE_LOCATION/LOGFILE.ARC';
NOTE: Repeat the same process for all the log files which are missing at standby.

Case 2: In this case we will be seeing when the Standby archive log location gets filled due to
which the log transport from Primary to Standby database will not be happening which causes a
log gap between the primary and standby database. Below are the steps taken to resolve the
issue.
Step 1: First check the mount point space usage where the Standby archive logs resides.
$ df -h
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s0 254966 204319 25151 90% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 496808 376 496432 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
/dev/dsk/c0t0d0s6 3325302 3325302 0 100% /archive
fd 0 0 0 0% /dev/fd
swap 496472 40 496432 1% /var/run
swap 496472 40 496432 1% /tmp
/dev/dsk/c0t0d0s5 13702 1745 10587 15% /opt
/dev/dsk/c0t0d0s7 9450 1045 7460 13% /export/home
As we can see the /archive location is completely filled and there is no space left for the new
archive log coming from Primary database.
When querying the v$archived_log view we can check what files are applied on the standby.
SQL> SELECT DEST_ID,SEQUENCE#,APPLIED
FROM V$ARCHIVED_LOG
WHERE DEST_ID=2
AND SEQUENCE# > (SELECT MAX(SEQUENCE#) - 10 FROM V$ARCHIVED_LOG)
ORDER BY SEQUENCE#
/
In the query output we can see the log sequence which is not applied will be shown as "NO".

Step 2: We cannot manually remove the archive logs from the archivelog destination, instead we
have to take a backup of all the archive logs which were not backed up using the RMAN
strategy.
RMAN TARGET /
RMAN> BACKUP AS COMPRESSED BACKUPSET ARCHIVELOG ALL NOT BACKED
UP;
The archive log files would have been removed at the moment of the backup, but this was not the
case. The files were still in the FRA.
Step 3: Now we have to delete all the archive logs after backing it up using RMAN strategy.

RMAN TARGET /
RMAN> DELETE NOPROMPT ARCHIVELOG ALL BACKED UP 1 TIMES TO DEVICE
TYPE DISK;
Now the archive log space will be released and the log ship will happen. The status of the logs
getting transmitted can be viewed using the below query.
SQL> SELECT DEST_ID,SEQUENCE#,APPLIED
FROM V$ARCHIVED_LOG
WHERE DEST_ID=2
AND SEQUENCE# > (SELECT MAX(SEQUENCE#) - 10 FROM V$ARCHIVED_LOG)
ORDER BY SEQUENCE#
/

--IN CASE OF MRP PROCESS IS DOWN--

In some case the MRP process may go down, in that case we have check the status of the MRP
process in v$managed_standby view.
SQL> SELECT
PROCESS,
SEQUENCE#,
STATUS
FROM
V$MANAGED_STANDBY;
PROCESS SEQUENCE# STATUS
------- ---------- -----------------------
ARCH 0 CONNECTED
ARCH 0 CONNECTED
MRP0 595 NO
RFS 595 ATTACHED
RFS 594 RECEIVING

If the MRP process is down we need to issue the below command to bring up the MRP process.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT
FROM SESSION;

SQL> SELECT
PROCESS,
SEQUENCE#,
STATUS
FROM
V$MANAGED_STANDBY;
PROCESS SEQUENCE# STATUS
------- ---------- -----------------------
ARCH 0 CONNECTED
ARCH 0 CONNECTED
MRP0 595 WAIT_FOR_LOG
RFS 595 ATTACHED
RFS 594 RECEIVING

Case 3: Recovering From a Network Failure. Whenever there is an network glitch between the
DC and the DR site there are possibilies that log shipping will be stalled until the network is back
alive. Here we discuss how to recover after a network failure.
Step 1: We need to identify the network failure, for this the V$ARCHIVE_DEST view contains the
network error and identifies which standby database cannot be reached. The below query needs
to be ran on Primary database.
SQL> SELECT DEST_ID, STATUS, ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID
= 2;
If the query results that there are errors archiving to the standby database, and the cause of the
error is TNS: No Listener, then we need to check on the corresponding Standby database if the
Listener is up and running fine.
Step 2: If the network is still down then we need to to prevent the database from stalling by
doing the below workaround.
-> Defer archiving to the mandatory destination.
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = DEFER;
-> When the network problem is resolved, you can enable the archive destination again.
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = ENABLE;
-> Change the archive destination from mandatory to optional.
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_2 = 'SERVICE=standby1 2>
OPTIONAL REOPEN=60';

NOTE: Once the n/w is back need to enable the dest_state again without fail.
Step 3: On the primary database, we need to archive the current online redo log file. The below
command will switch the logs on all RAC nodes.
SQL> ALTER SYSTEM ARCHIVE LOG CURRENT;
When the network is back up again, log apply services can detect and resolve the archive gaps
automatically when the physical standby database resumes Redo Apply.

3. FSFO
Introduction:
The Fast Start Fail Over(FSFO) is a feature available through oracle dataguard broker which
performs an automatic failover to the chosen standby database in case of a primary database
crash.FSFO involves an OBSERVER component which runs on a different machine than that of
the primary and the standby database. The Observer requires only a client or database software
installed in it’s site and TNS connectivity established to the primary and the standby database.

Below are the steps involved in enabling FSFO.


NOTE: The FSFO needs a DG broker to be configured in the Database.
-> The configuration is enabled in the DG Broker and it’s status is as below.
DGMGRL> show configuration;
Configuration - dgtest
Protection Mode: MaxAvailability
Databases:
oraprim - Primary database
orastby - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

Make sure that both, primary and the standby databases have the flashback and FRA featured
enabled which forms a pre-requisite for FSFO.

-> Set the FastStartFailoverTarget to the target database (orastby) for the primary database
(oraprim).
DGMGRL> edit database oraprim set property 'FastStartFailoverTarget'='orastby';
Property "FastStartFailoverTarget" updated
DGMGRL>

-> Similarly, set the FastStartFailoverTarget to the target database (oraprim) for the database
orastby. This will be used if orastby starts behaving as a primary database and oraprim as a
standby after the role transition.
DGMGRL> edit database orastby set property 'FastStartFailoverTarget'='oraprim';
Property "FastStartFailoverTarget" updated
DGMGRL>

-> The property can be verified by the below command.


DGMGRL> show database oraprim 'FastStartFailoverTarget';
FastStartFailoverTarget = 'orastby'

-> Now we need to set the property FastStartFailoverThreshold to 60 seconds which will be time
in seconds that the observer will wait before initiating the failover.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold =
60;
Property "faststartfailoverthreshold" updated

-> Observer process is a continuous process and does not return the prompt at the DGMGRL
session until you stop the observer from another DGMGRL session.
$ nohup dgmgrl sys/oracle@oraprim "start observer" &
OR
-> If willing to run in the foreground, then connect to the broker configuration from the observer
site and run the “start observer” command to start the observer.
$ dgmgrl sys/oracle@oraprim
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2009, 2018, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.


Connected.
DGMGRL> start observer;
Observer started

-> Once that the observer is running, we need to enable the FSFO. To do so, connect to the
broker configuration and execute “enable fast_start failover” command.
DGMGRL> enable fast_start failover;
Enabled.
DGMGRL>

-> We can verify that the cofiguration is successful by doing the below.
DGMGRL> show configuration;
Configuration - dgtest
Protection Mode: MaxAvailability
Databases:
oraprim - Primary database
orastby - (*) Physical standby database
Fast-Start Failover: ENABLED
Configuration Status:
SUCCESS

With the above steps we have completed the FSFO configuration. Now we can check the
properties of the FSFO by the below comand.
DGMGRL> show fast_start failover;
Fast-Start Failover: ENABLED
Threshold: 60 seconds
Target: orastby
Observer: ora1-3.mydomain
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES
Oracle Error Conditions:
(none)

Das könnte Ihnen auch gefallen