Monday, July 15, 2019

How to perform roll forward recovery for standby oracle database?

Oracle DBA Documentation: Standby Recover Forward with RMAN


TO FIX GAPS - Do NOT run backups during operational hours, unless you get approval
from the NOC!!!


Note: I really really like to use "screen -RD" when performing this procedure, so I can come
back to my session when I get disconnected from the network.  If you don't know how to
use screen, there are many tutorials on the web.


1. Stop the redo on the standby database


Since our standby databases use Oracle DataGuard Broker,
it is easiest to simply disable the broker on the standby side.

SQL> alter system set dg_broker_start=FALSE;

This will stop and prevent broker from starting by updating memory and the spfile or
the database. You can also cancel recovery, but this should not be running after
broker was stopped.

SQL> alter database recover managed standby database cancel;


2. Determine the current SCN for the standby database. Run both of these SQL
statements in SQL Plus on the Standby.


col CURRENT_SCN for 9999999999999
select current_scn from v$database;
select min(fhscn) current_scn from x$kcvfh;

Take the lesser of the 2, minus 10000 to be sure.


3. Use the value for the current SCN to take your incremental backup on the primary


Check the /DD_RMAN/rman/[database] folder or your backup folder and make sure
there is not a "control_file.ctl" file already there.  I recommend you remove all previous
backups, just in case. Just "rm" them or copy to another place/folder. 

run {
allocate channel disk1 type disk format '/DD_RMAN/rman/%d/%d_%s_%p_%t_%U.dat';
allocate channel disk2 type disk format '/DD_RMAN/rman/%d/%d_%s_%p_%t_%U.dat';
allocate channel disk3 type disk format '/DD_RMAN/rman/%d/%d_%s_%p_%t_%U.dat';
BACKUP INCREMENTAL FROM SCN 275632204001 database tag 'FOR STANDBY';
backup current controlfile for standby format '/DD_RMAN/rman/%d/control_file.ctl';
}


4.  Login to the standby and verify all the backupsets have been replicated to the
Standby Data Domain.


On Primary Node
ls -l /DD_RMAN/rman/[database]/
On Standby Node
ls -l /DD_RMAN/rman/[database]/

They should be the same and have the same file sizes.  Once they are identical, proceed.


5.  Go ahead and catalog the backupsets on the Standby


RMAN> catalog start with '/DD_RMAN/rman/[database]
Answer "YES" to catalog the backups.

I also like to crosscheck the backupsets and remove any old ones.

RMAN> allocate channel for maintenance device type disk;

RMAN> crosscheck backup;

RMAN> delete expired backup;


5. Recover the database from the incremental backup


RMAN> recover database noredo;
If when you run this, you see something like this, your incremental backup was
NOT from an old enough SCN.  You need to re-run the backup from an OLDER SCN.
Please check your queries on standby.

RMAN> run {    
2> allocate channel d1 type disk;
3> allocate channel d2 type disk;
allocate channel d3 type disk;
4> 5> recover database noredo;
6> }
allocated channel: d1
channel d1: SID=1032 instance=test1 device type=DISK
allocated channel: d2
channel d2: SID=1290 instance=test1 device type=DISK
allocated channel: d3
channel d3: SID=1549 instance=test1 device type=DISK
Starting recover at 21-APR-15
Finished recover at 21-APR-15
released channel: d1
released channel: d2
released channel: d3
 If your output looks something like this, you are in good shape.
allocated channel: d1
channel d1: SID=1290 instance=test1 device type=DISK
allocated channel: d2
channel d2: SID=1549 instance=test1 device type=DISK
allocated channel: d3
channel d3: SID=1802 instance=test1 device type=DISK
Starting recover at 21-APR-15
channel d1: starting incremental datafile backup set restore
channel d1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00006: +DATA/test/datafile/users.259.842540043
destination for restore of datafile 00010: +DATA/test/datafile/cccmonitor.564.842715095
destination for restore of datafile 00014: +DATA/test/datafile/procspuser_data.463.855351457
destination for restore of datafile 00015: +DATA/test/datafile/vis_data.460.858644267
destination for restore of datafile 00018: +DATA/test/datafile/undotbs5.350.866517573
destination for restore of datafile 00020: +DATA/test/datafile/flexnetprod_data.649.869682677
channel d1: reading from backup piece /DD_RMAN/rman/TEST/TEST_10841_1_877608247
_ipq4uf9n_1_1.dat
channel d2: starting incremental datafile backup set restore
channel d2: specifying datafile(s) to restore from backup set
destination for restore of datafile 00002: +DATA/test/datafile/sysaux.266.842540047
destination for restore of datafile 00004: +DATA/test/datafile/undotbs2.271.842540045
destination for restore of datafile 00005: +DATA/test/datafile/undotbs3.270.842540045
destination for restore of datafile 00009: +DATA/test/datafile/sp_data.269.842540047
destination for restore of datafile 00016: +DATA/test/datafile/hps_data.462.859231897
destination for restore of datafile 00017: +DATA/test/datafile/undotbs4.353.866515623
destination for restore of datafile 00019: +DATA/test/datafile/esdprod_data.648.869682669


6. Restore the STANDBY controlfile to the standby:


RMAN> shutdown immediate;
RMAN> startup nomount
RMAN> RESTORE STANDBY CONTROLFILE FROM '/DD_RMAN/rman/[database]/control_file.ctl';
RMAN> alter database mount;


7.  Now you need to syncronize the controlfile with the datafile location of the standby.


RMAN> catalog start with '+DATA/[database]/datafile';
RMAN> switch database to copy;
If you get an error when switching the datafile to copy, it may be that the database,
during startup has switched to a "newer" incarnation, which is not applicable to the standby
recovery.
Here is an example from test
RMAN> list incarnation;
using target database control file instead of recovery catalog
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1   1   TEST 162103565    ORPHAN 1      13-MAR-13
2   2   TEST 162103565    CURRENT  265249491116 07-FEB-15
Let's look at the primary - TEST
RMAN> list incarnation;
using target database control file instead of recovery catalog
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1   1   TEST 162103565    CURRENT 1      13-MAR-13
Notice how the  primary current is not the same as the standby?
 To fix this, execute the following on the standby.
RMAN> reset database to incarnation 1;
Now re-run the command to switch the datafile copies.
 RMAN> switch database to copy;
 This should list all the datafile copies.
Through all of this, if you are watching the alert log on the standby, you WILL see a lot of errors.
  It may even look like the datafiles don't exist. This is OK!


8.  Now stop the standby database with srvctl


# srvctl stop database -d [database]
 Then start it
# srvctl start database -d [database]


9.  Watch the alert log, to ensure the database comes up and applies any archive 
logs that are available locally.  If you get an "FAL gap" error, which can not be 
resolved, you will have to check the archive retention and start over with your incremental
backup from the new SCN.  


Make sure the local "archive delete" scripts on "node 1" of each cluster are retaining the archives
long enough for you to do all this stuff. If you recover forward clean.  Messages like this:


Tue Apr 21 13:37:40 2015
Archived Log entry 1363 added for thread 2 sequence 44117 ID 0x9a9530d dest 1:
Tue Apr 21 13:37:44 2015
RFS[9]: Assigned to RFS process 10194
RFS[9]: Opened log for thread 4 sequence 6201 dbid 162103565 branch 809981197
Archived Log entry 1364 added for thread 4 sequence 6201 rlc 809981197 ID 0x9a9530d dest 2:
Tue Apr 21 13:37:46 2015
RFS[10]: Assigned to RFS process 10234
RFS[10]: Opened log for thread 5 sequence 6084 dbid 162103565 branch 809981197
Archived Log entry 1365 added for thread 5 sequence 6084 rlc 809981197 ID 0x9a9530d dest 2:
 You are in good shape.


10. Restart the MRP process and broker on the standby.


sql> alter system set dg_broker_start=TRUE;
In the alert log, you should see something like this.
Tue Apr 21 13:47:53 2015
ALTER SYSTEM SET dg_broker_start=TRUE SCOPE=BOTH;
Starting Data Guard Broker (DMON)
Tue Apr 21 13:47:55 2015
INSV started with pid=84, OS id=21325 
Tue Apr 21 13:47:59 2015
NSV0 started with pid=85, OS id=21389 
Tue Apr 21 13:48:05 2015
RSM0 started with pid=86, OS id=21439 
ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='test1';
ALTER SYSTEM SET log_archive_format='%t_%s_%r.dbf' SCOPE=SPFILE SID='test1';
Tue Apr 21 13:48:25 2015
Managed Standby Recovery starting Real Time Apply
Tue Apr 21 13:50:29 2015

Start MRP with archive log current also if needed. 


11. Now check dgmgrl and confirm the standby is in good shape.  You may get
 some timeouts, just keep trying the command.



DGMGRL>  show database '[database]';

It should come back showing no transfer gap and perhaps it is still recovering.  It will also
show you the apply instance, so you can confirm broker is getting archives and managing
things via that local alert log.

No comments:

Post a Comment