Monday, January 14, 2019

How to resolve time since chkpt and ERROR OGG-15163 there was a problem sending a message to REPLICAT (Timeout waiting for message)?


So I have been observing and have been getting alerts on oracle goldengate replicate that lag is increasing.  The process was running but the lag kept increasing. Tried to troubleshoot and saw through logdump utility rba's were moving.  I was assured at least the process is doing its job. So it meant the performance was bad for the process.

GGSCI (ggahmad1.com) 6> info all
Program Status   Group    Lag at Chkpt  Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXTRAC2 00:00:04   00:00:04
REPLICAT RUNNING RPTRAC2 00:00:00   05:40:42

There is 5 hours 40 min of lag for that one process that means data is getting replicated but slower than how  its happening. 

So I added some extra parameters as below in replicat parameter file and restarted the process. The process first timed out when I tried stopping or send extract forcestoping. Eventually, I killed the process and restarted and was able to resolve the lag after 15 min of running.


DBOPTIONS INTEGRATEDPARAMS (parallelism 24)
DBOPTIONS INTEGRATEDPARAMS (EAGER_SIZE 20001)
BATCHSQL BATCHTRANSOPS 100

DBOPTIONS:
Valid For Extract and Replicat
Description: Use the DBOPTIONS parameter to specify database options. This is a global parameter, applying to all TABLE or MAP statements in the parameter file.
https://docs.oracle.com/goldengate/1212/gg-winux/GWURF/gg_parameters034.htm#GWURF437

DBOPTIONS INTEGRATEDPARAMS (parameter[, ...])
This parameter specification applies to Replicat in integrated mode. It specifies optional parameters for the inbound server.
https://docs.oracle.com/goldengate/1212/gg-winux/GIORA/config_apply.htm#GIORA938

PARALLELISM: Controls the number of processes used by the logmining server. The default is 2.
EAGER_SIZE: Sets a threshold for the size of a transaction (in number of LCRs) after which Oracle GoldenGate starts applying data before the commit record is received. The default for Oracle GoldenGate is 9500.
https://docs.oracle.com/goldengate/1212/gg-winux/GIORA/addl_integrated_params.htm#GIORA245

BATCHTRANSOPS: n
Controls the maximum number of batch operations that can be grouped into a transaction before requiring a commit. When BATCHTRANSOPS is reached, the operations are applied to the target.
1. Minimum value is 1.
2. Maximum value is 100000.
3. Default is 1000 for nonintegrated Replicat (all database types) and 50 for an integrated Oracle Replicat.
https://docs.oracle.com/goldengate/1212/gg-winux/GWURF/gg_parameters011.htm#GWURF404

stop replicat RPTRAC2   
edit param RPTRAC2   ==>> (with below params)  
save and exit ==>>;wq!
start replicat RPTRAC2

After 15 min the lag caught up and the processes was performance tuned.
Program Status   Group    Lag at Chkpt  Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXTRAC2 00:00:04   00:00:04
REPLICAT RUNNING RPTRAC2 00:00:00   00:01:42

Hope this helps. Cheers.

No comments:

Post a Comment