Friday, April 20, 2018

How resolve Goldengate process lags by restarting the process ?

The below document should be followed if the processes are running and lag is still increasing or we need to restart the goldengate any specific process.

Today we got notified from alerts that GG is having a lag in the sync.
1. Connect to the serve as GG os user.
2. Change directory to GG home  in my case its /u01/app/ogg/12c
3. Enter into ggsci command line
./ggsci
4. check all extracts and replicats
GGSCI (node1) 2> info all
I looked the at the processes and saw the processes info all as below.
This means we did not get any alerts for processes abended since all are running but not working or
are hung. You may have got lag alerts if its set up.


We need to stop and start the processes again.
Troubleshooting begins here:
1. On the ggsci command line view report processname and keep on pressing enter to see
more of the report. The error should be all the way to the end.  
GGSCI (dgea01.cccis.com) 6> view report process_name

READ THE REPORT FILE WHERE IT SAYS ABENDED WHICH IS MOSTLY AT THE BOTTOM
OF THE FILE AND LOOK AT THE ERROR OR ANY WARNING THAT IT DISPLAYS. IT YOU
FIND NOTHING.

IF THERE IS NO ERROR AND THE GG PROCESS IS RUNNING BUT LAG IS INCREASING.
2. Stop/Start extract/replicat process. In my case my replicats was not catching up with the lag.
Also i had to kill the processes.

GGSCI (node1) 2> stop  RPMPAEP1
GGSCI (node1) 2> stop RPMPAEP2

3. IF THIS COMMAND DOES NOT WORK. NEXT STEP IS TO FORCE STOP THE PROCESSES.
GGSCI (node1) 2>  send replicat name, forcestop.

In my case that did not work as well.

4.  SO I HAD TO KILL THE PROCESSES AND START AGAIN.
GGSCI (node1) 2>    kill RPMPAEP1
GGSCI (node1) 2>    kill RPMPAEP2
Once the process have been killed. . Start them again.
NOTE: killing the processes will cause a recovery.
GGSCI (node1) 2>   start RPMPAEP1
GGSCI (node1) 2>   start RPMPAEP2
5. once the processes change status from starting to running give it 10-15 min to have
graceful connection and start everything before it could catch up to the lag and then the
lag will shift over from time since chkpt to lag to chkpt as below.


6. At this point  you will see the time at lag at chkpt will start decreasing as below.
7. Eventually the lag will catch up. At this time you can share the status with the team
that the lag caught up and we are all set.
GGSCI (node1) 7> info all
Program Status   Group    Lag at Chkpt  Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXCFAEP1 00:00:02   00:00:07
EXTRACT RUNNING EXCFAEP2 00:00:05   00:00:00
EXTRACT RUNNING EXMPAEP1 00:00:03   00:00:00
REPLICAT RUNNING RPCFAEP1 00:00:00   00:00:06
REPLICAT RUNNING RPMPAEP1 00:00:02   00:00:01
REPLICAT RUNNING RPMPAEP2 00:00:00   00:00:00

No comments:

Post a Comment