PROBLEM:
We had an error trying to start the database CRS Cluster Ready services
Application connections were getting TNS errors not able to identify a service:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
Cluster check as user root shows error on all nodes, and would return an error when starting.
[test0b.test.com: trace]# crsctl check cluster -all
**************************************************************
test0a:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0b:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0c:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
**************************************************************
test0d:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0e:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0f:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[test0a.test.com: bin]# ./crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'test0a'
CRS-2674: Start of 'ora.crsd' on 'test0a' failed
CRS-2679: Attempting to clean 'ora.crsd' on 'test0a'
CRS-2681: Clean of 'ora.crsd' on 'test0a' succeeded
CRS-4000: Command Start failed, or completed with errors.
crsctl status res -t -init --> was showing:
ora.evmd 1 ONLINE INTERMEDIATE test0c STABLE
grid@test0c.test.com:$ crsctl query crs activeversion;
CRS-6750: unable to get the active version
CRS-6752: Active version query failed.
grid@test0c.test.com:$ crsctl query crs activeversion; --> This command was working fine on other nodes
grid@test0c.test.com:$ crsctl query crs activeversion; --> This command was working fine on other nodes
grid@test0c.test.com:$ ocrcheck; --> This command was working fine all on all nodes except node "C"
Issue started on node "C". When we tried to start CSR on this node, we had following errors in crs log (/u01/app/grid/diag/crs/test0c/crs/trace/alert.log) as:
Note: In 11g cluster logs are in:
/u01/app/11.2.0.4/grid/log/test08/alerttest8.log
cd /u01/app/11.2.0.4/grid/log/test08/client
crsctl_root.log
crswrapexece.log
emcrsp.log
crsctl_orarom.log
olsnodes.log
In /u01/app/grid/diag/crs/test0c/crs/trace/alert.log:
2019-01-04 01:00:43.320 [ORAROOTAGENT(6205)]CRS-5019: All OCR locations are on ASM disk groups [OCR], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/test0c/crs/trace/ohasd_orarootagent_root.trc".
2019-01-04 01:03:45.723 [OCRCHECK(67613)]CRS-1013: The OCR location in an ASM disk group is inaccessible.
2019-01-04 01:03:45.723 [OCRCHECK(67613)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/test0c/crs/trace/ocrcheck_67613.trc.
SOLUTION:
There were errors on multiple nodes, but the general fix was to clear some files that are used to identify process state, and then restart.
The files cleared were in the following directory: $GRID_HOME/crs/init
grid@test0a.test.com:$ pwd
/u01/app/12.2.0.1/grid/crs/init
grid@test0a.test.com:$ ls -lt
total 72
-rw-r--r-- 1 root root 0 Jan 4 05:36 test0a
-rw-r--r-- 1 root root 6 Jan 4 05:36 test0a.pid
-rw-r--r-- 1 root root 6939 Sep 8 16:44 afd
-rw-r--r-- 1 root root 7193 Sep 8 16:44 afd.sles
-rw-r--r-- 1 root root 11878 Sep 8 16:44 init.ohasd
-rw-r--r-- 1 root root 12199 Sep 8 16:44 init.ohasd.sles
-rw-r--r-- 1 root root 7394 Sep 8 16:44 ohasd
-rw-r--r-- 1 root root 7715 Sep 8 16:44 ohasd.sles
-rw-r--r-- 1 root root 4347 Sep 8 16:44 oka
-rw-r--r-- 1 root root 564 Sep 8 16:44 olfs
After the files were removed we were able to restart the cluster services without error (this needed to be done on all nodes for a clean start)
Additionally cleared other files on test0c
Cleared all the files under /var/tmp/.oracle/* , /tmp/.oracle/* ,/usr/tmp/.oracle/*, /u01/app/12.2.0.1/grid/ctss/init, /u01/app/grid/crsdata/test0c/output
Finally after the extra files were moved and the below command to start resource was run it all came back clean.
crsctl start res ora.crsd -init
We had an error trying to start the database CRS Cluster Ready services
Application connections were getting TNS errors not able to identify a service:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
Cluster check as user root shows error on all nodes, and would return an error when starting.
[test0b.test.com: trace]# crsctl check cluster -all
**************************************************************
test0a:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0b:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0c:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
**************************************************************
test0d:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0e:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
test0f:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[test0a.test.com: bin]# ./crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'test0a'
CRS-2674: Start of 'ora.crsd' on 'test0a' failed
CRS-2679: Attempting to clean 'ora.crsd' on 'test0a'
CRS-2681: Clean of 'ora.crsd' on 'test0a' succeeded
CRS-4000: Command Start failed, or completed with errors.
crsctl status res -t -init --> was showing:
ora.evmd 1 ONLINE INTERMEDIATE test0c STABLE
grid@test0c.test.com:$ crsctl query crs activeversion;
CRS-6750: unable to get the active version
CRS-6752: Active version query failed.
grid@test0c.test.com:$ crsctl query crs activeversion; --> This command was working fine on other nodes
grid@test0c.test.com:$ crsctl query crs activeversion; --> This command was working fine on other nodes
grid@test0c.test.com:$ ocrcheck; --> This command was working fine all on all nodes except node "C"
Issue started on node "C". When we tried to start CSR on this node, we had following errors in crs log (/u01/app/grid/diag/crs/test0c/crs/trace/alert.log) as:
Note: In 11g cluster logs are in:
/u01/app/11.2.0.4/grid/log/test08/alerttest8.log
cd /u01/app/11.2.0.4/grid/log/test08/client
crsctl_root.log
crswrapexece.log
emcrsp.log
crsctl_orarom.log
olsnodes.log
In /u01/app/grid/diag/crs/test0c/crs/trace/alert.log:
2019-01-04 01:00:43.320 [ORAROOTAGENT(6205)]CRS-5019: All OCR locations are on ASM disk groups [OCR], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/test0c/crs/trace/ohasd_orarootagent_root.trc".
2019-01-04 01:03:45.723 [OCRCHECK(67613)]CRS-1013: The OCR location in an ASM disk group is inaccessible.
2019-01-04 01:03:45.723 [OCRCHECK(67613)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/test0c/crs/trace/ocrcheck_67613.trc.
SOLUTION:
There were errors on multiple nodes, but the general fix was to clear some files that are used to identify process state, and then restart.
The files cleared were in the following directory: $GRID_HOME/crs/init
grid@test0a.test.com:$ pwd
/u01/app/12.2.0.1/grid/crs/init
grid@test0a.test.com:$ ls -lt
total 72
-rw-r--r-- 1 root root 0 Jan 4 05:36 test0a
-rw-r--r-- 1 root root 6 Jan 4 05:36 test0a.pid
-rw-r--r-- 1 root root 6939 Sep 8 16:44 afd
-rw-r--r-- 1 root root 7193 Sep 8 16:44 afd.sles
-rw-r--r-- 1 root root 11878 Sep 8 16:44 init.ohasd
-rw-r--r-- 1 root root 12199 Sep 8 16:44 init.ohasd.sles
-rw-r--r-- 1 root root 7394 Sep 8 16:44 ohasd
-rw-r--r-- 1 root root 7715 Sep 8 16:44 ohasd.sles
-rw-r--r-- 1 root root 4347 Sep 8 16:44 oka
-rw-r--r-- 1 root root 564 Sep 8 16:44 olfs
After the files were removed we were able to restart the cluster services without error (this needed to be done on all nodes for a clean start)
Additionally cleared other files on test0c
Cleared all the files under /var/tmp/.oracle/* , /tmp/.oracle/* ,/usr/tmp/.oracle/*, /u01/app/12.2.0.1/grid/ctss/init, /u01/app/grid/crsdata/test0c/output
Finally after the extra files were moved and the below command to start resource was run it all came back clean.
crsctl start res ora.crsd -init
No comments:
Post a Comment