Tuesday, January 26, 2016

CRS-4535 Cannot communicate with Cluster Ready Services

In Cluster environment, when you check the status of the CRS (Cluster Ready Service) you may find the error as CRS-4535 Cannot communicate with Cluster Ready Services as shown below.
1
2
3
4
5
[root@rac1 bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
This mainly occurs for two reasons:
1. Check if the nodes are able to ping each other in terms  of respective IPs (Public, Private and Virtual IP).
2. Check if the Grid owner has the permission on the ASM disks on the node where you faced the error.
In my case, GRID owner was user Oracle. The connectivity between the nodes using their Public, Private and Virtual IPs were perfect and was able to ping each other using the above said IPs.
So, the issue laid with the permission of the ASM disks for the Grid Owner (username Oracle)
This is what I found with the permissions for the ASM disks. The disks were owned by ROOT and ORACLE had no permissions on it.
1
2
3
4
5
6
7
8
9
10
11
12
13
[root@rac1 bin]# cd /dev/oracleasm/disks
[root@rac1 disks]# ls -lrt
total 0
brw------- 1 root root 8, 17 May  6 10:16 DISK1
brw------- 1 root root 8, 33 May  6 10:16 DISK2
brw------- 1 root root 8, 49 May  6 10:16 DISK3
brw------- 1 root root 8, 65 May  6 10:16 DISK4
brw------- 1 root root 8, 81 May  6 10:16 DISK5
 
[root@rac1 bin]# ps -ef  | grep css
root      3784     1  0 10:17 ?        00:00:01 /u01/app/grid/11.2.0/bin/cssdmonitor
root      3801     1  0 10:17 ?        00:00:01 /u01/app/grid/11.2.0/bin/cssdagent
root      4189  4107  0 10:26 pts/1    00:00:00 grep css
Now, change the owner of these disks to ORACLE as shown below and also provide appropriate permission for the ORACLE user to read/write these disks.
1
2
3
4
5
6
7
8
9
[root@rac1 disks]# chown -R oracle:dba /dev/oracleasm/disks
[root@rac1 disks]# chmod -R 777 /dev/oracleasm/disks
[root@rac1 disks]# ls -lrt
total 0
brwxrwxrwx 1 oracle dba 8, 17 May  6 10:16 DISK1
brwxrwxrwx 1 oracle dba 8, 33 May  6 10:16 DISK2
brwxrwxrwx 1 oracle dba 8, 49 May  6 10:16 DISK3
brwxrwxrwx 1 oracle dba 8, 65 May  6 10:16 DISK4
brwxrwxrwx 1 oracle dba 8, 81 May  6 10:16 DISK5
Once you have assigned the permission, start the cluster services as the ROOT user.
Change to your $GRID_HOME/bin directory (in my case, $GRID_HOME was /u01/app/oracle/product/11.2.0/grid) and start the cluster services using the CRSCTL utility as shown below.
1
2
3
4
5
6
7
8
9
10
11
[root@rac1 bin]# ./crsctl start cluster
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'rac1'
CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
Check the CSS service status:
1
2
3
4
5
[root@rac1 bin]# ps -ef | grep css
root      3784     1  0 10:17 ?        00:00:01 /u01/app/grid/11.2.0/bin/cssdmonitor
root      4372     1  0 10:30 ?        00:00:01 /u01/app/grid/11.2.0/bin/cssdagent
oracle    4387     1  1 10:30 ?        00:00:02 /u01/app/grid/11.2.0/bin/ocssd.bin
root      5347  4107  0 10:33 pts/1    00:00:00 grep css
Now check if CRS (Cluster Ready Service) is online or not:
1
2
3
4
5
6
[root@rac1 bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@rac1 bin]#
Here we go !!