Monday, January 18, 2016

Troubleshooting "one of the RAC node is down.

his document is still under constructions:

Note-1: These notes were conducted while troubleshooting a node down issue with Hadi.

crsctl status res -t

Login as root
cd $GRID_HOME/bin
./crsctl stat res -t -init
./crsctl start cluster  
./crsctl stop crs -f
./crsctl start crs

CRS Log files.
Check the CRS related log files to get some errors that can point you to the root cause.
$GRID_HOME/log/nodename/cssd/ocssd.log
$GRID_HOME/log/nodename/diskmon.log
cd $GRID_HOME/log/nodename/crsd/crsd.log
cd $GRID_HOME/log/nodename/alertstgrac1.log
cd $GRID_HOME/log/nodename/orarootagent_root.log
cd $GRID_HOME/log/nodename/ohasd.log


OCR
ocrcheck

nodes in the cluster
olsnodes -n -p -i

voting disk information
crsctl query css votedisk

cluster health
crsctl check cluster

OS logs
/var/log/messages.

Check Diskgroup status from other nodes.
Login as grid
. oaenv
+ASM1 or +ASM2 or +ASM3 → depending on which node you are on (working node)
select NAME , STATE FROM V$ASM_DISKGROUP;

To mount a disk from asmcmd
asmcmd mount OCRVOTE

To check the voting disk status
crsctl query css votedisk  


Restart ASMLib
/usr/sbin/oracleasm exit
oracleasm init
oracleasm scandisks
oracleasm listdisks
oracleasm status



Verifying oracleasm disks are available
Assuming oracleasm disks are in /dev/oracleasm/disks then perform the following on all nodes.
ls -ld /dev/oracleasm/disks/

Oracle Documents that can help
Doc ID 1050164.1 → Linux: 11gR2 GI Doesn't Startup After Node Reboot Due To Incorrect ASMLIB Setting
Doc ID 1050908.1 → How to Troubleshoot Grid Infrastructure Startup Issues
Doc ID 1054902.1 →  How to Validate Network and Name Resolution Setup for the Clusterware and RAC
Doc ID 1068835.1 → What to Do if 11gR2 Clusterware is Unhealthy


./cluvfy stage -post crsinst -n rac1,rac2 -verbose ? Replace appropriate nodenames for rac1 and rac2 while running the command.


Where is the Voting disk in ASM?

$ kfed read /dev/mapper/vg00-lvasm

No comments:

Post a Comment