You know what? - DIAGWAIT

What dose mean the diagwait on RAC environment ?

Oracle Clusterware evicts the node from the cluster when

1. Node is not pinging via the network heartbeat
2. Node is not pinging the Voting disk
3. Node is hung/busy and is unable to perform either of the earlier tasks

In Most cases when the node is evicted, there is information written to the logs to analyze the cause of the node eviction. However in certain cases this may be missing, the steps documented in this note are to be used for those cases where there is not enough information or no information to diagnose the cause of the eviction.

CAUSE
When the node is evicted and the node is extremely busy in terms of CPU (or lack of it) it is possible that the OS did not get time to flush the logs/traces to the file system. It may be useful to set diagwait attribute to delay the node reboot to give additional time to the OS to write the traces. This setting will provide more time for diagnostic data to be collected by safely and will NOT increase probability of corruption.

After setting diagwait, the Clusterware will wait an additional 10 seconds (Diagwait - reboottime). Customers can unset diagwait by following the steps documented below after fixing their OS scheduling issues.

It is important that the clusterware stack must be down on all the nodes when changing diagwait.The following steps provides the step-by-step instructions on setting diagwait.

1. Execute as root
#crsctl stop crs #/bin/oprocd stop

2. Ensure that Clusterware stack is down by running
#ps -ef egrep "crsd.binocssd.binevmd.binoprocd"
This should return no processes.

3. From one node of the cluster, change the value of the "diagwait" parameter to 13 seconds by issuing the command as root:
#crsctl set css diagwait 13 -force

4. Check if diagwait is set successfully by executing. the following command. The command should return 13. If diagwait is not set, the following message will be returned "Configuration parameter diagwait is not defined"
#crsctl get css diagwait

5. Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs

6. Validate that the node is running by executing: #crsctl check crs

Unsetting/Removing diagwait

Customers should not unset diagwait without fixing the OS scheduling issues as that can lead to node evictions via reboot. Diagwait delays the node eviction (and reconfiguration) by diagwait (13) seconds and as such setting diagwait does not affect most customers.In case there is a need to remove diagwait, the above mentioned steps need to be followed except step 3 needs to be replaced by the following command

#crsctl unset css diagwait

Cheers

Comments

Popular posts from this blog

You know what? How to debug RMAN !!

You know what " TRACE for EXPDP/IMPDP"