Tuesday, February 4, 2014

Start crs error CRS-4640 11gR2 clusterware, crsctl start cluster

I have faced the following problem when I was working on several Exadata Systems/RAC systems.
The problem arises after a patch application or maintanence operation, extactly when I try start the crs back again..
CRS can not be started with "crsctl start crs command", as you see below;
/u01/app/product/11.2/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

Note that : crsctl is an interface for controlling Oracle Clusterware objects.

The solution is to use crsctl start cluster command.. That's all but i m writing this post in order to expose the reason lies behind this solution..

So, lets look to the difference between start crs and start cluster is ;

start crs : to start the entire Oracle Clusterware stack on a node, including the OHASD process,which is responsible for starting up all other cluserware processes . This command is to be used only on the local node..
start cluster : to start Oracle Clusterware stack on local node . It does not include the OHASD process.

Okay now we know the difference, but this does not explain CRS-4640 error produced when we used the start crs command..

Additional info: If your OCR and Voting Disks are in ASM, you shouldnt shutdown the ASM instance alone. You need to stop the Oracle Clusterware stack. You have to user crsctl stop cluster -n node_name or crsctl stop crs (on local)

I can not reproduce right now, but this can be the reason.. The error is produced because we shutdown ASM.. We didnt use crsctl stop cluster or crs commands.. Voting disks and OCR need to be mounted for csrd to operate, because OCR contais the cluster node list, services, db instances and node mappings. Oracle Clusterware uses this info to verify cluster node membership and status.. On the other hand, crsctl start crs command should start asm , too.. So why are the errors produced?

Look what happens when I use "crsctl start cluster" command;

 /u01/app/11.2.0.3/grid/bin/crsctl start cluster -all

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'erm01'
CRS-2672: Attempting to start 'ora.cssd' on 'erm01'
CRS-2672: Attempting to start 'ora.diskmon' on 'erm01'
CRS-2672: Attempting to start 'ora.ctssd' on 'erm01'
CRS-2672: Attempting to start 'ora.evmd' on 'erm01'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'erm01'
CRS-2672: Attempting to start 'ora.asm' on 'erm01'
CRS-2672: Attempting to start 'ora.crsd' on 'erm01'

As you see, crsctl start cluster command starts asm and all necessary components before crsd ..
Note that it doesnt start OHASD , and that s why CRS-4640 error is not produced..  We have OHASD process already started. (check -> ps -ef|grep init.ohasd|grep -v grep)

Thanks to Bharat Damarla.(http://startupforce.wordpress.com/2013/05/17/rac-11gr2-startup-sequence)
He summarized clusterware sequence with the following diagram..
You can see it in the following figure, too -> OHASD starts crsd..
So what happens is;
"crsctl start crs" tries to start OHASD and can not do that as it s already started.. In my opinion, that is why it can not continue and start crs... The root cause seems to be stopping ASM without using crsctl stop cluster or crs commands, as this leaves an improper environment for crsctl start crs command..

In conclusion, we use crsctl start cluster in this situation, as we have OHASD up and running , which is a prereq for "crsctl start cluster command", and that's why crsctl start cluster becomes our solution..

7 comments :

  1. Thanks Man. Your post saved my time - iqbal

    ReplyDelete
  2. Beautiful! thanks! Saved my timed too!

    ReplyDelete
  3. [root@rac2 bin]# ./crsctl start cluster -all
    CRS-4404: The following nodes did not reply within the allotted time:
    rac2
    CRS-4690: Oracle Clusterware is already running on 'rac1'
    CRS-4705: Start of Clusterware failed on node rac2.
    CRS-4000: Command Start failed, or completed with errors.
    [root@rac2 bin]#

    ReplyDelete
  4. Hi Debasis,

    This is very generic. You need to check the logs. Start with ->
    /log//alertnode1.log:
    /log//crsd/crsd.log:

    ReplyDelete
  5. Dear Erman Arslan

    We have similar issue Node1 is working fine and Node2 throws Oracleasm module error. Both node on cluster with ASM RAC. Our OS is Redhat 7.0.
    We are able access db thru Node1 but Node2. Could you please help us.

    Thanks in advance.
    Regards
    GANESAN

    ReplyDelete
  6. Hi Ganesan,

    I give support through my forum.
    Please use the link "http://erman-arslan-s-oracle-forum.2340467.n4.nabble.com/RAC-f7.html" to create an issue into the RAC category.

    ReplyDelete
  7. Thanks Erman Arslan. Mr.Faraz from our team will use the link with details. Thanks a lot.

    ReplyDelete