Monday, December 12, 2016

GRID/RAC -- a real life story on a GRID upgrade.. How to check the status of Grid Upgrade and How to proceed when OUI is not available?

I was upgrading GRID home using OUI. It was a direct upgrade (installation and upgrade at the same time-- by selecting " Upgrade Oracle GI or Automatic Storage Management option") and it was an out-of place upgrade.


The GUI / OUI (Oracle Universal Installer) was doing its job perfectly and after a while, it requested rootUpgrade.sh to be executed on all the RAC nodes.

I was working on my customer's desktop and he was using Xming on top of putty to display the X screens.
Note that:Xming is a client software, that displays the X screens on the client.. But it is not like vncserver. That is if xming crashes, the job that we run on it crashes as well.

Anyways, I executed rootUpgrade.sh on node1 without any problems.
On the other hand, while I was executing it on node 2, the customer's desktop crashed. That is, Xming crashed because of a Client related problem and putty terminated itself...

It was a catastrophic incident that made the rootUpgrade.sh to stop immediately.
So I was like in the middle of nowhere.. That is, the rootUpgrade script was executed on node 2, but its state was ambiguous. Was it able to complete its job? Should it be executed again? Or should we cancel the upgrade at that moment?

The answers for these questions could be given by checking the binaries that are in use and executing the following commands on node 2;

The outputs should have convinced me that our GRID infrastructure is upgraded successfuly. That is,  all the outputs should have pointed me to the newly(upgraded) GRID Oracle Home and to the GRID version.
  • Check Oracle ASM is up & running from upgraded 11.2.0.4 home (use ps -ef)
  • Check the used files and   see they are the files stored in 11.2.0.4/grid.. (use lsof) 
  • Rebooted the server and check cluster services are automatically openned from 11.2.0.4 home without any errors..  (optional)
Analyze the outputs of following commands;
  • crsctl stat res -t -init 
  • crsctl stat res -t 
  • ps -ef |grep d.bin 
  • ps -ef |grep -i ohasd 
  • cat /etc/oracle/olr.loc 
  • crsctl query crs activeversion 
  • crsctl query crs releaseversion 
  • crsctl query crs softwareversion 
  • cat inventory.xml
In my case, all checks passed except the one for "inventory.xml".
This meant the rootUpgrade scripts were completed successfully, however the remaining work of OUI was missing..

The CRS=true flag in the inventory.xml of nodes was set in the line that was describing the old Oracle Home.

In order to fix this; I followed the Oracle Support Document named "How to Complete Grid Infrastructure Configuration Assistant(Plug-in) if OUI is not Available ( Doc ID 1360798.1 )"

I have verified that the CRS=true flag was migrated to the new home in all inventory.xml files and convinced that the GRID upgrade was successful.

At that moment; I was good to proceed with RDBMS Upgrade and I did the RDBMS upgrade without any problems.

So, here is the important conclusion;
  • Always use vncserver and vnc session while doing important works.
  • If your upgrade is terminated because of a server or client problem, don't panic. Check Oracle Support, check using your own method and try to analyze the situation and make a decision accordingly.. If you do your analysis consciously and if you are lucky, then you may be good to proceed.
  • Create a proactive SR (a SR opened before the upgrade) before these kind of important upgrades. A proactive SR which is opened as Severity 1 may give you an extra comfort in cases where something might go wrong.

No comments :

Post a Comment