Let's take a look at Exadata X8M-2 installation process.. Actually, It is a standard procedure that we follow for installing all the Exadata versions..
This time we have ROCE and PMEM inside the machine (in addition to the standard Exadata hardware), but these components don't make any difference while imaging the machine.. We just be sure that they are working properly after the imaging and before installing the GRID & RDBMS software..
In this post, I will give you an overview about the process, some clues and some experiences real life..
Let's start with the installation method;
Well... The installation/imaging is done remotely due to pandemic.
Basically, we need a shell that allows remote copy-paste for during installation process.
Shared Shell does that job -> https://www.oracle.com/support/shared-shell.html
Only the java must be installed on the client computer that runs the shell.
The installations are done from a NFS share. So, we put all the required installation files into the NFS share and make the installation by using them..
The installation process is similar to the method that we follow for installing the earlier versions of Exadata.
We connect to the ILOM interfaces of the nodes using SSH.
We first install the CELL nodes, then the DB nodes.. Actually it doesn't matter which node we start with , but we prefer installing the CELL nodes first..
The steps for imaging/re-imaging is similar both for DB and CELL nodes.. We just use different ISO images..
Before the imaging, we need to have the NFS shares be accessable from the ILOMs of the Exadata nodes. This is caused of the configuration we do, we basically tell ILOMs to reach the NFS and boot from there.
So, we connect the first CELL using ILOM through SSH and we start the /SP/console. (start /SP/console)
We check the current image version of the CELL using imageinfo.
We set the ISO using the set server_URI command. (set server_URI=nfs://nfs_ip/directory/cellblabla.iso)
Note that, we mustunzip the cell image zip file before that.
Then, we use "set /SP/services/kvms/host_storage_device/ mode=remote", and tlater set the boot device to cdrom. (set boot_device=cdrom)
We reboot/hard reset.. the node -> "reset /SYS" and then we run the command "start /SP/console" (to follow the boot process) -- The node boots itself using the new iso placed in the NFS.. So reimaging starts taking place here..
We watch the first boot of the first cell node we are imaging and see if there are any problems or not.. If everything works as expected, we repeat the same process for all the remaining CELL nodes in parallel..
When the boot process is completed, we check the image version with the imageinfo command and expect to see the image version we used for the installation/imaging/reimaging process.
Once we complete reimaging the CELL nodes, we use the same installation method for the DB nodes.. Remember, only the ISO image is different for the DB nodes. ( so we set the correct Image using the set server_URI)
Once the imaging is completed, we expect to see that all the interfaces are up & running. The interfaces with re prefix (re*) are ROCE interfaces.. eth0,bondeth0, re1, re2.. all of the must be up.
Note that, in our case, in one of the CELL nodes, the ROCE interfaces were not present.. We reimaged that CELL multiple times but this didn't solve the issue.
We even check the system by following the MOS Note: Verify RoCE Cabling on Oracle Exadata Database Machine X8M-2 and X8M-8 Servers (Doc ID 2587717.1), but no fix..
That CELL node couldn't see the ROCE interfaces at all.. So we made a dummy replacement.(reseat the cards and cables) , rebooted the machine and it worked! Of course we created an SR and ordered the new ROCE interfaces, but this move saved the day and we could at least proceed with the software installation.
One important note; in these kinds of issues that we identify during the installation, any issues like hardware issues , ILOM alerts, blinking Service led and so on, we create a SR to Oracle Support and solve the issue by following that SR.
Note that, we had Service led blinking in one of the Cell nodes as well.. This time for PMEM.. Well,.. We did that dummy replacement trick and reseated the failing card. It worked for this issue too :) Again, we still ordered the new parts...
We also had a LACP issue in the DB nodes.. The LACP client interfaces were up but not pingable .. We investigated the issue in the Linux side but everyting seemed okay.. Then the customer corrected the configuration in the switch side and LACP interfaces started working properly.. So, if you encounter problems with the LACP - (for the client interfaces), make the customer check the switch configuration and cabling in the first place! :)
Link Aggregation Control Protocol (LACP) on Exadata (Doc ID 2198475.1)
We continue with the software installation; (GRID, Database and all that)
We put the files listed in OEDA output into the nfs share. ( or into a local directory on the first database node) Note that, we don't put the ISO images that we used for the imaging process into the NFS share.
Before starting the software installation, we check all the nodes and ensure that, their network interfaces are up, their dates are correct and sync, their gateways are correct and pingable.. (we do all the environment checks including nslookup actually...) We also check the passwordless ssh connectivity between the nodes ( for root)
Then, we run the install.sh Example:install.sh -cf OEDA_xml-s1 ... In order to run this command, the OEDA itself should be there in the node as well.. Ofcourse OEDA xml too should be there.
Note that, we need to use the same version of OEDA, which we used to create the OEDA xml. If we use a different version of OEDA during the software installation, we may get errors while performing the install steps. So, if we use a newer version of OEDA for the installation, then we need to rebuild our OEDA xml using that newer version of OEDA.. ( We open the newer OEDA, we import the OEDA xml and we save it.. This way, we get our OEDA xml generated with our new version ODEA)
Note that, one of the steps of install.sh updates the environment as 1/8 rack. (if we are performing a 1/8 installation). Install.sh uses OEDA xml as input. So in this step, install.sh decrease the cores and it implements the Capacity on Demand if we choosed that in OEDA. For instance, it decreases the core count of the second db node to 8 and reboot that node.. Then , it decrease the core count of first db node and reboot the node.. After the reboot, we continue with the next step of install.sh.
So we complete all the steps (that we find necessary to be implemented.. For instance ,we don't execute the step named Resecure Machine in some of the installations, as very tight security becomes a problem in some customer sites.) and finish the software installation..
Once the installation is completed, exachk is executed and lastly we get our installation report..
That 's it.. After completing the steps in install.sh, we get ourselves an up&running, ready Exadata X82-M..
You can check my previous Exadata installation-related blog posts about this install.sh process but it is mostly straight forward..
Next post in this series (Part3) will be about on PCA configuration.. I mean configuring the virtual environment in PCA. Stay tuned ! Happy weekend.