Thursday, July 26, 2018

Exadata -- Image & GRID 12.2 upgrade

You may remember my article on upgrading Exadata software versions. ->

Exadata Patching-- Upgrading Exadata Software versions / Image upgrade

This time, I 'm extending this upgrade related topic.
So, in this post, I 'm writing about Exadata Image upgrade + 12.2 GRID infrastructure upgrade.

Well... Recently we needed to upgrade Exadata software and GRID infrastructure versions of an Exadata environment.

We divided this work into 2 parts. First we upgraded Exadata images and then we upgraded the GRID version.

Both these upgrades were rolling upgrades. So the databases remained working during these upgrade activities.

Let's take a look at how we do these upgrades.

Exadata Images Upgrades:

We upgraded the image version of a production Exadata envrionment from 12.1.2.1.2.150617.1 to 12.2.1.1.4.171128. We did this work by executing 3 the main processes, given below;
  • Analysis and gathering info about the environment.
  • Pre-check
  • Upgrading the Images in order of ->
    • Exadata Storage Servers(Cell nodes) 
    • Infiniband Switches
    • Compute Nodes (Database nodes)
So, we execute the 3 main phases above and while executing these phases, we actually take the following 8 actions;

1) Gathering info and controlling the current environment :

Image Info, DB Home & GRID Home patch levels opatch lsinventory outputs, SSH equivalency  check , ASM diskgroup repair times check, NFS shares, crontab outputs, .bash_profile contents, spfile/pfile backups, controlfile traces

Approx. duration : 3 hours (done before the operation day)
2) Running the Exack:

Downloading the up-to-date exachk and running it with the -a argument.
After running the exachk -> analyzing its output and taking the necessary actions if there are any.

Approx. duration : 2 hours (done before the operation day) 

 3) Downloading the new Exadata images and uploading it to the nodes.

Approx. duration : 2 hours (done before the operation day)

4) Creating the necessary group files for the Patchmgr . (cell_group, dbs_group, ibswitches.lst)

Approx. duration : 0.5 hours (done before the operation day)

5) Running Patchmgr precheck. After analyzing its output-> taking the necessary actions  (if there are any) For ex: if there are 3rd party rpms, we may decide to remove them manually before the upgrade.

Approx. duration : 0.5 hours (done before the operation day)

6) Running Patchmgr and upgrading the images. (we do the upgrade in rolling mode)

Before running the patchmgr, we kill all the ILOM sessions.. (active ILOM session may increase the duration of the upgrade)

Note: Upgrade is done in the following order;

Exadata Storage Servers(Cell nodes)  (1 hour per node)
Infiniband Switches (1 hour per switch )
Compute Nodes (Database nodes) ( 1.5 hours per node)
  
7) As the post upgrade actions; reconfiguring NFS & crontabs. Also reinstalling the 3rd party rpms (if removed before the upgrade)

Approx. duration : 0.5 hours

8) Post check: checking the databases, their connectivity and alert log files..
Note that : we also run exachk once again and analyze its output to ensure that everything is fine after the Image upgrade.

Approx. duration : 1 hour

GRID 12.2 Upgrade:

As for  the GRID 12.2 upgrade, we basically follow the MOS document below;

"12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running 11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1)"

First, we analyze our environment in conjuction with following the document above to determine the patches and prereq patches required for our environment.

Here is the list of patches that we used during our last GRID 12.2 upgrade work;

GI JAN 2018 RELEASE UPDATE 12.2.0.1.180116 Patch 27100009 
Oracle Database 12c Release 2 Grid Infrastructure (12.2.0.1.0) for Linux x86-64 V840012-01.zip 
OPatch 12.2.0.1.0 for Linux x86-64 Patch 6880880 
Opatch 11.2.0.0.0 for Linux x86-64 Patch 6880880 
CSSD : DUPLICATE RESPONSE IN GROUP DATA UPDATE Patch 21255373

Once all the required files/patches are in place, we do the upgrade GRID by following the steps below;
  1. Creating the new GRID Home directories.
  2. Unzipping the new GRID software into the relevant directories.
  3. Unzipping up-to-date opatch and GRID patches.
  4. If needed, configuring the ssh equivalencies.
  5. Running runcluvfy.sh and doing the cluster verification. (In case of an error, we fix the error and rerun it)
  6. Patching our current GRID home with the prereq patches (in our last upgrade work, we needed to apply the patch 21255373)
  7. Increasing the sga_max_size and sga_target values of the ASM instances.
  8. Configuring VNC (we do the actual upgrade using VNC)
  9. Starting the GRID upgrade using the unzipped new GRID Software (on VNC)
  10. Running the roolUpgrade.sh on all the nodes.
  11. Controlling/Checking the cluster services.
  12. Configuring the ASM compatibility levels.
  13. Lastly, as a post upgrade step, we add the new GRID home in to the inventory.
As you may guess, the most critical steps in the list above, are step 9 and step 10..  (as the actual upgrade is done while executing those steps)

Approx Duration : 4 hours.. (for a 2 node Exadata GRID upgrade)

that's it :) I hope you will find this blog post useful :)

Friday, July 20, 2018

Exadata Cloud Machine -- first look, quick info and important facts

Recently started an ECM (Exadata Cloud Machine) migration project, or maybe I should say an ECC (Exadata Cloud at Customer) migration project.

This is a big migration project, including migration of the Core Banking databases.
It is a long run, but it is very enjoyable.
We have 2 ECCs to migrate to..

Finally, last week, initial deployment of the machines was completed by Oracle.
This week, we connected to the machines and started to play with them :)

I think, I will write several blog posts about this new toys in the coming months, but here is a quick info and some important facts about the ECC environments.

First of all, ECC is an Exadata :) Exadata hardware + Exadata software..

Technically, it is such a virtualized Exadata RAC environment, that we(consultants) and customers can not access its cells, Iloms, switches and hypervisor.

  • It is a Cloud Machine, but it is behind the firewall of the customer.
  • It has a Cloud Control Plane application , a GUI to manage the database services and this application is hosted in OCC (Oracle Cloud Machine), which can be thought as the satellite of ECC. 
  • We do lots of stuff using this GUI. Database Service Creation (11.2.0.4, 12c, 18c) , Patching and etc..


  • Database service creations and Grid operations are automatized. According to version of the database created using GUI, GRID is automatically created.. For ex: If we create a 12.2 database and if it is the first 12.2 database that we create in ECC, GRID 12.2 is also automatically created..(cloud operations) For ex: If we have GRID 12.1 and some DB 12.1 residing in ECC and if we want to create our first and new 12.2 Database, then GRID is automatically upgraded to 12.2 as well.
  • The minimum supported DB version in ECC is 11.2.0.4. So we need to have our db compatible parameter set to 11.2.0.4 (mimum) in order to have a database on ECC -- this is related with the migration operations.
  • We can install Enterprise Manager agents on ECC. So our customer can manage and monitor ECC nodes and databases using its current Enterprise Manager Cloud or Grid control.
  • ECCs are virtualized. Only Oracle can access the hypervisor level. We and the customer can only access to the DOMu. In the DOMu RAC nodes , we and the customer do the OS administration.. Backups, patching, rpm installation and everything.. Customer is responsible for the DOMu machines, where GRID and Databases run on. Customer has root access for the DOMu nodes. (This means DB administration + OS administration is still continuing :))
  • So customer can't access Cell servers, and even ILOM consoles..
  • Administration for everyting that resides below the DomU layer, is done by Oracle
  • Responsibility for everything that resides below DomU layer, is on Oracle.
  • Currently, for every physical node, we have a VM node. For ex: If we have a 1/2 ECC. We have 4 physical nodes and 4 VMs.. (DOM u nodes) -- 1 to 1.
  • We can create RAC multi-node or single node databases on ECC.
  • We can also create databases manually on ECC. (without using the GUI).. Using scripts or runInstaller, everything can be done just like the old days. (as long as versions are compatible with ECC)
  • If we create a 12C database using GUI, it comes as Pluggable.. So if we want to have a non-PDP 12C Database, we need to create it manually.
  • Customer can connect to the RAC nodes (DOMu nodes) using SSH keys. (without password).. This is a must.
  • Customer can install backup agents to ECC.. So without changing the current backup method and technology, customer can backup the databases running on ECC.
  • There is no external infiniband connection to ECC.. External connection can be max 10Gbit.
  • Enterprise Manager Express comes with ECC. We have direct links to Enterprise Manager Express in the Control plane.
  • IORM is also available on GUI. Using GUI, we can do all the IORM configuration.. 
  • In ECC, we can use In-memory and Active Dataguard .. Actually, we can use all the database options without paying any licenses.
  • If we create 12.2 Databases, they are created with TDE.. So TDE is a must for 12.2 databases on ECC.
  • However, we are not required to use TDE, if we are using 11G databases on ECC.
  • The ASM diskgroups on ECC are High redundancy Diskgroups. This is the default an can not be changed!
  • Exadata Image upgrade operations on the ECC environments are done by Oracle.

That'a all for now :) In my next blog post, I will show you how we can create database services on ECC. (using GUI)

Monday, July 16, 2018

RDBMS -- Be careful while activating a standby database (especially in cascaded configurations)

Recently, a customer reported an issue about a standby database, which was out-of-sync with the primary. This standby database was the endpoint of a cascaded configuration.

The cascaded dataguard configuration in that customer environment, was as follows;

Primary -> Standby1 -> Stanby2

So, the customer's requirement was to activate standby1 and continue applying redologs of primary directly to the standby2.

However; while activating, actually after activating the standby database named standby1, the customer accidentally made Standby2 to apply the redologs which were generated by standby1.

When standby2 received and applied the archivelogs from standby1, standby2 became a new standby database for standby1, and it became out of sync with the initial production database.

Interesting, right?

In order to bring the database Standby2 in sync with its original primary database, we did the following;

We used flashback database option to flashback the standby2 to the point before it applied the archivelogs from standby1

Then, we deleted the archivelogs received from standby1 and make sure that standby1 is not sending any archive logs to standby2 until it is converted back to physical standby. ( this way we could ensure that standby2 is applying the redologs only from the production database.)

Note that, if we didn't have the possibility to use the flashback option, we would have to recreate the standy database named standby2...

So, be careful while playing with the dataguard configuration.. Especially in cascaded environments... First check the configuration, then take the action.. In this real life case, the dataguard configuration was from primary to standby 1 and from standby 1 to standby 2.. So when standby1 became activated, that path "from standby1 to standby2" worked, and standby1 became the new primary for standby2.. Incarnation changed and standby2 became out-of-sync with the original primary. 
In order to prevent this to be happen, the dataguard flow(configuration) should have been changed before activating the standby1.

Monday, July 9, 2018

GTECH -- Summer School -- Oracle Database & Cloud & EBS for newly graduates

Once in a year, we as GTech provide training for newly graduated engineers.

In this training, we teach Sql, PL/SQL, Oracle Database & Cloud, EBS, OBIEE, BigData, ETL and more.

This year, I was the lecturer for "Database and Cloud".

Actually, I extended the lessons a little bit by explaining the EBS System Administration Fundamentals, as well. :)

The students of the classes were so curios about databases and actually Oracle in general..

It was a honour for me to present "the introduction to Oracle Database", to explain the "Cloud terms" ( including Cloud-at-customer model) and to explain the "EBS architecture".

I tried to shed a light on the important topics like Oracle Database Server Architecture, Oracle Database Process Architecture,  background processes, High availability configurations and so on..

The list of topics covered in the training was as follows;
  • Introduction to RDBMS
  • Introduction to Oracle
  • Architecture (Oracle)
  • Installation (Oracle)
  • DBA role & DBA tools
  • Introduction to Cloud
  • APPS DBA role & EBS System Administration (EBS 12.2)

While explaing these topics, I tried to share real life stories all the time..  Tried to teach them the basics of Oracle, but I also dived deep when required.

The participants asked lots of good technical questions and these made our lessons more entertaining :)

The training for Database & Cloud lasted 3 days.

While, preparing the slides for the presentations that I have used in the training, I also wrote an exam for the students..

At the end of the training, we also gave this written examination to the participiants. (35 questions )

It was a pleasure for me to teach Oracle in GTech Academy ( GTech -- Oracle University Partner)

I hope, It was useful for these guys..
I also hope I will see them (at least some of them) as successful DBAs one day :)

Following is the picture of our class..  A good memory :)