Friday, November 29, 2019

Exadata/RDBMS -- Database Release Schedule + Support Dates - 19C + Exadata Image Upgrade (to the latest version // 19.3.1)

In order to get a long-term support, customers started to upgrade their Oracle databases to 19C.

"Oracle Database 19C is the most current long-term support release".


Read -> "Release Schedule of Current Database Releases (Doc ID 742060.1)" for more info about the release schedule and support dates.

Exadata customers are also planning to upgrade, but they have an extra prereq, as their image version should support Oracle Database 19C.

19C database upgrade brings the needs for 19C Grid upgrades and Exadata Image version upgrade actually. So this year, there will be lots of upgrade operations in the red stack :)

11GR2 works perfectly well. 18C is an option, but customers prefer to have a long term support, and to be always up-to-date, so the 19C is the best option to upgrade currently. Especially for the customers who follow the techological improvements in the Oracle database closely and/or who try to catch up, as the new developments in the areas outside the database (their integrations targets, their application stack and so on) are moving too fast.

The premier support for 20C and 21C is seems to be short, like 18C. However; 22C will be the next long term support release after 19C..

Anyways, lets go back to our original topic : Upgrading Exadata X3-2 image version to 19.3.1.


Currently, we don't have an image version that supports Oracle Database 19C in Exadata Cloud at Customer(ECC), but we do have an image version that supports  19C database in Exadata on-prem.
This post is based on an image upgrade work, which was recently done on an Exadata X3 environment..

Anyways, as you may guess, I upgraded an Exadata X3's image version to 19.3.1 (the latest release currently) recently.

My customer needed to have an Exadata environment that supports 19C databases.. Ofcourse 19C Grid as well..

So, before upgrading the Grid and Database version , we needed to upgrade the Exadata image version. The machine was a Gen-3 (X3) and that 's why I had some doubts about this upgrade.. Fortuneatly, it went perfectly well :)

I already documented the operation required for upgrading the image version of Exadata in the following blog posts earlier->

https://ermanarslan.blogspot.com/2018/03/exadata-upgrading-exadata-software.html
https://ermanarslan.blogspot.com/2018/07/exadata-image-grid-122-upgrade.html

So, in this blog post, I will try to give you some additional information about these kinds of operations.

In order to upgrade the image version of an Exadata, we must check the documentation and download the correct patches ;

For 19.3.1 , the documentation is as follows:

Latest Image Support Note : Exadata 19.3.1.0.0 release and patch (30441371) (Doc ID 2592151.1)

Exadata 19.3.1 supports the following Grid and Database version:

Oracle Grid Infrastructure:
19.4.0.0.0.190716 *
18.7.0.0.0.190716 *
12.2.0.1.0.190716 *
12.1.0.2.0.190716 *
Oracle Database:
19.4.0.0.0.190716 *
18.7.0.0.0.190716 *
12.2.0.1.0.181016
12.1.0.2.0.180831
11.2.0.4.0.180717

Well.. In order to upgrade our image versions, we download Cell, Database server and Infiniband network software and image files, as shown in the picture below.
Note that, this is a Bare Metal Exadata and it is a X3, so it doesn't have a RDBMA network switch but it has an infiniband network switch.


We use the patch tool/patchmgr, which comes with the Cell server patch , to patch the cell nodes. However; we download and use a specific patch tool/patchmgr for patching the compute/db nodes.( Patch 21634633 in this case...)


We upgrade the Exadata image version by executing 3 the main processes, given below;
  1. Analysis and gathering info about the environment.
  2. Pre-check
  3. Upgrading the Images 
So, we execute the 3 main phases above and while executing these phases, we actually take the following 7 actions;

1) Gathering info and controlling the current environment :
Image Info, DB Home & GRID Home patch levels opatch lsinventory outputs, SSH equivalency check , ASM diskgroup repair times check, NFS shares, crontab outputs, .bash_profile contents, spfile/pfile backups, controlfile traces

2) Running the Exack:
Downloading the up-to-date exachk and running it with the -a argument.
After running the exachk -> analyzing its output and taking the necessary actions if there are any.

3)Creating the necessary group files for the Patchmgr . (cell_group, dbs_group, ibswitches.lst)

4) Running Patchmgr precheck. After analyzing its output-> taking the necessary actions (if there are any) For ex: if there are 3rd party rpms, we may decide to remove them manually before the upgrade.

"-Note that, this time I had to upgrade the infiniband switch release to an intermediate version.."
"-Also, I had to change the Exadata machine's NTP server :) -- full stack.. "

5) Running Patchmgr and upgrading the images. (we do the upgrade in rolling mode)

Before running the patchmgr, we kill all the ILOM sessions.. (active ILOM session may increase the duration of the upgrade)

6) As the post upgrade actions; reconfiguring NFS & crontabs. Also reinstalling the 3rd party rpms (if removed before the upgrade)

7) Post check: checking the databases, their connectivity and alert log files..
Note that : we also run exachk once again and analyze its output to ensure that everything is fine after the Image upgrade.

After upgrading the Image version to 19.3.1, you will see something like the following when you check the versions;

[root@dbadm01 image_patches]# dcli -g all_group -l root imageinfo| grep -E 'Image version|Active image version'
dbadm01: Image version: 19.3.1.0.0.191018
dbadm02: Image version: 19.3.1.0.0.191018
celadm01: Active image version: 19.3.1.0.0.191018
celadm02: Active image version: 19.3.1.0.0.191018
celadm03: Active image version: 19.3.1.0.0.191018
[root@dbadm01 image_patches]# dcli -g ibswitches.lst -l root version |grep version |grep "SUN DCS"
csw-ibb01: SUN DCS 36p version: 2.2.13-2
csw-iba01: SUN DCS 36p version: 2.2.13-2

Note that, the versioning of  Infiniband Images are different than versioning of Exadata Images.
So 2.2.13-2 is the latest image version for infiniband and it is the one that compatible with 19.3.1 Exadata image version..

Well.. That is it for now :) for Exadata :)

Before coming to the end; one more thing...
After upgrading the Exadata Image version; we may want to upgrade our Grid Infra, as well..

In order to upgrade our Grid Infrastructure after upgrading our Exadata image version; we follow the MOS note below;


"19c Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running on Oracle Linux (Doc ID 2542082.1)"

---

That 's it for today :) I hope this has been useful to you:)

See you in my next blog post (which will be about 19C Database upgrades.. -- probably :)

Monday, November 18, 2019

RAC & Exadata // Some Tips & Tricks for the migration ... for the performance

Currently dealing with a RAC/EXADATA migration project..
This time I'm dealing with a mission critical single node environment. This environment is fully CPU-bound, but it has also very fast All Flash Arrays in its storage layer.
I m dealing with a source environment which has lots of connections doing lots of things and using lots of CPU cycles.
The application is not RAC-aware either. With all the things given above, you can guess that the application can not scale out in the database layer very easily.
In order to migrate this kind of a database to RAC/to Exadata, we must take some extra actions, some improvements actually. Some changes for the RAC, for scaling out properly.
Yes, we must think about all the 3 layers.. Application, Database and Storage.
Especially; If the target is an Exadata, then we must concantrate on the Application layer intensively.

In this blog post, I will quickly give you some tips and tricks that you may use before migrating a database environment, which is similar to the one I just described above ;

NOCACHE Sequences: Ordered Sequences do not scale well.
Use non-ordered and cached sequences if sequences are used to generate the primary keys 
ALTER SEQUENCE ERMAN_SEQ1 … CACHE 10000;
If you not cache them, you may see EQ or SQ contentions..
However, know that if you use non-ordered, cached sequences, then you may have non-ordered values in your table columns which are feeded with these sequences..
So, if you can't use cached and non-ordered sequences with your application, then consider having an active-passive configuration. You should consider using that part (the code that uses sequences) of your application only on one instance of your RAC.

Missing cluster interconnect parameter: Making an Oracle database running in Exadata use static infiniband interconnect ip address relies on setting cluster_interconnects parameter.
If not set, Oracle database by default chooses HAIP infiniband addresses for the cluster interconnect and it is not recommended.

This recommendation can also be viewed by analyzing an Exachk report.
That is , if we don't set the cluster_interconnects parameter in the database and leave Oracle database to use the default HAIP interconnects, then Exachk will report a failure saying "Database parameter CLUSTER_INTERCONNECTS is NOT set to the recommended value"

Database parameter CLUSTER_INTERCONNECTS should be a colon delimited string of the IP addresses returned from sbin/ifconfig for each cluster_interconnect interface returned by oifcfg. In the case of X2-2 it is expected that there would only be one interface and therefore one IP address. This is used to avoid the Clusterware HAIP address; For an X2-8 , the 4 IP addresses should be colon delimited

So use ifconfig to determine the ip addresses assigned for ib0 and ib1 interfaces (not the ib0:1 or ib1:1) on all the rac nodes, and set these ip address in a colon delimeted strings for all the instances and restart the database;
Ex:

alter system set cluster_interconnects='ib_ipaddress1ofnode1:ib_ipaddress2ofnode1' scope=spfile sid='SIDinst1';

Hard parses :  Use soft parsing to save and reuse the parse structure and execution plan. With soft parsing, metadata processing is not required.
When soft parsing, you don't parse and describe.. You only execute and fetch.. Yes, we are talking about eliminating cost here.

Avoid Hard Parsing by Prepared Statement and using Bind Variables...

Instead of;
String query = "SELECT LAST_NAME FROM "+"ERMANS WHERE ERMAN_ID = " + generateNumber(MIN_ERMAN_ID,MAX_ERMAN_ID);
prepStmt = connection.prepareStatement(query);
resultSet = pstmt.executeQuery();   
Change to:
String query = "SELECT LAST_NAME FROM "+"ERMANS WHERE EMPLOYEE_ID = ?";
prepStmt = connection.prepareStatement(query); 
int n = generateNumber(MIN_ERMAN_ID,MAX_ERMAN_ID)
prepStmt.setInt(1, n);
resultSet = pstmt.executeQuery(); 

Caching the soft parses :  Although soft parsing is not expensive, it can still take some time.. Consider using statement caching.
for ex : oracleDataSource.setImplicitCachingEnabled(true) + connection.setStatementCacheSize(10);

High number of concurrent sessions : In order to control concurrent sessions, consider using connection pooling. Consider limiting the pool and processes to avoid connection storms. Ensure that load is balanced properly over the RAC nodes.

High number of database sessions : More processes means higher memory consumption, bigger page tables, higher risk of paging, higher system CPU time. Consider using connection pools, consider releasing the connections when your works are finished.

Small Log Buffer Size:  Consider making log_buffer parameter bigger if you see log_buffer_space waits even in your current platform.

High interconnect usage: Using interconnect is expensive even if you use infiniband for the interconnect communication.. It is expensive, since it depends on lots of things. 
For ex: Even  the things that LGWR does, are important while we are using the interconnect. That is; when blocks with pending changes are pinged by other instances , the related redo must be written to log, before the block can be transferred. So in such an environment, where you have chatty processes that manipulate and read the same data blocks, you may even consider having the sessions that manipulate the same data frequently to connect to the same RAC node all the time.

Performance tuning: It just never ends. Like peeling an onion… There’s always another layer.. Consider using Partitioning, compression, indexes (you may even consider dropping some of them) and so on. Consider implementing RAC best practices..
Global hash partitioned indexes & Locally partitioned indexes -> Both of these make you achieve better cache locality.

Re-considering current underscore parameters : Any reasons? , for ex: _disk_sector_size_override=TRUE  (YOU DON’T NEED TO SET IT – 1447931.1)

EXADATA -- Connecting to the nodes using the Serial Management Port (RJ45-DB9-RS232-USB) + Connecting to the Cisco switch via CAT-5

Under certain circumstances, we connect to Exadata machines using a serial connection  or using a cat5 cable connected to the Cisco switch.
These kinds of connections may be needed while re-imaging Exadata or while installing an Exadata.
Basically, we may need to have these kinds of connections when we don't have a proper network configuration on the Exadata machines that we are dealing with.

In this blog post, I 'm going to explain both connecting our laptop to Exadata nodes using a serial connection and connecting our laptop to Exadata nodes using a TCP-IP connection directly through the Cisco-Switch (Admin network).

 RJ45-DB9-RS232-USB cable

Let's start with Serial connection;

This kind of a serial connection is actually for connecting to ILOMs via serial connection.
We actually connect to the nodes using the Serial Management Port through a RJ-45 cable, but that cable is converted to USB for the client side.(as seen in the picture above)

First, we download and install a usb to serial port converter driver to our client. (most of the case, we use windows clients) .  This driver can be found on Internet, for ex: CDM20814_Setup.exe

Then we open up our Device Manager and check our  COM X connection port. Yes, we need to find that X :) --> for instance : COM3, COM2


Then we connect the serial port side of our cable to the serial port of the node that we want to connect. (compute node, cisco switch etc)
Ofcourse we connect the usb side of our cable to our client machine.

Once the cabling is done, we use Putty to connect to the device using the COM Port.
That is; in putty, we choose the COM port/Serial connection and click Open..  (baud 9600,  8N1: eight data bits, no parity, one stop bit, Disable software flow control (XON/XOFF))

After clicking Open, we press the Enter key 3-4 times and we find ourselves in ILOM console of the machine.

Lastly, we reach the SSL console of the machine and that's it :)

ermanarslanexadata02-ilom login: root
Password: 
Oracle(R) Integrated Lights Out Manager
Version 3.1.2.10 r74387
Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.
 -> 
 -> start /SP/console
Are you sure you want to start /SP/console (y/n)? y
 Serial console started.  To stop, type ESC (

Let's take a look at connecting our laptop to the the Cisco switch using a cat-5 cable;

Ofcourse this is easier than having a  serial connection.

We connect our client to a free port of the Cisco switch using a cat-5 cable.
Then we configure our network adaptor by specifying an IP address manually. (we use an IP address assigned to one of our PDUS.. We also specify correct Subnet and Gateway for our network adaptor..) -- note that this info can be derived from the OEDA output.
Once configured, we are done.. We may see an exclamation mark (!) on our network adaptor , but this is Ok.. We may now connect to our db nodes, to our cell nodes and even to our infiniband switches using SSH :)

I'm done :) I hope, this will help you ..

Sunday, November 10, 2019

Cloud -- Creating ADWH(Autonomous Data WareHouse) & Connecting to ADWH

In today's post, I will give you a quick introduction about Oracle Cloud..
This post will be about connection to Oracle Cloud specifically and we will take a look at creating an ADWH (famous subject these days) and connecting to ADWH by going through some screenshots.

Actually, these things are so easy now.  The Cloud Console of OCI(Oracle Cloud Infrastructure) is quite good , capable and stable, I must say..

Basically, in order to create an ADWH; first we connect to Oracle cloud with our cloud credentials , and then we click on the hamburger menu in the left most corner of the page and choose ADWH..

After that we choose/check the configurables in the ADWH create screen and that's it.

Once we create your ADWH, we download the related wallet and use our favorite tool to connect this ADWH of ours with that wallet.

We can't connect to an ADWH with SYS and SYSTEM account, but we have a special account named ADMIN, if we want to be in the admin mode :)

Anyways, lets go through the screenshots and see this in action..

First we connect to our trial cloud account;


After supplying our cloud account (it is like a tenant name), we enter our cloud credentials;


After the login, we click on the hamburger menu and chose Autonomous Data Warehouse from there.


Once we click on the Autonomous Data Warehouse item, we see our current ADWH environments , and we find ourselves in a page where we can create a new Autonomous Data Warehouse.


We click on the Create Autonomous Database button and we find ourselves in a page, where we can create our ADWH.


We enter/choose the relevant items on this menu and lastly click on the "create Autonomous Database" button at the bottom of the page.

So, the creation of  an ADWH on Oracle Cloud is as easy as that :)

Now let's take a look at the connection.. Connecting to an ADWH..

First we click on our ADWH to see its configuration.


Next, we click on the DB connection button and then click on the download wallet button to download the related wallet into our client..


Not that, we supply a wallet password, before downloading the wallet.


Once our wallet is downloaded, we are ready to connect to our ADWH.. Note that, we don't unzip our wallet. We leave it as zipped..


Next, we start our favorite tool for the db connection.. (in this case, it is Sql Developer 19.2)


Then, we supply our wallet (wallet location) and user/pass (ADMIN user in this case.. ADMIN user that was created while creating our ADWH earlier) in the db connection pane.


That's it, we are connected ! :)


Next blog post about this Cloud topic will be about migrating data to an ADWH.. Stay tuned! :)

Friday, November 8, 2019

RDBMS/ECC -- RAT / PL/SQL based Workload / Database Replay with Synchrozation=OFF, WRC:replay clock, Data Divergence and Concurrency Waits

Currently working on a very critical Exadata Cloud at Customer(ECC) migration project.

Actually, I like to lead these kinds of projects, as I find them challenging :) Anyways, there is no fault tolerancy in this project. So we need to migrate all the mission critical  Oracle databases to ECC (Exadata Cloud at Customer) without any failures.

Performance is also very important, as these databases should run faster or at least they must run with the same speed that they run in their source platforms with.

There are critical Oracle E-Business Suite databases and other core business-related databases in the scope and their workload is mostly based on PL/SQL.

Both source and target Oracle Home versions are 11.2.0.4(equipped with latest PSU + RAT Bundle 4) and we must be sure that these databases will run in ECC without any performance problems..

In order to ensure that, we need to make a load test. A real workload test...

Here is the story about it :) ;

We have Oracle RAT / Database Replay opportunity to use for this kind of a test and ofcourse we give it a try.

First we capture and then we analyze.. Once our capture analysis are complete, we replay and then analyze again.

In order to analyze the captured workload, we use Workload Capture Analyzer Report..

There we see something important -> 

PL/SQL
Maximum Workload Impact: 77 % of DB Time
Rationale
If the replay is much slower than expected, try to run in unsynchronized mode.
Action
A significant part of your workload comes from PL/SQL.
If the PL/SQL blocks or functions have 'complicated' logic or multiple commits in them, they are hard to synchronize and they behavior might change during replay.
You might see a different workload profile during replay if this is the case.

Yes, we have a significant PL/SQL workload, and here comes the FACT ->

In 11.2.0.4 Oracle Database environments; a workload with this amount of PL/SQL needs to be replayed with synchronization=OFF..

Also, in 11.2.0.4 ; Oracle doesn't support RAT replay on EBS, as in EBS we have PL/SQL codes that send requests to the Concurrent managers and get into a loop waiting for Concurrent Managers to process these requests. This makes PL/SQL code to wait in an infinite loop, because a Concurrent Manager can't start processing before the calling PL/SQL is completed.

Yes.. When we have a PL/SQL based workload, we end up with PL/SQL synchronization problem during the replay and all the WRC process hang, waiting on WRC:replay clock wait.

If we need to replay this kind of a significant PL/SQL workload with synchronization; our database version should be at least12.2. It is because; we have an option named extended capture in Oracle Database 12.2..

In Extended capture, Oracle RAT captures the SQL/DML operations executed inside the PL/SQL (not the PL/SQL itself). If there is a loop, which executes a DML 100 times with different binds, then extended capture captures those 100 calls with their bind variables.. During replay, RAT executes these 100 calls based on the timing recorded during the capture.

Well.. We are in version 11.2.0.4, so we don't have the extended capture option.
So we must replay our workload with synchronization=OFF.

What does this synchronization mean bytheway?

In sync mode, RAT replays the workload with same concurrency and timing as the production workload. In sync mode, transaction commit order is honored and RAT ensures minimal data divergence.

When we use sync=off; Oracle RAT doesn't enforce any logical dependencies between the calls while replaying the workload.. It doesn't preserve the commit order, it just replays the workload based only on the captured timing.

Ofcourse, this increases the concurrency and increases the data divergence.
However, this is our only chance for replaying this kind of a huge PL/SQL workload on 11.2.0.4 enviromment. (if we use the sync mode, then all our WRC processes hang waiting for WRC:Replay clock waits)

Anyways, we replay this workload with RAT using synchronization=OFF.
It really increases the load of the server, but we can at least get our comparison reports once the replay is finished.

What we see in these reports are mainly high concurrency waits.
These waits are mostly caused by synchronization=OFF, but they contribute on the total DB time recorded during the replay..

Well... We identify these kinds of waits carefully, and ignore them.

This time the comparison report gives us a comparable picture and at least it tells us if our target server can handle this kind of a load or not.. Actually, when we use the sync=off, we do some kind of a stress test.. This means, eventhough, we don't get a good and trustable comparison report with sync=off, we can still identify things like the durability of the target platform and the sqls that speeded up or slowed down.. This is at least something :)

So if you are considering using RAT in your workload tests, keep the things I just wrote above, in your mind.. You may need to empower your RAT-based workload tests with a client-side test tool, a client-side swingbench maybe..

Thursday, November 7, 2019

ECC -- dynamically increasing Cpu Cores of Compute Nodes -- Cpu Steal Time

I needed to increase the cpu core count of an ECC (Exadata Cloud at Customer) 1/4 machine in a customer environment and I did it using the Cloud Console screens as documented and I could do it without any problems. Cpus were increased..
After increasing the Cpu count, I checked the /proc/cpuinfo and saw 44 cores in each node.. Having a core count of 44 is the maximum for an ECC(Exadata 7-based) 1/4 environment.
However; when I checked the instant cpu utilization using the top command, I saw that the newly added cpus were not utilized at all..


The newly added Cpus were not utilized at all, but their %st values were high.
This was interesting, as %st was representing the Cpu Steal Time.
Here is IBM's definition of Cpu Steal Time -> Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.

This didn't make sense, since this was an ECC and it was dedicated to this customer environment.
So I started checking the bugs.
There were bugs recorded for this behaviour, which appears after dynamically incresingVCpus in ECC environments and reboot seemed the best solution ..

We rebooted the nodes one by one and the issue dissapeared.. the newly added CPUs started to be utilized after rebooting the nodes.

I give you the version info of this environment below;

Kernel version: 4.1.12-94.8.4.el6uek.x86_64 SMP
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.7.0.0.180821

So, if you have a similar problem, you can consider rebooting the nodes one by one.. But the real solution should be an ECC image upgrade.. So plan your upgrades on time. Oracle does these upgrades for you in Cloud at Customer model anyways..

Wednesday, November 6, 2019

RDBMS -- Oracle Log Miner Read Rate

If you are working with a tool, that uses the Log Miner behind the scenes, then you may need to check your Log Miner Read rate..

Yes, we use Log Miner in some applications to replicate our data logically from one database to another. As we DBAs check LGWR performance, which is mostly write-based, we may also check Log Miner 's Read rate in these kinds of configurations while dealing with degradations in replication performance.

Here is an example for doing that;

We check our archivelogs and choose 5 archivelog to be used in our calculations..
note that , we choose 2 days old archived logs in this demo..

We take the first archived log and execute the following command:
EXECUTE DBMS_LOGMNR.ADD_LOGFILE( -
LOGFILENAME => '<first_log_file>', -
OPTIONS => DBMS_LOGMNR.NEW);

We then add the remaining 4 archive log files one by one by using the following command;
EXECUTE DBMS_LOGMNR.ADD_LOGFILE( -
LOGFILENAME => '<next_log_file>', -
OPTIONS => DBMS_LOGMNR.ADDFILE);

We get the the FIRST_CHANGE# and NEXT_CHANGE$ values from first log file and use it as STARTSCN and ENDSCN in the following command;

DBMS_LOGMNR.START_LOGMNR
(STARTSCN => <first_change#>, ENDSCN => <next_change#>,
OPTIONS => DBMS_LOGMNR.SKIP_CORRUPTION+ DBMS_LOGMNR.NO_SQL_DELIMITER+DBMS_LOGMNR.NO_ROWID_IN_STMT+ DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);

We start our Sqlplus session and set timing on..

Next, we use the below query to read all the redos from those 5 archivelogs that we choosed in the beginning. At the end of this query execution, we get 'no rows found'. This is okay, as we can still calculate the query execution time/duration. This query execution time is actually the total time required for reading those 5 archived logs.

SELECT * FROM V$LOGMNR_CONTENTS WHERE TABLE_NAME = 'TABLENOTEXIT';

Well, the calculation is simple archived_log_count x archive_log_size / duration.

So if those 5 archived logs each are 2GB in size, and if the query above took 10 mins , then our Logminer Read Rate is (5*2)*(60/10) = 60GB/Hr.

Here is a quick tip for you :)