Friday, November 8, 2019

RDBMS/ECC -- RAT / PL/SQL based Workload / Database Replay with Synchrozation=OFF, WRC:replay clock, Data Divergence and Concurrency Waits

Currently working on a very critical Exadata Cloud at Customer(ECC) migration project.

Actually, I like to lead these kinds of projects, as I find them challenging :) Anyways, there is no fault tolerancy in this project. So we need to migrate all the mission critical  Oracle databases to ECC (Exadata Cloud at Customer) without any failures.

Performance is also very important, as these databases should run faster or at least they must run with the same speed that they run in their source platforms with.

There are critical Oracle E-Business Suite databases and other core business-related databases in the scope and their workload is mostly based on PL/SQL.

Both source and target Oracle Home versions are 11.2.0.4(equipped with latest PSU + RAT Bundle 4) and we must be sure that these databases will run in ECC without any performance problems..

In order to ensure that, we need to make a load test. A real workload test...

Here is the story about it :) ;

We have Oracle RAT / Database Replay opportunity to use for this kind of a test and ofcourse we give it a try.

First we capture and then we analyze.. Once our capture analysis are complete, we replay and then analyze again.

In order to analyze the captured workload, we use Workload Capture Analyzer Report..

There we see something important -> 

PL/SQL
Maximum Workload Impact: 77 % of DB Time
Rationale
If the replay is much slower than expected, try to run in unsynchronized mode.
Action
A significant part of your workload comes from PL/SQL.
If the PL/SQL blocks or functions have 'complicated' logic or multiple commits in them, they are hard to synchronize and they behavior might change during replay.
You might see a different workload profile during replay if this is the case.

Yes, we have a significant PL/SQL workload, and here comes the FACT ->

In 11.2.0.4 Oracle Database environments; a workload with this amount of PL/SQL needs to be replayed with synchronization=OFF..

Also, in 11.2.0.4 ; Oracle doesn't support RAT replay on EBS, as in EBS we have PL/SQL codes that send requests to the Concurrent managers and get into a loop waiting for Concurrent Managers to process these requests. This makes PL/SQL code to wait in an infinite loop, because a Concurrent Manager can't start processing before the calling PL/SQL is completed.

Yes.. When we have a PL/SQL based workload, we end up with PL/SQL synchronization problem during the replay and all the WRC process hang, waiting on WRC:replay clock wait.

If we need to replay this kind of a significant PL/SQL workload with synchronization; our database version should be at least12.2. It is because; we have an option named extended capture in Oracle Database 12.2..

In Extended capture, Oracle RAT captures the SQL/DML operations executed inside the PL/SQL (not the PL/SQL itself). If there is a loop, which executes a DML 100 times with different binds, then extended capture captures those 100 calls with their bind variables.. During replay, RAT executes these 100 calls based on the timing recorded during the capture.

Well.. We are in version 11.2.0.4, so we don't have the extended capture option.
So we must replay our workload with synchronization=OFF.

What does this synchronization mean bytheway?

In sync mode, RAT replays the workload with same concurrency and timing as the production workload. In sync mode, transaction commit order is honored and RAT ensures minimal data divergence.

When we use sync=off; Oracle RAT doesn't enforce any logical dependencies between the calls while replaying the workload.. It doesn't preserve the commit order, it just replays the workload based only on the captured timing.

Ofcourse, this increases the concurrency and increases the data divergence.
However, this is our only chance for replaying this kind of a huge PL/SQL workload on 11.2.0.4 enviromment. (if we use the sync mode, then all our WRC processes hang waiting for WRC:Replay clock waits)

Anyways, we replay this workload with RAT using synchronization=OFF.
It really increases the load of the server, but we can at least get our comparison reports once the replay is finished.

What we see in these reports are mainly high concurrency waits.
These waits are mostly caused by synchronization=OFF, but they contribute on the total DB time recorded during the replay..

Well... We identify these kinds of waits carefully, and ignore them.

This time the comparison report gives us a comparable picture and at least it tells us if our target server can handle this kind of a load or not.. Actually, when we use the sync=off, we do some kind of a stress test.. This means, eventhough, we don't get a good and trustable comparison report with sync=off, we can still identify things like the durability of the target platform and the sqls that speeded up or slowed down.. This is at least something :)

So if you are considering using RAT in your workload tests, keep the things I just wrote above, in your mind.. You may need to empower your RAT-based workload tests with a client-side test tool, a client-side swingbench maybe..

No comments :

Post a Comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.