Erman Arslan's Oracle Blog: RAC -- interconnect and gc waits, are select statements affected?

Friday, October 7, 2016

RAC -- interconnect and gc waits, are select statements affected?

Recently dealed with a cluster wait issue on a RAC database, which was serving as a production database of a Bank.
During the work, there was a question asked by the folks and I felt that there was a misconception about the defition called "writers do not block readers"
They thought that the readers , who are actually using only the select statements should not be affected by the cluster waits...(which is not true bytheway)
Writers do not block readers is a true concept, but it has some additional and undeniable cost when the system is an Active-Active cluster (i.e RAC)

Look at this AWR:

Top User Events

Event	Event Class	% Event	Avg Active Sessions
gc buffer busy acquire	Cluster	75.00	33.73
enq: TX - row lock contention	Application	7.55	3.40
enq: TX - index contention	Concurrency	3.59	1.61
gc cr block lost	Cluster	2.94	1.32
gc buffer busy release	Cluster	2.62	1.18

See the top waits?

Look at the query recorded in AWR ->

SQL ID	PlanHash	Sampled # of Executions	% Activity	Row Source	% RwSrc	Top Event	% Event	SQL Text
38prqpb98prvm	1219140043	143	32.26	INDEX - UNIQUE SCAN	32.22	gc buffer busy acquire	30.75	SELECT count(*) FROM BLA_BLA...

You see the wait? (note I will not go into the details of this gc buffer busy acquire wait, it is beyond of the scope of this blog post, but what I will say is, it is there because of the contention and the select recorded in the AWR is affected from it.)

It definetly means a contention and, we may be talking about a Read-Write contention here, which can happen in RAC system for sure. (note I didnt copy paste the DMLs s here, but trust me .. they are there)

While instance B is making modification on a block, if you try to read(even only read) the same block from instance A, then you may see these waits.

Here is what happens in a scenario like this;

Instance A sends a request to GCS process, which manages this locking.
GCS checks the condition of the block and sees that it is locked by the Instance B.
GCS says to Instance B: remove your lock
If the session in Instance B does not want to remove its lock (if the session is still working with the block), it creates a Consistent Image in its buffer cache and sends this consistent image to Instance A. (consistent read)
This consistent image actually reflects the state of the block, which represents how the block looked like just before the modification.
This consistent image, as you may guess is sent from the private interconnect and Instance B gives this delivery info to GCS.

Well...We can say that this situation can be called a remote cache hit, and the turn of events explained above is caused by the read-write contention.

Note1:Remote cache hit costs more that a Local cache hit.
Note2:In order to have more local cache hits, it is recommend to do a configuration that will make the session that needs the block and the session that holds the lock for that block to be in the same RAC instance.
Note3:When we talk about a write-write contention, the LGWR, redo flush and PI(Past Image) creation may also come in to play. So it costs even more.

What can be done to prevent these inter-instance contentions?

Partitioning, reverse key or hash partitioned indexing and etc..

However, the best approach would be building a RAC configuration, that will make the session, which needs the block and the session which holds the lock for that block to be in the same RAC instance. (for example, if we have 3 instance, and if we see the data blocks for the same objects are frequently transferred from instance 3 to instance 2, we need to think that -> why dont we dedicate the related service to instance 2 and prevent this to happen, right?)

In order to make this happen, the approach to be followed is to divide the applications into modules and to configure these modules to connect to the db services, which are dedicated to instances.

So, this blog post is a specific one. The main idea here is; reader must not see the changes that are done but not committed by the writer. Altough, RAC is based on a shared disk/interconnect architecture, it has a non-shared memory, which acts like a shared one --cache fusion.
Cache fusion is a very sophisticated product and there are serialization mechanisms that come in to play to provide it. In brief, the cache fusion depends on in memory block locking, I/O, interconnect and interprocess communication. So it is costly.

Note: in the next few days, I will be writing a more detailed blog post about RAC. So, a more detailed article about RAC is on its way to you.

Update:

Keep in mind that, the Oracle database versions < 11.2.0.4, there are lots of bug reported with these issues.
DOC ID 1907369.1 is one of the documents reporting them.
The recommended solution for getting rid of those bugs are upgrading the database to 11.2.0.4So if you see a sessing get stuck on gc current request, you may be hitting those bugs.
Recommend solution for those bugs is; killing the stuck session or upgrading the database to 11.2.0.4, as most of those bugs are fixed in that release.

1 comment :

UnknownFebruary 14, 2019 at 2:46 PM
Awesome posting..Please keep posting
ReplyDelete
Replies

Add comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.

Subscribe to: Post Comments ( Atom )