Erman Arslan's Oracle Blog: RDBMS / CDC -- Log Miner Read performance -- most likely reasons for performance degradation

Sunday, March 22, 2020

RDBMS / CDC -- Log Miner Read performance -- most likely reasons for performance degradation

In one of my previous blog posts, I have shared a method to check LogMiner 's read rate.. As you may remember, in order to get the read rate, we start our log miner session using sqlplus (on a local bequeath connection) and execute a query on v$logmnr_contents table. ( with a dummy condition such as TABLE_NAME = 'TABLENOTEXIT';)

Here is the link for that blog post -> https://ermanarslan.blogspot.com/2019/11/rdbms-oracle-log-miner-read-rate.html

This was an easy but efficent test to see if the log miner is performing well enough to feed our CDC process, but what if we have a bad Log Miner read performance? Ofcourse, the word "bad" may not be the best word for describing this, so let's say what if we have an insufficient Log Miner read rate?
For instance, is 25 GB per hour sufficient for Log Miner read? The answer is, it depends..
Ofcourse, it is not sufficient in a mission critical production environment, which generates high redo (for instance 30GB per hour)..

Anyways.. While investigating these kinds of Log Miner read performance issues, I have faced with some facts that I want to share with you.

First of all, we analyze the log miner read performance by using AWR, doing the log miner read rate test that I mentioned in the first paragraph and getting trace of the session that executes logminer queries. (v$logmnr_contents)

This is where we start actually.
The session trace is very important because it may give us some clues, some unexpected waits that may make us think about the reasons for the low performance.

SQL> spool on
SQL> spool <full_path>/lgmnr.log
SQL> set timing out
SQL> alter session set tracefile_identifier='LOGMNR_TRACE';
SQL> alter session set MAX_DUMP_FILE_SIZE = unlimited;
SQL> alter session set events '10046 trace name context forever, level 12';
SQL> exec sys.dbms_logmnr.add_logfile ( logfilename => '<full-path-name-of-archived-log-file-name>', options => sys.dbms_logmnr.new);
SQL> exec sys.dbms_logmnr.start_logmnr(<options>);
SQL> SELECT * FROM V$LOGMNR_CONTENTS WHERE ................
SQL> alter session set events '10046 trace name context off';
SQL> exec dbms_logmnr.end_logmnr;

Alternatively;

alter session set max_dump_file_size = unlimited;
ALTER SESSION SET TIMED_STATISTICS = TRUE;
ALTER SESSION SET STATISTICS_LEVEL=TYPICAL;
alter session set tracefile_identifier = 'SQL10046';
ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';
SQL> exec sys.dbms_logmnr.add_logfile ( logfilename => '<full-path-name-of-archived-log-file-name>', options => sys.dbms_logmnr.new);
SQL> exec sys.dbms_logmnr.start_logmnr(<options>);
SQL> SELECT * FROM V$LOGMNR_CONTENTS WHERE ................
ALTER SESSION SET EVENTS '10046 trace name context forever,off';

The internals of Logminer process is not perfectly clear on our site.. It is not published and it is propriotary.. But with these tracing processes, we can actually get some info about the internals of it.

Logminer does log file sequential read -> for retrieving redo from the redo/archive logs.

It tracks the redo and archive info from the control file, so time to time -> it does control file reads..

Logminer also make some consistent gets and if you are working on a RAC you can see -> consistent gets and gc cr requests.

In other words; while getting the redo from the redo/archive files, it also gets data from the dictionary.. (making consistent reads probably).. In order to decide on the next redo/archive log file, it also does control file reads..Probably, it merges these redo and metadata in its memory (probably in stream pool), and then it sends the data to our client.

We also see recursive calls during a log miner session.. Those are probably the actual work that is done by the Log Miner.

So, every piece of this execution path should be analyzed to get an understanding of the possible reasons behind the low performance and to tune it as much as we can.

In this context, suppose we have 220.000 records in the archivelog sections of our controlfile.. This is huge right? In this kinds of an environment, we may see control file sequential read waits during our Log Miner-based CDC process.. So in this kind of a situation, we need to clear those records somehow( by recreating the controlfile most probably).. Once we clear those records, we may expect at least a little increase in the Log Miner performance.. You know what I mean right.. Anyways, this kind of a check may be done by using the following query ->

select type,records_total from V$CONTROLFILE_RECORD_SECTION order by records_total desc

Ofcourse the big part of the execution of Log Miner's process is spent while gathering the redo information from the redo and archived logs.. That 's why, Log File read sequential read is very important in this manner. That's why we also make some analysis for the disk layer, the whole access path actually. At this point, analyzing the AWR reports comes into play. We need to have minimal latency for that event.. This is because this wait event is incremented while archiver is archiving, dataguard feeds the destinations and log miner is reading the redo content..

In a write-intensive tuned disk subsystem, we may see latencies in this event, which may make our log miner query take more time.

Memory is important. Log miner uses streams pool and it is configured with streams_pool_size parameter.. Having a low stream pool size value in a busy environment may decrease the Log Miner performance. Log Miner has almost no tuning parameters, so it may be affected by the general instance performance as well.. With this in mind, we need to size our SGA and PGA properly.. We may see degraded Log Miner performance in the instances where we have severly undersized SGA and/or PGA.

Log Miner have also some performance-related bugs which are fixed with patches. In general, there is no harm in applying the latest PSU to the database and it's always recommended. But there are also specific cases such as ;

Bug 18674047 : LOGMINER KEEPS SPINNING ON CPU AFTER UPGRADE TO 11.2.0.4

Bug 19277582 - Logminer hangs on IOT when specifying COMMITTED_DATA_ONLY

In such cases, we need to apply the necessary patches and check the Log Miner performance once again. Having a supported release and paying the error correction for the extended support is important here.. Especially for the 11.2.0.4 customers.. Again for the 11.2.0.4 customers, Oracle doesn't do any developments.. So thats is, if we are on 11.2.0.4 then we can't proceed with Oracle Support in a case where we have a patch conflict.. In other words, Oracle doesn't develop a new patch to resolve this conflict. In this case, we are forced to upgrade our database to a version that has relevant bugfix...

Anyways, there is another point that we need to consider while dealing with log miner performance issues.. It is the database statistics.. I mean, the dictionary and fixed objects stats.. Especially when we see consistent read wait and gc read requests during v$logmnr_contents query.. Again, we are approaching to the fact that Logminer has no performance parameter settings, so database performance tuning is the best option...

In this context, we should gather the stats using the following approach;

Note that we lock the stats for X$KCCLH and then gather the fixed object stats..

Also note that, gathering stats for x$logmnr_contents requires a log miner session to be present.. Actually I don't find is very effective, but Oracle Support told us to do that..

Normally, we don't expect the usual execution plan for the x$logmnr_contents to be changed after gathering stats for x$logmnr_contents.. (FIXED TABLE FULL -> X$LOGMNR_CONTENTS)

But , gathering dictionary and gathering fixed object stats are effective actions. Because there may be Log Miner queries in the background which may use these stats.

connect / as sysdba
SQL> exec dbms_stats.gather_dictionary_stats;
SQL> exec dbms_stats.lock_table_stats (null,'X$KCCLH');
SQL> exec dbms_stats.gather_fixed_objects_stats;
SQL>exec dbms_stats.gather_table_stats('SYS', 'X$LOGMNR_CONTENTS')

So far so good. We have seen different areas for tuning the Log Miner's performance.
Well. The biggest positive impact on Log Miner's performance comes from the database and OS stack actually. The Load average is directly related with the Log Miner performance. It is just simple as that.. This is ofcourse valid for the environments which are in intensive use and actually technically have some room for Log Miner Performance improvement.. When we decrease load average by tuning our database, application and sql statements, we get a better Log Miner read rate..

Eventually, Log Miner depends on computational resources, both I/O and CPU..

We have tested Log Miner performance in different time intervals and in different load averages.. ( Bytheway, I 'm talking about Load Average reported by OS ..) We have seen that when load goes down, Log Miner performance increases. In other words, Log Miner's utilizes more resources and this is actually a good point for Log Miner, as it can increase its performance almost linearly aligned with the increase in the computational resources.

Ofcourse, there can be different reasons that may result a heavly loaded system. Different than poor execution plans and inefficient database tuning.. The hardware itself may be a reason for that as well. Not only the sequential read speed of the disks where the archivelogs and redologs are placed, but the CPU itself may be the reason for that.

For instance; in Supercluster ( and in T7, S7, M7 and M8 Servers) there is a known issue about degraded performance caused by a reduced power mode .. ( cores in reduced power mode)

It is documented in Oracle Support and we check this by using a python script named clockrate.py.

T7, S7, M7, T8 and M8 Servers May Experience Degraded Performance with Cores Misprogrammed in a Reduced Power Mode ( Doc ID 2350944.1 )
Similarly, if our database is running on a consolidation platform and if there are other databases there and if there is too much demand for these databases from time to time, then our load may increase and we can see a degraded performance in our Log Miner sessions. (eventhough our database is idle)

It is needless to say that the network performance is also important.. Especially the network between the CDC client and the Log Miner (the database).

In conclusion, what I want to mention is, while dealing with these issues we follow the approach named IDT. Actually we named it so :) So, we follow the 3-stage approach -> Identification, Diagnosis and Treatment.

Most of the time we identify the problem using monitoring techniques, log miner read rate tests and a bunch of queries for getting a detailed view of redo generation. We diagnose the problem and evaluate the reasons from every angle and then we build our action plan and execute it...

That's all for now. I hope this blog post will help you gain an insight for dealing with Log Miner performance problems..

Sunday, March 22, 2020

RDBMS / CDC -- Log Miner Read performance -- most likely reasons for performance degradation

No comments :

Post a Comment