As you may guess, these blog posts will be the last blog posts of the year :)
This one is for Goldengate.
Recently, started working for a new company and nowadays, I deal with Exadata machines and Goldengate more often.
Yesterday, analyzed a customer environment, where Goldengate was not performing well.
That is, there are was a lag/gap reported in a target database, in which, 55-60 tables were populated by Goldengate 12.2.
When we analyzed the environment, we saw that, it was not the extract process or network which was causing the issue.
The REPLICAT process was also looking good in the first glance, as it was performing well on its trail files.
However, when we check the db side, we saw that there was a lag around 80 hours.. So target db was behind the source db with a 80 hours difference.
We analyzed the target database, because we thought that it might be the cause.. I mean, there could be some PK, or FK missing on the target environment.. (if the keys are missing, this can be a throuble in goldengate replications). However, we concluded that, no keys were missing.
In addition to that, we analyzed the AWR reports, we analyzed the db structure using various client tools (like TOAD) and we check the db parameters, but -> all were fine..
Both source and target databases were on Exadata. AWR reports were clean. Load average was so low, the machine was sleeping and there were almost no active sessions in the database (when we analyzed it real time)
Then we checked the goldengate process reports and saw that REPLICAT was performing very slow.
It was doing 80 tps , but it should be around 10000 tps in this environment..
At that moment, we followed the note, and check the replicat accordingly.
(Excessive REPLICAT LAG Times (Doc ID 962592.1))
We considered the things in the following list, as well:
- Preventing full table
scan in the absence of keys KEYCOLS
- Splitting large transactions
- MAXTRANSOPS
- MAXSQLSTATEMENTS
- Improve update speed - redefine tables - stop and start
replicat
- Ensure effective execution plans by keeping fresh
statistics
- Set Replicat transaction timeout
Unfortuneatly, no matter what we did, the lag was increasing..
Then fortuneatly :), we saw that all these 55-60 tables in the target db had columns with the type of
UROWID..
These columns were recently added to the tables by the customer.
We also discovered that, this performance issue have started, after these columns were added.
We wanted to change the column type, because these UROWID columns have recently become supported with Goldengate..
ROWID/UROWID Support for GoldenGate (Doc ID 2172173.1)
So we thought that these columns may cause the REPLICAT to perform with this low performance.
The customer was using these columns to identify the PK changes and accepted to change the type of these columns to VARCHAR2.
As for the solution, we changed the type of those columns to varchar2 by creating empty tables and transferring the data using INSERT INTO APPEND statements.
Thanks to EXADATA , it didn't take lots of our time and thanks to Oracle Database 12C, we didn't need to gather statistics of these new tables, since in 12C it is done automatic during CTAS and Insert Into Append..
After changing the column type of those tables, we restarted the REPLICAT and the lag was dissapeared in 2 hours.
So, be careful when using UROWID columns in a Goldengate environment..