Erman Arslan's Oracle Blog: October 2016

Monday, October 31, 2016

RDBMS/EBS - DST patches, Turkey decided not to observe daylight saving time anymore

In the last few years, Turkey have changed the daylight saving rules. In 2015, the government has deciced to delay the DST change for two weeks and now this year, Turkey decided not to observe daylight saving time anymore.

Well these kinds of changes have brought us a responsibility to change our system environments, the timezone data and dst rules actually.

I wrote some articles about this DST changes and their affects in the last few years, I will not go into the details about the affects of these changes in this article, but I will do an overview, give you the things that we have done this year to configure our EBS systems according to this DST rules change and give you some support notes for dealing with these kinds of DST rule changes in different layers of our system environments.

Before going forward, please read the following articles for having the necessary info...

http://ermanarslan.blogspot.com.tr/2014/04/ntp-and-timezone-data-why-do-we-need.html

http://ermanarslan.blogspot.com.tr/2015/10/linux-oracle-about-dst-patches-turkey.html

Well, this year the change in Turkey's DST rules became more obvious, so Oracle and other vendors released lots of articles for the affects of these changes and the things to be done for being aligned with these changes.

The things to be done, however; seems to be different according to our environments, actually according to the datatype and objects that our databases or applications use.

MOS document named "Turkey Continues on Daylight Saving Time From 2016 - Impact on Oracle RDBMS (Doc ID 2185562.1)", has clearly identified the issue and given the instruction for the affected environments.

The MOS document with ID: 2185562.1 was especially written for the Database tier and it has given us a check list to make us ensure if our Database environments are affected by this change.

The main areas that may be affected were;

The DATE dataype and SYSDATE

Usage of TimeStamp with (Local) Time Zone datatype.

TimeStamp with Time Zone datatype

TimeStamp with Time Zone columns

Database timezone and Session timezone

Usage of JDK in Oracle Home

Usage of JVM in Oracle Database (with timezone and local time) -- EBS does not use this.

Usage of DBMS_SCHEDULER for custom and critical database jobs.

...

That is, altough installing the OS tzdata rpm is mandatory, we could decide if we need to apply the DST patches for the database, OVJM, JDK in Oracle Home and etc. We could decide by checking our database tier by using the info and scripts given in the support node, 2185562.1 .

So, at the end of the day, the decision was ours . "If we were infected, we must apply, if we were not affected(not dependent on the items given in the above list) , then we could decide whether to apply or not.

However, this was only for the database tier. Well... Think about EBS.. In EBS, we have application tier components right? We have FMW, Oracle Homes, Concurrent Tier and etc..

So, we needed to follow a speficic document for EBS, as you may guess.

There were 2 document in this scope and they were different to eachother according to the EBS version that were written for.

Complying with Daylight Saving Time (DST) and Time Zone Rule Changes in E-Business Suite 12(Doc ID 563019.1)

Complying with Daylight Saving Time (DST) and Time Zone Rule Changes in E-Business Suite 11i(Doc ID 458452.1)

These documents were actually contains the instructions in the DST document prepared for the RDBMS (2185562.1)), but they also included instructions specific to the EBS under the section named Application Server Tier Patches.

The interesting and dissapointing thing was the patches documented in those notes, were not present for EBS application tier components . This was true for EBS 12.2, as well.

The patches were not available for EBS because the application server components used in EBS, were actually old versions.

For example, the FMW home version, for instance was 11.1.0.7. The Oracle Homes were also old.

It didn't matter if we use EBS 12.2 or 12.1 or 12.0... The patches were not there and we needed to log Support Requests in order to obtain these patches.

This was quite dissapointing because this was even true for the latest EBS , which was 12.2.

The consequences of not applying these patches, were also given in those documents. So again, it became a decision to apply or not to apply these patches.

Here are the consequences;

If you enabled User Preferred Time Zone at your site, the Date with Time components displayed in interactive user interfaces might be incorrect.
The system generated dates and dates with time component might be incorrect. For dates without time components, the incorrect date might be generated. --> This is from OS.
If you ar using Gantt Charts, time zone related functionality such as scheduling and representations in Gantt charts will be incorrectly rendered.
Unpatched E-Business Suite environments that have been integrated with external systems will process inbound and outbound data with time components incorrectly.

So, this meant; if we were not using the things in the above list, we were not affected. That is, we could decide to not to apply the patches in EBS apps tier, and actually this was what we have done in most of our customer environments;

So at the end of the day; what we have done was, checking the database tier and the items in above consequences list, seeing that the environments were not affected and then applying the OS patches only.

We have done this in more than 10 customer environments and didn't see an issue. (however, we have done the checks the right? so it may not be your case)

The instructions for dealing with the DST rule changes in Engineered systems were also differ. For example, Oracle released a seperate document for Exadata, Exadata Instructions as Turkey Continues on Daylight Savings Time From 2016 (Doc ID 2195169.1).

Well, in short; in order to deal appropriately with DST rule changs, we need to discover our environments, we need to check our environment component by component by following the documents specifically written for the components and decide whether to take action or not. Ofcourse, the best way is to apply all the action plans for all the environments and components that we have in our Infrastructure, but if we think that we are not affected, the decision and the risk is still ours.

Thursday, October 20, 2016

EBS -- Workflow Mailer Problem parsing XML -> org.xml.sax.SAXException: Problem obtaining the RESOURCE content

You may encounter this issue on EBS 12.2 environments, after modifying the value of
WF_MAIL_WEB_AGENT profile.

This profile option specifies the host and port of the Web server that notification mailers use to generate the content for Oracle Applications Framework regions that are embedded in notifications. If this profile option is not set, notification mailers will use the same Web agent specified in the Application Framework Agent profile option.

I'm writing this because, there is no recorded bugs for this issue for EBS 12.2. (Actually there are bugs records and fixes for these types of problems which may be encountered in earlier releases)

Look at the real life example below;

Error log entry in Notitication mailer's log file:

oracle.apps.fnd.wf.mailer.NotificationFormatter.getFormattedMessages()]:Problem parsing XML -> org.xml.sax.SAXException: Problem obtaining the RESOURCE content -> java.net.MalformedURLException: For input string: "8000 "

Impact:

Notification emails can not be delivered by the Workflow Mailer.

Take a closer look at the log :

java.net.MalformedURLException: For input string: "8000 "

You see the space there? I mean"8000 " -> there is a space there... This value comes from the WF_MAIL_WEB_AGENT profile. In this real life example, it is probably a copy&paste victim.

So , because of that space , java throws MalformedURLException..

Solution:

Modify the WF_MAIL_WEB_AGENT and delete the space from there (better to delete the value and rewrite) and restart the workflow notification mailer.

That is it... That is the tip of the day.

EBS/Web Discoverer 11G - Java 8 certification

Discoverer Web 11g is certified with the client side Java 8. (it is only for EBS customer connection to an apps mode EUL)

However, to be able to use Discoverer Web 11g with client side Java 8, following should be done;

Discoverer should be upgraded to 11.1.1.7.0 (if it is not already in that patch level) and patch 20219003 should be applied.
TLS 1.1 and TLS 1.1 should be disabled in the client side Java as well..(de-select TLS 1.1 and TLS 1.2 in the Java Configuration Panel)

Reference Note:JRE 8 will enable, by default, TLS 1.1 and 1.2 for SSL communication which may lead to an SSL error if those protocols are not configured. See: Discoverer 11g Plus Fails To Launch With Errors Using JRE 8 On The Client PC (Doc ID 1638346.1)

Tuesday, October 18, 2016

Practical Oracle E-Business Suite has published!

Just published! my new book on EBS. http://www.apress.com/9781484214237

Some might say: "the blog turned into a book", but it was not actually the case for me.
Our story has begun in the spring of 2015. Syeed Zaheer(my coauthor and friend) and I have written and reviewed approximately 800 pages since that time.

After all the efforts, hard work, dedication and patience, I 'm pleased the announce that my book, "Practical Oracle E-Business Suite" has released and it is now available on amazon.com.

Amazon Link: www.amazon.com/Practical-Oracle-Business-Suite-Implementation/dp/1484214234

Official website of Practical Oracle E-Business Suite : http://www.apress.com/9781484214237

This book is a complete Implementation and Management Guide for Oracle E-Business Suite.
Although we have focused on the latest EBS 12.2 release in this book, key areas in R12.1 are also covered wherever necessary.

At the time of writing this blog post, this book is the one and only book written for EBS 12.2.

It’s official! I’m an Apress Author.

Check out the new badge... It is such an honor for me to receive this badge...

Access Manager / EBS -- setting the default language

Another Access Manager post from me.. I started to like these Access Manager issues, dealing with them are like solving puzzles.
Anyways, in this post I will write about changing the default langauge of the OAM login page and indirectly, changing the EBS login language.

What we need to is basically use the wlst to connect to the admin server and then execute configOAMLoginPagePref, as shown in the following demo;

suppose our EBS base lang is English(en) , but we need to make all the users login in to EBS via Turkish language (tr). Suppose we need to display the OAM login page in Turkish as well.

[appoid@ermanoid bin]$ cd /home/app/oracle/middleware/Oracle_IDM2/common/bin

[appoid@ermanoid bin]$ sh wlst.sh

Initializing WebLogic Scripting Tool (WLST) ...

Welcome to WebLogic Server Administration Scripting Shell

Type help() for help on available commands

wls:/offline> connect()

Please enter your username :weblogic

Please enter your password :

Please enter your server URL [t3://localhost:7001] :t3://ermanoid.ermanexampledomain:7105

Connecting to t3://ermanoid.ermanexampledomain:7105 with userid weblogic ...

Successfully connected to Admin Server 'AdminServer' that belongs to domain 'oam_domain'.

Warning: An insecure protocol was used to connect to the

server. To ensure on-the-wire security, the SSL port or

Admin port should be used instead.

wls:/oam_domain/serverConfig> configOAMLoginPagePref(persistentCookie="false",persistentCookieLifetime=30,

langPrefCookieDomain="ermanexampledomain",langPrefOrder="oamPrefsCookie,browserAcceptLanguage,serverOverrideLangPref,defaultLanguage",

serverOverrideLanguage="tr",defaultLanguage="en",

applicationSupportedLocales="en,tr")

That's it , clear your browser cache and you ll have turkish OAM login.

Note that, you will also have a drop down list to choose the login language;

Also, if you don't want to have a language selector in the OAM login page; execute the following;

Note that: following configuration is for making the OAM page turkish only;

wls:/oam_domain/serverConfig> configOAMLoginPagePref(persistentCookie="false",persistentCookieLifetime=30,

langPrefCookieDomain="ermanexampledomain",langPrefOrder="oamPrefsCookie,browserAcceptLanguage,serverOverrideLangPref,defaultLanguage",

serverOverrideLanguage="tr",defaultLanguage="en",

applicationSupportedLocales="tr")

Note that, there are some EBS profile, that may affect EBS language, as well. For example: FND_OVERRIDE_SSO_LANG.

So if you are interested to know more about this topic, I suggest you to read the MOS note: Integrating Oracle E-Business Suite Release 12.2 with Oracle Access Manager 11gR2 (11.1.2) using Oracle E-Business Suite AccessGate (Doc ID 1576425.1) - 5.4 Configure Languages for the Oracle Access Manager Login Page.

Also note that; following languages are not supported with OAM yet;

Hebrew - Bug 16901373 - Fixed in OAM 11gR1 Patchset 2.

Croatian and Canadian French - Bug 16920577

Albanian, Catalan, Cyrillic Serbian, Dutch, Egyptian, Icelandic, Indonesian, Latin Serbian, Lithuanian, Slovenian, Ukrainain, Vietnamese - Bug 16920613

OAM/EBS - Enable SSL in OAM 11g

Recently, implemented an SSL on OAM ( Access Manager), which was integrated to an EBS 12.2 instance.
Altough, the documentation seems a little bit confusing, and altough, there are multiple ways for enabling SSL in OAM 11g, I preferred to accomplish it in the following way;

First a created an Oracle wallet by sourcing the OAM envrionment file and using owm. (I prefer creating wallet, because we get used to it.)
Then, I created a certificate request using the wallet I created. (I use the details that my customer delivered to me for inputs)
After creating certificate request, I exported it and send it to customer for getting the server, root and intermediate (if it is available) certificate.
I imported the customer's certificates to Wallet , once I got them. (not that, the certificate format should be base 64 , x.509 format, .cer files...)
Later on, I converted the wallet to a JKS (java keystore), as FMW uses JKS.. Example: source oam env and run ->

$MW_HOME/oracle_common/bin/orapki wallet pkcs12_to_jks -wallet ./mywallet -pwd welcome1 -jksKeyStoreLoc ./mywalletK.jks -jksKeyStorepwd welcome1 -jksTrustStoreLoc ./mywallettrust.jks -jksTrustStorepwd welcome1

After then I jumped into the weblogic console and did all SSL configurations needed there;

Select Environment->Servers and click on the OAM managed server
Then Select the Keystores tab
Select Keystore -> Change
Select Custom Identity and Custome Trust from the drop down list and click Save
Enter the relevant information in the Keystores page:
Custom Identity Keystore : full path of the .jks file
Custom Identity Keystore : JKS (UPPERCASE)
Custom Identity Keystore Passphrase : welcome1
Confirm Custom Identity Keystore Passphrase : welcome1
Custom Trust Keystore : full path of the trust store .jks file
Custom Trust Keystore Type : JKS (UPPERCASE)
Custom Trust Keystore Passphrase : welcome1
Confirm Custom Trust Keystore Passphrase : welcome1
Click Save
After then select the SSL tab and enter the following
Private Key Alias : alias of the server cert gathered from the jks that was created earlier using keytool, it was orakey in my case
Private Key Password : welcome1
Confirm Private Key Password: welcome1
Click Save
Again, Select Environment -> Servers and click on the OAM managed server
Select the General;
Check SSL Listen Port Enabled
Change SSL Listen Port according to your env: in my case; 14443 (use netstat to ensure that it is free)
Click 'Save'

Then opened the oamconsole using adminserver:adminport/oamconsole and did the OAM Server port change (OAM port - ssl port) and OAM Server host change (https) as documented in "https://docs.oracle.com/cd/E52734_01/oim/IDMIG/idm_ssl.htm#IDMIG32029" , Section 4.2 Configuring SSL on Servers in the OAM Domain -- note that, if you don't dothat, EBS login will still be redirected to the non-ssl OAM login page
Lastly, restart the OAM managed server and Admin Server (just in case, remember, it is the OAM Admin server that redirects to OAM managed server during the EBS login)
You may need to reregister EBS using txkrun.pl , but it should be necessary. Just in case, keep that in mind.

Monday, October 17, 2016

OAM -- Change Logo in Login Page

We are about the complete an OAM 11G-EBS 12.2 Single Signon project, so I will write a comprehensive article about it, but here is a teaser for you.

In this post, I will give you the instructions for changing the default logo in the OAM login page.
This kind of a thing, as you may guess, is the last thing to do in most of the projects. However; considering we are completing the project, it is actually time to change the logo :)

So, in order to change the logo, we just replace the login_logo.png, located under the following directories;

$MW_HOME/user_projects/domains/oam_domain/servers/oam_server1/tmp/_WL_user/oam_server_11.1.2.0.0/dcyk8m/war/pages/images

$MW_HOME/user_projects/domains/oam_domain/servers/AdminServer/tmp/_WL_user/oam_admin_11.1.2.0.0/7po8nl/war/images --> this is optional, as it is for the Admin Server , as it is used only for Weblogic Console login/admisitration console.

--these directories may differ according to your environment, but the directory path should be almost the same the above.

So, we replace the login_logo.png images in the related directories and then restart the OAM managed server.. (restart the admin server if you want to change the logo of Weblogic Administration Console, as well)

Exadata -- RAC Extended (Strech) Clusters, why is it unsupported?

Last month, I have written an article on Active-Active datacenters .(http://ermanarslan.blogspot.com.tr/2016/09/rdbms-active-data-center-from-oracle.html).

In that article, I have given the concepts for building active-active datacenters. The most exciting concept that was given there was the RAC Extended Clusters or Strech Clusters.

RAC Extended Clusters is applicable when the distance between the sites is not more than 25 km(for non-Exadata environments) and this is a RAC configuration in which the nodes can be in different sites.

So I have introduced you the RAC Extended Clustering very briefly and now, I want to discuss about building RAC Extended Clusters on multiple Exadata Machines, located in different data centers. (supposing the distance between these data centers are <=100 meters -- Infiniband network currently has a limitation of 100 meters. )

First of all, Oracle Real Application Clusters on Extended Distance Clusters is not supported with Exadata. So, basically, it is not supported to build RAC Extended Clusters on multiple Exadata machines.

I want to write on this topic, because a realistic design like the depicted below; actually can't be built because of this support issue.

Altough, it seems realistic and efficient, an architecure like the one above can't be actualized, because RAC extended clusters is not supported with Exadata.

The discussion that I m talking about is actually the reasons behind this lack of support.

First of all, let's review the most cruicial things that are required for building a stable RAC extended clusters.

1) A fast and dedicated connectivity between the nodes and the sites is required for the Oracle RAC inter-instance communication. (this is okay, supposing the distance between the Data Centers is < IB network limitation)

2) A tie breaking voting disk needs to be placed in a third site. (This can be established as well)

3)A host based mirroring solution should be used to host the data on site A and site B and to make sure that the data between the two sites is kept in sync. Storage at each site must be setup as separate failure groups and use ASM mirroring, to ensure at least one copy of the data at each site. However; in Exadata, the disks in each cell are in seperate failgroups, not all the disks in each storage(all cells in each Exadata) are in seperate failgroups..
4)The fast interconnect is very important. So when you have an Exadata, you can't tolerate any slowness on there...(at least, if I were Oracle, I don't want to go in that risk) Any bottleneck in the interconnect between two different sites, will go against the paradigm of Engineered Systems and will Exadata look bad. Look at the Oracle whitepaper, "http://www.oracle.com/technetwork/products/clustering/overview/extendedracversion11-435972.pdf" , it is actually explaining "Oracle RAC one node", which doesn't require a very fast interconnect... You see the point..

So, you see? Actually the 3rd requirement given above is the thing that causes Extended Clusters to be unsupported with Exadata.

That is, we have to use ASM with Exadata and by using ASM we can have only Normal (2-mirrors) or High (3-mirrors) redundancy. Also, each storage cell in an Exadata is a failure group, so ASM places mirrored data in different cells, but it is not guaranteed that ASM will place the mirrored data in different cells, which are located in different Exadata machines.

So, in other words, we don't have the full control for the distribuition of mirrored data.

The question comes to mind here is, can't we extend the failure group definition of each Exadata machine to include all the disks in those Exadata machines? Well, this is also not supported.

The concept and thought for describing this lack of support is as follows;

Failure groups define ASM disks that share a common potential failure mechanism.

Are 2 different disks on 2 different cells have lots of potential failures in common? Actually no when compared with 2 different disks located in the same cell...

A cell includes controller, power, OS, patch level and dependencies.

So extending the ASM failure group defitinion to include multiple cells together is just not aligned with the concept. (remember Exadata is not an ASM failure group, it is group of failure group)

If it is possible to defined all the disk in one Exadata to be in a single failure group, then it would also break the engineered systems paradigm, right?

The concept to be used here should actually be the groups of failure groups, but this concept does not exists.

Good news is that, RAC Extended Clusters will probably be supported with Exadata in Oracle Database 12.2 (12CR2)... It is not certain yet, but it is expected. so we will see..

By the way, I m still discussing this support thing with Oracle Support. As you may guess, I will revisit this blog post, if I will get some additional info.

Sunday, October 16, 2016

EBS 12.2 - Turkish Reports Image problem - REP-0069, REP-62204, REP-50125, REP-002

Recently got an reports issue on EBS 12.2 (the exact version was 12.2.4 ). The issue was, related with the images that were attached to the Oracle reports, which are configured to be executed by the Concurrent Managers. The issue appeared only when the users, who are submitting the related concurrent programs , logged in Turkish.

The reports (REP*) errors reported in the error logs were;

REP-0069: Dahili hata (it is turkish, it means internal error)
REP-62204: - Unable to render RenderedOp for this operation. resmi yazılırken dahili hata.
REP-0069: Dahili hata (it is turkish, it means internal error)
REP-50125: İstisna saptandı: java.lang.NullPointerException
REP-0002: Unable to retrieve a string from the Report Builder message file.

Anyways, when I did my research , I saw that the issue was already documented in MOS note : Turkish Program Error REP-62204: - Unable To Render RenderedOp For This Operation (Doc ID 2003254.1)

It seemed that the reports code written for image operations, we not stable to handle turkish. (I mean , when the NLS_LANG is internally set to turkish, the problems may arise on reports which configured to include images in their outpus.)

The recommended solution was actually a workaround -> "The workaround for this issue is to recreate the report using the bundled XML Publisher / BI Publisher functionality instead."

But, who can/or wants to convert all the Oracle Reports to XML publisher/ BI Publiser reports in the middle of a project, right?

So, this is the question that made me writing this blog post.

Actually, there is a more quick and implemetable way for working around this issue.

That is, "exporting the REPORTS_OUTPUTIMAGEFORMAT to gif in the reports.sh", which is located under the ORACLE_HOME/bin directory, "is also a working workaround".

After setting this environment variable, there is no need to restart anything , just re-executing the report ( or concurrent that executes the report) is enough for getting rid of this problem.

exporting the REPORTS_OUTPUTIMAGEFORMAT to gif -> export REPORTS_OUTPUTIMAGEFORMAT=gif

This workaround is tested and verified.

Altough I didn't test them yet, there may be other workarounds (in reports.sh -> exporting REPORTS_DEFAULT_DISPLAY=NO or exporting the NLS_LANG something other than turkish),

Well...That is the tip of the day:) I hope, you find it useful.

Tuesday, October 11, 2016

RAC -- using different OS versions supported or not?

This blog post is written regarding a question that one of my customers asked a few days ago.
The question arised, when a customer wanted to utilize new servers for a mission critical 2 node Oracle RAC cluster environment.
The new servers had the same CPU architecture (Intel X86) as the current ones , but their Operating System were different than current ones.
So , the current machines were running with Oracle Linux 5.8 , but the new ones were installed with RHEL 6.

The customer wanted to expand their RAC by adding these new RHEL 6 machines using addnode.sh and then delete the old ones, so to minimize the downtime which may be required when they are renewing their RAC nodes.

The questions was : Can we use different OS versions (considering Oracle Linux is equivalent with Rhel) in RAC nodes? At least during this upgrade?

Well, I m sharing the answer because , it is interesting and it is something that I needed to research before answering it.

So, with RAC we have the rules:

The machine in a cluster can have different speeds and sizes.
Oracle Clusterware and RAC do not support heterogenous platforms (machines with different chipsets, i.e 1 machine Intel and 1 machine Sparc). However; you can have Intel 64 processor and another node using an AMD64 processor in the same cluster because the processors use the same x86-64 ISA. (interesting)
Oracle Clustware and RAC support mixed servers in the same cluster, but these servers must run the same OS (same OS binary).
The CPU architecture should be the same accroos the machines. This is because, running 32 bit and 64 bit software versions in the same cluster stack is not supported.

The above rules and statements are good to know, but the following were crucial for answering the cuıstomer's questions

Oracle supports the rolling upgrade of the OS in a cluster when the new OS is certified with the version of the Oracle Database you use. However, there is an important note regarding this; that is; "mixed operating system versions are only supported for the duration of an upgrade (i.e., within 24 hours)"
Additional note: During rolling upgrades of the operating system, Oracle supports using different operating system binaries when both versions of the operating system are certified with the Oracle Database release you are using.

Well, in the statement in bold, it is clearly said that it is possible for a rolling upgrade of the OS. But the customer wanted to know, if they can add new rac nodes to the cluster and then delete old ones.

Well, in rolling upgrade we still have the upgraded node running with the other nodes for a period of time, although these other nodes are not upgraded yet.
So it is similar to adding new nodes (with newer version of the OS) to a cluster and then removing the old ones..

At this point, the answer seemed to be YES . It seemed supported. (only supported during the renewing of the nodes)
That is, the customer could have a different OS versions during this work, but only during this work... ( Bytheway, I didnt test this yet, so I find it still risky, even the cluster check may prevent the customer to add a different versioned OS)

However, since it is not clearly documented; what seemed to be recommended is to follow an approach like "upgrade the OS on the current nodes, then add the new nodes to the cluster".

So , what was the question again?
The questions was : Can we use different OS versions (considering Oracle Linux is equivalent with Rhel) in RAC nodes? At least during this upgrade?

My answer: Maybe you can :) but don't do it.
Your new environment is RHEL 6, your old was RHEL 5. So it is risky, there is a major version change there. (if the old one would be RHEL 5.1 and the new one would be RHEL 5.2, then my answer might be YES)

So, do it this way -> Upgrade OS on your RAC nodes in a rolling fashion and then use addnode.sh to add the new ones. (and then delete old ones..)

Friday, October 7, 2016

RAC -- interconnect and gc waits, are select statements affected?

Recently dealed with a cluster wait issue on a RAC database, which was serving as a production database of a Bank.
During the work, there was a question asked by the folks and I felt that there was a misconception about the defition called "writers do not block readers"
They thought that the readers , who are actually using only the select statements should not be affected by the cluster waits...(which is not true bytheway)
Writers do not block readers is a true concept, but it has some additional and undeniable cost when the system is an Active-Active cluster (i.e RAC)

Look at this AWR:

Top User Events

Event	Event Class	% Event	Avg Active Sessions
gc buffer busy acquire	Cluster	75.00	33.73
enq: TX - row lock contention	Application	7.55	3.40
enq: TX - index contention	Concurrency	3.59	1.61
gc cr block lost	Cluster	2.94	1.32
gc buffer busy release	Cluster	2.62	1.18

See the top waits?

Look at the query recorded in AWR ->

SQL ID	PlanHash	Sampled # of Executions	% Activity	Row Source	% RwSrc	Top Event	% Event	SQL Text
38prqpb98prvm	1219140043	143	32.26	INDEX - UNIQUE SCAN	32.22	gc buffer busy acquire	30.75	SELECT count(*) FROM BLA_BLA...

You see the wait? (note I will not go into the details of this gc buffer busy acquire wait, it is beyond of the scope of this blog post, but what I will say is, it is there because of the contention and the select recorded in the AWR is affected from it.)

It definetly means a contention and, we may be talking about a Read-Write contention here, which can happen in RAC system for sure. (note I didnt copy paste the DMLs s here, but trust me .. they are there)

While instance B is making modification on a block, if you try to read(even only read) the same block from instance A, then you may see these waits.

Here is what happens in a scenario like this;

Instance A sends a request to GCS process, which manages this locking.
GCS checks the condition of the block and sees that it is locked by the Instance B.
GCS says to Instance B: remove your lock
If the session in Instance B does not want to remove its lock (if the session is still working with the block), it creates a Consistent Image in its buffer cache and sends this consistent image to Instance A. (consistent read)
This consistent image actually reflects the state of the block, which represents how the block looked like just before the modification.
This consistent image, as you may guess is sent from the private interconnect and Instance B gives this delivery info to GCS.

Well...We can say that this situation can be called a remote cache hit, and the turn of events explained above is caused by the read-write contention.

Note1:Remote cache hit costs more that a Local cache hit.
Note2:In order to have more local cache hits, it is recommend to do a configuration that will make the session that needs the block and the session that holds the lock for that block to be in the same RAC instance.
Note3:When we talk about a write-write contention, the LGWR, redo flush and PI(Past Image) creation may also come in to play. So it costs even more.

What can be done to prevent these inter-instance contentions?

Partitioning, reverse key or hash partitioned indexing and etc..

However, the best approach would be building a RAC configuration, that will make the session, which needs the block and the session which holds the lock for that block to be in the same RAC instance. (for example, if we have 3 instance, and if we see the data blocks for the same objects are frequently transferred from instance 3 to instance 2, we need to think that -> why dont we dedicate the related service to instance 2 and prevent this to happen, right?)

In order to make this happen, the approach to be followed is to divide the applications into modules and to configure these modules to connect to the db services, which are dedicated to instances.

So, this blog post is a specific one. The main idea here is; reader must not see the changes that are done but not committed by the writer. Altough, RAC is based on a shared disk/interconnect architecture, it has a non-shared memory, which acts like a shared one --cache fusion.
Cache fusion is a very sophisticated product and there are serialization mechanisms that come in to play to provide it. In brief, the cache fusion depends on in memory block locking, I/O, interconnect and interprocess communication. So it is costly.

Note: in the next few days, I will be writing a more detailed blog post about RAC. So, a more detailed article about RAC is on its way to you.

Update:

Keep in mind that, the Oracle database versions < 11.2.0.4, there are lots of bug reported with these issues.
DOC ID 1907369.1 is one of the documents reporting them.
The recommended solution for getting rid of those bugs are upgrading the database to 11.2.0.4So if you see a sessing get stuck on gc current request, you may be hitting those bugs.
Recommend solution for those bugs is; killing the stuck session or upgrading the database to 11.2.0.4, as most of those bugs are fixed in that release.

Exadata -- About Exadata SL (Sparc Linux)

A new Exadata named Exadata SL6 seems to be released.
The names comes from the Sparc Linux(SL).
It is different than the other Exadata X machines, as in Exadata SL, Oracle uses Sparc.
It is pleasing to know that altough, Cpu Arch is Sparc, Operating System is still Linux.
The machine has Sparc M7 CPUs and these CPUs are used in compression/decompression, as well as providing the security in CPU and increasing Sql Query performance. (software in silicon, SQL in silicon, Security in Silicon)

I would like to do a POC (EBS or Analytics) and write a more detailed blog post about it .

According to my current knowledge, the specs of the machine is as follows;

Ultra fast Sun Servers: SPARC T7-2

Fastest Processors : 32 Core Sparc

Fastest Analytics : SQL-in-Silicon

Fastest Storage : Exadata

Fastest Network : Infiniband

Most Secure : Security in Silicon

Same OS as Exadata X6: Linux

Same cost as Exadata X6

Subscribe to: Posts ( Atom )