Thursday, April 28, 2022

EBS 12.2 -- Using "SOAP" w/ ISG (Integrated SOA Gateway) -- Requires "SOA Suite"..

In EBS 12.2, we need to have SOA Suite installed and configured to be able use SOAP web services.. So don't be surprised. For using the REST Web Services, nothing extra is required, but for SOAP, we need SOA Suite and we need to license it.. 

Actually, it is impossible to miss it, because while you are configuring the ISG using the following MOS document; you will see it in the section named "Section 3.1: Installing Oracle WebLogic Server and Oracle SOA Suite for SOAP Services" under the part named "Configuring Oracle E-Business Suite SOAP Services"

Ref: Installing Oracle E-Business Suite Integrated SOA Gateway, Release 12.2 (Doc ID 1311068.1)

EBS 12.1 customers don't have to worry about that. This requirement of having SOA in place is the result of the technological changes in EBS 12.2 (namely Weblogic.) 

This was the second tip of the day:) (or should I say the first warning of the day:)

EBS 12.2 -- BNE_CREATE_DOC_ERROR, java.lang.NoSuchMethodError for .addToolPack --> "BNE: Enable OOXML Standard"

If you are facing the following issue while using the Journal wizard, (supposing your poi jar and the codes that calls the methods in it, are aligned in terms of patch levels..), just set set the profile option -> "BNE: Enable OOXML Standard" to No, restart Apache for changes to take effect and retest..

I mean ; if you login with sysadmin, connect to General Ledger Expert user - > Journals -> Launch Journal Wizard and you got the following error;

BNE_CREATE_DOC_ERROR Causing below error class java.lang.NoSuchMethodError: org.apache.poi.xssf.usermodel.XSSFWorkbook.addToolPack(Lorg/apache/poi/ss/formula/udf/UDFFinder;) Please contact your system administrator for assistance.

Just a background info for the profile ;

Ref : Oracle Support

Beginning in Release 12.2.5, you can choose to create Oracle Web Applications Desktop Integrator spreadsheets according to the Office Open XML (OOXML) standard, an XML format developed by Microsoft to represent spreadsheets and other types of documents. When using the OOXML format, Oracle Web Applications Desktop Integrator creates the spreadsheet entirely on the Oracle E-Business Suite server, and the completed spreadsheet is then downloaded to the desktop. Upload processing is also performed on the server. With this option, no macros are used during document creation and limited macros are used during upload. Consequently, the Trust access to the VBA project object model security option is not required to work with OOXML documents. Also, because the processing is performed on the server, document creation and upload use minimal desktop resources.

Remember, you can always debug these processes using the following method;

BNE Server Log Filename : BneLogger.log (ensure that file doesn't exist and if it does, delete it)

BNE Server Log Level : TRACE

BNE Server Log Path : free to choose, eg. in the IAS_TOP/Apache/Jserv/logs (use the full path)

-- Restart the Apache Server for changes to take effect.

cd $INST_TOP/admin/scripts/..

$ ./adapcctl.sh stop

$ ./adstpall.sh start

Okay.. This was the quick tip of the day : )

Wednesday, April 20, 2022

EBS / RDBMS -- High Latch Free Waits --> "Result Cache: RC Latch"

I have dealt an interesting performance issue recently.  The environment was an EBS R12 ( 12.2.6 ) .. EBS database was an Oracle 12.1.0.2 running on Solaris SPARC.

The issue was appearing in the office hours. 

Every business day around 9:30 AM, there were Latch Free waits and this situation was making the end user face a performance degradation and complain..

We had the Enterprise Manager, AWR and ASH reports. The cause aboveground was Latch Free waits, but the reason behind was the Result Cache. (Actually the improper use of the  Result Cache...)

If you don't have the necessary background info about the Result Cache, please read the following blog post for a quick enablement-> 

https://ermanarslan.blogspot.com/2019/10/rdbms-things-to-know-about-result-cache.html

(RDBMS -- things to know about the Result Cache / Result Cache : RC latch)

In this case, the following PL/SQLs were waiting on the Result Cache: RC Latch;

*FND_AOLJ_UTIL.is_Valid_ICX(:2,NULL,'Y',TRUE,TRUE,NULL,NULL,NUL)

*BEGIN   fnd_global.bless_next_init('FND_PERMIT_0006');   

 *FND_GLOBAL.APPS_INITIALIZE(    user_id=>:1    ,resp_id=>:2    ,resp_appl_id=

  >:3    ,security_group_id=>:4    ,server_id=>:5);END;

Look like a login issue right? I mean, it looked like, there was a login storm which was expected in the morning and it was the one that triggers the issue. (and actually it was the one that is becoming the victim of it..) Why do I say that? Because I saw those PLSQLs that were used in session validation, apps initialization and so on.. And! we were seeing the issue in the office office around 9 AM..

In these types issues, customers chase the quick wins at first..

So what might be the solution?

*Increasing the Result Cache Size?

*Disabling the custom code? (there was a Custom SSO on the login) 

*Gathering fixed object stats and dictionary stats?  - ref: Some Queries Are Causing Performance Issues in ISG (Doc ID 2396223.1

*Setting some hidden parameters -> alter system set "_optimizer_dsdir_usage_control" = 0;  -- ref: High Latch Free Result Cache RC LATCH On Object Linked Views (Doc ID 2245996.1) 

*Some more hidden parameters -> *alter system set "_optimizer_ads_use_result_cache" = FALSE; High "Latch Free" Waits with Contention on 'Result Cache: RC Latch' when RESULT_CACHE_MODE = MANUAL on Oracle 12c (Doc ID 2002089.1)

*Doing some other improvements -- for decreasing the latching -- moves like changing the cursor_sharing parameter --- not related actually..

*Enabling session trace for EBS users? -- doesn't tell us much in this case.

Log onto System Administrator Responsibility

Navigate: Profile > System

Search for the profile option Initialization SQL Statement - Custom (USER level. )

Enter the following sql statement into the User Level profile option. 

--modify USERID to the actual username of the user.

BEGIN FND_CTL.FND_SESS_CTL('','', '', 'TRUE','','ALTER SESSION SET TRACEFILE_IDENTIFIER='||''''||'USERID' ||''''||' EVENTS ='||''''||' 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 '||''''); END;

*Patches? -- It was a good try..
Patch 26436717  (Database/RDBMS Home Patch) and Patch 17305006 - SEVERE PERFORMANCE SLOWDOWN WHEN RESULTS CACHE IS THRASHING is needed.
Patch 27508480: CONNECTION LEAKS FROM WSH.SERVLET.REQUESTRATES değerlendirilebilir. 
PAY122:CONNECTION LEAKS WHEN ACCESSING ONLINE W-2 IN 12.2(Patch 22684327:R12.PAY.C) 
Patches 28831943 and 26599059 for FND 
Patches 25969099 and 20141119
BNE Is Leaking Connections In12.2.4 --- AD/TXK 7, ATG 5 And HRMS PC.C.7 (Doc ID 2141923.1)
21771287: CONNECTION LEAKS FROM GWY.JSP - SIMILAR TO BUG 20446583 -- değerlendirilebilir
R12.ATG_PF.C.delta.6 Patch 21900895 or greater has been applied which includes the latest changes for BNE memory allocation.

I could keep going , but actually no need. Well... That's because, we have considered all of the things above and we actually did some of them, but we couldn't get any benefit..

-->This time, we were not looking for a needle in a haystack. (that is actually pretty easy, because a needle and a haystack are very different).. We were looking for a particular needle in a stack of needles :) -- I like this phrase by the way.

We were almost sure that, the solution should be directly related with the result cache or to those (some EBS code) using it.
Disabling the Result Cache in the database level will resolve the issue, but we were not so into that.

What we did was the following; we first made the result cache off by using the bypass mode.
Then we flushed the Result Cache and made in on again.
Once the Result Cache is flushed, we started monitoring it..

For checking the current result cache
-----------------------------------------
SELECT dbms_result_cache.status() FROM dual;

To enable the bypass mode and flush:
-----------------------------------------
BEGIN
   DBMS_RESULT_CACHE.BYPASS(TRUE);
   DBMS_RESULT_CACHE.FLUSH;
END;
/

To disable the bypass mode
-----------------------------------------
exec DBMS_RESULT_CACHE.BYPASS(FALSE);

We saw that the Result cache is occupied by the FND_PROFILE mostly..
There was no space pressure on the cache but things like FND_PROFILE.GET_PROFILE_APPL_ID and FND_PROFILE.GET_PROFILE_ID was there on the top..

Query used for Result cache monitoring:

select substr(name, 1,100) name,
cache_id,
count(*) result_count,
round(avg(scan_count)) avg_scan_count,type,status
from gv$result_cache_objects
where type = 'Result'
group by name, cache_id,type,status

A culprit from the monitoring output:

NAME                                                                            
--------------------------------------------------------------------------------
CACHE_ID                                                                        
--------------------------------------------------------------------------------
RESULT_COUNT AVG_SCAN_COUNT TYPE       STATUS                                   
------------ -------------- ---------- ---------                                
"APPS"."FND_PROFILE"::11."GET_PROFILE_APPL_ID"#32fb3b6bdac49c05 #245            
85njc349fpj4gc54hq8ca2u41q                                                      
         691         272415 Result     Published                                
                                                                                
"APPS"."FND_PROFILE"::11."GET_PROFILE_ID"#32fb3b6bdac49c05 #216                 
6m1zcmdsv8v40c201cvbaa3y8u                                                      
         721         261123 Result     Published                                
                                                                                
"APPS"."FND_PROFILE"::11."GET_LEVEL_ENABLED_FLAG"#ba365d4c80ddb2d5 #275         
4b86jg87f0gkxafawmtmr29dh4                                                      
        4746         139188 Result     Published                                
                                                                                
"APPS"."FND_PROFILE"::11."GET_HIERARCHY_TYPE"#a0e3bfe28f4de20f #326             
1q2vx7pv7jvc18uksjffyf6vb9                                                      
         678         139164 Result     Published                        

Then , we jumped into the code.. The standard code and found that FND_PROFILE was written in a way that is implicitly use the result cache.. We saw the usage of cross-session PLSQL Fuction Result Cache in multiple places of the code. (the RESULT_CACHE Clause) 
Those FND_PROFILE calls were done mostly during the login and we concluded that, when there is a login storm and when there is a DML that may affect the data/FND_PROFILE related data that is already stored in the result cache, those waits for the Result Cache Latch start appearing..
Note that, we also had some cached things that belongs to the WF_CORE, but this wasn't the issue..

Anyways, as as workaround we wrote a script (job, cron job etc.. -- you can think whatever you like) to flush the result cache every morning just before the issue happens.. (The issue was mostly seen after 9 am..)

We could also blacklist the relevant parts of the result cache, as a more elegant workaround.

Blacklisting those caches:

Using the following query, identify the Cache_IDs that has high RESULT_COUNT.

SQL> select substr(name, 1,100) name,
cache_id,
count(*) result_count,
round(avg(scan_count)) avg_scan_count,type,status
from gv$result_cache_objects
where type = 'Result'
group by name, cache_id,type,status

Use Black_list_Add to ban them..

exec dbms_result_cache.Black_List_Add(xxxxxxxxxx, TRUE);

But what about the fix? Well... Fix is in the code.. The code is written such a way to use the Result Cache, but the processes in the ERP changes the data that is cached by the code and that is too bad!

Either the code or the processes (ERP processes that is updating the data cached by FND_PROFILE) should be revisited..

Nevertheless, we have 2 workarounds! :)  Again -> Be careful when using the result cache!

Well.. I can't stop myself to give some background info here.. Here it is ;

Result cache in these kinds of scenarios brings the data faster than the buffer cache, as it has the data in the result format and it just copies it to our session/process 's buffer.

However; if we try to use the result cache in a dynamic environment, things change.
Things change, because the invalidation ( when the data and objects that builds the contents cached in the result cache change) means "Writing the Result Cache"

Writing is not a very bad thing? Yes for 1 write or 10 writes maybe it is so, but consider 1000 concurrent writes..

Why writing to result cache concurrently is a bad thing then?
It is because , the result cache is protected by a single latch.

In order to write to the cache, it is required to get an exclusive latch.
This exclusive latch is need to be get when there is a cache miss or cache invalidation..

So cache hit doesn't require the session to get that exclusive latch.
This exclusive latch, on the other hands; is a single one, and it has no child.
It is basically, when you want to write to cache, you need to get it -> concurrent write access -> Problem
However, if you read the cache (cache hit), you need to get a shared latch -> no contention -> NO Problemo :)

Here is what Oracle Support says about it ->

The result cache was designed to cache small results sets for a short period of time and once it gets full there is an overhead of handling the data in it. When no empty blocks are found in the RC we attempt to make room first by removing any invalids, then requesting a new chunk to be allocated and finally by removing using LRU. Currently, there is a cost associated with making room in a full result cache, which increases with size of the cache, especially for LRU processing and this is unlikely to change.

If you indirectly or directly put an object or let's say a result into the result cache and if the data/object that produces that result is dynamically changing; then guess what?
You will need to write to the cache..

-- By the word directly, I mean using hints -> SELECT /+result_cache/

You will have Result Cache related activity.
You will see a peak for the latch named Result Cache: RC latch.

Okay.. It is a perfect place to end.. See you in the next post :)

Sunday, April 10, 2022

EBS -- poor performance in Autoconfig -- due to Java and unnecessary waits on Syscalls -- FUTEX_WAIT & FUTEX_WAIT_PRIVATE

Recently upgraded an EBS database from 11.2.0.4 to 19C. The source apps tier was an EBS 12.1.3 and the database was an Oracle Ent. Edition 11.2.0.4. Both applications and database was running on Oracle Linux 7 Operating Systems (OEL 7.9). 

Due to the low application version, we took lots of patching actions before doing the actual database upgrade and the multi tenant conversion.

The source environment was a multi-node E-Business Suite. It was consisting of 2 SSO enabled apps nodes (Non-shared Application Filesystem), and a 2-nodes RAC database.

The production iteration was our third iteration in this project (note that we do 3 iterations at least..). 

So we already had perfected our operation. We had a detailed SOP and a Runbook. We had a detailed upgrade schedule and we estimated the duration of the upgrade tasks carefully.

But! the production 19C upgrade took longer than we thought.

The problem was in the execution of adpatch actions and autoconfig.

We had lots of patches to apply to the apps tier nodes and we had to run autoconfig several times just to make the apps tier ready for the 19C upgrade. ( you know these things changes from env to env.. In this env, there were several apps tier patches required and needed to be in place before upgrading the database to 19C and converting it to a multi-tenant  -- 1CDB-1PDB)

The real cause that made the autoconfig and adpatch run slower was an inactive wait.. We kept following the autoconfig logs during those executions for instance... And we saw that autoconfig was waiting even during the JDBC connections.. There were no issues with the database connections but autoconfig was just waiting in these kinds of basic operations and that's why the total run time of autoconfig was extending.. The total duration of an autoconfig was almost 15 mins and this was an issue that can not be ignored.. Adpatch was not doing any better than autoconfig by the way..

Anyways, we used strace to get the system calls of those running java processes.. You know... In apps, we start with sh, we continue with perl and most of the time we end up with java.. :) So we traced the java processes while they were being executed by the autoconfig (or adpatch)

We saw that the java processes were waiting on FUTEX_WAIT system calls while we were seeing those unnecessary waits.. 

Futex / Fast Mutexes are just the locking mechanisms that are used for basic locking, or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables. This is why, we thought that there was probably a contention on a memory location, which was protected by one of those mutexes (operated by calling futexes :)  

Following is from the Man page:

long syscall(SYS_futex, uint32_t *uaddr, int futex_op, uint32_t val, const struct timespec *timeout, /* or: uint32_t val2 */ uint32_t *uaddr2, uint32_t val3);

The futex() system call provides a method for waiting until a certain condition becomes true. It is typically used as a blocking construct in the context of shared-memory synchronization. When using futexes, the majority of the synchronization operations are performed in user space. A user- space program employs the futex() system call only when it is likely that the program has to block for a longer time until the condition becomes true. Other futex() operations can be used to wake any processes or threads waiting for a particular condition.

The uaddr that we saw in the FUTEX_WAIT system calls was always the same, not changing.

So same processes were waiting on FUTEX_WAIT and the uaddr was always the same. .(this might be normal & expected, because of the implementation of virtual memory..)

Note that, I just mentioned FUTEX_WAIT so far but FUTEX_WAIT_PRIVATE is not so different than that. It seems, FUTEX_WAIT_PRIVATE is just the product of an optimization done by linux glibc to make futexes faster when they're not shared between processes. -- so just wanted to shed a light on this one.. Lets continue;

So at first we thought that those FUTEX_WAITs were caused by a contention. In order to see the contention and find the blocker ,we installed "stap" and used the script that was provided by Redhat.. (note that, the problem was on an Oracle Linux but they are almost the same right :)

The installation of stap was a little throublesome, but we installed it and used the procedure given in the Redhat article which is publically available in the following url > 
--IDENTIFYING CONTENDED USER-SPACE LOCKS

Suprisingly, there was no contention reported, even when the autoconfig was waiting for getting an established jdbc connection.

Then, we made a deep dive, checked some futex examples, wrote some code to implement futex waits and guess what we found; the contention was not the only possible cause of futex waits. A poor written code or a blocking task (slow I/O, CPU shortage , high load) could also indirectly cause this kind of an issue.

Okay now; just suppose I m the main thread and your are the child thread.. Now suppose, I ( the main) was just written in such a way that, I just acquire a lock (semaphore, mutex, futex you name it) and then create "you" / the threads and then do some other fast stuff (some things that require I/O) and only then I release that lock. 
Suppose;  you the threads are written in such a way, that you need to get that lock to start doing your actual work. Okay not a dead lock, but it is a lock!

Now suppose those fast stuff that I just mentioned were blocked.. So I m waiting for their return and that's why I don't release the lock.. Well, you will "wait"..

Okay, I need to stop this story telling, because it just started getting weird :) And I felt like we should speed up a bit. Therefore, I am now heading towards the conclusion.

So, we thought that OS or JVM might be the real cause that was making us wait, but we also thought that a poorly written code might be the cause.. So we might need a patch for it. (autoconfig patch, a tech patch or something like that..)

Java thread dump didn't help much. By the way, we got it by following the MOS note below;

*How to Obtain a Thread Dump (Stack Traces) from a Java Process or from a Core File of a Java Process on Linux (Doc ID 1282871.1)

Then, we tried dropping the fs caches of Linux.. Because some example codes, that were written in a way to read some files and import the contents of those files into the database; were running very fast in their very first executions, but they were start waiting on FUTEX_WAIT in their subsequent exeuctions. So, we thought that this might be caused by a misbehaviour of Linux FS caching and we just tried to drop the fs caches and retest.. 

It was a nice try but it didn't solve the issue :)

Okay, I m speeding up!

Well, finally we found the cause..

It was /dev/random & I was there , done that! :)

Read -> https://ermanarslan.blogspot.com/2020/05/entropy-linux-kernel-csprngs-devurandom.html -- this is one my favorites.. "Entropy, Linux Kernel - CSPRNGs, /dev/urandom vs /dev/random and all that"

Some background info about /dev/random and /dev/urandom:

In Linux, we have /dev/urandom and /dev/random for this. These are character devices and they look like files. We read them like we read files and when we read 100 bytes from them, they actually run CSPRNG on the entropy pool and give us the random number we need.

These tools provide us limited and uniform random bytes when we need. Moreover, the source they are fed, is populated by the unpredictable events.

But, as you may ask, we have two devices, right? /dev/random and /dev/urandom.. So which one should be used in which case? This is definitely the question that one may ask.

Well, first describe the difference between these tools, so that maybe we can make a decision depending on those differences.

The main difference between /dev/random and /dev/urandom is that, /dev/random tracks the entproy that we have in the entropy pool and it blocks when the entropy is low. (remember the entropy that I mentioned in the first part of the blog post).. It is basically implemented in a way to block itself when it thinks that the unpredictability is low.

Reference for the above: Erman Arslan's Oracle Blog :)

Entropy and Claude Shannon again! :)


We tested it by providing /dev/urandom using a command line argument to those java programs ..Djava.security.egd=file:/dev/./urandom did the job and we saw that this action cleared the waits.

However; we had to generalize it and somehow made it system-wide. 
We used the following for that ->
  • Open the java.security file of the related JDK/JRE.. (ex: JAVA_HOME/jre/lib/security/java.security)
  • Change the line: "securerandom.source=file:/dev/random" to "securerandom.source=file:/dev/./urandom"
  • Note that, we need to change the line to /dev/./urandom. Otherwise, java ignores it... For instance java ignores /dev/urandom.. (the one without /./ is ignored!) 
  • Save the changes.. 

Okay. That 's it :) This was for all my followers, and those ones who work in Oracle Application Technology & Oracle Linux Support..

I hope it will be useful.. 

Sunday, April 3, 2022

EBS 12.2 -- custom OAF Deployment doesn't realize the new code -- due to misplaced files/codes under OA_HTML

We had an issue with the custom OAF deployment in an newly upgraded EBS 12.2.10 environment.

We just couldn't make the new codes be activated and no matter what we did, the application was still using the old codes and we were not able to deploy the changed OAFs.

Actually, our deployments were fine, but the changes that were supposed to be delivered with those deployment, were not reflected to the system. 

Even when we deleted a custom code from JAVA_TOP, the page that was related with that code was still working without any problems.. So it was a clue.. That is, somehow EBS was getting the code from elsewhere.. Note that, we were facing this issue for almost all custom developments under JAVA_TOP. 

Before finding the real cause, we did the following, but didn't help;

Stopped all the application services, recompiled JSPs, started the application services

Cleared the java cache using Functional administrator

Cleared the browser cache

Cheared any networking cache ( including the Load Balancer)

Then, we saw that there were 2625 classes in the directory named $OA_HTML/WEB-INF/oracle.

The custom classes were also there..

So, we removed; any code (except for _pages) that reside in the directory named $OA_HTML/WEB-INF/oracle; and the issue disappeared.. 

As you may guess; the real cause was the misplacement of code.. 

That is, when the custom classes are deployed to OA_HTML/WEB-INF/oracle, they take precedence over customall,jar generated by adcgnjar under $JAVA_TOP. 

So no matter what we deploy to JAVA_TOP, it doesn't matter.. I mean, if the code we deploy to JAVA_TOP exists in OA_HTML/WEB-INF/oracle then, the code that resides in OA_HTML/WEB-INF/oracle is executed. So we can't see our changes, we can't activate our changes..

That was an interesting issue, and the question "who put the codes there?" -> is still a mystery :)

Erman Arslan's Oracle Forum / March 2022 - "Q & A Series"

This month, we had interesting issues. We were chatty :) and I tried hard to find time to answer all the questions despite the load of my work (leading my department, mission critical support activities & projects + intellectual gymnastics - physics and stuff like that ) ! :)

This month, we (GTech) had also a mission critical EBS 19C database upgrade task. Besides, I had 2 presentations .. One of them was in a physical Oracle event and other one was in online webinar.. So it was a busy month, but things went well, and I am here blogging again :)

Remember, you can ask questions and get remote support using my forum. Just click on the link named "Erman Arslan's Oracle Forum is available now". Click here to ask a question", which is available on the main page of Erman Arslan's Oracle Blog.

-- or just use the direct link:















Come on, let's see what we've been up to in March 2022.

#amaerman

WorkFlow Mailer PROCESS file by big

Rpm 32 bit ebs by satish

Form not open after upgrade to 19c by raiq1

Autoconfig failed, Non supported character set by raiq1

Migrate pluggable database from filesystem to ASM by raiq1

R12.2 Login page freezes by big

memory by Roshan

hcheck error 19c upgrade by raiq1

ap_holds_pkg.insert_single_hold by big

SSL termination - load balancer - R12.2 by satish

DMZ r12.2 admin server by satish

Sizing concurrent manager processes by satish

Migration to new server by satish

Rapidwiz nonshared appltop by satish

Patching actions in shared appltop by satish

Upgrade R12.2.0 to R12.2.10 fresh installation by satish

send email from oracle by Arsalan

RMAN recover error by Roshan

move redolog by big

App and db on single node - migration of ebs to new server by satish

Status diagram by big

Tools for migrations by satish

Workflow System Administrator by big

Rpm 12.1.3 EBS by satish

supplemental log group by Roshan

How to check custom forms in EBS R12.1.3 by soumya

Start Stop Apache by big

Cannot Launch Forms by big

certification OCI by Roshan

JDK for R12.2 by big

oracle cloud by Roshan

RAC node eviction by Roshan

row lock by Roshan

Migration of ebs R12.1.3 by satish

R12.1.3 migration from el5 32 bitto 64bit by satish

audit_file_dest by big