Tuesday, August 14, 2018

EBS R12 -- REQAPPRV ORA-24033 error after 12C DB upgrade /rulesets & queues

Encountered ORA-24033 in an EBS 12.1.3 environment.
Actually, this error started to be produced in workflow , just after upgrading the database of this environment from 11gR2 to 12cR1.

The database upgrade (running dbua and other stuff) was done by a different company, so that we were not able to check if it is done properly..
However; we were the ones who needed to solve this issue when it appeared :)

Anyways, functional team encountered this error while checking the workflows in Workflow Administrator Web Applications -> Status Monitor, and reported it as follows;


ORA-24033 was basically saying us, there is a queue-subscriber problem in the environment, so we started working with the queues, subscribers and the rulesets.

The analysis showed that, we had 1 ruleset and 1 rule missing in this environment..

select * from
dba_objects
where object_name like 'WF_DEFERRED_QUEUE%'

The following output was produced in a reference environment, on which workflow REQAPPRV was running without any problems.


The following output, on the other hand; was produced in this problematic environment.


As seen above, we had 1 ruleset named WF_DEFERRED_QUEUE_M$1 and 1 rule named WF_DEFERRED_QUEUE_M$1 missing in this problematic environment..

In addition to that, WF_DEFERRED related rulesets were invalid in this problematic environment.

In order to create (validate) these ruleset , we followed 2 MOS documents and executed our action plan accordingly.

Fixing Invalid Workflow Rule Sets such as WF_DEFERRED_R and Related Errors on Workflow Queues:ORA-24033 (Doc ID 337294.1)
Contracts Clause Pending Approval with Error in Workflow ORA-25455 ORA-25447 ORA-00911 invalid character (Doc ID 1538730.1)

So what we executed in this context was as follows;

declare
l_wf_schema varchar2(200);
lagent sys.aq$_agent;
l_new_queue varchar2(30);

begin
l_wf_schema := wf_core.translate('WF_SCHEMA');
l_new_queue := l_wf_schema||'.WF_DEFERRED';
lagent := sys.aq$_agent('WF_DEFERRED',null,0);
dbms_aqadm.remove_subscriber(queue_name=>l_new_queue, subscriber=>lagent);
end;
/
commit;

declare
l_wf_schema varchar2(200);
lagent sys.aq$_agent;
l_new_queue varchar2(30);

begin
l_wf_schema := wf_core.translate('WF_SCHEMA');
l_new_queue := l_wf_schema||'.WF_DEFERRED';
lagent := sys.aq$_agent('WF_DEFERRED',null,0);
dbms_aqadm.add_subscriber(queue_name=>l_new_queue, subscriber=>lagent,rule=>'1=1');
end;
commit;

declare

lagent sys.aq$_agent;
begin
lagent := sys.aq$_agent('APPS','',0);
dbms_aqadm.add_subscriber(queue_name=>'APPLSYS.WF_DEFERRED_QUEUE_M',
subscriber=>lagent,
rule=>'CORRID like '''||'APPS'||'%''');
end;
/

So what we did was to;

Remove and add back the subscriber/rules to the WF_DEFERRED queue 
+
Add the subscriber and rule back into the WF_DEFERRED_QUEUE_M queue.  (if needed we could remove the subscribe before adding it)

By taking these actions; the ruleset named WF_DEFERRED_QUEUE_M$1 and the rule named WF_DEFERRED_QUEUE_M$ were automatically created and actually, this fixed the ORA-24033 error in REQAPPRV :)

Monday, August 13, 2018

EBS -- MIGRATION // 2 interesting problems & 2 facts -- autoconfig rule (2n-1) & APPL_SERVER_ID in the plan.xml of ebsauth

Recently migrated a production EBS from an Exadata to another Exadata. That was an advanced operation, as it involved Oracle Access Manager(OAM), Oracle Internet Directory(OID) and 2 EBS disaster environments.,

This was a very critical operation, because it was not tested.. Moreover; we needed to do this work without any tests and we needed start working immediately..

The environment was as follows;

PROD : 1 Load Balancer, 2 Apps Nodes, 1 OAM/OID node and 2 Database nodes (Exadata)
-- Parallel Concurrent Processing involved as well..
Local Standby : 1 Apps Node, 2 Database nodes (Exadata)
Remote Standby: 1 Apps Nodes, 2 Database nodes (Exadata)

What we needed to do was; migrating the DB nodes of PROD, to Local Standby.
In order to do this; we followed the action plan below;

Note: actually we did much more than this, but this action plan should give you the idea :) 
  • stopped OAM+OID+EBSAccessGate + Webgate etc..
  • stopped EBS apps services which were running on both of the Prod Apps nodes.
  • Switched over the EBS Prod database to be primary in Local Standby.
  • Reconfigured local standby to be the new primary and configured it as the primary for the remote standby as well.
  • After switching the database over the standby site; we cleaned up the apps specific conf which was stored in the database (fnd_conc_clone.setup_clean)
  • We built context files (adbldxml.pl) and executed autoconfig on the new db nodes. 
  • Once db nodes were configured properly; we manually edited the apps tier context files and executed autoconfig on each of the apps tier nodes. (note that ; apps services were not migrated to any other servers)
  • We started the apps tier services.
  • We reconfigured the workflow mailer (its configuration was overwritten by autoconfig)
  • We logged in locally (without OAM) , checked the OAF , Forms and concurrent managers.
  • Everything was running except the concurrent manager which were configured to run in the second apps node. No matter what we did from the command line and from the concurrent manager administration screens, we couldn't fix it.. There was nothing written in the internal manager log, but the concurrent managers of node 2 could not be started.. 
    • The first fact : If you have a multi node EBS apps tier, AutoConfig has to be run '2n - 1' times. In other words;   for an application which has 'n' number of application nodes, AutoConfig has to be run '2n - 1' times so that the tnsnames.ora file on each node has FNDSM entries for all the other nodes. So, as for the solution, we executed autoconfig once more in the second node, and the problem dissapeared.
Reference: AutoConfig Does Not Populate tnsnames.ora With FNDSM Entries For All The Nodes In A Multi-Node Environment (Doc ID 1358073.1)
  • After fixing the conc managers, we continued with the OAM and OID.. We changed the datasource of the SSO (in weblogic) with the new db url and also changed dbc file there.. Then, we started Access Gate, Webgate, OAM and OID and checked the EBS login by using SSO-enabled url. But, the login was throwing http 404..
  • All the components of SSO (OAM,OID and everyting) was running.. But only the deployment named ebsauth_prod was stopped and it could not be started ( it was getting errors)
    • The second fact : if you changed the host of the EBS database and if your APPL_SERVER_ID was changed, then you need to redeploy the ebsauth by modifying it Plan.xml with the new APPL_SERVER_ID. Actually you have 2 choices; 1) Set the app_APPL_SERVER_ID to a valid value in the Plan.xml file for the AccessGate deployment and then restart the EAG Servers. The Plan.xml file location is specified on the Overview Tab for the AccessGate Deployment within the Weblogic Console where AccessGate is deployed. 2) Undeploy and redeploy AccessGate. 
Reference: EBS Users Unable To Sign In using SSO After Upgrading To EBSAccessGate 1234 With OAM 11GR2PS2 (Doc ID 2013855.1)
  • Well. after this move, SSO-enabled EBS login started to work as well. The operation was completed , and we deserved a good night sleep :)

Saturday, August 4, 2018

Oracle VM Server -- Guest VM in blocked state, VM console connection(VNC), Linux boot (init=/bin/bash)

This is an interesting one.. It involves an interesting way of booting Linux, dealing with Oracle Vm Server and its Hypervisor.

Last week, after a power failure in a critical datacenter, one of the production EBS application nodes couldn't be started.. That EBS application node was a VM running on a Oracle VM Server, and although Oracle VM Server could be started without any problem, that application node couldn't.

As, I like to administrate Oracle VM Server using xm commands, I directly jumped into the Oracle VM Server by connecting it using ssh (as root).

The repositories were there.. They were all accessable and xm list command were showing that EBS node, but its state was "b".. (blocked)

I did restart the EBS Guest VM couple of times, but it didn't help.. The EBS Guest VM was going into the blocked state just after starting.

Customer was afraid of it, as the status "blocked" didn't sound good...

However; the fact was that, it was normal for a Guest VM to be in blocked status if it doesn't do anything or let's say if it has nothing actively running on CPU.

This fact made me think that there should be problem during the boot process of this EBS Guest VM.

The OS installed on this VM was Oracle Linux, and I thought that, probably, Oracle Linux wasn't doing anything during its boot process.. Maybe it was asking something during its boot, or maybe it was waiting for an input..

In order to understand that, we needed to have a console connection to this EBS Guest VM..

To have a console connection, I modified the vm.cfg of this EBS Guest VM -- actually added VNC specific parameters to it.

Note that, in Oracle VM Server we can use VNC to connect to the Guest machines even during their boot process.

After modifying the vm.cfg file of the EBS Guest VM, I restarted this guest machine using xm commands and directly connnected to its console using VNC.

I started to watch the Linux boot process of this EBS Guest VM and then I saw it stopped..
It stopped, because it was reporting a filesystem corruption and asking us to run fsck manually..

So far so good.. It was as I expected..

The Oracle Linux normally was asking for the root password to be able to give us a terminal for running fsck manually. However; we just couldn't get the password.

So we were stuck..

We tried to ignore the fsck message of Oracle Linux, but then it couldn't boot..

We needed find a way.

At that time, I put my Linux admin hat on , and did the following;

During the boot, I opened the GRUB(GRand Unified Bootloader) menu. (bootloader)
Selected the appropriate boot entry (uek kernel in our case) in the GRUB menu and pressed e to edit the line.
Selected the kernel line and pressed e again to edit it.
Appended init=/bin/bash at the end of line.
Booted it.

By using the init=/bin/bash, I basically told the Linux kernel to run /bin/bash as init, rather than the system init.

As you may guess, by using init=/bin/bash, I booted the Linux and obtained a terminal without supplying the root password.

After this point, running fsck was a piece of cake :)

So I executed fsck for the root filesystem and actually for the other ones also.. Repaired all of them and rebooted the Linux once again..

This time, Linux OS of that virtualized EBS application node booted perfectly and the EBS application services on it could be started without any problems..

It was a stressful work but it made me have this interesting story :)

Thursday, July 26, 2018

Exadata -- Image & GRID 12.2 upgrade

You may remember my article on upgrading Exadata software versions. ->

Exadata Patching-- Upgrading Exadata Software versions / Image upgrade

This time, I 'm extending this upgrade related topic.
So, in this post, I 'm writing about Exadata Image upgrade + 12.2 GRID infrastructure upgrade.

Well... Recently we needed to upgrade Exadata software and GRID infrastructure versions of an Exadata environment.

We divided this work into 2 parts. First we upgraded Exadata images and then we upgraded the GRID version.

Both these upgrades were rolling upgrades. So the databases remained working during these upgrade activities.

Let's take a look at how we do these upgrades.

Exadata Images Upgrades:

We upgraded the image version of a production Exadata envrionment from 12.1.2.1.2.150617.1 to 12.2.1.1.4.171128. We did this work by executing 3 the main processes, given below;
  • Analysis and gathering info about the environment.
  • Pre-check
  • Upgrading the Images in order of ->
    • Exadata Storage Servers(Cell nodes) 
    • Infiniband Switches
    • Compute Nodes (Database nodes)
So, we execute the 3 main phases above and while executing these phases, we actually take the following 8 actions;

1) Gathering info and controlling the current environment :

Image Info, DB Home & GRID Home patch levels opatch lsinventory outputs, SSH equivalency  check , ASM diskgroup repair times check, NFS shares, crontab outputs, .bash_profile contents, spfile/pfile backups, controlfile traces

Approx. duration : 3 hours (done before the operation day)
2) Running the Exack:

Downloading the up-to-date exachk and running it with the -a argument.
After running the exachk -> analyzing its output and taking the necessary actions if there are any.

Approx. duration : 2 hours (done before the operation day) 

 3) Downloading the new Exadata images and uploading it to the nodes.

Approx. duration : 2 hours (done before the operation day)

4) Creating the necessary group files for the Patchmgr . (cell_group, dbs_group, ibswitches.lst)

Approx. duration : 0.5 hours (done before the operation day)

5) Running Patchmgr precheck. After analyzing its output-> taking the necessary actions  (if there are any) For ex: if there are 3rd party rpms, we may decide to remove them manually before the upgrade.

Approx. duration : 0.5 hours (done before the operation day)

6) Running Patchmgr and upgrading the images. (we do the upgrade in rolling mode)

Before running the patchmgr, we kill all the ILOM sessions.. (active ILOM session may increase the duration of the upgrade)

Note: Upgrade is done in the following order;

Exadata Storage Servers(Cell nodes)  (1 hour per node)
Infiniband Switches (1 hour per switch )
Compute Nodes (Database nodes) ( 1.5 hours per node)
  
7) As the post upgrade actions; reconfiguring NFS & crontabs. Also reinstalling the 3rd party rpms (if removed before the upgrade)

Approx. duration : 0.5 hours

8) Post check: checking the databases, their connectivity and alert log files..
Note that : we also run exachk once again and analyze its output to ensure that everything is fine after the Image upgrade.

Approx. duration : 1 hour

GRID 12.2 Upgrade:

As for  the GRID 12.2 upgrade, we basically follow the MOS document below;

"12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running 11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1)"

First, we analyze our environment in conjuction with following the document above to determine the patches and prereq patches required for our environment.

Here is the list of patches that we used during our last GRID 12.2 upgrade work;

GI JAN 2018 RELEASE UPDATE 12.2.0.1.180116 Patch 27100009 
Oracle Database 12c Release 2 Grid Infrastructure (12.2.0.1.0) for Linux x86-64 V840012-01.zip 
OPatch 12.2.0.1.0 for Linux x86-64 Patch 6880880 
Opatch 11.2.0.0.0 for Linux x86-64 Patch 6880880 
CSSD : DUPLICATE RESPONSE IN GROUP DATA UPDATE Patch 21255373

Once all the required files/patches are in place, we do the upgrade GRID by following the steps below;
  1. Creating the new GRID Home directories.
  2. Unzipping the new GRID software into the relevant directories.
  3. Unzipping up-to-date opatch and GRID patches.
  4. If needed, configuring the ssh equivalencies.
  5. Running runcluvfy.sh and doing the cluster verification. (In case of an error, we fix the error and rerun it)
  6. Patching our current GRID home with the prereq patches (in our last upgrade work, we needed to apply the patch 21255373)
  7. Increasing the sga_max_size and sga_target values of the ASM instances.
  8. Configuring VNC (we do the actual upgrade using VNC)
  9. Starting the GRID upgrade using the unzipped new GRID Software (on VNC)
  10. Running the roolUpgrade.sh on all the nodes.
  11. Controlling/Checking the cluster services.
  12. Configuring the ASM compatibility levels.
  13. Lastly, as a post upgrade step, we add the new GRID home in to the inventory.
As you may guess, the most critical steps in the list above, are step 9 and step 10..  (as the actual upgrade is done while executing those steps)

Approx Duration : 4 hours.. (for a 2 node Exadata GRID upgrade)

that's it :) I hope you will find this blog post useful :)

Friday, July 20, 2018

Exadata Cloud Machine -- first look, quick info and important facts

Recently started an ECM (Exadata Cloud Machine) migration project, or maybe I should say an ECC (Exadata Cloud at Customer) migration project.

This is a big migration project, including migration of the Core Banking databases.
It is a long run, but it is very enjoyable.
We have 2 ECCs to migrate to..

Finally, last week, initial deployment of the machines was completed by Oracle.
This week, we connected to the machines and started to play with them :)

I think, I will write several blog posts about this new toys in the coming months, but here is a quick info and some important facts about the ECC environments.

First of all, ECC is an Exadata :) Exadata hardware + Exadata software..

Technically, it is such a virtualized Exadata RAC environment, that we(consultants) and customers can not access its cells, Iloms, switches and hypervisor.

  • It is a Cloud Machine, but it is behind the firewall of the customer.
  • It has a Cloud Control Plane application , a GUI to manage the database services and this application is hosted in OCC (Oracle Cloud Machine), which can be thought as the satellite of ECC. 
  • We do lots of stuff using this GUI. Database Service Creation (11.2.0.4, 12c, 18c) , Patching and etc..


  • Database service creations and Grid operations are automatized. According to version of the database created using GUI, GRID is automatically created.. For ex: If we create a 12.2 database and if it is the first 12.2 database that we create in ECC, GRID 12.2 is also automatically created..(cloud operations) For ex: If we have GRID 12.1 and some DB 12.1 residing in ECC and if we want to create our first and new 12.2 Database, then GRID is automatically upgraded to 12.2 as well.
  • The minimum supported DB version in ECC is 11.2.0.4. So we need to have our db compatible parameter set to 11.2.0.4 (mimum) in order to have a database on ECC -- this is related with the migration operations.
  • We can install Enterprise Manager agents on ECC. So our customer can manage and monitor ECC nodes and databases using its current Enterprise Manager Cloud or Grid control.
  • ECCs are virtualized. Only Oracle can access the hypervisor level. We and the customer can only access to the DOMu. In the DOMu RAC nodes , we and the customer do the OS administration.. Backups, patching, rpm installation and everything.. Customer is responsible for the DOMu machines, where GRID and Databases run on. Customer has root access for the DOMu nodes. (This means DB administration + OS administration is still continuing :))
  • So customer can't access Cell servers, and even ILOM consoles..
  • Administration for everyting that resides below the DomU layer, is done by Oracle
  • Responsibility for everything that resides below DomU layer, is on Oracle.
  • Currently, for every physical node, we have a VM node. For ex: If we have a 1/2 ECC. We have 4 physical nodes and 4 VMs.. (DOM u nodes) -- 1 to 1.
  • We can create RAC multi-node or single node databases on ECC.
  • We can also create databases manually on ECC. (without using the GUI).. Using scripts or runInstaller, everything can be done just like the old days. (as long as versions are compatible with ECC)
  • If we create a 12C database using GUI, it comes as Pluggable.. So if we want to have a non-PDP 12C Database, we need to create it manually.
  • Customer can connect to the RAC nodes (DOMu nodes) using SSH keys. (without password).. This is a must.
  • Customer can install backup agents to ECC.. So without changing the current backup method and technology, customer can backup the databases running on ECC.
  • There is no external infiniband connection to ECC.. External connection can be max 10Gbit.
  • Enterprise Manager Express comes with ECC. We have direct links to Enterprise Manager Express in the Control plane.
  • IORM is also available on GUI. Using GUI, we can do all the IORM configuration.. 
  • In ECC, we can use In-memory and Active Dataguard .. Actually, we can use all the database options without paying any licenses.
  • If we create 12.2 Databases, they are created with TDE.. So TDE is a must for 12.2 databases on ECC.
  • However, we are not required to use TDE, if we are using 11G databases on ECC.
  • The ASM diskgroups on ECC are High redundancy Diskgroups. This is the default an can not be changed!
  • Exadata Image upgrade operations on the ECC environments are done by Oracle.

That'a all for now :) In my next blog post, I will show you how we can create database services on ECC. (using GUI)

Monday, July 16, 2018

RDBMS -- Be careful while activating a standby database (especially in cascaded configurations)

Recently, a customer reported an issue about a standby database, which was out-of-sync with the primary. This standby database was the endpoint of a cascaded configuration.

The cascaded dataguard configuration in that customer environment, was as follows;

Primary -> Standby1 -> Stanby2

So, the customer's requirement was to activate standby1 and continue applying redologs of primary directly to the standby2.

However; while activating, actually after activating the standby database named standby1, the customer accidentally made Standby2 to apply the redologs which were generated by standby1.

When standby2 received and applied the archivelogs from standby1, standby2 became a new standby database for standby1, and it became out of sync with the initial production database.

Interesting, right?

In order to bring the database Standby2 in sync with its original primary database, we did the following;

We used flashback database option to flashback the standby2 to the point before it applied the archivelogs from standby1

Then, we deleted the archivelogs received from standby1 and make sure that standby1 is not sending any archive logs to standby2 until it is converted back to physical standby. ( this way we could ensure that standby2 is applying the redologs only from the production database.)

Note that, if we didn't have the possibility to use the flashback option, we would have to recreate the standy database named standby2...

So, be careful while playing with the dataguard configuration.. Especially in cascaded environments... First check the configuration, then take the action.. In this real life case, the dataguard configuration was from primary to standby 1 and from standby 1 to standby 2.. So when standby1 became activated, that path "from standby1 to standby2" worked, and standby1 became the new primary for standby2.. Incarnation changed and standby2 became out-of-sync with the original primary. 
In order to prevent this to be happen, the dataguard flow(configuration) should have been changed before activating the standby1.

Monday, July 9, 2018

GTECH -- Summer School -- Oracle Database & Cloud & EBS for newly graduates

Once in a year, we as GTech provide training for newly graduated engineers.

In this training, we teach Sql, PL/SQL, Oracle Database & Cloud, EBS, OBIEE, BigData, ETL and more.

This year, I was the lecturer for "Database and Cloud".

Actually, I extended the lessons a little bit by explaining the EBS System Administration Fundamentals, as well. :)

The students of the classes were so curios about databases and actually Oracle in general..

It was a honour for me to present "the introduction to Oracle Database", to explain the "Cloud terms" ( including Cloud-at-customer model) and to explain the "EBS architecture".

I tried to shed a light on the important topics like Oracle Database Server Architecture, Oracle Database Process Architecture,  background processes, High availability configurations and so on..

The list of topics covered in the training was as follows;
  • Introduction to RDBMS
  • Introduction to Oracle
  • Architecture (Oracle)
  • Installation (Oracle)
  • DBA role & DBA tools
  • Introduction to Cloud
  • APPS DBA role & EBS System Administration (EBS 12.2)

While explaing these topics, I tried to share real life stories all the time..  Tried to teach them the basics of Oracle, but I also dived deep when required.

The participants asked lots of good technical questions and these made our lessons more entertaining :)

The training for Database & Cloud lasted 3 days.

While, preparing the slides for the presentations that I have used in the training, I also wrote an exam for the students..

At the end of the training, we also gave this written examination to the participiants. (35 questions )

It was a pleasure for me to teach Oracle in GTech Academy ( GTech -- Oracle University Partner)

I hope, It was useful for these guys..
I also hope I will see them (at least some of them) as successful DBAs one day :)

Following is the picture of our class..  A good memory :)

Wednesday, June 6, 2018

RDBMS -- datapump -- 3 interesting problems and 3 things to consider

Recently dealed with some critical and interesting datapump problems.
These problems were basically realated with corrupted dump files, uncompleted dump files, missing Master tables and performance problems during a partitioned table import.

Let's start with the corrupted dump files..
This is very clear.. If you have corrupt dumpfile in your datapump backupset, then you won't be able to correct them.. At least Oracle doesn't have a method or tool to correct them..
There are 1 or 2 third party tools, but I didn't test them. Those tools doesn't support compressed dumpfile , anyways..
On the other hand, you can still import even if you have corrupted dumpfiles in your dumpfile backupset..
Of course, you will get fatal errors during your import.. That is unavoidable, but you can restart the import process and resume your import by skipping those errors.. At the end, you may have your data imported in your database partially..

The other interesting problem was related with the uncompleted dump files..

In our case, someone killed the expdp process during the export.. One day after that, we tried to import the dumpfiles , but we saw the errors .. Some of the dumpfiles were not there.. Master table was not there.. Master table was actually in the database, but it wasn't in any of the files in the export backup set. So we couldn't import the tables.. Datapump was complaining about the master table..

Actually, all the critical tables were already exported before someone killed the expdp.. So, what we needed to do was to find a way to import these dumpfiles.. However, impdp was not doing anything.. It was complaining about the Master table..

What we did was interesting.. We resumed the expdp job.. (it was killed 24 hours ago) So expdp continued, finished all the other tables in the list and exported the MASTER TABLE (expdp exports this table at the end) and finished its work..

Of course, the tables exported in this run were not from the same timestamp as the other ones, but at the end we got our MASTER TABLE included in the backup set and could import those critical tables using this backup set.

The last thing that I want to share with you is related with impdp performance.. Impdp can't import parallel when a partition table is already created in the database. At least, this is the case for 11gR2.

So, if you are going to import a partitioned table using parallel argument, I suggest you to let impdp create that partitioned table.. If you create the table beforehand, you will see impdp running serially although you pass the parallel argument to it. This is based on a real life case.. Hope you 'll find it useful.

Thursday, May 24, 2018

EBS 12.2 -- Things that can be done for debugging WLS Managed Server performance and stability

Weblogic (a FMW component) is an important component in EBS 12.2.

FMW plays an important role in EBS 12.2 , as EBS 12.2 delivers Http services ,OAF and forms services through FMW.

That's why, time-to-time, a real diagnostics is required , especially for analyzing weird performance and hang issues on EBS OAF pages.


In this post, I will go through the things that can be done for debugging the Weblogic side, especially the managed server performance and stability.

Of course, when dealing with weblogic inside EBS, we directly check the managed server logs, admin server logs, Heap size configurations, managed server counts (whether it is aligned with the concurrent user count or not), connection pool limits and so on. On the other hand; these debugging activities that I will give you in this blog post, are a little more advanced. It is also needless to say that, these debugging activities require advanced skills on Weblogic and EBS administration.

Note that,I  won't give the full instructions for these diagnostics activities. In other words; I will explain them very briefly.
Also note that, these activities are not fully documented, that's why they are not fully supported --the risk is yours.

Garbage Collector Debug: for getting a more elaborated GC info and checking the time passed for each GC event.
We can get this debug info using -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps, jvm arguments.

Running technology stack inventory report: to collect the list of patches applied to all middle tier homes (besides Weblogic).. The output of this script may be used to identify unapplied performance patches.

$ADPERLPRG $FND_TOP/patch/115/bin/TXKScript.pl -script=$FND_TOP/patch/115/bin/txkInventory.pl -txktop=$APPLTMP -contextfile=$CONTEXT_FILE -appspass=<appspassword> -outfile=$APPLTMP/Report_App_Inventory.html

Diagnostic Connection Leaks: for getting leak connection-related diag info, we use "How To Detect a Connection Leak Using Diagnostic JDBC Dumps (Doc ID 1502054.1)"

Create heap dump & thread dumps: Especially for getting info about an outofmemory problems.
These diagnostics are done by using the necessary command line arguments in server start arguments section of the related managed server (using WLS console)

The related arguments are specified using the server start arguments section ->

Connect WLS console
Navigate to servers under EBS_domain_<SID> environment
Click on the managed server (ex:oacore_server1)
Click  on  Lock & Edit in Change Center
Click on Server start
Edit arguments (such as  -XX:HeapDumpOnCtrlBreak)

So once the necessary argument is given to a managed server, we restart the managed server and use  OS kill commands to generate these dumps.. (ex: kill -3 os_pid -- kill -3 - SIGQUIT - like ctrl-C but with a core dump)

--review -> How to create a Java stack trace on e-Business Suite ? (Doc ID 833913.1)

Once the error is reproduced we review the FMW logs -> 12.2 Ebusiness Suite - Collecting Fusion Middleware Log Files Note 1362900.1.

Consider increasing Stuck Thread timeouts : in case we have stuck threads.. We can increase the Stuch Thread Max Time using Weblogic console.

Connection Debugging: For JDBC connection debugging, we use Oracle E-Business Suite 12.2 Data Source Connection Pool Diagnostics (Doc ID 1940996.1).

DB level trace: We enable trace at db level -> "alter system set events '10046 trace name context forever, level 12';"
We reproduce the issue and turn it off "alter system set events '10046 trace name context off';"

We check the traces (find the relevant trace using  "grep MODULE *.trc  and/or "grep ACTION   *.trc"

Tracing Managed Server sessions :  For diagnosing  managed server related db activity, and for diagnosing inactive (not closed) managed server sessions.

Reference: On E-Business Suite 12.2 V$SESSION.PROCESS incorrectly reports EBS Client Process ID as '1234' (Doc ID 1958352.1)

Connect to Weblogic Console and then do the following;
Services > Data Sources > EBSDataSource > Configuration > Connection Pool
Set "System Property" as below

v$session.program=weblogic.Name [Take note of the initial value one is changing as one will need to reset it once the fix is delivered and applied.]

Lastly we restart oacore managed servers and monitor the database using a query like;

SQL> select program, process, machine, sql_id, status, last_call_et from v$session where program like 'oacore_server%';

Tuesday, May 22, 2018

EXADATA -- Unique Articles Worth Reading ( imaging, upgrade, installation, configuration and so on)

Nowadays, my context is completely switched. That is, I have started to work more on Exadata and ECM/OCM migrations.. As a result of that, I produce more content on these areas.

Till the last month, I was more focused on Exadata.. But nowadays, I m not only focused on Exadata, but also Exadata Clould machines and cloud migration projects.

Of course, I documented the critical things that we have done on Exadata machines one by one and produced the following articles for sharing with you.