Monday, August 24, 2015

EBS 12.2 -- AD script start/stop hang issue in older AD TXK versions

Oracle recommends upgrading AD and TXK levels time to time.

Oracle Delevopment advises it as follows : 
Note: Oracle strongly recommends that you always apply the latest version of the AD and TXK release update packs at your earliest convenience.  

In addition to the performance enhancements and bug fixes there are some enhancements in the latest AD and TXK level that eases the administration and fixes some administration problems ,as well.
For instance; the AD and TXK Delta 6 brings an opportunity to start the WLS servers even if the admin server is done. 
This enhancement is introduced as an option named -msimode which brings the ability to use local copy of the configuration and skip the checks to identify the changes since last start.
Concerning the new -msimode above; also as per our practices, we have seen the following hang issue in AD startup scripts(for ex: adstrtal.sh) can not be reproduced with the lastest TXK and AD levels. 

The issue was documented in the following blog post;
AD startup script hangs, when the weblogic domain is locked: 

It seemed the issue was due to the synchronization that was done from the generic domain config to a local copy of the configuration for the WLS server  and because the domain was locked the startup of the  WLS server was prevented.

The issue was seen in older code , probably in an environment with AD and TXK Level 3, which caused the AD startup scripts to hang where the WLS domain was LOCKED.

At the bottom line, upgrading AD and TXK levels are recommend and needed as they eases our administration works and saves us spending our for find the root causes of the problems caused by bugs in the older AD and TXK levels.

So , if you are available, upgrade to the latest AD and TXK levels.

Note that : The latest RUPs are R12.AD.C.Delta.6 (Patch 19197270) and R12.TXK.C.Delta.6 (Patch 19330775)
Reference : Applying the Latest AD and TXK Release Update Packs to Oracle E-Business Suite Release 12.2 (Doc ID 1617461.1)

Wednesday, August 19, 2015

EBS 12.2 - Adop Reports, adopreports

Recently found an utility when I was going around the EBS filesystem. The utility is called adopreports and can be executed as follows.
The utility can be used to do a list of thing without using any other utilities such as Toad and sqlplus.
You can check table details, indexes on a table and more. But, It is actually based on patching. So it is there for actually to be used during patching activities.
In short, It has several reports which can be used for diagnosing adop issues, or simply gaining knowledge about the status of your system.

Here is an example usage of it;
  [applmgr@somehost]$ adopreports

Enter the APPS username: apps
Enter the APPS Password:

    Online Patching Diagnostic Reports Main Menu
    --------------------------------------------

    1.  Run edition reports
    2.  Patch edition reports
    3.  Other generic reports
    4.  Exit

    Enter your choice [4]: 3




    Other Generic Reports Sub Menu
    ------------------------------

    1.  Editions summary
    2.  Editioned objects summary
    3.  Free space in important tablespaces
    4.  Status of critical AD_ZD objects
    5.  Actual objects in current edition
    6.  Objects dependencies
    7.  Objects dependency tree
    8.  Editioning views column mappings
    9.  Index details for a table
    10.  Inherited objects in the current edition
    11.  All log messages
    12.  Materialized view details
    13.  Database sessions by edition
    14.  Table details (Synonyms, EV, etc.)
    15.  Count and status of DDL execution by phase
    16.  Back to main menu

    Enter your choice [16]: 9
Enter the object name : FND_NODES

===============================================================================
=========================================================================
=  Indexes for FND_NODES
=========================================================================

OWNER           INDEX_NAME             COL# COLUMN_NAME            COLUMN_EXPRESSION
--------------- ------------------------------ ---- ------------------------------ ------------------------------------------------------------
APPLSYS         FND_NODES_N1                      1 NODE_ID
APPLSYS         FND_NODES_U1                      1 NODE_NAME


Above report is captured in file : /u01/install/APPSDEVEL/fs2/EBSapps/appl/admin/TEST/out/adzdshowindex.out.

Tuesday, August 18, 2015

EBS R12 , 12.2 -- MULTI_ORG, MO_GLOBAL.SET_ORG_ACCESS ORA-01031 insufficient privileges, APP-PO-14142, MO_GLOBAL, DBMS_SESSION Line 101

Recently encountered an error in EBS R12 where a Multiple Organization setup was done.
The Standard forms screens was encountering APP-PO-14142 errors.



The error message was saying that there was an ORA-01031 produced in the database level. This error is actually the insufficient privilege error and it is normally produced when there is lack of privileges detected in the database level. So, the apps session must have trying to use a database object without having the necessary grants.



After some investigation , and putting some debug logs into the standard code, we have seen the everyting was in place. The schema that was used by the session was APPS and all the grants of the APPS schema was proper  ( still, just in case, we have recreated the grants and synonyms , as well as executed adgrants.sql)
We have taken an FND debug and analyzed the trace file , using the following well known method;

select max(log_sequence) from fnd_log_messages
eg : log_sequence = 71666646
2) Go to the System Profile options and set the following parameters:
FND: Debug Log Enabled = YES
FND: Debug Log Filename = NULL
FND: Debug Log Level STATEMENT (most detailed log)
FND: Debug Log Module %
3) Re-login to the application and reproduce the bug.
4) select message_text from fnd_log_messages where log_sequence > XXX – value from step 1


During the analysis , we have also seen that the org_id , security group id, application id, responsibility id etc. were all proper.

Using oradebug and/or alter system set events ‘1031 trace name errorstack level 3; , we have also gathered the call trace  and it was like the following; didnt helped much as the names of the function were interrnal and they were not documented or known

Calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFF2A394E08 ? 000000001 ?
000000001 ? 000000002 ?
ksedst1()+98 call skdstdst() 000000000 ? 000000000 ?
7FFF2A394E08 ? 000000001 ?
000000000 ? 000000002 ?
ksedst()+34 call ksedst1() 000000000 ? 000000001 ?
7FFF2A394E08 ? 000000001 ?
000000000 ? 000000002 ?
dbkedDefDump()+2741 call ksedst() 000000000 ? 000000001 ?
7FFF2A394E08 ? 000000001 ?
000000000 ? 000000002 ?
ksedmp()+36 call dbkedDefDump() 000000003 ? 000000000 ?
7FFF2A394E08 ? 000000001 ?
000000000 ? 000000002 ?
dbkdaKsdActDriver() call ksedmp() 000000003 ? 000000000 ?
+1960 7FFF2A394E08 ? 000000001 ?
000000000 ? 000000002 ?
dbgdaExecuteAction( call dbkdaKsdActDriver() 7F01CAED9710 ? 7FFF2A39BA90 ?
)+1065 7FFF2A394E08 ? 000000001 ?
000000000 ? 000000002 ?
dbgdaRunAction()+81 call dbgdaExecuteAction( 7F01CAED9710 ? 00A184340 ?
0 ) 0020C0003 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
dbgdRunActions()+59 call dbgdaRunAction() 7F01CAED9710 ? 000000005 ?
0020C0003 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
dbgdProcessEventAct call dbgdRunActions() 7F01CAED9710 ? 000000005 ?
ions()+651 0020C0003 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
dbgdChkEventKgErr() call dbgdProcessEventAct 7F01CAED9710 ? 00BBC01C0 ?
+1653 ions() 7F01CA9BF698 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
dbkdChkEventRdbmsEr call dbgdChkEventKgErr() 7F01CAED9710 ? 00BBC01C0 ?
r()+56 7FFF2A391640 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
ksfpec()+61 call dbkdChkEventRdbmsEr 7FFF2A391640 ? 00BBC01C0 ?
r() 7FFF2A391640 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
dbgePostErrorKGE()+ call ksfpec() 7FFF2A391640 ? 7FFF2A391640 ?
1129 7FFF2A391640 ? 7FFF2A39BA90 ?
000000001 ? 000000002 ?
dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 00BBC01C0 ? 7F01CA990040 ?
66 000000407 ? 000000000 ?
100000000 ? 000000002 ?
kgeade()+351 call dbkePostKGE_kgsf() 00BBC01C0 ? 7F01CA990040 ?
000000407 ? 000000000 ?
100000000 ? 000000002 ?
kgerelv()+135 call kgeade() 00BBC01C0 ? 00BBC0370 ?
7F01CA990040 ? 000000407 ?
100000000 ? 000000002 ?
kserecl0()+157 call kgerelv() 00BBC01C0 ? 7F01CA990040 ?
000000407 ? 009D95D14 ?
7FFF2A39C900 ? 000000000 ?
kzctxChkTyp()+402 call kserecl0() 00BBC01C0 ? 7F01CA990040 ?
000000407 ? 000A39118 ?
074F6DB00 ? 074F6DB38 ?
kzctxesc()+341 call kzctxChkTyp() 7F01CA9DE998 ? 000000000 ?
000000009 ? 000A39118 ?
074F6DB00 ? 074F6DB38 ?
pevm_icd_call_commo call kzctxesc() 7F01CA9DE998 ? 000000005 ?
n()+867 7F01CA9A3DE0 ? 000A39118 ?
000000000 ? 074F6DB38 ?
pfrinstr_ICAL()+163 call pevm_icd_call_commo 00A5C6D80 ? 000000000 ?
n() 000000001 ? 000000007 ?
000000005 ? 7FFF00000000 ?
pfrrun_no_tool()+63 call pfrinstr_ICAL() 7F01CA9DE998 ? 24856BD10 ?
7F01CA9DEA08 ? 000000007 ?
000000005 ? 7FFF00000000 ?
pfrrun()+622 call pfrrun_no_tool() 7F01CA9DE998 ? 24856BD10 ?
7F01CA9DEA08 ? 000000007 ?
000000005 ? 7FFF00000000 ?
plsql_run()+644 call pfrrun() 7F01CA9DE998 ? 000000000 ?
7F01CA9DEA08 ? 7FFF2A39DD80 ?
000000005 ? 069D984CC ?




We have also reverse engineered the code (MO_GLOBAL) and seen that it was failing at DBMS_SESSION Line 101.
Line 101 of DBMS_SESSION was something like the following;

KZCTXESC(attribute1,attribute2.....);

the KZCTXESC was an internal C function  and Oracle was using it through a gateway interface probably declared with the following PRAGMA definition in the DBMS_SESSION;

PROCEDURE SOMEFUCTION(attribute1,attribute2.....);
PRAGMA INTERFACE (C, SOMEFUNCTION);

So , this was an Internal mechanism and it was closed to us , thus the issue can not be investigated more after this level.

Also look : undocumented pragmas http://www.petefinnigan.com/weblog/archives/00000835.htm

Also, the problem in the DBMS_SESSION.. It just could not be tested directly and it was misleading to test the dbms_session by using a direct plsql approach, as to set the attribute of a context it is necessary to use the package/procedure declared when the context was created.

That is ;

begin
dbms_session.set_context('multi_org', 'access_mode', 'S');
end;
--> Can not be done, it will produce ORA-01031 in any cases, as MO_GLOBAL should be used to set this context..
->
select type,schema, package from dba_context where namespace = 'MULTI_ORG';
ACCESSED LOCALLY APPS MO_GLOBAL

Anyways, altough the investigations and finding the solution took 4 hours of mine ; I ll keep it short :)
Then, we have took a Forms Trace with binds and analyzed the trace file.
In that trace file, we saw a very interesting thing.
There was a very interesting line, as the standard context name was there , I mean multi_org, but it was written as MULTİ_ORG.. An uppercase "İ" was there, so the database session must have been using the word MULTİ_ORG for using the multi_org context and failing because the MULTİ_ORG with an uppercase İ was not equal with multi_org actually.

By keeping it in mind, checked the recently modified profile option values in the system and saw ICX_NLS_SORT, ICX: NLS Sort was changed to XTURKISH , which was actually the extended Turkish.

Then we connected to the database with sqlplus and set my session's nls_sort to XTURKISH, and used the following query to check the situation;

select nls_upper(multi_org) from dual;

The output of the query was MULTİ_ORG as I was expected.
So at that point we concluded that there must be some uppercase function used in the EBS code and that was changing the name of the context that was trying to be set by the database session from multi_org to MULTİ_ORG , which was actually a different word than multi_org.
We also concluded that, DBMS_SESSION Line 101 could have getting ORA-01031 insufficient privileges because it could not find the context that the session was trying set , as there was actually  no context named MULTİ_ORG in the database.

As for the solution;

We changed the the ICX NLS SORT profile from XTURKISH to BINARY and solved the problem ..

It was a very interesting story , that's why I shared this with you.
It means, chaging a EBS profile, actually changes the behave and output the code and I didnt like it at all. It seem like a bug actually, as an internal APPS context should not be dependent on this kind of profile change..  Anayways, we know the solution for that now :)
The system that we have encountered this problem was the latest Version, 12.2.4.
So, in any case, I ll report that to Oracle, as well.

Saturday, August 8, 2015

EBS R12 , 11i -- Changing output file before printing

In this post, I will mention about a little trick that can be used for customizing the printing process in EBS.

As you may already know, EBS prepares the output and send it to printers using Os tools, such as "lp" in Linux.
So, if we can manipulate the output files before sending it to the printers , we can accomplish customizing the output file before sending it to the printers.
Let's explain this process this by a little explain.
Suppose we want to replace all the 0 (zero characters) with the O ( The letter O) in our printings and suppose we want to do this in all EBS prints.
In this kind of scenario, we have to build a mechanism that is tightly integrated with EBS printing sub system, as we want EBS modify the output files generated by EBS and we want it to be transparent to EBS.

So, the things that need to be done are pretty simple actually.
First, we create a script in the application node of EBS environment, where the printing is done.
The script will include a one liner which will replace the characters in the outpout file and send it to the printer. Note that, its first argument will be the printer name and the second argument will be the output file.

/home/applmgr/erman.script
sed -i 's/0/O/g' $2
lp -d $1 $2


Then , we go to EBS screens and modify the printer definition (to the driver actually) as follows;

Change in Arguments text box: /home/applmgr/erman.script $PROFILES$.PRINTER $PROFILES$.FILENAME


I will not go in details about these arguments and script, as you have found this blog and reading this article with curiosity :) That is, I suppose you can understand the purpose of the script and the changes we have done in the arguments textbox above.

Note that: The arguments may be changed, extra arguments may be added.
So, this is the way, and I hope you like it.

ODA X4-2 -- Virtualized platform, ASMResilver running all the time, Bug 20438706, ASM , resilvering , asmResilver2

We may notice a performance problem caused by  an ASMResilver process, which runs all the time.
the issue can be encountered in ODA X4-2 virtualized platforms and it may be started after restarting an ODA node.
After the restart , we may see the asmResilver2 process starts and runs forever. It cannot complete its work and stays in a stuck situation.
As a result, the overal performance of the viruatliazed system decreases.

I have explained Asm Resilvering generally in one of my previous posts: http://ermanarslan.blogspot.com.tr/2015/05/exadata-asm-resilvering-vs-asm-rebalance.html

So, in this post, I will focus on the resilvering bug in ODA , rather than explaining what asm resilvering is.

There are two major threads take roles in ASM Resilvering process. One of them is asmResilver1. This thread starts in case of a recovery situation such as when a node was aborted and then started again. AsmResilver1 checks if a mirror recovery is required and if it finds a recovery is required, it begins the recovery.
asmResilver2 is the other thread plays roles in the ASM recover/resilvering process. It is triggered when a cluster membership is changed. This kind of changes may be the results of a node crash or an ASM instance crash.
asmResilver2 checks to find out if a volume needs to be recoverd and does the recovery if it is needed.
If there is a ongoing mirror recovery operation in the volume that needs to recovered, that recovery operations is aborted and this thread restarts the recovery process.
So that 's why, after a node failure , we can see asmResilver starts immediately during the restart of the node and then aborted and then asmResilver2 gets started, which it is actually an expected behaviour.

This type of recovery can be triggered by the following events:
1. System reboot/crash while volumes are mounted.
2. ASM shutdown with volumes mounted (shutdown abort/immediate)
3. Forced diskgroup dismount while volume is open.

Also it can be aborted by the following events:
1. Restarting ASM instance anywhere in the cluster
2. Rebooting a node anywhere in the cluster.
3. Unmounting any mirrored volume anywhere in the cluster.

Moreover, if an ongoing mirror recovery is aborted, the recovery will start from the beginning when it is restarted.

So, if we restart a node oftenly, then we may see asmResilver2 running all the time , as it may not finish its work between two restarts and because each time it needs to start from beginning, thus it may not complete the recovery very quickly.

But, the problem that makes me write this blog post is not like that .. 
The problem I m talking about was that asmResilver2 kept running with %100 CPU usage all the time, altough we did not reboot a node for weeks.

This kind of issues can be seen from the logs. The mirrored regions per sec and the resilvering resilvering window can be estimated.
So, by considering the volume size and resilvered total region size , we may decide whether it is normal or not.
For example:

If  the entire volume is 4TB and if we  see 1.07 TB of this 4 TB is resilvered in 19 hours and if the resilvering process is still running , then it is an evidence of a hung situation or an endless loop situation as it is not possible for resilvering for resilvering process to operate on this kind of big data during a normal DRL aging interval. (this was seen in our case actually)

Anyways, I ll keep it short, so this hung or endless loop situation that I have explained above was caused by a bug actually..
The bug was getting triggered for the volumes which were greater than 2TB. Because of this bug the resilvering code was trying toı recover the same volume regios repeatedly causing an endless loop.
During this recovery operation , the resilvering was done for the same regions repeadetly and this action was increasing the total resilvered regions and that s why we were seeing such an huge total resilvered region counts.

So , for this kind of problem;
The cause is the Bug with number 20438706 and the fix is to apply the patch 20438706.
The patch should be applied on ODA BASE nodes, where GI homes are located.
The patch should be applied by reading the readme file of it and before applying it all the VMs in the ODA environment should be stopped.

In brief, if we have an ODA X4-2 virtualized platform, keep in mind that, there is a bug in the resilvering code and it must be fixed before we ll have volumes greater than 2 TB in size.

Thursday, August 6, 2015

EBS R12 -- patch analysis, rollback a patch (only in theory, not recommended)

As you may know, in EBS R12 there is no way or no utility that can  rollback an Applications patch which is applied with the adpatch or adop .
So, these patches are not like the rdbms patches which can applied and rolled back using the opatch utility.
Altough, there is no utility or a support document that can be used to rollback an application patch, there is a manual method to follow for rolling back an applications patch in EBS systems.
It is probably a not supported action but it is worth mentioning..
Of course , this method must not followed for big pataches like applications upgrade patches , but it can be used for small patches which are applied for fixing a couple of bugs.
Knowing the proper method for rollbacking a patch is restoring a backup, this manual method that I will explain below can also be used on your own risk as the last resort.

To rollback a patch, we first need to identify the changes delivered with it. To identifiy the changes, we unzip the patch zipfile and open the patch driver file.. The driver file is where the patching utility gets its order from.

The actions in driver file are equipped with comments , for example the following ;

# Phase TAB - Create Tables and Indexes
# file-version-parsed XXX patch/115/xdf /XXX.xdf 120.5.12020000.2

So, comments above tell us that , the patch creates some tables and indexes and the definitions for these tables and indexes are in XXX.xdf file that is stored in the patch directory, exactly in the patch/115/xdf subdirectory.

So , when we open this XXX.xdf ile, we gather te following info , as we find it properly commented there..

Primary Object Schema Name :
XXX
Primary Object Name :
    XXX_EXT
Primary Object Type :
    TABLE
Dependent Object Information :
Indexes on XXX_EXT :
    XXX_UI
    None
Sequence(s) on XXX_EXT :
    None
Policy(ies) on XXX_EXT :
    None

So, we see that the patch creates a table named XXX_EXT and also creates an index XXX_UI on table XXX_EXT.

Similary, we identify the plsql objects and form files that are created or changed with the patch using the information written in the driver file, and take the necessary actions to take them back.

Following is a snip from a patch driver file, it basically says that , this patch is designed to execute the XX_NEW.pls to create or replace some package specification ..
So, by opening the pls with an editor, we can easily see the name of the spec and find out what is changed there. So if we want to rollback that spec, we can take the definition of it using a backup or a test system and compile it back in the system where the patch is applied.

# Phase PLS - Create Package Specifications
# file-version-parsed cle patch/115/admin/sql XX_NEW.pls 120.2.12020000.1
sql xxx patch/115/admin/sql XX_NEX.pls none none none package &phase=pls
checkfile:xxx:patch/115/admin/sql:XX_NEW.pls

So, the necessary actions that we are talking about here, may be getting the definition of the changed objects from a backup or a test system and compiling them in the system where the patch is applied ..

If the changed object is an application tier object such as a form (fmb) , we should take the old version of it from the test environment or from a backup, place it in to the correct folder and compile it.

Lastly, in the driver file, we may also see a data load for seed tables with a comment like;
#Phase DAA - AOL Seed Data.
In this case, we can analyze the lct and ldt files to rollback the changes, as these seed data operations are done using FNDLOAD utility.

Once all the work for rolling back is finished, we may compile the invalid objects inside the database using adadmin-apps compile and utlrp.sql and restart our application services.

Altough, we rolled back a localization patch in EBS 12.1 using this method recently, and altough it worked for our customer, I want to remind you that, this is not a method for rolling back a patch.
You should never use this method normally!
But, of course, you can use it on your own risk, if you have nothing else to do, when you have no backup., nothing.
Moreover , if you decide to roll back a patch manually using this patch analysis method, you must know what you are doing, as it is not documented, not supported and a completely manual operation. Also, the steps in such an operation may vary according to the patch and the environment , so you must have necessary experience to analyze the patch driver file properly and take the necessary actions in the system.