Thursday, May 24, 2018

EBS 12.2 -- Things that can be done for debugging WLS Managed Server performance and stability

Weblogic (a FMW component) is an important component in EBS 12.2.

FMW plays an important role in EBS 12.2 , as EBS 12.2 delivers Http services ,OAF and forms services through FMW.

That's why, time-to-time, a real diagnostics is required , especially for analyzing weird performance and hang issues on EBS OAF pages.


In this post, I will go through the things that can be done for debugging the Weblogic side, especially the managed server performance and stability.

Of course, when dealing with weblogic inside EBS, we directly check the managed server logs, admin server logs, Heap size configurations, managed server counts (whether it is aligned with the concurrent user count or not), connection pool limits and so on. On the other hand; these debugging activities that I will give you in this blog post, are a little more advanced. It is also needless to say that, these debugging activities require advanced skills on Weblogic and EBS administration.

Note that,I  won't give the full instructions for these diagnostics activities. In other words; I will explain them very briefly.
Also note that, these activities are not fully documented, that's why they are not fully supported --the risk is yours.

Garbage Collector Debug: for getting a more elaborated GC info and checking the time passed for each GC event.
We can get this debug info using -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps, jvm arguments.

Running technology stack inventory report: to collect the list of patches applied to all middle tier homes (besides Weblogic).. The output of this script may be used to identify unapplied performance patches.

$ADPERLPRG $FND_TOP/patch/115/bin/TXKScript.pl -script=$FND_TOP/patch/115/bin/txkInventory.pl -txktop=$APPLTMP -contextfile=$CONTEXT_FILE -appspass=<appspassword> -outfile=$APPLTMP/Report_App_Inventory.html

Diagnostic Connection Leaks: for getting leak connection-related diag info, we use "How To Detect a Connection Leak Using Diagnostic JDBC Dumps (Doc ID 1502054.1)"

Create heap dump & thread dumps: Especially for getting info about an outofmemory problems.

--review ->  How To Create a Java HeapDump in E-Business Suite (Doc ID 835909.1)

You add the following parameters to JVM startup parameters -> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path>

Then you restart the relevant managed server and wait an outofmemory error to occur.
Note that a heap dump is written only on the first OutOfMemoryError.
You can also generate the heap dump by using jmap or jconsole..
Examples:

jmap -dump:format=b,file=<filename> <jvm_pid>

Launch jconsole --> attach to the java process --> and invoke operation dumpHeap() on HotSpotDiagnostic MBean

Note that, the generated trace file can be analyzed using 3rd party tools like Solaris tool jhat, IBM HeapAnalyzer, Eclipse Memory Analyze.

Creating Java Stack Trace : Especially for getting info about hanging, blocking, spinning processes. These diagnostics are done by using the necessary command line arguments in server start arguments section of the related managed server (using WLS console)

The related arguments are specified using the server start arguments section ->

Connect WLS console
Navigate to servers under EBS_domain_<SID> environment
Click on the managed server (ex:oacore_server1)
Click  on  Lock & Edit in Change Center
Click on Server start
Edit arguments (such as  -XX:HeapDumpOnCtrlBreak)

So once the necessary argument is given to a managed server, we restart the managed server and use  OS kill commands to generate these dumps.. (ex: kill -3 os_pid -- kill -3 - SIGQUIT - like ctrl-C but with a core dump)

--review -> How to create a Java stack trace on e-Business Suite ? (Doc ID 833913.1)

Once the error is reproduced we review the FMW logs -> 12.2 Ebusiness Suite - Collecting Fusion Middleware Log Files Note 1362900.1.

Consider increasing Stuck Thread timeouts : in case we have stuck threads.. We can increase the Stuch Thread Max Time using Weblogic console.

Connection Debugging: For JDBC connection debugging, we use Oracle E-Business Suite 12.2 Data Source Connection Pool Diagnostics (Doc ID 1940996.1).

DB level trace: We enable trace at db level -> "alter system set events '10046 trace name context forever, level 12';"
We reproduce the issue and turn it off "alter system set events '10046 trace name context off';"

We check the traces (find the relevant trace using  "grep MODULE *.trc  and/or "grep ACTION   *.trc"

Tracing Managed Server sessions :  For diagnosing  managed server related db activity, and for diagnosing inactive (not closed) managed server sessions.

Reference: On E-Business Suite 12.2 V$SESSION.PROCESS incorrectly reports EBS Client Process ID as '1234' (Doc ID 1958352.1)

Connect to Weblogic Console and then do the following;
Services > Data Sources > EBSDataSource > Configuration > Connection Pool
Set "System Property" as below

v$session.program=weblogic.Name [Take note of the initial value one is changing as one will need to reset it once the fix is delivered and applied.]

Lastly we restart oacore managed servers and monitor the database using a query like;

SQL> select program, process, machine, sql_id, status, last_call_et from v$session where program like 'oacore_server%';

Tuesday, May 22, 2018

EXADATA -- Unique Articles Worth Reading ( imaging, upgrade, installation, configuration and so on)

Nowadays, my context is completely switched. That is, I have started to work more on Exadata and ECM/OCM migrations.. As a result of that, I produce more content on these areas.

Till the last month, I was more focused on Exadata.. But nowadays, I m not only focused on Exadata, but also Exadata Clould machines and cloud migration projects.

Of course, I documented the critical things that we have done on Exadata machines one by one and produced the following articles for sharing with you.

Monday, May 21, 2018

Exadata -- Cisco Switch Firmware upgrade

In this post, I will explain upgrading the firmware of the Cisco Switch, which is delivered --built-in-- with the Exadata machines.
For explaning the process, I will go through a real life case, which was done in an Exadata X3-2 environment.

The Cisco switch version that I use for demonstrating this upgrade is Catalyst 4948e, which is the ethernet switch delivered with Exadata X3-2 machines. (In Exadata X7, we see Cisco Nexus switches..)

In Exadata environments, these cisco switches are used for systems management net interfaces access only. (ethernet based management network, ssh connection, ILOM and so on.)

So, during such an upgrade, no production traffic is affected, just consoles and node management...


The requirement for upgrading the firmware of these switches may arise after a security scan, which is usually performed regularly by the security teams in customer environments (enterprise customers..) 

Following is a list of vulnerabilities that were discovered in a customer environment.. These vulnerabilities were discovered on the cisco switch which was delivered with the Exadata X3-2. (cisco firmware version was : cat4500e-IPBASEK9-M Version 15.1(1)SG)

• Cisco IOS Cluster Management Protocol Telnet Option Handling 
• Cisco IOS IKEv2 Fragmentation DoS 
• Cisco IOS IKEv1 Fragmentation DoS 
• Cisco IOS Software DHCP Version 6 Server Denial of Service Vulnerability 
• Cisco IOS Software DHCP Denial of Service Vulnerability 
• Cisco IOS EnergyWise DoS 
• Cisco IOS Software Internet Key Exchange Version 2 (IKEv2) Denial of Service 
• Cisco IOS Software Smart Install Denial of Service Vulnerability 
• Cisco IOS Software RSVP DoS 
• Cisco IOS Multicast Routing Multiple DoS 
• Cisco IOS Multiple OpenSSL Vulnerabilities 
• Cisco IOS Software TFTP DoS 
• Cisco IOS Software DHCP Denial of Service Vulnerability 

These vulnerabilites are fixed in cisco firmware version "cat4500e-ipbasek9-mz.152-2.E8"  and here is the list of things that we did for upgrading this 15.2.2E8 target release;

  • First, we connect to the cisco switch using telnet from db node 1 and check the current firware version;

[oracle@exanode1~]$ telnet <cisco_switch_ip_address>
exaswc0>show version

Cisco IOS Software, Catalyst 4500 L3 Switch Software (cat4500e-IPBASEK9-M), Version 15.1(1)SG, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled Sun 15-Apr-12 02:55 by prod_rel_team

ROM: 12.2(44r)SG11
fbadmswc0 uptime is 4 years, 37 weeks, 2 days, 23 hours, 55 minutes
System returned to ROM by power-on
System restarted at 15:15:39 GDT Tue Jul 2 2013
System image file is "bootflash:cat4500e-ipbasek9-mz.151-1.SG.bin"
Hobgoblin Revision 21, Fortooine Revision 1.40

  • Then, we download the new switch software from cisco -

https://software.cisco.com/download/release.html?mdfid=283027810&softwareid=280805680&release=15.2.2E8&flowid=3592
(Choose "IP Base Image" line from 15.2.2E8(MD) version.
File name : cat4500e-ipbasek9-tar.152-2.E8.tar)

  • After downloading the new switch software , we create a tftp server and , and put the new cisco software bin (which comes out from the tar file) to a tftp directory like /tftpboot/switch_image.

[root@acs-vmmachine~]# mkdir /tftpboot/switch_image

[root@acs-vmmachine ~]# chmod 777 /tftpboot/switch_image/

[root@acs-vmmachine ~]# ls -l /tftpboot/switch_image/
total 0

-rwxrwxrwx 1 root root 0 Mar 19 09:16 new_image.bin

  • Then again, in cisco switch; we list the files in the bootflash directory and check its size;

exaswc0>enable
Password: 

exaswc0#dir bootflash:
Directory of bootflash:/
    6  -rw-    25213107  Mar 19 2013 14:46:08 +04:00  cat4500e-ipbase-mz.150-2.SG2.bin
    7  -rw-    32288280   Jun 5 2013 20:04:54 +04:00  cat4500e-ipbasek9-mz.151-1.SG.bin
  
exaswc0>show file systems 
File Systems: 

Size(b) Free(b) Type Flags Prefixes 
* 60817408 45204152 flash rw bootflash:   --------> There are about 45 MB free space in bootflash. (Min 20 MB required.)
  • We configure our cisco to boot from a specific firmware file. 

exaswc0#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
exaswc0(config)#no boot system
exaswc0(config)#boot system bootflash:cat4500e-ipbasek9-mz.151-1.SG.bin (current)

  • Then, we save the running config and name it with the suffix "before-upgrade"

exaswc0#copy running-config startup-config all 
exaswc0#copy running-config bootflash:cat4500e-ipbasek9-mz.151-1.SG-before-upgrade
  • Next, we copy this file to our tftp server. (for backup) -- we answer the prompts for the tftp-server name and the destination filename..

exaswc0#copy bootflash:cisco4948-ip-confg-before-upgrade tftp:
  • After copying our running config to our tftp-server (installed earlier into our client machine), we copy the new image from tftp-server to our cisco switch by executing the following command on cisco.

copy tftp: bootflash:
Address or name of remote host []? acs-vmmachine
Source filename []? switch_image/new_image.bin
Destination filename [new_image.bin]?
cat4500e-ipbasek9-mz.152-2.E8.bin

...
....
exaswc0# 
exaswc0# dir bootflash: 
Directory of bootflash:/
    6  -rw-    25213107  Mar 19 2013 14:46:08 +04:00  cat4500e-ipbase-mz.150-2.SG2.bin
    7  -rw-    32288280   Jun 5 2013 20:04:54 +04:00  cat4500e-ipbasek9-mz.151-1.SG.bin
25  -rw-    38791882  Mar 20 2018 15:24:24 +04:00  cat4500e-ipbasek9-mz.152-2.E8.bin -- this is the firmware that we are upgrading to.

  • We verify the new image file;

exaswc0-ip#verify bootflash:cat4500e-ipbasek9-mz.152-2.E8.bin
File system hash verification successful.

  • After our new image file is verified, we configure our cisco switch boot system to our new image bin and save the configuration into nvram.

exaswc0#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
exaswc0(config)#config-register 0x2102
exaswc0(config)#no boot system
exaswc0(config)#boot system bootflash:cat4500e-ipbasek9-mz.152-2.E8.bin
exaswc0(config)#
exaswc0(config)# (type <control-z> here to end)
exaswc0#show run | include boot
boot-start-marker
boot system bootflash:cat4500e-ipbasek9-mz.152-2.E8.bin
boot-end-marker

exaswc0# copy running-config startup-config all
exaswc0#write memory 


Note that: 0x2102 instructs the boot process to ignore any breaks, sets baudrate to 9600 and boots into ROM if the main boot process fails for some reason.
  • Lastly, we boot our cisco switch with the new firmware and save running config.

exaswc0# reload 
exaswc0-#copy running-config startup-config all 
exaswc0#copy running-config bootflash:cat4500e-ipbasek9-mz.152-2.E8-after-upgrade
exaswc0#write memory 

  • At this point, we can continue enabling SSH access and disabling telnet access. (although, this action is optional, it is highly recommended. Check the below references for the instructions.

References:

Upgrading firmware / Configuring SSH on Cisco Catalyst 4948 Ethernet Switch (Doc ID 1415044.1)
How To Update Exadata Management Network Switch Firmware (Doc ID 1593004.1)

Thursday, May 17, 2018

RDBMS -- Interesting error on Duplicate From Active Database -> ORA-19845, ORA-17628, ORA-19571, ORA-19660

Recently encountered an interesting problem in a Rman Duplicate session.
We were trying to duplicate a database from active database using rman and although, we did everything fine, we ended up with the following error stack.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/03/2018 00:15:53
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script

ORA-19845: error in backupArchivedLog while communicating with remote database server
ORA-17628: Oracle error 19571 returned by remote Oracle server
ORA-19571: RECID STAMP not found in control file
ORA-19660: some files in the backup set could not be verified
ORA-19662: archived log thread 1 sequence 7643 could not be verified
ORA-19845: error in backupArchivedLog while communicating with remote database server
ORA-17628: Oracle error 19571 returned by remote Oracle server
ORA-19571: RECID STAMP not found in control file


As the error ORA-17628 suggests, RMAN couldn't comminucate with the remote server.
The remote server that is mentioned here was actually the auxiliary instance , which was the new database instance that we were creating from the active database.

This problem was closely related with the service_names parameter of this auxiliary.

As you may already know, when we duplicate from active database, rman restores the spfile from the source instance and update it according to the parameter settings that we used in our duplicate command..

In case of the service_names and other similar type of parameters, rman restores the spfile and updates it according to the value that we set for the "SPFILE PARAMETER VALUE_CONVERT", that we use in our rman duplicate command.

However; what we discovered in this case was, rman couldn't do that update properly.. (at least for the service_names parameter and at least for our case..)

So, although we set the correct value for the SPFILE PARAMETER_VALUE_CONVERT parameter, rman couldn't update the service_names parameter of the auxiliary instance properly.

As a result, we encountered "ORA-17628: Oracle error 19571 returned by remote Oracle server" error during our duplicate session.

I must admit that, this was weird and probably this was probably a bug.
Fortuneatly, we found the workaround.

As for the workaround, we did the following;
  • we created an init.ora for the auxiliary and made the changes in init.ora (changes for the desired values)
db_unique_name='ERM'
set db_name='ERM'
set instance_name='ERM1'
set instance_number='1'
set db_create_file_dest='+DATA'
set db_recovery_file_dest_size='40G'
set db_recovery_file_dest='+RECO'
set control_files='+DATA','+RECO'
set db_create_online_log_dest_1='+DATA'
set db_create_online_log_dest_2='+RECO'
set diagnostic_dest='/u01/app/oracle'
set audit_file_dest='/u01/app/oracle/product/12.1.0.2/dbhome_3/rdbms/audit'
set log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST'
set log_archive_dest=''
set local_listener=''
set cluster_database='FALSE';

  • Then, we connected to the auxililary and created spfile from the pfile
SQL> CREATE SPFILE FROM PFILE='location of destination pfile'; ----this is the pfile created earlier.
SQL> STARTUP NOMOUNT;

  • Lastly, we run our duplicate command without SPFILE clause. (without SPFILE PARAMETER_VALUE_CONVERT)
In brief, we set our desired parameters for the auxiliary database in a pfile (init.ora), then created spfile from that init.ora(pfile) and started up the auxiliary database in nomount mode using that spfile..
After that , we run our rman duplicate command  without specifying SPFILE.. parameter.

By doing this; we started up the auxilariy instance with the desired parameters and bypassed the automatic spfile update that is done from source instance to auxiliary by rman ( this automatic update is done when we use SPFILE PARAMETER_VALUE_CONVERT parameter and when the auxiliary instance is started up using pfile)..

This workaround saved the day, so I wanted to share it with you.

Note that: starting up the auxiliary database directly with the spfile (filled with the desired parameters) is actually a good thing to do. So we are considering using this approach in our next duplicate sessions as well.

Wednesday, May 16, 2018

EBS -- Upgrading EBS 12.1.3 to 12.2 -- the general steps

In this blog post, I want to give you a list which includes the general steps that can be included in the project plan of an EBS 12.1.3 to 12.2 upgrade project.

By taking the following phases and the related steps into account, you may calculate your effort and do your project plan accordingly.

I wanted to use EBS 12.1.3 as the source version because it is a very common version in EBS customer environments. While giving the steps, I also wanted to highlight the teams responsible for completing those steps. (apps DBA team, Functional team, core business users etc..).

Of course, some of these steps like the "upgrade database step" is optional .( if your db release is up-to-date enough.)

Phase 1      
  • Upgrade the database on the existing EBS 12.1: apps DBA team      
  • Execute a functional test: EBS functional team
Phase 2      
  • Install all application pre-upgrade patches: apps DBA team      
  • Verify the instance: EBS functional team
Phase 3      
  • Execute all functional pre-upgrade tasks including customizations: functional team      
  • Perform a full system backup: System and apps DBA team   
 Phase 4      
  • Apply localization and 12.2 pre-upgrade patches: apps DBA team      
  • Upgrade to 12.2.0: apps DBA team      
  • Enable online patching: apps DBA team      
  • Apply tech stack patches: apps DBA team      
  • Upgrade to 12.2.6/12.2.7: apps DBA team      
  • Perform all post-upgrade tasks: apps DBA and functional teams      
  • Application function test cases: core business users