Wednesday, October 18, 2017

EBS 12.2 -- oacore server start problem -- java.util.zip.ZipException: error reading zip file

We encountered a strange problem in an EBS 12.2.6 environment, built on Solaris 11 sparc servers.
The problem started after the Dba restarted the application services.
The problem was directly related with oacore..
oacore_server1 and oacore_server2 could not be started. (it was a multi node apps tier environment, built on shared appl_top)

While, all the other managed servers(like forms) and the Admin Server could be started without any problems, oacore servers could not.

Oacore managed servers could not be started because of the following error;

java.util.zip.ZipException: error reading zip file
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
at weblogic.utils.io.DataIO.readFully(DataIO.java:351)
at weblogic.utils.io.DataIO.readFully(DataIO.java:328)
at weblogic.utils.classloaders.ZipSource.getBytes(ZipSource.java:76)
at weblogic.utils.classloaders.GenericClassLoader.defineClass(GenericClassLoader.java:330)
at weblogic.utils.classloaders.GenericClassLoader.findLocalClass(GenericClassLoader.java:302)
at weblogic.utils.classloaders.GenericClassLoader.findClass(GenericClassLoader.java:270)
at weblogic.utils.classloaders.ChangeAwareClassLoader.findClass(ChangeAwareClassLoader.java:64)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at weblogic.utils.classloaders.GenericClassLoader.loadClass(GenericClassLoader.java:179)
at weblogic.utils.classloaders.ChangeAwareClassLoader.loadClass(ChangeAwareClassLoader.java:43)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

This is an undocumented and an interesting real life case.

I will give the cause and the solution shortly, but first let's look at what we did to correct the problem, or let's say; find the underlying cause of this problem.

It was obvious that during the start of the oacore servers, oacore application was deployed by EBS. ( weblogic)

The problem was on this deploy.. 

During the deployment, some zip files could not be read! (it may be a zip file, jar file or war file)

The error stack was saying these above things but it didn't give us the name of that problematic file.
  • So, we enabled debug on WLS. We enabled debug for Deployer as well.
Enabling debug:
Environment > Servers > MyServer > Debug > weblogic
Then, enable the level of debug you need, e.g.: Deploment.
This change does not require WebLogic Server Restart.
Make sure the severity is set to debug in Weblogic console:
Environment > Servers > MyServer > Logging >Advanced > Minimum severity to log: Debug

Even after enabling the debug, the name of the problematic file could not be determined.
  • We executed ChkEBSDepencencies to ensure that there is no dependency failure.
$FND_TOP/bin/txkrun.pl -script=ChkEBSDependecies -server=ALL_SERVER

This was successful.. So dependencies were not the cause.
  • We exucted truss to see which jar/zip file is having issues.
 truss -daefo /tmp/to_erman.log admanagedsrvctl.sh start oacore_server1

However, truss didn't give us the name.. (or the output file was so big to be make good analysis)
  • Modified the ulimits (especially hard and soft files and process limits). Again, not fixed.
  • Tried to create a new oacore server and work around the problem in case, it could be related with a specific oacore_server using "$AD_TOP/patch/115/bin/adProvisionEBS.pl ebs-create-managedserver"
However, this command failed as it tried to start the new server once it was created and, that new managed server (lets say oacore_server2) failed with the same zip error!  So this wasn't the solution or the workaround.
  • Checked the AD and TXK patch level, but they were already high..
SQL> select ABBREVIATION, NAME, codelevel FROM AD_TRACKABLE_ENTITIES where abbreviation in ('txk','ad');

ABBREVIATION NAME CODELEVEL
ad Applications DBA C.9
txk Oracle Applications Technology Stack C.9
  • Did the following things as instructed by Oracle Support: (altough I found them unrelated with our zip issue)
1. Set SITE level profile option "FND: Disable Inline Attachments" (FND_DISABLE_INLINE_ATTACHMENTS) to a value of "TRUE"
2. Re-start EBS middle tier services to ensure the profile option change is picked up
3. Monitor for any further recurrence of the issue

1.set s_jdbc_connect_descriptor_generation parameter to TRUE on the Target instance
2. Run autoconfig for the affected parameters to reference Target instance
3. Re-test issue

As I expected, these moves didn't solve the issue.

Okay.. Let's see how I found the problematic file and how I fixed the issue ->>

After trying the attempts above, I decided to regenerate the Jar files using adadmin.

I knew that those jar files were used by oacore servers, but I wasn't expecting that there were zip files used during the deployment /start of oacore_server + I didn't expected the same zip files were used when we run the regenerate jar files using adadmin..

So, I executed the adadmin and tried to relink the jar files.
adadmin failed with error, so I checked the adadmin.log file.

There it was..!  the I/O errors...

ERROR: I/O error while attempting to read /u01/app/fs1/EBSapps/comn/java/lib/DnBGlobalAccess.zip

ERROR: I/O or zip error while attempting to read entry oracle/dss/dataView/AdornmentLayout.class in zip file /u01/app/fs1/EBSapps/comn/java/lib/bipres.zip

ERROR: I/O or zip error while attempting to read entry oracle/apps/edr/security/server/EdrVpdRuleEOImpl.class in zip file /u01/app/fs1/EBSapps/comn/java/classes

So, the zip and some class files in the $JAVA_TOP could not be read due to I/O errors.

After seeing these errors, I diretly jumped to the filesystem and tried to copy those problematic files using cp command.

I/O Errors, again !! Solaris could not copy them due to I/O errors..

So, the files were corrupted on OS/Storage layer, on filesystem layer.. ( I sent this info to the OS team and requested a host and fs check from them)

What I did for the fix was simple;

I renamed those files and copied them from the patch filesystem. (checked patch fs, these files were identical as the run filesystem)

Copy was successful.. So the files in patch fs were not corrupted .

After copying them from patch fs, I executed the adadmin again. (generate jar files)
This time, it successfuly completed.

After that, I started the services using adstrtal.sh

This time, oacore_server1 and oacore_server2 could succesfully started!!

So, at the end of the day, I spent almost 6 hours to solve this.. 
No sleep during the diagnostics work!

Unfortuneatly, the issue was undocumented and there was no method to see the problematic zip file other than executing adadmin regenerate jar files..

Anyways, I hope you find this post useful.

No comments :

Post a Comment