Monday, February 10, 2014

Oracle EBS on EXADATA X3 Database Machine

In this post , I will write about a POC event we have done for a leading organization in Turkey.
The POC was done to prove the EBS 11i can run on Exadata X3 smoothly and to display the performance results achieved with the power of Exadata X3.


Implementation steps of this POC were -> patch the source app&db tier in place , migrate the database to Exadata , configure Exadata and EBS components for maximizing the performance , and supplying High availability and Load Balance.

For the source, we had an EBS database running on a powerfulerma HP server .
The source system's computing power was provided by 2xIntel 4 core Cpu's and the I/O subsystem was supplied by NetApp storages in the backend. The data between the storage and the database server was flowing on fiber using fcp protocol.. All of this system was designed to run the EBS database which was 1.2 TB in size.. There were high number of batches, transactions, ad-hocs and reporting on the database, and the users was complaining about the significant decrease in response times during the peak hours.
The system was servicing the following technologies/tools : Application Servers components ( Oracle Forms, Reports, Apache, Oc4j instances), Discoverer Admin/Desktop and Web reporting tools, External in-house developed web applications, Oracle Concurrent Managers/Programs , lots of Integrations etc..

So it was a busy system(1200 database session in avg) as you can imagine..

Okay lets look at the source system specs clearly;

-SOURCE -> HP ProLiant DL380 G7 & Netapp SAN-



Hardware:

148 GB memory
Cpu Count : 2
Core count per Cpu : 4
Cpu model: Intel(R) Xeon(R) CPU E5640 @ 2.67GHz
Netapp Storage ( a raid dp array with 15k disks dedicated to EBS luns)

Software:

OS: RHEL 5.4 64 bit, hugepage enabled&used
Oracle RDBMS: Oracle 10.2.0.4 64 bit Enterprise Edition
EBS: 11i 11.5.10.2 --extended support patches applied (application tier didnt be migrated to Exadata.)
EBS modules installed and used were Finance, Inventory ,HR , some Custom modules and etc..

The top waits on this source system was db file sequential read, read by other session , db file scattered read, log file sync and Cpu time..

Top 5 Wait table ( gathered from the awr snapshots taken in peak hours)




As you see database has a lot of I/O activities. The load was high, so the system was busy.. IO waits were significant..

Elapsed:
60.09 (mins)

DB Time:
2,427.01 (mins)

Statistic Total
BUSY_TIME 9,744,849
IDLE_TIME 36,270,309
IOWAIT_TIME 11,054,996
NICE_TIME 210
SYS_TIME 646,362
USER_TIME 8,875,169
LOAD 35
RSRC_MGR_CPU_WAIT_TIME 0
PHYSICAL_MEMORY_BYTES 63,350,468,608
NUM_CPUS 16
NUM_CPU_SOCKETS 2

Now look at the target system specs ;

-TARGET -> EXADATA Database Machine x3-2 quarter rack-

Hardware


2 x Database Servers, each with:

2 x Eight-Core Intel® Xeon® E5-2690 Processors (2.9 GHz)
256GB Memory
Disk Controller HBA with 512MB Battery Backed Write Cache
4 x 300 GB 10,000 RPM Disks
2 x QDR (40Gb/s) Ports
4 x 1/10 Gb Ethernet Ports (copper)
2 x 10 Gb Ethernet Ports (optical)
1 x ILOM Ethernet Port
2 x Redundant Hot-Swappable Power Supplies
3 x Exadata Storage Servers in total;
36 CPU cores for SQL processing
12 PCI flash cards with 4.8 TB Exadata Smart Flash Cache
36 x 600 GB 15,000 RPM High Performance disks
2 x 36 port QDR (40 Gb/sec) InfiniBand Switches

Software:

Oracle Grid Infrastructure 11.2.0.3 (11.2.0.3.21)
Oracle RDBMS 11.2.0.3 Enterprise Edition
ORACLE RAC
Oracle Exadata Storage Software components.

...

Installation of the Exadata was done by Oracle, but this time , I've partially participated in the installation process to Exadata.. Exadata X3-2 database machine re-imaged and Bundle Patch 21 was installed..

We requested 22 ip adresses, as we had to configure internal and external networks as well as scan listeners.

We also requested 5 ethernet cables. 4 of them were for 2 Db nodes, 1 of them was for Cisco switch.
The requested ip block template was like the following;
 
For external connections: (1st vlan)
exaclient01 192.168.19.11
exaclient01-vip 192.168.19.12
exaclient02 192.168.19.13
exaclient02-vip 192.168.19.14
exa-scan 192.168.19.15
exa-scan 192.168.19.16
exa-scan 192.168.19.17 --note 3 exa-scan ip , different ip - same hostname(exa-scan)

For Administration and ILOM : (2nd vlan)
exadbadm01 192.168.95.11
exadbadm02 192.168.95.12
exaceladm01 192.168.95.13
exaceladm02 192.168.95.14
exaceladm03 192.168.95.15
exadbadm01-ilom 192.168.95.16
exadbadm02-ilom 192.168.95.17
exaceladm01-ilom 192.168.95.18
exaceladm02-ilom 192.168.95.19
exaceladm03-ilom 192.168.95.20
exasw-adm01 192.168.95.21
exasw-iba01 192.168.95.22
exasw-ibb01 192.168.95.23
exasw-pdua01 192.168.95.24
exasw-pdub01 192.168.95.25

The installation was a standard process.. The servers were installed through the images..

Okay, after describing the source and target environments; lets start with migration process:

Following MOS docs can be followed for Exadata migration;
  • Export/import notes on Applications 11i Database 11g (Doc ID 557738.1)
  • Interoperability Notes Oracle EBS 11i with Oracle Database 11gR2 (11.2.0) (Doc ID 881505.1)
  • Migrating an Oracle E-Business Suite Database to Oracle Exadata Database Machine (Doc ID 1133355.1)
  • Migrating Oracle E-Business Suite to Sun Oracle Database Machine Using Oracle Data Pump paper
  • E-Business Suite on Exadata Oracle Maximum Availability Architecture White Paper October 2012 

I will not go in to details about the migration steps, so I will keep it short..

First of all; the decision for the migration method was made.. The migration method was using datapump for do operation.. Datapump would work without any problems inspite of the version differences in OS and Database Software. It would make our upgrade automatically during the migration , as we would export the data from 10g and import it to the 11g.. Datapump would also clear our fragmentation, and help us decrese the foot print of our EBS database.. We also thought that import would be very fast, as the target system was an Exadata.. That is, we had 2 node rac environment, several disks, so we would enforce the limits of parallelism on the basis of I/O and CPU..

So, after choosing the accurate migration method, the first step of the migration was about upgrading the related Oracle EBS application tier and database objects into appropriate versions.. This approach was needed because target environment was an Exadata , which has 11gr2 database by default.. So in order to migrate our EBS database(which was 10.2.0.4), we had to apply the needed patches to make it interoperable with 11gR2 databases.

Also, we need to apply other patches for the components like autoconfig,exp/imp and etc... They were needed to support the new technology. In addition to that, we applied patches to target database (11.2.0.3)to fix the known EBS specific problems..

During a patch we needed to stop asm, too. While starting ASM again, we faced the problem applied and the solution in the following post :

http://ermanarslan.blogspot.com.tr/2014/02/start-crs-error-crs-4640-11gr2.html

After applying the necessary patches , I first created our target database using Create Database command syntax. Actually, target database script was generated by a Oracle EBS script. Then we executed the script on the target to create our target database. After successful completion of our database creation script, changed the created database into two node RAC. Then, I set up SYS and SYSTEM schemas in the target database. In addition to that, installed java virtual machine and required components such as ORD, Spatial, XDB, OLAP, interMedia, and Context in to the target database using oracle application scripts.

I followed the docs line by line. I mean, for instance; we recorded advanced queues as the docs indicate that, advanced queues not propagated in the target database instance during the export/process.

I rebuilt the spatial indexes to remove the rebuild index parameters from them ,and scynronized the indexes, which were not synchronized before running the export and so on.. In brief everything stated in MOS docs were done before exporting the database.

In the export phase; I used a parameter file as stated, exported the database with 8 parallel export processes.. I kept the max dump file size 20 gb in order not to reach the file count limit..

During the export phase, I faced the following datapump problems and applied the related solutions stated in the following posts of mine;

http://ermanarslan.blogspot.com.tr/2014/01/rdbms-expdp-ora-39095-dump-file-space.html

http://ermanarslan.blogspot.com.tr/2014/01/ebs-datapump-expdp-error-ora-33272-zpb.htmlNote that : During the export phase of the migration if you cancel your export to start it again for some reason, following note can be used to cleanup the mess, before issuing the expdp command again.
http://ermanarslan.blogspot.com.tr/2014/02/datapump-cleanup-dbadatapumpjobs.html

After exporting the database , the export dumpfile were copied to the target system.. I used the Exadata's disks as storage for storing the export files, by creating an LVM on it.

I didnt use dbfs, because I didnt find it useful for the case, as this was a POC event..

When the export was finished, I started the import job , used parallel workers and didnt use the cluster(by default y) option with imptp and imported the database using cluster=n. This parameter prevented the I/O errors based on the lack of the availability of export dump files on the remote nodes.(as I used the local storage of node 1)
Again, the MOS documents were followed line by line, so all the necessary little things were done before and after the import process...

When the import finished, our work in the database tier was also finished.
We left the performance tuning to the end. I did few things in the database tier at this time; such as ;
Fixed object stats were gathered ( Issue details:http://ermanarslan.blogspot.com.tr/2014/01/dbmsstats-error-gathering-fixed-object.html)
After the import, srvctl and local listener configurations were done, as well.
In addition; Enterprise manager configured and started.. Also, fnd tables were cleaned and autoconfig was run on db tier to populate the tables with new database information again.(Note that, autoconfig was run on both db node 1 and db node 2)
During import I faced with some problems; you can find the explanations and solutions regarding to these problems in the following posts;

http://ermanarslan.blogspot.com.tr/2014/01/ebs-impdb-ora-39083-ignorable-apps.htmlhttp://ermanarslan.blogspot.com.tr/2014/01/datapump-impdp-raises-error-ora-942.htmlAt this point, our database became an Apps database , so we were ready to go ahead and connect our Application Tier to this 2node rac database.. We proceeded with the application post clone phase.

In the application post clone phase, we run adcfgclone.pl to update the application configuration files with the new database information. After this step the application became connected to our new database , which was running on Exadata.. When the post clone was completed, I did some manual configuration for load balancing, such as modifying jdbc_url in the apps context file.. After this changes , I run autoconfig to update the necessary configuration file of the application components..
At this moment, I reviewed the docs 881505.1 and 1133355.1 and took actions if previously missed out or necessary. Also I reviewed the EBS on EXA MAA papers and took actions..

Lastly, I configured all application components for connecting to the database in a load balanced way , increased the memory target, parameter of the database and started/restarted the application and database services. You can find the details about load balancing in the following post ; http://ermanarslan.blogspot.com.tr/2014/01/ebs-on-oracle-rac-exadata-load-balance.htmlNote that; I faced a problem with tnsping , tnsping 8 couldnt ping to the database through a load balanced tns.. In this case, it was normal.
For the details and explanation: http://ermanarslan.blogspot.com.tr/2014/01/tnsing-tns-03507-even-though-sqlnet.htmlInitially the architecture became like the following;




So all the components were connecting to the database using load balance tns.. ( Note that EBS 11i cant use scan ips by the way..)

With this configuration, the business users run some tests on the screen ad batches.. They seemed pleased with the response times. They also run their discoverer reports, which were suffering in performance, and they seem pleased again..
At the same, we analyzed the system performance through awr reports and enterprise manager..
We saw that there weren't any I/O related waits.. This progress made us happy, because all the waits were on CPU..
Top 5 waits..
DB CPU 8,610
PX Deq: Slave Session Stats 982
library cache: mutex X 508
enq: PS - contention 470

The machine was idle actually.. So in order to enhance the performance, we found top 50 tables and put them in to the flash cache, and also compressed them using query low compression method..
After the compression we moved the related tables/partitions/subpartition and rebuilt the indexes on them.(in order to compress the current data stored in those tables).
We faced an issue while rebuilding the indexes..
Read the following for details: http://ermanarslan.blogspot.com.tr/2014/01/is-my-indexindex-subpartition-unusable.htmlAt the same time, we increased the parallelization in tables and made the necessary modifications in the database parameters for supporting such a parallel processes.. (like max parallel servers and etc..) We did this as the database servers had a lot of Cpu cores, but they were idle. We needed to utilize them..
Discoverer reports seemed accelarated with this modications, but some issues were reported by the Business Users.. The issues were in Forms components.. That is, some screens (like login page) started to work slow.

This was because of the parallelization. Discoverer reports liked parallelization, but the screens didnt..
On the other hand, we had to this.. We had to utilize the CPU's..
To supply the paralelization without effecting the screen peformance, I designed the architecture as follows;


So ,with this design, our Rac database service was separeted into two logical units..
One node would be serving for Web,Forms,Concurrents and Reports components..
The other node, node2 would be serving for Discoverer Clients only.
By seperating them, we gained the control of the database according to the Application needs.
Db node2 was configured to have 64 parallel servers, which was optimal for Discoverer reports.
Db node1 was configured to have 8 parallel server, which was okay for the standard EBS processing..

Note that, we got back the parallel degrees that we set on tables previously... as the standard tables were configured to use the application's standard parallel degrees.. This action was taken because Application developer explicitly set it.. Application developer should be the one who knows the code, should be the one to answer the question : Is this table suitable for high degree parallelism?... so if during the installation, application tables were configured to be in serial by the installer, then they should be serial, but if the tables were configured to be in for ex 8 degree parallel, then this means those tables are suitable for parallel operations and we can increase the parallelizm degree on them..
Anyways , we tuned the system according to above thought.
I mean, for custom the tables used in Discoverer reports, we set 64 degree parallelism.
For application standard tables, we used single thread (if they were set so by design).. For other application tables, which were configured to operate in 8 parallel, we increased the degree to 32..
and had left the system to the EBS functionals/developers for testing..
We also made some indexes invisible.. But not all the related indexes.. Only we made this operation on the indexes which we found necessary..
The test result were very good, actually better than we expected.
For example: There was a performance problem in standard HR forms and an SR has been opened for this problem 3 months ago .. Development probably would send a patch for the problem, but the issue was resolved automatically after the migrating to Exadata.. In other words, Exadata could run the problematic query with its Software & Hardware power .. The sql is still problematic but it is transparent to the application:)
Discoverer reports were also accelerated without a need for any modification in the sql.. A Discoverer report which took 4 hours to complete, could run in 10-20 mins .. In addition, batch concurrent programs have been accelerated significantly. Following is important part of the test results.. I m sharing it with you.. You can see the improvement in performance by comparing the Source and Exadata columns...


In addition to the improvement in application performance; database footprint reduced down to %50 of its size in the source..

Of course backups were superfast.. We could backup entire database(nearly 600gb) using only 8 parallel channels into the ASM almost in 6 mins.. We tried different backup types. We had used high compress and basic compress methods to decrease backup size. The compression ratio was %80, actually pretty good for an Erp application's database.. Backup Test results are below;

OBJECT_TYPE
START_TIME
END_TIME
INPUT_MB
OUTPUT_MB
DB FULL(high comp)

2/2/2014 12:33:38 PM
2/2/2014 2:55:38 PM
573607936
101987328


DB FULL(basic comp)
2/2/2014 11:47:37 AM
2/2/2014 12:31:36 PM
573576192
118644736


DB FULL(nocompress)

2/2/2014 11:29:29 AM
2/2/2014 11:36:04 AM
573486080
525056000

As a result of this performance improvoment in I/O , cloning time for this EBS environment would be almost 1 hours with all the application specific configurations.. Cloning process is an import thing for Apps Dba 's, as it can be a throuble if it is a time cosuming thing.. For example, we spent 1 day for cloning activites for this system normally, but If we had a chance to use exadata we could take 24 clones in one day :)..

Of course, there are somethings that even Exadata cant cure.. Like a Discoverer report we have seen in this POC.. That problematic report was running & running.. It was running in loop, its logic was wrong.. In Exadata, it couldnt be accelarated, but we could easily say that the reports is a defective one, as we dont have any waits in database , in IO or something else.. This could be clearly seen by the application owner, too..


In conclusion;

I can say that, Exadata is a great cure for busy EBS 11i an R12 environments.. We already support EBS production systems running on Exadata X2 and X3 s , but the systems we support, could not even utilize Exadata in %50.. That 's why this POC was usefeul ,as in this POC, we could see an EBS environment, which already reached the limits of its current server environment.. We could see that this system can take advantage of Exadata's power in terms of performance and at the same time even this system can not utilize either Exadata's IO subsystem or CPU 's . Test results were promising, even though we didnt have enough time for fine tuning.. Exadata didnt let us down :) We could see and achieve the things in real life.. The things those were written in advertisements...


2 comments :

  1. Good One Bro...

    https://exadatacertification.blogspot.com

    ReplyDelete

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.