Thursday, March 27, 2014

EBS-- 12.2 Applications Dba Online Patching (ADOP)

Online patching is our new toy in EBS 12.2.
Adop is the utility for patching EBS 12.2 . Adop replaces adpatch, altough it uses adpatch in background :)
Oracle provides this online patching system by using two Apps tier filesystems named fs1 and fs2 in Application tier , by using edition based redefinition feature of 11gR2 Rdbms in Database Tier.
Normally, we have 5 phases when applying a EBS patch. These phases are prepare,apply,finalize,cutover and cleanup. (adop phase=..)

We start with prepare for syncronizing the patch filesystem for Applications and the patch version for db tier.
Then we continue with apply phase. This phase is actual phase that the patch operations are done.. adop uses patch filesystem to upload new apps tier files and uses the patch edition of the database to take db actions.  After apply we continue with adop=finalize. In this phase, application and database specific actions such as compilation takes place.
So those 3 phases require no downtime , but the next phase which is called cutover need a little downtime. It is automatic though.. In cutover phase, adop switches the patched filesystem and db edition with the current/run filesystem and db edition... That's why Application Tier components are restarted in this phase.. Downtime is coming from restarting the components..
The last phase is cleanup phase. It is self explanatory..  In this phase ADOP starts cleaning up the old editions.

One last thing, if you are using 12.2.0 VISION, be careful if you need to restart the apps tier process after a patch application, because in 12.2 VISION we dont have an environment file for choosing and setting the correct environment.. I mean EBSapps.env..
On the other hand, you can use the following script maybe;

# Script to set the current RUN Edition APPL_TOP environment automatically.
# Available through My Oracle Support Note 1545584.1
# Requirements: Set the XMLfile variable as specified below. This needs
# to be set only once and not again after a file system cutover in adop.
# Save the script as fsauto<SID>.env in a non-EBS directory, eg.:
# /home/oracle/fsautoPROD.env
# Usage: . <directory>/fsauto<SID>.env
# Eg.: . /home/oracle/fsautoPROD.env
# note the space ' ' character between the leading period '.' and first
# forward-slash '/' characters
# Alternatively use: source <directory>/fsauto<SID>.env
# If needed make the script runnable using command: chmod u+x fsautoPROD.env
# Carlo den Otter, 02-APR-2013
# Note this script is for educational purposes only, and as such is not supported.
# It was created on Linux and not tested on other platforms.
# Use at own risk.

Lastly, I suggest you to take a look to the following note for the environment setting :
How To Automatically Set the Current Run or Patch Edition / File System for EBS 12.2 (Doc ID 1545584.1)

One last thing.. I share a drawing of mine , that explains this online patching process.. Maybe it become useful for understanding..

Exadata -- Learning Path

To start learning Exadata , you can follow one of the learning paths in Exadata- Learning Path below.
I dont want to say that you need to attend the related courses to get those skills, but  somehow you need to learn the pieces composing this Engineered System.. Otherwise, your knowledge about Exadata will be shallow.

In many cases, we use single admin for Exadata, on the other hand; following Learning Path describes different roles(which is the desired method I think) for managing Exadata.

In my opinion following skills should be enough for administrating Exadata Machine as a whole ;
Database Administration
Rac Administration
Oracle Linux Administration + Oracle Solaris Administration (if your compunte nodes are Solaris)
Exadata Database Machine Administration

Any extra skill to the above skill will be easy to learn and a bonus I think :)

Here is the Oracle 's Learning Path for Exadata;

Tuesday, March 25, 2014

Linux -- Rhel 6 panics on boot -- selinux problem

When booting the RHEL 6.5 , with kernel 2.6.32-431 or 2.6.32-279 , we have faced a panic issue
as follows;

Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init not tainted 2.6.32-279.el6.x86_64 #1
Call Trace
  [<ffffffff814fd11a>] ? panic+0xa0/0x168
  [<ffffffff81070db2>] ? do_exit+0862/0x870
  [<ffffffff8117ca75>] ? fput+0x25/0x30
  [<ffffffff81070c38>] ? do_group_exit+0x17/0x20
  [<ffffffff81070cc7>] ? sys_exit_group+0x17/0x20
  [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b

This seems to be document in Redhat Support:

For the solution, we booted Redhat with rescue cd and corrected the selinux configuration , as it was corrupted.. SELINUXTYPE was set to DISABLED..

An alternative solution could be using command line option in grub menu;

press 'e' to edit the command
press 'e' to edit kernel command
at the end of the line write
3 selinux=0
3 is runlevel 3
and selinux=0 -> disables selinux permanentaly..

EBS-- Oracle EBS 12.2 Vision Installation

Nowadays, we use EBS 12.2 in new projects.
At the begining of our implementation, we first install 12.2 Vision environment.
Because the vision is used for Demo and for some kind of initial training, that s why, customers does not allocate powerful servers for it. They want us install EBS Vision to Virtual Machines running on minimal logical hardware.. If that is the case, the 12.2 vision installation an take some reasonable time, as there several zip files to download & unzip and there are several critical places , which the installation can fail..
In this situation, it is important to do the installation in one go without sticking any errors..

Following approach desribes the installation process while bringing you some important information for the installation. If you consider this document, you will be likely able to install Vision 12.2 in one go.

Note that , the instructions in this document are for Linux 64 bit environments, but similar approaches can be implemented for different environments , as well.

First of all , you should follow the MOS document:
Oracle E-Business Suite Installation and Upgrade Notes Release 12 (12.2) for Linux x86-64 (Doc ID 1330701.1)

You should follow the above line by line to satisft all the requirements.
Keep that in mind that ,
an insufficent memory can extend your installation time for days.
an insufficient temp size, can make you encounter errors during in the middle of the installation, after waiting for 3 hours maybe.
a corrupted installation file can take you to the start ..

I have seen this, I have done this, you dont want this :)

That's why before the beginning take a look to the following notes too;12.2 Rapid-Install fails on Installing Web Tier Utilities with Error "Fatal Error: TXK Install Service" (Doc ID 1587452.1)
Oracle E-Business Suite Release Notes, Release 12.2 (Doc ID 1320300.1)
 Once you are ready, I mean after all the hardware/disks requirements covered, all the rpms installed,  kernel and limits configured,  users/groups created, firewall disabled , /etc/oraInst.loc created/pointed to relevant oraInventory, and etc, you can continue with the following;

Change you directory to where your zip files are located and unzip following 3 archives with root user;

cd /to_your_stage
unzip -q 'V35215-01*.zip' -d .
This action will create startCD diretory in your stage
Then go to your startCD directory and execute the script to build the Installation Stage.
cd startCD/Disk1/rapidwiz/bin/
When you exeute the script above, a menu will appear.. Choose Create new stage function, and create your stage by specifying the directory that your zip files reside.
Before starting the installer, you need to open a vncserver or a similar X display server, and grant necessary privileges for everyone to open display on it. (I mean vncserver , and xhost + commands)
--Dont  use manual unzip for building the stage..

So, after building the stage successfully, you may need to adjust your rman restore parallelism according to your resources;
You can add the following line into the restore script restore-single2.sql located in startCD/Disk1/rapidwiz/template/ directory
For 2 parallel processes;

select 'configure device type DISK parallelism 2;' from dual;

Add the above just before the line "select 'catalog start with ''INSTALL'' NOPROMPT;' from dual;"
and then start the rapidwiz installer as follows;

cd /startCD/Disk1/rapidwiz/

This will open the installer program.. Installer will want some input from you, as well as want you to push the Next Button :)
And example installation is as follows;
Note that , I will not write any information about the installer screens as they are self explanatory..

If you somehow encounter errors and need to reinstall this, you should to the things below before restarting the installer;
delete the contents of the directories where you tried install the EBS 12.2 in your first try.
delete all the tempfiles in your tmp directory;
clear the contens of the inventory
kill all the oracle processes.

Anyways, after finishing the installation successfully, you should continue with post installation actions documented in 1330701.1, and apply  NLS language patch if necessary.
In 12.2 we use adop (not adpatch) utility to patch our EBS environment. Adop is a tool that is a part of online patching mechanism which also introduced in 12.2
Take a look to the FAQ in MOS to have a quick understanding abot the online patching concept and adop.
Oracle E-Business Suite Release 12.2: Online Patching FAQ (Doc ID 1583902.1)

Following is a graphical description of the new architecture that supports online patching..


As you we see, we have 2 filesystems there.. One for the running env. and one for the patches to be applied.
Also, there is edition-based redefinion used in Database Layer, to supply such a seperation like run system and patch system.

We have also a new script to switch between these 2 environments.. EBSapps.env
We dont need so we dont have a manintanence mode option anymore.

For NLS patching;
1)Download your NLS zip
2)unzip it , for ex: unzip -d $APPL_TOP_NE/../patch
3)License your language using Oracle Applications Manager
4)Maintain Multi lingual tables using adadmin
5)cd to patch directory
6)adop phase=prepare
7)adop phase=apply patches=10124646_TR:u10124646.drv  (for turkish)
8)adop phase=finalize , finalize_mode=full gathers statistics to help improve performance. Finalize will take about one hour longer if this mode is specified.finalize_mode=quick does not gather statistics, and therefore completes more quickly. This is the default. So; adop phase=finalize should be run here, by default it will be quick.. (it is vision I felt no need to gather stats)
9)adop phase=cutover
10)adop phase=cleanup

The description of adop phases are below:

prepare - Prepare the instance for patch application. 
 apply - Apply patches (to the patch edition). 
 finalize - Ready the instance for cutover. 
 cutover - Make the patch edition the new run edition.
 cleanup - Drop obsolete objects and data from old editions.

Special phases: 
 abort - Abort the current patching cycle. 
actualize_all - Create new copies of all code objects in the patch edition. 
 fs_clone - Copy the run file system to the patch file system.

So, I desribed  12.2 installation for a Single x86_64 Linux server.
I will make 12.2 fresh installations on Rac environments in upcoming days.. Split configuration will be involved, as well..  I ll keep you informed..
That is it for now. 

Note that: It took 6 hours to complete 12.2 Vision installation on a Vmserver with 4 logical cores and 12GB memory. The stage was on an external drive mounted from usb.

Also, It seems it is also mandatory to upgrade to 12.2.2 or 12.2.3 RUP patch.. It is a required as documented in 1320300.1.

Sunday, March 23, 2014

EBS 11i -- ORA-06512 associative array not consistent with session parameters

In a 11i EBS environment with a or database, you can encounter "ORA-06512 associative array not consistent with session parameters" in EBS screens, especially during peak times..
Note that, you can encounter this error in different Applications as well, because the error is caused by a database bug actually...

Following EBS screenshots represent the generated error messages caused by the underlying database problem;

This ORA-06512 error should to be raised If settings for national language or globalization change during a session that uses associative arrays.
The possible workaround should be not modifying NLS parameters at the session level when using package-level associative arrays. But it is not easy to implement as this is an EBS, and it is a packaged application.. In addition this issue starts to come out suddenly -- especially under peak..

So I will keep it short the issue is caused by bug 5890966..
There is a patch(for db) available on MOS.. You can apply patch 5890966 for fixing those errors..
Patch is available for and Rdbms..
End of story :)

Linux -- extending /tmp size, an alternative approach-- not preferred

Sometimes, Linux admins use /root filesystem to hosts special directory like /tmp or /boot.. I prefer sepearing them and using dedicated mount points for hosting those directory. There are several reason for doing so. One of them is ease of administration.
Lets look at an incident that I have faced during Oracle EBS 12.2 installion on a Linux box.
The Linux had 2 gb root filesystem. The root filesystem was not seperated . I mean the special directories was not mounted as seperated devices..
The installation was reporting errors such as ;

Stdout: Extracting 0%..............................................Fatal error encountered during self-extraction.[No space left on device]

After analyzing the log files, I saw that temp filesystem was the problem.. Installation requires more temp , but the OS have only 2gb disk space for root filesystem and all of its contents.. Temp was just one of those directories in root filesystem..  So we had a problem.. Resizing root filesystem was one of our options, but it was costly, as it requires recreating partitions and resizing filesystem.. I  had another mount point with 150gb disk size available so I did a soft link operation and saved the day in 2 mins, but it is not preffed operation, it is risky and should be done in production environments... Anyway, I dont like using soft links in critic environments..

What I did was;

mv /tmp /u01/newtemp
ln -s /u01/newtemp /tmp

Then , I checked the tmp  directory size limit by creating a 2 gb file on it ..
cd /tmp
dd if=/dev/zero of=filename bs=1024 count=2M

Actions above saved the day as it was a non-production system. On the other hand; the important lesson in here was ;
"create separated filesystem layout in Linux installations" 
"By doing so you can deal with the specific directories not the whole root partition in case of a disk space problem" 

Application Deployment Workflow Diagram

Following is an Application Deployment Workflow Diagram, that summarizes the workflow implemented accross different teams those play roles in application development,support and system administration. I designed this for Oracle Systems, and this workflow can be applied in mid-sized IT departments for deploying new functionalities or bugfixes to Oracle Databases or Application Servers. This diagram was prepared in 2008.. In those days, we didnt have a dedicated Test team, or a lot of servers for seperating Dev,Test,Uat and Perf environments :) Now, we always use at least 3 environments even for minimal deployments. For example, I  had made an upgrade of an EBS environment in one of our client last year.For that system upgrade, I made the same work in 5 different EBS environments : Dev,Test,Uat,Perf and Live :) But this diagram as I said earlier can be implemented in mid-sized IT departments, even in today...

The processes are enumerated in the diagram. Their definitions are as follows:

1: The developed code or application components are transferred to the Application testing or support team.
2:Support/Test team tests the code to be deployed, in the Test environment. Note that : DBAs deploy the code or application components in to the Test environment, altough it is not seen in the diagram.
3:If Support/Test/Dba team sees any problems during tests, the code is sent back to the development team while reporting the issues.  Note that: Dbas can report issues, too..
4:If the tests are successful, then an approval request is sent to the Head of Support/Development.
5:After the Head of Support and Dev approves it, the deployment package is sent to Dba Team.
6:Dbas deploys the new functionality or bugfix in certain maintanence time ( this maintanence windows should be decided by the Division already..)
7:Dba informs the relevant teams about the new deployment, and sent a request to the Application support and Test team to make them start their  Production tests immediately.
8:Application support and Test team tests the code in the Production environment and approves/disapproves the deployment and functionality.
9:Application support and Test team sends the final decision to Dbas
10: According the information received from the App Support and Test team, Dba rollbacks the change or finishes the deployment process by making System performance and functionality tests. 

Exadata -- default passwords

Compute Nodes:


ILOM login : root/welcome1

Cell Nodes


ILOMs login: root/welcome1

InfiniBand switches


ILOMs login : ilom-admin/ilom-admin , ilom-operator/ilom-operator

Ethernet switches


Power distribution units (PDUs)



Wednesday, March 19, 2014

EBS R12 -- Notification Mailer -- Unable to relay, Stuck, Recreation Out queue

Yesterday, an issue regarding to the alert notifications was escalated to me.
The problematic alert was an periodic alert, which was designed to send emails to the relevant recipients. The alert was working properly, producing data and giving the order for an email to be sent.. The problem was as it is said.. The email was never delivered.

I analyzed the situation by querying alr and wf tables in the first place, because these alert things are a little nasty in EBS environments :).. On the other hand,  I didnt see any abnormality there..
So, I decided to take a look to the notification mailer logs.. To able to see the whole picture there, I increased the log level of the notification mailer to "statement".

While analyzing the notification mailer's log file, I saw the following ;

474:34:ERROR:[SVC-GSM-WFMLRSVC-185112-10006 :]:Problem encountered when sending to {[[], []]} -> javax.mail.SendFailedException: Invalid Addresses;
  nested exception is:
        class javax.mail.SendFailedException: 550 5.7.1 Unable to relay;

The issue was clear, an email was trying to be sent but, the SMTP server didnt have the necessary relay for the domains..(
Then I decided to find where those emails were triggered.. So I checked the custom alert definitions from the Alert Manager, and found that they were triggered from a Custom periodic alert.. Normally, I disabled this Alert, and restarted workflow notification mailer to have a clean start. Unfortuneatly, the error messages continued... It was clear that there was a stuck situation,  like a paper jam :)
That is  the notification mailer was still trying to send those emails, and it could not proceed with the other emails. In other words, Ntf mailer couldnt send any emails because of those stuck ALR notifications..

After examining carefully, I saw that those alert notifications are stuck in wf_notification_out queue. Notification mailer was trying to proceed with the email orders in wf_notification_out queue, and got stuck there.. In my opinion, there should be a bug record for this..
Anyways, when I checked the out queue, I saw that there was not any important notification orders.. That's why ; I decided to recreate the queue.. The steps I followed are as follows;

I backed up the the table , just in case;
  • Stopped the notification mailer
  • Backed up  the creation scripts of wf_notification_out table and queue.
  • Dropped the WF_NOTIFICATION_OUT queue with the API; 
        sqlplus apps/apps
         exec dbms_aqadm.stop_queue( queue_name => 'APPLSYS.WF_NOTIFICATION_OUT', wait =>          FALSE);
         exec dbms_aqadm.drop_queue_table( queue_table => 'APPLSYS.WF_NOTIFICATION_OUT',              force =>TRUE);
  • Recreated the wf_notification_out table & queue manually, with sqlplus using the scripts I backed up in the first place.;





,COMPATIBLE => '8.1'
,COMMENT => 'Workflow JMS Topic'


,RETRY_DELAY => 3600
,COMMENT => 'Workflow JMS Topics'

aSubscriber :=$_agent('WF_NOTIFICATION_OUT',
,subscriber => aSubscriber
,rule => '1=1');



  • Executed  following from the application node..
sqlplus apps @$FND_TOP/patch/115/sql/wfntfqup.sql APPS apps APPLSYS
sqlplus apps/apps @$FND_TOP/patch/115/sql/wfjmsqc2.sql applsys apps
sqlplus apps/apps @$FND_TOP/patch/115/sql/wfmqsubc2.sql APPLSYS APPS
sqlplus apps/apps @$FND_TOP/patch/115/sql/wfntfqup APPS apps APPLSYS   --> again
  • Started up the notification mailer and the issue dissapeared as expected . 

Monday, March 17, 2014

RDBMS -- Restricted Mode,Append&Undo,Nologging Indexes --snip

In this post , I will write about 3 things those I have seen in my clients environment.
The things I will mention are related and observed during database migration operations.
They are also related with each other, as commonly in logical migration, we start our database in restricted mode , load our data in append mode and create our indexes in nologging mode ..

Suppose we have downtime for the migation operation.. During downtime, we unload our data and load our data to the target environment. The first  important thing to consider here is that , we need to ensure that the data will not change in our running database during our unload operation.
To supply this , we can stop our listeners, restart our database, stop our job queue processes and etc.. to have a clean database session environment, but even with these actions, we  does not guarantee the up-to-date reliablity of our data.. Because, some applications still can use beq connections and update our database even if our listeners are not working..
That 's why, to guarantee that,  we use restricted mode for our database. That is , we stop our database and startup in restricted mode.. In restricted mode we unload our data and import it to the target..

But, in restricted mode,  there is one important thing that needs to be considered ,which is the fact that you can not connect to a instance in restricted mode remotely using a normal tns ...
If this is the case, I mean if you need to connect remotely to an instance in restricted mode, you need use (UR) statement in your tns entry..

For example


The (UR=A) is enhancement, since 10g .. Using service_name is preferred..
Without UR=A , you can end up with; TNS-12526, TNS-12527 or TNS-12528 even if you use SYS account.. Because when the instance in restricted mode, PMON updates the listeners with that information in startup and that's why, listeners block new connections from being established..
So by specifiying UR=A in your tns entry, you can use sql developer or a similar tool to connect your database remotely and you can make your data unloading/loading remotely even if your instance is in restricted mode.

Loading data in append mode is also one of the irrevocable approaches in data migrations.. As known, when inserting in append mode,the operation will be direct path load and the redo generation will be decreased.
It is decreased but not eliminated..

insert /*+append*/
Create table as select

When we use direct path load, we bypass the buffer cache, we gain some time, we load our data above the high water mark, so in such a scenario, rollback for an append operation means just deallocating the areas above the High water mark.  
So in a manner, we bypass undo generation for our table.That is, we dont need to keep the undo data, as we already have our action plan if the transaction needs recovery.(just deallocate what is allocated by our transaction) 
On the other hand; this is not acceptable for dictionary modications and the indexes on the table we are loading in to.. The indexes and dictionary modifications, caused by our appending load operations, will still need to have undo, as they can only be rolled back with the traditional way. 
So keep in mind that, undo will be generated  in append mode, but the amount of it will be lesser.

Lastly, I will mention about the nologging indexes. Creating indexes in nologging mode is another considerable approach in migration projects with limited downtime.. 
That's why we can choose to create indexes in nologging mode.. By choosing nologging index operations, we gain time as these index creations does not generate accessive redo data, because they are considered as unrecoverable operations. On the other hand; leaving those indexes in nologging mode is unnecessary. Because, when we the subject is an index, nologging attribute plays part on only for creation, altering and rebuilding operations. That is , only create, alter and rebuild operations on an index can be done in  nologging mode.. So after these operations , you better make your indexes logging again, as this is more logical and logging objects are mandatory in Production environments.. 

alter index index_name logging;

Monday, March 3, 2014

RDBMS -- Private Strand Flush not complete waits --Strands

This post is about the Private Strand Flush not complete waits which can be encountered in the Redolog write phase of a running Oracle Database.
Oracle Database keeps track and capture history of changes in the redolog files. They are like the first point of contact for database operations, which modifies, deletes or inserts data as well as metadata.   
I will not go in details about the redologs, Log writer processes and Log buffers as they are not the main subjects of this post, rahter I will be focus on the Private strand flush not complete waits, those I have seen in some customer's site nowadays.

As we now, our processes write to the log buffer .. Lgwr background process writes to the redolog files.. Redologs files are switched when they are filled or manually. Dbwr works on the background to write the dirty buffers to disks.. 
The above sequence of events need to be quick, need to be in a regular fashion and need to be done without damaging the consistency of the logical process cycles.

In the scope of redologs, we have the most important waits are caused by the Redo Copy Latches, Redo Allocation Latches and Redo writing Latch. Note that Oracle Latches are like memory locks that protect important memory structures to get logically damaged because of the concurrent accesses..  In a manner, they are like the mutexes protects the critical section of a program 's code..
So we can use them to understand the processing work flows.

For instance , when a process need to write in to the Log buffer, it needs to acquire Redo Copy latches. The latch name gives us the clue that makes us understand the event.. So this writing operations is a copy that is made from the process memory to the log buffer. 
Redo Allocation Latch is acquired for allocating space in the Log buffer. This latch and redo copy latches seem to alike, as Oracle process acquire Redo allocation latch for small redo records, and do the copy operation without a need to acquire a Redo Copy latch.
Redo Writing Latch prevents the Oracle processes to not to send the needed signal to the LGWR process for a log buffer flush or a log switch operation, concurrently.

So as you see above there are some important latches which can create waits on those events..
For instance,  
log file sync wait occurs if a process waits for LGWR to flush the data from the log buffer to redolog files. That is , the user process commits and waits on this event till LGWR will finish the data to the redolog files, till LGWR will send a signal indicating that the flush request is finished..
If LGWR is idle, I mean ony waiting, it will wait on Rdbms ipc message..
If LGWR is updating headers of the redolog files, you will see log file single write waits..
End if ; LGWR is writing the redo data from log buffer to redolog group, you will see log file parallel write waits, as this operation can be done in parallel.

So after giving the information above, lets come back to our subject.. "Private strand flush not complete"
You can see this message in alert log of a 11gR2 Oracle database, and you may worry about your database performance, but according to the Oracle, this is expected behavior..

Take a look at the following blog post of mine; I expain strands in detail;

So , like in the above figure; Writes to the redologs are made strand by strand.. The redolog is divided into parts, sized with the strand size, and each strand is written to the relevant part of the redolog. In other words; the strands of log buffers, which are in memory, are mapped to the redolog file.

According to the techinal info above,  all strands need to be flushed when a log switch is being initiated.
That s why "checkpoint not complete" and "private strand flush not complete" are similar events..
This messages mean that there are some dirty blocks, some active sessions when the log switch is being initiated. A strand sill have transactions active which need to be flushed before this redo can be overwritten.
So it seems, Oracle writes this information to the alert log and keeps going;

For example;
Sqlplus  >
SQL> update erm set X='Y';
1 row updated
"note that I do not commit and my session is active"

Another Sqlplus >
SQL>alter system switch logfile;

Alert Log >

Thread 1 cannot allocate new log, sequence 11
Private strand flush not complete
Current log# 1 seq# 222 mem# 0: /Erman/redolog01.log

So this is not a problem , but expected behavior.. 
Neverthenless,  check the time between "private strand flush not complete" and "Advanced to log sequence"/"Current log#" message..
If the time between those messages are significant, then an I/O or DBWR or LGWR tunning may be required. 

Linux-- D state processes and Load Average, Cpu utilization vs Load

In my earlier post , I mentioned about the process states and especially uninterruptable sleep state(D state) in detail..

Today , I feel that, I have to write down the impact of D state processes to the Load Average, because there is a misconception between Load average and Cpu Utilization, and there is ambiguity around me and it is about the processes included in Load average calculations ..

First of all, the processes waiting with an uninterruptable sleep state (D state) , are included in the calculation of Linux Load average. I have already seen these.. When a unrecoverable failure arires NFS disk path and if that disk is used by the process, you will see the D states, and you will see that your Load Average goes high.. So, Linux includes processes in uninterruptible sleep states to its load average..Yes it is so, even if we think that load average calculates running and ready-to-run processes only and these type of processes are usually in wait for DISK I/O..  The design is implemented in this way because, uninterruptable sleeps by concept, can lead a blocking system, as you know these processes can not be kill or stopped before they get what they are waiting for..

So, if you have D state processes, you can expect a high load average..
This is already obviously defined in man pages,

ldavg-1 System load average for the last minute. The load average is calculated as the average number of runnable or running tasks (R state), and the number of tasks in uninterruptible sleep (D state) over the specified interval.

Second, Cpu load is different than Cpu utilization.. This explains why we see Cpu as idle even if the load average is high. The difference can be explained by the following picture:

So , we can have free cpu cyles in system but we should have to wait for other things.. These situation will lead our load average to increase while our cpu utilization does not change..  

Lastly, I will mention about the optimium value for load average..
So if you have one cpu-core machine. The optimum value for maximum load average is 1.
If you have multicore or multicpu machine; our rule of thumb is -> max load average = number of cpu cores.
Of course , multicore and multicpu are different in technology, in terms of cache and etc, but this is what we accept roughly..