Friday, May 22, 2015

Linux / Oracle RDBMS -- High Load after VM migration

After migrating a 10gR2 EBS database from a 32 bit Linux machine to a new 64 bit Linux Machine, performance started to slow down.
Load average was always somewhere between 20 and 30, and end users were facing slowness in the EBS OAF and Forms screens..
Altough the users were facing this slowness, they were also saying that their Discoverer reports are completed faster than they were in the old 32 Linux..
The major difference between this old and new machines are as follows;

Old Machine Linux version: Redhat 5.5
Old MAchine Arch: 32 bit
Old MAchine SGA: 8GB (implemented VLM mode + in-memory filesystem (shmfs/tmpfs, ramfs)
Old MAchine PGA: 2G
Old MAchine Total Memory: 31GB
Old Machine CPU core count: 16 Pyhiscal Cores (No Hyper threading)

New Machine Linux version: Redhat 5.9
New MAchine Arch: 64 bit
New MAchine SGA: 18G
New Machine PGA : 10G
New MAchine Total Memory: 48GB (hugepages enabled for SGA)
New Machine CPU core count: 16 Cores (16 virtual sockets , one core each.. no hyper threading)

So , the new machine was 64 bit, had 48GB memory(hugepages were also enabled) and had the same core counts as the old server, but it was on VM (Vmware ESX)
The important difference was that the new machine's CPU cores were virtual..

When we examined the AWR reports, we have seen CPU waits all the time..
Also, In top command , we have seen %96 user space CPU usage all the time..
Load Average was high all the time..
Discoverer reports were running all the time.. They were  always in "R" state and they were always consuming %99 cpu..
So they were obviously the main cause of the slowness, but we had no chance to tune the Discoverer Reports, because these reports were also used in the old system by the End users..

AWR gathered for 30 minutes interval representing the problem:

EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time19,76974.4
db file sequential read209,68085043.2User I/O
SQL*Net more data to client644,83054612.1Network
latch: cache buffers chains4,576389851.5Concurrency
TCP Socket (KGAS)6,313317501.2Network


Okay we had 19769 seconds CPU time..
On the other hand we had 16 vm cores in this servers. So for a 30 minutes interval , we had 16 * 30  * 60 seconds CPU time available ..
This made 28800, so we still had 9031 seconds CPU time available..
So, we had a high CPU load but we were not in a hang situation. This high DB CPU usage meant that the database was spending a lot of time in CPU (processing) compared to I/O and other events.

This high CPU usage was caused by the Discoverer Reports, it was for sure..

CPU Time (s)Elapsed Time (s)ExecutionsCPU per Exec (s)% Total DB TimeSQL IdSQL ModuleSQL Text
2,1902,68563,2310.0310.106j1rnhpxu1bryDisco10, SELECT COUNT(T.NO) FROM XXIC_T...
1,9942,4932996.849.3879h2r4xdn8g44Disco10, SELECT o481137.CUSTOMER_NAME ...
1,8272,2022913.398.2993bkbks2j66q5Disco10, SELECT DECODE((decode(o310037...

The design of these reports were not optimized.. For example a subquery was running 63231 times in 30 minues and  It was a cpu bound query.. On the other hand, it was not the time to tune these sqls or modify these reports, because they were working exactly as the same way as they were working in the old system.. 
I mean, even the execution plans and the sqls in these reports were the same, they did not change.. 
In addition, this new system was 64 bit and it had bigger SGA and bigger Physical memory than the old one.. So we were not expecting such a load in the new database server..
That 'is , the cause of high Cpu times must be something else..

There was db file sequential read listed at second place in the Top 5 Waits Event.. This wait was indicating single block index reads and its average was 4 ms.. So it was acceptable for this environment.. In adddition, we were using the same disk devices even the same filesystems as the old server, so IO was not the problem..

It was the CPU... It would be better if I could share the CPU model information in here but the idea in this post is to prove the slowness using AWR ..

In the Latch activity , we saw the cache buffer chains with the highest Get Request..

                               Get RequestsPct Get MissAvg Slps /MissWait Time (s)NoWait RequestsPct NoWait Miss
cache buffers chains810,825,5890.020.033891,140,6170.03

The high cache buffer chains was directing us to find the hot buffers, find the sessions waiting for  latch contention, intersect the objects related with hot buffers and the objects touched with queries waiting for latch contention, conclude the problematic sqls and made an sql tuning. 
On the other hand, I was not in that mood, as this database was running as the same way as the old database server..

By knowing that, the latches acquired when a process needs them and latches are released when the process finishes its work in the relevant memory structure; we decided "the time spent between acquiring and releasing the latches" is also very important..

That is , if our processes could make their jobs quickly , then more latches would be available.. 
I mean, the time between acquiring and releasing the latches will be shorter, and that will decrease the cpu time waits instance wide, as the processes will not be on the CPU while waiting for the latches ..

At this point, we changed the CPUs.. I mean we changed the Hardware of the Virtual Machine which were running the database.. 
Note that , the disks stayed the same and the memory stayed the same.. 

After the CPU change: the cache buffer chain count was again so high;

                               Get RequestsPct Get MissAvg Slps /MissWait Time (s)NoWait RequestsPct NoWait Miss
cache buffers chains3,439,451,4022.090.000823,8850.04
Using the new CPUS, there were almost no waits in latch: cache buffers chain;
EventWaits%Time -outsTotal Wait Time (s)Avg wait (ms)Waits /txn
atch: cache buffers chains17,4240.00000.57
These values were as follows in the OLD system..
latch: cache buffers chains4,5760.00389850.34

So , eventhough there were more number of latches allocated with the new CPUs, there were almost no  wait times
Look, we had more db file sequential read, and more cache buffer chains , but our CPU time is lower :)

Note that : This report was taken for an 1 hour interval. Eventhough, it was for a longer period, CPU time was lower than the value with the old CPUs..

EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time10,73096.0
db file sequential read213,55616511.5User I/O
TCP Socket (KGAS)11,950162141.4Network
SQL*Net more data to client1,359,692930.8Network
log file parallel write28,298472.4System I/O

We had almost have the same load on the Oracle Database. Note that: these values are per second and per transaction.

Load Profile (with new CPUs) --1 hour AWR
Per SecondPer Transaction
Redo size:58,524.706,768.44
Logical reads:509,961.7958,977.58
Block changes:338.9039.19
Physical reads:171.8619.88
Physical writes:24.922.88
User calls:2,476.00286.35
Parses:170.1219.67
Hard parses:4.220.49
Sorts:296.3534.27
Logons:0.960.11
Executes:7,877.99911.10
Transactions:8.65
Load Profile (with OLD CPUs) --30 minutes AWR
Per SecondPer Transaction
Redo size:38,658.426,072.38
Logical reads:212,494.6233,378.16
Block changes:195.1930.66
Physical reads:294.8046.31
Physical writes:22.433.52
User calls:1,471.26231.10
Parses:94.9414.91
Hard parses:4.280.67
Sorts:157.2424.70
Logons:0.840.13
Executes:7,478.481,174.70
Transactions:6.37

We had  more redo size, more logical reads, more block changes, more physical read, more phyiscal writes, almost the same hard parses, more sorts, more executions and more transactions in New CPU System, but there was low CPU time waits :)
Almost the same sqls were running , same Discoverer reports were at the top, but this time Cpu time was not so high  in instance wide..

So this showed that the approach was correct, the CPUs were the bottleneck and the new CPUs fixed the problem..  The new cpus were again Virtual, but the Cpu family of the new hardware was different.. There were major differences in CPU specs... like hyperthreading...

Okay. So far so good. But we had one more question; how was this system dealing with this load earlier?  I mean this database was running on 32 bit server with 8gb SGA and we were not seeing this CPU time waits in AWRs... So the question is why?

Lets start with the CPU utilization...
High Cpu utilization may not a problem actually.. If you are not waiting for other resources, you can end up waiting for  CPU or running on the CPU..
 Especially if your queries are cpu bound, then you will have your CPU time increasing..

CPU time in AWR refers to the amount of CPU time that session used .. Oracle gets this info from OS using getrusage() or similar OS calls.. So this CPU time value does not include time spent waiting on CPU run queues..

High CPU utilization may not necessarily indicate that there is a problem; it could just mean that the system is being well utilized.
Time shown as "DB CPU" or "CPU time" in AWR and similar reports refers to the amount of CPU time that the session/s used and is usually based on information from the OS itself (eg: using getrusage() or similar OS calls on Unix). This measure typically does not include time spent waiting on CPU run queues.
If we could have ASH, it would be better because Time shown as "CPU + Wait for CPU" in ASH reports could give us an idea of the time that session/s were running, or wanting to run on CPU..

Anyways, the bottom line is that , if you are not waiting on other things, then you will be on CPU and your CPU time will increase.. You will use the CPU in parallel and you will increase the load of the server and then you will start to feel performance degregation in system wide..

Lets have a look at the AWR report gathered from the old 32 bit server, running the same database, same queries, same reports and having almost the same number of sessions connected to it..

Do not bother LNS wait.. This system had a standby when this AWR snapshot was taken.

Load Profile (with OLD CPU s --64 bit)
Per SecondPer Transaction
Redo size:38,658.426,072.38
Logical reads:212,494.6233,378.16
Block changes:195.1930.66
Physical reads:294.8046.31
Physical writes:22.433.52
User calls:1,471.26231.10
Parses:94.9414.91
Hard parses:4.280.67
Sorts:157.2424.70
Logons:0.840.13
Executes:7,478.481,174.70
Transactions:6.37
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time19,76974.4
db file sequential read209,68085043.2User I/O
SQL*Net more data to client644,83054612.1Network
latch: cache buffers chains4,576389851.5Concurrency
TCP Socket (KGAS)6,313317501.2Network

Load Profile (with OLD Phyiscal Server --32 bit )
Per SecondPer Transaction
Redo size:38,685.465,267.77
Logical reads:348,930.5147,513.61
Block changes:218.7829.79
Physical reads:1,026.41139.77
Physical writes:282.6938.49
User calls:2,255.70307.16
Parses:138.6518.88
Hard parses:3.750.51
Sorts:205.2227.95
Logons:0.860.12
Executes:6,003.52817.50
Transactions:7.34

EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time15,46693.5
db file sequential read755,43838912.4User I/O
db file scattered read302,77629011.8User I/O
LNS wait on SENDREQ38998253.6Network
log file sync17,380935.6Commit

So the old 32 bit system had more db file sequential read, more db file scattered read and a significant amount of CPU time..
Lets take a look at te Operating System Statistics ;

Operating System Statistics (OLD Physical Server --32 bit)

Note that : this AWR was for an 60 minutes interval.

    StatisticTotal
    BUSY_TIME1,868,106
    IDLE_TIME3,894,329
    IOWAIT_TIME65,502
    NICE_TIME48
    SYS_TIME222,197
    USER_TIME1,625,270
    LOAD5
    RSRC_MGR_CPU_WAIT_TIME0
    PHYSICAL_MEMORY_BYTES34,076,270,592
    NUM_CPUS16
    NUM_CPU_SOCKETS4
    • Total Buffer Gets: 1,262,389,050

    Latch NameGet RequestsPct Get MissAvg Slps /MissWait Time (s)NoWait RequestsPct NoWait Miss
    Cache buffers chains2,406,314,9300.280.0005,506,1880.00
    library cache13,180,2590.040.010112,1430.09
    library cache load lock10,3480.0000
    library cache lock4,375,4480.000.0000
    library cache lock allocation181,1450.0000
    library cache pin5,453,5610.010.0000
    library cache pin allocation69,9310.0000

    Operating System Statistics (OLD CPU s --64 bit)

    Note that : this AWR was for an 30 minutes interval.

      StatisticTotal
      BUSY_TIME1,674,495
      IDLE_TIME571,810
      IOWAIT_TIME39,970
      NICE_TIME0
      SYS_TIME296,542
      USER_TIME1,247,579
      LOAD18
      RSRC_MGR_CPU_WAIT_TIME0
      PHYSICAL_MEMORY_BYTES50,750,922,752
      NUM_CPUS16
      NUM_CPU_SOCKETS4

      • Total Buffer Gets: 444,663,900
      Latch NameGet RequestsPct Get MissAvg Slps /MissWait Time (s)NoWait RequestsPct NoWait Miss
      cache buffers chains810,825,5890.020.033891,140,6170.03
      library cache7,053,5640.120.24994090.24
      library cache load lock5,0110.00070.00
      library cache lock1,622,6040.010.3100
      library cache lock allocation62,0530.001.0000
      library cache pin2,719,8410.020.30430.00
      library cache pin allocation31,8160.0000

      As we see , The new 64 bit system which has the old CPUs had a load of 18.. On the other hand the old 32 bit system, which has physical  CPUs, had a load of 5 .. 
      The new 64 bit System which has old CPUs, has a BUSY time of 1,674,495 measured in 30 mins , but the old 32 bit system which has physical CPUS has a BUSY time of 1,868,106 measured in 1 hour..
      Also when we look at the cache buffer chains, we see that there are almost no waits for the cache buffer chains in the old 32 bit system.. Note that Total Buffer gets is much higher (3 times more) in 
      the Old 32 Bit server, and even if it so , it has almost no wait for cache buffers chains..
      The situation was almost the same for library cache latches.. 
      At the bottom line ,we can say that altough the old 32 bit server had more operations, its CPU busy time is lower(1/2 rate) than the new 64 bit server which had old cpus..

      So this was a proof for the CPU problem in 64 bit server.. Ofcourse , there should be a more detailed explanation for this, but as we knew the sqls were the same, the load was the same and as the storage is the same, we requested a CPU change in the newly migrated 64 bit server, and after the CPU change, the performance problem went away..

      I would appreciate your comments on the paper including the acceptance or rejection on the basis of the things described above.

      No comments :

      Post a Comment