Erman Arslan's Oracle Blog: Linux / Oracle RDBMS -- High Load after VM migration

Friday, May 22, 2015

Linux / Oracle RDBMS -- High Load after VM migration

After migrating a 10gR2 EBS database from a 32 bit Linux machine to a new 64 bit Linux Machine, performance started to slow down.
Load average was always somewhere between 20 and 30, and end users were facing slowness in the EBS OAF and Forms screens..
Altough the users were facing this slowness, they were also saying that their Discoverer reports are completed faster than they were in the old 32 Linux..
The major difference between this old and new machines are as follows;

Old Machine Linux version: Redhat 5.5
Old MAchine Arch: 32 bit
Old MAchine SGA: 8GB (implemented VLM mode + in-memory filesystem (shmfs/tmpfs, ramfs)
Old MAchine PGA: 2G
Old MAchine Total Memory: 31GB
Old Machine CPU core count: 16 Pyhiscal Cores (No Hyper threading)

New Machine Linux version: Redhat 5.9
New MAchine Arch: 64 bit
New MAchine SGA: 18G
New Machine PGA : 10G
New MAchine Total Memory: 48GB (hugepages enabled for SGA)
New Machine CPU core count: 16 Cores (16 virtual sockets , one core each.. no hyper threading)

So , the new machine was 64 bit, had 48GB memory(hugepages were also enabled) and had the same core counts as the old server, but it was on VM (Vmware ESX)
The important difference was that the new machine's CPU cores were virtual..

When we examined the AWR reports, we have seen CPU waits all the time..
Also, In top command , we have seen %96 user space CPU usage all the time..
Load Average was high all the time..
Discoverer reports were running all the time.. They were always in "R" state and they were always consuming %99 cpu..
So they were obviously the main cause of the slowness, but we had no chance to tune the Discoverer Reports, because these reports were also used in the old system by the End users..

AWR gathered for 30 minutes interval representing the problem:

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
CPU time		19,769		74.4
db file sequential read	209,680	850	4	3.2	User I/O
SQL*Net more data to client	644,830	546	1	2.1	Network
latch: cache buffers chains	4,576	389	85	1.5	Concurrency
TCP Socket (KGAS)	6,313	317	50	1.2	Network

Okay we had 19769 seconds CPU time..
On the other hand we had 16 vm cores in this servers. So for a 30 minutes interval , we had 16 * 30 * 60 seconds CPU time available ..
This made 28800, so we still had 9031 seconds CPU time available..
So, we had a high CPU load but we were not in a hang situation. This high DB CPU usage meant that the database was spending a lot of time in CPU (processing) compared to I/O and other events.

This high CPU usage was caused by the Discoverer Reports, it was for sure..

CPU Time (s)	Elapsed Time (s)	Executions	CPU per Exec (s)	% Total DB Time	SQL Id	SQL Module	SQL Text
2,190	2,685	63,231	0.03	10.10	6j1rnhpxu1bry	Disco10,	SELECT COUNT(T.NO) FROM XXIC_T...
1,994	2,493	2	996.84	9.38	79h2r4xdn8g44	Disco10,	SELECT o481137.CUSTOMER_NAME ...
1,827	2,202	2	913.39	8.29	93bkbks2j66q5	Disco10,	SELECT DECODE((decode(o310037...

The design of these reports were not optimized.. For example a subquery was running 63231 times in 30 minues and It was a cpu bound query.. On the other hand, it was not the time to tune these sqls or modify these reports, because they were working exactly as the same way as they were working in the old system..

I mean, even the execution plans and the sqls in these reports were the same, they did not change..

In addition, this new system was 64 bit and it had bigger SGA and bigger Physical memory than the old one.. So we were not expecting such a load in the new database server..

That 'is , the cause of high Cpu times must be something else..

There was db file sequential read listed at second place in the Top 5 Waits Event.. This wait was indicating single block index reads and its average was 4 ms.. So it was acceptable for this environment.. In adddition, we were using the same disk devices even the same filesystems as the old server, so IO was not the problem..

It was the CPU... It would be better if I could share the CPU model information in here but the idea in this post is to prove the slowness using AWR ..

In the Latch activity , we saw the cache buffer chains with the highest Get Request..

Get Requests	Pct Get Miss	Avg Slps /Miss	Wait Time (s)	NoWait Requests	Pct NoWait Miss

cache buffers chains

810,825,589

0.02

0.03

389

1,140,617

0.03

The high cache buffer chains was directing us to find the hot buffers, find the sessions waiting for latch contention, intersect the objects related with hot buffers and the objects touched with queries waiting for latch contention, conclude the problematic sqls and made an sql tuning.

On the other hand, I was not in that mood, as this database was running as the same way as the old database server..

By knowing that, the latches acquired when a process needs them and latches are released when the process finishes its work in the relevant memory structure; we decided "the time spent between acquiring and releasing the latches" is also very important..

That is , if our processes could make their jobs quickly , then more latches would be available..

I mean, the time between acquiring and releasing the latches will be shorter, and that will decrease the cpu time waits instance wide, as the processes will not be on the CPU while waiting for the latches ..

At this point, we changed the CPUs.. I mean we changed the Hardware of the Virtual Machine which were running the database..

Note that , the disks stayed the same and the memory stayed the same..

After the CPU change: the cache buffer chain count was again so high;

Get Requests	Pct Get Miss	Avg Slps /Miss	Wait Time (s)	NoWait Requests	Pct NoWait Miss

cache buffers chains

3,439,451,402

2.09

0.00

823,885

0.04

Using the new CPUS, there were almost no waits in latch: cache buffers chain;

Event	Waits	%Time -outs	Total Wait Time (s)	Avg wait (ms)	Waits /txn

atch: cache buffers chains

17,424

0.00

0.57

These values were as follows in the OLD system..

latch: cache buffers chains

4,576

0.00

389

0.34

So , eventhough there were more number of latches allocated with the new CPUs, there were almost no wait times

Look, we had more db file sequential read, and more cache buffer chains , but our CPU time is lower :)

Note that : This report was taken for an 1 hour interval. Eventhough, it was for a longer period, CPU time was lower than the value with the old CPUs..

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
CPU time		10,730		96.0
db file sequential read	213,556	165	1	1.5	User I/O
TCP Socket (KGAS)	11,950	162	14	1.4	Network
SQL*Net more data to client	1,359,692	93	0	.8	Network
log file parallel write	28,298	47	2	.4	System I/O

We had almost have the same load on the Oracle Database. Note that: these values are per second and per transaction.

Load Profile (with new CPUs) --1 hour AWR

	Per Second	Per Transaction
Redo size:	58,524.70	6,768.44
Logical reads:	509,961.79	58,977.58
Block changes:	338.90	39.19
Physical reads:	171.86	19.88
Physical writes:	24.92	2.88
User calls:	2,476.00	286.35
Parses:	170.12	19.67
Hard parses:	4.22	0.49
Sorts:	296.35	34.27
Logons:	0.96	0.11
Executes:	7,877.99	911.10
Transactions:	8.65

Load Profile (with OLD CPUs) --30 minutes AWR

	Per Second	Per Transaction
Redo size:	38,658.42	6,072.38
Logical reads:	212,494.62	33,378.16
Block changes:	195.19	30.66
Physical reads:	294.80	46.31
Physical writes:	22.43	3.52
User calls:	1,471.26	231.10
Parses:	94.94	14.91
Hard parses:	4.28	0.67
Sorts:	157.24	24.70
Logons:	0.84	0.13
Executes:	7,478.48	1,174.70
Transactions:	6.37

We had more redo size, more logical reads, more block changes, more physical read, more phyiscal writes, almost the same hard parses, more sorts, more executions and more transactions in New CPU System, but there was low CPU time waits :)

Almost the same sqls were running , same Discoverer reports were at the top, but this time Cpu time was not so high in instance wide..

So this showed that the approach was correct, the CPUs were the bottleneck and the new CPUs fixed the problem.. The new cpus were again Virtual, but the Cpu family of the new hardware was different.. There were major differences in CPU specs... like hyperthreading...

Okay. So far so good. But we had one more question; how was this system dealing with this load earlier? I mean this database was running on 32 bit server with 8gb SGA and we were not seeing this CPU time waits in AWRs... So the question is why?

Lets start with the CPU utilization...

High Cpu utilization may not a problem actually.. If you are not waiting for other resources, you can end up waiting for CPU or running on the CPU..

Especially if your queries are cpu bound, then you will have your CPU time increasing..

CPU time in AWR refers to the amount of CPU time that session used .. Oracle gets this info from OS using getrusage() or similar OS calls.. So this CPU time value does not include time spent waiting on CPU run queues..

High CPU utilization may not necessarily indicate that there is a problem; it could just mean that the system is being well utilized.

Time shown as "DB CPU" or "CPU time" in AWR and similar reports refers to the amount of CPU time that the session/s used and is usually based on information from the OS itself (eg: using getrusage() or similar OS calls on Unix). This measure typically does not include time spent waiting on CPU run queues.

If we could have ASH, it would be better because Time shown as "CPU + Wait for CPU" in ASH reports could give us an idea of the time that session/s were running, or wanting to run on CPU..

Anyways, the bottom line is that , if you are not waiting on other things, then you will be on CPU and your CPU time will increase.. You will use the CPU in parallel and you will increase the load of the server and then you will start to feel performance degregation in system wide..

Lets have a look at the AWR report gathered from the old 32 bit server, running the same database, same queries, same reports and having almost the same number of sessions connected to it..

Do not bother LNS wait.. This system had a standby when this AWR snapshot was taken.

Load Profile (with OLD CPU s --64 bit)

	Per Second	Per Transaction
Redo size:	38,658.42	6,072.38
Logical reads:	212,494.62	33,378.16
Block changes:	195.19	30.66
Physical reads:	294.80	46.31
Physical writes:	22.43	3.52
User calls:	1,471.26	231.10
Parses:	94.94	14.91
Hard parses:	4.28	0.67
Sorts:	157.24	24.70
Logons:	0.84	0.13
Executes:	7,478.48	1,174.70
Transactions:	6.37

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
CPU time		19,769		74.4
db file sequential read	209,680	850	4	3.2	User I/O
SQL*Net more data to client	644,830	546	1	2.1	Network
latch: cache buffers chains	4,576	389	85	1.5	Concurrency
TCP Socket (KGAS)	6,313	317	50	1.2	Network

Load Profile (with OLD Phyiscal Server --32 bit )

	Per Second	Per Transaction
Redo size:	38,685.46	5,267.77
Logical reads:	348,930.51	47,513.61
Block changes:	218.78	29.79
Physical reads:	1,026.41	139.77
Physical writes:	282.69	38.49
User calls:	2,255.70	307.16
Parses:	138.65	18.88
Hard parses:	3.75	0.51
Sorts:	205.22	27.95
Logons:	0.86	0.12
Executes:	6,003.52	817.50
Transactions:	7.34

Event	Waits	Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
CPU time		15,466		93.5
db file sequential read	755,438	389	1	2.4	User I/O
db file scattered read	302,776	290	1	1.8	User I/O
LNS wait on SENDREQ	389	98	253	.6	Network
log file sync	17,380	93	5	.6	Commit

So the old 32 bit system had more db file sequential read, more db file scattered read and a significant amount of CPU time..

Lets take a look at te Operating System Statistics ;

Operating System Statistics (OLD Physical Server --32 bit)

Note that : this AWR was for an 60 minutes interval.

Statistic Total

BUSY_TIME 1,868,106

IDLE_TIME 3,894,329

IOWAIT_TIME 65,502

NICE_TIME 48

SYS_TIME 222,197

USER_TIME 1,625,270

LOAD 5

RSRC_MGR_CPU_WAIT_TIME 0

PHYSICAL_MEMORY_BYTES 34,076,270,592

NUM_CPUS 16

NUM_CPU_SOCKETS 4

Total Buffer Gets: 1,262,389,050

Statistic	Total
BUSY_TIME	1,868,106
IDLE_TIME	3,894,329
IOWAIT_TIME	65,502
NICE_TIME	48
SYS_TIME	222,197
USER_TIME	1,625,270
LOAD	5
RSRC_MGR_CPU_WAIT_TIME	0
PHYSICAL_MEMORY_BYTES	34,076,270,592
NUM_CPUS	16
NUM_CPU_SOCKETS	4

Latch Name	Get Requests	Pct Get Miss	Avg Slps /Miss	Wait Time (s)	NoWait Requests	Pct NoWait Miss

Cache buffers chains

2,406,314,930

0.28

0.00

5,506,188

0.00

library cache	13,180,259	0.04	0.01	112,143	0.09
library cache load lock	10,348	0.00		0
library cache lock	4,375,448	0.00	0.00	0
library cache lock allocation	181,145	0.00		0
library cache pin	5,453,561	0.01	0.00	0
library cache pin allocation	69,931	0.00		0

Operating System Statistics (OLD CPU s --64 bit)

Note that : this AWR was for an 30 minutes interval.

Statistic Total

BUSY_TIME 1,674,495

IDLE_TIME 571,810

IOWAIT_TIME 39,970

NICE_TIME 0

SYS_TIME 296,542

USER_TIME 1,247,579

LOAD 18

RSRC_MGR_CPU_WAIT_TIME 0

PHYSICAL_MEMORY_BYTES 50,750,922,752

NUM_CPUS 16

NUM_CPU_SOCKETS 4

Statistic	Total
BUSY_TIME	1,674,495
IDLE_TIME	571,810
IOWAIT_TIME	39,970
NICE_TIME	0
SYS_TIME	296,542
USER_TIME	1,247,579
LOAD	18
RSRC_MGR_CPU_WAIT_TIME	0
PHYSICAL_MEMORY_BYTES	50,750,922,752
NUM_CPUS	16
NUM_CPU_SOCKETS	4

Total Buffer Gets: 444,663,900

Latch Name	Get Requests	Pct Get Miss	Avg Slps /Miss	Wait Time (s)	NoWait Requests	Pct NoWait Miss

cache buffers chains

810,825,589

0.02

0.03

389

1,140,617

0.03

library cache	7,053,564	0.12	0.24	99	409	0.24
library cache load lock	5,011	0.00		0	7	0.00
library cache lock	1,622,604	0.01	0.31	0	0
library cache lock allocation	62,053	0.00	1.00	0	0
library cache pin	2,719,841	0.02	0.30	4	3	0.00
library cache pin allocation	31,816	0.00		0	0

As we see , The new 64 bit system which has the old CPUs had a load of 18.. On the other hand the old 32 bit system, which has physical CPUs, had a load of 5 ..

The new 64 bit System which has old CPUs, has a BUSY time of 1,674,495 measured in 30 mins , but the old 32 bit system which has physical CPUS has a BUSY time of 1,868,106 measured in 1 hour..

Also when we look at the cache buffer chains, we see that there are almost no waits for the cache buffer chains in the old 32 bit system.. Note that Total Buffer gets is much higher (3 times more) in

the Old 32 Bit server, and even if it so , it has almost no wait for cache buffers chains..

The situation was almost the same for library cache latches..

At the bottom line ,we can say that altough the old 32 bit server had more operations, its CPU busy time is lower(1/2 rate) than the new 64 bit server which had old cpus..

So this was a proof for the CPU problem in 64 bit server.. Ofcourse , there should be a more detailed explanation for this, but as we knew the sqls were the same, the load was the same and as the storage is the same, we requested a CPU change in the newly migrated 64 bit server, and after the CPU change, the performance problem went away..

I would appreciate your comments on the paper including the acceptance or rejection on the basis of the things described above.

Friday, May 22, 2015

Linux / Oracle RDBMS -- High Load after VM migration

Operating System Statistics (OLD Physical Server --32 bit)

Statistic Total

BUSY_TIME 1,868,106

IDLE_TIME 3,894,329

IOWAIT_TIME 65,502

NICE_TIME 48

SYS_TIME 222,197

USER_TIME 1,625,270

LOAD 5

RSRC_MGR_CPU_WAIT_TIME 0

PHYSICAL_MEMORY_BYTES 34,076,270,592

NUM_CPUS 16

NUM_CPU_SOCKETS 4

Total Buffer Gets: 1,262,389,050

Operating System Statistics (OLD CPU s --64 bit)

Statistic Total

BUSY_TIME 1,674,495

IDLE_TIME 571,810

IOWAIT_TIME 39,970

NICE_TIME 0

SYS_TIME 296,542

USER_TIME 1,247,579

LOAD 18

RSRC_MGR_CPU_WAIT_TIME 0

PHYSICAL_MEMORY_BYTES 50,750,922,752

NUM_CPUS 16

NUM_CPU_SOCKETS 4

No comments :

Post a Comment

Friday, May 22, 2015

Linux / Oracle RDBMS -- High Load after VM migration

Operating System Statistics (OLD Physical Server --32 bit)

StatisticTotal BUSY_TIME1,868,106 IDLE_TIME3,894,329 IOWAIT_TIME65,502 NICE_TIME48 SYS_TIME222,197 USER_TIME1,625,270 LOAD5 RSRC_MGR_CPU_WAIT_TIME0 PHYSICAL_MEMORY_BYTES34,076,270,592 NUM_CPUS16 NUM_CPU_SOCKETS4 Total Buffer Gets: 1,262,389,050

Operating System Statistics (OLD CPU s --64 bit)

StatisticTotal BUSY_TIME1,674,495 IDLE_TIME571,810 IOWAIT_TIME39,970 NICE_TIME0 SYS_TIME296,542 USER_TIME1,247,579 LOAD18 RSRC_MGR_CPU_WAIT_TIME0 PHYSICAL_MEMORY_BYTES50,750,922,752 NUM_CPUS16 NUM_CPU_SOCKETS4

No comments :

Post a Comment

Statistic Total

BUSY_TIME 1,868,106

IDLE_TIME 3,894,329

IOWAIT_TIME 65,502

NICE_TIME 48

SYS_TIME 222,197

USER_TIME 1,625,270

LOAD 5

RSRC_MGR_CPU_WAIT_TIME 0

PHYSICAL_MEMORY_BYTES 34,076,270,592

NUM_CPUS 16

NUM_CPU_SOCKETS 4

Total Buffer Gets: 1,262,389,050

Statistic Total

BUSY_TIME 1,674,495

IDLE_TIME 571,810

IOWAIT_TIME 39,970

NICE_TIME 0

SYS_TIME 296,542

USER_TIME 1,247,579

LOAD 18

RSRC_MGR_CPU_WAIT_TIME 0

PHYSICAL_MEMORY_BYTES 50,750,922,752

NUM_CPUS 16

NUM_CPU_SOCKETS 4