Thursday, November 21, 2013

Linux & EBS 11i -- uek kernel, PAE support(32->36 bits), HighMem-LowMem,address large memory

Oracle Enterprise Linux 5 32 bit comes with a built-in PAE support.
You can see it by looking to the kernel configuration file;
Configuration file is under /boot directory, and it s something like config-2.6.32-300.10.1.el5uek.
In this confiuration file, you can see the PAE support declaration -> CONFIG_X86_PAE=y

PAE stands for Physical Address Extension.. It is basically a feature for 32 bit systems.. It provides addressing memory larger than 4GB. Technically a 32 bit system can address 2^32 byte  (4GB)  memory.. PAE increases address size from 32 bits to 36 bits.. By increasing the address size, PAE actually increases the adressable memory from 4GB to 64 GB.

Lets look how does an Operating System like Linux locate the data? (without PAE)

The memory manager locates the Page Directory for the current process. The memory manager is informed of the location of the Page Directory for the process by a special control register.
The 10-bit Page Directory Index in the virtual address is used to locate a Page Directory Entry (PDE) that defines the location of the Page Table needed to translate the virtual memory address.
The 10-bit Page Table Index in the virtual address is used to locate the correct Page Table Entry (PTE).
The PTE is used to locate the correct 4 KB page of memory.
After accessing this page, the 12-bit Byte Index in the virtual address is used to locate the physical address of the desired data.

With PAE -> A page directory pointer table  is also added to the process above. Actually page directory pointer table provides accessing more than 4 gb memory.

Following figure explains reaching a page in the memory using a PAE kernel memory architecture..
(Reference: Paging Extentions for the Pentium Pro Processors, by Robert R. Collins)

As you see, there is a high level hierarchy. This hierarchy between page tables are used to decrease the size of the page tables.

Note that: Adding page tables to the hierarchy can reduce performance..

The virtual address is the same, it s 32 bit. The first table which is the Page Directory pointer uses 2 bits to points to the 4-page directory. The entries in the page directories are 64 bit , but only 36 bits of them are used to decribe the location.  So we have 36 bit addresses.. By using 36 bit, 64 GB memory becomes addressable.
In my opinion, the virtual address is still 32 bit, so a single process can not access more than 4gb even with PAE. Maybe using swapping techniques, a single process can address more than 4gb pyhsical memory , but at single point in time, it will have an address space of 4gb at most.

Okay, technically PAE supports 64 GB, but it is not the case in practice, especially for critical systems..

A Reference: Mel Gorman, University of Miami , Understanding The Linux Virtual Memory Manager

PAE allows a processor to address up to 64GiB in theory but, in practice, pro-
cesses in Linux still cannot access that much RAM as the virtual address space is
still only 4GiB. This has led to some disappointment from users who have tried to
malloc() all their RAM with one process.
Secondly, PAE does not allow the kernel itself to have this much RAM available.
The struct page used to describe each page frame still requires 44 bytes and this
uses kernel virtual address space in ZONE_NORMAL. That means that to describe 1GiB
of memory, approximately 11MiB of kernel memory is required. Thus, with 16GiB,
176MiB of memory is consumed, putting significant pressure on ZONE_NORMAL. This
does not sound too bad until other structures are taken into account which use
ZONE_NORMAL. Even very small structures such as Page Table Entries (PTEs) require
about 16MiB in the worst case. This makes 16GiB about the practical limit for
available physical memory Linux on an x86. If more memory needs to be accessed,
the advice given is simple and straightforward, buy a 64 bit machine.

To test this; we booted our 64 bit HP server (which has 64gb memory installed) with a PAE enabled Oracle Linux 32 bit.. It could see the 64 GB Ram.. So we used hugepages to address the large memory, and took necessary actions to make Oracle use a big sga. Everything seemed perfect in the beginning..  Then we started up our Oracle Database with 4gb Sga. Database started, so still no problem. Then we started to create some tablespaces and lastly, we ended up with the following;

Nov 12 10:25:22 productlinux kernel: lowmem_reserve[]: 0 0 0 0
Nov 12 10:25:22 productlinux kernel: DMA: 1*4kB 1*8kB 0*16kB 0*32kB
1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1868kB
Nov 12 10:25:22 productlinux kernel: DMA32: empty
Nov 12 10:25:22 productlinux kernel: Normal: 1*4kB 1*8kB 8*16kB 1*32kB
0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB
Nov 12 10:25:22 productlinux kernel: HighMem: 0*4kB 0*8kB 1*16kB
0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 6114*4096kB =
Nov 12 10:25:22 productlinux kernel: 5360827 pagecache pages
Nov 12 10:25:22 productlinux kernel: Swap cache: add 0, delete 0, find
0/0, race 0+0
Nov 12 10:25:22 productlinux kernel: Free swap = 34799608kB
Nov 12 10:25:22 productlinux kernel: Total swap = 34799608kB
Nov 12 10:25:22 productlinux kernel: Free swap: 34799608kB
Nov 12 10:25:22 productlinux kernel: 16777216 pages of RAM
Nov 12 10:25:22 productlinux kernel: 16547840 pages of HIGHMEM
Nov 12 10:25:22 productlinux kernel: 403562 reserved pages
Nov 12 10:25:22 productlinux kernel: 7467319 pages shared
Nov 12 10:25:22 productlinux kernel: 0 pages swap cached
Nov 12 10:25:22 productlinux kernel: 158886 pages dirty
Nov 12 10:25:22 productlinux kernel: 0 pages writeback
Nov 12 10:25:22 productlinux kernel: 146965 pages mapped
Nov 12 10:25:22 productlinux kernel: 90980 pages slab
Nov 12 10:25:22 productlinux kernel: 5958 pages pagetables
Nov 12 10:25:22 productlinux kernel: Out of memory: Killed process

So, from the kernel messages above, we can say that, OOM killer killed one of our background processes.. In this case the process got killed, was LGWR..

OOM can be disabled, but it s a built-in security mechanism.. It s actually there for helping us. OOM killed the process because it saw that the amount of free memory is decreasing to zero..

But we have 64 gb right? There should be a lot of memory left even when the database was running..

We see the HighMem there, 25044944kB, it s approx. 23 GB. It seems okay...

Higmem is the memory that an application or lets say user process can access. In this case, there were plenty of rooms in High Mem.

Lets look at the line starting with Normal.. It should be the line for the Low Memory.. In this case, Low Memory had only 3756KB (approx 3MB) available.. This points to a problem.. Low Memory is reserved memory for kernel, and it seems the source of the problem is the Low Memory... Linux 32bit systems implements by default a 3:1 split memory, as 32 bit Linux can address 4GB of memory; the first 3GB of the memory is called High Memory which is the user address space and the remaining 1GB is called Low Memory which is the kernel address space. So when the kernel needs to create its structures and load data into them, it uses the Low Memory. For a 32 bit Linux System with 64 Gb memory installed, it seems the kernel stores a lot of information just to manage this big memory. It stores this information in the low memory.. So even if you use PAE, it is not stable when you adress such a big memory like 64 gb.. The reason behind that is that PAE uses 3:1 split for high-low memory.. So in brief, PAE kernel can see 64 gb memory, but when the server load increases, it can fill out the low memory very easily and break our processes down.

Lets talk about HugeMem support . Hugemem is another option that allows Linux 32 bit kernel to address more than 4 GB memory. Hugemem memory boundries are not like PAE(3:1), so a kernel that support hugemem has a 4:4 split memory . That is, 4 Gb for high mem, 4 Gb for low mem... It makes a Linux kernel to be able to address more than 16gb memory on a 32 bit system. It is not like the PAE, as it has a 4:4 split. As you predict, It will not suffer from the 1gb lowmem limit like the other 32bit kernels. Ofcourse, if you need to address such a big memory on 32 bit system, you have to use your Oracle Database with some specific parameters (like indirect data buffer).

Bad news is that, Oracle Enterprise Linux 5 and upwards does not support HugeMem. It seems it is not supported because of the insufficiency for the new patches and widespread use of 64 bit systems in the Enterprise Level..

In addition, there is no equivalent kernel for Oracle Enterprise Linux 5 and above.. It seems, the same applies for Redhat, too..

In brief, if you want to use 64 Gb memory on a 32 bit Linux system, you need to use Oracle Enterprise Linux 4 with hugemem. If using an 64 Gb memory is a very important requirement and if you want to use Oracle Enterprise Linux 5 or 6, you need to go for 64 bit..

So lets suppose we have an Application (Oracle EBS) running on a Oracle Enterprise Linux 5 32 bit OS installed on a 32bit server with 16 GB memory installed;

As I mentioned above we can use PAE for this configuration..

Oracle Enterprise Linux 5 supports PAE option.. PAE kernel can address 16 gb memory stably.

To test this, lets use the following C program;
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGE_SZ (1<<12) /* it makes 4096*/
int main() {
int i;
int gb = 2;
for (i = 0; i < ((unsigned long)gb<<30)/PAGE_SZ ; ++i) { /*converting GB to Byte */
void *m = malloc(PAGE_SZ);
if (!m)
break; /*Break if you can not allocate*/
memset(m, 0, 1); /*Write*/
printf("allocated %lu MB\n", ((unsigned long)i*PAGE_SZ)>>20); /*Converting Byte to MB*/
return 0;

We upload this c code into a file and compile it with gcc on our Oracle Enterprise Linux 5 system.

The name of the produced executable is "a.out"

So lets execute this a.out..

It will allocate 2GB of memory .. It will allocate them in pages(4096 by 4096). It will stop if it can not allocate any more memory and it will print out the total memory allocated.

The server has 16gb memory as below;

op - 16:48:13 up 3:45, 15 users, load average: 0.12, 0.12, 0.04

Tasks: 454 total, 1 running, 453 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 16502020k total, 15973388k used, 528632k free, 8072k buffers
Swap: 33559776k total, 1195800k used, 32363976k free, 35676k cached

Lets start 8 a.out processes.. (as I know that a process can allocate 2GB memory at most in Linux 32 bit..)
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 
 2196 root      20   0 2056m 2.0g  344 S  0.0 12.7   0:01.56 a.out                                                   
 2037 root      20   0 2056m 2.0g  344 S  0.0 12.7   0:01.41 a.out                                                   
 2001 root      20   0 2055m 2.0g  344 S  0.0 12.7   0:01.39 a.out                                                   
 2101 root      20   0 2056m 2.0g  344 S  0.0 12.7   0:01.41 a.out                                                   
 2004 root      20   0 2055m 2.0g  344 S  0.0 12.7   0:01.39 a.out                                                   
 2068 root      20   0 2056m 2.0g  344 S  0.0 12.7   0:01.36 a.out                                                   
 2218 root      20   0 2055m 2.0g  344 S  0.0 12.7   0:03.10 a.out                                                   
 1967 root      20   0 2056m 975m  344 S  0.0  6.1   0:01.40 a.out    

Not that : The RES column in the above ps output shows , the real allocated physical memory for each process.. As you see 7x a.out processes allocates 2 gb memory per process and 1x a.out process allocates 975m of memory, because only 975m memory has left..

This proves that a PAE enabled 32 bit Linux can adress 16Gb memory without problems..

Lastly, I will write something from the Oracle EBS / Apps Dba perspective..
By using this information in this post, a conclusion has arrived -> If you need to EBS 11i in to a Linux environment, I suggest you to choose a split configuration. 
First, Install the database into a 64 bit Linux and use a big sga and then install the Application Tier into a 32 bit Linux with PAE support. Note that, EBS 11i application tier should be installed into a 32bit Linux because EBS 11i application code is not supported on 64 bit Linux Operating Systems..
By using this split configuration method, you can use a big sized sga for database, as well as 16 gb memory for application services..

I don't want to say that EBS application code can not be compiled and run in a 64 bit Linux.. Maybe it can.. Maybe by modifying the makefiles, environments and so on, you can compile or relink executables.. On the other hand; It is not supported by Oracle.

No comments :

Post a Comment