Thursday, January 9, 2014

Linux & Oracle -- Using strace, pstack and /proc filesystem for analyzing processes.

This post will be about Linux process runtime analysis.
I choosed the DBWR process to analyze. I wonder what DBWR is doing in a certain time interval..
Is it working actually? Does it write or read? Which  file or files DBWR is working on?


I will try answer those question by making analysis on OS layer.
I will use top, stace, pstack and /proc filesystem to do that.

Lets start;
Here , we see DBWR is doing I/O. We use the top command  and see the DBWR process is in "D" state.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16748 dbexa     15   0 1212m 829m 787m D  0.3  4.6   0:24.39 ora_dbw0_EXADEV

We obtain the process id of dbwr from above. The process id of the dbwr process im question is 16748.

So lets attach that process using pstack and see what is in its execution stack..

we issue the command; pstack 16748

#0  0x00000034ce80bd05 in __pwrite_nocancel () from /lib64/tls/libpthread.so.0
#1  0x0000000003c147f9 in skgfqio ()
#2  0x000000000090af6e in ksfdgo ()
#3  0x0000000000907b4f in ksfdaio1 ()
#4  0x000000000090564a in ksfdaio ()
#5  0x00000000027292ae in kcfdowr ()
#6  0x00000000010e99ea in kcbbdrv ()
#7  0x00000000008270f1 in ksbabs ()
#8  0x000000000082f658 in ksbrdp ()
#9  0x0000000002f9b5e8 in opirip ()
#10 0x000000000134375e in opidrv ()
#11 0x0000000001ece70e in sou2o ()
#12 0x0000000000713b85 in opimai_real ()
#13 0x0000000000713a3c in main ()

So when we look above, we can see  __pwrite_nocancel from the libpthread.so.0 library. So its coming from the threads library..
__pwrite_nocancel is in the top of the stack, so this function is the last thing that have been executed by this process. This process seems to be doing I/O, actually write I/O's..

Lets trace the system calls of the DBWR process using strace, to get a better runtime data for better understanding the I/O that the process does.

To do this, we issue the command strace -p 16748   (16748 is the process id of dbwr)

pwrite(116, "\2\242\0\0\244O\5\0303!\"\204\211\5\33\4\220\t\0\0;\0\31"..., 8192, 2851373056) = 8192
times(NULL)                             = 456992346
times(NULL)                             = 456992346
pwrite(116, "\2\242\0\0\252O\5\0303!\"\204\211\5\33\4\25\204\0\0;\0"..., 8192, 2851422208) = 8192
times(NULL)                             = 456992346
times(NULL)                             = 456992346
pwrite(116, "\2\242\0\0\254O\5\0303!\"\204\211\5\17\4\204m\0\0;\0\31"..., 8192, 2851438592) = 8192
times(NULL)                             = 456992346
times(NULL)                             = 456992347
pwrite(116, "\2\242\0\0\256O\5\0302!\"\204\211\5\33\4:X\0\0;\0\31\0"..., 8192, 2851454976) = 8192
times(NULL)                             = 456992347
times(NULL)                             = 456992347
pwrite(116, "\2\242\0\0\260O\5\0302!\"\204\211\5\33\4/\266\0\0;\0\31"..., 8192, 2851471360) = 8192
times(NULL)                             = 456992348
times(NULL)                             = 456992348
pwrite(116, "\2\242\0\0\262O\5\0302!\"\204\211\5\33\4\332\10\0\0;\0"..., 8192, 2851487744) = 8192
times(NULL)                             = 456992348
times(NULL)                             = 456992348
pwrite(116, "\2\242\0\0\264O\5\0302!\"\204\211\5\33\4\20-\0\0;\0\31"..., 8192, 2851504128) = 8192
times(NULL)                             = 456992351
times(NULL)                             = 456992351
pwrite(116, "\2\242\0\0\270O\5\0302!\"\204\211\5\33\4\316\311\0\0;\0"..., 8192, 2851536896) = 8192
....................
....................
.....................
Okay, we see a lot of pwrites and times sys calls.. pwrite syscalls we see here, are consistent with what we saw in the pstack (__pwrite_nocancel)

Lets look what this system calls actually are;

times(): returns the number of clock ticks that have elapsed since an arbitrary point in the past.
times() syscall is not so relevant with our approach..

pwrite() : write to a file descriptor at a given offset
Lets look to this syscal closely.

pwrite's definition is as follows;

ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);

So, now we understand the pwrite. 
Next, we analyze DBWR's pwrite syscalls.

We have similar lines in our strace output.. Like the following;
pwrite(116, "\2\242\0\0\260O\5\0302!\"\204\211\5\33\4/\266\0\0;\0\31"..., 8192, 2851471360) = 8192

When we decode above line according to the syscall's definition , we see ->

DBWR issues write operations to a file, using one of its file descriptor 116 .
It requests to write 8192 bytes (3th value in the arguments = 8192) and its successfully writes all of them. (=8192) --> Another derived data from here is that , the database block size is probably 8k :) 

But what is this file? 
Lets go to /proc filesystem and find out which file it is...
procfs (or the proc filesystem) is a special filesystem in UNIX-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory


So we go to /proc filesystem and look to the file descriptors of the DBWR process.

cd /proc/16748/fd
 ls -lrt |grep 116
lrwx------  1 root root 64 Jan  9 15:05 116 -> /erman/erman_33.dbf

So file descriptor with number 116 is pointing to /erman/erman_33.dbf

So in conclusion;

Just looking from the OS layer, we can say that; dbwr is running and writing aggresively to the datafile named /erman/erman_33.dbf..
We did this analysis on the OS layer, as you see. We dont need know anything about the Oracle to make this kind of analysis.


No comments :

Post a Comment