Monday, May 29, 2023

EBS -- Apache exiting with 152 on Multi Apps Node / Shared FS Configuration

Recently, we dealt with an interesting issue in my forum, so I wanted to share it.

In a Multi Apps Node (with shared Application File System) EBS 12.2, all of a sudden Apache started to fail, when trying to start.

[applprod@erman02 opmn]$ adopmnctl.sh startall
You are running adopmnctl.sh version 120.0.12020000.2
Starting Apache...
EXIT CODE is 152. Please check the log file for more details.

adopmnctl.sh: exiting with status 152

The Exit Code 152 didn't tell us a lot. It was not documented in anywhere as far I could see.

*Ensured that we had read/write access (from the applications OS user) for the directory named "u01/ERMANAPPS/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/states", for all the files located in it.

*Ensured that we didn't have any active security mechanism (selinux, firewall etc..) that might prevent apache from starting. Ensured we had all the necessary file permissions in place..

*Ensured the OS limits (ulimit) in place for the OS user that was starting the apache/OHS.

*Ensured we didn't have any space shortage in the filesystems.

*We got Exit 152, The following MOS note was for 150, but still checked -> 

adapcctl.sh: exiting with status 150 (Doc ID 1106795.1)

Nothing helped, everything looked normal so we decided to get an STRACE for the process and all its threads which were relevant with apache/with the start of Apache. We needed to see the system calls, as  they might tell us something useful.

We needed to run the strace with "-ff" option. Because of the following;

-ff makes that each child process started is logged in separate log file where the <PID> is added to the file name.

-ff
--follow-forks --output-separately
Combine the effects of --follow-forks and
--output-separately options. This is incompatible with
-c, since no per-process cou1)

Example command: strace -o startapache.trc -ff -t $INST_TOP/ora/10.1.3/Apache/Apache/bin/apachectl startssl -f $INST_TOP/ora/10.1.3/Apache/Apache/conf/httpd.conf &

**Following records were derived from the STRACE output:

1)
connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 ECONNREFUSED (Connection refused)
shutdown(5, SHUT_RDWR)                  = -1 ENOTCONN (Transport endpoint is not connected)
close(5)                                = 0
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
connect(5, {sa_family=AF_INET, sin_port=htons(6110), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)
shutdown(5, SHUT_RDWR)                  = -1 ENOTCONN (Transport endpoint is not connected)
...
......
exit_group(2)                           = ?
+++ exited with 2 +++

2)
futex(0x7f55b79aaea4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x7f55b79aae78, 14) = 1
futex(0x7f55b79aae78, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0xba5724, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0xba56f8, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0x92054c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {tv_sec=1684324311, tv_nsec=5000000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)

3)
tgkill(28998, 29132, SIGHUP)            = 0
tgkill(28998, 29132, SIG_0)             = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=500000}) = 0 (Timeout)
tgkill(28998, 29132, SIGHUP)            = 0
tgkill(28998, 29132, SIG_0)             = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=500000}) = 0 (Timeout)
tgkill(28998, 29132, SIGHUP)            = 0
tgkill(28998, 29132, SIG_0)             = 0
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=500000}) = 0 (Timeout)
tgkill(28998, 29132, SIGHUP)            = 0
..
....
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28976, si_uid=54321} ---
rt_sigreturn({mask=~[ILL TRAP ABRT BUS FPE KILL SEGV USR2 PIPE TERM STOP SYS RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7fce16c809d0, FUTEX_WAIT, 29011, NULL) = ?
+++ killed by SIGKILL +++

4)
getrlimit(RLIMIT_NOFILE, {rlim_cur=4*1024, rlim_max=64*1024}) = 0
close(3)                                = -1 EBADF (Bad file descriptor)
close(4)                                = -1 EBADF (Bad file descriptor)
close(5)                                = -1 EBADF (Bad file descriptor)
close(6)                                = -1 EBADF (Bad file descriptor)
close(7)                                = -1 EBADF (Bad file descriptor)
close(8)                                = -1 EBADF (Bad file descriptor)
close(9)                                = -1 EBADF (Bad file descriptor)
close(10)                               = -1 EBADF (Bad file descriptor)
close(11)                               = -1 EBADF (Bad file descriptor)
...
.............

5)
access("/u01/PRODERM/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/proxy-wallet/ewallet.p12", F_OK) = -1 ENOENT (No such file or directory)
access("/u01/PRODRM/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/proxy-wallet/cwallet.sso", F_OK) = 0
open("/u01/PRODERM/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OHS/EBS_web/proxy-wallet/cwallet.sso", O_RDONLY) = 22

6)
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28993
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28994
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28995
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28996
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fce20820a50) = 28998
write(12, "[2023-05-17T17:21:56.2408+05:30]"..., 338) = 338
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 28981
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 28982
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) = 28983
wait4(-1, 0x7ffdc1512d38, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
write(13, "!", 1)                       = 1
wait4(-1, 0x7ffdc1512d38, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
write(13, "!", 1)                       = 1
wait4(-1, 0x7ffdc1512d38, WNOHANG|WSTOPPED, NULL) = 0
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
....
.......
................

wait4(28993, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
wait4(28994, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
wait4(28995, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
wait4(28996, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
wait4(28998, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
....
...................
............................

kill(28993, SIGTERM)                    = 0
wait4(28994, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
write(12, "[2023-05-17T17:24:07.4981+05:30]"..., 264) = 264
kill(28994, SIGTERM)                    = 0
wait4(28995, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
write(12, "[2023-05-17T17:24:07.4984+05:30]"..., 264) = 264
kill(28995, SIGTERM)                    = 0
wait4(28996, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0
write(12, "[2023-05-17T17:24:07.4986+05:30]"..., 264) = 264
kill(28996, SIGTERM)                    = 0
wait4(28998, 0x7ffdc1512cd8, WNOHANG|WSTOPPED, NULL) = 0

7)

fstat(25, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0
read(25, "tz6Nm33MxjSStI6k6pYDxt5dXdX", 27) = 27
close(25)                               = 0
write(24, "POST /connect HTTP/1.1\r\nVersion:"..., 163) = 163
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
read(24, "POST /status HTTP/1.1\r\nVersion: "..., 2048) = 207
write(24, "POST /subscribe HTTP/1.1\r\nConten"..., 106) = 106
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
read(24, "POST /status HTTP/1.1\r\nVersion: "..., 2048) = 102
futex(0x116d9bc, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x116d990, 2) = 1
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
read(24, "POST /event HTTP/1.1\r\norigin: 00"..., 2048) = 571
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
read(24, "SubscriberID: 1\r\n\r\n", 2048) = 19
futex(0x116daf4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x116daf0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)

8)

read(5, "# Generated by NetworkManager\nse"..., 4096) = 108
read(5, "", 4096)                       = 0
close(5)                                = 0
munmap(0x7f8880605000, 4096)            = 0
open("/u01/PRODERM/fs2/FMW_Home/webtier/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/u01/PRODERM/fs2/FMW_Home/webtier/opmn/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/u01/PRODERM/fs2/EBSapps/10.1.2/jdk/jre/lib/i386/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/u01/PRODERM/fs2/EBSapps/10.1.2/jdk/jre/lib/i386/server/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/u01/PRODERM/fs2/EBSapps/appl/cz/12.0.0/bin/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/u01/PRODERM/fs2/EBSapps/10.1.2/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/X11R6/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/u01/PRODERM/fs2/EBSapps/appl/sht/12.0.0/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=118394, ...}) = 0
mmap(NULL, 118394, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7f88805e9000
close(5)                                = 0
open("/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 5
read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260!\0\0\0\0\0\0"..., 832) = 832
fstat(5, {st_mode=S_IFREG|0755, st_size=61560, ...}) = 0
mmap(NULL, 2173048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7f887bdcf000
mprotect(0x7f887bddb000, 2093056, PROT_NONE) = 0
mmap(0x7f887bfda000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0xb000) = 0x7f887bfda000
mmap(0x7f887bfdc000, 22648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f887bfdc000
close(5)                                = 0
access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
mprotect(0x7f887bfda000, 4096, PROT_READ) = 0
munmap(0x7f88805e9000, 118394)          = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=603, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8880605000
read(5, "127.0.0.1   localhost localhost."..., 4096) = 603
read(5, "", 4096)                       = 0
close(5)                                = 0
munmap(0x7f8880605000, 4096)            = 0
socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 5
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
open("/u01/PRODERM/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/.formfactor", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0
read(6, "tz6Nm33MxjSStI6k6pYDxt5dXdX", 27) = 27
close(6)                                = 0
write(5, "POST /connect HTTP/1.1\r\nContent-"..., 238) = 238
read(5, "HTTP/1.1 408 Request Time-out\r\nC"..., 8192) = 116
read(5, "<?xml version='1.0' encoding='UT"..., 8192) = 731
write(2, "================================"..., 81) = 81
write(2, "opmn id=erpprodapp02.ttd.com:621"..., 34) = 34
write(2, "Response: 0 of 1 processes start"..., 36) = 36
write(2, "\nias-instance id=EBS_web_OHS2\n", 30) = 30
write(2, "++++++++++++++++++++++++++++++++"..., 81) = 81
write(2, "--------------------------------"..., 81) = 81
write(2, "ias-component/process-type/proce"..., 66) = 66
write(2, "--> Process (index=1,uid=1070952"..., 47) = 47
write(2, "  time out while waiting for a m"..., 56) = 56
write(2, "  Log:\n  /u01/PRODERM/fs2/FMW_H"..., 115) = 115
read(5, "", 8192)                       = 0
shutdown(5, SHUT_RDWR)                  = 0
close(5)                                = 0
munmap(0x7f887c1d2000, 266240)          = 0
munmap(0x7f8880576000, 303104)          = 0
exit_group(408)                         = ?
+++ exited with 152 +++
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000) = 0 (Timeout)
read(24, 0x7fce10000b60, 2048)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=24, events=POLLIN|POLLPRI}], 1, 5000 <unfinished ...>) = ?
+++ killed by SIGKILL +++

8)

fcntl(22, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGHUP {si_signo=SIGHUP, si_code=SI_TKILL, si_pid=28993, si_uid=54321} -
9)

access("/etc/sysconfig/strcasecmp-nonascii", F_OK) = -1 ENOENT (No such file or directory)
mprotect(0x7f887bfda000, 4096, PROT_READ) = 0
munmap(0x7f88805e9000, 118394)          = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=603, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8880605000
read(5, "127.0.0.1   localhost localhost."..., 4096) = 603
read(5, "", 4096)                       = 0
close(5)                                = 0
munmap(0x7f8880605000, 4096)            = 0
socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 5
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
open("/u01/PRODERM/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/.formfactor", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0
read(6, "tz6Nm33MxjSStI6k6pYDxt5dXdX", 27) = 27
close(6)                                = 0
write(5, "POST /connect HTTP/1.1\r\nContent-"..., 238) = 238
read(5, "HTTP/1.1 408 Request Time-out\r\nC"..., 8192) = 116
read(5, "<?xml version='1.0' encoding='UT"..., 8192) = 731
write(2, "================================"..., 81) = 81
write(2, "opmn id=erpprodapp02.ttd.com:621"..., 34) = 34
write(2, "Response: 0 of 1 processes start"..., 36) = 36
write(2, "\nias-instance id=EBS_web_OHS2\n", 30) = 30
write(2, "++++++++++++++++++++++++++++++++"..., 81) = 81
write(2, "--------------------------------"..., 81) = 81
write(2, "ias-component/process-type/proce"..., 66) = 66
write(2, "--> Process (index=1,uid=1070952"..., 47) = 47
write(2, "  time out while waiting for a m"..., 56) = 56
write(2, "  Log:\n  /u01/PRODERM/fs2/FMW_H"..., 115) = 115
read(5, "", 8192)                       = 0
shutdown(5, SHUT_RDWR)                  = 0
close(5)                                = 0
munmap(0x7f887c1d2000, 266240)          = 0
munmap(0x7f8880576000, 303104)          = 0
exit_group(408)                         = ?nts are kept.

THE SOLUTION:

The solution is about updating the httpd.conf.. Updating the lock files and the AcceptMutex lines.

This is already documented for OCI and that directive should have been there already. But! I think it should be applicable to ON-PREM as well!
So, under the hood, we made semaphores to be used rather than the lock files. Lock files are not required. (according to the FMW 11.1.1.9 Admin Guide)

*Here -> Sharing the Application Tier File System in Oracle E-Business Suite Release 12.2 or 12.1.3 Using the Oracle Cloud Infrastructure File Storage Service (Doc ID 2794300.1) .. 

And, the reference of that MOS note comes from the FMW documented itself (from Oracle HTTP Server 11.1.1.9 Fusion Middleware Administrator's Guide for Oracle HTTP Server)
  1. Beginning with the primary application tier node, update the httpd.conf as follows:
  2. Launch the Fusion Middle Control. For example, use the following URL: http://<hostname.domain:admin_port>/em
  3. Select and edit the httpd.conf file.
  4. Update AcceptMutex fcntl to the following AcceptMutex sysvsem (found in two places in the httpd.conf file).
  5. Comment out the LockFile directive (found in three places in the httpd.conf file).
  6. Save the file and exit the Fusion Middleware Control.
  7. Restart the HTTP server for the configuration changes to take effect.
  8. Repeat steps 1 through 6 for all secondary nodes in the environment.

Note that: We had fcntl in the strace, just before the timeout output is given :

fcntl(5, F_SETFD, FD_CLOEXEC) = 0
connect(5, {sa_family=AF_INET6, sin6_port=htons(6110), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = 0
open("/u01/PRODERM/fs2/FMW_Home/webtier/instances/EBS_web_OHS2/config/OPMN/opmn/.formfactor", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0400, st_size=27, ...}) = 0

and we also had the following;

fcntl(5, F_SETFD, FD_CLOEXEC) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(6110), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)

fcntl(22, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGHUP {si_signo=SIGHUP, si_code=SI_TKILL, si_pid=28993, si_uid=54321} -

Some additional info:

The AcceptMutex directive sets the method that Apache uses to serialize multiple children accepting requests on network sockets.
sysvsem : Uses SySV-style semaphores to implement the mutex..
fcntl:Uses the fnctl system call to lock the file defined by the LockFile directive

One final note on this issue:

Maybe ( I didn't test it), Linux semaphores will need to be manually cleaned up if HTTP Server crashes abnormally. If such a crash happens, you can use ipcs -a to see that, and ipcrm -s to clean them. (but again "maybe", I didn't test it... just saying)

No comments :

Post a Comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.