Friday, November 7, 2014

Linux-- Setting the hostname FQDN or Short? --a detailed approach , + a look from EBS perspective

EBS Application Tier processes running on Linux may encounter problems because of a wrong hostname setting of the Operating System. Thus the hostname we set for Linux must be appropriate.
Why appropriate? Because FNDLIBR uses the hostname it gathers from the kernel.
Lets use the strace utility to see what the process is doing when our start script executes FNDLIBR;
strace FNDLIBR FND FNDCPBWV apps/apps SYSADMIN 'System Administrator' SYSADMIN.
Okay.. I will not copy&paste the entire trace here, but the obvious thing is that FNDBLIR uses uname calls and gets the hostname..
uname({sys="Linux", node="ermanhost.domain.com", ...}) = 0
Note that, uname command also gets the hostname using uname system call. On success, zero is returned.


The hostname comes from the below strucute ;
struct utsname {
char sysname[]; /* Operating system name (e.g., "Linux") */
char nodename[]; /* Name within "some implementation-defined
network" */
char release[]; /* Operating system release (e.g., "2.6.28") */
char version[]; /* Operating system version */
char machine[]; /* Hardware identifier */
#ifdef _GNU_SOURCE
char domainname[]; /* NIS or YP domain name */
#endif
};


So using uname , FNDLIBR obtains the hostname from the kernel.
To demonstrate, I'll write a little C program and execute it while tracing with strace;
Our program to get and display the uname using struct data;

#include <stdio.h>
#include <sys/utsname.h>
int main ()
{
struct utsname u;
uname (&u);
printf (“%s release %s (version %s) on %s\n”, u.sysname, u.release, u.version, u.machine);
return 0;
}

We compile it;
gcc /tmp/ourprogram.c

We run it
It display the following.
./a.out
Linux release 2.6.32-100.26.2.el5 (version #1 SMP Tue Jan 18 20:11:49 EST 2011) on x86_64

When we trace it using strace;
./a.out 
execve("./a.out", ["./a.out"], [/* 22 vars */]) = 0
brk(0)                                  = 0xd2d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb55dc02000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb55dc01000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=107884, ...}) = 0
mmap(NULL, 107884, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb55dbe6000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332a\0335\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722304, ...}) = 0
mmap(0x351b600000, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x351b600000
mprotect(0x351b74e000, 2097152, PROT_NONE) = 0
mmap(0x351b94e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x351b94e000
mmap(0x351b953000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x351b953000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb55dbe5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb55dbe4000
arch_prctl(ARCH_SET_FS, 0x7fb55dbe46e0) = 0
mprotect(0x351b94e000, 16384, PROT_READ) = 0
mprotect(0x351b41b000, 4096, PROT_READ) = 0
munmap(0x7fb55dbe6000, 107884)          = 0
uname({sys="Linux", node="ermanhost.domain.com", ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb55dc00000
write(1, "Linux release 2.6.32-100.26.2.el"..., 90Linux release 2.6.32-100.26.2.el5 (version #1 SMP Tue Jan 18 20:11:49 EST 2011) on x86_64
) = 90
exit_group(0)  

We see this program calls uname syscall and that returns the same thing as FNDLIBR's syscall..

uname({sys="Linux", node="ermanhost.domain.com", ...}) = 0

Then, we can say  that at the lowest level , FNDBLIR does the same thing to retrieve the hostname , same as our little program does.

Note that : hostname information is available in the proc file system too.
cat /proc/sys/kernel/hostname
ermanhost.domain.com
Also , hostname command can be used to display the hostname too.
hostname -f
ermanhost.domain.com
hostname
ermanhost.domain.com

But wait... You see, hostname -f and hostname returns the same thing in this system. "-f argument" is used to display FQDN ,but what about the output of hostname with no arguments??*

Actually they shouldnt return the same thing because;
hostname will print the name of the system as returned by the  gethost-name function.
The FQDN is the name gethostbyname returns for the host name returned by gethostname.

So in this system gethostname and gethostbyname return the same thing - > FQDN..
In other words; in this system hostname returns FQDN from everywhere..

Lets see what these gethostname and gethostbyname functions are...

int gethostname(char *name, size_t len);
gethostname() returns the null-terminated hostname in the character array name, which has a length of len bytes. If the null-terminated hostname is too large to fit, then the name is truncated, and no error is returned (but see NOTES below). POSIX.1-2001 says that if such truncation occurs, then it is unspecified whether the returned buffer includes a terminating null byte.

struct hostent *gethostbyname(const char *name);
The gethostbyname() function returns a structure of type hostent for the given host name. Here name is either a hostname, or an IPv4 address in standard dot notation If name is an IPv4 or IPv6 address, no lookup is performed and gethostbyname() simply copies name into the h_name field and its struct in_addrequivalent into the h_addr_list[0] field of the returnedhostent structure...
Basically; gethostbyname returns FQDN for the host name returned by gethostname.

When we trace the commands hostname and hostname -f , we see that "hostname -f" reaches nsswitch.conf and host.conf file. So it is network aware..

read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1698
open("/etc/host.conf", O_RDONLY) ;

The nsswitch.conf file(The Name Service Switch (NSS) configuration file), /etc/nsswitch.conf, is used by the GNU C Library to determine the sources from which to obtain name-service information in a range of categories, and in what order.

When we open the /etc/nsswitch.conf;
we see the following line;
hosts:      files  dns

So this nsswitch.conf says that the system first attempts to resolve host names and IP addresses by querying files and if that fails, it tries querying a DNS server.

So the gethostbyname which is used by the hostname -f command reads /etc/nsswitch.conf and /etc/host.conf to decide whether to read information in /etc/sysconfig/network or /etc/hosts.

Note that :
We have fully qualified hostname defined in /etc/sysconfig/network.
cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=ermanhost.domain.com

We have a lot of lines in start scripts which uses this file;
cd /etc/rc.d
grep -R /etc/sysconfig/network *|wc -l
317

When we open /etc/host.conf 
we see the following line;

order hosts,bind

This means  -> "first , use /etc/hosts to retrive the hostname , if you cant find it then try dns query"

So nsswitch.conf and host.conf say pretty much the same thing here.  So why do we have both of them?
It seems because the older Linux standard library, libc, used /etc/host.conf as its master configuration file, but new GNU standard library, glibc, uses /etc/nsswitch.conf.

Insteresting thing is that;
hostname -s command , which is used for returning the shortname of the servers uses /etc/nsswitch.conf and /etc/host.conf files, returns the short hostname as expected.

So what do we have so far;

hostname -s retruns ermanhost   (GOOD)
hostname -f returns ermanhost.domain.com (GOOD)
hostname returns ermanhost.domain.com  (BAD) ..

hostname command should not return ermanhost.domain.com(FQDN) when it is called without any arguments..
It uses gethostname as expected, but it should not return the FQDN..

As I mentioned above this might be related with the HOSTNAME defined as FQDN in /etc/sysconfig/network. 

Lets explore this ;

When we execute hostname command 
hostname is derived from -> uname directly(without going nsswitch.conf or host.conf -> uname derives this info from the kernel structrure -- > So we need to know ;

What does set this hostname as FQDN in the structure.
When is it set?
What is the configuration file that is used by the cede that sets the hostname as FQDN in the kernel structure?  ( I suspect this is /etc/sysconfig/network bytheway)

So we need to have a look to the boot process of Redhat Based Linux and here is it ;

I m not gonna startup from the Bios :)
Here is the info we need  :
When the init command starts, it becomes the parent or grandparent of all of the processes that start up automatically on the system. First, it runs the /etc/rc.d/rc.sysinit script....

When we look at the startup scripts , we see the following line in /etc/rc.sysinit file;

if [ -f /etc/sysconfig/network ]; then
    . /etc/sysconfig/network

We also see the following lines;

# Set the hostname.
update_boot_stage RChostname
action $"Setting hostname ${HOSTNAME}: " hostname ${HOSTNAME}

So , we have found the command that sets the hostname..
It basically gets the hostname from the /etc/sysconfig/network file at sets the hostname accordingly..

So far so good.. We know what we need to know about setting & getting the hostname in Linux.

Lets summarize the gathered info, make our comments and describe the best practice for setting hostnames in Linux :
  • HOSTNAME in /etc/sysconfig/network should be the machine name- not the FQDN. 'hostname' should ideally simply return the actual hostname.
  • /etc/resolv.conf must be properly configured for searching the domain.
  • /etc/hosts mut be properly configured to contain both FQDN and machine name.
DEMO:

False setting: 
[root@ermanhost ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ermanhost.ermandomain.com
[root@ermanhost ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain
10.34.50.104 ermanhost.ermandomain.com ermanhost
[root@ermanhost ~]# hostname -s
ermanhost
[root@ermanhost ~]# hostname -f
ermanhost.ermandomain.com
[root@ermanhost ~]# hostname
ermanhost.ermandomain.com --> this shouldnt be FQDN

Good setting:

[root@ermanhost ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ermanhost
[root@ermanhost ~]# hostname
ermanhost                    -->>   GOOD
[root@ermanhost ~]# hostname -f
ermanhost.ermandomain.com
[root@ermanhost ~]# hostname -s
ermanhost

Note that: if you change the /etc/sysconfig/network without rebooting the server, your hostname will still use the old hostname.. To change hostname in linux you need to issue hostname "newname" command , and then you must change the /etc/sysconfig/network file with the new hostname.. We change the /etc/sysconfig/network for making the change permenant after reboot.


Now lets see the effect of a wrong hostname setting in EBS ;
Note that : this is applicable for 11i, R12 and 12.2

Here it is documented as follows;

Concurrent Managers Fail To Start After New Install of Release 12 (Doc ID 413164.1)   
Basically, when we try to start concurrent managers in a machine with a long hostname; we end up with the following and concurrent managers cant be able to start.

ERROR
APP-FND-01564: ORACLE error 12899 in insert_icm_record
Cause: insert_icm_record failed due to ORA-12899: value too large for column "APPLSYS"."FND_CONCURRENT_PROCESSES"."NODE_NAME" (actual: 31, maximum: 30)

This is because the column in FND_CONCURRENT_PROCESSES table is VARCHAR2(30).
So we need to use a hostname which must be maximum 30 chars long.

But what if we need to have long fully qualified DN?
Then, we need to apply the approach as I wrote above.(The Good setting)

That's all what I need to say about this topic.

Okay.. In this article , we have learned the logic behind the hostname setting of linux. We have worked with strace , written a c program to get the hostname from the kernel structure, reviewed the boot process of linux  , stated the proper setting for hostname and lastly seen this information in action on  a EOracle BS problem.

I hope you 'll find it useful.

No comments :

Post a Comment