Thursday, April 10, 2014

Linux -- Oracle/Redhat Linux 6.2 Network Manager problems, device state change: 8 -> 3

This week, in one of the Oracle Linux 6.2 servers, we have faced with a strange problem.
There was a problem in Linux network layer, as our ping request were cut off time to time.
/var/log/messages displays the following messages;

Apr  7 11:35:55 otmprodoic NetworkManager[2044]: <info> Activation (eth0) successful, device activated.
Apr  7 11:35:55 otmprodoic NetworkManager[2044]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.
Apr  7 11:35:56 otmprodoic dnsmasq[2782]: reading /etc/resolv.conf
Apr  7 11:35:56 otmprodoic dnsmasq[2782]: using nameserver 10.10.10.18#53
Apr  7 11:35:56 otmprodoic dnsmasq[2782]: using nameserver 10.10.10.7#53
Apr  7 11:35:53 ermansrv NetworkManager[2044]: <info> (eth0): device state change: 8 -> 3 (reason 39)
Apr  7 11:35:53 ermansrv NetworkManager[2044]: <info> (eth0): deactivating device (reason: 39).

We directly disabled IPV6, as we have suspected from it, but didnt help.

As for the analysis, I have gathered following information and jumped in to google to look for the bug records..

Device State 3 means "disconnected".
Reason 39 seems to mean ; a deactivation requested , maybe from a GUI or from ifup/ifdown command or from similar thing..
So why does Network Manager take such an action , I mean why it deactivates the device..
I habe looked at the last action that Network Manager did before deactivating the device and saw that; before deactivating the device, it just reads resolv.conf , which is the file that contains the dns server information in Linux. So the 3 lines above the errors seemed to be unrelated with the error.

Anyways, our eth0 device was become disconnected , and as a result ,our tcp packets were dropping from time to time..
After analyzing and searching a little bit, I have found the following bug reported in Redhat;
Bug 590704 - intermittent connection caused by Networkmanager
It was for Fedora actually, but thought that would apply in our case, too, because the problematic component was Network Manager.

About Network Manager :  Red Hat initiated a NetworkManager project in 2004 with the goal of enabling Linux users to deal more easily with modern networking needs, particularly wireless networking. NetworkManager takes an opportunistic approach to network selection, attempting to use the best available connection as outages occur, or as the user roams between wireless networks. It prefers Ethernet connections over “known” wireless networks, which are preferred over wireless networks with SSIDs to which the user has never connected.  

So , because there is a similar bug report for the Network Manager in Fedora and also because the messages are coming from the network manager and lastly because the meaning of reason 39 reported in the error message; I disabled the Network Manager , manually configured eth devices from ifcgh-eth files and manage them using ifup/ifdown commands..
These actions saved the day. The problem doesn't appear anymore.

Disabling Network Manager;
chkconfig NetworkManager off
service NetworkManager stop
service network restart

Of course, the disadvantage of disabling the network manager is losing the ability to use a GUI for configuring the network interfaces. 

No comments :

Post a Comment