Saturday, August 4, 2018

Oracle VM Server -- Guest VM in blocked state, VM console connection(VNC), Linux boot (init=/bin/bash)

This is an interesting one.. It involves an interesting way of booting Linux, dealing with Oracle Vm Server and its Hypervisor.

Last week, after a power failure in a critical datacenter, one of the production EBS application nodes couldn't be started.. That EBS application node was a VM running on a Oracle VM Server, and although Oracle VM Server could be started without any problem, that application node couldn't.

As, I like to administrate Oracle VM Server using xm commands, I directly jumped into the Oracle VM Server by connecting it using ssh (as root).

The repositories were there.. They were all accessable and xm list command were showing that EBS node, but its state was "b".. (blocked)

I did restart the EBS Guest VM couple of times, but it didn't help.. The EBS Guest VM was going into the blocked state just after starting.

Customer was afraid of it, as the status "blocked" didn't sound good...

However; the fact was that, it was normal for a Guest VM to be in blocked status if it doesn't do anything or let's say if it has nothing actively running on CPU.

This fact made me think that there should be problem during the boot process of this EBS Guest VM.

The OS installed on this VM was Oracle Linux, and I thought that, probably, Oracle Linux wasn't doing anything during its boot process.. Maybe it was asking something during its boot, or maybe it was waiting for an input..

In order to understand that, we needed to have a console connection to this EBS Guest VM..

To have a console connection, I modified the vm.cfg of this EBS Guest VM -- actually added VNC specific parameters to it.

Note that, in Oracle VM Server we can use VNC to connect to the Guest machines even during their boot process.

After modifying the vm.cfg file of the EBS Guest VM, I restarted this guest machine using xm commands and directly connnected to its console using VNC.

I started to watch the Linux boot process of this EBS Guest VM and then I saw it stopped..
It stopped, because it was reporting a filesystem corruption and asking us to run fsck manually..

So far so good.. It was as I expected..

The Oracle Linux normally was asking for the root password to be able to give us a terminal for running fsck manually. However; we just couldn't get the password.

So we were stuck..

We tried to ignore the fsck message of Oracle Linux, but then it couldn't boot..

We needed find a way.

At that time, I put my Linux admin hat on , and did the following;

During the boot, I opened the GRUB(GRand Unified Bootloader) menu. (bootloader)
Selected the appropriate boot entry (uek kernel in our case) in the GRUB menu and pressed e to edit the line.
Selected the kernel line and pressed e again to edit it.
Appended init=/bin/bash at the end of line.
Booted it.

By using the init=/bin/bash, I basically told the Linux kernel to run /bin/bash as init, rather than the system init.

As you may guess, by using init=/bin/bash, I booted the Linux and obtained a terminal without supplying the root password.

After this point, running fsck was a piece of cake :)

So I executed fsck for the root filesystem and actually for the other ones also.. Repaired all of them and rebooted the Linux once again..

This time, Linux OS of that virtualized EBS application node booted perfectly and the EBS application services on it could be started without any problems..

It was a stressful work but it made me have this interesting story :)

No comments :

Post a Comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.