I want to share the solution that we implemented for fixing a recent issue. It was about Oracle Linux KVM and there was a misconfiguration issue with the SSL certificates of this problematic KVM environment. The certificates were attempted to be renewed manually, and the problem arose after that.
To quickly summarize the issue: KVM hosts were appearing in Down status within the OLVM (Oracle Linux Virtualization Manager) interface. Consequently, VM information and metadata were inaccessible.During our diagnostic work, we identified issues related to the hosts' SSL Certificates. The libvirtd and vdsmd services on kvmhost2 failed to start, reporting "Authentication failed" and SASL errors. Observations noted that the host kernel version was outdated and the system had an uptime of 1877 days without a restart.
kvmhost1 returned to an "Up" status in OLVM -- immediately following the certificate renewal.
For resolving kvmhost2 issue, the vdsm-tool configure --force command was executed in kvmhost2 to resolve persistent configuration issues. However; the output was not that good;
We followed the MOS Note - OLVM: How to Renew SSL Certificates that are Expired or Nearing Expiration (Doc ID 3006292.1), but! the OlvmKvmCerts.sh script was missing. So we created a SR, and got the script from Oracle Support. After that, the steps to the solution were as follows;
We renewed the certificates using the OlvmKvmCerts.sh script. (OlvmKvmCert.sh renew-all) -- executed on OLVM node.kvmhost1 returned to an "Up" status in OLVM -- immediately following the certificate renewal.
For resolving kvmhost2 issue, the vdsm-tool configure --force command was executed in kvmhost2 to resolve persistent configuration issues. However; the output was not that good;
Checking configuration status...
SUCCESS: ssl configured to true. No conflicts
Running configure...
Error: ServiceOperationError: _systemctlStart failed
Job for libvirtd.service failed.
At this point, we manually (re)started the following services: libvirtd, mom-vdsm, vdsmd, and supervdsmd. (via commands like: systemctl restart libvirtd mom-vdsm vdsmd supervdsmd) -- in some cases there may be a restart needed for the ovirt-engine as well.. ( on the engine: symstemcyl restart ovirt-engine)
SUCCESS: ssl configured to true. No conflicts
Running configure...
Error: ServiceOperationError: _systemctlStart failed
Job for libvirtd.service failed.
At this point, we manually (re)started the following services: libvirtd, mom-vdsm, vdsmd, and supervdsmd. (via commands like: systemctl restart libvirtd mom-vdsm vdsmd supervdsmd) -- in some cases there may be a restart needed for the ovirt-engine as well.. ( on the engine: symstemcyl restart ovirt-engine)
After this restart, the issue was resolved for kvmhost2 too.. All statuses were confirmed as "Success". We also executed the OlvmKvmCert.sh.again (just in case) for the kvmhost2 and this time OlvmKvmCert.sh was completed successfully. This was just a check to ensure that we will be in the safe side for a possible future certificate renewal.
Note that, ovirt-log-collector helped a lot for diagnosing the issue.
Some references:
OLVM: OlvmKvmCerts - Script to Check or Renew Hypervisor Certificates (Doc ID 3008653.1)
OLVM: How to Renew SSL Certificates that are Expired or Nearing Expiration (Doc ID 3006292.1)