Saturday, January 17, 2026

Exadata X11M / X11M-Z / Naming, Elasticity, RoCE and all that

Recently dived into a new project with newest Exadata(s) inside :) and today, I wanted to share you a quick info about the newest Exadata naming based on the elasticity we have, and then I will remind you the power of Exadata by revisiting things like the Game Changer : RDMA, and the mantra : "The fastest I/O is the I/O that never happens."

Okay. First the naming, and the elasticity.. These days we see Exadata X11M-Z systems in the field.. Well, for years, we dealt with eighth rack configuration, which, let's be honest, were often just quarter racks with some cores and disks software-locked in earlier generations / historically (Note that this is true for X5, and X4.. But! for X7, and onwards Eighth Rack database servers have one of the CPUs removed, so no change for them).  

Anyways, now, all Exadata X11M systems are offered as flexible configurations, starting with at least two database servers and three storage servers. So we can start with an economical and flexible configuration using Exadata X11M-Z database and storage servers, and later if needed we can expand the system by adding more X11M-Z servers or larger X11M servers. As a result, the fixed Eighth Rack configuration is not required and is no longer available. So it is more elastic now. Of course you can built something like eight rack or quarter rack. But! know that: quarter/half/full: These names are now simply nicknames describing how full the cabin is, not official product codes.

However; still if you check the data sheet, you may see some quarter rack configuration. But! when you check it carefully, you see it there in two different ways:

Standard Quarter Rack, and Z-Series Quarter Rack.

In other words, Quarter Rack has become a term that refers to the minimum occupancy rate inside the cabinet, rather than a hardware model.

Okay, so far so good..

Alright, let's dive deep into the internals, focusing on the fast OLTP capabilities and the magic of RDMA and Persistent Memory. 

You may already know the mantra: "The fastest I/O is the I/O that never happens." But what if you have to do I/O? What if every microsecond counts? This is where Exadata isn't just fast; it’s architecturally brilliant, especially for OLTP workloads.

We often talk about Smart Scan and its analytical powers, offloading terabytes of data processing to storage. But for OLTP, where individual transactions need sub-millisecond response times, a different feature of Exadata takes place. That is the RDMA and the intelligent use of Persistent Memory (PMEM).

Most of the time, we see traditional Storage system fall short...Think about a typical OLTP transaction: a single row update or a quick lookup by primary key.. These tasks are like searching a needle in the haystack.. Actually sometimes this phrase is not enough to describe the hardness of it.. Let's say we are looking for a particular needle in a stack of needles :)

 In a traditional architecture, even for a single block read, we (in the background) issue an I/O request through the operating system, and then we wait for the storage controller to process it. The data travels over the SAN network. Then, our database server's OS receives it, handles interrupts, and context switches. Finally, we get our data in our Oracle database instance.

Each of these steps adds latency. So, thousands of transactions, these delays (although there are micro) create significant bottlenecks. This is where Exadata's features comes into play.

With RDMA / Remote DMA / Remote Direct Memory Access, what happens is, we bypass the Kernel, and we go directly to the memory...

So, Exadata completely re-engineers this critical path for OLTP I/O. Instead of the database server's CPU and OS being involved in every single I/O operation, Exadata leverages RDMA over RoCE interconnect.

With RDMA, the database server can directly access memory on the Exadata Storage Cells, and it gets the data it needs with only a minimal CPU involvement on Storage Cell's CPU or OS.

So, basically it bypasses the kernel, which means less context switching, fewer interrupts. 

Okay.. Let's visit the subject of Persistent Memory (PMEM). Well, Exadata Storage Servers come equipped with PMEM, a revolutionary technology that sits between DRAM and Flash. 
It's like DRAM but persistent like Flash. Exadata intelligently caches frequently accessed OLTP data blocks in this PMEM. When the database server needs a block that's in the PMEM cache of a Storage Cell, it uses RDMA to read it directly. You see the benefit, right? The power of Exadata is not only due to having a fast storage; it's intelligent, again! -> software and hardware engineered to work together!

In this blog post, I focused only on some of the key features and put them forward in the OLTP context, by I think we all already know the other well known features like Smart Scan and Storage Index.
These are still there, making Exadata a worlds one of the fastest database machine for dealing with hybrid workloads, as it is capable of handling both OLTP and OLAP simultaneously without compromise.

Understanding these underlying mechanisms always excites me , and these things are not just academic, it's crucial for delivering real-world results.

Until next time, keep optimizing, keep questioning, and keep digging into those internals.

Feel free to share your thoughts here and your questions on my forum: http://ermanarslan.blogspot.com.tr/p/forum.html

Thursday, January 8, 2026

OLVM connectivity issue due to hosts' SSL Certificates, KVM hosts were appearing in Down status & info about the script: OlvmKvmCerts.sh

I want to share the solution that we implemented for fixing a recent issue. It was about Oracle Linux KVM and there was a misconfiguration issue with the SSL certificates of this problematic KVM environment. The certificates were attempted to be renewed manually, and the problem arose after that.

To quickly summarize the issue: KVM hosts were appearing in Down status within the OLVM (Oracle Linux Virtualization Manager) interface. Consequently, VM information and metadata were inaccessible.

During our diagnostic work, we identified issues related to the hosts' SSL Certificates. The libvirtd and vdsmd services on kvmhost2 failed to start, reporting "Authentication failed" and SASL errors. Observations noted that the host kernel version was outdated and the system had an uptime of 1877 days without a restart.

We followed the MOS Note - OLVM: How to Renew SSL Certificates that are Expired or Nearing Expiration (Doc ID 3006292.1), but! the OlvmKvmCerts.sh script was missing. So we created a SR, and got the script from Oracle Support. After that, the steps to the solution were as follows;

We renewed the certificates using the OlvmKvmCerts.sh script. (OlvmKvmCert.sh renew-all) -- executed on OLVM node.

kvmhost1 returned to an "Up" status in OLVM -- immediately following the certificate renewal. 

For resolving kvmhost2 issue, the vdsm-tool configure --force command was executed in kvmhost2 to resolve persistent configuration issues. However; the output was not that good;

Checking configuration status...
SUCCESS: ssl configured to true. No conflicts
Running configure...
Error: ServiceOperationError: _systemctlStart failed
Job for libvirtd.service failed.

At this point, we manually (re)started the following services: libvirtd, mom-vdsm, vdsmd, and supervdsmd. (via commands like: systemctl restart libvirtd mom-vdsm vdsmd supervdsmd) -- in some cases there may be a restart needed for the ovirt-engine as well.. ( on the engine: symstemcyl restart ovirt-engine)

After this restart, the issue was resolved for kvmhost2 too.. All statuses were confirmed as "Success".  We also executed the OlvmKvmCert.sh.again (just in case) for the kvmhost2 and this time OlvmKvmCert.sh was completed successfully.  This was just a check to ensure that we will be in the safe side for a possible future certificate renewal.

Note that, ovirt-log-collector helped a lot for diagnosing the issue.

Some references:

OLVM: OlvmKvmCerts - Script to Check or Renew Hypervisor Certificates (Doc ID 3008653.1)
OLVM: How to Renew SSL Certificates that are Expired or Nearing Expiration (Doc ID 3006292.1)