Monday, December 8, 2025

Jagent / Monitoring Agent 12.2.1.2 & Oracle Goldengate 21.3 / java.lang.IllegalArgumentException: source parameter must not be null

 You are using Oracle Goldengate 21.3, the classical one (not the microservices architecture based one), and you want to monitor the activities of Goldengate?

You installed an Oracle Enterprise Manager, deployed the Goldengate Plugin, installed Jagent / Monitoring Agent 12.2.1.2 into the targets and configured them. 

You saw PMSRVR and JAGENT processes in GGSCI outputs, you started them, and they were in RUNNING status, and you told yourself: okay, so far so good.

Then, you used Oracle Enterprise Manager's auto discovery for discovering thee monitoring agents by the using relevant information like the port , host and etc..

Oracle Enterprise Manager didn't get any errors and discovery completed successfully, but! nothing changed? Couldn't Goldengate Monitoring agents be discovered?

Then you jumped into the servers where you installed those agents and checked the listen port by using netstat and everything seemed fine.

However; when you checked the log of those agents, you saw something like the following.

Could not get WSInstance Information , [[
com.goldengate.monitor.model.AgentCommunicationException: Failed to get Registry.
at com.goldengate.monitor.jagent.comm.ws.ManagerWSApi.getInstanceInfo(ManagerWSApi.java:239)
at com.goldengate.monitor.jagent.comm.ws.ManagerWSAdapter.getInstanceInfoNative(ManagerWSAdapter.java:105)
at com.goldengate.monitor.jagent.comm.impl.ManagerFacadeImpl.fillInstanceMap(ManagerFacadeImpl.java:259)
at com.goldengate.monitor.jagent.comm.impl.ManagerFacadeImpl.getInstances(ManagerFacadeImpl.java:253)
at com.goldengate.monitor.jagent.jmx.MBeansContainerImpl.createMBeans(MBeansContainerImpl.java:344)
at com.goldengate.monitor.jagent.jmx.MBeansContainerImpl$1.run(MBeansContainerImpl.java:836)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: source parameter must not be null
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:119)
at com.sun.xml.internal.ws.message.AbstractMessageImpl.readPayloadAsJAXB(AbstractMessageImpl.java:135)
at com.sun.xml.internal.ws.api.message.MessageWrapper.readPayloadAsJAXB(MessageWrapper.java:171)
at com.sun.xml.internal.ws.client.dispatch.JAXBDispatch.toReturnValue(JAXBDispatch.java:88)
at com.sun.xml.internal.ws.client.dispatch.DispatchImpl.doInvoke(DispatchImpl.java:274)
at com.sun.xml.internal.ws.client.dispatch.DispatchImpl.invoke(DispatchImpl.java:289)
at com.goldengate.monitor.jagent.comm.ws.ManagerService.getRegistry(ManagerService.java:267)
at com.goldengate.monitor.jagent.comm.ws.ManagerWSApi.getInstanceInfo(ManagerWSApi.java:230)

Then you thought deleting and creating datastore commands would help, but you got invalid command errors in GGSCI ( cause those commands were already deprecated in the OGG version that you were using).. 

Is this what happened?

Well.. Here is the cure for you ->  Patch 31997138 - Oracle GoldenGate Monitor 12.2.1.2.210930 (PS2 BP8) (Server+Agent) 

That patch will solve it.

I will give the same message again: 

Never take any initiative. Take all the steps documented in Oracle documentation.
Follow the steps in the documentation exactly 
Follow the correct documentation.!   For this case it is ->  "Doc ID 2314622.1 How To Enable Monitoring For GoldenGate 12.3.x Targets Using Oracle Enterprise Manager 13c R2+"
 
That's the tip of the day..

Monday, November 24, 2025

Oracle GoldenGate and ODI Working Together for Near-Real Time Data Ingestion for DWH Projects

We're all chasing the same goal, near-real time data. You simply cannot run a modern business, especially in the Financial Services sector, with batch processes that deliver data hours late. That's where Change Data Capture (CDC) shines, and in our recent DWH project, we deployed the absolute best: Oracle GoldenGate (OGG), especially when we replicate data from Oracle to Oracle..

Let's be clear about what CDC is. It is a vital mechanism to capture only the data that has changed , ensuring minimal load on the source database. For a Data Warehouse, this is the gold standard for data ingestion and even lightweight transformations. Of Course we're not talking about querying the whole table every few minutes; we're talking about direct, surgical capture from the transaction logs. (redo logs and archived log in Oracle case)

Oracle GoldenGate is not just a CDC tool; it's a real-time replication solution, and dedicated to that. For an Oracle-to-Oracle environment, it’s arguably the best in class.

Some background info about the key capabilities of Oracle Goldengate;

Goldengate ensures continuous data synchronization and maintains high availability. If a process fails, it has the ability to resume operation after the failure.
While best with Oracle, it can handle replication across diverse environments, including PostgreSQL, MS SQL, and MySQL.
By using Goldengate, we can transform data before it hits the target, a key feature for ensuring data quality early.
The architecture supports multi-thread and/or multi-process/multi-target setups, which is critical for handling massive volumes of transactional data.

From now on; I will use OGG acronym for Goldengate. Just noted that here..

The beauty of OGG lies in its streamlined and robust process architecture.
Let's give some information about the processes of OGG, and how we used them in our recent project..

Extract: This is the capture agent. In our project, we chose the Integrated Extract. This process is configured on the source database and is responsible for capturing changes from the source database, writing all captured DML operations to a local trail file. We chose Integrated Extract because it uses Oracle server-side internal operations to get the changes, and it automatically scales with the CPU for better performance and high throughput.

Pump: In our recent project, a Pump process reads the local trail files and securely sends them over the network to the target OGG environment. This process acts as a high-speed data distributor. (note that, these path can be secured by TLS/SSL enablement) 

Replicat: This is like the the apply agent, configured on the target database. It read the remote trail files and apply the changes to the target table(s). We use parameters like ASSUMETARGETDEFS to map columns automatically and even UPDATEDELETES to treat incoming DELETE operations as UPDATEs, crucial for DWH history tracking.

Note that; This structure was successfully deployed on a massive scale, including the capture processes for a 1 TB table with a billion rows (X_TABLE).

OGG have light transformation capabilities, so at a point a crucial handover might take place. So in our case, in a recent DWH project that handover was from "OGG to ODI for Data Conversion"..

This is where the entire CDC flow comes together for our DWH project. OGG’s job is to efficiently and reliably get the changed transactional data from the the critical core PROD / source system into a stable staging layer, which we call the ODS (Operational Data Store)

The flow is explicit.

Source - > ODS  

Once the data hits the ODS, OGG's primary task is done. Of course we do some lightweight transformation here, we used SQLEXEC configurations for doing these things in OGG and we enriched the replicated data, with the operation type and timestamp to make the change history for the records to be maintained in the target / ODS & DWH. But the heavy transformation was in the next state.. The next stage involves the complex, business-rule-heavy transformations necessary to structure the data for the final DWH targets.

This is the point of handover to Oracle Data Integrator (ODI).

ODS -> ODS2 / DWH

ODI then uses its trigger-based mechanism, specifically designed like CDC, to be triggered by the changed records and read the changed records in the ODS. ODI's job is to manage the parsing and heavy transformation logic.

We are talking about journalizing here.. Journalizing in ODI is a change data capture mechanism that tracks inserts, updates, and deletes in source tables , allowing downstream processes to load only the changed data.

So OGG captures the changed data and write to the target, there ODI captures the written data, triggered with it and continues processing..

So, OGG for high-speed, real-time capture and replication into a staging area, and ODI for complex, CDC-aware transformations into the final structure. At the end; we achieve an architecture that is both highly efficient and massively scalable.

This is how you build a modern DWH fit for todays need. Real Time DWH and Data Lakehouse(s) are one step ahead.  In order to have real time DWH, all data processing logic should be done in a single streaming layer, and all data should be considered a stream. Next time, we will take a look at those things as well.

That's it for today. I hope you'll find this blogpost helpful.

Friday, November 7, 2025

Oracle Linux KVM vs. Red Hat KVM for Oracle Products

A recent discussion on my forum highlighted the importance of KVM selection for Oracle Customers.

A member of my forum created a thread and mentioned that they are moving their critical E-Business Suite (EBS) system from physical hardware to a virtual environment. They see KVM as the key to proper, aligned licensing ( due to capabilities of hard partitioning and CPU pinning).

So far so good.. But which KVM?

They were using Red Hat Enterprise Linux (RHEL), and they know that KVM is baked into RHEL. So, they were planning to install RHEL, run KVM, and use that to host their Oracle database and EBS. They knew that Oracle Products (database and EBS in this case)  were certified with RHEL. 

However; there was an important distinction there. I mean, the operating system running the application is one thing. The operating system running the hypervisor that is defining your license boundary is another. So we must differentiate...

Oracle's policy on virtualization is clear, technical, and brutally enforced. For you to claim Hard Partitioning (the ability to restrict your Oracle license count to just the cores assigned to the VM), you must use a technology that Oracle explicitly approves.

Oracle has been very specific. You cannot rely on a non-Oracle KVM vendor's implementation of CPU pinning for licensing purposes. 

Yes, Oracle Products will run on RHEL KVM , but it is important to note the following->

Oracle Products are not certified to run on virtual machines/guests provided by Xen or KVM offerin
gs by Red Hat, Novell, SUSE, Ubuntu, Citrix, Nutanix, or XenSource."
So, you may have issues if you go with Redhat and KVM, and if any issues arise, you ll be the one that will try to solve them.. (Redhat may support you but it depends according to the issue type.)

That means, if you run Oracle software on a VM hosted by Red Hat KVM, even if you technically pin the CPUs, there is still a support risk and licensing risk.!

Support Risk / Not Certified: If the issue is related to the virtualization layer, Oracle Support can, and likely will ask you to reproduce the issue on a supported platform. In such a case, you may be the one trying to solve complex kernel-level issues.

Licensing Risk: The license auditor will consider this Soft Partitioning. This means you are liable for licensing the entire physical server's capacity, regardless of your CPU pinning efforts. The cost savings you planned for are gone.

Note: in my opinion; , there is no difference between running an Oracle Linux with Red Hat Compatible Kernel and running a Red Hat Enterprise Linux, binary wise. So it is better to go with Oracle Linux in that sense as well..

That means the only way to use KVM and confidently argue for core-level licensing is to use Oracle Linux KVM.  That is the platform specifically engineered and approved to meet the hard partitioning criteria for Oracle licensing.

Here is that thread about this topic: http://erman-arslan-s-oracle-forum.124.s1.nabble.com/Migration-of-physical-to-vm-licensing-td13040.html

In summary;

Reference MOS Document ID 417770.1 -> Oracle products are not certified to run on guest virtual machines provided by Red Hat's KVM offering.

You can use Redhat KVM but you may issues with Oracle Support (if there is a need) + you may have license -cpu core alignment issues...

Use Oracle Linux KVM..

Finally, here is what Oracle says: "Only Oracle Linux KVM can be hard partitioned. No other Linux KVM can be hard partitioned because the use of Oracle software is required to implement and monitor the cpu pinning. No other Linux distribution includes the required olvm_vmcontrol utility. The OLKVM hard partitioning CFD says "The olvm-vmcontrol utility is required to set and get the CPU/vCPU bindings for a virtual machine running on Oracle Linux KVM through the Oracle Linux Virtualization Manager."

That's it. Tip of the day:)

Oracle Linux Boot Failure: A GLIBC Version Mismatch / GLIBC_2.33' not found

In this blog post I share a real production incident and its resolution. While the issue was severe, proper troubleshooting methodology and rescue media made recovery possible without data loss.

An Oracle Linux 8.10 production server suddenly became unresponsive. The system would boot but freeze indefinitely at the graphical login screen, showing only the Oracle Linux logo with a loading spinner that never completed.

No amount of waiting helped. The system was completely inaccessible through normal means. SSH connections timed out, and the console remained locked at the authentication screen.

Our initial discovery was trough the emergency shell access. I mean, to diagnose the issue, we bypassed the normal boot process using GRUB emergency parameters:

# At GRUB menu, press 'e' on the kernel line

# Add to the linux/linuxefi line:

rw init=/bin/bash

 Once we gained shell access, the true nature of the problem became apparent immediately:

[root@myprodddb01 /]# rpm --version

rpm: /lib64/libc.so.6: version 'GLIBC_2.33' not found (required by /lib64/libcrypto.so.1.1)

[root@myprodddb01 /]# yum

yum: /lib64/libc.so.6: version 'GLIBC_2.33' not found (required by /lib64/libcrypto.so.1.1)

[root@myprodddb01 /]# dnf

dnf: /lib64/libc.so.6: version 'GLIBC_2.33' not found (required by /lib64/libcrypto.so.1.1)

Every foundational system command was broken. This was not a simple misconfiguration, this was a fundamental system library corruption.

GLIBC_2.33' not found was the error. Well, lets first take a look at the GLIBC. 

GLIBC (GNU C Library) is the core system library that nearly every Linux program depends on. It provides essential functions for:

  • Memory management
  • System calls
  • String operations
  • File I/O
  • Network operations

Without a working GLIBC, the system cannot function.

That's enough for the giving the background..

So, Oracle Linux 8.10 ships with GLIBC 2.28. However, our system's binaries were looking for GLIBC 2.33 and 2.34, which are part of Oracle Linux 9 (based on RHEL 9).

[root@myprodddb01 /]# /lib64/libc.so.6 --version GNU C Library (GNU libc) stable release version 2.28

The library version was correct (2.28), but the programs themselves (rpm, yum, ping, dnf) were looking for libraries from Oracle Linux 9.

How did this happen? In our case this is not certain yet, but we have some clues and here is the list of possible causes for such a catastrophic situation:

  • Mixed package repositories - OL9 repos were accidentally added to an OL8 system
  • Manual binary replacement - Someone copied binaries from an OL9 system
  • Failed upgrade attempt - An attempt to upgrade from OL8 to OL9 went wrong
  • Incorrect package installation - Installing .el9 RPMs on an .el8 system

Anyways; to fix the system, we needed rpm to reinstall packages. But! rpm itself was broken because it needed GLIBC 2.33. We couldn't use yum or dnf for the same reason. Even basic networking tools like ping were non-functional. The broken system could not fix itself.

The solution was in rescue mode recovery.

We booted from Oracle Linux 8 ISO, and entered Rescue Environment. The rescue environment automatically detected and mounted our system to /mnt/sysimage. The rescue environment provided working tools with correct GLIBC 2.28.

sh-4.4# /lib64/libc.so.6 --version GNU C Library (GNU libc) stable release version 2.28 Copyright (C) 2018 Free Software Foundation, Inc.

sh-4.4# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 80G 0 disk

├─sda1 8:1 0 1G 0 part

├─sda2 8:2 0 1G 0 part

└─sda3 8:3 0 70G 0 part

└─root--vg-root--lv 253:0 0 70G 0 lvm /

sh-4.4# mount /dev/mapper/root--vg-root--lv /mnt/sysroot
 

We then identified the corrupted packages using the rescue environment's working rpm: ,

sh-4.4# rpm --root=/mnt/sysroot -qa | grep "\.el9"

This command listed all Oracle Linux 9 packages installed on our OL8 system.

And, we copied the GLIBC 23 libraries (and libcrypto) from rescue to our broken system.

cp -fv /lib64/libc-2.28.so /mnt/sysroot/lib64/

cp -fv /lib64/libc.so.6 /mnt/sysroot/lib64/

cp -fv /lib64/libm*.so* /mnt/sysroot/lib64/

cp -fv /lib64/libpthread*.so* /mnt/sysroot/lib64/

 cp /lib64/libcrypto.so.1.1.1k /mnt/sysroot/lib64/

After these actions, we chrooted into the system to verify and tested the foundational commands, they were all run successfully

chroot /mnt/sysroot

rpm --version        

ping -c 2 8.8.8.8   

yum --version 

We verified the GLIBC version.

rpm -q glibc

-- Expected: glibc-2.28-251.0.1.el8.x86_64

We rebooted and tested.

exit

sync

reboot

This fixed the issue but during the emergency shell access, we also reset the root password:

In emergency mode (init=/bin/bash);

mount -o remount,rw /

passwd root

# Enter new password

sync

reboot

Well, we fixed the issue and learned a few important things.
  • Never install packages from OL9 (or any EL9) on an OL8 system, even if they seem compatible. ---Major version boundaries exist for a reason.
  • Always verify yum repository configuration.
  • Before installing RPMs, always check the version.
  • Keep Rescue Media Ready ( to save time and be ready).
  • Take snapshots before: System upgrades, Repository changes, Mass package installations.
  • Monitor Package Origins : Set up alerts for package installations from unexpected repositories.

VIRTUALIZATION MYTHS for Oracle: Migrating Physical to VM and Why Your Licensing Math is Broken

A system (this is for Oracle Products, Oracle Database especially) running happily on a dedicated physical machine needs to be migrated to a virtual environment. Simple, right? The technical migration is the easy part. The licensing, the support, the certification? That’s where the confusion, and the cost, begins.  

The Migration Team's Optimism is usually on the having the same licensing cost in the target VM environments. So at first glance; they say the licensing (related with the Oracle Licenses) cost should remain the same. The flexibility will increase, the cost stays flat. This is also the goal, actually.

But what I say is ; you are optimizing for flexibility while ignoring the fine print. Oracle’s licensing is not based on what the VM is limited to, but on what the VM is capable of accessing.

If you are using VMware, Microsoft Hyper-V, or any non-Oracle virtualization platform that does not employ "hard partitioning" (and they typically do not by default, or lets say Oracle doesn't accept their partitioning capability as hard partitioning), your licensing burden is likely to increase dramatically. You are forced to license the entire underlying physical capacity of the host cluster that the VM can potentially migrate to.

Think about it: Your small, 8-vCPU VM is sitting on a cluster with four physical hosts, each having 32 cores. The total physical core count is 4 X 32 = 128 cores. Apply the standard Oracle core factor (say, 0.5 for Intel/AMD). You might suddenly be on the hook for 128 X 0.5 = 64 licenses, not the 8 you planned for.

The cost savings you anticipated from virtualization are instantly wiped out. Isolation is the key, and non-Oracle VM solutions struggle to provide the licensing-level isolation we need.

The Oracle Solution

This is where Oracle developed solutions come into play. Oracle VM (OVM) or Oracle Linux /KVM. These platforms are explicitly designed to support hard partitioning. With KVM's CPU pinning, for example, you can logically and technically restrict a VM to a subset of the physical cores, and Oracle accepts this as the boundary for licensing.

You may say: we already invested heavily in VMware infrastructure! We can't switch now.

I Say; that is a classic case of operational convenience trumping financial risk. 

Is the cost of maintaining the VMware environment worth the exponential risk of an Oracle audit that demands licensing for the entire cluster?  This is a question that you have to ask to yourselves.

For large environments, the license cost penalty of non-Oracle virtualization can easily justify the expense of migrating to an Oracle-sanctioned hard-partitioned platform. 
The technical solution is easy; the financial one requires difficult choices.

Respect the licensing rulebook. It is often more complex, and more expensive, than the technical architecture.

That’s enough for today. Check your contracts before you virtualize.

EBS / CPU PATCHES STRIKE AGAIN: The Forms FRM-92101 Error and the Hidden URL Rewriting Trick - disabled URL rewriting

You care about system security/ You apply the Critical Patch Updates (CPU), you run the adop cycles, and the logs all shout "Success." Then you try to open a standard Oracle E-Business Suite (EBS) form, and BAM: FRM-92101. The system is shouting at you, and your users are waiting.

A recent case on my forum perfectly illustrates this predictable chaos. The user did all the right things—applied multiple AD/TXK updates, patched the database tier, and rolled through the full stack of CPU and other CVEs. Yet, the system broke.

Forms session <12> aborted: runtime process failed during startup with errors Unable to switch to Working Directory: <path>/fs1/FMW_Home/Oracle_EBS-app1/forms

The DBA's Initial Reflex in these situation is "I applied patches. Now I get FRM-92101: Unable to switch to Working Directory." This is usually a fundamental filesystem or configuration issue. Wrong path, wrong permissions, or the active FS is missing files. I ran adadmin, I checked the basic Oracle Docs. 

But! We weren't here to guess. If the Forms Server fails, we needed check the foundation. I needed the necessary outputs, I needed confirmation the pre- and post-patching actions were actually successful (not just that the adop phase finished).  In EBS 12.2, we neededed to check both filesystems (FS1 and FS2). 

1) So don't you have <FMW_HOME>/Oracle_EBS-app1 directory and <FMW_HOME>/Oracle_EBS-app1/forms directory on PROD?

2)What about TEST, do you have "<FMW_HOME>/Oracle_EBS-app1/forms" ?

3)both on PROD and TEST do the following and send me the following command outputs;

cd to <FMW_HOME>/Oracle_EBS-app1;
run pwd command
run ls -al command
then cd to forms
run pwd command
run ls -al command

4)Did you successfully applied the patches? Successfully executed the pre and post actions of the patches? Are you checked and sure for that? What document you follow / and actions you take? any missing thing there?

5)* Also check you have that forms directory on both of the fs(s) - fs1 and fs2. Maybe your active filesystem doesn't have that directory, but it is available in patch directory. Check that as well.. If so, we may take actions accordingly.
what is your run fs , and what is your patch fs ? state that as well and check accordingly. 

While going forward with this issue, my follower performed a rollback and re-applied the patch(es), and then the error disappeared. But! this time, a new error appeared. "FRM-92101". Although followed the MOS Doc: "Getting FRM-92102 Errors when Accessing a Forms Module on a Cloned Environment (Doc ID 2237225.1)", nothing changed, the issue persisted.

->
So, as far as I see, you reapply the patch(es), got errors, and fixed them and made the patching process /or scripts made continue from where they left off. You say you applied the patches successfully then..
So your final error is FRM-92101, but this time it is for failure in forms server during startup.

Check ->

E-Business Suite 12.2 Forms Error 'FRM-92101' Connection from new browser window not supported if using custom jars not added in userjarfile (Doc ID 2142147.1)
R12.2 FRM-92101 Error When Opening Multiple Forms in Separate IE11 Browser (Doc ID 2109411.1)
FRM-92101 There Was A Failure In The Forms Server During Startup (Doc ID 2594880.1)
FRM-92101:There Was A Failure In The Forms Server During Startup.This Could Happen Due To Invalid Configuration When Opening Forms (Doc ID 1931394.1

Check the following notes carefully, consider implementing the appropriate one(s) and update me with the outcome.


Anyways; the real root cause wasn't the Forms Server configuration. It wasn't the filesystem.
It was the very security patch intended to help:  the CPU (and subsequent patches).

The issue was the patch introduced code that disabled URL rewriting. This is a subtle, almost silent change, but it breaks the fundamental way the application handles Forms sessions, especially when using Remote Desktop Protocol (RDP).

The solution was a single workaround from Oracle: Re-enable URL rewriting. 

This critical piece of information was buried in an appendix of the CPU Availability Document (Doc ID 3090429.1). 

The Oracle E-Business Suite Release 12.2 April 2025 CPU introduced code that disabled URL rewriting which may result in FRM-92102 or FRM-92101 errors if you are using remote desktop protocol (RDP). Use the following steps to enable URL rewriting as a workaround until a more permanent solution is available. -> You must dig to find the actual fix.

The Lesson: When security updates are applied, the side effects are often more brutal than the initial vulnerability. You fix one hole and create two new compliance/functional nightmares. Respect the process, check the full stack, and know that sometimes the root cause is a single, seemingly minor change introduced by the very patch you thought was the solution.

That’s enough for today. The details matter.

Monday, October 27, 2025

OCI / Crushing the Memoryless Barrier with Oracle Cloud Multi-Modal AI

While pursuing topics like Markov’s Ghost (Andrey Markov, Russian Mathematician) and the Non-Markovian Leap, and making my own researchers Indivisible Stochastic Quantum Mechanics; a POC that we made recently in OCI made me feel compelled to write this blog post.

Let's not waste any time. The core engine of every single piece of Artificial Intelligence—from the simplest classifier to the big Large Language Models (LLMs)—is prediction.

And where does this power stem from? Fundamentally, from the legacy of Andrey Markov.

Markov showed us, centuries ago, how to calculate the probability of the next state based on the current one. The Markov Chain is a foundational, elegant beast. But it’s limited by a critical, crippling constraint: it’s memoryless. The next step depends only on the immediate preceding step.

Now.. Let’s be brutally honest. Is a modern LLM (like GPT, Grok, or any Transformer-based model) a Markov Chain? Conceptually, yes, it’s a probabilistic prediction machine. But technically, no. And that distinction is everything.

The Transformer’s Attention mechanism takes the memoryless constraint away. It allows the model to absorb a vast, complex history—the schema of a database, an entire paragraph, a detailed prompt, and fuse it all into one rich, non-local, highly intelligent current state for the next token prediction. It’s a predictive model that has become Non-Markovian by learning long-range dependencies.

This leap is the technical advantage we leverage to build solutions that actually work in the Enterprise environments. And, when you pair this power with a platform built for enterprise scale like Oracle Cloud Infrastructure (OCI), you get results that make data experts happy.

Well, lets make the intellectual connection...

As an Oracle ACE Pro, I’m always focused on building solutions that leverage OCI’s native power. We set out to engineer a single, unified Flask API that applies this advanced predictive intelligence across two critical modalities: voice and text. The goal was to allow users to interact with an Oracle Autonomous Database, but with simple human language.

Here is the high-level architecture, the predictive loop we built on OCI;

The SQL Generation, the Prediction Engine is where the magic happens. Here we turn unstructured human thought into structured, executable code.

A raw LLM is useless without context, so before a question hits the Grok model, our system dynamically extracts the table names, columns, and foreign keys from the Oracle Autonomous Database (23ai). This schema metadata becomes part of the prompt, it defines the model's current, powerful predictive state.

The OCI Grok model uses this context to predict and generate perfect, Oracle-compatible SQL.

The API executes the generated SQL and returns the data, bypassing the need for a developer to write any code. You just ask, and the database answers.

We also implemented a speech-to-Text capability. To make the application truly multi-modal, we needed to handle voice commands. 

In this context, the client sends base64-encoded audio. The Flask API takes this, uploads it immediately to OCI Object Storage. There in OCI, the core transcription job is initiated using the OCI AI Speech Service.
Since transcription is an asynchronous, the API waits continuously the OCI service until the job successfully completes and the predicted text is ready.

Soı, we constructed a complete, enterprise-grade AI solution in under 500 lines of Python, and we built the solution by leveraging the power of OCI’s integrated services.
This project isn’t just about chaining together APIs; it’s about architecting a unified system where OCI Generative AI, AI Speech, Autonomous Database, and Object Storage work as a seamless whole.


We moved beyond the limitations of classical predictive models. We used OCI to crush the memory barrier and build a true intelligent assistant. 

Erman Arslan (Oracle ACE Pro & Data Engineer)

Saturday, May 31, 2025

Creating Database Domain on SuperCluster and Installing Cluster Database with OEDA

Today, we will take a look at the Supercluster side of things. That is, setting up a new Database Domain on Oracle Super Cluster M8 and performing a Cluster Database installation using the Oracle Exadata Deployment Assistant (OEDA) tool.  

Supercluster is already in end of support status, but we are still seeing it hosting critical environments. Of course, it won't go like this and Super Cluster customers will probably replace their Super Clusters by placing PCA(s) and Exadata(s), but now, respect to what Super Cluster has contributed so far, today's blog post will be about Super Cluster.



This isn't just another generic guide; I'm going to systematically walk through the steps, highlighting critical details, especially around configuring the infrastructure. I will also share the steps you absolutely need to skip. Consider this your high level go-to reference for similar installations.




1. Creating a New Oracle Database Domain via IO Domain tab. (we do this on both of the nodes)

First things first, let's get our new Database Domain up and running on the Super Cluster.
Open the Super Cluster Virtual Assistant screen.


Navigate to the I/O Domains tab on the navigation panel.
Click the Add button to create a new domain.
Input all the necessary parameters for each domain, including CPU, memory, and network settings.
 
 
2. Database Configuration with OEDA

Now that our domains are ready, let's get OEDA involved. We know OEDA from the Exadata environments, but we see it in Super Cluster as well. 

2.1. OEDA Preparations

OEDA helps you with the prerequisites too.
Launch the Oracle Exadata Deployment Assistant (OEDA) tool.
Select the two newly created database domains and perform the JOC File Export operation. This action will generate an XML configuration file containing all the domain-related information.
 
2.2. Obtaining DNS and Installation Files

Refer to the installation template generated by OEDA:
APPENDIX A: DNS requirements
APPENDIX B: Files to be used for installation
Prepare these files and place them in the appropriate directories.

2.3. Placing Installation Files

Keep your OEDA directory structure tidy
Copy the installation files specified in APPENDIX B into the WorkDir folder within your OEDA directory structure.
 
2.4. SSH Requirement

This is a crucial step.
Since we're installing on SuperCluster, passwordless SSH connectivity must be configured over the ZFS rpool for both database domains.
Both Grid and Database software will be installed directly on ZFS.
 
3. OEDA Installation Commands

Once everything is set up, it's time to run the OEDA commands on the respective domains:

Following  command lists all the installation steps.

instal.sh -cf xml_file -l (character l)

Following command validates the configuration.

install.sh -cf xml_file -s 1 (number 1)

If the validation is successful, the following steps are executed sequentially:

install.sh -cf xml_file -s 2
install.sh -cf xml_file -s 3 
install.sh -cf xml_file -s 4

4. Steps That Must NOT Be Executed

IMPORTANT: Since there are already other database domains running on the system, the following steps "MUST NOT" be executed. Failing to skip these can lead to data loss or system instability for existing domains! ->

Step 5: Calibrate Cells
Step 6: Create Cell Disks
Step 17: Resecure Machine

5. Installation Step List (Overview)

Here’s a quick overview of the OEDA installation steps:

Validate Configuration File

Setup Required Files

Create Users

Setup Cell Connectivity

Calibrate Cells (SKIP THIS!)
Create Cell Disks (SKIP THIS!)

Create Grid Disks

Install Cluster Software

Initialize Cluster Software

Install Database Software

Relink Database with RDS

Create ASM Diskgroups

Create Databases

Apply Security Fixes

Install Exachk

Create Installation Summary

Resecure Machine (SKIP THIS!)

6. Completing the Installation

Once you’ve followed all the steps above, the installation for the new database environment (GRID /RAC + RDBMS installed) in your Super Cluster environment should be successfully completed. Always remember to perform system tests and verify access to finalize the installation.

7. Known Issues
 
Before starting the OEDA installation, since the installation will be on the Super Cluster IO Database Domain Global zone, passwordless SSH settings must be configured between the ZFS storage and the IO Domains. 

The /u01 directory, where the installation will take place, resides on ZFS.

During OEDA installation, if there are other IO database domains on the Super Cluster system, it's critically important not to run the OEDA Create Cell Disk step. Otherwise, other IO domains will be affected, potentially leading to data loss.
 
Before the Grid installation, passwordless SSH access must be configured between the two nodes for the users under which the Grid and Oracle software will be installed.

That's all for today. I hope this walk through helps you navigate your Super Cluster installations with more confidence. Happy super clustering! :)

Friday, May 16, 2025

ODA -- odacli command Issue after implementing SSL: A Real SR Process in the Shadow of Missing Steps -- Lessons Learned & Takeaways

Enhancing security in Oracle Database Appliance (ODA) environments through SSL (Secure Socket Layer) configurations can ripple across various system components. Changing certificates, transforming the SSL configuration to a more secure one (with more secure and trusted certificates) can be a little tricky. However, the path to resolving issues encountered during these processes isn't always found in the documentation.

In this post, I will share a real Oracle Service Request (SR) journey around this subject. I will try to share both the technical side of things and those undocumented steps we had to follow.

The Symptom: Silence from odacli

After implementing SSL configuration (renewing the default SSL certificates of DCS agent and DCS controller with the certificates of the customer) on ODA, we hit a wall: the odacli commands simply refused to work. For instance, when tried to run: odacli list-vms, we got the following cryptic message;

DCS-12015: Could not find the user credentials in the DCS agent wallet. Could not find credential for key:xxxx

This clearly pointed to a problem with the DCS Agent wallet lacking the necessary user credentials. Despite following the configuration guides, odacli failed, and the DCS Agent felt completely out of reach.

Initial Moves: Sticking to the Script (Official Oracle Docs)

Oracle's official documentation laid out a seemingly straightforward path:

Configure SSL settings within the dcs yml file(s).
Restart DCS.
Update CLI certificates and dcscli configuration files.

We done all this. Every step was executed properly. Yet, the problem persisted. odacli continued to encounter errors.

The Real Culprit: A Missing Step, An Undocumented Must-Do

Despite the seemingly correct configurations, our back-and-forth with the Oracle support engineer through the SR revealed a critical piece of the puzzle – a step absent from any official documentation:

We get ODACILMTL PASSWORD by the following command;

/u01/app/19.23.0.0/grid/bin/mkstore \ -wrl /opt/oracle/dcs/dcscli/dcscli_wallet \ -viewEntry DCSCLI_CREDENTIAL_MAP@#3#@ODACLIMTLSPASSWORD

We get the password from the output of the command above and we use it to change the password of /opt/oracle/dcs/dcscli/dcs-ca-certs. (--custom keystore. Note that, we get the password related with DCSCLI_CREDENTIAL_MAP.  )

/opt/oracle/dcs/java/1.8.0_411/bin/keytool -storepasswd -keystore /opt/oracle/dcs/dcscli/dcs-ca-certs

We update the conf file with the ODACLIMTLSPASSWORD entries.

These two files : /opt/oracle/dcs/dcscli/dcscli.conf and /opt/oracle/dcs/dcscli/dcscli-adm.conf

The following line: 

TrustStorePasswordKey=ODACLIMTLSPASSWORD

So we do something like a mapping of  wallet and the keystore passwords using the ODACLIMTLPASSWORD.

Skip these, and even with a perfectly configured agent, odacli commands will fail because they can't access the necessary credentials.

Live Intervention and Breakthrough

During a screen-sharing session with the Oracle engineers via Zoom, we went through the following:
Re-verified and, where needed, reconfigured the dcs yml file(s).
Ensured the wallet entry was correctly added.
Executed the crucial mkstore and dcscli commands (above) 
Restarted both the Agent and CLI services.

After these, commands like odacli list-jobs and odacli list-vms started working flawlessly. 

This SR journey left us with some significant takeaways:

"Official documentation may not be always the full story." Some critical steps, like the mkstore credential mapping, might only surface through the SR process itself.

"Configuration details demand absolute precision." File names, paths, and alias definitions in Oracle configurations must be an exact match. Even a minor deviation during the adaptation of Oracle's example configurations to your environment can bring the system down.

"Configuration Files are as Crucial as Logs in Support Requests". Attaching the actual configuration files to your SR significantly accelerates the troubleshooting process for Oracle engineers.

Lessons Learned:
  • Documentation Gaps: Document the steps learned from SRs in the internal technical notes.
  • The processes behind enhancing security in Oracle environments may extend beyond the confines of official documentation. This experience wasn't just about resolving a technical problem; it was a valuable lesson in enterprise knowledge management. If you find yourself facing a similar situation, remember to explore beyond the documented steps – and make sure those learnings from SRs find their way into your internal knowledge base.

Wednesday, May 7, 2025

RAC -- Importance of pingtarget in virtualized environments & DCS-10001:Internal error in ODA DB System Creation

Recently struggled with an issue in a mission critical environment. The issue was the relocating VIPs. It started all of a sudden and diagnostics indicated some kind of a network problem.

The issue was related with failed pings. The pingtarget concept of Oracle was in the stage and due justified reasons, causing VIPs to failover to the secondary node of the RAC.

Some background information about Ping target : Delivered with 12C (12.1.0.2), useful and relevant in virtualized environments. It is there for detecting and take actions in case where network failures are not recognized in the guest VMs. It is related with the public network only, since private networks already have their own heart beat check mechanisms designed with care. So basically, if the target ip(s) can not be pinged from a RAC node, or if there is a significant delay in those pings, VIPs are failed over to the secondary node(s). The parameter is set via srvctl modify nodeapps -pingtarget command. 

Well.. This is a feature developed with the logic that "if the relevant node cannot reach the ping targets, then there is a network problem between this node and the public network, namely the clients, and this means, the clients cannot access the DBs on this node, and if so let's failover the VIPs and save the situation."

It seems innocent since it has nothing to do with the interconnect, but actually it is vital. VIP transfer(s) etc. are happening according to this routine.

In our case, a switch problem caused everything. The default gateway was set to the firewall's ip address and the responses of the firewall to ping(s) were sometimes mixed up. 

We were lucky that the ping target parameter could be set to more than one IP.  ( the fault tolerance), and that saved the day.

But here is an important thing to note: We should not set ping target to the IPs that are against the logic of this. It is necessary to set our ping target to the ip addresses of the physical and stable devices that provide connection to the outside world and that will respond to ping.

If more than one IP is to be given, those IP addresses must be the ones that belong to the devices that are directly related to the public network connections.

Also, a final note on this subject: when you set this parameter to more than one IP, there may be Oracle routines that cannot manage it. Of course, I am not talking about DB or GI, but for example, we faced this in an ODA DB System creation. DB System creation could not continue when the ping target was set to more than one IP address, we had to temporarily set the parameter to a single IP address, and then set it to multiple IP addresses ​​again when the DB System creation finished.

Well, the following is the error we got;

[Grid stack creation] - DCS-10001:Internal error encountered: Failed to set ping target on public network.\\\",\\\"taskName\\\":\\\"Grid stack

This error can be encountered due to incorrect network gateway used in DB system creation (we specify it during DB System Creation GUI(s) and we may change it in the json configuration file) , but! it can also be encountered if you specify multiple ip addresses as the ping targets. We have faced this, and temporarily set the ping target to a single (default gw) address to fix the issue in ODA DB system creation.

I hope this blog post will be helpful in informing you on the subject and will save you time when dealing with the related ODA error.