P A G E S

Saturday, May 31, 2025

Creating Database Domain on SuperCluster and Installing Cluster Database with OEDA

Today, we will take a look at the Supercluster side of things. That is, setting up a new Database Domain on Oracle Super Cluster M8 and performing a Cluster Database installation using the Oracle Exadata Deployment Assistant (OEDA) tool.  

Supercluster is already in end of support status, but we are still seeing it hosting critical environments. Of course, it won't go like this and Super Cluster customers will probably replace their Super Clusters by placing PCA(s) and Exadata(s), but now, respect to what Super Cluster has contributed so far, today's blog post will be about Super Cluster.



This isn't just another generic guide; I'm going to systematically walk through the steps, highlighting critical details, especially around configuring the infrastructure. I will also share the steps you absolutely need to skip. Consider this your high level go-to reference for similar installations.




1. Creating a New Oracle Database Domain via IO Domain tab. (we do this on both of the nodes)

First things first, let's get our new Database Domain up and running on the Super Cluster.
Open the Super Cluster Virtual Assistant screen.


Navigate to the I/O Domains tab on the navigation panel.
Click the Add button to create a new domain.
Input all the necessary parameters for each domain, including CPU, memory, and network settings.
 
 
2. Database Configuration with OEDA

Now that our domains are ready, let's get OEDA involved. We know OEDA from the Exadata environments, but we see it in Super Cluster as well. 

2.1. OEDA Preparations

OEDA helps you with the prerequisites too.
Launch the Oracle Exadata Deployment Assistant (OEDA) tool.
Select the two newly created database domains and perform the JOC File Export operation. This action will generate an XML configuration file containing all the domain-related information.
 
2.2. Obtaining DNS and Installation Files

Refer to the installation template generated by OEDA:
APPENDIX A: DNS requirements
APPENDIX B: Files to be used for installation
Prepare these files and place them in the appropriate directories.

2.3. Placing Installation Files

Keep your OEDA directory structure tidy
Copy the installation files specified in APPENDIX B into the WorkDir folder within your OEDA directory structure.
 
2.4. SSH Requirement

This is a crucial step.
Since we're installing on SuperCluster, passwordless SSH connectivity must be configured over the ZFS rpool for both database domains.
Both Grid and Database software will be installed directly on ZFS.
 
3. OEDA Installation Commands

Once everything is set up, it's time to run the OEDA commands on the respective domains:

Following  command lists all the installation steps.

instal.sh -cf xml_file -l (character l)

Following command validates the configuration.

install.sh -cf xml_file -s 1 (number 1)

If the validation is successful, the following steps are executed sequentially:

install.sh -cf xml_file -s 2
install.sh -cf xml_file -s 3 
install.sh -cf xml_file -s 4

4. Steps That Must NOT Be Executed

IMPORTANT: Since there are already other database domains running on the system, the following steps "MUST NOT" be executed. Failing to skip these can lead to data loss or system instability for existing domains! ->

Step 5: Calibrate Cells
Step 6: Create Cell Disks
Step 17: Resecure Machine

5. Installation Step List (Overview)

Here’s a quick overview of the OEDA installation steps:

Validate Configuration File

Setup Required Files

Create Users

Setup Cell Connectivity

Calibrate Cells (SKIP THIS!)
Create Cell Disks (SKIP THIS!)

Create Grid Disks

Install Cluster Software

Initialize Cluster Software

Install Database Software

Relink Database with RDS

Create ASM Diskgroups

Create Databases

Apply Security Fixes

Install Exachk

Create Installation Summary

Resecure Machine (SKIP THIS!)

6. Completing the Installation

Once you’ve followed all the steps above, the installation for the new database environment (GRID /RAC + RDBMS installed) in your Super Cluster environment should be successfully completed. Always remember to perform system tests and verify access to finalize the installation.

7. Known Issues
 
Before starting the OEDA installation, since the installation will be on the Super Cluster IO Database Domain Global zone, passwordless SSH settings must be configured between the ZFS storage and the IO Domains. 

The /u01 directory, where the installation will take place, resides on ZFS.

During OEDA installation, if there are other IO database domains on the Super Cluster system, it's critically important not to run the OEDA Create Cell Disk step. Otherwise, other IO domains will be affected, potentially leading to data loss.
 
Before the Grid installation, passwordless SSH access must be configured between the two nodes for the users under which the Grid and Oracle software will be installed.

That's all for today. I hope this walk through helps you navigate your Super Cluster installations with more confidence. Happy super clustering! :)

Friday, May 16, 2025

ODA -- odacli command Issue after implementing SSL: A Real SR Process in the Shadow of Missing Steps -- Lessons Learned & Takeaways

Enhancing security in Oracle Database Appliance (ODA) environments through SSL (Secure Socket Layer) configurations can ripple across various system components. Changing certificates, transforming the SSL configuration to a more secure one (with more secure and trusted certificates) can be a little tricky. However, the path to resolving issues encountered during these processes isn't always found in the documentation.

In this post, I will share a real Oracle Service Request (SR) journey around this subject. I will try to share both the technical side of things and those undocumented steps we had to follow.

The Symptom: Silence from odacli

After implementing SSL configuration (renewing the default SSL certificates of DCS agent and DCS controller with the certificates of the customer) on ODA, we hit a wall: the odacli commands simply refused to work. For instance, when tried to run: odacli list-vms, we got the following cryptic message;

DCS-12015: Could not find the user credentials in the DCS agent wallet. Could not find credential for key:xxxx

This clearly pointed to a problem with the DCS Agent wallet lacking the necessary user credentials. Despite following the configuration guides, odacli failed, and the DCS Agent felt completely out of reach.

Initial Moves: Sticking to the Script (Official Oracle Docs)

Oracle's official documentation laid out a seemingly straightforward path:

Configure SSL settings within the dcs yml file(s).
Restart DCS.
Update CLI certificates and dcscli configuration files.

We done all this. Every step was executed properly. Yet, the problem persisted. odacli continued to encounter errors.

The Real Culprit: A Missing Step, An Undocumented Must-Do

Despite the seemingly correct configurations, our back-and-forth with the Oracle support engineer through the SR revealed a critical piece of the puzzle – a step absent from any official documentation:

We get ODACILMTL PASSWORD by the following command;

/u01/app/19.23.0.0/grid/bin/mkstore \ -wrl /opt/oracle/dcs/dcscli/dcscli_wallet \ -viewEntry DCSCLI_CREDENTIAL_MAP@#3#@ODACLIMTLSPASSWORD

We get the password from the output of the command above and we use it to change the password of /opt/oracle/dcs/dcscli/dcs-ca-certs. (--custom keystore. Note that, we get the password related with DCSCLI_CREDENTIAL_MAP.  )

/opt/oracle/dcs/java/1.8.0_411/bin/keytool -storepasswd -keystore /opt/oracle/dcs/dcscli/dcs-ca-certs

We update the conf file with the ODACLIMTLSPASSWORD entries.

These two files : /opt/oracle/dcs/dcscli/dcscli.conf and /opt/oracle/dcs/dcscli/dcscli-adm.conf

The following line: 

TrustStorePasswordKey=ODACLIMTLSPASSWORD

So we do something like a mapping of  wallet and the keystore passwords using the ODACLIMTLPASSWORD.

Skip these, and even with a perfectly configured agent, odacli commands will fail because they can't access the necessary credentials.

Live Intervention and Breakthrough

During a screen-sharing session with the Oracle engineers via Zoom, we went through the following:
Re-verified and, where needed, reconfigured the dcs yml file(s).
Ensured the wallet entry was correctly added.
Executed the crucial mkstore and dcscli commands (above) 
Restarted both the Agent and CLI services.

After these, commands like odacli list-jobs and odacli list-vms started working flawlessly. 

This SR journey left us with some significant takeaways:

"Official documentation may not be always the full story." Some critical steps, like the mkstore credential mapping, might only surface through the SR process itself.

"Configuration details demand absolute precision." File names, paths, and alias definitions in Oracle configurations must be an exact match. Even a minor deviation during the adaptation of Oracle's example configurations to your environment can bring the system down.

"Configuration Files are as Crucial as Logs in Support Requests". Attaching the actual configuration files to your SR significantly accelerates the troubleshooting process for Oracle engineers.

Lessons Learned:
  • Documentation Gaps: Document the steps learned from SRs in the internal technical notes.
  • The processes behind enhancing security in Oracle environments may extend beyond the confines of official documentation. This experience wasn't just about resolving a technical problem; it was a valuable lesson in enterprise knowledge management. If you find yourself facing a similar situation, remember to explore beyond the documented steps – and make sure those learnings from SRs find their way into your internal knowledge base.

Wednesday, May 7, 2025

RAC -- Importance of pingtarget in virtualized environments & DCS-10001:Internal error in ODA DB System Creation

Recently struggled with an issue in a mission critical environment. The issue was the relocating VIPs. It started all of a sudden and diagnostics indicated some kind of a network problem.

The issue was related with failed pings. The pingtarget concept of Oracle was in the stage and due justified reasons, causing VIPs to failover to the secondary node of the RAC.

Some background information about Ping target : Delivered with 12C (12.1.0.2), useful and relevant in virtualized environments. It is there for detecting and take actions in case where network failures are not recognized in the guest VMs. It is related with the public network only, since private networks already have their own heart beat check mechanisms designed with care. So basically, if the target ip(s) can not be pinged from a RAC node, or if there is a significant delay in those pings, VIPs are failed over to the secondary node(s). The parameter is set via srvctl modify nodeapps -pingtarget command. 

Well.. This is a feature developed with the logic that "if the relevant node cannot reach the ping targets, then there is a network problem between this node and the public network, namely the clients, and this means, the clients cannot access the DBs on this node, and if so let's failover the VIPs and save the situation."

It seems innocent since it has nothing to do with the interconnect, but actually it is vital. VIP transfer(s) etc. are happening according to this routine.

In our case, a switch problem caused everything. The default gateway was set to the firewall's ip address and the responses of the firewall to ping(s) were sometimes mixed up. 

We were lucky that the ping target parameter could be set to more than one IP.  ( the fault tolerance), and that saved the day.

But here is an important thing to note: We should not set ping target to the IPs that are against the logic of this. It is necessary to set our ping target to the ip addresses of the physical and stable devices that provide connection to the outside world and that will respond to ping.

If more than one IP is to be given, those IP addresses must be the ones that belong to the devices that are directly related to the public network connections.

Also, a final note on this subject: when you set this parameter to more than one IP, there may be Oracle routines that cannot manage it. Of course, I am not talking about DB or GI, but for example, we faced this in an ODA DB System creation. DB System creation could not continue when the ping target was set to more than one IP address, we had to temporarily set the parameter to a single IP address, and then set it to multiple IP addresses ​​again when the DB System creation finished.

Well, the following is the error we got;

[Grid stack creation] - DCS-10001:Internal error encountered: Failed to set ping target on public network.\\\",\\\"taskName\\\":\\\"Grid stack

This error can be encountered due to incorrect network gateway used in DB system creation (we specify it during DB System Creation GUI(s) and we may change it in the json configuration file) , but! it can also be encountered if you specify multiple ip addresses as the ping targets. We have faced this, and temporarily set the ping target to a single (default gw) address to fix the issue in ODA DB system creation.

I hope this blog post will be helpful in informing you on the subject and will save you time when dealing with the related ODA error.