Sunday, November 25, 2018

Exadata -- Exadata X3 reimaging problem -- biosbootorder

This will be a quick post, because I 'am currently on-site and waiting to start a migration operation :)
Still, I couldn't wait to write about this :)

Recently, we needed to reimage an Exadata X3 system) with a newer image version. ( (X3 can be considered quite old)

We downloaded the Exadata images (18.1.7 and 12.2.1.1.8 versions) and configured our PXE server.

First we tried with 18.1.7...
We booted the compute node using PXE, but the imaging operation failed while validating the biosbootorder.. 

[ERROR][0-0][/opt/oracle.cellos/validations/init.d/biosbootorder- 247][main][247]  
Failed. See logs: /var/log/cellos/validations/biosbootorder

After the failed biosbootorder validation, we got a kernel panic (as a result of reboot).

That is, the boot problems ended up with a kernel panic ->

[ 70.944116] [<ffffffff816a9634>] dump_stack+0x63/0x81 
[ 70.949533] [<ffffffff816a757c>] panic+0xcb/0x21b 
[ 70.954618] [<ffffffff81086560>] do_exit+0xa70/0xa70 
[ 70.959946] [<ffffffff8106c53c>] ? __do_page_fault+0x1cc/0x480 
[ 70.966145] [<ffffffff816b5a6a>] ? page_fault+0xda/0x120 
[ 70.971820] [<ffffffff810865f5>] do_group_exit+0x45/0xb0 
[ 70.977494] [<ffffffff81086674>] SyS_exit_group+0x14/0x20 
[ 70.983284] [<ffffffff816b031a>] system_call_fastpath+0x18/0xd4 
[ 70.989605] Kernel Offset: disabled 
[ 70.993393] Rebooting in 60 seconds.. 

We thought that;
probably, during the boot, there were some drivers missing. They were probably not loaded so, we couldn't boot. ..

Following document was related with a virtualized env, but it had the same error stack and kernel panic.

OCI-C - Instance Fails to Boot Post Patching For L1TF Vulnerability ( Doc ID 2448058.1 )

As I already mentioned, we suspected from the drivers.. However, the real cause was the initramfs.. initramfs could not detect the boot disk using its boot label.

Anyways, we opened an SR and Oracle logged a bug for this.
Bug 28893408 - X3-2 : PXE IMAGING HANGS AND KERNEL PANIC " ?

The solution was as follows;

boot with diag.iso
chroot to /mnt/cell -> chroot /mnt/cell
change boot to DbSys1  (in file named i_am_hd_boot)
Install Grub on /dev/sda - > image_functions_grub2_install /dev/sda /boot force
rebuild initramfs -> dracut --force

I am not going into details of the actions above right now. But, I will :) in my next posts.

Tuesday, November 20, 2018

Exadata Cloud at Customer -- my experience & interesting stories -- traditional methods vs Oracle Database Cloud Service Console

Today, I want to share my experience on Exadata Cloud at Customer, aka ECC (or ECM :).
I want to share the things I have seen so far... ( I have 3 on-going ECC migrations at the moment)
Rather than giving the benefits about this machine ( I already did that) and this Cloud@customer named cloud model, I will concantrate on explaining the database deployment and patching lifecyle.


First, I want you to know the following;
  • Oracle doesn't force us to use TDE for the 11.2.0.4 Databases, which are deployed into the ECC.
  • Oracle recommends us to use TDE for the databases deployed in ECC. 
  • Oracle has Cloud GUIs delivered with ECC. (both for creating instances + creating databases..)
  • Oracle Cloud GUIs in ECC can even patch the databases. (with PSUs and other stuff)
  • Ofcourse, Customers prefer to use these GUIs. However, using these ECC machines as traditional Exadata machines is also supported. That is , we can download our RDBMS software and install it into ECC machines manually as well.
  • We can create our databases using dbca (as an alternative to the tools in GUIs) -- We can deploy and patch our databases just like we do in a non-cloud Exadata machine. (at least currently..)
  • ECC software is patched by Oracle. Both ECC and its satellite(OCC) are patched by Oracle in a rolling fashion. ( Having RAC instances gives us an advantage here)
  • There is a new edition of Oracle Database it seems.. It is called Extreme Edition and I have only seen it in ECC machines)
  • We can't reinstall GRID Home in ECC.. What we can do is to patch it.. ( if a reinstall is needed, we create SRs)
  • When we have a problem, we create a SR using the ECC CSI and it is handled by the  Cloud Team of Oracle Support.
  • Oracle Homes and patches delivered by the ECC GUIs are a little different than the ones deployed with traditional methods.
  • We see banners of Extreme edition in the headers of sqlplus and similar tools.
  • Keeping the ECC software up-to-date is important, because there are little bugs in the earlier releases. (bugs like -> expdp cannot run parallel -- it says : this is not an Enterprise Edition database -- probably because of Extreme edition specific info delivered in ECC Oracle Homes)
So far so good. Let's put these things into your minds and continue reading.

The approach that I follow in ECC projects is simple.

That is; if you deploy a database using Cloud GUI, then continue using Cloud GUI.

I mean, if you create a database (and an Oracle Home) using Oracle Database Cloud Service Console, then patch that database using Cloud Service Console.

But if you install a database home and create a database using the standard approach (download Enterprise Edition software, use dbca etc..), then continue patching using the standard approach.

If you mix these 2 approaches, then you need to make a lot of efforts to make the things go right.

Yesterday, I was in a customer site and the customer reported that they couldn't run dbca successfuly to create their database in ECC.

They actually deployed the Oracle Home using Cloud Console, and then they tried to use dbca to create a database using that home.  ( home is from GUI, database is from dbca)

The error they were getting during the dbca run was the following;


When I checked the dbca logs, I saw that DBCA was trying to create the USERS tablespace and dbca was trying to create it with TDE. (as a result of the encrypt_new_tablespaces parameter.. dbca was setting it to CLOUD_ONLY.. probably the templates were configured that way.) .

See.. Even the behaviour of the dbca is different, when it is executed from an Oracle Home deployed via ECC GUIs.

I fixed the error by customizing the database that dbca would create.. I made dbca not to create the USERS tablespace and then the error dissapeared.

After the database was created , I set set the encrypt_new_tablespaces parameter to DDL (as my customer wanted), and then they could even create new tablespaces without using TDE.

-- optionally, I could create a master key and leave the parameter as is. (CLOUD_ONLY)

Another interesting story was during a patch run.

Customer reported that they couldn't patch the database using GUI.

When I check the GUI, I saw that patch was seen there, but when I checked the logs of the ECC's patching tool, I saw the wget commands.. However; the links that wget commands were trying to reach broken.. 

The patching tools in ECC get the patches using wget automatically, and those patches are not coming from Oracle Support, they are coming from another inventory (a cloud inventory)

Anyways, as the links were broken, customer created a SR to cloud team to make them put related patches to the place they need to be.

Customer also wanted to apply a patch using the traditional way (opatch auto) into an Oracle Home which was created using GUI.

Actually, we patched the Oracle Home successfully using opatch auto , but then we encountered the error "ORA-00439: Feature Not Enabled: Real Application Clusters", while starting the database instances.

Note that, we downloaded the patch using the traditional way (from Oracle Support) as well.

"ORA-00439: Feature Not Enabled: Real Application Clusters" is normally encountered when oracle binaries are relinked using rac_off.

On the other hand, it wasn't the cause in this case.

I relinked properly, but the issue remained.

Rac was on! The related library was linked properly but ORA-00439 remained!

Then, I start to analyze the make file.. (ins_rdbms.mk), and found there an interesting thing there.

In ins_rdbms.mk , there were enterprise_edition tags and extereme_edition tags.. (other tags as well)

When I checked a little bit further, I saw that according to these tags, the linked libraries differ.

Then I realized that this patch that we applied was downloaded from Oracle Support.. (there is no extreme_edition there)

As for the solution, I relinked the binary using the enterprise_edition argument and the error dissapeared->

cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk edition_enterprise rac_on ioracle

-- we fixed the error, but this home became untrustable .. tainted..

So, what is different there? The patch seems different right? What about the libraries? Yes there are little differences in libraries too..

What about the extreme_edition thing? This seems completely new..

So, again -> "if you deploy a database using Cloud GUI, then continue using Cloud GUI. I mean, if you create a database (and an Oracle Home) using Oracle Database Cloud Service Console, then patch that database using Cloud Service Console.
But if you install a database home and create a database using the standard approach (download Enterprise Edition software, use dbca etc..), then continue patching using the standard approach."

That's it :)

Sunday, November 18, 2018

RDBMS -- EBS - Exadata Cloud at Customer (ECC) migration -- ignorable errors during 11.2.0.4 upgrade

This blog post will be in 2 parts.

In the first part , I will give some useful info about "EBS - ECC migrations" and then, I will share some ignorable errors that we have seen during our EBS database upgrade.

Let's start with the first part;

As a prereq for an Exadata Cloud at Customer (ECC) migration project, we were upgrading the database tier of an EBS R12 instance.

The upgrade was done for making the database of this EBS instance aligned with the ECC's minimum database software version requirements. ( currently, ECC requires 11.2.0.4 as the minimum RDBMS version)

So our plan was to upgrade this EBS's database and then migrate it to ECC using dataguard..

With this migration, we also planned to convert this EBS's database from single instance to RAC.
Note that, the source database version was 11.2.0.3.

Anyways, although it sounds complicated , there are 3 documents to follow for this approach.

The MOS document for the database upgrade -> Interoperability Notes EBS 12.0 and 12.1 with Database 11gR2 (Doc ID 1058763.1)

The MOS document for dataguard switchover -- migration -> Business Continuity for Oracle E-Business Release 12.1 Using Oracle 11g Release 2 Physical Standby Database (Doc ID 1070033.1)

The MOS document for converting Using RAC 11gR2 with EBS R12 ->  Using Oracle Real Application Clusters 11g Release 2 with Oracle E-Business Suite Release 12 (Doc ID 823587.1)

Now, let's check what we have seen during 11.2.0.3 to 11.2.0.4 upgrade (using dbua)

Well.. Although we have did everything documented in "Interoperability Notes EBS 12.0 and 12.1 with Database 11gR2 (Doc ID 1058763.1)", during the upgdade we have seen unexpected errors like the following;




When we check the upgrade log (we must check the log to see the details of the failing command), we ended up with the following;

drop procedure sys.drop_aw_elist_all
*
ERROR at line 1:
ORA-04043: object DROP_AW_ELIST_ALL does not exist

create or replace type SYSTEM.LOGMNR$TAB_GG_REC wrapped
*
ERROR at line 1:
ORA-02303: cannot drop or replace a type with type or table dependents 
create or replace type SYSTEM.LOGMNR$COL_GG_REC wrapped
*
ERROR at line 1:
ORA-02303: cannot drop or replace a type with type or table

create or replace type SYSTEM.LOGMNR$SEQ_GG_REC wrapped
*
ERROR at line 1:
ORA-02303: cannot drop or replace a type with type or table
  
create or replace type SYSTEM.LOGMNR$KEY_GG_REC wrapped
*
ERROR at line 1:
ORA-02303: cannot drop or replace a type with type or table

CREATE TYPE SYSTEM.LOGMNR$TAB_GG_RECS AS TABLE OF  SYSTEM.LOGMNR$TAB_GG_REC;
*
ERROR at line 1:
ORA-00955: name is already used by an existing object

Good news -> After some researches, we have concluded that all of these errors were ignorable.

The ORA-04043 which was encountered for dropping DROP_AW_ELIST_ALL was ignorable. My comment on this was -> this object seems not secure (maybe there was a SQL/DML injection bug there) and maybe that's why upgrade was trying to drop it. (ref: http://www.davidlitchfield.com/OLAPDMLInjection.pdf)

But, as far as I can see, this procedure normally comes with patch 9968263.. So if this patch was not applied, then it is normal to not to have this procedure inside the database and then "object doesn't exist error" is normal as well. So it was just ignorable :)

The ORA-02303 and ORA-00955 errors were encountered for Logminer-specific objects. These errors were completely addressed/documented in Oracle Support, so they were directly ignorable.This was a bug actually. These objects were actually affecting the Goldengate, since we didn't have Goldengate in this customer, we just ignored them..  However, if you have Goldengate, then check the following document:

ORA-02303 & ORA-00955 Errors on SYSTEM.LOGMNR$ Types During PSU Updates (Doc ID 2008146.1)

That's it :) Just wanted to share.