Monday, November 18, 2019

RAC & Exadata // Some Tips & Tricks for the migration ... for the performance

Currently dealing with a RAC/EXADATA migration project..
This time I'm dealing with a mission critical single node environment. This environment is fully CPU-bound, but it has also very fast All Flash Arrays in its storage layer.
I m dealing with a source environment which has lots of connections doing lots of things and using lots of CPU cycles.
The application is not RAC-aware either. With all the things given above, you can guess that the application can not scale out in the database layer very easily.
In order to migrate this kind of a database to RAC/to Exadata, we must take some extra actions, some improvements actually. Some changes for the RAC, for scaling out properly.
Yes, we must think about all the 3 layers.. Application, Database and Storage.
Especially; If the target is an Exadata, then we must concantrate on the Application layer intensively.

In this blog post, I will quickly give you some tips and tricks that you may use before migrating a database environment, which is similar to the one I just described above ;

NOCACHE Sequences: Ordered Sequences do not scale well.
Use non-ordered and cached sequences if sequences are used to generate the primary keys 
ALTER SEQUENCE ERMAN_SEQ1 … CACHE 10000;
If you not cache them, you may see EQ or SQ contentions..
However, know that if you use non-ordered, cached sequences, then you may have non-ordered values in your table columns which are feeded with these sequences..
So, if you can't use cached and non-ordered sequences with your application, then consider having an active-passive configuration. You should consider using that part (the code that uses sequences) of your application only on one instance of your RAC.

Missing cluster interconnect parameter: Making an Oracle database running in Exadata use static infiniband interconnect ip address relies on setting cluster_interconnects parameter.
If not set, Oracle database by default chooses HAIP infiniband addresses for the cluster interconnect and it is not recommended.

This recommendation can also be viewed by analyzing an Exachk report.
That is , if we don't set the cluster_interconnects parameter in the database and leave Oracle database to use the default HAIP interconnects, then Exachk will report a failure saying "Database parameter CLUSTER_INTERCONNECTS is NOT set to the recommended value"

Database parameter CLUSTER_INTERCONNECTS should be a colon delimited string of the IP addresses returned from sbin/ifconfig for each cluster_interconnect interface returned by oifcfg. In the case of X2-2 it is expected that there would only be one interface and therefore one IP address. This is used to avoid the Clusterware HAIP address; For an X2-8 , the 4 IP addresses should be colon delimited

So use ifconfig to determine the ip addresses assigned for ib0 and ib1 interfaces (not the ib0:1 or ib1:1) on all the rac nodes, and set these ip address in a colon delimeted strings for all the instances and restart the database;
Ex:

alter system set cluster_interconnects='ib_ipaddress1ofnode1:ib_ipaddress2ofnode1' scope=spfile sid='SIDinst1';

Hard parses :  Use soft parsing to save and reuse the parse structure and execution plan. With soft parsing, metadata processing is not required.
When soft parsing, you don't parse and describe.. You only execute and fetch.. Yes, we are talking about eliminating cost here.

Avoid Hard Parsing by Prepared Statement and using Bind Variables...

Instead of;
String query = "SELECT LAST_NAME FROM "+"ERMANS WHERE ERMAN_ID = " + generateNumber(MIN_ERMAN_ID,MAX_ERMAN_ID);
prepStmt = connection.prepareStatement(query);
resultSet = pstmt.executeQuery();   
Change to:
String query = "SELECT LAST_NAME FROM "+"ERMANS WHERE EMPLOYEE_ID = ?";
prepStmt = connection.prepareStatement(query); 
int n = generateNumber(MIN_ERMAN_ID,MAX_ERMAN_ID)
prepStmt.setInt(1, n);
resultSet = pstmt.executeQuery(); 

Caching the soft parses :  Although soft parsing is not expensive, it can still take some time.. Consider using statement caching.
for ex : oracleDataSource.setImplicitCachingEnabled(true) + connection.setStatementCacheSize(10);

High number of concurrent sessions : In order to control concurrent sessions, consider using connection pooling. Consider limiting the pool and processes to avoid connection storms. Ensure that load is balanced properly over the RAC nodes.

High number of database sessions : More processes means higher memory consumption, bigger page tables, higher risk of paging, higher system CPU time. Consider using connection pools, consider releasing the connections when your works are finished.

Small Log Buffer Size:  Consider making log_buffer parameter bigger if you see log_buffer_space waits even in your current platform.

High interconnect usage: Using interconnect is expensive even if you use infiniband for the interconnect communication.. It is expensive, since it depends on lots of things. 
For ex: Even  the things that LGWR does, are important while we are using the interconnect. That is; when blocks with pending changes are pinged by other instances , the related redo must be written to log, before the block can be transferred. So in such an environment, where you have chatty processes that manipulate and read the same data blocks, you may even consider having the sessions that manipulate the same data frequently to connect to the same RAC node all the time.

Performance tuning: It just never ends. Like peeling an onion… There’s always another layer.. Consider using Partitioning, compression, indexes (you may even consider dropping some of them) and so on. Consider implementing RAC best practices..
Global hash partitioned indexes & Locally partitioned indexes -> Both of these make you achieve better cache locality.

Re-considering current underscore parameters : Any reasons? , for ex: _disk_sector_size_override=TRUE  (YOU DON’T NEED TO SET IT – 1447931.1)

EXADATA -- Connecting to the nodes using the Serial Management Port (RJ45-DB9-RS232-USB) + Connecting to the Cisco switch via CAT-5

Under certain circumstances, we connect to Exadata machines using a serial connection  or using a cat5 cable connected to the Cisco switch.
These kinds of connections may be needed while re-imaging Exadata or while installing an Exadata.
Basically, we may need to have these kinds of connections when we don't have a proper network configuration on the Exadata machines that we are dealing with.

In this blog post, I 'm going to explain both connecting our laptop to Exadata nodes using a serial connection and connecting our laptop to Exadata nodes using a TCP-IP connection directly through the Cisco-Switch (Admin network).

 RJ45-DB9-RS232-USB cable

Let's start with Serial connection;

This kind of a serial connection is actually for connecting to ILOMs via serial connection.
We actually connect to the nodes using the Serial Management Port through a RJ-45 cable, but that cable is converted to USB for the client side.(as seen in the picture above)

First, we download and install a usb to serial port converter driver to our client. (most of the case, we use windows clients) .  This driver can be found on Internet, for ex: CDM20814_Setup.exe

Then we open up our Device Manager and check our  COM X connection port. Yes, we need to find that X :) --> for instance : COM3, COM2

Then we connect the serial port side of our cable to the serial port of the node that we want to connect. (compute node, cisco switch etc)
Ofcourse we connect the usb side of our cable to our client machine.

Once the cabling is done, we use Putty to connect to the device using the COM Port.
That is; in putty, we choose the COM port/Serial connection and click Open.. After clicking Open, we press the Enter key 3-4 times and we find ourselves in ILOM console of the machine.

Lastly, we reach the SSL console of the machine and that's it :)

ermanarslanexadata02-ilom login: root
Password: 
Oracle(R) Integrated Lights Out Manager
Version 3.1.2.10 r74387
Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.
 -> 
 -> start /SP/console
Are you sure you want to start /SP/console (y/n)? y
 Serial console started.  To stop, type ESC (

Let's take a look at connecting our laptop to the the Cisco switch using a cat-5 cable;

Ofcourse this is easier than having a  serial connection.

We connect our client to a free port of the Cisco switch using a cat-5 cable.
Then we configure our network adaptor by specifying an IP address manually. (we use an IP address assigned to one of our PDUS.. We also specify correct Subnet and Gateway for our network adaptor..) -- note that this info can be derived from the OEDA output.
Once configured, we are done.. We may see an exclamation mark (!) on our network adaptor , but this is Ok.. We may now connect to our db nodes, to our cell nodes and even to our infiniband switches using SSH :)

I'm done :) I hope, this will help you ..

Sunday, November 10, 2019

Cloud -- Creating ADWH(Autonomous Data WareHouse) & Connecting to ADWH

In today's post, I will give you a quick introduction about Oracle Cloud..
This post will be about connection to Oracle Cloud specifically and we will take a look at creating an ADWH (famous subject these days) and connecting to ADWH by going through some screenshots.

Actually, these things are so easy now.  The Cloud Console of OCI(Oracle Cloud Infrastructure) is quite good , capable and stable, I must say..

Basically, in order to create an ADWH; first we connect to Oracle cloud with our cloud credentials , and then we click on the hamburger menu in the left most corner of the page and choose ADWH..

After that we choose/check the configurables in the ADWH create screen and that's it.

Once we create your ADWH, we download the related wallet and use our favorite tool to connect this ADWH of ours with that wallet.

We can't connect to an ADWH with SYS and SYSTEM account, but we have a special account named ADMIN, if we want to be in the admin mode :)

Anyways, lets go through the screenshots and see this in action..

First we connect to our trial cloud account;


After supplying our cloud account (it is like a tenant name), we enter our cloud credentials;


After the login, we click on the hamburger menu and chose Autonomous Data Warehouse from there.


Once we click on the Autonomous Data Warehouse item, we see our current ADWH environments , and we find ourselves in a page where we can create a new Autonomous Data Warehouse.


We click on the Create Autonomous Database button and we find ourselves in a page, where we can create our ADWH.


We enter/choose the relevant items on this menu and lastly click on the "create Autonomous Database" button at the bottom of the page.

So, the creation of  an ADWH on Oracle Cloud is as easy as that :)

Now let's take a look at the connection.. Connecting to an ADWH..

First we click on our ADWH to see its configuration.


Next, we click on the DB connection button and then click on the download wallet button to download the related wallet into our client..


Not that, we supply a wallet password, before downloading the wallet.


Once our wallet is downloaded, we are ready to connect to our ADWH.. Note that, we don't unzip our wallet. We leave it as zipped..


Next, we start our favorite tool for the db connection.. (in this case, it is Sql Developer 19.2)


Then, we supply our wallet (wallet location) and user/pass (ADMIN user in this case.. ADMIN user that was created while creating our ADWH earlier) in the db connection pane.


That's it, we are connected ! :)


Next blog post about this Cloud topic will be about migrating data to an ADWH.. Stay tuned! :)

Friday, November 8, 2019

RDBMS/ECC -- RAT / PL/SQL based Workload / Database Replay with Synchrozation=OFF, WRC:replay clock, Data Divergence and Concurrency Waits

Currently working on a very critical Exadata Cloud at Customer(ECC) migration project.

Actually, I like to lead these kinds of projects, as I find them challenging :) Anyways, there is no fault tolerancy in this project. So we need to migrate all the mission critical  Oracle databases to ECC (Exadata Cloud at Customer) without any failures.

Performance is also very important, as these databases should run faster or at least they must run with the same speed that they run in their source platforms with.

There are critical Oracle E-Business Suite databases and other core business-related databases in the scope and their workload is mostly based on PL/SQL.

Both source and target Oracle Home versions are 11.2.0.4(equipped with latest PSU + RAT Bundle 4) and we must be sure that these databases will run in ECC without any performance problems..

In order to ensure that, we need to make a load test. A real workload test...

Here is the story about it :) ;

We have Oracle RAT / Database Replay opportunity to use for this kind of a test and ofcourse we give it a try.

First we capture and then we analyze.. Once our capture analysis are complete, we replay and then analyze again.

In order to analyze the captured workload, we use Workload Capture Analyzer Report..

There we see something important -> 

PL/SQL
Maximum Workload Impact: 77 % of DB Time
Rationale
If the replay is much slower than expected, try to run in unsynchronized mode.
Action
A significant part of your workload comes from PL/SQL.
If the PL/SQL blocks or functions have 'complicated' logic or multiple commits in them, they are hard to synchronize and they behavior might change during replay.
You might see a different workload profile during replay if this is the case.

Yes, we have a significant PL/SQL workload, and here comes the FACT ->

In 11.2.0.4 Oracle Database environments; a workload with this amount of PL/SQL needs to be replayed with synchronization=OFF..

Also, in 11.2.0.4 ; Oracle doesn't support RAT replay on EBS, as in EBS we have PL/SQL codes that send requests to the Concurrent managers and get into a loop waiting for Concurrent Managers to process these requests. This makes PL/SQL code to wait in an infinite loop, because a Concurrent Manager can't start processing before the calling PL/SQL is completed.

Yes.. When we have a PL/SQL based workload, we end up with PL/SQL synchronization problem during the replay and all the WRC process hang, waiting on WRC:replay clock wait.

If we need to replay this kind of a significant PL/SQL workload with synchronization; our database version should be at least12.2. It is because; we have an option named extended capture in Oracle Database 12.2..

In Extended capture, Oracle RAT captures the SQL/DML operations executed inside the PL/SQL (not the PL/SQL itself). If there is a loop, which executes a DML 100 times with different binds, then extended capture captures those 100 calls with their bind variables.. During replay, RAT executes these 100 calls based on the timing recorded during the capture.

Well.. We are in version 11.2.0.4, so we don't have the extended capture option.
So we must replay our workload with synchronization=OFF.

What does this synchronization mean bytheway?

In sync mode, RAT replays the workload with same concurrency and timing as the production workload. In sync mode, transaction commit order is honored and RAT ensures minimal data divergence.

When we use sync=off; Oracle RAT doesn't enforce any logical dependencies between the calls while replaying the workload.. It doesn't preserve the commit order, it just replays the workload based only on the captured timing.

Ofcourse, this increases the concurrency and increases the data divergence.
However, this is our only chance for replaying this kind of a huge PL/SQL workload on 11.2.0.4 enviromment. (if we use the sync mode, then all our WRC processes hang waiting for WRC:Replay clock waits)

Anyways, we replay this workload with RAT using synchronization=OFF.
It really increases the load of the server, but we can at least get our comparison reports once the replay is finished.

What we see in these reports are mainly high concurrency waits.
These waits are mostly caused by synchronization=OFF, but they contribute on the total DB time recorded during the replay..

Well... We identify these kinds of waits carefully, and ignore them.

This time the comparison report gives us a comparable picture and at least it tells us if our target server can handle this kind of a load or not.. Actually, when we use the sync=off, we do some kind of a stress test.. This means, eventhough, we don't get a good and trustable comparison report with sync=off, we can still identify things like the durability of the target platform and the sqls that speeded up or slowed down.. This is at least something :)

So if you are considering using RAT in your workload tests, keep the things I just wrote above, in your mind.. You may need to empower your RAT-based workload tests with a client-side test tool, a client-side swingbench maybe..

Thursday, November 7, 2019

ECC -- dynamically increasing Cpu Cores of Compute Nodes -- Cpu Steal Time

I needed to increase the cpu core count of an ECC (Exadata Cloud at Customer) 1/4 machine in a customer environment and I did it using the Cloud Console screens as documented and I could do it without any problems. Cpus were increased..
After increasing the Cpu count, I checked the /proc/cpuinfo and saw 44 cores in each node.. Having a core count of 44 is the maximum for an ECC(Exadata 7-based) 1/4 environment.
However; when I checked the instant cpu utilization using the top command, I saw that the newly added cpus were not utilized at all..


The newly added Cpus were not utilized at all, but their %st values were high.
This was interesting, as %st was representing the Cpu Steal Time.
Here is IBM's definition of Cpu Steal Time -> Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.

This didn't make sense, since this was an ECC and it was dedicated to this customer environment.
So I started checking the bugs.
There were bugs recorded for this behaviour, which appears after dynamically incresingVCpus in ECC environments and reboot seemed the best solution ..

We rebooted the nodes one by one and the issue dissapeared.. the newly added CPUs started to be utilized after rebooting the nodes.

I give you the version info of this environment below;

Kernel version: 4.1.12-94.8.4.el6uek.x86_64 SMP
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.7.0.0.180821

So, if you have a similar problem, you can consider rebooting the nodes one by one.. But the real solution should be an ECC image upgrade.. So plan your upgrades on time. Oracle does these upgrades for you in Cloud at Customer model anyways..

Wednesday, November 6, 2019

RDBMS -- Oracle Log Miner Read Rate

If you are working with a tool, that uses the Log Miner behind the scenes, then you may need to check your Log Miner Read rate..

Yes, we use Log Miner in some applications to replicate our data logically from one database to another. As we DBAs check LGWR performance, which is mostly write-based, we may also check Log Miner 's Read rate in these kinds of configurations while dealing with degradations in replication performance.

Here is an example for doing that;

We check our archivelogs and choose 5 archivelog to be used in our calculations..
note that , we choose 2 days old archived logs in this demo..

We take the first archived log and execute the following command:
EXECUTE DBMS_LOGMNR.ADD_LOGFILE( -
LOGFILENAME => '<first_log_file>', -
OPTIONS => DBMS_LOGMNR.NEW);

We then add the remaining 4 archive log files one by one by using the following command;
EXECUTE DBMS_LOGMNR.ADD_LOGFILE( -
LOGFILENAME => '<next_log_file>', -
OPTIONS => DBMS_LOGMNR.ADDFILE);

We get the the FIRST_CHANGE# and NEXT_CHANGE$ values from first log file and use it as STARTSCN and ENDSCN in the following command;

DBMS_LOGMNR.START_LOGMNR
(STARTSCN => <first_change#>, ENDSCN => <next_change#>,
OPTIONS => DBMS_LOGMNR.SKIP_CORRUPTION+ DBMS_LOGMNR.NO_SQL_DELIMITER+DBMS_LOGMNR.NO_ROWID_IN_STMT+ DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);

We start our Sqlplus session and set timing on..

Next, we use the below query to read all the redos from those 5 archivelogs that we choosed in the beginning. At the end of this query execution, we get 'no rows found'. This is okay, as we can still calculate the query execution time/duration. This query execution time is actually the total time required for reading those 5 archived logs.

SELECT * FROM V$LOGMNR_CONTENTS WHERE TABLE_NAME = 'TABLENOTEXIT';

Well, the calculation is simple archived_log_count x archive_log_size / duration.

So if those 5 archived logs each are 2GB in size, and if the query above took 10 mins , then our Logminer Read Rate is (5*2)*(60/10) = 60GB/Hr.

Here is a quick tip for you :)

Thursday, October 31, 2019

RAT -- Export/Import AWR problem // DBMS_WORKLOAD_CAPTURE.EXPORT_AWR // ORA-39213: Metadata processing is not available + ORA-39001: invalid argument value


Recently implemented RAT (Capture and Replay) in an Exadata Cloud at Customer migration project.
Capture phase was completed properly, but we weren't able to export AWR reports from the source to compare them with the target (with the AWR data generated during the the replay)

DBMS_WORKLOAD_CAPTURE.EXPORT_AWR was failing like below;

ORA-20115: datapump export encountered error:
ORA-39006: internal error
ORA-39213: Metadata processing is not available
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 1378
ORA-39006: internal error
ORA-06512: at "SYS.DBMS_WORKLOAD_CAPTURE", line 1911

Anyways, DBMS_WORKLOAD_CAPTURE.EXPORT_AWR actually doesn't do a magic.
awrextr.sql that resides in the $ORACLE_HOME/rdbms/admin directory does the same thing..
However; in this case awrextr.sql was getting the same ORA-39213 error..
The problem was probably related with the stylesheets that are used by these kinds of utilities..
However; the stylesheets were loaded already.
So I decided to check them with the following;

BEGIN
IF dbms_metadata_util.are_stylesheets_loaded THEN
dbms_output.put_line('TRUE');
ELSE
dbms_output.put_line('FALSE');
END IF;
END;
 /

The output was:

META:15:50:54.881: bad stylesheet: kuamvl
FALSE

So one stylesheet was corrupt , and then I decided to load all the stylesheets once again in one go.

exec dbms_metadata_util.load_stylesheets.

After this move, I got TRUE output by the same check.

BEGIN
IF dbms_metadata_util.are_stylesheets_loaded THEN
dbms_output.put_line('TRUE');
ELSE
dbms_output.put_line('FALSE');
END IF;
END;
 /
TRUE

This issue was solved, but there was another issue waiting for me :)

This time,  DBMS_WORKLOAD_CAPTURE.EXPORT_AWR was failing with the following;

ORA-39001: invalid argument value
Exception encountered in AWR_EXTRACT
begin
*
ERROR at line 1:
ORA-20115: datapump export encountered error:
ORA-39001: invalid argument value
ORA-39041: Filter "EXCLUDE_PATH_EXPR" either identifies all object types or no object types.
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 1378
ORA-06512: at line 3

This were all related with Datapump, as these types of commands were using datapump to extract the data they need in the backend.

This time, I checked metanametrans$ table.. -> select count(*) from metanametrans$;
It was empty... (it should have records normally.. record count should be higher than 2000 or  higher than 3000)

So I decided to run catmet2.sql

catmet2.sql

Rem DESCRIPTION
Rem Creates heterogeneous type definitions for
Rem TABLE_EXPORT
Rem SCHEMA_EXPORT
Rem DATABASE_EXPORT
Rem TRANSPORTABLE_EXPORT
Rem Also loads xsl stylesheets
Rem All this must be delayed until the packages have been built.

After that I executed the utlrp.sql.

Basically the solution was:

sqlplus / as sysdba

@$ORACLE_HOME/rdbms/admin/catmet2.sql
@$ORACLE_HOME/rdbms/admin/utlrp.sql

The issue was dissapeared :)

After this point, we could export the AWRs and then imported them to the target and created our comparison report successfully:)

Hope this helps..

Wednesday, October 30, 2019

RDBMS -- Migrating from Microsoft Sql Server to Oracle // Sql Developer 19.2 -- Migration Wizard & Copy to Oracle

Migrations are popular these days, just like they always were.
Cloud migrations , Oracle migrations, Sql Server migration, Data Center migrations, Cloud at Customer migration etc.. you name it :)

Today, I want to share a migration approach that can be taken in order to migrate data from Sql Server to Oracle.
I prefer using the term data rahter than the term database, because this database term is used a little different in Sql Server ,  a little different than it is used in Oracle world.

Anyways;

Sql Server to Oracle migrations can be done  in several ways. There are tools which can be purchased. Like this good one:

striim -> www.striim.com

Migrating Sql Server to Oracle is mostly a logical migration work, right? You can even write your own tool to migrate your data from Sql Server to Oracle.

In this post, however; I will give you an approach , which is based on using Oracle Sql Developer 19.2.

This tool has great features, and it is fun to play with them..

In this post; Sql Developer's Migration Wizard & Copy to Oracle features will be on our focus, while going through the migration process.

Let's see our source and target environments:


Quite realistic right? :) 11.2.0.4 customer should consider upgrading their databases, to a supported version , a long term supported one maybe 19C.. Anyways, we usually don't migrate the database and upgrade it at the same time, so let's not lose our focus.

As I mentioned, migration operations are performed using the SQL Developer as the Migration Tool. We run  Sql Developer on Linux, but it can be run on Windows too...

Sql Developer does this migration task in 3 steps...

Normally, all of these 3 steps can be performed ONLINE.
However; in our case, GENERATE and DATA MOVE steps are performed OFFLINE..
This is done on purpose bytheway..
We just needed to interfere the process, as some of the table formats and most of the time the naming that is used in source, couldn't directly be migrated to Oracle.  So we needed to do some manual work by stopping the Sql Developer migration and then by letting it resume, once we complete performing our manual actions.


PREREQUISITIES
  • Install SQL DEVELOPER (19.2.0) for Linux x86-64 Download the SQL Developer from the following link.
https://download.oracle.com/otn/java/sqldeveloper/sqldeveloper-19.2.1.247.2212.noarch.rpm
  • Install the SQL Developer via root user.
          rpm -ivh sqldeveloper-19.2.1.247.2212.noarch.rpm
  •  Download the jar file from the link below and follow the steps. 
https://sourceforge.net/projects/jtds/

  • Add jar to SQL Developer as follows;

SQL Developer > Tools > Preferences > Database > Third Party JDBC Drivers > Add Entry


  • Connect to SQL Server Database as a privileged user;
Add connection information as follows to “SQL Developer > File > New Connection “

Name : SQL
Username : user
Password : password
Hostname : ermanserver1
Port : 1433


  • Create Oracle Database Migration Repository

A database migration repository user/schema is created on target Oracle Database so that we can perform all data migration steps

CREATE USER MIGREP IDENTIFIED BY MIGREP DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
ALTER USER MIGREP QUOTA UNLIMITED ON USERS;
GRANT ALTER ANY ROLE, ALTER ANY SEQUENCE, ALTER ANY TABLE, ALTER TABLESPACE, ALTER ANY TRIGGER, COMMENT ANY TABLE,CREATE ANY SEQUENCE,CREATE ANY TABLE,CREATE ANY TRIGGER,CREATE ROLE,CREATE TABLESPACE,CREATE USER, DROP ANY SEQUENCE, DROP ANY TABLE, DROP ANY TRIGGER, DROP TABLESPACE, DROP USER, DROP ANY ROLE, GRANT ANY ROLE, INSERT ANY TABLE, SELECT ANY TABLE, UPDATE ANY TABLE, DBA TO MIGREP;
GRANT ALL PRIVILEGES,CREATE ROLE , ALTER ANY TRIGGER,CREATE USER, CONNECT, RESOURCE, CREATE VIEW, CREATE MATERIALIZED VIEW, CREATE PUBLIC SYNONYM TO MIGREP WITH ADMIN OPTION;

  • Associate Migration Repository
After the migration repository user is created, we establish our Oracle Database connection via MIGREP schema using the SQL Developer. After that, we associate our migration repository by clicking on the Associate Migration Repository button, as shown below;


MIGRATION

Migration consists of 3 steps: Capture, Generate and Data Move.

CAPTURE

In Capture step , we connect both to the SQL Server Database and  to the Oracle Database via Sql Developer. After establishing our connections, we start our capture by clicking the migrate button as shown below. -- As a result, we get a master sql created. This master sql is used to create the database objects in the target Oracle Database, in the next step.

"SQL Developer Tools > Migration > Migrate"


The migration wizard requests some information for the capture phase.
As shown below, we populate the related screens and start our migration;

Specify a name  and directory for the migration project... Migration files are located on this directory.


Source Database - Select Online Mode for Source Database (SQL Server)


Choose the schema to be migrated. e.g. INTERFACE schema



Conversion Matrix ; in Convert Menu we can modify the data type conversion matrix. Although , Oracle Data Type Column is modifiable; it is okay to continue with the default options (at least, it was okay in our case).
However; we need to modify some modifiables under the advanced options tab. So we click the Advanced Options button..


In File Creation Options, we select "One Single File option."
In General Options, we uncheck "Generate Comments, Latest Privilege Schema Migration, Generate Failed Objects".
We check "Generate Stored Procedure for Migrate Blobs Offline, Create Users, Generate Separate Emulation Package".
We also uncheck  “Target Oracle Database 12c” as our Target database version is 11g.
Object Types: In our case we move only the data, the tables, so we check “Tables“ only.


Identifier Options : In Identifier Options window, we check only the checkbox named “Microsoft SQL Server: Is Quoted Identifier On“.


Translators: We uncheck all Oracle Database 12c Features.


Translate: SQL Objects : We move all the Sql Objects to the left side (In our case we migrate only the tables to the target system.)


Target database: Online generation doesn't really work for us, therefore we select the offline mode for the GENERATE step. It is mainly because, we need  to modify the generated script (when necessary) and run it manually.


Move Data:  As we select the Offline mode for the GENERATE Step, we select the offline mode again here, normally.


Summary: We  review the migration summary and continue..


Editing and Running the "Master.sql":

After completing Migration Wizard, a master.sql is created in the Project Directory. (Note that, we defined the Project directory before,  in the relevant migration wizard screen)

The Project directory contains folders named “generated” and “datamove”. The "master.sql" is created in the "generated” folder.


The master.sql is the one that is used for creating the users/schemas/objects in the target Oracle Database.
We execute this "master.sql" manually, after we change the password with the new passwords we prefer.
We just run the master.sql using the sqlplus.. If any errors are received when running master.sql script in SQL Developer, these errors should be carefully examined.

What do we modify in master.sql? Well it depends.. We replace the turkish characters used in table and column names. Basically this kind of stuff.. We edit the master.sql in a way to make the DDL statements inside of it be compatible with Oracle.. (Note that, our target sytem version (Oracle Database version) was 11.2.0.4)

GENERATE:

After the capture step, we continue with the Generate step.
In Generate step, we need to install SQL Server sqlcmd and bcp command-line tools on Oracle Linux 6 (in our case, the host that we run Sql Developer was an OEL 6).
After these steps are completed, we run "MicrosoftSQLServer_data.sh" script on our host.
The script exports data from the source database to several files.
Note that, before running this script, the Linux environments "LANG" and "NLS_LANG" must be defined properly in our shell.

Installing SQL Server sqlcmd and bcp Utility:

sudo su
curl https://packages.microsoft.com/config/rhel/6/prod.repo > /etc/yum.repos.d/mssql-release.repo
exit
sudo yum remove unixODBC-utf16 unixODBC-utf16-devel #to avoid conflicts
sudo ACCEPT_EULA=Y yum install msodbcsql
# optional: for bcp and sqlcmd
sudo ACCEPT_EULA=Y yum install mssql-tools
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc
# optional: for unixODBC development headers
sudo yum install unixODBC-devel

Running “MicrosoftSQLServer_data.sh

export LANG=tr_TR.ISO-8859-9   (in our case)
export NLS_LANG=Turkish_TURKEY.WEISO8859P9 (in our case)
cd /home/oracle/LOG/01102019/INT_ITE1/datamove/2019-10-01_14-28-24
./MicrosoftSQLServer_data.sh <ip_address_of_sqlserver> <user> <password>

DATA MOVE:

Once the generate step is completed, we have the data exported from Sql Server in several files located in the host that we run our Sql Developer on.
In the Data Move phase, we import this data to Oracle.
Generation uses sqlcmd and bcp utilities to export the data from Sql Server.
Data Move uses sqlldr (SQL Loader) to import this data to Oracle.

In the Data Move step, we copy all the files to the target database host, and execute the oracle_loader.sh from there.
Before running this script, Linux environment variables "LANG" and "NLS_LANG" must be defined properly in our terminal.

 Running oracle_loader.sh

export LANG=tr_TR.ISO-8859-9 (in our case)
export NLS_LANG=Turkish_TURKEY.WEISO8859P9 (in our case)
cd /home/oracle/LOG/01102019/INT_ITE1/datamove/2019-10-01_14-28-24
./oracle_loader.sh <db_name/TNS_Entry_of_oracle>  <user>  <password> 

--note that we run this oracle_loader.sh script from a host where an ORACLE_HOME is installed.  (recommended : from the target database host). That 's because -> when Oracle Home is installed, we get our Sql loader installed as well. Note that, we also populate the tnsnames.ora file in that Oracle Home with the TNS_ENTRY of our target Oracle database. (as you may already saw in the example command above, we use the TNS_ENTRY_of_oracle as the first command line argument, while runnning the oracle_loader.sh)

COMPARISON:

After the data move, we can say that our migration is completed. But we need to compare the data. We need to ensure that , the data stored in the source (Sql Server) is properly migrated to target (Oracle)..
So, after completing the migration steps, we must compare metadata and data between source and target databases. 

Compare Table Count

We compare the table count with the following sql queries.

SQL Server Table Count SQL

SELECT Count(*)
FROM erman_db.information_schema.TABLES
WHERE table_type = 'base table'

Oracle Table Count SQL

SELECT Count(owner) AS "TABLE COUNT"
FROM dba_objects
WHERE owner LIKE ( '%ERMAN_DB' )
AND object_type = 'TABLE';

Compare Row Count

We compare the row count of the migrated tables with the following sql queries.

SQL Server Table Row Count SQL 

SELECT Sum([partitions].[rows]) AS [TotalRowCount],
[tables].NAME AS [TableName],
Schema_name(schema_id) AS [SchemaName]
FROM sys.tables AS [Tables]
JOIN sys.partitions AS [Partitions]
ON [tables].[object_id] = [partitions].[object_id]
AND [partitions].index_id IN ( 0, 1 )
GROUP BY Schema_name(schema(schema_id), [tables].NAME

Oracle Table Row Count SQL

SELECT num_rows,
table_name,
owner
FROM dba_tables
WHERE owner LIKE ( '%ERMAN_DB' )
ORDER BY num_rows DESC;

KNOWN ISSUES:

In our tests, we saw some cases where some of the tables could not be migrated directly by using the Sql Developer's Migration Wizard.
They could not be migrated due to errors that Sql Developer or Sql Loader(in the background) got during the migration process.

The errors that we recorded during our tests were listed below;

ERROR : SQL*Loader-466: Column PROMPT does not exist in table XXXX.
ERROR : 2 Rows not loaded because all fields were null. (XXX)
ERROR : 74 Rows not loaded because all fields were null.
ERROR : SQL*Loader-350: Syntax error at line 4. Illegal combination of non-alphanumeric characters into table XXXX
ERROR : Record 71: Discarded - all columns null.
ERROR : ORA-01400: cannot insert NULL into (XXX.XXX)
ERROR : Error on table XXXX, column YYYY
ERROR : SQL*Loader-500: Unable to open file (XXX ) 
ERROR : SQL*Loader-553: file not found S
ERROR : SQL*Loader-509: Systemerror:No such file or directory 
ERROR : SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
ERROR : Field in data file exceeds maximum length
ERROR : Rejected - Error on table XXX, column YYY. Field in data file exceeds maximum length

Actually, by looking at the error messages, we can  directly tell the causes. Actually, these causes may be fixed sometimes by editing the SQL Loader files or by taking some actions in the source (in Sql Server side).. However; we didn't take any manual actions for fixing these errors.

What we have done is, we just used the "Copy to Oracle" method for the failing tables, and it worked.
We dropped the failed tables(which contains rows in the target) from the target and used Copy to Oracle to re-migrate them.
We didn't drop the failed tables(which doesn't contains any rows in the target) from the target and used Copy to Oracle to re-migrate them directly.
Somehow, the Copy to Oracle, which uses sqlcl in the backend, could migrate these kinds of table data properly. This may also mean that "Copy to Oracle" feature of Sql Developer is a little more stable than the Migration Wizard :)

Monday, October 28, 2019

ODA-- Health Check Reports // the control points & the subtitles

You probably already guessed that I m doing ODA health checks nowadays.
Mostly, these health checks are done on virtualized platforms, but they are applicable for the bare metal platforms too.


In this blog post, I will briefly introduce the things that I check in an ODA Health check.
I usually check these control points, try to detect any errors or misconfiguration, prepare my action plans, my suggestions and lastly create my health check report accordingly.
Almost forgot :) I also write a conclusion for managers... One or two paragraphs which can be understood without having a deep technical knowledge about ODA and its components. A plane and simple conclusion summarizing the controls and action plans.

So here is the list of the subtitles that I use in my ODA health check reports.. (note that they are based on my control points)
  • Current, minimum Supported and Recommended versions
  • Hardware checks
    • ILOMS
    • BIOS information
    •  NETWORK Cabling
    • Power Cabling
    • Led panels (the light in the front panel of the ODA nodes)
  • RDBMS
    • Parameters
    • Performance (mostly based on the AWR reports)
    • Backup policy
    • Errors and traces
    • Version, critical one-off patches and PSU level
  • GRID
    • ASM configuration
    • ASM errors and traces
    • Version, critical one-off patches and PSU level
    •  Cluster resources and cluster interconnect
  • Operating System (OVM)
    • Resource consumption
    • utilization
    • load
    • errors and traces
    • configuration (network configuration included)
    • OAKD & XEND analysis
    • boot logs, last command output
    •  SAR command outputs
  • SOS report outputs
  • OAK Validate outputs
  • ORACHK outputs

ODA X6-2 HA -- Virtualized // Network Architecture / Internal NFS and fault tolerant cabling

Recently dealed with a strange ODA issue.. The environment was a virtualized ODA X6-2 HA and the customer was complaining about the strange networking behaviour of the machine..
The issue appeared during a network H/A switch test...
That is, this ODA machine was connected to 2 network switches (good ones) to provide a full path network fault tolerancy.

As you may already know; this types, I mean ODA X6-2 HA machines have 2 x 4 port ethernet attached to each of their compute nodes. This makes a total of 8 ports, and when cabled correctly with to customer switches; it must supply at least something (in terms of fault tolerancy..)

Okay what do we expect from this kind of a configration?

Actually we expect a fault tolerancy against port failures, against cable failures and even against a switch failure..


Well.. as you may guess, this environment wasn't behaving as expected and that's why this issue was esclated to me...

Before giving you the solution (which is actually a basic one), I will try to give you the big picture...
What I mean by the big picture is the network architecture of a virtualized ODA.
I will give you that info because I see there is a misconception there.

This kind of a misconception may be expected actually.. Because, these machines have an architecture that contains lots of things.. Not only the hardware-related things, but also the software-related things (installations and configurations).

Although the ODA environments can be installed and configured easily. There are Infiniband cards, private networks, NFS shares between Domains, ASM , ACFS, OVM , Shared repositories, bonding, eth devices, virtual networks and virtual bridges installed and configured on them.

Let's start with the interconnect and with the internal NFS which is exported from the ODA_BASE to DOM 0.

The interconnect is based on infiniband.. (it is according to your order ofcourse, but most of the time it is based on the infiniband.)

So the RAC interconnect is based on the infiniband and the cluster_interconnects parameter should be set accordingly for the RAC databases running on the privileged ODA_BASE domain.

NFS shares, that store the shared repositories and the virtual machines files are a little interesting.
ODA_BASE reaches the shelf, actually the disks and creates ACFS/ASM filesystem on top of them, then it shares these filesystems to tDOM0.

So, yes these NFS shares are exported from a child domain (ODA_BASE) to the main domain/hypervisor level (DOM 0).

The role of these shares are very important.. They basically store the virtual machine files and this  means when they are off, the virtual machines are off .

By the way, these NFS shares are not based on the infiniband!
These NFS shares are not based on the ethernet as well..
These NFS shares are acutally based on the virtual bridges, a logical network between ODA_BASE and DOM0.. So this means, the network traffic on these NFS shares are based on memory copy. No physical devices are in the path..

Note that, the virtual bridges are in memory, they are codes, not physical network hardwares.
There is no physical media (ie. Ethernet cable) when we talk about these types of bridges. So their capabilities are based on limits of the CPUs moving the packets between the virtual nics. This configuration should give the best performance.. However; we need to take NFS and TCP traffic-related bottlenecks into the account as well..
In other words; even if this traffic is based on TCP and NFS in the upper layers, this traffic is actually based on memory operations between the VM domains in the background.

There is another thing to be aware of..
These NFS shares are running on the virtual ip addresses (VIPs). These VIP addresses are dynamic cluster resources, as you know.

So, if ODA_BASE fails on node1, the VIP of ODA_BASE node 1 is transferred to ODA_Base node2. However; at this time, the routes change.. In this case, the NFS traffic between node2 and the DOM 0 node 1 goes through the infiniband. (route outputs of DOM0 and ODA_BASE machines support this  info as well.)



Yes, by knowing this, we can say that even when there is problem in the infiniband or ethernet-based network of ODA, the NFS shares (which are very crucial), will continue to work!. The VM machines created on ODA will keep running..
  
Okay, lets take a look at the ethernet side of this.

As mentioned in the beginning of this blog post, there are 4 ethernet ports in each one of the ODA nodes.

When we look from the DOM 0 perspective, we see that in each ODA node we have eth0, eth1,eth2 and eth3.

These devices are paired and bonded. (in active-backup mode in Dom0 )

eth0 and eth1 are the slaves of bond0.
eth1 and eth2 are the slaves of bond1
bond0 is mapped to the virtual bridge named net1,  the virtual bridge named net1
bond1 is mapped to the virtual bridge named net2, and that net1 is mapped to ODA_BASE
bond1 is mapped to the virtual bridge named net2, and that net2 is mapped to ODA_BASE

ODA_BASE machines by default uses net1, so in the background they use -> bond0 in Dom0 , eth0 and eth1 in Dom0

net2 is not configured by default, so we must do an additional configuration for making use of it.

In order to do this configuration, we can use oakcli configure additionalnet or we can use a manual method, / editing the ifcfg files manually.

Anyways, suppose we have 4 port cabled and eth1 is enabled on ODA_BASE. Suppose we  have 2 switches in front of ODA and we connect our cables to these switches randomly. (2 cables to each of the switches per an ODA node)... So now, are we in the safe side? Can we  survive a port-failure , can we survive a switch failure?

Well that depends on the cabling.. This is for sure as this is the cause that made me write this blog post..

That is, we need to have some sort of cross cabling...

We need to ensure that eth0 (DOM0) of a Node and eth1 of that Node (DOM0) should be connected to the different switches.

If we cable them into the same switch, then we can't survice a switch failure, because our bond will be down.. And if it is bond0, we will lose all the databases traffic in ODA_BASE of the relevant node..

There is RAC yes! RAC makes us survive even from this kind of a disaster, but why  creating an extra job for RAC ?  Why losing our active resource because of a network problem:) ?
Bytheway, even in RAC, the application design (TAF and etc..) should be properly done in order to have a zero-downtime in the application layer.

Anyways, the cause that made me write this blog was an incorrect cabling and here I m giving you the correct cabling for eth devices of ODA nodes.