Friday, August 9, 2019

Weblogic -- Disaster Recovery implementations

No matter the application code is delivered by Oracle or not, we do see Weblogic in our implementation projects.

Like the FMW products such as OAM, OID and SOA use it as a built-in application server, other important Oracle application like EBS makes use of the enhanced capabilities of Weblogic Server in its application tier.

In addition to these packaged Oracle applications, we also see  thatcustom java code running on Weblogic and new projects are deployed on it.

Most of the applications residing on Weblogic and apps tier, have also a database layer for storing and querying data.

When it comes to deciding on the DR implementation, we easily conclude on the database disaster recovery methods, don't we?

For instance, we directly decide to use the Data Guard if the database in the source and target environments are both Oracle. Only if the database is not an Enterprise Edition Oracle Database (which means Data Guard can not be used), we think about other alternative solutions.

So far so good.

However; building and deciding on a correct disaster recovery solution for the Weblogic/the apps tier, is usually a little bit compex for us, for the DBAs.

In this post, I will shed a light on this subject by giving you the required method and prerequisities for it.

First of all, Weblogic DR implementation can be done by replicating the Weblogic filesystem basically.

We can either do it by a storage replication (like Netapp's Snapmirror) or by using a 3rd party tool  (like rsync)

If we don't have a storage environment, which is capable of replication the Weblogic filesystem across storages, then we can still implement our DR using a tool like rsync.

The replication that must be done for feeding the DR environment should be an as-is replication.
Such a replication should be done automatically (for instance using a scheduler like crond) and it is recommended to replicate the Weblogic filesystem at least once in a day.

This replication can be done while the Weblogic application server is running.

Patching activities should be done on the primary first.

After a successful patching operation,  the replication routine should be triggered manually to reflect the changes directly to the Weblogic DR filesystem.(considering the DB is already being replicated in the backend -- a manual database syncronization can also be triggered after these patching activities)

As for the switchover and failover operation;

If the the hostname of the primary Weblogic Server and the hostname of DR site Weblogic Server are the same, then we can start the services directly without doing anything extra in case of a failover/or switchover. (ofcourse we need to change the direction of the replication)

However; If  the hostname of the primary Weblogic Server and hostname of the DR site Weblogic Server are different, then we need to configure a virtual hostname for this weblogic environment. We need to configure it both for admin Server and Managed servers.

That is, the listen adress of the admin and managed servers should be based on the virtual hostnames, and that virtual hostname must be resolvable both from the Primary and DR site. Thus, even if the physical hostnames are different, we can still do a failover by just starting the services on the DR Site, without having to do anything extra.

If we have different hostnames for Primary and Disaster Sites and if we don't have a virtual hostname configured, we need to do some config changes in case of a failover or switchover ( config changes in files such as config.xml)

Ofcourse, the same logical requirements aplly for a database failover as well ( if that database is used by a Weblogic Server)

In a case where we switchover or failover the database tier of a Weblogic installation, then we need to change the database configuration of our Weblogic environment. (we have Data Sources right..)

Again, if we use virtual hostname for the db tier, or if we use a load balancer and configure our database and weblogic to use the hostname which is managed by the load balancer, then we can do our database layer failover by just starting/activating the DR site database without having to do anything extra.

I hope you get the idea.

Lastly, I will give you the list of actions which can be taken to do a failover or a switchover operation on a recommend Weblogic-Database configuration;

To perform a failover or switchover from the production site to the standby site when you use rsync:

*Shut down any processes running on the production site (if applicable).
*Stop rsync jobs between the production site hosts and standby site peer hosts.
*Use Oracle Data Guard to failover the production site databases to the standby site.
*On the standby site, manually start the processes for the Oracle Fusion Middleware Server instances.
*Route all user requests to the standby site by performing a global DNS push or something similar, such as updating the global load balancer.
*Use a browser client to perform post-failover or post-switchover testing to confirm that requests are being resolved at the standby site (current production site).
*At this point, the standby site is the new production site and the production site is the new standby site.
*Reestablish rsync between the two sites, but configure it so that replications go now in the opposite direction (from the current production site to the current standby site).

Friday, August 2, 2019

Bash -- Reading the alert log, Writing to the syslog, awk, logger, syslogd

Today, I want to share a bash script that I wrote yesterday.
This kind of a bash script was required in a project, where we needed to read the alert log line by line, and without missing anything, writing all the Oracle alert log messages to the syslog.
(Linux -- /var/log/messages in our case)
After a long time, I found my self writing scripts again. This was fun! :)

I used awk and logger utilities for the read and write operations mainly.
I used bash functions to make the code a bit more understandable.

The things that I do with this script are;

I check my saved counter, which is stored in a file to check where I left (the last line of the Alert log that I read in my last read)
If I find a gap, then I read the gap by using a loop, increment my line counter and save it.
If I find only a single line to be read, then I read it with a single instruction, increment my line counter and save it.
If I find my saved counter and alert log line count are in sync, then I almost don't do anything.
After I read the line or lines of the alert log , I write to syslog (/var/log/messages , using the logger utility through syslogd - using the local0 as a notice)

- Note the following; 
I wrote this script in 30 minutes and I share it with you just to give you an idea.. This kind of a script can be modified to be better.
This script can be modified to include a check pattern or to include a transformation routine before writing the alert log contents to the syslog.
The script can be modified to read the paths and everyting from the variables. 
This kind of a script can be required when our customer wants to see Oracle's alert log messages almost realtime in /var/log/messages.
This kind of a script can be daemonized with a while loop or it can scheduled using crontab.

I m sharing the script below.. I hope you find it useful...

ALERT LOG CHECKER - SYSLOG WRITER

### The scripts starts here
### Function Defitinions

initialize_counters()
{
let last_line=`wc -l /u001/u01/app/oracle/diag/rdbms/orcl/orcl/trace/alert_orcl.log | awk '{print $1}'`;
let saved_line=`cat /root/script_erman/counter_savepoint`;
let check_number=$saved_line+1;
}
read_multiple_lines()
{
/dev/null > /root/script_erman/output > /dev/null 2>&1
let tail_line=$last_line-$saved_line;
for (( c=$tail_line; c>0 ;c-- ))
do
let awk_line=$last_line-$c+1;
read_command=`echo awk NR==$awk_line /u001/u01/app/oracle/diag/rdbms/orcl/orcl/trace/alert_orcl.log;`
$read_command >> /root/script_erman/output
if [ $? -ne 0 ]
then
exit
fi
done
cat /root/script_erman/output | write_to_syslog
}
read_single_line()
{
read_command=`echo awk NR==$last_line /u001/u01/app/oracle/diag/rdbms/orcl/orcl/trace/alert_orcl.log;`
$read_command > /root/script_erman/output
if [ $? -ne 0 ]
then
exit
else
cat /root/script_erman/output | write_to_syslog
fi
}
do_almost_nothing()
{
echo "NO ERRORS found in alert log, no log recorded into the Alert log since the last check" | write_to_syslog
exit
}
write_to_syslog()
{
logger -t oracle/DATABASEALERT -p local0.notice
}
checkpoint_to_savepoint()
{
echo $last_line > /root/script_erman/counter_savepoint
exit
}

### Script's main

initialize_counters
if [ "$last_line" -eq "$check_number" ]
then
read_single_line
checkpoint_to_savepoint
elif [ "$last_line" -lt "$check_number" ]
then
do_almost_nothing
else
read_multiple_lines
checkpoint_to_savepoint
fi

### The scripts ends here