Wednesday, May 17, 2017

RAC ASM -- Disaster Recovery solution (using storage techniques) snapshot/snapmirror, Netapp

This blog post is about an alternative Disaster Recovery solution that I  recently built through a POC (Proof of Concept).
The source system was a 2 Node Oracle RAC (Grid + RDBMS both 11.2.0.4) and the target system residing in DR was a 1 Node environment (GRID + RDBMS both 11.2.0.4).
Both source and target was using ASM for storing all the db files.
This made me think a little bit, becuase this alternative disaster recovery solution was relying on Netapp snapshot and snapmiror mechanisms.
We were planning to put the database to the backup mode (hot backup mode
/alter database begin backup), take the snapshot on the local storage, transfer the snapshot to the DR storage using snapmirror , map the luns created by this snapshot/snapmirror to the target server, end backup in target and open the database successfully in the target.

ASM was my concern, because  I had never done this before.
Actually I have implemented (even automated) this solution 6 years ago, but the filesystems were ext3 ( a cooked one) in that case.

So, I was afraid that ASM header info might get corrupted during the snapshot/snapmirror operations.
However, everything went well.

It is ofcourse not supported by Oracle. However,  I can say that ASM diskgroups can be replicated using the snapshot/snapmirror operations. (ofcourse if you know what you are doing..)

I did 3 tests and didn't encounter any problems.

In this blog post, I will give you the things that I did to replicate a 2 Node RAC system(Grid/ASM+RDBMS) to a single Node System(Grid/ASM+RDBMS) using Netapp's snapshot and snapmirror functionalities.

Following is a representative picture, explaining the work done.
As shown in the following picture, we take the snapshot of the ASM disks, replicate it to the DR using snapmirror and then map the luns to the target server running on the DR site, and make ASM running on the target server to see those disks...



The operation that was done for this, requires both Linux admin and storage admin skills. ( I don't see a need for mentioning the required DBA skills :) )
It requires Linux admin skills, because we do stuff like formatting the drives (we are working on new luns since it is a POC), rescanning the scsi bus and etc.
It requires the Storage admin skills, because we do stuff like taking snapshots, initiating snapmirror, mapping luns and etc.
Lastly, it requires DBA skills, because we put the database to the backup mode, change some init.ora parameters, make the replicated disks be known by ASM, open our database succesfully in the target node (recovery may ve needed), check the log files in case of an error and so on.

Let's see what I did in this POC ->
Note that, this is a technical overview. (doesn't have to include all the steps that I have done and all the commands that I have used).

RAC to SINGLE STORAGE REPLICATION (ASM diskgroup which is built on top of 3 disks)

1. Storage team creates 3 ASM disks (each sized approx. 13gb) and maps these luns to our Linux RAC nodes. (source nodes)
We do the scsi rescan to make Linux see these luns/disks, and then we create partition on them.  (oracleasm createdisk requires partition)

fdisk output just after the scsi rescan:

Disk /dev/sdbk: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000


Disk /dev/sdbl: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000


Disk /dev/sdbm: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x00000000

Attention, before discovering the disks we use fdisk to create partitions on them.
fdisk output just after the partition creations:

Disk /dev/sdbk: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x42926436

    Device Boot      Start         End      Blocks   Id  System
/dev/sdbk1               1       11264    11534320   83  Linux

Disk /dev/sdbl: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xb900c4fd

    Device Boot      Start         End      Blocks   Id  System
/dev/sdbl1               1       12288    12582896   83  Linux

Disk /dev/sdbm: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x65f91faf

    Device Boot      Start         End      Blocks   Id  System
/dev/sdbm1               1       13314    13633520   83  Linux


2. We use oracleasm to create the ASM disks and again using oracleasm, we make ASM see them.

[root@source01 ~]# oracleasm createdisk DEV1 /dev/sdbk1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]# oracleasm createdisk DEV2 /dev/sdbl1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]# oracleasm createdisk DEV3 /dev/sdbm1
Writing disk header: done
Instantiating disk: done

[root@source01 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
[root@source01 ~]# oracleasm listdisks

DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
DATA9
DEV1 ** new disk
DEV2 ** new disk
DEV3 ** new disk
FRA1
FRA2
TEST1

3. In order to make these new ASM disks available in source node 2 (remember this is RAC) , we rescan the scsi and use oracleasm scandisks in source node 2 as well..

root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host0/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host1/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host2/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host3/scan
[root@source02 ~]#  echo "- - -" > /sys/class/scsi_host/host4/scan
[root@source02 ~]# oracleasm  scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DEV2"
Instantiating disk "DEV1"
Instantiating disk "DEV3"
[root@source02 ~]# oracleasm listdisks
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
DATA9
DEV1  ** new disk
DEV2  ** new disk
DEV3  ** new disk
FRA1
FRA2
TEST1

4. After these scsi and ASM disks become available on both of the RAC nodes, we create a diskgroup  called +DEV on these new 3 disks using ASMCA . After that we create a 2 node RAC database called DEVRAC and we configure this database and make all its files (controlfiles,redologs,datafiles etc..) be put into +DEV diskgroup.

SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 13:39:43 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select instance_name from gv$instance;

INSTANCE_NAME
----------------
DEVRAC1
DEVRAC2

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf

SQL> select name from v$controlfile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/control01.ctl
+DEV/devrac/control02.ctl

SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
+DEV/devrac/redo02.log
+DEV/devrac/redo01.log
+DEV/devrac/redo03.log
+DEV/devrac/redo04.log


5. At this point, in order to make the necessary checks in the target environment, we create a table called ERMAN in the source. This table resides on a tablespace called ERMAN. (all the datafiles of this tablespace called ERMAN reside in +DEV diskgroup)
So, we first create a tablespace called ERMAN by creating all its datafiles on diskgroup +DEV. Then, we create a table named ERMAN in the tablespace called ERMAN . (lots of ERMANs.., I know :)

SQL> CREATE TABLESPACE ERMAN DATAFILE '+DEV';

Tablespace created.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
+DEV/devrac/datafile/erman.270.943623731


SQL> create table ERMAN tablespace ERMAN as select  * from dba_objectS;

Table created.

SQL> insert into ERMAN select * from ERMAN;

86293 rows created.

SQL> commit;

Commit complete.

SQL> insert into ERMAN select * from ERMAN;

172586 rows created.

6. We put our database (DEVRAC) to backup mode/hot backup mode and while it is in backup mode, we create a storage snapshot and then do a snapmirror update. After the snapmirror update, we create the luns on the target storage system (using the snapmirrored data) and map these luns to the target database node. After the lun mapping, we rescan the scsi on target node (Linux), and make ASM (running on target node) see these new luns (using oracleasm)

root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host0/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host1/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host2/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host3/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host4/scan

Disk /dev/sdn: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x42926436

   Device Boot      Start         End      Blocks   Id  System
/dev/sdn1               1       11264    11534320   83  Linux
Disk /dev/sdo: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xb900c4fd

   Device Boot      Start         End      Blocks   Id  System
/dev/sdo1               1       12288    12582896   83  Linux

Disk /dev/sdp: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x65f91faf

   Device Boot      Start         End      Blocks   Id  System
/dev/sdp1               1       13314    13633520   83  Linux

[root@targetsinglenode ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DEV1"
Instantiating disk "DEV2"
Instantiating disk "DEV3" 

7. After making the snapmirrored disks available on the target host, we modify the init.ora, which we copy from the source. We modify it to exclude the RAC related parameters, because the target node is a single node.
After modifying the init.ora, we mount the diskgroup named +DEV to the target node's ASM and we startup the DEVRAC database as single node database in our target node.
Note that, after opening the database, we check the table named ERMAN in the target instance to ensure that it is replicated successfully.

[oracle@targetsinglenode ~]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 14:26:35 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option

SQL> alter diskgroup DEV mount;


SQL> startup mount;
ORACLE instance started.

Total System Global Area 2355015680 bytes
Fixed Size                  2255472 bytes
Variable Size             587203984 bytes
Database Buffers         1761607680 bytes
Redo Buffers                3948544 bytes
Database mounted.
SQL> alter database end backup;

Database altered.

SQL> alter database open;

Database altered.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
+DEV/devrac/datafile/erman.270.943623731

6 rows selected.

SQL> select count(*) from ERMAN;

  COUNT(*)
----------
    172586

SQL> alter system checkpoint;

Note: After opening the database, we monitor the alert logs and check the related log files for both RDBMS and GRID and check whether any errors recorded in them.. In conclusion, we don't see any errors.

8. Just in case, we use the replicated database a little bit. We do some read write tests by inserting some data using a loop, switching the log files and so on.. During these tests, we do not encounter any abnormalities.

SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 14:57:11 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> create table ERMAN2 as select * from ERMAN;
BEGIN
    FOR x IN 1 .. 10 LOOP
         INSERT INTO ERMAN2
         SELECT * from ERMAN;
    END LOOP;
END;
Table created.

SQL>   2    3    4    5    6
  7  /

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> commit;

Commit complete.

SQL> archive log all;
ORA-00271: there are no logs that need archiving
SQL> alter system checkpoint;

System altered.

SQL> alter system switch logfile;

System altered.

8. Actually, at this point the POC is successfully finished, but I want to give you a bonus info about the deletion of the databases and other things that we created during this POC.
Basically, we drop the databases using rman. We drop the diskgroups using the ASMCA and we clear these candidate disks using oracleasm(deletedisk).
You can see the "rman drop database method" in the the following example.. The method that I give in the following example is for a single node database, but it can be used in RAC environments as well.. ( note: in order to drop a RAC database using rman, the parameter named cluster_database must be set to "false" before executing the drop database command)

[oracle@targetsinglenode trace]$ sqlplus "/as sysdba"
SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 15:01:06 2017
Copyright (c) 1982, 2013, Oracle.  All rights reserved. 

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> shu immediaTE;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL> startup mount exclusive;
ORACLE instance started.

Total System Global Area 2355015680 bytes
Fixed Size                  2255472 bytes
Variable Size             587203984 bytes
Database Buffers         1761607680 bytes
Redo Buffers                3948544 bytes
Database mounted.
SQL> alter system enable restricted session;

System altered.

[oracle@targetsinglenode trace]$ echo $ORACLE_SID
DEVRAC
[oracle@targetsinglenode trace]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Wed May 10 15:04:12 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DEVRAC (DBID=680719421, not open)

RMAN> drop database including backups;

database name is "DEVRAC" and DBID is 680719421

Do you really want to drop all backups and the database (enter YES or NO)? YES

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=25 device type=DISK
specification does not match any backup in the repository

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=25 device type=DISK
specification does not match any datafile copy in the repository
specification does not match any control file copy in the repository
specification does not match any control file copy in the repository
List of Archived Log Copies for database with db_unique_name DEVRAC
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
2       1    7       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_7.271.943623819

3       1    8       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_8.272.943623843

4       1    9       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_9.273.943626473

7       1    10      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_10.276.943628345

8       1    11      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_11.277.943628349

9       1    12      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_12.278.943628353

10      1    13      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_13.279.943628355

11      1    14      A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_14.280.943628377

1       2    1       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_1.269.943619773

5       2    2       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_2.274.943626473

6       2    3       A 10-MAY-17
        Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_3.275.943626475

deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_7.271.943623819 RECID=2 STAMP=943623818
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_8.272.943623843 RECID=3 STAMP=943623842
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_9.273.943626473 RECID=4 STAMP=943626472
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_10.276.943628345 RECID=7 STAMP=943628345
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_11.277.943628349 RECID=8 STAMP=943628349
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_12.278.943628353 RECID=9 STAMP=943628352
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_13.279.943628355 RECID=10 STAMP=943628356
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_14.280.943628377 RECID=11 STAMP=943628377
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_1.269.943619773 RECID=1 STAMP=943619772
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_2.274.943626473 RECID=5 STAMP=943626473
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_3.275.943626475 RECID=6 STAMP=943626474
Deleted 11 objects

database name is "DEVRAC" and DBID is 680719421
database dropped

Well. That is it :) it is working.. Please feel free to comment.

1 comment :

  1. Great post! This is for a good information, It's very helpful for this blog. And great valu for these information.This is good work you and you are doing well.

    Oracle Training in Chennai

    ReplyDelete