This blog post is about an alternative Disaster Recovery solution that I recently built through a POC (Proof of Concept).
The source system was a 2 Node Oracle RAC (Grid + RDBMS both 11.2.0.4) and the target system residing in DR was a 1 Node environment (GRID + RDBMS both 11.2.0.4).
Both source and target was using ASM for storing all the db files.
This made me think a little bit, becuase this alternative disaster recovery solution was relying on Netapp snapshot and snapmiror mechanisms.
We were planning to put the database to the backup mode (hot backup mode
/alter database begin backup), take the snapshot on the local storage, transfer the snapshot to the DR storage using snapmirror , map the luns created by this snapshot/snapmirror to the target server, end backup in target and open the database successfully in the target.
ASM was my concern, because I had never done this before.
Actually I have implemented (even automated) this solution 6 years ago, but the filesystems were ext3 ( a cooked one) in that case.
So, I was afraid that ASM header info might get corrupted during the snapshot/snapmirror operations.
However, everything went well.
It is ofcourse not supported by Oracle. However, I can say that ASM diskgroups can be replicated using the snapshot/snapmirror operations. (ofcourse if you know what you are doing..)
I did 3 tests and didn't encounter any problems.
In this blog post, I will give you the things that I did to replicate a 2 Node RAC system(Grid/ASM+RDBMS) to a single Node System(Grid/ASM+RDBMS) using Netapp's snapshot and snapmirror functionalities.
Following is a representative picture, explaining the work done.
As shown in the following picture, we take the snapshot of the ASM disks, replicate it to the DR using snapmirror and then map the luns to the target server running on the DR site, and make ASM running on the target server to see those disks...
The operation that was done for this, requires both Linux admin and storage admin skills. ( I don't see a need for mentioning the required DBA skills :) )
It requires Linux admin skills, because we do stuff like formatting the drives (we are working on new luns since it is a POC), rescanning the scsi bus and etc.
It requires the Storage admin skills, because we do stuff like taking snapshots, initiating snapmirror, mapping luns and etc.
Lastly, it requires DBA skills, because we put the database to the backup mode, change some init.ora parameters, make the replicated disks be known by ASM, open our database succesfully in the target node (recovery may ve needed), check the log files in case of an error and so on.
Let's see what I did in this POC ->
Note that, this is a technical overview. (doesn't have to include all the steps that I have done and all the commands that I have used).
RAC to SINGLE STORAGE REPLICATION (ASM diskgroup which is built on top of 3 disks)
1. Storage team creates 3 ASM disks (each sized approx. 13gb) and maps these luns to our Linux RAC nodes. (source nodes)
We do the scsi rescan to make Linux see these luns/disks, and then we create partition on them. (oracleasm createdisk requires partition)
fdisk output just after the scsi rescan:
Disk /dev/sdbk: 11.8 GB,
11811160064 bytes
64 heads, 32 sectors/track,
11264 cylinders
Units = cylinders of 2048 *
512 = 1048576 bytes
Sector size (logical/physical):
512 bytes / 4096 bytes
I/O size (minimum/optimal):
4096 bytes / 65536 bytes
Disk identifier: 0x00000000
Disk /dev/sdbl: 12.9 GB,
12884901888 bytes
64 heads, 32 sectors/track,
12288 cylinders
Units = cylinders of 2048 *
512 = 1048576 bytes
Sector size
(logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal):
4096 bytes / 65536 bytes
Disk identifier: 0x00000000
Disk /dev/sdbm: 14.0 GB,
13960740864 bytes
64 heads, 32 sectors/track,
13314 cylinders
Units = cylinders of 2048 *
512 = 1048576 bytes
Sector size
(logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal):
4096 bytes / 65536 bytes
Disk identifier: 0x00000000
Attention, before discovering the disks we use fdisk to create partitions on them.
fdisk output just after the partition creations:
Disk /dev/sdbk: 11.8 GB,
11811160064 bytes
64 heads, 32 sectors/track,
11264 cylinders
Units = cylinders of 2048 *
512 = 1048576 bytes
Sector size
(logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal):
4096 bytes / 65536 bytes
Disk identifier: 0x42926436
Device Boot Start End Blocks
Id System
/dev/sdbk1 1 11264
11534320 83 Linux
Disk /dev/sdbl: 12.9 GB, 12884901888
bytes
64 heads, 32 sectors/track,
12288 cylinders
Units = cylinders of 2048 *
512 = 1048576 bytes
Sector size
(logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal):
4096 bytes / 65536 bytes
Disk identifier: 0xb900c4fd
Device Boot Start End Blocks
Id System
/dev/sdbl1 1 12288
12582896 83 Linux
Disk /dev/sdbm: 14.0 GB,
13960740864 bytes
64 heads, 32 sectors/track,
13314 cylinders
Units = cylinders of 2048 *
512 = 1048576 bytes
Sector size
(logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal):
4096 bytes / 65536 bytes
Disk identifier: 0x65f91faf
Device Boot Start End Blocks
Id System
/dev/sdbm1 1 13314
13633520 83 Linux
2. We use oracleasm to create the ASM disks and again using oracleasm, we make ASM see them.
[root@source01 ~]#
oracleasm createdisk DEV1 /dev/sdbk1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]# oracleasm
createdisk DEV2 /dev/sdbl1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]#
oracleasm createdisk DEV3 /dev/sdbm1
Writing disk header: done
Instantiating disk: done
[root@source01 ~]#
oracleasm scandisks
Reloading disk partitions:
done
Cleaning any stale ASM
disks...
Scanning system for ASM
disks...
[root@source01 ~]#
oracleasm listdisks
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
DATA9
DEV1 ** new disk
DEV2 ** new disk
DEV3 ** new disk
FRA1
FRA2
TEST1
3. In order to make these new ASM disks available in source node 2 (remember this is RAC) , we rescan the scsi and use oracleasm scandisks in source node 2 as well..
root@source02 ~]# echo "- - -" > /sys/class/scsi_host/host0/scan
[root@source02 ~]# echo "- - -" > /sys/class/scsi_host/host1/scan
[root@source02 ~]# echo "- - -" > /sys/class/scsi_host/host2/scan
[root@source02 ~]# echo "- - -" > /sys/class/scsi_host/host3/scan
[root@source02 ~]# echo "- - -" > /sys/class/scsi_host/host4/scan
[root@source02 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DEV2"
Instantiating disk "DEV1"
Instantiating disk "DEV3"
[root@source02 ~]# oracleasm listdisks
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATA8
DATA9
DEV1 ** new disk
DEV2 ** new disk
DEV3 ** new disk
FRA1
FRA2
TEST1
4. After these scsi and ASM disks become available on both of the RAC nodes, we create a diskgroup called +DEV on these new 3 disks using ASMCA . After that we create a 2 node RAC database called DEVRAC and we configure this database and make all its files (controlfiles,redologs,datafiles etc..) be put into +DEV diskgroup.
SQL*Plus: Release
11.2.0.4.0 Production on Wed May 10 13:39:43 2017
Copyright (c) 1982, 2013,
Oracle. All rights reserved.
Connected to:
Oracle Database 11g
Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real
Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real
Application Testing options
SQL> select
instance_name from gv$instance;
INSTANCE_NAME
----------------
DEVRAC1
DEVRAC2
SQL> select name from
v$datafile;
NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
SQL> select name from
v$controlfile;
NAME
--------------------------------------------------------------------------------
+DEV/devrac/control01.ctl
+DEV/devrac/control02.ctl
SQL> select member from
v$logfile;
MEMBER
--------------------------------------------------------------------------------
+DEV/devrac/redo02.log
+DEV/devrac/redo01.log
+DEV/devrac/redo03.log
+DEV/devrac/redo04.log
5. At this point, in order to make the necessary checks in the target environment, we create a table called ERMAN in the source. This table resides on a tablespace called ERMAN. (all the datafiles of this tablespace called ERMAN reside in +DEV diskgroup)
So, we first create a tablespace called ERMAN by creating all its datafiles on diskgroup +DEV. Then, we create a table named ERMAN in the tablespace called ERMAN . (lots of ERMANs.., I know :)
SQL> CREATE TABLESPACE
ERMAN DATAFILE '+DEV';
Tablespace created.
SQL> select name from
v$datafile;
NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
+DEV/devrac/datafile/erman.270.943623731
SQL> create table ERMAN
tablespace ERMAN as select * from
dba_objectS;
Table created.
SQL> insert into ERMAN
select * from ERMAN;
86293 rows created.
SQL> commit;
Commit complete.
SQL> insert into ERMAN
select * from ERMAN;
172586 rows created.
6. We put our database (DEVRAC) to backup mode/hot backup mode and while it is in backup mode, we create a storage snapshot and then do a snapmirror update. After the snapmirror update, we create the luns on the target storage system (using the snapmirrored data) and map these luns to the target database node. After the lun mapping, we rescan the scsi on target node (Linux), and make ASM (running on target node) see these new luns (using oracleasm)
root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host0/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host1/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host2/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host3/scan
[root@targetsinglenode ~]# echo "- - -" > /sys/class/scsi_host/host4/scan
Disk /dev/sdn: 11.8 GB, 11811160064 bytes
64 heads, 32 sectors/track, 11264 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x42926436
Device Boot Start End Blocks Id System
/dev/sdn1 1 11264 11534320 83 Linux
Disk /dev/sdo: 12.9 GB, 12884901888 bytes
64 heads, 32 sectors/track, 12288 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xb900c4fd
Device Boot Start End Blocks Id System
/dev/sdo1 1 12288 12582896 83 Linux
Disk /dev/sdp: 14.0 GB, 13960740864 bytes
64 heads, 32 sectors/track, 13314 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0x65f91faf
Device Boot Start End Blocks Id System
/dev/sdp1 1 13314 13633520 83 Linux
[root@targetsinglenode ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "DEV1"
Instantiating disk "DEV2"
Instantiating disk "DEV3"
7. After making the snapmirrored disks available on the target host, we modify the init.ora, which we copy from the source. We modify it to exclude the RAC related parameters, because the target node is a single node.
After modifying the init.ora, we mount the diskgroup named +DEV to the target node's ASM and we startup the DEVRAC database as single node database in our target node.
Note that, after opening the database, we check the table named ERMAN in the target instance to ensure that it is replicated successfully.
[oracle@targetsinglenode ~]$ sqlplus "/as sysasm"
SQL*Plus: Release 11.2.0.4.0 Production on Wed May 10 14:26:35 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option
SQL> alter diskgroup DEV mount;
SQL> startup mount;
ORACLE instance started.
Total System Global Area 2355015680 bytes
Fixed Size 2255472 bytes
Variable Size 587203984 bytes
Database Buffers 1761607680 bytes
Redo Buffers 3948544 bytes
Database mounted.
SQL> alter database end backup;
Database altered.
SQL> alter database open;
Database altered.
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
+DEV/devrac/system01.dbf
+DEV/devrac/sysaux01.dbf
+DEV/devrac/undotbs01.dbf
+DEV/devrac/users01.dbf
+DEV/devrac/undotbs02.dbf
+DEV/devrac/datafile/erman.270.943623731
6 rows selected.
SQL> select count(*) from ERMAN;
COUNT(*)
----------
172586
SQL> alter system checkpoint;
Note: After opening the database, we monitor the alert logs and check the related log files for both RDBMS and GRID and check whether any errors recorded in them.. In conclusion, we don't see any errors.
8. Just in case, we use the replicated database a little bit. We do some read write tests by inserting some data using a loop, switching the log files and so on.. During these tests, we do not encounter any abnormalities.
SQL*Plus: Release
11.2.0.4.0 Production on Wed May 10 14:57:11 2017
Copyright (c) 1982, 2013,
Oracle. All rights reserved.
Connected to:
Oracle Database 11g
Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning,
Automatic Storage Management, OLAP, Data Mining
and Real Application
Testing options
SQL> create table ERMAN2
as select * from ERMAN;
BEGIN
FOR x IN 1 .. 10 LOOP
INSERT INTO ERMAN2
SELECT * from ERMAN;
END LOOP;
END;
Table created.
SQL> 2
3 4 5
6
7 /
PL/SQL procedure
successfully completed.
SQL>
SQL>
SQL> commit;
Commit complete.
SQL> archive log all;
ORA-00271: there are no
logs that need archiving
SQL> alter system
checkpoint;
System altered.
SQL> alter system switch
logfile;
System altered.
8. Actually, at this point the POC is successfully finished, but I want to give you a bonus info about the deletion of the databases and other things that we created during this POC.
Basically, we drop the databases using rman. We drop the diskgroups using the ASMCA and we clear these candidate disks using oracleasm(deletedisk).
You can see the "rman drop database method" in the the following example.. The method that I give in the following example is for a single node database, but it can be used in RAC environments as well.. ( note: in order to drop a RAC database using rman, the parameter named cluster_database must be set to "false" before executing the drop database command)
[oracle@targetsinglenode trace]$ sqlplus "/as sysdba"
SQL*Plus: Release
11.2.0.4.0 Production on Wed May 10 15:01:06 2017
Copyright (c) 1982, 2013,
Oracle. All rights reserved.
Connected to:
Oracle Database 11g
Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning,
Automatic Storage Management, OLAP, Data Mining
and Real Application
Testing options
SQL> shu immediaTE;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL> startup mount
exclusive;
ORACLE instance started.
Total System Global Area
2355015680 bytes
Fixed Size 2255472 bytes
Variable Size 587203984 bytes
Database Buffers 1761607680 bytes
Redo Buffers 3948544 bytes
Database mounted.
SQL> alter system enable
restricted session;
System altered.
[oracle@targetsinglenode trace]$ echo $ORACLE_SID
DEVRAC
[oracle@targetsinglenode trace]$ rman target /
Recovery Manager: Release
11.2.0.4.0 - Production on Wed May 10 15:04:12 2017
Copyright (c) 1982, 2011, Oracle
and/or its affiliates. All rights
reserved.
connected to target
database: DEVRAC (DBID=680719421, not open)
RMAN> drop database
including backups;
database name is
"DEVRAC" and DBID is 680719421
Do you really want to drop
all backups and the database (enter YES or NO)? YES
using target database
control file instead of recovery catalog
allocated channel:
ORA_DISK_1
channel ORA_DISK_1: SID=25
device type=DISK
specification does not
match any backup in the repository
released channel:
ORA_DISK_1
allocated channel:
ORA_DISK_1
channel ORA_DISK_1: SID=25
device type=DISK
specification does not
match any datafile copy in the repository
specification does not
match any control file copy in the repository
specification does not
match any control file copy in the repository
List of Archived Log Copies
for database with db_unique_name DEVRAC
=====================================================================
Key Thrd Seq S Low Time
------- ---- ------- -
---------
2 1
7 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_7.271.943623819
3 1
8 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_8.272.943623843
4 1
9 A 10-MAY-17
Name: +DEV/devrac/archivelog/2017_05_10/thread_1_seq_9.273.943626473
7 1
10 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_10.276.943628345
8 1
11 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_11.277.943628349
9 1
12 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_12.278.943628353
10 1
13 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_13.279.943628355
11 1
14 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_1_seq_14.280.943628377
1 2
1 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_2_seq_1.269.943619773
5 2
2 A 10-MAY-17
Name: +DEV/devrac/archivelog/2017_05_10/thread_2_seq_2.274.943626473
6 2
3 A 10-MAY-17
Name:
+DEV/devrac/archivelog/2017_05_10/thread_2_seq_3.275.943626475
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_7.271.943623819 RECID=2
STAMP=943623818
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_8.272.943623843 RECID=3
STAMP=943623842
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_9.273.943626473 RECID=4
STAMP=943626472
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_10.276.943628345 RECID=7
STAMP=943628345
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_11.277.943628349 RECID=8
STAMP=943628349
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_12.278.943628353 RECID=9
STAMP=943628352
deleted archived log
archived log file name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_13.279.943628355
RECID=10 STAMP=943628356
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_1_seq_14.280.943628377 RECID=11
STAMP=943628377
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_1.269.943619773 RECID=1
STAMP=943619772
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_2.274.943626473 RECID=5
STAMP=943626473
deleted archived log
archived log file
name=+DEV/devrac/archivelog/2017_05_10/thread_2_seq_3.275.943626475 RECID=6
STAMP=943626474
Deleted 11 objects
database name is
"DEVRAC" and DBID is 680719421
database dropped
Well. That is it :) it is working.. Please feel free to comment.