Tuesday, May 5, 2015

Exadata/ACFS -- ASM Resilvering vs ASM Rebalance

The term resilvering is actually a new thing for me.. I saw resilvering processes running in ODA X4-2 systems and wanted a shed some light on them.. How resilvering differs from rebalancing?
Resilvering simply means copying of data from one side of the mirror to another. It is like rebuilding.
Rebalancing, on the other hand; means distributing the data accross the available disks on a disk group..

When we look to these terms from a failure perspective, we can say that;

Resilvering can be seen in Exadata.. For example: ASM resilvering takes place when a write-back mode enabled flash disk fails.. Actually while the mirror copy of it is being written to the disk..
Following picture describes the resilvering process used in Exadata:

Picture Ref: http://www.lunar2013.com

ASM resilvering can also be seen in ACFS based virtualized ODA X4-2 environments. The shared repository volumes may be resilvered after a failure situation.

Here is a quote from an example log file taken from an ODA system:

Asm_startRecovery: recovery needed. 
Asm_relGateway: odlm_unlock returned 0. 
ADVMK-00010: Mirror recovery for volume VMREPO1 in diskgroup DATA started. 
asmResilver2[63209] Asm_resilverVolume: vol VMREPO1, start 

Here is what I can found about in Oracle Community:

As ACFS and ADVMK use ASM in the backend, the ADVM driver will ensure ASM Dynamic volume mirror consistency and for the recovery of only the dirty regions in cases of node and ASM instance failures. This is accomplished using DRL. DRL is an
industry-common optimization for mirror consistency and the recovery of mirrored extents. DRL requires write ahead logging for its mirrored writes.

Resilvering also appears in front of us in the ZFS worlds.
Almost the same thing but a different system..
Reference: Oracle
The process of replacing a device can take an extended period of time, depending on the size of the device and the amount of data in the pool. The process of moving data from one device to another device is known as resilvering and can be monitored by using the zpool statuscommand.
Traditional file systems resilver data at the block level. Because ZFS eliminates the artificial layering of the volume manager, it can perform resilvering in a much more powerful and controlled manner. The two main advantages of this feature are as follows:
ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-GB disk can take seconds if a pool has only a few gigabytes of used disk space.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.

Rebalancing is more familiar to us.. It takes place when a disk fails or when we add new disks in an ASM environment ..
That is ; If a disk fails(as I said we are looking from a failure perspective), ASM distributes the data stored in the failed disk to the rest of available disks. The rebalancing can be described with something like the following picture:

                                                                  Ref: manchev.org
That's all about this topic for now..
any comments will be appreciated..

1 comment :