Tuesday, December 10, 2013

Exadata -- ZBR and Interleaving

Exadata introduces an opportunity of having Interleaved Disks, actually Interleaved Grid Disks..  Actually, interleaving is defined in the Cell Disk Layer, but Grid Disks become interleaved..
As known, Griddisks are the fourth layer of the disk abstraction in Exadata.. They are created on top of Cell Disks(third layer of abstraction) and used for building the ASM Groups.


Cell Disks can be considered as physical disks or luns , that we see on fdisk -l output on Linux/Unix systems( actually it s not, but lets suppose). Grid Disks can be thought as partitions on physical disk or luns. 
So by default, when we create our first grid disk on a cell disk, Exadata starts the creation from the fartest sectors.. Thus the first grid disk created becomes faster than the remaining disks..

This is an expected behaviour, as by default Exadata uses non-interleaving Grid Disks..

The interleaving Grid disks concepts is based on dividing the grid disks in multiple parts, thus protecting the perfomance across them..

In other words; by using the interleaving option in Exadata the cell disks are intelligently divided in to grid disks which are equal in performance.

In this interleaving disk concept, Exadata actually takes benefit of Zone Bit Recording (ZBR) to have higher transfer IO bandwith on outer part of the hard disks.. In addittion to that , Exadata manages the allocation of disk parts using its interleaving option in order to have equal disk portion in terms of performance..

To understand better, lets see the concept of disk structures..

Following shape describes the structure of a normal disk(none ZBR).
As you see below; altough distance from the center increases, the number of sectors in a given angle does not change...

A: Disk structure showing a track ,
B: a sector
C: a sector of track
D: a cluster of sectors


On the other hand;

Following shape, on the other hand; describes the structure of a disk, that uses ZBR(Zone Bit Recording).

Red: The closest to the center
Green : In the middle, have more sectors than Red.
Grey: Farthest to the center, have more sectors than Red.


As you above, as the distance from the center increases, the number of sectors in a given angle also increases.  As standard hard disks have a constant angular velocity, which means regardless of where the heads are, the same speed is used to turn the media.. So by knowing this, we can say that; the path that a disk would travers in  a 360 degree turn will be more in outer part of the disk, so more data will be read.. This will increase linear velocity.

Lets go back to the exadata.. 
When grid disks are created without interleaving option, Exadata allocates the outer part first... Like most of the operating systems -> Since both hard disks and floppy disks typically number their tracks beginning at the outer edge and continuing inward, and since operating systems typically fill the lowest-numbered tracks first, this is where the operating system typically stores its own files during its initial installation onto an empty drive.  So, considering the ZBR , the first disk created will be faster, and because of the ZBR's nature, the second disk will be slower..

Exadata brings a solution for that.. Actually it s an option, optional thing called Interleaving.
Interleaving is defined in Cell disk layer.
Note that: Whether you choose to use no interleaving or interleavig, Oracle will allocate the first set of extents on the outer part of the physical disk..
Oracle has two options for Interleaving . Normal redundancy and High redundancy.. These redundancies are not like ASM redundancy.. They have different meaning..
In Normal Redundancy, Oracle divides your grid disk into two ranges of disk tracks. In High Redundancy: Oracle divides your grid disk into three ranges of disk tracks..

So to create a cell disk with interleaving; we use the following command;
create celldisk interleaving_erman lun=0_11 INTERLEAVING='normal_redundancy'

After creating the cell disk , we can place the grid disks on this Cell disk with the following commands..

create griddisk DATA1 celldisk=interleaving_erman, size=.........
create griddisk RECO1 celldisk=interleaving_erman.

So the diagrammatic explanation of the output is the following;


As represented above, the cell disk named interleaving_erman is divided into 2 equal portion.  Grid disk DATA1 is placed  on the ranges of disk track, painted red..  Grid Disk RECO is placed on the ranges of disk tracks, painted green. In this scenario; DATA1 and RECO are considered almost equal in overall performance. Note that, the ranges in above shape can be changed accoring to the Grid Disk size specified.
What happens here is, the disk is divided in two parts..(%50 - %50 ).. And in each part, the disk is divided again..
So if we have 600G cell disk with interleaving normal redundancy and if we create two Grid Disks in sequence.. (Grid Disk A with 250G, Grid Disk B with 350G)  it happens as follows;
Oracle will divide the disk in to 300 gb pieces.
In the first part : Oracle will place 125 G of Grid Disk A(most outer) -> RED , and 175 G of Grid Disk B -> GREEN
In the second part: Oracle will place 125 GB of Grid Disk A -> RED  , and 175 G of Grid Disk B -> GREEN.

So the size of the rings will differ.. I didnt do the math, but this should be a good approach to catch the same IO rates for the interleaving disks..

In brief,
Oracle emphasize that, different ASM disk groups can share the Cell Disks without a performance bias, if you prefer to use Interleaving .. In other words, by using interleaving, Oracle can divide the Cell Disk in to the parts which are equal in performance(really equal?, it seems not exactly but almost equal to me :))
The idea behind that is ZBR. In ZBR, the tracks in the same zone are recorded with the same read/write rate..  So, Oracle's disk dividing operation provides using this zones intelligently..

No comments :

Post a Comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.