Monday, June 29, 2015

RDBMS -- Why is it called Redo Change Vector

In this blogpost, I will try to find why the Redo Change Vector is used in Oracle for describing the smallest unit of changes.
If you ask to Vikipedia what the Change Vector means, you  will get the following answer;
In an Oracle database, a change vector consists of a single change to a single data block. A change vector is the smallest unit of change recorded in a redo log.

If you ask the question "what is a vector " to a physicist,  you will get thollowing answer;
Vectors are quantities that are fully described by both a magnitude and a direction. So a vector has a magnitude and direction.
So the change vector in general must be someting like ;
a vector which is created by the the differences between the two or more vectors between  two limits of time.

Anyways, as the changes in Oracle Database done in the Database blocks. The change vectors actually created for describing the changes made for the database blocks.
They are vectors, but not derived by the the differences between the two or more vectors between two points in time, however they are derived by the differences between the states of a data blocks.

For understanding the Oracle redo change vector and why Oracle calls the smallest unit of changes as a change vectors, the magnitude of the change vector can be thought as the amount of change and the direction of the change may be thought as the type of change. 

By thinking like above, we can say that the vector rules can apply for Oracle change vectors too;

Suppose an insert is north and delete is south (as insert is the opposite of delete), so Inserting the value 'ERMAN' into a table ERMAN_TABLE will create a change vector for the corresponding initially empty block, which has a "magnitude/amount of change" as 5 characters and has a direction/type as an insert /north.
So , deleting the value of 'ERMAN' will also create a change vector for the same corresponding block which has a "magnitude/amount of change" as 5 characters and "has a direction/type" as a delete/south.

So if we do these changes in sequence, we will have our block empty again.
Actually, we will have no change at the bottom line as these change vectors are anti parallel and that's why will cancel eachother and as a result there will be no change in the initial state of the corresponding block.
In other words; if we add the second change(delete ERMAN) vector to the first change(insert WERMAN) vector, we will have no vectors, so we will have no change. Thus, the database block will be come back to its initial state where it was empty .

Likewise, if we make another insert afterwards, this will create a new change vector and when we add this third change vector to the first and second change vectors, we will find the contents of the data block.
That is , we will find a vector pointing to the final state of the database block. (insert as the directory and 5 is the value)
Insert ERMAN + Delete ERMAN + Insert ERMAN = ERMAN in the database blocks.

But what if we insert ERMAN and delete NAMRE. According to the principle above, these two vectors should cancel eachother , too. It is because we said that the amount of change is the magnitude and since ERMAN=NAMRE in amount so these two will cancel eachother.
So this is wrong.
If that would be true, then 12 steps south should cancel out 21 steps north..
We have to think the amount of change is the DATA, not the length of the DATA.
So if the amount of change/magnitude is the DATA , then our principle works.
Suppose, at a time, the block is empty ; "Insert ERMAN + delete ERMAN" will empty db block again.
Supoose, at atime , we have only "ERMAN" in the db block, then "delete ERMAN + INSERT ERMAN" will make ERMAN to be stored in db block again.
However; insert ERMAN + delete NAMRE will be something different.
So the vector operation principles continue to apply.

When talking about the data actually, we need to talk about the rows.. We can think the rows as vectors. 
A row in a table of N columns may be thought as an N-dimensional vector and the difference for two rows/vectors (the new version and the old version of a rows) will also give us a new vector which can be called a change vector. This change vector will reflect the changes between old version of a row and the new version of it.
Note that, We have only the new value of the data in a redolog change vector, so if it is a change vector the new value must be the change actually, in other words the difference should be the new value of the data. This makes sense to me, as the new data for a row must be thought as a completely new thing, so if it is a completely new thing than difference between the new data and old data will be the new data. So the standard vector operations still apply.

It is hard to know the reason why Oracle Corparation decided to use the name Redolog Change Vector, But, these are my thoughts about this subject. They make sense to me, but they may also be wrong .
So what are your thoughts?

No comments :

Post a Comment