Friday, May 30, 2014

RDBMS-- an unusual ORA-01555

We have seen and learned that ORA-01555 errors mostly caused by long queries which try to get overwritten older images of changed blocks.. As these images are overwritten due to undo management mechanism, those queries have returned ORA-01555 errors.

Yesterday however, I have faced a unsual ORA-01555 error that have made me rethink the behaviour of Oracle in Undo management.
The error was produced for a very short running query ,that returns data in only 1 second.
This was not usual , as the possibility of the needed undo to be overwritten is very low.
The undo management was Auto. Undo tablespace was 7.5 gb and retentio was 900.

So, after analyzing a little further, I have seen that this query was inside a procedure, which was used to manage the partitions.. The procedure was run by a scheduler .. Using this procedure, partitions were periodically dropped and  created at night.
The query was something like;
Select partition_name, ...., .... from user_tab_partitions where table_name=:B1'

So the procedure was running the query above to see the current partitions, and then executes DDL 's to recreate some partitions..
The problem was that this procedure was run by the scheduler multiple times for different tables one after another, and as these DDL operations are autocommit, this process was doing fetch across 
commits..

Select Q1
commit
Select Q1
commit
Select Q1
commit.

Note that: When you commit, your undo will be overwritable in normal circumstances.

I have suspected from the delayed block cleanout first, but to fix the problematic situation that may arise from the delayed block cleanout; the procedure should be modified.. I mean, procedure should be modified such that, it should execute the query once and make its DDL operations using the data returned from the query, or it should execute the query for every DDL but it should not execute the same query twice.. (maybe a sort or hint can be added)

Delayed Block Cleanout represenation: (Ref : Oracle Support)

Stage 1 - No Changes Made
Description: This is the starting point. At the top of the data block we have an area used to link active transactions to a rollback segment (the 'tx' part), and the rollback segment header has a table that stores information upon all the latest transactions that have used that rollback segment.
In our example, we have two active transaction slots (01 and 02) and the next free slot is slot 03. (Since we are free to overwrite committed transactions.)
Stage1
Stage 2 - Row 2 is Updated
Description: We have now updated row 2 of block 500. Note that the data block header is updated to point to the rollback segment 5, transaction slot 3 (5.3) and that it is marked uncommitted (Active).
Stage2
Stage 3 - The User Issues a Commit
Description: Next the user hits commit. Note that all that this does is it updates the rollback segment header's corresponding transaction slot as committed. It does *nothing* to the data block.
Stage3
Stage 4 - Another User Selects Data Block 500
Description: Some time later another user (or the same user) revisits data block 500. We can see that there is an uncommitted change in the data block according to the data block's header.
Oracle then uses the data block header to look up the corresponding rollback segment transaction table slot, sees that it has been committed, and changes data block 500 to reflect the true state of the datablock. (i.e. it performs delayed cleanout).
Stage4

Anyways, as this requires modification, I have tried increasing Undo Tablespace size. I have done this to increase the count of rollback segments.. By increasing the rollback segments , the work would separate to more rollback segment and thus decrease the chance of overwriting a rollback transaction slot ..  

Actually, increasing the size of undo have help, and the error is not appeared any more..
But still, I feel the need to write the possible causes of ORA-01555 errors..

Causes, which may lead to ORA-01555; (Ref: Asktom)

1)The rollback information itself is overwritten
2)The transaction slot in the rollback segment's transaction table (stored in the rollback segment's header) is overwritten

Possible fixes;

1. Increase size of rollback segment which will reduce the likelihood of overwriting rollback information that is needed.
2. Reduce the number of commits.
3. Run the processing against a range of data rather than the whole table.
4. Add additional rollback segments. This will allow the updates etc. to be spread across more rollback segments thereby reducing the chances of overwriting required rollback information.
5. If fetching across commits, the code can be changed so that this is not done.
6. Ensure that the outer select does not revisit the same block at different times during the processing. This can be achieved by : - Using a full table scan rather than an index lookup - Introducing a dummy sort so that we retrieve all the data, sort it and then sequentially visit these data blocks. Note that: This is only applicable if the reason is the rollback information overwritten(1)

2 comments :

  1. Hi,
    I suffered from the same situation, snapshot too old for queries even with 0 seconds. I check with many blogs but Oracle support's note 1950577.1 helped me.

    ReplyDelete
  2. Hi Muhammad,

    A useful doc.
    Thanks for sharing

    ReplyDelete