Erman Arslan's Oracle Blog: RDBMS-- an unusual ORA-01555

Friday, May 30, 2014

RDBMS-- an unusual ORA-01555

We have seen and learned that ORA-01555 errors mostly caused by long queries which try to get overwritten older images of changed blocks.. As these images are overwritten due to undo management mechanism, those queries have returned ORA-01555 errors.

Yesterday however, I have faced a unsual ORA-01555 error that have made me rethink the behaviour of Oracle in Undo management.
The error was produced for a very short running query ,that returns data in only 1 second.
This was not usual , as the possibility of the needed undo to be overwritten is very low.
The undo management was Auto. Undo tablespace was 7.5 gb and retentio was 900.

So, after analyzing a little further, I have seen that this query was inside a procedure, which was used to manage the partitions.. The procedure was run by a scheduler .. Using this procedure, partitions were periodically dropped and created at night.
The query was something like;
Select partition_name, ...., .... from user_tab_partitions where table_name=:B1'

So the procedure was running the query above to see the current partitions, and then executes DDL 's to recreate some partitions..

The problem was that this procedure was run by the scheduler multiple times for different tables one after another, and as these DDL operations are autocommit, this process was doing fetch across

commits..

Select Q1

commit

Select Q1

commit

Select Q1

commit.

Note that: When you commit, your undo will be overwritable in normal circumstances.

I have suspected from the delayed block cleanout first, but to fix the problematic situation that may arise from the delayed block cleanout; the procedure should be modified.. I mean, procedure should be modified such that, it should execute the query once and make its DDL operations using the data returned from the query, or it should execute the query for every DDL but it should not execute the same query twice.. (maybe a sort or hint can be added)

Delayed Block Cleanout represenation: (Ref : Oracle Support)

Stage 1 - No Changes Made

Description: This is the starting point. At the top of the data block we have an area used to link active transactions to a rollback segment (the 'tx' part), and the rollback segment header has a table that stores information upon all the latest transactions that have used that rollback segment.
In our example, we have two active transaction slots (01 and 02) and the next free slot is slot 03. (Since we are free to overwrite committed transactions.)

Stage 2 - Row 2 is Updated

Description: We have now updated row 2 of block 500. Note that the data block header is updated to point to the rollback segment 5, transaction slot 3 (5.3) and that it is marked uncommitted (Active).

Stage 3 - The User Issues a Commit

Description: Next the user hits commit. Note that all that this does is it updates the rollback segment header's corresponding transaction slot as committed. It does *nothing* to the data block.

Stage 4 - Another User Selects Data Block 500

Description: Some time later another user (or the same user) revisits data block 500. We can see that there is an uncommitted change in the data block according to the data block's header.
Oracle then uses the data block header to look up the corresponding rollback segment transaction table slot, sees that it has been committed, and changes data block 500 to reflect the true state of the datablock. (i.e. it performs delayed cleanout).

Anyways, as this requires modification, I have tried increasing Undo Tablespace size. I have done this to increase the count of rollback segments.. By increasing the rollback segments , the work would separate to more rollback segment and thus decrease the chance of overwriting a rollback transaction slot ..

Actually, increasing the size of undo have help, and the error is not appeared any more..

But still, I feel the need to write the possible causes of ORA-01555 errors..

Causes, which may lead to ORA-01555; (Ref: Asktom)

1)The rollback information itself is overwritten

2)The transaction slot in the rollback segment's transaction table (stored in the rollback segment's header) is overwritten

Possible fixes;

1. Increase size of rollback segment which will reduce the likelihood of overwriting rollback information that is needed.
2. Reduce the number of commits.
3. Run the processing against a range of data rather than the whole table.
4. Add additional rollback segments. This will allow the updates etc. to be spread across more rollback segments thereby reducing the chances of overwriting required rollback information.
5. If fetching across commits, the code can be changed so that this is not done.
6. Ensure that the outer select does not revisit the same block at different times during the processing. This can be achieved by : - Using a full table scan rather than an index lookup - Introducing a dummy sort so that we retrieve all the data, sort it and then sequentially visit these data blocks. Note that: This is only applicable if the reason is the rollback information overwritten(1)

2 comments :

muhammad khalilJuly 23, 2015 at 11:04 AM
Hi,
I suffered from the same situation, snapshot too old for queries even with 0 seconds. I check with many blogs but Oracle support's note 1950577.1 helped me.
ReplyDelete
Replies
AnonymousJuly 23, 2015 at 3:35 PM
Hi Muhammad,

A useful doc.
Thanks for sharing
ReplyDelete
Replies

Add comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.

Subscribe to: Post Comments ( Atom )