Sie sind auf Seite 1von 12

Block corruptuion Erorr

(1) Introduction
~~~~~~~~~~~~~~~~
This short article discusses the oracle error ORA-1578 and describes the main
actions to take to deal with it. Please read the complete article before taking any action.
You are advised to contact Oracle support with as much information as possible for
advice on which option is applicable to your scenario.

Throughout the article there are 'Evidence>>' tags which show what information
you should collect. All SQL statements are for use in either SQLDBA or Server Manager
when connected 'internal'.

In some cases you may be sent this article in connection with errors other than
ORA 1578. In these cases Oracle support will inform you of the 'FILE' and 'BLOCK'
numbers to use in conjunction with the steps here.

(2) What is an ORA 1578 ?


~~~~~~~~~~~~~~~~~~~~~~~~~
An ORA-1578 error is raised when Oracle sees a data block which it thinks is
corrupt. The error should always provide you with the file number and the block number
of the block in question. For the purposes of this article we shall use:

<F> to denote the file number reported in the error


<B> to denote the block number reported in the error

The original cause of the error could be any of:


- Bad IO hardware / firmware
- Operating system IO or caching problems
- Memory or paging problems
- Oracle incorrectly attempting to access an unformatted block
- Part of the datafile overwritten
- Disk repair utilities
- any of a number of other causes.

The point in time when the error is raised may be much later than when any
corruption actually occurred.

(3) Initial Actions:


~~~~~~~~~~~~~~~~~~~~
Whenever such an error occurs note down the FULL error message/s and look in
the instance alert log and trace files for any associated errors. It is important to do this
first to assess whether this is a single block corruption or a more severe issue.
Once you have determined that it appears to be a single block problem then the
steps below can be used to help determine what action can be taken.

IMPORTANT: If there are multiple errors OR you also have operating system
level errors against the affected file there is little point proceeding until the underlying
problem has been addressed. Get your hardware vendor to check the system over and
contact Oracle Support with details of all errors.

Evidence>> - Record the original error in full, along with details of the application
which encountered the error.
- Save an extract from the alert log from a few hours before the FIRST
recorded problem up to the current point in time.
- Save any trace files mentioned in the alert log.
- Record any recent OS problems you have encountered.
- Note if you are using any special features - Eg: ASYNC IO, fast write
disk options etc..
- Record your current BACKUP position (Dates, Type etc...)
- Note if your database is in ARCHIVELOG mode or not
Eg: Issue "ARCHIVE LOG LIST" in SQLDBA.

(4) Which File and Object was the Error Reported Against ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first step to addressing a reported corruption is to find out which file the
corruption occurs on:

SELECT * FROM v$datafile WHERE file#= <F>;

The next step is to find out which object the error was reported
against. The following query will tell you the object name , owner
and type:

SELECT owner, segment_name, segment_type


FROM dba_extents
WHERE file_id = <F>
AND <B> BETWEEN block_id AND block_id + blocks - 1;

Based on the 'segment type' returned:

- If the segment type is TEMPORARY or CACHE or no rows are returned


recheck you have entered the SQL and parameters correctly.
If you get the same result contact Oracle support.

- If the segment is a ROLLBACK segment contact Oracle support.


The database is likely to require recovery.

- If the segment is an INDEX determine which TABLE it belongs to:


Eg:
SELECT table_owner, table_name
FROM dba_indexes
WHERE index_name='name-of-segment';

- If the segment is a CLUSTER determine which tables it contains.


Eg:
SELECT owner, table_name
FROM dba_tables
WHERE cluster_name='name-of-segment';

- If the segment is a TABLE just note the name and owner.

- If the segment type is none of the above contact Oracle support.

If the table involved is a data dictionary table contact Oracle


support. The database is likely to require recovery.

Evidence>> You should now have recorded:


- The filename of the bad file
- The corrupt object name, owner and type
- Any tables directly associated with this object

(5) Is there a Permanent Corruption ?


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before deciding on the best action it is important to determine if there is a
permanent corruption on the object or not. The 'ANALYZE' command can be used to
help check this.

Eg: For either an INDEX or TABLE issue the following command for the
relevant 'owner.tablename':

ANALYZE TABLE owner. tablename VALIDATE STRUCTURE


CASCADE;

For a cluster issue the command:

ANALYZE CLUSTER owner. clustername VALIDATE STRUCTURE


CASCADE;

The possible outcomes are:


ORA 1578 with same arguments as the original error
ORA 1578 with different arguments to the original error
ORA 1498 error
No errors.
Some other error.
Whatever the outcome it is advisable to repeat the ANALYZE command a
second time. This helps show if there is a consistent repeatable problem (most likely
permanent corruption on disk) or if the problem is either intermittant / random (more
likely memory or intermittent hardware fault). Note that ANALYZE will stop at the first
severe corruption so it is possible that the FILE and BLOCK number reported as bad will
differ from the initial error if there is more than one corrupt block on the object. This is
why it is advisable to run ANALYZE at least twice.

The next actions depend on the outcome of the 2 (or more) analyze
commands:

ORA 1578 with repeatable arguments:


This is most likely a physically corrupt block.

ORA 1578 with different arguments each time:


Get your hardware vendor to check out the
memory, page space, and IO subsystems before
proceeding.

ORA 1498 error:


This means we have not seen the same corruption
before but the object has an internal corruption
of some form.
Your first step should be to contact the hardware
vendor to check your memory, page space and IO
subsystems for intermittant problems.
In addition you should contact Oracle Support with the
trace files produced by the ANALYZE command. The
content of the files may be completely unrelated to
the original ORA 1578 error.

No errors:
This means we have not seen the same corruption
as originally recorded.
You should still contact the hardware vendor to check
your memory, page space and IO subsystems for
intermittant problems.
It is also advisable to export the problem table as
ANALYZE will not detect all types of error.

Some other error:


Contact Oracle Support.

Evidence>> - Analyze commands entered and resulting error/s (if any).


- Check if there is a trace file in the USER_DUMP_DEST directory
and include this also.

(6) Recovering from Hardware Errors:


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you have encountered potential hardware errors on a particular disk or
controller it is advisable to relocate the files. This article does NOT cover full details of
recovering from hardware errors (Eg: Disk controller returning different blocks to those
requested) but in summary if the file reporting the corruption error is affected by
hardware problems you should relocate it to a good disk and repeat the tests in section 5.

If you are in ARCHIVELOG mode, as is recommended for production systems, you


should just need to:

OFFLINE the affected data file,


Restore its last backup on a GOOD disk,
Rename the datafile to the NEW location,
Recover the datafile (using the archived logs),
Online the file.

The difficulty here is to know which backup (if any) does not contain the
corruption/s.

If you are NOT in ARCHIVELOG mode you cannot do this. You can offline
the bad file, copy it elsewhere, rename it and online it but this assumes that a copy
command will read the file without error which may not be true.

If you relocate any files you should now repeat section 5.

(7) A Physical Corruption:


~~~~~~~~~~~~~~~~~~~~~~~~~~
To get to this section you should be encountering repeatable corruption
errors on an ANALYZE ... VALIDATE ... command. You should also have
either moved the file/s off of any suspect hardware or replaced such
hardware if possible.

If the FILE and BLOCK errors from the ANALYZE command are different to
those originally reported substitute them into the SELECT statement in
section (4) above to determine the actual corrupt object.

If this object is a user INDEX you can simply drop and re-create it.

If this object is a TABLE or CLUSTER it must be understood that the


data within the corrupt block is lost. The rest of the table can
usually be salvaged as described in the following sections.
Evidence>>
- Before trying to salvage data from the corrupt table you should
collect the following physical evidence if there is a need to try
and identify the actual cause of the corruption:
i) An operating system HEX dump of the bad block
On UNIX:
dd if=FILENAME bs=DB_BLOCK_SIZE skip=<B> count=3 | od -x
^^^^^^^^ ^^^^^^^^^^^^^ ^^^
Eg:
dd if=ts11.dbf bs=4k skip=1223 count=3 | od -x

On VMS:
DUMP/BLOCKS=(start:XXXX,end:YYYY)/out=file.out dbfile.dbf

Where XXXX=Operating system block number (in 512 byte chunks)


To calculate this multiply the block number reported by
'DB_BLOCK_SIZE/512'.

ii) If you are in ARCHIVELOG mode make a safe copy of the archived
log files around the time of the error, and preferably for a few
hours before the error was reported. These may be required later.

(8) Going back in Time ?


~~~~~~~~~~~~~~~~~~~~~~~~
Whatever sort of block the problem occurred on one possible option
to be considered while looking at alternatives is to recover
the whole database to a point in time BEFORE the corruption appeared.
The difficulty with this option is that it is not always possible
to know when the problem first appeared.

Evidence>> From this section on there are no more Evidence tags as the
focus is on salvaging data. However, you are strongly advised
to keep a track of exactly what you do should things go wrong.

(9) Salvaging data from a table:


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before attempting to salvage the rest of the data check:

- Do you have a recent export from which you could rebuild


the table completely ?

- Do you have a backup copy of the database from which


you could create such an export ?

If you need to retrieve data from the table itself you effectively
need to select out the data avoiding the bad block/s.
There are several ways you can achieve this:

- In Oracle 7.1 onwards you can use a ROWID range scan.

- If there is a primary key you can select via this index.

- If there is a non-unique key you can select some of the


data via this index.

- For certain kinds of corrupt block you can set an Oracle


event to SKIP the block. This only applies to one particular
kind of corruption and is unlikely to be of use unless
you are on Oracle7 release 7.2 or higher. This option
is NOT discussed in this article.

Each of these is discussed below. Before looking at extracting the


data you need to understand what a 'ROWID' is. This is discussed
first.

(10) ROWID:
~~~~~~~~~~~
Each row in a table in the database can be uniquely referenced by
a 'ROWID' psuedo-column. A rowid is a hexadecimal representation of
of a rows location in the database thus:

BLOCK.ROW-SLOT.FILE

Eg: If we issued SELECT EMPNO, ROWID FROM EMP we would get


something
like:

EMPNO ROWID
----- -----
100 00000003.0000.000A
101 00000003.0001.000A
... etc

Hence these 2 rows are in file 0xA which is 10 decimal, block 3.

For any bad block we can make a template ROWID by converting <F> and
<B> from the error into hexadecimal and using the above ROWID format:

Eg: For file 11, block 22 we would have:

File: 11 == B hex
Block: 22 == 16 hex

Hence the rowid's for rows in this block could range from

00000016.0000.000B through to 00000016.7FFF.000B

as the maximum row-slot number is 7FFF hex.

(11-A) Oracle 7.1 Rowid Range Scans:


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you are on a release prior to 7.1 see section 11-B.
In Oracle 7.1 onwards you can select data from the base table using a
ROWID range scan hint. This allows you to select AROUND the corrupt
block.

The syntax of the command is simply:

SELECT /*+ ROWID(tablename) */ column list


FROM tablename
WHERE rowid ...

The WHERE clause has to do a ROWID range comparison such as:

WHERE rowid BETWEEN 'low-rowid' and 'high-rowid';

We can use this type of statement several times to select data around
the bad blocks and insert it into a SALVAGE table. Note that when
constructing a where clause we must not reference a ROWID in the
corrupt block directly but use the ROWIDs either side of the corrupt
block.

Eg:
SQL> SELECT count(*) FROM bigemp;
ERROR:
ORA-01578: ORACLE data block corrupted (file # 8, block # 8147)
ORA-01110: data file 8: '/oracle/m4/v714/instance/dbs/usr3714.dbf'

Corrupt File ID: 8 == 8 Hex


Corrupt Block ID: 8147 == 1FD3 Hex

Hence:
First ROWID in the corrupt block: 00001FD3.0000.0008
Last possible ROWID before this block: 00001FD2.7FFF.0008
First ROWID after this block: 00001FD4.0000.0008

So we can:
SQL> CREATE TABLE salvage AS SELECT * FROM bigemp WHERE 1 = 2;

SQL> INSERT INTO salvage


SELECT /*+ ROWID(bigemp) */ * FROM bigemp
WHERE rowid <= '00001FD2.7FFF.0008';

SQL> INSERT INTO salvage


SELECT /*+ ROWID(bigemp) */ * FROM bigemp
WHERE rowid >= '00001FD4.0000.0008';

Note we are using <= and >= with rowid's either side of the corrupt
block.

(11-B) Pulling Data Out Of Tables Through Indexes:


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This option is really for use in versions before Oracle 7.1 where ROWID
range scans do not exist. Suppose you have the same corrupted table
BIGEMP with columns EMPNO, ENAME and DEPTNO and that you have a
unique index on the EMPNO column. Assuming the index is also not
corrupt and that we are looking at a not null column we can select
data from the table via the index as the index holds the ROWIDs for
each row in the table.

Eg: SELECT rowid, empno FROM bigemp WHERE empno>=0;

ROWID EMPNO
------------------ ----------
00001FC3.000C.0008 177
00001FC3.000D.0008 178
00001FC3.0010.0008 181
00001FC3.0011.0008 182
...

We use EMPNO>=0 to enable the index. This does not error as the
above query is satisfied wholly from the INDEX.

Note that if the index were on a character column we would change


the above where clause to something like WHERE EMPNO > '';

As the index holds the ROWID we can use a filter in the WHERE clause
to identify indexed column values that reside in the bad block. We use
a where clause of the form:
ROWIDTOCHAR( ROWID ) LIKE 'BLOCK.%.FILE'

Eg: Using the previous values for the corrupt block and file:
SQL> SELECT empno from BIGEMP
WHERE EMPNO >= 0
AND ROWIDTOCHAR(ROWID) LIKE '00001FD3.%.0008';

EMPNO
----------
821
822
....
861

If there are only a few rows or they are contiguous values you can
select the data out using a clause like:

WHERE empno <821

Eg: Create a duplicate table:

CREATE TABLE TEMP AS SELECT * FROM BIGEMP WHERE 1 = 2;

and then insert into that table around the corruption:

INSERT INTO TEMP SELECT * FROM BIGEMP WHERE EMPNO <


821;
INSERT INTO TEMP SELECT * FROM BIGEMP WHERE EMPNO >
861;

If there are a lot of rows in the corrupt block OR the indexed values
are not contiguous you need to use a more complex method of extraction:

Eg: To get a list of key values that we want to extract:

CREATE TABLE pk_table AS


SELECT empno FROM bigemp
WHERE empno >= 0
AND ROWIDTOCHAR(ROWID) NOT LIKE '00001FD3.%.0008';

Now build the destination table:

CREATE TABLE temp AS SELECT * FROM bigemp WHERE 1 = 2;

And insert the data:

INSERT INTO temp SELECT bigemp.* FROM bigemp, pk_table


WHERE bigemp.empno >= 0
AND bigemp.empno = pk_table.empno;

(12) Possible Problems:


~~~~~~~~~~~~~~~~~~~~~~~
Chained Rows:
It is possible for a row to be either "chained" or "migrated".
Such a row spans more than one Oracle block. The 'ROWID' is the
location of the first portion (or 'head') of the row. If there is
a corruption on a block containing a piece of a chained row other
than its 'head' block then you will need to adjust the above
procedures to exclude not only the corrupt block but also any rows
which have part of their row-piece in the corrupted block.

Such rows can be identified by selecting 'ROWID , TABLE.*' from the


table using a relevant WHERE clause to avoid the known corrupt block.
If you get an error then the last row returned by this query should
show the ROWID before the row which has part of its contents in the
corrupt block. You should adjust the relevant query above to avoid
this "chained" or "migrated" row also.

It may require several attempts at building 'salvage' queries before


it is possible to extract all available data without error.

Long Columns:
LONG columns have restrictions in that only certain SQL operations
are allowed on them. You can use the same methods as described in
section 11-A or 11-B to extract the data but you may need to use the
sqlplus 'COPY' command to actually copy the data. Note that sqlplus
COPY has a port specific limit on the maximum size of LONG you can
copy. Refer to the SQLPLUS User Guide and your port specific
documentation to determine if this is feasible or not.

(13) What data have I lost ?


~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you have an index on the corrupt table you can often determine
something about the data you have lost using the same method as
in section 11-B.
Suppose you have the same corrupted table BIGEMP with columns EMPNO,
ENAME and DEPTNO and that you have a unique index on the EMPNO column.
You can select data from the index for the corrupt block thus:

SQL> SELECT empno from BIGEMP


WHERE EMPNO >= 0
AND ROWIDTOCHAR(ROWID) LIKE '00001FD3.%.0008';
Where "EMPNO >= 0" forces an index range scan
and "ROWIDTOCHAR(ROWID) LIKE '00001FD3.%.0008'" restricts the results
to those entries which would fall in the bad block.

You can use this approach on all indexes on the table to get as much
information as possible about the missing data.