Tuesday, May 24, 2016

Oracle : How to deal online redo log files Corruption ?



If you suspect that the database is having problems writing redo logfiles, You can configure Oracle to use checksums to verify blocks in the redo logfiles.
SELECT * FROM V$PARAMETER WHERE NAME='db_block_checksum';
Setting the LOG_BLOCK_CHECKSUM initialization parameter to TRUE will enable redo log block checking. The default value of LOG_BLOCK_CHECKSUM is TRUE in oracle 9i and FALSE in earlier version.
When redo log block checking is enabled, Oracle computes a checksum for each redo log block written to the current log and writes that checksum in the header of the block.
Oracle uses the checksum to detect corruption in a redo log block and will try to verify the redo log block when it writes the block to an archivelog files or when the block is read from an archived log during recovery.
select n.name, m.value from sys.v_$mystat  m, sysv_$statname    
where m.statistic# in (12, 42, 164) and n.statistic# = m.statistic#;
If a redo log block is corrupted while the archive is being written, Oracle will try to read the block from another member in the group. If all members have a corrupted block, archiving will stop.
SELECT GROUP#, SEQUENCE#, BYTES, STATUS FROM V$LOG;
SELECT GROUP#, STATUS, MEMBER FROM V$LOGFILE;
Note: Setting LOG_BLOCK_CHECKSUM to TRUE will cause more I/O and CPU usage. The system should be monitored closely while this parameter is set.
Recover when only one online redologs file corrupted.
If you lose or corrupted only one of your online redologs, then you need only is, to open the database with the RESETLOGS option. Opening with RESETLOGS will recreate your online redologs. There is no need a backup of your control file for this operation. Your current control files will work. To restore, perform the following:
1. STARTUP MOUNT
2. RECOVER DATABASE UNTIL CANCEL  ---CANCEL (cancel right away)
3. ALTER DATABASE OPEN RESETLOGS;
Note: Once the database is open with RESETLOGS, You must take fresh database backup [especially for oracle version 9i].
Recover when all the online redologs file corrupted.
If you lose all members of a redo log group then the steps for maintenance is depending on the group status and whether or not the database is in archivelog mode.  
INACTIVE:  
If the affected redo log group has a status of INACTIVE it is no longer required for crash recovery.  If you are in NOARCHIVELOG mode issue either CLEAR LOGFILE or recreate the group manually.
FOR Example:
ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP <group_number>;
ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 3;
ALTER DATABASE CLEAR LOGFILE GROUP <group_number>;
Note: This statement overcomes two situations where dropping redo logs is not possible. If there are only two log groups The corrupt redo log file belongs to the current group.
If you are in ARCHIVELOG mode and the group has been archived, issue either of the steps above. 
If you are in ARCHIVELOG mode and the group hasn’t been archived then issue CLEAR UNARCHIVED LOGFILE.  If you don’t specify the UNARCHIVED keyword you will receive an error.
Note: Perform a complete backup (including the control file) after executing the above command there is a chance to find a gap in the archivelog.
ACTIVE:
If the redo log group has a status of ACTIVE, it is still required for crash recovery.  Issue the command ALTER SYSTEM CHECKPOINT, If successful then follow the steps above for INACTIVE. 
If the checkpoint fails, then you need to perform recovery.  If you are in NOARCHIVELOG mode then you need to perform a complete recovery of the last cold backup. 
If the checkpoint fails and you are in ARCHIVELOG mode then you can perform an INCOMPLETE recovery up to the previous log file. 
CURRENT:
The current redo log group is the one, on which Oracle is currently writing to. If you lose this group or files become corrupted, LGWR may terminate. If LGWR terminates you have to recover the database:
If you are in ARCHIVELOG mode, perform an incomplete recovery up to the previous log file.
If you are in NOARCHIVELOG, perform a complete recovery of the last cold backup. If the database is still online, you can try and clear the logfile group. If the command is not successful then you will need to recover.

No comments:

Post a Comment