• Oracel数据文件损坏如何强制完成介质恢复?
  • 发布于 2个月前
  • 386 热度
    0 评论
  • 梦清幽
  • 0 粉丝 43 篇博客
  •   
数据库版本:
Oracle 11.2.0.4.0
问题现象:
cjc01.dbf数据文件损坏,通过RMAN进行恢复,由于缺失需要的归档文件,导致介质恢复失败,数据库无法OPEN。
OPEN报错如下:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/oracle/app/oracle/oradata/cjc/cjc01.dbf'
介质恢复报错如下,由于缺失thread 1 sequence 15 archived log,导致介质恢复失败:
RMAN> recover datafile 5;

Starting recover at 2024-10-19 11:09:42
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 16 is already on disk as file /arch/cjc_1_16_1177245832.arc
archived log for thread 1 with sequence 17 is already on disk as file /arch/cjc_1_17_1177245832.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/19/2024 11:09:42
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 15 and starting SCN of 1128934 found to restore

问题分析:

数据文件损坏,通常需要RMAN+归档进行完全恢复,如果需要的归档文件同时损坏或有缺失,介质恢复一般无法完成,数据库不能正常启动,如何在归档缺失的情况下强制完成介质恢复,启动数据库呢?


解决方案:
由于数据库启动时,会检查每个数据文件头的 checkpoint scn,通过备份restore的数据文件cjc01.dbf 的 checkpoint scn比其他数据文件小,所以需要介质恢复,那么通过bbed等工具强制修改cjc01.dbf数据文件头部的checkpoint scn,改成和其他数据文件相同,是否能欺骗数据库完成介质恢复,启动数据库呢?

实验过程如下:
1.归档模式
SQL> archive log list;
Database log mode           Archive Mode
Automatic archival           Enabled
Archive destination           /arch
Oldest online log sequence     13
Next log sequence to archive   15
Current log sequence           15
2.创建测试数据
SQL> create tablespace cjc datafile '/oracle/app/oracle/oradata/cjc/cjc01.dbf' size 10M autoextend on;
SQL> create user cjc identified by "a" default tablespace cjc;
SQL> grant dba to cjc;
SQL> conn cjc/a
SQL> create table t1(id int);
SQL> insert into t1 values(1);
SQL> insert into t1 values(2);
SQL> commit;
SQL> select * from t1;
    ID
----------
     1
     2
3.rman备份数据文件cjc01.dbf
SQL> col name for a50
SQL> select * from v$dbfile order by 1;
     FILE# NAME
---------- --------------------------------------------------
     1 /oracle/app/oracle/oradata/cjc/system01.dbf
     2 /oracle/app/oracle/oradata/cjc/sysaux01.dbf
     3 /oracle/app/oracle/oradata/cjc/undotbs01.dbf
     4 /oracle/app/oracle/oradata/cjc/users01.dbf
     5 /oracle/app/oracle/oradata/cjc/cjc01.dbf

[oracle@cjc-db-01 ~]$ mkdir /oradata/back
[oracle@cjc-db-01 ~]$ rman target /
RMAN> backup datafile 5 format '/oradata/back/rman_5_cjc01_%U';
...
input datafile file number=00005 name=/oracle/app/oracle/oradata/cjc/cjc01.dbf
...
piece handle=/oradata/back/rman_5_cjc01_0137v7oe_1_1 tag=TAG20241019T105510 comment=NONE
...
# 堆代码 duidaima.com
[oracle@cjc-db-01 ~]$ ls -lrth /oradata/back/
total 1.2M
-rw-r----- 1 oracle oinstall 1.2M Oct 19 10:55 rman_5_cjc01_0137v7oe_1_1
4.查看归档序列号
select sequence#,status from v$archived_log;
 SEQUENCE# S
---------- -
......
    13 A
    14 A
11 rows selected.
5.归档切换
alter system switch logfile;
6.新增数据和表
SQL> conn cjc/a
SQL> select * from t1;
SQL> insert into t1 values(3);
SQL> commit;
SQL> create table t2 as select * from t1;
7.归档切换
alter system switch logfile;
alter system switch logfile;
8.查看归档序列号
select sequence#,status from v$archived_log;
 SEQUENCE# S
---------- -
......
    13 A
    14 A
    15 A
    16 A
    17 A

14 rows selected.
9.模拟故障:
(1)停库
SQL> conn / as sysdba
SQL> shutdown immediate
(2)数据文件丢失;
[oracle@cjc-db-01 cjc]$ pwd
/oracle/app/oracle/oradata/cjc
[oracle@cjc-db-01 cjc]$ mv cjc01.dbf cjc01.dbf.bak
(3)归档丢失
[oracle@cjc-db-01 arch]$ pwd
/arch
[oracle@cjc-db-01 arch]$ ls -lrth
total 323M
......
-rw-r----- 1 oracle oinstall  50M Oct 13 13:00 cjc_1_13_1177245832.arc
-rw-r----- 1 oracle oinstall  13M Oct 13 15:53 cjc_1_14_1177245832.arc
-rw-r----- 1 oracle oinstall  29M Oct 19 10:59 cjc_1_15_1177245832.arc
-rw-r----- 1 oracle oinstall  21K Oct 19 11:01 cjc_1_16_1177245832.arc
-rw-r----- 1 oracle oinstall 2.5K Oct 19 11:01 cjc_1_17_1177245832.arc

[oracle@cjc-db-01 arch]$ mv cjc_1_15_1177245832.arc cjc_1_15_1177245832.arc.bak
10.启动数据库失败
SQL> startup
ORACLE instance started.

Total System Global Area  563691520 bytes
Fixed Size            2255232 bytes
Variable Size          222299776 bytes
Database Buffers      335544320 bytes
Redo Buffers            3592192 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/oracle/app/oracle/oradata/cjc/cjc01.dbf'
11.恢复数据库失败,缺失需要的thread 1 sequence 15 archived log
[oracle@cjc-db-01 arch]$ rman target /
RMAN> restore datafile 5;
RMAN> recover datafile 5;

Starting recover at 2024-10-19 11:09:42
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 16 is already on disk as file /arch/cjc_1_16_1177245832.arc
archived log for thread 1 with sequence 17 is already on disk as file /arch/cjc_1_17_1177245832.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/19/2024 11:09:42
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 15 and starting SCN of 1128934 found to restore
无法 open 数据库
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/oracle/app/oracle/oradata/cjc/cjc01.dbf'
12.检查控制文件记录的CHECKPOINT_CHANGE#,数据文件头记录的CHECKPOINT_CHANGE#
SQL> select file#,status,CHECKPOINT_CHANGE# from v$datafile;

     FILE# STATUS  CHECKPOINT_CHANGE#
---------- ------- ------------------
     1 SYSTEM          1137224
     2 ONLINE          1137224
     3 ONLINE          1137224
     4 ONLINE          1137224
     5 ONLINE          1137224
可以看到,FILE#=5 CHECKPOINT_CHANGE# 为1136379,小于其他文件的1137224
SQL> select file#,status,recover,CHECKPOINT_CHANGE# from v$datafile_header;

     FILE# STATUS  REC CHECKPOINT_CHANGE#
---------- ------- --- ------------------
     1 ONLINE  NO          1137224
     2 ONLINE  NO          1137224
     3 ONLINE  NO          1137224
     4 ONLINE  NO          1137224
     5 ONLINE  YES          1136379
查看日志序列号
SQL> select thread#,group#,sequence#,FIRST_CHANGE#,NEXT_CHANGE#,status from v$log;

   THREAD#     GROUP#  SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# STATUS
---------- ---------- ---------- ------------- ------------ ----------------
     1        1          16       1136991        1137048 INACTIVE
     1        3          18       1137051     2.8147E+14 CURRENT
     1        2          17       1137048        1137051 INACTIVE
将 1136379 和 1137224 转换为16进制。
select to_char(1136379,'XXXXXXXXXXXX'),to_char(1137224,'XXXXXXXXXXXX') from dual;
TO_CHAR(11363 TO_CHAR(11372
------------- -------------
       1156FB         115A48
13.BBED修改文件头检查点SCN和RBA
[oracle@cjc-db-01 ~]$ bbed
BBED: Release 2.0.0.0.0 - Limited Production on Sat Oct 19 12:27:27 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************
查看数据文件
BBED> info
 File#  Name                                                        Size(blks)
 -----  ----                                                        ----------
     1  /oracle/app/oracle/oradata/cjc/system01.dbf                      96000
     2  /oracle/app/oracle/oradata/cjc/sysaux01.dbf                      66560
     3  /oracle/app/oracle/oradata/cjc/undotbs01.dbf                      8960
     4  /oracle/app/oracle/oradata/cjc/users01.dbf                         640
     5  /oracle/app/oracle/oradata/cjc/cjc01.dbf                          1280
查看 file 1 文件头检查点SCN,和 v$datafile_header 查询值相同
BBED> set file 1 block 1 
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x00115a48   ---16进制转成10进制是 1137224
      ub2 kscnwrp                           @488      0x0000
查看 file 2 文件头检查点SCN,和 v$datafile_header 查询值相同
BBED> set file 2 block 1 
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x00115a48   ---16进制转成10进制是 1137224
      ub2 kscnwrp                           @488      0x0000
查看 file 5 文件头检查点SCN,和 v$datafile_header 查询值相同,和 file 1,file 2值不同
BBED> set file 5 block 1
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x001156fb   ---16进制转成10进制是 1136379
      ub2 kscnwrp                           @488      0x0000
将 file 5 block 1 的 kscnbas 修改为 0x00115a48
BBED> modify /x 485a1100 offset 484 
BBED> p kcvfhckp
BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x00115a48
      ub2 kscnwrp                           @488      0x0000
查看file 1文件头检查点RBA
BBED> set file 1 block 1 
BBED> p kcvfhckp
......   
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000012
         ub4 kcrbabno                       @504      0x0000008d
         ub2 kcrbabof                       @508      0x0010
查看file 2文件头检查点RBA,和 file 1相同
BBED> set file 2 block 1 
BBED> p kcvfhckp
......
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000012
         ub4 kcrbabno                       @504      0x0000008d
         ub2 kcrbabof                       @508      0x0010
查看file 5文件头检查点RBA,和file 1,file 2不同
BBED> set file 5 block 1
BBED> p kcvfhckp
......
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000000f
         ub4 kcrbabno                       @504      0x0000de69
         ub2 kcrbabof                       @508      0x0010
将 file 5文件头检查点RBA改成和file 1,file 2相同
将 file 5 block 1 的 kcrbaseq 改成 0x00000012
将 file 5 block 1 的 kcrbabno 改成 0x0000008d
BBED> modify /x 12 offset 500 
BBED> modify /x 8d00 offset 504
BBED>  p kcvfhckp
......
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000012
         ub4 kcrbabno                       @504      0x0000008d
         ub2 kcrbabof                       @508      0x0010
......
提交修改
BBED> sum apply

14.再次执行介质恢复,成功
[oracle@cjc-db-01 arch]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Sat Oct 19 12:56:01 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CJC (DBID=3819394245, not open)
RMAN> recover datafile 5;

Starting recover at 2024-10-19 12:56:09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=18 device type=DISK

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 2024-10-19 12:56:10
检查数据文件头CHECKPOINT_CHANGE#,已修改为1137224
SQL> select file#,status,recover,CHECKPOINT_CHANGE# from v$datafile_header;

     FILE# STATUS  REC CHECKPOINT_CHANGE#
---------- ------- --- ------------------
     1 ONLINE  NO          1137224
     2 ONLINE  NO          1137224
     3 ONLINE  NO          1137224
     4 ONLINE  NO          1137224
     5 ONLINE  NO          1137224
15.可以正常启动数据库
SQL> alter database open;
Database altered
16.数据验证
查询,cjc.t1表丢失1条数据,cjc.t2表全部丢失,也就是15 号归档及以后所有数据都全部丢失。
SQL> select * from cjc.t1;

    ID
----------
     1
     2

SQL> select * from cjc.t2;
select * from cjc.t2
                  *
ERROR at line 1:
ORA-08103: object no longer exists

SQL> select * from cjc.t100;
select * from cjc.t100
                  *
ERROR at line 1:
ORA-00942: table or view does not exist
查看 ORA-08103 错误说明:
[oracle@cjc-db-01 ~]$ oerr ora 08103
08103, 00000, "object no longer exists"
// *Cause:  The object has been deleted by another user since the operation
//          began, or a prior incomplete recovery restored the database to
//          a point in time during the deletion of the object.
// *Action: Delete the object if this is the result of an incomplete
//          recovery.
翻译如下:
自操作开始以来,该对象已被其他用户删除,或者之前的不完全恢复将数据库还原到删除该对象期间的某个时间点。
The object has been deleted by another user since the operation began, or a prior incomplete recovery restored the database to a point in time during the deletion of the object.
如果这是不完全恢复的结果,请删除该对象。
Delete the object if this is the result of an incomplete recovery.
可以看到该表
set line 300
col owner for a10
col object_name for a10
col created for a25
col LAST_DDL_TIME for a25
select owner,object_name,created,LAST_DDL_TIME  from dba_objects where owner='CJC';
OWNER       OBJECT_NAM CREATED            LAST_DDL_TIME
---------- ---------- ------------------------- -------------------------
CJC       T1          2024-10-19 10:51:12    2024-10-19 10:51:12
CJC       T2          2024-10-19 10:59:52    2024-10-19 10:59:52

SQL> conn cjc/a
SQL> select * from tab;

TNAME                   TABTYPE    CLUSTERID
------------------------------ ------- ----------
T1                   TABLE
T2                   TABLE

SQL> select * from t1;

    ID
----------
     1
     2
无法查询数据
SQL> select * from t2;
select * from t2
              *
ERROR at line 1:
ORA-08103: object no longer exists
可以删除t2
SQL> drop table t2 purge;
Table dropped.
其他:
除了v$datafile_header 和 bbed 可以查询 数据文件头 CHECKPOINT_CHANGE# 和 RBA以外,Dump Data File Header也可以查询到相同的信息,方法如下:
SQL> select spid from v$process where addr in (select paddr from v$session where sid = (select sid from v$mystat where rownum=1)); 
SPID
------------------------
4870
SQL> alter session set events 'immediate trace name FILE_HDRS level 10';
[oracle@cjc-db-01 trace]$ ls -lrth *4870*
-rw-r----- 1 oracle oinstall 186 Oct 19 11:13 cjc_ora_4870.trm
-rw-r----- 1 oracle oinstall 39K Oct 19 11:13 cjc_ora_4870.trc
查看trace,可以看到 CHECKPOINT_CHANGE# , RBA 和上面查询的值一致。
[oracle@cjc-db-01 trace]$ vi cjc_ora_4870.trc
......
DUMP OF DATA FILES: 5 files in database

DATA FILE #1:
  name #7: /oracle/app/oracle/oradata/cjc/system01.dbf
.....
Checkpointed at scn:  0x0000.00115a48 10/19/2024 11:05:37  ---CHECKPOINT_CHANGE#
 thread:1 rba:(0x12.8d.10)                                 ---RBA
......

DATA FILE #5:
  name #9: /oracle/app/oracle/oradata/cjc/cjc01.dbf
.....
Checkpointed at scn:  0x0000.001156fb 10/19/2024 10:55:10   ---CHECKPOINT_CHANGE#
 thread:1 rba:(0xf.de69.10)                                 ---RBA
......
注意:
此方法仅供参考,有丢失数据的风险,请勿用于生产环境。
用户评论