联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复

该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)

由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作
SQL> @dbms_diskgroup_get_block.sql +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.headerParameter 1:ASM_file_name (required)Parameter 2:block_to_extract (required)Parameter 3number_of_blocks_to_extract (required)Parameter 4:FileSystem_File_Name (required)old 14: v_AsmFilename := '&ASM_File_Name';new 14: v_AsmFilename := '+DATA/xifenfei.dbf';old 15: v_offstart := '&block_to_extract';new 15: v_offstart := '1';old 16: v_numblks := '&number_of_blocks_to_extract';new 16: v_numblks := '1';old 17: v_FsFilename := '&FileSystem_File_Name';new 17: v_FsFilename := '/tmp/xff/xifenfei.dbf.header';File: +DATA/xifenfei.dbfType: 2 Data FileSize (in logical blocks): 3978880Logical Block Size: 16384Physical Block Size: 512PL/SQL procedure successfully completed. |
然后通过bbed修改相关scn
BBED> set filename 'xifenfei.dbf.header' FILENAME xifenfei.dbf.headerBBED> set blocksize 16384 BLOCKSIZE 16384BBED> map File: xifenfei.dbf.header (0) Block: 1 Dba:0x00000000------------------------------------------------------------ Data File Header struct kcvfh, 860 bytes @0 ub4 tailchk @16380 BBED> p kcvfh.kcvfhckp.kcvcpscnstruct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xa8061324 ub2 kscnwrp @488 0x0081BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn;struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xa8133e2b ub2 kscnwrp @488 0x0081 |
然后把修改的数据文件头写回到asm中
SQL> @dbms_diskgroup_cp_block_to_asm.sql /tmp/xff/xifenfei.dbf.header +DATA/xifenfei.dbf 1 1 Parameter 1:v_FsFileName (required)Parameter 2:v_AsmFileName (required)Parameter 3v_offstart (required)Parameter 4v_numblks (required)old 16: v_FsFileName := '&v_FsFileName';new 16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header';old 17: v_AsmFileName := '&v_AsmFileName';new 17: v_AsmFileName := '+DATA/xifenfei.dbf';old 18: v_offstart := '&v_offstart';new 18: v_offstart := '1';old 19: v_numblks := '&v_numblks';new 19: v_numblks := '1';File: +DATA/xifenfei.dbfType: 2 Data FileSize (in logical blocks): 3978880Logical Block Size: 16384PL/SQL procedure successfully completed. |
查询文件头是否修改成功
[oracle@xff1 xff]$ sqlplus / as sysdbaSQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024Copyright (c) 1982, 2011, Oracle. All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,Data Mining and Real Application Testing optionsSQL> set numw 16SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295);CHECKPOINT_CHANGE#------------------ 556870614571 556870614571SQL> recover datafile 295;Media recovery complete. |
通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库
SQL> recover database;Media recovery complete.SQL> alter database open;Database altered. |
alert日志提示
Sat Aug 10 16:46:11 2024ALTER DATABASE RECOVER datafile 295 Media Recovery StartSerial Media Recovery startedWARNING! Recovering data file 295 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.Media Recovery Complete (xff1)Completed: ALTER DATABASE RECOVER datafile 295 Sat Aug 10 16:46:39 2024ALTER DATABASE RECOVER database Media Recovery Start started logmerger processSat Aug 10 16:46:51 2024WARNING! Recovering data file 1139 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 1140 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 1601 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 1803 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 1827 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 1931 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 2185 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 2473 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.WARNING! Recovering data file 2616 from a fuzzy backup. It might be an onlinebackup taken without entering the begin backup command.Sat Aug 10 16:46:54 2024Parallel Media Recovery started with 64 slavesMedia Recovery Complete (xff1)Completed: ALTER DATABASE RECOVER database Sat Aug 10 17:19:58 2024alter database openThis instance was first to openSat Aug 10 17:19:58 2024SUCCESS: diskgroup DATA was mountedSat Aug 10 17:19:58 2024NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is establishedSat Aug 10 17:20:10 2024Picked broadcast on commit scheme to generate SCNsSat Aug 10 17:20:10 2024SUCCESS: diskgroup REDO was mountedSat Aug 10 17:20:10 2024NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is establishedThread 1 opened at log sequence 124958 Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.oraSuccessful open of redo thread 1MTTR advisory is disabled because FAST_START_MTTR_TARGET is not setSat Aug 10 17:20:14 2024SMON: enabling cache recoveryInstance recovery: looking for dead threadsInstance recovery: lock domain invalid but no dead threads[33770] Successfully onlined Undo Tablespace 2.Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds)Verifying file header compatibility for 11g tablespace encryption..Verifying 11g file header compatibility for tablespace encryption completedSMON: enabling tx recoveryDatabase Characterset is ZHS16GBKSat Aug 10 17:20:16 2024minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000Starting background process GTX0Sat Aug 10 17:20:16 2024GTX0 started with pid=45, OS id=34119 Starting background process RCBGSat Aug 10 17:20:16 2024RCBG started with pid=46, OS id=34121 replication_dependency_tracking turned off (no async multimaster replication found)Starting background process QMNCSat Aug 10 17:20:16 2024QMNC started with pid=47, OS id=34134 Starting background process SMCOCompleted: alter database open |
检查数据字典一致性
SQL> @hcheck.sqlHCheck Version 07MAY18 on 10-AUG-2024 18:24:49----------------------------------------------Catalog Version 11.2.0.3.0 (1102000300)db_name: XFF Catalog FixedProcedure Name Version Vs Release TimestampResult------------------------------ ... ---------- -- ---------- --------------------.- LobNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS.- MissingOIDOnObjCol ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS.- SourceNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS.- OversizedFiles ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- PoorDefaultStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- PoorStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- TabPartCountMismatch ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- OrphanedTabComPart ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- MissingSum$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- MissingDir$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- DuplicateDataobj ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS.- ObjSynMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS.- ObjSeqMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS.- OrphanedUndo ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS.- OrphanedIndex ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS.- OrphanedIndexPartition ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS.- OrphanedIndexSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- OrphanedTable ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- OrphanedTablePartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- OrphanedTableSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- MissingPartCol ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- OrphanedSeg$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- OrphanedIndPartObj# ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- DuplicateBlockUse ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- FetUet ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- Uet0Check ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- SeglessUET ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- BadInd$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS.- BadTab$ ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- BadIcolDepCnt ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- ObjIndDobj ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- TrgAfterUpgrade ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- ObjType0 ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- BadOwner ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- StmtAuditOnCommit ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- BadPublicObjects ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- BadSegFreelist ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- BadDepends ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- CheckDual ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- ObjectNames ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS.- BadCboHiLo ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- ChkIotTs ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- NoSegmentIndex ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- BadNextObject ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- DroppedROTS ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- FilBlkZero ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- DbmsSchemaCopy ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- OrphanedObjError ... 1102000300 > 1102000000 08/10 18:24:54 PASS.- ObjNotLob ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS.- MaxControlfSeq ... 1102000300 <= *All Rel* 08/10 18:24:55 PASS.- SegNotInDeferredStg ... 1102000300 > 1102000000 08/10 18:25:18 PASS.- SystemNotRfile1 ... 1102000300 > 902000000 08/10 18:25:18 PASS.- DictOwnNonDefaultSYSTEM ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS.- OrphanTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS.- ObjNotTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS---------------------------------------10-AUG-2024 18:25:18 Elapsed: 29 secs---------------------------------------Found 0 potential problem(s) and 0 warning(s)PL/SQL procedure successfully completed.Statement processed.Complete output is in trace file:/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc |
运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)

浙公网安备 33010602011771号