联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
存储突然掉线,导致数据库crash,报大量ORA-00206 ORA-00202 ORA-15081以及Linux-x86_64 Error: 5: Input/output error之类的错误
Sun Jul 21 20:00:11 2024Thread 1 advanced to log sequence 1594398 (LGWR switch) Current log# 5 seq# 1594398 mem# 0: +DATA/xff/onlinelog/group_5.412.906718739Sun Jul 21 20:53:17 2024WARNING: Write Failed. group:2 disk:0 AU:506916 offset:49152 size:16384Sun Jul 21 20:53:17 2024WARNING: Read Failed. group:2 disk:2 AU:506931 offset:49152 size:16384WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 415 in group [2.34109396] from disk ORACLE_DATA_0002 allocation unit 506931 reason error; if possible, will try another mirror sideErrors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:ORA-15080: 与磁盘的同步 I/O 操作失败ORA-27061: 异步 I/O 等待失败Linux-x86_64 Error: 5: Input/output errorAdditional information: -1Additional information: 16384WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 415 in group 2 on disk 0 allocation unit 506916 Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:ORA-00206: 写入控制文件时出错 (块 3, # 块 1)ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''ORA-15081: 无法将 I/O 操作提交到磁盘ORA-15081: 无法将 I/O 操作提交到磁盘Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:ORA-00221: 写入控制文件时出错ORA-00206: 写入控制文件时出错 (块 3, # 块 1)ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''ORA-15081: 无法将 I/O 操作提交到磁盘ORA-15081: 无法将 I/O 操作提交到磁盘CKPT (ospid: 42142): terminating the instance due to error 221Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_lmon_42087.trc:ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''ORA-15081: 无法将 I/O 操作提交到磁盘ORA-27072: 文件 I/O 错误Linux-x86_64 Error: 5: Input/output errorAdditional information: 4Additional information: 1038194784Additional information: -1Sun Jul 21 20:53:19 2024ORA-1092 : opitsk aborting processSun Jul 21 20:53:24 2024ORA-1092 : opitsk aborting processSun Jul 21 20:53:24 2024License high water mark = 59Sun Jul 21 20:53:28 2024Instance terminated by CKPT, pid = 42142USER (ospid: 64660): terminating the instanceInstance terminated by USER, pid = 64660 |
存储恢复之后启动数据库报ORA-600 2131错误
Mon Jul 22 09:10:04 2024ALTER DATABASE MOUNTThis instance was first to mountMon Jul 22 09:10:04 2024Sweep [inc][490008]: completedSweep [inc2][490008]: completedNOTE: Loaded library: System SUCCESS: diskgroup ORACLE_DATA was mountedNOTE: dependency between database rac and diskgroup resource ora.ORACLE_DATA.dg is establishedErrors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_14301.trc (incident=492409):ORA-00600: ??????, ??: [2131], [33], [32], [], [], [], [], [], [], [], [], []Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_492409/xff1_ora_14301_i492409.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.ORA-600 signalled during: ALTER DATABASE MOUNT... |
客户尝试重建ctl进行恢复,结果由于分析不正确,导致在重建ctl的时候,遗漏了3个数据文件,并且在屏蔽一致性的情况下,强制resetlogs操作,结果数据库没有被正常打开,而是报ORA-600 2662错误
alter database open resetlogsRESETLOGS is being done without consistancy checks. This may resultin a corrupted database. The database should be recreated.RESETLOGS after incomplete recovery UNTIL CHANGE 9965567206652Clearing online redo logfile 1 +DATA/xff/onlinelog/group_1.414.906718739Clearing online log 1 of thread 1 sequence number 0Clearing online redo logfile 1 completeClearing online redo logfile 2 +DATA/xff/onlinelog/group_2.413.906718739Clearing online log 2 of thread 1 sequence number 0Clearing online redo logfile 2 completeClearing online redo logfile 5 +DATA/xff/onlinelog/group_5.412.906718739Clearing online log 5 of thread 1 sequence number 0Clearing online redo logfile 5 completeExpanded controlfile section 2 from 1 to 63 recordsThe number of logical blocks in section 2 remains the sameExpanded controlfile section 1 from 4 to 66 recordsRequested to grow by 62 records; added 32 blocks of recordsExpanded controlfile section 30 from 1 to 63 recordsThe number of logical blocks in section 30 remains the sameExpanded controlfile section 29 from 1 to 63 recordsThe number of logical blocks in section 29 remains the sameControl file has been expanded to support 63 threadsMon Jul 22 23:04:07 2024Redo thread 2 enabled by open resetlogs or standby activationOnline log +DATA/xff/onlinelog/group_1.414.906718739: Thread 1 Group 1 was previously clearedOnline log +DATA/xff/onlinelog/group_2.413.906718739: Thread 1 Group 2 was previously clearedOnline log +DATA/xff/onlinelog/group_3.501.1175036643: Thread 2 Group 3 was previously clearedOnline log +DATA/xff/onlinelog/group_4.502.1175036645: Thread 2 Group 4 was previously clearedOnline log +DATA/xff/onlinelog/group_5.412.906718739: Thread 1 Group 5 was previously clearedMon Jul 22 23:04:08 2024Setting recovery target incarnation to 2Initializing SCN for created control fileDatabase SCN compatibility initialized to 3Warning - High Database SCN: Current SCN value is 9965567206655, threshold SCN value is 0If you have not previously reported this warning on this database, please notify Oracle Support so that additional diagnosis can be performed.Mon Jul 22 23:04:09 2024Assigning activation ID 2763017873 (0xa4b04e91)Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: +DATA/xff/onlinelog/group_1.414.906718739Successful open of redo thread 1MTTR advisory is disabled because FAST_START_MTTR_TARGET is not setMon Jul 22 23:04:10 2024SMON: enabling cache recoveryErrors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc (incident=624374):ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_624374/xff1_ora_64210_i624374.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc:ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc:ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []Error 600 happened during db open, shutting down databaseUSER (ospid: 64210): terminating the instance due to error 600Instance terminated by USER, pid = 64210ORA-1092 signalled during: alter database open resetlogs... |
操作到这里,后续问题就比较麻烦了,因为在asm磁盘组中数据文件重建ctl的时候遗漏3个并且还被resetlogs操作过,导致这三个文件的resetlogs scn和其他数据文件不一致,对于这个问题,解决办法通过Oracle Recovery Tools工具或者bbed修改相关resetlogs scn,然后重建ctl
SQL> @rectl.sqlControl file created.SQL> RECOVER DATABASE;Media recovery complete |
然后解决之前数据库启动报ORA-600 2662问题,通过修改数据库scn进行解决,可以使用Patch_SCN工具进行快速解决,然后open数据库成功
SQL> ALTER DATABASE OPEN; Database altered. |
但是查看alert日志数据库报大量ORA-600 4194、ORA-01595和Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xC21D511] [PC:0x97F4EFA, kgegpa()+40]之类错误
Wed Jul 24 15:24:21 2024alter database openBeginning crash recovery of 1 threads parallel recovery started with 32 processesStarted redo scanCompleted redo scan read 0 KB redo, 0 data blocks need recovery…………Database Characterset is ZHS16GBKNo Resource Manager plan activeErrors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc (incident=777938):ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.replication_dependency_tracking turned off (no async multimaster replication found)Starting background process QMNCWed Jul 24 15:24:40 2024QMNC started with pid=79, OS id=54632 Block recovery from logseq 2, block 74 to scn 9965587206835Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: +DATA/xff/onlinelog/redo02LOGSTDBY: Validating controlfile with logical metadataWed Jul 24 15:24:40 2024Block recovery stopped at EOT rba 2.82.16Block recovery completed at rba 2.82.16, scn 2320.1263080114Block recovery from logseq 2, block 74 to scn 9965587206833Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: +DATA/xff/onlinelog/redo02Block recovery completed at rba 2.82.16, scn 2320.1263080114Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc:ORA-01595: 释放区 (4) 回退段 (20) 时出错ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []LOGSTDBY: Validation completeWed Jul 24 15:24:41 2024Sweep [inc][777938]: completedSweep [inc2][777938]: completedWed Jul 24 15:24:41 2024Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc (incident=778362):ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Starting background process SMCOWed Jul 24 15:24:42 2024SMCO started with pid=83, OS id=54691 Block recovery from logseq 2, block 74 to scn 9965587206835Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: +DATA/xff/onlinelog/redo02Block recovery completed at rba 2.82.16, scn 2320.1263080118Block recovery from logseq 2, block 74 to scn 9965587206838Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: +DATA/xff/onlinelog/redo02Block recovery completed at rba 2.83.16, scn 2320.1263080119Error 600 in kwqmnpartition(), aborting txn Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc (incident=778363):ORA-25319: 队列表重新分区已中止Completed: alter database openBlock recovery from logseq 2, block 74 to scn 9965587206835Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: +DATA/rac/onlinelog/redo02Block recovery completed at rba 2.82.16, scn 2320.1263080118Block recovery from logseq 2, block 74 to scn 9965587207538Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0 Mem# 0: +DATA/rac/onlinelog/redo02Block recovery completed at rba 2.1097.16, scn 2320.1263080819Errors in file /users/oracle/app/db/diag/rdbms/rac/rac1/trace/rac1_cjq0_55657.trc (incident=778427):ORA-00600: 内部错误代码, 参数: [600], [ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []], [], [], [], [], [], [], [], [], [], []Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_778427/xff1_cjq0_55657_i778427.trcException [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F4EFA, kgegpa()+40][flags: 0x0, count: 1]Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2]Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2] |
从报错分析是由于undo异常导致,处理异常undo回滚段之后,数据库open正常,安排逻辑迁移数据,完成本次恢复
浙公网安备 33010602011771号