联系:QQ(5163721)
标题:11.2 RAC 修改了目录权限(u01)后crs不能启动的解决方法–使用rootcrs.pl -init修复
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
还原节点损坏的场景:
[root@lunardb01 grid]# chown -R oracle:oinstall /u01 [root@lunardb01 grid]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@lunardb01 grid]# ps -ef|grep d.bin root 27170 1 6 19:27 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 27400 1 3 19:27 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 27609 19818 0 19:27 pts/1 00:00:00 grep d.bin [root@lunardb01 grid]# ps -ef|grep d.bin root 27170 1 5 19:27 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 27400 1 2 19:27 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 27621 19818 0 19:27 pts/1 00:00:00 grep d.bin [root@lunardb01 grid]# ps -ef|grep d.bin root 27170 1 1 19:27 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 27400 1 0 19:27 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 28150 19818 0 19:28 pts/1 00:00:00 grep d.bin [root@lunardb01 grid]#
可以看到,此时crs起不来了,后台报错:
-----ohasd的报错: 2014-10-04 19:27:27.643: [ CRSPE][1148361024] {0:0:2} RI [ora.mdnsd 1 1] new internal state: [STARTING] old value: [STABLE] 2014-10-04 19:27:27.643: [ CRSPE][1148361024] {0:0:2} Sending message to agfw: id = 223 2014-10-04 19:27:27.644: [ CRSPE][1148361024] {0:0:2} CRS-2672: Attempting to start 'ora.mdnsd' on 'lunardb01' 2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} Agfw Proxy Server received the message: RESOURCE_START[ora.mdnsd 1 1] ID 4098:223 2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} Creating the resource: ora.mdnsd 1 1 2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} Initializing the resource ora.mdnsd 1 1 for type ora.mdns.type 2014-10-04 19:27:27.644: [ AGFW][1137854784] {0:0:2} SR: acl = owner:grid:rw-,pgrp:oinstall:rw-,other::r--,user:grid:rwx 2014-10-04 19:27:27.645: [ CRSPE][1148361024] {0:0:2} ICE has queued an operation. Details: Operation [START of [ora.gpnpd 1 1] on [lunardb01] : local=0, unplanned=00x2aaab00c68f0] cannot run cause it needs W lock for: WO for Placement Path RI:[ora.mdnsd 1 1] server [lunardb01] target states [ONLINE ], locked by op [START of [ora.mdnsd 1 1] on [lunardb01] : local=0, unplanned=00x2aaab00b72e0]. Owner: CRS-2683: It is locked by 'SYSTEM' for command 'Resource Autostart : lunardb01' —crsd的报错: 2014-10-04 19:26:23.937: [ CRSCOMM][1158867264][FFAIL] Ipc: Couldnt clscreceive message, no message: 11 2014-10-04 19:26:23.938: [ CRSCOMM][1158867264] Ipc: Client disconnected. 2014-10-04 19:26:23.938: [ CRSCOMM][1158867264][FFAIL] IpcL: Listener got clsc error 11 for memNum. 1 2014-10-04 19:26:23.938: [ CRSCOMM][1158867264] IpcL: connection to member 1 has been removed 2014-10-04 19:26:23.938: [CLSFRAME][1158867264] Removing IPC Member:{Relative|Node:0|Process:1|Type:3} 2014-10-04 19:26:23.938: [CLSFRAME][1158867264] Disconnected from AGENT process: {Relative|Node:0|Process:1|Type:3} 2014-10-04 19:26:23.938: [ AGFW][1165171008] {1:33686:190} Agfw Proxy Server received process disconnected notification, count=1 2014-10-04 19:26:23.939: [ AGFW][1165171008] {1:33686:190} /u01/app/11.2.0/grid/bin/oraagent_grid disconnected. 2014-10-04 19:26:23.939: [ AGFW][1165171008] {1:33686:190} Agent /u01/app/11.2.0/grid/bin/oraagent_grid[5646] stopped! 2014-10-04 19:26:23.939: [ CRSCOMM][1165171008] {1:33686:190} IpcL: removeConnection: Member 1 does not exist. –alert的报错: 2014-10-04 19:27:23.293 [ohasd(27170)]CRS-2112:The OLR service started on node lunardb01. 2014-10-04 19:27:23.314 [ohasd(27170)]CRS-1301:Oracle High Availability Service started on node lunardb01. 2014-10-04 19:27:23.314 [ohasd(27170)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred 2014-10-04 19:27:24.351 [/u01/app/11.2.0/grid/bin/orarootagent.bin(27307)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsload" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/orarootagent_root/orarootagent_root.log" 2014-10-04 19:27:27.171 [/u01/app/11.2.0/grid/bin/orarootagent.bin(27307)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 2014-10-04 19:29:27.802 [/u01/app/11.2.0/grid/bin/oraagent.bin(27400)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/oraagent_grid//oraagent_grid.log. 2014-10-04 19:29:31.812 [ohasd(27170)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/ohasd/ohasd.log. 2014-10-04 19:31:34.907 [/u01/app/11.2.0/grid/bin/oraagent.bin(29240)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/oraagent_grid//oraagent_grid.log. 2014-10-04 19:31:38.918 [ohasd(27170)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/ohasd/ohasd.log. 2014-10-04 19:33:41.993 [/u01/app/11.2.0/grid/bin/oraagent.bin(30882)]CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/agent/ohasd/oraagent_grid//oraagent_grid.log. 2014-10-04 19:33:46.004 [ohasd(27170)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.gpnpd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0/grid/log/lunardb01/ohasd/ohasd.log.
可以看到,卡在ora.mdnsd服务不能启动:
[root@lunardb01 grid]# crsctl status res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE Instance Shutdown ora.cluster_interconnect.haip 1 ONLINE OFFLINE ora.crf 1 ONLINE OFFLINE ora.crsd 1 ONLINE OFFLINE ora.cssd 1 ONLINE OFFLINE ora.cssdmonitor 1 ONLINE OFFLINE ora.ctssd 1 ONLINE OFFLINE ora.diskmon 1 ONLINE OFFLINE ora.drivers.acfs 1 ONLINE OFFLINE ora.evmd 1 ONLINE OFFLINE ora.gipcd 1 ONLINE OFFLINE ora.gpnpd 1 ONLINE OFFLINE ora.mdnsd 1 ONLINE OFFLINE STARTING [root@lunardb01 grid]#
使用rootcrs.pl的init选项尝试修复,结果是不行的:
[root@lunardb01 lunardb01]# $GRID_HOME/crs/install/rootcrs.pl -init Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params [root@lunardb01 lunardb01]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@lunardb01 lunardb01]# [root@lunardb01 ohasd]# ps -ef|grep d.bin root 12642 1 0 19:48 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 14804 1 0 19:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 15481 19818 0 19:52 pts/1 00:00:00 grep d.bin [root@lunardb01 ohasd]# ps -ef|grep d.bin root 12642 1 0 19:48 ? 00:00:01 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 14804 1 0 19:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 15663 19818 0 19:52 pts/1 00:00:00 grep d.bin [root@lunardb01 ohasd]#
后台日志的报错信息,跟上面的是雷同的。
可见,使用rootcrs.pl -init修复目录权限,在chown -R /u01面前,作用不大。