联系:QQ(5163721)
标题:11.2 RAC 修改了目录权限(u01)后crs不能启动的解决方法-1-手工修复错误的权限
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
在11.2RAC中,如果修改了gird的安装目录(类似chown -R xxx /u01),比如通常我们会使用/u01,则crs会出现不能启动的状态,启动时,mdnsd进程会首先卡主,crs日志会有如下信息:
2014-09-26 20:56:25.895 [ohasd(16366)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:199} in /u01/app/11.2.0/grid/log/lunardb1/ohasd/ohasd.log. 2014-09-26 20:58:28.984 [/u01/app/11.2.0/grid/bin/oraagent.bin(15422)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:228} in /u01/app/11.2.0/grid/log/lunardb1/agent/ohasd/oraagent_grid//oraagent_grid.log. 2014-09-26 20:58:32.994 [ohasd(16366)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:228} in /u01/app/11.2.0/grid/log/lunardb1/ohasd/ohasd.log. 2014-09-26 23:36:05.848 [/u01/app/11.2.0/grid/bin/orarootagent.bin(8064)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/acfsload" spawned by agent "/u01/app/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/lunardb1/agent/ohasd/orarootagent_root/orarootagent_root.log"
下面我们尝试使用3种方法来修复该问题。
方法1————直接修改/u01和其他相关文件或者目录的权限:
注意: 此方法,仅仅用于紧急启动数据库或者ASM的不得已的做法,在生产环境下,官方建议的做法是删除节点和添加节点(后面会在方法3中详细描述)。
首先修改/u01目录为grid:oinstall,并修改/u01/app/oracle为oracle:oinstall
chown -R grid:oinstall /u01 chown -R oracle:oinstall /u01/app/oracle
如果进行上述目录权限的修改,那么crs表面可以启动,但是可以看到后台重要的agent进程都是错误的状态:
2014-10-04 19:56:16.828 [/u01/app/11.2.0/grid/bin/oraagent.bin(19898)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/q9db01/agent/crsd/oraagent_grid/oraagent_grid.log" 2014-10-04 19:56:16.832 [/u01/app/11.2.0/grid/bin/oraagent.bin(19898)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/q9db01/agent/crsd/oraagent_grid/oraagent_grid.log" 2014-10-04 19:56:16.848 [/u01/app/11.2.0/grid/bin/oraagent.bin(19898)]CRS-5016:Process "/u01/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/q9db01/agent/crsd/oraagent_grid/oraagent_grid.log" 2014-10-04 19:56:23.733 [/u01/app/11.2.0/grid/bin/oraagent.bin(20116)]CRS-5010:Update of configuration file "/u01/app/oracle/product/11.2.0/db_1/srvm/admin/oratab.bak.q9db01" failed: details at "(:CLSN00013:)" in "/u01/app/11.2.0/grid/log/q9db01/agent/crsd/oraagent_oracle//oraagent_oracle.log"
还有一些对ohasd和crsd比较关键的文件的权限,也一并修改了:
[grid@lunardb1 ~]$ env|grep ORA ORACLE_SID=+ASM1 ORACLE_BASE=/u01/app/grid ORACLE_HOME=/u01/app/11.2.0/grid [grid@lunardb1 ~]$ exit logout [root@lunardb1 app]# export GRID_HOME=/u01/app/11.2.0/grid [root@lunardb1 app]# cd $GRID_HOME/log/`hostname`/crsd ; -bash: cd: /u01/app/11.2.0/grid/log/lunardb1.800best.com/crsd: No such file or directory [root@lunardb1 app]# cd /u01/app/11.2.0/grid/log/lunardb1/crsd [root@lunardb1 crsd]# chown root:root * [root@lunardb1 crsd]# cd ../ohasd [root@lunardb1 ohasd]# chown root:root * [root@lunardb1 ohasd]# cd .. [root@lunardb1 lunardb1]# ll total 2312 drwxr-xr-x 2 grid oinstall 4096 Jun 7 2013 acfs drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 acfslog drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 acfsrepl drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 acfsreplroot drwxr-xr-x 2 grid oinstall 4096 Jun 7 2013 acfssec drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 admin drwxrwxr-t 4 grid oinstall 4096 Jun 7 2013 agent -rw-rw-r-- 1 grid oinstall 2266872 Oct 4 17:48 alertlunardb1.log drwxr-x--x 2 grid oinstall 4096 Jun 17 14:24 client drwxr-x--- 2 grid oinstall 4096 Aug 6 15:40 crflogd drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 crfmond drwxr-x--- 2 grid oinstall 4096 Sep 2 01:02 crsd drwxr-x--- 2 grid oinstall 4096 Sep 27 01:54 cssd drwxr-x--- 2 grid oinstall 4096 Sep 26 10:02 ctssd drwxr-x--- 4 grid oinstall 4096 Jun 7 2013 cvu drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 diskmon drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 evmd drwxr-x--- 2 grid oinstall 4096 Oct 4 17:47 gipcd drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 gnsd drwxr-x--- 2 grid oinstall 4096 Oct 4 17:47 gpnpd drwxr-x--- 2 grid oinstall 4096 Sep 27 01:30 mdnsd drwxr-x--- 2 grid oinstall 4096 Sep 27 01:49 ohasd drwxrwxr-t 5 grid oinstall 4096 Jun 7 2013 racg drwxr-x--- 2 grid oinstall 4096 Jun 7 2013 srvm [root@lunardb1 lunardb1]# cd /crsd -bash: cd: /crsd: No such file or directory [root@lunardb1 lunardb1]# cd agent/crsd/orarootagent_root [root@lunardb1 orarootagent_root]# chown root:root * [root@lunardb1 orarootagent_root]# [root@lunardb1 orarootagent_root]# cd /u01/app/11.2.0/grid/log/lunardb1/agent/ohasd/orarootagent_root [root@lunardb1 orarootagent_root]# ll total 104528 -rw-r--r-- 1 grid oinstall 10564752 Sep 26 14:18 orarootagent_root.l01 -rw-r--r-- 1 grid oinstall 10565738 Sep 24 04:23 orarootagent_root.l02 -rw-r--r-- 1 grid oinstall 10563920 Sep 21 18:28 orarootagent_root.l03 -rw-r--r-- 1 grid oinstall 10565310 Sep 19 08:40 orarootagent_root.l04 -rw-r--r-- 1 grid oinstall 10565749 Sep 16 22:41 orarootagent_root.l05 -rw-r--r-- 1 grid oinstall 10563754 Sep 14 12:41 orarootagent_root.l06 -rw-r--r-- 1 grid oinstall 10563226 Sep 12 02:49 orarootagent_root.l07 -rw-r--r-- 1 grid oinstall 10561202 Sep 9 17:03 orarootagent_root.l08 -rw-r--r-- 1 grid oinstall 10543893 Sep 7 07:09 orarootagent_root.l09 -rw-r--r-- 1 grid oinstall 10566373 Sep 4 21:51 orarootagent_root.l10 -rw-r--r-- 1 grid oinstall 1213705 Oct 4 17:57 orarootagent_root.log -rw-r--r-- 1 grid oinstall 0 Jun 7 2013 orarootagent_rootOUT.log -rw-r--r-- 1 grid oinstall 5 Oct 4 17:48 orarootagent_root.pid [root@lunardb1 orarootagent_root]# chown root:root * [root@lunardb1 orarootagent_root]# [root@lunardb1 orarootagent_root]# cd $GRID_HOME [root@lunardb1 grid]# cd bin [root@lunardb1 bin]# ll oradism -rwxr-x--- 1 grid oinstall 71758 Sep 17 2011 oradism [root@lunardb1 bin]# [root@lunardb1 bin]# chown root:oinstall oradism [root@lunardb1 bin]# chmod 4750 oradism [root@lunardb1 bin]# ll oradism -rwsr-x--- 1 root oinstall 71758 Sep 17 2011 oradism [root@lunardb1 bin]#
此时启动可以crs可以启动了。
但是,可以看到目录权限有问题的节点,数据库没有正常启动:
[root@lunardb1 app]# ps -ef|grep d.bin root 3722 1 3 17:47 ? 00:00:02 /u01/app/11.2.0/grid/bin/ohasd.bin reboot grid 3938 1 0 17:47 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin grid 3950 1 0 17:47 ? 00:00:00 /u01/app/11.2.0/grid/bin/mdnsd.bin grid 4003 1 0 17:47 ? 00:00:00 /u01/app/11.2.0/grid/bin/gpnpd.bin grid 4024 1 1 17:47 ? 00:00:00 /u01/app/11.2.0/grid/bin/gipcd.bin root 4071 1 0 17:47 ? 00:00:00 /u01/app/11.2.0/grid/bin/cssdmonitor root 4086 1 0 17:47 ? 00:00:00 /u01/app/11.2.0/grid/bin/cssdagent grid 4117 1 2 17:47 ? 00:00:01 /u01/app/11.2.0/grid/bin/ocssd.bin root 4508 1 1 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/orarootagent.bin root 4531 1 0 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/octssd.bin reboot grid 4571 1 0 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/evmd.bin root 5322 1 5 17:48 ? 00:00:01 /u01/app/11.2.0/grid/bin/crsd.bin reboot grid 5600 4571 0 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log grid 5646 1 6 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/oraagent.bin root 5650 1 3 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/orarootagent.bin grid 5847 1 2 17:48 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit grid 5864 1 0 17:49 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_DG -inherit root 5869 429 0 17:49 pts/1 00:00:00 grep d.bin [root@lunardb1 app]# crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.DATA.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.DATA1.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.LISTENER.lsnr ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.LISTENER_DG.lsnr ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.OCR_VOTE.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.REDODG.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.asm ONLINE ONLINE lunardb1 Started ONLINE ONLINE lunardb2 Started ora.gsd OFFLINE OFFLINE lunardb1 OFFLINE OFFLINE lunardb2 ora.net1.network ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.net2.network ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.ons ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.registry.acfs ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE lunardb2 ora.cvu 1 ONLINE ONLINE lunardb2 ora.oc4j 1 ONLINE ONLINE lunardb2 ora.lunardb.db 1 ONLINE OFFLINE 2 ONLINE ONLINE lunardb2 Open,Readonly ora.lunardb1-dg-vip.vip 1 ONLINE ONLINE lunardb1 ora.lunardb1.vip 1 ONLINE ONLINE lunardb1 ora.lunardb2-dg-vip.vip 1 ONLINE ONLINE lunardb2 ora.lunardb2.vip 1 ONLINE ONLINE lunardb2 ora.scan1.vip 1 ONLINE ONLINE lunardb2 [root@lunardb1 app]#
手工启动数据库,报错信息如下:
[oracle@lunardb1 ~]$ ss SQL*Plus: Release 11.2.0.3.0 Production on Sat Oct 4 17:52:09 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to an idle instance. 17:52:09 @>startup mount ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/lunardb/spfilelunardb.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/lunardb/spfilelunardb.ora ORA-12547: TNS:lost contact 17:52:15 @>exit Disconnected [oracle@lunardb1 ~]$
这个错误通常意味着oracle二进制文件权限不对,尝试修改:
[oracle@lunardb1 trace]$ ss SQL*Plus: Release 11.2.0.3.0 Production on Sat Oct 4 18:01:53 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to an idle instance. 18:01:53 @>startup ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/lunardb/spfilelunardb.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/lunardb/spfilelunardb.ora ORA-12547: TNS:lost contact 18:01:58 @>
正常情况下,$GRID_HOME/bin/oracle和$ORACLE_HOME/bin/oracle的权限都应该是6751,即“-rwsr-s–x”
对比下节点2(正常节点):
[grid@lunardb2 ~]$ cd $ORACLE_HOME [grid@lunardb2 grid]$ cd bin [grid@lunardb2 bin]$ ll oracle -rwsr-s--x 1 grid oinstall 204113496 Jun 7 2013 oracle [grid@lunardb2 bin]$
再看看节点1(问题节点):
[root@lunardb1 bin]# ll oracle -rwxr-x--x 1 grid oinstall 204113496 Jun 7 2013 oracle [root@lunardb1 bin]#
手工修改$GRID_HOME/bin/oracle文件权限:
[root@lunardb1 bin]# chmod 6751 oracle [root@lunardb1 bin]# ll oracle -rwsr-s--x 1 grid oinstall 204113496 Jun 7 2013 oracle [root@lunardb1 bin]#
顺便检查一下$ORACLE_HOME/bin/oracle文件权限:
[root@lunardb1 bin]# su - oracle [oracle@lunardb1 ~]$ cd $ORACLE_HOME [oracle@lunardb1 db_1]$ cd bin [oracle@lunardb1 bin]$ ll oracle -rwxr-s--x 1 oracle asmadmin 221332085 Jun 7 2013 oracle [oracle@lunardb1 bin]$
现在,再重新启动数据库:
[oracle@lunardb1 ~]$ ss SQL*Plus: Release 11.2.0.3.0 Production on Sat Oct 4 18:06:27 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to an idle instance. 18:06:27 @>startup ORACLE instance started. Total System Global Area 1.6034E+11 bytes Fixed Size 2236968 bytes Variable Size 3.0602E+10 bytes Database Buffers 1.2939E+11 bytes Redo Buffers 352468992 bytes Database mounted. Database opened. 18:07:19 @>exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, Data Mining and Real Application Testing options [oracle@lunardb1 ~]$ exit logout You have new mail in /var/spool/mail/root [root@lunardb1 bin]# crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.DATA.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.DATA1.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.LISTENER.lsnr ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.LISTENER_DG.lsnr ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.OCR_VOTE.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.REDODG.dg ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.asm ONLINE ONLINE lunardb1 Started ONLINE ONLINE lunardb2 Started ora.gsd OFFLINE OFFLINE lunardb1 OFFLINE OFFLINE lunardb2 ora.net1.network ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.net2.network ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.ons ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 ora.registry.acfs ONLINE ONLINE lunardb1 ONLINE ONLINE lunardb2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE lunardb2 ora.cvu 1 ONLINE ONLINE lunardb2 ora.oc4j 1 ONLINE ONLINE lunardb2 ora.lunardb.db 1 ONLINE ONLINE lunardb1 Open,Readonly 2 ONLINE ONLINE lunardb2 Open,Readonly ora.lunardb1-dg-vip.vip 1 ONLINE ONLINE lunardb1 ora.lunardb1.vip 1 ONLINE ONLINE lunardb1 ora.lunardb2-dg-vip.vip 1 ONLINE ONLINE lunardb2 ora.lunardb2.vip 1 ONLINE ONLINE lunardb2 ora.scan1.vip 1 ONLINE ONLINE lunardb2 [root@lunardb1 bin]#
目前该数据库貌似可以启动了,如果在很多异常情况下,目前的情况,已经可以尝试导出数据库或者备份数据库等等了。
但是这种状态的crs和数据库是存在很大隐患的,比如很可能会异常宕机或者出现其他莫名其妙的损坏等情况。
因此,一旦权限出现问题,要么使用rootcrs.pl -init修复(通常这种情况下,这种修复是徒劳的,后面的测试会证明这一点)
否则官方不支持任何手工手工修改权限的做法。就这一点,官方有明确的:
The permissions can be reverted back to original values with rootcrs.pl or roothas.pl . There is a option -init : Reset the permissions of all files and directories under Oracle CRS/HA home. for GRID: rootcrs.pl -init For Standalone Grig: roothas.pl -init If that does not work then permissions can be altered manually with information found from crsconfig_fileperms and crsconfig_dirs files. Please note that changing the permissions manually is last resort and shouldn't be done unless re