我们都知道,在RAC环境中,如果kill ocssd.bin进程,会引起主机重启。
但是有时候系统已经异常了了,且CRS不能正常关闭,而主机可能是几年没重启的老系统,没人敢重启,现在怎么办?
我们只能尝试手工kill进程的方式,然后手工修复CRS(注意,在10.2 RAC中,只有3个d.bin进程)。
测试环境:操作系统是OEL 6.6
[root@lunar1 ~]# cat /etc/oracle-release Oracle Linux Server release 6.6 [root@lunar1 ~]# [root@lunar1 ~]# uname -a Linux lunar1 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux [root@lunar1 ~]#
这套RAC的CRS版本是11.2.0.4:
[root@lunar1 ~]# crsctl query crs activeversion Oracle Clusterware active version on the cluster is [11.2.0.4.0] [root@lunar1 ~]# crsctl query crs releaseversion Oracle High Availability Services release version on the local node is [11.2.0.4.0] [root@lunar1 ~]# crsctl query crs softwareversion Oracle Clusterware version on node [lunar1] is [11.2.0.4.0] [root@lunar1 ~]#
注意,由于12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。
查看当前CRS的状态:
[root@lunar1 ~]# crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRSDG.dg ONLINE ONLINE lunar1 ONLINE ONLINE lunar2 ora.DATADG1.dg ONLINE ONLINE lunar1 ONLINE ONLINE lunar2 ora.DATADG2.dg ONLINE ONLINE lunar1 ONLINE ONLINE lunar2 ora.LISTENER.lsnr ONLINE ONLINE lunar1 ONLINE ONLINE lunar2 ora.asm ONLINE ONLINE lunar1 Started ONLINE ONLINE lunar2 Started ora.gsd OFFLINE OFFLINE lunar1 OFFLINE OFFLINE lunar2 ora.net1.network ONLINE ONLINE lunar1 ONLINE ONLINE lunar2 ora.ons ONLINE ONLINE lunar1 ONLINE ONLINE lunar2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE lunar2 ora.cvu 1 ONLINE ONLINE lunar2 ora.lunar.db 1 ONLINE ONLINE lunar1 Open 2 ONLINE OFFLINE STARTING ora.lunar1.vip 1 ONLINE ONLINE lunar1 ora.lunar2.vip 1 ONLINE ONLINE lunar2 ora.oc4j 1 ONLINE ONLINE lunar1 ora.scan1.vip 1 ONLINE ONLINE lunar2 [root@lunar1 ~]#
查看当前所有的CRS进程:
[root@lunar1 ~]# ps -ef|grep d.bin root 3860 1 0 19:31 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 3972 1 0 19:31 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 3983 1 0 19:31 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin grid 3994 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin root 4004 1 0 19:31 ? 00:00:15 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 4007 1 0 19:31 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/gipcd.bin root 4019 1 0 19:31 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin root 4032 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor root 4051 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/cssdagent grid 4063 1 0 19:31 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 4157 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot grid 4180 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 4343 4180 0 19:32 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 5385 1 1 19:39 ? 00:00:17 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot grid 5456 1 0 19:39 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 5473 1 0 19:39 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 5475 1 0 19:39 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit oracle 7132 1 0 20:04 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 7350 7273 0 20:04 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
这么多进程,他们的关系参见:11.2 RAC 的启动过程
好吧,我们开始模拟kill进程。首先kill 掉/u01/app/11.2.0.4/grid/bin/ohasd.bin(会自动重启,参见11.2 RAC 的启动过程)
[root@lunar1 ~]# kill -9 3860 [root@lunar1 ~]# ps -ef|grep d.bin grid 3983 1 0 19:31 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin grid 3994 1 0 19:31 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin grid 4007 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin root 4019 1 0 19:31 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin root 4032 1 0 19:31 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/cssdmonitor grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 4157 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot grid 4180 1 0 19:31 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 4343 4180 0 19:32 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 5385 1 1 19:39 ? 00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot grid 5456 1 0 19:39 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 5473 1 0 19:39 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 5475 1 0 19:39 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit oracle 7132 1 0 20:04 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 7490 1 0 20:06 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit root 7534 2487 14 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart grid 7571 1 6 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 7575 1 8 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin root 7578 1 2 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent root 7588 1 3 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor root 7676 7273 0 20:07 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
然后,我们kill cssdmonitor:
[root@lunar1 ~]# kill -9 4032 -bash: kill: (4032) - No such process [root@lunar1 ~]#
这里没有这个集成,表示cssdmonitor进程被重启过了:
(参见11.2 RAC 的启动过程)
[root@lunar1 ~]# ps -ef|grep d.bin grid 3983 1 0 19:31 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin grid 3994 1 0 19:31 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/gpnpd.bin grid 4007 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/gipcd.bin root 4019 1 0 19:31 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/osysmond.bin grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 4157 1 0 19:31 ? 00:00:06 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot grid 4180 1 0 19:31 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/evmd.bin grid 4343 4180 0 19:32 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log root 5385 1 1 19:39 ? 00:00:19 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot grid 5456 1 0 19:39 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 5473 1 0 19:39 ? 00:00:07 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 5475 1 0 19:39 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit oracle 7132 1 0 20:04 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 7490 1 0 20:06 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit root 7534 2487 3 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart grid 7571 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 7575 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor root 7740 7273 0 20:07 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
上面进程启动时间在20:04~20:07之间的,都是被/u01/app/11.2.0.4/grid/bin/ohasd.bin进程重启后,自动后台重启的。
现在,我们kill mdnsd gpnpd gipcd osysmond。
这4个进程中,前面3个是CRS启动除了ohasd以外,最早启动的几个进程。
如果kill这些进程,ohasd都会重启的:
[root@lunar1 ~]# kill -9 3983 3994 4007 4019 [root@lunar1 ~]# ps -ef|grep d.bin grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin grid 6535 1 0 19:50 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 7490 1 0 20:06 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit root 7534 2487 2 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart grid 7571 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 7575 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor grid 7756 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin grid 7758 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin root 7776 7273 0 20:07 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
这里我们看到,刚才kill 的4 进程都没起来,怎么回事?
别急,还没到时间,ohasd需要check后才启动,O(∩_∩)O哈哈~
然后,我们kill 监听:
[root@lunar1 ~]# kill -9 6535 7490 [root@lunar1 ~]# ps -ef|grep d.bin grid 4063 1 0 19:31 ? 00:00:13 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 7534 2487 2 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart grid 7571 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 7575 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor grid 7756 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin grid 7758 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin grid 7783 1 2 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin root 7785 1 2 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin root 7844 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1 root 7853 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin grid 7873 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin root 7874 1 14 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot grid 7944 7873 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log grid 7979 1 9 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 7982 1 3 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin oracle 7986 1 4 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 8001 1 3 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 8025 7979 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER grid 8028 7979 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/lsnrctl status LISTENER_SCAN1 root 8083 7273 0 20:08 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
好吧,看看,刚才kill的进程都被重启了,11.2的RAC真强悍啊。
现在我们kill /etc/init.d/init.ohasd进程:
[root@lunar1 ~]# ps -ef|grep ohasd root 2487 1 0 19:20 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 7534 2487 1 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin restart root 8191 7273 0 20:08 pts/2 00:00:00 grep ohasd [root@lunar1 ~]# kill -9 2487 7534 [root@lunar1 ~]# ps -ef|grep ohasd root 8239 1 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 8257 8239 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 8258 8257 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 8267 7273 0 20:08 pts/2 00:00:00 grep ohasd [root@lunar1 ~]# ps -ef|grep ohasd root 8239 1 0 20:08 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 8299 7273 0 20:08 pts/2 00:00:00 grep ohasd [root@lunar1 ~]#
这里我们看到的就是/etc/init.d/init.ohasd被系统自动重启的过程。这些信息会记录在/var/log/message/中:
[root@lunar1 ~]# tail -f /var/log/messages Jan 24 19:45:31 lunar1 kernel: e1000 0000:00:03.0 eth0: Reset adapter Jan 24 20:03:50 lunar1 kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Jan 24 20:03:52 lunar1 kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Jan 24 20:07:01 lunar1 clsecho: /etc/init.d/init.ohasd: ohasd is restarting 1/10. Jan 24 20:07:01 lunar1 logger: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "restart" Jan 24 20:08:26 lunar1 init: oracle-ohasd main process (2487) killed by KILL signal Jan 24 20:08:26 lunar1 init: oracle-ohasd main process ended, respawning Jan 24 20:13:58 lunar1 init: oracle-ohasd main process (8239) killed by KILL signal Jan 24 20:13:58 lunar1 init: oracle-ohasd main process ended, respawning Jan 24 20:14:12 lunar1 root: exec /u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib /u01/app/11.2.0.4/grid/bin/crswrapexece.pl /u01/app/11.2.0.4/grid/crs/install/s_crsconfig_lunar1_env.txt /u01/app/11.2.0.4/grid/bin/ohasd.bin "reboot" ^C [root@lunar1 ~]#
而且他进程都被自动重启了(注意这是crsd进程还没被重启):
[root@lunar1 ~]# ps -ef|grep d.bin grid 4063 1 0 19:31 ? 00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor grid 7756 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gpnpd.bin grid 7758 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin grid 7783 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/gipcd.bin root 7785 1 1 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/osysmond.bin root 7844 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1 root 7853 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/octssd.bin grid 7873 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin root 7874 1 3 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot grid 7944 7873 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log grid 7979 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 7982 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin oracle 7986 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 8001 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 8119 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit grid 8120 1 0 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit root 8321 8319 1 20:08 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has root 8325 7273 0 20:08 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
现在我们依次kill:evmlogger.bin gpnpd.bin mdnsd.bin gipcd.bin evmd.bin oraagent.bin scriptagent.bin oraagent.bin orarootagent.bin和两个lisnterner
[root@lunar1 ~]# kill -9 7944 7756 7758 7783 7873 7979 7982 7986 8001 8119 8120 [root@lunar1 ~]# ps -ef|grep d.bin grid 4063 1 0 19:31 ? 00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 7578 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent root 7588 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/cssdmonitor root 7785 1 1 20:07 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/osysmond.bin root 7844 1 0 20:07 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ologgerd -m lunar2 -r -d /u01/app/11.2.0.4/grid/crf/db/lunar1 root 8593 8591 0 20:09 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin check has root 8597 7273 0 20:09 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
然后,kill osysmond.bin ologgerd cssdmonitor cssdagent :
[root@lunar1 ~]# kill -9 7785 7844 7588 7578 [root@lunar1 ~]#
好吧,现在就剩下一个ocssd.bin了:
[root@lunar1 ~]# ps -ef|grep d.bin grid 4063 1 0 19:31 ? 00:00:14 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 8629 7273 0 20:10 pts/2 00:00:00 grep d.bin [root@lunar1 ~]#
现在我们kill 传说中一旦被kill就会引起主机重启的进程 ocssd.bin :
[root@lunar1 ~]# kill -9 4063 [root@lunar1 ~]#
好了,我们的系统都还好好的,没有重启,资源也都释放干净了:
[root@lunar1 ~]# ipcs -ma ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status ------ Semaphore Arrays -------- key semid owner perms nsems 0x00000000 0 root 600 1 0x00000000 65537 root 600 1 ------ Message Queues -------- key msqid owner perms used-bytes messages [root@lunar1 ~]# [root@lunar1 ~]#
如果要恢复,很简单,只要直接重启crs就ok了:
[root@lunar1 ~]# ps -ef | grep -v grep|grep -E 'init|d.bin|ocls|evmlogger|UID' UID PID PPID C STIME TTY TIME CMD root 1 0 0 19:20 ? 00:00:01 /sbin/init root 2486 1 0 19:20 ? 00:00:00 /bin/sh /etc/init.d/init.tfa run root 8924 1 0 20:13 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run [root@lunar1 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@lunar1 ~]# ps -ef|grep ohasd root 8924 1 0 20:13 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 8968 1 4 20:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot root 9187 7273 0 20:14 pts/2 00:00:00 grep ohasd [root@lunar1 ~]# [root@lunar1 ~]# ps -ef|grep d.bin root 8968 1 0 20:14 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot grid 9090 1 0 20:14 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 9101 1 0 20:14 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/mdnsd.bin grid 9112 1 0 20:14 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/gpnpd.bin root 9122 1 0 20:14 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 9126 1 0 20:14 ? 00:00:08 /u01/app/11.2.0.4/grid/bin/gipcd.bin root 9139 1 0 20:14 ? 00:00:12 /u01/app/11.2.0.4/grid/bin/osysmond.bin root 9150 1 0 20:14 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/cssdmonitor root 9169 1 0 20:14 ? 00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent grid 9180 1 0 20:14 ? 00:00:04 /u01/app/11.2.0.4/grid/bin/ocssd.bin root 9212 1 1 20:14 ? 00:00:28 /u01/app/11.2.0.4/grid/bin/ologgerd -M -d /u01/app/11.2.0.4/grid/crf/db/lunar1 root 9340 1 0 20:18 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/octssd.bin reboot grid 9363 1 0 20:18 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/evmd.bin root 9455 1 0 20:18 ? 00:00:09 /u01/app/11.2.0.4/grid/bin/crsd.bin reboot grid 9532 9363 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log grid 9569 1 0 20:18 ? 00:00:02 /u01/app/11.2.0.4/grid/bin/oraagent.bin grid 9572 1 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/scriptagent.bin root 9591 1 0 20:18 ? 00:00:05 /u01/app/11.2.0.4/grid/bin/orarootagent.bin grid 9682 1 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit grid 9684 1 0 20:18 ? 00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER_SCAN1 -inherit oracle 9774 1 0 20:19 ? 00:00:03 /u01/app/11.2.0.4/grid/bin/oraagent.bin root 10642 7273 0 20:38 pts/2 00:00:00 grep d.bin [root@lunar1 ~]# [root@lunar1 ~]# crsctl status res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRSDG.dg ONLINE ONLINE lunar1 ora.DATADG1.dg ONLINE ONLINE lunar1 ora.DATADG2.dg ONLINE ONLINE lunar1 ora.LISTENER.lsnr ONLINE ONLINE lunar1 ora.asm ONLINE ONLINE lunar1 Started ora.gsd OFFLINE OFFLINE lunar1 ora.net1.network ONLINE ONLINE lunar1 ora.ons ONLINE ONLINE lunar1 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE lunar1 ora.cvu 1 ONLINE ONLINE lunar1 ora.lunar.db 1 ONLINE ONLINE lunar1 Open 2 ONLINE OFFLINE ora.lunar1.vip 1 ONLINE ONLINE lunar1 ora.lunar2.vip 1 ONLINE INTERMEDIATE lunar1 FAILED OVER ora.oc4j 1 ONLINE ONLINE lunar1 ora.scan1.vip 1 ONLINE ONLINE lunar1 [root@lunar1 ~]#
这里只显示了节点1,因为节点2我关闭了。
测试证明,只要先kill cssdmonitor 和 cssdagent进程(准确的说是cssagent,从那张CRS启动的经典大图上也可以看到这个关系),再kill ocssd.bin进程,系统是不会重启的。
另外,12.1普通RAC(非Flex Cluster)的情况根本文一样,处理思路和过程也一样。