Master Diskmon是Oracle Clusterware 11.1.0.7版本引入的一个新的进程(主要是为了Exadata Storage Server软件而设计的),该进程作为缺省安装的一部分随着Oracle Clusterware的安装就存在了。
Master Diskmon主要负责监控cell,并负责跟数据库节点的diskmon进程通信。该进程还参与IO fencing机制和IORM(IO Resource Manager)。
Master Diskmon进程是一个单独的进程,他跟ocssd进程通信,即便是非Exadata环境,该进程也是存在的(只是非Exadata环境,Diskmon进程没有什么作用,后面会解释这个)。
在11.1.0.7中,/bin/sh /etc/init.d/init.cssd 会启动2个diskmon相关进程,即:
root 1717 0.0 0.0 6716 1368 ? Ss 11:43 0:07 /bin/sh /etc/init.d/init.cssd fatal <span>root 2799 0.0 0.0 6720 1364 ? S 11:44 0:00 \_ /bin/sh /etc/init.d/init.cssd diskmon oracle 3317 0.0 0.9 108864 18976 ? Sl 11:44 0:00 | \_ /u01/64bit/B107/crs/bin/diskmon.bin -d -f</span> 再来看下11.2的情况。我们都知道11.2的Oracle Clusterware结构与11.1中变化很大,所有的进程都是由ohasd启动的。下面是11.2的进程启动流程:
从图中可以看出来,当机器启动时,init 脚本会启动ohasd进程,而ohasd进程启动orarootagent 守护进程,这个orarootagent守护进程是root用户的所有ohasd资源的agent,他负责启动crsd、ctssd、ACFS Drivers和diskmon.
虽然名字是“Diskmon”,但是其主要作用是在Exadata的数据库节点上监控所有节点和网络的连接,作用是确认这些节点的存活状态。
在Exadata上,ocssd的IO Fencing进程就是跟diskmon进程通信来处理IO fencing的。
另外,从11.2以后,我们可以看到,
[root@dm01db01 init.d]# strings /u01/app/11.2.0.3/grid/bin/orarootagent.bin|grep diskmon ora.diskmon.type diskmon diskmon /valgrind_diskmon diskmon -d -f diskmon DiskmonAgent::DiskmonAgent diskmon is enabled DiskmonAgent::DiskmonAgent diskmon is disabled DiskmonAgent:: diskmon shutting down cleanly %d DiskmonAgent:: Unable to reboot..clean ora.diskmon... DiskmonAgent::check: diskmon OS pid unknown %u DiskmonAgent::check: diskmon OS pid %u [root@dm01db01 init.d]# 上面我们看到diskmon的启动方式为“diskmon.bin -d -f”,其中: -d 表示打开trace选项 -f 创建该进程前,remove之前创建的管道 这个很好理解,因为我们知道在Exadata上,通信是DB进程的读写是使用pipe的方式的。 [root@dm01db01 ~]# ps -ef|grep d.bin root 2388 1 0 19:10 ? 00:00:13 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot grid 2610 1 0 19:10 ? 00:00:08 /u01/app/11.2.0.3/grid/bin/oraagent.bin grid 2624 1 0 19:10 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin grid 2636 1 0 19:10 ? 00:00:07 /u01/app/11.2.0.3/grid/bin/gpnpd.bin <span>root 2648 1 0 19:10 ? 00:01:21 /u01/app/11.2.0.3/grid/bin/orarootagent.bin</span> grid 2651 1 0 19:10 ? 00:00:12 /u01/app/11.2.0.3/grid/bin/gipcd.bin root 2670 1 0 19:10 ? 00:00:23 /u01/app/11.2.0.3/grid/bin/osysmond.bin root 2684 1 0 19:10 ? 00:00:02 /u01/app/11.2.0.3/grid/bin/cssdmonitor root 2706 1 0 19:10 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/cssdagent grid 2710 1 0 19:10 ? 00:00:02 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f grid 2733 1 0 19:10 ? 00:00:20 /u01/app/11.2.0.3/grid/bin/ocssd.bin root 2851 1 0 19:11 ? 00:00:03 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot grid 2877 1 0 19:11 ? 00:00:03 /u01/app/11.2.0.3/grid/bin/evmd.bin root 3216 1 2 19:12 ? 00:07:13 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01 root 3279 1 0 19:13 ? 00:00:13 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot grid 3392 2877 0 19:13 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log grid 3445 1 0 19:13 ? 00:00:08 /u01/app/11.2.0.3/grid/bin/oraagent.bin grid 3452 1 0 19:13 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/scriptagent.bin root 3455 1 0 19:13 ? 00:00:27 /u01/app/11.2.0.3/grid/bin/orarootagent.bin oracle 3807 1 0 19:14 ? 00:00:09 /u01/app/11.2.0.3/grid/bin/oraagent.bin root 22357 22301 0 23:34 pts/4 00:00:00 grep d.bin [root@dm01db01 ~]#
查看ocssd.bin进程的调用信息:
[root@dm01db01 ~]# pstack 2733 。。。。。。。。。。。。 Thread 3 (Thread 0x4243c940 (LWP 2834)): #0 0x0000003946c0aee9 in <a href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0 #1 0x00002b12cee6bc78 in sltspcwait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 <span>#2 0x00002b12cf04ccfb in kgzf_send_main () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 --发送消息 </span>#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x425bd940 (LWP 2835)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002b12cf25f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 <span>#2 0x00002b12cf04c25e in kgzf_recv_main () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 ---接收消息 </span>#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b12d0b2d330 (LWP 2733)): #0 0x0000003946c0b150 in <a href="mailto:pthread_cond_timedwait@@GLIBC_2.3.2">pthread_cond_timedwait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0 #1 0x00002b12cee6bc36 in sltspctimewait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x00002b12ccc56a18 in clsucvtimewait () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #3 0x0000000000464829 in clssnmNMGetStatus () #4 0x0000000000494a12 in clssgmStartNMMon () #5 0x000000000040a504 in clssscmain () #6 0x0000000000407d50 in main () [root@dm01db01 ~]#
查看diskmon.bin的调用信息:
[root@dm01db01 ~]# pstack 2710 Thread 11 (Thread 0x40808940 (LWP 2714)): #0 0x0000003946c0aee9 in <a href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0 #1 0x00002aeb5478bc78 in sltspcwait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x00002aeb562bf640 in clsd_logThread () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x410b8940 (LWP 2715)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb55e64768 in sosstcppoll () from /u01/app/11.2.0.3/grid/lib/libcell11.so #2 0x00002aeb55e431eb in ossnet_poll_in_batches () from /u01/app/11.2.0.3/grid/lib/libcell11.so #3 0x00002aeb55e4389e in ossnet_async_monitor_poll () from /u01/app/11.2.0.3/grid/lib/libcell11.so #4 0x00002aeb55e3ca72 in oss_wait () from /u01/app/11.2.0.3/grid/lib/libcell11.so #5 0x00002aeb55e3d118 in oss_wait_postable () from /u01/app/11.2.0.3/grid/lib/libcell11.so <span>#6 0x00000000004243a0 in dskm_tcpmon_thrd_main () ----监控tcp端口</span> #7 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x412b9940 (LWP 2716)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb56515194 in sgipcwWaitHelper () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #2 0x00002aeb56512da1 in sgipcwWait () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #3 0x00002aeb5636cb47 in gipcWaitOsd () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #4 0x00002aeb56359599 in gipcInternalWait () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #5 0x00002aeb56306f60 in gipcWaitF () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so <span>#6 0x00002aeb56212d4b in clsssRecvMsg () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so --这三个thread的主要作用是监控其他进程的状态,并接收消息 #7 0x00002aeb561eea9c in clssgsGroupGetStatus () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so #8 0x00002aeb561e4ec6 in clssgsgrpstat () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so </span>#9 0x000000000042957f in dskm_rac_thrd_main () #10 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #11 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x414ba940 (LWP 2717)): #0 0x0000003946c0b150 in <a href="mailto:pthread_cond_timedwait@@GLIBC_2.3.2">pthread_cond_timedwait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0 #1 0x00002aeb5478bc36 in sltspctimewait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x0000000000425996 in dskm_oss_thrd_main () #3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x40be3940 (LWP 2735)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 <span>#1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 --读取接收到的消息 </span>#2 0x0000000000424dfe in dskm_slave_thrd_main () #3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x416bb940 (LWP 2837)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x0000000000424dfe in dskm_slave_thrd_main () #3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x40e3a940 (LWP 2846)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb5603d6fe in ssskgxp_poll () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #2 0x00002aeb5603687f in sskgxp_select () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #3 0x00002aeb55fe9392 in skgxpiwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #4 0x00002aeb55fe7ed4 in skgxpwaiti () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #5 0x00002aeb56025c14 in skgxpwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #6 0x0000000000425e85 in dskm_rcv_thrd_main () #7 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x418bc940 (LWP 3154)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x0000000000424dfe in dskm_slave_thrd_main () #3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x41abd940 (LWP 3238)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb5603d6fe in ssskgxp_poll () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #2 0x00002aeb5603687f in sskgxp_select () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #3 0x00002aeb55fe9392 in skgxpiwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #4 0x00002aeb55fe7ed4 in skgxpwaiti () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so #5 0x00002aeb56025c14 in skgxpwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so <span>#6 0x0000000000417ea3 in dskm_hb_thrd_main () --心跳检测 </span>#7 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #8 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x41cbe940 (LWP 4004)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x0000000000424dfe in dskm_slave_thrd_main () #3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000039460d44bd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2aeb56ee1c50 (LWP 2710)): #0 0x00000039460cb696 in poll () from /lib64/libc.so.6 #1 0x00002aeb54b80441 in skgznp_accept () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 #2 0x000000000042d339 in dskm_main () #3 0x000000000040ae3a in main () [root@dm01db01 ~]#
上面我们看到libskgxp11.so动态库,这个文件(libskgxpX.so,其中X代表Oracle版本号9/10/11)最早是9i RAC引入的。
skgxp 是System Kernel Generic Interface Inter-Process Communications的缩写,是oracle开放的一个应用接口,用于传输GCS和GES 的数据。
非Infiniband环境下,Oracle自带的libskgxp文件定义的传输协议是UDP/IP。
在Infiniband网络中,Oracle自带的libskgxp文件定义的传输协议是IPoIB(IP over Infiniband),而在Exadata环境中,缺省使用的传输协议是RDS.
在diskmon进程的日志中显示了该进程的版本,进程号、跟cell的通信信息、监控细节等等:
I/O Fencing and SKGXP HA monitoring daemon -- Version 1.2.0.0 Process 2710 started on 2014-03-15 at 19:10:49.694 2014-03-15 19:10:49.702: [ DISKMON][2710] dskm main: starting up 2014-03-15 19:10:49.702: [ DISKMON][2710:1458445392] dskm_tcpmon_thrd_creat: thread created 2014-03-15 19:10:49.703: [ DISKMON][2710:1091275072] dskm_tcpmon_thrd_main: running 2014-03-15 19:10:49.703: [ DISKMON][2710:1091275072] dskm_tcpmon_thrd_main: initFlag = 81 2014-03-15 19:10:49.705: [ DISKMON][2710:1093376320] dskm_rac_thrd_main: running 2014-03-15 19:10:49.705: [ DISKMON][2710:1458445392] dskm_rac_thrd_creat2: got the post from the css event handling thread 2014-03-15 19:10:49.707: [ DISKMON][2710:1093376320] CELL communication is configured to use 1 interface(s): 2014-03-15 19:10:49.707: [ DISKMON][2710:1093376320] 192.168.56.31 2014-03-15 19:10:49.711: [ DISKMON][2710:1095477568] dskm_oss_thrd_main: running 2014-03-15 19:10:49.711: [ DISKMON][2710:1458445392] dskm_oss_thrd_creat2: got the post from the oss check status thread 2014-03-15 19:10:49.711: [ DISKMON][2710:1458445392] dskm main: startup complete 2014-03-15 19:10:49.711: [ DISKMON][2710:1458445392] listening on -> /var/tmp/.oracle/master_diskmon 2014-03-15 19:10:49.736: [ DISKMON][2710:1093376320] IPC version: Oracle UDP/IP (generic) 2014-03-15 19:10:49.736: [ DISKMON][2710:1093376320] IPC Vendor 1 Protocol 2 2014-03-15 19:10:49.736: [ DISKMON][2710:1093376320] Version 4.1 2014-03-15 19:10:49.739: [ DISKMON][2710:1093376320] dskm_pre_oss_ini6: oss context reconnect initialized, parameter values: _dskm_disable_reconnect_to_cell = 0 _dskm_reconnect_to_oss_attempts = 7 _dskm_reconnect_to_oss_freq_in_sec = 2 _dskm_reconnect_to_oss_counter_reset_freq_in_sec = 60 2014-03-15 19:10:49.739: [ DISKMON][2710:1093376320] dskm_clss_ini1: calling clssscbinit 2014-03-15 19:10:49.739: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit 2014-03-15 19:10:50.253: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit 2014-03-15 19:10:50.633: [ DISKMON][2710:1458445392] dskm_slave_thrd_creat: thread created 2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_slave_thrd_main1: slave 1 running 2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_IDENTIFY (0x0001) 2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_proc_identify8: client orarootagent/2648, diskmon kgzm 2.1, slave 1, reid cid=DUMMY,icin=-1,nmn=-1,lnid=-1,gid=-1,gin=-1,gmn=-1,umemid=-1,opid=-1,opsn=-1,lvl=process hdr=0xfece0100 2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_send_version1: 2014-03-15 19:10:50.637: [ DISKMON][2710:1086208320] dskm_send_version4: done 2014-03-15 19:10:50.637: [ DISKMON][2710:1086208320] dskm_process_msg7: processed msg 0 type KGZM_IDENTIFY (0x0001), retcode 0 2014-03-15 19:10:50.638: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:10:50.764: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit 2014-03-15 19:10:51.276: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit 2014-03-15 19:10:51.291: [ DISKMON][2710:1093376320] dskm_clss_ini5: successful clsssinit(), clssvers 2.1 2014-03-15 19:10:51.291: [ DISKMON][2710:1093376320] dskm_clss_ini6: calling clssnsqlnum 2014-03-15 19:10:53.643: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:10:56.645: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:10:59.650: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:02.654: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:05.657: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:08.662: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:11.669: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:14.671: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:17.667: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-15 19:11:20.377: [ DISKMON][2710:1093376320] dskm_clss_ini8: calling clsssattrib 2014-03-15 19:11:20.377: [ DISKMON][2710:1093376320] dskm_clss_ini11: calling clssnsqname 2014-03-15 19:11:20.384: [ DISKMON][2710:1093376320] dskm_clss_ini13: calling clsssattrib 2014-03-15 19:11:20.384: [ DISKMON][2710:1093376320] dskm_clss_ini15: calling clssgsregnodegrp 。。。。。。。。 <span>2014-03-16 00:48:48.442: [ DISKMON][2710:1103882560] dskm_issue_ioctl_helper4: oss_ioctl 13 to device o/192.168.56.11 (oss_fd 1, handle 0x1699eb28) 2014-03-16 00:48:48.446: [ DISKMON][2710:1103882560] dskm_bcast_oss11: oss_ioctl request 0x1699eb28 completed 2014-03-16 00:48:48.446: [ DISKMON][2710:1103882560] dskm_process_msg7: processed msg 342 type KGZM_BCAST_OSS_IOCTL (0x000a), retcode 0 </span>2014-03-16 00:48:48.711: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:48:51.719: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:48:54.713: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:48:57.721: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:00.735: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:03.742: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:06.815: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:09.754: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:12.760: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:15.763: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:18.765: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:21.770: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:24.784: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:27.794: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:30.800: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:33.800: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:36.803: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:39.977: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:42.815: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:46.362: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) <span>2014-03-16 00:49:48.458: [ DISKMON][2710:1103882560] dskm_process_msg5: received msg type KGZM_BCAST_OSS_IOCTL (0x000a) 2014-03-16 00:49:48.459: [ DISKMON][2710:1103882560] dskm_issue_ioctl_helper4: oss_ioctl 13 to device o/192.168.56.11 (oss_fd 1, handle 0x1699eb28) 2014-03-16 00:49:48.464: [ DISKMON][2710:1103882560] dskm_bcast_oss11: oss_ioctl request 0x1699eb28 completed 2014-03-16 00:49:48.464: [ DISKMON][2710:1103882560] dskm_process_msg7: processed msg 343 type KGZM_BCAST_OSS_IOCTL (0x000a), retcode 0 </span>2014-03-16 00:49:49.267: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:51.833: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 00:49:54.844: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 。。。。。。。。。。 2014-03-16 01:43:42.362: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:43:45.367: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:43:48.371: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:43:51.377: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) <span>2014-03-16 01:43:51.766: [ DISKMON][2710:1103882560] dskm_process_msg5: received msg type KGZM_BCAST_OSS_IOCTL (0x000a) 2014-03-16 01:43:51.766: [ DISKMON][2710:1103882560] dskm_issue_ioctl_helper4: oss_ioctl 13 to device o/192.168.56.11 (oss_fd 1, handle 0x1699eb28) 2014-03-16 01:43:51.886: [ DISKMON][2710:1103882560] dskm_bcast_oss11: oss_ioctl request 0x1699eb28 completed 2014-03-16 01:43:51.886: [ DISKMON][2710:1103882560] dskm_process_msg7: processed msg 395 type KGZM_BCAST_OSS_IOCTL (0x000a), retcode 0 </span>2014-03-16 01:43:54.378: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:43:57.384: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:44:00.389: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:44:03.388: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:44:06.393: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:44:09.403: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:44:12.411: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011) 2014-03-16 01:44:15.418: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
从diskmon的日志信息我们可以发现,每隔1分钟,ocss进程会发送一个KGZM_BCAST_OSS_IOCTL指令给diskmon,然后diskmon会去跟cell通信(我这里是VM环境,只有一个cell,IP地址是:192.168.56.11)
显然,这些都说明了,diskmon进程确实只在Exadata环境才有作用。
在11.1.0.7的Cluster Resource中,可以看到diskmon进程是缺省就启动的(前面说过,这个版本首次引入了diskmon进程,不过该功能是为了Exadata环境设计的):
$GRID_HOME/bin/crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE rac1 Started ora.crsd 1 ONLINE ONLINE rac1 ora.cssd 1 ONLINE ONLINE rac1 ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE ONLINE rac1 OBSERVER <span>ora.diskmon 1 ONLINE ONLINE rac1 </span>ora.drivers.acfs 1 ONLINE ONLINE rac1 ora.evmd 1 ONLINE ONLINE rac1 ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac1
从11.2.0.2以后,Cluster Resource又多出了两个进程,即:(关于haip和crf不做这里的重点讨论):
ora.cluster_interconnect.haip 1 ONLINE ONLINE rac1 ora.crf 1 ONLINE ONLINE rac1
从11.2.0.3以后,在非Exadata上, ora.diskmon的缺省状态为offline:
ora.diskmon
1 OFFLINE OFFLINE rac1
这里面主要是两点原因:
1,非Exadata环境中,diskmon是没有什么用处的
2,Diskmon在非Exadata环境下,不但没走用,还容易触发一些bug,因此索性缺省为offline
从12c以后, Cluster Resource家族又增添了新成员:ora.storage,这个资源是为了Flex ASM而设计的:
ora.storage 1 ONLINE ONLINE racnode1 STABLE Diskmon进程的日志位置: GRID_HOME/log/host/diskmon – Disk Monitor Daemon