Exadata上的进程-Diskmon进程

联系:QQ(5163721)

标题:Exadata上的进程-Diskmon进程

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

Master Diskmon是Oracle Clusterware 11.1.0.7版本引入的一个新的进程(主要是为了Exadata Storage Server软件而设计的),该进程作为缺省安装的一部分随着Oracle Clusterware的安装就存在了。
Master Diskmon主要负责监控cell,并负责跟数据库节点的diskmon进程通信。该进程还参与IO fencing机制和IORM(IO Resource Manager)。
Master Diskmon进程是一个单独的进程,他跟ocssd进程通信,即便是非Exadata环境,该进程也是存在的(只是非Exadata环境,Diskmon进程没有什么作用,后面会解释这个)。

在11.1.0.7中,/bin/sh /etc/init.d/init.cssd 会启动2个diskmon相关进程,即:

root 1717 0.0 0.0 6716 1368 ? Ss 11:43 0:07 /bin/sh /etc/init.d/init.cssd fatal
<span>root 2799 0.0 0.0 6720 1364 ? S 11:44 0:00 \_ /bin/sh /etc/init.d/init.cssd diskmon
oracle 3317 0.0 0.9 108864 18976 ? Sl 11:44 0:00 | \_ /u01/64bit/B107/crs/bin/diskmon.bin -d -f</span>

再来看下11.2的情况。我们都知道11.2的Oracle Clusterware结构与11.1中变化很大,所有的进程都是由ohasd启动的。下面是11.2的进程启动流程:



无标题


从图中可以看出来,当机器启动时,init 脚本会启动ohasd进程,而ohasd进程启动orarootagent 守护进程,这个orarootagent守护进程是root用户的所有ohasd资源的agent,他负责启动crsd、ctssd、ACFS Drivers和diskmon.
虽然名字是“Diskmon”,但是其主要作用是在Exadata的数据库节点上监控所有节点和网络的连接,作用是确认这些节点的存活状态。
在Exadata上,ocssd的IO Fencing进程就是跟diskmon进程通信来处理IO fencing的。

另外,从11.2以后,我们可以看到,

[root@dm01db01 init.d]# strings /u01/app/11.2.0.3/grid/bin/orarootagent.bin|grep diskmon
ora.diskmon.type
diskmon
diskmon
/valgrind_diskmon
diskmon -d -f
diskmon
DiskmonAgent::DiskmonAgent diskmon is enabled
DiskmonAgent::DiskmonAgent diskmon is disabled
DiskmonAgent:: diskmon shutting down cleanly %d
DiskmonAgent:: Unable to reboot..clean ora.diskmon...
DiskmonAgent::check: diskmon OS pid unknown %u
DiskmonAgent::check: diskmon OS pid %u
[root@dm01db01 init.d]#

上面我们看到diskmon的启动方式为“diskmon.bin -d -f”,其中:
-d 表示打开trace选项
-f 创建该进程前,remove之前创建的管道

这个很好理解,因为我们知道在Exadata上,通信是DB进程的读写是使用pipe的方式的。

[root@dm01db01 ~]# ps -ef|grep d.bin
root 2388 1 0 19:10 ? 00:00:13 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
grid 2610 1 0 19:10 ? 00:00:08 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 2624 1 0 19:10 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
grid 2636 1 0 19:10 ? 00:00:07 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
<span>root 2648 1 0 19:10 ? 00:01:21 /u01/app/11.2.0.3/grid/bin/orarootagent.bin</span>
grid 2651 1 0 19:10 ? 00:00:12 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 2670 1 0 19:10 ? 00:00:23 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 2684 1 0 19:10 ? 00:00:02 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 2706 1 0 19:10 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/cssdagent
grid 2710 1 0 19:10 ? 00:00:02 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
grid 2733 1 0 19:10 ? 00:00:20 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 2851 1 0 19:11 ? 00:00:03 /u01/app/11.2.0.3/grid/bin/octssd.bin reboot
grid 2877 1 0 19:11 ? 00:00:03 /u01/app/11.2.0.3/grid/bin/evmd.bin
root 3216 1 2 19:12 ? 00:07:13 /u01/app/11.2.0.3/grid/bin/ologgerd -M -d /u01/app/11.2.0.3/grid/crf/db/dm01db01
root 3279 1 0 19:13 ? 00:00:13 /u01/app/11.2.0.3/grid/bin/crsd.bin reboot
grid 3392 2877 0 19:13 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/evmlogger.bin -o /u01/app/11.2.0.3/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.3/grid/evm/log/evmlogger.log
grid 3445 1 0 19:13 ? 00:00:08 /u01/app/11.2.0.3/grid/bin/oraagent.bin
grid 3452 1 0 19:13 ? 00:00:01 /u01/app/11.2.0.3/grid/bin/scriptagent.bin
root 3455 1 0 19:13 ? 00:00:27 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
oracle 3807 1 0 19:14 ? 00:00:09 /u01/app/11.2.0.3/grid/bin/oraagent.bin
root 22357 22301 0 23:34 pts/4 00:00:00 grep d.bin
[root@dm01db01 ~]#

查看ocssd.bin进程的调用信息:

[root@dm01db01 ~]# pstack 2733
。。。。。。。。。。。。

Thread 3 (Thread 0x4243c940 (LWP 2834)):
#0 0x0000003946c0aee9 in <a href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0
#1 0x00002b12cee6bc78 in sltspcwait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
<span>#2 0x00002b12cf04ccfb in kgzf_send_main () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 --发送消息
</span>#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x425bd940 (LWP 2835)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002b12cf25f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
<span>#2 0x00002b12cf04c25e in kgzf_recv_main () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 ---接收消息
</span>#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b12d0b2d330 (LWP 2733)):
#0 0x0000003946c0b150 in <a href="mailto:pthread_cond_timedwait@@GLIBC_2.3.2">pthread_cond_timedwait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0
#1 0x00002b12cee6bc36 in sltspctimewait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x00002b12ccc56a18 in clsucvtimewait () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#3 0x0000000000464829 in clssnmNMGetStatus ()
#4 0x0000000000494a12 in clssgmStartNMMon ()
#5 0x000000000040a504 in clssscmain ()
#6 0x0000000000407d50 in main ()
[root@dm01db01 ~]#

查看diskmon.bin的调用信息:

[root@dm01db01 ~]# pstack 2710
Thread 11 (Thread 0x40808940 (LWP 2714)):
#0 0x0000003946c0aee9 in <a href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0
#1 0x00002aeb5478bc78 in sltspcwait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x00002aeb562bf640 in clsd_logThread () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x410b8940 (LWP 2715)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb55e64768 in sosstcppoll () from /u01/app/11.2.0.3/grid/lib/libcell11.so
#2 0x00002aeb55e431eb in ossnet_poll_in_batches () from /u01/app/11.2.0.3/grid/lib/libcell11.so
#3 0x00002aeb55e4389e in ossnet_async_monitor_poll () from /u01/app/11.2.0.3/grid/lib/libcell11.so
#4 0x00002aeb55e3ca72 in oss_wait () from /u01/app/11.2.0.3/grid/lib/libcell11.so
#5 0x00002aeb55e3d118 in oss_wait_postable () from /u01/app/11.2.0.3/grid/lib/libcell11.so
<span>#6 0x00000000004243a0 in dskm_tcpmon_thrd_main () ----监控tcp端口</span>
#7 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#8 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x412b9940 (LWP 2716)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb56515194 in sgipcwWaitHelper () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#2 0x00002aeb56512da1 in sgipcwWait () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#3 0x00002aeb5636cb47 in gipcWaitOsd () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#4 0x00002aeb56359599 in gipcInternalWait () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#5 0x00002aeb56306f60 in gipcWaitF () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
<span>#6 0x00002aeb56212d4b in clsssRecvMsg () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so --这三个thread的主要作用是监控其他进程的状态,并接收消息
#7 0x00002aeb561eea9c in clssgsGroupGetStatus () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
#8 0x00002aeb561e4ec6 in clssgsgrpstat () from /u01/app/11.2.0.3/grid/lib/libhasgen11.so
</span>#9 0x000000000042957f in dskm_rac_thrd_main ()
#10 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#11 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x414ba940 (LWP 2717)):
#0 0x0000003946c0b150 in <a href="mailto:pthread_cond_timedwait@@GLIBC_2.3.2">pthread_cond_timedwait@@GLIBC_2.3.2</a> () from /lib64/libpthread.so.0
#1 0x00002aeb5478bc36 in sltspctimewait () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x0000000000425996 in dskm_oss_thrd_main ()
#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x40be3940 (LWP 2735)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
<span>#1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1 --读取接收到的消息
</span>#2 0x0000000000424dfe in dskm_slave_thrd_main ()
#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x416bb940 (LWP 2837)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x0000000000424dfe in dskm_slave_thrd_main ()
#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x40e3a940 (LWP 2846)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb5603d6fe in ssskgxp_poll () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#2 0x00002aeb5603687f in sskgxp_select () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#3 0x00002aeb55fe9392 in skgxpiwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#4 0x00002aeb55fe7ed4 in skgxpwaiti () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#5 0x00002aeb56025c14 in skgxpwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#6 0x0000000000425e85 in dskm_rcv_thrd_main ()
#7 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#8 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x418bc940 (LWP 3154)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x0000000000424dfe in dskm_slave_thrd_main ()
#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x41abd940 (LWP 3238)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb5603d6fe in ssskgxp_poll () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#2 0x00002aeb5603687f in sskgxp_select () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#3 0x00002aeb55fe9392 in skgxpiwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#4 0x00002aeb55fe7ed4 in skgxpwaiti () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
#5 0x00002aeb56025c14 in skgxpwait () from /u01/app/11.2.0.3/grid/lib/libskgxp11.so
<span>#6 0x0000000000417ea3 in dskm_hb_thrd_main () --心跳检测
</span>#7 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#8 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x41cbe940 (LWP 4004)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb54b7f686 in skgznp_read_msg () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x0000000000424dfe in dskm_slave_thrd_main ()
#3 0x0000003946c0673d in start_thread () from /lib64/libpthread.so.0
#4 0x00000039460d44bd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aeb56ee1c50 (LWP 2710)):
#0 0x00000039460cb696 in poll () from /lib64/libc.so.6
#1 0x00002aeb54b80441 in skgznp_accept () from /u01/app/11.2.0.3/grid/lib/libclntsh.so.11.1
#2 0x000000000042d339 in dskm_main ()
#3 0x000000000040ae3a in main ()
[root@dm01db01 ~]#

上面我们看到libskgxp11.so动态库,这个文件(libskgxpX.so,其中X代表Oracle版本号9/10/11)最早是9i RAC引入的。
skgxp 是System Kernel Generic Interface Inter-Process Communications的缩写,是oracle开放的一个应用接口,用于传输GCS和GES 的数据。

非Infiniband环境下,Oracle自带的libskgxp文件定义的传输协议是UDP/IP。
在Infiniband网络中,Oracle自带的libskgxp文件定义的传输协议是IPoIB(IP over Infiniband),而在Exadata环境中,缺省使用的传输协议是RDS.

在diskmon进程的日志中显示了该进程的版本,进程号、跟cell的通信信息、监控细节等等:

I/O Fencing and SKGXP HA monitoring daemon -- Version 1.2.0.0
Process 2710 started on 2014-03-15 at 19:10:49.694

2014-03-15 19:10:49.702: [ DISKMON][2710] dskm main: starting up
2014-03-15 19:10:49.702: [ DISKMON][2710:1458445392] dskm_tcpmon_thrd_creat: thread created
2014-03-15 19:10:49.703: [ DISKMON][2710:1091275072] dskm_tcpmon_thrd_main: running
2014-03-15 19:10:49.703: [ DISKMON][2710:1091275072] dskm_tcpmon_thrd_main: initFlag = 81
2014-03-15 19:10:49.705: [ DISKMON][2710:1093376320] dskm_rac_thrd_main: running
2014-03-15 19:10:49.705: [ DISKMON][2710:1458445392] dskm_rac_thrd_creat2: got the post from the css event handling thread
2014-03-15 19:10:49.707: [ DISKMON][2710:1093376320] CELL communication is configured to use 1 interface(s):
2014-03-15 19:10:49.707: [ DISKMON][2710:1093376320] 192.168.56.31
2014-03-15 19:10:49.711: [ DISKMON][2710:1095477568] dskm_oss_thrd_main: running
2014-03-15 19:10:49.711: [ DISKMON][2710:1458445392] dskm_oss_thrd_creat2: got the post from the oss check status thread
2014-03-15 19:10:49.711: [ DISKMON][2710:1458445392] dskm main: startup complete
2014-03-15 19:10:49.711: [ DISKMON][2710:1458445392] listening on -&gt; /var/tmp/.oracle/master_diskmon
2014-03-15 19:10:49.736: [ DISKMON][2710:1093376320] IPC version: Oracle UDP/IP (generic)
2014-03-15 19:10:49.736: [ DISKMON][2710:1093376320] IPC Vendor 1 Protocol 2
2014-03-15 19:10:49.736: [ DISKMON][2710:1093376320] Version 4.1
2014-03-15 19:10:49.739: [ DISKMON][2710:1093376320] dskm_pre_oss_ini6: oss context reconnect initialized, parameter values:
_dskm_disable_reconnect_to_cell = 0
_dskm_reconnect_to_oss_attempts = 7
_dskm_reconnect_to_oss_freq_in_sec = 2
_dskm_reconnect_to_oss_counter_reset_freq_in_sec = 60
2014-03-15 19:10:49.739: [ DISKMON][2710:1093376320] dskm_clss_ini1: calling clssscbinit
2014-03-15 19:10:49.739: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit
2014-03-15 19:10:50.253: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit
2014-03-15 19:10:50.633: [ DISKMON][2710:1458445392] dskm_slave_thrd_creat: thread created
2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_slave_thrd_main1: slave 1 running
2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_IDENTIFY (0x0001)
2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_proc_identify8: client orarootagent/2648, diskmon kgzm 2.1, slave 1, reid cid=DUMMY,icin=-1,nmn=-1,lnid=-1,gid=-1,gin=-1,gmn=-1,umemid=-1,opid=-1,opsn=-1,lvl=process hdr=0xfece0100
2014-03-15 19:10:50.633: [ DISKMON][2710:1086208320] dskm_send_version1:
2014-03-15 19:10:50.637: [ DISKMON][2710:1086208320] dskm_send_version4: done
2014-03-15 19:10:50.637: [ DISKMON][2710:1086208320] dskm_process_msg7: processed msg 0 type KGZM_IDENTIFY (0x0001), retcode 0
2014-03-15 19:10:50.638: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:10:50.764: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit
2014-03-15 19:10:51.276: [ DISKMON][2710:1093376320] dskm_clss_ini2: calling clsssinit
2014-03-15 19:10:51.291: [ DISKMON][2710:1093376320] dskm_clss_ini5: successful clsssinit(), clssvers 2.1
2014-03-15 19:10:51.291: [ DISKMON][2710:1093376320] dskm_clss_ini6: calling clssnsqlnum
2014-03-15 19:10:53.643: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:10:56.645: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:10:59.650: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:02.654: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:05.657: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:08.662: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:11.669: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:14.671: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:17.667: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-15 19:11:20.377: [ DISKMON][2710:1093376320] dskm_clss_ini8: calling clsssattrib
2014-03-15 19:11:20.377: [ DISKMON][2710:1093376320] dskm_clss_ini11: calling clssnsqname
2014-03-15 19:11:20.384: [ DISKMON][2710:1093376320] dskm_clss_ini13: calling clsssattrib
2014-03-15 19:11:20.384: [ DISKMON][2710:1093376320] dskm_clss_ini15: calling clssgsregnodegrp

。。。。。。。。

<span>2014-03-16 00:48:48.442: [ DISKMON][2710:1103882560] dskm_issue_ioctl_helper4: oss_ioctl 13 to device o/192.168.56.11 (oss_fd 1, handle 0x1699eb28)
2014-03-16 00:48:48.446: [ DISKMON][2710:1103882560] dskm_bcast_oss11: oss_ioctl request 0x1699eb28 completed
2014-03-16 00:48:48.446: [ DISKMON][2710:1103882560] dskm_process_msg7: processed msg 342 type KGZM_BCAST_OSS_IOCTL (0x000a), retcode 0
</span>2014-03-16 00:48:48.711: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:48:51.719: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:48:54.713: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:48:57.721: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:00.735: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:03.742: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:06.815: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:09.754: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:12.760: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:15.763: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:18.765: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:21.770: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:24.784: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:27.794: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:30.800: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:33.800: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:36.803: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:39.977: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:42.815: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:46.362: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
<span>2014-03-16 00:49:48.458: [ DISKMON][2710:1103882560] dskm_process_msg5: received msg type KGZM_BCAST_OSS_IOCTL (0x000a)
2014-03-16 00:49:48.459: [ DISKMON][2710:1103882560] dskm_issue_ioctl_helper4: oss_ioctl 13 to device o/192.168.56.11 (oss_fd 1, handle 0x1699eb28)
2014-03-16 00:49:48.464: [ DISKMON][2710:1103882560] dskm_bcast_oss11: oss_ioctl request 0x1699eb28 completed
2014-03-16 00:49:48.464: [ DISKMON][2710:1103882560] dskm_process_msg7: processed msg 343 type KGZM_BCAST_OSS_IOCTL (0x000a), retcode 0
</span>2014-03-16 00:49:49.267: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:51.833: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 00:49:54.844: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)

。。。。。。。。。。

2014-03-16 01:43:42.362: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:43:45.367: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:43:48.371: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:43:51.377: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
<span>2014-03-16 01:43:51.766: [ DISKMON][2710:1103882560] dskm_process_msg5: received msg type KGZM_BCAST_OSS_IOCTL (0x000a)
2014-03-16 01:43:51.766: [ DISKMON][2710:1103882560] dskm_issue_ioctl_helper4: oss_ioctl 13 to device o/192.168.56.11 (oss_fd 1, handle 0x1699eb28)
2014-03-16 01:43:51.886: [ DISKMON][2710:1103882560] dskm_bcast_oss11: oss_ioctl request 0x1699eb28 completed
2014-03-16 01:43:51.886: [ DISKMON][2710:1103882560] dskm_process_msg7: processed msg 395 type KGZM_BCAST_OSS_IOCTL (0x000a), retcode 0
</span>2014-03-16 01:43:54.378: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:43:57.384: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:44:00.389: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:44:03.388: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:44:06.393: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:44:09.403: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:44:12.411: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2014-03-16 01:44:15.418: [ DISKMON][2710:1086208320] dskm_process_msg5: received msg type KGZM_PING (0x0011)

从diskmon的日志信息我们可以发现,每隔1分钟,ocss进程会发送一个KGZM_BCAST_OSS_IOCTL指令给diskmon,然后diskmon会去跟cell通信(我这里是VM环境,只有一个cell,IP地址是:192.168.56.11)

显然,这些都说明了,diskmon进程确实只在Exadata环境才有作用。

在11.1.0.7的Cluster Resource中,可以看到diskmon进程是缺省就启动的(前面说过,这个版本首次引入了diskmon进程,不过该功能是为了Exadata环境设计的):

$GRID_HOME/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE rac1 Started
ora.crsd
1 ONLINE ONLINE rac1
ora.cssd
1 ONLINE ONLINE rac1
ora.cssdmonitor
1 ONLINE ONLINE rac1
ora.ctssd
1 ONLINE ONLINE rac1 OBSERVER
<span>ora.diskmon
1 ONLINE ONLINE rac1
</span>ora.drivers.acfs
1 ONLINE ONLINE rac1
ora.evmd
1 ONLINE ONLINE rac1
ora.gipcd
1 ONLINE ONLINE rac1
ora.gpnpd
1 ONLINE ONLINE rac1
ora.mdnsd
1 ONLINE ONLINE rac1

从11.2.0.2以后,Cluster Resource又多出了两个进程,即:(关于haip和crf不做这里的重点讨论):

ora.cluster_interconnect.haip
1 ONLINE ONLINE rac1
ora.crf
1 ONLINE ONLINE rac1

从11.2.0.3以后,在非Exadata上, ora.diskmon的缺省状态为offline:
ora.diskmon
1 OFFLINE OFFLINE rac1

这里面主要是两点原因:
1,非Exadata环境中,diskmon是没有什么用处的
2,Diskmon在非Exadata环境下,不但没走用,还容易触发一些bug,因此索性缺省为offline

从12c以后, Cluster Resource家族又增添了新成员:ora.storage,这个资源是为了Flex ASM而设计的:

ora.storage
1 ONLINE ONLINE racnode1 STABLE


Diskmon进程的日志位置:
GRID_HOME/log/host/diskmon – Disk Monitor Daemon
此条目发表在 FAQ 分类目录,贴了 , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注