我们都知道,ASM实例管理着元数据,普通数据库实例通过查询元数据的信息来访问相应的ASM文件。
ASM实例和数据库实例都可以访问一组普通的磁盘,这套磁盘被称为磁盘组。
然后,数据库实例直接访问ASM文件的内容,并在与ASM实例通信时获取有关这些文件的分布信息。
Group Services用于注册数据库实例查找ASM实例时所需要的连接信息:
Group Services用于注册数据库实例查找ASM实例所需要的连接信息。
当ASM实例mount一个磁盘组时,它就将磁盘组的信息和连接串注册到Group Services。
数据库实例知道了磁盘组的名称,就可以找到应该连接到哪个ASM实例。
ASM实例有哪些独特地方:
1,INSTANCE_TYPE = ASM
2,startup = startup mount(11.2以后,可以直接对ASM实例 startup,但是本质还是startup mount),对于ASM实例,mount选项不会去mount数据文件,而是mount在参数文件中ASM_DISKGROUPS指定的磁盘组
3,connect / as sysdba(10g) 和 connect / as sysasm(11.2)
ASM的后台进程有很多,具体可以参考reference中的描述,这里只想研究一下数据库和ASM之间负责心跳机制的ASMB进程。
[grid@dm01db01 oraagent_grid]$ ps -ef|grep ASM1 grid 2714 2711 0 12:21 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 3467 1 0 09:24 ? 00:00:00 asm_pmon_+ASM1 grid 3471 1 0 09:24 ? 00:00:00 asm_psp0_+ASM1 grid 3475 1 0 09:24 ? 00:00:05 asm_vktm_+ASM1 grid 3481 1 0 09:24 ? 00:00:00 asm_gen0_+ASM1 grid 3485 1 0 09:24 ? 00:00:00 asm_diag_+ASM1 grid 3489 1 0 09:24 ? 00:00:00 asm_ping_+ASM1 grid 3493 1 0 09:24 ? 00:00:00 asm_dskm_+ASM1 grid 3497 1 0 09:24 ? 00:00:03 asm_dia0_+ASM1 grid 3501 1 0 09:24 ? 00:00:01 asm_lmon_+ASM1 grid 3505 1 0 09:24 ? 00:00:00 asm_lmd0_+ASM1 grid 3512 1 0 09:24 ? 00:00:01 asm_lms0_+ASM1 grid 3518 1 0 09:24 ? 00:00:00 asm_lmhb_+ASM1 grid 3522 1 0 09:24 ? 00:00:00 asm_mman_+ASM1 grid 3526 1 0 09:24 ? 00:00:00 asm_dbw0_+ASM1 grid 3530 1 0 09:24 ? 00:00:00 asm_lgwr_+ASM1 grid 3534 1 0 09:24 ? 00:00:00 asm_ckpt_+ASM1 grid 3538 1 0 09:24 ? 00:00:00 asm_smon_+ASM1 grid 3542 1 0 09:24 ? 00:00:00 asm_rbal_+ASM1 grid 3546 1 0 09:24 ? 00:00:00 asm_gmon_+ASM1 grid 3550 1 0 09:24 ? 00:00:00 asm_mmon_+ASM1 grid 3554 1 0 09:24 ? 00:00:00 asm_mmnl_+ASM1 grid 3558 1 0 09:24 ? 00:00:00 asm_xdmg_+ASM1 grid 3562 1 0 09:24 ? 00:00:00 asm_lck0_+ASM1 grid 3580 1 0 09:24 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 3628 1 0 09:24 ? 00:00:00 oracle+ASM1_ocr (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 3637 1 0 09:24 ? 00:00:00 asm_asmb_+ASM1 --------------ASM的ASMB进程 grid 3641 1 0 09:24 ? 00:00:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) -----ASMB进程连接到+ASM1,并把存储的统计信息同步到CSS grid 3847 1 0 09:24 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) -----oracleagent进程 grid 4296 1 0 09:25 ? 00:00:00 oracle+ASM1_asmb_bbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) -----ASMB进程连接到数据库实例,并把存储相关的统计信息同步到CSS(比如增加磁盘组等等) grid 6596 30872 0 13:11 pts/4 00:00:00 grep ASM1 grid 8872 1 0 10:25 ? 00:00:00 oracle+ASM1_o000_bbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) [grid@dm01db01 oraagent_grid]$
我们知道ASMB进程实际上是提供了一个数据库实例和ASM实例之间通信的桥梁,比如在数据库中创建、删除文件,或者修改文件等等的跟存储物理变化相关的操作。首先,我们观察下,他们在CRS,ASM和数据库启动过程中的启动顺序和先后关系:
ASM的alert:
。。。。。 Sun Mar 09 09:24:47 2014 NOTE: [crsd.bin@dm01db01 (TNS V1-V3) 3615] opening OCR file Starting background process ASMB Sun Mar 09 09:24:47 2014 ASMB started with pid=27, OS id=3637 Sun Mar 09 09:24:47 2014 NOTE: client +ASM1:+ASM registered, osid 3641, mbr 0x0 Sun Mar 09 09:26:06 2014 NOTE: client bbff1:bbff registered, osid 4296, mbr 0x1 。。。。。
DB的alert:
。。。。。 Sun Mar 09 09:25:49 2014 SMON started with pid=21, OS id=4272 Sun Mar 09 09:25:50 2014 RECO started with pid=22, OS id=4276 Sun Mar 09 09:25:50 2014 RBAL started with pid=23, OS id=4280 Sun Mar 09 09:25:50 2014 ASMB started with pid=24, OS id=4284 。。。。。
ASM和数据库实例的ASMB进程都分别将信息注册到css中,参看ocssd.log:
。。。。。 2014-03-09 09:24:47.069: [ CSSD][1081276736]clssgmDestroyProc: cleaning up proc(0x1f7cba50) con(0x2518) skgpid 3628 ospid 3628 with 0 clients, refcount 0 -------3628是ocr进程:oracle+ASM1_ocr (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)) 2014-03-09 09:24:47.069: [ CSSD][1081276736]clssgmDiscEndpcl: gipcDestroy 0x2518 2014-03-09 09:24:47.089: [ CSSD][1081276736]clssscSelect: cookie accept request 0x1ef1ef60 2014-03-09 09:24:47.089: [ CSSD][1081276736]clssgmAllocProc: (0x1f7cba50) allocated 2014-03-09 09:24:47.089: [ CSSD][1081276736]clssgmClientConnectMsg: properties of cmProc 0x1f7cba50 - 1,2,3,4,5 2014-03-09 09:24:47.089: [ CSSD][1081276736]clssgmClientConnectMsg: Connect from con(0x2579) proc(0x1f7cba50) pid(3628/3628) version 11:2:1:4, properties: 1,2,3,4,5 2014-03-09 09:24:47.089: [ CSSD][1081276736]clssgmClientConnectMsg: msg flags 0x0000 2014-03-09 09:24:47.487: [ CSSD][1081276736]clssscSelect: cookie accept request 0x1ef1ef60 2014-03-09 09:24:47.487: [ CSSD][1081276736]clssgmAllocProc: (0x1f7ddbd0) allocated 2014-03-09 09:24:47.487: [ CSSD][1081276736]clssgmClientConnectMsg: properties of cmProc 0x1f7ddbd0 - 1,2,3,4,5 2014-03-09 09:24:47.487: [ CSSD][1081276736]clssgmClientConnectMsg: Connect from con(0x25f5) proc(0x1f7ddbd0) pid(3641/3641) version 11:2:1:4, properties: 1,2,3,4,5 ---3641是ASMB进程,oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 。。。。。。。。。。。 2014-03-09 09:25:50.663: [ CSSD][1081276736]clssgmAllocProc: (0x1f8b6290) allocated 2014-03-09 09:25:50.663: [ CSSD][1081276736]clssgmClientConnectMsg: properties of cmProc 0x1f8b6290 - 1,2,3,4,5 2014-03-09 09:25:50.663: [ CSSD][1081276736]clssgmClientConnectMsg: Connect from con(0x35fc) proc(0x1f8b6290) pid(4284/4284) version 11:2:1:4, properties: 1,2,3,4,5----4284是数据库的ASMB进程 2014-03-09 09:25:50.663: [ CSSD][1081276736]clssgmClientConnectMsg: msg flags 0x0000 2014-03-09 09:25:50.921: [ CSSD][1081276736]clssgmDeadProc: proc 0x1f8b6290 2014-03-09 09:25:50.921: [ CSSD][1081276736]clssgmDestroyProc: cleaning up proc(0x1f8b6290) con(0x35fc) skgpid 4284 ospid 4284 with 0 clients, refcount 0 2014-03-09 09:25:50.921: [ CSSD][1081276736]clssgmDiscEndpcl: gipcDestroy 0x35fc 2014-03-09 09:25:51.195: [ CSSD][1081276736]clssscSelect: cookie accept request 0x1ef1ef60 2014-03-09 09:25:51.195: [ CSSD][1081276736]clssgmAllocProc: (0x1f8b6290) allocated 2014-03-09 09:25:51.196: [ CSSD][1081276736]clssgmClientConnectMsg: properties of cmProc 0x1f8b6290 - 1,2,3,4,5 2014-03-09 09:25:51.196: [ CSSD][1081276736]clssgmClientConnectMsg: Connect from con(0x3663) proc(0x1f8b6290) pid(4284/4284) version 11:2:1:4, properties: 1,2,3,4,5 2014-03-09 09:25:51.196: [ CSSD][1081276736]clssgmClientConnectMsg: msg flags 0x0000 2014-03-09 09:25:51.216: [ CSSD][1081276736]clssscSelect: cookie accept request 0x1ef1ef60 2014-03-09 09:25:51.216: [ CSSD][1081276736]clssgmAllocProc: (0x1f8cdb50) allocated 2014-03-09 09:25:51.218: [ CSSD][1081276736]clssgmClientConnectMsg: properties of cmProc 0x1f8cdb50 - 1,2,3,4,5 2014-03-09 09:25:51.218: [ CSSD][1081276736]clssgmClientConnectMsg: Connect from con(0x36dd) proc(0x1f8cdb50) pid(4231/4231) version 11:2:1:4, properties: 1,2,3,4,5----4231是数据库的lmon进程 。。。。。。。 2014-03-09 09:26:06.534: [ CSSD][1109334336]clssnmSendingThread: sending status msg to all nodes 2014-03-09 09:26:06.534: [ CSSD][1109334336]clssnmSendingThread: sent 4 status msgs to all nodes 2014-03-09 09:26:06.885: [ CSSD][1081276736]clssscSelect: cookie accept request 0x1ef1ef60 2014-03-09 09:26:06.885: [ CSSD][1081276736]clssgmAllocProc: (0x1f91e650) allocated 2014-03-09 09:26:06.886: [ CSSD][1081276736]clssgmClientConnectMsg: properties of cmProc 0x1f91e650 - 1,2,3,4,5 2014-03-09 09:26:06.886: [ CSSD][1081276736]clssgmClientConnectMsg: Connect from con(0x397e) proc(0x1f91e650) pid(4296/4296) version 11:2:1:4, properties: 1,2,3,4,5----oracle+ASM1_asmb_bbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq 2014-03-09 09:26:06.886: [ CSSD][1081276736]clssgmClientConnectMsg: msg flags 0x0000 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssscSelect: cookie accept request 0x1f91e650 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssscevtypSHRCON: getting client with cmproc 0x1f91e650 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssgmRegisterClient: proc(33/0x1f91e650), client(1/0x1f7baaf0) 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssgmJoinGrock: local grock UFG_+ASM1 new client 0x1f7baaf0 with con 0x39b6, requested num 1, flags 0x10100 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssgmAddGrockMember: adding member to grock UFG_+ASM1 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssgmAddMember: Adding fencing for member 1, group UFG_+ASM1, death 1, SAGE 0 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssgmAddMember: member (1/0x1f332350) added. pbsz(108) prsz(108) flags 0x0 to grock (0x1f805240/UFG_+ASM1) 2014-03-09 09:26:06.912: [ CSSD][1081276736]clssgmCommonAddMember: local group grock UFG_+ASM1 member(1/Local) node(1) flags 0x0 0x30 。。。。。
这里,数据库启动时,ASMB的活动过程:
1,ASM实例的ASMB进程启动(spid: 3637,asm_asmb_+ASM1)
2,ASM实例的ASMB进程启动了一个连接到ASM实例的进程(spid:3641,oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))))
3,ASM实例的ASMB进程将连接进程(oracle+ASM1_asmb_+asm1)的信息注册到css中
4,数据库启动时,启动数据库的ASMB进程(spid:4284,ora_asmb_bbff1)
5,数据库的ASMB进程将数据库的ASMB进程注册到CSS中
6,ASM实例的ASMB进程启动一个进程连接到数据库实例的进程:20140309-09:26:05,oracle+ASM1_asmb_bbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
7,ASM实例的ASMB进程将这个连接到数据库实例的进程(oracle+ASM1_asmb_bbff1)的信息注册到CSS中
当然,通常情况下,连接到数据库的ASMB如果出现异常,那么会很快创建一个新的连接,并注册到css中,这一点可以从css的日志中发现。
我目前的测试环境时EXADATA 11.2.3.2.1的VM,经过跟踪,可以发现,数据库进程在做类似添加、删除表空间等等所有跟存储相关的操作的时候,实际上是通过pipe来完成的(通常每个相关进程2个pipe,一个用于读,一个用于写)。不知道其他的ASM环境,是否也是这个结论,回头找个普通的ASM环境测试下,O(∩_∩)O哈哈~
下面我们删除一个表空间,并跟踪一下,看看ASMB是如何操作的:
[root@dm01db01 ~]# ps -ef|grep LOCAL=YES grid 3580 1 0 09:24 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 3628 1 0 09:24 ? 00:00:00 oracle+ASM1_ocr (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 3641 1 0 09:24 ? 00:00:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 3847 1 0 09:24 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) grid 11438 1 0 14:20 ? 00:00:00 oracle+ASM1_asmb_bbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) ----ASM实例的ASMB进程连接到数据库进程 oracle 11465 1 0 14:20 ? 00:00:01 oraclebbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) ------oracleagent进程 oracle 11650 1 0 14:22 ? 00:00:00 oraclebbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) ------oracleagent进程 oracle 11666 1 0 14:22 ? 00:00:00 oraclebbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) ------oracleagent进程 oracle 13959 13956 0 14:54 ? 00:00:00 oraclebbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) ------我的进程 root 14019 13831 0 14:55 pts/1 00:00:00 grep LOCAL=YES [root@dm01db01 ~]# [root@dm01db01 ~]# ps -ef|grep ocss grid 2881 1 0 09:22 ? 00:00:25 /u01/app/11.2.0.3/grid/bin/ocssd.bin root 14465 13831 0 15:01 pts/1 00:00:00 grep ocss [root@dm01db01 ~]# [root@dm01db01 ~]# ps -ef|grep asmb grid 3637 1 0 09:24 ? 00:00:00 asm_asmb_+ASM1 grid 3641 1 0 09:24 ? 00:00:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 11433 1 0 14:20 ? 00:00:00 ora_asmb_bbff1 grid 11438 1 0 14:20 ? 00:00:00 oracle+ASM1_asmb_bbff1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) root 12228 30240 0 14:29 pts/4 00:00:00 grep asmb [root@dm01db01 ~]#
可以看到,spid 13959是我当前的进程,删除表空间之前使用strace进行跟踪:
strace -fr -o /tmp/11438.log -p 11438 strace -fr -o /tmp/13956.log -p 13956 strace -fr -o /tmp/2881.log -p 2881 SYS@bbff1>drop tablespace lunartest include contents and datafiles; drop tablespace lunartest include contents and datafiles * ERROR at line 1: ORA-02173: invalid option for DROP TABLESPACE Elapsed: 00:00:00.08 SYS@bbff1>>drop tablespace lunartest including contents and datafiles; SP2-0734: unknown command beginning ">drop tabl..." - rest of line ignored. SYS@bbff1> SYS@bbff1>drop tablespace lunartest including contents and datafiles; Tablespace dropped. Elapsed: 00:00:07.15 SYS@bbff1>
删除表空间后,结束跟踪,并进行观察:
[root@dm01db01 ~]# strace -fr -o /tmp/11438.log -p 11438 Process 11438 attached - interrupt to quit Process 11438 detached [root@dm01db01 ~]# [root@dm01db01 ~]# strace -fr -o /tmp/13956.log -p 13956 Process 13956 attached - interrupt to quit Process 13956 detached [root@dm01db01 ~]# [root@dm01db01 ~]# strace -fr -o /tmp/2881.log -p 2881 Process 2881 attached with 20 threads - interrupt to quit Process 2881 detached Process 2885 detached Process 2888 detached Process 2889 detached Process 2890 detached Process 2891 detached Process 2902 detached Process 2903 detached Process 2924 detached Process 2925 detached Process 2926 detached Process 2927 detached Process 2930 detached Process 2934 detached Process 2940 detached Process 2941 detached Process 2942 detached Process 2944 detached Process 2948 detached Process 2949 detached [root@dm01db01 ~]#
我们看到,数据库的server process接收到“drop tablespace lunartest includ……”命令后,将信息写入了设备/proc/13956/fd下面的10号文件,并从11号文件读取了反馈信息:
13956 0.000203 read(0, "drop tablespace lunartest includ"..., 1024) = 60 13956 8.392710 gettimeofday({1394348248, 583001}, NULL) = 0 13956 0.000332 write(10, "\1S\0\0\6\0\0\0\0\0\21i\t\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\1\0\0"..., 339) = 339 13956 0.002337 read(11, "\0\313\0\0\6\0\0\0\0\0\10\6\0(\37\6\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0"..., 8208) = 203 13956 7.140464 write(1, "\n", 1) = 1 13956 0.000291 lseek(3, 3072, SEEK_SET) = 3072 13956 0.000089 read(3, "\22\0A\0\0\0t\0B\0\0\0\212\0C\0\0\0\240\0D\0\0\0\261\0E\0\0\0\302\0"..., 512) = 512 13956 0.000962 write(1, "Tablespace dropped.", 19) = 19 13956 0.000855 write(1, "\n", 1) = 1 13956 0.000333 write(1, "\n", 1) = 1 13956 0.005740 gettimeofday({1394348255, 734445}, NULL) = 0 13956 0.000224 write(1, "Elapsed: 00:00:07.15\n", 21) = 21 13956 0.000403 write(1, "SYS@bbff1>", 10) = 10
再看下进程的fd(file description)信息,我们看到,10号和11号文件分别是两个pipe:
[root@dm01db01 fd]# pwd /proc/13956/fd [root@dm01db01 fd]# ls -lrt total 0 lrwx------ 1 oracle oinstall 64 Mar 9 15:01 2 -> /dev/pts/2 lrwx------ 1 oracle oinstall 64 Mar 9 15:02 0 -> /dev/pts/2 lr-x------ 1 oracle oinstall 64 Mar 9 15:02 8 -> /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/mesg/ocius.msb lr-x------ 1 oracle oinstall 64 Mar 9 15:02 7 -> /proc/13956/fd lr-x------ 1 oracle oinstall 64 Mar 9 15:02 6 -> /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/mesg/diaus.msb lr-x------ 1 oracle oinstall 64 Mar 9 15:02 5 -> /u01/app/oracle/product/11.2.0.3/dbhome_1/sqlplus/mesg/cpyus.msb lr-x------ 1 oracle oinstall 64 Mar 9 15:02 4 -> /u01/app/oracle/product/11.2.0.3/dbhome_1/sqlplus/mesg/sp2us.msb lr-x------ 1 oracle oinstall 64 Mar 9 15:02 3 -> /u01/app/oracle/product/11.2.0.3/dbhome_1/sqlplus/mesg/sp1us.msb lr-x------ 1 oracle oinstall 64 Mar 9 15:02 11 -> pipe:[8035210] l-wx------ 1 oracle oinstall 64 Mar 9 15:02 10 -> pipe:[8035209] lrwx------ 1 oracle oinstall 64 Mar 9 15:02 1 -> /dev/pts/2 [root@dm01db01 fd]#
也就是说出了写到终端的反馈信息外,服务器进程将删除表空间的信息写入一个pipe(10),并从另一个pipe(11)读取反馈信息