ASM NORMAL REDUNDANCY情况下,谁完成了数据的镜像IO?

联系:QQ(5163721)

标题:ASM NORMAL REDUNDANCY情况下,谁完成了数据的镜像IO?

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

前几天,一些朋友讨论ASM中,如果是NORMAL redundancy磁盘组,数据的镜像是由oracle rdbms进程完成,还是由ASM的进程完成镜像的工作。
我们知道,ASM NORMAL REDUNDANCY磁盘组类似于RAID 10的操作,也就是镜像+条带划。
.
在传统架构中,oracle只负责写入一份数据,数据保护(镜像)是由存储或者RAID卡来完成的,那么在ASM中是否也是DB完成一次写入,ASM进行同步呢?
根据下面的测试,结论是:
DB的进程完成数据库中所有应用数据的IO操作,包括镜像数据的IO。而ASM进程只负责元数据(metadata extent)的维护和IO。
.
具体测试如下:
首先,我们创建一个normal redundancy的磁盘组,用来放数据库的redo,比如 +REDODG:

SQL>select GROUP_NUMBER,NAME,SECTOR_SIZE,BLOCK_SIZE,ALLOCATION_UNIT_SIZE,TYPE FROM V$ASM_DISKGROUP where name='REDODG';

GROUP_NUMBER NAME                           SECTOR_SIZE BLOCK_SIZE ALLOCATION_UNIT_SIZE TYPE
------------ ------------------------------ ----------- ---------- -------------------- ------
           6 REDODG                                 512       4096              1048576 NORMAL

SQL>
SQL>col path for a50  
SQL>col library for a15    
SQL>select GROUP_NUMBER,DISK_NUMBER,REDUNDANCY,LIBRARY,NAME,PATH from v$asm_disk WHERE GROUP_NUMBER=6;

GROUP_NUMBER DISK_NUMBER REDUNDA LIBRARY         NAME                           PATH
------------ ----------- ------- --------------- ------------------------------ --------------------------------------------------
           6           1 UNKNOWN System          REDODG_0001                    /dev/mapper/redolun2
           6           0 UNKNOWN System          REDODG_0000                    /dev/mapper/redolun1

SQL>

这个两个磁盘的failure group的信息如下:

SQL>SELECT GROUP_NUMBER,DISK_NUMBER,STATE,REDUNDANCY,LIBRARY,NAME,FAILGROUP,PATH,REPAIR_TIMER FROM V$ASM_DISK WHERE GROUP_NUMBER=6;

GROUP_NUMBER DISK_NUMBER STATE    REDUNDA LIBRARY         NAME                           FAILGROUP
------------ ----------- -------- ------- --------------- ------------------------------ ------------------------------
PATH                                               REPAIR_TIMER
-------------------------------------------------- ------------
           6           1 NORMAL   UNKNOWN System          V5DATA_0001                    V5DATA_0001
/dev/mapper/v5lun2                                            0

           6           0 NORMAL   UNKNOWN System          V5DATA_0000                    V5DATA_0000
/dev/mapper/v5lun1                                            0


SQL>

[oracle@lunardb1 ~]$ ll /dev/mapper/redolun*
brw-rw---- 1 oracle oinstall 253, 8 Jun 16 10:39 /dev/mapper/redolun1
brw-rw---- 1 oracle oinstall 253, 9 Jun 16 10:39 /dev/mapper/redolun2
[oracle@lunardb1 ~]$ 

然后,我们使用REDODG创建了9组redo log group(这套10204的RAC的redo都放在上面了):

[oracle@lunardb1 ~]$ ss

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Jun 16 10:37:49 2015

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options

sys@LUNAR>select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
         1          1       4623   52428800          1 YES INACTIVE             902491454 16-JUN-15
         2          1       4621   52428800          1 YES INACTIVE             901886291 16-JUN-15
         3          2       1621   52428800          1 YES INACTIVE             900432674 16-JUN-15
         4          2       1624   52428800          1 NO  CURRENT              902514208 16-JUN-15
         5          1       4624 1073741824          1 YES INACTIVE             902511227 16-JUN-15
         6          1       4625 1073741824          1 NO  CURRENT              903006387 16-JUN-15
         7          1       4622 1073741824          1 YES INACTIVE             901890974 16-JUN-15
         8          2       1622 1073741824          1 YES INACTIVE             901661757 16-JUN-15
         9          2       1623 1073741824          1 YES INACTIVE             901886509 16-JUN-15

9 rows selected.

sys@LUNAR>col member for a70
sys@LUNAR>select * from v$logfile;

    GROUP# STATUS  TYPE    MEMBER                                                                 IS_
---------- ------- ------- ---------------------------------------------------------------------- ---
         2         ONLINE  +REDODG/lunar/onlinelog/group_2.269.855587247                           NO
         1         ONLINE  +REDODG/lunar/onlinelog/group_1.270.855587247                           NO
         3         ONLINE  +REDODG/lunar/onlinelog/group_3.264.855587433                           NO
         4         ONLINE  +REDODG/lunar/onlinelog/group_4.263.855587433                           NO
         5         ONLINE  +REDODG/lunar/onlinelog/group_5.341.855591573                           NO
         6         ONLINE  +REDODG/lunar/onlinelog/group_6.303.855591671                           NO
         7         ONLINE  +REDODG/lunar/onlinelog/group_7.403.855591683                           NO
         8         ONLINE  +REDODG/lunar/onlinelog/redo08.log                                      NO
         9         ONLINE  +REDODG/lunar/onlinelog/redo09.log                                      NO

9 rows selected.

这个数据库实例的LGWR进程号为 11159:

[oracle@lunardb1 ~]$ ps -ef|grep lgwr|grep lunar
oracle   11159     1  0 Mar03 ?        08:01:25 ora_lgwr_lunar1
[oracle@lunardb1 ~]$ 


sys@lunar>select spid from v$process where PROGRAM like '%LGWR%';

SPID
------------
11159

sys@lunar>

现在我们使用strace跟踪一下这个进程在数据库切换日志时的动作,如果lgwr进程只写了一个设备,比如/dev/mapper/redolun1或者/dev/mapper/redolun2,那么可以再跟踪一下ASMB进程。
.
如果LGWR进程写了两个设备,即/dev/mapper/redolun2和/dev/mapper/redolun1都写入了相应的IO,那么我们可以认为,数据库的LGWR自己完成了primary extent和mirror extent的全部操作。
这也是Oracle 文档中一直说明的一点“ASM负责ASM实例的metadata的IO,而DB完成应用实际数据的IO”。
具体跟踪文件如下:

首先我们看到oracle将相同的内容
[oracle@lunardb1 ~]$ tail -f /tmp/lgwr_lunar1_strace-1.log
。。。。。。。。。。。。。。。。。。。。。。。
11159      0.000078 times(NULL)         = 1336555656
11159      0.000043 pread(16, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\245K\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512
11159      0.007057 times(NULL)         = 1336555657
11159      0.000081 times(NULL)         = 1336555657
11159      0.000045 pread(16, "\25\302\0\0f\0\0\0\342\334\6\0\377\377\1\4\375-\0\0\3\0\2\0\0\0\0\0\0\0+"..., 16384, 586383360) = 16384
11159      0.000222 times(NULL)         = 1336555657
11159      0.000077 times(NULL)         = 1336555657
11159      0.000059 pwrite(17, "\1\"\0\0\1\0\0\0\27\22\0\0\0\200\246\335\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 1400898048) = 512
11159      0.005443 times(NULL)         = 1336555658
11159      0.000063 times(NULL)         = 1336555658
11159      0.000049 pwrite(16, "\1\"\0\0\1\0\0\0\27\22\0\0\0\200\246\335\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 1400898048) = 512
11159      0.004075 times(NULL)         = 1336555658
11159      0.000098 times(NULL)         = 1336555658
11159      0.000120 pread(16, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\245K\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512
11159      0.000148 times(NULL)         = 1336555658
11159      0.000068 times(NULL)         = 1336555658
11159      0.000044 pwrite(16, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\255\364\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512
11159      0.000472 times(NULL)         = 1336555658
11159      0.000060 times(NULL)         = 1336555658
11159      0.000052 pwrite(17, "\1\"\0\0\1\0\0\0\26\22\0\0\0\200\255\364\0\0\0\0\0\3 \n-\250\371\232lunar"..., 512, 105444803072) = 512
11159      0.000399 times(NULL)         = 1336555658
11159      0.000075 times(NULL)         = 1336555658
。。。。。。。。。。。。。。。。。。。。。。。。。。。

上面的跟踪文件可以很清晰的看到,LGWR进程连续写了2分相同的数据到fd为16和17的设备上。
那么16和17是什么呢:

[oracle@lunardb1 fd]$ cd /proc/11159/fd
[oracle@lunardb1 fd]$ ls -lrt
total 0
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 0 -> /dev/null
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 9 -> /u01/oracle/app/product/10.2/db_1/dbs/lkinstlunar1 (deleted)
l-wx------ 1 oracle oinstall 64 Jun 13 17:04 8 -> /u01/oracle/app/admin/lunar/bdump/alert_lunar1.log
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 7 -> /u01/oracle/app/product/10.2/db_1/dbs/hc_lunar1.dat
l-wx------ 1 oracle oinstall 64 Jun 13 17:04 6 -> /u01/oracle/app/admin/lunar/bdump/alert_lunar1.log
l-wx------ 1 oracle oinstall 64 Jun 13 17:04 5 -> /u01/oracle/app/admin/lunar/udump/lunar1_ora_11099.trc
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 4 -> /dev/null
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 3 -> /dev/null
l-wx------ 1 oracle oinstall 64 Jun 13 17:04 2 -> /u01/oracle/app/admin/lunar/bdump/lunar1_lgwr_11159.trc
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 18 -> /u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 17 -> /dev/mapper/redolun2
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 16 -> /dev/mapper/redolun1
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 15 -> socket:[32662]
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 14 -> /u01/oracle/app/product/10.2/db_1/dbs/hc_lunar1.dat
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 13 -> /u01/oracle/app/product/10.2/db_1/rdbms/mesg/oraus.msb
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 12 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 11 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 10 -> socket:[32659]
lr-x------ 1 oracle oinstall 64 Jun 13 17:04 1 -> /dev/null
[oracle@lunardb1 fd]$ 
[oracle@lunardb1 fd]$ ll 17
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 17 -> /dev/mapper/redolun2
[oracle@lunardb1 fd]$ ll 16
lrwx------ 1 oracle oinstall 64 Jun 13 17:04 16 -> /dev/mapper/redolun1
[oracle@lunardb1 fd]$ 

这里看到,16和17就是redodg所使用的两个磁盘。也就是说,LGWR自己完成了primary extent和mirror extent的IO操作。
至此已经很清楚了,那么可以我们可以推断,DBWR等数据库操作也是有DB自己的进程完成了,而ASM只负责元数据的IO操作和维护。
具体的测试,有兴趣的可以自己跟踪。
上面的跟踪信息还可以看到,实际上oracle使用AIO的方式(使用io_submit,io_getevents等),定期同步控制文件的信息,仍然是写16和17两个设备。
并且通知ARCH进程进行归档操作,并在完成后,写入alert.log的过程:

。。。。。。。。。。。。。。。。。。。
11159      0.000050 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2
11159      0.000127 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 1
11159      0.000545 times(NULL)         = 1336555658
11159      0.000053 io_getevents(46982646722560, 1, 1023, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}}, {600, 0}) = 1
11159      0.000048 times(NULL)         = 1336555658
11159      0.000075 times(NULL)         = 1336555658
11159      0.000068 pread(17, "\25\302\0\0\t\3\0\0\376\326\6\0\377\377\1\4\311N\0\0\4\2\2\0\337\7\0\0\0\0\0\0"..., 16384, 591937536) = 16384
11159      0.000230 times(NULL)         = 1336555658
11159      0.000086 times(NULL)         = 1336555658
11159      0.000045 pread(17, "\25\302\0\0\253\1\0\0\t\251\n\0\377\377\1\0041\330\0\0\270\35\2264\1\0\0\0\363\27\0\0"..., 16384, 588038144) = 16384
11159      0.005841 times(NULL)         = 1336555659
11159      0.000094 times(NULL)         = 1336555659
11159      0.000046 pread(17, "\25\302\0\0.\0\0\0)\251\n\0\377\377\1\4\356\231\0\0\0\220\1\0\24\22\0\0\2\0\0\0"..., 16384, 587300864) = 16384
11159      0.000195 times(NULL)         = 1336555659
11159      0.000076 times(NULL)         = 1336555659
11159      0.000099 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2
11159      0.000164 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}, {0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 2
11159      0.000329 times(NULL)         = 1336555659
11159      0.000050 times(NULL)         = 1336555659
11159      0.000065 times(NULL)         = 1336555659
11159      0.000045 pread(17, "\25\302\0\0,\0\0\0\27\251\n\0\377\377\1\4\310Z\0\0\17\0\0\0-\246\3465\0\0\375>"..., 16384, 587268096) = 16384
11159      0.000221 times(NULL)         = 1336555659
11159      0.000098 times(NULL)         = 1336555659
11159      0.000054 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2
11159      0.000121 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}}, {600, 0}) = 1
11159      0.000379 times(NULL)         = 1336555659
11159      0.000048 io_getevents(46982646722560, 1, 1023, {{0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 1
11159      0.000047 times(NULL)         = 1336555659
11159      0.000077 times(NULL)         = 1336555659
11159      0.000053 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2
11159      0.000108 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}, {0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 2
11159      0.000425 times(NULL)         = 1336555659
11159      0.000038 times(NULL)         = 1336555659
11159      0.000073 times(NULL)         = 1336555659
11159      0.000050 io_submit(46982646722560, 2, {{0x2abafffaa3c8, 0, 1, 0, 16}, {0x2abafffaa6d8, 0, 1, 0, 17}}) = 2
11159      0.000114 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 1
11159      0.000421 times(NULL)         = 1336555659
11159      0.000041 io_getevents(46982646722560, 1, 1023, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}}, {600, 0}) = 1
11159      0.000047 times(NULL)         = 1336555659
11159      0.000076 times(NULL)         = 1336555659
11159      0.000054 io_submit(46982646722560, 2, {{0x2abafffaa6d8, 0, 1, 0, 16}, {0x2abafffaa3c8, 0, 1, 0, 17}}) = 2
11159      0.000128 io_getevents(46982646722560, 1, 1024, {{0x2abafffaa6d8, 0x2abafffaa6d8, 16384, 0}, {0x2abafffaa3c8, 0x2abafffaa3c8, 16384, 0}}, {600, 0}) = 2
11159      0.000318 times(NULL)         = 1336555659
11159      0.000038 times(NULL)         = 1336555659
11159      0.000060 times(NULL)         = 1336555659
11159      0.000044 pread(16, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\16\243\0\0\0\0\0\0\0\3 \n-\250\371\232"..., 16384, 581976064) = 16384
11159      0.000244 times(NULL)         = 1336555660
11159      0.000067 times(NULL)         = 1336555660
11159      0.000117 times(NULL)         = 1336555660
11159      0.000044 times(NULL)         = 1336555660
11159      0.000037 times(NULL)         = 1336555660
11159      0.000343 times(NULL)         = 1336555660
11159      0.000065 semctl(720901, 51, SETVAL, 0x7fff00000001) = 0
11159      0.000081 times(NULL)         = 1336555660
11159      0.000053 pread(16, "\25\302\0\0f\0\0\0\342\334\6\0\377\377\1\4\375-\0\0\3\0\2\0\0\0\0\0\0\0+V"..., 16384, 586383360) = 16384
11159      0.000234 times(NULL)         = 1336555660
11159      0.000062 times(NULL)         = 1336555660
11159      0.000081 semctl(720901, 18, SETVAL, 0x2abb00000001) = 0
11159      0.000062 semctl(720901, 19, SETVAL, 0x2abb00000001) = 0
11159      0.000123 semctl(720901, 20, SETVAL, 0x2abb00000001) = 0
11159      0.000251 open("/proc/11356/stat", O_RDONLY) = 19
11159      0.000113 read(19, "11356 (oracle) S 1 11356 11356 0"..., 999) = 249
11159      0.000118 close(19)           = 0
11159      0.000120 semctl(720901, 36, SETVAL, 0x2abb00000001) = 0
11159      0.000239 close(8)            = 0
11159      0.000044 open("/u01/oracle/app/admin/lunar/bdump/alert_lunar1.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 8
11159      0.000069 writev(8, [{"Tue Jun 16 14:47:51 2015\n", 25}, {"Thread 1 advanced to log sequenc"..., 52}, {"\n", 1}], 3) = 78
11159      0.000075 times(NULL)         = 1336555660
11159      0.000043 times(NULL)         = 1336555660
11159      0.000053 close(8)            = 0
11159      0.000053 open("/u01/oracle/app/admin/lunar/bdump/alert_lunar1.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 8
11159      0.000057 writev(8, [{"  Current log# 2 seq# 4631 mem# "..., 79}, {"\n", 1}], 2) = 80
11159      0.000061 times(NULL)         = 1336555660
11159      0.000043 times(NULL)         = 1336555660
11159      0.000043 semtimedop(720901, 0x7fff585eeef0, 1, {1, 960000000}) = 0
11159      0.105071 times(NULL)         = 1336555670
11159      0.000058 times(NULL)         = 1336555670
11159      0.000102 times(NULL)         = 1336555670
。。。。。。。。。。。。。。。。。。。。。。。。。。。
[oracle@lunardb1 ~]$ 

至此,已经完全可以得出结论,ASM的冗余操作分为两部分:
1,数据库中实际应用数据的冗余,primary extent和mirror extent都由数据库自己完成
2,ASM的元数据的镜像操作由ASM进程自己完成。

此条目发表在 ASM 分类目录,贴了 , 标签。将固定链接加入收藏夹。

发表评论

电子邮件地址不会被公开。 必填项已用 * 标注