User guide上列出了全部的Metric,这里只说些一般比较关注的:
CellCLI> list metriccurrent FC_BY_USED, FC_BY_DIRTY FC_BY_USED FLASHCACHE 307,012 MB FC_BY_DIRTY FLASHCACHE 250,320 MB
FC_BY_USED – number of MB cached (total)
FC_BY_DIRTY – number of dirty MB cached (data written only to FlashCache but not to disks)
CellCLI> list metriccurrent GD_BY_FC_DIRTY GD_BY_FC_DIRTY DATA_CD_00_cel14 7,214 MB GD_BY_FC_DIRTY DATA_CD_01_cel14 6,698 MB . . . CellCLI> list metriccurrent CD_BY_FC_DIRTY where metricObjectName=FD_05_cel14 CD_BY_FC_DIRTY FD_05_cel14 19,430 MB
GD_BY_FC_DIRTY – number of dirty MB cached for the griddisk
CD_BY_FC_DIRTY – number of dirty MB cached on the flash celldisk
通常我们测试一个应用的flashcache的负载,可以参考如下过程:
1,重置flashcache的统计信息:
CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(resetStats,0,0,0)"
2,应用测试增加负载
3,dump统计信息:
看 cellsrv 的ADR…/trace/下面的svtrc开头的trace文件,比如我的环境:
export CELL_ADR=/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(dumpStats,0,0,L)" L – is any non-negative number to be used as a result Label
动态调整从griddisk重读的频率:
CellCLI> alter cell events = "immediate cellsrv.cellsrv_setparam ('_cell_flashcache_diag_reads_frequency ','20')"
下面是查看flashcache的控制块的结构的方法:
首先在cellsrv的alert中找到类似下面的信息:
。。。 Sun Jun 30 10:24:28 2013 <strong>Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=67d1e918-7455-4afa-9b6f-7ecb2da11b82 (504684860), size=816MB, cdisk=FD_02_dm01cel01</strong> Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=d6400f21-abcc-4ea3-a09c-79425619ef36 (1922598212), size=816MB, cdisk=FD_01_dm01cel01 Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=decfc31a-e1ba-4d63-9161-50f6db206572 (1710125700), size=816MB, cdisk=FD_03_dm01cel01 Caching enabled on FlashCache Part dm01cel01_FLASHCACHE guid=4dec16e9-1559-468d-9cfe-39443ac18ddc (1241609148), size=816MB, cdisk=FD_00_dm01cel01 FlashCache: allowing client IOs (mode=Writeback) Sun Jun 30 10:24:52 2013 Smart Flash Logging enabled on FlashLog dm01cel01_FLASHLOG (2124647276), size=128MB, cdisk=FD_00_dm01cel01 Sun Jun 30 10:24:52 2013 Smart Flash Logging enabled on FlashLog dm01cel01_FLASHLOG (885498860), size=128MB, cdisk=FD_03_dm01cel01 Sun Jun 30 10:24:53 2013 Smart Flash Logging enabled on FlashLog dm01cel01_FLASHLOG (1996959724), size=128MB, cdisk=FD_02_dm01cel01 。。。
dump这个flashcache的控制块结构:
CellCLI> list cell detail name: dm01cel01 bbuTempThreshold: 60 bbuChargeThreshold: 800 bmcType: absent cellVersion: OSS_11.2.3.2.1_LINUX.X64_130109 cpuCount: 2 diagHistoryDays: 7 fanCount: 1/1 fanStatus: normal flashCacheMode: WriteBack id: ef56a78d-a9cc-4c04-baac-648a18370eb7 interconnectCount: 1 interconnect1: eth0 iormBoost: 0.0 ipaddress1: 192.168.56.11/24 kernelVersion: 2.6.18-274.el5xen <strong> makeModel: Fake hardware</strong> metricHistoryDays: 7 offloadEfficiency: 1,000.0 powerCount: 1/1 powerStatus: normal releaseVersion: 11.2.3.2.1 releaseTrackingBug: 14522699 status: online temperatureReading: 0.0 temperatureStatus: normal upTime: 0 days, 0:09 cellsrvStatus: running msStatus: running rsStatus: running <strong>CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(dumpctrlblock, 504684860,0,L)"</strong> Dump sequence #1 has been written to <strong><strong>/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/dm01cel01/trace/svtrc_2312_58.trc</strong> Cell dm01cel01 successfully altered CellCLI>
trace内容如下:
Trace file /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/dm01cel01/trace/svtrc_2312_58.trc ORACLE_HOME = /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109 System name: Linux Node name: dm01cel01 Release: 2.6.18-274.el5xen Version: #1 SMP Mon Jul 25 14:24:57 EDT 2011 Machine: x86_64 CELL SW Version: OSS_11.2.3.2.1_LINUX.X64_130109 *** 2013-06-30 10:24:25.763 UserThread: LWPID: 2648 userId: 58 kernelId: 58 pthreadID: 0x6641a940 FCC: Control block memory dump for Flash ID 504684860: 2013-06-30 10:33:35.042335*: Dump sequence #1: 2AAB6BD00000 A624DCC2 00000000 54434346 4B4C424C [..$.....FCCTLBLK] 2AAB6BD00010 64373600 31396531 34372D38 342D3535 [.67d1e918-7455-4] 2AAB6BD00020 2D616661 66366239 6365372D 61643262 [afa-9b6f-7ecb2da] 2AAB6BD00030 38623131 03050032 00000002 00000000 [11b82...........] 2AAB6BD00040 00280000 00000000 33000000 00000000 [..(........3....] 2AAB6BD00050 5B0B8A32 0000013F 1E14E13C 00000002 [2..[?...<.......] 2AAB6BD00060 00000010 00000000 54434346 4B4C424C [........FCCTLBLK] 2AAB6BD00070 00000000 00000000 [........] <strong>FCC: Control block formatted dump for Flash ID 504684860: fid=504684860, head/tail FCCTLBLK/FCCTLBLK, mdSize=2621440, chksum=2787433666, flashSz=855638016, flags=5, cacheline size shift 16, creation_timestamp=1371622050354, version=2, guid=67d1e918-7455-4afa-9b6f-7ecb2da11b82, pers_mode=Writeback, verif_level=crc. </strong><strong>FCC: Control block for flashID=504684860 is located on cdisk='FD_02_dm01cel01' at offset 184549376 FCC: Primary FC metadata for flashID=504684860 is located on cdisk='FD_02_dm01cel01', offset/size: 184614912 1261568 FCC: Shadow FC metadata for flashID=504684860 is located on cdisk='FD_02_dm01cel01', offset/size: 185876480 1261568 </strong>
检查某个griddisk是否被cache,并dump其cache-line header的信息
首先看alert 。。。 Sun Jun 30 10:24:26 2013 CellDisk v0.7 name=CD_disk01_dm01cel01 status=NORMAL confine status=NONE confine reason=CD_GOOD guid=55031178-14e1-49f8-b080-ce4293b5c683 found on dev=/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01 GridDisk name=data_CD_disk01_dm01cel01 guid=6065c05e-8eae-461c-9b43-02b5c46fd6bb (542754140), cached by these FlashCache parts: 504684860 GridDisk name=reco_CD_disk01_dm01cel01 guid=6d40fc07-34e3-4e71-9fd8-a51a98e68769 (709161532), cached by these FlashCache parts: 504684860 Initialization of celldisk CD_disk01_dm01cel01 on /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/disk01 completed. GridDisk name=data_CD_disk12_dm01cel01 guid=0fb93c36-a5b5-431a-aa7b-025b52f7cbe4 (2820417692), cached by these FlashCache parts: 1241609148 GridDisk name=reco_CD_disk12_dm01cel01 guid=bba5ae9b-dc0f-4515-94f1-2104e8d0bc44 (1611296652), cached by these FlashCache parts: 1241609148 。。。
使用griddisk id和griddisk offset从FC中dump相关的cache的元数据:
SQL> conn lunar/lunar Connected. SQL> create table lunartest as select * from dba_objects; Table created. SQL> alter table lunartest STORAGE (CELL_FLASH_CACHE keep); Table altered. SQL> select object_id from user_objects; OBJECT_ID ---------- 17852 SQL> select count(*) from lunartest; COUNT(*) ---------- 17580 SQL> CellCLI> LIST FLASHCACHECONTENT WHERE objectNumber=17852 DETAIL cachedKeepSize: 0 cachedSize: 966656 dbID: 3118431096 dbUniqueName: BBFF hitCount: 0 missCount: 0 objectNumber: 17852 tableSpaceNumber: 4 CellCLI> SQL> select count(*) from lunartest; COUNT(*) ---------- 17580 SQL> CellCLI> LIST FLASHCACHECONTENT WHERE objectNumber=17852 DETAIL cachedKeepSize: 1925120 cachedSize: 1925120 dbID: 3118431096 dbUniqueName: BBFF hitCount: 25 hoursToExpiration: 24 missCount: 2 objectNumber: 17852 tableSpaceNumber: 4 CellCLI> CellCLI> alter cell events="immediate cellsrv.cellsrv_flashcache(dumpmdchunk,1,542754140,1)" Cell dm01cel01 successfully altered CellCLI>
[root@dm01cel01 trace]# cat svtrc_2312_80.trc
Trace file /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/dm01cel01/trace/svtrc_2312_80.trc
ORACLE_HOME = /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109
System name: Linux
Node name: dm01cel01
Release: 2.6.18-274.el5xen
Version: #1 SMP Mon Jul 25 14:24:57 EDT 2011
Machine: x86_64
CELL SW Version: OSS_11.2.3.2.1_LINUX.X64_130109
*** 2013-06-30 10:24:25.608
UserThread: LWPID: 2670 userId: 80 kernelId: 80 pthreadID: 0x74030940
2013-06-30 10:24:26.061175*: For GridDisk data_CD_disk05_dm01cel01 set these caching FlashIDs: 1922598212
2013-06-30 10:24:26.061175*: For GridDisk reco_CD_disk05_dm01cel01 set these caching FlashIDs: 1922598212
2013-06-30 10:24:26.139905*: [CDP] initCDPers – found persdata for guid: fb9bb908-044e-44a1-afa1-2428c065b9bc
dmgType: DMG_UNKNOWN dmgSlot: 255 predFailStat: 0 ioTimeIndex: 0 lastIOCompTime: 1372559066070289 lastIOSubmitTime: 0 histIOLatIndex: 0
CellDisk UUID: fb9bb908-044e-44a1-afa1-2428c065b9bc CellDiskPersObj File offset: 7360
confTransIdx: 0 Current confine state: NONE Health incarnation number: 0 ConfineTransIndex cstate ccause activeForced activeAlertSent inactiveForced inactiveAlertSent asmRespond testsFailed testOutcomeForced noneTime activeTime inactiveTime finalTime
0 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
1 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
2 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
3 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
4 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
5 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
6 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
7 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
8 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
9 NONE CD_GOOD 0 0 0 0 0 0 0 0 0 0 0
2013-06-30 10:25:15.733434*: New info from MS for CD CD_disk09_dm01cel01: diskMediaGroup: DMG_UNKNOWN, disk slot number: 1, predictive failure on disk: 0
No Cache header ID=1, loc=542703616
[root@dm01cel01 trace]#
不知道是否因为我的是VM,因此,这个命令实际上并没有令我满意的结果…………