Lunar的oracle实验室 - 世间所有相遇都是久别重逢

Exadata上的Writeback和Writethrouth-2-SSD篇

发表于 2014 年 5 月 19 日由 Lunar

在了解Exadata上的Flash Card的使用方式之前，我也先给自己科普了一下SSD的相关知识，具体如下：

SSD，也就是我们常说的固态硬盘（Solid State Disk），他是用固态电子存储芯片阵列而制成的硬盘，由控制单元和存储单元（FLASH芯片、DRAM芯片）组成，好一点的还可以多一个缓存芯片。固态硬盘在接口的规范和定义、功能及使用方法上与普通硬盘的完全相同，在产品外形和尺寸上也完全与普通硬盘一致。

从发展实践来看，1970年，StorageTek公司(Sun StorageTek)开发了第一个固态硬盘驱动器。1989年，世界上第一款固态硬盘出现。

固态硬盘（Solid State Drives），是用固态电子存储芯片阵列而制成的硬盘，其芯片的工作温度范围很宽,商规产品（0~70℃）工规产品（-40~85℃）。这些指标都的使用范围都远超过硬盘的工作规范，因此，我们常说固态盘适应的环境更多……

Exadata上的Writeback和Writethrouth-2-SSD篇

发表在 FAQ, 体系架构 | 标签为 Exadata, exadata ssd, ssd | 留下评论

Exadata上的Writeback和Writethrouth-1-硬盘篇

发表于 2014 年 5 月 19 日由 Lunar

以前写过一些关于Exadata上磁盘管理相关的文章：

Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息
 Exadata的磁盘自动管理-2-Cell上各种磁盘相关的概念
 Exadata的磁盘自动管理-3-磁盘自动管理的操作规则

如何看待exadata的cell节点出现的writethrough/wirteback模式更换或者控制器充放电信息

Exadata更换硬盘的操作过程和解释

今天偶然间看见一段alert的信息，这是Exadata上Disk Controller BBU充放电的相关信息，具体解释请参见《如何看待exadata的cell节点出现的writethrough/wirteback模式更换或者控制器充放电信息》：

dm01cel01: 25_1 2014-01-17T02:00:52+08:00 info "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours. Note that many learn cycles do not require entering WriteThrough caching mode. When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent. Battery Serial Number : 13718 Battery Type : iBBU08 Battery Temperature : 42 C Full Charge Capacity : 1345 mAh Relative Charge : 100 % Ambient Temperature : 23 C"

dm01cel01: 25_2 2014-01-17T07:34:12+08:00 clear "All disk drives are in WriteBack caching mode. Battery Serial Number : 13718 Battery Type : iBBU08 Battery Temperature : 46 C Full Charge Capacity : 1341 mAh Relative Charge : 51 % Ambient Temperature : 23 C"
dm01cel01: 26 2014-01-20T10:49:03+08:00 info "This is a test trap"
dm01cel01: 27_1 2014-03-01T12:27:00+08:00 critical "Cell configuration check discovered the following problems: Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Checking DNS server on 0.48.0.10 : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."

dm01cel01: 27_2 2014-03-02T12:25:18+08:00 clear "The cell configuration check was successful."
dm01cel01: 28_1 2014-03-08T12:26:54+08:00 critical "Cell configuration check discovered the following problems: Check Exadata configuration via ipconf utility Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Checking DNS server on 10.48.0.10 : FAILED Error. Overall status of verification of Exadata configuration file: FAILED [INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] You may ignore this alert, if the NTP or DNS servers are valid and available. [INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations."

dm01cel01: 28_2 2014-03-09T12:25:19+08:00 clear "The cell configuration check was successful."

联想到Exadata上Flash Card的“Writeback和Writethrough”功能的变迁，以及目前为什么社会上各家一体机的架构（我所了解的大一点的Oracle运维三方基本都有了，没有的也在研发中了，只有一些目前宣称只做软件的三方还没做一体机，O(∩_∩)O哈哈~），忽然间想更多的了解下各种硬盘和SSD的发展历程，于是就开始给自己扫盲……………………

Exadata上的Writeback和Writethrouth-1-硬盘篇

发表在 FAQ, 体系架构 | 标签为 hdd, ssd | 留下评论

Oracle对各家虚拟化产品的支持——1-vmware

发表于 2014 年 5 月 12 日由 Lunar

Support Position for Oracle Products Running on VMWare Virtualized Environments (Doc ID 249212.1)

Purpose
———
Explain to customers how Oracle supports our products when running on VMware

Scope & Application
———————-
For Customers running Oracle products on VMware virtualized environments.
No limitation on use or distribution.
Support Status for VMware Virtualized Environments
————————————————–
Oracle has not certified any of its products on VMware virtualized
environments. Oracle Support will assist customers running Oracle products
on VMware in the following manner: Oracle will only provide
support for issues that either are known to occur on the native OS, or
can be demonstrated not to be as a result of running on VMware.

If a problem is a known Oracle issue, Oracle support will recommend the
appropriate solution on the native OS. If that solution does not work in
the VMware virtualized environment, the customer will be referred to VMware
for support. When the customer can demonstrate that the Oracle solution
does not work when running on the native OS, Oracle will resume support,
including logging a bug with Oracle Development for investigation if required.

If the problem is determined not to be a known Oracle issue, we will refer
the customer to VMware for support. When the customer can demonstrate
that the issue occurs when running on the native OS, Oracle will resume
support, including logging a bug with Oracle Development for investigation
if required.

NOTE: Oracle has not certified any of its products on VMware. For Oracle RAC, Oracle will only
accept Service Requests as described in this note on Oracle RAC 11.2.0.2 and later
releases.

发表在虚拟化 | 标签为 oracle vmware, vmware | 留下评论

bbed会产生cr块么？

发表于 2014 年 5 月 11 日由 Lunar

先说下结论，bbed不会产生cr块，具体请看测试结果：

这个结论我的测试不知道是不是科学或者完整，因此，有不同结论的测试，欢迎交流，O(∩_∩)O哈哈~

在没有flush buffer cache的情况下，我们看看：

Start dump data blocks tsn: 4 file#:4 minblk 131 maxblk 131
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777347
BH (0x64bfb2f8) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x64bb6000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 18222 objn: 18222 tsn: 4 afn: 4 hint: f
  hash: [0x617e2cf8,0x83e39ac0] lru: [0x64bfc480,0x77bee8d0]
  obj-flags: object_ckpt_list
  ckptq: [0x64fd9908,0x83f48650] fileq: [0x83f486d0,0x83f486d0] objq: [0x7c05dde8,0x7c05dde8] objaq: [0x7c05ddc8,0x64bfc4b8]
  st: XCURRENT md: NULL fpin: 'kdswh01: kdstgr' tch: 2
  flags: buffer_dirty redo_since_read
  LRBA: [0x5e.7af.0] LSCN: [0x0.af3be] HSCN: [0x0.af3c4] HSUB: [1]
BH (0x617e2c48) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x61524000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 18222 objn: 18222 tsn: 4 afn: 4 hint: f
  hash: [0x627d8588,0x64bfb3a8] lru: [0x83f44af8,0x763f2a20]
  lru-flags: moved_to_tail
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: CR md: NULL fpin: 'kdswh01: kdstgr' tch: 2
  cr: [scn: 0x0.af3c3],[xid: 0x0.0.0],[uba: 0x0.0.0],[cls: 0x0.af3c3],[sfl: 0x0],[lc: 0x0.af3be]
  flags: redo_since_read
BH (0x627d84d8) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x6240a000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 18222 objn: 18222 tsn: 4 afn: 4 hint: f
  hash: [0x83e39ac0,0x617e2cf8] lru: [0x613fa210,0x737e16a0]
  lru-flags: moved_to_tail on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh01: kdstgr' tch: 0 lfb: 33
  flags:

上面这段信息基本跟X$BH的内容对应。

这里，表示file 4 block 131是object_id为18222的对象的一个块（st: XCURRENT，即当前块），该块作为class 1 block被prefetch（class 1表示数据块，比如如果这是是class 8，就表示1st level bmb ，如果是3就表示3级位图块等等……）
这个块是脏块（buffer_dirty）。同时，在LRU链表的末端有一个该块的CR块，在辅助链表还有一个该块的副本（FREE状态，表示跟数据文件上该块的是一致的）。

其中LRU_FLAG的解释如下：
KCBBHLDF 0x01 8.1 LRU Dump Flag used in debug print routine
KCBBHLMT 0x02 8.1 moved to tail of lru (for extended stats)
KCBBHLAL 0x04 8.1 on auxiliary list
KCBBHLHB 0x08 8.1 hot buffer – not in cold portion of lru

可以通过如下语句查询lru_flah：
select ba,dbablk,lru_flag from x$bh where dbablk=131;
注意：
X$BH.lru_flag 为2：moved_to_tail
X$BH.lru_flag 为8：hot_buffer
X$BH.lru_flag 为4：on_auxiliary_list
X$BH.lru_flag 为6：moved_to_tail on_auxiliary_list

再比如，可以通过下面的语句查询出来该块hash链表的值，例如hash: [0x83e39ac0,0x617e2cf8]，表示下一个BH的hash值是0x83e39ac0，而前面一个BH的hash值是0x617e2cf8：
select ba,dbablk,nxt_hash,prv_hash from x$bh where dbablk=131

如果我做了flush buffer cache，在使用bbed读取block，则所有块都在辅助链表，且没有XCURRENT了，都是FREE：

  
  Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=16777347
BH (0x64bfb2f8) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x64bb6000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 18222 objn: 18222 tsn: 4 afn: 4 hint: f
  hash: [0x617e2cf8,0x83e39ac0] lru: [0x64fd9530,0x617e2d30]
  lru-flags: on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh01: kdstgr' tch: 0 lfb: 33
  flags:
BH (0x617e2c48) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x61524000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 18222 objn: 18222 tsn: 4 afn: 4 hint: f
  hash: [0x627d8588,0x64bfb3a8] lru: [0x64bfb3e0,0x64fda700]
  lru-flags: moved_to_tail on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh01: kdstgr' tch: 0 lfb: 33
  flags:
BH (0x627d84d8) file#: 4 rdba: 0x01000083 (4/131) class: 1 ba: 0x6240a000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0
  dbwrid: 0 obj: 18222 objn: 18222 tsn: 4 afn: 4 hint: f
  hash: [0x83e39ac0,0x617e2cf8] lru: [0x613fa210,0x737e16a0]
  lru-flags: moved_to_tail on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh01: kdstgr' tch: 0 lfb: 33
  flags:

顺便多看一点：

  
Block dump from disk:
buffer tsn: 4 rdba: 0x01000083 (4/131)
scn: 0x0000.000af3c4 seq: 0x01 flg: 0x04 tail: 0xf3c40601
frmt: 0x02 chkval: 0x044f type: 0x06=trans data  
Hex dump of block: st=0, typ_found=1
.........

这部分内容中的type、frmt_kcbh等等和trace中的下面第一行对应（即块头的信息）：

 
BBED> p kcbh  
struct kcbh, 20 bytes                       @0       
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x01000083
   ub4 bas_kcbh                             @8        0x000a5a05
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x04 (KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x467e
   ub2 spare3_kcbh                          @18       0x0000

BBED> dump count 20   
 File: /stage/travel/users01.dbf (4)
 Block: 131              Offsets:    0 to   19           Dba:0x01000083
------------------------------------------------------------------------
 06a20000 83000001 055a0a00 00000104 7e460000 

 <32 bytes per line>

BBED>

下面的dump是内存中的块的dump，这部分信息真好呀，你不用像以前那样为了看看OS系统上块长啥样子而去”dd|od -x” 了，O(∩_∩)O哈哈~

 
Dump of memory from 0x00007FFB32AFEA00 to 0x00007FFB32B00A00
7FFB32AFEA00 0000A206 01000083 000AF3C4 04010000  [................]     
这里A2就是type_kcbh，06就是type_kcbh，01000083就是rdba_kcbh，04就是flg_kcbh，01就是seq_kcbh.。。。。。。。。。后面的依此类推
7FFB32AFEA10 0000044F 00000001 0000472E 000A59FD  [O........G...Y..]     
7FFB32AFEA20 00000000 00320003 01000080 0000FFFF  [......2.........]
7FFB32AFEA30 00000000 00000000 00000000 00008000  [................]
7FFB32AFEA40 000A59FD 000D0001 000001EC 00C0025E  [.Y..........^...]
7FFB32AFEA50 0026004D 0013000A 00000000 00000000  [M.&.............]
7FFB32AFEA60 00000000 00000000 00000000 00000000  [................]
7FFB32AFEA70 00000000 00000000 00000000 00580100  [..............X.]
7FFB32AFEA80 00C2FFFF 037001C8 00000383 1F330058  [......p.....X.3.]
7FFB32AFEA90 039803E5 03491E4B 02AF02FC 02150262  [....K.I.....b...]
7FFB32AFEAA0 1C0D01C8 1B741BC0 1ACF1B1E 1A331A81  [......t.......3.]
7FFB32AFEAB0 199719E6 18FF194B 186118B0 17C81815  [....K.....a.....]
7FFB32AFEAC0 172D1779 168B16DA 15E81636 154C159A  [y.-.....6.....L.]
7FFB32AFEAD0 14AF14FE 140C1464 136C13BB 12CE131A  [....d.....l.....]
7FFB32AFEAE0 12331280 119811E6 10FD1149 105F10AF  [..3.....I....._.]
7FFB32AFEAF0 0FC61013 0F280F77 0E8A0ED9 0DED0E3C  [....w.(.....<...]
7FFB32AFEB00 0D400D91 0C9E0CF0 0BFD0C4C 0B5B0BAC  [..@.....L.....[.]
7FFB32AFEB10 0AC30B0F 0A230A76 097F09D2 08E0092C  [....v.#.....,...]
7FFB32AFEB20 083C0891 07A007EE 07060752 066A06B8  [..<.....R.....j.]
7FFB32AFEB30 05C8061A 05230576 048104CE 00000432  [....v.#.....2...]
7FFB32AFEB40 00000000 00000000 00000000 00000000  [................]
。。。。。。。。。。。。。。。。
7FFB32B00990 32302D33 3A38302D 303A3132 35323A30  [3-02-08:21:00:25]
7FFB32B009A0 4C415605 4E014449 4E014E01 2C05C102  [.VALID.N.N.N...,]
7FFB32B009B0 53030E02 4C055359 52414E55 15C102FF  [...SYS.LUNAR....]
7FFB32B009C0 0503C102 4C424154 71780745 01160802  [....TABLE.xq....]
7FFB32B009D0 7178071A 01160802 3032131A 302D3331  [..xq......2013-0]
7FFB32B009E0 38302D32 3A31323A 323A3030 41560535  [2-08:21:00:25.VA]
7FFB32B009F0 0144494C 014E014E 02C1024E F3C40601  [LID.N.N.N.......]

发表在 Internal | 留下评论

ASM disk和diskgroup等使用的限制

发表于 2014 年 5 月 9 日由 Lunar

11.2 ASM磁盘和磁盘组的限制如下：

Oracle ASM has the following limits on the number of disk groups, disks, and files:

63 disk groups in a storage system
10,000 Oracle ASM disks in a storage system
1 million files for each disk group

Without any Oracle Exadata Storage, Oracle ASM has these storage limits:

2 terabytes (TB) maximum storage for each Oracle ASM disk
20 petabytes (PB) maximum for the storage system

With all Oracle Exadata Storage, Oracle ASM has these storage limits:

4 PB maximum storage for each Oracle ASM disk（这个有人测试过么？迄今为止所有版本的Exadata上都没有使用超过2T的盘……大盘也是划成小盘的……）
40 exabytes (EB) maximum for the storage system

The maximum size limit of a disk group equals the maximum disk size multiplied by the maximum number of disks in a disk group (10,000).

The maximum number of disks across all disk groups is 10,000. The 10,000 disks can be in one disk group or distributed across a maximum of 63 disk groups. This is a limitation on the number of Oracle ASM disks, not necessarily the number of spindles. A storage array could group multiple spindles into a LUN that is used as a single Oracle ASM disk. However Oracle ASM is currently limited to 2 TB in a single disk unless using Oracle Exadata storage.

File size limits are dependent on the value of the disk group compatibility attributes. Oracle ASM supports file sizes greater than 128 TB in any redundancy mode when the COMPATIBLE.RDBMS disk group attribute is set greater than10.1.

If COMPATIBLE.RDBMS is set to 10.1, the file size limits are less. For example, with COMPATIBLE.RDBMS equal to 10.1 and the AU size equal to 1 MB, Oracle ASM file size limits are:

External redundancy: 16 TB
Normal redundancy: 5.8 TB
High redundancy: 3.9 TB

Note:

Oracle Database supports data file sizes up to 128 TB depending on the file system. In addition, Oracle Database has a file size limit that is dependent on the DB_BLOCK_SIZE initialization parameter.

12.1 ASM的磁盘和磁盘组的限制如下：

Oracle ASM has the following limits on the number of disk groups, disks, and files:

511 disk groups in a storage system
10,000 Oracle ASM disks in a storage system
1 million files for each disk group
Without any Oracle Exadata Storage, Oracle ASM has the following storage limits if the COMPATIBLE.ASM disk group attribute is set to less than 12.1:

2 terabytes (TB) maximum storage for each Oracle ASM disk
20 petabytes (PB) maximum for the storage system
Without any Oracle Exadata Storage, Oracle ASM has the following storage limits if the COMPATIBLE.ASM disk group attribute is set to 12.1 or greater:

4 PB maximum storage for each Oracle ASM disk with the allocation unit (AU) size equal to 1 MB

8 PB maximum storage for each Oracle ASM disk with the AU size equal to 2 MB

16 PB maximum storage for each Oracle ASM disk with the AU size equal to 4 MB

32 PB maximum storage for each Oracle ASM disk with the AU size equal to 8 MB

320 exabytes (EB) maximum for the storage system

With all Oracle Exadata Storage, Oracle ASM has the following storage limits:

4 PB maximum storage for each Oracle ASM disk with the AU size equal to 1 MB

8 PB maximum storage for each Oracle ASM disk with the AU size equal to 2 MB

16 PB maximum storage for each Oracle ASM disk with the AU size equal to 4 MB

32 PB maximum storage for each Oracle ASM disk with the AU size equal to 8 MB

320 EB maximum for the storage system

The maximum size limit of a disk group equals the maximum disk size multiplied by the maximum number of disks in a disk group (10,000).

The maximum number of disks across all disk groups is 10,000. The 10,000 disks can be in one disk group or distributed across a maximum of 511 disk groups. This is a limitation on the number of Oracle ASM disks, not necessarily the number of spindles. A storage array could group multiple spindles into a LUN that is used as a single Oracle ASM disk.

File size limits are dependent on the value of the disk group compatibility attributes. Oracle ASM supports file sizes greater than 128 TB in any redundancy mode when the COMPATIBLE.RDBMS disk group attribute is set greater than10.1.

If COMPATIBLE.RDBMS is set to 10.1, the file size limits are less. For example, with COMPATIBLE.RDBMS equal to 10.1 and the AU size equal to 1 MB, Oracle ASM file size limits are:

External redundancy: 16 TB
Normal redundancy: 5.8 TB
High redundancy: 3.9 TB
Note:
Oracle Database supports data file sizes up to 128 TB depending on the file system. In addition, Oracle Database has a file size limit that is dependent on the DB_BLOCK_SIZE initialization parameter.
For information abo

根据12.1的官方文档，ASM 存储的限制更加宽松了，但是众所周知的Bug 6453944导致实际上单盘>=2T 就会报“ORA-15196 WITH ASM DISKS LARGER THAN 2T”，而AIX 单盘4P的bug依然没有彻底解决：

这里需要注意的问题：

（1）aix上单盘为4P的bug： On release 11.2.0.1: This problem is due to the Bug 9495887 ASM RECOGNIZE 1.25 TB DISKS AS 4,095.25 TB DISKS ON AIX.（类似的其他bug还有几个，有兴趣的可以MOS上搜索）

（2）11.1和11.2上，”单盘>=2T”的bug是由于ASM的硬编码造成的，类似的例如：11.1上的Bug 13852568 : ADD ASM INDIVIDUAL DISK SPACE METRIC， 11.2上的Bug 6453944 : ORA-15196 WITH ASM DISKS LARGER THAN 2TB等等（根据bug描述，言外之意是，如果fixed了，可以使用2T单盘……仅此而已）

（3）在“IOE”架构中，我们通常建议单盘500G~1T(在部分环境下1.5T也有bug)，但是在目前流行的各家一体机上，恐怕使用小于2T的盘是一个不可绕开的问题了……没有环境，否则可以测试下最新的11.2.0.4.x和12.1.0.1.2，是否可以突破2T的限制……

Anyway，bug是不可缺少的磨练，毕竟ASM单盘超过2T或者4T等等是一个未来的趋势（希捷老大前段时间不是说明年要出10T的机械盘么……），但是迄今为止，使用超过2T做单盘的ASM确实比例少之又少……O(∩_∩)O哈哈~

发表在 ASM | 标签为 asm, asm disk, asm disk 大小限制 | 一条评论

浅谈Oracle非常规恢复

发表于 2014 年 4 月 27 日由 Lunar

一直以来，类似非归档无备份的数据库损坏，或者备份不可用，或者用备份恢复因为时间太长或者空间限制等等原因制约，非常规恢复一直是我们不能扔掉的救命稻草，在这个方面我并不擅长，但是一直都很喜欢。

记得2001年前后，我第一次有兴趣想要认真学习一下Oracle（以前做开发相关比较多，菜鸟dba），在还没有了解Oracle备份恢复的机制时，忘记为什么首先接触了Oracle 817的Standby，一周内完整的读了一遍文档，动手搭建了两个，并记录在ITPUB和我自己的blog上，很有成就感，而那时，我还没有意识到，其实Standby 的本质就是数据库的备份和恢复的完美结合。

后来有机会作为dba参加一个公司（在当时没觉得公司小，但是只有我一个菜鸟dba，O(∩_∩)O哈哈~）的一个海外项目，是做一个3节点Oracle 817 OPS 克隆数据库的事情，当时我还不熟悉RMAN，使用dd裸设备的方式完成了任务。这个项目之后，我开始迷上Oracle的备份和恢复，并且开始玩rman了。

迄今为止，我还是认为数据库备份恢复是学习Oracle的最佳入口，因为很多时候你可以方便的模拟场景，并研究恢复，在一个个案例中，学习和了解更多internal的原理。当然，任何时候官方文档都是第一步，没有这个基础，很多都如同空中楼阁。

还有一点，一个好的架构设计、备份恢复策略和灾备设计都是最好的选择，这个是毋庸置疑的。

但是中国大量D版客户，尤其是小客户中太多情况下是没有专业的设计，出了问题，非常规恢复的手段就是救命稻草了。
下面的ppt就是去年一个客户在备份不可用的情况下，花“巨资”请人做了非常规恢复后，找我们去做的一次交流，客户要求的主题就是“非常规恢复”：
浅谈Oracle非常规恢复-lunar

发表在 backup&recovery, Internal | 标签为 bbed, dul, oracle, oracle uba, oracle xid, xid uba, 非常规恢复 | 留下评论

Exadata的磁盘自动管理-3-磁盘自动管理的操作规则

发表于 2014 年 4 月 22 日由 Lunar

Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息
 Exadata的磁盘自动管理-2-Cell上各种磁盘相关的概念
 Exadata的磁盘自动管理-3-磁盘自动管理的操作规则
磁盘自动管理的一些条件和限制：

1. Griddisk 的状态改变时（OFFLINE/ONLINE）：
如果一个griddisk的状态变为临时不可用（temporarily unavailable），那么它会在ASM中被自动的变更为OFFLINED。
如果一个griddisk的状态变成可用的（available），那么它会在ASM中被自动的变更为ONLINED

2. Griddisk的DROP磁盘的操作
如果一个物理盘（physicaldisk）失效了，所有在这个物理盘之上的griddisk都会在ASM中自动的以FORCE选项DROP。
如果一个物理盘（physicaldisk）的状态变为’predictive failure’，所有在这个物理盘之上的griddisk都会在ASM中被自动DROP。
‘predictive failure’的概念参见前面的《Exadata的磁盘自动管理-1-读懂cell alert的磁盘信息》
如果一个flashdisk出现性能降级，相应的griddisk都会在ASM中自动的以FORCE选项DROP。

3. Griddisk的ADD磁盘的操作
如果更换了一个物理盘，该物理盘上对应的celldisk和griddisk都会被重新自动的创建，并且在创建后自动的加入到ASM中。当然，这个是要求换盘之前这个盘是完全自动管理的，也就是之前他是被自动的在ASM中drop的。
如果手工的drop了griddisk（不带force选项），那么就需要手工的将盘加回到ASM中。
如果griddisk是NORMAL状态，并且在ONLINE模式，那么使用FORCE选线drop了磁盘（这个模式通常必须使用FORCE选项），他也会被自动加回到ASM中。

4. 对CELL进行rolling upgrade时，Griddisk 的状态发生改变：OFFLINE/ONLINE
Before the upgrade all griddisks will be inactivated on the storage cell and OFFLINED in ASM. After the upgrade all griddisks will be activated on the storage cell and ONLINED in ASM.

升级前，cell上所有的griddisk会变为inactive状态，且在ASM中变为OFFLINED状态。
升级后，cell上所有的griddisk会变被激活（状态为active），且在ASM中变为ONLINED状态。

5. 手工改变 griddisk 状态：activation/inactivation
如果cell上griddisk的状态为inactivated，他就会在ASM中被自动的变为OFFLINED。
如果cell上griddisk的状态为activated，他就会在ASM中被自动的变为ONLINED。

不能自动完成的磁盘操作包括：
MOUNT ASM 磁盘组
celldisk EXPORT
celldisk IMPORT

Exadata自动管理磁盘的实现，离不开3个主要进程（cellsrv,ms,rs）：
CELLSRV是Exadata上存储服务器的主要组成部件，它是一个多线程服务器，主要针对简单的块请求和智能扫描请求（如投影和筛选的表扫描等）提供服务，另外，CELLSRV也与DBRM协同工作来计量各种数据库和客户组在发送IO时所用IO带宽。
CELLSRV收集与操作相关的大力统计信息。Oracle数据库和ASM进程使用LIBCELL和CELLSRV通信，LIBCELL使用iDB协议将IO请求转换为要发送给CELLSRV的消息。

MS(管理服务器)，MS提供Exadata cell管理和配置的功能。MS监控cell上硬件的改变（比如插拔磁盘等等）或者告警（比如 disk failure），并通过ioctl系统调用（input/output control）来通知CELLSRV。
ASM实例上跟CELL的CELLSRV通信的进程，在ASM的一段也是通过IOCTL系统调用来检查是否需要有什么相应的行动（比如online或者offline等等）。

自动管理功能对于非计划的事件（unplanned events，例如 disk failure）和计划的活动（例如 cell打patch时’deactivating disks’）都有作用。
磁盘的deactivation是在cell上执行的，但是ASM会响应这一动作，即将相关磁盘OFFLINE。

RS（重启服务器），RS用于启动和关闭CELLSRV和MS服务，并监控这些服务进程的状态，在必要时负责重新启动cellsrv和ms。

在ASM上会有两个进程执行磁盘自动管理的工作，即Exadata Automation Manager (XDMG)和Exadata Automation Manager (XDWK) ：

Exadata Automation Manager (XDMG) 监控所有cell配置的状态的改变，比如一个替换一个故障盘后，需要执行此类事件所需的任务。
它的主要任务是监控无法访问的磁盘和cell，当他们重新变为可访问时，发出让ASM将该盘ONLINE的操作。

Exadata Automation Manager (XDWK)执行XDMG所要求的自动任务。
当XDMG进程请求一些异步操作时就会启动这个进程，例如磁盘 ONLINE, DROP 和 ADD。
当该进程空闲5分钟后，就会自行关闭，等待XDMG的下次请求时再被自动启动。

这两个进程都可以被是ASM的“非核心”进程，在某些情况下可以被kill，系统就会自动重启这两个进程。例如，有些老版本中，系统自动在ASM中加盘或者删盘有问题。

在ASM中，有几个跟磁盘自动管理相关参数：
_AUTO_MANAGE_EXADATA_DISKS ，_AUTO_MANAGE_NUM_TRIES ，_AUTO_MANAGE_MAX_ONLINE_TRIES
具体含义，以前ML和月明都曾经讲过，我也没search到他们的blog，有兴趣的，可以自行google和百度。O(∩_∩)O哈哈~。

跟自动管理相关的配置文件：
$OSSCONF/cell_disk_config.xml，包含了IORM plans，cell信息，disk信息等等
$OSSCONF/griddisk.owners.dat，包含了下面的信息：
ASM disk name (normally the same as griddisk name)
ASM diskgroup name
ASM failgroup name (normally the same as cell name)
Cluster identifier (which cluster this disk belongs to)
Requires DROP/ADD (should the disk be dropped from or added to ASM)

常用的磁盘自动管理的trace方法：

启用ASM上的磁盘自动管理的trace：
SQL> alter system set event=’trace[KXDAM] memory highest, disk highest’ scope=spfile sid=’*’;

禁用ASM上的磁盘自动管理的trace：
SQL> alter system reset event scope=spfile sid=’*’;

在cell上启用磁盘自动管理的trace：
CellCLI> alter cell events=’trace[cellsrv.cellsrv_events_layer] memory=highest,disk=highest’

在cell上禁用磁盘自动管理的trace：
CellCLI> alter cell events=’trace[cellsrv.cellsrv_events_layer] off’

发表在内部机制 | 标签为 cellsrv, Exadata, exadata predictive failure, predictive failure, xdkm, XDMG | 留下评论

Exadata的磁盘自动管理-2-Cell上各种磁盘相关的概念

发表于 2014 年 4 月 22 日由 Lunar

Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息
 Exadata的磁盘自动管理-2-Cell上各种磁盘相关的概念
 Exadata的磁盘自动管理-3-磁盘自动管理的操作规则
首先明确一些Cell上各种磁盘相关的概念：

每个cell上有12块Physicaldisk，每个盘的容量相同，目前，X2的可以有600GB, 2TB和3TB（2T盘已经停产了），X3的只能有600GB和3TB，X4可以有1.2T和4T盘。

每块物理盘对应一个LUN（前面的2块盘，分别会划去一部分空间用来管理OS。后面的10块盘分别直接对应一个LUN），每个LUN对应一个CELLDISK。

每个cell上前两块盘的前面42G左右的空间做成软RAID1，后面的557G空间和其他10块盘使用方法一样，都是作为一个独立的celldisk的。
注意，celldisk是一个逻辑盘的概念。

这个信息在cell的配置文件中有明确说明：
还可以使用下面的命令查看celldisk的容量：
[root@dm01cel11 ~]# cellcli -e list griddisk where celldisk=CD_02_dm01cel11 attributes name,size,offset

每个cell上有16个Flashdisk（每个cell上4个PCI闪存卡，每个卡上集成4块flashdisk），X2是每个24G，X3是每个100G，X4是每个200G。

每一个Flashdisk对应一个Flash LUN，每个Flash LUN对应一个Cell Disk。

普通硬盘，即disktype=HardDisk的盘上创建的Celldisk缺省命名为CD_00_cellname,CD_01_cellname, …CD_11_cellname
基于FlashDisk的盘，即disktype=flashdisk的盘上创建的Celldisk缺省命名为FD_00_cellname, FD_01_cellname … FD_15_cellname.

GridDisk是一些创建在一个或者多个celldisk上的逻辑盘。在Exadata的标准安装流程中，Griddisk只是使用基于硬盘创建的celldisk，而不使用基于Flashdisk创建的celldisk。
一般我们使用基于flashdisk创建的celldisk来做flashcache和flashlog。

在Exadata环境中，ASM DISK其实就是一个的griddisk。
在Exadata的标准安装中，一般会创建3个磁盘组，其中DATA DG和RECO DG会使用全部的12块盘（含前两块盘除去系统以外的空间划成的盘），而DBFS DG只是用后面的10块盘（不含前两块系统盘）。

GridDisk的命名缺省是以ASM磁盘组的名称为前缀，后面加上celldisk的名字，例如：

[root@dm01cel11 ~]# cellcli -e list griddisk attributes name,size,offset
         DATA_DM01_CD_00_dm01cel11       423G            32M
         DATA_DM01_CD_01_dm01cel11       423G            32M
         DATA_DM01_CD_02_dm01cel11       423G            32M
         DATA_DM01_CD_03_dm01cel11       423G            32M
         DATA_DM01_CD_04_dm01cel11       423G            32M
         DATA_DM01_CD_05_dm01cel11       423G            32M
         DATA_DM01_CD_06_dm01cel11       423G            32M
         DATA_DM01_CD_07_dm01cel11       423G            32M
         DATA_DM01_CD_08_dm01cel11       423G            32M
         DATA_DM01_CD_09_dm01cel11       423G            32M
         DATA_DM01_CD_10_dm01cel11       423G            32M
         DATA_DM01_CD_11_dm01cel11       423G            32M
         DBFS_DG_CD_02_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_03_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_04_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_05_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_06_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_07_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_08_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_09_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_10_dm01cel11         29.125G         528.734375G
         DBFS_DG_CD_11_dm01cel11         29.125G         528.734375G
         RECO_DM01_CD_00_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_01_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_02_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_03_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_04_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_05_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_06_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_07_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_08_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_09_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_10_dm01cel11       105.6875G       423.046875G
         RECO_DM01_CD_11_dm01cel11       105.6875G       423.046875G
[root@dm01cel11 ~]#

发表在内部机制 | 标签为 cellsrv, Exadata, predictive failure | 留下评论

Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息

发表于 2014 年 4 月 22 日由 Lunar

Exadata的磁盘自动管理-1-教你读懂cell alert中的磁盘相关信息
 Exadata的磁盘自动管理-2-Cell上各种磁盘相关的概念
 Exadata的磁盘自动管理-3-磁盘自动管理的操作规则
从11.2.3.2.x开始，系统可以自动识别磁盘性能下降，并自动从当前配置中移除。
出现性能下降的磁盘会影响系统整体的性能表现，因为所有的工作负载是平均的分布到所有的磁盘的。
举个例子，如果一个磁盘有相对于其他的磁盘有30%的性能下降，那么整体的系统IO能力就会下降30%。

当检测到磁盘性能下降时，系统自动从当前配置将该盘移除。Exadata会执行一些列性能测试，如果这个磁盘的问题是临时问题，那么系统会自动将其放回原来的配置中。如果这个磁盘不能通过测试，那么就会被标记为性能不佳（pool performance），并自动开启一个ASR(Automatic Service Request)来请求更换磁盘。
这一特性对于硬盘和flash盘同样有效。

说到Exadata的IO能力和管理，必要要提到众所周知的CELLSRV。
CELLSRV是Exadata上存储服务器的主要组成部件，它是一个多线程服务器，主要针对简单的块请求和智能扫描请求（如投影和筛选的表扫描等）提供服务，另外，CELLSRV也与DBRM协同工作来计量各种数据库和客户组在发送IO时所用IO带宽。
CELLSRV收集与操作相关的大力统计信息。Oracle数据库和ASM进程使用LIBCELL和CELLSRV通信，LIBCELL使用iDB协议将IO请求转换为要发送给CELLSRV的消息。

CELL上跟IO相关的主要进程还有MS和RS。
MS(管理服务器)，MS提供Exadata cell管理和配置的功能。它与命令行界面CELLCLI协同工作，每个CELL有CELLCLI进行单独管理。
CELLCLI是一个本地化的管理工具，也就是说他需要在某个特定CELL中管理该CELL。
不过，在Exadata上，可以用dcli实现远程地对多个CELL运行同一个CELLCLI命令来达到集成管理或者统一管理。dcli是一套集成的工具包，本质是通过SSH进行集成管理，可以单独配置（比如你自己有很多非Exadata的linux环境，可以根据需要配置dcli）。
另外，在CELL上，除了CELLSRV负责收集主要的统计信息外，MS也负责发送警报和收集其它统计信息。

RS（重启服务器），RS用于启动和关闭CELLSRV和MS服务，并监控这些服务进程的状态，在必要时负责重新启动cellsrv和ms。

前面说到CELLSRV会自动检测磁盘的性能，并自动更改配置。当CELLSRV检测到磁盘性能下降时，cell disk的状态就会更改为’normal – confinedOnline’，物理盘的状态更改为’warning – confinedOnline’。
这意味着磁盘已经进入了性能表现不佳的第一个阶段，这个阶段是个过渡阶段，磁盘状态不会停留在这个阶段很长时间。

通常磁盘的常见状态会有4种：
HEALTH_BAD_ONLINE
HEALTH_BAD_OFFLINE
HEALTH_GOOD
HEALTH_FAIL

这些检测和状态的变化，会有CELLSRV记录到alert中，例如：

Thu Dec 19 01:27:16 2013
CDHS: Mark cd health state change CD_08_dm01cel05  with newState HEALTH_BAD_ONLINE  pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD
Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0 inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0
Thu Dec 19 01:27:16 2013
global conf related state: numHDsConf: 3 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_GOOD to newState HEALTH_BAD_ONLINE
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_GOOD to newState HEALTH_BAD_ONLINE
ABSOLUTE SERVICE TIME VIOLATION DETECTED ON DISK /dev/sdi: CD name - CD_08_dm01cel05 AVERAGE SERVICETIME: 163.000000 ms. AVERAGE WAITTIME: 1.666667 ms. AVERAGE REQUESTSIZE: 1137 sectors. NUMBER OF IOs COMPLETED IN LAST CYCLE ON DISK: 9 THRESHOLD VIOLATION COUNT: 6 NON_ZERO SERVICETIME COUNT: 6 SET CONFINE SUCCESS: 1
NOTE: Initiating ASM Instance operation: Query ASM Deactivation Outcome on 3 disks
Published 1 grid disk events Query ASM Deactivation Outcome on DG DATA_DM01 to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
Published 1 grid disk events Query ASM Deactivation Outcome on DG DBFS_DG to: 
ClientHostName = dm01db08.lunar,  ClientPID = 10891
Published 1 grid disk events Query ASM Deactivation Outcome on DG RECO_DM01 to: 
ClientHostName = dm01db07.lunar,  ClientPID = 10523

同样的信息，在alerthistroy总也有可以观察到：

70_1  2013-12-19T01:27:21+08:00       warning         "Hard disk entered confinement offline status. The LUN 0_8 changed status to warning - confinedOffline. CellDisk changed status to normal - confinedOffline. All subsequent I/Os on this disk are failed immediately. Confinement tests will be run on the disk to determine if the disk should be dropped. Status                      : WARNING - CONFINEDOFFLINE  Manufacturer                : HITACHI  Model Number                : HUS1560SCSUN600G  Size                        : 600G  Serial Number               : 1216KLN0HN  Firmware                    : A700  Slot Number                 : 8  Cell Disk                   : CD_08_dm01cel05  Grid Disk                   : RECO_DM01_CD_08_dm01cel05, DBFS_DG_CD_08_dm01cel05, DATA_DM01_CD_08_dm01cel05  Reason for confinement      : threshold for service time exceeded"

我猜测，上述信息是通过类似：下面这样的命令完成的：
list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

上述就是磁盘检测的第一步，接下来，Exadata就开始执行test步骤，即“Prepare for test – confined offline”
这个步骤的第一件事情，就是将所有该celldisk上的griddisk都offline，然后运行在其上运行性能测试。
这里，CELLSRV会请求ASM将griddisk offline，同时alert总记录信息如下：

NOTE: Initiating ASM Instance operation: ASM OFFLINE disk on 3 disks
Published 1 grid disk events ASM OFFLINE disk on DG DATA_DM01 to: 
ClientHostName = dm01db07.lunar,  ClientPID = 10523
Published 1 grid disk events ASM OFFLINE disk on DG DBFS_DG to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
Published 1 grid disk events ASM OFFLINE disk on DG RECO_DM01 to: 
ClientHostName = dm01db04.lunar,  ClientPID = 10378
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_BAD_ONLINE to newState HEALTH_BAD_OFFLINE
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_BAD_ONLINE to newState HEALTH_BAD_OFFLINE

如果可能的话，ASM会将所有相关的GRIDDISK OFFLINE。
注意，这个是“如果可能的话”，也就是说还需要看其他一些情况，比如是否满足了’disk_repair_time’的需求（缺省3.6小时）。

接下来，MS进程会对已经offline的griddisk进行压力测试，如果测试结果良好，MS会告诉CELLSRV说：这个磁盘是好的，请放心使用，而CELLSRV得到这个通知后就会告诉ASM：亲，请将这些griddisk online吧，他们都是ok的。
例如：

Fri Nov 08 02:48:03 2013
CDHS: Do cd health state change  after confinement CD_08_dm01cel05 testFailed 0
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_BAD_OFFLINE to newState HEALTH_GOOD
Set CD perf state normal.
No need to clear proactive drop state for RECO_DM01_CD_08_dm01cel05 [5401ad6b-29b3-4f56-a958-29580980abf9] :
No need to clear proactive drop state for DATA_DM01_CD_08_dm01cel05 [37f5e40a-5e03-48be-b5ab-34f70e629f20] :
No need to clear proactive drop state for DBFS_DG_CD_08_dm01cel05 [45bf182b-5124-4e5c-94ef-9b6fe4fd4390] :
NOTE: Initiating ASM instance operation:
 Operation: ONLINE ASM disks for 3 Grid disks guids...
NOTE: Initiating ASM Instance operation: ASM ONLINE disk on 3 disks
Published 1 grid disk events ASM ONLINE disk on DG DATA_DM01 to: 
ClientHostName = dm01db07.lunar,  ClientPID = 10523
Published 1 grid disk events ASM ONLINE disk on DG DBFS_DG to: 
ClientHostName = dm01db08.lunar,  ClientPID = 10675
Published 1 grid disk events ASM ONLINE disk on DG RECO_DM01 to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_BAD_OFFLINE to newState HEALTH_GOOD

类似的，alerthistory中的信息如下：

dm01cel05: 70_2  2013-12-19T01:31:06+08:00       clear           "Hard disk status changed to normal.  Status        : NORMAL  Manufacturer  : HITACHI  Model Number  : HUS1560SCSUN600G  Size          : 600GB  Serial Number : 1216KLN0HN  Firmware      : A700  Slot Number   : 8  Cell Disk     : CD_08_dm01cel05  Grid Disk     : RECO_DM01_CD_08_dm01cel05, DBFS_DG_CD_08_dm01cel05, DATA_DM01_CD_08_dm01cel05"

当然，如果MS对offline的griddisk进行测试时候，发现性能确实已经不满足要求，呢么celldisk的状态就会改变为’proactive failure’，同时物理盘的状态改变为 ‘warning – poor performance’
也就是说，这些盘就需要从当前配置中移除。这时，MS就会通知CELLSRV这一信息，而CELLSRV就会通知ASM去drop这些griddisk。

在alerthistory中相关信息如下：

dm01cel05: 87_2  2014-01-31T02:06:15+08:00       critical        "Hard disk entered poor performance status.  Status                      : WARNING - POOR PERFORMANCE  Manufacturer                : HITACHI  Model Number                : HUS1560SCSUN600G  Size                        : 600G  Serial Number               : 1216KLN0HN  Firmware                    : A700  Slot Number                 : 8  Cell Disk                   : CD_08_dm01cel05  Grid Disk                   : RECO_DM01_CD_08_dm01cel05, DBFS_DG_CD_08_dm01cel05, DATA_DM01_CD_08_dm01cel05  Reason for poor performance : threshold for service time exceeded"

而cell的alert会有如下信息：

Fri Jan 31 02:06:12 2014
CDHS: Do cd health state change  after confinement CD_08_dm01cel05 testFailed 1
CDHS: Do cd health state change CD_08_dm01cel05 from HEALTH_BAD_OFFLINE to newState HEALTH_FAIL
NOTE: Initiating ASM Instance operation: ASM DROP dead disk on 3 disks
Published 1 grid disk events ASM DROP dead disk on DG DATA_DM01 to: 
ClientHostName = dm01db06.lunar,  ClientPID = 10523
Published 1 grid disk events ASM DROP dead disk on DG DBFS_DG to: 
ClientHostName = dm01db04.lunar,  ClientPID = 10378
Published 1 grid disk events ASM DROP dead disk on DG RECO_DM01 to: 
ClientHostName = dm01db08.lunar,  ClientPID = 10891
CDHS: Done cd CD_08_dm01cel05 health state change from HEALTH_BAD_OFFLINE to newState HEALTH_FAIL

同时，在asm的磁盘中就会出现相应的drop磁盘的信息（drop force）
之后，等待rebalance完成后，就可以更换磁盘了。ASM的rebalance信息可以参考+ASM5_rbal_xxxx和+ASM5_arb0_xxxx相关的trace。

发表在内部机制 | 标签为 cellsrv, Exadata, predictive failure | 留下评论

SAP 产品的发展史

发表于 2014 年 4 月 12 日由 Lunar

最近发现SAP很有意思，就像多年前发现oracle很有意思一样（oracle产品，其实现在也很有意思，不过有些东西貌似回不去了……比如，OGG又大又胖了，当然，他一定力气更大了吧，O(∩_∩)O哈哈~）

这张图是SAP ERP的架构和大致的发展史。

1992年，SAP发布了SAP R/3 系统，数据库是Oracle 7.0，最初的SAP（SAP R/3 4.6 Core之前）只有两部分内容，即SAP Basis和Application。这时的SAP Basis使用ABAP语言编写的。
从产品结构看，猜测这个是一个嵌入式的架构（具体我也不懂，纯猜测）。

那时ORACLE跟SAP是好兄弟，共同闯荡江湖，在一些领域相依相偎不过分吧，O(∩_∩)O哈哈~……
我个人感觉ORACLE第一个成熟的RDBMS是1996年发布的ORACLE 7.3（1994年release了ORACLE 7.1,1995年release 7.2），这个版本ORACLE有了Standby，直到今天，这个功能也是我个人认为非常有价值和成熟的功能，也是平时给客户做容灾方案和实施时可能条件下的首选方案。

到了1998年，SAP发布了第一个SAP BW（Business Warehouse），而这前一年，Oracle 8 带着OPS(Oracle Parallel Server，RAC前身)、Partiton table和Partiton index、基于时间点的RMAN恢复等等荣耀发布。

不知道是哪一年，SAP发布了使用ABAP开发的SAP R/3 Enterprise Core 4.7，这时的体系架构，看样子已经是两层结构了，这里的SAP Web Application Server 6.20也就是以前的SAP Basis。
我的理解是，从这时起，应用服务器和应用剥离了。在这个版本上，SAP开发了更多的行业解决方案，他们被部署在Enterprise Extensions之上。

那时候还没有SAP Netweaver，从2004年以后的ECC 5.0开始才有了NetWeaver这个平台。

2002年，Oracle 9.2 带着更加亲民的N多新特性分布了（比如OPS中cache fusion的改进，从这个版本起，OPS退出历史舞台，RAC闪亮登场）。
两年后（2004年），SAP发布了ECC 5.0，这个平台中，不再有SAP Web Application Server（也就是以前的SAP Basis），取而代之的是一个全套SAP应用服务器和应用的大平台： SAP NetWeaver 。我猜，total solution这词也是那时候开始流行吧？
在SAP NetWeaver上面，可以部署SAP NetWeaver Portal（一个统一的UI界面），用来管理所有应用程序，还可以部署SAP BW（Business Warehouse，跟SAP流程集成在一起，是SAP ERP解决方案的组成部分）

最新的是SAP NetWeaver Application Server ABAP 7.0。目前，生产环境下很多还是SAP NetWeaver Application Server ABAP 6.0甚至更早的版本。
而最新的Oracle已经是2013年发布的12.1了。目前多数生产环境下还是Oracle 10.2和11.2（有些甚至更早的Oracle 8i，9i）。
或许，SAP使用的数据库中Oracle的比例会越来越小的，在这个缤纷的眼花缭乱的IT浪潮中，各种概念层出不穷络绎不绝，各种新技术如雨后春笋拔地而起。
我个人感觉，其实永恒的还是那个主题：实体经济，近水楼台先得月。但是信息技术的领先，才能磨刀不误砍柴工。不会忽悠不行，因为酒香也怕巷子深，全是忽悠也不行，因为翻云覆雨后终须落地。

SAP套件包含：SAP ERP, SAP CRM, SAP SCM, SAP SRM , SAP PLM
SAP ERP包含：ECC, XSS, XECO, BW, Portal, PI等等
上诉东西我还不懂，只安装过ECC，是AIX平台下，数据库是11.2 RAC
…………

发表在 SAP | 标签为 sap 产品 | 留下评论

站内搜索

Search for:
Oracle证书
分类目录
- ASM (30)
- Database (86)
  - backup&recovery (21)
  - expdp/impdp (5)
  - Installation and Deinstall (31)
  - network (7)
  - ORA-600 or ORA-7445 (6)
  - Performence Tuning (13)
  - troubleshoooting (2)
- Dataguard (7)
- EBS (3)
  - EBS系统管理 (1)
  - 安装、克隆、迁移 (2)
- Exadata (120)
  - FAQ (19)
  - POC和性能调整 (11)
  - 体系架构 (19)
  - 内部机制 (22)
  - 安装和升级 (14)
  - 性能指标 (8)
    - Exadata V1 (1)
    - Exadata V2 (1)
    - Exadata X2-2 (2)
    - Exadata X3-2 (1)
    - Exadata X4-2 (1)
    - FAQ (1)
  - 故障诊断 (3)
  - 日常运维 (15)
  - 硬件配置 (43)
    - Exadata V1 (6)
    - Exadata V2 (6)
    - Exadata X2-2 (6)
    - Exadata X3-2 (8)
    - Exadata X4-2 (8)
    - FAQ (1)
- FAQ (16)
- Internal (21)
  - bbed (2)
  - DUL ODU (4)
- Linux (20)
- MYSQL (8)
  - FAQ (7)
  - 复制 (1)
  - 安装配置 (1)
- OGG (1)
- ORA-600/7445 (2)
- ORA-XXXXX (5)
- Oracle 11.1 & Oracle11.2 (6)
- ORACLE 12C (21)
- Oracle 8 & Oracle 8i (1)
- RAC (47)
- SAP (2)
- Scripts (6)
- 未分类 (1)
- 虚拟化 (1)
2025 年一月

S M T W T F S

« Nov

1 2 3 4

5 6 7 8 9 10 11

12 13 14 15 16 17 18

19 20 21 22 23 24 25

26 27 28 29 30 31
文章归档
文章归档
近期文章
近期评论
- tom 发表在《exadata巡检报告的模板》
- cyx 发表在《关于我》
- 李科胜发表在《EBS克隆–db和app分开在两个服务器上》
- xiao 发表在《exadata巡检报告的模板》
- Chris Sun 发表在《使用Oracle 11.2的DBMS_RESOURCE_MANAGER.CALIBRATE_IO对Exadata X5（HC）进行测试》

站内搜索

Oracle证书

分类目录

文章归档

近期文章

近期评论