联系:QQ(5163721)
标题:ORA-00443 background process MMNL did not start
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
启动asm报ORA-00443
[root@lunar ~]# srvctl start asm PRCR-1079 : Failed to start resource ora.asm CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-00443: background process "MMNL" did not start . For details refer to "(:CLSN00107:)" in "/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log". CRS-2674: Start of 'ora.asm' on 'lunar' failed
先看下ORA-00443的含义:
[grid@lunar ~]$ oerr ora 00443 00443, 00000, "background process \"%s\" did not start" // *Cause: The specified process did not start. // *Action: Ensure that the executable image is in the correct place with // the correct protections, and that there is enough memory. [grid@lunar ~]$
感觉是内存不足…………
再看下/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log:
2013-08-24 15:54:54.330: [ora.asm][1092012352] {0:0:2} [start] clsnUtils::error Exception type=2 string= CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log". 2013-08-24 15:54:54.330: [ AGFW][1092012352] {0:0:2} sending status msg [CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "/u01/app/12.1/grid/log/lunar/agent/ohasd/oraagent_grid/oraagent_grid.log". ] for start for resource: ora.asm lunar 1 2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} InstConnection::connectInt (2) Exception OCIException 2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} InstConnection:connect:excp OCIException OCI error 1034 2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} AsmCommonAgent DedicatedThread Exception OCIException 2013-08-24 16:19:43.750: [ USRTHRD][1114593600] {0:0:2} ORA-01034: ORACLE not available ORA-27101: shared memory realm does not exist Linux-x86_64 Error: 2: No such file or directory Process ID: 0 Session ID: 0 Serial number: 0
我们发现,ohasd进程尝试多次重启ASM都是报上述错误,貌似什么原因造成ASM起不来。。。。
接着检查下ASM的日志发现如下信息:
Starting up: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production With the Automatic Storage Management option. ORACLE_HOME = /u01/app/12.1/grid System name: Linux Node name: lunar Release: 2.6.32-300.10.1.el5uek Version: #1 SMP Wed Feb 22 17:37:40 EST 2012 Machine: x86_64 Using parameter settings in server-side spfile +DATA/ASM/ASMPARAMETERFILE/registry.253.818242245 System parameters with non-default values: large_pool_size = 12M remote_login_passwordfile= "EXCLUSIVE" IMODE=BR ILAT =0 LICENSE_MAX_USERS = 0 SYS auditing is disabled NOTE: remote asm mode is local (mode 0x301; from cluster type) Starting up: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production With the Automatic Storage Management option. ORACLE_HOME = /u01/app/12.1/grid System name: Linux Node name: lunar Release: 2.6.32-300.10.1.el5uek Version: #1 SMP Wed Feb 22 17:37:40 EST 2012 Machine: x86_64 Using parameter settings in server-side spfile +DATA/ASM/ASMPARAMETERFILE/registry.253.818242245 System parameters with non-default values: large_pool_size = 12M remote_login_passwordfile= "EXCLUSIVE" asm_diskgroups = "RECO" asm_power_limit = 1 NOTE: remote asm mode is local (mode 0x301; from cluster type) Sat Aug 24 15:54:18 2013 NOTE: PatchLevel of this instance 0 Starting background process PMON Sat Aug 24 15:54:19 2013 PMON started with pid=2, OS id=3521 Starting background process PSP0 Sat Aug 24 15:54:19 2013 PSP0 started with pid=3, OS id=3525 Starting background process VKTM Sat Aug 24 15:54:20 2013 VKTM started with pid=4, OS id=3529 at elevated priority Starting background process GEN0 Sat Aug 24 15:54:20 2013 VKTM running at (1)millisec precision with DBRM quantum (100)ms Sat Aug 24 15:54:20 2013 GEN0 started with pid=5, OS id=3535 Starting background process MMAN Sat Aug 24 15:54:20 2013 MMAN started with pid=6, OS id=3539 Starting background process DIAG Sat Aug 24 15:54:21 2013 DIAG started with pid=8, OS id=3547 Starting background process DIA0 Sat Aug 24 15:54:21 2013 DIA0 started with pid=9, OS id=3551 Starting background process DBW0 Sat Aug 24 15:54:21 2013 DBW0 started with pid=10, OS id=3555 Starting background process LGWR Sat Aug 24 15:54:21 2013 LGWR started with pid=11, OS id=3559 Starting background process CKPT Sat Aug 24 15:54:21 2013 CKPT started with pid=12, OS id=3563 Starting background process SMON Sat Aug 24 15:54:21 2013 SMON started with pid=13, OS id=3567 Starting background process LREG Sat Aug 24 15:54:21 2013 LREG started with pid=14, OS id=3571 Starting background process RBAL Sat Aug 24 15:54:21 2013 RBAL started with pid=15, OS id=3575 Starting background process GMON Sat Aug 24 15:54:22 2013 Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F6AC008] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1] Sat Aug 24 15:54:22 2013 Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F67A010] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1] Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_psp0_3525.trc (incident=27225): ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F67A010] [PC:0xA6B0D9C] [Non-existent physical address] [] Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_3579.trc (incident=28801): ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F6AC008] [PC:0xA6B0D9C] [Non-existent physical address] [] Incident details in: /u01/app/grid/diag/asm/+asm/+ASM/incident/incdir_27225/+ASM_psp0_3525_i27225.trc Incident details in: /u01/app/grid/diag/asm/+asm/+ASM/incident/incdir_28801/+ASM_gmon_3579_i28801.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Aug 24 15:54:24 2013 Using default pga_aggregate_limit of 2048 MB Sat Aug 24 15:54:26 2013 Dumping diagnostic data in directory=[cdmp_20130824155426], requested by (instance=1, osid=3525 (PSP0)), summary=[incident=27225]. Process GMON died, see its trace file Sat Aug 24 15:54:27 2013 USER (ospid: 3474): terminating the instance due to error 443 Sat Aug 24 15:54:28 2013 Instance terminated by USER, pid = 3474
这里可以发现,实际上ASM的进程已经启动了pmon, smon,ckpt,dbwr…等重要进程,但是后来被GMON进程终止了。
这里简单说下,GMON和PSP0进程都是ORACLE 10.2 ASM中就有的进程,其中:
GMON(ASM Disk Group Monitor Process)是10.2 asm引入的一个新的进程, 该进程ASM instace启动以后监控diskgroup的元数据信息,并跟ocssd进程进行交互,
GMON负责将ASM实例的Diskgroup信息发送给ocssd,这样,其他数据库实例通过跟ocssd交互并获得ASM磁盘组的信息,再之后,数据库实例就可以打开磁盘组,对其进读写的操作。
文档中是这样描述的:
GMON ASM Disk Group Monitor Process Monitors all mounted ASM disk groups GMON monitors all the disk groups mounted in an ASM instance and is responsible for maintaining consistent disk membership and status information. Membership changes result from adding and dropping disks, whereas disk status changes result from taking disks offline or bringing them online.
而PSP0进程的主要作用是创建新的进程,文档描述如下:
PSP0 Process Spawner Process Spawns Oracle background processes after initial instance startup
我们来分析一下这两个进程的trace。
检查GMON的trace,发现确实系统当时空闲内存很少:
========= Dump for incident 27225 (ORA 7445 [dbgtTrcData_int]) ======== Dump continued from file: /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_gmon_3579.trc ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F6AC008] [PC:0xA6B0D9C] [Non-existent physical address] [] ========= Dump for incident 28801 (ORA 7445 [dbgtTrcData_int]) ======== ----- Beginning of Customized Incident Dump(s) ----- Dumping swap information Memory (Avail / Total) = 75.04M / 1164.46M Swap (Avail / Total) = 3999.99M / 3999.99M Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F6AC008] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1] Registers: %rax: 0x0000000000000001 %rbx: 0x000000009f6abfd0 %rcx: 0xffffffff0000ffff %rdx: 0x0000000000000000 %rdi: 0x000000009f6aff80 %rsi: 0x0000000000000000 %rsp: 0x00007fff8433f580 %rbp: 0x00007fff8433f790 %r8: 0x000000009f6abfd0 %r9: 0x000000009f6abfd8 %r10: 0x0000000000010000 %r11: 0x000000000000000b
检查PSP0的trace,发现确实系统当时空闲内存很少:
Dump continued from file: /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_psp0_3525.trc ORA-07445: exception encountered: core dump [dbgtTrcData_int()+380] [SIGBUS] [ADDR:0x9F67A010] [PC:0xA6B0D9C] [Non-existent physical address] [] ========= Dump for incident 27225 (ORA 7445 [dbgtTrcData_int]) ======== ----- Beginning of Customized Incident Dump(s) ----- Dumping swap information Memory (Avail / Total) = 75.04M / 1164.46M Swap (Avail / Total) = 3999.99M / 3999.99M Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x9F67A010] [PC:0xA6B0D9C, dbgtTrcData_int()+380] [flags: 0x0, count: 1] Registers: %rax: 0x0000000000000001 %rbx: 0x000000009f679fd8 %rcx: 0xffffffff0000ffff %rdx: 0x000000000000004b %rdi: 0x000000009f67bf80 %rsi: 0x0000000000000000 %rsp: 0x00007fff83b440e0 %rbp: 0x00007fff83b442f0 %r8: 0x000000009f679fd8 %r9: 0x000000009f679fe0 %r10: 0x0000000000010000 %r11: 0x0000000000000008
已经差不多定位了,系统内存不足,因此,关闭VM,增加VM的内存,然后重启,一切ok了
现在看下asm中sga的参数配置:
SQL> show parameter target NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ memory_max_target big integer 1076M 居然是1G memory_target big integer 1076M pga_aggregate_target big integer 0 sga_target big integer 0 SQL> show parameter sga NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ lock_sga boolean FALSE sga_max_size big integer 1088M sga_target big integer 0 unified_audit_sga_queue_size integer 1048576 SQL>
查询了Oracle 10.2中,Oracle Database Administrator’s Guide “Using Automatic Storage Management”这一章节对于ASM的实例有如下的简单说明:
ASM Instance Memory Requirements ASM instances are smaller than database instances. A 64 MB SGA should be sufficient for all but the largest ASM installations. Total memory footprint for a typical ASM instance is approximately 100 MB.
也就是说,在10.2的环境中,该ASM实例设置为100M足够了。
从11.2以后,ASM单独有一个doc来讲: Oracle Automatic Storage Management Administrator’s Guide:
Automatic memory management automatically manages the memory-related parameters for both Oracle ASM and database instances with the MEMORY_TARGET parameter.
Automatic memory management is enabled by default on an Oracle ASM instance, even when the MEMORY_TARGET parameter is not explicitly set.
The default value used for MEMORY_TARGET is acceptable for most environments.
This is the only parameter that you must set for complete Oracle ASM memory management.
Oracle strongly recommends that you use automatic memory management for Oracle ASM. ————》Oracle强烈推荐使用AMM方式管理ASM实例
If you do not set a value for MEMORY_TARGET, but you do set values for other memory related parameters, Oracle internally calculates the optimum value for MEMORY_TARGET based on those memory parameter values.
You can also increase MEMORY_TARGET dynamically, up to the value of the MEMORY_MAX_TARGET parameter, just as you can do for the database instance.
Although it is not recommended, you can disable automatic memory management by either setting the value for MEMORY_TARGET to 0 in the Oracle ASM parameter file or by running an ALTER SYSTEM SET MEMORY_TARGET=0 statement. When you disable automatic memory management, Oracle reverts to auto shared memory management and automatic PGA memory management. To revert to Oracle Database 10g release 2 (10.2) functionality to manually manage Oracle ASM SGA memory, also run the ALTER SYSTEM SET SGA_TARGET=0 statement. You can then manually manage Oracle ASM memory using the information in “Oracle ASM Parameter Setting Recommendations”, that discusses Oracle ASM memory-based parameter settings. Unless specified, the behaviors of the automatic memory management parameters in Oracle ASM instances behave the same as in Oracle Database instances.
Notes:
For a Linux environment, automatic memory management cannot work if /dev/shm is not available or is undersized.
For more information, see Oracle Database Administrator’s Reference for Linux and UNIX-Based Operating Systems.
For information about platforms that support automatic memory management, see Oracle Database Administrator’s Guide.
The minimum MEMORY_TARGET for Oracle ASM is 256 MB. If you set MEMORY_TARGET to 100 MB, then Oracle increases the value for MEMORY_TARGET to 256 MB automatically.
也就是说,从11.2开始,Oracle强烈推荐使用AMM方式管理ASM实例,并且最小值是256M。
我们看下12c(12.1)的文档中除了11.2的上述描述外,增加了一个内容,无他:
In an Oracle Exadata environment, the recommended settings for managing memory are SGA_TARGET = 1250MB, PGA_AGGREGATE_TARGET = 400MB, MEMORY_TARGET = 0, and MEMORY_MAX_TARGET = 0.
在Exadata的ASM环境,缺省配置是这样的:
在Exadata Version 11.2.2.2.0: --------------------------------------- memory_max_target = 1073741824 memory_target = 1073741824 pga_aggregate_target = 104857600 sga_max_size = 943718400 sga_target = 943718400 shared_pool_size = 0 sort_area_size = 65536 large_pool_size = 12582912 从Exadata Version 11.2.3.1.1到11.2.3.2.1: --------------------------------------- memory_max_target = 0 memory_target = 0 pga_aggregate_target = 419430400 sga_max_size = 1325400064 sga_target = 1325400064 sort_area_size = 65536 large_pool_size = 16777216
居然用 sort_area_size ,不知道这里面有什么玄机………………