联系:QQ(5163721)
标题:Exadata上的常用工具介绍(Troubleshooting Tools)
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
Utility Path | Usage/Comments |
Infiniband | Some of these tools may be found in /opt/oracle.SupportTools/ibdiagtools on cells or database servers. Also see the Infiniband Triage wiki page. |
/opt/oracle.SupportTools/ibdiagtools/infinicheck | |
/opt/oracle.SupportTools/ibdiagtools/verify-topology | |
ibqueryerrors | |
/usr/bin/ibdiagnet | Detecting fabric issues |
/usr/sbin/ibaddr | Examining HCA state & guids |
/usr/sbin/ibcheckerrors | Detecting fabric issues |
/usr/sbin/ibcheckerrs | Detecting fabric issues |
/usr/sbin/ibcheckstate | Detecting fabric issues |
/usr/sbin/ibcheckwidth | Detecting fabric issues |
/usr/sbin/ibclearcounters | Reset counters when detecting fabric issues |
/usr/sbin/ibclearerrors | Reset counters when detecting fabric issues |
/usr/sbin/ibdatacounters | Not directly used. perfquery is used instead |
/usr/sbin/ibdatacounts | Not directly used. perfquery is used instead |
/usr/sbin/ibhosts | Lising cells/db nodes |
/usr/sbin/iblinkinfo.pl | Obtaining the fabric topology |
/usr/sbin/ibnetdiscover | Obtaining the fabric topology |
/usr/sbin/ibnodes | Lising cells/db nodes/switches |
/usr/sbin/ibping | Checking IB level connectivity |
/usr/sbin/ibportstate | Testing port failure/disabling bad links |
/usr/sbin/ibqueryerrors.pl | Detecting fabric issues |
/usr/sbin/ibstat | Examining HCA state & guids |
/usr/sbin/ibstatus | Examining HCA state & guids |
/usr/sbin/ibswitches | Listing IB switch names |
/usr/sbin/ibtracert | Examining IB routes |
/usr/sbin/perfquery | Computing throughput, detecting fabric errors |
/usr/sbin/saquery | Not directly used |
/usr/sbin/set_nodedesc.sh | Setting the HCA node description based on node type |
/usr/sbin/sminfo | Determing location of master SM |
/usr/sbin/smpdump | not directly used |
/usr/sbin/smpquery | not directly used |
/usr/sbin/vendstat | not directly used |
/usr/bin/ibv_devices | listing local HCAs |
/usr/bin/ibv_devinfo | listing details of local HCAs |
/usr/bin/ibv_rc_pingpong | Determining working status of HCA |
/usr/bin/ibv_srq_pingpong | Determining working status of HCA |
/usr/bin/ibv_uc_pingpong | Determining working status of HCA |
/usr/bin/ibv_ud_pingpong | Determining working status of HCA |
/usr/bin/mstflint | Burning new HCA firmware/obtaining current firmware version |
/usr/bin/ib_rdma_bw | Computing IB level stats for troubleshooting |
/usr/bin/ib_rdma_lat | Computing IB level stats for troubleshooting |
/usr/bin/ib_read_bw | Computing IB level stats for troubleshooting |
/usr/bin/ib_read_lat | Computing IB level stats for troubleshooting |
/usr/bin/ib_send_bw | Computing IB level stats for troubleshooting |
/usr/bin/ib_send_lat | Computing IB level stats for troubleshooting |
/usr/bin/ib_write_bw | Computing IB level stats for troubleshooting |
/usr/bin/ib_write_lat | Computing IB level stats for troubleshooting |
/usr/bin/qperf | Computing throughput for RDS/TCP/SDP protocols |
/sbin/ifconfig | Determining configuration/status of network interfaces |
/usr/bin/ib-bond | Determining active slave interface for bond0 |
/usr/bin/rds-gen | Not directly used |
/usr/bin/rds-info | Examining RDS state |
/usr/bin/rds-ping | Determining RDS connectivity |
/usr/bin/rds-sink | Not directly used |
/usr/bin/rds-stress | Profiling RDS performance |
Imaging and versions | These tools are related to imaging status and info as well as versions installed |
imagehistory | |
imageinfo | Only on database servers version >= 11.2.1.3 |
/opt/oracle.cellos/CheckHWnFWProfile | Only applicable on cells. With the -d option, it will display versions found. Without options, it will report any mismatches against known correct vaiues. |
/opt/oracle.SupportTools/CheckSWProfile.sh | Only applicable on cells. Without options, displays any mismatch against known good configurations. |
collectlogs.sh | for collecting logs from onecommand deployments |
Networking | |
cat /proc/net/bonding/bond* | |
cat /sys/class/net/eth?/operstate | |
cat /sys/class/net/bond*/operstate | |
ifconfig | |
ethtool <interface_name> | reports information about the interface like link mode capabilities |
Logfiles on both database server and cells | |
/var/log/messages | Older versions of this file will be automatically renamed as messages.<number> with number 1 being the most recent history. |
dmesg (a command that displays log) | |
/var/log/cellos/validations.log | |
/var/log/cellos/validations/*log | |
Logfiles on cells | |
$ADR_BASE/diag/asm/cell/<hostname>/trace/alert.log | Cell’s alert log. Also will find cell’s trace files in the same directory as the alert.log |
Logfiles on database servers | |
$ORACLE_BASE/diag/asm/+asm/<instname>/trace/alert_<instname>.log | ASM alert logfile |
$ORACLE_BASE/diag/rdbms/<dbname>/<instname>/trace/alert_<instname>.log | DB alert log – one for each database running…may be more than one DB |
/u01/app/11.2.0/grid/log/<hostname>/alert<hostname>.log | Grid Infrastructure alert logfile. This log is relatively high-level and will often lead you to one of the logs mentioned in the entry just below this one. |
/u01/app/11.2.0/grid/log/<hostname>/[cssd,crsd,diskmon]/*.log | Logfiles for CSSD, CRSD, and diskmon processes. These processes are the most likely ones to have issues and will expose most issues. |
Infiniband Switches | These commands may be run on IB switches |
sminfo | shows the current subnet master switch in the fabric – there should be exactly one regardless of how many switches are present in the fabric |
ibswitches | lists all IB switches in the fabric |
showunhealthy | shows any unhealthy sensors |
env_test | lists all the data from the environmental sensors in the switch |
nm2version | shows the current versions – use this to determine what version the switch is running right now |
getfanspeed | shows the speed of the internal fans in the switch – can be useful if showunhealthy indicates a problem with one of the fans |
Cell software commands (cellcli and friends) | These commands may be run from within cellcli |
list cell detail | |
list alerthistory | |
list celldisk detail | |
list griddisk detail | |
list lun detail | |
list physicaldisk detail | |
list flashcache detail | |
list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome | |
alter cell validate configuration | |
adrci | show incident |
mdadm –misc –detail /dev/md* | for an overview of the state of the raid devices on the storage cell |
cat /proc/mdstat | for a view of the status of the devices |
/usr/local/bin/ipconf –verify | |
mdadm -Q –detail /dev/md? | state information on a particular meta device |
<GRID_HOME>/bin/kfod disks=all | lists disks available from DB node for ASM use (run on DB node) |
Hardware | These commands may be run to query hardware status. Unless otherwise noted, they apply to cells and database servers. |
ipmitool sel list | Lists the system event logs – these logs sometimes show HW events that aren’t seen elsewhere. |
ipmitool sunoem cli ‘show /SYS’ | Shows system serial number, fault_state (overall fault state, not necessarily a rollup – may be a fault on a component-level) |
/opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 | All adapter info |
/opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -dsply -a0 | Diplay controller’s log |
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0 | Get battery status |
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -a0 | Get battery properties |
/opt/MegaRAID/MegaCli/MegaCli64 -LDinfo -Lall -aALL | Looking for WriteThrough? on the Current Cache Policy – if disabled, may affect performance; easier to get this information from cellcli -e list lun attributes name,lunWriteCacheMode,status |
/opt/MegaRAID/MegaCli/MegaCli64 -LDPdInfo -aAll | Helpful to investigate predictive failure if necessary |
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0 | The Inquiry Data will contain the drive firmware, but decoding the string to get the firmware requires special instructions – beyond what is here. Check list physicaldisk attributes physicalFirmware in cellcli for drive FW version. |
lspci [-v [ -v [ -v ]]] | Listing PCI devices. The more -v arguments you add, the more information detail it provides |
lsscsi | Especially helpful on cells. Flash cards will show up as MARVELL devices. There should be 16 flash devices listed. If not, there’s a card missing or not visible to the OS. |
/opt/oracle.cellos/scripts_aura.sh | This script lists the flash disks as will be seen from the cell software |
/opt/oracle.SupportTools/sundiag.sh | Gathers many diagnostic command outputs and important logfiles for analysis of storage cell and disk issues |