明天又要刷机器了,装机工很久没玩,快忘光了,温习一下,O(∩_∩)O哈哈~
1,刷机前先检查和保留当前系统关键部件的信息,例如:
/opt/oracle.SupportTools/CheckHWnFWProfile -s /opt/oracle.SupportTools/CheckHWnFWProfile -c loose /opt/oracle.SupportTools/CheckSWProfile.sh -I dm01sw-ib2,dm01sw-ib3 imageinfo imagehistory
2,跟NOTES 888828.1的内容,找到相关的image,download后,解压,例如:
unzip ImageMaker.tar.zip
tar -pxvf ImageMaker.tar
DB的image解tar后,可以发现 dl360 目录
CELL的image解tar后,可以发现 dl180 目录
这是因为,Exadata早先跟HP合作推出的V1,用的都是HP的pcserver系列,计算节点的型号是 dl360,存储节点的型号是 dl180,后来也就一直都没有更改了。
我们有四种方式刷机:
1. 用U盘刷机,也就是 USB flash thumb drive
2. 制作ISO image,使用ILOM指定iso的方式(当然如果刻录成光盘,也可以使用DVD模式)
3. 制作一个紧急启动的iso文件(类似于紧急启动盘),然后把image放在NFS上,进行刷机
4. 使用PXE+NFS
上面的4种方法,对于1/4配置来说,哪个都不复杂,用U盘和ISO Image最简单,也最省心。
对于满配或者大量的reimage工作来说,显然U盘就太不可取了,会累死人的,可以使用PXE+NFS和ISO image。
无论哪种方式,制作Reimage的命令都是一个makeImageMedia.sh,语法如下:
./makeImageMedia.sh [-preconf <prconf.csv file full pathname>] [ <dvd iso file name> | [-pxe [-pxeout <pxe output filename> ]] | [<nfs iso filename> -nfs nfs_share -dir nfs_dir [-nfs_ip <ip addr for nfs server>] [-dhcp] [-logpath <[lognfs_ip:]/full path to writeable nfs share>] ] ]
Exadata出厂时带有双操作系统,一个是Linux,一个是solaris x86,通常,至少国内的客户绝大部分都会选择使用Linux,因此,在安装完成后,我们需要做reclaim操作。
如果是Reimage,那么我们也可以在制作U盘,image或者使用PXE时带上 -dualboot=no 选项,这样就节省了后面刷机后的reclaim的时间(reclaim我印象中都要1小时以上)
Exadata出厂时带有缺省IP,这些信息在随机附带的document中可以找到。
Exadata官方文档的位置
在文档中,给出了Exadata出厂时的所有缺省IP,你可以看到,IP的分布是对应到物理机器的(当然,X4没有最下面的一个Spine switch了,级联的时候,如果需要可以单独购买)。
比如如果你是1/4 Rack的,那么可以根据每个部件的位置,确定其管理IP或者ILOM IP等等。
刷机时可以附带上新的IP配置文件, 这样就可以刷机后直接使用全套新的你指定的IP。当然如果不带的话,刷完了,机器所有IP就是文档中的缺省IP的配置了。
例如下面的过程制作了一个U盘,我使用了预先配置的preconf.csv文件来制定新的IP:
1,使用U盘启动的方式
[root@dm01db01 dl360]# ./makeImageMedia.sh -preconf /tmp/preconf.csv Done. Pre config verification OK Please wait. Calculating md5 checksums for cellbits ... Please wait. Making initrd ... 199367 blocks Please wait. Calculating md5 checksums for boot ... Choose listed USB devices to set up the Oracle CELL installer sdd Approximate capacity 15441 MB Enter the comma separated (no spaces) list of devices or word 'ALL' for to select all: sdd sdd will be used as the Oracle CELL installer All data on sdd will be erased. Proceed [y/n]? y The number of cylinders for this disk is set to 1922. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 1922. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): Command action e extended p primary partition (1-4) Partition number (1-4): First cylinder (1-1922, default 1): Last cylinder or +size or +sizeM or +sizeK (1-1922, default 1922): Command (m for help): The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. umount2: Invalid argument umount: /dev/sdd1: not mounted mke2fs 1.39 (29-May-2006) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 1929536 inodes, 3857600 blocks 192880 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=3951034368 118 block groups 32768 blocks per group, 32768 fragments per group 16352 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 28 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. Copying files... will take several minutes GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.] grub> root (hd0,0) Filesystem type is ext2fs, partition type 0x83 grub> setup (hd0) Checking if "/boot/grub/stage1" exists... no Checking if "/grub/stage1" exists... yes Checking if "/grub/stage2" exists... yes Checking if "/grub/e2fs_stage1_5" exists... yes Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded. succeeded Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded Done. grub> Done creation of installation USB for DL360 [root@dm01db01 dl360]#
之后使用这个USB启动服务器,启动后确认重新两次(它会提示你的,怕你误操作,O(∩_∩)O哈哈~),之后就没你什么事情了,他自己就work了……
之间会重启几次,最后一次,会提示你拔出U盘,然后,重启一下,就ok了。
2,使用ISO image的方式
由于ILOM提供了远程映射ISO文件的重定向功能,因此,我们可以启动ILOM,选择使用CD-ROM image的方式,然后使用ILOM执行reset来重启
如果是X2,那么就可以去喝茶了,如果是X3,那么还要等启动时在Bios中确认使用CDROM方式启动,然后再去喝茶……
如果ISO image的方式,无论是否有 -dualboot=no 选项,重装完的计算节点上都不再是dualboot,可以使用 reclaimdisks.sh -check 进行确认,使用ISO方法如下:
[root@lunar dl360]# ./makeImageMedia.sh -preconf ../preconf_db.csv -stit -notests diskgroup -nodisktests db_img112330.iso
Done. Pre config verification OK Please wait. Calculating md5 checksums for cellbits ... Calculating md5 checksum for exaos.tbz ... Calculating md5 checksum for dbboot.tbz ... Calculating md5 checksum for dbfw.tbz ... Calculating md5 checksum for kernel.tbz ... Calculating md5 checksum for ofed.tbz ... Calculating md5 checksum for sunutils.tbz ... Calculating md5 checksum for hputils.tbz ... Calculating md5 checksum for c7rpms.tbz ... Calculating md5 checksum for commonos.tbz ... Calculating md5 checksum for debugos.tbz ... Calculating md5 checksum for dbrpms.tbz ... Please wait. Making initrd ... 。。。。。。
使用ISO image的方式,不用拔U盘,O(∩_∩)O哈哈~,他自己重启几次(大概2~3次,忘记了),然后出现“Installation SUCCESSFUL”就ok了。
3,使用ISO image + NFS 方式
这个我没有尝试,但是根据readme,我看大约是制作了iso文件放在子目录中,当然,这个目录是放在NFS上的:
makeImageMedia.sh x.iso -nfs /exports/images -dir dl180/11132
readme上有详细的步骤,具体可以参考readme来完成。
4,使用PXE+NFS的方式
(1)首先要确认TITP功能,如果没有需要安装syslinux
yum install syslinux
(2)制作image
cd dl360/
./makeImageMedia.sh -pxe
检查image文件:
cd /tftpboot/linux-install/dl360/PXE ls -l -rw-r--r-- 1 root root 38813575 Aug 19 10:39 initrd-11.2.3.2.1-130109-DL360.img -rw-r--r-- 1 root root 1325076480 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar -rw-r--r-- 1 root root 69 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar.md5 -r-xr-xr-x 1 root root 3688864 Aug 19 10:39 vmlinux-11.2.3.2.1-130109-DL360 cd /tftpboot/linux-install ls -l drwxrwxr-x 7 root root 4096 Aug 19 10:39 dl360 -rw-r--r-- 1 root root 38813575 Aug 19 10:39 initrd-11.2.3.2.1-130109-DL360.img drwxr-xr-x 2 root root 4096 Aug 16 2012 msgs -rw-r--r-- 1 root root 1325076480 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar -rw-r--r-- 1 root root 69 Aug 19 10:39 nfsimg-11.2.3.2.1-130109-DL360.tar.md5 -rw-rw-r-- 1 root root 13100 Jul 25 2011 pxelinux.0 drwxr-xr-x 2 root root 4096 Aug 19 09:15 pxelinux.cfg -r-xr-xr-x 1 root root 3688864 Aug 19 10:39 vmlinux-11.2.3.2.1-130109-DL360
(3)配置NFS Exports,并启动nfs server
cat /etc/exports service nfs restart
(4)安装 TFTP SERVER
yum install tftp-server chkconfig --level 345 tftp on
(5)修改 TFTP 配置文件:
/tftpboot/linux-install/pxelinux.cfg/default 注意,其中的 kernel vmlinux-11.2.3.2.1-130109-DL360 就是上面我们生成的
(6)配置DHCP
yum install dhcp mv /etc/dhcpd.conf /etc/dhcpd.orig chkconfig --level 345 dhcpd on
检查 /etc/dhcpd.conf,例如:
option ip-forwarding false; # No IP forwarding option mask-supplier false; # Don't respond to ICMP Mask req subnet 10.187.114.0 netmask 255.255.254.0 { option routers 10.187.114.1; } group { next-server 10.187.115.250; ######这个是PXE SERVER filename "linux-install/pxelinux.0"; option root-path "10.187.115.250:/tftpboot/linux-install"; host exadbmel02 { hardware ethernet 00:21:28:A3:27:68; ######eth0的MAC地址 fixed-address 10.187.115.225; ######这个是需要做reimage的节点的eth0 } }
这里确认eth0的信息,也可以通过ILOM的方式: System Information -> Components -> /SYS/MB/NET0
或者ssh到ILOM,执行: show /SYS/MB/NET0
(7)重启一下所有网络相关服务:
service dhcpd restart service xinetd restart service iptables stop
这样配置好了PXE,就可以使用PXE+NFS了,后面的过程类似于ISO image了。
上述4中reimage的方法和详细步骤,参见readme:
Copyright (c) 2009, 2011, Oracle and/or its affiliates. All rights reserved. ================================================================ Imaging install media options and imaging procedures for Exadata ================================================================ DO EVERYTHING AS root USER |-------------------------------------------------------------------------| | NOTE: FOR SUN Factory the process involves 2 rounds of PXE+NFS imaging | | Scroll all the way to the end of the document to see the overall | | steps in the process and sample pxe configuration files. | |-------------------------------------------------------------------------| Creation of the installation USB or ISO: Download the production ImageMaker.tar.zip files on some machine with Oracle Enterprise Linux 64bit or RHEL 64 bit that has grub 0.97 and has tar with bzip2 support - grub --version will show the grub version As root user extract the ImageMaker.tar.zip file unzip ImageMaker.tar.zip tar -pxvf ImageMaker.tar Cell node image extracts to dl180 DB node image extracts to dl360 The makeImageMedia.sh script inside the above directories is used to create the actual image installation media. The installation media can be 1. USB flash thumb drive 2. ISO image that may be used on DVD or as remote virtual media using LightsOut remote virtual media capabilities. 3. ISO+NFS - where a small iso file is used to boot the system and the imaging payload is hosted on a NFS server. 4. PXE+NFS --------------------------------------------------------------------- | Run all commands AS root from inside the dl180 or dl360 directory | --------------------------------------------------------------------- ./makeImageMedia.sh [-preconf <prconf.csv file full pathname>] [ <dvd iso file name> | [-pxe [-pxeout <pxe output filename> ]] | [<nfs iso filename> -nfs nfs_share -dir nfs_dir [-nfs_ip <ip addr for nfs server>] [-dhcp] [-logpath <[lognfs_ip:]/full path to writeable nfs share>] ] ] Install media preparation ------------------------- USB thumb drive - also known as the CELLINSTALL USB --------------------------------------------------- NOTE: It is best to have no other external USB storage devices connected to the machine on which you prepare the installer USB s. Insert empty USB thumb drives of size between 2GB and 32GB, and follow prompts after executing: ./makeImageMedia.sh ISO - That may be burnt on DVD and used for install --------------------------------------------------- ./makeImageMedia.sh <iso file name> Example: ./makeImageMedia cell.iso NFS+ISO: -------- nfs iso is either created on the nfs server itself where the nfs export path is nfs_share e.g. /exports/images and nfs_dir is the subdirectory of nfs_share where image bits are copied by the iso creation command e.g. dl180/11132. OR The iso may be created anywhere using the nfsip option to supply the ip address of the nfs server and then the contents of nfs_share directory must be copied to nfs server at the EXACT same path as nfs_share NOTE: nfs iso MUST either be built on the nfs server as root user OR you must supply the ip address for nfs server with -nfsip Example: nfs iso built on the nfs server: makeImageMedia.sh x.iso -nfs /exports/images -dir dl180/11132 Will create x.iso and copy image bits to /exports/images/dl180/11132 You can copy the preconf.csv file then to /exports/images/dl180/11132 You can of course embed the preconf.csv file in the iso itself with makeImageMedia.sh x.iso -nfs /exports/images -dir dl180/11132 \ -preconf preconf.csv nfs iso NOT built on nfs server: makeImageMedia.sh x.iso -nfsip 123.123.123.123 -nfs /exports/images \ -dir dl180/11132 Will create x.iso and copy image bits to /exports/images/dl180/11132 You can copy the preconf.csv file then to /exports/images/dl180/11132 You must now copy the entire contents of /exports/images/dl180/11132 onto the real nfs server 123.123.123.123 exactly at path /exports/images/dl180/11132 and the nfs server must export /exports/images -logpath option now may be passed to makeImageMedia.sh to extract the logs from imaging and zero and first boot validations. See the logpath option syntax in PXE support -dhcp option now may be passed to makeImageMedia.sh to use dhcp to get initial ip address during imaging. -multiprof option will create multiprofile images Optional command line options for USB / ISO / NFS+ISO installation media -------------------------------------------------------------------------- -factory print special [FACTORY_...] messages for use by factory. -kerver <kernel_version> overwrite default kernel. By defaut installing kernel version depends on hardware type (system product name). List of kernels and hardware dependencies defined in the "kernel_ver" line at the very top of makeImageMedia.sh script. -dualboot <yes|no> overwtire default dual boot behaviour. It's only applicable for the db node. By default dual boot feature (installing Linux image and prereserving space for Solaris) depends on hardware type (system product name). By default X4170 M2 and X4800 (G5) servers have dual boot. -stit signals to force reimage even if installation exists on the machine -reboot-on-success Do not wait for operator to power off machine on successful image, just reboot. Useful for mass unattended imaging using preconf option. DO NOT use in factory process. -nodisktests do not run disktests at zero boot. They take 6-14 hours! -notests <group for validations to be skipped, e.g. diskgroup> skips all validations with given vldgroup name. For example, if diskgroup is given as the group name, then each validation with vldgroup set to diskgroup will be skipped. diskgroup today includes disktests, calibration and diskhealth. This is one more way to skip disktests in addition to the nodisktests option. PXE+NFS ------- Do everything as root user 0. tar -pxvf the ImageMaker.tar file resulting in dl180 or dl360 directory. cd to the directory dl180 or dl360 1. You make PXE build using ./makeImageMedia.sh -pxe [-pxeout <ImageName>] This will create 3 files in ./PXE directory kernel - vmlinux-<version>-ImageName-<DL180 or DL360 cell or db respectively> initrd - initrd-<version>-ImageName-<DL180 or DL360 cell or db respectively>.img image - nfsimg-<version>-ImageName-<DL180 or DL360 cell or db respectively>.tar The kernel and initrd files are sent to the node being imaged by the PXE server. The image tar file should be extracted in the nfs_share directory by root user as shown in next step. Example: ./makeImageMedia.sh -pxe -pxeout dl180 PXE nfs image: /dani/11132/dl180/./PXE/nfsimg-11.1.3.2.0-dl180-DL180.tar PXE initrd: /dani/11132/dl180/./PXE/initrd-11.1.3.2.0-dl180-DL180.img PXE kernel: /dani/11132/dl180/./PXE/vmlinux-11.1.3.2.0-dl180-DL180 2. Prepare nfs share Copy the nfsimg tar and md5 files to nfs_share directory. Example: Using nfs_share from the following example, where the nfs server that holds the image content is 123.45.67.189 and the nfs_share on it is /vol/exadata/dl180, and assuming the image tar file was in /root. cp /root/nfsimg-11.1.3.2.0-dl180-DL180.* /vol/exadata/dl180 Releases older than 11.2.1.3.0 should exptract the nfsimg files: Extract nfsimg-11.1.3.2.0-dl180-DL180.tar to the nfs_share "tar -x -p -v -C <nfs_share> -f nfsimg-11.1.3.2.0-dl180-DL180.tar" Example: Using nfs_share from the following example, where the nfs server that holds the image content is 123.45.67.189 and the nfs_share on it is /vol/exadata/dl180, and assuming the image tar file was in /root. "tar -x -p -v -C /vol/exadata/dl180 -f /root/nfsimg-11.1.3.2.0-dl180-DL180.tar" 3. Configure the PXE server Assume PXE server has both DHCP and TFTP daemons started on it. Copy initrd and kernel into /tftpboot on the PXE server. Create DHCP configuration in /tftpboot/pxelinux.cfg/<hexadecimal_ip_address_of_the_imaging_machine>. You can also use the MAC address for the eth0 NIC for filename instead of hexadecimal ip address. For ip address 123.123.123.123 the filename based on ip address will be 7B7B7B7B. If the MAC address is used, and it was 12:34:56:78:90:12, then the file will be 01-12-34-56-78-90-12. Please check your specific PXE server requirements, the above names are what were used in our testing and your PXE server may need slightly different names or settings. Examples of configuration file Example 1 --------- # With dhcp and disktests default linux prompt 1 timeout 72 label linux kernel vmlinux-11.1.3.2.0-dl180-DL180 append initrd=initrd-11.1.3.2.0-dl180-DL180.img pxe stit updfrm dhcp sk=123.45.67.189:/vol/exadata/dl180 preconf=123.45.67.123:/vol/configs/exadata/allexadataservers.csv Example 2 --------- # With explicit ethX address instead of dhcp and no disktests or # calibration default linux prompt 1 timeout 72 label linux kernel vmlinux-11.1.3.2.0-dl180-DL180 append initrd=initrd-11.1.3.2.0-dl180-DL180.img pxe stit updfrm reboot-on-success notests=diskgroup sk=123.45.67.189:/vol/exadata/dl180 eth0=123.123.123.123:255.255.254.0:123.123.123.1 preconf=123.45.67.123:/vol/configs/exadata/allexadataservers.csv Explanation of command line options: ----------------------------------- pxe - signals this is pxe imaging mandatory parameter factory - print special [FACTORY_...] messages for use by factory. optional parameter. stit - signals to force reimage even if installation exists on the machine optional parameter updfrm - check hardware and firmware versions. Update firmware where applicable. optional parameter. kerver=<kernel_version> overwrite default kernel. By defaut installing kernel version depends on hardware type (system product name). List of kernels and hardware dependencies defined in the "kernel_ver" line at the very top of makeImageMedia.sh script. dualboot=<yes|no> overwtire default dual boot behaviour. It's only applicable for the db node. By default dual boot feature (installing Linux image and prereserving space for Solaris) depends on hardware type (system product name). By default X4170 M2 and X4800 (G5) servers have dual boot. dhcp - dhcp is optional parameter and should NOT be given with eth0 option If given, dhcp is used to obtain the initial dhcp address instead of requiring to pass the eth0 information. ethX=<ip>:<netmask>:<gateway> - ethX is optional parameter. Do NOT give with dhcp above. - where X is one of 0,1,2,3 on Sun and 0 on HP machines If both dchp and ethX are absent imaging will enter interactive mode and ask for Ethernet ip, netmask and gateway information sk=<nfsip>:<nfs_share> - sk is mandatory parameter for the PXE boot. MUST use IP address not the hostname for NFS server. nfs_share is the full path to directory where the 3 files from PXE directory are available logpath=[nfsip:]<full path to writeable nfs share> - logpath is optional parameter If given it will copy the imaging, zero and first boots logs to the writable nfs share location in single tar bzip2 file <serial_num>.tbz. The serial_num is the serial number of the system obtained as dmidecode -s system-serial-number ----------------------------------------------- Development ONLY options NOT for use in Factory ----------------------------------------------- reboot-on-success - Do not wait for operator to power off machine on successful image, just reboot. Useful for mass unattended imaging using preconf option. DO NOT use in factory process. optional parameter multiprof - Image the node as multi profile enabled This parameter is optional NOTE: Do NOT build the images with -multiprof to makeImageMedia.sh If you do that image can only be used to do multi profile nodes. nodisktests - do not run disktests at zero boot. They take 6-14 hours! This parameter is optional notests=<group for validations to be skipped, e.g. diskgroup> - Skips all validations with given vldgroup name. For example, if diskgroup is given as the group name, then each validation with vldgroup set to diskgroup will be skipped. diskgroup today includes disktests, calibration and diskhealth. This is one more way to skip disktests in addition to the nodisktests option. This parameter is optional preconf=[<preconf_nfsip>:]<full path name of preconf_file on nfs server> - preconf is optional parameter The nfsip MUST be IP address of the NFS server not its hostname The preconf_nfsip can be same or different than the nfsip in sk option, allowing the preloaded configuration file to reside on different subtree or entirely different nfs server from that of the imaging bits on the nfsip nfs server. Installation process -------------------- Preinstall steps for HP DL180 ----------------------------- Imaging will stop and require you to confirm to continue if: a) P400 Smart Array disk controller is not in PCIe x8 slot b) There are additional USB s besides the CELLINSALL USB and the blank USB for use as CELL boot USB c) All drives are not identical model and make On the target machine set up the BIOS boot sequence such that - Hard disk drives is the first in the boot sequence - Within the hard disk drives option the USB flash disk(s) are before the P400 disk controller - Disable removable drives Use the screen shots in the doc directory as guide. Preinstall steps for HP DL360 ----------------------------- Imaging will FAIL if following are true: a) Infiniband card is not in PCIe x8 slot On the target machine set up the BIOS boot sequence such that - USB flash is first in the boot sequence Use the screen shots in the doc directory as guide. Preinstall steps for SUN X4275 ------------------------------ Imaging will stop and require you to confirm to continue if: a) LSI 9261-8i disk controller is not in PCIe x8 slot b) There are additional USB s besides the CELLINSALL USB and the blank USB for use as CELL boot USB c) All drives are not identical model and make On the target machine set up the BIOS boot sequence such that - The CELLINSTALL USB is first in boot order - The internal CELLBOOT USB (UNIGEN) is the second after the CELLINSTALL USB - The LSI disk controller is next Use the screen shots in the doc directory as guide. Preinstall steps for SUN X4170 ------------------------------ Same as SUN X4275 - Ignore any messages as stated in the Things to ignore section above - If BIOS, Disk controller or disk firmware needs update the imaging process will update the firmware and try to power cycle the machine using ipmi. ----- ALERT: It is possible that the machine may not boot back after such power ----- cycle due to issues with BIOS boot order being reset or the ipmi power cycle not properly able to complete. Please manually power cycle the machine to continue imaging. - After imaging and automatic creation of the internal CELL boot USB machine will launch several health checks and long disk tests. SAS 600GB drive disk tests will take up to 9 hours SATA 2TB drive disk tests will take up to 14 hours SAS 450GB drive disk tests will take up to 12 hours MDL SAS 1TB (SATA 7200RPM 1TB drives) drive disk tests will take up to 48 hours (A) When Success of validation tests When all tests pass the machine will indicate the success of installation on the console and wait for you to power off the machine. In case of a reimage the machine may come to "localhost login:" prompt. Login as root/welcome1 and reboot the machine. (B) When failure of validation tests When a validation fails the machine will prompt you to choose to rerun the validations on reboot. You must choose to re-run the tests. After you finish making the choice the machine will either present "localhost login:" prompt or exit to a shell. You can logon as root and password welcome1 if login prompt is presented. Please 1. Examine the log files in /var/log/cellos/validations/ to identify the cause of failure. Correct the problem and reboot the machine. The machine will rerun the tests unless you had chosen not to re-run them. 2. If you can not easily identify the cause, please reboot the machine to see if the checks pass. If you get prompted for hostname and other configuration information, you should poweroff the machine. Install steps ------------- Using CELLINSTALL USB: ---------------------- Insert the CELLINSTALL usb in any USB slot on target machine and boot the machine - Assuming the machine is bare metal it will automatically boot from the USB and the imaging process will start automatically Using ISO: --------- Boot the system using the ISO and follow prompts. For bare metal imaging will start automatically Using ISO+NFS: ------------- Boot the system with the iso and follow prompts. For bare metal imaging will start automatically Using PXE+NFS: ------------- Boot the system using PXE by pressing F12 after during BIOS initialization splash screens. For bare metal imaging will start automatically. You can also use "ipmitool chassis bootdev pxe" from already imaged system, to force the system to boot one time on next reboot. This is useful for mass re-imaging of systems. Unattended first boot configuration: ----------------------------------- There is now support for unattended first boot configuration as long as you build it in the image. The steps to use unattended first boot configuration need you to build the image media with new option to makeImageMedia.sh See the sample_preconf.csv file for example preconfiguration file. This file can be (A) passed to the makeImageMedia.sh, and/or (B) it can be copied to the nfs_share location when using PXE and/or (C) it can be copied to the nfs_share/nfs_dir location if using nfs iso. (D) it can be copied to the / directory of the install USB If the file is inserted in the install media using (A), and if it is also passed with methods (B) to (D), then the file from (B) to (D) takes precedence over the file passed using (A). This allows you to update the file after creating the installer media - USB or the iso+nfs or the pxe+nfs, so that you can image more machines using the same media. Preparing the preconf.csv file: ------------------------------- 1. First line with "Cell Preconfig version" is mandatory 2. Title line starting with "Hostname, Domain, ...." is mandatory and format is fixed 3. You MUST NOT change these two lines. 4. The line started with "common" keyword in Hostname column is optional, and provides a way of supplying common values. 4.1. The "common" line MUST not contain these a. "eth0 mac address" b. "eth0 ip" c. "bond0 ip" d. "hostname" 4.2. Multiple "common" lines are allowed. Each next common line overwrites all previous common settings. 5. Any individual line for the host MUST have unique hostname, eth0 mac address, eth0 ip and bond0 ip values. 5.1. If individual column is empty the value from the common column is used 5.2. All values are mandatory except nameservers and NTP servers 5.3. The full hostname is result of "$HOSTNAME.$DOMAIN" 5.4. Nameservers and NTP servers have to be separated by space. 6. Any line starting with # is treated as comment line. NOTE: It's a good practice to validate syntax of .csv file. You can do it using <dl180 or dl360>/initrd/opt/oracle.cellos/ipconf -verify -preconf <path_to_csv_file> dl180 or dl360 are top level directories when you extract the ImageMaker tar.zip files. -------------------------------------------------------------------------------- SAMPLE of preconf.csv file - See csv files in /opt/oracle.SupportTools/firstconf -------------------------------------------------------------------------------- Things to ignore safely during install: -------------------------------------- HP: -- 1) "cciss/cXdYpZ Invalid partition table" The above message will repeat several times with X,Y, and Z are some integers - X is the P400 smart array disk controller slot number - Y is the disk number starting with 0 for the slot - Z is the partition number on the disk Reason for this is unknown and the message is harmless SUN+HP: ------ 1) "RAID1 conf printout:..." This is the software RAID printout we are not yet able to find a way to suppress 2) tar: <file>: <date stamp> is X s in future Known Issues and work around: ----------------------------- 1) ONLY on HP DL180: With the install USB and a blank USB in the machine, you may get "Disk error. Invalid disk press any key to continue..." Please fix the BIOS boot sequence as indicated in various screen shots (.gif files in ScreenShots directory) 2) Installer USB does not work from some USB slots: Solution: Try different slots - if all else fails create fresh usb and try if that also fails get a new machine. Miscellaneous tips: ------------------- NOTE: 1) The disk controller on the Exadata cell nodes must be in PCIe x8 slot for optimal performance (performance can degrade 50% if this is not the case). 2) The Infiniband card must be in PCIe x8 slot on compute aka database nodes You can confirm the speed of the slot as follows: as root run command lspci -vvv on freshly imaged box and check for the Link speed for disk controller, the Infiniband cards. Look at line marked ===> in the sample outputs of the lspci -vvv command below. You should have matching lines for the disk controller and Infiniband cards on your machines - If NOT then you will need to open the machine and relocate the corresponding cards to correct slots For DL180 - the Exadata cell nodes: ---------------------------------- 03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 03) Subsystem: Hewlett-Packard Company P400 SAS Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 169 Region 0: Memory at fcc00000 (64-bit, non-prefetchable) [size=1M] Region 2: I/O ports at e800 [size=256] Region 3: Memory at fcbff000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at fcb80000 [disabled] [size=256K] Capabilities: [b0] Express Endpoint IRQ 0 Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 128 bytes, MaxReadReq 2048 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0 Link: Latency L0s <2us, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- ====> Link: Speed 2.5Gb/s, Width x8 Capabilities: [d4] MSI-X: Enable+ Mask- TabSize=4 Vector table: BAR=0 offset=000fe000 PBA: BAR=0 offset=000ff000 Capabilities: [e0] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [ec] Vital Product Data Capabilities: [100] Power Budgeting 05:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0) Subsystem: Mellanox Technologies MT25418 [ConnectX IB DDR] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 169 Region 0: Memory at fce00000 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at fa800000 (64-bit, prefetchable) [size=8M] Region 4: Memory at fcdfe000 (64-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Mask- TabSize=256 Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00001000 Capabilities: [60] Express Endpoint IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag+ Device: Latency L0s <64ns, L1 unlimited Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- ====> Link: Speed 2.5Gb/s, Width x4 For DL360 - the database aka compute nodes: ------------------------------------------ 13:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev 20) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 177 Region 0: Memory at fdf00000 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M] Region 4: Memory at d0000000 (64-bit, prefetchable) [size=128M] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Capabilities: [90] Mescell Signalled Interrupts: 64bit+ Queue=0/5 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 Vector table: BAR=0 offset=00082000 PBA: BAR=0 offset=00082200 Capabilities: [60] Express Endpoint IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 unlimited Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 4096 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- ====> Link: Speed 2.5Gb/s, Width x8 ########################### SUN FACTORY OVERALL PROCESS ################################################# As of Sep 27, 2009: NOTE: ---------------------------- ALL steps below assume at most single DBM i.e. one full rack. If more than one DBM are to be interconnected, then the below scheme needs to be modified accordingly to avoid ip address conflicts. NOTE: ---------------------------- The factory process at SUN will make 2 imaging passes using PXE+NFS. The first successful imaging pass will leave the systems (X4170 and X4275) configured with all the basic network information such as the hostname, ip address, etc. Factory will do this using the preconf option to imaging to feed the configuration information. The exact mandatory PXE options are listed below. Once the nodes are fully up, factory can run various tests on them. The mandatory tests are listed below. Assuming all tests pass in the first round, the second round of imaging will wipe out the first image and re-image the nodes. This round will use different options to PXE. The list of options to use is listed below. This second round will leave the systems configured with fixed private hostnames, ip addresses as oulined in following sections. To do this Factory will need to ensure that the MAC addresses for eth0 interfaces used in the first round are plugged in to the template preconf file for second round in exact order as documented in the template file. Factory will need to confirm at end of round 2 that the systems can come up and carry the private ip addresses and hostnames. Note that "pxe" and "sk" options are mandatory for PXE. Out of optional parameters: Round 1 image PXE options: -------------------------- Must use: 1. stit 2. updfrm 3. factory 4. reboot-on-success 5. preconf Must NOT use: 1. notests=diskgroup 2. multiprof Round 2 image PXE options: -------------------------- Must use: 1. stit 2. updfrm 3. factory 4. notests=diskgroup 5. preconf - based on second round template 6. reboot-on-success Must NOT use: 1. multiprof Distributed/Rack-wide tests to be run after successful image at Round 1: ----------------------------------------------------------------------- Login to any one node as root/welcome1. Let us call this the master test node (MTN). 0. Examine the validations success a. cd /var/log/cellos b. Examine the file vldrun.xx.log for any failures. If there are failures, then examine for each failed validation the, suggested remedy file "<validation name>.SuggestedRemedy" in the validations subdirectory. If no remedy file exists look for the corresponding log files to track down the failure cause and correct it. 1. Check Infiniband Switch software and firmware versions a. cd /opt/oracle.SupportTools b. ./CheckSWProfile.sh -I <comma separated list of switch ips no spaces> Follow prompts 2. Establish root user ssh equivalence between MTN and any other node: a. ssh-keygen -t dsa b. ssh-keygen -t rsa Accept defaults so the ssh keys are created for root user c. create a file called "nodes" listing one hostname (short hostname - i.e. hostname -s output) per line for all nodes in the rack. d. cd /opt/oracle.SupportTools e. ./setup_ssh_eq.sh "full pathname to the nodes file" root welcome1 This pushes the ssh keys to all nodes establishing the ssh trust for root user to all nodes from MTN. 3. Verify the Infiniband connectivity and topology: a. cd /opt/oracle.SupportTools/ibdiagtools b. ./verify-topology -factory [-t quarterrack] Sample output of this for successful runs is in SampleOutputs.txt file in same directory for a full rack. 4. Check that the Infiniband performance is acceptable. Failures indicate problems with links, badly seated HCA s, wrong configuration on switch, etc. a. cd /opt/oracle.SupportTools/ibdiagtools b. Create a file of DB nodes (Sun X4170 or HP DL360) one Infiniband IP address per line. If there are less than 8 nodes in full DBM and less than 4 in half DBM there is some problem in IB connectivity ibhosts | awk '/S [0-9.]* / {print $8}' | tee dbips.ora c. Create a file of Cell nodes (Sun X4275 or HP DL180) one Infiniband IP address per line. If there are less than 8 nodes in full DBM and less than 4 in half DBM there is some problem in IB connectivity ibhosts | awk '/C [0-9.]* / {print $8}' | tee cellip.ora d. Setup all to all root user ssh equivalence d.1. Create a file allip.ora with one ip address per node ibhosts | awk '/[SC] [0-9.]* HCA\-1/ {print $8}' | tee allip.ora Now use the setup_ssh_eq.sh script to setup user equivalence from current node to all other nodes without a password ../setup_ssh_eq.sh allip.ora root <root_password> d.2 Now to setup user equivalence between all other nodes in the rack.. ./infinicheck -b -g dbips.ora -c cellip.ora -u root -s Respond to prompts or you can expect script this part and automate it. e. Create a smaller file dbip.ora out of dbips.ora with just one ip per compute node. So if there are 2 compute nodes with 4 HCAs, then dbip.ora should just have 2 ip address from each node. ibhosts | awk '/S [0-9.]* HCA\-1/ {print $8}' | tee dbip.ora f. Run the check. Option -b for bare metal will suppress the warnings about cellinit.ora and cellip.ora files not found. ./infinicheck -b -g dbip.ora -c cellip.ora g. To view only performance run results ./infinicheck -d -p h. To clean up after a run ./infinicheck -z Details for Round 2: ----------------------------------------------------------------------- 1. Before starting reimage for round 2, copy the file somewhere /opt/oracle.SupportTools/firstconf/factory_use_only.csv 2. Edit the copied file to add the MAC addresses for nodes. Pay close attention to the order in which addresses are entered. The nodes are organized in top to bottom ordering in the rack. The existing MAC addresses are sample only and should be written over with real MAC addresses. Populate these only for the type of Database Machine (DBM) in build. For example only fill up the half rack section for half rack. Leave the rest alone. Verify basic form and content of the file by /opt/oracle.cellos/ipconf -preconf <the csv file> -verify 3. The above edited file should be used as the preconf.csv file to reimage the nodes for Round 2. 4. Once the nodes are up, login to the console (root/welcome1) of the bottommost DB node (X4170) in the rack. This should have come up with hostname fdata01 or hdata01 or qdata01 or bdata01 corresponding to full, half or quarter DBM. 5. cd /opt/oracle.SupportTools 6. ./setup_ssh_eq.sh \ /opt/oracle.SupportTools/firstconf/<full | half | quarter> \ root \ welcome1 7. Check that root ssh equivalence was set up correctly in above step by simply executing some simple ssh command: /usr/local/bin/dcli \ -g /opt/oracle.SupportTools/firstconf/<full | half | quarter> \ -l root \ "hostname -i" 8. Set the ILOM ip addresses as specified in /opt/oracle.SupportTools/firstconf/factory_use_only.csv 9. Set the NM2 InfiniBand switch ip addresses as specified in /opt/oracle.SupportTools/firstconf/factory_use_only.csv 10. Set the CISCO Ethernet switch ip addresses as specified in /opt/oracle.SupportTools/firstconf/factory_use_only.csv 11. When the above step passed, power off the nodes. They are ready to ship. Round 2 variation if Round 2 above does not work and there is time pressure ---------------------------------------------------------------------------- This is the case where you ship the rack without setting up private IP addresses. Simply reimage the rack but this time use the options: Must use: 1. stit 2. updfrm 3. factory 4. notests=diskgroup Must NOT use: 1. preconf - based on second round template 2. multiprof 3. reboot-on-success What happens at customer site when machine ships with private ips ----------------------------------------------------------------- At customer site, to apply real configuration the customer prepares the preconf.csv file that has all the correct content. Then, 0. Log in to the bottommost DB node as root/welcome1 1. cd /opt/oracle.SupportTools/firstconf 2. copy the customer preconf 3. ./applyconfig.sh \ <full | half | quarter> \ <full path to preconf.csv file e.g. /root/preconf.csv> 4. This will push the configuration to all nodes and reboot them. Solaris installation -------------------------------------------------------------------------- To install Solaris you have to install or reinstall Linux with dualboot option first and make sure that the Linux installation succeed. You should use ForFactorySolaris zip file to deploy PXE server. The README file in the factory zip explains the details. You also have the option to use the iso for Solaris installtion. ISO based installation is completely unattended, thus you need only boot from the disk and wait till the machine reboots after the sucessful installation. It may take up to several hours. You can check status of the process: 0. Log in to the DB node as root/welcome 1. tail -f /tmp/install_log
Pingback 引用通告: 看图说话——Exadata的网络架构 – 世间所有相遇都是久别重逢 - Lunar的oracle实验室
Exadata 的4种刷机方法——Reimage – 世间所有相遇都是久别重逢 – Lunar的oracle实验室