Tuesday, April 19, 2011

AIX Troubleshooting

Troubleshooting AIX and HACMP

Core dump:

  • Find core dump files: /usr/samples/findcore/corepath, getvfsname
  • Debug and analyze the core: snapcore –d /tmp/coredir core.16928.24200405

Boot Process:

  • To check the boot process: alog –t boot –o
  • Failure to locate a boot image: The boot image of the disk may be corrupted. Access rootvg from bootable media(select start maintenance mode for system recoveryàaccess a root VGà0 to continue)àrun boboot command

Corrupted FS/corrupted JFS log device/Failing FSCK/bad disk: Boot from CDrom/mksysb tapeàselect start maintenance mode/system recoveryàaccess rootvgàformat the default jfs log using /usr/sbin/logform /dev/hd8àrun fsck –y /dev/hd1,hd2,hd3,hd4,hd9var (If fsck find any errors repair the FS using fsck –p /dev/hd#)àlslv –m hd5(for finding boot disk)àrecreate the boot image using bosboot –ad /dev/hdisk#, bootlist –m normal hdisk#-->shutdown –Fr

Remove much of system configuration and save it to backup directory: mount /dev/hd4 /mnt; ,mount /dev/hd2 /usr; mkdir /mnt/etc/objrepos/bak; cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak; umount all; exit

Save the clean ODM database: savebase –d /dev/hdisk#

Check file system sizes using : df /dev/hd3; df /dev/hd4

Check the /etc/inittab file is missing or not

Check all permissions ls –al / .profile /etc/environment /etc/profile

Check for ls –al /bin /bin/bsh /bin/sh /lib /u /unix

Check ls –l /etc/fsck /sbin/rc.boot or missed or not

No Login Prompt: ps ax |grep consoleàcheck getty process is running or not; lscons

System dump:

  • Estimating dump size: sysdumpdev –e
  • To view current dump device: sysdumpdev –l (/dev/hd6)
  • To specify the primary dump device : sysdumpdev –P –p /dev/hd7
  • To specify the secondary dump device: sysdumpdev –P –s /dev/hd7
  • Create dump device: estimate the size sysdumpdev –e; mklv –y hd7 –t sysdump rootvg 7
  • Check the dump resources used by the system dump: /usr/lib/ras/dumpcheck –p
  • Change the size of a dump device: chps –s ‘1’ hd6
  • Always allow system dump: sysdumpdev –k
  • Get the last dump information: sysdumpdev –L

TCP/IP troubleshooting:

  • Traceroute shows each gateway that the packet traverses on its way to finding the target host. Traceroute uses the UDP protocol. And ping uses ICMP protocol. If you receive any answer from local gateway then the problem with the remote host problem. If you receive nothing then local network problem.

NFS troubleshooting:

  • Verify that the network connections
  • Verify inetd, portmap and biod daemons are running in the client
  • Verify valid mount point exists
  • Verify that server is up and running using rpcinfo –p server
  • Verify mountd, portmap and nfsd daemons running on NFS using rpcinfo –u server mount, portmap, nfs
  • Check the /etc/exports file using showmount –e server
  • Identifying the cause of slow access times for NFS: stopsrc –s biod ; startsrc –s biod
  • Use nfsstat –s and nfsstat –c commands to determine if the client or server is retransmitting large blocks.
  • NFS error messages: mountd will not start, server not responding: port mapper failure – RPC timed out, mount: access denied, mount: you are not allowed

LVM Troubleshooting:

  • VG lost:
    1. NON rootvg
      • exportvg data_vg
      • remove the bad disk from ODM using rmdev –l hdisk# -d
      • create new disks and reboot
      • if you have savevg backup: restvg –f /dev/rmt0 hdisk#
      • if you don’t have savevg backup recreate VG LV FS
      • restore FS data using restore –rqvf /dev/rmt0
    2. Rootvg

· shutdown the system and replace the bad disks

· boot in maintenance mode

· restore from a mksysb image( power off the machineàturn on the poweràplace the bootable mediaàpress 5 / F5àwhen installation screen appears select start maintenance mode for system recoveryàselect install from a system backup

· import each VG into a new ODM.

Boot Problem Management:

LED

User Action

553

Access the rootvg. Issue ‘df –k’. Check if /tmp, /usr or / are full.

553

Access the rootvg. Check /etc/inittab (empty, missing or corrupt?). Check /etc/environment.

551, 555, 557

Access the rootvg. Re-create the BLV:

# bosboot –ad /dev/hdiskx

551, 552, 554, 555, 556, 557

Access rootvg before mounting the rootvg filesystems. Re-create the JFS log:

# logform /dev/hd8

Run fsck afterwards

552, 554, 556

Run fsck against all rootvg filesystems. If fsck indicates errors (not an AIXV4 filesystem), repair the superblock (each filesystem has two superblocks, one in logical block 1 and a copy in logical block 31, so copy block 31 to block 1)

# dd count=1 bs=4k skip-31 seek=1 if=/dev/hd4 of=/dev/hd4

551

Access rootvg and unlock the rootvg:

chvg –u rootvg

523 – 534

ODM files are missing or inaccessible. Restore the missing files from a system backup.

518

Mount of /usr or /var failed? Check the /etc/filesystem. Check network (remote mount)., filesystems (fsck) and hardware.

No comments:

Post a Comment