Prasad Banisetti: HACMP Short Notes

HACMP

HACMP : High Availability Cluster Multi-Processing

High Availability : Elimination of both planned and unplanned system and application downtime. This is achieved through elimination of H/W and S/W single points of failure.

Cluster Topology : The Nodes, networks, storage, clients, persistent node ip label/devices

Cluster resources: HACMP can move these components from one node to others Ex: Service labels, File systems and applications

RSCT Version: 2.4.2

SDD Version: 1.3.1.3

HA Configuration :

Define the cluster and nodes
Define the networks and disks
Define the topology
Verify and synchronize
Define the resources and resource groups
Verify and synchronize

After Installation changes : /etc/inittab,/etc/rc.net,/etc/services,/etc/snmpd.conf,/etc/snmpd.peers,/etc/syslog.conf,

/etc/trcfmt,/var/spool/cron/crontabs/root,/etc/hosts , HACMP group will add

Software Components:

Application server

HACMP Layer

RSCT Layer

AIX Layer

LVM Layer

TCP/IP Layer

HACMP Services :

Cluster communication daemon(clcomdES)

Cluster Manager (clstrmgrES)

Cluster information daemon(clinfoES)

Cluster lock manager (cllockd)

Cluster SMUX peer daemon (clsmuxpd)

HACMP Deamons: clstrmgr, clinfo, clmuxpd, cllockd.

HA supports up to 32 nodes

HA supports up to 48 networks

HA supports up to 64 resource groups per cluster

HA supports up to 128 cluster resources

IP Label : The label that is associated with a particular IP address as defined by the DNS (/etc/hosts)

Base IP label : The default IP address. That is set on the interface by aix on startup.

Service IP label: a service is provided and it may be bound on a single/multiple nodes. These addresses that HACMP keep highly available.

IP alias: An IP alias is an IP address that is added to an interface. Rather than replacing its base IP address.

RSCT Monitors the state of the network interfaces and devices.

IPAT via replacement : The service IP label will replace the boot IP address on the interface.

IPAT via aliasing: The service IP label will be added as an alias on the interface.

Persistent IP address: this can be assigned to a network for a particular node.

In HACMP the NFS export : /use/es/sbin/cluster/etc/exports

Shared LVM:

Shared volume group is a volume group that resides entirely on the external disks shared by cluster nodes
Shared LVM can be made available on Non concurrent access mode, Concurrent Access mode, Enhanced concurrent access mode.

NON concurrent access mode: This environment typically uses journaled file systems to manage data.

Create a non concurrent shared volume group: smitty mkvgàGive VG name, No for automatically available after system restart, Yes for Activate VG after it is created, give VG major number

Create a non concurrent shared file system: smitty crjfsàRename FS names, No to mount automatically system restart, test newly created FS by mounting and unmounting it.

Importing a volume group to a fallover node:

· Varyoff the volume group

· Run discover process

· Import a volume group

Concurrent Acccess Mode: It’s not supported for file systems. Instead must use raw LV’s and Physical disks.

Creating concurrent access volume group:

· Verify the disk status using lsdev –Cc disk

· Smitty cl_convgàCreate a concurrent volume groupàenter

· Import the volume group using importvg –C –y vg_name physical_volume_name

· Varyonvg vgname

Create LV’s on the concurrent VG: smitty cl_conlv.

Enhanced concurrent mode VG’s: This can be used for both concurrent and non concurrent access. This VG is varied on all nodes in the cluster, The access for modifying the data is only granted to the node that has the resource group active.

Active or passive mode:

Active varyon: all high level operations permitted.

Passive varyon: Read only permissions on the VG.

Create an enhanced concurrent mode VG: mkvg –n –s 32 –C –y myvg hdisk11 hdisk12

Resource group behaviour:

Cascading: Fallover using dynamic node priority. Online on first available node

Rotating : Failover to next priority node in the list. Never fallback. Online using distribution policy.

Concurrent : Online on all available nodes . never fallback

RG dependencies:Clrgdependency –t

/etc/hosts : /etc/hosts for name resolution. All cluster node IP interfaces must be added on this file.

/etc/inittab : hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init>/dev/console 2> &1 will strat the clcomdES and clstrmgrES.

/etc/rc.net file is called by cfgmgr. To configure and start TCP/IP during the boot process.

C-SPOC uses clcomdES to execute commands on remote nodes.

C-SPC commands located in /usr/es/sbin/cluster/cspoc

you should not stop a node with the forced option on more than one node at a time and also the RG in concurrent mode.

Cluster commands are in /usr/es/sbin/cluster

User Administration : cl_usergroup

Create a concurrent VG -- > smitty cl_convg

To find the resource group information: clrginfo –P

HACMP Planning:

Maximum no.of nodes in a cluster is 32

In an HACMP Cluster, the heartbeat messages are exchanged via IP networks and Point-to-Point networks

IP Label represents the name associated with a specific IP address

Service IP label/address: The service IP address is an IP address used for client access.

2 types of service IP addresses:

Shared Service IP address: It can be active only on one node at a time.

Node bound service IP address: An IP address that can be configured only one node

Method of providing high availability service IP addresses:

IP address takeover via IP aliases

IPAT via IP replacement

IP alias is an IP address that is configured on a communication interface in addition to the base ip address. IP alias is an AIX function that is supported by HACMP. AIX supports multiple IP aliases on each communication interface. Each IP alias can be a different subnet.

Network Interface:

Service Interface: This interface used for providing access to the application running on that node. The service IP address is monitored by HACMP via RSCT heartbeat.

Boot Interface: This is a communication interface. With IPAT via aliasing, during failover the service IP label is aliased onto the boot interface

Persistent node IP label: Its useful for administrative purpose.

When an application is started or moved to another node together with its associated resource group, the service IP address can be configured in two ways.

Replacing the base IP address of a communication interface. The service IP label and boot IP label must be on same subnet.
Configuring one communication interface with an additional IP address on top of the existing one. This method is IP aliasing. All Ip addresses/labels must be on different subnet.

Default method is IP aliasing.

HACMP Security: Implemented directly by clcomdES, Uses HACMP ODM classes and the /usr/es/sbin/cluster/rhosts file to determine partners.

Resource Group Takeover relationship:

Resource Group: It’s a logical entity containing the resources to be made highly available by HACMP.

Resources: Filesystems, NFS, Raw logical volumes, Raw physical disks, Service IP addresses/Labels, Application servers, startup/stop scripts.

To made highly available by the HACMP each resource should be included in a Resource group.

Resource group takeover relationship:

Cascading
Rotating
Concurrent
Custom

Cascading:

Cascading resource group is activated on its home node by default.
Resource group can be activated on low priority node if the highest priority node is not available at cluster startup.
If node failure resource group falls over to the available node with the next priority.
Upon node reintegration into the cluster, a cascading resource group falls back to its home node by default.
Attributes:

1. Inactive takeover(IT): Initial acquisition of a resource group in case the home node is not available.

2. Fallover priority can be configured in default node priority list.

3. cascading without fallback is an attribute that modifies the fall back behavior. If cwof flag is set to true, the resource group will not fall back to any node joining. When the flag is false the resource group falls back to the higher priority node.

Rotating:

At cluster startup first available node in the node priority list will activate the resource group.
If the resource group is on the takeover node. It will never fallback to a higher priority node if one becomes available.
Rotating resource groups require the use of IP address takeover. The nodes in the resource chain must all share the same network connection to the resource group.

Concurrent:

A concurrent RG can be active on multiple nodes at the same time.

Custom:

Users have to explicitly specify the desired startup, fallover and fallback procedures.
This support only IPAT – via aliasing service IP addresses.

Startup Options:

Online on home node only
Online on first available node
Online on all available nodes
Online using distribution policyàThe resource group will only be brought online if the node has no other resource group online. You can find this by lssrc –ls clstrmgrES

Fallover Options:

Fallover to next priority node in list
Fallover using dynamic node priorityàThe fallover node can be selected on the basis of either its available CPU, its available memory or the lowest disk usage. HACMP uses RSCT to gather all this information then the resource group will fallover to the node that best meets.
Bring offlineàThe resource group will be brought offline in the event of an error occur. This option is designed for resource groups that are online on all available nodes.

Fallback Options:

Fallback to higher priority node in the list
Never fallback

Basic Steps to implement an HACMP cluster:

Planning
Install and connect the hardware
Configure shared storage
Installing and configuring application software
Install HACMP software and reboot each node
Define the cluster topology
Synchronize the cluster topology
Configure cluster resources
Configure cluster resource group and shared storage
Synchronize the cluster
Test the cluster

HACMP installation and configuration:

HACMP release notes : /usr/es/lpp/cluster/doc

Smitty install_all à fast path for installation

Cluster.es and cluster.cspoc images must be installed on all servers

Start the cluster communication daemon à startsrc –s clcomdES

Upgrading the cluster options: node by node migration and snapshot conversion

Steps for migration:

Stop cluster services on all nodes
Upgrade the HACMP software on each node
Start cluster services on one node at a time

Convert from supported version of HAS to hacmp

Current s/w should be commited
Save snapshot
Remove the old version
Install HA 5.1 and verify

Check previous version of cluster: lslpp –h “cluster”

To save your HACMP configuration, create a snapshot in HACMP

Remove old version of HACMP: smitty install_remove ( select software name cluster*)

Lppchk –v and lppchk –c cluster* both commands run clean if the installation is ok.

After you have installed HA on cluster nodes you need to convert and apply the snapshot. converting the snapshot must be performed before rebooting the cluster nodes

Clconvert_snapshot –C –v version –s à It converts HA old version snapshot to new version

After installation rebooting the cluster services is required because to activate the new cluster manager.

Verification and synchronization : smitty hacmpàextended configurationà extended verification and configuration à verify changes only

Perform Node-by-Node Migration:

Save the current configuration in snapshot.
Stop cluster services on one node using graceful with takeover
Verify the cluster services
Install hacmp latest version.
Check the installed software using lppchk
Reboot the node.
Restart the HACMP software ( smitty hacmpàSystem ManagementàManage cluster servicesàstart cluster services
Repeat above steps on all nodes
Logs documenting on /tmp/hacmp.out /tmp/cm.log /tmp/clstrmgr.debug
Config_too_long message appears when the cluster manager detects that an event has been processing for more than the specified time. To change the time interval ( smitty hacmpà extended configurationàextended event configurationàchange/show time until warning)

Cluster snapshots are saved in the /usr/es/sbin/cluster/snapshots.

Synchronization process will fail when migration is incomplete. To back out from the change you must restore the active ODM. (smitty hacmp à Problem determination toolsà Restore HACMP configuration database from active configuration.)

Upgrading HACMP new version involves converting the ODM from previous release to the current release. That is done by /usr/es/sbin/cluster/conversion/cl_convert –F –v 5.1

The log file for the conversion is /tmp/clconvert.log.

Clean up process once installation interrupted.( smitty installà software maintenance and installationà clean up after a interrupted installation)

Network Configuration:

Physical Networks: TCP/IP based, such as Ethernet and token ring Device based, RS 232 target mode SSA(tmssa)

Configuring cluster Topology:

Standard and Extended configuration

Smitty hacmpàInitialization and standard configuration

IP aliasing is used as the default mechanism for service IP label/address assignment to a network interface.

Configure nodes : Smitty hacmpàInitialization and standard configurationàconfigure nodes to an hacmp clusterà (Give cluster name and node names)
Configure resources: Use configure resources to make highly available ( configure IP address/label, Application server, Volume groups, Logical volumes, File systems
Configure resource groups: Use configure HACMP resource groups . you can choose cascading, rotating, custom, concurrent
Assign resources to each resource group: configure HACMP resource groupsà Change/show resources for a Resource group.
Verify and synchronize the cluster configuration
Display the cluster configuration

Steps for cluster configuration using extended path:

Run discovery: Running discovery retrieves current AIX configuration information from all cluster nodes.
Configuring an HA cluster: smitty hacmpàextended configurationàextended topology configurationàconfigure an HACMP clusteràAdd/change/show an HA cluster
Defining a node: smitty hacmpàextended configurationàextended topology configurationàconfigure HACMP nodesàAdd a node to the HACMP cluster
Defining sites: This is optional.
Defining network: Run discover before network configuration.

IP based networks: smitty hacmpàextended configurationàextended topology configurationàconfigure HACMP networksàAdd a network to the HACMP clusteràselect the type of networkà(enter network name, type, netmask, enable IP takeover via IP aliases(default is true), IP address offset for heartbeating over IP aliases.

Defining communication interfaces: smitty hacmpàextended configurationàextended topology configurationàHACMP cotmmunication interfaces/DevicesàSelect communication interfacesàadd node name, network name, network interface, IPlabel/address, network type
Defining communication devices: smitty hacmpàextended configurationàextended topology configurationàconfigure HACMP communication interface/devicesàselect communication devices
To see boot IP labels on a node use netstat –in
Defining persistent IP labels: It always stays on the same node, does not require installing an additional physical interface, its not part of any resource group.smitty hacmpàextended topology configurationàconfigure persistent node IP label/addressesàadd persistent node IP label(enter node name, network name, node IP label/address)

Resource Group Configuration

Smitty hacmpàinitialization and standard configurationàConfigure HACMP resource groupsà Add a standard resource groupà Select cascading/Rotating/Concurrent/Custom (enter resource group name, participating node names)
Assigning resources to the RG. Smitty hacmpàinitialization and standard configurationà Configure HACMP resource groupsàchange/show resources for a standard resource group( add service IP label/address, VG, FS, Application servers.

Resource group and application management:

Bring a resource group offline: smitty cl_adminàselect hacmp resource group and application managementàBring a resource group offline.
Bring a resource group online: smitty hacmp àselect hacmp resource group and application managementàBring a resource group online.
Move a resource group: smitty hacmp à select hacmp resource group and application managementà Move a resource group to another node

C-SPOC: Under smitty cl_admin

Manage HACMP services
HACMP Communication interface management
HACMP resource group and application manipulation
HACMP log viewing and management
HACMP file collection management
HACMP security and users management
HACMP LVM
HACMP concurrent LVM
HACMP physical volume management

Post Implementation and administration:

C-Spoc commands are located in the /usr/es/sbin/cluster/cspoc directory.

HACMP for AIX ODM object classes are stored in /etc/es/objrepos.

User group administration in hacmp is smitty cl_usergroup

Problem Determination:

To verify the cluster configuration use smitty clverify.dialog

Log file to store output: /var/hacmp/clverify/clverify.log

HACMP Log Files:

/usr/es/adm/cluster.log: Generated by HACMP scripts and daemons.

/tmp/hacmp.out: This log file contains line – by – line record of every command executed by scripts.

/usr/es/sbin/cluster/history/cluster.mmddyyyy: System creates cluster history file everyday.

/tmp/clstrmgr.debug: This messages generated by clstrmgrES activity.

/tmp/cspoc.log: generated by hacmp c-spoc commands

/tmp/dms_loads.out: stores log messages every time hacmp triggers the deadman switch

/var/hacmp/clverify/clverify.log: cluster verification log.

/var/ha/log/grpsvcs, /var/ha/log/topsvcs, /var/ha/log/grpglsm: daemon logs.

Snapshots: The primary information saved in a cluster snapshot is the data stored in the HACMP ODM classes(HACMPcluster, HACMPnode, HACMPnetwork, HACMPdaemons).

The cluster snapshot utility stores the data it saves in two separate files:

ODM data file(.odm), Cluster state information file(.info)

To create a cluster snapshot: smitty hacmpàhacmp extended configurationàhacmp snapshot configurationàadd a cluster snapshot

Cluster Verification and testing:

High and Low water mark values are 33 and 24

The default value for syncd is 60.

Before starting the clu ster clcomd daemon is added to the /etc/inittab and started by init.

Verify the status of the cluster services: lssrc –g cluster ( cluster manager daemon(clstrmgrES), cluster SMUX peer daemon(clsmuxpd) and cluster topology services daemon(topsvcd) should be running.

Status of different cluster subsystems: lssrc –g topsvcs and lssrc –g emsvcs.

In /tmp/hacmp.out file look for the node_up and node_up_complete events.

To check the HACMP cluster status: /usr/sbin/cluster/clstat. To use this command you should have started the clinfo daemon.

To change the snmp version : /usr/sbin/snmpv3_ssw -1.

Stop the cluster services by using smitty clstop : graceful, takeover, forced. In the log file /tmp/hacmp.out search for node_down and node_down_complete.

Graceful: Node will be released, but will not be acquired by other nodes.

Graceful with takeover: Node will be released and acquired by other nodes.

Forced: Cluster services will be stopped but resource group will not be released.

Resource group states: online, offline, aquiring, releasing, error, temporary error, or unknown.

Find the resource group status: /usr/es/sbin/cluster/utilities/clfindres or clRGinfo.

Options: -t : If you want to display the settling time –p: display priority override locations

To review cluster topology: /usr/es/sbin/cluster/utilities/cltopinfo.

Different type of NFS mounts: hard and soft

Hard mount is default choice.

NFS export file: /usr/es/sbin/cluster/etc/exports.

If the adapter configured with a service IP address : verify in /tmp/hacmp.out event swap_adapter has occurred, Service IP address has been moved using the command netstat –in .

You can implement RS232 heartbeat network between any 2 nodes.

To test a serial connection lsdev –Cc tty, baud rate is set to 38400, parity to none, bits per character to 8

Test to see RSCT is functioning or not : lssrc –ls topsvcs

RSCT verification: lssrc –ls topsvcs. To check RSCT group services: lssrc –ls grpsvcs

Monitor heartbeat over all the defines networks: cllsif.log from /var/ha/run/topsvcs.clustername.

Prerequisites:

PowerHA Version 5.5 à AIX v5300-9 àRSCT levet 2.4.10

BOS components: bos.rte.*, bos.adt.*, bos.net.tcp.*,

Bos.clvm.enh ( when using the enhanced concurrent resource manager access)

Cluster.es.nfs fileset comes with the powerHA installation medium installs the NFSv4. From aix BOS bos.net.nfs.server 5.3.7.0 and bos.net.nfs.client 5.3.7.0 is required.

Check all the nodes must have same version of RSCT using lslpp –l rsct

Installing powerHA: release notes: /usr/es/sbin/cluster/release_notes

Enter smitty install_allàselect input deviceàPress f4 for a software listingàenter

Steps for increase the size of a shared lun:

Stop the cluster on all nodes
Run cfgmgr
Varyonvg vgname
Lsattr –El hdisk#
Chvg –g vgname
Lsvg vgname
Varyoffvg vgname
On subsequent cluster nodes that share the vg. (run cfgmgr, lsattr –El hdisk#, importvg –L vgname hdisk#)
Synchronize

PowerHA creates a backup copy of the modified files during synchronization on all nodes. These backups are stored in /var/hacmp/filebackup directory.

The file collection logs are stored in /var/hacmp/log/clutils.log file.

User and group Administration:

Adding a user: smitty cl_usergroupàselect users in a HACMP clusteràAdd a user to the cluster.(list users, change/show characteristics of a user in cluster, Removing a user from the cluster

Adding a group: smitty cl_usergroupàselect groups in a HACMP clusteràAdd a group to the cluster.(list groups, change/show characteristics of a group in cluster, Removing a group from the cluster

Command is used to change password on all cluster nodes: /usr/es/sbin/cluster/utilities/clpasswd

Smitty cl_usergroupàusers in a HACMP cluster

Add a user to the cluster
List users in the cluster
Change/show characteristics of a user in the cluster
Remove a user from the cluster

Smitty cl_usergroupàGroups in a HACMP cluster

Add a group to the cluster
List groups to the cluster
Change a group in the cluster
Remove a group

Smitty cl_usergroupàPasswords in an HACMP cluster

Importing VG automatically: smitty hacmpàExtended configurationàHACMP extended resource configurationàChange/show resources and attributes for a resource groupàAutomatically import volume groups to true

C-SPOC LVM: smitty cl_admin à HACMP Logical Volume Management

Shared Volume groups
Shared Logical volumes
Shared File systems
Synchronize shared LVM mirrors (Synchronize by VG/Synchronize by LV)
Synchronize a shared VG definition

C-SPOC concurrent LVM: smitty cl_admin à HACMP concurrent LVM

Concurrent volume groups
Concurrent Logical volumes
Synchronize concurrent LVM mirrors

C-SPOC Physical volume management: smitty cl_adminàHACMP physical volume management

Add a disk to the cluster
Remove a disk from the cluster
Cluster disk replacement
Cluster datapath device management

Cluster Verification: smitty hacmpàExtended verificationàExtended verification and synchronization. Verification log files stored in /var/hacmp/clverify.

/var/hacmp/clverify/clverify.log à Verification log

/var/hacmp/clverify/pass/nodename à If verification succeeds

/var/hacmp/clverify/fail/nodename à If verification fails

Automatic cluster verification: Each time you start cluster services and every 24 hours.

Configure automatic cluster verification: smitty hacmpàproblem determination toolsàhacmp verification à Automatic cluster configuration monitoring.

Cluster status Monitoring: /usr/es/sbin/cluster/clstat –a and o.

/usr/es/sbin/cluster/utilities/cldumpàIt provides snapshot of the key cluster status components.

Clshowsrv: It displays the status

Disk Heartbeat:

It’s a non-IP heartbeat
It’s use dedicated disk/LUN
It’s a point to point network
If more than 2 nodes exist in your cluster, you will need a minimum of n number of non-IP heartbeat networks.
Disk heartbeating will typically requires 4 seeks/second. That is each of two nodes will write to the disk and read from the disk once/second. Filemon tool monitors the seeks.

Configuring disk heartbeat:

Vpaths are configured as member disks of an enhanced concurrent volume group. Smitty lvmàselect volume groupsàAdd a volume groupàGive VG name, PV names, VG major number, Set create VG concurrent capable to enhanced concurrent.
Import the new VG on all nodes using smitty importvg or importvg –V 53 –y c23vg vpath5
Create the diskhb networkàsmitty hacmpàextended configuration àextended topology configurationàconfigure hacmp networksàAdd a network to the HACMP clusteràchoose diskhb
Add 2 communication devicesà smitty hacmpàextended configuration àextended topology configurationàConfigure HACMP communication Interfaces/DevicesàAdd communication interfaces/devicesàAdd pre-defined communication interfaces and devicesà communication devicesàchoose the diskhb
Create one communication device for other node also

Testing Disk Heartbeat connectivity:/usr/sbin/rsct/dhb_read is used to test the validity of a diskhb connection.

Dhb_read –p vpath0 –r for receives data over diskhb network

Dhb_read –p vpath3 –t for transmits data over diskhb network.

Monitoring disk heartbeat: Monitor the activity of the disk heartbeats via lssrc –ls topsvcs. Monitor the Missed HBS field.

Configure HACMP Application Monitoring: smitty cm_cfg_appmonàAdd a process application monitoràgive process names, app startup/stop scripts

Application availability analysis tool: smitty hacmpàsystem managementàResource group and application managementàapplication availability analysis

Commands:

List the cluster topology : /usr/es/sbin/cluster/utilities/cllsif

/usr/es/sbin/cluster/clstat

Start cluster : smitty clstart .. Monitor with /tmp/hacmp.out and check for node_up_complete.

Stop the cluster : smitty cl_stop àMonitor with /tmp/hacmp.out and check fr node_down_complete.

Determine the state of cluster: /usr/es/sbin/cluster/utilities/clcheck_server

Display the status of HACMP subsystems: clshowsrv –v/-a

Display the topology information: cltopinfo –c/-n/-w/-i

Monitor the heartbeat activity: lssrc –ls topsvcs [ check for dropped, errors]

Display resource group attributes: clrginfo –v, -p, -t, -c, -a OR clfindres

Prasad Banisetti

Tuesday, April 19, 2011

HACMP Short Notes

2 comments:

Followers