You are on page 1of 646

Front cover

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
Implementing high availability for ITWS and Tivoli Framework Windows 2000 Cluster Service and HACMP scenarios Best practices and tips

Vasfi Gucer Satoko Egawa David Oswald Geoff Pusey John Webb Anthony Yen

ibm.com/redbooks

International Technical Support Organization High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework March 2004

SG24-6632-00

Note: Before using this information and the product it supports, read the information in Notices on page vii.

First Edition (March 2004) This edition applies to IBM Tivoli Workload Scheduler Version 8.2, IBM Tivoli Management Framework Version 4.1.

Copyright International Business Machines Corporation 2004. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 IBM Tivoli Workload Scheduler architectural overview . . . . . . . . . . . . . . . . 2 1.2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework . 4 1.3 High availability terminology used in this book . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Overview of clustering technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.1 High availability versus fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.2 Server versus job availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.3 Standby versus takeover configurations . . . . . . . . . . . . . . . . . . . . . . 12 1.4.4 IBM HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.5 Microsoft Cluster Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 When to implement IBM Tivoli Workload Scheduler high availability . . . . 24 1.5.1 High availability solutions versus Backup Domain Manager . . . . . . . 24 1.5.2 Hardware failures to plan for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.6 Material covered in this book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 2. High level design and architecture . . . . . . . . . . . . . . . . . . . . . . 31 2.1 Concepts of high availability clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.1.1 A birds-eye view of high availability clusters . . . . . . . . . . . . . . . . . . 32 2.1.2 Software considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.1.3 Hardware considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Hardware configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.1 Types of hardware cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.2 Hot standby system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3 Software configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.1 Configurations for implementing IBM Tivoli Workload Scheduler in a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.2 Software availability within IBM Tivoli Workload Scheduler . . . . . . . 57 2.3.3 Load balancing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.3.4 Job recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Copyright IBM Corp. 2004. All rights reserved.

iii

Chapter 3. High availability cluster implementation . . . . . . . . . . . . . . . . . 63 3.1 Our high availability cluster scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1.1 Mutual takeover for IBM Tivoli Workload Scheduler . . . . . . . . . . . . . 64 3.1.2 Hot standby for IBM Tivoli Management Framework . . . . . . . . . . . . 66 3.2 Implementing an HACMP cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.1 HACMP hardware considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.2 HACMP software considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.3 Planning and designing an HACMP cluster . . . . . . . . . . . . . . . . . . . 67 3.2.4 Installing HACMP 5.1 on AIX 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.3 Implementing a Microsoft Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 3.3.1 Microsoft Cluster hardware considerations . . . . . . . . . . . . . . . . . . . 139 3.3.2 Planning and designing a Microsoft Cluster installation . . . . . . . . . 139 3.3.3 Microsoft Cluster Service installation . . . . . . . . . . . . . . . . . . . . . . . 141 Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 183 4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster . . . 184 4.1.1 IBM Tivoli Workload Scheduler implementation overview . . . . . . . 184 4.1.2 Preparing to install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 4.1.3 Installing the IBM Tivoli Workload Scheduler engine . . . . . . . . . . . 191 4.1.4 Configuring the IBM Tivoli Workload Scheduler engine . . . . . . . . . 192 4.1.5 Installing IBM Tivoli Workload Scheduler Connector . . . . . . . . . . . 194 4.1.6 Setting the security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 4.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance 201 4.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster. 202 4.1.9 Applying IBM Tivoli Workload Scheduler fix pack . . . . . . . . . . . . . . 204 4.1.10 Configure HACMP for IBM Tivoli Workload Scheduler . . . . . . . . . 210 4.1.11 Add IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . 303 4.1.12 Production considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 4.1.13 Just one IBM Tivoli Workload Scheduler instance . . . . . . . . . . . . 345 4.2 Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster . . . 347 4.2.1 Single instance of IBM Tivoli Workload Scheduler . . . . . . . . . . . . . 347 4.2.2 Configuring the cluster group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 4.2.3 Two instances of IBM Tivoli Workload Scheduler in a cluster. . . . . 383 4.2.4 Installation of the IBM Tivoli Management Framework . . . . . . . . . . 396 4.2.5 Installation of Job Scheduling Services. . . . . . . . . . . . . . . . . . . . . . 401 4.2.6 Installation of Job Scheduling Connector . . . . . . . . . . . . . . . . . . . . 402 4.2.7 Creating Connector instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 4.2.8 Interconnecting the two Tivoli Framework Servers . . . . . . . . . . . . . 405 4.2.9 Installing the Job Scheduling Console . . . . . . . . . . . . . . . . . . . . . . 408 4.2.10 Scheduled outage configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Chapter 5. Implement IBM Tivoli Management Framework in a cluster . 415 5.1 Implement IBM Tivoli Management Framework in an HACMP cluster . . 416

iv

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5.1.1 Inventory hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 5.1.2 Planning the high availability design . . . . . . . . . . . . . . . . . . . . . . . . 418 5.1.3 Create the shared disk volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 5.1.4 Install IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . 453 5.1.5 Tivoli Web interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 5.1.6 Tivoli Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 5.1.7 Tivoli Endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 5.1.8 Configure HACMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 5.2 Implementing Tivoli Framework in a Microsoft Cluster . . . . . . . . . . . . . . 503 5.2.1 TMR server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 5.2.2 Tivoli Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 5.2.3 Tivoli Endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Appendix A. A real-life implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Rationale for IBM Tivoli Workload Scheduler and HACMP integration . . . . . 572 Our environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Installation roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 Hardware configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Installing the AIX operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Finishing the network configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Creating the TTY device within AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Testing the heartbeat interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Configuring shared disk storage devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Copying installation code to shared storage . . . . . . . . . . . . . . . . . . . . . . . . . 580 Creating user accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Creating group accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Installing IBM Tivoli Workload Scheduler software . . . . . . . . . . . . . . . . . . . . 581 Installing HACMP software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 Installing the Tivoli TMR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 Patching the Tivoli TMR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 TMR versus Managed Node installation . . . . . . . . . . . . . . . . . . . . . . . . . . 583 Configuring IBM Tivoli Workload Scheduler start and stop scripts. . . . . . . . . 584 Configuring miscellaneous start and stop scripts . . . . . . . . . . . . . . . . . . . . . . 584 Creating and modifying various system files . . . . . . . . . . . . . . . . . . . . . . . . . 585 Configuring the HACMP environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Testing the failover procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 HACMP Cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 HACMP Cluster Resource Group topology . . . . . . . . . . . . . . . . . . . . . . . . 588 ifconfig -a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Skills required to implement IBM Tivoli Workload Scheduling/HACMP . . . . . 590 Observations and questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Contents

Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS . . . . . 601 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Configure the wlocalhost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Install Framework on the primary node. . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Install Framework on the secondary node . . . . . . . . . . . . . . . . . . . . . . . . 603 Configure the TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Set the root administrators login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Force the oserv to bind to the virtual IP . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Change the name of the DBDIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Modify the setup_env.cmd and setup_env.sh . . . . . . . . . . . . . . . . . . . . . . 604 Configure the registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Rename the Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Rename the TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Rename the top-level policy region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Rename the root administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Configure the ALIDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Create the cluster resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Create the oserv cluster resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Create the trip cluster resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Set up the resource dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Validate and backup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Test failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Back up the Tivoli databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615

vi

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

Copyright IBM Corp. 2004. All rights reserved.

vii

Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AFS AIX Balance DB2 DFS Enterprise Storage Server IBM LoadLeveler Maestro NetView Planet Tivoli PowerPC pSeries Redbooks Redbooks (logo) RS/6000 SAA Tivoli Enterprise Tivoli TotalStorage WebSphere ^ z/OS

The following terms are trademarks of other companies: Intel, Intel Inside (logos), and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, and service names may be trademarks or service marks of others.

viii

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Preface
This IBM Redbook is intended to be used as a major reference for designing and creating highly available IBM Tivoli Workload Scheduler and Tivoli Framework environments. IBM Tivoli Workload Scheduler Version 8.2 is the IBM strategic scheduling product that runs on many different platforms, including the mainframe. Here, we describe how to install ITWS Version 8.2 in a high availability (HA) environment and configure it to meet high availability requirements. The focus is on the IBM Tivoli Workload Scheduler Version 8.2 Distributed product, although some issues specific to Version 8.1 and IBM Tivoli Workload Scheduler for z/OS are also briefly covered. When implementing a highly available IBM Tivoli Workload Scheduler environment, you have to consider high availability for both IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework environments, because IBM Tivoli Workload Scheduler uses IBM Tivoli Management Framework's services for authentication. Therefore, we discuss techniques you can use to successfully implement IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework (TMR server, Managed Nodes and Endpoints), and we present two major case studies: High-Availability Cluster Multiprocessing (HACMP) for AIX, and Microsoft Windows Cluster Service. The implementation of IBM Tivoli Workload Scheduler within a high availability environment will vary from platform to platform and from customer to customer, based on the needs of the installation. Here, we cover the most common scenarios and share practical implementation tips. We also make recommendations for other high availability platforms; although there are many different clustering technologies in the market today, they are similar enough to allow us to offer useful advice regarding the implementation of a highly available scheduling system. Finally, although we basically cover highly available scheduling systems, we also offer a section for customers who want to implement a highly available IBM Tivoli Management Framework environment, but who are not currently using IBM Tivoli Workload Scheduler.

The team that wrote this redbook


This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center.

Copyright IBM Corp. 2004. All rights reserved.

ix

Vasfi Gucer is an IBM Certified Consultant IT Specialist at the ITSO Austin Center. He has been with IBM Turkey for 10 years, and has worked at the ITSO since January 1999. He has more than 10 years of experience in systems management, networking hardware, and distributed platform software. He has worked on various Tivoli customer projects as a Systems Architect and Consultant in Turkey and in the United States, and is also a Certified Tivoli Consultant. Satoko Egawa is an I/T Specialist with IBM Japan. She has five years of experience in systems management solutions. Her area of expertise is job scheduling solutions using Tivoli Workload Scheduler. She is also a Tivoli Certified Consultant, and in the past has worked closely with the Tivoli Rome Lab. David Oswald is a Certified IBM Tivoli Services Specialist in New Jersey, United States, who works on IBM Tivoli Workload Scheduling and Tivoli storage architectures/deployments (TSRM, TSM,TSANM) for IBM customers located in the United States, Europe, and Latin America. He has been involved in disaster recovery, UNIX administration, shell scripting and automation for 17 years, and has worked with TWS Versions 5.x, 6.x, 7.x, and 8.x. While primarily a Tivoli services consultant, he is also involved in Tivoli course development, Tivoli certification exams, and Tivoli training efforts. Geoff Pusey is a Senior I/T Specialist in the IBM Tivoli Services EMEA region. He is a Certified IBM Tivoli Workload Scheduler Consultant and has been with Tivoli/IBM since January 1998, when Unison Software was acquired by Tivoli Systems. He has worked with the IBM Tivoli Workload Scheduling product for the last 10 years as a consultant, performing customer training, implementing and customizing IBM Tivoli Workload Scheduler, creating customized scripts to generate specific reports, and enhancing IBM Tivoli Workload Scheduler with new functions. John Webb is a Senior Consultant for Tivoli Services Latin America. He has been with IBM since 1998. Since joining IBM, John has made valuable contributions to the company through his knowledge and expertise in enterprise systems management. He has deployed and designed systems for numerous customers, and his areas of expertise include the Tivoli Framework and Tivoli PACO products. Anthony Yen is a Senior IT Consultant with IBM Business Partner Automatic IT Corporation, <http://www.AutomaticIT.com>, in Austin, Texas, United States. He has delivered 19 projects involving 11 different IBM Tivoli products over the past six years. His areas of expertise include Enterprise Console, Monitoring, Workload Scheduler, Configuration Manager, Remote Control, and NetView. He has given talks at Planet Tivoli and Automated Systems And Planning OPC and TWS Users Conference (ASAP), and has taught courses on IBM Tivoli

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Workload Scheduler. Before that, he worked in the IT industry for 10 years as a UNIX and Windows system administrator. He has been an IBM Certified Tivoli Consultant since 1998. Thanks to the following people for their contributions to this project: Octavian Lascu, Dino Quintero International Technical Support Organization, Poughkeepsie Center Jackie Biggs, Warren Gill, Elaine Krakower, Tina Lamacchia, Grant McLaughlin, Nick Lopez IBM USA Antonio Gallotti IBM Italy

Become a published author


Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/Redbooks/residencies.html

Comments welcome
Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review Redbook form found at:
ibm.com/Redbooks

Send your comments in an Internet note to:


Redbook@us.ibm.com

Preface

xi

Mail your comments to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493

xii

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 1.

Introduction
In this chapter, we introduce the IBM Tivoli Workload Scheduler suite and identify the need for high availability by IBM Tivoli Workload Scheduler users. Important ancillary concepts in IBM Tivoli Management Framework (also referred as Tivoli Framework, or TMF) and clustering technologies are introduced for new users as well. The following topics are covered in this chapter: IBM Tivoli Workload Scheduler architectural overview on page 2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework on page 4 High availability terminology used in this book on page 7 Overview of clustering technologies on page 8 When to implement IBM Tivoli Workload Scheduler high availability on page 24 Material covered in this book on page 27

Copyright IBM Corp. 2004. All rights reserved.

1.1 IBM Tivoli Workload Scheduler architectural overview


IBM Tivoli Workload Scheduler Version 8.2 is the IBM strategic scheduling product that runs on many different platforms, including the mainframe. This redbook covers installing ITWS Version 8.2 in a high availability (HA) environment and configuring it to meet high availability requirements. The focus is on the IBM Tivoli Workload Scheduler Version 8.2 Distributed product, although some issues specific to Version 8.1 and IBM Tivoli Workload Scheduler for z/OS are also briefly covered. Understanding specific aspects of IBM Tivoli Workload Schedulers architecture is key to a successful high availability implementation. In-depth knowledge of the architecture is necessary for resolving some problems that might present themselves during the deployment of IBM Tivoli Workload Scheduler in an HA environment. We will only identify those aspects of the architecture that are directly involved with an high availability deployment. For a detailed discussion of IBM Tivoli Workload Schedulers architecture, refer to Chapter 2, Overview, in IBM Tivoli Workload Scheduling Suite Version 8.2, General Information, SC32-1256. IBM Tivoli Workload Scheduler uses the TCP/IP-based network connecting an enterprises servers to accomplish its mission of scheduling jobs. A job is an executable file, program, or command that is scheduled and launched by IBM Tivoli Workload Scheduler. All servers that run jobs using IBM Tivoli Workload Scheduler make up the scheduling network. A scheduling network contains at least one domain, the master domain, in which a server designated as the Master Domain Manager (MDM) is the management hub. This server contains the definitions of all scheduling objects that define the batch schedule, stored in a database. Additional domains can be used to divide a widely distributed network into smaller, locally managed groups. The management hubs for these additional domains are called Domain Manager servers. Each server in the scheduling network is called a workstation, or by the interchangeable term CPU. There are different types of workstations that serve different roles. For the purposes of this publication, it is sufficient to understand that a workstation can be one of the following types. You have already been introduced to one of them, the Master Domain Manager. The other types of workstations are Domain Manager (DM) and Fault Tolerant Agent (FTA). Figure 1-1 on page 3 shows the relationship between these architectural elements in a sample scheduling network.

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

MASTERDM
AIX

Master Domain Manager

DomainA
AIX

DomainB
Domain Manager DM_B
HPUX

Domain Manager DM_A

FTA1
AIX

FTA2 OS/400

FTA3
Windows 2000

FTA4
Solaris

Figure 1-1 Main architectural elements of IBM Tivoli Workload Scheduler relevant to high availability

The lines between the workstations show how IBM Tivoli Workload Scheduler communicates between them. For example, if the MDM needs to send a command to FTA2, it would pass the command via DM_A. In this example scheduling network, the Master Domain Manager is the management hub for two Domain Managers, DM_A and DM_B. Each Domain Manager in turn is the management hub for two Fault Tolerant Agents. DM_A is the hub for FTA1 and FTA2, and DM_B is the hub for FTA3 and FTA4. IBM Tivoli Workload Scheduler operations revolve around a production day, a 24-hour cycle initiated by a job called Jnextday that runs on the Master Domain

Chapter 1. Introduction

Manager. Interrupting or delaying this process presents serious ramifications for the proper functioning of the scheduling network. Based upon this architecture, we determined that making IBM Tivoli Workload Scheduler highly available requires configuring at least the Master Domain Manager server for high availability. This delivers high availability of the scheduling object definitions. In some sites, even the Domain Manager and Fault Tolerant Agent servers are configured for high availability, depending upon specific business requirements.

1.2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework
IBM Tivoli Workload Scheduler provides out-of-the-box integration with up to six other IBM products: IBM Tivoli Management Framework IBM Tivoli Business Systems Manager IBM Tivoli Enterprise Console IBM Tivoli NetView IBM Tivoli Distributed Monitoring (Classic Edition) IBM Tivoli Enterprise Data Warehouse Other IBM Tivoli products, such as IBM Tivoli Configuration Manager, can also be integrated with IBM Tivoli Workload Scheduler but require further configuration not provided out of the box. Best practices call for implementing IBM Tivoli Management Framework on the same Master Domain Manager server used by IBM Tivoli Workload Scheduler. Figure 1-2 on page 5 shows a typical configuration of all six products, hosted on five servers (IBM Tivoli Business Systems Manager is often hosted on two separate servers).

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

IBM Tivoli Workload Scheduler IBM Tivoli Management Framework

IBM Tivoli Management Framework IBM Tivoli Enterprise Console IBM Tivoli Enterprise Data Warehouse

IBM Tivoli Management Framework IBM Tivoli NetView IBM Tivoli Distributed Monitoring

IBM Tivoli Business Systems Manager

Figure 1-2 Typical site configuration of all Tivoli products that can be integrated with IBM Tivoli Workload Scheduler out of the box

In this redbook, we show how to configure IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework for high availability, corresponding to the upper left server in the preceding example site configuration. Sites that want to implement other products on an IBM Tivoli Workload Scheduler Master Domain Manager server for high availability should consult their IBM service provider. IBM Tivoli Workload Scheduler uses IBM Tivoli Management Framework to deliver authentication services for the Job Scheduling Console GUI client, and to communicate with the Job Scheduling Console in general. Two components are used within IBM Tivoli Management Framework to accomplish these responsibilities: the Connector, and Job Scheduling Services (JSS). These components are only required on the Master Domain Manager server. For the purposes of this redbook, be aware that high availability of IBM Tivoli Workload Scheduler requires proper configuration of IBM Tivoli Management Framework, all Connector instances, and the Job Scheduling Services component. Figure 1-3 on page 6 shows the relationships between IBM Tivoli Management Framework, the Job Scheduling Services component, the IBM Tivoli Workload Scheduler job scheduling engine, and the Job Scheduling Console.

Chapter 1. Introduction

Job Scheduling Consoles

Connector_A

Connector_B Tivoli Management Framework

Production_A

Job Scheduling Services

Figure 1-3 Relationship between major components of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework

In this example, Job Scheduling Console instances on three laptops are connected to a single instance of IBM Tivoli Management Framework. This instance of IBM Tivoli Management Framework serves two different scheduling networks called Production_A and Production_B via two Connectors called Connector_A and Connector_B. Note that there is only ever one instance of the

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Production_B

Job Scheduling Services component no matter how many instances of the Connector and Job Scheduling Console exist in the environment. It is possible to install IBM Tivoli Workload Scheduler without using the Connector and Job Scheduling Services components. However, without these components the benefits of the Job Scheduling Console cannot be realized. This is only an option if a customer is willing to perform all operations from just the command line interface. In high availability contexts, both IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework are typically deployed in a high availability environment. In this Redbook, we will show how to deploy IBM Tivoli Workload Scheduler both with and without IBM Tivoli Management Framework.

1.3 High availability terminology used in this book


It helps to share a common terminology for concepts used in this redbook. The high availability field often uses multiple terms for the same concept, but in this redbook, we adhere to conventions set by International Business Machines Corporation whenever possible. Cluster Node Primary Backup This refers to a group of servers configured for high availability of one or more applications. This refers to a single server in a cluster. This refers to a node that initially runs an application when a cluster is started. This refers to one or more nodes that are designated as the servers an application will be migrated to if the applications primary node fails. This refers to the process of a node announcing its availability to the cluster. This refers to the process of a backup node taking over an application from a failed primary node. This refers to the process of a failed primary node that was repaired rejoining a cluster. Note that the primary nodes application does not necessarily have to migrate back to the primary node. See fallback. This refers to the process of migrating an application from a backup node to a primary node. Note that the primary node does not have to be the original primary node (for example, it can be a new node that joins the cluster).

Joining Fallover Reintegration

Fallback

Chapter 1. Introduction

For more terms commonly used when configuring high availability, refer to High Availability Cluster Multi-Processing for AIX Master Glossary, Version 5.1, SC23-4867.

1.4 Overview of clustering technologies


In this section we give an overview of clustering technologies with respect to high availability. A cluster is a group of loosely coupled machines networked together, sharing disk resources. While clusters can be used for more than just their high availability benefits (like cluster multi-processing), in this document we are only concerned with illustrating the high availability benefits; consult your IBM service provider for information about how to take advantage of the other benefits of clusters for IBM Tivoli Workload Scheduler. Clusters provide a highly available environment for mission-critical applications. For example, a cluster could run a database server program which services client applications on other systems. Clients send queries to the server program, which responds to their requests by accessing a database stored on a shared external disk. A cluster takes measures to ensure that the applications remain available to client processes even if a component in a cluster fails. To ensure availability, in case of a component failure, a cluster moves the application (along with resources that ensure access to the application) to another node in the cluster.

1.4.1 High availability versus fault tolerance


It is important for you to understand that we are detailing how to install IBM Tivoli Workload Scheduler in a highly available, but not a fault-tolerant, configuration.

Fault tolerance relies on specialized hardware to detect a hardware fault and instantaneously switch to a redundant hardware component (whether the failed component is a processor, memory board, power supply, I/O subsystem, or storage subsystem). Although this cut-over is apparently seamless and offers non-stop service, a high premium is paid in both hardware cost and performance because the redundant components do no processing. More importantly, the fault-tolerant model does not address software failures, by far the most common reason for downtime. High availability views availability not as a series of replicated physical
components, but rather as a set of system-wide, shared resources that cooperate to guarantee essential services. High availability combines software with industry-standard hardware to minimize downtime by quickly restoring essential services when a system, component, or application fails. While not instantaneous, services are restored rapidly, often in less than a minute.

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

The difference between fault tolerance and high availability, then, is this: a fault-tolerant environment has no service interruption, while a highly available environment has a minimal service interruption. Many sites are willing to absorb a small amount of downtime with high availability rather than pay the much higher cost of providing fault tolerance. Additionally, in most highly available configurations, the backup processors are available for use during normal operation. High availability systems are an excellent solution for applications that can withstand a short interruption should a failure occur, but which must be restored quickly. Some industries have applications so time-critical that they cannot withstand even a few seconds of downtime. Many other industries, however, can withstand small periods of time when their database is unavailable. For those industries, HACMP can provide the necessary continuity of service without total redundancy. Figure 1-4 shows the costs and benefits of availability technologies.

Figure 1-4 Cost and benefits of availability technologies

As you can see, availability is not an all-or-nothing proposition. Think of availability as a continuum. Reliable hardware and software provide the base level of availability. Advanced features such as RAID devices provide an enhanced level of availability. High availability software provides near-continuous

Chapter 1. Introduction

access to data and applications. Fault-tolerant systems ensure the constant availability of the entire system, but at a higher cost.

1.4.2 Server versus job availability


You should also be aware of the difference between availability of the server and availability of the jobs the server runs. This redbook shows how to implement a highly available server. Ensuring the availability of the jobs is addressed on a job-by-job basis. For example, Figure 1-5 shows a production day with four job streams, labeled A, B, C and D. In this example, a failure incident occurs in between job stream B and D, during a period of the production day when no other job streams are running.

Job Stream A Job Stream B Job Stream C Job Stream D

Production Day

Failure Incident

Figure 1-5 Example disaster recovery incident where no job recovery is required

Because no jobs or job streams are running at the moment of the failure, making IBM Tivoli Workload Scheduler itself highly available is sufficient to bring back scheduling services. No recovery of interrupted jobs is required. Now suppose that job streams B and D must complete before a database change is committed. If the failure happened during job stream D as in Figure 1-6 on page 11, then before IBM Tivoli Workload Scheduler is restarted on a new server, the database needs to be rolled back so that when job stream B is restarted, it will not corrupt the database.

10

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Job Stream A Job Stream B Job Stream C Job Stream D

Production Day

Failure Incident

Figure 1-6 Example disaster recovery incident where job recovery not related to IBM Tivoli Workload Scheduler is required

This points out some important observations about high availability with IBM Tivoli Workload Scheduler. It is your responsibility to ensure that the application-specific business logic of your application is preserved across a disaster incident. For example, IBM Tivoli Workload Scheduler cannot know that a database needs to be rolled back before a job stream is restarted as part of a high availability recovery. Knowing what job streams and jobs to restart after IBM Tivoli Workload Scheduler falls over to a backup server is dependent upon the specific business logic of your production plan. In fact, it is critical to the success of a recovery effort that the precise state of the production day at the moment of failure is communicated to the team performing the recovery. Lets look at Figure 1-7 on page 12, which illustrates an even more complex situation: multiple job streams are interrupted, each requiring its own, separate recovery activity.

Chapter 1. Introduction

11

Job Stream A Job Stream B Job Stream C Job Stream D

Production Day

Failure Incident

Figure 1-7 Example disaster recovery incident requiring multiple, different job recovery actions

The recovery actions for job stream A in this example are different from the recovery actions for job stream B. In fact, depending upon the specifics of what your jobs and job streams run, the recovery action for a job stream that are required after a disaster incident could be different depending upon what jobs in a job stream finished before the failure. The scenario this redbook is most directly applicable towards is restarting an IBM Tivoli Workload Scheduler Master Domain Manager server on a highly available cluster where no job streams other than FINAL are executed. The contents of this redbook can also be applied to Master Domain Manager, Domain Manager, and Fault Tolerant Agent servers that run job streams requiring specific recovery actions as part of a high availability recovery. But implementing these scenarios requires simultaneous implementation of high availability for the individual jobs. The exact details of such implementations are specific to your jobs, and cannot be generalized in a cookbook manner. If high availability at the job level is an important criteria, your IBM service provider can help you to implement it.

1.4.3 Standby versus takeover configurations


There are two basic types of cluster configurations: Standby This is the traditional redundant hardware configuration. One or more standby nodes are set aside idling, waiting for a primary server in the cluster to fail. This is also known as hot standby.

12

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Takeover

In this configuration, all cluster nodes process part of the clusters workload. No nodes are set aside as standby nodes. When a primary node fails, one of the other nodes assumes the workload of the failed node in addition to its existing primary workload. This is also known as mutual takeover.

Typically, implementations of both configurations will involve shared resources. Disks or mass storage like a Storage Area Network (SAN) are most frequently configured as a shared resource. Figure 1-8 shows a standby configuration in normal operation, where Node A is the primary node, and Node B is the standby node and currently idling. While Node B has a connection the shared mass storage resource, it is not active during normal operation.

Node A

Node B
Standby (idle)

Mass Storage
Figure 1-8 Standby configuration in normal operation

After Node A falls over to Node B, the connection to the mass storage resource from Node B will be activated, and because Node A is unavailable, its connection to the mass storage resource is inactive. This is shown in Figure 1-9 on page 14.

Chapter 1. Introduction

13

Node A (down)

Node B
Standby (active)

X
Mass Storage
Figure 1-9 Standby configuration in fallover operation

By contrast, a takeover configuration of this environment accesses the shared disk resource at the same time. For IBM Tivoli Workload Scheduler high availability configurations, this usually means that the shared disk resource has separate, logical filesystem volumes, each accessed by a different node. This is illustrated by Figure 1-10 on page 15.

14

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Node A

Node B

App 1

App 2

Node A FS Node B FS Mass Storage


Figure 1-10 Takeover configuration in normal operation

During normal operation of this two-node highly available cluster in a takeover configuration, the filesystem Node A FS is accessed by App 1 on Node A, while the filesystem Node B FS is accessed by App 2 on Node B. If either node fails, the other node will take on the workload of the failed node. For example, if Node A fails, App 1 is restarted on Node B, and Node B opens a connection to filesystem Node A FS. This fallover scenario is illustrated by Figure 1-11 on page 16.

Chapter 1. Introduction

15

Node A

Node B

X
Node A FS Node B FS Mass Storage
Figure 1-11 Takeover configuration in fallover operation

App 2 App 1

Takeover configurations are more efficient with hardware resources than standby configurations because there are no idle nodes. Performance can degrade after a node failure, however, because the overall load on the remaining nodes increases. In this redbook we will be showing how to configure IBM Tivoli Workload Scheduler for takeover high availability.

1.4.4 IBM HACMP


The IBM tool for building UNIX-based, mission-critical computing platforms is the HACMP software. The HACMP software ensures that critical resources, such as applications, are available for processing. HACMP has two major components: high availability (HA) and cluster multi-processing (CMP). In this document we focus upon the HA component. The primary reason to create HACMP Clusters is to provide a highly available environment for mission-critical applications. For example, an HACMP Cluster could run a database server program that services client applications. The clients send queries to the server program, which responds to their requests by accessing a database stored on a shared external disk.

16

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

In an HACMP Cluster, to ensure the availability of these applications, the applications are put under HACMP control. HACMP takes measures to ensure that the applications remain available to client processes even if a component in a cluster fails. To ensure availability, in case of a component failure, HACMP moves the application (along with resources that ensure access to the application) to another node in the cluster.

Benefits
HACMP helps you with each of the following: The HACMP planning process and documentation include tips and advice on the best practices for installing and maintaining a highly available HACMP Cluster. Once the cluster is operational, HACMP provides the automated monitoring and recovery for all the resources on which the application depends. HACMP provides a full set of tools for maintaining the cluster, while keeping the application available to clients. HACMP lets you: Set up an HACMP environment using online planning worksheets to simplify initial planning and setup. Ensure high availability of applications by eliminating single points of failure in an HACMP environment. Leverage high availability features available in AIX. Manage how a cluster handles component failures. Secure cluster communications. Set up fast disk takeover for volume groups managed by the Logical Volume Manager (LVM). Manage event processing for an HACMP environment. Monitor HACMP components and diagnose problems that may occur. For a general overview of all HACMP features, see the IBM Web site:
http://www-1.ibm.com/servers/aix/products/ibmsw/high_avail_network/hacmp.html

Enhancing availability with the AIX software


HACMP takes advantage of the features in AIX, which is the high-performance UNIX operating system. AIX Version 5.1 adds new functionality to further improve security and system availability. This includes improved availability of mirrored data and

Chapter 1. Introduction

17

enhancements to Workload Manager that help solve problems of mixed workloads by dynamically providing resource availability to critical applications. Used with the IBM IBM ^ pSeries, HACMP can provide both horizontal and vertical scalability, without downtime. The AIX operating system provides numerous features designed to increase system availability by lessening the impact of both planned (data backup, system administration) and unplanned (hardware or software failure) downtime. These features include: Journaled File System and Enhanced Journaled File System Disk mirroring Process control Error notification The IBM HACMP software provides a low-cost commercial computing environment that ensures that mission-critical applications can recover quickly from hardware and software failures. The HACMP software is a high availability system that ensures that critical resources are available for processing. High availability combines custom software with industry-standard hardware to minimize downtime by quickly restoring services when a system, component, or application fails. While not instantaneous, the restoration of service is rapid, usually 30 to 300 seconds.

Physical components of an HACMP Cluster


HACMP provides a highly available environment by identifying a set of resources essential to uninterrupted processing, and by defining a protocol that nodes use to collaborate to ensure that these resources are available. HACMP extends the clustering model by defining relationships among cooperating processors where one processor provides the service offered by a peer, should the peer be unable to do so. An HACMP Cluster is made up of the following physical components: Nodes Shared external disk devices Networks Network interfaces Clients The HACMP software allows you to combine physical components into a wide range of cluster configurations, providing you with flexibility in building a cluster that meets your processing requirements. Figure 1-12 on page 19 shows one

18

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

example of an HACMP Cluster. Other HACMP Clusters could look very different, depending on the number of processors, the choice of networking and disk technologies, and so on.

Figure 1-12 Example HACMP Cluster

Nodes
Nodes form the core of an HACMP Cluster. A node is a processor that runs both AIX and the HACMP software. The HACMP software supports pSeries uniprocessor and symmetric multiprocessor (SMP) systems, and the Scalable POWERParallel processor (SP) systems as cluster nodes. To the HACMP software, an SMP system looks just like a uniprocessor. SMP systems provide a cost-effective way to increase cluster throughput. Each node in the cluster can be a large SMP machine, extending an HACMP Cluster far beyond the limits of a single system and allowing thousands of clients to connect to a single database.

Chapter 1. Introduction

19

In an HACMP Cluster, up to 32 RS/6000 or pSeries stand-alone systems, pSeries divided into LPARS, SP nodes, or a combination of these cooperate to provide a set of services or resources to other entities. Clustering these servers to back up critical applications is a cost-effective high availability option. A business can use more of its computing power, while ensuring that its critical applications resume running after a short interruption caused by a hardware or software failure. In an HACMP Cluster, each node is identified by a unique name. A node may own a set of resources (disks, volume groups, filesystems, networks, network addresses, and applications). Typically, a node runs a server or a back-end application that accesses data on the shared external disks. The HACMP software supports from 2 to 32 nodes in a cluster, depending on the disk technology used for the shared external disks. A node in an HACMP Cluster has several layers of software components.

Shared external disk devices


Each node must have access to one or more shared external disk devices. A shared external disk device is a disk physically connected to multiple nodes. The shared disk stores mission-critical data, typically mirrored or RAID-configured for data redundancy. A node in an HACMP Cluster must also have internal disks that store the operating system and application binaries, but these disks are not shared. Depending on the type of disk used, the HACMP software supports two types of access to shared external disk devices: non-concurrent access, and concurrent access. In non-concurrent access environments, only one connection is active at any given time, and the node with the active connection owns the disk. When a node fails, disk takeover occurs when the node that currently owns the disk leaves the cluster and a surviving node assumes ownership of the shared disk. This is what we show in this redbook. In concurrent access environments, the shared disks are actively connected to more than one node simultaneously. Therefore, when a node fails, disk takeover is not required. We do not show this here because concurrent access does not support the use of the Journaled File System (JFS), and JFS is required to use either IBM Tivoli Workload Scheduler or IBM Tivoli Management Framework.

Networks
As an independent, layered component of AIX, the HACMP software is designed to work with any TCP/IP-based network. Nodes in an HACMP Cluster use the network to allow clients to access the cluster nodes, enable cluster nodes to

20

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

exchange heartbeat messages and, in concurrent access environments, serialize access to data. The HACMP software has been tested with Ethernet, Token-Ring, ATM, and other networks. The HACMP software defines two types of communication networks, characterized by whether these networks use communication interfaces based on the TCP/IP subsystem (TCP/IP-based), or communication devices based on non-TCP/IP subsystems (device-based).

Clients A client is a processor that can access the nodes in a cluster over a local area
network. Clients each run a front-end or client application that queries the server application running on the cluster node. The HACMP software provides a highly available environment for critical data and applications on cluster nodes. Note that the HACMP software does not make the clients themselves highly available. AIX clients can use the Client Information (Clinfo) services to receive notice of cluster events. Clinfo provides an API that displays cluster status information. The /usr/es/sbin/cluster/clstat utility, a Clinfo client shipped with the HACMP software, provides information about all cluster service interfaces. The clients for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework are the Job Scheduling Console and the Tivoli Desktop applications, respectively. These clients do not support the Clinfo API, but feedback that the cluster server is not available is immediately provided within these clients.

1.4.5 Microsoft Cluster Service


Microsoft Cluster Service (MSCS) provides three primary services: Availability Continue providing a service even during hardware or software failure. This redbook focuses upon leveraging this feature of MSCS. Enable additional components to be configured as system load increases. Manage groups of systems and their applications as a single system.

Scalability Simplification

MSCS is a built-in feature of Windows NT/2000 Server Enterprise Edition. It is software that supports the connection of two servers into a cluster for higher availability and easier manageability of data and applications. MSCS can automatically detect and recover from server or application failures. It can be used to move server workload to balance utilization and to provide for planned maintenance without downtime.

Chapter 1. Introduction

21

MSCS uses software heartbeats to detect failed applications or servers. In the event of a server failure, it employs a shared nothing clustering architecture that automatically transfers ownership of resources (such as disk drives and IP addresses) from a failed server to a surviving server. It then restarts the failed servers workload on the surviving server. All of this, from detection to restart, typically takes under a minute. If an individual application fails (but the server does not), MSCS will try to restart the application on the same server. If that fails, it moves the applications resources and restarts it on the other server. MSCS does not require any special software on client computers; so, the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection is often transparent because MSCS restarts the application using the same IP address. If a client is using stateless connections (such as a browser connection), then it would be unaware of a failover if it occurred between server requests. If a failure occurs when a client is connected to the failed resources, then the client will receive whatever standard notification is provided by the client side of the application in use. For a client side application that has statefull connections to the server, a new logon is typically required following a server failure. No manual intervention is required when a server comes back online following a failure. As an example, when a server that is running Microsoft Cluster Server (server A) boots, it starts the MSCS service automatically. MSCS in turn checks the interconnects to find the other server in its cluster (server B). If server A finds server B, then server A rejoins the cluster and server B updates it with current cluster information. Server A can then initiate a failback, moving back failed-over workload from server B to server A.

Microsoft Cluster Service concepts


Microsoft provides an overview of MSCS in a white paper that is available at:
http://www.microsoft.com/ntserver/ProductInfo/Enterprise/clustering/ClustArchit.asp

The key concepts of MSCS are covered in this section.

Shared nothing
Microsoft Cluster employs a shared nothing architecture in which each server owns its own disk resources (that is, they share nothing at any point in time). In the event of a server failure, a shared nothing cluster has software that can transfer ownership of a disk from one server to another.

22

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster Services Cluster Services is the collection of software on each node that manages all
cluster-specific activity.

Resource A resource is the canonical item managed by the Cluster Service. A resource
may include physical hardware devices (such as disk drives and network cards), or logical items (such as logical disk volumes, TCP/IP addresses, entire applications, and databases).

Group A group is a collection of resources to be managed as a single unit. A group


contains all of the elements needed to run a specific application and for client systems to connect to the service provided by the application. Groups allow an administrator to combine resources into larger logical units and manage them as a unit. Operations performed on a group affect all resources within that group.

Fallback Fallback (also referred as failback) is the ability to automatically rebalance the
workload in a cluster when a failed server comes back online. This is a standard feature of MSCS. For example, say server A has crashed, and its workload failed over to server B. When server A reboots, it finds server B and rejoins the cluster. It then checks to see if any of the Cluster Group running on server B would prefer to be running in server A. If so, it automatically moves those groups from server B to server A. Fallback properties include information such as which group can fallback, which server is preferred, and during what hours the time is right for a fallback. These properties can all be set from the cluster administration console.

Quorum Disk A Quorum Disk is a disk spindle that MSCS uses to determine whether another
server is up or down. When a cluster member is booted, it searches whether the cluster software is already running in the network: If it is running, the cluster member joins the cluster. If it is not running, the booting member establishes the cluster in the network. A problem may occur if two cluster members are restarting at the same time, thus trying to form their own clusters. This potential problem is solved by the Quorum Disk concept. This is a resource that can be owned by one server at a time and for which servers negotiate for ownership. The member who has the Quorum Disk creates the cluster. If the member that has the Quorum Disk fails, the resource is reallocated to another member, which in turn, creates the cluster.

Chapter 1. Introduction

23

Negotiating for the quorum drive allows MSCS to avoid split-brain situations where both servers are active and think the other server is down.

Load balancing Load balancing is the ability to move work from a very busy server to a less busy
server.

Virtual server A virtual server is the logical equivalent of a file or application server. There is no
physical component in the MSCS that is a virtual server. A resource is associated with a virtual server. At any point in time, different virtual servers can be owned by different cluster members. The virtual server entity can also be moved from one cluster member to another in the event of a system failure.

1.5 When to implement IBM Tivoli Workload Scheduler high availability


Specifying the appropriate level of high availability for IBM Tivoli Workload Scheduler often depends upon how much reliability needs to be built into the environment, balanced against the cost of solution. High availability is a spectrum of options, driven by what kind of failures you want IBM Tivoli Workload Scheduler to survive. These options lead to innumerable permutations of high availability configurations and scenarios. Our goal in this redbook is to demonstrate enough of the principles in configuring IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework to be highly available in a specific, non-trivial scenario such that you can use the principles to implement other configurations.

1.5.1 High availability solutions versus Backup Domain Manager


IBM Tivoli Workload Scheduler provides a degree of high availability through its Backup Domain Manager feature, which can also be implemented as a Backup Master Domain Manager. This works by duplicating the changes to the production plan from a Domain Manager to a Backup Domain Manager. When a failure is detected, a switchmgr command is issued to all workstations in the Domain Manager servers domain, causing these workstations to recognize the Backup Domain Manager. However, properly implementing a Backup Domain Manager is difficult. Custom scripts have to be developed to implement sensing a failure, transferring the scheduling objects database, and starting the switchmgr command. The code for sensing a failure is by itself a significant effort. Possible failures to code for

24

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

include network adapter failure, disk I/O adapter failure, network communications failure, and so on. If any jobs are run on the Domain Manager, the difficulty of implementing a Backup Domain Manager becomes even more obvious. In this case, the custom scripts also have to convert the jobs to run on the Backup Domain Manager, for instance by changing all references to the workstation name of the Domain Manager to the workstation name of the Backup Domain Manager, and changing references to the hostname of the Domain Manager to the hostname of the Backup Domain Manager. Then even more custom scripts have to be developed to migrate scheduling object definitions back to the Domain Manager, because once the failure has been addressed, the entire process has to be reversed. The effort required can be more than the cost of acquiring a high availability product, which addresses many of the coding issues that surround detecting hardware failures. The Total Cost of Ownership of maintaining the custom scripts also has to be taken into account, especially if jobs are run on the Domain Manager. All the nuances of ensuring that the same resources that jobs expect on the Domain Manager are met on the Backup Domain Manager have to be coded into the scripts, then documented and maintained over time, presenting a constant drain on internal programming resources. High availability products like IBM HACMP and Microsoft Cluster Service provide a well-documented, widely-supported means of expressing the required resources for jobs that run on a Domain Manager. This makes it easy to add computational resources (for example, disk volumes) that jobs require into the high availability infrastructure, and keep it easily identified and documented. Software failures like a critical IBM Tivoli Workload Scheduler process crashing are addressed by both the Backup Domain Manager feature and IBM Tivoli Workload Scheduler configured for high availability. In both configurations, recovery at the job level is often necessary to resume the production day. Implementing high availability for Fault Tolerant Agents cannot be accomplished using the Backup Domain Manager feature. Providing hardware high availability for a Fault Tolerant Agent server can be accomplished through custom scripting, but using a high availability solution is strongly recommended. Table 1-1 on page 26 illustrates the comparative advantages of using a high availability solution versus the Backup Domain Manager feature to deliver a highly available IBM Tivoli Workload Scheduler configuration.

Chapter 1. Introduction

25

Table 1-1 Solution HA BMDM

Comparative advantages of using a high availability solution Hardware Software FTA Cost TCO: $$ Initially: $ TCO: $$

P P

1.5.2 Hardware failures to plan for


When identifying the level of high availability for IBM Tivoli Workload Scheduler, potential hardware failures you want to plan for can affect the kind of hardware used for the high availability solution. In this section, we address some of the hardware failures you may want to consider when planning for high availability for IBM Tivoli Workload Scheduler. Site failure occurs when an entire computer room or data center becomes unavailable. Mitigating this failure involves geographically separate nodes in a high availability cluster. Products like IBM High Availability Geographic Cluster system (HAGEO) deliver a solution for geographic high availability. Consult your IBM service provider for help with implementing geographic high availability. Server failure occurs when a node in a high availability cluster fails. The minimum response to mitigate this failure mode is to make backup node available. However, you might also want to consider providing more than one backup node if the workstation you are making highly available is important enough to warrant redundant backup nodes. In this redbook we show how to implement a two-node cluster, but additional nodes are an extension to a two-node configuration. Consult your IBM service provider for help with implementing multiple-node configurations. Network failures occur when either the network itself (through a component like a router or switch), or network adapters on the server, fail. This type of failure is often addressed with redundant network paths in the former case, and redundant network adapters in the latter case. Disk failure occurs when a shared disk in a high availability cluster fails. Mitigating this failure mode often involves a Redundant Array of Independent Disks (RAID) array. However, even a RAID can catastrophically fail if two or more disk drives fail at the same time, if a power supply fails, or a backup power supply fails at the same time as a primary power supply. Planning for these catastrophic failures usually involves creating one or more mirrors of the RAID array, sometimes even on separate array hardware. Products like the IBM TotalStorage Enterprise Storage Server (ESS) and TotalStorage 7133 Serial Disk System can address these kinds of advanced disk availability requirements.

26

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

These are only the most common hardware failures to plan for. Other failures may also be considered while planning for high availability.

1.5.3 Summary
In summary, for all but the simplest configuration of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework, using a high availability solution to deliver high availability services is the recommended approach to satisfy high availability requirements. Identifying the kinds of hardware and software failures you want your IBM Tivoli Workload Scheduler installation to address with high availability is a key part of creating an appropriate high availability solution.

1.6 Material covered in this book


In the remainder of this redbook, we focus upon the applicable high availability concepts for IBM Tivoli Workload Scheduler, and two detailed implementations of high availability for IBM Tivoli Workload Scheduler, one using IBM HACMP and the other using Microsoft Cluster Service. In particular, we show you: Key architectural design issues and concepts to consider when designing highly available clusters for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework; refer to Chapter 2, High level design and architecture on page 31. How to implement an AIX HACMP and Microsoft Cluster Service cluster; refer to Chapter 3, High availability cluster implementation on page 63. How to implement a highly available installation of IBM Tivoli Workload Scheduler, and a highly available IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework, on AIX HACMP and Microsoft Cluster Service; refer to Chapter 4, IBM Tivoli Workload Scheduler implementation in a cluster on page 183. How to implement a highly available installation of IBM Tivoli Management Framework on AIX HACMP and Microsoft Cluster Service; refer to Chapter 5, Implement IBM Tivoli Management Framework in a cluster on page 415. The chapters are generally organized around the products we cover in this redbook: AIX HACMP, Microsoft Cluster Service, IBM Tivoli Workload Scheduler, and IBM Tivoli Management Framework. The nature of high availability design and implementation requires that some products and the high availability tool be considered simultaneously, especially during the planning

Chapter 1. Introduction

27

stage. This tends to lead to a haphazard sequence when applied along any thematic organization, except a straight cookbook recipe approach. We believe the best results are obtained when we present enough of the theory and practice of implementing highly available IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework installations so that you can apply the illustrated principles to your own requirements. This rules out a cookbook recipe approach in the presentation, but readers who want a recipe will still find value in this redbook. If you are particularly interested in following a specific configuration we show in this redbook from beginning to end, the following chapter road map gives the order that you should read the material. If you are not familiar with high availability in general, and AIX HACMP or Microsoft Cluster Service in particular, we strongly recommend that you use the introductory road map shown in Figure 1-13.

Chapter 1

Chapter 2

Figure 1-13 Introductory high availability road map

If you want an installation of IBM Tivoli Workload Scheduler in a highly available configuration by itself, without IBM Tivoli Management Framework, the road map shown in Figure 1-14 on page 29 gives the sequence of chapters to read. This would be appropriate, for example, for implementing a highly available Fault Tolerant Agent.

28

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 3

Chapter 4 (except for Framework sections)

Figure 1-14 Road map for implementing highly available IBM Tivoli Workload Scheduler (no IBM Tivoli Management Framework, no Job Scheduling Console access through cluster nodes)

If you want to implement an installation of IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework, use the road map shown in Figure 1-15.

Chapter 3

Chapter 4

Figure 1-15 Road map for implementing IBM Tivoli Workload Scheduler in a highly available configuration, with IBM Tivoli Management Framework

If you want to implement an installation of IBM Tivoli Management Framework in a highly available configuration by itself, without IBM Tivoli Workload Scheduler, the road map shown in Figure 1-16 on page 30 should be used. This would be appropriate, for example, for implementing a stand-alone IBM Tivoli Management Framework server as a prelude to installing and configuring other IBM Tivoli products.

Chapter 1. Introduction

29

Chapter 3

Chapter 5

Figure 1-16 Road map for implementing IBM Tivoli Management Framework by itself

High availability design is a very broad subject. In this redbook, we provide representative scenarios meant to demonstrate to you the issues that must be considered during implementation. Many ancillary issues are briefly mentioned but not explored in depth here. For further information, we encourage you to read the material presented in Related publications on page 611.

30

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 2.

High level design and architecture


Implementing a high availability cluster is an essential task for most mission-critical systems. In this chapter, we present a high level overview of HA clusters. We cover the following topics: Concepts of high availability clusters on page 32 Hardware configurations on page 43 Software configurations on page 46

Copyright IBM Corp. 2004. All rights reserved.

31

2.1 Concepts of high availability clusters


Today, as more and more business and non-business organizations rely on their computer systems to carry out their operations, ensuring high availability (HA) to their computer systems has become a key issue. A failure of a single system component could result in an extended denial of service. To avoid or minimize the risk of denial of service, many sites consider an HA cluster to be a high availability solution. In this section we describe what an HA cluster is normally comprised of, then discuss software/hardware considerations and introduce possible ways of configuring an HA cluster.

2.1.1 A birds-eye view of high availability clusters


We start with defining the components of a high availability cluster.

Basic elements of a high availability cluster


A typical HA cluster, as introduced in Chapter 1, Introduction on page 1, is a group of machines networked together sharing external disk resources. The ultimate purpose of setting up an HA cluster is to eliminate any possible single points of failure. By eliminating single points of failure, the system can continue to run, or recover in an acceptable period of time, with minimal impact to the end users. Two major elements make a cluster highly available: A set of redundant system components Cluster software that monitors and controls these components in case of a failure Redundant system components provide backup in case of a single component failure. In an HA cluster, an additional server(s) is added to provide server-level backups in case of a server failure. Components in a server, such as network adapters, disk adapters, disks and power supplies, are also duplicated to eliminate single points of failure. However, simply duplicating system components does not provide high availability, and cluster software is usually employed to control them. Cluster software is the core element in HA clusters. It is what ties system components into clusters and takes control of those clusters. Typical cluster software provides a facility to configure clusters and predefine actions to be taken in case of a component failure. The basic function of cluster software in general is to detect component failure and control the redundant components to restore service after a failure. In the event of a component failure, cluster software quickly transfers whatever service

32

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

the failed component provided to a backup component, thus ensuring minimum downtime. There are several cluster software products in the market today; Table 2-1 lists common cluster software for each platform.
Table 2-1 Commonly used cluster software - by platform Platform type AIX HP-UX Solaris Linux Cluster software HACMP MC/Service Guard Sun Cluster, Veritas Cluster Service SCYLD Beowulf, Open Source Cluster Application Resources (OSCAR), IBM Tivoli System Automation Microsoft Cluster Service

Microsoft Windows

Each cluster software product has its own unique benefits, and the terminologies and technologies may differ from product to product. However, the basic concept and functions of most cluster software provides have much in common. In the following sections we describe how an HA cluster is typically configured and how it works, using simplified examples.

Typical high availability cluster configuration


Most cluster software offers various options to configure an HA cluster. Configurations depend on the systems high availability requirements and the cluster software used. Though there are several variations, the two configurations types most often discussed are idle or hot standby, and mutual takeover. Basically, a hot standby configuration assumes a second physical node capable of taking over for the first node. The second node sits idle except in the case of a fallover. Meanwhile, the mutual takeover configuration consists of two nodes, each with their own set of applications, that can take on the function of the other in case of a node failure. In this configuration, each node should have sufficient machine power to run jobs of both nodes in the event of a node failure. Otherwise, the applications of both nodes will run in a degraded mode after a fallover, since one node is doing the job previously done by two. Mutual takeover is usually considered to be a more cost effective choice since it avoids having a system installed just for hot standby. Figure 2-1 on page 34 shows a typical mutual takeover configuration. Using this figure as an example, we will describe what comprises an HA cluster. Keep in mind that this is just an example of an HA cluster configuration. Mutual takeover is a popular configuration; however, it may or may not be the best high

Chapter 2. High level design and architecture

33

availability solution for you. For a configuration that best matches your requirements, consult your service provider.

Cluster_A
subnet1 subnet2 net_hb

App_A

App_B

Disk_A

Disk_B

Disk_A mirror Node_A

Disk_B mirror Node_B

Figure 2-1 A typical HA cluster configuration

As you can see in Figure 2-1, Cluster_A has Node_A and Node_B. Each node is running an application. The two nodes are set up so that each node is able to provide the function of both nodes in case a node or a system component on a node fails. In normal production, Node_A runs App_A and owns Disk_A, while Node_B runs App_B and owns Disk_B. When one of the nodes fail, the other node will acquire ownership of both disks and run both applications. Redundant hardware components are the bottom-line requirement to enable a high availability scenario. In the scenario shown here, notice that most hardware components are duplicated. The two nodes are each connected to two physical TCP/IP networks, subnet1 and subnet2, providing an alternate network connection in case of a network component failure. They share a same set of external disks, Disk_A and Disk_B, each mirrored to prevent the loss of data in case of a disk failure. Both nodes have a path to connect to the external disks. This enables one node to acquire owner ship of an external disk owned by

34

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

another node in case of a node failure. For example, if Node_A fails, Node_B can acquire ownership of Disk_A and resume whatever service that requires Disk_A. Disk adapters connecting the nodes and the external disks are duplicated to provide backup in the event of a disk adapter failure. In some cluster configurations, there may be an additional non-TCP/IP network that directly connects the two nodes, used for heartbeats. This is shown in the figure as net_hb. To detect failures such as network and node failure, most cluster software uses the heartbeat mechanism. Each node in the cluster sends heartbeat packets to its peer nodes over TCP/IP network and/or non-TCP/IP network. If heartbeat packets are not received from the peer node for a predefined amount of time, the cluster software interprets it as a node failure. When using only TCP/IP networks to send heartbeats, it is difficult to differentiate node failures from network failures. Because of this, most cluster software recommends (or require) a dedicated point-to-point network for sending heartbeat packets. Used together with TCP/IP networks, the point-to-point network prevents cluster software from misinterpreting network component failure as node failure. The network type for this point-to-point network may vary depending on the type of network the cluster software supports. RS-232C, Target Mode SCSI, Target Mode SSA is supported for point-to-point networks in some cluster software.

Managing system components


Cluster software is responsible for managing system components in a cluster. It is typically installed on the local disk of each cluster node. There is usually a set of processes or services that is running constantly on the cluster nodes. It monitors system components and takes control of those resources when required. These processes or services are often referred to as the cluster manager. On a node, applications and other system components that are required by those applications are bundled into a group. Here, we refer to each application and system component as resource, and refer to a group of these resources as resource group. A resource group is generally comprised of one or more applications, one or more logical storages residing on an external disk, and an IP address that is not bound to a node. There may be more or fewer resources in the group, depending on application requirements and how much the cluster software is able to support.

Chapter 2. High level design and architecture

35

A resource group is associated with two or more nodes in the cluster, and in normal production. A resource group is the unit that a cluster manager uses to move resources to one node from another. It will reside on the primary node in normal production; in the event of a node or component failure on the primary node, the cluster manager will move the group to another node. Figure 2-2 shows an example of resources and resource groups in a cluster.

Cluster_A
192.168.1.101 192.168.1.102

APP1

APP2

DISK1

DISK3

DISK2

DISK4

Node_A
Resource Group: GRP_1 Application: APP1 Disk: DISK1, DISK2 IP Address: 192.168.1.101

Node_B
Resource Group: GRP_2 Application: APP2 Disk: DISK3, DISK4 IP Address: 192.168.1.102

Figure 2-2 Resource groups in a cluster

In Figure 2-2, a resource group called GRP_1 is comprised of an application called APP1, and external disks DISK1 and DISK2. IP address 192.168.1.101 is associated to GRP_1. The primary node for GRP1 is Node_A, and the secondary node is Node_B. GRP_2 is comprised of application APP2, and disks DISK3 and DISK4, and IP address 192.168.1.102. For GRP_2, Node_B is the primary node and Node_A is the secondary node.

36

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Fallover and fallback of a resource group


In normal production, cluster software constantly monitors the cluster resources for any signs of failure. As soon as a cluster manager running on a node detects a node or a component failure, it will quickly acquire the ownership of the resource group and restart the application. In our example, assume a case where Node_A crashed. Through heartbeats, Node_B detects Node_As failure. Because Node_B is configured as a secondary node for resource GRP_1, Node_Bs cluster manager acquires ownership of resource group GRP_1. As a result, DISK1 and DISK2 are mounted on Node_B, and the IP address associated to GRP_1 has moved to Node_B. Using these resources, Node_B will restart APP1, and resume application processing. Because these operations are initiated automatically based on pre-defined actions, it is a matter of minutes before processing of APP1 is restored. This is called a fallover. Figure 2-3 on page 38 shows an image of the cluster after fallover.

Chapter 2. High level design and architecture

37

Cluster_A
192.168.1.102 192.168.1.101

APP1

DISK1

DISK3
APP2

DISK2

DISK4

Node_A

Node_B
Resource Group: GRP_2 Application: APP2 Disk: DISK3, DISK4 IP Address: 192.168.1.102 Resource Group: GRP_1 Application: APP1 Disk: DISK1, DISK2 IP Address: 192.168.1.101

Figure 2-3 Fallover of a resource group

Note that this is only a typical scenario of a fallover. Most cluster software is capable of detecting both hardware and software component failures, if configured to do so. In addition to basic resources such as nodes, network, disks, what other resources could be monitored differs by product. Some cluster software may require more or less configuration to monitor the same set of resources. For details on what your choice of cluster software can monitor, consult your service provider. After a node recovers from a failure, it rejoins the cluster. Depending on the cluster configuration, the resource group that failed over to a standby node is returned to the primary node at the time of rejoining. In this Redbook, we refer to this cluster behavior as fallback.

38

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

To describe this behavior using our example, when fallback is initiated, resource group GRP_1 moves back to Node_A and returns to its normal production state as shown in Figure 2-2 on page 36. There are some considerations about fallback. These are summarized in 2.1.2, Software considerations on page 39 under Fallback policy. As described, cluster software addresses node failure by initiating a fallover of a resource group from the failed node to the standby node. A failed node would eventually recover from a failure and rejoin the cluster. After the rejoining of the failed node, you would have the choice of either keeping the resource group on the secondary node, or relocating the resource group to the original node. If you choose the latter option, then you should consider the timing of when to initiate the fallback. Most cluster software provides options on how a resource group should be managed in the event of a node rejoining the cluster. Typically you would have the option of either initiating a fallback automatically when the node rejoins the cluster, or have the node just rejoin the cluster and manually initiate a fallback whenever appropriate. When choosing to initiate an automatic fallback, be aware that this initiates a fallback regardless of the application status. A fallback usually requires stopping the application on the secondary node and restarting the application on the primary node. Though a fallback generally takes place in a short period of time, this may disrupt your application processing. To implement a successful HA cluster, certain software considerations and hardware considerations should be met. In the following section, we describe what you need to consider prior to implementing HA clusters.

2.1.2 Software considerations


In order to make your application highly available, you must either use the high availability functions that your application provides, or put them under the control of cluster software. Many sites look to cluster software as a solution to ensure application high availability, as it is usually the case that high availability functions within an application do not withstand hardware failure. Though most software programs are able to run in a multi-node HA cluster environment and are controllable by cluster software, there are certain considerations to take into account. If you plan to put your application under control of any cluster software, check the following criteria to make sure your application is serviced correctly by cluster software.

Application behavior
First think about how your application behaves in a single-node environment. Then consider how your application may behave in a multi-node HA cluster. This

Chapter 2. High level design and architecture

39

determines how you should set up your application. Consider where you should place your application executables, and how you should configure your application to achieve maximum availability. Depending on how your application works, you may have to install them on a shared disk, or just have a copy of the software on the local disk of the other node. If several instances of the same application may run on one node in the event of a fallover, make sure that your application supports such a configuration.

Licensing
Understand your application licensing requirements and make sure the configuration you plan is not breaching the application license agreements. Some applications are license-protected by incorporating processor-specific information into each instance of application installed. This means that even though you implement your application appropriately and the cluster hardware handles the application correctly in case of a fallover, the application may not be able to start because of your license restrictions. Make sure you have licenses for each node in the cluster that may run your applications. If you plan to have several instances of the same application running on one node, ensure you have the license for each instance.

Dependencies
Check your application dependencies. When configuring your software for an HA cluster, it is important that you know what your applications are dependent upon, but it is even more important to know what your application should not be dependent upon. Make sure your application is independent of any node-bound resources. Any applications dependent on a resource that is bound to a particular node may have dependency problems, as those resources are usually not attached or accessible to the standby node. Things like binaries or configuration files installed on locally attached drives, hard coding to a particular device in a particular location, and hostname dependencies could become a potential dependency issue. Once you have confirmed that your application does not depend on any local resource, define which resource needs to be in place to run your application. Common dependencies are data on external disks and an IP address for client access. Check to see if your application needs other dependencies.

Automation
Most cluster software uses scripts or agents to control software and hardware components in a cluster. For this reason, most cluster software requires that any application handled by it must be able to start and stop by command without manual intervention. Scripts to start and stop your applications are generally required. Make sure your application provides startup and shutdown commands.

40

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Also, make sure that those commands do not prompt you for operator replies. If you plan to have your application monitored by the cluster software, you may have to develop a script to check the health of your application.

Robustness
Applications should be stable enough to withstand sudden hardware failure. This means that your application should be able to restart successfully on the other node after a node failure. Tests should be executed to determine if a simple restart of the application is sufficient to recover your application after a hardware failure. If further steps are needed, verify that your recovery procedure could be automated.

Fallback policy
As described in Fallover and fallback of a resource group on page 37, cluster software addresses node failure by initiating a fallover of the resource group from the failed node to the standby node. A failed node would eventually recover from a failure and rejoin the cluster. After the rejoining of the failed node, you would have the choice of either keeping the resource group on the secondary node or relocating the resource group to the original node. If you choose to relocate the resource group to the original node, then you should consider the timing of when to initiate the fallback. Most cluster software gives you options on how a resource group should be managed in the event of a node rejoining the cluster. Typically you would have the option of either initiating a fallback automatically when the node rejoins the cluster, or having the node just rejoin the cluster and manually initiate a fallback whenever appropriate. When choosing to initiate an automatic fallback, be aware that this initiates a fallback regardless of the application status. A fallback usually requires stopping the application on the secondary node and restarting the application on the primary node. Though a fallback generally takes place in a short period of time, this may disrupt your application processing.

2.1.3 Hardware considerations


In this case, hardware considerations involve how to provide redundancy. A cluster that provides maximum high availability is a cluster with no single points of failure. A single point of failure exists when a critical cluster function is provided by a single component. If that component fails, the cluster has no way of providing that function, and the application or service dependent on that component becomes unavailable. An HA cluster is able to provide high availability for most hardware components when redundant hardware is supplied and the cluster software is configured to

Chapter 2. High level design and architecture

41

take control of them. Preventing hardware components from becoming single points of failure is not a difficult task; simply duplicating them and configuring the cluster software to handle them in the event of a failure should solve the problem for most components. However, we remind you again that adding redundant hardware components is usually associated with a cost. You may have to make compromises at some point. Consider the priority of your application. Balance the cost of the failure against the cost of additional hardware and the workload it takes to configure high availability. Depending on the priority and the required level of availability for your application, manual recovery procedures after notifying the system administrator may be enough. In Table 2-2 we point out basic hardware components which could become a single point of failure, and describe how to address them. Some components simply need to be duplicated, with no additional configuration, because the hardware in which they reside automatically switches over to the redundant component in the event of a failure. For other components you may have to perform further configuration to handle them, or write custom code to detect their failure and trigger recovery actions. This may vary depending on the cluster software you use, so consult your service provider for detailed information.
Table 2-2 Eliminating single points of failure Hardware component Measures to eliminate single points of failure Set up a standby node. An additional node could be a standby for one or more nodes. If an additional node will just be a hot standby for one node during production, a node with the same machine power as the active node is sufficient. If you are planning a mutual takeover, make sure the node has enough power to execute all the applications that will run on that server in the event of a fallover. Use multiple circuits or uninterruptable power supplies (UPS.) To recover from a network adapter failure, you will need at least two network adapters per node. If your cluster software requires a dedicated TCP/IP network for heartbeats, additional network adapters may be added. Have multiple networks to connect nodes.

Node

Power source Network adapter

Network

42

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Hardware component TCP/IP subsystem

Measures to eliminate single points of failure Use a point-to-point network to connect nodes in the cluster. Most cluster software requires, or recommends, at least one active network (TCP/IP or non-TCP/IP) to send heartbeats to the peer nodes. By providing a point-to-point network, cluster software will be able to distinguish a network failure from a node failure. For cluster software that does not support non-TCP/IP network for heartbeats, consult your service provider for ways to eliminate TCP/IP subsystem as a single point of failure. Add an additional disk adapter to each node. When cabling your disks, make sure that each disk adapter has access to each external disk. This enables an alternate access path to external disks in case of a disk adapter failure. Use redundant disk controllers. Provide redundant disks and enable RAID to protect your data from disk failures.

Disk adapter

Disk controller Disk

2.2 Hardware configurations


In this section, we discuss the different types of hardware cluster, concentrating on disk clustering rather than network or IP load balancing scenarios. We also examine the differences between a hardware cluster and a hot standby system.

2.2.1 Types of hardware cluster


There are many types of hardware clustering configurations, but here we concentrate on four different configurations: two-node cluster, multi-node cluster, grid computing, and disk mirroring (these terms may vary, depending on the hardware manufacturer).

Two-node cluster
A two-node cluster is probably the most common form of hardware cluster configuration; it consists of two nodes which are able to access a disk system that is externally attached to the two nodes, as shown in Figure 2-4 on page 44. The external drive system can be attached over the LAN or SAN network (SSA Disk system), or even by local SCSI cables. This type of cluster is used when configuring only a couple of applications in a high availability cluster. This type of configuration can accommodate either

Chapter 2. High level design and architecture

43

Active/Passive or Active/Active, depending on the operating system and cluster software that is used.
Public Network Connection

Private Network Connection

Shared Disk Node1 Node2

Figure 2-4 Two-node cluster

Multi-node cluster
In a multi-node cluster, we have between two and a number of nodes that can access the same disk system, which is externally attached to this group of nodes, as shown in Figure 2-5 on page 45. The external disk system can be over the LAN or SAN. This type of configuration can be used for extra fault tolerance where, if Node1 were to fail, then all work would move onto Node2but if Node2 were to fail as well, then all work would then move on to the next node, and so on. It also can support many applications running simultaneously across all nodes configured in this cluster. The number of nodes that this configuration can support depends on the hardware and software manufacturers.

44

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Public Network Connection

Private Network Connection

Private Network Connection

Private Network Connection

Node1

Node2

Node3

Node4

Shared Disk

Figure 2-5 Multi-node cluster

Grid computing
Even though grid computing is not necessarily considered a cluster, it acts like one, so we will explain the concepts involved. Grid computing is based on the concept that the IT infrastructure can be managed as a collection of distributed computing resources available over a network that appear to an end user or application as one large virtual computing system. A grid can span locations, organizations, machine architectures, and software boundaries to provide unlimited power, collaboration, and information access to everyone connected to the grid. Grid computing enables you to deliver computing power to applications and users that need it on demand, which is only when they need it for meeting business objectives.

Disk mirroring
Disk mirroring is more commonly used in a hot standby mode, but it is also used in some clustering scenarios, especially when mirroring two systems across large distances; this will depend on the software and or hardware capabilities. Disk mirroring functionality can be performed by software in some applications and in some clustering software packages, but it can also be performed at the hardware level where you have a local disk on each side of a cluster and any

Chapter 2. High level design and architecture

45

changes made to one side is automatically sent across to the other side, thus keeping the two sides in synchronization.

2.2.2 Hot standby system


This terminology is used for a system that is connected to the network and fully configured, with all the applications loaded but not enabled. It is normally an identical system for which it is on standby for, and this is both hardware and software. One hot standby system can be on standby for several live systems which can include application servers which have a Fault Tolerant Agent, IBM Tivoli Workload Scheduler Master Domain Manager or a Domain Manager. The advantage over a hardware cluster is that one server can be configured for several systems, which cut the cost dramatically. The disadvantages over a hardware cluster are as follows: It is not an automatic switchover and can take several minuets or even hours to bring up the standby server. The work that was running on the live server has no visibility on the standby server, so an operator would have to know where to restart the standby server. The standby server has a different name, so the IBM Tivoli Workload Scheduler jobs would not run on this system as defined in the database. Therefore, the IBM Tivoli Workload Scheduler administrator would have to submit the rest of the jobs by hand or create a script to do this work.

2.3 Software configurations


In this section we cover the different ways to implement IBM Tivoli Workload Scheduler in a cluster and also look at some of the currently available software configurations built into IBM Tivoli Workload Scheduler.

2.3.1 Configurations for implementing IBM Tivoli Workload Scheduler in a cluster


Here we describe the different configurations of IBM Tivoli Workload Scheduler workstations, how they are affected in a clustered environment, and why each configuration would be put into a cluster. We will also cover the different types of Extended Agents and how they work in a cluster.

46

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Master Domain Manager


The Master Domain Manager is the most critical of all the IBM Tivoli Workload Scheduler workstation configurations. It is strongly recommended to configure this into a cluster, as it manages and controls the scheduling database. From this database, it generates and distributes the 24-hour daily scheduling plan called a symphony file. It also controls, coordinates and keeps track of all the scheduling dependences throughout the entire IBM Tivoli Workload Scheduler network. Keep the following considerations in mind when setting up a Master Domain Manager in a cluster: Connectivity to the IBM Tivoli Workload Scheduler database Ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2) Ability of the user interface (IBM Tivoli Workload Scheduler Console) to connect to the new location where IBM Tivoli Workload Scheduler is now running Starting all the IBM Tivoli Workload Scheduler processes and services Coordinating all messages from and to the IBM Tivoli Workload Scheduler network Linking all workstations in its domain Lets examine these considerations in more detail.

IBM Tivoli Workload Scheduler database


The IBM Tivoli Workload Scheduler database is held in the same file system as the installed directory of IBM Tivoli Workload Scheduler. Therefore, providing this is not being mounted or links to a separate file system, then the database will follow the IBM Tivoli Workload Scheduler installation. If the version of IBM Tivoli Workload Scheduler used is prior to Version 8.2, then you will have to consider the TWShome/../unison/ directory, as this is where part of the database is held (workstation, NT user information); the working security file is also held here. The directory TWShome/../unison/ may not be part of the same file system as the TWShome directory, so this will have to be added as part of the cluster package. Because the database is a sequential index link database, there is no requirement to start the database before IBM Tivoli Workload Scheduler can read it.

Chapter 2. High level design and architecture

47

IBM Tivoli Workload Scheduler components file


All versions prior to IBM Tivoli Workload Scheduler Version 8.2 require a components file. The contents of this file must contain the location of both maestro and Netman installations, and it is installed in the directory c:\win32app\TWS\Unison\netman. Under the UNIX operating system /usr/unson/ this needs to be accessed on both sides of the cluster.

IBM Tivoli Workload Scheduler console


The IBM Tivoli Workload Scheduler console (called the Job Scheduling Console) connects to the IBM Tivoli Workload Scheduler engine through the IBM Tivoli Management Framework or the Framework. The Framework authenticates the logon user, and communicates to the IBM Tivoli Workload Scheduler engine through two Framework modules (Job Scheduling Services and Job Scheduling Connector). Therefore, you need to consider both the IP address of the Framework and the location of the IBM Tivoli Workload Scheduler engine code. When a user executes the Job Scheduling Console, it prompts for a User name, Password for that user and an address of where the Framework is located. This address can be a fully-qualified domain name or an IP address, but it must be able to connect to where the Framework is running (after the cluster take over). The Job Scheduling Console displays a symbol of an engine. If the IBM Tivoli Workload Schedule engine is active, the engine symbol displays without a red cross through it. If the IBM Tivoli Workload Schedule engine is not active, then the engine symbol has a red crossmark through it, as shown in Figure 2-6.

Figure 2-6 Symbol of IBM Tivoli Workload Scheduler engine availability

Domain Manager
The Domain Manager is the second critical workstation that needs to be protected in a HA cluster, because it controls, coordinates and keeps track of all scheduling dependences between workstations that are defined in the domain that this Domain Manager is managing (which may be hundreds or even a thousand workstations). The considerations that should be kept in mind when setting up a Domain Manager in a cluster are:

48

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

The ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2). The ability of the user interface (Job Scheduling Console) to connect to the new location of where IBM Tivoli Workload Scheduler is now running (this is optional, as it is not essential to run the console on this workstation). In addition, the starting of all IBM Tivoli Workload Scheduler processes and services, the coordination of all messages from and to the IBM Tivoli Workload Scheduler network, and the linking of all workstations in its domain should be taken into account.

Fault Tolerant Agent


The Fault Tolerant Agent may be put in a cluster because a critical application needs to be in a HA environment, so the Fault Tolerant Agent that schedules and controls all the batch work needs to be in this same cluster. Keep the following considerations in mind when setting up a Fault Tolerant Agent in a cluster: The ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2) The ability of the user interface (Job Scheduling Console) to connect to the new location of where IBM Tivoli Workload Scheduler is now running (this is optional, as it is not essential to run the console on this workstation). In addition, the starting of all IBM Tivoli Workload Scheduler processes and services should be taken into account.

Extended Agents
An Extended Agent (xa or x-agent) serves as an interface to an external, non-IBM Tivoli Workload Scheduler system or application. It is defined as an IBM Tivoli Workload Scheduler workstation with an access method and a host. The access method communicates with the external system or application to launch and monitor jobs and test Open file dependencies. The host is another IBM Tivoli Workload Scheduler workstation (except another xa) that resolves dependencies and issues job launch requests via the method. In this section, we consider the implications of implementing these Extended Agents in a HA cluster with the different Extended Agents currently available. All the Extended Agents are currently installed partly in the application itself and also on a IBM Tivoli Workload Scheduler workstation (which can be a Master

Chapter 2. High level design and architecture

49

Domain Manager, a Domain Manager or an Fault Tolerant Agent), so we need to consider the needs of the type of workstation the Extended Agent is installed on. We will cover each type of Extended Agent in turn. The types of agents that are currently supported are: SAP R/3; Oracle e-Business Suite; PeopleSoft; z/OS access method; and Local and Remote UNIX access. For each Extended Agent, we describe how the access method will work in a cluster.

SAP R/3 access method


When you install and configure the SAP Extended Agent and then create a workstation definition for the SAP instance you wish to communicate with, there will be an R3batch method in the methods directory. This is a C program that communicates with the remote R3 system. It finds where to run the job by reading the r3batch.opts file, and then matching the workstation name with the first field in the r3batch.opts file. R3batch then reads all the parameters in the matched workstation line, and uses these to communicate with the R/3 system. The parameter that we are interested in is the second field of the r3batch.opts file: R/3 Application Server. This will be a IP address or domain name. In order for the Extended Agent to operate correctly, this system should be accessed from wherever IBM Tivoli Workload Scheduler is running. (This operates in the same way for the Microsoft or the UNIX cluster.)

Oracle e-Business Suite access method


The Oracle e-Business Suite Extended Agent is installed, configured on the same system as the Oracle Application server. When setting this up in a cluster, you must first configure the Fault Tolerant Agent and Extended Agent to be in the same part of the cluster. When the Oracle Applications x-agent is started, the IBM Tivoli Workload Scheduler host executes the access method mcmagent. Using the x-agents workstation name as a key, mcmagent looks up the corresponding entry in the mcmoptions file to determine which instance of Oracle Applications it will connect to. The Oracle Applications x-agent can then launch jobs on that instance of Oracle Applications and monitor the jobs through completion, writing job progress and status information to the jobs standard list file.

PeopleSoft access method


The PeopleSoft Extended Agent is installed and configured on the same system as the PeopleSoft client. It also requires an IBM Tivoli Workload Scheduler Fault Tolerant Agent to host the PeopleSoft Extended Agent, which is also installed and configured on the same system as the PeopleSoft client.

50

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

When setting this configuration up in a cluster, you must first configure the Fault Tolerant Agent and Extended Agent to be in the same part of the cluster as the PeopleSoft Client. To launch a PeopleSoft job, IBM Tivoli Workload Scheduler executes the psagent method, passing it information about the job. An options file provides the method with path, executable and other information about the PeopleSoft process scheduler and application server used to launch the job. The Extended Agent can then access the PeopleSoft process request table and make an entry in the table to launch the job. Job progress and status information are written to the jobs standard list file.

z/OS access method


IBM Tivoli Workload Scheduler z/OS access method has three separate methods, depending on what you would like to communicate to on the z/OS system. All of these methods work in the same way, and they are: JES, OPC and CA7. The Extended Agent will communicate to the z/OS gateway over TCP/IP, and will use the parameter HOST in the workstation definition to communicate to the gateway. When configuring a z/OS Extended Agent in a cluster, be aware that this Extended Agent is hosted by a Fault Tolerant Agent; the considerations for a Fault Tolerant Agent are described in 2.3.1, Configurations for implementing IBM Tivoli Workload Scheduler in a cluster on page 46. The parameter that we are interested in is in the workstation definition HOST. This will be a IP address or domain name. In order for the Extended Agent to operate correctly, this system should be accessed from wherever the IBM Tivoli Workload Scheduler is running. (This operates in the same way for the Microsoft or the UNIX cluster.) Figure 2-7 on page 52 shows the architecture of the z/OS access method.

Chapter 2. High level design and architecture

51

TWS Host

mvs access method Unix or NT Host z/OS System mvs gateway

method.opts

JES2/JES3

OPC

CA7

Job

Figure 2-7 z/OS access method

Local UNIX access method


When the IBM Tivoli Workload Scheduler sends a job to a local UNIX Extended Agent, the access method, unixlocl, is invoked by the host to execute the job. The method starts by executing the standard configuration script on the host workstation (jobmanrc). If the jobs logon user is permitted to use a local configuration script and the script exists as $HOME/.jobmanrc, the local configuration script is also executed. The job itself is then executed either by the standard or the local configuration script. If neither configuration script exists, the method starts the job. For the local UNIX Extended Agent to function properly in a cluster, the parameter that we are interested in is host, which is in the workstation definition. This will be an IP address or domain name, and providing that wherever the IBM Tivoli Workload Scheduler is running this system can be accessed, then the Extended Agent will still operate correctly.

Remote UNIX access method


Note: In this section we explain how this access method works in a cluster; this explanation is not meant to be used as a way to set up and configure this Extended Agent. When the IBM Tivoli Workload Scheduler sends a job to a remote UNIX Extended Agent, the access method, unixrsh, creates a /tmp/maestro directory on the non-IBM Tivoli Workload Scheduler computer. It then transfers a wrapper script to the directory and executes it. The wrapper then executes the scheduled job. The wrapper is created only once, unless it is deleted, moved, or outdated.

52

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

For the remote UNIX Extended Agent to function properly in a cluster, the parameter that we are interested in is host, which is in the workstation definition. This will be an IP address or domain name, and providing that wherever the IBM Tivoli Workload Scheduler is running this system can be accessed, then the Extended Agent will still operate correctly.

One instance of IBM Tivoli Workload Scheduler


In this section, we discuss the circumstances under which you might install one instance of the IBM Tivoli Workload Scheduler in a high availability cluster. The first consideration is where the product is to be installed: it must be in the shared file system that moves between the two servers in the cluster. The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster.

Why to install only one copy of IBM Tivoli Workload Scheduler


In this configuration there may be three reasons for installing only one copy of IBM Tivoli Workload Scheduler in this cluster: Installing a Master Domain Manager (MDM) in a cluster removes the single point of failure of the IBM Tivoli Workload Scheduler database and makes the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures. Installing a Domain Manager (DM) in a cluster makes the segment of the IBM Tivoli Workload Scheduler network that the Domain Manager manages more fault tolerant against failures. If an application is running in a clustered environment and is very critical to the business, it may have some critical batch scheduling; you could install a Fault Tolerant Agent in the same cluster to handle the batch work.

When to install only one copy of IBM Tivoli Workload Scheduler


You would install the workstation in this cluster in order to provide high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager in the cluster.

Where to install only one copy of IBM Tivoli Workload Scheduler


To take advantage of the cluster, install this instance of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two sides of the cluster.

Chapter 2. High level design and architecture

53

What to install
Depending on why you are installing one instance of IBM Tivoli Workload Scheduler, you may install a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.

Two instances of IBM Tivoli Workload Scheduler


In this section, we discuss the circumstances under which you might install two instances of the IBM Tivoli Workload Scheduler. The first consideration is where the product is to be installed: each IBM Tivoli Workload Scheduler instance must have a different installation directory, and that must be in the shared file system that moves between the two servers in the cluster. Each instance will also have its own installation user. The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster. Each IBM Tivoli Workload Scheduler instance must also have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler. Why to install two instances of IBM Tivoli Workload Scheduler In this configuration there may be two reasons for installing two copies of IBM Tivoli Workload Scheduler in this cluster: Installing a Master Domain Manager and a Domain Manager in the cluster not only removes the single point of failure of the IBM Tivoli Workload Scheduler database, but also makes the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures. If two applications are running in a clustered environment and they are very critical to the business, they may have some critical batch scheduling; you could install a Fault Tolerant Agent for each application running in the cluster to handle the batch work. When to install two instances of IBM Tivoli Workload Scheduler You would install both instances of IBM Tivoli Workload Scheduler in this cluster in order to give a high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or Domain Manager in this cluster. Where to install two instances of IBM Tivoli Workload Scheduler To take advantage of the cluster, you would install the two instances of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two

54

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

sides of the cluster. You would set up the cluster software in such a way that the first instance of IBM Tivoli Workload Scheduler would have a preference of running on server A and the second instance would have a preference of running on server B.

What to install
Depending on why you are installing two instances of IBM Tivoli Workload Scheduler, you may install a combination of a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.

Three instances of IBM Tivoli Workload Scheduler


In this section, we discuss the circumstances under which you might install three instances of the IBM Tivoli Workload Scheduler. The first consideration is where the product is to be installed. When two instances of IBM Tivoli Workload Scheduler are running on the same system, you must have each IBM Tivoli Workload Scheduler instance installed in a different directoryand one of the instances must be installed in the shared file system that moves between the two servers in the cluster. Each instance will have it own installation user. The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed. In this case, one will have the IP address that is associated to the cluster, and the other two will have the IP address of each system that is in this cluster. Each IBM Tivoli Workload Scheduler instance must have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler. Why to install three instances of IBM Tivoli Workload Scheduler In this configuration, only one instance is installed in a high availability mode; the other two are installed on the local disks shown in Figure 2-8 on page 56. Why would you install IBM Tivoli Workload Scheduler in this configuration? Because an application is running on both sides of the cluster that cannot be configured in a cluster; therefore, you need to install the IBM Tivoli Workload Scheduler workstation with the application. Also, you may wish to install the Master Domain Manager in the cluster, or an third application is cluster-aware and can move. When to install three instances of IBM Tivoli Workload Scheduler You would install one instance of the IBM Tivoli Workload Scheduler in this cluster in order to give high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or

Chapter 2. High level design and architecture

55

Domain Manager in this cluster, and one instance of IBM Tivoli Workload Scheduler on each local disk. This second instance may be scheduling batch work for the systems in the cluster, or an application that only runs on the local disk subsystem. Where to install three instances of IBM Tivoli Workload Scheduler Install one instance of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two sides of the cluster, and one instance of IBM Tivoli Workload Scheduler on the local disk allocated to each side of the cluster, as shown in shown in Figure 2-8. What to install Depending on why you are installing one instance of IBM Tivoli Workload Scheduler as described above, you may a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster. You would install a Fault Tolerant Agent on each side of the cluster.

TWS Engine 1

TWS Engine 2

TWS Engine 3

Local Disk Volume

System 2

Shared Disk Volume

System 2

Local Disk Volume

Figure 2-8 Three-instance configuration

Multiple instances of IBM Tivoli Workload Scheduler


In this section, we discuss the circumstances under which you might install multiple instances of the IBM Tivoli Workload Scheduler. The first consideration is where the product is to be installed, because each IBM Tivoli Workload Scheduler instance must have a different installation directory. These installation directories must be in the shared file system that moves between the two servers in the cluster. Each instance will also have its own installation user. The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster. Each IBM Tivoli Workload Scheduler instance must also have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to

56

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler. Why to install multiple instances of IBM Tivoli Workload Scheduler In this configuration there may be many applications running in this cluster, and each application would need to have its own workstation associated with this application. You might also want to install Master Domain Manager and even the Domain Manager in the cluster to make the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures. When to install multiple instances of IBM Tivoli Workload Scheduler You would install multiple instances of IBM Tivoli Workload Scheduler in this cluster to give high availability to an application and to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or Domain Manager in this cluster. Where to install multiple instances of IBM Tivoli Workload Scheduler All instances of IBM Tivoli Workload Scheduler would be installed on the shared disk system that moves between the two sides of the cluster. Each instance would need its own installation directory, its own installation user, and its own port address. What to install Depending on why you are installing multiple instances of IBM Tivoli Workload Scheduler, you may install a combination of a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.

2.3.2 Software availability within IBM Tivoli Workload Scheduler


In this section we discuss software options currently available with IBM Tivoli Workload Scheduler that will give you a level of high availability if you do not have, or do not want to use, a hardware cluster.

Backup Master Domain Manager


A Backup Master Domain Manager (BMDM) and the Master Domain Manager (MDM) are critical parts of a highly available IBM Tivoli Workload Scheduler environment. If the production Master Domain Manager fails and cannot be immediately recovered, a backup Master Domain Manager will allow production to continue. The Backup Master Domain Manager must be identified when defining your IBM Tivoli Workload Scheduler network architecture; it must be a member of the

Chapter 2. High level design and architecture

57

same domain as the Master Domain Manager, and the workstation definition must have the Full Status and Resolve Dependencies modes selected. It may be necessary to transfer files between the Master Domain Manager and its standby. For this reason, the computers must have compatible operating systems. Do not combine UNIX with Windows NT computers. Also, do not combine little-endian and big-endian computers. When a Backup Master Domain Manager is correctly configured, the Master Domain Manager will send any changes and updates to the production file to the BMDMbut any changes or updates that are made to the database are not automatically sent to the BMDM. In order to keep the BMDM and the MDM databases synchronized, you must manually copy on a daily basis, following start-of-day processing, the TWShome\mozart and TWShome\..\unison\network directories (the unison directory is only for versions older than 8.2). Any changes to the security must be replicated to the BMDM, and configuration files like localopts and globalopts files must also be replicated to the BMDM. The main advantages over a hardware HA solution is that this currently exists in the IBM Tivoli Workload Scheduler product, and the basic configuration where the BMDM takes over the IBM Tivoli Workload Scheduler network for a short-term loss of the MDM is fairly easy to set up. Also, no extra hardware or software is needed to configure this solution. The main disadvantages are that the IBM Tivoli Workload Scheduler database is not automatically synchronized and it is the responsibility of the system administrator to keep both databases in sync. Also, for a long-term loss of the MDM, the BMDM will have to generate a new production day plan and for this an operator will have to submit a Jnextday job on the BMDM. Finally, any jobs or job streams that ran on the MDM will not run on the BMDM, because the workstation names are different.

Backup Domain Manager


The management of a domain can be assumed by any Fault Tolerant Agent that is a member of the same domain.The workstation definition has to have Full Status and Resolve Dependencies modes selected. When the management of a domain is passed to another workstation, all domain workstations members are informed of the switch, and the old Domain Manager is converted to a Fault Tolerant Agent in the domain. The identification of domain managers is carried forward to each new days symphony file, so that switches remain in effect until a subsequent switchmgr command is executed.

58

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Once a new workstation has taken over the responsibility of the domain, it has the ability to resolve any dependencies for the domain it is managing, and also the ability to process any messages to or from the network.

Switch manager command


The switch manager command is used to transfer the management of a IBM Tivoli Workload Scheduler domain to another workstation. This command can be used on the Master Domain Manager or on a Domain Manager. To use the command switchmgr, the workstation that you would like to have take over the management of a domain must be a member of the same domain. It must also have resolve dependences and full status to work correctly. The syntax of the command is switchmgr domain;newmgr. The command stops a specified workstation and restarts it as the Domain Manager. All domain member workstations are informed of the switch, and the old Domain Manager is converted to a Fault Tolerant Agent in the domain. The identification of Domain Managers is carried forward to each new days symphony file, so that switches remain in effect until a subsequent switchmgr command is executed. However, if new day processing (the Jnextday job) is performed on the old domain manager, the domain will act as though another switchmgr command had been executed and the old Domain Manager will automatically resume domain management responsibilities.

2.3.3 Load balancing software


Using load balancing software is another way of bringing a form of high availability to IBM Tivoli Workload Scheduler jobs; the way to do this is by integrating IBM Tivoli Workload Scheduler with IBM LoadLeveler, because IBM LoadLeveler will detect if a system is unavailable and reschedule it on one that is available. IBM LoadLeveler is a job management system that allows users to optimize job execution and performance by matching job processing needs with available resources. IBM LoadLeveler schedules jobs and provides functions for submitting and processing jobs quickly and efficiently in a dynamic environment. This distributed environment consists of a pool of machines or servers, often referred to as a LoadLeveler cluster. Jobs are allocated to machines in the cluster by the IBM LoadLeveler scheduler. The allocation of the jobs depends on the availability of resources within the cluster and on rules defined by the IBM LoadLeveler administrator. A user submits a job to IBM LoadLeveler and the scheduler attempts to find resources within the cluster to satisfy the requirements of the job.

Chapter 2. High level design and architecture

59

At the same time, the objective of IBM LoadLeveler is to maximize the efficiency of the cluster. It attempts to do this by maximizing the utilization of resources, while at the same time minimizing the job turnaround time experienced by users.

2.3.4 Job recovery


In this section we explain how IBM Tivoli Workload Scheduler will treat a job if it has failed; this is covered in three scenarios.

A job abends in a normal job run


Prior to IBM Tivoli Workload Scheduler Version 8.2, if a job finished with a return code other than 0, the job was treated as ABENDED. If this was the correct return code for this job, the IBM Tivoli Workload Scheduler administrator would run a wrapper script around the job or change the .jobmanrc to change the job status to SUCCES. In IBM Tivoli Workload Scheduler Version 8.2, however, a new field in the job definition allows you to set a boolean expression for the return code of the job. This new field is called rccondsucc. In this field you are allowed to type in a boolean expression which determines the return code (RC) required to consider a job successful. For example, you can define a successful job as a job that terminates with a return code equal to 3 or with a return code greater than or equal to 5, and less than 10, as follows:
rccondsucc "RC=3 OR (RC>=5 AND RC<10)"

Job process is terminated


A job can be terminated in a number of ways, and in this section we look at some of the more common ones. Keep in mind, however, that it is not the responsibility of IBM Tivoli Workload Scheduler to roll back any actions that a job may have done during the time that it was executing. It is the responsibility of the person creating the script or command to allow for a rollback or recovery action. When a job abends, IBM Tivoli Workload Scheduler can rerun the abended job or stop or continue on with the next job. You can also generate a prompt that needs to be replied to, or launch a recovery job. The full combination of the job flow is shown in Figure 2-9 on page 61.

60

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

JOB 1

JOB 2

FAILURE JOB 3

JOB 4

JOB 5

Issue a Recovery Prompt

Run Recovery Job

JOB3A

Stop

Rerun
JOB 3

Continue
JOB 4

Figure 2-9 IBM Tivoli Workload Scheduler job flow

Here are the details of this job flow: When a job is killed through the conman CLI or Job Scheduling Console, the job will be terminated by terminating the parent process. The termination of any child processes that the parent has started will be the responsibility of the operating system and not IBM Tivoli Workload Scheduler. After the job has been terminated, it will be displayed in the current plan in the Abend state. Any jobs or job streams that are dependent on a killed job are not released. Killed jobs can be rerun. When the process ID is killed, either in UNIX or Microsoft operating systems, the job will be terminated by terminating the parent process. The termination of any child processes that the parent has started will be the responsibility of the operating system and not IBM Tivoli Workload Scheduler. After the job has been terminated, it will be displayed in the current plan in the Abend state. Any jobs or job streams that are dependent on a killed job are not released. Killed jobs can be rerun. When the system crashes or is powered off, the job is killed by the crash or by the system being powered down. In that case, when the system is re-booted

Chapter 2. High level design and architecture

61

and IBM Tivoli Workload Scheduler is restarted, IBM Tivoli Workload Scheduler will check to see if there are any jobs left in the jobtable file: If jobs are left, IBM Tivoli Workload Scheduler will read the process ID and then go out to see if that process ID is still running. If no jobs are left, it will mark the job as Abend and the normal recovery action will run.

62

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 3.

High availability cluster implementation


In this chapter, we provide step-by-step installation procedures to help you plan and implement an high availability cluster using High Availability Cluster Multiprocessing for AIX (HACMP) and Microsoft Cluster Service (MSCS), for a mutual takeover scenario of Tivoli Framework and Tivoli Workload Scheduler. We cover the following procedures: Our high availability cluster scenarios on page 64 Implementing an HACMP cluster on page 67 Implementing a Microsoft Cluster on page 138

Copyright IBM Corp. 2004. All rights reserved.

63

3.1 Our high availability cluster scenarios


With numerous cluster software packages on the market, each offering a variety of configurations, there are many ways of configuring a high availability (HA) cluster. We cannot cover all possible scenarios, so in this redbook we focus on two scenarios which we believe are applicable to many sites: a mutual takeover scenario for IBM Tivoli Workload Scheduler, and a hot standby scenario for IBM Tivoli Management Framework. We discuss these scenarios in detail in the following sections.

3.1.1 Mutual takeover for IBM Tivoli Workload Scheduler


In our scenario, we assume a customer case where they plan to manage jobs for two mission-critical business applications. They plan to have the two business applications running on separate nodes, and would like to install separate IBM Tivoli Workload Scheduler Master Domain Managers on each node to control the jobs for each application. They are seeking a cost-effective, high availability solution to minimize the downtime of their business application processing in case of a system component failure. Possible solutions for this customer would be the following: Create separate HA clusters for each node by adding two hot standby nodes and two sets of external disks. Create one HA cluster by adding an additional node and a set of external disks. Designate the additional node as a hot standby node for the two application servers. Create one HA cluster by adding a set of external disks. Each node is designated as a standby for the other node. The first two solutions require additional machines to sit idle until a fallover occurs, while the third solution utilizes all machines in a cluster and no node is left to sit idle. Here we assume that the customer chose the third solution. This type of configuration is called a mutual takeover, as discussed in Chapter 2, High level design and architecture on page 31. Note that this type of cluster configuration is allowed under the circumstance that the two business applications in question and IBM Tivoli Workload Scheduler itself have no software or hardware restrictions to run on the same physical machine. Figure 3-1 on page 65 shows a diagram of our cluster.

64

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster

Tivoli Management Framework Server 1 TWS1 (for APP1) TWS Connector1


Instance1 Instance2

Tivoli Management Framework Server 1 TWS Connector2 TWS2 (for APP2)


Instance1 Instance2

Node1
Figure 3-1 Overview of our HA cluster scenario

Node2

In Figure 3-1, node Node1 controls TWS1 and the application APP1. Node Node2 controls TWS2 and application APP2. TWS1 and TWS2 are installed on the shared external disk so that each instance of IBM Tivoli Workload Scheduler could fall over to another node. We assume that system administrators would like to use the Job Scheduling Console (JSC) to manage the scheduling objects and production plans. To enable the use of JSC, Tivoli Management Framework(TMF) and IBM Tivoli Workload Scheduler Connector must be installed. Because each IBM Tivoli Workload Scheduler instance requires a running Tivoli Management Framework Server or a Managed Node, we need two Tivoli Management Region (TMR) servers. Keep in mind that in our scenario, when a node fails, everything installed on the external disk will fall over to another node. Note that it is not officially supported to run two TMR servers or Managed Nodes in one node. So the possible configuration of TMF in this scenario would be to install TMR servers on the local disks of each node.

Chapter 3. High availability cluster implementation

65

IBM Tivoli Workload Scheduler connector will also be installed on the local disks. To enable JSC access to both IBM Tivoli Workload Scheduler instances during a fallover, each IBM Tivoli Workload Scheduler Connector needs two connector instances defined: Instance1 to control TWS1, and Instance2 to control TWS2.

3.1.2 Hot standby for IBM Tivoli Management Framework


In our mutual takeover scenario, we cover the high availability scenario for IBM Tivoli Workload Scheduler.Here, we cover a simple hot standby scenario for IBM Tivoli Management Framework (TMF). Because running multiple instances of Tivoli Management Region server (TMR server) on one node is not supported, a possible configuration to provide high availability would be to configure a cluster with the primary node, hot standby node and a disk subsystem. Figure 3-2 shows a simple hot standby HA cluster with two nodes and a shared external disk. IBM Tivoli Management Framework is installed on the shared disk, and normally resides on Node1. When Node1 fails, TMF will fall over to Node2.

Cluster

Tivoli Management Framework Server

Node2 Node1

Figure 3-2 A hot standby cluster for a TMR server

66

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

3.2 Implementing an HACMP cluster


HACMP is a clustering software provided by IBM for implementing high availability solutions on AIX platforms. In the following sections we describe the process of planning, designing, and implementing a high availability scenario using HACMP. For each implementation procedure discussed in this section, we provide examples by planning an HACMP cluster for IBM Tivoli Workload Scheduler high availability scenario.

3.2.1 HACMP hardware considerations


As mentioned in Chapter 2, High level design and architecture on page 31, the ultimate goal in implementing an HA cluster is to eliminate all possible single points of failure. Keep in mind that cluster software alone does not provide high availability; appropriate hardware configuration is also required to implement a highly available cluster. This applies to HACMP as well. For general hardware considerations about an HA cluster, refer to 2.2, Hardware configurations on page 43.

3.2.2 HACMP software considerations


HACMP not only provides high availability solutions for hardware, but for mission-critical applications that utilize those hardware resources as well. Consider the following before you plan high availability for your applications in an HACMP cluster: Application behavior Licensing Dependencies Automation Robustness Fallback policy For details on what you should consider for each criteria, refer to 2.1.2, Software considerations on page 39.

3.2.3 Planning and designing an HACMP cluster


As mentioned in Chapter 2, High level design and architecture on page 31, the sole purpose of implementing an HACMP cluster is to eliminate possible single points of failure in order to provide high availability for both hardware and software. Thoroughly planning the use of both hardware and software components is required prior to HACMP installation.

Chapter 3. High availability cluster implementation

67

To plan our HACMP cluster, we followed the steps described in HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. Because we cannot cover all possible high availability scenarios, in this section we discuss only the planning tasks needed to run IBM Tivoli Workload Scheduler in a mutual takeover scenario. Planning tasks for a mutual takeover scenario can be extended for a hot standby scenario. The following planning tasks are described in this section. Planning the cluster nodes Planning applications for high availability Planning the cluster network Planning the shared disk device Planning the shared LVM components Planning the resource groups Planning the cluster event processing

Use planning worksheets


A set of offline and online planning worksheets is provided for HACMP 5.1. For a complete and detailed description of planning an HACMP cluster using these worksheets, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. By filling out these worksheets, you will be able to plan your HACMP cluster easily. Here we describe some of the offline worksheets. (Note, however, that our description is limited to the worksheets and fields that we used; fields and worksheets that were not essential to our cluster plan are omitted.)

Draw a cluster diagram


In addition to using these worksheets, it is also advisable to draw a diagram of your cluster as you plan. A cluster diagram should provide an image of where your cluster resources are located. In the following planning tasks, we show diagrams of what we planned in each task.

Planning the cluster nodes


The initial step in planning an HACMP cluster is to plan the size of your cluster. This is the phase where you define how many nodes and disk subsystems you need in order to provide high availability for your applications. If you plan high availability for one application, a cluster of two nodes and one disk subsystem may be sufficient. If you are planning high availability for two or more applications installed on several servers, you may want to add more than one nodes to provide high availability. You may also need more than one disk subsystem, depending on the amount of data you plan to store on external disks.

68

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

For our scenario of a mutual takeover, we plan a cluster with two AIX platforms and an SSA disk subsystem to share. Machine types used in the scenario are a given environment in our lab. When planning for a mutual takeover configuration, make sure that each node has sufficient machine power to perform the job of its own and the job of the other node in the event that a fallover occurs. Otherwise, you may not achieve maximum application performance during a fallover. Figure 3-3 shows a diagram of our cluster node plan. The cluster name is cltivoli. There are two nodes in the cluster, tivaix1 and tivaix2, sharing an external disk subsystem. Each node will run one business application and one instance of IBM Tivoli Workload Scheduler to manage that application. Note that we left some blank space in the diagram for adding cluster resources to this diagram as we plan. In this section and the following sections, we describe the procedures to plan an HACMP cluster using our scenario as an example. Some of the planning tasks may be extended to configure high availability for other applications; however, we are not aware of application-specific considerations and high availability requirements.

Cluster:

cltivoli

Disk subsystem
Disk Adapter

tivaix1

Figure 3-3 Cluster node plan

Disk Adapter

tivaix2

Chapter 3. High availability cluster implementation

69

Planning applications for high availability


After you have planned the cluster nodes, the next step is to define where your application executables and data should be located, and how you would like HACMP to control them in the event of a fallover or fallback. For each business application or any other software packages that you plan to make highly available, create an application definition and an application server.

Application definition means giving a user-defined name to your application, and then defining the location of your application and how it should be handled in the event of fallover. An application server is a cluster resource that associates the application and the names of specially written scripts to start and stop the application. Defining an application server enables HACMP to resume application processing on the takeover node when a fallover occurs.
When planning for applications, the following HACMP worksheets may help to record any required information. Application Worksheet Application Server Worksheet

Completing the Application Worksheet


The Application Worksheet helps you to define which applications should be controlled by HACMP, and how they should be controlled. After completing this worksheet, you should have at least the following information defined: Application Name Assign a name for each application you plan to put under HACMP control. This is a user-defined name associated with an application.

Location of Key Application Files For each application, define the following information for the executables and data. Make sure you enter the full path when specifying the path of the application files. -Directory/path where the files reside -Location (internal disk/external disk) -Sharing (shared/not shared) Cluster Name Node Relationship Name of the cluster where the application resides. Specify the takeover relationship of the nodes in the cluster (choose from cascading, concurrent, or rotating). For a description of each takeover relationship,

70

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

refer to HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864. Fallover Strategy Define the fallover strategy for the application. Specify which node would be the primary and which node will be the takeover. -The primary node will control the application in normal production. -The takeover node will control the application in the event of a fallover deriving from a primary node failure or a component failure on the primary node. Start Commands/Procedures Specify the commands or procedures for starting the application. This is the command or procedure you will write in your application start script. HACMP invokes the application start script in the event of a cluster start or a fallover. Specify the commands to verify that your application is up or running. Specify the commands or procedures for stopping the application. This is the command or procedure you will write in your application stop script. HACMP will invoke the application stop script in the event of a cluster shutdown or a fallover. Specify the commands to verify that your application has stopped.

Verification Commands Stop Commands/Procedures

Verification Commands

Note: Start, stop, and verification commands specified in this worksheet should not require operator intervention; otherwise, cluster startup, shutdown, fallover, and fallback may halt. Table 3-1 on page 72 and Table 3-2 on page 73 show examples of how we planned IBM Tivoli Workload Scheduler for high availability. Because we plan to have two instances of IBM Tivoli Workload Scheduler running in one cluster, we defined two applications, TWS1 and TWS2. In normal production, TWS1 resides on node tivaix1, while TWS2 resides on node tivaix2.

Chapter 3. High availability cluster implementation

71

Note: If you are installing an IBM Tivoli Workstation Scheduler version older than 8.2, you cannot use /usr/maestro and /usr/maestro2 as the installation directories. Why? Because in such a case, both installations would use the same Unison directoryand the Unison directory should be unique for each installation. Therefore, if installing a version older than 8.2, we suggest using /usr/maestro1/TWS and /usr/maestro2/TWS as the installation directories, which will make the Unison directory unique. For Version 8.2, this is not important, since the Unison directory is not used in this version.

Notice that we placed the IBM Tivoli Workload Scheduler file systems on the external shared disk, because both nodes must be able to access the two IBM Tivoli Workload Scheduler instances for fallover. The two instances of IBM Tivoli Workload Scheduler should be located in different file systems to allow both instances of IBM Tivoli Workload Scheduler to run on the same node. Node relationship is set to cascading because each IBM Tivoli Workload Scheduler instance should return to its primary node when it rejoins the cluster.
Table 3-1 Application definition for IBM Tivoli Workload Scheduler1 (TWS1) Items to define Value

Application Name Location of Key Application Files 1. Directory/path where the files reside 2. Location (internal disk/external disk) 3. Sharing (shared/not shared) Cluster Name Node Relationship Fallover Strategy

TWS1 1. /usr/maestro 2. external disk 3. shared

cltivoli cascading
tivaix1: primary tivaix2: takeover

72

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Items to define Start Commands/Procedures

Value

1. run conman start to start IBM Tivoli Workload Scheduler process as maestro user 2. run conman link @; noask to link all FTAs 1. run ps -ef | grep -v grep | grep /usr/maestro 2. check that netman, mailman, batchman and jobman are running 1. run conman unlink @;noask to unlink all FTAs as maestro user 2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user 1. run ps -ef | grep -v grep | grep /usr/maestro 2. check that netman, mailman, batchman and jobman are not running

Verification Commands

Stop Commands/Procedures

Verification Commands

Table 3-2 Application definition for Tivoli Workstation Scheduler2 (TWS2) Items to define Value

Application Name Location of Key Application Files 1. Directory/path where the files reside 2. Location (internal disk/external disk) 3. Sharing (shared/not shared) Cluster Name Node Relationship Fallover Strategy

TWS2 1. /usr/maestro2 2. external disk 3. shared

cltivoli cascading
tivaix2: primary tivaix1: takeover

Chapter 3. High availability cluster implementation

73

Items to define Start Commands/Procedures

Value

1. run conman start to start IBM Tivoli Workload Scheduler process as maestro user 2. run conman link @; noask to link all FTAs 1. run ps -ef | grep -v grep | grep /usr/maestro2 2. check that netman, mailman, batchman and jobman are running 1. run conman unlink @;noask to unlink all FTAs as maestro user 2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user 1. run conman unlink @;noask to unlink all FTAs as maestro user 2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user

Verification Commands

Stop Commands/Procedures

Verification Commands

Completing the Application Server Worksheet


This worksheet helps you to plan the application server cluster resource. Define an application server resource for each application that you defined in the Application Worksheet. If you plan to have more than one application server in a cluster, then add a server name and define the corresponding start/stop script for each application server. Cluster Name Enter the name of the cluster. This must be the same name you specified for Cluster Name in the Application Worksheet. For each application in the cluster, specify an application server name. Specify the name of the application start script for the application server in full path. Specify the name of the application stop script for the application server in full path.

Server Name Start Script Stop Script

We defined two application servers, tws_svr1 and tws_svr2 in our cluster; tws_svr1 is for controlling application TWS1, and tws_svr2 is for controlling application TWS2. Table 3-3 shows the values we defined for tws_svr1.

74

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Table 3-3 Application server definition for tws_svr1 Items to define Value

Cluster Name Server Name Start Script Stop Script

cltivoli tws_svr1 /usr/es/sbin/cluster/scripts/start_tws1 .sh /usr/es/sbin/cluster/scripts/stop_tws1. sh

Table 3-4 shows the values we defined for tws_svr2.


Table 3-4 Application server definition for tws_svr2 Items to define Value

Cluster Name Server Name Start Script Stop Script

cltivoli tws_svr2 /usr/es/sbin/cluster/scripts/start_tws2 .sh /usr/es/sbin/cluster/scripts/stop_tws2. sh

After planning your application, add the information about your applications into your diagram. Figure 3-4 shows an example of our cluster diagram populated with our application plan. We omitted specifics such as start scripts and stop scripts, because the purpose of the diagram is to show the names and locations of cluster resources.

Chapter 3. High availability cluster implementation

75

Cluster:

cltivoli

SSA Disk subsystem

tivaix1

tivaix2

Figure 3-4 Cluster diagram with applications added

Planning the cluster network


The cluster network must be planned so that network components (network, network interface cards, TCP/IP subsystems) are eliminated as a single point of failure. When planning the cluster network, complete the following tasks: Design the cluster network topology. Network topology is the combination of IP and non-IP (point-to-point) networks to connect the cluster nodes and the number of connections each node has to each network. Determine whether service IP labels will be made highly available with IP Address Takeover (IPAT) via IP aliases or IPAT via IP Replacement. Also determine whether IPAT will be done with or without hardware address takeover. Service IP labels are relocatable virtual IP label HACMP uses to ensure client connectivity in the event of a fallover. Service IP labels are not bound to a particular network adapter. They can be moved from one adapter to another, or from one node to another.

76

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

SSA

SSA

We used the TCP/IP Network Worksheet, TCP/IP Network Interface Worksheet, and Point-to-point Networks Worksheet to plan our cluster network.

Completing the TCP/IP Network Worksheet


Enter information about all elements of your TCP/IP network that you plan to have in your cluster. The following items should be identified when you complete this worksheet. Cluster Name The name of your cluster.

Then, for each network, specify the following. Network Name Network Type Netmask Node Names IPAT via IP Aliases Assign a name for the network. Enter the type of the network (Ethernet, Token Ring, and so on.) Enter the subnet mask for the network. Enter the names of the nodes you plan to include in the network. Choose whether to enable IP Address Takeover (IPAT) over IP Aliases or not. If you do not enable IPAT over IP Aliases, it will be IPAT via IP Replacement. For descriptions of the two types of IPAT, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. Complete this field if you plan heartbeating over IP Aliases. For a detailed description, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. Table 3-5 on page 81 lists the values we specified in the worksheet. We defined one TCP/IP network called net_ether_01. Note: A network in HACMP is a group of network adapters that will share one or more service IP labels. Include all physical and logical network that act as a backup for one another in one network. For example, if two nodes are connected to two redundant physical networks, then define one network to include the two physical networks.

IP Address Offset for Heartbeating over IP Aliases

Chapter 3. High availability cluster implementation

77

Table 3-5 TCP/IP Network definition Cluster Name Network Name cltivoli Network Type Netmask Node Names IPAT via IP Aliases IP Address Offset for Heartbeating over IP Aliases 172.16.100.1

net_ether_01

Ethernet

255.255.255.0

tivaix1, tivaix2

enable

Completing the TCP/IP Network Interface Worksheet


After you have planned your TCP/IP network definition, plan your network Interface. Associate your IP labels and IP address to network interface. When you complete this worksheet, the following items should be defined. Complete this worksheet for each node you plan to have in your cluster. Node Name IP Label Network Interface Network Name Enter the node name. Assign an IP label for each IP Address you plan to have for the node. Assign a physical network interface (for example, en0, en1) to the IP label. Assign an HACMP network name. This network name must be one of the networks you defined in the TCP/IP Network Worksheet. Specify the function of the interface and whether the interface is service, boot or persistent.

Interface Function

Note: In HACMP, there are several kinds of IP labels you can define. A boot IP label is a label that is bound to one particular network adapter. This label is used when the system starts. A Service IP label is a label that is associated with a resource group and is able to move from one adapter to another on the same node, or from one node to another. It floats among the physical TCP/IP network interfaces to provide IP address consistency to an application serviced by HACMP. This IP label exists only when cluster is active. A Persistent IP label is a label bound to a particular node. This IP label also floats among two or more adapters in one node, to provide constant access to a node, regardless of the cluster state. IP Address Associate an IP address to the IP label.

78

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Netmask Hardware Address

Enter the netmask. Specify hardware address of the network adapter if you plan IPAT with hardware address takeover.

Table 3-6 and Table 3-7 show the values we entered in our worksheet. We omitted hardware address because we do not plan to have hardware address takeover.
Table 3-6 TCP/IP network interface plan for tivaix1 Node Name IP Label tivaix1_svc tivaix1_bt1 tivaix1_bt2 tivaix1 tivaix1 Network Interface en0 en1 Network Name net_ether_01 net_ether_01 net_ether_01 net_ether_01 Interface Function service boot boot persistent IP Address 9.3.4.3 192.168.100.101 10.1.1.101 9.3.4.194 Netmask 255.255.254.0 255.255.254.0 255.255.254.0 255.255.254.0

Table 3-7 TCP/IP network interface plan for tivaix2 Node Name IP Label tivaix2_svc tivaix2_bt1 tivaix2_bt1 tivaix2 tivaix2 Network Interface en0 en1 Network Name net_ether_01 net_ether_01 net_ether_01 net_ether_01 Interface Function service boot boot persistent IP Address 9.3.4.4 192.168.100.102 10.1.1.102 9.3.4.195 Netmask 255.255.254.0 255.255.254.0 255.255.254.0 255.255.254.0

Completing the Point-to-Point Networks Worksheet


You may need a non-TCP/IP point-to-point network in the event of a TCP/IP subsystem failure. The Point-to-Point Networks Worksheet helps you to plan non-TCP/IP point-to-point networks. When you complete this worksheet, you should have the following items defined. Cluster name Enter the name of your cluster.

Then, for each of your point-to-point networks, enter the values for the following items: Network Name Enter the name of your point-to-point network.

Chapter 3. High availability cluster implementation

79

Network Type

Enter the type of your network (disk heartbeat, Target Mode SCSI, Target Mode SSA, and so on). Refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861, for more information. Enter the name of the nodes you plan to connect with the network. Enter the name of the physical disk (required only for disk heartbeat networks).

Node Names Hdisk

Table 3-8 on page 91 lists the definition for point-to-point network we planned in our scenario. We omitted the value for Hdisk because we did not plan disk heartbeats.
Table 3-8 Point-to-point network definition Cluster Name Network Name net_tmssa_01 cltivoli Network Type Target Mode SSA Node Names tivaix1, tivaix2

After you have planned your network, add your network plans to the diagram. Figure 3-5 on page 81 shows our cluster diagram with our cluster network plans added. There is a TCP/IP network definition net_ether_01. For a point-to-point network, we added net_tmssa_01. For each node, we have two boot IP labels, a service IP label and a persistent IP label.

80

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster:
service tivaix1_svc, persistent tivaix1 boot1 tivaix1_bt1 boot2: tivaix1_bt2

cltivoli

service tivaix2_svc, persistent tivaix2

net_ether_01

SSA Disk subsystem en0 en1 net_tmssa_01 SSA

en0 en1

tivaix1

IP labels and address for tivaix1: tivaix1_bt1: 192.168.100.101 tivaix1_bt2: 10.1.1.101 tivaix1_svc: 9.3.4.3 tivaix1: 9.3.4.194

Figure 3-5 Cluster diagram with network topology added

SSA

tivaix2

IP labels and address for tivaix2: tivaix2_bt1: 192.168.100.102 tivaix2_bt2: 10.1.1.102 tivaix2_svc: 9.3.4.4 tivaix2: 9.3.4.195

Planning the shared disk device


Shared disk is an essential part of HACMP cluster. It is usually one or more external disks shared among two or more cluster nodes. In a non-concurrent configuration, only one node at a time has control of the disks. If a node fails in a cluster, the node with the next highest priority in the cluster acquires the ownership of the disks and restarts applications to restore mission-critical services. This ensures constant access to application executables and data stored on those disks. When you complete this task, at a minimum the following information should be defined: Type of shared disk technology The number of disks required The number of disk adapters

Chapter 3. High availability cluster implementation

81

HACMP supports several disk technologies, such as SCSI and SSA. For a complete list of supported disk device, consult your service provider. We used an SSA disk subsystem for our scenario, because this was the given environment of our lab. Because we planned to have two instances of IBM Tivoli Workload Scheduler installed in separate volume groups, we needed at least two physical disks. Mirroring SSA disks is recommended, as mirroring an SSA disk enables the replacement of a failed disk drive without powering off entire system. Mirroring requires an additional disk for each physical disk, so the minimum number of disks would be four physical disks. To avoid having disk adapters become single points of failure, redundant disk adapters are recommended. In our scenario, we had one disk adapter for each node, due to the limitations of our lab environment. Figure 3-6 on page 83 shows a cluster diagram with at least four available disks in the SSA subsystem. While more than one disk adapter per node is recommended, we only have one disk adapter on each node due to the limitations of our environment.

82

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster: cltivoli
service tivaix1_svc, persistent tivaix1 boot1 tivaix1_bt1 boot2 tivaix1_bt2

service tivaix2_svc, persistent tivaix2 boot1 tivaix2_bt1 tivaix2_bt2

net_ether_01

boot2

en0 en1
net_tmssa_01

en0 en1

tivaix1

IP labels and address for tivaix1: tivaix1_bt1: 192.168.100.101 tivaix1_bt2: 10.1.1.101 tivaix1_svc: 9.3.4.3 tivaix1: 9.3.4.194

Figure 3-6 Cluster diagram with disks added

SSA

tivaix2

IP labels and address for tivaix2: tivaix2_bt1: 192.168.100.102 tivaix2_bt2: 10.1.1.102 tivaix2_svc: 9.3.4.4 tivaix2: 9.3.4.195

Planning the shared LVM components


AIX uses Logical Volume Manager (LVM) to manage disks. LVM components (physical volumes, volume groups, logical volumes, file systems) maps data between physical and logical storage. For more information on AIX LVM, refer to AIX System Management Guide. To share and control data in an HACMP cluster, you need to define LVM components. When planning for LVM components, we used the Shared Volume Group/Filesystem Worksheet.

SSA

Chapter 3. High availability cluster implementation

83

Completing the Shared Volume Group/Filesystem Worksheet


For each field in the worksheet, you should have at least the following information defined. This worksheet should be completed for each shared volume group you plan to have in your cluster. Node Names Shared Volume Group Name Major Number Record the node name of the each node in the cluster. Specify a name for the volume group shared by the nodes in the cluster. Record the planned major number for the volume group. This field could be left blank to use the system default if you do not plan to have NFS exported filesystem. When configuring shared volume group, take note of the major number. You may need this when importing volume groups on peer nodes. Log Logical Volume Name Specify a name for the log logical volume (jfslog). The name of the jfslog must be unique in the cluster. (Do not use the system default name, because a log logical name on another node may be assigned the identical name.) When creating jfslog, make sure you rename it to the name defined in this worksheet. For each node, record the names of physical volumes you plan to include in the volume group. Physical volume names may differ by node, but PVIDs (16-digit IDs for physical volumes) for the shared physical volume must be the same on all nodes. To check the PVID, use the lspv command.

Physical Volumes

Then, for each logical volume you plan to include in the volume group, fill out the following information: Logical Volume Name Assign a name for the logical volume. Specify the number of copies of the logical volume. This number is needed for mirroring the logical volume. If you plan mirroring, the number of copies must be 2 or 3. Number of Copies of Logical Partition

84

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Filesystem Mount Point Size

Assign a mount point for the logical volume name. Specify the size of the file system in 512-byte blocks.

Table 3-9 and Table 3-10 show the definition of volume groups planned for our scenario. Because we plan to have shared volume groups for each instance of IBM Tivoli Workload Scheduler, we defined volume groups tiv_vg1 and tiv_vg2. Then, we defined one logical volume in each of the volume groups to host a file system. We assigned major numbers instead of using system default, but this is not mandatory when you are not using NFS exported file systems.
Table 3-9 Definitions for shared volume groups /file system (tiv_vg1) Items to define Node Names Value tivaix1, tivaix2

Shared Volume Group Name


Major Number

tiv_vg1
45

Log Logical Volume name Physical Volume son tivaix1


Physical Volumes on tivaix2

lvtws1_log hdisk6
hdisk7

Logical Volume Name Number of Copies Filesystem Mount point


Size

lvtws1 2 /usr/maestro
1048576

Table 3-10 Definitions for shared volume groups /file system (tiv_vg2) Items to define Node Names Value tivaix1, tivaix2

Shared Volume Group Name


Major Number

tiv_vg2
46

Log Logical Volume name Physical Volumes on tivaix1


Physical Volumes on tivaix2

lvtws2_log hdisk7
hdisk20

Logical Volume Name

lvtws2

Chapter 3. High availability cluster implementation

85

Items to define

Value

Number of Copies Filesystem Mount point


Size

2 /usr/maestro2
1048576

Figure 3-8 on page 91 shows the cluster diagram with shared LVM components added.

Cluster:
service tivaix1_svc, persistent tivaix1 boot1 tivaix1_bt1 boot2 tivaix1_bt2

cltivoli
service tivaix2_svc, persistent tivaix2 boot1 tivaix2_bt1 boot2 tivaix2_bt2

net_ether_01

en0 en1 net_tmssa_01

en0 en1

tivaix1

tivaix2

IP labels and address for tivaix1: tivaix1_bt1: 192.168.100.101 tivaix1_bt2: 10.1.1.101 tivaix1_svc: 9.3.4.3 tivaix1: 9.3.4.194

IP labels and address for tivaix2: tivaix2_bt1: 192.168.100.102 tivaix2_bt2: 10.1.1.102 tivaix2_svc: 9.3.4.4 tivaix2: 9.3.4.195

Figure 3-7 Cluster diagram with shared LVM added

86

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

SSA

Planning the resource groups


A resource group refers to a set of resources that will move from one node to another in the event of an HACMP fallover or a fallback. A resource group usually consists of volume groups and service IP address. For this task, we used the Resource Group Worksheet. One worksheet must be completed for each resource group that you plan. The following items should be defined when you complete the worksheet. Cluster Name Specify the name of the cluster where the resource group reside. This should be the name that you have defined when planning the cluster nodes. Assign a name for the resource group you are planning. Choose the management policy of the resource group (Cascading, Rotating, Concurrent or Custom). For details on management policy, refer to HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864.

Resource Group Name Management Policy

Participating Nodes/Default Node PrioritySpecify the name of nodes that may acquire the resource group. When specifying the nodes, make sure the nodes are listed in the order of their priority (nodes with higher priority should be listed first.) Service IP Label Specify the service IP label for IP Address Takeover (IPAT). This IP label is associated to the resource group, and it is transferred to another adapter or a node in the event of a resource group fallover. Specify the name of the volume group(s) to include in the resource group. Specify the name of the file systems to include in the resource group.

Volume Groups

Filesystems

Chapter 3. High availability cluster implementation

87

Note: There is no need to specify file system names if you have specified a name of a volume group, because all the file systems in the specified volume group will be mounted by default. In the worksheet, leave the file system field blank unless you need to include individual file systems. Filesystems Consistency Check Specify fsck or logredo. This is the method to check consistency of the file system. Specify parallel or sequential. This is the recovery method for the file systems. Set it to true if you wish to have volume group imported automatically to any cluster nodes in the resource chain. Set it to true or false. If you want the resource groups acquired only by the primary node, set this attribute to false. Set it to true or false. If you set this to true, then a resource group that has failed over to another node will not fall back automatically in the event that its primary node rejoins the cluster. This option is useful if you do not want HACMP to move resource groups during application processing. Set it to true or false. Set it true or false. Table 3-11 on page 89 and Table 3-12 on page 89 show how we planned our resource groups. We defined one resource group for each of the two instances of IBM Tivoli Workload Scheduler, rg1 and rg2. Notice that we set Inactive Takeover Activated to false, because we want the resource group to always be acquired by the node that has the highest priority in the resource chain.

Filesystem Recovery Method

Automatically Import Volume Groups

Inactive Takeover

Cascading Without Fallback Activated

Disk Fencing Activated

File systems Mounted before IP Configured

88

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

We set Cascading Without Fallback Activated to true because we do not want IBM Tivoli Workload Scheduler to fall back to the original node while jobs are running.
Table 3-11 Definition for resource group tws1_rg Items to define Value

Cluster Name Resource Group Name Management policy Participating Nodes/Default Node Priority Service IP Label Volume Groups Filesystems Consistency Check Filesystem Recovery Method Automatically Import Volume Groups Inactive Takeover Activated Cascading Without Fallback Activated Disk Fencing Activated File systems Mounted before IP Configured

cltivoli rg1 cascading tivaix1, tivaix2 tivaix1_svc tiv_vg1 fsck sequential false false true false false

Table 3-12 Definition for resource group tws1_rg2 Items to define Value

Cluster Name Resource Group Name Management policy Participating Nodes/Default Node Priority Service IP Label Volume Groups

cltivoli rg1 cascading tivaix2, tivaix1 tivaix2_svc tiv_vg2

Chapter 3. High availability cluster implementation

89

Items to define

Value

Filesystems Consistency Check Filesystem Recovery Method Automatically Import Volume Groups Inactive Takeover Activated Cascading Without Fallback Activated Disk Fencing Activated File systems Mounted before IP Configured

fsck sequential false false true false false

Figure 3-9 on page 85 shows the cluster diagram with resource groups added.

90

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster:
service tivaix1_svc, persistent tivaix1 boot1 tivaix1_bt1 boot2 tivaix1_bt2

cltivoli
service tivaix2_svc, persistent tivaix2 boot1 tivaix2_bt1

net_ether_01

SSA Disk subsystem en0 en1

en0 en1

tivaix1
Resource Group : rg1 Service IP Label : tivaix1_svc Volume Group: tiv_vg1

tivaix2
Resource Group: rg2 Service IP Label: tivaix2_svc Volume Group : tiv_vg2

IP labels and address for tivaix1: tivaix1_bt1: 192.168.100.101 tivaix1_bt2: 10.1.1.101 tivaix1_svc: 9.3.4.3 tivaix1: 9.3.4.194

IP labels and address for tivaix2: tivaix2_bt1: 192.168.100.102 tivaix2_bt2: 10.1.1.102 tivaix2_svc: 9.3.4.4 tivaix2: 9.3.4.195

Figure 3-8 Cluster diagram with resource group added

Planning the cluster event processing


A cluster event is a change of status in the cluster. For example, if a node leaves the cluster, that is a cluster event. HACMP takes action based on these events by invoking scripts related to each event. A default set of cluster events and related scripts are provided. If you want some specific action to be taken on an occurrence of these events, you can define a command or script to execute before/after each event. You may also define events of your own. For details on cluster events and customizing events to tailor your needs, refer to HACMP documentation. In this section, we give you an example of customized cluster event processing. In our scenario, we planned our resource group with CWOF because we do not

SSA

SSA

Chapter 3. High availability cluster implementation

91

want HACMP to fallback IBM Tivoli Workload Scheduler during job execution. However, this leaves two instances of IBM Tivoli Workload Scheduler running on one node, even after the failed node has reintegrated into the cluster. The resource group must be manually transferred to the reintegrated node, or some implementation must be done to automate this procedure.

Completing the Cluster Event Worksheet


To plan cluster event processing, you will need to define several items. The Cluster Event Worksheet helps you to plan your cluster events. Here we describe the items that we defined for our cluster events. Cluster Name Cluster Event Name Post-Event Command The name of the cluster. The name of the event you would like to configure. The name of the command or script you would like to execute after the cluster event you specified in the Cluster Event Name field.

Table 3-13 shows the values we defined for each item.


Table 3-13 Definition for cluster event Items to define Cluster Name Cluster Event Name Post-Event Command Value cltivoli node_up_complete /usr/es/sbin/cluster/sh/quiesce_tws.sh

3.2.4 Installing HACMP 5.1 on AIX 5.2


This section provides step-by-step instructions for installing HACMP 5.1 on AIX 5.2. First we cover the steps to prepare the system for installing HACMP, then we go through the installation and configuration steps.

Preparation
Before you install HACMP software, complete the following tasks: Meet all hardware and software requirements Configure the disk subsystems Define the shared LVM components Configure Network Adapters

92

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Meet all hardware and software requirements


Make sure your system meets the hardware and software requirements for HACMP software. The requirements may vary based on the hardware type and software version that you use. Refer to the release notes for requirements.

Configure the disk subsystems


Disk subsystems are an essential part of an HACMP cluster. The external disk subsystems enable physically separate nodes to share the same set of disks. Disk subsystems must be cabled and configured properly so that all nodes in a cluster is able to access the same set of disks. Configuration may differ depending on the type of disk subsystems you use. In our scenario, we used IBM 7133 Serial Storage Architecture (SSA) Disk Subsystem Model 010. Figure 3-9 shows how we cabled our 7133 SSA Disk Subsystem.

Node1

Node2

SSA Adapter

SSA Adapter

Disk Group4 Disk Group3

7133 Unit

back front

Disk Group3

Disk Group2

Disk Group1

Figure 3-9 SSA Cabling for high availability scenario

Chapter 3. High availability cluster implementation

93

The diagram shows a single 7133 disk subsystem containing eight disk drives connected between two nodes in a cluster. Each node has one SSA Four Port Adapter. The disk drives in the 7133 are cabled to the two machines in two loops. Notice that there is a loop that connects Disk Group1 and the two nodes, and another loop that connects Disk Group2 and the two nodes. Each loop is connected to a different port pair on the SSA Four Port Adapters, which enables the two nodes to share the same set of disks. Once again, keep in mind that this is only an example scenario of a 7133 disk subsystem configuration. Configuration may vary depending on the hardware you use. Consult your system administrator for precise instruction on configuring your external disk device. Important: In our scenario, we used only one SSA adapter per node. In actual production environments, we recommend that an additional SSA adapter be added to each node to eliminate single points of failure.

Define the shared LVM components


Prior to installing HACMP, shared LVM components such as volume groups and file systems must be defined. In this section, we provide a step-by-step example of the following tasks: Defining volume groups Defining file systems Renaming logical volumes Importing volume groups Testing volume group migrations Defining volume groups 1. Log in as root user on tivaix1. 2. Open smitty. The following command takes you to the Volume Groups menu.
# smitty vg

a. In the Volume Groups menu, select Add a Volume Group as seen in Figure 3-10 on page 95.

94

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Volume Groups Move cursor to desired item and press Enter. [TOP] List All Volume Groups Add a Volume Group Set Characteristics of a Volume Group List Contents of a Volume Group Remove a Volume Group Activate a Volume Group Deactivate a Volume Group Import a Volume Group Export a Volume Group Mirror a Volume Group Unmirror a Volume Group Synchronize LVM Mirrors Back Up a Volume Group Remake a Volume Group Preview Information about a Backup [MORE...4] F1=Help Esc+9=Shell F2=Refresh Esc+0=Exit F3=Cancel Enter=Do Esc+8=Image

Figure 3-10 Volume Group SMIT menu

b. In the Add a Volume Group screen (Figure 3-11 on page 96), enter the following value for each field. Note that physical volume names and volume group major number may vary according to your system configuration. VOLUME GROUP Name:tivaix1 Physical Partition SIZE in megabytes:4 PHYSICAL VOLUME names:hdisk6, hdisk7 Activate volume group AUTOMATICALLY at system restart?: no Volume Group MAJOR NUMBER:45 Create VG Concurrent Capable?:no

Chapter 3. High availability cluster implementation

95

Add a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tiv_vg1] 4 [hdisk6 hdisk7] no yes [45] no no 128

VOLUME GROUP name Physical partition SIZE in megabytes * PHYSICAL VOLUME names Force the creation of a volume group? Activate volume group AUTOMATICALLY at system restart? Volume Group MAJOR NUMBER Create VG Concurrent Capable? Create a big VG format Volume Group? LTG Size in kbytes

+ + + + +# + + +

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 3-11 Defining a volume group

c. Verify that the volume group you specified in the previous step (step d) is successfully added and varied on.
# lsvg -o

Example 3-1 shows the command output. With the -o option, you will only see the volume groups that are successfully varied on. Notice that volume group tiv_vg1 is added and varied on.
Example 3-1 lvsg -o output # lsvg -o tiv_vg1 rootvg

Defining file systems 1. To create a file system, enter the following command. This command takes you to the Add a Journaled File System menu.
# smitty crjfs

96

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

2. Select Add a Standard Journaled File System (Figure 3-12). You are prompted to select a volume group in which the shared filesystem should reside. Select the shared volume group that you defined previously, and proceed to the next step.

Add a Journaled File System Move cursor to desired item and press Enter. Add a Standard Journaled File System Add a Compressed Journaled File System Add a Large File Enabled Journaled File System

F1=Help Esc+9=Shell

F2=Refresh Esc+0=Exit

F3=Cancel Enter=Do

Esc+8=Image

Figure 3-12 Add a Journaled File System menu

3. Specify the following values for the new journaled file system. Volume Group Name:tiv_vg1 SIZE of file system unit size:Megabytes Number of Units: MOUNT POINT: 512 /usr/maestro

Mount AUTOMATICALLY at system restart?:no Start Disk Accounting: no

Chapter 3. High availability cluster implementation

97

Note: When creating a file system that will be put under control of HACMP, do not set the attribute of Mount AUTOMATICALLY at system restart to YES. HACMP will mount the file system after cluster start. Figure 3-13 shows our selections.

Add a Standard Journaled File System Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] tiv_vg1 Megabytes [512] [/usr/maestro] no read/write [] no 4096 4096 8 + # + + + + + + +

Volume group name SIZE of file system Unit Size * Number of units * MOUNT POINT Mount AUTOMATICALLY at system restart? PERMISSIONS Mount OPTIONS Start Disk Accounting? Fragment Size (bytes) Number of bytes per inode Allocation Group Size (MBytes)

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 3-13 Defining a journaled file system

4. Mount the file system using the following command:


# mount /usr/maestro

5. Using the following command, verify that the filesystem is successfully added and mounted:
# lsvg -l tiv_vg1

98

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 3-1 on page 96 shows a sample of the command output.


Example 3-2 lsvg -l tiv_vg1 output tiv_vg1: LV NAME loglv00 lv06 TYPE jfslog jfs LPs 1 1 PPs 1 1 PVs LV STATE 1 open/syncd 1 open/syncd MOUNT POINT N/A /usr/maestro

6. Unmount the file system using the following command:


# umount /usr/maestro

Renaming logical volumes Before we proceed to configuring network adapters, we need to rename the logical volume name for the file system we created. This is because in an HACMP cluster, all shared logical volumes need to have a unique name. 1. Determine the name of the logical volume and the logical log volume by entering the following command.
# lsvg -l tiv_vg1

Example 3-3 shows the command output. Note that the logical volume name is loglv00, and the file system is lv06.
Example 3-3 lsvg -l tiv_vg1 output # lsvg -l tiv_vg1 tiv_vg1: LV NAME loglv00 lv06

TYPE jfslog jfs

LPs 1 1

PPs 1 1

PVs LV STATE 1 closed/syncd 1 closed/syncd

MOUNT POINT N/A /usr/maestro

2. Enter the following command. This will take you to the Change a Logical Volume menu.
# smitty chlv

3. Select Rename a Logical Volume (see Figure 3-14 on page 100).

Chapter 3. High availability cluster implementation

99

Change a Logical Volume Move cursor to desired item and press Enter. Change a Logical Volume Rename a Logical Volume

F1=Help Esc+9=Shell

F2=Refresh Esc+0=Exit

F3=Cancel Enter=Do

Esc+8=Image

Figure 3-14 Changing a Logical Volume menu

4. Select or type the current logical volume name, and enter the new logical volume name. In our example, we use lv06 for the current name, and lvtiv1 for the new name (see Figure 3-15 on page 101).

100

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Rename a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [lv06] [lvtiv1]

* CURRENT logical volume name * NEW logical volume name

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 3-15 Renaming a logical volume

5. Perform the steps 1 through 4 for the logical log volume. We specified the name of the current logical log volume name loglv00 and the new logical volume name as lvtiv1_log. 6. Verify that the logical volume name has been changed successfully by entering the following command.
# lsvg -l tiv_vg1

Example 3-4 shows the command output.


Example 3-4 Command output of lsvg # lsvg -l tiv_vg1 tiv_vg1: LV NAME lvtws1_log lvtws1

TYPE jfslog jfs

LPs 1 512

PPs PVs LV STATE 2 2 open/syncd 1024 2 open/syncd

MOUNT POINT N/A /usr/maestro

7. After renaming the logical volume and the logical log volume, check the entry for the file system in the /etc/filesystems file. Make sure the attributes dev and log reflect the change. The value for dev should be the new name for the

Chapter 3. High availability cluster implementation

101

logical volume, while the value for log should be the name of the jfs log volume. If the log attributes do not reflect the change, issue the following command. (We used /dev/lvtws1_log in our example.)
# chfs -a log=/dev/lvtws1_log /usr/maestro

Example 3-5 shows how the entry for the file system should look in the /etc/filesystems file. Notice that the value for attribute dev is the new logical volume name(/dev/lvtws1), and the value for attribute log is the new logical log volume name (/dev/lvtws1_log).
Example 3-5 An entry in the /etc/filesystems file /usr/maestro: dev vfs log mount options account = = = = = = /dev/lvtws1 jfs /dev/lvtws1_log false rw false

Importing the volume groups At this point, you should have a volume group and a file system defined on one node. The next step is to set up the volume group and the file system so that the both nodes are able to access them. We do this by importing the volume groups from the source node to destination node. In our scenario, we import volume group tiv_vg1 to tivaix2. The following steps describe how to import a volume group from one node to another. In these steps we refer to tivaix1 as the source server, and tivaix2 as the destination server. 1. Log in to the source server. 2. Check the physical volume name and the physical volume ID of the disk in which your volume group reside. In Example 3-6 on page 103, notice that the first column indicates the physical volume name, and the second column indicates the physical volume ID. The third column shows which volume group resides on each physical volume. Check the physical volume ID (shown in the second column) for the physical volumes related to your volume group, as this information is required in the following steps to come.
# lspv

Example 3-6 on page 103 shows example output from tivaix1. You can see that volume group tiv_vg1 resides on hdisk6 and hdisk7.

102

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 3-6 Output of an lspv command # lspv hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 0001813fe67712b5 0001813f1a43a54d 0001813f95b1b360 0001813fc5966b71 0001813fc5c48c43 0001813fc5c48d8c 000900066116088b 000000000348a3d6 00000000034d224b none none none 00000000034d7fad rootvg rootvg rootvg rootvg None None tiv_vg1 tiv_vg1 None None None None None active active active active

active active

3. Vary off tiv_vg1 from the source node:


# varyoffvg tiv_vg1

4. Log into the destination node as root. 5. Check the physical volume name and ID on the destination node. Look for the same physical volume ID that you identified in step 2.). Example 3-7 shows output of the lspv command run on node tivaix2. Note that hdisk5 has the same physical volume id as hdisk6 on tivaix1, and hdisk6 has the same physical volume ID as hdisk7 on tivaix1.
# lspv Example 3-7 Output of lspv on node tivaix2 # lspv hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk7 hdisk16 hdisk17 hdisk18 hdisk19 hdisk20 0001814f62b2a74b none none none none 000900066116088b 000000000348a3d6 00000000034d224b 0001814fe8d10853 none none none 00000000034d7fad rootvg None None None None None None1 tiv_vg2 None None None None tiv_vg2 active

active

active

Chapter 3. High availability cluster implementation

103

Importing volume groups To import a volume group, enter the following command. This will take you to the Import a Volume Group screen.
# smitty importvg

1. Specify the following values. VOLUME GROUP name: tiv_vg1 PHYSICAL VOLUME name:hdisk5 Volume Group MAJOR NUMBER: 45

Note: The physical volume name has to be the one with the same physical disk id that the importing volume group resides on. Also, note that the value for Volume Group MAJOR NUMBER should be the same value as specified when creating the volume group.

Our selections are shown in Figure 3-16 on page 105.

104

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Import a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tiv_vg1] [hdisk6] [45]

VOLUME GROUP name * PHYSICAL VOLUME name Volume Group MAJOR NUMBER

+ +#

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 3-16 Import a Volume Group

2. Use the following command to verify that the volume group is imported on the destination node.
# lsvg -o

Example 3-8 shows the command output on the destination node. Note that tiv_vg1 is now varied on to tivaix2 and is available.
Example 3-8 lsvg -o output # lsvg -o tiv_vg1 rootvg

Note: By default, the imported volume group is set to be varied on automatically at system restart. In an HACMP cluster, the HACMP software varies on the volume group. We need to change the property of the volume group so that it will not be automatically varied on at system restart.

Chapter 3. High availability cluster implementation

105

3. Enter the following command.


# smitty chvg

4. Select the volume group imported in the previous step. In our example, we use tiv_vg1 (Figure 3-17).
Change a Volume Group Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] [tiv_vg1]

* VOLUME GROUP name

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 3-17 Changing a Volume Group screen

5. Specify the following, as seen in Figure 3-18 on page 107. Activate volume group AUTOMATICALLY at system restart: no

106

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Change a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] tiv_vg1 no yes no no 128 n n

* VOLUME GROUP name * Activate volume group AUTOMATICALLY at system restart? * A QUORUM of disks required to keep the volume group on-line ? Convert this VG to Concurrent Capable? Change to big VG format? LTG Size in kbytes Set hotspare characteristics Set synchronization characteristics of stale partitions

+ + + + + + +

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 3-18 Changing the properties of a volume group

Note: At this point, you should now have shared resources defined on one of the nodes. Perform steps Defining the file systems through Testing volume group migrations to define another set of shared resources that reside on the other node. Testing volume group migrations You should manually test the migration of volume groups between cluster nodes before installing HACMP, to ensure each cluster node can use every volume group. To test volume group migrations in our environment: 1. Log on to tivaix1 as root user. 2. Ensure all volume groups are available. Run the command lsvg. You should see local volume group(s) like rootvg, and all shared volume groups. In our environment, we see the shared volume groups tiv_vg1 and tiv_vg2 from the SSA disk subsystem, as shown in Example 3-9 on page 108.

Chapter 3. High availability cluster implementation

107

Example 3-9 Verifying all shared volume groups are available on a cluster node [root@tivaix1:/home/root] lsvg rootvg tiv_vg1 tiv_vg2

3. While all shared volume groups are available, they should not be online. Use the following command to verify that no shared volume groups are online:
lsvg -o

In our environment, the output from the command, as shown in Example 3-10, indicates only the local volume group rootvg is online.
Example 3-10 Verifying no shared volume groups are online on a cluster node [root@tivaix1:/home/root] lsvg -o rootvg

If you do see shared volume groups listed, vary them offline by running the command:
varyoffvg volume_group_name

where volume_group_name is the name of the volume group. 4. Vary on all available shared volume groups. Run the command:
varyonvg volume_group_name

where volume_group_name is the name of the volume group, for each shared volume group. Example 3-11 shows how we varied on all shared volume groups.
Example 3-11 How to vary on all shared volume groups on a cluster node [root@tivaix1:/home/root] [root@tivaix1:/home/root] tiv_vg1 rootvg [root@tivaix1:/home/root] [root@tivaix1:/home/root] tiv_vg2 tiv_vg1 rootvg varyonvg tiv_vg1 lsvg -o

varyonvg tiv_vg2 lsvg -o

Note how we used the lsvg command to verify at each step that the vary on operation succeeded. 5. Determine the corresponding logical volume(s) for each shared volume group varied on.

108

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Use the following command to list the logical volume(s) of each volume group:
lsvg -l volume_group_name

where volume_group_name is the name of a shared volume group. As shown in Example 3-12, in our environment shared volume group tiv_vg1 has two logical volumes, lvtws1_log and lvtws1, and shared volume group tiv_vg2 has logical volumes lvtws2_log and lvtws2.
Example 3-12 Logical volumes in each shared volume group varied on in a cluster node [root@tivaix1:/home/root] lsvg tiv_vg1: LV NAME TYPE lvtws1_log jfslog lvtws1 jfs [root@tivaix1:/home/root] lsvg tiv_vg2: LV NAME TYPE lvtws2_log jfslog lvtws2 jfs -l tiv_vg1 LPs PPs PVs LV STATE 1 2 2 closed/syncd 512 1024 2 closed/syncd -l tiv_vg2 LPs 1 128 PPs 2 256 PVs LV STATE 2 closed/syncd 2 closed/syncd MOUNT POINT N/A /usr/maestro

MOUNT POINT N/A /usr/maestro2

6. Mount the corresponding JFS logical volume(s) for each shared volume group. Use the mount command to mount each JFS logical volume to its defined mount point. Example 3-13 shows how we mounted the JFS logical volumes in our environment.
Example 3-13 Mounts of logical volumes on shared volume groups on a cluster node [root@tivaix1:/home/root] df /usr/maestro Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd2 2523136 148832 95% 51330 9% /usr [root@tivaix1:/home/root] mount /usr/maestro [root@tivaix1:/home/root] df /usr/maestro Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/lvtws1 2097152 1871112 11% 1439 1% /usr/maestro [root@tivaix1:/home/root] df /usr/maestro2 Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd2 2523136 148832 95% 51330 9% /usr [root@tivaix1:/home/root] mount /usr/maestro2 [root@tivaix1:/home/root] df /usr/maestro2 Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/lvtws1 524288 350484 34% 1437 2% /usr/maestro2

Note how we use the df command to verify that the mount point before the mount command is in one file system, and after the mount command is attached to a different filesystem. The different file systems before and after the mount commands are highlighted in bold in Example 3-13.

Chapter 3. High availability cluster implementation

109

7. Unmount each logical volume on each shared volume group. Example 3-14 shows how we unmount all logical volumes from all shared volume groups.
Example 3-14 Unmount logical volumes on shared volume groups on a cluster node [root@tivaix1:/home/root] [root@tivaix1:/home/root] Filesystem 1024-blocks /dev/hd2 2523136 [root@tivaix1:/home/root] [root@tivaix1:/home/root] Filesystem 1024-blocks /dev/hd2 2523136 umount /usr/maestro df /usr/maestro Free %Used Iused %Iused Mounted on 148832 95% 51330 9% /usr umount /usr/maestro2 df /usr/maestro2 Free %Used Iused %Iused Mounted on 148832 95% 51330 9% /usr

Again, note how we use the df command to verify a logical volume is unmounted from a shared volume group. d. Vary off each shared volume group on the cluster node. Use the following command to vary off a shared volume group:
varyoffvg volume_group_name

where volume_group_name is the name of the volume group, for each shared volume group. The following example shows how we vary off the shared volume groups tiv_vg1 and tiv_vg2:
Example 3-15 How to vary off shared volume groups on a cluster node [root@tivaix1:/home/root] [root@tivaix1:/home/root] tiv_vg2 rootvg [root@tivaix1:/home/root] [root@tivaix1:/home/root] rootvg varyoffvg tiv_vg1 lsvg -o

varyoffvg tiv_vg2 lsvg -o

Note how we use the lsvg command to verify that a shared volume group is varied off. 8. Repeat this procedure for the remaining cluster nodes. You must test that all volume groups and logical volumes can be accessed through the appropriate varyonvg and mount commands on each cluster node. You now know that if volume groups fail to migrate between cluster nodes after installing HACMP, then there is likely a problem with HACMP and not with the configuration of the volume groups themselves on the cluster nodes.

110

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configure Network Adapters


Network Adapters should be configured prior to installing HACMP. Important: When configuring Network Adapters, bind only the boot IP address to each network adapter. No configuration for service IP address and persistent IP address is needed at this point. Do not bind a service or persistent IP address to any adapters. A service and persistent IP address is configured after HACMP is installed.

1. Log in as root on the cluster node. 2. Enter the following command. This command will take you to the SMIT TCP/IP menu.
# smitty tcpip

3. From the TCP/IP menu, select Minimum Configuration & Startup (Figure 3-19 on page 112). You are prompted to select a network interface from the Available Network Interface list. Select the network interface you want to configure.

Chapter 3. High availability cluster implementation

111

TCP/IP Move cursor to desired item and press Enter. Minimum Configuration & Startup Further Configuration Use DHCP for TCPIP Configuration & Startup IPV6 Configuration Quality of Service Configuration & Startup

F1=Help Esc+9=Shell

F2=Refresh Esc+0=Exit

F3=Cancel Enter=Do

Esc+8=Image

Figure 3-19 The TCP/IP SMIT menu

4. For the network interface you have selected, specify the following items and press Enter. Figure 3-20 on page 113 shows the configuration for our cluster. HOSTNAME Internet ADDRESS Network MASK NAME SERVER Default Gateway Hostname for the node. Enter the IP address for the adapter. This must be the boot address that you planned for the adapter. Enter the network mask. Enter the IP address and the domain name of your name server. Enter the IP address of the default Gateway.

112

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Minimum Configuration & Startup To Delete existing configuration data, please use Further Configuration menus Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] * HOSTNAME * Internet ADDRESS (dotted decimal) Network MASK (dotted decimal) * Network INTERFACE NAMESERVER Internet ADDRESS (dotted decimal) DOMAIN Name Default Gateway Address (dotted decimal or symbolic name) Cost Do Active Dead Gateway Detection? [MORE...2] F1=Help Esc+5=Reset Esc+9=Shell F2=Refresh Esc+6=Command Esc+0=Exit F3=Cancel Esc+7=Edit Enter=Do [Entry Fields] [tivaix1] [192.168.100.101] [255.255.254.0] en0 [9.3.4.2] [itsc.austin.ibm.com] [9.3.4.41] [0] no

# +

F4=List Esc+8=Image

Figure 3-20 Configuring network adapters

5. Repeat steps 1 through 4 for all network adapters in the cluster. Attention: To implement an HA cluster for IBM Tivoli Workload Scheduler, install IBM Tivoli Workload Scheduler before proceeding to 3.2.4, Installing HACMP 5.1 on AIX 5.2 on page 92. For instructions on installing IBM Tivoli Workload Scheduler in an HA cluster environment, refer to 4.1, Implementing IBM Tivoli Workload Scheduler in an HACMP cluster on page 184.

Install HACMP
The best results when installing HACMP are obtained if you plan the procedure before attempting it. We recommend that you read through the following installation procedures before undertaking them. If you make a mistake, uninstall HACMP; refer to Remove HACMP on page 134.

Chapter 3. High availability cluster implementation

113

Tip: Install HACMP after all application servers are installed, configured, and verified operational. This simplifies troubleshooting because if the application server does not run after HACMP is installed, you know that addressing an HACMP issue will fix the error. You will not have to spend time identifying whether the problem is with your application or HACMP. The major steps to install HACMP are covered in the following sections: Preparation on page 114 Install base HACMP 5.1 on page 122 Update HACMP 5.1 on page 126 (Optional, use only if installation or configuration fails) Remove HACMP on page 134 The details of each step follow.

Preparation
By now you should have all the requirements fulfilled and all the preparation completed. In this section, we provide a step-by-step description of how to install HACMP Version 5.1 on AIX Version 5.2. Installation procedures may differ depending on which version of HACMP software you use. For versions other than 5.1, refer to the installation guide for the HACMP version that you install. Ensure that you are running AIX 5.2 Maintenance Level 02. To verify your current level of AIX, run the oslevel and lslpp commands, as shown in Example 3-16.
Example 3-16 Verifying the currently installed maintenance level of AIX 5.2 [root@tivaix1:/home/root] oslevel -r 5200-02 [root@tivaix1:/home/root] lslpp -l bos.rte.commands Fileset Level State Description ---------------------------------------------------------------------------Path: /usr/lib/objrepos bos.rte.commands 5.2.0.12 COMMITTED Commands Path: /etc/objrepos bos.rte.commands

5.2.0.0 COMMITTED

Commands

If you need to upgrade your version of AIX 5.2, visit the IBM Fix Central Web site:
http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp

Be sure to upgrade from AIX 5.2.0.0 to Maintenance Level 01 first, then to Maintenance Level 02.

114

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 3-21 shows the IBM Fix Central Web page, and the settings you use to select the Web page with AIX 5.2 maintenance packages. (We show the entire Web page in Figure 3-21, but following figures omit the banners in the left-hand, upper, and bottom portions of the page.)

Figure 3-21 IBM Fix Central Web page for downloading AIX 5.2 maintenance packages

At the time of writing, Maintenance Level 02 is the latest available. We recommend that if you are currently running AIX Version 5.2, you upgrade to Maintenance Level 02. Maintenance Level 01 can be downloaded from:
https://techsupport.services.ibm.com/server/mlfixes/52/01/00to01.html

Maintenance Level 02 can be downloaded from:


https://techsupport.services.ibm.com/server/mlfixes/52/02/01to02.html

Chapter 3. High availability cluster implementation

115

Note: Check the IBM Fix Central Web site before applying any maintenance packages. After you ensure the AIX prerequisites are satisfied, you may prepare HACMP 5.1 installation media. To prepare HACMP 5.1 installation media on a cluster node, follow these steps: 1. Copy the HACMP 5.1 media to the hard disk on the node. We used /tmp/hacmp on our nodes to hold the HACMP 5.1 media. 2. Copy the latest fixes for HACMP 5.1 to the hard disk on the node. We used /tmp/hacmp on our nodes to hold the HACMP 5.1 fixes. 3. If you do not have the latest fixes for HACMP 5.1, download them from the IBM Fix Central Web site:
http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp

4. From this Web page, select pSeries, RS/6000 for the Server pop-up, AIX OS, Java, compilers for the Product or fix type pop-up, Specific fixes for the Option pop-up, and AIX 5.2 for the OS level pop-up, then press Continue, as shown in Figure 3-22 on page 117.

116

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 3-22 Using the IBM Fix Central Web page for downloading HACMP 5.1 patches

5. The Select fixes Web page is displayed, as shown in Figure 3-23 on page 118. We use this page to search for and download the fixes for APAR IY45695 and also the following PTF numbers: U496114, U496115, U496116, U496117, U496118, U496119, U496120, U496121, U496122, U496123, U496124, U496125, U496126, U496127, U496128, U496129, U496130, U496138, U496274, U496275 We used /tmp/hacmp_fixes1 for storing the fix downloads of APAR IY45695, and /tmp/hacmp_fixes2 for storing the fix downloads of the individual PTFs.

Chapter 3. High availability cluster implementation

117

Figure 3-23 Select fixes page of IBM Fix Central Web site

6. To download the fixes for APAR IY45695, select APAR number or abstract for the Search by pop-up, enter IY45695 in the Search string field, and press Go. A browser dialog as shown in Figure 3-24 may appear, depending upon previous actions within IBM Fix Central. If it does appear, press OK to continue (Figure 3-24).

Figure 3-24 Confirmation dialog presented in IBM Fix Central Select fixes page

7. The Select fixes page displays the fixes found, as shown in Figure 3-25 on page 119.

118

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 3-25 Select fixes page showing fixes found that match APAR IY45695

8. Highlight the APAR in the list box, then press the Add to my download list link. Press Continue, which displays the Packaging options page. 9. Select AIX 5200-01 for the Indicate your current maintenance level pop-up. At the time of writing, the only available download servers are in North America, so selecting a download server is an optional step. Select a download server if a more appropriate server is available in the pop-up. Now press Continue, as shown in Figure 3-26 on page 120.

Chapter 3. High availability cluster implementation

119

Figure 3-26 Packaging options page for packaging fixes for APAR IY45695

10.The Download fixes page is displayed as shown in Figure 3-27 on page 121. Choose an appropriate option from the Download and delivery options section of the page, then follow the instructions given to download the fixes.

120

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 3-27 Download fixes page for fixes related to APAR IY45695

11.Downloading fixes for PTFs follows the same procedure as for downloading the fixes for APAR IY45695, except you select Fileset or PTF number in the Search by pop-up in the Select fixes Web page.

Chapter 3. High availability cluster implementation

121

12.Copy the installation media to each cluster node or make it available via a remote filesystem like NFS, AFS, or DFS.

Install base HACMP 5.1


After the installation media is prepared on a cluster node, install the base HACMP 5.1 Licensed Program Products (LPPs): 1. Enter the command smitty install to start installing the software. The Software Installation and Maintenance SMIT panel is displayed as in Figure 3-28.

Software Installation and Maintenance Move cursor to desired item and press Enter. Install and Update Software List Software and Related Information Software Maintenance and Utilities Software Service Management Network Installation Management EZ NIM (Easy NIM Tool) System Backup Manager

F1=Help F9=Shell

F2=Refresh F10=Exit

F3=Cancel Enter=Do

F8=Image

Figure 3-28 Screen displayed after running command smitty install

2. Go to Install and Update Software > Install Software and press Enter. This brings up the Install Software SMIT panel (Figure 3-29 on page 123).

122

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Install Software Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] [/tmp/hacmp]

* INPUT device / directory for software

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-29 Filling out the INPUT device/directory for software field in the Install Software smit panel

3. Enter the directory that the HACMP 5.1 software is stored under into the INPUT device / directory for software field and press Enter, as shown in Figure 3-29. In our environment we entered the directory /tmp/hacmp into the field and pressed Enter. This displays the Install Software SMIT panel with all the installation options (Figure 3-30 on page 124).

Chapter 3. High availability cluster implementation

123

Install Software Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] /tmp/hacmp [_all_latest] no yes no yes yes no no yes no yes no no F4=List F8=Image

* INPUT device / directory for software * SOFTWARE to install PREVIEW only? (install operation will NOT occur) COMMIT software updates? SAVE replaced files? AUTOMATICALLY install requisite software? EXTEND file systems if space needed? OVERWRITE same or newer versions? VERIFY install and check file sizes? Include corresponding LANGUAGE filesets? DETAILED output? Process multiple volumes? ACCEPT new license agreements? Preview new LICENSE agreements? F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do

+ + + + + + + + + + + + +

Figure 3-30 Install Software SMIT panel with all installation options

4. Press Enter to install all HACMP 5.1 Licensed Program Products (LPPs) in the selected directory. 5. SMIT displays an installation confirmation dialog as shown in Figure 3-31 on page 125. Press Enter to continue. The COMMAND STATUS SMIT panel is displayed. Throughout the rest of this redbook, if a SMIT confirmation dialog is displayed it is assumed you will know how to respond to it, so we do not show this step again.

124

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+--------------------------------------------------------------------------+ ARE YOU SURE? Continuing may delete information you may want to keep. This is your last chance to stop before continuing. Press Enter to continue. Press Cancel to return to the application. F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do +--------------------------------------------------------------------------+ Figure 3-31 Installation confirmation dialog for SMIT

6. The COMMAND STATUS SMIT panel displays the progress of the installation. Installation will take several minutes, depending upon the speed of your machine. When the installation completes, the panel looks similar to Figure 3-32.

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] geninstall -I "a -cgNpQqwX -J" -Z -d /usr/sys/inst.images/hacmp/hacmp_510 -f Fi le 2>&1 File: I:cluster.hativoli.client I:cluster.hativoli.server I:cluster.haview.client I:cluster.haview.server

5.1.0.0 5.1.0.0 4.5.0.0 4.5.0.0

******************************************************************************* [MORE...90] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 3-32 COMMAND STATUS SMIT panel showing successful installation of HACMP 5.1

Chapter 3. High availability cluster implementation

125

Update HACMP 5.1


After installing the base HACMP 5.1 Licensed Program Products (LPPs), you must upgrade it to the latest fixes available. To update HACMP 5.1: 1. Enter the command smitty update to start updating HACMP 5.1. The Update Software by Fix (APAR) SMIT panel is displayed as shown in Figure 3-33.

Update Software by Fix (APAR) Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] * INPUT device / directory for software [] +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-33 Update Software by Fix (APAR) SMIT panel displayed by running command smitty update

2. Enter in the INPUT device / directory for software field the directory that you used to store the fixes for APAR IY45695, then press Enter. We used /tmp/hacmp_fixes1 in our environment, as shown in Figure 3-35 on page 128.

126

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Update Software by Fix (APAR) Type or select a value for the entry field. Press Enter AFTER making all desired changes. [Entry Fields] [/tmp/hacmp_fixes1]

* INPUT device / directory for software

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-34 Entering directory of APAR IY45695 fixes into Update Software by Fix (APAR) SMIT panel

3. The Update Software by Fix (APAR) SMIT panel is displayed with all the update options. Move the cursor to the FIXES to install item as shown in Figure 3-35 on page 128, and press F4 (or Esc 4) to select the HACMP fixes to update.

Chapter 3. High availability cluster implementation

127

Update Software by Fix (APAR) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] /tmp/hacmp_fixes1 [] no yes no yes no no yes

* INPUT device / directory for software * FIXES to install PREVIEW only? (update operation will NOT occur) COMMIT software updates? SAVE replaced files? EXTEND file systems if space needed? VERIFY install and check file sizes? DETAILED output? Process multiple volumes?

+ + + + + + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-35 Preparing to select fixes for APAR IY45695 in Update Software by Fix (APAR) SMIT panel

4. The FIXES to install SMIT dialog is displayed as in Figure 3-36 on page 129. This lists all the fixes for APAR IY45695 that can be applied.

128

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+--------------------------------------------------------------------------+ FIXES to install Move cursor to desired item and press F7. Use arrow keys to scroll. ONE OR MORE items can be selected. Press Enter AFTER making all selections. [TOP] IY45538 - ENH: Updated Online Planning Worksheets for HACMP R510 IY45539 - ENH: clrgmove support of replicated resources IY47464 UPDATE WILL PUT IN TWO NAME_SERVER STANZAS IY47503 HAES,HAS: BROADCAST ROUTES EXIST ON LO0 INTERFACE AFTER IY47577 WITH TCB ACTIVE, MANY MSG 3001-092 IN HACMP.OUT DURING SYNCLVO IY47610 HAES: FAILURE TO UMOUNT EXPORTED FILESYSTEM WITH DEVICE BUSY - IY47777 IF ONE NODE UPGRADED TO HAES 5.1 SMIT START CLUSTER SERVICES IY48184 Fixes for Multiple Site Clusters [MORE...36] F1=Help F2=Refresh F3=Cancel F7=Select F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 3-36 Selecting fixes for APAR IY45695 in FIXES to install SMIT dialog

5. Select all fixes in the dialog by pressing F7 (or Esc 7) on each line so that a selection symbol (>) is added in front of each line as shown in Figure 3-37 on page 130. Press Enter after all fixes are selected.

Chapter 3. High availability cluster implementation

129

+--------------------------------------------------------------------------+ FIXES to install Move cursor to desired item and press F7. Use arrow keys to scroll. ONE OR MORE items can be selected. Press Enter AFTER making all selections. [MORE...36] > IY48918 CSPOC:Add a Shared FS gives error in cspoc.log > IY48922 CSPOC:disk replacement does not work > IY48926 incorrect version info on node_up > IY49152 cluster synch changes NW attribute from private to public > IY49490 ENH: relax clverify check for nodes in fast connect mt rg > IY49495 clstrmgr has memory leaks > IY49497 ENH: Need option to leave log files out of cluster snapshot > IY49498 Verification dialogs use inconsistent terminology. [BOTTOM] F1=Help F2=Refresh F3=Cancel F7=Select F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 3-37 Selecting all fixes of APAR IY45695 in FIXES to install SMIT dialog

6. The Update Software by Fix (APAR) SMIT panel is displayed again (Figure 3-38 on page 131), showing all the selected fixes from the FIXES to install SMIT dialog in the FIXES to install field. Press Enter to begin applying all fixes of APAR IY45695.

130

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Update Software by Fix (APAR) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] /tmp/hacmp_fixes1 [IY45538 IY45539 IY474> no yes no yes no no yes

* INPUT device / directory for software * FIXES to install PREVIEW only? (update operation will NOT occur) COMMIT software updates? SAVE replaced files? EXTEND file systems if space needed? VERIFY install and check file sizes? DETAILED output? Process multiple volumes?

+ + + + + + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-38 Applying all fixes of APAR IY45695 in Update Software by Fix (APAR) SMIT panel

7. The COMMAND STATUS SMIT panel is displayed. It shows the progress of the selected fixes for APAR IY45695 applied to the system. A successful update will appear similar to Figure 3-39 on page 132.

Chapter 3. High availability cluster implementation

131

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] instfix -d /usr/sys/inst.images/hacmp/hacmp_510_fixes -f /tmp/.instfix_selection s.12882 > File installp -acgNpqXd /usr/sys/inst.images/hacmp/hacmp_510_fixes -f File File: cluster.adt.es.client.include 05.01.0000.0002 cluster.adt.es.client.samples.clinfo 05.01.0000.0002 cluster.adt.es.client.samples.clstat 05.01.0000.0001 cluster.adt.es.client.samples.libcl 05.01.0000.0001 cluster.es.client.lib 05.01.0000.0002 cluster.es.client.rte 05.01.0000.0002 [MORE...67] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 3-39 COMMAND STATUS SMIT panel showing all fixes of APAR IY45695 successfully applied

8. Confirm that the fixes were installed by first exiting SMIT. Press F10 (or Esc 0) to exit SMIT. Then enter the following command:
lslpp -l "cluster.*"

The output should be similar to that shown in Example 3-17. Note that some of the Licensed Program Products (LPPs) show a version other than the 5.1.0.0 base version of HACMP. This confirms that the fixes were successfully installed.
Example 3-17 Confirming installation of fixes for APAR IY45695 [root@tivaix1:/home/root]lslpp -l "cluster.*" Fileset Level State Description ---------------------------------------------------------------------------Path: /usr/lib/objrepos cluster.adt.es.client.demos 5.1.0.0 COMMITTED ES Client Demos cluster.adt.es.client.include 5.1.0.2 COMMITTED ES Client Include Files cluster.adt.es.client.samples.clinfo

132

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5.1.0.2 COMMITTED cluster.adt.es.client.samples.clstat 5.1.0.1 COMMITTED cluster.adt.es.client.samples.demos 5.1.0.0 COMMITTED cluster.adt.es.client.samples.libcl 5.1.0.1 COMMITTED cluster.adt.es.java.demo.monitor 5.1.0.0 COMMITTED cluster.adt.es.server.demos 5.1.0.0 COMMITTED cluster.adt.es.server.samples.demos 5.1.0.1 COMMITTED cluster.adt.es.server.samples.images 5.1.0.0 COMMITTED cluster.doc.en_US.es.html 5.1.0.1 COMMITTED cluster.doc.en_US.es.pdf 5.1.0.1 COMMITTED

ES Client CLINFO Samples ES Client Clstat Samples ES Client Demos Samples ES Client LIBCL Samples ES Web Based Monitor Demo ES Server Demos ES Server Sample Demos ES Server Sample Images HAES Web-based HTML Documentation - U.S. English HAES PDF Documentation - U.S. English ES Cluster File System Support ES Client Libraries ES Client Runtime ES Client Utilities ES for AIX Concurrent Access ES CSPOC Commands ES CSPOC dsh ES CSPOC Runtime Commands ES Plugins - dhcp ES Plugins - Name Server

cluster.es.cfs.rte 5.1.0.1 COMMITTED cluster.es.client.lib 5.1.0.2 COMMITTED cluster.es.client.rte 5.1.0.2 COMMITTED cluster.es.client.utils 5.1.0.2 COMMITTED cluster.es.clvm.rte 5.1.0.0 COMMITTED cluster.es.cspoc.cmds 5.1.0.2 COMMITTED cluster.es.cspoc.dsh 5.1.0.0 COMMITTED cluster.es.cspoc.rte 5.1.0.2 COMMITTED cluster.es.plugins.dhcp 5.1.0.1 COMMITTED cluster.es.plugins.dns 5.1.0.1 COMMITTED cluster.es.plugins.printserver 5.1.0.1 COMMITTED ES Plugins - Print Server cluster.es.server.diag 5.1.0.2 COMMITTED ES Server Diags cluster.es.server.events 5.1.0.2 COMMITTED ES Server Events cluster.es.server.rte 5.1.0.2 COMMITTED ES Base Server Runtime cluster.es.server.utils 5.1.0.2 COMMITTED ES Server Utilities cluster.es.worksheets 5.1.0.2 COMMITTED Online Planning Worksheets cluster.license 5.1.0.0 COMMITTED HACMP Electronic License cluster.msg.en_US.cspoc 5.1.0.0 COMMITTED HACMP CSPOC Messages - U.S. English cluster.msg.en_US.es.client 5.1.0.0 COMMITTED ES Client Messages - U.S. English cluster.msg.en_US.es.server 5.1.0.0 COMMITTED ES Recovery Driver Messages U.S. English Path: /etc/objrepos cluster.es.client.rte cluster.es.clvm.rte

5.1.0.0 COMMITTED 5.1.0.0 COMMITTED

ES Client Runtime ES for AIX Concurrent Access

Chapter 3. High availability cluster implementation

133

cluster.es.server.diag cluster.es.server.events cluster.es.server.rte cluster.es.server.utils

5.1.0.0 5.1.0.0 5.1.0.2 5.1.0.0

COMMITTED COMMITTED COMMITTED COMMITTED

ES ES ES ES

Server Diags Server Events Base Server Runtime Server Utilities

Path: /usr/share/lib/objrepos cluster.man.en_US.es.data 5.1.0.2 COMMITTED

ES Man Pages - U.S. English

9. Repeat this procedure for each node in the cluster to install the LPPs for APAR IY45695. 10.Repeat this entire procedure for all the fixes corresponding to the preceding PTFs. Enter the directory path these fixes are stored in into the INPUT device / directory for software field referred to by step 2. We used /tmp/hacmp_fixes2 in our environment.

Remove HACMP
If you make a mistake with the HACMP installation, or if subsequent configuration fails due to Object Data Manager (ODM) errors or another error that prevents successful configuration, you can remove HACMP to recover to a known state. Removing resets all ODM entries, and removes all HACMP files. Re-installing will create new ODM entries, and often solve problems with corrupted HACMP ODM entries. To remove HACMP: 1. Enter the command smitty remove. 2. The Remove Installed Software SMIT panel is displayed. Enter the following text in the SOFTWARE name field: cluster.*, as shown in Figure 3-40 on page 135.

134

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Remove Installed Software Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [cluster.*] yes no no no

* SOFTWARE name PREVIEW only? (remove operation will NOT occur) REMOVE dependent software? EXTEND file systems if space needed? DETAILED output?

+ + + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-40 How to specify removal of HACMP in Remove Installed Software SMIT panel

3. Move the cursor to the PREVIEW only? (remove operation will NOT occur) field and press Tab to change the value to no, change the EXTEND file systems if space needed? field to yes, and change the DETAILED output field to yes, as shown in Figure 3-41 on page 136.

Chapter 3. High availability cluster implementation

135

Remove Installed Software Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [cluster.*] no no yes yes

* SOFTWARE name PREVIEW only? (remove operation will NOT occur) REMOVE dependent software? EXTEND file systems if space needed? DETAILED output?

+ + + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 3-41 Set options for removal of HACMP in Installed Software SMIT panel

4. Press Enter to start removal of HACMP. The COMMAND STATUS SMIT panel displays the progress and final status of the removal operation. A successful removal looks similar to Figure 3-42 on page 137.

136

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] geninstall -u -I "pX -V2 -J -w" -Z -f File 2>&1 File: cluster.* +-----------------------------------------------------------------------------+ Pre-deinstall Verification... +-----------------------------------------------------------------------------+ Verifying selections...done Verifying requisites...done Results... [MORE...134] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 3-42 Successful removal of HACMP as shown by COMMAND STATUS SMIT panel

5. Press F10 (or Esc 0) to exit SMIT. When you finish the installation of HACMP, you need to configure it for the application servers you want to make highly available. In this redbook, we show how to do this with IBM Tivoli Workload Scheduler first in 4.1.10, Configure HACMP for IBM Tivoli Workload Scheduler on page 210, then IBM Tivoli Management Framework in 4.1.11, Add IBM Tivoli Management Framework on page 303.

Chapter 3. High availability cluster implementation

137

3.3 Implementing a Microsoft Cluster


In this section, we walk you through the installation process for a Microsoft Cluster (also referred to as Microsoft Cluster Service or MSCS throughout the book). We also discuss the hardware and software aspects of MSCS, as well as the installation procedure. The MSCS environment that we create in this chapter is a two-node hot standby cluster. The system will share two external SCSI drives connected to each of the nodes via a Y-cable. Figure 3-43 illustrates the system configuration.

Public NIC IP 9.3.4.197

Private NIC IP 192.168.1.1

Private NIC IP 192.168.1.2

Public NIC IP 9.3.4.198

X: SCSI ID-5

C:

Y : & Z: SCSI ID-4

C:

tivw2k1

SCSI SCSI ID-6

Figure 3-43 Microsoft Cluster environment

The cluster is connected using four Network Interface Cards (NICs). Each node has a private NIC and a public NIC. In an MSCS, the heartbeat connection is referred to as a private connection. The private connection is used for internal cluster communications and is connected between the two nodes using a crossover cable. The public NIC is the adapter that is used by the applications that are running locally on the server, as well as cluster applications that may move between the nodes in the cluster. The operating system running on our nodes is Windows 2000 Advanced Edition with Service Pack 4 installed. In our initial cluster installation, we will set up the default cluster group. Cluster groups in an MSCS environment are logical groups of resources that can be moved from one node to another. The default cluster group that we will set up will

SCSI SCSI ID-7

tivw2k2

138

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

contain the shared drive X: an IP address (192.168.1.197) and a network name (tivw2kv1).

3.3.1 Microsoft Cluster hardware considerations


When designing a Microsoft Cluster, it is important to make sure all the hardware you would like to use is compatible with the Microsoft Cluster software. To make this easy Microsoft maintains a Hardware Compatibility List (HCL) found at:
http://www.microsoft.com/whdc/hcl/search.mspx

Check the HCL before you order your hardware to ensure your cluster configuration will be supported.

3.3.2 Planning and designing a Microsoft Cluster installation


You need to execute some setup tasks before you start installing Microsoft Cluster Service. Following are the requirements for a Microsoft Cluster: Configure the Network Interface Cards (NICs) Each node in the cluster will need two NICs: one for public communications, and one for private cluster communications. The NICs will have to be configured with static IP addresses.Table 3-14 shows our configuration.
Table 3-14 NIC IP addresses Node tivw2k1 (public) tivw2k1 (private) tivw2k2 (public) tivw2k2 (private) IP 9.3.4.197 192.168.1.1 9.3.4.198 192.168.1.2

Set up the Domain Name System (DNS) Make sure all IP addresses for your NICs, and IP addresses that will be used by the cluster groups, are added to the Domain Name System (DNS). The private NIC IP addresses do not have to be added to DNS. Our configuration will require that the IP addresses and names listed in Table 3-15 on page 140 be added to the DNS.

Chapter 3. High availability cluster implementation

139

Table 3-15 DNS entries required for the cluster Hostname tivw2k1 tivw2k2 tivw2kv1 tivw2kv2 IP Address 9.3.4.197 9.3.4.198 9.3.4.199 9.3.4.175

Set up the shared storage When setting up the shared storage devices, ensure that all drives are partitioned correctly and that they are all formatted with the NT filesystem (NTFS). When setting up the drives, ensure that both nodes are assigned the same driver letters for each partition and are set up as basic drives. We chose to set up our drive letters starting from the end of the alphabet so we would not interfere with any domain login scripts or temporary storage devices. If you are using SCSI drives, ensure that the drives are all using different SCSI IDs and that the drives are terminated correctly. When you partition your drives, ensure you set up a partition specifically for the quorum. The quorum is a partition used by the cluster service to store cluster configuration database checkpoints and log files. The quorum partition needs to be at least 100 MB in size. Important: Microsoft recommends that the quorum partition be on a separate disk and also recommends the partition be 500 MB in size Table 3-16 illustrates how we set up our drives.
Table 3-16 Shared drive partition table Disk Disk 1 DIsk 2 Drive Letter X: Y: Z: Size 34 GB 33.9 GB 100 MB Label Partition1 Partition 2 Quorum

Note: When configuring the disks, make sure that you configure them on one node at a time and that the node that is not being configured is powered off. If both nodes try to control the disk at the same time, they may cause disk corruption.

140

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Update the operating system Before installing the cluster service, connect to the Microsoft Software Update Web site to ensure you have all the latest hardware drivers and software patches installed. The Microsoft Software Update Web site can be found at:
http://windowsupdate.microsoft.com

Create a domain account for the cluster The cluster service requires that a domain account be created under which the cluster service will run. The domain account must be a member of the administrator group on each of the nodes in the cluster. Make sure you set the account so that the user cannot change the password and that the password never expires. We created the account cluster_service for our cluster. Add nodes to the domain The cluster service runs under a domain account. In order for the domain account to be able to authenticate against the domain controller, the nodes must join the domain where the cluster user has been created.

3.3.3 Microsoft Cluster Service installation


Here we discuss the Microsoft Cluster Service installation process. The installation is broken into three sections: installation of the primary node; installation of the secondary node; and configuration of the cluster resources. Following is a high-level overview of the installation procedure. Detailed information for each step in the process are provided in the following sections.

Installation of the MSCS node 1


Important: Before starting the installation on Node 1, make sure that Node 2 is powered off. The cluster service is installed as a Windows component. To install the service, the Windows 2000 Advanced Server CD-ROM should be in CD-ROM drive. You can save time by copying the i386 directory from the CD to the local drive. 1. To start the installation, open the Start menu and select Settings -> Control Panel and then double-click Add/Remove Programs.

Chapter 3. High availability cluster implementation

141

2. Click Add/Remove Windows Components, located on the left side of the window. Select Cluster Service from the list of components as shown in Figure 3-44, then click Next.

Figure 3-44 Windows Components Wizard

142

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

3. Make sure that Remote administration mode is checked and click Next (Figure 3-45). You will be asked to insert the Windows 2000 Advanced Server CD if it is not already inserted. If you copied the CD to the local drive, select the location where it was copied.

Figure 3-45 Windows Components Wizard

Chapter 3. High availability cluster implementation

143

4. Click Next at the welcome screen (Figure 3-46).

Figure 3-46 Welcome screen

144

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5. The next window (Figure 3-47) is used by Microsoft to verify that you are aware that it will not support hardware that is not included in its Hardware Compatibility List (HCL). To move on to the next step of the installation, click I Understand and then click Next.

Figure 3-47 Hardware Configuration

Chapter 3. High availability cluster implementation

145

6. Now that we have located the installation media and have acknowledged the support agreement we can start the actually installation. The next screen is used to select whether you will be installing the first node or any additional node. We will install the first node in the cluster at this point so make sure that the appropriate radio button is selected and click Next (Figure 3-48). We will return to this screen again later when we install the second node in the cluster.

Figure 3-48 Create or Join a Cluster

146

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

7. We must now name our cluster. The name is the local name associated with the whole cluster. This is not the virtual name that is associated with the a cluster group. This is used by the Microsoft Cluster Administrator utility to administer the cluster resources. We prefer to use the same as the hostname to prevent confusion. In this case we call it TIVW2KV1. After you have entered a name for your cluster, click Next (Figure 3-49).

Figure 3-49 Cluster Name

Chapter 3. High availability cluster implementation

147

8. The next step is to enter the domain account that the cluster service will use. See the pre-installation setup section for details on setting up the domain account that the cluster service will use. Click Next (Figure 3-50).

Figure 3-50 Select an Account

148

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

9. The next window is used to determine the disks that the cluster service will manage. In the example we have two partitions, one for the quorum and another for the data. Make sure both are set up as managed disks. Click Next (Figure 3-51).

Figure 3-51 Add or Remove Managed Disks

Chapter 3. High availability cluster implementation

149

10.We now need to select where the cluster checkpoint and log files will be stored. This disk is referred to as the Quorum Disk. The quorum is a vital part of the cluster as it used for storing critical cluster files. If the data on the Quorum Disk becomes corrupt, the cluster will be unusable. It is important to back up this data regularly so you will be able to recover your cluster. It is recommended that you have at least 100 MB on a separate partition for reserved for this purpose; refer to the preinstallation setup section on disk preparation. After you select your Quorum Disk, select Next. (Figure 3-52).

Figure 3-52 Cluster File Storage

150

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

11.The next step is to configure networking. A window will pop up to recommend that you use multiple public adapters to remove any single point of failure. Click Next to continue (Figure 3-53).

Figure 3-53 Warning window

Chapter 3. High availability cluster implementation

151

12.The next section will prompt you to identify each NIC as either public, private or both. Since we named our adapters ahead of time, this is easy. Set the adapter that is labeled Public Network Connection as Client access only (public network). Click Next (Figure 3-54).

Figure 3-54 Network Connections - All communications

152

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

13.Now we will configure the private network adapter. This adapter is used as a heartbeat connection between the two nodes of the cluster and is connected via a crossover cable. Since this adapter is not accessible from the public network, this is considered a private connection and should be configured as Internal cluster communications only (private network). Click Next (Figure 3-55).

Figure 3-55 Network Connections - Internal cluster communications only (private network)

Chapter 3. High availability cluster implementation

153

14.Because we configured two adapters to be capable of communicating as private adapters, we need to select the priority in which the adapters will communicate. In our case, we want the Private Network Connection to serve as our primary private adapter. We will use the Public Network Connection as our backup adapter. Click Next to continue (Figure 3-56).

Figure 3-56 Network priority setup

154

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

15.Once the network adapters have been configured, it is time to create the cluster resources. The first cluster resource is the cluster IP address. The cluster IP address is the IP address associated with the cluster resource group; it will follow the resource group when it is moved from node to node. This cluster IP address is commonly referred to as the virtual IP. Set up the cluster IP address you will need to enter the IP address and subnet mask that you plan to use, and select the Public Network Connection as the network to use. Click Next (Figure 3-57). .

Figure 3-57 Cluster IP Address

Chapter 3. High availability cluster implementation

155

16.Click Finish to complete the cluster service configuration (Figure 3-58).

Figure 3-58 Cluster Service Configuration Wizard

17.The next window is just an informational pop-up letting you know that the Cluster Administrator application is now available. The cluster service is managed using the Cluster Administrator tool. Click OK (Figure 3-60 on page 157).

Figure 3-59 Cluster Service Configuration Wizard

156

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

18.Click Finish one more time to close the installation wizard (Figure 3-60).

Figure 3-60 Windows Components Wizard

At this point, the installation of the cluster service on the primary node is compete. Now that we have created a cluster, we will need to add additional nodes to the cluster.

Installing the second node


The next step is to install the second node in the cluster. To add the second node, you will have to perform the following steps on the secondary node. The installation of the secondary node is relatively easy, since the cluster is configured during the installation on the primary node. The first few steps are identical to installing the cluster service on the primary node. To install the cluster service on the secondary node: 1. Go to the Start Menu and select Settings -> Control Panel and double-click Add/Remove Programs.

Chapter 3. High availability cluster implementation

157

2. Click Add/Remove Windows Components, located on the left side of the window, and then select Cluster Service from the list of components. Click Next to start the installation (Figure 3-61).

Figure 3-61 Windows Components Wizard

158

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

3. Make sure the Remote administration mode is selected (Figure 3-62); it should be the only option available. Click Next to continue.

Figure 3-62 Windows Components Wizard

Chapter 3. High availability cluster implementation

159

4. Click Next past the welcome screen (Figure 3-63).

Figure 3-63 Windows Components Wizard

160

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5. Once again you will have to verify that the hardware that you have selected is compatible with the software you are installing and that you understand that Microsoft will not support software not on the HCL. Click I Understand and then Next to continue (Figure 3-64).

Figure 3-64 Hardware Configuration

Chapter 3. High availability cluster implementation

161

6. The next step is to select that you will be adding the second node to the cluster. Once the second node option is selected, click Next to continue (Figure 3-65).

Figure 3-65 Create or Join a Cluster

162

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

7. You will now have to type in the name of the cluster that you would like the second node to be a member. Since we set up a domain account to be used for the cluster service, we will not need to check the connect to cluster box. Click Next (Figure 3-66).

Figure 3-66 Cluster Name

Chapter 3. High availability cluster implementation

163

8. The next window prompts you for a password for the domain account that we installed the primary node with. Enter the password and click Next (Figure 3-67).

Figure 3-67 Select an Account

164

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

9. Click Finish to complete the installation (Figure 3-68).

Figure 3-68 Finish the installation

Chapter 3. High availability cluster implementation

165

10.The next step is to verify that the cluster works. To verify that the cluster is operational, we will need to open the Cluster Administrator. You can open the Cluster Administrator in the Start Menu by selecting Programs -> Administrative Tools -> Cluster Administrator (Figure 3-69). You will notice that the cluster will have two groups: one called cluster group, and the other called Disk Group 1: The cluster group is the group that contains the virtual IP and name and cluster shared disk. Disk Group 1 at this time only contains our quorum disk. In order to verify that the cluster is functioning properly, we need to move the cluster group from one node to the other. You can move the cluster group by right-clicking the icon and selecting Move Group. After you have done this, you should see the group icon change for a few seconds while the resources are moved to the secondary node. Once the group has been moved, you should see that the icon return to normal and the owner of the group should now be the second node in the cluster.

Figure 3-69 Verifying that the cluster works

The cluster service is now installed and we are ready to start adding applications to our cluster groups.

166

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configuring the cluster resources


Now it is time to configure the cluster resources. The default setup using the cluster service installation wizard is not optimal for our Tivoli environment. For the scenarios used later in this book, we have to set up the cluster resources for a mutual takeover scenario. To support this, we have to modify the current resource groups and add two resources. Figure 3-70 illustrates the desired configuration.

tivw2k1
TIVW2KV1 Resource Group Driv e X: IP Address 9.3.4.199 Network Name TIVW2KV1

tivw2k2

TIVW2KV2 Resource Group Driv e Y : Z: IP Address 9.3.4.175 Network Name TIVW2KV2

Figure 3-70 Cluster resource diagram

The following steps will guide you through the cluster configuration.

Chapter 3. High availability cluster implementation

167

1. The first step is to rename the cluster resource groups. a. Right-click the cluster group containing the Y: and Z: drive resource and select Rename (Figure 3-71). Enter the name TIVW2KV1. b. Right-click the cluster group containing the X: drive resource and select Rename. Enter the name TIVW2KV2.

Figure 3-71 Rename the cluster resource groups

168

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

2. Now we will need to move the disk resources to the correct groups. a. Right-click the Disk Y: Z: resource under the TIVW2KV1 resource group and select Change Group -> TIVW2KV2 as shown in Figure 3-72.

Figure 3-72 Changing resource groups

b. Press Yes to complete the move (Figure 3-73).

Figure 3-73 Resource move confirmation

c. Right -lick the Disk X: resource under the TIVW2KV2 resource group and select Change Group -> TIVW2KV1. d. Press Yes to complete the move.

Chapter 3. High availability cluster implementation

169

3. The next step is to rename the resources. We do this so we can determine which resource group a resource belongs to by it name. a. Right-click the Cluster IP Address resource under the TIVW2KV1 resource group and select Rename (Figure 3-74). Enter the name TIVW2KV1 Cluster IP Address. b. Right-click the Cluster Name resource under the TIVW2KV1 resource group and select Rename. Enter the name TIVW2KV1 - Cluster Name. c. Right- click the Disk X: resource under the TIVW2KV1 resource group and select Rename. Enter the name TIVW2KV1 - Disk X:. d. Right-click the Disk Y: Z: resource under the TIVW2KV2 resource group and select Rename. Enter the name TIVW2KV2 - Disk Y: Z:.

Figure 3-74 Rename resources

170

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

4. We now need to add two resources under the TIVW2KV2 resource group. The first resource we will add is the IP Address resource. a. Right-click the TIVW2KV2 resource group and select New -> Resource (Figure 3-75).

Figure 3-75 Add a new resource

Chapter 3. High availability cluster implementation

171

b. Enter TIVW2KV2 - IP Address in the name field and set the resource type to IP address. Click Next (Figure 3-76).

Figure 3-76 Name resource and select resource type

172

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

c. Select both TIVW2K1 and TIVW2K2 and possible owners of the resource. Click Next (Figure 3-77).

Figure 3-77 Select resource owners

Chapter 3. High availability cluster implementation

173

d. Click Next past the dependencies screen; no dependencies need to be defined at this time (Figure 3-78).

Figure 3-78 Dependency configuration

174

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

e. The next step is to configure the IP address associated with the resource. Enter the IP address 9.3.4.175 in the Address field and add the subnet mask of 255.255.255.254. Make sure the Public Network Connection is selected in the Network field and the Enable NetBIOS for this address box is checked. Click Next (Figure 3-79).

Figure 3-79 Configure IP address

f. Click OK to complete the installation (Figure 3-80).

Figure 3-80 Completion dialog

Chapter 3. High availability cluster implementation

175

5. Now that the IP address resource has been created, we need to create the Name resource for the TIVW2KV2 cluster group. a. Right-click the TIVW2KV2 resource group and select New -> Resource (Figure 3-81).

Figure 3-81 Adding a new resource

176

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

b. Set the name of the resource to TIVW2KV2 - Cluster Name and specify the resource type to be Network Name. Click Next (Figure 3-82).

Figure 3-82 Specify resource name and type

Chapter 3. High availability cluster implementation

177

c. Next select both TIVW2K1 and TIVW2K2 as possible owners of the resource. Click Next (Figure 3-83).

Figure 3-83 Select resource owners

178

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

d. Click Next in the Dependencies screen (Figure 3-84). We do not need to configure these at this time.

Figure 3-84 Resource dependency configuration

Chapter 3. High availability cluster implementation

179

e. Next we will enter the cluster name for the TIVW2KV2 resource group. Enter the cluster name TIVW2KV2 in the Name field. Click Next (Figure 3-85).

Figure 3-85 Cluster name

f. Click OK to complete the cluster name configuration (Figure 3-86).

Figure 3-86 Completion dialog

180

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

6. The final step of the cluster configuration is to bring the TIVW2KV2 resource group online. To do this, right-click the TIVW2KV2 resource group and select Bring Online (Figure 3-87).

Figure 3-87 Bring resource group online

This concludes our cluster configuration.

Chapter 3. High availability cluster implementation

181

182

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 4.

IBM Tivoli Workload Scheduler implementation in a cluster


In this chapter, we cover implementation of IBM Tivoli Workload Scheduler in an HACMP and an MCSC cluster. The chapter is divided into the following main sections: Implementing IBM Tivoli Workload Scheduler in an HACMP cluster on page 184 Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster on page 347

Copyright IBM Corp. 2004. All rights reserved.

183

4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster


In this section, we describe the steps to implement IBM Tivoli Workload Scheduler in an HACMP cluster. We use the mutual takeover scenario described in 3.1.1, Mutual takeover for IBM Tivoli Workload Scheduler on page 64. Note: In this section we assume that you have finished planning your cluster and have also finished the preparation tasks to install HACMP. If you have not finished these tasks, perform the steps described in Chapter 3, Planning and Designing an HACMP Cluster, and the preparation tasks described in Chapter 3 Installing HACMP. We strongly recommend that you install IBM Tivoli Workload Scheduler before HACMP, and confirm that IBM Tivoli Workload Scheduler runs without any problem. It is important that you also confirm that IBM Tivoli Workload Scheduler is able to fallover and fallback between nodes, by manually moving the volume group between nodes. This verification procedure is described in Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster on page 202.

4.1.1 IBM Tivoli Workload Scheduler implementation overview


Figure 4-1 on page 185 shows a diagram of a IBM Tivoli Workload Scheduler implementation in a mutual takeover HACMP cluster. Using this diagram, we will describe how IBM Tivoli Workload Scheduler could be implemented, and what you should be aware of. Though we do not describe a hot standby scenario of IBM Tivoli Workload Scheduler, the steps used to configure IBM Tivoli Workload Scheduler for a mutual takeover scenario also cover what should be done for a hot standby scenario.

184

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster:

cltivoli

Mount Point 1: /usr/maestro User: maestro Mount Point 2: /usr/ maestro2 User: maestro2

TWS Engine1 nmport=31111 IP=tivaix1_svc TWS Engine2 nmport=31112 IP=tivaix2_svc

Mount Point 1: /usr/maestro User: maestro Mount Point 2: /usr/maestro2 User: maestro2

tivaix1
Figure 4-1 IBM Tivoli Workload Scheduler implementation overview

tivaix2

To make IBM Tivoli Workload Scheduler highly available in an HACMP cluster, the IBM Tivoli Workload Scheduler instance should be installed on the external shared disk. This means that the /TWShome directory should reside on the shared disk and not the locally attached disk. This is the bottom line to enable HACMP to relocate the IBM Tivoli Workload Scheduler engine from one node to another, along with other system components such as external disks and service IP labels. When implementing IBM Tivoli Workload Scheduler in a cluster, there are certain items you should be aware of, such as the location of the IBM Tivoli Workload Scheduler engine and the IP address used for IBM Tivoli Workload Scheduler workstation definition. Specifically for a mutual takeover scenario, you have more to consider, as there will be multiple instances of IBM Tivoli Workload Scheduler running on one node. Following are the considerations you need to keep in mind when implementing IBM Tivoli Workload Scheduler in an HACMP cluster. The following considerations apply for Master Domain Manager, Domain Manager, Backup Domain Manager and FTA.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

185

Location of IBM Tivoli Workload Scheduler engine executables As mentioned earlier, IBM Tivoli Workload Scheduler engine should be installed in the external disk to be serviced by HACMP. In order to have the same instance of IBM Tivoli Workload Scheduler process its job on another node after a fallover, executables must be installed on the external disk. For Version 8.2, all files essential to IBM Tivoli Workload Scheduler processing are installed in the /TWShome directory. The /TWShome directory should reside on file systems on the shared disk. For versions prior to 8.2, IBM Tivoli Workload Scheduler executables should be installed in a file system with the mount point above the /TWShome directory. For example, if /TWShome is /usr/maestro/maestro, the mount point should be /usr/maestro. In a mutual takeover scenario, you may have a case where multiple instances of IBM Tivoli Workload Scheduler are installed on the shared disk. In such a case, make sure these instances are installed on separate file systems residing on separate volume groups. Creating mount points on standby nodes Create a mount point for the IBM Tivoli Workload Scheduler file system on all nodes that may run that instance of IBM Tivoli Workload Scheduler. When configuring for a mutual takeover, make sure that you create mount points for every IBM Tivoli Workload Scheduler instance that may run a node. In Figure 4-1 on page 185, nodes tivaix1 and tivaix2 may both have two instances of IBM Tivoli Workload Scheduler engine running in case of a node failure. Note that in the diagram, both nodes have mount points for TWS Engine1 and TWS Engine2. IBM Tivoli Workload Scheduler user account and group account On each node, create a IBM Tivoli Workload Scheduler user and group for all IBM Tivoli Workload Scheduler instances that may run on the node. The users home directory must be set to /TWShome. If a IBM Tivoli Workload Scheduler instance will fallover and fallback among several nodes in a cluster, make sure all those nodes have the IBM Tivoli Workload Scheduler user and group defined to control that instance. In the mutual takeover scenario, you may have multiple instances running at the same time on one node. Make sure you create separate users for each IBM Tivoli Workload Scheduler instance in your cluster so that you are able to control them separately. In our scenario, we add user maestro and user maestro2 on both nodes because TWS Engine1 and TWS Engine2 should be able to run on both nodes. The same group accounts should be created on both nodes to host these users.

186

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Netman port When there will be only one instance of IBM Tivoli Workload Scheduler running on a node, using the default port (31111) is sufficient. For a mutual takeover scenario, you need to consider setting different port numbers for each IBM Tivoli Workload Scheduler instance in the cluster. This is because several instances of IBM Tivoli Workload Scheduler may run on the same node, and no IBM Tivoli Workload Scheduler instance on the same node should have same netman port. In our scenario, we set the netman port of TWS Engine1 to 31111, and the netman port of TWS Engine2 to 31112. IP address The IP address or IP label specified in the workstation definition should be the service IP address or the service IP label for HACMP. If you plan a fallover or a fallback for an IBM Tivoli Workload Scheduler instance, it should not use an IP address or IP label that is bound to a particular node. (Boot address and persistent address used in an HACMP cluster are normally bound to one node, so these should not be used.) This is to ensure that IBM Tivoli Workload Scheduler instance does not lose connection with other IBM Tivoli Workload Scheduler instances in case of a fallover or a fallback. In our diagram, note that TWS_Engine1 uses a service IP address called tivaix1_service, and TWS_Engine2 uses a service IP address called tivaix2_service. These service IP address will move along with the IBM Tivoli Workload Scheduler instance from one node to another. Starting and stopping IBM Tivoli Workload Scheduler instances IBM Tivoli Workload Scheduler instances should be started and stopped from HACMP application start and stop scripts. Generate a custom script to start and stop each IBM Tivoli Workload Scheduler instance in your cluster, then when configuring HACMP, associate your custom scripts to resource groups that your IBM Tivoli Workload Scheduler instances reside in. If you put IBM Tivoli Workload Scheduler under the control of HACMP, it should not be started from /etc/inittab or from any other way except for application start and stop scripts. Files installed on the local disk Though most IBM Tivoli Workload Scheduler executables are installed in the IBM Tivoli Workload Scheduler file system, some files are installed on local disks. You may have to copy these local files to other nodes. For IBM Tivoli Workload Scheduler 8.2, copy the /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a file.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

187

For IBM Tivoli Workload Scheduler8.1, you may need to copy the following files to any node in the cluster that will host the IBM Tivoli Workload Scheduler instance: /usr/unison/components /usr/lib/libatrc.a /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a Monitoring the IBM Tivoli Workload Scheduler process HACMP is able to monitor application processes. It can be configured to initiate a cluster event based on application process failures. When considering to monitor TWS using HACMPs application monitoring, keep in mind that IBM Tivoli Workload Scheduler stops and restarts its all its processes (excluding the netman process) every 24 hours. The recycling of the processes is initiated by the FINAL jobstream, which is set to run at a certain time everyday. Be aware that if you configure HACMP to initiate an action in the event of a TWS process failure, this expected behavior of IBM Tivoli Workload Scheduler could be interpreted as a failure of IBM Tivoli Workload Scheduler processes, and could trigger unwanted action. If you simply want to monitor process failures, we recommend that you use monitoring software (for example, IBM Tivoli Monitoring.)

4.1.2 Preparing to install


Before installing IBM Tivoli Workload Scheduler in an HACMP cluster, define the IBM Tivoli Workload Scheduler group and user account on each node that will host IBM Tivoli Workload Scheduler. The following procedure presents an example of how to prepare for an installation of IBM Tivoli Workload Scheduler 8.2 on AIX 5.2. We assume that IBM Tivoli Workload Scheduler file system is already created as described in 3.2.3, Planning and designing an HACMP cluster on page 67. In our scenario, we added a group named tivoli, users maestro and maestro2 on each node. 1. Creating group accounts Execute the following on all the nodes that IBM Tivoli Workload Scheduler instance will run. a. Enter the following command; this will take you to the SMIT Groups menu:
# smitty groups

b. From the Groups menu, select Add a Group. c. Enter a value for each of the following items:

188

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Group NAME ADMINISTRATIVE Group Group ID

Assign a name for the group. true Assign a group ID. Assign the same ID for all nodes in the cluster.

Figure 4-2 shows an example of adding a group. We added group tivoli with an ID 2000.

Add a Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tivoli] true [2000] [] []

* Group NAME ADMINISTRATIVE group? Group ID USER list ADMINISTRATOR list

+ # + +

F1=Help Esc+5=Reset Esc+9=Shell

F2=Refresh Esc+6=Command Esc+0=Exit

F3=Cancel Esc+7=Edit Enter=Do

F4=List Esc+8=Image

Figure 4-2 Adding a group

2. Adding IBM Tivoli Workload Scheduler users Perform the following procedures for all nodes in the cluster: a. Enter the following command; this will take you to the SMIT Users menu:
# smitty user

b. From the Users menu, select Add a User. c. Enter the values for the following item, then press Enter. The other items should be left as it is. User NAME Assign a name for the user.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

189

User ID ADMINISTRATIVE USER? Primary GROUP Group SET HOME directory

Assign an ID for the user. This ID for the user should be the same on all nodes. false Set the group that you have defined in the previous step. Set the primary group and the staff group. Set /TWShome.

Figure 4-3 shows an example of a IBM Tivoli Workload Scheduler user definition. In the example, we defined maestro user.

Add a User Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] * User NAME User ID ADMINISTRATIVE USER? Primary GROUP Group SET ADMINISTRATIVE GROUPS ROLES Another user can SU TO USER? SU GROUPS HOME directory Initial PROGRAM User INFORMATION EXPIRATION date (MMDDhhmmyy) [MORE...37] F1=Help Esc+5=Reset Esc+9=Shell F2=Refresh Esc+6=Command Esc+0=Exit F3=Cancel Esc+7=Edit Enter=Do [Entry Fields] [maestro] [1001] false [tivoli] [tivoli,staff] [] [] true [ALL] [/usr/maestro] [] [] [0]

# + + + + + + +

F4=List Esc+8=Image

Figure 4-3 Defining a user

d. After you have added the user, modify the $HOME/.profile of the user. Modify the PATH variable to include the /TWShome and /TWShome/bin directory. This enables you to run IBM Tivoli Workload Scheduler commands in any directory as long as you are logged in as the IBM Tivoli Workload Scheduler user. Also add the TWS_TISDIR variable. The value for the TWS_TISDIR should be the /TWShome directory. The TWS_TISDIR enables IBM Tivoli Workload Scheduler to display

190

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

messages in the correct language codeset. Example 4-1 shows an example of how the variable should be defined. In the example, /usr/maestro is the /TWShome directory.
Example 4-1 An example .profile for TWSusr PATH=/TWShome:/TWShome/bin:$PATH export PATH TWS_TISDIR=/usr/maestro export TWS_TISDIR

4.1.3 Installing the IBM Tivoli Workload Scheduler engine


In this section, we show you the steps to install IBM Tivoli Workload Scheduler 8.2 Engine (Master Domain Manager) from the command line. For procedures to install IBM Tivoli Workload Scheduler using the graphical user interface, refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273. In our scenario, we installed two TWS instances called TIVAIX1 and TIVAIX2 on a shared external disk. TIVAIX1 was installed from node tivaix1, and TIVAIX2 was installed from tivaix2. We used the following steps to do this. 1. Before installing, identify the following items. These items are required when running the installation script. workstation type - master workstation name - The name of the workstation. This is the value for the host field that you specify in the workstation definition. It will also be recorded in the globalopts file. netman port - Specify the listening port for netman. We remind you again that if you plan to have several instances of IBM Tivoli Workload Scheduler running on machine, make sure you specify different port numbers for each IBM Tivoli Workload Scheduler instance. company name - Specify this if you would like your company name in reports produced by IBM Tivoli Workload Scheduler report commands. 2. Log in to the node where you want to install the IBM Tivoli Workload Scheduler engine, as a root user. 3. Confirm that the IBM Tivoli Workload Scheduler file system is mounted. If it is not mounted, use the mount command to mount the IBM Tivoli Workload Scheduler file system. 4. Insert IBM Tivoli Workload Scheduler Installation Disk 1.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

191

5. Locate the twsinst script in the directory of the platform on which you want to run the script. The following is an example of installing a Master Domain Manager named TIVAIX1.
# ./twsinst -new -uname twsusr -cputype master -thiscpu cpuname -master cpuname -port port_no -company company_name

Where: twsusr - The name of the IBM Tivoli Workload Scheduler user. master - The workstation type. Refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273, for other options cpuname - The name of the workstation. For -thiscpu, specify the name of the workstation that you are installing. For -master, specify the name of the Master Domain Manager. When installing the Master Domain Manager, specify the same value for -thiscpu and -master. port_no - Specify the port number that netman uses to receive incoming messages other workstations. company_name - The name of your company (optional) Example 4-2 shows sample command syntax for installing Master Domain Manager TIVAIX1.
Example 4-2 twsinst script example for TIVAIX1 # ./twsinst -new -uname maestro -cputype master -thiscpu tivaix1 -master tivaix1 -port 31111 -company IBM

Example 4-3 shows sample command syntax for installing Master Domain Manager TIVAIX2.
Example 4-3 twsinst script example for TIVAIX2 # ./twsinst -new -uname maestro2 -cputype master -thiscpu tivaix2 -master tivaix2 -port 31112 -company IBM

4.1.4 Configuring the IBM Tivoli Workload Scheduler engine


After you have installed the IBM Tivoli Workload Scheduler engine as a Master Domain Manager, perform the following configuration tasks. These are the minimum tasks that you should perform to get IBM Tivoli Workload Scheduler Master Domain Manager running. For instructions on configuring other types of workstation, such as Fault Tolerant Agents and Domain Managers, refer to Tivoli Workload Scheduler Job Scheduling Console Users Guide, SH19-4552, or Tivoli Workload Scheduler Version 8.2, Reference Guide, SC32-1274.

192

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Checking the workstation definition


In order to have IBM Tivoli Workload Scheduler serviced correctly by HACMP in the event of a fallover, it must have the service IP label or the service IP address defined in its workstation definition. When installing a Master Domain Manager (master), the workstation definition is added automatically. After you have installed IBM Tivoli Workload Scheduler, check the workstation definition of the master and verify that the service IP label or the address is associated with the master. 1. Log into the master workstation as: TWSuser. 2. Execute the following command; this opens a text editor with the masters CPU definition:
$ composer modify cpu=master_name

Where: master - the workstation name of the master. Example 4-4 and Example 4-5 give the workstation definition for workstations TIVAIX1 and TIVAIX2 that we installed. Notice that the value for NODE is set to the service IP label in each workstation definition.
Example 4-4 Workstation definition for TIVIAIX1
CPUNAME TIVAIX1 DESCRIPTION "MASTER CPU" OS UNIX NODE tivaix1_svc DOMAIN MASTERDM TCPADDR 31111 FOR MAESTRO AUTOLINK ON RESOLVEDEP ON FULLSTATUS ON END

Example 4-5 Workstation definition for TIVIAIX1


CPUNAME TIVAIX2 DESCRIPTION "MASTER CPU" OS UNIX NODE tivaix2_svc DOMAIN MASTERDM TCPADDR 31112 FOR MAESTRO AUTOLINK ON RESOLVEDEP ON FULLSTATUS ON END

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

193

3. If the value for NODE is set to the service IP label correctly, then close the workstation definition. If is not set correctly, then modify the file and save.

Adding the FINAL jobstream


The FINAL jobstream is responsible for generating daily production files. Without this jobstream, IBM Tivoli Workload Scheduler is unable to perform daily job processing. IBM Tivoli Workload Scheduler provides a definition file that you can use to add this FINAL jobstream. The following steps describe how to add the FINAL jobstream using this file. 1. Log in as the IBM Tivoli Workload Scheduler user. 2. Add the FINAL schedule by running the following command.
$ composer "add Sfinal"

3. Run Jnextday to create the production file.


$ Jnextday

4. Check the status of IBM Tivoli Workload Scheduler by issuing the following command.
$ conman status

If IBM Tivoli Workload Scheduler started correctly, the status should be Batchman=LIVES. 5. Check that all IBM Tivoli Workload Scheduler processes (netman, mailman, batchman, jobman) are running. Example 4-6 illustrates checking for the IBM Tivoli Workload Scheduler process.
Example 4-6 Checking for IBM Tivoli Workload Scheduler process $ ps -ef | grep -v grep | grep maestro maestro2 14484 31270 0 16:59:41 - 0:00 32000 maestro2 16310 13940 1 16:00:29 pts/0 0:00 maestro2 26950 1 0 22:38:59 - 0:00 maestro2 28658 16310 2 17:00:07 pts/0 0:00 root 29968 14484 0 16:59:41 - 0:00 maestro2 31270 26950 0 16:59:41 - 0:00 3 2000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE $ /usr/maestro2/bin/batchman -parm -ksh /usr/maestro2/bin/netman ps -ef /usr/maestro2/bin/jobman /usr/maestro2/bin/mailman -parm

4.1.5 Installing IBM Tivoli Workload Scheduler Connector


If you plan to use JSC to perform administration tasks for IBM Tivoli Workload Scheduler, install the IBM Tivoli Workload Scheduler connector. IBM Tivoli Workload Scheduler connector must be installed on any TMR server or Managed

194

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Node that is running IBM Tivoli Workload Scheduler Master Domain Manager. Optionally, JSC could be installed on any Domain Manager or FTA, providing that Managed Node is also installed. Note: Tivoli Management Framework should be installed prior to IBM Tivoli Workload Scheduler Connector installation. For instructions on installing a TMR server, refer to Chapter 5 or Tivoli Enterprise Installation Guide Version 4.1, GC32-0804. In this section, we assume that you have already installed Tivoli Management Framework, and have applied the latest set of fix packs. Here we describe the steps to install Job Scheduling Services (a prerequisite to install IBM Tivoli Workload Scheduler Connector) and IBM Tivoli Workload Scheduler Connector by using the command line. For instructions on installing IBM Tivoli Workload Scheduler Connector from the Tivoli Desktop, refer to Tivoli Workload Scheduler Job Scheduling Console Users Guide, SH19-4552. For our mutual takeover scenario, each node in our two-node HACMP cluster (tivaix1, tivaix2) hosts a TMR server. We installed IBM Tivoli Workload Scheduler Connector on each of the two cluster nodes. 1. Before installing, identify the following items. These items are required when running the IBM Tivoli Workload Scheduler Connector installation script. Node name to install IBM Tivoli Workload Scheduler Connector - This must be the name defined in the Tivoli Management Framework. The full path to the installation image - For Job Scheduling Services, it is the directory with the TMF_JSS.IND file. For IBM Tivoli Workload Scheduler Connector, it is the directory with the TWS_CONN.IND file. IBM Tivoli Workload Scheduler installation directory - The /TWShome directory. Connector Instance Name - A name for a connector instance name. Instance Owner - The name of the IBM Tivoli Workload Scheduler user. 2. Insert the IBM Tivoli Workload Scheduler Installation Disk 1. 3. Log in on the TMR server as root user. 4. Run the following command to source the Tivoli environment variables:
# . /etc/Tivoli/setup_env.sh

5. Run the following command to install Job Scheduling Services:


# winstall -c install_dir -i TMF_JSS nodename

Where: install_dir - the path to the installation image

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

195

nodename - the name of the TMR server or the Managed Node that you are installing JSS on. The command will perform a prerequisite verification, and you will be prompted to proceed with the installation or not. Example 4-7 illustrates the execution of the command.
Example 4-7 Installing JSS from the command line # winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_1/TWS_CONN -i TMF_JSS tivaix1 Checking product dependencies... Product TMF_3.7.1 is already installed as needed. Dependency check completed. Inspecting node tivaix2... Installing Product: Tivoli Job Scheduling Services v1.2

Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts: tivaix2 need to copy the CAT (generic) to: tivaix2:/usr/local/Tivoli/msg_cat For the machines in the aix4-r1 class: hosts: tivaix2 need to copy the BIN (aix4-r1) to: tivaix2:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix2:/usr/local/Tivoli/spool/tivaix2.db Continue([y]/n)? Creating product installation description object...Created. Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix2 Completed. Distributing architecture specific Binaries --> tivaix2 Completed. Distributing architecture specific Server Database --> tivaix2 ....Product install completed successfully. Completed. Registering product installation attributes...Registered.

196

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

6. Verify that Job Scheduling Services was installed by running the following command:
# wlsinst -p

This command shows a list of all the Tivoli products installed in your environment. You should see in the list Tivoli Job Scheduling Services v1.2. Example 4-8 shows an example of the command output. The 10th line shows that JSS was installed successfully
Example 4-8 wlsinst -p command output # wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1 (build 09/19) Tivoli AEF, Version 4.1 (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Distribution Status Console, Version 4.1
#

7. To install IBM Tivoli Workload Scheduler Connector, run the following command:
# winstall -c install_dir -i TWS_CONN twsdir=/TWShome iname=instance owner=twsuser createinst=1 nodename

Where: Install_dir - the path of the installation image. twsdir - set this to /TWSHome. iname - the name of the IBM Tivoli Workload Scheduler Connector instance. owner - the name of the IBM Tivoli Workload Scheduler user. 8. Verify that IBM Tivoli Workload Scheduler Connector was installed by running the following command.
# wlsinst -p

This command shows a list of all the Tivoli products installed in your environment. You should see in the list TWS Connector 8.2. The following is an example of a command output. The 11th line shows that IBM Tivoli Workload Scheduler Connector was installed successfully.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

197

Example 4-9 wlsinst -p command output # wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1 (build 09/19) Tivoli AEF, Version 4.1 (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Tivoli TWS Connector 8.2 Distribution Status Console, Version 4.1

4.1.6 Setting the security


After you have installed IBM Tivoli Workload Scheduler Connectors, apply changes to the IBM Tivoli Workload Scheduler Security file so that users can access IBM Tivoli Workload Scheduler through JSC. If you grant access to a Tivoli Administrator, then any operating system user associated to that Tivoli Administrator is granted access through JSC. For more information on IBM Tivoli Workload Scheduler Security file, refer to Tivoli Workload Scheduler Version 8.2 Installation Guide, SC32-1273. To modify the security file, follow the procedures described in this section. For our scenario, we added the name of two Tivoli Administrators, Root_tivaix1-region and Root_tivaix2-region, to the Security file of each Master Domain Manager. Root_tivaix1-region is a Tivoli Administrator on tivaix1, and Root_tivaix2-region is a Tivoli Administrator on tivaix2. This will make each iIBM Tivoli Workload Scheduler Master Domain Manager accessible from either of the two TMR servers. In the event of a fallover, IBM Tivoli Workload Scheduler Master Domain Manager remains accessible from JSC through the Tivoli Administrator on the surviving node. 1. Log into IBM Tivoli Workload Scheduler master as the TWSuser. TWSuser is the user you have used to install IBM Tivoli Workload Scheduler. 2. Run the following command to dump the Security file to a text file.
# dumpsec > /tmp/sec.txt

3.Modify the security file and save your changes. Add the name of Tivoli Administrators to the LOGON clause.

198

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 4-8 on page 197 illustrates a security file. This security file grants full privileged access to Tivoli Administrators called Root_tivaix1-region and Root_tivaix2-region.
Example 4-10 Example of a security file USER MAESTRO CPU=@+LOGON=maestro,root,Root_tivaix2-region,Root_tivaix1-region BEGIN USEROBJ CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY,ALTPASS JOB CPU=@ ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,CONFIRM,DELDEP,DELETE,DI SPLAY,KILL,MODIFY,RELEASE,REPLY,RERUN,SUBMIT,USE,LIST SCHEDULE CPU=@ ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,DELDEP,DELETE,DI SPLAY,LIMIT,MODIFY,RELEASE,REPLY,SUBMIT,LIST RESOURCE CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY,RESOURCE,USE,LI ST PROMPT ACCESS=ADD,DELETE,DISPLAY,MODIFY,REPLY,USE,LIST FILE NAME=@ ACCESS=CLEAN,DELETE,DISPLAY,MODIFY CPU CPU=@ ACCESS=ADD,CONSOLE,DELETE,DISPLAY,FENCE,LIMIT,LINK,MODIF Y,SHUTDOWN,START,STOP,UNLINK,LIST PARAMETER CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY CALENDAR ACCESS=ADD,DELETE,DISPLAY,MODIFY,USE END

4. Verify your security file by running the following command. Make sure that no errors or warnings are displayed.
$ makesec -v /tmp/sec.txt

Note: Running makesec command with the -v option only verifies your security file to see there are no syntax errors. It does not update the security database. Example 4-11 shows the sample output of the makesec -v command:
Example 4-11 Output of makesec -v command $ makesec -v /tmp/sec.txt TWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

199

Corp. MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)] MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total) $

5. If there are no errors, compile the security file with the following command:
$ makesec /tmp/sec.txt

Example 4-12 illustrates output of the makesec command:


Example 4-12 Output of makesec command $ makesec /tmp/sec.txt TWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)] MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total) MAKESEC:Security file installed as /usr/maestro/Security $

6. When applying changes to the security file, the connector instance should be stopped to allow the change to take effect. Run the following commands to source the Tivoli environment variables and stop the connector instance:
$ . /etc/Tivoli/setup_env.sh $ wmaeutil inst_name -stop "*"

where inst_name is the name of the instance you would like to stop. Example 4-13 shows an example of wmaeutil command to stop a connector instance called TIVAIX1.
Example 4-13 Output of wmaeutil command $ . /etc/Tivoli/setup_env.sh $ wmaeutil TIVAIX1 -stop "*" AWSBCT758I Done stopping the ENGINE server AWSBCT758I Done stopping the DATABASE server AWSBCT758I Done stopping the PLAN server $

Note: You do not need to manually restart the connector instance, as it is automatically started when a user logs in to JSC.

200

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

7. Verify that the changes in the security file are effective. by running the dumpsec command. This will dump the current content of the security file into a text file. Open the text file and confirm that the previous change you have made is reflected:
$ dumpsec > filename

where filename is the name of the text file. 8. Verify that the changes are effective by logging into JSC as a user you have added in the security file.

4.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance


One IBM Tivoli Workload Scheduler Connector instance can only be mapped to one IBM Tivoli Workload Scheduler instance. In our mutual takeover scenario, one TMR server would be hosting two instances of IBM Tivoli Workload Scheduler in case a fallover occurs. An additional IBM Tivoli Workload Scheduler Connector instance is required on each node so that a user can access both instances of IBM Tivoli Workload Scheduler on the surviving node. We added a connector instance to each node to control both IBM Tivoli Workload Scheduler Master Domain Manager TIVAIX1 and TIVAIX2. To add an additional IBM Tivoli Workload Scheduler Connector Instance, perform the following tasks. Note: You must install the Job Scheduling Services and IBM Tivoli Workload Scheduler Connector Framework products before performing these tasks. 1. Log into a cluster node as root. 2. Source the Tivoli environment variables by running the following command:
#. /etc/Tivoli/setup_env.sh

3. List the existing connector instance:


# wlookup -ar MaestroEngine

Example 4-14 on page 201 shows one IBM Tivoli Workload Scheduler Connector instance called TIVAIX1.

Example 4-14 Output of wlookup command before adding additional instance # wlookup -ar MaestroEngine TIVAIX1 1394109314.1.661#Maestro::Engine#

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

201

4. Add an additional connector instance:


# wtwsconn.sh -create -n instance_name -t TWS_directory

where: instance_name - the name of the instance you would like to add. TWS_directory - the path where the IBM Tivoli Workload Scheduler engine associated with the instance resides. Example 4-15 shows output for the wtwsconn.sh command. We added a TWS Connector instance called TIVAIX2. This instance is for accessing IBM Tivoli Workload Scheduler engine installed on /usr/maestro directory.
Example 4-15 Sample wtwsconn.sh command # wtwsconn.sh -create -n TIVAIX2 -t /usr/maestro Scheduler engine created Created instance: TIVAIX2, on node: tivaix1 MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro2

5. Run the wlookup -ar command again to verify that the instance was successfully added. The IBM Tivoli Workload Scheduler Connector that you have just added should show up in the list.
# wlookup -ar MaestroEngine

Example 4-16 shows that IBM Tivoli Workload Scheduler Connector instance TIVAIX2 is added to the list.
Example 4-16 Output of wlookup command after adding additional instance # wlookup -ar MaestroEngine TIVAIX1 1394109314.1.661#Maestro::Engine# TIVAIX2 1394109314.1.667#Maestro::Engine##

4.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster


When you have finished installing IBM Tivoli Workload Scheduler, verify that the BM Tivoli Workload Scheduler is able to move from one node to another, and that it is able to run on the standby node(s) in the cluster. It is important that you perform this task manually before applying fix packs, and also before you install HACMP. Making sure that IBM Tivoli Workload Scheduler

202

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

behaves as expected before each major change simplifies troubleshooting in case you have issues with IBM Tivoli Workload Scheduler. If you apply IBM Tivoli Workload Scheduler fix packs and install HACMP, and then find out that IBM Tivoli Workload Scheduler behaves unexpectedly, it would be difficult to determine the cause of the problem. Though it may seem cumbersome, we strongly recommend that you verify IBM Tivoli Workload Scheduler behavior before you make a change to a system. The sequence of the verification is as follows. 1. Stop IBM Tivoli Workload Scheduler on a cluster node. Log in as TWSuser and run the following command:
$ conman "shut ;wait"

2. Migrate the volume group to another node. Refer to the volume group migration procedure described in Define the shared LVM components on page 94. 3. Start IBM Tivoli Workload Scheduler on the node by running the conman start command:
$ conman start

4. Verify the batchman status. Make sure the Batchman status is LIVES.
$ conman status

5. Verify that all IBM Tivoli Workload Scheduler processes are running by issuing the ps command:
$ ps -ef | grep -v grep | grep maestro

Example 4-17 shows an example of ps command output. Check that netman, mailman, batchman and jobman processes are running for each IBM Tivoli Workload Scheduler instance installed.
Example 4-17 Output of ps command $ ps -ef | grep -v grep | grep maestro maestro 26378 43010 1 18:46:58 pts/1 0:00 root 30102 34192 0 18:49:59 - 0:00 maestro 33836 38244 0 18:49:59 - 0:00 32 000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE maestro 34192 33836 0 18:49:59 - 0:00 3 2000 maestro 38244 1 0 18:49:48 - 0:00 maestro 41214 26378 4 18:54:52 pts/1 0:00 $ -ksh /usr/maestro/bin/jobman /usr/maestro/bin/mailman -parm

/usr/maestro/bin/batchman -parm

/usr/maestro/bin/netman ps -ef

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

203

6. If using JSC, log into the IBM Tivoli Workload Scheduler Master Domain Manager. Verify that you are able to see the scheduling objects and the production plan.

4.1.9 Applying IBM Tivoli Workload Scheduler fix pack


When you have completed installing IBM Tivoli Workload Scheduler and IBM Tivoli Workload Scheduler Connector, apply the latest fix pack available. For instructions on installing the fix pack for IBM Tivoli Workload Scheduler engine, refer to the README file included in each fix pack. The IBM Tivoli Workload Scheduler engine fix pack can be applied either from the command line by using the twspatch script, or from the Java-based graphical user interface. The IBM Tivoli Workload Scheduler Connector fix pack is applied from the Tivoli Desktop. Because instructions on applying IBM Tivoli Workload Scheduler Connectors are not documented in the fix pack README, we describe the procedures to install IBM Tivoli Workload Scheduler Connector fix packs here. Before applying any of the fix packs, make sure you have a viable backup. Note: The same level of fix pack should be applied to the IBM Tivoli Workload Scheduler engine and the IBM Tivoli Workload Scheduler Connector. If you apply a fix pack to the IBM Tivoli Workload Scheduler engine, make sure you apply the same level of fix pack for IBM Tivoli Workload Scheduler Connector.

Applying IBM Tivoli Workload Scheduler Connector fix pack from Tivoli Desktop
Install the IBM Tivoli Workload Scheduler Connector fix pack as follows: 1. Set the installation media. If you are using a CD, then insert the CD. If you have downloaded the fix pack from the fix pack download site, then extract the tar file in a temporary directory. 2. Log in to TMR server using the Tivoli Desktop. Enter the host machine name, user name, and password, then press OK as seen in Figure 4-4 on page 205.

204

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 4-4 Logging into IBM Tivoli Management Framework through the Tivoli Desktop

3. Select Desktop -> Install -> Install Patch as seen in Figure 4-5.

Figure 4-5 Installing the fix pack

4. If the error message in Figure 4-6 on page 206 is shown, press OK and proceed to the next step.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

205

Figure 4-6 Error message

5. In the Path Name field, enter the full path of the installation image, as shown in Figure 4-7. The full path should be the directory where U2_TWS.IND file resides.

Figure 4-7 Specifying the path to the installation image

6. In the Install Patch dialog (Figure 4-8 on page 207), select the fix pack from the Select Patches to Install list. Then make sure the node to install the fix pack is shown in the Clients to Install On list. Press Install.

206

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 4-8 Install Patch

7. Pre-installation verification is performed, and then you are prompted to continue or not. If there are no errors or warnings shown in the dialog, press Continue Install (Figure 4-9 on page 208).

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

207

Figure 4-9 Patch Installation

8. Confirm the Finished Patch Installation message, then press Close. 9. Log in, as root user, to the node where you just installed the fix pack. 10.Source the Tivoli environment variables:
# . /etc/Tivoli/setup_env.sh

11.Verify that the fix pack was installed successfully:


# wlsinst -P

For IBM Tivoli Workload Scheduler Connector Fix Pack 01, confirm that Tivoli TWS Connector upgrade to v8.2 patch 1 is included in the list. For Fix Pack 02, confirm that Tivoli TWS Connector upgrade to v8.2 patch 2 is included in the list. Example 4-18 on page 209 shows an output of the wlsinst command after installing Fix Pack 01.

208

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 4-18 Verifying the fix pack installation # wlsinst -P 4.1-TMF-0008 Tier 2 3.7 Endpoint Bundles for Tier1 Gateways Tivoli Framework Patch 4.1-TMF-0013 (build 05/28) Tivoli Framework Patch 4.1-TMF-0014 (build 05/30) Tivoli Framework Patch 4.1-TMF-0015 for linux-ppc (LCF41) (build 05/14) Tivoli Management Agent 4.1 for iSeries Endpoint (41016) Tivoli Framework Patch 4.1-TMF-0034 (build 10/17) Java 1.3 for Tivoli, United Linux Tivoli Management Framework, Version 4.1 [2928] os400 Endpoint French language Tivoli Management Framework, Version 4.1 [2929] os400 Endpoint German language Tivoli Management Framework, Version 4.1 [2931] os400 Endpoint Spanish language Tivoli Management Framework, Version 4.1 [2932] os400 Endpoint Italian language Tivoli Management Framework, Version 4.1 [2962] os400 Endpoint Japanese language Tivoli Management Framework, Version 4.1 [2980] os400 Endpoint Brazilian Portugu ese language Tivoli Management Framework, Version 4.1 [2984] os400 Endpoint DBCS English lang uage Tivoli Management Framework, Version 4.1 [2986] os400 Endpoint Korean language Tivoli Management Framework, Version 4.1 [2987] os400 Endpoint Traditional Chine se language Tivoli Management Framework, Version 4.1 [2989] os400 Endpoint Simplified Chines e language Tivoli TWS Connector upgrade to v8.2 patch 1 #

Best practices for applying IBM Tivoli Workload Scheduler fix pack
As of December 2003, the latest fix pack for IBM Tivoli Workload Scheduler 8.2 is 8.2-TWS-FP02. Because 8.2-TWS-0002 is dependent on 8.2-TWS-FP01, we applied both fix packs. Here are some hints and tips for applying these fix packs.

Additional disk space required for backup files


Though not mentioned in the README for 8.2-TWS-FP01, a backup copy of the existing binaries is created under the home directory of the user applying the fix. For applying IBM Tivoli Workload Scheduler fix pack, we use the root user. This means that the backup is created under the home directory of the root user. Before applying the fix, confirm that you have enough space in that directory for the backup file. For UNIX systems, it is 25 MB. If you do not have enough space on that directory, the fix pack installation may fail with the message shown in

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

209

Example 4-19. This example shows an installation failure message when installation of the fix pack was initiated from the command line.
Example 4-19 IBM Tivoli Workload Scheduler fix pack installation error # ./twspatch -install -uname maestro2 Licensed Materials Property of IBM TWS-WSH (C) Copyright IBM Corp 1998,2003 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. TWS for UNIX/TWSPATCH 8.2 Revision: 1.5 AWSFAF027E Error: Operation INSTALL failed. For more details see the /tmp/FP_TWS _AIX_maestro2^8.2.0.01.log log file. #

Check the fix pack installation log file. On UNIX systems, the fix pack installation log file is saved in /tmp directory. These logs are named twspatchXXXXX.log, where XXXXX is a 5-digit random number. Example 4-20 shows the log file we received when we had insufficient disk space.
Example 4-20 The contents of /tmp/twspatchXXXXX.log Tue Dec 2 19:24:53 CST 2003 DISSE0006E Operation unsuccessful: fatal failure.

If you do not have sufficient disk space on the desired directory, you can either add additional disk space, or change the backup directory to another directory with sufficient disk space. For instructions on how to change the backup directory, refer to the README file attached to 8.2-TWS-FP02. Note: Changing the backup directory requires a modification of a file used by IBM Tivoli Configuration Manager 4.2, and changes may affect the behavior of TCM 4.2 if you have it installed on your system. Consult your IBM service provider for more information.

4.1.10 Configure HACMP for IBM Tivoli Workload Scheduler


After you complete the installation of the application server (IBM Tivoli Workload Scheduler, in this redbook) and then HACMP, you configure HACMP as you

210

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

planned in 3.2.3, Planning and designing an HACMP cluster on page 67, so the application server can be made highly available. Note: We strongly recommend that you install your application servers and ensure they function properly before installing HACMP. In the environment we used for this redbook, we installed IBM Tivoli Workload Scheduler and/or IBM Tivoli Management Framework as called for by the scenarios we implement. This section is specifically oriented towards showing you how to configure HACMP for IBM Tivoli Workload Scheduler in a mutual takeover cluster. In this section we show how to configure HACMP specifically for IBM Tivoli Workload Scheduler. Configuration of HACMP 5.1 can be carried out through the HACMP menu of the SMIT interface, or by the Online Planning Worksheets tool shipped with the HACMP 5.1 software. In this and in the following sections, we describe the steps to configure HACMP using the SMIT interface to support IBM Tivoli Workload Scheduler. We walk you through a series of steps that are specifically tailored to make the following scenarios highly available: IBM Tivoli Workload Scheduler IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework IBM Tivoli Management Framework (shown in Chapter 5, Implement IBM Tivoli Management Framework in a cluster on page 415) Note: There are many other possible scenarios, and many features are not used by our configuration in this redbook and not covered in the following sections. Any other scenario should be planned and configured using the HACMP manuals and IBM Redbooks, or consult your IBM service provider for assistance in planning and implementation of other scenarios. The Online Planning Worksheet is a Java-based worksheet that will help you plan your HACMP configuration. It generates a configuration file based on the information you have entered that can be directly loaded into a live HACMP cluster, and it also generates a convenient HTML page documenting the configuration. We do not show how to use this worksheet here; for a complete and detailed explanation of this worksheet, see Chapter 16, Using Online

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

211

Planning Worksheets, High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00. Note: One of the major drawbacks when you use the Online Planning Worksheet is that certain HACMP configurations that are accepted by the tool might cause problems on a live HACMP cluster. The SMIT screens that we show in this redbook tend to catch these problems. Our recommendation, as of HACMP Version 5.1, is to use the Online Planning Worksheet to create convenient HTML documentation of the configuration, and then manually configure the cluster through the SMIT screens.

Following is an overview of the steps we use to configure HACMP for our IBM Tivoli Workload Scheduler environment, and where you can find each step: Configure heartbeating on page 213 Configure HACMP topology on page 219 Configure HACMP service IP labels/addresses on page 252 Configure application servers on page 223 Configure application monitoring on page 227 Add custom start and stop HACMP scripts on page 234 Add a custom post-event HACMP script on page 242 Modify /etc/hosts and name resolution order on page 250 Configure HACMP networks and heartbeat paths on page 254 Configure HACMP resource groups on page 257 Configure HACMP resource groups on page 257 Configure cascading without fallback on page 264 Configure pre-event and post-event commands on page 267 Configure pre-event and post-event processing on page 269 Configure HACMP persistent node IP label/addresses on page 272 Configure predefined communication interfaces on page 276 Verify the configuration on page 280 Start HACMP Cluster services on page 287 Verify HACMP status on page 292 Test HACMP resource group moves on page 294 Live test of HACMP fallover on page 298

212

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configure HACMP to start on system restart on page 300 Verify IBM Tivoli Workload Scheduler fallover on page 301 The details of each step are as follows.

Configure heartbeating
The configuration we used implements two heartbeat mechanisms: one over the IP network, and one over the SSA disk subsystem (called target mode SSA). Best practices call for implementing at least one non-IP point-to-point network for exchanging heartbeat keepalive packets between cluster nodes, in case the TCP/IP-based subsystem, networks, or network NICs fail. Available non-IP heartbeat mechanisms are: Target Mode SSA Target Mode SCSI Serial (also known as RS-232C) Heartbeating over disk (only available for enhanced concurrent mode volume groups) In this section, we describe how to configure a target mode SSA connection between HACMP nodes sharing disks connected to SSA on Multi-Initiator RAID adapters (FC 6215 and FC 6219). The adapters must be at Microcode Level 1801 or later. You can define a point-to-point network to HACMP that connects all nodes on an SSA loop. The major steps of configuring target mode SSA are: Changing node numbers on systems in SSA loop on page 213 Configuring Target Mode SSA devices on page 215 The details of each step follows.

Changing node numbers on systems in SSA loop


By default, SSA node numbers on all systems are zero. These must to be changed to unique, non-zero numbers on the nodes to enable target mode SSA. To configure the target mode SSA devices: 1. Assign a unique non-zero SSA node number to all systems on the SSA loop. Note: The ID on a given SSA node should match the HACMP node ID, which is contained in the node_id field of the HACMP node ODM entry.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

213

The following command retrieves the HACMP node ID:


odmget -q "name = node_name" HACMPnode

where node_name is the HACMP node name of the cluster node. In our environment, we used tivaix1 and tivaix2 as the values for node_name. Example 4-21 shows how we determined the HACMP node ID for tivaix1. Here we determined that tivaix1 uses node ID 1, based upon the information in the line highlighted in bold that starts with the string node_id.
Example 4-21 How to determine a cluster nodes HACMP node ID [root@tivaix1:/home] odmget -q "name = tivaix1" HACMPnode | grep -p COMMUNICATION_PATH HACMPnode: name = "tivaix1" object = "COMMUNICATION_PATH" value = "9.3.4.194" node_id = 1 node_handle = 1 version = 6

Note that we piped the output from the odmget command used to the grep command to extract one stanza of the odmget command. If you omitted this part of the command string in the preceding figure, multiple stanzas are displayed that all have the same node_id field. 2. To change the SSA node number:
chdev -l ssar -a node_number=number

where number is the new SSA node number. Best practice calls for using the same number as the HACMP node ID determined in the preceding step. Note: If you are using IBM AIX General Parallel File System (GPFS), you must make the SSA node number match the HACMP cluster node ID. In our environment, we assigned SSA node number 2 to tivaix1 and SSA node number 1 to tivaix2. 3. To show the systems SSA node number:
lsattr -El ssar

Example 4-22 shows the output of this command for tivaix1, where the node number is highlighted in bold.
Example 4-22 Show a systems SSA node number, taken from tivaix1 [root@tivaix1:/home] lsattr -El ssar node_number 1 SSA Network node number True

214

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Repeat this procedure on each cluster node, assigning a different SSA node number for each cluster node. In our environment, Example 4-23 shows that tivaix2 was assigned SSA node number 2.
Example 4-23 Show a systems SSA node number, taken from tivaix2 [root@tivaix2:/home] lsattr -El ssar node_number 2 SSA Network node number True

Configuring Target Mode SSA devices


After enabling the target mode interface, run cfgmgr to create the initiator and target devices and make them available. To create the initiator and target devices: 1. Enter: smit devices. SMIT displays a list of devices. 2. Select Install/Configure Devices Added After IPL and press Enter. 3. Exit SMIT after the cfgmgr command completes. 4. Ensure that the devices are paired correctly:
lsdev -C | grep tmssa

Example 4-24 shows this commands output on tivaix1 in our environment.


Example 4-24 Ensure that target mode SSA is configured on a cluster node, taken from tivaix1 [root@tivaix1:/home] lsdev -C | grep tmssa tmssa2 Available Target Mode SSA Device tmssar Available Target Mode SSA Router

Example 4-25 shows this commands output on tivaix2 in our environment.


Example 4-25 Ensure that target mode SSA is configured on a cluster node, taken from tivaix2 # lsdev -C | grep tmssa tmssa1 Available tmssar Available Target Mode SSA Device Target Mode SSA Router

Note how each cluster node uses the same target mode SSA router, but different target mode SSA devices. The differences are highlighted in bold in the preceding figures. Cluster node tivaix1 uses target mode SSA device tmssa2, while cluster node tivaix2 uses tmssa1. Repeat the procedures for enabling and configuring the target mode SSA devices for other nodes connected to the SSA adapters.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

215

Configuring the target mode connection creates two target mode files in the /dev directory of each node: /dev/tmssan.im - the initiator file, which transmits data /dev/tmssan.tm - the target file, which receives data where n is a number that uniquely identifies the target mode file. Note that this number is different than the SAA node number and HACMP node ID from the preceding section. These numbers are deliberately set differently. Example 4-26 shows the target mode files created in the /dev directory for tivaix1 in our environment.
Example 4-26 Display the target mode SSA files for tivaix1 [root@tivaix1:/home] ls /dev/tmssa*.im /dev/tmssa*.tm /dev/tmssa2.im /dev/tmssa2.tm

Example 4-27 shows the target mode files created in the /dev directory for tivaix2 in our environment.
Example 4-27 Display the target mode SSA files for tivaix2 [root@tivaix2:/home] ls /dev/tmssa*.im /dev/tmssa*.tm /dev/tmssa1.im* /dev/tmssa1.tm*

Note: On page 273, in section Configuring Target Mode SSA Devices of High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, these target mode SSA files are referred to as /dev/tmscsinn.im and /dev/tmscsinn.tm. We believe this is incorrect, because these are the files used for target mode SCSI heartbeating. This redbook shows what we believe are the correct file names. This includes the corrected unique identifiers, changed from two digits (nn) to one digit (n).

Testing the target mode connection


In order for the target mode connection to work, initiator and target devices must be paired correctly. To ensure that devices are paired and that the connection is working after enabling the target mode connection on both nodes: 1. Enter the following command on a node connected to the SSA disks:
cat < /dev/tmssa#.tm

where # must be the number of the target node. (This command hangs and waits for the next command.)

216

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

In our environment, on tivaix1 we ran the command:


cat < /dev/tmssa2.tm

2. On the target node, enter the following command:


cat filename > /dev/tmssa#.im

where # must be the number of the sending node and filename is any short ASCII file. The contents of the specified file are displayed on the node on which you entered the first command. In our environment, on tivaix2 we ran the command:
cat /etc/hosts > /dev/tmssa1.im

The contents of /etc/hosts on tivaix2 is shown in the terminal session of tivaix1. 3. You can also check that the tmssa devices are available on each system:
lsdev -C | grep tmssa

Defining the Target Mode SSA network to HACMP


To configure the Target Mode SSA point-to-point network in the HACMP custer, follow these steps. 1. Enter: smit hacmp. 2. In SMIT, select Extended Configuration -> Extended Topology Configuration -> Configure HACMP Networks -> Add a Network to the HACMP Cluster and press Enter. SMIT displays a choice of types of networks. 3. Select the type of network to configure (select tmssa because we are using target mode SSA) and press Enter. The Add a Serial Network screen is displayed as shown in Figure 4-10 on page 218.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

217

Add a Serial Network to the HACMP Cluster Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [net_tmssa_01] tmssa

* Network Name * Network Type

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-10 Filling out the Add a Serial Network to the HACMP Cluster SMIT screen

4. Fill in the fields on the Add a Serial Network screen as follows: Network Name Name the network, using no more than 32 alphanumeric characters and underscores; do not begin the name with a numeric. Do not use reserved names to name the network. For a list of reserved names see High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Network Type Valid types are RS232, tmssa, tmscsi, diskhb. This is filled in for you by the SMIT screen.

5. Press Enter to configure this network. 6. Return to the Add a Serial Network SMIT screen to configure more networks if necessary. For our environment, we configured net_tmssa_01 as shown in Figure 4-10. No other serial networks were necessary.

218

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configure HACMP topology


Complete the following procedures to define the cluster topology. You only need to perform these steps on one node. When you verify and synchronize the cluster topology, its definition is copied to the other nodes. To define and configure nodes for the HACMP cluster topology: 1. Enter: smitty hacmp. The HACMP for AIX SMIT screen is displayed as shown in Figure 4-11.

HACMP for AIX Move cursor to desired item and press Enter. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools

F1=Help F9=Shell

F2=Refresh F10=Exit

F3=Cancel Enter=Do

F8=Image

Figure 4-11 HACMP for AIX SMIT screen

2. Go to Initialization and Standard Configuration -> Add Nodes to an HACMP Cluster and press Enter. The Configure Nodes to an HACMP Cluster (standard) SMIT screen is displayed as shown in Figure 4-12 on page 220.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

219

Configure Nodes to an HACMP Cluster (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [cltivoli] [tivaix1 tivaix2]

* Cluster Name New Nodes (via selected communication paths) Currently Configured Node(s)

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-12 Configure nodes to an HACMP Cluster

3. Enter field values on the Configure Nodes to an HACMP Cluster screen as follows: Cluster Name Enter an ASCII text string that identifies the cluster. The cluster name can include alpha and numeric characters and underscores, but cannot have a leading numeric. Use no more than 32 characters. It can be different from the hostname. Do not use reserved names. For a list of reserved names see Chapter 6, Verifying and Synchronizing a Cluster Configuration, in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. New nodes (via selected communication paths) Enter (or add) one resolvable IP Label (this may be the hostname), IP address, or Fully Qualified Domain Name for each new node in the cluster, separated by spaces.

220

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

This path will be taken to initiate communication with the node (for example, NodeA, 10.11.12.13, NodeC.ibm.com). Use F4 to see the picklist display of the hostnames and/or addresses in /etc/hosts that are not already HACMP-configured IP Labels/Addresses. You can add node names or IP addresses in any order. Currently configured node(s) If nodes are already configured, they are displayed here. In our environment, we entered cltivoli in the Cluster Name field and tivaix1 tivaix2 in the New Nodes (via selected communication paths) path. 4. Press Enter to configure the nodes of the HACMP cluster. A COMMAND STATUS SMIT screen displays the progress of the cluster node configurations. The HACMP software uses this information to create the cluster communication paths for the ODM. Once communication paths are established, HACMP runs the discovery operation and prints results to the SMIT screen. 5. Verify that the results are reasonable for your cluster. At this point HACMP does not know how to locate the cluster nodes this step only reserves spaces for these nodes. The following steps fill out the remaining information that enables HACMP to associate actual computing resources like disks, processes, and networks with these newly-reserved cluster nodes.

Configure HACMP service IP labels/addresses


A service IP label/address is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label. This connection can be node-bound or taken over by multiple nodes. For the standard configuration, it is assumed that the connection will allow IP Address Takeover (IPAT) via aliases.The /etc/hosts file on all nodes must contain all IP labels and associated IP addresses that you want to discover. Follow this procedure to define service IP labels for your cluster: 1. Enter: smit hacmp. 2. Go to HACMP -> Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Service IP Labels/Addresses and press Enter.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

221

3. Fill in field values as follows as shown in Figure 4-13: IP Label/IP Address Enter, or select from the picklist, the IP label/IP address to be kept highly available. Network Name Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured. If you leave the field blank, HACMP fills in this field automatically with the network type plus a number appended, starting with 1 (for example, netether1).

Add a Service IP Label/Address (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tivaix1_svc] [net_ether_01]

* IP Label/Address * Network Name

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-13 Enter service IP label for tivaix1

Figure 4-13 shows how we entered the service address label for tivaix1. In our environment, we used tivaix1_svc as the IP label and net_ether_01 as the network name. 4. Press Enter after filling in all required fields. HACMP now checks the validity of the IP interface configuration. 5. Repeat the previous steps until you have configured all IP service labels for each network, as needed.

222

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

In our environment, we created another service IP label for cluster node tivaix2, as shown in Figure 4-14. We used tivaix2_svc as the IP label and net_ether_01 as the network name. Note how we assigned the network name net_ether_01 in both cases, so that both sets of service IP labels are in the same HACMP network.

Add a Service IP Label/Address (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tivaix2_svc] [net_ether_01]

* IP Label/Address * Network Name

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-14 Enter service IP labels for tivaix2

Configure application servers


An application server is a cluster resource used to control an application that must be kept highly available. Configuring an application server does the following: Associates a meaningful name with the server application. For example, you could give an installation of IBM Tivoli Workload Scheduler a name such as itws. You then use this name to refer to the application server when you define it as a resource. Points the cluster event scripts to the scripts that they call to start and stop the server application. Allows you to then configure application monitoring for that application server.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

223

We show you in Add custom start and stop HACMP scripts on page 234 how to write the start and stop scripts for IBM Tivoli Workload Scheduler. Note: Ensure that the server start and stop scripts exist on all nodes that participate as possible owners of the resource group where this application server resides. Complete the following steps to create an application server on any cluster node: 1. Enter smitty hacmp. 2. Go to Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Application Servers and press Enter. The Configure Resources to Make Highly Available SMIT screen is displayed as shown in Figure 4-15.

Configure Resources to Make Highly Available Move cursor to desired item and press Enter. Configure Configure Configure Configure Service IP Labels/Addresses Application Servers Volume Groups, Logical Volumes and Filesystems Concurrent Volume Groups and Logical Volumes

F1=Help F9=Shell

F2=Refresh F10=Exit

F3=Cancel Enter=Do

F8=Image

Figure 4-15 Configure Resources to Make Highly Available SMIT screen

Go to Add an Application Server and press Enter (Figure 4-16 on page 225).

224

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configure Application Servers Move cursor to desired item and press Enter. Add an Application Server Change/Show an Application Server Remove an Application Server

F1=Help F9=Shell

F2=Refresh F10=Exit

F3=Cancel Enter=Do

F8=Image

Figure 4-16 Configure Application Servers SMIT screen

3. The Add Application Server SMIT screen is displayed as shown in Figure 4-17 on page 226. Enter field values as follows: Server Name Enter an ASCII text string that identifies the server. You will use this name to refer to the application server when you define resources during node configuration. The server name can include alphabetic and numeric characters and underscores. Use no more than 64 characters. Enter the name of the script and its full pathname (followed by arguments) called by the cluster event scripts to start the application server (maximum: 256 characters). This script must be in the same location on each cluster node that might start the server. The contents of the script, however, may differ. Enter the full pathname of the script called by the cluster event scripts to stop the server (maximum: 256 characters). This script must be in the same location

Start Script

Stop Script

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

225

on each cluster node that may start the server. The contents of the script, however, may differ.

Add Application Server Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tws_svr1] [/usr/es/sbin/cluster/> [/usr/es/sbin/cluster/>

* Server Name * Start Script * Stop Script

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-17 Fill out the Add Application Server SMIT screen for application server tws_svr1

As shown in Figure 4-17, in our environment on tivaix1 we named the instance of IBM Tivoli Workload Scheduler that normally runs on that cluster node tws_svr1. For the instance of IBM Tivoli Workload Scheduler on tivaix2, we name the corresponding application server object tws_svr2. Note that no mention is made of the cluster nodes when defining an application server. We only mention them to make you aware of the conventions we used in our environment. For the start script of application server tws_svr1, we entered the following in the Start Script field:
/usr/es/sbin/cluster/utils/start_tws1.sh

The stop script of this application server is:


/usr/es/sbin/cluster/utils/stop_tws1.sh

This is entered in the Stop Script field. 4. Press Enter to add this information to the ODM on the local node.

226

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5. Repeat the procedure for all additional application servers. In our environment, we added a definition for application server tws_svr2, using the start script for the Start Script field:
/usr/es/sbin/cluster/utils/start_tws2.sh

For tws_svr2, we entered the following stop script in the Stop Script field:
/usr/es/sbin/cluster/utils/stop_tws2.sh

Figure 4-18 shows how we filled out the SMIT screen to define application server tws_svr2.

Add Application Server Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tws_svr2] [/usr/es/sbin/cluster/> [/usr/es/sbin/cluster/>

* Server Name * Start Script * Stop Script

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-18 Fill out the Add Application Server SMIT screen for application server tws_svr2

You only need to perform this on one cluster node. When you verify and synchronize the cluster topology, the new application server definitions are copied to the other nodes.

Configure application monitoring


HACMP can monitor specified applications and automatically take action to restart them upon detecting process death or other application failures.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

227

Note: If a monitored application is under control of the system resource controller, check to be certain that action:multi are -O and -Q. The -O specifies that the subsystem is not restarted if it stops abnormally. The -Q specifies that multiple instances of the subsystem are not allowed to run at the same time. These values can be checked using the following command:
lssrc -Ss Subsystem | cut -d : -f 10,11

If the values are not -O and -Q, then they must be changed using the chksys command. You can select either of two application monitoring methods: Process application monitoring detects the death of one or more processes of an application, using RSCT Event Management. Custom application monitoring checks the health of an application with a custom monitor method at user-specified polling intervals. Process monitoring is easier to set up, because it uses the built-in monitoring capability provided by RSCT and requires no custom scripts. However, process monitoring may not be an appropriate option for all applications. Custom monitoring can monitor more subtle aspects of an applications performance and is more customizable, but it takes more planning, because you must create the custom scripts. In this section, we show how to configure process monitoring for IBM Tivoli Workload Scheduler. Remember that an application must be defined to an application server before you set up the monitor. For IBM Tivoli Workload Scheduler, we configure process monitoring for the netman process because it will always run under normal conditions. If it fails, we want the cluster to automatically fall over, and not attempt to restart netman. Because netman starts very quickly, we only give it 60 seconds to start before monitoring begins. For cleanup and restart scripts, we will use the same scripts as the start and stop scripts discussed in Add custom start and stop HACMP scripts on page 234.

228

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Tip: For more comprehensive application monitoring by HACMP, configure process monitoring for the IBM Tivoli Workload Scheduler processes batchman, jobman, mailman, and writer. Define application server resources for each of these processes before defining the process monitoring for them. If you do this, be sure to use the cl_RMupdate command to suspend monitoring before Jnextday starts and to resume monitoring after Jnextday completes. Otherwise, the cluster will interpret the Jnextday-originated shutdown of these processes as a failure of the cluster node and inadvertently start a fallover. Set up your process application monitor as follows: 1. Enter: smit hacmp. 2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resources Configuration -> Configure HACMP Application Monitoring -> Configure Process Application Monitor -> Add Process Application Monitor and press Enter. A list of previously defined application servers appears. 3. Select the application server for which you want to add a process monitor. In our environment, we selected tws_svr1, as shown in Figure 4-19.

+--------------------------------------------------------------------------+ Application Server to Monitor Move cursor to desired item and press Enter. tws_svr1 tws_svr2 F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-19 How to select an application server to monitor

4. In the Add Process Application Monitor screen, fill in the field values as follows: Monitor Name This is the name of the application monitor. If this monitor is associated with an application server, the

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

229

monitor has the same name as the application server. This field is informational only and cannot be edited. Application Server Name (This field can be chosen from the picklist. It is already filled in with the name of the application server you selected.) Processes to Monitor Specify the process(es) to monitor. You can type more than one process name. Use spaces to separate the names. Note: To be sure you are using correct process names, use the names as they appear from the ps -el command (not ps -f), as explained in section Identifying Correct Process Names in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Process Owner Specify the user id of the owner of the processes specified above (for example: root). Note that the process owner must own all processes to be monitored. Instance Count Specify how many instances of the application to monitor. The default is 1 instance. The number of instances must match the number of processes to monitor exactly. If you put one instance, and another instance of the application starts, you will receive an application monitor error. Note: This number must be 1 if you have specified more than one process to monitor (one instance for each process). Stabilization Interval Specify the time (in seconds) to wait before beginning monitoring. For instance, with a database application, you may wish to delay monitoring until after the start script and initial database search have been completed. You may need to experiment with this value to balance performance with reliability.

230

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Note: In most circumstances, this value should not be zero. Restart Count Specify the restart count, that is the number of times to attempt to restart the application before taking any other actions. The default is 3. Note: Make sure you enter a Restart Method if your Restart Count is any non-zero value. Restart Interval Specify the interval (in seconds) that the application must remain stable before resetting the restart count. Do not set this to be shorter than (Restart Count) x (Stabilization Interval). The default is 10% longer than that value. If the restart interval is too short, the restart count will be reset too soon and the desired fallover or notify action may not occur when it should. Action on Application Failure Specify the action to be taken if the application cannot be restarted within the restart count. You can keep the default choice notify, which runs an event to inform the cluster of the failure, or select fallover, in which case the resource group containing the failed application moves over to the cluster node with the next highest priority for that resource group. For more information, refer to Note on the Fallover Option and Resource Group Availability in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Notify Method (Optional) Define a notify method that will run when the application fails. This custom method runs during the restart process and during notify activity. Cleanup Method (Optional) Specify an application cleanup script to be invoked when a failed application is detected, before invoking the restart method. The default is the

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

231

application server stop script defined when the application server was set up. Note: With application monitoring, since the application is already stopped when this script is called, the server stop script may fail. Restart Method (Required if Restart Count is not zero.) The default restart method is the application server start script defined previously, when the application server was set up. You can specify a different method here if desired. In our environment, we entered the process /usr/maestro/bin/netman in the Process to Monitor field, maestro in the Process Owner field, 60 in the Restart Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left unchanged, as shown in Figure 4-20.

Add Process Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] tws_svr1 tws_svr1 [/usr/maestro/bin/netm> [maestro] [] [60] [0] [] [fallover] [] [/usr/es/sbin/cluster/> [/usr/es/sbin/cluster/>

* * * *

Monitor Name Application Server Name Processes to Monitor Process Owner Instance Count * Stabilization Interval * Restart Count Restart Interval * Action on Application Failure Notify Method Cleanup Method Restart Method

# # # # +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-20 Add Process Application Monitor SMIT screen for application server tws_svr1

232

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

In our environment, the COMMAND STATUS SMIT screen displays two warnings as shown in Figure 4-21, which could we safely ignore because the default values applied are the desired values.

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. claddappmon warning: e 1. claddappmon warning: use 0. The parameter "INSTANCE_COUNT" was not specified. Will us

The parameter "RESTART_INTERVAL" was not specified. Will

F1=Help F8=Image n=Find Next

F2=Refresh F9=Shell

F3=Cancel F10=Exit

F6=Command /=Find

Figure 4-21 COMMAND STATUS SMIT screen after creating HACMP process application monitor

5. Press Enter when you have entered the desired information. The values are then checked for consistency and entered into the ODM. When the resource group comes online, the application monitor starts. 6. Repeat the operation for remaining application servers. In our environment, we repeated the operation for application server tws_svr2. We entered the field values as shown in Figure 4-22 on page 234.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

233

Add Process Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] tws_svr2 tws_svr2 [/usr/maestro/bin/netm> [maestro2] [] [60] [0] [] [fallover] [] [/usr/es/sbin/cluster/> [/usr/es/sbin/cluster/>

* * * *

Monitor Name Application Server Name Processes to Monitor Process Owner Instance Count * Stabilization Interval * Restart Count Restart Interval * Action on Application Failure Notify Method Cleanup Method Restart Method

# # # # +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-22 Add Process Application Monitor SMIT screen for application server tws_svr2

We entered the process /usr/maestro2/bin/netman in the Process to Monitor field, maestro2 in the Process Owner field, 60 in the Restart Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left unchanged.

Add custom start and stop HACMP scripts


For IBM Tivoli Workload Scheduler, custom scripts for HACMP are required to start and stop the application server. These are used when HACMP starts an application server that is part of a resource group, and gracefully shuts down the application server when a resource group is taken offline or moved. The stop script, of course, does not get an opportunity to execute if a cluster node is unexpectedly halted. We developed the following basic versions of the scripts for our environment. You may need to write your own version to accommodate your sites specific requirements. Both of these example scripts are designed to recognize how they were called. That is, the name of the script is passed into itself, and based upon this name it

234

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

performs certain actions. Our environments design has two variable factors when starting and stopping IBM Tivoli Workload Scheduler: Name of the TWSuser user account associated with a particular instance of IBM Tivoli Workload Scheduler. In our environment, there are two instances of the application, and the user accounts maestro and maestro2 are associated with these instances. Path to the installation of each instance of IBM Tivoli Workload Scheduler, called the TWShome directory. In our environment, the two instances are installed under /usr/maestro and /usr/maestro2. The scripts are designed so that when they are called with a name that follows a certain format, they will compute these variable factors depending upon the name. The format is start_twsn.sh and stop_twsn.sh, where n matches the cluster node number by convention. When n equals 1, it is treated as a special case. TWSuser is assumed to be maestro and TWShome is assumed to be /usr/maestro. When n equals any other number, TWSuser is assumed to be maestron. For example: If n is 4, TWSuser is maestro4. TWShome is assumed to be /usr/maestron. Using the same example, TWShome is /usr/maestro4. You need one pair of start and stop scripts for each instance of IBM Tivoli Workload Scheduler that will run in the cluster. For mutual takeover configurations like the two-node cluster environment we show in this redbook, you need each pair of start and stop scripts on each cluster node that participates in the mutual takeover architecture. In our environment, we used the start script shown in Example 4-28 on page 236. Most of the script deals with starting correctly. The key line that actually starts IBM Tivoli Workload Scheduler is towards the end of the script, which reads:
su - ${clusterized_TWSuser} -c "./StartUp ; conman start"

This means the su command will execute as the TWSuser user account the command:
./StartUp ; conman start

This is a simple command to start IBM Tivoli Workload Scheduler. Your site may require a different start procedure, so you can replace this line with your own procedure to start IBM Tivoli Workload Scheduler.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

235

Example 4-28 Sample start script for IBM Tivoli Workload Scheduler under HACMP #!/bin/sh # # Sample script for starting IBM Tivoli Workload Scheduler Version 8.2 # under IBM HACMP Version 5.1. # # Comments and questions to Anthony Yen <sg24-6632-00@AutomaticIT.com> # #----------------------------# User-Configurable Constants #----------------------------# # Base TWShome path. Modify this to match your site's standards. # root_TWShome=/usr # # Base TWSuser. Modify this to match your site's standards. # TWSuser="maestro" # # Debugging directory. This just holds a flag file; it won't grow more than 1 KB. # DBX_DIR=/tmp/ha_cfg #------------------# Main Program Body #------------------# # Ensure debugging directory is available, create if if necessary if [ -d ${DBX_DIR} ] ; then DBX=1 else mkdir ${DBX_DIR} rc=$? if [ $rc -ne 0 ] ; then echo "WARNING: no debugging directory could be created, no debug" echo "information will be issued..." DBX=0 else DBX=1 fi fi # # Determine how we are called

236

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

CALLED_AS=`basename $0` # # Disallow being called as root name if [ "${CALLED_AS}" = "start_tws.sh" ] ; then echo "FATAL ERROR: This script cannot be called as itself. Please create a" echo "symbolic link to it of the form start_twsN.sh where N is an integer" echo "corresponding to the cluster node number and try again." exit 1 fi # # Determine cluster node number we are called as. extracted_node_number=`echo ${CALLED_AS} | sed 's/start_tws\(.*\)\.sh/\1/g'` # # Set TWShome path to correspond to cluster node number. if [ ${extracted_node_number} -eq 1 ] ; then clusterized_TWShome=${root_TWShome}/${TWSuser} clusterized_TWSuser=${TWSuser} else clusterized_TWShome=${root_TWShome}/${TWSuser}${extracted_node_number} clusterized_TWSuser=${TWSuser}${extracted_node_number} fi echo "clusterized_TWShome = $clusterized_TWShome" echo "clusterized_TWSuser = $clusterized_TWSuser" if [ $DBX -eq 1 ] ; then echo "Script for starting TWS ${extracted_node_number} at "`date` > \ ${DBX_DIR}/start${extracted_node_number}.flag fi echo "Starting TWS ${extracted_node_number} at "`date` su - ${clusterized_TWSuser} -c "./StartUp ; conman start" echo "Netman on TWS ${extracted_node_number} started, conman start issued" sleep 10 echo "Process list of ${clusterized_TWSuser}-owned processes..." ps -ef | grep -v grep | grep ${clusterized_TWSuser} exit 0

In our environment, we used a stop script that uses the same execution semantics as the start script described in the preceding discussion. The exact commands it runs depends upon the name the stop script is called as when it is executed. Most of the script deals with starting correctly. The script is oriented towards stopping the cluster node, which in our environment is a Master Domain Manager. The key lines that actually stop IBM Tivoli Workload Scheduler are

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

237

towards the end of the script, which are extracted and shown in Example 4-29 on page 238.
Example 4-29 Commands used by stop script to stop IBM Tivoli Workload Scheduler su - ${clusterized_TWSuser} -c "conman 'unlink cpu=@ ; noask'" su - ${clusterized_TWSuser} -c "conman 'stop @ ; wait ; noask'" su - ${clusterized_TWSuser} -c "conman 'shutdown ; wait'" . . . wmaeutil ${connector} -stop "*"

This means the su command will execute, as the TWSuser user account, the following command:
conman 'unlink cpu=@ ; noask'

This unlinks all CPUs in the scheduling network. This is followed by another su command that executes, as the TWSuser user account, the following command:
conman 'stop @ ; wait ; noask'

This stops the IBM Tivoli Workload Scheduler engine on all CPUs in the scheduling network. A third and final su command executes, as the TWSuser user account, the following command:
conman 'shutdown ; wait'

This stops the netman process of the instance of IBM Tivoli Workload Scheduler on the cluster node. Finally, the wmaeutil command is executed within a loop that passes the name of each IBM Tivoli Workload Scheduler Connector found on the cluster node to each iteration of the command. This stops all Connectors associated with the instance of IBM Tivoli Workload Scheduler that is being stopped. This is a simple set of commands to stop IBM Tivoli Workload Scheduler. Your site may require a different stop procedure, so you can replace this line with your own procedure to stop IBM Tivoli Workload Scheduler. Example 4-30 shows our sample stop script.
Example 4-30 Sample stop script for IBM Tivoli Workload Scheduler under HACMP #!/bin/ksh # # Sample script for stopping IBM Tivoli Workload Scheduler Version 8.2 # under IBM HACMP Version 5.1. #

238

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

# Comments and questions to Anthony Yen <sg24-6632-00@AutomaticIT.com> # #----------------------------# User-Configurable Constants #----------------------------# # Base TWShome path. Modify this to match your site's standards. # root_TWShome=/usr # # Base TWSuser. Modify this to match your site's standards. # TWSuser="maestro" # # Debugging directory. This just holds a flag file; it won't grow more than 1 KB. # DBX_DIR=/tmp/ha_cfg #------------------# Main Program Body #------------------# # Source in environment variables for IBM Tivoli Management Framework. if [ -d /etc/Tivoli ] ; then . /etc/Tivoli/setup_env.sh else echo "FATAL ERROR: Tivoli environment could not be sourced, exiting..." exit 1 fi # # Ensure debugging directory is available, create if if necessary if [ -d ${DBX_DIR} ] ; then DBX=1 else mkdir ${DBX_DIR} rc=$? if [ $rc -ne 0 ] ; then echo "WARNING: no debugging directory could be created, no debug" echo "information will be issued..." DBX=0 else DBX=1 fi fi # # Determine how we are called

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

239

CALLED_AS=`basename $0` # # Disallow being called as root name if [ "${CALLED_AS}" = "stop_tws.sh" ] ; then echo "FATAL ERROR: This script cannot be called as itself. Please create a" echo "symbolic link to it of the form stop_twsN.sh where N is an integer" echo "corresponding to the cluster node number and try again." exit 1 fi # # Determine cluster node number we are called as. extracted_node_number=`echo ${CALLED_AS} | sed 's/stop_tws\(.*\)\.sh/\1/g'` # # Set TWShome path to correspond to cluster node number. if [ ${extracted_node_number} -eq 1 ] ; then clusterized_TWShome=${root_TWShome}/${TWSuser} clusterized_TWSuser=${TWSuser} else clusterized_TWShome=${root_TWShome}/${TWSuser}${extracted_node_number} clusterized_TWSuser=${TWSuser}${extracted_node_number} fi # # Source IBM Tivoli Workload Scheduler environment variables. if [ -f ${clusterized_TWShome}/tws_env.sh ] ; then . ${clusterized_TWShome}/tws_env.sh else echo "FATAL ERROR: Unable to source ITWS environment from:" echo " ${clusterized_TWShome}/tws_env.sh" echo "Exiting..." exit 1 fi echo "clusterized_TWShome = $clusterized_TWShome" echo "clusterized_TWSuser = $clusterized_TWSuser" if [ $DBX -eq 1 ] ; then echo "Script for stopping TWS ${extracted_node_number} at "`date` > ${DBX_DIR}/start${extracted_node_number}.flag fi

echo su su su echo

"Stopping TWS ${extracted_node_number} at "`date` ${clusterized_TWSuser} -c "conman 'unlink cpu=@ ; noask'" ${clusterized_TWSuser} -c "conman 'stop @ ; wait ; noask'" ${clusterized_TWSuser} -c "conman 'shutdown ; wait'" "Shutdown for TWS ${extracted_node_number} issued..."

echo "Verify netman is stopped..." ps -ef | grep -v grep | grep ${clusterized_TWShome}/bin/netman > /dev/null

240

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

rc=$? while ( [ ${rc} -ne 1 ] ) do sleep 10 ps -ef | grep -v grep | grep ${clusterized_TWShome}/bin/netman > /dev/null rc=$? done echo "Stopping all Connectors..." # # Identify all Connector object labels connector_labels=`wlookup -Lar MaestroEngine` for connector in ${connector_labels} do echo "Stopping connector ${connector}..." wmaeutil ${connector} -stop "*" done echo "Process list of ${clusterized_TWSuser}-owned processes:" ps -ef | grep -v grep | grep ${clusterized_TWSuser} exit 0

To add the custom start and stop HACMP scripts: 1. Copy both scripts to the directory /usr/es/sbin/cluster/utils on each cluster node. 2. Run the commands in Example 4-31 to install the scripts. These create symbolic links to the scripts. When the script is called via one of these symbolic links, it will know which instance of IBM Tivoli Workload Scheduler to manage.
Example 4-31 Commands to run to install custom HACMP start and stop scripts for IBM Tivoli Workload Scheduler ln ln ln ln -s -s -s -s /usr/es/sbin/cluster/utils/start_tws.sh /usr/es/sbin/cluster/utils/start_tws1.sh /usr/es/sbin/cluster/utils/start_tws.sh /usr/es/sbin/cluster/utils/start_tws2.sh /usr/es/sbin/cluster/utils/stop_tws.sh /usr/es/sbin/cluster/utils/stop_tws2.sh /usr/es/sbin/cluster/utils/stop_tws.sh /usr/es/sbin/cluster/utils/stop_tws1.sh

The symbolic links mean that no matter how many instances of IBM Tivoli Workload Scheduler you configure in a mutual takeover HACMP cluster, only two actual scripts need to be maintained. If you ensure that there are no unique variations between installations of IBM Tivoli Workload Scheduler, then maintaining the scripts among all installations is very easy. Only two scripts ever need to be modified, vastly simplifying maintenance and reducing copying errors.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

241

Note: Keep in mind that, after a modification is made to either or both scripts, they need to be copied back to all the cluster nodes. Tip: Console output from the start and stop scripts are sent to /tmp/hacmp.out on the cluster nodes. This is useful to debug start and stop scripts while you develop them.

Add a custom post-event HACMP script


IBM Tivoli Workload Scheduler presents a special case situation that HACMP can be configured to handle. If IBM Tivoli Workload Scheduler falls back to a cluster node, ideally it should fall back only after all currently running jobs have had a chance to finish. For example, consider our environment of a two-node mutual takeover HACMP cluster, shown in Figure 4-23 when it is running normally. Here, cluster node tivaix1 runs an instance of IBM Tivoli Workload Scheduler we will call TWS Engine 1 from disk volume group tiv_vg1. Meanwhile, cluster node tivaix2 runs TWS Engine 2 from disk volume group tiv_vg2.

tivaix1

tivaix2

TWS Engine1

TWS Engine2

tiv_vg1 tiv_vg2 Mass Storage


Figure 4-23 Normal operation of two-node mutual takeover HACMP cluster

242

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Suppose cluster node tivaix2 suffers an outage, and falls over to tivaix1. This means TWS Engine2 now also runs on tivaix1, and tivaix1 picks up the connection to disk volume group tiv_vg2, as shown in Figure 4-24.

tivaix1

tivaix2

TWS Engine1 TWS Engine2

X
tiv_vg1 tiv_vg2 Mass Storage

Figure 4-24 Location of application servers after tivaix2 falls over to tivaix1

Due to the sudden nature of a catastrophic failure, the jobs that are in progress on tivaix2 under TWS Engine2 when the disaster incident occurs are lost. When TWS Engine2 starts on tivaix1, you would perform whatever job recovery is necessary.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

243

tivaix1

tivaix2

TWS Engine1 TWS Engine2

tiv_vg1 tiv_vg2 Mass Storage


Figure 4-25 State of cluster after tivaix2 returns to service and reintegrates with the cluster

When tivaix2 is restored to service, it reintegrates with the cluster, but because we chose to use the Cascading WithOut Fallback (CWOF) feature, TWS Engine2 is not immediately transferred back to tivaix2 when it reintegrates with the cluster. This is shown in Figure 4-25, where tivaix2 is shown as available and back in the cluster, but TWS Engine2 is not shut down and transferred over to it yet. Here is where the special case situation presents itself. If we simply shut down TWS Engine2 and transfer it back to tivaix2, any jobs TWS Engine2 currently is running on tivaix1 can possibly lose their job state information, or in the worst case where the jobs are executed from the same disk volume group as TWS Engine2 (or use the same disk volume group to read and write their data), be interrupted in mid-execution. This is shown in Figure 4-26 on page 245. As long as there are running jobs under TWS Engine2 in the memory of tivaix1, moving TWS Engine2 to tivaix2 can cause undesirable side effects because we cannot move the contents of memory from one machine to another, only the contents of a disk volume group.

244

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tivaix1 TWS Engine1


Running Job(s)

tivaix2

TWS Engine2
Running Job(s)

tiv_vg1 tiv_vg2 Mass Storage


Figure 4-26 Running jobs under TWS Engine2 on tivaix1 prevent TWS Engine2 from transferring back to tivaix2

It is usually too inconvenient to wait for a lull in the jobs that are running under TWS Engine2 on tivaix1. In many environments there simply is no such dead zone in currently running jobs. When this occurs, the jobs currently executing on the cluster node in question (tivaix1, in this example) need to run through to completion, without any new jobs releasing on the cluster node, before moving the application server (TWS Engine2 ,in this example). The new jobs that are prevented from releasing will have a delayed launch time, but this is often the least disruptive approach to gracefully transferring an application server back to a reintegrated cluster node. This process is called quiescing the application server. For IBM Tivoli Workload Scheduler, as long as there are no currently running jobs on the cluster node itself that an instance of IBM Tivoli Workload Scheduler needs to move away from, all information that needs to be transferred intact is held on disk. This makes it easy and safe to restart IBM Tivoli Workload Scheduler on the reintegrated cluster node. The job state information that needs to be transferred can be thought of as in hibernation after no jobs are actively running.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

245

We quiesce an instance of IBM Tivoli Workload Scheduler by raising the job fence of the instance on a cluster node high enough that all new jobs on the cluster node will not release. See IBM Tivoli Workload Scheduling Suite Version 8.2, General Information, SC32-1256, for more details on job fences. Raising the job fence does not affect currently running jobs. We do not recommend using a job stream or CPU limit to quiesce the currently running jobs under an instance of IBM Tivoli Workload Scheduler on a CPU. Schedulers and users can still override a limit by forcing the priority of a job to the GO state, which can cause problems for falling back to a cluster node if a job is released at an inopportune time during the fallback. Tip: While you can quiesce IBM Tivoli Workload Scheduler at any time, you still gain benefits from planning when during the production day you quiesce it. Quiesce when there is as little time left as possible for currently running jobs to complete, because the sooner currently running jobs complete, the less time new jobs will be kept on hold. Use the available reports in IBM Tivoli Workload Scheduler to predict when currently running jobs will complete. It is very important to understand that when and how to quiesce an instance of IBM Tivoli Workload Scheduler is wholly dependent upon business considerations. When designing a schedule, collect information from the business users of IBM Tivoli Workload Scheduler on which jobs and job streams must not be delayed, which can be delayed if necessary, and for how long can they be delayed. This information is used to determine when to quiesce the server, and the impact of the operation. It can also be used to automate the decision and process of falling back an application server. Some considerations external to IBM Tivoli Workload Scheduler usually affect this process as well. For example, if a database is used by jobs running on the cluster, or is hosted on the disk volume group that the application server uses, falling back would require shutting down the database. In some environments, this can be very time consuming, it can difficult to obtain authorization for on short notice, or it can be simply unacceptable during certain times of the year (like quarter-end processing periods). A highly available environment that takes these considerations into account is part of the design process of an actual production deployment. Consult your IBM service provider for advice on how to mitigate these additional considerations.

246

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tivaix1 TWS Engine1


Running Job(s)

tivaix2

TWS Engine2

tiv_vg1 tiv_vg2
Held Job(s)

Mass Storage

Figure 4-27 TWS Engine2 on tivaix1 is quiesced, only held jobs exist on tivaix1 under TWS Engine2. TWS Engine2 can now fall back to tivaix2

Once an instance of IBM Tivoli Workload Scheduler is quiesced on a CPU, all remaining jobs for that instance on that CPU are held either because their dependencies have not been satisfied yet, or the job fence has held their priority. This is shown inFigure 4-27, in which on tivaix1 only TWS Engine1 still has running jobs, while TWS Engine2s jobs are all held, and their state recorded to the production file on disk volume group tiv_vg2. Due to the business and other non-IBM Tivoli Workload Scheduler considerations that affect the decision and process of quiescing an application server in preparation for falling it back to its original cluster node, we do not show in this redbook a sample quiesce script. In our lab environment, because we are not running actual production applications, our quiesce script simply exits.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

247

However, when you develop your own quiesce script, we recommend that you design it as a script to be called as a post-event script for the node_up_complete event. Before raising the fence, the script should check for at least the following conditions: All business conditions are met for raising the fence. For example, do not raise the fence if a business user still requires scheduling services for a critical job that needs to execute in the near future. HACMP is already running on the cluster node a quiesced application server in a resource group needs to fall back to. The cluster node is reintegrated within the cluster, but the resource group that normally belongs on the cluster node is not on that node. This prevents the quiescing process from accidentally running on a new node that joins the cluster and unnecessarily shutting down an application server, for example. The resource group that falls back is in the ONLINE state on another cluster node. This prevents the quiescing from accidentally moving resource groups taken down for business reasons, for example. Example 4-32 shows korn shell script code that can be used to determine if HACMP is running. It simply checks the status of the basic HACMP subsystems. You may need to modify it to suit your particular HACMP environment if other HACMP subsystems are used.
Example 4-32 How to determine in a script whether or not HACMP is running PATH=${PATH}:/usr/es/sbin/cluster/utilities clstrmgrES=`clshowsrv clstrmgrES | grep -v '^Subsystem' | awk '{ print $3 }'` clinfoES=` clshowsrv clinfoES | grep -v '^Subsystem' | awk '{ print $3 }'` clsmuxpdES=`clshowsrv clsmuxpdES | grep -v '^Subsystem' | awk '{ print $3 }'` if ( [ "${clstrmgrES}" = 'inoperative' ] \ -o [ "${clinfoES}" = 'inoperative' ] \ -o [ "${clsmuxpdES}" = 'inoperative' ] ) ; then echo "FATAL ERROR: HACMP does not appear to be running, exiting..." exit 1 fi

Example 4-33 shows the clRGinfo command and sample output from our environment. This can be used to determine whether or not a resource group is ONLINE, and if so, which cluster node it currently runs upon.
Example 4-33 Using the clRGinfo command to determine the state of resource groups in a cluster [root@tivaix1:/home/root] clRGinfo ----------------------------------------------------------------------------Group Name Type State Location -----------------------------------------------------------------------------

248

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

rg1

cascading OFFLINE OFFLINE

tivaix1 tivaix2

rg2

cascading ONLINE tivaix2 OFFLINE tivaix1 [root@tivaix1:/home/root] clRGinfo -s rg1:OFFLINE:tivaix1:cascading rg1:OFFLINE:tivaix2:cascading rg2:ONLINE:tivaix2:cascading rg2:OFFLINE:tivaix1:cascading

You use conmans fence command to raise the job fence on a CPU. If we want to raise the job fence on cluster node tivaix1 for an instance of IBM Tivoli Workload Scheduler that is running as CPU TIVAIX2, we log into the TWSuser user account of that instance, then run the command:
conman "fence TIVAIX2 ; go ; noask"

In our environment, we would log into maestro2 on tivaix1. A quiesce script would be running on a reintegrated cluster node, and remotely log into the surviving node to perform the job fence operation. Example 4-34 shows one way to have a shell script wait for currently executing jobs under an instance of IBM Tivoli Workload Scheduler on a CPU to exit. It is intended to be run as root user. It simply uses the su command to run a command as the TWSuser user account that owns the instance of IBM Tivoli Workload Scheduler. The command that is run lists all jobs in the EXEC state on the CPU TIVAIX1, then counts the number of jobs returned. As long as the number of jobs in the EXEC state is not equal to zero, the code waits for a minute, then checks the number of jobs in the EXEC state again. Again, a quiesce script would remotely run this code on the surviving node against the desired instance of IBM Tivoli Workload Scheduler.
Example 4-34 Wait for currently executing jobs to exit num_exec_jobs=`su - maestro -c "conman sj TIVAIX1#@.@+state='exec' 2>/dev/null | \ grep -v 'sj @#@.@+state=exec' | wc -l"` while ( [ ${num_exec_jobs} -ne 0 ] ) do sleep 60 num_exec_jobs=`su - maestro -c "conman sj TIVAIX1#@.@+state='exec' 2>/dev/null | \ grep -v 'sj @#@.@+state=exec' | wc -l"` done

If the implemented quiesce script successfully quiesces the desired instance of IBM Tivoli Workload Scheduler, it can also be designed to automatically perform

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

249

the resource group move. A script would use the clRGmove command, as shown in Example 4-35, to move resource group rg2 to tivaix2:
Example 4-35 Move a resource group using the clRGmove command /usr/es/sbin/cluster/utilities/clRGmove -s 'false' -m -i -g 'rg2' -n 'tivaix2'

This command can be run from any cluster node. In our environment, we copy our stub quiesce script to:
/usr/es/sbin/cluster/sh/quiesce_tws.sh

This script is copied to the same location on both cluster nodes tivaix1 and tivaix2. The stub does not perform any actual work, so it has no effect upon HACMP. In our environment, with CWOF set to true, the stub would have to run clRGmove to simulate quiescing. We still perform the quiescing manually as a result. Tip: Make sure the basic HACMP services work for straight fallover and fallback scenarios before customizing HACMP behavior. In a production deployment, the quiesce script would be implemented and tested only after basic configuration and testing of HACMP is successful.

Modify /etc/hosts and name resolution order


The IP hostnames we use for HACMP are configured in /etc/hosts so that local name resolution can be performed if access to the DNS server is lost. In our environment, our /etc/hosts file is the same on both cluster nodes tivaix1 and tivaix2, as shown in Figure 4-28 on page 251.

250

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

127.0.0.1 # 9.3.4.33

loopback localhost tivdce1.itsc.austin.ibm.com

# loopback (lo0) name/address

# Administrative addresses (persistent on each node) 9.3.4.194 tivaix1 tivaix1.itsc.austin.ibm.com 9.3.4.195 tivaix2 tivaix2.itsc.austin.ibm.com

# Base IP labels for en1 on both nodes 10.1.1.101 tivaix1_bt2 10.1.1.102 tivaix2_bt2

# Service IP labels 9.3.4.3 tivaix1_svc 9.3.4.4 tivaix2_svc

# Boot IP labels for en0 192.168.100.101 tivaix1_bt1 192.168.100.102 tivaix2_bt1 Figure 4-28 File /etc/hosts copied to all by cluster nodes of cluster we used

Name resolution order is controlled by the following items, in decreasing order of precedence (the first line overrides the second line, which in turn overrides the third line): Environment variable NSORDER Host settings in the /etc/netsvc.conf file Host settings in the /etc/irs.conf file In our environment, we used the following line in /etc/netsvc.conf to set the name resolution order on all cluster nodes:
hosts = local, bind

The /etc/netsvc.conf file on all cluster nodes is set to this line. Note: In our environment, we used some IP hostnames that include underscores to test HACMPs handling of name resolution. In a live production environment, we do not recommend this practice.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

251

Underscores are not officially supported in DNS, so some of the host entries we use for our environment can never be managed by strict DNS servers. The rules for legal IP hostnames are set by RFC 952:
http://www.ietf.org/rfc/rfc952.txt

RFC 1123 also sets the rules for legal IP hostnames:


http://www.ietf.org/rfc/rfc1123.txt

All the entries for /etc/hosts are drawn from the planning worksheets that you fill out when planning for HACMP.

Configure HACMP service IP labels/addresses


A service IP label/address is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label. This connection can be node-bound or taken over by multiple nodes. For the standard configuration, it is assumed that the connection will allow IP Address Takeover (IPAT) via aliases. The /etc/hosts file on all nodes must contain all IP labels and associated IP addresses that you want to discover. Follow this procedure to define service IP labels for your cluster: 1. Enter: smit hacmp. 2. Go to HACMP -> Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Service IP Labels/Addresses and press Enter. 3. Fill in field values as follows as shown in Figure 4-29 on page 253: IP Label/IP Address Enter, or select from the picklist, the IP label/IP address to be kept highly available. Network Name Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured. If you leave the field blank, HACMP fills in this field automatically with the network type plus a number appended, starting with 1 (for example, netether1).

252

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Add a Service IP Label/Address (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tivaix1_svc] [net_ether_01]

* IP Label/Address * Network Name

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-29 Enter service IP label for tivaix1

Figure 4-29 shows how we entered the service address label for tivaix1. In our environment, we use tivaix1_svc as the IP label and net_ether_01 as the network name. 4. Press Enter after filling in all required fields. HACMP now checks the validity of the IP Interface configuration. 5. Repeat the previous steps until you have configured all IP service labels for each network, as needed. In our environment, we create another service IP label for cluster node tivaix2, as shown in Figure 4-30 on page 254.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

253

Add a Service IP Label/Address (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tivaix2_svc] [net_ether_01]

* IP Label/Address * Network Name

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-30 How to enter service IP labels for tivaix2

We used tivaix2_svc as the IP label and net_ether_01 as the network name. Note how we assigned the network name net_ether_01 in both cases, so that both sets of service IP labels are in the same HACMP network.

Configure HACMP networks and heartbeat paths


The cluster should have more than one network, to avoid a single point of failure. Often the cluster has both IP and non-IP based networks in order to use different heartbeat paths. Use the Add a Network to the HACMP cluster SMIT screen to define HACMP IP and point-to-point networks. Running HACMP discovery before configuring is recommended, to speed up the process. In our environment, we use IP-based networks, heartbeating over IP aliases, and point-to-point networks over Target Mode SSA. In this section we show how to configure IP-based networks and heartbeating using IP aliases. Refer to Configure heartbeating on page 213 for information about configuring point-to-point networks over Target Mode SSA.

254

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configure IP-Based networks


To configure IP-based networks, take the following steps: 1. Enter: smit hacmp. 2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Networks -> Add a Network to the HACMP Cluster and press Enter. 3. Select the type of network to configure and press Enter. The Add an IP-Based Network to the HACMP Cluster SMIT screen displays the configuration fields. In our environment, we selected ether for the type of network to configure. 4. Enter the information as follows: Network Name If you do not enter a name, HACMP will give the network a default network name made up of the type of network with a number appended (for example, ether1). If you change the name for this network, use no more than 32 alphanumeric characters and underscores. This field is filled in depending on the type of network you selected. The netmask (for example, 255.255.255.0). The default is True. If the network does not support IP aliases, then IP Replacement will be used. IP Replacement is the mechanism whereby one IP address is removed from an interface, and another IP address is added to that interface. If you want to use IP Replacement on a network that does support aliases, change the default to False. IP Address Offset for Heartbeating over IP Aliases Enter the base address of a private address range for heartbeat addresses (for example 10.10.10.1). HACMP will use this address to automatically generate IP addresses for heartbeat for each boot interface in the configuration. This address range must be unique and must not conflict with any other subnets on the network. Refer to section Heartbeat Over IP Aliases in Chapter 3, Planning Cluster Network Connectivity in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1,

Network Type Netmask

Enable IP Takeover via IP Aliases

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

255

SC23-4861-00, and to your planning worksheet for more information on selecting a base address for use by Heartbeating over IP Aliases. Clear this entry to use the default heartbeat method. In our environment, we entered the values for the IP-based network as shown in Figure 4-31. We used the network name of net_ether_01, with a netmask of 255.255.254.0 for our lab network, and set an IP address offset for heartbeating over IP aliases of 172.16.100.1, corresponding to the offset we chose during the planning stage. Because our lab systems use network interface cards capable of supporting IP aliases, we leave the flag Enable IP Address Takeover via IP Aliases toggled to Yes.

Add an IP-Based Network to the HACMP Cluster Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [net_ether_01] ether [255.255.254.0] [Yes] [172.16.100.1]

* * * *

Network Name Network Type Netmask Enable IP Address Takeover via IP Aliases IP Address Offset for Heartbeating over IP Aliases

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-31 Add an IP-Based Network to the HACMP Cluster SMIT screen

5. Press Enter to configure this network. 6. Repeat the operation to configure more networks. In our environment, this is the only network we configured, so we did not configure any other HACMP networks.

256

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Configure heartbeating over IP aliases


In HACMP 5.1, you can configure heartbeating over IP Aliases to establish IP-based heartbeat rings over IP Aliases to run over your existing topology. Heartbeating over IP Aliases supports either IP Address Takeover (IPAT) via IP Aliases or IPAT via IP Replacement. The type of IPAT configured determines how HACMP handles the service label: IPAT via IP Aliases The service label, as well as the heartbeat alias, is aliased onto the interface. IPAT via IP Replacement The service label is swapped with the interface IP address, not the heartbeating alias. Note: HACMP removes the aliases from the interfaces at shutdown. It creates the aliases again when the network becomes operational. The /tmp/hacmp.out file records these changes. To configure heartbeating over IP Aliases, you specify an IP address offset when configuring an interface. See the preceding section for details. Make sure that this address does not conflict with addresses configured on your network. When you run HACMP verification, the clverify utility verifies that: The configuration is valid for the address range All interfaces are the same type (for example, Ethernet) and have the same subnet mask The offset address allots sufficient addresses and subnets on the network. In our environment we use IPAT via IP aliases.

Configure HACMP resource groups


This creates a container to organize HACMP resources into logical groups that are defined later. Refer to High Availability Cluster Multi-Processing for AIX Concepts and Facilities Guide Version 5.1, SC23-4864, for an overview of types of resource groups you can configure in HACMP 5.1. Refer to the chapter on planning resource groups in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, for further planning information. You should have your planning worksheets in hand. Using the standard path, you can configure resource groups that use the basic management policies. These policies are based on the three predefined types of startup, fallover, and fallback policies: cascading, rotating, concurrent.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

257

In addition to these, you can also configure custom resource groups for which you can specify slightly more refined types of startup, fallover and fallback policies. Once the resource groups are configured, if it seems necessary for handling certain applications, you can use the Extended Configuration path to change or refine the management policies of particular resource groups (especially custom resource groups). Configuring a resource group involves two phases: Configuring the resource group name, management policy, and the nodes that can own it Adding the resources and additional attributes to the resource group. Refer to your planning worksheets as you name the groups and add the resources to each one. To create a resource group: 1. Enter: smit hacmp. 2. On the HACMP menu, select Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Add a Standard Resource Group and press Enter. You are prompted to select a resource group management policy. 3. Select Cascading, Rotating, Concurrent or Custom and press Enter. For our environment, we used Cascading. Depending on the previous selection, you will see a screen titled Add a Cascading | Rotating | Concurrent | Custom Resource Group. The screen will only show options relevant to the type of the resource group you selected. If you select custom, you will be asked to refine the startup, fallover, and fallback policy before continuing. 4. Enter the field values as follows for a cascading, rotating, or concurrent resource group (Figure 4-32 on page 259): Resource Group Name Enter the desired name. Use no more than 32 alphanumeric characters or underscores; do not use a leading numeric. Do not use reserved words. See List of Reserved Words in Chapter 6 of High Availability Cluster Multi-Processing for AIX Administration and

258

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Troubleshooting Guide Version 5.1, SC23-4862. Duplicate entries are not allowed. Participating Node Names Enter the names of the nodes that can own or take over this resource group. Enter the node with the highest priority for ownership first, followed by the nodes with the lower priorities, in the desired order. Leave a space between node names (for example, NodeA NodeB NodeX). If you choose to define a custom resource group, you define additional fields. We do not use custom resource groups in this redbook for simplicity of presentation. Figure 4-32 shows how we configured resource group rg1 in the environment implemented by this redbook. We use this resource group to contain the instances of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework normally running on tivaix1.

Add a Resource Group with a Cascading Management Policy (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [rg1] [tivaix1 tivaix2]

* Resource Group Name * Participating Node Names / Default Node Priority

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-32 Configure resource group rg1

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

259

Figure 4-33 shows how we configured resource group rg2 in our environment. We used this resource group to contain the instances of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework normally running on tivaix2.

Add a Resource Group with a Cascading Management Policy (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [rg2] [tivaix2 tivaix1]

* Resource Group Name * Participating Node Names / Default Node Priority

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-33 How to configure resource group rg2

Configure cascading without fallback, other attributes


We configured all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Workload Scheduler can be given enough time to quiesce before falling back. This is part of the extended resource group configuration. We use this step to also configure other attributes of the resource groups, such as the associated shared volume group and filesystems. To configure CWOF and other resource group attributes: 1. Enter: smit hacmp.

260

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

2. Go to Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Change/Show Resources for a Standard Resource Group and press Enter to display a list of defined resource groups. 3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, and Participating Node Names (Default Node Priority) fields filled in. Note: SMIT displays only valid choices for resources, depending on the type of resource group that you selected. The fields are slightly different for custom, non-concurrent, and concurrent groups. If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings. 4. Enter the field values as follows: Service IP Label/IP Addresses (Not an option for concurrent or custom concurrent-like resource groups.) List the service IP labels to be taken over when this resource group is taken over. Press F4 to see a list of valid IP labels. These include addresses which rotate or may be taken over. Filesystems (empty is All for specified VGs) (Not an option for concurrent or custom concurrent-like resource groups.) If you leave the Filesystems (empty is All for specified VGs) field blank and specify the shared volume groups in the Volume Groups field below, all file systems will be mounted in the volume group. If you leave the Filesystems field blank and do not specify the volume groups in the field below, no file systems will be mounted. You may also select individual file systems to include in the resource group. Press F4 to see a list of the file systems. In this case only the specified file systems will be mounted when the resource group is brought online. Filesystems (empty is All for specified VGs) is a valid option only for non-concurrent resource groups. Volume Groups (If you are adding resources to a non-concurrent resource group) Identify the shared volume groups

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

261

that should be varied on when this resource group is acquired or taken over. Select the volume groups from the picklist, or enter desired volume groups names in this field. Pressing F4 will give you a list of all shared volume groups in the resource group and the volume groups that are currently available for import onto the resource group nodes. Specify the shared volume groups in this field if you want to leave the field Filesystems (empty is All for specified VGs) blank and to mount all file systems in the volume group. If you specify more than one volume group in this field, then all file systems in all specified volume groups will be mounted; you cannot choose to mount all filesystems in one volume group and not to mount them in another. For example, in a resource group with two volume group (vg1 and vg2), if the field Filesystems (empty is All for specified VGs) is left blank, then all the filesystems in vg1 and vg2 will be mounted when the resource group is brought up. However, if the field Filesystems (empty is All for specified VGs) has only filesystems that are part of the vg1 volume group, then none of the filesystems in vg2 will be mounted, because they were not entered in the Filesystems (empty is All for specified VGs) field along with the filesystems from vg1. If you have previously entered values in the Filesystems field, the appropriate volume groups are already known to the HACMP software. Concurrent Volume Groups (Appears only if you are adding resources to a concurrent or custom concurrent-like resource group.) Identify the shared volume groups that can be accessed simultaneously by multiple nodes. Select the volume groups from the picklist, or enter desired volume groups names in this field. If you previously requested that HACMP collect information about the appropriate volume groups, then pressing F4 will give you a list of all existing concurrent

262

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

capable volume groups that are currently available in the resource group, and concurrent capable volume groups available to be imported onto the nodes in the resource group. Disk fencing is turned on by default. Application Servers Indicate the application servers to include in the resource group. Press F4 to see a list of application servers. Note: If you are configuring a custom resource group, and choose to use a dynamic node priority policy for a cascading-type custom resource group, you will see the field where you can select which one of the three predefined node priority policies you want to use. In our environment, we defined resource group rg1 as shown in Figure 4-34.

Change/Show Resources for a Cascading Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] rg1 tivaix1 tivaix2 [tivaix1_svc] [tiv_vg1] [] [tws_svr1] + + + +

Resource Group Name Participating Node Names (Default Node Priority) * Service IP Labels/Addresses Volume Groups Filesystems (empty is ALL for VGs specified) Application Servers

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-34 Define resource group rg1

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

263

For resource group rg1, we assigned tivaix1_svc as the service IP label, tiv_vg1 as the sole volume group to use, and tws_svr1 for the application server. 5. Press Enter to add the values to the HACMP ODM. 6. Repeat the operation for other resource groups to configure. In our environment, we defined resource group rg2 as shown in Figure 4-35.

Change/Show Resources for a Cascading Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] rg2 tivaix2 tivaix1 [tivaix2_svc] [tiv_vg2] [] [tws_svr2] + + + +

Resource Group Name Participating Node Names (Default Node Priority) * Service IP Labels/Addresses Volume Groups Filesystems (empty is ALL for VGs specified) Application Servers

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-35 Define resource group rg2

Configure cascading without fallback


We configured all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Workload Scheduler can be given enough time to quiesce before falling back. This is part of the extended resource group configuration. To configure CWOF: 1. Enter: smit hacmp. 2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resource Group Configuration -> Change/Show Resources and Attributes for a Resource Group and press Enter.

264

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

SMIT displays a list of defined resource groups. 3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, Inter-site Management Policy, and Participating Node Names (Default Node Priority) fields filled in as shown in Figure 4-36. If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings. 4. Enter true in the Cascading Without Fallback Enabled field by pressing Tab in the field until the value is displayed.

Change/Show All Resources and Attributes for a Cascading Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name Resource Group Management Policy Inter-site Management Policy Participating Node Names / Default Node Priority Dynamic Node Priority (Overrides default) Inactive Takeover Applied Cascading Without Fallback Enabled Application Servers Service IP Labels/Addresses Volume Groups Use forced varyon of volume groups, if necessary Automatically Import Volume Groups [MORE...19] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do [Entry Fields] rg1 cascading ignore tivaix1 tivaix2 [] false true [tws_svr1] [tivaix1_svc] [tiv_vg1] false false

+ + + + + + + +

F4=List F8=Image

Figure 4-36 Set cascading without fallback (CWOF) for a resource group

5. Repeat the operation for any other applicable resource groups. In our environment, we applied the same operation to resource group rg2; all resources and attributes for resource group rg1 are shown in Example 4-36 on page 266.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

265

Example 4-36 All resources and attributes for resource group rg1 [TOP] Resource Group Name Resource Group Management Policy Inter-site Management Policy Participating Node Names / Default Node Priority Dynamic Node Priority (Overrides default) Inactive Takeover Applied Cascading Without Fallback Enabled Application Servers Service IP Labels/Addresses Volume Groups Use forced varyon of volume groups, if necessary Automatically Import Volume Groups Filesystems (empty is ALL for VGs specified) Filesystems Consistency Check Filesystems Recovery Method Filesystems mounted before IP configured Filesystems/Directories to Export Filesystems/Directories to NFS Mount Network For NFS Mount Tape Resources Raw Disk PVIDs Fast Connect Services Communication Links Primary Workload Manager Class Secondary Workload Manager Class Miscellaneous Data [BOTTOM] [Entry Fields] rg1 cascading ignore tivaix1 tivaix2 [] false true [tws_svr1] [tivaix1_svc] [tiv_vg1] false false [/usr/maestro] fsck sequential false [] [] [] [] [] [] [] [] [] []

+ + + + + + + + + + + + + + + + + + + + +

For resource group rg2, all resources and attributes configured for it are shown in Example 4-37.
Example 4-37 All resources and attributes for resource group rg2 [TOP] Resource Group Name Resource Group Management Policy Inter-site Management Policy Participating Node Names / Default Node Priority Dynamic Node Priority (Overrides default) [Entry Fields] rg2 cascading ignore tivaix2 tivaix1 []

266

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Inactive Takeover Applied Cascading Without Fallback Enabled Application Servers Service IP Labels/Addresses Volume Groups Use forced varyon of volume groups, if necessary Automatically Import Volume Groups Filesystems (empty is ALL for VGs specified) Filesystems Consistency Check Filesystems Recovery Method Filesystems mounted before IP configured Filesystems/Directories to Export Filesystems/Directories to NFS Mount Network For NFS Mount Tape Resources Raw Disk PVIDs Fast Connect Services Communication Links Primary Workload Manager Class Secondary Workload Manager Class Miscellaneous Data [BOTTOM]

false true [tws_svr2] [tivaix2_svc] [tiv_vg2] false false [/usr/maestro2] fsck sequential false [] [] [] [] [] [] [] [] [] []

+ + + + + + + + + + + + + + + + + + + +

We used this SMIT screen to overview and configure for the resource groups any resources we may have missed earlier.

Configure pre-event and post-event commands


To define your customized cluster event scripts, take the following steps: 1. Enter: smit hacmp. 2. Go to HACMP Extended Configuration -> Extended Event Configuration -> Configure Pre- or Post-Events -> Add a Custom Cluster Event and press Enter. 3. Enter the field values as follows: Cluster Event Command Name Enter a name for the command. The name can have a maximum of 31 characters.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

267

Cluster Event Description Enter a short description of the event. Cluster Event Script Filename Enter the full pathname of the user-defined script to execute. In our environment, we defined the cluster event quiesce_tws in the Cluster Event Name field for the script we added in Add a custom post-event HACMP script on page 242. We entered the following file pathname to the field Cluster Event Script Filename:
/usr/es/sbin/cluster/sh/quiesce_tws.sh

Figure 4-37 shows how we entered these fields.

Add a Custom Cluster Event Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [quiesce_tws] [] [/usr/es/sbin/cluster/>

* Cluster Event Name * Cluster Event Description * Cluster Event Script Filename

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-37 Add a Custom Cluster Event SMIT screen

4. Press Enter to add the information to HACMP custom in the local Object Data Manager (ODM).

268

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5. Go back to the HACMP Extended Configuration menu and select Verification and Synchronization to synchronize your changes across all cluster nodes. Note: Synchronizing does not propagate the actual new or changed scripts; you must add these to each node manually.

Configure pre-event and post-event processing


Complete the following steps to set up or change the processing for an event. In this step you indicate to the cluster manager to use your customized pre-event or post-event commands. You only need to complete these steps on a single node. The HACMP software propagates the information to the other nodes when you verify and synchronize the nodes. Note: When resource groups are processed in parallel, fewer cluster events occur in the cluster. In particular, only node_up and node_down events take place, and events such as node_up_local, or get_disk_vg_fs do not occur if resource groups are processed in parallel. As a result, the use of parallel processing reduces the number of particular cluster events for which you can create customized pre- or post-event scripts. If you start using parallel processing for some of the resource groups in your configuration, be aware that your existing event scripts may not work for these resource groups. For more information, see Appendix C, Resource Group Behavior During Cluster Events in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, and the chapter on planning events in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00. To configure pre- and post-events for customized event processing, and specifically the quiesce_tws post-event script, follow these steps: 1. Enter: smit hacmp. 2. Select HACMP Extended Configuration -> Extended Event Configuration -> Change/Show Pre-defined HACMP Events to display a list of cluster events and subevents. 3. Select an event or subevent that you want to configure and press Enter. SMIT displays the screen with the event name, description, and default event command shown in their respective fields.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

269

In our environment, we used node_up_complete as the event to configure. 4. Enter field values as follows: Event Name Description Event Command The name of the cluster event to be customize. A brief description of the events function. This information cannot be changed. The full pathname of the command that processes the event. The HACMP software provides a default script. If additional functionality is required, it is strongly recommended that you make changes by adding pre-or post-event processing of your own design, rather than by modifying the default scripts or writing new ones. (Optional) Enter the full pathname of a user-supplied script to run both before and after a cluster event. This script can notify the system administrator that an event is about to occur or has occurred. The arguments passed to the command are: the event name, one keyword (either start or complete), the exit status of the event (if the keyword was complete), and the same trailing arguments passed to the event command. Pre-Event Command (Optional) If you have defined custom cluster events, press F4 for the list. Or, enter the name of a custom-defined event to run before the HACMP Cluster event command executes. This command provides pre-processing before a cluster event occurs. The arguments passed to this command are the event name and the trailing arguments passed to the event command. Remember that the ClusterManager will not process the event until this pre-event script or command has completed. Post-Event Command (Optional) If you have defined custom cluster events, press F4 for the list. Or, enter the name of the custom event to run after the HACMP Cluster event command executes successfully. This script provides post-processing after a cluster event. The arguments passed to this command are the event name, event exit status, and the trailing arguments passed to the event command.

Notify Command

270

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Recovery Command (Optional) Enter the full pathname of a user-supplied script or AIX command to execute to attempt to recover from a cluster event command failure. If the recovery command succeeds and the retry count is greater than zero, the cluster event command is rerun. The arguments passed to this command are the event name and the arguments passed to the event command. Recovery Counter Enter the number of times to run the recovery command. Set this field to zero if no recovery command is specified, and to at least one (1) if a recovery command is specified.

In our environment, we enter the quiesce_tws post-event command for the node_up_complete event, as shown in Figure 4-38.

Change/Show Cluster Events Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] Event Name Description * Event Command Notify Command Pre-event Command Post-event Command Recovery Command * Recovery Counter node_up_complete Script run after the > [/usr/es/sbin/cluster/> [] [] [quiesce_tws] [] [0]

+ + #

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-38 Add quiesce_tws script in Change/Show Cluster Events SMIT screen

5. Press Enter to add this information to the HACMP ODM.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

271

6. Return to the HACMP Extended Configuration screen and synchronize your event customization by selecting the Verification and Synchronization option. Note that all HACMP event scripts are maintained in the /usr/es/sbin/cluster/events directory. The parameters passed to a script are listed in the scripts header. If you want to modify the node_up_complete event itself, for example, you could customize it by locating the corresponding script in this directory. See Chapter 8, Monitoring an HACMP Cluster in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, for a discussion of event emulation to see how to emulate HACMP event scripts without actually affecting the cluster.

Configure HACMP persistent node IP label/addresses


A persistent node IP label is an IP alias that can be assigned to a network for a specified node. A persistent node IP label is a label which: Always stays on the same node (is node-bound). Co-exists with other IP labels present on an interface. Does not require installing an additional physical interface on that node. Is not part of any resource group. Assigning a persistent node IP label for a network on a node allows you to have a node-bound address on a cluster network that you can use for administrative purposes to access a specific node in the cluster. Refer to Configuring HACMP Persistent Node IP Labels/Addresses in Chapter 3, Configuring HACMP Cluster Topology and Resources (Extended) in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, for information about persistent node IP labels prerequisites. To add persistent node IP labels, follow these steps: 1. Enter: smit hacmp. 2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Persistent Node IP Label/Addresses -> Add a Persistent Node IP Label/Address and press Enter. The Select a Node SMIT dialog shows cluster nodes currently defined for the cluster. 3. Select a node to add a persistent node IP label/address to and then press Enter, as shown in the following figure. The Add a Persistent Node IP Label/Address SMIT screen is displayed. In our environment, we start with cluster node tivaix1, as shown in Figure 4-39 on page 273.

272

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+--------------------------------------------------------------------------+ Select a Node Move cursor to desired item and press Enter. tivaix1 tivaix2 F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-39 Select a Node SMIT dialog

4. Enter the field values as follows: Node Name Network Name The name of the node on which the IP label/address will be bound. The name of the network on which the IP label/address will be bound. The IP label/address to keep bound to the specified node. In our environment, we enter net_ether_01 for the Network Name field, and tivaix1 for the Node IP Label/Address field, as shown in Figure 4-40 on page 274.

Node IP Label/Address

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

273

Add a Persistent Node IP Label/Address Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] tivaix1 [net_ether_01] [tivaix1]

* Node Name * Network Name * Node IP Label/Address

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-40 Add a Persistent Node IP Label/Address SMIT screen for tivaix1

Note: If you want to use any HACMP IP address over DNS, do not use underscores in the IP hostname, because DNS does not recognize underscores. The use of underscores in the IP hostnames in our environment was a way to ensure that they were never introduced into the labs DNS server. We entered these values by pressing F4 to select them from a list. In our environment, the list for the Network Name field is shown in Figure 4-41 on page 275.

274

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+--------------------------------------------------------------------------+ Network Name Move cursor to desired item and press Enter. net_ether_01 (9.3.4.0/23 192.168.100.0/23 10.1.0.0/23) F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-41 Network Name SMIT dialog

The selection list dialog for the Node IP Label/Address is similar. 5. Press Enter. In our environment, we also created a persistent node IP label for cluster node tivaix2, as shown in Figure 4-42 on page 276. Note that we used the enter the same Network Name field value.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

275

Add a Persistent Node IP Label/Address Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] tivaix2 [net_ether_01] [tivaix2]

* Node Name * Network Name * Node IP Label/Address

+ +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-42 Add a Persistent Node IP Label/Address SMIT screen for tivaix2

Configure predefined communication interfaces


In our environment, communication interfaces and devices were already configured to AIX, and needed to be configured to HACMP (that means no HACMP discovery). To add predefined network interfaces to the cluster, follow these steps: 1. Enter: smit hacmp. 2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Communication Interfaces/Devices -> Add Communication Interfaces/Devices and press Enter. A SMIT selector screen appears that lets you add previously discovered, or previously defined network interfaces: Add Discovered Communication Interfaces and Devices Displays a list of interfaces and devices which HACMP has been able to determine as being already

276

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

configured to the operating system on a node in the cluster. Add Pre-defined Communication Interfaces and Devices Displays a list of all communication interfaces and devices supported by HACMP. Select the predefined option, as shown in Figure 4-43. SMIT displays a selector screen for the Predefined Communications Type.

+--------------------------------------------------------------------------+ Select a category Move cursor to desired item and press Enter. Add Discovered Communication Interface and Devices Add Pre-defined Communication Interfaces and Devices F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-43 Select Add a Pre-defined Communication Interface to HACMP Cluster configuration

3. Select Communication Interfaces as shown in Figure 4-44 and press Enter. The Select a Network SMIT selector screen appears.

+--------------------------------------------------------------------------+ Select the Pre-Defined Communication type Move cursor to desired item and press Enter. Communication Interfaces Communication Devices F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-44 Select the Pre-Defined Communication type SMIT selector screen

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

277

4. Select a network, as shown in Figure 4-45, and press Enter.

+--------------------------------------------------------------------------+ Select a Network Move cursor to desired item and press Enter. net_ether_01 (9.3.4.0/23 192.168.100.0/23 10.1.0.0/23) F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-45 Select a Network SMIT selector screen

The Add a Communication Interface screen appears. In our environment we only had one network, net_ether_01, and we selected that network. 5. Fill in the fields as follows: Node Name Network Name Network Interface IP Label/Address The name of the node on which this network interface physically exists. A unique name for this logical network. Enter the network interface associated with the communication interface (for example, en0). The IP label/address associated with this communication interface which will be configured on the network interface when the node boots. The picklist filters out IP labels/addresses already configured to HACMP. The type of network media/protocol (for example, Ethernet, Token Ring, FDDI, and so on). Select the type from the predefined list of network types.

Network Type

Note: The network interface that you are adding has the base or service function by default. You do not specify the function of the network interface as in releases prior to HACMP 5.1, but further configuration defines the function of the interface. In our environment, we enter the IP label tivaix1_bt1 for interface en0 on cluster node tivaix1 as shown in Figure 4-46 on page 279.

278

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Add a Communication Interface Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [tivaix1_bt1] ether net_ether_01 [tivaix1] [en0]

* * * *

IP Label/Address Network Type Network Name Node Name Network Interface

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-46 Add a Communication Interface SMIT screen

6. Repeat this operation for any remaining communication interfaces that you planned for earlier. In our environment, we configured the communication interfaces shown in Table 4-1to HACMP network net_ether_01. Note that the first row corresponds to Figure 4-46.
Table 4-1 Communication interfaces to configure for network net_ether_01 Network Interface en0 en1 en0 en1 IP Label/Address tivaix1_bt1 (192.168.10) tivaix1_bt2 (10.1.1.101) tivaix2_bt1 (192.168.10) tivaix2_bt2 (10.1.1.101) Node Name tivaix1 tivaix1 tivaix2 tivaix2

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

279

If you configure a Target Mode SSA network as described in Configure heartbeating on page 213, you should not have to configure the interfaces listed in Table 4-2; we only show this information so you can verify other HACMP communication interface configurations. For HACMP network net_tmssa_01, we configured the following communication interfaces.
Table 4-2 Communication interfaces to configure for network tivaix1_tmssa2_01 Device Name tivaix1_tmssa1_01 tivaix2_tmssa1_01 Device Path /dev/tmssa2 /dev/tmssa1 Node Name tivaix1 tivaix2

Verify the configuration


When all the resource groups are configured, verify the cluster components and operating system configuration on all nodes to ensure compatibility. If no errors are found, the configuration is then copied (synchronized) to each node in the cluster. If Cluster Services are running on any node, the configuration changes will take effect, possibly causing one or more resources to change state. Complete the following steps to verify and synchronize the cluster topology and resources configuration: 1. Enter: smit hacmp. 2. Go to Initialization and Standard Configuration -> HACMP Verification and Synchronization and press Enter. SMIT runs the clverify utility. The output from the verification is displayed in the SMIT Command Status window. If you receive error messages, make the necessary changes and run the verification procedure again. You may see warnings if the configuration has a limitation on its availability (for example, only one interface per node per network is configured). Figure 4-47 on page 281 shows a sample SMIT screen of a successful verification of an HACMP configuration.

280

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] Verification to be performed on the following: Cluster Topology Cluster Resources Retrieving data from available cluster nodes. This could take a few minutes.... Verifying Cluster Topology...

Verifying Cluster Resources... WARNING: Error notification stanzas will be added during synchronization for the following: [MORE...40] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 4-47 COMMAND STATUS SMIT screen for successful verification of an HACMP Cluster configuration

It is useful to view the cluster configuration to document it for future reference. To display the HACMP Cluster, follow these steps: 1. Enter: smit hacmp. 2. Go to Initialization and Standard Configuration -> Display HACMP Configuration and press Enter. SMIT displays the current topology and resource information. The configuration for our environment is shown in Figure 4-48 on page 282.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

281

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] Cluster Description of Cluster: cltivoli Cluster Security Level: Standard There are 2 node(s) and 3 network(s) defined NODE tivaix1: Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix1_bt2 10.1.1.101 tivaix1_bt1 192.168.100.101 Network net_tmssa_01 Network net_tmssa_02 tivaix1_tmssa2_01 /dev/tmssa2 [MORE...21] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 4-48 COMMAND STATUS SMIT screen for our environments configuration

If you want to obtain the same information from the command line, use the cltopinfo command as shown in Example 4-38.
Example 4-38 Obtain the HACMP configuration using the cltopinfo command [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/cltopinfo Cluster Description of Cluster: cltivoli Cluster Security Level: Standard There are 2 node(s) and 3 network(s) defined NODE tivaix1: Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix1_bt2 10.1.1.101 tivaix1_bt1 192.168.100.101 Network net_tmssa_01 tivaix1_tmssa2_01 /dev/tmssa2 NODE tivaix2:

282

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix2_bt1 192.168.100.102 tivaix2_bt2 10.1.1.102 Network net_tmssa_01 tivaix2_tmssa1_01 /dev/tmssa1 Resource Group rg1 Behavior Participating Nodes Service IP Label Resource Group rg2 Behavior Participating Nodes Service IP Label

cascading tivaix1 tivaix2 tivaix1_svc

cascading tivaix2 tivaix1 tivaix2_svc

The clharvest_vg command can also be used for a more detailed configuration information, as shown in Example 4-39.
Example 4-39 Gather detailed shared volume group information with the clharvest_vg command [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clharvest_vg -w Initializing.. Gathering cluster information, which may take a few minutes... Processing... Storing the following information in file /usr/es/sbin/cluster/etc/config/clvg_config tivaix1: Hdisk: hdisk0 PVID: 0001813fe67712b5 VGname: rootvg VGmajor: active Conc-capable: Yes VGactive: No Quorum-required:Yes Hdisk: hdisk1 PVID: 0001813f1a43a54d VGname: rootvg VGmajor: active Conc-capable: Yes VGactive: No Quorum-required:Yes Hdisk: hdisk2 PVID: 0001813f95b1b360

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

283

VGname: rootvg VGmajor: active Conc-capable: Yes VGactive: No Quorum-required:Yes Hdisk: hdisk3 PVID: 0001813fc5966b71 VGname: rootvg VGmajor: active Conc-capable: Yes VGactive: No Quorum-required:Yes Hdisk: hdisk4 PVID: 0001813fc5c48c43 VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk5 PVID: 0001813fc5c48d8c VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk6 PVID: 000900066116088b VGname: tiv_vg1 VGmajor: 45 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk7 PVID: 000000000348a3d6 VGname: tiv_vg1 VGmajor: 45 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk8 PVID: 00000000034d224b VGname: tiv_vg2 VGmajor: 46 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk9 PVID: none

284

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk10 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk11 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk12 PVID: 00000000034d7fad VGname: tiv_vg2 VGmajor: 46 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk13 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No FREEMAJORS: 48... tivaix2: Hdisk: hdisk0 PVID: 0001814f62b2a74b VGname: rootvg VGmajor: active Conc-capable: Yes VGactive: No Quorum-required:Yes Hdisk: hdisk1 PVID: none VGname: None VGmajor: 0 Conc-capable: No

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

285

VGactive: No Quorum-required:No Hdisk: hdisk2 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk3 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk4 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk5 PVID: 000900066116088b VGname: tiv_vg1 VGmajor: 45 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk6 PVID: 000000000348a3d6 VGname: tiv_vg1 VGmajor: 45 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk7 PVID: 00000000034d224b VGname: tiv_vg2 VGmajor: 46 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk16 PVID: 0001814fe8d10853 VGname: None VGmajor: 0 Conc-capable: No

286

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

VGactive: No Quorum-required:No Hdisk: hdisk17 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk18 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk19 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No Hdisk: hdisk20 PVID: 00000000034d7fad VGname: tiv_vg2 VGmajor: 46 Conc-capable: No VGactive: No Quorum-required:Yes Hdisk: hdisk21 PVID: none VGname: None VGmajor: 0 Conc-capable: No VGactive: No Quorum-required:No FREEMAJORS: 48...

Start HACMP Cluster services


After verifying the HACMP configuration, start HACMP Cluster services. Before starting HACMP Cluster services, verify that all network interfaces are configured with the boot IP labels. Example 4-40 on page 288 for tivaix1 shows how to use the ifconfig and host commands to verify that the configured IP addresses (192.168.100.101, 9.3.4.194, and 10.1.1.101 in the example, highlighted in bold) on the network interfaces all correspond to boot IP labels.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

287

Example 4-40 Configured IP addresses before starting HACMP Cluster services on tivaix1 [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255 tcp_sendspace 131072 tcp_recvspace 65536 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] host 192.168.100.101 tivaix1_bt1 is 192.168.100.101, Aliases: tivaix1 [root@tivaix1:/home/root] host 9.3.4.194 tivaix1 is 9.3.4.194, Aliases: tivaix1.itsc.austin.ibm.com [root@tivaix1:/home/root] host 10.1.1.101 tivaix1_bt2 is 10.1.1.101

Example 4-41 shows the configured IP addresses before HACMP starts for tivaix2.
Example 4-41 Configured IP addresses before starting HACMP Cluster services on tivaix2 [root@tivaix2:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.102 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.195 netmask 0xfffffe00 broadcast 9.3.5.255 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.102 netmask 0xfffffe00 broadcast 10.1.1.255 tcp_sendspace 131072 tcp_recvspace 65536 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix2:/home/root] host 192.168.100.102 tivaix2_bt1 is 192.168.100.102 [root@tivaix2:/home/root] host 9.3.4.195 tivaix2 is 9.3.4.195, Aliases: tivaix2.itsc.austin.ibm.com [root@tivaix2:/home/root] host 10.1.1.102 tivaix2_bt2 is 10.1.1.102

To start HACMP Cluster services:

288

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

1. Enter: smit hacmp. 2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Start Cluster Services and press Enter. The Start Cluster Services SMIT screen is displayed. 3. Add all cluster nodes you want to start to the Start Cluster Services on these nodes field as a comma-separated list of cluster node names. Press Enter to start HACMP Cluster services on the selected cluster nodes. In our environment, we enter the cluster node names tivaix1 and tivaix2 as shown in Figure 4-49.

Start Cluster Services Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] now [tivaix1,tivaix2] true false true false

* Start now, on system restart or both Start Cluster Services on these nodes BROADCAST message at startup? Startup Cluster Lock Services? Startup Cluster Information Daemon? Reacquire resources after forced down ?

+ + + + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-49 Start Cluster Services SMIT screen

4. The COMMAND STATUS SMIT screen displays the progress of the start operation, and will appear similar to Figure 4-50 on page 303 if successful.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

289

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP]

Starting Cluster Services on node: tivaix1 This may take a few minutes. Please wait... tivaix2: start_cluster: Starting HACMP tivaix2: 0513-029 The portmap Subsystem is already active. tivaix2: Multiple instances are not supported. tivaix2: 0513-029 The inetd Subsystem is already active. tivaix2: Multiple instances are not supported. tivaix2: 8832 - 0:00 syslogd tivaix2: Setting routerevalidate to 1 tivaix2: 0513-059 The topsvcs Subsystem has been started. Subsystem PID is 19384 [MORE...30] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 4-50 COMMAND STATUS SMIT screen displaying successful start of cluster services

Check the network interfaces again after the start operation is complete. The service IP label and the IP addresses for heartbeating over IP aliases are populated into the network interfaces after HACMP starts. The service IP address is populated into any available network interface; HACMP selects which network interface. One IP address for heartbeating over IP aliases is populated by HACMP for each available network interface. Example 4-42 on page 291 shows the configured IP addresses on the network interfaces of tivaix1 after HACMP is started. Note that three new IP addresses are added into our environment, 172.16.100.2, 172.16.102.2, and 9.3.4.3, highlighted in bold in the example output. The IP addresses for heartbeating over IP aliases are 172.16.100.2 and 172.16.102.2. The service IP address is 9.3.4.3.

290

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 4-42 Configured IP addresses after starting HACMP Cluster services on tivaix1 [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255 inet 172.16.100.2 netmask 0xfffffe00 broadcast 172.16.101.255 tcp_sendspace 131072 tcp_recvspace 65536 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255 inet 172.16.102.2 netmask 0xfffffe00 broadcast 172.16.103.255 inet 9.3.4.3 netmask 0xfffffe00 broadcast 9.3.5.255 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] host 172.16.100.2 host: 0827-803 Cannot find address 172.16.100.2. [root@tivaix1:/home/root] host 172.16.102.2 host: 0827-803 Cannot find address 172.16.102.2. [root@tivaix1:/home/root] host 9.3.4.3 tivaix1_svc is 9.3.4.3

In our environment we do not assign IP hostnames to the IP addresses for heartbeating over IP aliases, so the host commands for these addresses return an error. Example 4-43 shows the IP addresses populated by HACMP after it is started on tivaix2. The addresses on tivaix2 are 172.16.100.3, 172.16.102.3 for the IP addresses for heartbeating over IP aliases, and 9.3.4.4 for the service IP label, highlighted in bold.
Example 4-43 Configured IP addresses after starting HACMP Cluster services on tivaix2 [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.102 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.195 netmask 0xfffffe00 broadcast 9.3.5.255 inet 172.16.100.3 netmask 0xfffffe00 broadcast 172.16.101.255 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.102 netmask 0xfffffe00 broadcast 10.1.1.255 inet 172.16.102.3 netmask 0xfffffe00 broadcast 172.16.103.255 inet 9.3.4.4 netmask 0xfffffe00 broadcast 9.3.5.255 tcp_sendspace 131072 tcp_recvspace 65536 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

291

inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] host 172.16.100.3 host: 0827-803 Cannot find address 172.16.100.3. [root@tivaix1:/home/root] host 172.16.102.3 host: 0827-803 Cannot find address 172.16.102.3. [root@tivaix1:/home/root] host 9.3.4.4 tivaix2_svc is 9.3.4.4

HACMP is now started on the cluster.

Verify HACMP status


Ensure that HACMP has actually started before starting to use its features. Log into the first node as root user and follow these steps: 1. Enter: smit hacmp. 2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Show Cluster Services and press Enter to move a resource group. The COMMAND STATUS SMIT screen is displayed with the current status of all HACMP subsystems on the current node, similar to Figure 4-51 on page 293.

292

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. Subsystem clstrmgrES clinfoES clsmuxpdES cllockdES Group cluster cluster cluster lock PID 16684 12950 26856 Status active active active inoperative

F1=Help F8=Image n=Find Next

F2=Refresh F9=Shell

F3=Cancel F10=Exit

F6=Command /=Find

Figure 4-51 Current status of all HACMP subsystems on a cluster node

3. You can also verify the status of each node on an HACMP Cluster by running the following command:
/usr/es/sbin/cluster/utilities/clshowsrv -a

This produces output similar to Example 4-44.


Example 4-44 Using the command line to obtain the current status of all HACMP subsystems on a cluster node $ /usr/es/sbin/cluster/utilities/clshowsrv -a Subsystem Group PID clstrmgrES cluster 16684 clinfoES cluster 12950 clsmuxpdES cluster 26856 cllockdES lock Status active active active inoperative

Whether using SMIT or the command line, only the following HACMP subsystems must be active on each node in the cluster: clstrmgrES, clinfoES, and clsmuxpdES. All other subsystems should be active if their services are required by your application(s).

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

293

Repeat the procedure for all remaining nodes in the cluster. In our cluster, we repeated the procedure on tivaix2, and verified that the same subsystems are active.

Test HACMP resource group moves


Manually testing the movement of resource groups between cluster nodes further validates the HACMP configuration of the resource groups. If a resource group does not fall over to a cluster node after it was successfully moved manually, then you immediately know that addressing the issue involves addressing the HACMP fallover process, and likely not the resource group configuration. To test HACMP resource group moves, follow these steps: 1. Enter: smit hacmp. 2. Go to System Management (C-SPOC) -> HACMP Resource Group and Application Management -> Move a Resource Group to Another Node and press Enter to move a resource group. The Select a Resource Group SMIT dialog is displayed. 3. Move the cursor to resource group rg1, as shown in Figure 4-52, and press Enter.

+--------------------------------------------------------------------------+ Select a Resource Group Move cursor to desired item and press Enter. Use arrow keys to scroll. # # Resource Group State Node(s) / Site # rg1 ONLINE tivaix1 / # rg2 ONLINE tivaix2 / F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-52 Select a Resource Group SMIT dialog

4. Move the cursor to destination node tivaix2, as shown in Figure 4-53 on page 295, and press Enter.

294

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+--------------------------------------------------------------------------+ Select a Destination Node Move cursor to desired item and press Enter. Use arrow keys to scroll. # To choose the highest priority available node for the # resource group, and to remove any Priority Override Location # that is set for the resource group, select # "Restore_Node_Priority_Order" below. Restore_Node_Priority_Order # To choose a specific node, select one below. # # Node Site # tivaix2 F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 4-53 Select a Destination Node SMIT dialog

5. The Move a Resource Group SMIT dialog is displayed as in Figure 4-54 on page 296. Press Enter to start moving resource group rg2 to destination node tivaix2.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

295

Move a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] rg1 tivaix2 false

Resource Group to be Moved Destination Node Persist Across Cluster Reboot?

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-54 Move a Resource Group SMIT screen

6. A COMMAND STATUS SMIT screen displays the progress of the resource group move. It takes about two minutes to complete the resource group move in our environment (it might take longer, depending upon your environments specific details). When the resource group move is complete, the COMMAND STATUS screen displays the results of the move. This is shown in Figure 4-55 on page 297, where we move resource group rg1 to cluster node tivaix2.

296

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. [TOP] Attempting to move group rg1 to node tivaix2. Waiting for cluster to process the resource group movement request..... Waiting for the cluster to stabilize............ Resource group movement successful. Resource group rg1 is online on node tivaix2.

----------------------------------------------------------------------------Group Name Type State Location Priority Override [MORE...8] F1=Help F8=Image n=Find Next F2=Refresh F9=Shell F3=Cancel F10=Exit F6=Command /=Find

Figure 4-55 COMMAND STATUS SMIT screen for moving a resource group

7. Repeat the process of moving resource groups in comprehensive patterns to verify that all possible resource group moves can be performed by HACMP. Table 4-3 lists all the resource group moves that we performed to test all possible combinations. (Note that you have already performed the resource group move listed in the first line of this table.)
Table 4-3 Resource group movement combinations to test Resource Group Destination Node Resource Groups in tivaix1 after move none rg2 rg1, rg2 rg1 Resource Groups in tivaix2 after move rg1, rg2 rg1 none rg2

rg1 rg2 rg1 rg2

tivaix2 tivaix1 tivaix1 tivaix2

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

297

Of course, if you add more cluster nodes to a mutual takeover configuration, you will need to test more combinations of resource group moves. We recommend that you automate the testing if possible for clusters of six or more cluster nodes.

Live test of HACMP fallover


After testing HACMP manually, perform a live test of its fallover capabilities. Restriction: Do not perform this procedure unless you are absolutely certain that all users are logged off the node and that restarting the node hardware is allowed. This procedure involves restarting the node, which can lead to lost data if it is performed while users are still logged into the node. A live test ensures that HACMP performs as expected during fallover and fallback incidents. To perform a live test of HACMP in our environment: 1. Make sure that HACMP is running on all cluster nodes before starting this operation. 2. On the node you want to simulate a catastrophic failure upon, run the sync command several times, followed by the halt command:
sync ; sync ; sync ; halt -q

This flushes disk buffers to the hard disks and immediately halts the machine, simulating a catastrophic failure. Running sync multiple times is not strictly necessary on modern AIX systems, but it is performed as a best practice measure. If the operation is successful, the terminal displays the following message:
....Halt completed....

In our environment, we ran the halt command on tivaix2. 3. If you are logged in remotely to the node, your remote connection is disconnected shortly after this message is displayed. To verify the success of the test, log into the node that will accept the failed nodes resource group(s) and inspect the resource groups reported for that node using the lsvg, ifconfig and clRGinfo commands. In our environment, we logged into tivaix2, then ran the halt command. We then logged into tivaix1, and ran the lsvg, ifconfig, and clRGinfo commands to identify the volume groups, service label/service IP addresses, and resource groups that fall over from tivaix2, as shown in Example 4-45.
Example 4-45 Using commands on tivaix1 to verify that tivaix2 falls over to tivaix1 [root@tivaix1:/home/root] hostname tivaix1 [root@tivaix1:/home/root] lsvg -o

298

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tiv_vg2 tiv_vg1 rootvg [root@tivaix1:/home/root] ifconfig -a en0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64B IT,PSEG,CHAIN> inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.3 netmask 0xfffffe00 broadcast 9.3.5.255 inet 9.3.4.4 netmask 0xfffffe00 broadcast 9.3.5.255 tcp_sendspace 131072 tcp_recvspace 65536 en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64B IT,PSEG,CHAIN> inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255 inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536 [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clRGinfo ----------------------------------------------------------------------------Group Name Type State Location ----------------------------------------------------------------------------rg1 cascading ONLINE tivaix1 OFFLINE tivaix2 rg2 cascading OFFLINE ONLINE tivaix2 tivaix1

Note how volume group tiv_vg2 and the service IP label/IP address 9.3.4.4, both normally found on tivaix1, fall over to tivaix1. Also note that resource group rg2 is listed in the OFFLINE state for tivaix2, but in the ONLINE state for tivaix1. 4. If you would like to get a simple list of the resource groups that are in the ONLINE state on a specific node, run the short script shown in Example 4-46 on the node you want to inspect for resource groups in the ONLINE state, replacing the string tivaix1 with the cluster node of your choice:
Example 4-46 List resource groups in ONLINE state for a node /usr/es/sbin/cluster/utilities/clRGinfo -s | grep ONLINE | grep tivaix1 | \ awk -F':' '{ print $1 }'

In our environment, this script is run on tivaix1 and returns the results shown in Example 4-47 on page 300. This indicates that resource group rg2, which used to run on cluster node tivaix2, is now on cluster node tivaix1.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

299

Example 4-47 Obtain a simple list of resource groups that are in the ONLINE state on a specific node [root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clRGinfo -s | grep ONLINE | \ > grep tivaix1 | awk -F':' '{ print $1 }' rg1 rg2

5. After the test, power back on the halted node. In our environment, we powered back on tivaix2. 6. Start HACMP on the node that was halted after it powers back on. The node reintegrates back into the cluster. 7. Verify that Cascading Without Fallback (CWOF) works. In our environment, we made sure that resource group rg2 still resides on cluster node tivaix1. 8. Move the resource group back to its original node, using the preceding procedure for testing resource groups moves. In our environment, we moved resource group rg2 to tivaix2. 9. Repeat the operation for other potential failure modes. In our environment, we tested halting cluster node tivaix1, and verified that resource group rg1 moved to cluster node tivaix2.

Configure HACMP to start on system restart


When you are satisfied with the verification of HACMPs functionality, configure AIX to automatically start the cluster subsystems when the node starts. The node then automatically joins the cluster when the machine restarts. 1. Enter: smit hacmp. 2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Start Cluster Services and press Enter to configure HACMPs cluster start attributes. The Start Cluster Services SMIT dialog is displayed as shown in Figure 4-56 on page 301.

300

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Start Cluster Services Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] restart [tivaix2] true false true false

* Start now, on system restart or both Start Cluster Services on these nodes BROADCAST message at startup? Startup Cluster Lock Services? Startup Cluster Information Daemon? Reacquire resources after forced down ?

+ + + + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 4-56 How to start HACMP on system restart

3. In the Start now, on system restart or both field, press Tab to change the value to restart as shown in Example 4-56 on page 321t, hen press Enter so the cluster subsystems will start when the machine restarts. HACMP now starts on the cluster nodes automatically when the node restarts.

Verify IBM Tivoli Workload Scheduler fallover


When halting cluster nodes during testing in Live test of HACMP fallover on page 298, IBM Tivoli Workload Scheduler will also start appropriately when a resource group is moved. Once you verify that a resource groups disk and network resources have moved, you must verify that IBM Tivoli Workload Scheduler itself functions in its new cluster node (or in HACMP terms, verify that the application server resource of the resource group is functions in the new cluster node). In our environment, we perform the live test of HACMP operation at least twice: once to test HACMP resource group moves of disk and network resources in response to a sudden halt of a cluster node, and again while verifying IBM Tivoli Workload Scheduler is running on the appropriate cluster node(s).

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

301

To verify that IBM Tivoli Workload Scheduler is running during a test of a cluster node fallover from tivaix2 to tivaix1: 1. Log into the surviving cluster node as any user. 2. Run the following command:
ps -ef | grep -v grep | grep maestro

The output should be similar to the following figure. Note that there are two instances of IBM Tivoli Workload Scheduler, because there are two instances of the processes batchman, netman, jobman, and mailman. Each pair of instances is made up of one process owned by the TWSuser user account maestro, and another owned by maestro2.
Example 4-48 Sample output of command to verify IBM Tivoli Workload Scheduler is moved by HACMP [root@tivaix1:/home/root] ps -ef | grep -v grep | grep maestro maestro 13440 38764 0 15:56:41 - 0:00 /usr/maestro/bin/batchman -parm 32000 maestro2 15712 1 0 18:57:44 - 0:00 /usr/maestro2/bin/netman maestro2 26840 15712 0 18:57:55 - 0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE maestro 30738 1 0 15:56:29 - 0:00 /usr/maestro/bin/netman root 35410 13440 0 15:56:42 - 0:00 /usr/maestro/bin/jobman root 35960 40926 0 18:57:56 - 0:00 /usr/maestro2/bin/jobman maestro 38764 30738 0 15:56:40 - 0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE maestro2 40926 26840 0 18:57:56 - 0:00 /usr/maestro2/bin/batchman -parm 32000

The command should be repeated while testing that CWOF works. If CWOF works, then the output will remain identical after the halted cluster node reintegrates with the cluster. The command should be repeated again to verify that falling back works. In our environment, after moving a resource group back to the reintegrated cluster node, so tivaix1 and tivaix2 each have their original resource groups, the output of the command on tivaix1 shows just one set of IBM Tivoli Workload Scheduler processes as shown in the following.
Example 4-49 IBM Tivoli Workload Scheduler processes running on tivaix1 after falling back resource group rg2 to tivaix2 [root@tivaix1:/home/root] ps -ef | grep -v grep | grep maestro maestro 13440 38764 0 15:56:41 - 0:00 /usr/maestro/bin/batchman -parm 32000 maestro 30738 1 0 15:56:29 - 0:00 /usr/maestro/bin/netman root 35410 13440 0 15:56:42 - 0:00 /usr/maestro/bin/jobman maestro 38764 30738 0 15:56:40 - 0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE

302

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

The output of the command on tivaix2 in this case also shows only one instance of IBM Tivoli Workload Scheduler. The process IDs are different, but the processes are otherwise the same, as shown in Example 4-50.
Example 4-50 IBM Tivoli Workload Scheduler processes running on tivaix2 after falling back resource group rg2 to tivaix2 [root@tivaix2:/home/root] ps -ef | grep -v grep | grep maestro maestro2 17926 39660 0 19:02:17 - 0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE maestro2 39660 1 0 19:02:06 - 0:00 /usr/maestro2/bin/netman root 47242 47366 0 19:02:19 - 0:00 /usr/maestro2/bin/jobman maestro2 47366 17926 0 19:02:18 - 0:00 /usr/maestro2/bin/batchman -parm 32000

4.1.11 Add IBM Tivoli Management Framework


After IBM Tivoli Workload Scheduler is configured for HACMP and made highly available, you can add IBM Tivoli Management Framework so that the Job Scheduling Console component of IBM Tivoli Workload Scheduler can be used. In this section we show how to plan, install and configure IBM Tivoli Management Framework for a highly available installation of IBM Tivoli Workload Scheduler. The steps include: Planning for IBM Tivoli Management Framework on page 303 Planning the installation sequence on page 312 Stage installation media on page 313 Install base Framework on page 315 Load Tivoli environment variable in .profile files on page 318 Install Tivoli Framework components and patches on page 318 Add IP alias to oserv on page 320 Install IBM Tivoli Workload Scheduler Framework components on page 322 Create additional Connectors on page 328 Configure Framework access on page 330 Interconnect Framework servers on page 331 How to log in using the Job Scheduling Console on page 339 The details of each step follow.

Planning for IBM Tivoli Management Framework


In this section we show the entire process of iteratively planning the integration of IBM Tivoli Management Framework into an HACMP environment specifically

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

303

configured for IBM Tivoli Workload Scheduler. We show successively more functional configurations of IBM Tivoli Management Framework. Note: While we discuss this process after showing you how to configure HACMP for IBM Tivoli Workload Scheduler in this redbook, in an actual deployment this planning occurs alongside the planning for HACMP and IBM Tivoli Workload Scheduler. Configuring multiple instances of IBM Tivoli Management Framework on the same operating system image is not supported by IBM Support. In our highly available IBM Tivoli Workload Scheduler environment of mutual takeover nodes, this means we cannot use two or more instances of IBM Tivoli Management Framework on a single cluster node. In other words, IBM Tivoli Management Framework cannot be configured as an application server in a resource group configured for mutual takeover in a cluster. At the time of writing, while the configuration is technically feasible and even demonstrated in IBM publications such as the IBM Redbook High Availability Scenarios for Tivoli Software, SG24-2032, IBM Support does not sanction this configuration. Due to this constraint, we install an instance of IBM Tivoli Management Framework on a local drive on each cluster node. We then create a Connector for both cluster nodes on each instance of IBM Tivoli Management Framework. The Job Scheduling Console is the primary component of IBM Tivoli Workload Scheduler that uses IBM Tivoli Management Framework. It uses the Job Scheduling Services component in IBM Tivoli Management Framework. The primary object for IBM Tivoli Workload Scheduler administrators to manage in the Job Scheduling Services is the Connector. A Connector holds the specific directory location that an IBM Tivoli Workload Scheduler scheduling engine is installed into. In our environment, this is /usr/maestro for TWS Engine1 that normally runs on tivaix1 and is configured for resource group rg1, and /usr/maestro2 that normally runs on tivaix2 and is configured for resource group rg2. In our environment, under normal operation the relationship of Connectors to IBM Tivoli Workload Scheduler engines and IBM Tivoli Management Framework on cluster nodes is as shown in Figure 4-57 on page 305.

304

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tivaix1 TWS Engine1 /usr/maestro

tivaix2 TWS Engine2 /usr/maestro2

Framework1 Connector1

Framework2 Connector2

9.3.4.3 port 94

9.3.4.4 port 94

Job Scheduling Consoles


Figure 4-57 Relationship of IBM Tivoli Workload Scheduler, IBM Tivoli Management Framework, Connectors, and Job Scheduling Consoles during normal operation of an HACMP Cluster

We use Job Scheduling Console Version 1.3 Fix Pack 1; best practice calls for using at least this level of the Job Scheduling Console or later because it addresses many user interface issues. Its prerequisite is the base install of Job Scheduling Console Version 1.3 that came with your base installation media for IBM Tivoli Workload Scheduler. If you do not already have it installed, download Fix Pack 1 from:
ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_1.3/1.3-JSC-FP01

You can use the environment in this initial configuration as is. Users can log into either TWS Engine1 or TWS Engine2 by logging into the corresponding service IP address. Users can even log into both, but that requires running two instances of the Job Scheduling Console. Figure 4-58 on page 306 shows the display of a users Microsoft Windows 2000 computer running two instances of Job Scheduling Console. Each instance of the Job Scheduling Console is logged into a different cluster node as root user. To run two instances of Job Scheduling Console, simply run it twice.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

305

Figure 4-58 Viewing multiple instances of IBM Tivoli Workload Scheduler on separate cluster nodes on a single display

Note how in the Job Scheduling Console window for Administrator Root_tivaix1-region (root@tivaix1), the scheduling engine for TIVAIX2, is unavailable. The engine for TIVAIX2 is marked by a small icon badge that looks like a red circle with a white X inside it, as shown in Figure 4-59 on page 307.

306

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 4-59 Available scheduling engines when logged into tivaix1 during normal operation

In the Job Scheduling Console window for Administrator Root_tivaix2-region (root@tivaix2), the reverse situation exists: the scheduling engine for TIVAIX1 is unavailable. The engine for TIVAIX1 is similarly marked unavailable as shown in Figure 4-60.

Figure 4-60 Available scheduling engines when logged into tivaix2 during normal operation

This happens because in our environment we actually configure two Connectors (one for each instance of IBM Tivoli Workload Scheduler) on each instance of IBM Tivoli Management Framework, as shown Figure 4-61 on page 308. If we do not configure multiple Connectors in this manner, then for example, when resource group rg2 on tivaix2 falls over to tivaix1, no Connector for TWS Engine2 will exist on tivaix1 after the fallover. In normal operation, when a user logs into tivaix1, they use the Connector for TWS Engine1 (called Connector1 in Figure 4-61 on page 308). But on tivaix1 the Connector for TWS Engine2 does not refer to an active instance of IBM Tivoli Workload Scheduler on tivaix1 because /usr/maestro2 is already mounted and in use on tivaix2.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

307

tivaix1 TWS Engine1 /usr/maestro

tivaix2 TWS Engine2 /usr/maestro2

Framework1 Connector1 Connector2

Framework2 Connector2

Connector1

Job Scheduling Consoles


Figure 4-61 How multiple instances of the Connector work during normal operation

If resource groups rg1 and rg2 are running on a single cluster node, each instance of IBM Tivoli Workload Scheduler in each resource group requires its own Connector. This is why we create two Connectors for each instance of IBM Tivoli Management Framework. The Job Scheduling Console clients connect to IBM Tivoli Workload Scheduler through the IBM Tivoli Management Framework oserv process that listens on interfaces that are assigned the service IP labels. For example, consider the fallover scenario where tivaix2 falls over to tivaix1. It causes resource group rg2 to fall over to tivaix1. As part of this resource group move, TWS Engine2 on /usr/maestro2 is mounted on tivaix1. Connector2 on tivaix1 then determines that /usr/maestro2 contains a valid instance of IBM Tivoli Workload Scheduler, namely TWS Engine2. IBM Tivoli Management Framework is configured to listen to both tivaix1_svc (9.3.4.3) or tivaix2_svc (9.3.4.4). Because HACMP moves these service IP labels as part of the resource group, it makes both scheduling engines TWS Engine1 and TWS Engine2 available to Job Scheduling Console users who log into either tivaix1_svc or tivaix2_svc, even though both service IP labels in this fallover scenario reside on a single cluster node (tivaix1).

308

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

When a Job Scheduling Console session starts, the instance of IBM Tivoli Workload Scheduler it connects to creates authentication tokens for the session. These tokens are held in memory. When the cluster node that this instance of IBM Tivoli Workload Scheduler falls over to another cluster node, these authentication tokens in memory are lost. Note: Users working through the Job Scheduling Console on the instance of IBM Tivoli Workload Scheduler in the cluster node that fails must exit their session and log in through the Job Scheduling Console again. Because the IP service labels are still valid, users simply log into the same service IP label they originally used. As far as Job Scheduling Console users are concerned, if a fallover occurs, they simply log back into the same IP address or hostname. Figure 4-62 shows the fallover scenario where tivaix2 falls over to tivaix1, and the effect upon the Connectors.

tivaix1 TWS Engine1 /usr/maestro TWS Engine2 /usr/maestro2 Framework1 Connector1 Connector2 9.3.4.4 port 94

tivaix2

Framework2

Connector2 Connector1

X X

9.3.4.3 port 94

X
Job Scheduling Consoles

Figure 4-62 Multiple instances of Connectors after tivaix2 falls over to tivaix1

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

309

Note how Job Scheduling Console sessions that were connected to 9.3.4.4 on port 94 used to communicate with tivaix2, but now communicate instead with tivaix1. Users in these sessions see an error dialog window similar to the following figure the next time they attempt to perform an operation.

Figure 4-63 Sample error dialog box in Job Scheduling Console indicating possible fallover of cluster node

Users should be trained to determine identify when this dialog indicates a cluster node failure. Best practice is to arrange for appropriate automatic notification whenever a cluster fallover occurs, whether by e-mail, pager, instant messaging, or other means, and to send another notification when affected resource group(s) are returned to service. When Job Scheduling Console users receive the second notification, they can proceed to log back in again. Once the resource group falls over, understanding when and how Connectors recognize a scheduling engine is key to knowing why certain scheduling engines appear after certain actions. Note: While Job Scheduling Console users from the failed cluster node who log in again will see both scheduling engines, Job Scheduling Console users on the surviving cluster node will not see both engines until at least one user logs into the instance of IBM Tivoli Workload Scheduler that fell over, and after they log in. The scheduling engine that falls over is not available to the Job Scheduling Console of the surviving node until two conditions are met, in the following order: 1. A Job Scheduling Console session against the engine that fell over is started. In the scenario we are discussing where tivaix2 falls over to tivaix1, this means Job Scheduling Console users must log into tivaix2_svc. 2. The Job Scheduling Console users who originally logged into tivaix1_svc (the users of the surviving node, in other words) log out and log back into tivaix1_svc.

310

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

When these conditions are met, Job Scheduling Console users on the surviving node see a scheduling engine pane as shown in Figure 4-64.

Figure 4-64 Available scheduling engines on tivaix1 after tivaix2 falls over to it

Only after a Job Scheduling Console session communicates with the Connector for a scheduling engine is the scheduling engine recognized by other Job Scheduling Console sessions that connect later. Job Scheduling Console sessions that are already connected will not recognize the newly-started scheduling engine because identification of scheduling engines only occurs once during Job Scheduling Console startup. While the second iteration of the design is a workable solution, it is still somewhat cumbersome because it requires users who need to work with both scheduling engines to remember a set of rules. Fortunately, there is one final refinement to our design that helps address some of this awkwardness. The TMR interconnection feature of IBM Tivoli Management Framework allows objects on one instance of IBM Tivoli Management Framework to be managed by another instance, and vice versa. We used a two-way interconnection between the IBM Tivoli Management Framework instances on the two cluster nodes in the environment we used for this redbook to expose the Connectors on each cluster node to other cluster nodes. Now when tivaix2 falls over tivaix1, Job Scheduling Console users see the available scheduling engines, as shown in Figure 4-65.

Figure 4-65 Available Connectors in interconnected Framework environment after tivaix2 falls over to tivaix1

Note that we now define the Connectors by the cluster node and resource group they are used for. So Connector TIVAIX1_rg1 is for resource group rg1 (that is, scheduling engine TWS Engine1) on tivaix1. In Example 4-65, we see Connector TIVAIX1_rg2 is active. It is for resource group rg2 (that is, TWS Engine2) on

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

311

tivaix1, and it is active only when tivaix2 falls over tivaix1. Connector TIVAIX2_rg1 is used if resource group rg1 falls over to tivaix2. Connector TIVAIX2_rg2 would normally be active, but because resource group rg2 has fallen over to tivaix1, it is inactive in the preceding figure. During normal operation of the cluster, the active Connectors are TIVAIX1_rg1 and TIVAIX2_rg2, as shown in Figure 4-66.

Figure 4-66 Available Connectors in interconnected Framework environment during normal cluster operation

In this section we show how to install IBM Tivoli Management Framework Version 4.1 into an HACMP Cluster configured to make IBM Tivoli Workload Scheduler highly available, with all available patches as of the time of writing. We specifically show how to install on tivaix1 in the environment we used for this redbook. Installing on tivaix2 is similar, except the IP hostname is changed where applicable.

Planning the installation sequence


Before installing, plan the sequence of packages to install. The publication Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, describes in detail what needs to be installed. Figure 4-67 on page 313 shows the sequence and dependencies of packages we planned for IBM Tivoli Management Framework Version 4.1 for the environment used for this redbook.

312

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

TMF410

4.1-TMF-0008

4.1-TMF-0014

4.1-TMF-0015

4.1-TMF-0016

4.1-TMF-0017

4.1-TMF-0032

4.1-TMF-0034

odadmin rexec

Figure 4-67 IBM Tivoli Framework 4.1.0 application and patch sequence and dependencies as of December 2, 2003

Stage installation media


We first stage the installation media on a hard disk for ease of installation. If your system does not have sufficient disk space to allow this, you can copy the media to a system that does have enough disk space and use Network File System (NFS), Samba, Andrew File System (AFS) or similar remote file systems to mount the media over the network. In our environment, we created directories and copied the contents of the media and patches to the directories as shown in Table 4-4. The media was copied to both cluster nodes tivaix1 and tivaix2.
Table 4-4 Installation media directories used in our environment Sub-directory under /usr/sys/inst.images/ tivoli tivoli/fra tivoli/fra/FRA410_1of2 tivoli/fra/FRA410_2of2 Description of contents or disc title (or electronic download) Top level of installation media directory. Top level of IBM Tivoli Management Framework media. Tivoli Management Framework v4.1 1 of 2 Tivoli Management Framework v4.1 2 of 2

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

313

Sub-directory under /usr/sys/inst.images/ tivoli/fra/41TMFnnn tivoli/wkb tivoli/wkb/TWS820_1 tivoli/wkb/TWS820_2 tivoli/wkb/8.2-TWS-FP01 tivoli/wkb/JSC130_1 tivoli/wkb/JSC130_2 tivoli/wkb/1.3-JSC-FP01

Description of contents or disc title (or electronic download) Extracted tar file contents of patch 4.1-TMF-0nnn. Top level of IBM Tivoli Workload Scheduler media IBM Tivoli Workload Scheduler V8.2 1 of 2 IBM Tivoli Workload Scheduler V8.2 2 of 2 IBM Tivoli Workload Scheduler V8.2 Fix Pack 1 Job Scheduling Console V1.3 1 of 2 Job Scheduling Console V1.3 2 of 2 Job Scheduling Console V1.3 Fix Pack 1

You can download the patches for IBM Tivoli Management Framework Version 4.1 from:
ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_4.1

Note that we only used the contents of the tar files of each patch into the corresponding patch directory, such that the file PATCH.LST is in the top level of the patch directory. For example, for patch 4.1-TMF-0008, we downloaded the tar file:
ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_4.1/4.1TMF-0008/4.1-TMF-0008.tar

Then we expanded the tar file in /usr/sys/inst.images/tivoli, resulting in a directory called 41TMF008. One of the files beneath that directory was the PATCH.LST file. Example 4-51 shows the top two levels of the directory structure.
Example 4-51 Organization of installation media [root@tivaix1:/home/root] ls /usr/sys/inst.images/tivoli/ ./ ../ fra/ [root@tivaix1:/home/root] ls /usr/sys/inst.images/tivoli/* /usr/sys/inst.images/tivoli/fra: ./ 41TMF014/ 41TMF017/ FRA410_1of2/ ../ 41TMF015/ 41TMF032/ FRA410_2of2/ 41TMF008/ 41TMF016/ 41TMF034/ /usr/sys/inst.images/tivoli/wkb: ./ 1.3-JSC-FP01/ JSC130_1/ TWS820_1/ ../ 8.2-TWS-FP01/ JSC130_2/ TWS820_2/

314

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

After staging the media, install the base product as shown in the following section.

Install base Framework


In this section we show how to install IBM Tivoli Management Framework so it is specifically configured for IBM Tivoli Workload Scheduler on HACMP. This enables you to transition the instances of IBM Tivoli Management Framework used for IBM Tivoli Workload Scheduler to a mutual takeover environment if that becomes a supported feature in the future. We believe the configuration as shown in this section can be started and stopped directly from HACMP in a mutual takeover configuration. When installing IBM Tivoli Management Framework on an HACMP Cluster node in support of IBM Tivoli Workload Scheduler, use the primary IP hostname as the hostname for IBM Tivoli Management Framework. Add an IP alias later for the service IP label. When this configuration is used with the multiple Connector object configuration described in section, this enables Job Scheduling Console users to connect through any instance of IBM Tivoli Management Framework, no matter which cluster nodes fall over. IBM Tivoli Management Framework consists of a base install and various components. You must first prepare for the base install by performing the commands shown in Example 4-52 for cluster node tivaix1, in our environment. On tivaix2, we replace the IP hostname in the first command shown in bold from tivaix1 to tivaix2
Example 4-52 Preparing for installation of IBM Tivoli Management Framework 4.1 [root@tivaix1:/home/root] HOST=tivaix1 [root@tivaix1:/home/root] echo $HOST > /etc/wlocalhost [root@tivaix1:/home/root] WLOCALHOST=$HOST [root@tivaix1:/home/root] export WLOCALHOST [root@tivaix1:/home/root] mkdir /usr/local/Tivoli/install_dir [root@tivaix1:/home/root] cd /usr/local/Tivoli/install_dir [root@tivaix1:/home/root] /bin/sh /usr/sys/inst.images/tivoli/fra/FRA410_1of2/WPREINST.SH to install, type ./wserver -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 [root@tivaix1:/home/root] DOGUI=no [root@tivaix1:/home/root] export DOGUI

After you prepare for the base install, perform the initial installation of IBM Tivoli Management Framework by running the command shown in Example 4-53 on page 316. You will see output similar to this example; depending upon the speed of your server, it will take 5 to 15 minutes to complete.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

315

On tivaix2 in our environment, we run the same command except we change the third line of the command highlighted in bold from tivaix1 to tivaix2.
Example 4-53 Initial installation of IBM Tivoli Management Framework Version 4.1 [root@tivaix1:/home/root] sh ./wserver -y \ -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 \ -a tivaix1 -d \ BIN=/usr/local/Tivoli/bin! \ LIB=/usr/local/Tivoli/lib! \ ALIDB=/usr/local/Tivoli/spool! \ MAN=/usr/local/Tivoli/man! \ APPD=/usr/lib/lvm/X11/es/app-defaults! \ CAT=/usr/local/Tivoli/msg_cat! \ LK=1FN5B4MBXBW4GNJ8QQQ62WPV0RH999P99P77D \ RN=tivaix1-region \ AutoStart=1 SetPort=1 CreatePaths=1 @ForceBind@=yes @EL@=None Using command line style installation..... Unless you cancel, the following operations will be executed: need to copy the CAT (generic) to: tivaix1:/usr/local/Tivoli/msg_cat need to copy the CSBIN (generic) to: tivaix1:/usr/local/Tivoli/bin/generic need to copy the APPD (generic) to: tivaix1:/usr/lib/lvm/X11/es/app-defaults need to copy the GBIN (generic) to: tivaix1:/usr/local/Tivoli/bin/generic_unix need to copy the BUN (generic) to: tivaix1:/usr/local/Tivoli/bin/client_bundle need to copy the SBIN (generic) to: tivaix1:/usr/local/Tivoli/bin/generic need to copy the LCFNEW (generic) to: tivaix1:/usr/local/Tivoli/bin/lcf_bundle.40 need to copy the LCFTOOLS (generic) to: tivaix1:/usr/local/Tivoli/bin/lcf_bundle.40/bin need to copy the LCF (generic) to: tivaix1:/usr/local/Tivoli/bin/lcf_bundle need to copy the LIB (aix4-r1) to: tivaix1:/usr/local/Tivoli/lib/aix4-r1 need to copy the BIN (aix4-r1) to: tivaix1:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix1:/usr/local/Tivoli/spool/tivaix1.db need to copy the MAN (aix4-r1) to: tivaix1:/usr/local/Tivoli/man/aix4-r1 need to copy the CONTRIB (aix4-r1) to: tivaix1:/usr/local/Tivoli/bin/aix4-r1/contrib need to copy the LIB371 (aix4-r1) to:

316

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tivaix1:/usr/local/Tivoli/lib/aix4-r1 need to copy the LIB365 (aix4-r1) to: tivaix1:/usr/local/Tivoli/lib/aix4-r1 Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix1 ..... Completed. Distributing machine independent generic Codeset Tables --> tivaix1 .... Completed. Distributing architecture specific Libraries --> tivaix1 ...... Completed. Distributing architecture specific Binaries --> tivaix1 ............. Completed. Distributing architecture specific Server Database --> tivaix1 .......................................... Completed. Distributing architecture specific Man Pages --> tivaix1 ..... Completed. Distributing machine independent X11 Resource Files --> tivaix1 ... Completed. Distributing machine independent Generic Binaries --> tivaix1 ... Completed. Distributing machine independent Client Installation Bundle --> tivaix1 ... Completed. Distributing machine independent generic HTML/Java files --> tivaix1 ... Completed. Distributing architecture specific Public Domain Contrib --> tivaix1 ... Completed. Distributing machine independent LCF Images (new version) --> tivaix1 ............. Completed. Distributing machine independent LCF Tools --> tivaix1 ....... Completed. Distributing machine independent 36x Endpoint Images --> tivaix1 ............ Completed. Distributing architecture specific 371_Libraries --> tivaix1 .... Completed.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

317

Distributing architecture specific 365_Libraries --> tivaix1 .... Completed. Registering installation information...Finished.

Load Tivoli environment variable in .profile files


The Tivoli environment variables contain pointers to important directories that IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework use for many commands. Loading the variables in the .profile file of a user account ensures that these environment variables are always available immediately after logging into the user account. Use the commands shown in Example 4-54 to modify the .profile files of the root and TWSuser user accounts on all cluster nodes to source in all Tivoli environment variables for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework.
Example 4-54 Load Tivoli environment variables PATH=${PATH}:${HOME}/bin if [ -f /etc/Tivoli/setup_env.sh ] ; then . /etc/Tivoli/setup_env.sh fi if [ -f `maestro`/tws_env.sh ] ; then . `maestro`/tws_env.sh fi

Also enter these commands on the command line, or log out and log back in to activate the environment variables for the following sections.

Install Tivoli Framework components and patches


After the base install is complete, you can install all remaining Framework components and patches by running the script shown in Example 4-55 on page 319. If you use this script on tivaix2, change the line that starts with the string HOST= so that tivaix1 is replaced with tivaix2.

318

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 4-55 Script for installing IBM Tivoli Management Framework Version 4.1 with patches #!/bin/ksh if [ -d /etc/Tivoli ] ; then . /etc/Tivoli/setup_env.sh fi reexec_oserv() { echo "Reexecing object dispatchers..." if [ `odadmin odlist list_od | wc -l` -gt 1 ] ; then # # Determine if necessary to shut down any clients tmr_hosts=`odadmin odlist list_od | head -1 | cut -c 36-` client_list=`odadmin odlist list_od | grep -v ${tmr_hosts}$` if [ "${client_list}" = "" ] ; then echo "No clients to shut down, skipping shut down of clients..." else echo "Shutting down clients..." odadmin shutdown clients echo "Waiting for all clients to shut down..." sleep 30 fi fi odadmin reexec 1 sleep 30 odadmin start clients } HOST="tivaix1" winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRE130 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JHELP41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JCF41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRIM41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i MDIST2GU $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISDEPOT $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISCLNT $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i ADE $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i AEF $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF008 -y -i 41TMF008 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF014 -y -i 41TMF014 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF015 -y -i 41TMF015 $HOST reexec_oserv wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF016 -y -i 41TMF016 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2928 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2929 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2931 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2932 $HOST

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

319

wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF034 reexec_oserv wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF032

-y -y -y -y -y -y -y

-i -i -i -i -i -i -i

TMA2962 $HOST TMA2980 $HOST TMA2984 $HOST TMA2986 $HOST TMA2987 $HOST TMA2989 $HOST 41TMF034 $HOST

-y -i JRE130_0 $HOST

This completes the installation of IBM Tivoli Management Framework Version 4.1. After installing IBM Tivoli Management Framework, configure it to meet the requirements of integrating with IBM Tivoli Workload Scheduler over HACMP.

Add IP alias to oserv


Installing IBM Tivoli Management Framework using the primary IP hostname of the server binds the Framework server (also called oserv) to the corresponding IP address. It only listens for Framework network traffic on this IP address. This makes it easy to start IBM Tivoli Management Framework before starting HACMP. In our environment, we also need oserv to listen on the service IP address. The service IP label/address is moved between cluster nodes along with its parent resource group, but the primary IP hostname remains on the cluster node to ease administrative access (that is why it is called the persistent IP label/address). Job Scheduling Console users depend upon using this IP address, not the primary IP hostname of the server, to access IBM Tivoli Workload Scheduler services. As a security precaution, IBM Tivoli Management Framework only listens on the IP address it is initially installed against unless the feature is specifically disabled to bind against other addresses. We show you how to disable this feature in this section. To add the service IP label as a Framework oserv IP alias, follow these steps: 1. Log in as root user on a cluster node. In our environment, we log in as root user on cluster node tivaix1. 2. Use the odadmin command as shown in Example 4-56 on page 321 to verify the current IP aliases of the oserv, add the service IP label as an IP alias to the oserv, then verify that the service IP label is added to the oserv as an IP alias.

320

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Note that the numeral 1 in the odadmin odlist add_ip_alias command should be replaced by the dispatcher number of your Framework installation.
Example 4-56 Add an IP alias to the Framework oserv server [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port IPaddr Hostname(s) 1369588498 1 ct94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com [root@tivaix1:/home/root] odadmin odlist add_ip_alias 1 tivaix1_svc [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port IPaddr Hostname(s) 1369588498 1 ct94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com 9.3.4.3 tivaix1_svc

The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation. In Example 4-57, the dispatcher number is 7.
Example 4-57 Identify the dispatcher number of a Framework installation [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port 1369588498 7 ct94 IPaddr 9.3.4.194 Hostname(s) tivaix1,tivaix1.itsc.austin.ibm.com

The dispatcher number will be something other than 1 if you delete and reinstall Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation. 3. Use the odadmin command as shown in Example 4-58to verify that IBM Tivoli Management Framework currently binds against the primary IP hostname, disable the feature, then verify that it is disabled. Note that the numeral 1 in the odadmin set_force_bind command should be replaced by the dispatcher number of your Framework installation.
Example 4-58 Disable set_force_bind object dispatcher option [root@tivaix1:/home/root] odadmin | grep Force Force socket bind to a single address = TRUE [root@tivaix1:/home/root] odadmin set_force_bind FALSE 1 [root@tivaix1:/home/root] odadmin | grep Force Force socket bind to a single address = FALSE

The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation. In Example 4-59 on page 322, the dispatcher number is 7.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

321

Example 4-59 Identify the dispatcher number of a Framework installation [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port 1369588498 7 ct94 IPaddr 9.3.4.194 Hostname(s) tivaix1,tivaix1.itsc.austin.ibm.com

The dispatcher number will be something other than 1 if you delete and reinstall Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation. Important: Disabling the set_force_bind variable can cause unintended side effects for installations of IBM Tivoli Management Framework that also run other IBM Tivoli server products, such as IBM Tivoli Monitoring and IBM Tivoli Configuration Manager. Refer to your IBM service provider for advice on how to address this potential conflict if you plan on deploying other IBM Tivoli server products on top of the instance of IBM Tivoli Management Framework that you use for IBM Tivoli Workload Scheduler. Best practice is to dedicate an instance of IBM Tivoli Management Framework for IBM Tivoli Workload Scheduler, typically on the Master Domain Manager, and not to install other IBM Tivoli server products into it. This simplifies these administrative concerns and does not affect the functionality of a Tivoli Enterprise environment. 4. Repeat the operation on all remaining cluster nodes. For our environment, we repeated the operation on tivaix2, replacing tivaix1 with tivaix2 in the commands.

Install IBM Tivoli Workload Scheduler Framework components


After installing IBM Tivoli Management Framework, install the IBM Tivoli Workload Scheduler Framework. The components for IBM Tivoli Workload Scheduler Version 8.2 in the environment we use throughout this redbook are: Tivoli Job Scheduling Services v1.2 Tivoli TWS Connector 8.2 There are separate versions for Linux environments. See Tivoli Workload Scheduler Job Scheduling Console Users Guide, SH19-4552, to identify the equivalent components for a Linux environment. Best practice is to back up the Framework object database before installing any Framework components. This enables you to restore the object database to its original state before the installation in case the install operation encounters a problem.

322

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Use the wbkupdb command as shown in Example 4-60 to back up the object database.
Example 4-60 Back up the object database of IBM Tivoli Management Framework [root@tivaix1:/home/root] cd /tmp [root@tivaix1:/tmp] wbkupdb tivaix1 ; echo DB_`date +%b%d-%H%M` Starting the snapshot of the database files for tivaix1... ............................................................ .............................. Backup Complete. DB_Dec09-1958

The last line of the output is produced by the echo command; it returns the name of the backup file created by wbkupdb. All backup files are stored in the directory $DBDIR/../backups. shows how to list all the available backup files.
Example 4-61 List all available object database backup files [root@tivaix1:/home/root] ls $DBDIR/../backups ./ ../ DB_Dec08-1705 DB_Dec08-1716 DB_Dec08-1723 DB_Dec08-1724 DB_Dec09-1829

Example 4-61 shows there are five backups taken of the object database on cluster node tivaix1. Tip: Backing up the object database of IBM Tivoli Management Framework requires that the current working directory that the wbkupdb command is executed from grants write permission to the current user and contains enough disk space to temporarily hold the object database. A common reason wbkupdb fails is the current working directory that it is executed from either does not grant write permissions to the user account running it, or there is not enough space to temporarily hold a copy of the object database directory. Example 4-62 on page 324 shows how to verify there is enough disk space to run wkbkupdb.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

323

Example 4-62 Verifying enough disk space in the current working directory for wbkupdb [root@tivaix1:/tmp] pwd /tmp [root@tivaix1:/tmp] du -sk $DBDIR 15764 /usr/local/Tivoli/spool/tivaix1.db [root@tivaix1:/tmp] df -k /tmp Filesystem 1024-blocks Free %Used /dev/hd3 1146880 661388 43%

Iused %Iused Mounted on 872 1% /tmp

In Example 4-62, the current working directory is /tmp. The du command in the example shows how much space the object database directory occupies. It is measured in kilobytes, and is 15,764 kilobytes in this example (highlighted in bold). The df command in the example shows how much space is available in the current working directory. The third column, labeled Free in the output of the command, shows the available space in kilobytes. In this example, the available disk space in /tmp is 661,388 kilobytes. As long as the latter number is at least twice as large as the former, proceed with running wbkupdb. If the installation of these critical IBM Tivoli Workload Scheduler components fail, refer to your sites Tivoli administrators for assistance in recovering from the error, and direct them to the file created by wbkupdb (as reported by the echo command). To install the IBM Tivoli Management Framework components for IBM Tivoli Workload Scheduler: 1. Log in as root user on a cluster node. In our environment, we logged in as root user on tivaix1. 2. Enter the winstall command as shown in Example 4-63 to install Job Scheduling Services.
Example 4-63 Install Job Scheduling Services component on cluster node tivaix1 [root@tivaix1:/home/root] winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_2/TWS_CONN \ -y -i TMF_JSS tivaix1 Checking product dependencies... Product TMF_3.7.1 is already installed as needed. Dependency check completed. Inspecting node tivaix1... Installing Product: Tivoli Job Scheduling Services v1.2

Unless you cancel, the following operations will be executed: For the machines in the independent class:

324

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

hosts: tivaix1 need to copy the CAT (generic) to: tivaix1:/usr/local/Tivoli/msg_cat For the machines in the aix4-r1 class: hosts: tivaix1 need to copy the BIN (aix4-r1) to: tivaix1:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix1:/usr/local/Tivoli/spool/tivaix2.db

Creating product installation description object...Created. Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix1 Completed. Distributing architecture specific Binaries --> tivaix1 Completed. Distributing architecture specific Server Database --> tivaix1 ....Product install completed successfully. Completed. Registering product installation attributes...Registered.

Note: Both IBM Tivoli Workload Scheduler Job Scheduling Console Users Guide Feature Level 1.2, SH19-4552 (released for IBM Tivoli Workload Scheduler Version 8.1) on page 26, and IBM Tivoli Workload Scheduler Job Scheduling Console Users Guide Feature Level 1.3, SC32-1257 (release for IBM Tivoli Workload Scheduler Version 8.2) on page 45 refer to an owner argument to pass to the winstall command to install the Connector. We believe this is incorrect, because the index files TWS_CONN.IND for both versions of IBM Tivoli Workload Scheduler do not indicate support this argument, and using the argument produces errors in the installation. 3. Enter the winstall command as shown in Example 4 on page 327 to install the Connector Framework resource. The command requires two IBM Tivoli Workload Scheduler-specific arguments, twsdir and iname. These arguments create an initial Connector object. Best practice is to create initial Connector objects on a normally operating cluster. The order that Connector objects are created in does not affect functionality. It is key, however, to ensure the resource group of the corresponding instance of IBM Tivoli Workload Scheduler the initial Connector is being created for is in the ONLINE state on the cluster node you are working on.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

325

twsdir

Enter the TWShome directory of an active instance of IBM Tivoli Workload Scheduler. The file system of the instance must be mounted and available. Enter a Connector name for the instance of IBM Tivoli Workload Scheduler.

iname

In our environment, we use /usr/maestro for twsdir, make sure it is mounted, and use TIVAIX1_rg1 as the Connector name for iname because we want to create an initial Connector object for resource group rg1 on tivaix1, as the cluster is in normal operation and resource group rg1 in the ONLINE state on tivaix1 is the normal state.
Example 4-64 Install Connector component for cluster node tivaix1 root@tivaix1:/home/root] winstall -c \ /usr/sys/inst.images/tivoli/wkb/TWS820_2/TWS_CONN -y -i TWS_CONN \ twsdir=/usr/maestro iname=TIVAIX1_rg1 createinst=1 tivaix1 Checking product dependencies... Product TMF_JSS_1.2 is already installed as needed. Product TMF_3.7.1 is already installed as needed. Dependency check completed. Inspecting node tivaix1... Installing Product: Tivoli TWS Connector 8.2

Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts: tivaix1 For the machines in the aix4-r1 class: hosts: tivaix1 need to copy the BIN (aix4-r1) to: tivaix1:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix1:/usr/local/Tivoli/spool/tivaix1.db

Creating product installation description object...Created. Executing queued operation(s) Distributing architecture specific Binaries --> tivaix1 .. Completed. Distributing architecture specific Server Database --> tivaix1 ....Product install completed successfully. Completed. Registering product installation attributes...Registered.

326

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

4. Verify both Framework components are installed using the wlsinst command as shown in the following example. The strings Tivoli Job Scheduling Services v1.2 and Tivoli TWS Connector 8.2 (highlighted in bold in Example 4-65) should display in the output of the command.
Example 4-65 Verify installation of Framework components for IBM Tivoli Workload Scheduler [root@tivaix1:/home/root] wlsinst -p Tivoli Management Framework 4.1 Tivoli ADE, Version 4.1 (build 09/19) Tivoli AEF, Version 4.1 (build 09/19) Tivoli Java Client Framework 4.1 Java 1.3 for Tivoli Tivoli Java RDBMS Interface Module (JRIM) 4.1 JavaHelp 1.0 for Tivoli 4.1 Tivoli Software Installation Service Client, Version 4.1 Tivoli Software Installation Service Depot, Version 4.1 Tivoli Job Scheduling Services v1.2 Tivoli TWS Connector 8.2 Distribution Status Console, Version 4.1

5. Verify the installation of the initial Connector instance using the wtwsconn.sh. Pass the same Connector name used for the iname argument in the preceding step as the value to the -n flag argument. shows the flag argument value TIVAIX1_rg1 (highlighted in bold). In our environment we passed TIVAIX1_rg1 as the value for the -n flag argument.
Example 4-66 Verify creation of initial Connector [root@tivaix1:/home/root] wtwsconn.sh -view -n TIVAIX1_rg1 MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro"

The output of the command shows the directory path used as the value for the twdir argument in the preceding step, repeated on three lines (highlighted in bold in Example 4-66). 6. Repeat the operation for the remaining cluster nodes. In our environment, repeated the operation for cluster node tivaix2. We used /usr/maestro2 for the twsdir argument and TIVAIX2_rg2 for the iname argument.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

327

Create additional Connectors


The initial Connector objects created as part of the installation of IBM Tivoli Workload Scheduler Framework components only address one resource group that can run on each cluster node. Create additional Connectors to address all possible resource groups that a cluster node can take over, on all cluster nodes. To create additional Connector objects: 1. Log in as root user on a cluster node. In our environment we log in as root user on cluster node tivaix1. 2. Use the wlookup command to identify which Connector objects already exist on the cluster node, as shown in Example 4-67.
Example 4-67 Identify which Connector objects already exist on a cluster node [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1

In our environment, the only Connector object that exists is the one created by the installation of the IBM Tivoli Workload Scheduler Framework components, TIVAIX1_rg1, highlighted in bold in Example 4-67. 3. Use the wtwsconn.sh command to create an additional Connector object, as shown in Example 4-68. The command accepts the name of the Connector object to create for the value of the -n flag argument, and the TWShome directory path of the instance of IBM Tivoli Workload Scheduler that the Connector object will correspond to, as the value for the -t flag argument. The corresponding resource group does not have to be in the ONLINE state on the cluster node. This step only creates the object, but does not require the presence of the resource group to succeed. In our environment we created the Connector object TIVAIX1_rg2 to manage resource group rg2 on tivaix1 in case tivaix2 falls over to tivaix1. Resource group rg2 contains scheduling engine TWS Engine2. TWS Engine2 is installed in /usr/maestro2. So we pass /usr/maestro2 as the value to the -t flag argument.
Example 4-68 Create additional Connector object [root@tivaix1:/home/root] wtwsconn.sh -create -n TIVAIX1_rg2 -t /usr/maestro2 Scheduler engine created Created instance: TIVAIX1_rg2, on node: tivaix1 MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro2 MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro2

328

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

4. Verify the creation of the additional Connector objects using the wtwsconn.sh command as shown in Example 4-69.
Example 4-69 Verify creation of additional Connector object [root@tivaix1:/home/root] wtwsconn.sh -view -n TIVAIX1_rg2 MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro2" MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro2" MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro2"

Pass the name of a new Connector object as the value for the -n flag argument. The output displays the TWShome directory path you use to create the Connector object if the create operation is successful. 5. Repeat the operation for all remaining Connector objects to create on the cluster node. Only create Connector objects for possible resource groups that the cluster node can take over. Using the examples in this section for instance, we would not create any Connector objects on tivaix1 that start with TIVAIX2. So the Connector objects TIVAIX2_rg1 and TIVAIX2_rg2 would not be created on tivaix1. They are instead created on tivaix2. In our environment, we did not have any more resource groups to address, so we did not create any more Connectors on tivaix1. 6. Repeat the operation on all remaining cluster nodes. In our environment we created the Connector object TIVAIX2_rg1 as shown in Example 4-70.
Example 4-70 Create additional Connectors on tivaix2 [root@tivaix2:/home/root] wtwsconn.sh -create -n TIVAIX2_rg1 -t /usr/maestro Scheduler engine created Created instance: TIVAIX2_rg1, on node: tivaix2 MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro [root@tivaix2:/home/root] wtwsconn.sh -view -n TIVAIX2_rg1 MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro" MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro"

If you make a mistake creating a Connector, remove the Connector using the wtwsconn.sh command as shown in Example 4-71.
Example 4-71 Remove a Connector [root@tivaix1:/home/root] wtwsconn.sh -remove -n TIVAIX2 Removed 'MaestroEngine' for 'TIVAIX2' instance Removed 'MaestroPlan' for 'TIVAIX2' instance Removed 'MaestroDatabase' for 'TIVAIX2' instance

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

329

In Example 4-71 on page 329, the Connector TIVAIX2 is removed. You can also use wtwsconn.sh to edit the one value accepted by a Connector when creating it. This is the directory of TWShome of the instance of IBM Tivoli Workload Scheduler the Connector communicates with. Example 4-72 shows how to change the directory.
Example 4-72 Change Connectors directory value [root@tivaix1:/home/root] wtwsconn.sh -remove -n TIVAIX2 Removed 'MaestroEngine' for 'TIVAIX2' instance Removed 'MaestroPlan' for 'TIVAIX2' instance Removed 'MaestroDatabase' for 'TIVAIX2' instance

Editing the value of the directory is useful to match changes to the location of TWShome if IBM Tivoli Workload Scheduler is moved.

Configure Framework access


After you install IBM Tivoli Management Framework (see Implementing IBM Tivoli Workload Scheduler in an HACMP cluster on page 184), configure Framework access for the TWSuser accounts. This lets the TWSuser accounts have full access to IBM Tivoli Management Framework so you can add Tivoli Enterprise products like IBM Tivoli Workload Scheduler Plus Module, and manage IBM Tivoli Workload Scheduler Connectors. In this redbook we show how to grant access to the root Framework Administrator object.The Tivoli administrators of some sites do not allow this level of access. Consult your Tivoli administrator if this is the case, because other levels of access can be arranged. Use the wsetadmin command for to grant this level of access to your TWSuser accounts. In the environment, we ran the following command as root user to identify which Framework Administrator object to modify:
wlookup -ar Administrator

This command returns output similar to that shown in Example 4-73, taken from tivaix1 in our environment.
Example 4-73 Identify which Framework Administrator object to modify to grant TWSuser account root-level Framework access [root@tivaix1:/home/root] wlookup -ar Administrator Root_tivaix1-region 1394109314.1.179#TMF_Administrator::Configuration_GUI# root@tivaix1 1394109314.1.179#TMF_Administrator::Configuration_GUI#

330

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

This shows that the root account is associated with the Administrator object called root@tivaix1. We then used the following command to add the TWSuser accounts to this Administrator object:
wsetadmin -l maestro -l maestro2 root@tivaix1

This grants root-level Framework access to the user accounts maestro and maestro2. Use the wgetadmin command as shown in Example 4-74 to confirm that the TWSuser accounts were added to the root Framework Administrator object. In line 3, the line that starts with the string logins:, the TWSuser accounts maestro and maestro2 (highlighted in bold) indicate these accounts were successfully added to the Administrator object.
Example 4-74 Confirm TWSuser accounts are added to root Framework Administrator object [root@tivaix1:/home/root] wgetadmin root@tivaix1 Administrator: Root_tivaix1-region logins: root@tivaix1, maestro, maestro2 roles: global super, senior, admin, user, install_client, install_product, policy security_group_any_admin user Root_tivaix1-region admin, user, rconnect notice groups: TME Administration, TME Authorization, TME Diagnostics, TME Scheduler

Once these are added, you can use the wtwsconn.sh command (and other IBM Tivoli Management Framework commands) to manage Connector objects from the TWSuser user account. If you are not sure which Connectors are available, use the wlookup command to identify the available Connectors, as shown in Example 4-75.
Example 4-75 Identify available Connectors to manage on cluster node [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1

In Example 4-75, the Connector called TIVAIX1 (case is significant for Connector names) is available on tivaix1.

Interconnect Framework servers


The Connectors for each resource group are configured on each cluster node. Interconnect the Framework servers to be able to manage the Connectors on each cluster node from every other cluster node. Framework interconnection is a complex subject. We will show how to interconnect the Framework servers for our environment, but you should plan your interconnection if your installation of IBM Tivoli Workload Scheduler is part of a larger Tivoli Enterprise environment. Consult your IBM service provider for assistance with planning the interconnection.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

331

Tip: When working with Tivoli administrators, be aware that they are used to hearing Framework resources called managed resources. We use the term Framework resource in this redbook to point out that this is a concept applied to IBM Tivoli Management Framework, and to distinguish it from HACMP resources. It is not an official term, however, so when working with staff who are not familiar with HACMP we advise using the official term of managed resources to avoid confusion. To interconnect the Framework servers for IBM Tivoli Workload Scheduler for our environment, follow these steps: 1. Before starting, make a backup of the IBM Tivoli Management Framework object database using the wbkupdb command as shown in Example 4-76. Log on to each cluster node as root user, and run a backup of the object database on each.
Example 4-76 Back up object database of IBM Tivoli Management Framework [root@tivaix1:/home/root] cd /tmp [root@tivaix1:/tmp] wbkupdb tivaix1 Starting the snapshot of the database files for tivaix1... ............................................................ .............................. Backup Complete.

2. Temporarily grant remote shell access to the root user on each cluster node. Edit or create as necessary the .rhosts file in the home directory of the root user on each cluster node. (This is a temporary measure and we will remove it after we finish the interconnection operation.) In our environment we created the .rhosts file with the contents as shown in Example 4-77.
Example 4-77 Contents of .rhosts file in home directory of root user tivaix1 root tivaix2 root

3. Temporarily grant the generic root user account (root with no hostname qualifier) a Framework login on the root Framework account. Run the wsetadmin command as shown:
wsetadmin -l root root@tivaix1

332

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

If you do not know your root Framework account, consult your Tivoli administrator or IBM service provider. (This is a temporary measure and we will remove it after we finish the interconnection operation.) In our environment the root Framework account is root@tivaix, so we grant the generic root user account a login on this Framework account. Note: If an interconnection is made under a user other than root, the /etc/hosts.equiv file also must be configured. Refer to Secure and Remote Connections in Tivoli Management Framework Maintenance and Troubleshooting Guide Version 4.1, GC32-0807, for more information. 4. Run the wlookup commands on the cluster node as shown in Example 4-78 to determine the Framework objects that exist before interconnection, so you can refer back to them later in the operation.
Example 4-78 Sampling Framework objects that exist before interconnection on tivaix1 [root@tivaix1:/home/root] wlookup -Lar ManagedNode tivaix1 [root@tivaix1:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1 TIVAIX1_rg2

In our environment we ran the commands on tivaix1. 5. Run the same sequence of wlookup commands, but on the cluster node on the opposing side of the interconnection operation, as shown in Example 4-79.
Example 4-79 Sampling Framework objects that exist before interconnection on tivaix2 [root@tivaix2:/home/root] wlookup -Lar ManagedNode tivaix2 [root@tivaix2:/home/root] wlookup -Lar MaestroEngine TIVAIX2_rg1 TIVAIX2_rg2

In our environment we ran the commands on tivaix2. 6. Interconnect the Framework servers in a two-way interconnection using the wconnect command as shown in Example 4-80 on page 334. Refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806, for a complete description of how to use wconnect.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

333

Example 4-80 Interconnect the Framework servers on tivaix1 and tivaix2 [root@tivaix1:/home/root] wconnect -c none -l root -m Two-way -r none tivaix2 Enter Password for user root on host tivaix2:

Note: While writing this redbook, we observed that the wconnect command behaves inconsistently when used in trusted host mode, especially upon frequently restored object databases. Therefore, we enabled trusted host access through .rhosts only as a precaution, and forced wconnect to require a password; then it does not exhibit the same inconsistency. In our environment we configured an interconnection against tivaix2, using the root account of tivaix2 to perform the operation through the remote shell service, as shown in Example 4-80. Because we do not use an interregion encryption (set during Framework installation in the wserver command arguments), we pass none to the -c flag option. Because we do not use encryption in tivaix2s Tivoli region, we pass none to the -r flag option. We log into tivaix2 and use the odadmin command to determine the encryption used in tivaix2s Tivoli region, as shown in Example 4-81. The line that starts with Inter-dispatcher encryption level displays the encryption setting of the Tivoli region, which is none in the example (highlighted in bold).
Example 4-81 Determine the encryption used in the Tivoli region of tivaix2 [root@tivaix2:/home/root] odadmin Tivoli Management Framework (tmpbuild) #1 Wed Oct 15 16:45:40 CDT 2003 (c) Copyright IBM Corp. 1990, 2003. All Rights Reserved. Region = 1221183877 Dispatcher = 1 Interpreter type = aix4-r1 Database directory = /usr/local/Tivoli/spool/tivaix2.db Install directory = /usr/local/Tivoli/bin Inter-dispatcher encryption level = none Kerberos in use = FALSE Remote client login allowed = version_2 Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/usr/local/Tivoli/install_dir/iblib/aix4-r1:/usr/lib:/us r/local/Tivoli/lib/aix4-r1:/usr/lib Force socket bind to a single address = FALSE Perform local hostname lookup for IOM connections = FALSE Use Single Port BDT = FALSE Port range = (not restricted) Single Port BDT service port number = default (9401) Network Security = none

334

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

SSL Ciphers = default ALLOW_NAT = FALSE State flags in use = TRUE State checking in use = TRUE State checking every 180 seconds Dynamic IP addressing allowed = FALSE Transaction manager will retry messages 4 times.

Important: Two-way interconnection operations only need to be performed on one side of the connection. If you have two cluster nodes, you only need to run the wconnect command on one of them. 7. Use the wlsconn and odadmin commands to verify the interconnection as shown in Example 4-82.
Example 4-82 Verify Framework interconnection [root@tivaix1:/home/root] wlsconn MODE NAME SERVER REGION <----> tivaix2-region tivaix2 1221183877 [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port IPaddr 1369588498 1 ct94 9.3.4.194 9.3.4.3 1112315744 1 ct94 9.3.4.195

Hostname(s) tivaix1,tivaix1.itsc.austin.ibm.com tivaix1_svc tivaix2,tivaix2.itsc.austin.ibm.com

The output displays the primary IP hostname of the cluster node that is interconnected to in the preceding step. In our environment, the primary IP hostname of cluster node tivaix2 is found under the SERVER column of the output of the wlsconn command (highlighted in bold in Example 4-82, with the value tivaix2). The same value (tivaix2, highlighted in bold in Example 4-82) is found under the Hostname(s) column in the output of the odadmin command, on the row that shows the Tivoli region ID of the cluster node. The Tivoli region ID is found by entering the odadmin command as shown in Example 4-83. It is on the line that starts with Region =.
Example 4-83 Determine Tivoli region ID of cluster node [root@tivaix2:/home/root] odadmin Tivoli Management Framework (tmpbuild) #1 Wed Oct 15 16:45:40 CDT 2003 (c) Copyright IBM Corp. 1990, 2003. All Rights Reserved. Region = 1221183877 Dispatcher = 1 Interpreter type = aix4-r1 Database directory = /usr/local/Tivoli/spool/tivaix2.db Install directory = /usr/local/Tivoli/bin

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

335

Inter-dispatcher encryption level = none Kerberos in use = FALSE Remote client login allowed = version_2 Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/usr/local/Tivoli/install_dir/iblib/aix4-r1:/usr/lib:/us r/local/Tivoli/lib/aix4-r1:/usr/lib Force socket bind to a single address = FALSE Perform local hostname lookup for IOM connections = FALSE Use Single Port BDT = FALSE Port range = (not restricted) Single Port BDT service port number = default (9401) Network Security = none SSL Ciphers = default ALLOW_NAT = FALSE State flags in use = TRUE State checking in use = TRUE State checking every 180 seconds Dynamic IP addressing allowed = FALSE Transaction manager will retry messages 4 times.

In this example, the region ID is shown as 1221183877. 8. Interconnecting Framework servers only establishes a communication path. The Framework resources that need to be shared between Framework servers have to be pulled across the servers using an explicit updating command. Sharing a Framework resource shares all the objects that the resource defines. This enables Tivoli administrators to securely control which Framework objects are shared between Framework servers, and control the performance of the Tivoli Enterprise environment by leaving out unnecessary resources from the exchange of resources between Framework servers. Exchange all relevant Framework resources among cluster nodes by using the wupdate command. In our environment we exchanged the following Framework resources: ManagedNode MaestroEngine MaestroDatabase MaestroPlan SchedulerEngine SchedulerDatabase SchedulerPlan

Use the script shown in Example 4-84 on page 337 to exchange resources on all cluster nodes.

336

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Important: Unlike the wconnect command, the wupdate command must be run on all cluster nodes, even on two-way interconnected Framework servers.
Example 4-84 Exchange useful and required resources for IBM Tivoli Workload Scheduler for resource in ManagedNode \ MaestroEngine MaestroDatabase MaestroPlan \ SchedulerEngine SchedulerDatabase SchedulerPlan do wupdate -r ${resource} All done

The SchedulerEngine Framework resource enables the interconnected scheduling engines to present themselves in the Job Scheduling Console. The MaestroEngine Framework resource enables the wmaeutil command to manage running instances of Connectors. Tip: Best practice is to update the entire Scheduler series (SchedulerDatabase, SchedulerEngine, and SchedulerPlan) and Maestro series (MaestroDatabase, MaestroEngine, and MaestroPlan) of Framework resources, if for no other reason than to deliver administrative transparency so that all IBM Tivoli Workload Scheduler-related Framework objects can be managed from any cluster node running IBM Tivoli Management Framework. It is much easier to remember that any IBM Tivoli Workload Scheduler-related Framework resource can be seen and managed from any cluster node running a two-way interconnected IBM Tivoli Management Framework server, than to remember a list of which resources must be managed locally on each individual cluster nodes, and which can be managed from anywhere in the cluster. In our environment, we ran the script in Example 4-84 on tivaix1 and tivaix2. 9. Verify the exchange of Framework resources. Run the wlookup command as shown in Example 4-85 on the cluster node. Note the addition of Framework objects that used to only exist on the cluster node on the opposite side of the interconnection.
Example 4-85 Verify on tivaix1 the exchange of Framework resources [root@tivaix1:/home/root] wlookup -Lar ManagedNode tivaix1 tivaix2 [root@tivaix1:/home/root] wlookup -Lar MaestroEngine

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

337

TIVAIX1_rg1 TIVAIX1_rg2 TIVAIX2_rg1 TIVAIX2_rg2

In our environment, we ran the commands on tivaix1. 10.Run the same sequence of wlookup commands, but on the cluster node on the opposite side of the interconnection, as shown in Example 4-86. The output from the commands should be identical to the same commands run on the cluster node in the preceding step.
Example 4-86 Verify on tivaix2 the exchange of Framework resources [root@tivaix2:/home/root] wlookup -Lar ManagedNode tivaix1 tivaix2 [root@tivaix2:/home/root] wlookup -Lar MaestroEngine TIVAIX1_rg1 TIVAIX1_rg2 TIVAIX2_rg1 TIVAIX2_rg2

In our environment, we ran the commands on tivaix1. 11.Log into both cluster nodes through the Job Scheduling Console, using the service IP labels of the cluster nodes and the root user account. All scheduling engines (corresponding to the configured Connectors) on all cluster nodes appear. Those scheduling engines marked inactive are actually Connectors for potential resource groups on a cluster node that are not active because the resource group is not running on that cluster node. In our environment, the list of available scheduling engines was as shown in Figure 4-86 on page 363, for a normal operation cluster.

Figure 4-68 Available scheduling engines after interconnection of Framework servers

12.Remove the .rhosts entries or delete the entire file if the two entries in this operation were the only ones added.

338

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

13.Remove the configuration that allows any root user to access Framework. Enter the wsetadmin command as shown.
wsetadmin -L root root@tivaix1

14.Set up a periodic job to exchange Framework resources using the wupdate command shown in the script in the preceding example. The frequency that the job should run at depends upon how often changes are made to the Connector objects. For most sites, best practice is a daily update about an hour before Jnextday. Timing it before Jnextday makes the Framework resource update compatible with any changes to the installation location of IBM Tivoli Workload Scheduler. These changes are often timed to occur right before Jnextday is run.

How to log in using the Job Scheduling Console


Job Scheduling Console users should log in using the service IP label of the scheduling engine they work with the most. Figure 4-69 shows how to log into TWS Engine1, no matter where it actually resides on the cluster, by using tivaix1_svc as the service label.

Figure 4-69 Log into TWS Engine1

Figure 4-70 on page 340 shows how to log into TWS Engine2.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

339

Figure 4-70 Log into TWS Engine2

While using the IP hostnames will also work during normal operation of the cluster, they are not transferred during an HACMP fallover. Therefore, Job Scheduling Console users must use a service IP label for an instance of IBM Tivoli Workload Scheduler that falls over to a foreign cluster node.

4.1.12 Production considerations


In this redbook, we present a very straightforward implementation of a highly available configuration of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework. An actual production deployment adds considerably to the complexity of the presentation. In this section, we identify some of the considerations that have to be managed in an actual deployment.

Naming conventions
In this redbook we used names selected to convey their product function as much as possible. However, this may lead to names that are inconvenient for users in a production environment. The IP service labels in our environment, tivaix1_svc and tivaix2_svc, are the primary means for Job Scheduling Console users to specify what to log into. For these users, the _svc string typically holds no significance. We recommend using a more meaningful name like master1 and master2 for two cluster nodes that implement Master Domain Manager servers, for example. Connector names in this redbook emphasized the cluster node first. In an actual production environment, we recommend emphasizing the resource group first in the name. Furthermore, the name of the resource group would be more

340

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

meaningful if it referred to its primary business function. For example, TIVAIX1_rg1 in the environment we used for this redbook would be changed to mdm1_tivaix1 for Master Domain Manager server 1. Job Scheduling Console users would then see in their GUI a list of resource groups in alphabetical order, in terms they already work with.

Dynamically creating and deleting Connectors


The inactive Connector objects do not have to remain in their static configurations. They only have to be created if a resource group falls over to a cluster node. For example, during normal operation of our environment, we do not use Connectors TIVAIX1_rg2 and TIVAIX2_rg1. If the Connectors can be dynamically created and deleted as necessary, then Job Scheduling Console users will only ever see active resource groups. After a resource group is brought up in a cluster node, the rg_move_complete event is posted. A custom post-event script for the event can be developed to identify which resource group is moving, what cluster node it is moving to, and which Connectors are extraneous as a result of the move. This information, taken together, enables the script to create an appropriate new Connector and delete the old Connector. The result delivered to the Job Scheduling Console users is a GUI that presents the currently active scheduling engines running in the cluster as of the moment in time that the user logs into the scheduling network.

Time synchronization
Best practice is to use a time synchronization tool to keep the clocks on all cluster nodes synchronized to a known time standard. One such tool we recommend is ntp, an Open Source implementation of the Network Time Protocol. For more information on downloading and implementing ntp for time synchronization, refer to:
http://www.ntp.org/

Network Time Protocol typically works by pulling time signals from the Internet or through a clock tuned to a specific radio frequency (which is sometimes not available in certain parts of the world). This suffices for the majority of commercial applications, even though using the Internet for time signals represents a single point of failure. Sites with extremely high availability requirements for applications that require very precise time keeping can use their own onsite reference clocks to eliminate using the Internet or a clock dependent upon a radio frequency as the single point of failure.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

341

Security
In this redbook we present a very simplified implementation with as few security details as necessary that obscure the HACMP aspects. In an actual production deployment, however, security is usually a large part of any planning and implementation. Be aware that some sites may not grant access to the Framework at the level that we show. Some sites may also enforce a Framework encryption level across the Managed Nodes. This affects the interconnection of servers. Consult your IBM service provider for information about your sites encryption configuration and about how to interconnect in an encrypted Framework environment. Other security considerations like firewalls between cluster nodes, firewalls between cluster nodes and client systems like Job Scheduling Console sessions, and so forth require careful consideration and planning. Consult your IBM service provider for assistance on these additional scenarios.

Monitoring
By design, failures of components in the cluster are handled automaticallybut you need to be aware of all such events. Chapter 8, Monitoring an HACMP Cluster, in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, describes various tools you can use to check the status of an HACMP Cluster, the nodes, networks, and resource groups within that cluster, and the daemons that run on the nodes. HACMP software incudes the Cluster Information Program (Clinfo), an SNMP-based monitor. HACMP for AIX software provides the HACMP for AIX MIB, which is associated with and maintained by the HACMP for AIX management agent, the Cluster SMUX peer daemon (clsmuxpd). Clinfo retrieves this information from the HACMP for AIX MIB through the clsmuxpd. Clinfo can run on cluster nodes and on HACMP for AIX client machines. It makes information about the state of an HACMP Cluster and its components available to clients and applications via an application programming interface (API). Clinfo and its associated APIs enable developers to write applications that recognize and respond to changes within a cluster. The Clinfo program, the HACMP MIB, and the API are documented in High Availability Cluster Multi-Processing for AIX Programming Client Applications Version 5.1, SC23-4865. Although the combination of HACMP and the inherent high availability features built into the AIX system keeps single points of failure to a minimum, there are still failures that, although detected, can cause other problems. See the chapter on events in High Availability Cluster Multi-Processing for AIX, Planning and

342

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Installation Guide Version 5.1, SC23-4861-00, for suggestions about customizing error notification for various problems not handled by the HACMP events.

Geographic high availability


An extension of cluster-based high availability is geographic high availability. As the name implies, these configurations increase the availability of an application even more when combined with a highly available cluster. The configurations accomplish this by treating the clusters entire site as a single point of failure, and introduce additional nodes in a geographically separate location. These geographically separate nodes can be clusters in themselves. Consult your IBM service provider for assistance in planning and implementing a geographic high availability configuration.

Enterprise management
Delivering production-quality clusters often involves implementing enterprise systems management tools and processes to ensure the reliability, availability and serviceability of the applications that depend upon the cluster. This section covers some of the considerations we believe that should be given extra attention when implementing a highly available cluster for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework. Many IBM Tivoli products speed the time to deliver the additional necessary services to enable you to deliver service level guarantees for the users of the cluster. For more information about these products, refer to:
http://www.ibm.com/software/tivoli/

We recommend that you consult your IBM Tivoli service provider for advice on other enterprise systems management issues that should be considered. The issues covered in this section represent only a few of the benefits available for delivery to users of the cluster.

Measuring availability
Availability analysis is a major maintenance tool for clusters. You can use the Application Availability Analysis tool to measure the amount of time that any of your applications is available. The HACMP software collects, time stamps, and logs the following information: An application starts, stops, or fails A node fails or is shut down, or comes online A resource group is taken offline or moved Application monitoring is suspended or resumed Using SMIT, you can select a time period and the tool will display uptime and downtime statistics for a given application during that period.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

343

The tool displays: Percentage of uptime Amount of uptime Longest period of uptime Percentage of downtime Amount of downtime Longest period of downtime Percentage of time application monitoring was suspended The Application Availability Analysis tool reports application availability from the HACMP Cluster infrastructure's point of view. It can analyze only those applications that have been properly configured so that they will be managed by the HACMP software. When using the Application Availability Analysis tool, keep in mind that the statistics shown in the report reflect the availability of the HACMP application server, resource group, and (if configured) the application monitor that represent your application to HACMP. The Application Availability Analysis tool cannot detect availability from an end user's point of view. For example, assume that you have configured a client-server application so that the server was managed by HACMP, and after the server was brought online, a network outage severed the connection between the end user clients and the server. End users would view this as an application outage because their client software could not connect to the serverbut HACMP would not detect it, because the server it was managing did not go offline. As a result, the Application Availability Analysis tool would not report a period of downtime in this scenario. For this reason, best practice is to monitor everything that affects the entire user experience. We recommend using tools like IBM Tivoli Monitoring, IBM Tivoli Service Level Advisor, and IBM Tivoli NetView to perform basic monitoring and reporting of the end-user service experience.

Configuration management
When there are many nodes in a cluster, configuration management often makes a difference of as much as hours or even days between the time a new cluster node is requested by users and when it is available with a fully configured set of highly available applications. Configuration management tools also enable administrators to enforce the maintenance levels, patches, fix packs and service packs of the operating system and applications on the cluster nodes. They accomplish this by gathering inventory information and comparing against baselines established by the

344

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

administrators. This eliminates the errors that are caused in a cluster by mismatched versions of operating systems and applications. We recommend using IBM Tivoli Configuration Manager to implement services that automatically create a new cluster node from scratch, and enforce the software levels loaded on all nodes in the cluster.

Notification
Large, highly available installations are very complex systems, often involving multiple teams of administrators overseeing different subsystems. Proper notification is key to the timely and accurate response to problems identified by a monitoring system. We recommend using IBM Tivoli Enterprise Console and a notification server to implement robust, flexible and scalable notification services.

Provisioning
For large installations of clusters, serving many highly available applications, with many on demand cluster requirements and change requests each week, provisioning software is recommended as a best practice. In these environments, a commercial-grade provisioning system substantially lowers the administrative overhead involved in responding to customer change requests. We recommend using IBM Tivoli ThinkDynamic Orchestrator to implement provisioning for very complex and constantly changing clusters.

Practical lessons learned about high availability


While writing this redbook, a serial disk in the SSA disk tray we use in our environment failed. Our configuration does not use this disk for any of our volume groups, so we continued to use the SSA disk tray. However, the failed drive eventually impacted the performance of the SSA loop to the point that HACMP functionality was adversely affected. The lesson we learned from this experience was that optimal HACMP performance depends upon a properly maintained system. In other words, using HACMP does not justify delaying normal system preventative and necessary maintenance tasks.

Forced HACMP stops


We observed that forcing HACMP services to stop may leave it in an inconsistent state. If there are problems starting it again, we find that stopping it gracefully before attempting a start clears up the problem.

4.1.13 Just one IBM Tivoli Workload Scheduler instance


The preceding sections show you how to design, plan and implement a two-node HACMP Cluster for an IBM Tivoli Workload Scheduler Master Domain Manager

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

345

in a mutual takeover configuration. This requires you to design your overall enterprise workload into two independent, or at most loosely coupled, sets of job streams. You can, however,opt to only implement a single instance of IBM Tivoli Workload Scheduler in a two-node cluster in a hot standby configuration. Best practice is to use a mutual takeover configuration for Master Domain Managers. In this section, we discuss how to implement a single instance of IBM Tivoli Workload Scheduler in a hot standby configuration, which is appropriate for creating highly available Fault Tolerant Agents, for example. Important: Going from a mutual takeover, dual Master Domain Manager configuration to only one instance of IBM Tivoli Workload Scheduler doubles the risk exposure of the scheduling environment. You can create a cluster with just one instance of IBM Tivoli Workload Scheduler by essentially using the same instructions, but eliminating one of the resource groups. You can still use local instances of IBM Tivoli Management Framework. With only one resource group, however, there are some other, minor considerations to address in the resulting HACMP configuration. Create only one IBM Tivoli Workload Scheduler Connector on each cluster node. If the installation of the single instance of IBM Tivoli Workload Scheduler is in /usr/maestro, the instance normally runs on cluster node tivaix1, and the IBM Tivoli Workload Scheduler Connector is named PROD for production, then all instances of IBM Tivoli Management Framework on other cluster nodes also use a IBM Tivoli Workload Scheduler Connector with the same name (PROD) and configured the same way. When the resource group containing an instance of IBM Tivoli Workload Scheduler falls over to the another cluster node, the IP service label associated with the instance falls over with the resource group. Configure the instances of IBM Tivoli Management Framework on the cluster nodes to support this IP service label as an IP alias for the Managed Node in each cluster node. Job Scheduling Console sessions can connect against the corresponding IP service address even after a fallover event. Consult your IBM service provider if you need assistance with configuring a hot standby, single instance IBM Tivoli Workload Scheduler installation.

Complex configurations
In this redbook we show how to configure IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework on a cluster with two cluster nodes. More complex configurations include: One instance of IBM Tivoli Workload Scheduler across more than two cluster nodes.

346

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

More than two instances of IBM Tivoli Workload Scheduler across more than two cluster nodes. Multiple instances of IBM Tivoli Workload Scheduler on a single cluster node, in a cluster with multiple nodes. The number of permutations of fallover scenarios increases with each additional cluster node beyond the two-node environment we show in this redbook. Best practice is to test each permutation. Consult your IBM service provider if you want assistance with configuring a more complex configuration.

4.2 Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster


In this section, we describe how to implement a Tivoli Workload Scheduler engine in a Microsoft Cluster using Microsoft Cluster Service. We cover both a single installation of Tivoli Workload Scheduler, and two copies of Tivoli Workload Scheduler in a mutual takeover scenario. We do not cover how to perform patch upgrades. For more detailed information about installing IBM Tivoli Workload Scheduler under a windows platform, refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273.

4.2.1 Single instance of IBM Tivoli Workload Scheduler


Figure 4-71 on page 348 shows two Windows 2000 systems in a Microsoft Cluster. In the center of this cluster is a shared disk volume, configured in the cluster as volume X, where we intend to install the Tivoli Workload Scheduler engine.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

347

Public Network Connection

Private Network Connection

tivw2k1 TWS Engine 1

tivw2k2

Shared Disk Volume X:

Figure 4-71 Network diagram of the Microsoft Cluster

Once the cluster is set up and configured properly, as described in 3.3, Implementing a Microsoft Cluster on page 138, you can install the IBM Tivoli Workload Scheduler software in the shared disk volume X:. The following steps will guide you through a full installation. 1. Ensure you are logged on as the local Administrator. 2. Ensure that the shared disk volume X:, that it is owned by System 1 (tivw2k1), and that it is online. To verify this, open the Cluster Administrator, as shown in Figure 4-72 on page 349.

348

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 4-72 Cluster Administrator

3. Insert the IBM Tivoli Workload Scheduler Installation Disk 1 into the CD-ROM drive. 4. Change directory to the Windows folder and run the setup program, which is the SETUP.exe file.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

349

5. Select the language in which you want the wizard to be displayed, and click OK as seen in Figure 4-73.

Figure 4-73 Installation-Select Language

350

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

6. Read the welcome information and click Next, as seen in Figure 4-74.

Figure 4-74 Installation-Welcome Information

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

351

7. Read the license agreement, select the acceptance radio button, and click Next, as seen in Figure 4-75.

Figure 4-75 Installation-License agreement

352

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

8. The Install a new Tivoli Workload Scheduler Agent is selected by default. Click Next, as seen in Figure 4-76.

Figure 4-76 Installation-Install new Tivoli Workload Scheduler

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

353

9. Specify the IBM Tivoli Workload Scheduler user name. Spaces are not permitted. On Windows systems, if this user account does not already exist, it is automatically created by the installation program. Note the following: The User name must be a domain user (this is mandatory); specify the name as domain_name\user_name. Also, type and confirm the password. Click Next, as seen in Figure 4-77.

Figure 4-77 Installation user information

10.If you specified a user name that does not already exist, an information panel is displayed about extra rights that need to be applied. Review the information and click Next.

354

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

11.Specify the installation directory under which the product will be installed. The directory cannot contain spaces. On Windows systems only, the directory must be located on an NTFS file system. If desired, click Browse to select a different destination directory, and click Next as shown in Figure 4-78.

Figure 4-78 Installation install directory

Note: Make sure that the shared disk is attached to the node that you are installing IBM Tivoli Workload Scheduler on.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

355

12.Select the Custom install option and click Next, as shown in Figure 4-79. This option will allow the custom installation of just the engine and not the Framework or any other features.

Figure 4-79 Type of Installation

356

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

13.Select the type of IBM Tivoli Workload Scheduler workstation you would like to install (Master Domain Manager, Backup Master, Fault Tolerant Agent or a Standard Agent), as this installation will only install the parts of the code needed for each configuration. If needed, you are able to promote the workstation to a different type of IBM Tivoli Workload Scheduler workstation using this installation program. Select Master Domain Manager and click Next, as shown in Figure 4-80.

Figure 4-80 Type of IBM Tivoli Workload Scheduler workstation

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

357

14.Type in the following information and then click Next, as shown in Figure 4-81: a. Company Name as you would like it to appear in program headers and reports. This name can contain spaces provided that the name is not enclosed in double quotation marks (). b. The IBM Tivoli Workload Scheduler 8.2 name for this workstation. This name cannot exceed 16 characters, cannot contain spaces, and it is not case sensitive. c. The TCP port number used by the instance being installed. It must be a value in the range 1-65535. The default is 31111.

Figure 4-81 Workstation information

358

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

15.In this dialog box you are allowed to select the Tivoli Plus Module and/or the Connector. In this case we do not need these options, so leave them blank and click Next, as shown in Figure 4-82.

Figure 4-82 Extra optional features

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

359

16.In this dialog box, as shown in Figure 4-83, you have the option of installing additional languages. We did not select any additional languages to install at this stage, since this requires the Tivoli Management Framework 4.1 Language CD-ROM be available in addition to Tivoli Framework 4.1 Installation CD-ROM during the install phase.

Figure 4-83 Installation of Additional Languages

360

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

17.Review the installation settings and then click Next, as shown in Figure 4-84.

Figure 4-84 Review the installation

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

361

18.A progress bar indicates that the installation has started, as shown in Figure 4-85.

Figure 4-85 IBM Tivoli Workload Scheduler Installation progress window

362

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

19.After the installation is complete a final summary panel will be displayed, as shown in Figure 4-86. Click Finish to exit the setup program.

Figure 4-86 Completion of a successful install

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

363

20.Now that the installation is completed on one side of the cluster (system1), you have to make sure the registry entries are updated on the other side of the cluster pair. The easiest way to do this is by removing the just installed software on system1 (tivw2k1) in the following way: a. Make sure that all the services are stopped by opening the Services screen. Go to Start -> Settings -> Control Panel. Then open up Administrative Tools ->Services. Verify that Tivoli Netman, Tivoli Token Service and Tivoli Workload Scheduler services are not running. b. Using Windows Explorer, go to the IBM Tivoli Workload Scheduler installation directory x:\win32app\TWS\TWS82 and remove all files and directories in this directory. c. Use the Cluster Administrator to verify that the shared disk volume X: is owned by System 2 (tivw2k2), and is online. Open Cluster Administrator, as shown in Figure 4-87.

Figure 4-87 Cluster Administrator

21.Now install IBM Tivoli Workload Scheduler on the second system by repeating steps 3 through 18.

364

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

22.To complete IBM Tivoli Workload Scheduler installation, you will need to add a IBM Tivoli Workload Scheduler user to the database. The install process should have created one for you, but we suggest that you verify that the user exists by running the composer program as shown in Example 4-87.
Example 4-87 Check the user creation C:\win32app\TWS\maestro82\bin>composer TWS for WINDOWS NT/COMPOSER 8.2 (1.18.2.1) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2001 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Installed for user ''. Locale LANG set to "en" -display users tws82#@ CPU id. User Name ---------------- --------------------------------------------TWS82 gb033984 USERNAME TWS82#gb033984 PASSWORD "***************" END AWSBIA251I Found 1 users in @. -

If the user exists in the database, then you will not have to do anything. 23.Next you need to modify the workstation definition. You can modify this by running the composer modify cpu=TWS82 command. This will display the workstation definition that was created during the IBM Tivoli Workload Scheduler installation in an editor. The only parameter you will have to change is the argument Node; it will have to be changed to the IP address of the cluster. Table 4-5 lists and describes the arguments.
Table 4-5 IBM Tivoli Workload Scheduler workstation definition Argument cpuname Value TWS82 Description Type in a workstation name that is appropriate for this workstation. Workstation names must be unique, and cannot be the same as workstation class and domain names. Type in a description that is appropriate for this workstation.

Description

Master CPU

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

365

Argument OS Node

Value WNT 9.3.4.199

Description Specifies the operating system of the workstation. Valid values include UNIX, WNT, and OTHER. This field is the address of the cluster. This address can be a fully-qualified domain name or an IP address. Specify a domain name for this workstation. The default name is MASTERDM. Specifies the TCP port number that is used for communications. The default is 31111. If you have two copies of TWS running on the same system, then the port address number must be different. This field has no value, because it is a key word to start the extra options for the workstation.

Domain TCPaddr

Masterdm 31111

For Maestro Autolink On

When set to ON, this specifies whether to open the link between workstations at the beginning of each day during startup. With this set to ON, this workstation will track dependencies for all jobs and job streams, including those running on other workstations. With this set to ON, this workstation will be updated with the status of jobs and job streams running on all other workstations in its domain and in subordinate domains, but not on peer or parent domains. This field has no value, because it is a key word to end the workstation definition.

Resolvedep

On

Fullstatus

On

End

366

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 4-88 illustrates the workstation definition.

Figure 4-88 IBM Tivoli Workload Scheduler Workstation definition

24.After the workstation definition has been modified, you are able to add the FINAL job stream definition to the database which is the script that creates the next days production day file. To do this, login as the IBM Tivoli Workload Scheduler installation user and run this command:
Maestrohome\bin\composer add Sfinal

This will add the job and jobstreams to the database. 25.While still logged in as the IBM Tivoli Workload Scheduler installation user, run the batch file Jnextday:
Maetsrohome\Jnextday

Verify that Jnextday has worked correctly by running the conman program:
Maestrohome\bin\conman

In the output, shown in Example 4-88, you should see in the conman header Batchman Lives, which indicates that IBM Tivoli Workload Scheduler is installed correctly and is up and running.
Example 4-88 Header output for conman x:\win32app\TWS\TWS82\bin>conman TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2001 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Installed for user ''. Locale LANG set to "en"

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

367

Schedule (Exp) 06/11/03 (#1) on TWS82. Audit Level: 0 %

Batchman LIVES. Limit: 10, Fence: 0,

26.When a new workstation is created in an IBM Tivoli Workload Scheduler distributed environment, you need to set the workstation limit of concurrent jobs because the default value is set to 0, which means no jobs will run. To change the workstation limit from 0 to 10, enter the following command:
Maestrohome\bin\conman limit cpu=tws82;10

Verify that the command has worked correctly by running the conman show cpus command:
Maestrohome\bin\conman sc=tws82

The conman output, shown in Example 4-89, contains the number 10 in the fifth column, indicating that the command has worked correctly.
Example 4-89 conman output C:\win32app\TWS\maestro82\bin>conman sc=tws82 TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2001 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Installed for user ''. Locale LANG set to "en" Schedule (Exp) 06/11/03 (#1) on TWS82. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0 sc=tws82 CPUID RUN NODE LIMIT FENCE DATE TIME STATE METHOD DOMAIN TWS82 1 *WNT MASTER 10 0 06/11/03 12:08 I J MASTERDM

27.Before you configure IBM Tivoli Workload Scheduler in the cluster services, you need to set the three IBM Tivoli Workload Scheduler services to manual start up. Do this by opening the Services Screen. Go to Start -> Settings -> Control Panel and open Administrative Tools -> Services. Change Tivoli Netman, Tivoli Token Service and Tivoli Workload Scheduler to manual startup. 28.Now you can configure IBM Tivoli Workload Scheduler in the cluster services by creating a new resource for each of the three IBM Tivoli Workload Scheduler services: Tivoli Netman, Tivoli Token Service, and Tivoli Workload Scheduler. These three new resources have to be created in the same Cluster Services Group as the IBM Tivoli Workload Scheduler installation

368

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

drive. In this case we used the X: drive, which belongs to cluster group Disk Group1. 29.First create the new resource Tivoli Token Service, as shown in Figure 4-89.

Figure 4-89 New Cluster resource

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

369

30.Fill in the first screen (Figure 4-90) as follows, and then click Next: Name Description Resource type Group Enter the name you want to use for this resource, such as Tivoli Token Service. Enter a description of this resource, such as Tivoli Token Service. The resource type of service for Tivoli Token Service. Select Generic Service. Select the group where you want to create this resource in. It must be created in the same group as any dependences (such as the installation disk drive or network).

Figure 4-90 Resource values

370

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

31.Now you need to select the possible nodes that this resource can run on. In this case, select both nodes as shown in Figure 4-91. Then click Next.

Figure 4-91 Node selection for resource

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

371

32.Select all the dependencies that you would like this resource (Tivoli Token Service) to be dependent on. In this case, you need the disk, network and IP address to be online before you can start the Tivoli Token Service as shown in Figure 4-92. Then click Next.

Figure 4-92 Dependencies for this resource

372

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

33.Add in the service parameters for the service Tivoli Token Service, then click Next, as shown in Figure 4-93. Service name To find the service name, open the Windows services panel; go to Start -> Settings -> Control Panel, then open Administrative Tools -> Services. Highlight the service, then click Action -> Properties. Under the General tab on the first line you can see the service name, which in this case is tws_tokensrv_tws8_2. Start parameters Enter any start parameters needed for this service (Tivoli Token Service). In this case, there are no start parameters, so leave this field blank.

Figure 4-93 Resource parameters

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

373

34.This screen (Figure 4-94) allows you to replicate registry data to all nodes in the cluster. In the case of this service, Tivoli Token Service, this is not needed, so leave it blank. Then click Finish.

Figure 4-94 Registry Replication

35.Figure 4-95 should then be displayed, indicating that the resource has been created successfully. Click OK.

Figure 4-95 Cluster resource created successfully

374

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

36.Now create a new resource for the Tivoli Netman service by repeating step 27 (shown in Figure 4-89 on page 369). 37.Fill in the resource values in the following way, then click Next. Name Description Resource type Group Enter the name you want to use for this resource, such as Tivoli Netman Service. Enter a description of this resource, such as Tivoli Netman Service. The resource type of service for Tivoli Netman Service. Select Generic Service. Select the group where you want to create this resource in. It must be created in the same group as any dependences (such as the installation disk drive or network).

38.Select the possible nodes that this resource can run on. In this case select both nodes, then click Next.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

375

39.Select all the dependencies that you would like this resource (Tivoli Netman Service) to be dependent on. In this case we only need the Tivoli Token Service to be online before we can start the Tivoli Netman Service, because Tivoli Token Service will not start until the disk, network and IP address are available, as shown in Figure 4-96. Then click Next.

Figure 4-96 Dependencies for IBM Tivoli Workload Scheduler Netman service

40.Add in the service parameters for the service Tivoli Netman Service with the following parameters, then click Next. Service name To find the service name, open the Windows services panel. Go to Start -> Settings -> Control Panel, then open Administrative Tools -> Services. Highlight the service, then click Action -> Properties. Under the General tab on the first line you can see the service name, which in this case is tws_netman_tws8_2. Enter start parameters needed for the service Tivoli Netman Service. In this case, there are no start parameters so leave this field blank.

Start parameters

376

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

41.Repeat steps 32 and 33 by clicking Finish, which should then bring you to a window indicating that the resource was created successfully. Then click OK. 42.Now create a new resource for the IBM Tivoli Workload Scheduler by repeating step 27, as shown in Figure 4-89 on page 369. 43.Fill out the resource values in the following way; when you finish, click Next: Name Description Resource type Group Enter the name you want to use for this resource, such as TWS Workload Scheduler. Enter a description of this resource, such as TWS Workload Scheduler. Select the resource type of service for TWS Workload Scheduler. Select Generic Service. Select the group where you want to create this resource in. It must be created in the same group as any dependences like the installation disk drive or network.

44.Select the possible nodes that this resource can run on. In this case, select both nodes. Then click Next. 45.Select all dependencies that you would like this resource, TWS Workload Scheduler, to be dependent on. In this case we only need the Tivoli Netman Service to be online before we can start the TWS Workload Scheduler, because Tivoli Netman Service will not start until the Tivoli Token Service is started, and Tivoli Token Service will not start until the disk, network and IP address are available. When you finish, click Next. 46.Add in the service parameters for this service, TWS Workload Scheduler, with the following parameters, then click Next. Service name To find the service name, open the Windows services panel. Go to Start -> Settings -> Control Panel, then open Administrative Tools -> Services. Highlight the service, then click Action -> Properties. Under the General tab on the first line you can see the service name, which in this case is tws_maestro_tws8_2. Enter start parameters needed for this service Tivoli NetmanService. In this case there are no start parameters, so leave this field blank.

Start parameters

47.Repeat steps 32 and 33 by clicking Finish, which should then display a screen indicating that the resource was created successfully. Then click OK.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

377

48.At this point all three resources have been created in the cluster. Now you need to change some of the advanced parametersbut only in the TWS Workload Scheduler resource. To do this, open the Cluster Administrator tool. Click the Group that you have defined the TWS Workload Scheduler resource in. Highlight the resource and click Action -> Properties, as shown in Figure 4-97.

Figure 4-97 Cluster Administrator

378

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

49.Now click the Advanced tab, as shown in Figure 4-98, and change the Restart to Do not Restart. Then click OK.

Figure 4-98 The Advanced tab

4.2.2 Configuring the cluster group


Each cluster group has a set of settings that affect the way the cluster fail over and back again. In this section we cover the different options and how they affect TWS. We describe the three main tabs used when dealing with the properties of the cluster group. To modify any of these options: 1. Open Cluster Administrator. 2. In the console tree (usually the left pane), click the Groups folder. 3. In the details pane (usually the right pane), click the appropriate group. 4. On the File menu, click Properties. 5. On the General tab, next to Preferred owners, click Modify.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

379

The General tab is shown in Figure 4-99. Using this tab, you can define the following: Name Description Preferred owner Enter the name of the cluster. Enter a description of this cluster. Select the preferred owner of this cluster. If no preferred owners are specified, failback does not occur, but if more than one node is listed under preferred owners, priority is determined by the order of the list. The group always tries to fail back to the highest priority node that is available.

Figure 4-99 General tab for Group Properties

380

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

The Failover tab is shown in Figure 4-100.Using this table, you can define the following: Threshold Enter a number. This number means the number of times the cluster can fail over within a set time period. To set an accurate number, consider how long it takes for all products in this group to come back online. Also consider the fact that if a services is not available on both sides of the cluster then the cluster software will continue to move it from side to side until it becomes available or the time period is reached. Enter a period of time in which to monitor the cluster, and if it moves more than the threshold number within this period, then do not move any more.

Period

Figure 4-100 Failover tab for Group Properties

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

381

The Failback tab is shown in Figure 4-101 on page 383. Using this tab gives you the choice of two options on where this cluster can or cannot fail back, as follows: Prevent failback If Prevent fallback is set, and provided that all dependences of the cluster are met, this group will run on this side of the cluster until there is a problem and the group will move again. The other way the group can move is by the Cluster Administrator. If Allow fallback is set, then you have two further options: Immediately, and Fallback between. If Immediately is set, then the Group will try to move the cluster back immediately. If Fallback between is set, which is the preferred option, then you can define a time from and to where you would like the cluster group to move back. We recommend using a period of time before Jnextday, but yet allowing enough time for the Group to come back online before Jnextday has to start. But if no preferred owners are specified for the group, then failback does not occur.

Allow failback

382

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 4-101 Failover tab for Group Properties

4.2.3 Two instances of IBM Tivoli Workload Scheduler in a cluster


In this section, we describe how to install two instances of IBM Tivoli Workload Scheduler 8.2 Engine (Master Domain Manager) in a Microsoft Cluster. The configuration will be in a mutual takeover mode, which means that when one side of the cluster is down, you will have two copies of IBM Tivoli Workload Scheduler running on the same node. This configuration is shown in Figure 4-102 on page 384.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

383

Public Network Connection

Private Network Connection

TWS Engine 1

tivw2k1

Shared Disk Volume X:

tivw2k2

TWS Engine 2

Shared Disk Volume Y:

Figure 4-102 Network diagram of the Microsoft Cluster

1. Before starting the installation, some careful planning must take place. To plan most efficiently, you need the following information. Workstation type You need to understand both types of workstations to be installed in the cluster, as this may have other dependences (such as JSC and Framework connectivity) as well as installation requirements. In this configuration we are installing two Master Domain Managers (MDMs). Location of the code This code should be installed on a file system that is external to both nodes in the cluster, but also accessible by both nodes. The location should also be in the same part of the file system (or at least the same drive) as the application that the IBM Tivoli Workload Scheduler engine is going to manage. You also need to look at the way the two instances of IBM Tivoli Workload Scheduler will work together, so you need to make sure that the directory structure does not overlap. Finally, you we need sufficient disk space to install IBM Tivoli Workload Scheduler into. Refer to IBM Tivoli

384

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Workload Scheduler Release Notes V 82 SC32-1277, for information about these requirements. In this configuration, we will install one copy of IBM Tivoli Workload Scheduler 8.2 in the X drive and the other in the Y drive. Installation user Each instance of IBM Tivoli Workload Scheduler needs an individual installation user name, because this user is used to start the services for this instance of IBM Tivoli Workload Scheduler. This installation user must exist on both sides of the cluster, because the IBM Tivoli Workload Scheduler instance can run on both sides of the cluster. It also needs its own home directory to run in, and this home directory must be the same location directory, for the same reasons described in the Location of the code section. In our case, we will use the same names as the Cluster group names. For the first installation, we will use TIVW2KV1; for the second installation, we will use TIVW2KV2. Naming convention Plan your naming convention carefully, because it is difficult to change some of these objects after installing IBM Tivoli Workload Scheduler (in fact, it is easier to reinstall rather than change some objects). The naming convention that you need to consider will be used for installation user names, workstation names, cluster group names, and the different resource names in each of the cluster groups. Use a naming convention that makes it easy to understand and identify what is running where, and that also conforms to the allowed maximum characters for that object. Netman port This port is used for listening for incoming requests, and because we have a configuration where two instances of IBM Tivoli Workload Scheduler can be running on the same node (mutual takeover scenario), we need to set two different port numbers for each listing instance of IBM Tivoli Workload Scheduler. The two port numbers that are chosen must not conflict with any other network products installed on these two nodes.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

385

In this installation we use port number 31111 for the first installation of TIVW2KV1, and port 31112 for the second installation of YTIVW2KV2. IP address The IP address that you define in the workstation definition for each IBM Tivoli Workload Scheduler instance should not be an address that is bound to a particular node, but the one that is bound to the cluster group. This IP address should be addressable from the network. If the two IBM Tivoli Workload Scheduler instances are to move separately, then you will need two IP addresses, one for each cluster group. In this installation, we use 9.3.4.199 for cluster group TIVW2KV1, and 9.3.4.175 for cluster group TIVW2KV2. 2. After gathering all the information in step 2 and deciding on a naming convention, you can install the first IBM Tivoli Workload Scheduler engine in the cluster. To do this, repeat steps 1 through to 20 in 4.2.1, Single instance of IBM Tivoli Workload Scheduler on page 347, but use the parameters listed in Table 4-6.
Table 4-6 IBM Tivoli Workload Scheduler workstation definition Argument Installation User Name Password Value TIVW2KV1 TIVW2KV1 Description In our case, we used the name of the cluster group as the installation user name. To keep the installation simple, we used the same password as the installation user name. However, in a real customer installation, you would use the password provided by the customer. This has to be installed on the disk that is associated with cluster group TIVW2KV1. In our case, that is the X drive. This is used for the heading of reports, so enter the name of the company that this installation is for. In our case, we used IBM ITSO. Because we are installing a Master Domain Manager, the Master CPU name is the same as This CPU name.

Destination Directory Company Name Master CPU name

X:\win32app\t ws\tivw2kv1 IBM ITSO

TIVW2KV1

386

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Argument TCP port Number

Value 31111

Description This specifies the TCP port number that is used for communications. The default is 31111. If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different.

3. When you get to step 20, replace the Installation Arguments with the values listed in Table 4-6 on page 386. 4. When you get to step 22, replace the workstation definition with the arguments listed inTable 4-7.
Table 4-7 IBM Tivoli Workload Scheduler workstation definition Argument cpuname Description Value TIVW2KV1 Master CPU for the first cluster group WNT 9.3.4.199 Description Verify that the workstation name is TIVW2KV1, as this should be filled in during the installation. Enter a description that is appropriate for this workstation. Specifies the operating system of the workstation. Valid values include UNIX, WNT, and OTHER. This field is the address that is associated with the first cluster group. This address can be a fully-qualified domain name or an IP address. Specify a domain name for this workstation. The default name is MASTERDM. Specifies the TCP port number that is used for communication. The default is 31111. If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different. This field has no value, because it is a key word to start the extra options for the workstation. On When set to ON, this specifies whether to open the link between workstations at the beginning of each day during startup. When set to ON, this workstation will track dependencies for all jobs and job streams, including those running on other workstations.

OS Node

Domain TCPaddr

Masterdm 31111

For Maestro Autolink

Resolvedep

On

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

387

Argument Fullstatus

Value On

Description With this set to ON, this workstation will be updated with the status of jobs and job streams running on all other workstations in its domain and in subordinate domains, but not on peer or parent domains. This field has no value, because it is a key word to end the workstation definition.

End

5. Now finish off the first installation by repeating steps 23 through to 27. However, at step 25, use the following command:
Maestrohome\bin\conman limit cpu=tivw2kv1;10

To verify that this command has worked correctly, run the conman show cpus command:
Maestrohome\bin\conman sc=tivw2kv1 The

The conman output, shown in Example 4-90, contains the number 10 in the fourth column, illustrating that the command has worked correctly.
Example 4-90 conman output X:\win32app\TWS\tivw2kv1\bin>conman sc=tws82 TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2001 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Installed for user ''. Locale LANG set to "en" Schedule (Exp) 06/11/03 (#1) on TIVW2KV1. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0 sc=tivw2kv1 CPUID RUN NODE LIMIT FENCE DATE TIME STATE METHOD DOMAIN TIVW2KV1 1 *WNT MASTER 10 0 06/11/03 12:08 I J MASTERDM

388

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

6. After installing the first IBM Tivoli Workload Scheduler instance in the cluster you can now install the second IBM Tivoli Workload Scheduler engine in the cluster by repeating steps 1 through to 20 in 4.2.1, Single instance of IBM Tivoli Workload Scheduler on page 347, using the parameters listed in Table 4-8.
Table 4-8 IBM Tivoli Workload Scheduler workstation definition Argument Installation User Name Password Value TIVW2KV2 TIVW2KV2 Description In this case, we used the name of the cluster group as the installation user name. To keep this installation simple, we used the same password as the installation user name, but in a real customer installation you would use the password provided by the customer. This has to be installed on the disk that is associated with cluster group TIVW2KV2; in this case, that is the Y drive. This is used for the heading of reports, so enter the name of the Company that this installation is for. In our case, we used IBM ITSO. Because we are installing a Master Domain Manager, the Master CPU name is the same as This CPU name. Specifies the TCP port number that is used for communication. The default is 31111. If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different.

Destination Directory Company Name Master CPU name TCP Port Number

Y:\win32app\t ws\tivw2kv2 IBM ITSO

TIVW2KV2

31112

7. When you get to step 20, replace the Installation Arguments with the values in Table 4-8.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

389

8. When you get to step 22, replace the workstation definition with the arguments list in Table 4-9.
Table 4-9 IBM Tivoli Workload Scheduler workstation definition Argument cpuname Description Value TIVW2KV2 Master CPU for the first cluster group WNT 9.3.4.175 Description Check that the workstation name is TIVW2KV2, as this should be filled in during the installation. Type in a description that is appropriate for this workstation. Specifies the operating system of the workstation. Valid values include UNIX, WNT, and OTHER. This field is the address that is associated with the first cluster group. This address can be a fully-qualified domain name or an IP address. Specify a domain name for this workstation. The default name is MASTERDM. Specifies the TCP port number that is used for communication. The default is 31111. If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different. This field has no value, because it is a key word to start the extra options for the workstation. On When set to ON, it specifies whether to open the link between workstations at the beginning of each day during startup. With this set to ON, this workstation will track dependencies for all jobs and job streams, including those running on other workstations. With this set to ON, this workstation will be updated with the status of jobs and job streams running on all other workstations in its domain and in subordinate domains, but not on peer or parent domains. This field has no value, because it is a key word to end the workstation definition.

OS Node

Domain TCPaddr

Masterdm 31112

For Maestro Autolink

Resolvedep

On

Fullstatus

On

End

9. Now finish the first installation by repeating steps 23 through 27.

390

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

However, when you reach step 25, use the following command:
Maestrohome\bin\conman limit cpu=tivw2kv2;10

Run the conman show cpus command to verify that the command has worked correctly:
Maestrohome\bin\conman sc=tivw2kv2

The conman output, shown in Example 4-91, contains the number 10 in the fourth column, indicating that the command has worked correctly.
Example 4-91 conman output Y:\win32app\TWS\tivw2kv2\bin>conman sc=tws82 TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7) Licensed Materials Property of IBM 5698-WKB (C) Copyright IBM Corp 1998,2001 US Government User Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Installed for user ''. Locale LANG set to "en" Schedule (Exp) 06/11/03 (#1) on TIVW2KV2. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0 sc=tivw2kv2 CPUID RUN NODE LIMIT FENCE DATE TIME STATE METHOD DOMAIN TIVW2KV2 1 *WNT MASTER 10 0 06/11/03 12:08 I J MASTERDM

10.The two instances of IBM Tivoli Workload Scheduler are installed in the cluster. Now you need to configure the cluster software so that the two copies of IBM Tivoli Workload Scheduler will work in a mutual takeover. 11.You can configure the two instances of IBM Tivoli Workload Scheduler in the cluster services by creating two sets of new resources for each of the three IBM Tivoli Workload Scheduler services: Tivoli Netman, Tivoli Token Service and Tivoli Workload Scheduler. These two sets of three new resources have to be created in the same cluster group as the IBM Tivoli Workload Scheduler installation drive. The first set (TIVW2KV1) was installed in the X drive, so this drive is associated with cluster group TIVW2KV1 . The second set (TIVW2KV2) was installed in the Y drive, so this drive is associated with cluster group TIVW2KV2. 12.Create the new resource Tivoli Token Service for the two IBM Tivoli Workload Scheduler engines by repeating steps 28 through to 34 in 4.2.1, Single instance of IBM Tivoli Workload Scheduler on page 347. Use the parameters in Table 4-10 on page 392 for the first set (TIVW2KV1), and use the parameters in Table 4-11 on page 392 for the second set (TIVW2KV2).

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

391

Table 4-10 Tivoli Token Service definition for first instance REF figure 4-90 Argument Name Value ITIVW2KV1 Token Service Tivoli Token Service for the first instance Generic Service ITIVW2KV1 tws_tokensrv_ TIVW2KV1 Description Enter the name of the new resource. In our case, we used the cluster group name followed by the service. Enter a description of this resource Tivoli Token Service for the first instance. Select the resource type of service for ITIVW2KV1 - Token Service. Select Generic Service. Select the group where you want to create this resource in. Enter the service name; this can be found in the services panel. This service does not need any start parameters, so leave this blank.

4-90

Description

4-90

Resource type Group Service name Start parameters

4-90 4-93 4-93

Table 4-11 Tivoli Token Service definition for second instance REF figure 4-90 Argument Name Value ITIVW2KV2 Token Service Tivoli Token Service for the second instance Generic Service ITIVW2KV2 tws_tokensrv_ TIVW2KV2 Description Enter the name of the new resource. In our case, we used the cluster group name followed by the service. Enter a description of this resource Tivoli Token Service for the second instance. Select the resource type of service for ITIVW2KV2 - Token Service. Select Generic Service. Select the group where you want to create this resource in. Enter the service name; this can be found in the services panel This service dose not need any start parameters, so leave this blank.

4-90

Description

4-90

Resource type Group Service name Start parameters

4-90 4-93 4-93

392

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

13.Create the new resource Tivoli Netman Service for the two IBM Tivoli Workload Scheduler engines by repeating steps 35 through to 40 in 4.2.1, Single instance of IBM Tivoli Workload Scheduler on page 347. Use the parameters in Table 4-12 for the first set (TIVW2KV1) and use the parameters in Table 4-13 for the second set (TIVW2KV2) below.
Table 4-12 Tivoli Netman Service definition for first instance REF figure 4-90 Argument Name Value ITIVW2KV1 Netman Service Tivoli Netman Service for the first instance Generic Service ITIVW2KV1 tws_netman_T IVW2KV1 Description Enter the name of the new resource. In this case, we used the cluster group name followed by the service. Enter a description of this resource Tivoli Netman Service for the first instance. Select the resource type of service for ITIVW2KV1 - Netman Service. Select Generic Service. Select the group where you want to create this resource in. Type in the service name; this can be found in the services panel. This service does not need any start parameters, so leave this blank. ITIVW2KV1 Token Service The only resource dependency is the ITIVW2KV1 - Token Service.

4-90

Description

4-90

Resource type Group Service name Start parameters Resource Dependenci es

4-90 4-93

4-93

4-96

Table 4-13 Tivoli Netman Service definition for second instance REF Figure 4-90 Argument Name Value ITIVW2KV2 Netman Service Tivoli Netman Service for the second instance Description Enter the name of the new resource. In our case, we used the cluster group name followed by the service. Enter a description of this resource Tivoli Netman Service for the second instance.

4-90

Description

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

393

REF Figure 4-90

Argument Resource type Group Service name Start parameters Resource Dependenci es

Value Generic Service ITIVW2KV2 tws_netman_T IVW2KV2

Description Select the resource type of service for ITIVW2KV2 - Netman Service. Select Generic Service. Select the group where you want to create this resource in. Type in the service name; this can be found in the services panel. This service does not need any start parameters, so leave this blank.

4-90 4-93 4-93 4-96

ITIVW2KV2 Token Service

The only resource dependency is the ITIVW2KV2 - Token Service.

14.Create the new resource Tivoli Workload Scheduler for the two IBM Tivoli Workload Scheduler engines by repeating steps 41 through to 48 in 4.2.1, Single instance of IBM Tivoli Workload Scheduler on page 347. Use the parameters inTable 4-14 for the first set (TIVW2KV1) and use the parameters in Table 4-15 on page 395 for the second set (TIVW2KV2).
Table 4-14 Tivoli Workload Scheduler definition for first instance REF figure 4-90 Argument Name Value ITIVW2KV1 Tivoli Workload Scheduler Tivoli Workload Scheduler for the first instance Generic Service ITIVW2KV1 tws_maestro_ TIVW2KV1 Description Enter the name of the new resource. In our case, we used the cluster group name followed by the service. Enter a description of this resource Tivoli Workload Scheduler for the second instance.

4-90

Description

4-90

Resource type Group Service name

Select the resource type of service for ITIVW2KV1 - Tivoli Workload Scheduler. Select Generic Service. Select the where you want to create this resource in. Enter the service name; this can be found in the services panel.

4-90 4-93

394

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

REF figure 4-93 4-96

Argument Start parameters Resource Dependenci es

Value

Description This service does not need any start parameters, so leave this blank.

ITIVW2KV1 Netman Service

The only resource dependency is the ITIVW2KV1 - Netman Service.

Table 4-15 Tivoli Workload Scheduler definition for second instance REF figure 4-90 Argument Name Value ITIVW2KV2 Tivoli Workload Scheduler Tivoli Workload Scheduler for the second instance Generic Service ITIVW2KV2 tws_maestro_ TIVW2KV2 Description Enter the name of the new resource. In our case, we used the cluster group name followed by the service. Enter a description of this resource Tivoli Workload Scheduler for the second instance.

4-90

Description

4-90

Resource type Group Service name Start parameters Resource Dependenci es

Select the resource type of service for ITIVW2KV2 - Tivoli Workload Scheduler. Select Generic Service. Select the group where you want to create this resource in. Enter the service name; this can be found in the services panel. This service does not need any start parameters, so leave this blank.

4-90 4-93

4-93

4-96

ITIVW2KV2 Netman Service

The only resource dependency is the ITIVW2KV2 - Netman Service.

15.All resources are set up and configured correctly. Now configure the cluster groups by going through the steps in 4.2.2, Configuring the cluster group on page 379. Use the parameters in Table 4-16 on page 396 for the first set (TIVW2KV1), and use the parameters in Table 4-13 on page 393 for the second set (TIVW2KV2).

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

395

Table 4-16 Cluster group settings for first instance REF figure 4-99 General tab 4-99 General tab Argument Name Value ITIVW2KV1 Group This group is for the first instance of IBM Tivoli Workload Scheduler TIVW2KV1 10 Description This name should be there by default. If it is not, then verify that the correct group is selected. Enter a description of this group.

Description

4-99 General Tab 4-100 Failover Tab 4-100 Failover Tab 4-101 Failback Tab 4-101 Failback Tab

Preferred owner Threshold

Select the preferred owner for this group. We selected TIVW2KV1. Enter a number to define that this group can fail over within a set time period. Enter a number to define that this group can fail over within this period. We selected 6 hours. This will enable the facility to failback to the preferred owner. Enter the time range that you would like the group to failback.

Period

Allow failback Failback between

Check Allow Failback 4 and 6

16.You now have the two instances of IBM Tivoli Workload Scheduler engine installed on both sides and configured within the cluster, and also the cluster configured in the best way to satisfy IBM Tivoli Workload Scheduler. 17.To test this installation, open the Cluster Administrator. Expand the group to show the two groups. Highlight one TIVW2KV1. Go to File -> Move Group. All resources should go offline, and then the owner should change from TIVW2K1 to TIVW2K2. Then all resources should come back online, with the new owner.

4.2.4 Installation of the IBM Tivoli Management Framework


The IBM Tivoli Management Framework (Tivoli Framework) is used as an authenticating layer for any user that is using the Job Scheduling Console to connect with the IBM Tivoli Workload Scheduler engine. There are two products that get installed in the Framework: Job Scheduling Services (JSS), and the Job

396

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Scheduling Console (JSC). They make up the connection between the Job Scheduling Console and the IBM Tivoli Workload Scheduler engine, as shown in Figure 4-103.

Job Scheduling Console (JSC)


TWS Extensions GUI base code OPC Extensions

TMR
Figure 4-103 IBM Tivoli Workload Scheduler user authentication flow

There are a number of ways to install the Tivoli Framework. You can install the Tivoli Framework separately from the IBM Tivoli Workload Scheduler engine. In this case, install the Tivoli Framework before installing IBM Tivoli Workload Scheduler. If there is no Tivoli Framework installed on the system, you can use the Full install option when installing IBM Tivoli Workload Scheduler. This will install the Tivoli Management Framework 4.1, Job Scheduling Services (JSS), Job Scheduling Connector (JSC), and add the Tivoli Job Scheduling administration user. In this section, we describe how to install the IBM Tivoli Management Framework separately. After or before IBM Tivoli Workload Scheduler is configured for Microsoft Cluster and made highly available, you can add IBM Tivoli

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

397

Management Framework so that the Job Scheduling Console component of IBM Tivoli Workload Scheduler can be used. Note: IBM Tivoli Management Framework should be installed prior to IBM Tivoli Workload Scheduler Connector installation. For instructions on installing a TMR server, refer to Chapter 5 of Tivoli Enterprise Installation Guide Version 4.1, GC32-0804. Here, we assume that you have already installed Tivoli Management Framework, and have applied the latest set of fix packs.

Because the IBM Tivoli Management Framework is not officially supported in a mutual takeover mode, we will install on the local disk on each side of the cluster, as shown in Figure 4-104.
Public N ork etw C onnection O this local disk w n e w install theTM , ill R JS andJS S C O this local disk w n e w install theTM , ill R JS andJS S C

rivateN ork P etw onnection C

Local D isk Node1

Shared D isk Node2

Local D isk

Figure 4-104 Installation location for TMRs

The following instructions are only a guide to installing the Tivoli Framework. For more detailed information, refer to Tivoli Enterprise Installation Guide Version 4.1, GC32-0804. To install Tivoli Framework, follow these steps: 1. Select node1 to install the Tivoli Framework on. In our configuration, node 1 is called TIVW2K1. 2. Insert the Tivoli Management Framework (1 of 2) CD into the CD-ROM drive, or map the CD from a drive on a remote system. 3. From the taskbar, click Start, and then select Run to display the Run window. 4. In the Open field, type x:\setup, where x is the CD-ROM drive or the mapped drive. The Welcome window is displayed.

398

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5. Click Next. The License Agreement window is displayed. 6. Read the license agreement and click Yes to accept the agreement. The Accounts and File Permissions window is displayed. 7. Click Next. The Installation Password window is displayed. 8. 7.In the Installation Password window, perform the following steps: a. In the Password field, type an installation password, if desired. If you specify a password, this password must be used to install Managed Nodes, to create interregional connections, and to perform any installation using Tivoli Software Installation Service. Note: During installation the specified password becomes the installation and the region password. To change the installation password, use the odadmin region set_install_pw command. To change the region password, use the odadmin region set_region_pw command. Note that if you change one of these passwords, the other password is not automatically changed. b. Click Next. The Remote Access Account window is displayed. 9. In the Remote Access Account window, perform the following steps: a. Type the Tivoli remote access account name and password through which Tivoli programs will access remote file systems. If you do not specify an account name and password and you use remote file systems, Tivoli programs will not be able to access these remote file systems. Note: If you are using remote file systems, the password must be at least one character. If the password is null, the object database is created, but you cannot start the object dispatcher (the oserv service). b. Click Next. The Setup Type window is displayed. 10.In the Setup Type window, do the following: a. Select one of the following setup types: Typical - Installs the IBM Tivoli Management Framework product and its documentation library. Compact - Installs only the IBM Tivoli Management Framework product. Custom - Installs the IBM Tivoli Management Framework components that you select.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

399

b. Accept the default destination directory or click Browse to select a path to another directory on the local system. Note: Do not install on remote file systems or share Tivoli Framework files among systems in a Tivoli environment. c. Click Next. If you selected the Custom option, the Select Components window is displayed. If you selected Compact or Typical, go to step 12. 11.(Custom setup only) In the Select Components window, do the following: a. Select the components to install. From this window you can preview the disk space required by each component, as well as change the destination directory. b. If desired, click Browse to change the destination directory. c. Click Next. The Choose Database Directory window is displayed. 12.In the Choose Database Directory window, do the following: a. Accept the default destination directory, or click Browse to select a path to another directory on the local system. b. Click Next. The Enter License Key window is displayed. 13.In the Enter License Key window, do the following: a. In the Key field, type: IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41. b. Click Next. The Start Copying Files window is displayed. 14.Click Next. The Setup Status window is displayed. 15.After installing the IBM Tivoli Management Framework files, the setup program initializes the Tivoli object dispatcher server database. When the initialization is complete, you are prompted to press any key to continue. 16.If this is the first time you installed IBM Tivoli Management Framework on this system, you are prompted to restart the machine. Tip: Rebooting the system loads the TivoliAP.dll file. 17.After the installation completes, configure the Windows operating system for SMTP e-mail. From a command line prompt, enter the following commands:
%SystemRoot%\system32 \drivers \etc \tivoli \setup_env.cmd bash wmailhost hostname

18.Tivoli Management Framework is installed on node 1, so now install it on node 2. In our configuration, node 2 is called TIVW2K2.

400

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

19.Log into node 2 (TIVW2K2) and repeat steps 2 through to 17.

4.2.5 Installation of Job Scheduling Services


To install IBM Workload Scheduler Job Scheduling Services 8.2, you must have the following component installed within your IBM Tivoli Workload Scheduler 8.2 network: Tivoli Framework 3.7.1 or 4.1 You must install the Job Scheduling Services on the Tivoli Management Region server or on a Managed Node on the same workstation where the Tivoli Workload Scheduler engine code is installed. Note: You only have to install this component if you wish to monitor or access the local data on the Tivoli Workload Scheduler engine by the Job Scheduling Console. You can install and upgrade the components of the Job Scheduling Services using any of the following installation mechanisms: By using an installation program, which creates a new Tivoli Management Region server and automatically installs or upgrades the IBM Workload Scheduler Connector and Job Scheduling Services By using the Tivoli desktop, where you select which product and patches to install on which machine By using the winstall command provided by Tivoli Management Framework, where you specify which products and patches to install on which machine Here we provide an example of installing the Job Scheduling Services using the Tivoli Desktop. Ensure you have set the Tivoli environment by issuing the command c:\windirsystem32\drivers\etc\Tivoli\setup_env.cmd, then follow these steps: Note: Before installing any new product into the Tivoli Management Region server. make a backup of the Tivoli database. 1. First select node1 to install the Tivoli Job Scheduling Services on. In our configuration, node 1 is called TIVW2K1. 2. Open the Tivoli Desktop on TIVW2K1. 3. From the Desktop menu choose Install, then Install Product. The Install Product window is displayed.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

401

4. Click Select Media to select the installation directory. The File Browser window is displayed. 5. Type or select the installation path. This path includes the directory containing the CONTENTS.LST file. 6. Click Set Media & Close. You return to the Install Product window. 7. In the Select Product to Install list, select Tivoli Job Scheduling Services v. 1.2. 8. In the Available Clients list, select the nodes to install on and move them to the Clients to Install On list. 9. In the Install Product window, click Install. The Product Install window is displayed, which shows the operations to be performed by the installation program. 10.Click Continue Install to continue the installation, or click Cancel to cancel the installation. 11.The installation program copies the files and configures the Tivoli database with the new classes. When the installation is complete, the message Finished product installation appears. Click Close. 12.Now select node 2 to install the Tivoli Job Scheduling Services on. In our configuration, node 2 is called TIVW2K2. 13.Repeat steps 2 through to 11.

4.2.6 Installation of Job Scheduling Connector


To install IBM Workload Scheduler Connector 8.2, you must have the following components installed within your Tivoli Workload Scheduler 8.2 network: Tivoli Framework 3.7.1 or 4.1 Tivoli Job Scheduling Services 1.3 You must install IBM Tivoli Workload Scheduler Connector on the Tivoli Management Region server or on a Managed Node on the same workstation where the Tivoli Workload Scheduler engine code is installed. Note: You only have to install this component if you wish to monitor or access the local data on the Tivoli Workload Scheduler engine by the Job Scheduling Console. You can install and upgrade the components of IBM Tivoli Workload Scheduler Connector using any of the following installation mechanisms:

402

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

By using an installation program, which creates a new Tivoli Management Region server and automatically installs or upgrades IBM Workload Scheduler Connector and Job Scheduling Services By using the Tivoli Desktop, where you select which product and patches to install on which machine By using the winstall command provided by Tivoli Management Framework, where you specify which products and patches to install on which machine Connector installation and customization varies, depending on whether your Tivoli Workload scheduler master is on a Tivoli Server or a Managed Node. When the Workload Scheduler master is on a Tivoli server, you must install both Job Scheduling Services and the Connector on the Tivoli server of your environment. You must also create a Connector instance for the Tivoli server. You can do this during installation by using the Create Instance check box and completing the required fields. In this example, we are installing the connector in this type of configuration. When the Workload Scheduler master is on a Managed Node, you must install Job Scheduling Services on the Tivoli Server and on the Managed Node where the master is located. You must then install the Connector on the Tivoli server and on the same nodes where you installed Job Scheduling Services. Ensure that you do not select the Create Instance check box. If you have more than one node where you want to install the Connector (for example, if you want to access the local data of a fault-tolerant agent through the Job Scheduling Console), you can install Job Scheduling Services and the connector on multiple machines. However, in this case you should deselect the Create Instance check box. Following is an example of how to install the Connector using the Tivoli Desktop. Ensure you have installed Job Scheduling Services and have set the Tivoli environment. Then follow these steps: Note: Before installing any new product into the Tivoli Management Region server, make a backup of the Tivoli database. 1. Select node 1 to install Tivoli Job Scheduling Connector on. In our configuration, node 1 is called TIVW2K1. 2. Open the Tivoli Desktop on TIVW2K1. 3. From the Desktop menu choose Install, then Install Product. The Install Product window is displayed. 4. Click Select Media to select the installation directory. The File Browser window is displayed.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

403

5. Type or select the installation path. This path includes the directory containing the CONTENTS.LST file. 6. Click Set Media & Close. You will return to the Install Product window. 7. In the Select Product to Install list, select Tivoli TWS Connector v. 8.2 The Install Options window is displayed. 8. This window enables you to: Install the Connector only. Install the Connector and create a Connector instance. 9. To install the Connector without creating a Connector instance, leave the Create Instance check box blank and leave the General Installation Options fields blank. These fields are used only during the creation of the Connector Instance. 10.To install the Connector and create a Connector Instance: a. Select the Create Instance check box. b. In the TWS directory field, specify the directory where IBM Tivoli Workload Scheduler is installed. c. In the TWS instance name field, specify a name for the IBM Tivoli Workload Scheduler instance on the Managed Node. This name must be unique in the network. It is preferable to use the name of the scheduler agent as the instance name. 11.Click Set to close the Install Options window and return to the Install Product window. 12.In the Available Clients list, select the nodes to install on and move them to the Clients to Install On list. 13.In the Install Product window, click Install. The Product Install window is displayed, which shows you the progress of the installation. 14.Click Continue Install to continue the installation, or click Cancel to cancel the installation. 15.The installation program copies the files and configures the Tivoli database with the new classes. When the installation is complete, the message Finished product installation appears. Click Close. 16.Now select node 2 to install the Tivoli Job Scheduling Connector on. In our configuration, node 2 is called TIVW2K2. 17.Repeat steps 2 through to 15.

404

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

4.2.7 Creating Connector instances


You need to create one Connector instance on each Framework server (one on each side of the cluster) for each engine that you want to access with the Job Scheduling Console. If you selected the create instance check box when running the installation program or installing from the Tivoli desktop, you do not need to perform the following procedure, but in our environment we do needed to do this. To create Connector instances from the command line, ensure you set the Tivoli environment, then enter the following command on the Tivoli server or Managed Node where you installed the Connector that you need to access through the Job Scheduling Console:
wtwsconn.sh -create -h node -n instance_name -t TWS_directory

So in our case we need to run this four times, twice on one Framework server, and twice on the other, using these parameters: First, on node TIVW2K1
wtwsconn.sh -create -n TIVW2K1_rg1 -t X:\win32app\TWS\TWS82 wtwsconn.sh -create -n TIVW2K2_rg1 -t Y:\win32app\TWS\TWS82

Then on node TIVW2K2


wtwsconn.sh -create -n TIVW2K1_rg2 -t X:\win32app\TWS\TWS82 wtwsconn.sh -create -n TIVW2K2_rg2 -t Y:\win32app\TWS\TWS82

4.2.8 Interconnecting the two Tivoli Framework Servers


Now that we have successfully installed and configured the two instances of the IBM Tivoli Workload Scheduler engine on the shared disk system in the Microsoft Cluster (4.2.3, Two instances of IBM Tivoli Workload Scheduler in a cluster on page 383) and the two Tivoli Management Frameworks, one on each workstation in the cluster on the local disk (4.2.4, Installation of the IBM Tivoli Management Framework on page 396). Also we have successfully installed the Job Scheduling Services (4.2.5, Installation of Job Scheduling Services on page 401), and Job Scheduling Connectors in both of the Tivoli Management Framework. We now need to share the IBM Tivoli Management Framework resources so that if one side of the cluster is down, then the operator can log into the other Tivoli Management Framework and see both IBM Tivoli Workload Scheduler engines through the connectors. To achieve this we need to share the resources between the two Tivoli Framework servers; this is called interconnection.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

405

Framework interconnection is a complex subject. We will show how to interconnect the Framework servers for our environment, but you should plan your interconnection if your installation of IBM Tivoli Workload Scheduler is part of a larger Tivoli Enterprise environment To interconnect the Framework servers for IBM Tivoli Workload Scheduler for the environment used in this redbook, first ensure you have set the Tivoli environment by issuing c:\windirsystem32\drivers\etc\Tivoli\setup_env.cmd Then follow these steps: 1. Before starting, make a backup of the IBM Tivoli Management Framework object database using the wbkupdb command. Log onto each as the Windows Administrator, and run a backup of the object database on each node. 2. Run the wlookup commands on the cluster node 1 to determine that the Framework objects exists before interconnecting them. The syntax of the command is:
wlookup -Lar ManagedNode

and
wlookup -Lar MaestroEngine

3. Run the same wlookup commands on the other node in the cluster to see if the objects exist. 4. Interconnect the Framework servers in a two-way interconnection using the wconnect command. For a full description of how to use this command, refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806. While logged on to node TIVW2K1, enter the following command:
wconnect -c none -l administrator -m Two-way -r none tivw2k2

Note: The two-way interconnection command only needs to be performed on one of the connections. If you have two cluster nodes, you only need to run the wconnect command on one of them. 5. Use the wlsconn and odadmin commands to verify that the interconnection has worked between the two Framework servers. Look at the output of the wlsconn command; it will contain the primary IP hostname of the node that is interconnected to in the preceding step. In our environment, the primary IP hostname of cluster node TIVW2K2 is found under the SERVER column in the output of the wlsconn command. The same value is found under the Hostname(s) column in the output of the odadmin command, on the row that shows the Tivoli region ID of the cluster node.

406

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

6. Interconnecting Framework servers only establishes a communication path. The Framework resources that need to be shared between Framework servers have to be pulled across the servers by using an explicit updating command. Sharing a Framework resource shares all the objects that the resource defines. This enables Tivoli administrators to securely control which Framework objects are shared between Framework servers, and to control the performance of the Tivoli Enterprise environment by leaving out unnecessary resources from the exchange of resources between Framework servers. Exchange all relevant Framework resources among cluster nodes by using the wupdate command. In our environment we exchanged the following Framework resources: ManagedNode MaestroEngine MaestroDatabase MaestroPlan SchedulerEngine SchedulerDatabase SchedulerPlan

Important: The wupdate command must be run on all cluster nodes, even on two-way interconnected Framework servers. The SchedulerEngine Framework resource enables the interconnected scheduling engines to present themselves in the Job Scheduling Console. The MaestroEngine Framework resource enables the wmaeutil command to manage running instances of Connectors. 7. Now verify the exchange of the Framework resources has worked. You can use the wlookup command with the following parameters:
wlookup -Lar ManagedNode

and
wlookup -Lar MaestroEngine

When you use the command wlookup with the parameter ManagedNode, you will see the two nodes in this cluster. When you use the same command with the parameter MaestroEngine, you should see four names that are associated with the two connector instances. 8. Run the same sequence of wlookup commands, but on the cluster node on the opposite side of the interconnection. The output from the commands should be identical to the same commands run on the cluster node in the preceding step.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

407

9. Log into both cluster nodes through the Job Scheduling Console, using the service IP labels of the cluster nodes and the root user account. All scheduling engines (corresponding to the configured Connectors) on all cluster nodes appear. Those scheduling engines marked inactive are not active because the resource group is not running on that cluster node. 10.Set up a periodic job to exchange Framework resources by using the wupdate command shown in the preceding steps. The frequency that the job should run at depends upon how often changes are made to the Connector objects. For most sites, best practice is a daily update about an hour before Jnextday. Timing it before Jnextday makes the Framework resource update compatible with any changes to the installation location of IBM Tivoli Workload Scheduler. These changes are often timed to occur right before Jnextday is run.

4.2.9 Installing the Job Scheduling Console


The Job Scheduling Console can be installed on any workstation that has a TCP/IP connection. However, to use the Job Scheduling Console Version 1.3 you should have the following components installed within your IBM Tivoli Workload Scheduler 8.2 network: Tivoli Framework 3.7.1 or 4.1 Tivoli Job Scheduling Services 1.3 IBM Tivoli Workload Scheduler Connector 8.2 For a full description of the installation, refer to IBM Tivoli Workload Scheduler Job Scheduling Console Users Guide Feature Level 1.3, SC32-1257, and to IBM Tivoli Workload Scheduler Version 8.2: New Features and Best Practices, SG24-6628. For the most current information about supported platforms and system requirements, refer to IBM Tivoli Workload Scheduler Job Scheduling Console Release Notes, SC32-1258. An installation program is available for installing the Job Scheduling Console. You can install directly from the CDs. Alternatively, copy the CD to a network drive and map that network drive. You can install the Job Scheduling Console using any of the following installation mechanisms: By using an installation wizard that guides the user through the installation steps By using a response file that provides input to the installation program without user intervention By using Software Distribution to distribute the Job Scheduling Console files

408

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Here we provide an example of the first method, using the installation wizard interactively. The installation program can perform a number of actions: Fresh install Add new languages to an existing installation Repair an existing installation Here we assume that you are performing a fresh install. The installation is exactly the same for a non-cluster installation as for a clustered environment. 1. Insert the IBM Tivoli Workload Scheduler Job Scheduling Console CD 1 in the CD-ROM drive. 2. Navigate to the JSC directory. 3. Locate the directory of the platform on which you want to install the Job Scheduling Console, and run the setup program for the operating system on which you are installing: Windows: setup.exe UNIX: setup.bin 4. The installation program is launched. Select the language in which you want the program to be displayed, and click OK. 5. Read the welcome information and click Next. 6. Read the license agreement, select the acceptance radio button, and click Next. 7. Select the location for the installation, or click Browse to install to a different directory. Click Next. Note: The Job Scheduling Console installation directory inherits the access rights of the directory where the installation is performed. Because the Job Scheduling Console requires user settings to be saved, it is important to select a directory in which users are granted access rights.

8. On the dialog displayed, you can select the type of installation you want to perform: Typical. English and the language of the locale are installed. Click Next. Custom. Select the languages you want to install and click Next. Full. All languages are automatically selected for installation. Click Next. 9. A panel is displayed where you can select the locations for the program icons. Click Next. 10.Review the installation settings and click Next. The installation is started.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

409

11.When the installation completes, a panel will either display a successful installation, or it will contain a list of which items failed to install and the location of the log file containing the details of the errors. 12.Click Finish.

4.2.10 Scheduled outage configuration


After IBM Tivoli Workload Scheduler is installed as described above and working correctly, there can be two separate situations where a IBM Tivoli Workload Scheduler Master Domain Manager or domain manager that is configured in a cluster does not link to the agents that are defined in the network it is managing. The descriptions of those situations and solutions are described here:

Situation 1
The first situation is when IBM Tivoli Workload Scheduler Master Domain Manager or domain manager failover or fail back occurs in the cluster. When the IBM Tivoli Workload Scheduler engine restarts, the Fault Tolerant Agents that are defined in the network that this Master Domain Manager is managing can remain in a state of UNLINKED.

Solution
The solution for this situation is to issue the conman command conman link @;noask. When you issue this command, the IBM Tivoli Workload Scheduler engine will link up all the Fault Tolerant agents that it is managing in its network. To make this an unattended solution, you can put this command into a command file; after the command is in a command file, you can run the command file in the failover/fail back procedure. To make a command file run as a service, use the program svrany.exe, which can be found in the Resource Kit. svrany.exe will allow any bat, cmd or exe file to be run as a service. If the bat, cmd or exe file is not a real service, it is executed once at the start of the service, which is just what is required in this situation. To set up this unattended solution, follow this procedure on each node in the cluster: 1. Create a service with the command INSTSRV service_name full_path_to_svrany.exe. This will execute as the IBM Tivoli Workload Scheduler installation userid. 2. Run regedit to edit the created service: a. Add a @Parameters key (same level as Enum and Security). b. Add a String Application to the added key.

410

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

c. Assign Value of added string = full_path_to_command_link_cmd:

Figure 4-105 New TWS_Link service

3. Set up a cluster service that refers to the Link node service (similar to the cluster services set up for IBM Tivoli Workload Scheduler). 4. Make the TWS_Link cluster service dependent on the IBM Tivoli Workload Scheduler cluster service. Now when the node fails over or fails back, the cluster will do the following: When the Network, IP and Disk are available, the Token cluster service will start. When the Token cluster service is available, the Netman cluster service will start. When the Netman cluster service is available, the TWS cluster service will start. When the TWS cluster service is available, the Link cluster service will start.

Situation 2
The second situation is when IBM Tivoli Workload Scheduler Master Domain Manager executes the Jnextday script. This script is used to create the new production day. When the Jnextday script runs, it shuts down the workstations that are under the control of the Master Domain Manager and restarts them (this is a normal operation). During this process, the Master Domain Manager is also shut down and restarted. During this time the MSCS cluster is monitoring these processes and when the processes are shut down, the MSCS cluster marks them as failed and logs this event in the Windows EventLog.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

411

The MSCS cluster, however, expects their services to be stopped/started using cluster administrator commands. A command line version of these exists and is documented on the Microsoft Web site:
http://www.microsoft.com/windows2000/en/datacenter/help/sag_mscsacluscmd_0. htm

Figure 4-106 displays the syntax for the cluster resource command.

The basic cluster resource syntax is: cluster [[/cluster:]cluster name] resource resource name /option The cluster name is optional. If no option is specified, the default option is /status. If the name of your cluster is also a cluster command or its abbreviation, such as cluster or resource, use /cluster: to explicitly specify the cluster name. For a list of all the cluster commands, see Related Topics. With /online and /offline the option /wait[:timeout in seconds] specifies how long Cluster.exe waits before canceling the command if it does not successfully complete. If no time-out is specified, Cluster.exe waits indefinitely or until the resource state changes. Figure 4-106 The basic cluster resource syntax

Solution
The solution for the second situation is to create two cmd files, as discussed here.

The first file


The first file will issue offline commands to the cluster resource, as shown in Example 4-92 on page 412.
Example 4-92 Sample script to bring the TWS Cluster OFFLINE @echo off rem ******************************************************** rem * Bring TWS Cluster OFFLINE on MSCS Cluster * rem ******************************************************** echo ******************************************************************************* *

412

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

echo Show cluster status cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" echo . echo Set cluster status rem FIRST bring 'Linkage' offline, then 'TWS' !!! (reverse order from online) cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" /offline cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" /offline echo . echo Show cluster status cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" echo . echo ******************************************************************************* *

Create a IBM Tivoli Workload Scheduler job for this first script and schedule it two minutes before Jnextday runs (the default is 0559 for Jnextday, so set the first script to run at 0557). The successful execution of this script will stop the monitoring of the IBM Tivoli Workload Scheduler service (tws_maestro in this MSCS cluster), as the IBM Tivoli Workload Scheduler services are now offline. During the normal execution of Jnextday, a Conman stop command is issuedbut because the services are already down, this command has no effect and no warning or error messages are produced. Jnextday also issues a conman start command, which brings up the TWS node servicebut as the cluster did not start these services, MSCS cluster will still say they are offline. Tip: The first script should not stop the netman process. Why? Because if the netman process were to be stopped, then the master domain manager would not be able to restart this agent or domain manager during Jnextday.

The second file


The second file will issue online commands to the cluster resource, defining this second file as a job in IBM Tivoli Workload Scheduler. This should be defined without any dependences so that it runs right after Jnextday.

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

413

This job to bring the cluster services online will fail because the node services are already present, but the cluster service status has been updated and now shows the IBM Tivoli Workload Scheduler cluster service as online. Example 4-93 on page 414 displays the second file.
Example 4-93 Sample script to bring the TWS Cluster ONLINE @echo off rem ******************************************************** rem * Set TWS FTA Cluster status to ONLINE on MSCS Cluster * rem ******************************************************** echo ******************************************************************************* * echo Show cluster status cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" echo . echo Set cluster status rem FIRST bring 'TWS' online, then 'Linkage' !!! (reverse order from offline) cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" /online cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" /online echo . echo Show cluster status cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" echo . echo *******************************************************************************

As the IBM Tivoli Workload Scheduler cluster is brought offline by a cluster command, there are no error entries in the EventLog (only cluster degraded warning entries are displayed, which seem normal).

414

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 5.

Implement IBM Tivoli Management Framework in a cluster


In this chapter, we show you how to implement IBM Tivoli Management Framework in a highly available cluster. Unlike in the preceding chapters, we show an implementation that consists only of IBM Tivoli Management Framework; we do not involve high availability considerations for IBM Tivoli Workload Scheduler. We specifically discuss the following: Implement IBM Tivoli Management Framework in an HACMP cluster on page 416 Implementing Tivoli Framework in a Microsoft Cluster on page 503 While this is the basis for a highly available Tivoli Enterprise configuration, specific IBM Tivoli products may present unique high availability issues not covered in this redbook. Consult your IBM service provider for assistance with designing and implementing high availability for products like IBM Tivoli Enterprise Console, IBM Tivoli Configuration Manager, IBM Tivoli Monitoring.

Copyright IBM Corp. 2004. All rights reserved.

415

5.1 Implement IBM Tivoli Management Framework in an HACMP cluster


IBM Support officially does not recognize implementing two instances of IBM Tivoli Management Framework on a single operating system image. While it is technically possible to implement this configuration, it is not supported. You can read more about this configuration in the IBM Redbook High Availability Scenarios for Tivoli Software, SG24-2032. In this chapter, we show a supported HA configuration for a Tivoli server. Important: Even though both this chapter and 4.1.11, Add IBM Tivoli Management Framework on page 303 deal with configuring IBM Tivoli Management Framework for HACMP, they should be treated as separate from each other: This chapter describes how to configure IBM Tivoli Management Framework by itself. Chapter 4, IBM Tivoli Workload Scheduler implementation in a cluster on page 183, in contrast, deals with how to configure IBM Tivoli Management Framework and IBM Tivoli Workload Scheduler as an integrated whole. This chapter also provides implementation details for IBM Tivoli Management Framework 4.1. For a discussion on how to implement IBM Tivoli Management Framework 3.7b on the MSCS platform, refer to Appendix B, TMR clustering for Tivoli Framework 3.7b on MSCS on page 601. We also discuss how to configure Managed Nodes and Endpoints for high availability. The general steps to implement IBM Tivoli Management Framework for HACMP are: Inventory hardware on page 417 Planning the high availability design on page 418 Create the shared disk volume on page 420 Install IBM Tivoli Management Framework on page 453 Tivoli Web interfaces on page 464 Tivoli Managed Node on page 464 Tivoli Endpoints on page 466 Configure HACMP on page 480 The following sections break down each step into the following operations.

416

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5.1.1 Inventory hardware


Here we present an inventory of the hardware we used for writing this redbook. This enables you to determine what changes you may need to make when using this book as a guide in your own deployment by comparing your environment against what we used. Our environment consisted of two IBM RS/6000 7025-F80s. They are identically configured. There are four PowerPC RS64-III 450 MHz processors in each system. There is 1 GB of RAM in each system. We determined the amount of RAM by using the lsattr command:
lsattr -El mem0

The firmware is at level CL030829, which we verified by using the lscfg command:
lscfg -vp | grep -F .CL

Best practice is to bring your hardware up to the latest firmware and microcode levels. Download the most recent firmware and microcode from:
http://www-1.ibm.com/servers/eserver/support/pseries/fixes/hm.html

Onboard the system, the following devices are installed: SCSI 8mm Tape Drive (20000 MB) 5 x 16-bit LVD SCSI Disk Drive (9100 MB) 16-bit SCSI Multimedia CD-ROM Drive (650 MB) There are four adapter cards in each system: IBM 10/100 Mbps Ethernet PCI Adapter IBM 10/100/1000 Base-T Ethernet PCI Adapter (14100401) IBM SSA 160 SerialRAID Adapter IBM PCI Token ring Adapter We did not use the IBM PCI Token ring Adapter. Shared between the two systems is an IBM 7133 Model 010 Serial Disk System disk tray. Download the most recent SSA drive microcode from:
http://www.storage.ibm.com/hardsoft/products/ssa/index.html

The IBM SSA 160 SerialRAID Adapter is listed in this Web site as the Advanced SerialRAID Adapter. In our environment, the adapters are at loadable microcode level 05, ROS level BD00.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

417

There are 16 SSA drives physically installed in the disk tray, but only 8 are active. The SSA drives are 2 GB type DFHCC2B1, at microcode level 8877. In the preceding Web page, the drives are listed as type DFHC (RAMST).

5.1.2 Planning the high availability design


The restriction against two instances of IBM Tivoli Management Framework on the same operating system image prevents mutual takeover implementations. Instead, we show in this section how to install IBM Tivoli Management Framework and configure it in AIX HACMP for a two-node hot standby cluster. In this configuration, IBM Tivoli Management Framework is active on only one cluster node at a time, but is installed onto a shared volume group available to all cluster nodes. It is configured to always run from the service IP label and corresponding IP address of the cluster node it normally runs upon. Tivoli Desktop sessions connect to this IP address. In our environment we configured the file system /opt/hativoli on the shared volume group. In normal operation in our environment, the oserv server of IBM Tivoli Management Framework runs on tivaix1 as shown in Figure 5-1 on page 419.

418

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-1 IBM Tivoli Management Framework in normal operation on tivaix1

If IBM Tivoli Management Framework on tivaix1 falls over to tivaix2, the IP service label and shared file system are automatically configured by HACMP onto tivaix2. Tivoli Desktop sessions are restarted when the oserv server is shut down, so users of Tivoli Desktop will have to log back in. The fallover scenario is shown in Figure 5-2 on page 420.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

419

Figure 5-2 State of cluster after IBM Tivoli Management Framework falls over to tivaix2

All managed resources are brought over at the same time because the entire object database is contained in /opt/hativoli. As far as IBM Tivoli Management Framework is concerned, there is no functional difference between running on tivaix1 or tivaix2.

5.1.3 Create the shared disk volume


In this section, we show you how to create and configure a shared disk volume to install IBM Tivoli Management Framework into. Before installing HACMP, we create the shared volume group and install the application servers in them. We can then manually test the fallover of the application server before introducing HACMP.

420

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Plan the shared disk


The cluster needs a shared volume group to host the IBM Tivoli Management Framework upon so that participating cluster nodes can take over and vary on the volume group during a fallover. Here we show how to plan shared volume groups for an HACMP cluster that uses SSA drives. Start by making an assessment of the SSA configuration on the cluster.

Assess SSA links


Ensure that all SSA links are viable, to rule out any SSA cabling issues before starting other assessments. To assess SSA links: 1. Enter: smit diag. 2. Go to Current Shell Diagnostics and press Enter. The DIAGNOSTIC OPERATING INSTRUCTIONS diagnostics screen displays some navigation instructions. 3. Press Enter. The FUNCTION SELECTION diagnostics screen displays diagnostic functions. 4. Go to Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.) -> SSA Service Aids -> Link Verification and press Enter. The LINK VERIFICATION diagnostics screen displays a list of SSA adapters to test upon. Go to an SSA adapter to test and press Enter. In our environment, we selected the SSA adapter ssa0 on tivaix1 as shown in Figure 5-3 on page 422.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

421

LINK VERIFICATION

802385

Move cursor onto selection, then press <Enter>.

tivaix1:ssa0

2A-08

IBM SSA 160 SerialRAID Adapter (

F3=Cancel

F10=Exit

Figure 5-3 Start SSA link verification on tivaix1

5. The link verification test screen displays the results of the test. The results of the link verification test in our environment are shown in Figure 5-4 on page 423.

422

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

LINK VERIFICATION SSA Link Verification for: tivaix1:ssa0

802386

2A-08

IBM SSA 160 SerialRAID Adapter (

To Set or Reset Identify, move cursor onto selection, then press <Enter> Physical Serial# Adapter Port A1 A2 B1 B2 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0

Status Good Good Good Good Good Good Good Good

tivaix1:pdisk9 tivaix1:pdisk8 tivaix1:pdisk11 tivaix1:pdisk13 tivaix2:ssa0:A tivaix1:pdisk10 tivaix1:pdisk14 tivaix1:pdisk12 tivaix1:pdisk16 tivaix2:ssa0:B

AC7D2457 AC7D200F AC7D25F9 AC7D2654 AC7D25A4 AC7D2A94 AC7D25FE 29922C0B

F3=Cancel

F10=Exit

Figure 5-4 Results of link verification test on SSA adapter ssa0 in tivaix1

The link verification test indicates only the following SSA disks are available on tivaix1: pdisk9, pdisk8, pdisk11, pdisk13, pdisk10, pdisk14, pdisk14, pdisk12, and pdisk16. 6. Repeat the operation for remaining cluster nodes. In the environment, we tested the link verification for SSA adapter ssa0 on tivaix2, as shown in Figure 5-5 on page 424.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

423

LINK VERIFICATION SSA Link Verification for: tivaix2:ssa0

802386

17-08

IBM SSA 160 SerialRAID Adapter (

To Set or Reset Identify, move cursor onto selection, then press <Enter> Physical Serial# Adapter Port A1 A2 B1 B2 0 1 2 3 4 4 3 2 1 0 0 1 2 3 4 4 3 2 1 0

Status

tivaix1:ssa0:A tivaix2:pdisk1 tivaix2:pdisk0 tivaix2:pdisk3 tivaix2:pdisk5 tivaix1:ssa0:B tivaix2:pdisk2 tivaix2:pdisk6 tivaix2:pdisk4 tivaix2:pdisk7

AC7D2457 AC7D200F AC7D25F9 AC7D2654 AC7D25A4 AC7D2A94 AC7D25FE 29922C0B

Good Good Good Good Good Good Good Good

F3=Cancel

F10=Exit

Figure 5-5 Results of SSA link verification test on SSA adapter ssa0 in tivaix2

The link verification test indicates only the following SSA disks are available on tivaix2: pdisk0, pdisk1, pdisk2, pdisk3, pdisk4, pdisk5, pdisk6, and pdisk7.

Identify the SSA connection addresses


The connection address uniquely identifies a SSA device. To display the connection address of a physical disk, follow these steps: 1. Enter: smit chgssapdsk. The SSA Physical Disk SMIT selection screen displays a list of known physical SSA disks. Note: You can also enter: smit devices. Then go to SSA Disks -> SSA Physical Disks -> Change/Show Characteristics of an SSA Physical Disk and press Enter. 2. Go to a SSA disk and press Enter, as shown in Figure 5-6 on page 425.

424

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+--------------------------------------------------------------------------+ SSA Physical Disk Move cursor to desired item and press Enter. [TOP] pdisk0 Defined 2A-08-P 2GB SSA C Physical Disk Drive pdisk1 Defined 2A-08-P 2GB SSA C Physical Disk Drive pdisk10 Available 2A-08-P 2GB SSA C Physical Disk Drive pdisk11 Available 2A-08-P 2GB SSA C Physical Disk Drive pdisk12 Available 2A-08-P 2GB SSA C Physical Disk Drive pdisk13 Available 2A-08-P 2GB SSA C Physical Disk Drive pdisk14 Available 2A-08-P 2GB SSA C Physical Disk Drive pdisk15 Defined 2A-08-P Other SSA Disk Drive pdisk16 Available 2A-08-P 2GB SSA C Physical Disk Drive pdisk2 Defined 2A-08-P 2GB SSA C Physical Disk Drive pdisk3 Defined 2A-08-P 2GB SSA C Physical Disk Drive [MORE...6] F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 5-6 Select an SSA disk from the SSA Physical Disk SMIT selection screen

3. The Change/Show Characteristics of an SSA Physical Disk SMIT screen displays the characteristics of the selected SSA disk. The Connection address field displays the SSA connection address of the selected disk, as shown in Figure 5-7 on page 426.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

425

Change/Show Characteristics of an SSA Physical Disk Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] pdisk0 2000mbC ssa 2GB SSA C Physical Di> Defined 2A-08-P [] ssar none none adapter_a + 0004AC7D205400D

Disk Disk type Disk interface Description Status Location Location Label Parent adapter_a adapter_b primary_adapter Connection address

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-7 Identify the connection address of an SSA disk

4. Repeat the operation for all remaining SSA drives. 5. Repeat the operation for all remaining cluster nodes. An SSA connection address is unique throughout the cluster. Identify the relationship between each connection address and the AIX physical disk definition it represents on each cluster node. This establishes an actual physical relationship between the defined physical disk in AIX and the hardware disk, as identified by its SSA connection address. In our environment, we identified the SSA connection address of the disks on tivaix1 and tivaix2 as shown in Table 5-1.
Table 5-1 SSA connection addresses of SSA disks on tivaix1 and tivaix2 Physical disk on tivaix1 pdisk0 pdisk1 Connection address 0004AC7D205400D 0004AC7D20A200D Physical disk on tivaix2 pdisk8 pdisk9

426

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Physical disk on tivaix1 pdisk2 pdisk3 pdisk4 pdisk5 pdisk6 pdisk7 pdisk8 pdisk9 pdisk10 pdisk11 pdisk12 pdisk13 pdisk14 pdisk15 pdisk16

Connection address 0004AC7D22A800D 0004AC7D240D00D 0004AC7D242500D 0004AC7D25BC00D 0004AC7D275E00D 0004AC7DDACC00D 0004AC7D200F00D 0004AC7D245700D 0004AC7D25A400D 0004AC7D25F900D 0004AC7D25FE00D 0004AC7D265400D 0004AC7D2A9400D 08005AEA42BC00D 000629922C0B00D

Physical disk on tivaix2 pdisk10 pdisk11 pdisk12 pdisk13 pdisk14 pdisk15 pdisk0 pdisk1 pdisk2 pdisk3 pdisk4 pdisk5 pdisk6 n/a pdisk7

Using the list of disks identified in the link verification test in the preceding section, we highlight (in bold in Table 5-1 on page 426) the disks on each cluster node that are physically available to be shared on both nodes. From this list we identify which disks are also available to be shared as logical elements by using the assessments in the following sections.

Assess tivaix1
In our environment, the available SSA physical disks on tivaix1 are shown in Example 5-1.
Example 5-1 Available SSA disks on tivaix1 before configuring shared volume groups [root@tivaix1:/home/root] lsdev -C -c pdisk -s ssar -H name status location description pdisk0 pdisk1 pdisk10 pdisk11 pdisk12 Defined Defined Available Available Available 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2GB 2GB 2GB 2GB 2GB SSA SSA SSA SSA SSA C C C C C Physical Physical Physical Physical Physical Disk Disk Disk Disk Disk Drive Drive Drive Drive Drive

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

427

pdisk13 pdisk14 pdisk15 pdisk16 pdisk2 pdisk3 pdisk4 pdisk5 pdisk6 pdisk7 pdisk8 pdisk9

Available Available Defined Available Defined Defined Defined Defined Defined Defined Available Available

2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P 2A-08-P

2GB SSA C 2GB SSA C Other SSA 2GB SSA C 2GB SSA C 2GB SSA C 2GB SSA C 2GB SSA C 2GB SSA C 2GB SSA C 2GB SSA C 2GB SSA C

Physical Disk Physical Disk Disk Drive Physical Disk Physical Disk Physical Disk Physical Disk Physical Disk Physical Disk Physical Disk Physical Disk Physical Disk

Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive

The logical disks on tivaix1 are defined as shown in Example 5-2. Note the physical volume ID (PVID) field in the second column, and the volume group assignment field in the third column.
Example 5-2 Logical disks on tivaix1 before configuring shared volume groups [root@tivaix1:/home/root] lspv hdisk0 0001813fe67712b5 hdisk1 0001813f1a43a54d hdisk2 0001813f95b1b360 hdisk3 0001813fc5966b71 hdisk4 0001813fc5c48c43 hdisk5 0001813fc5c48d8c hdisk6 000900066116088b hdisk7 000000000348a3d6 hdisk8 00000000034d224b hdisk9 none hdisk10 none hdisk11 none hdisk12 00000000034d7fad hdisk13 none rootvg rootvg rootvg rootvg None None tiv_vg1 tiv_vg1 tiv_vg2 None None None tiv_vg2 None active active active active

The logical-to-physical SSA disk relationship of configured SSA drives on tivaix1 is shown in Example 5-3.
Example 5-3 How to show logical to physical SSA disk relationships on tivaix1. [root@tivaix1:/home/root] for i in $(lsdev -CS1 -t hdisk -sssar -F name) > do > echo "$i: "$(ssaxlate -l $i) > done hdisk10: pdisk12 hdisk11: pdisk13 hdisk12: pdisk14 hdisk13: pdisk16

428

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

hdisk6: hdisk7: hdisk8: hdisk9:

pdisk8 pdisk9 pdisk10 pdisk11

Assess tivaix2
The same SSA disks in the same SSA loop that are available on tivaix2 are shown in Example 5-4.
Example 5-4 Available SSA disks on tivaix2 before configuring shared volume groups [root@tivaix2:/home/root] lsdev -C -c pdisk -s ssar -H name status location description pdisk0 pdisk1 pdisk10 pdisk11 pdisk12 pdisk13 pdisk14 pdisk15 pdisk2 pdisk3 pdisk4 pdisk5 pdisk6 pdisk7 pdisk8 pdisk9 Available Available Defined Defined Defined Defined Defined Defined Available Available Available Available Available Available Defined Defined 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 17-08-P 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB 2GB SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA SSA C C C C C C C C C C C C C C C C Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Physical Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive Drive

The logical disks on tivaix2 are defined as shown in Example 5-5.


Example 5-5 Logical disks on tivaix2 before configuring shared volume groups [root@tivaix2:/home/root] lspv hdisk0 0001814f62b2a74b hdisk1 none hdisk2 none hdisk3 none hdisk4 none hdisk5 000900066116088b hdisk6 000000000348a3d6 hdisk7 00000000034d224b hdisk8 0001813f72023fd6 hdisk9 0001813f72025253 hdisk10 0001813f71dd8f80 hdisk11 00000000034d7fad rootvg None None None None tiv_vg1 tiv_vg1 tiv_vg2 None None None tiv_vg2 active

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

429

hdisk12 hdisk16

0001814f7ce1d08d 0001814fe8d10853

None None

The logical-to-physical SSA disk relationship of configured SSA drives on tivaix2 is shown in Example 5-6.
Example 5-6 Show logical-to-physical SSA disk relationships on tivaix2 [root@tivaix2:/home/root] for i in $(lsdev -CS1 -t hdisk -sssar -F name) > do > echo "$i: "$(ssaxlate -l $i) > done hdisk10: pdisk5 hdisk11: pdisk6 hdisk12: pdisk7 hdisk5: pdisk0 hdisk6: pdisk1 hdisk7: pdisk2 hdisk8: pdisk3 hdisk9: pdisk4

Identify the volume group major numbers


Each volume group is assigned a major device number, a unique number on a cluster node different from the major number of any other device on the cluster node. Creating a new shared volume group, on the other hand, requires a new major device number assigned to it with the following characteristics: It is different from any other major number of any device on the cluster node. It is exactly the same as the major number assigned to the same shared volume group on all other cluster nodes that share the volume group. Satisfy these criteria by identifying the existing volume group major numbers that exist on each cluster node so a unique number can be assigned for the new shared volume group. If any other shared volume groups already exist, also identify the major numbers used for these devices. Whenever possible, try to keep major numbers of similar devices in the same range. This eases the administrative burden of keeping track of the major numbers to assign. In our environment, we used the following command to identify all major numbers used by all devices on a cluster node:
ls -al /dev/* | awk '{ print $5 }' | awk -F',' '{ print $1 }' | sort | uniq

In our environment, the major numbers already assigned include the ones shown in Example 5-7 on page 431. We show only a portion of the output for brevity; the parts we left out are indicated by vertical ellipses (...).

430

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 5-7 How to list major numbers already in use on tivaix1 [root@tivaix1:/home/root] ls -al /dev/* | awk '{ print $5 }' | \ > awk -F',' '{ print $1 }' | sort -n | uniq . . . 8 11 . . . 43 44 45 46 47 512 . . .

In this environment, the volume groups tiv_vg1 and tiv_vg2 are shared volume groups that already exist. We use the ls command on tivaix1, as shown in Example 5-8,to identify the major numbers used for these shared volume groups.
Example 5-8 Identify the major numbers used for shared volume groups on tivaix1 [root@tivaix1:/home/root] ls -al /dev/tiv_vg1 crw-rw---1 root system 45, 0 Nov 05 15:51 /dev/tiv_vg1 [root@tivaix1:/home/root] ls -al /dev/tiv_vg2 crw-r----1 root system 46, 0 Nov 10 17:04 /dev/tiv_vg2

Example 5-8 shows that shared volume group tiv_vg1 uses major number 45, and shared volume group tiv_vg2 uses major number 46. We perform the same commands on the other cluster nodes that access the same shared volume groups. In our environment, these commands are entered on tivaix2, as shown in Example 5-9.
Example 5-9 Identify the major numbers used for shared volume groups on tivaix2 [root@tivaix2:/home/root] ls -al /dev/tiv_vg1 crw-r----1 root system 45, 0 Dec 15 20:36 /dev/tiv_vg1 [root@tivaix2:/home/root] ls -al /dev/tiv_vg2 crw-r----1 root system 46, 0 Dec 15 20:39 /dev/tiv_vg2

Again, you can see that the major numbers are the same on tivaix2 for the same volume groups. Between the list of all major numbers used by all devices, and

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

431

the major numbers already used by the shared volume groups in our cluster, we choose 49 as the major number to assign to the next shared volume group on all cluster nodes that will access the new shared volume group.

Analyze the assessments


Use the assessment data gathered in the preceding sections to plan the disk sharing design. Identify which physical disks are not yet assigned to any logical elements. List the physical disks available on each cluster node, as well as each disks physical volume ID (PVID), its corresponding logical disk, and the volume group the physical disk is assigned to. If a physical disk is not assigned to any logical elements yet, describe the logical elements as not available. Disks listed as defined but not available usually indicate connection problems or hardware failure on the disk itself, so do not include these disks in the analysis.
Table 5-2 Identify SSA physical disks on tivaix1 available for logical assignments Physical Disk pdisk8 pdisk9 pdisk10 pdisk11 pdisk12 pdisk13 pdisk14 pdisk16 PVID 000000000348a3d6 000000000348a3d6 00000000034d224b n/a n/a n/a 00000000034d7fad n/a Logical Disk hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 hdisk13 Volume Group tiv_vg1 tiv_vg1 tiv_vg2 n/a n/a n/a tiv_vg2 n/a

The analysis of tivaix1 indicates that four SSA disks are available as logical elements (highlighted in bold in Table 5-2) because no volume groups are allocated to them: pdisk11, pdisk12, pdisk13, and pdisk16. We want the two cluster nodes in our environment to share a set of SSA disks, so we have to apply the same analysis of available disks to tivaix2; see Table 5-3 on page 433.

432

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Table 5-3 Identify SSA physical disks on tivaix2 available for logical assignments Physical Disk pdisk0 pdisk1 pdisk2 pdisk3 pdisk4 pdisk5 pdisk6 pdisk7 PVID 000900066116088b 000000000348a3d6 00000000034d224b 0001813f72023fd6 0001813f72025253 0001813f71dd8f80 00000000034d7fad 0001814f7ce1d08d Logical Disk hdisk5 hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 Volume Group tiv_vg1 tiv_vg1 tiv_vg2 n/a n/a n/a tiv_vg2 n/a

The analysis of tivaix2 indicates that four SSA disks are available as logical elements (highlighted in bold in Table 5-3) because no volume groups are allocated to them: pdisk3, pdisk4, pdisk5, and pdisk7. Pooling together the separate analyses from each cluster node, we arrive at the map shown in Table 5-4. The center two columns show the actual, physical SSA drives as identified by their connection address and the shared volume groups hosted on these drives. The outer two columns show the AIX-assigned physical and logical disks on each cluster node, for each SSA drive.
Table 5-4 SSA connection addresses of SSA disks on tivaix1 and tivaix2 tivaix1 disks Physical pdisk8 pdisk9 pdisk10 pdisk11 pdisk12 pdisk13 pdisk14 pdisk16 Logical hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 hdisk13 Connection address 0004AC7D200F00D 0004AC7D245700D 0004AC7D25A400D 0004AC7D25F900D 0004AC7D25FE00D 0004AC7D265400D 0004AC7D2A9400D 000629922C0B00D tiv_vg2 Volume group tiv_vg1 tiv_vg1 tiv_vg2 tivaix2 disks Physical pdisk0 pdisk1 pdisk2 pdisk3 pdisk4 pdisk5 pdisk6 pdisk7 Logical hdisk5 hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

433

You can think of the AIX physical disk as the handle by which the SSA drivers in AIX use to communicate with the actual SSA hardware drive. Think of the AIX logical disk as the higher level construct that presents a uniform interface to the AIX volume management system. These logical disks are allocated to volume groups, and they map back through a chain (logical disk to physical disk to connection address) to reach the actual SSA hardware drive.

Allocate the SSA disks to a new volume group


The assessments and the analyses shows us that four SSA drives are available to allocate to a volume group for IBM Tivoli Management Framework, and be assigned as a volume group amongst both nodes in our two-node cluster. These are highlighted in bold in the preceding table. A basic installation of IBM Tivoli Management Framework requires no more than 2 GB. Our assessments in the preceding sections (Assess tivaix1 on page 427 and , Assess tivaix2 on page 429) show us that our SSA storage system uses 2 GB drives, so we know the physical capacity of each drive. We will use two drives for the volume group that will hold IBM Tivoli Management Framework, as shown in the summary analysis table (Table 5-5) that distills all the preceding analysis into the concluding analysis identifying the physical SSA disks to use, and the order in which we specify them when defining them into a volume group.
Table 5-5 Summary analysis table of disks to use for new shared volume group tivaix1 Disks Physical pdisk11 pdisk12 Logical hdisk9 hdisk10 Connection Address 0004AC7D25F900D 0004AC7D25FE00D Volume Group itmf_vg itmf_vg tivaix2 Disks Physical pdisk3 pdisk4 Logical hdisk8 hdisk9

The following section describes how to allocate the new volume group on the selected SSA drives.

Configure volume group on SSA drives


Use the SSA drives selected during analysis to configure a volume group upon. This volume group is shared among all the cluster nodes. To configure a volume group on SSA drives: 1. Select a cluster node from the final analysis table (Table 5-5). Log into that cluster node as root user. In our environment, we logged into tivaix1 as root user.

434

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

2. Enter the SMIT fast path command: smit mkvg. The Add a Volume Group SMIT screen appears. 3. Enter: itmf_vg in the VOLUME GROUP name field. 4. Go to the PHYSICAL VOLUME names field and press F4. The PHYSICAL VOLUME names SMIT dialog appears. 5. Select the physical volumes to include in the new volume group and press Enter. The Add a Volume Group SMIT selection screen appears. In our environment, we used the summary analysis table to determine that because we are on tivaix1, we need to select hdisk9 and hdisk10 in the Add a Volume Group SMIT selection screen, as shown in Figure 5-8.

+--------------------------------------------------------------------------+ PHYSICAL VOLUME names Move cursor to desired item and press F7. ONE OR MORE items can be selected. Press Enter AFTER making all selections. hdisk4 hdisk5 > hdisk9 > hdisk10 hdisk11 hdisk13 F1=Help F2=Refresh F3=Cancel F7=Select F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 5-8 Select physical volumes for volume group itmf_vg

6. Go to the Volume Group MAJOR NUMBER field and enter a unique major number. This number must be unique in every cluster node that the volume group is shared in. Ensure the volume group is not automatically activated at system restart (HACMP needs to automatically activate it) by setting the Activate volume group AUTOMATICALLY at system restart field to no. Tip: Record the volume group major number and the first physical disk you use for the volume group, for later reference in Import the volume group into the remaining cluster nodes on page 448.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

435

In our environment, we entered 49 in the Volume Group MAJOR NUMBER field, and set the Activate volume group AUTOMATICALLY at system restart field to no, as shown in Figure 5-9. We use 49 as determined in Identify the volume group major numbers on page 430, so it will not conflict with the major numbers chosen for other volume groups and devices.

Add a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [itmf_vg] 4 [hdisk9 hdisk10] no no [49] no no 128

VOLUME GROUP name Physical partition SIZE in megabytes * PHYSICAL VOLUME names Force the creation of a volume group? Activate volume group AUTOMATICALLY at system restart? Volume Group MAJOR NUMBER Create VG Concurrent Capable? Create a big VG format Volume Group? LTG Size in kbytes

+ + + + +# + + +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-9 Configure settings to add volume group itmf_vg

7. Press Enter. The volume group is created. 8. Use the lsvg and lspv commands to verify the new volume group exists, as shown in Example 5-10.
Example 5-10 Verify creation of shared volume group itmf_vg on tivaix1 [root@tivaix1:/home/root] lsvg rootvg tiv_vg1 tiv_vg2 itmf_vg [root@tivaix1:/home/root] lspv hdisk0 0001813fe67712b5

rootvg

active

436

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 hdisk13

0001813f1a43a54d 0001813f95b1b360 0001813fc5966b71 0001813fc5c48c43 0001813fc5c48d8c 000900066116088b 000000000348a3d6 00000000034d224b 0001813f72023fd6 0001813f72025253 0001813f71dd8f80 00000000034d7fad none

rootvg rootvg rootvg None None tiv_vg1 tiv_vg1 tiv_vg2 itmf_vg itmf_vg None tiv_vg2 None

active active active

active active

Create the logical volume and Journaled File System


Create a logical volume and a Journaled File System (JFS) on the new volume group. This makes the volume group available to applications running on AIX. To create a logical volume and Journaled File System on the new volume group: 1. Create the mount point for the logical volumes file system. Do this on all cluster nodes. In our environment, we used the following command:
mkdir -p /opt/hativoli

2. Enter: smit crjfsstd. 3. The Volume Group Name SMIT selection screen displays a list of volume groups. Go to the new volume group and press Enter. The Add a Standard Journaled File System SMIT screen displays the attributes for a new standard Journaled File System. In our environment, we selected itmf_vg, as shown in Figure 5-10 on page 438.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

437

+--------------------------------------------------------------------------+ Volume Group Name Move cursor to desired item and press Enter. rootvg tiv_vg1 tiv_vg2 itmf_vg F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 5-10 Select a volume group using the Volume Group Name SMIT selection screen

4. Enter values into the fields. Number of units Enter the number of megabytes to allocate for the standard Journaled File System. MOUNT POINT The mount point, which is the directory where the file system is available or will be made available. Mount AUTOMATICALLY at system restart? Indicates whether the file system is mounted at each system restart. Possible values are: yes - meaning that the file system is automatically mounted at system restart no - meaning that the file system is not automatically mounted at system restart. In our environment, we entered 2048 in the Number of units field, /opt/hativoli in the MOUNT POINT field, and yes in the Mount AUTOMATICALLY at system restart? field, as shown in Figure 5-11 on page 439.

438

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Add a Standard Journaled File System Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] itmf_vg Megabytes [2048] [/opt/hativoli] yes read/write [] no 4096 4096 8 + # + + + + + + +

Volume group name SIZE of file system Unit Size * Number of units * MOUNT POINT Mount AUTOMATICALLY at system restart? PERMISSIONS Mount OPTIONS Start Disk Accounting? Fragment Size (bytes) Number of bytes per inode Allocation Group Size (MBytes)

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-11 Create a standard Journaled File System on volume group itmf_vg in tivaix1

5. Press Enter to create the standard Journaled File System. The COMMAND STATUS SMIT screen displays the progress and result of the operation. A successful operation looks similar to Figure 5-12 on page 440.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

439

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. Based on the parameters chosen, the new /opt/hativoli JFS file system is limited to a maximum size of 134217728 (512 byte blocks) New File System size is 4194304

F1=Help F8=Image n=Find Next

F2=Refresh F9=Shell

F3=Cancel F10=Exit

F6=Command /=Find

Figure 5-12 Successful creation of JFS file system /opt/hativoli on tivaix1

6. Use the ls, df, mount, and umount commands to verify the new standard Journaled File System, as shown in Example 5-11.
Example 5-11 Verify successful creation of a JFS file system [root@tivaix1:/home/root] [root@tivaix1:/home/root] Filesystem 1024-blocks /dev/hd10opt 262144 [root@tivaix1:/home/root] [root@tivaix1:/home/root] Filesystem 1024-blocks /dev/lv09 2097152 [root@tivaix1:/home/root] lost+found [root@tivaix1:/home/root] ls /opt/hativoli df -k /opt/hativoli Free %Used Iused %Iused Mounted on 68724 74% 3544 6% /opt mount /opt/hativoli df -k /opt/hativoli Free %Used Iused %Iused Mounted on 2031276 4% 17 1% /opt/hativoli ls /opt/hativoli umount /opt/hativoli

The new volume group is now populated with a new standard Journaled File System.

440

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Important: Our environment does not use multiple SSA adapters due to resource constraints. In a production high availability environment, you use multiple disk controllers. Best practice for HACMP is to use multiple disk controllers and multiple disks for volume groups. Specifically, to ensure disk availability, best practice for each cluster node is to split a volume group between at least two disk controllers and three disks, mirroring across all the disks.

Configure the logical volume


Rename the new logical volume and its log volume so it is guaranteed to be a unique name in any cluster node. The new name will be the same name on any cluster node that varies on the logical volumes volume group, and must be unique from any other logical volume on all cluster nodes. You only need to perform this operation from one cluster node. The volume group must be online on this cluster node. In our environment, we wanted to rename logical volume lv09 to itmf_lv, and logical log volume loglv00 to itmf_loglv. To rename the logical volume and logical log volume: 1. Use the lsvg command as shown in Example 5-12 to identify the logical volumes on the new volume group. In our environment, the volume group itmf_vg contains two logical volumes. Logical volume lv09 is for the standard Journal File System /opt/hativoli. Logical volume loglv00 is the log logical volume for lv09.
Example 5-12 Identify logical volumes on new volume group [root@tivaix1:/home/root] lsvg itmf_vg: LV NAME TYPE loglv00 jfslog lv09 jfs -l itmf_vg LPs 1 512 PPs 1 512 PVs LV STATE 1 closed/syncd 1 closed/syncd MOUNT POINT N/A /opt/hativoli

2. Enter: smit chlv2. You can also enter: smit storage, go to Logical Volume Manager -> Logical Volumes -> Set Characteristic of a Logical Volume -> Rename a Logical Volume and press Enter. The Rename a Logical Volume SMIT screen is displayed. 3. Enter the name of the logical volume to rename in the CURRENT logical volume name field. Enter the new name of the logical volume in the NEW logical volume name field.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

441

In our environment, we entered lv09 in the CURRENT logical volume name field, and itmf_lv in the NEW logical volume name field, as shown in Figure 5-13.

Rename a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [lv09] [itmf_lv]

* CURRENT logical volume name * NEW logical volume name

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-13 Rename a logical volume

4. Press Enter to rename the logical volume. The COMMAND STATUS SMIT screen displays the progress and the final status of the renaming operation. 5. Repeat the operation for the logical log volume. In our environment, we renamed logical volume loglv00 to itmf_loglv, as shown in Figure 5-14 on page 443.

442

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Rename a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [loglv00] [itmf_loglv]

* CURRENT logical volume name * NEW logical volume name

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-14 REname the logical log volume

6. Run the chfs command as shown in Example 5-13 to update the relationship between the logical volume itmf_lv and logical log volume itmf_loglv.
Example 5-13 Update relationship between renamed logical volumes and logical log volumes [root@tivaix1:/home/root] chfs /opt/hativoli

7. Verify the chfs command modified the /etc/filesystems file entry for the file system. In our environment, we used the grep command as shown in Example 5-14 on page 444 to verify that the /etc/filesystems entry for /opt/hativoli matches the new names of the logical volume and logical log volume. The attributes dev and log contain the new names itmf_lv and itmf_loglv, respectively.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

443

Example 5-14 Verify the chfs command [root@tivaix1:/home/root] /opt/hativoli: dev = vfs = log = mount = check = options = account = grep -p /opt/hativoli /etc/filesystems /dev/itmf_lv jfs /dev/itmf_loglv true false rw false

Export the volume group


Export the volume group from the cluster node it was created upon to make it available to other cluster nodes. To export a volume group: 1. Log into the cluster node that the volume group was created upon. In our environment, we logged into tivaix1 as root user. 2. Note that the volume group is varied on as soon as it is created. Vary off the volume group if necessary, so it can be exported. In our environment, we varied off the volume group itmf_vg by using the following command:
varyoffvg itmf_vg

3. Enter: smit exportvg. The Export a Volume Group SMIT screen displays a VOLUME GROUP name field. 4. Enter the new volume group in the VOLUME GROUP name field. In our environment, we entered itmf_vg in the VOLUME GROUP name field, as shown in Figure 5-15 on page 445.

444

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Export a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [itmf_vg]

* VOLUME GROUP name

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-15 Export a Volume Group SMIT screen

5. Press Enter to export the volume group. The COMMAND STATUS SMIT screen displays the progress and final result of the export operation. 6. Use the lsvg and lspv commands as shown in Example 5-15 to verify the export of the volume group. Notice that the volume group name does not appear in the output of either command.
Example 5-15 Verify the export of volume group itmf_vg from tivaix1 [root@tivaix1:/home/root] lsvg rootvg tiv_vg1 tiv_vg2 [root@tivaix1:/home/root] lspv hdisk0 0001813fe67712b5 hdisk1 0001813f1a43a54d hdisk2 0001813f95b1b360 hdisk3 0001813fc5966b71 hdisk4 0001813fc5c48c43 hdisk5 0001813fc5c48d8c

rootvg rootvg rootvg rootvg None None

active active active active

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

445

hdisk6 hdisk7 hdisk8 hdisk9 hdisk10 hdisk11 hdisk12 hdisk13

000900066116088b 000000000348a3d6 00000000034d224b 0001813f72023fd6 0001813f72025253 0001813f71dd8f80 00000000034d7fad none

tiv_vg1 tiv_vg1 tiv_vg2 None None None tiv_vg2 None

Re-import the volume group


Once we export a volume group, we import it back into the same cluster node we first exported it from. We then log into the other cluster nodes on the same SSA loop as the cluster node we create the volume group upon in Configure volume group on SSA drives on page 434, and import the volume group so we can make it a shared volume group. To import the volume group back to the same cluster node we first exported it from: 1. Log into the cluster node as root user. In our environment, we logged into tivaix1 as root user. 2. Use the lsvg command as shown in Example 5-16 to verify the volume group is not already imported.
Example 5-16 Verify volume group itmf_vg is not already imported into tivaix1 [root@tivaix1:/home/root] lsvg -l itmf_vg 0516-306 : Unable to find volume group i in the Device Configuration Database.

3. Enter: smit importvg. You can also enter: smit storage, go to Logical Volume Manager -> Volume Groups -> Import a Volume Group, and press Enter. The Import a Volume Group SMIT screen is displayed. 4. Enter the following values. Use the values determined in Configure volume group on SSA drives on page 434. VOLUME GROUP name The volume group name. The name must be unique system-wide, and can range from 1 to 15 characters. PHYSICAL VOLUME name The name of the physical volume. Physical volume names are typically in the form hdiskx where x is a system-wide unique number. This name is assigned when the disk is detected for the first time on a system

446

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

startup or when the system management commands are used at runtime to add a disk to the system. Volume Group MAJOR NUMBER The major number of the volume group. The system kernel accesses devices, including volume groups, through a major and minor number combination. To see what major numbers are available on your system, use the SMIT List feature. In our environment, we entered itmf_vg in the VOLUME GROUP name field, hdisk9 in the PHYSICAL VOLUME name field, and 49 in the Volume Group MAJOR NUMBER, as shown in Figure 5-16.

Import a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [itmf_vg] [hdisk9] [49]

VOLUME GROUP name * PHYSICAL VOLUME name Volume Group MAJOR NUMBER

+ +#

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-16 Import a volume group

5. Press Enter to import the volume group. The COMMAND STATUS SMIT screen displays the progress and final result of the volume group import operation. 6. Vary on the volume group using the varyonvg command.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

447

In our environment, we entered the command:


varyonvg itmf_vg

7. Use the lsvg command as shown in Example 5-17 to verify the volume group import.
Example 5-17 Verify import of volume group itmf_vg into tivaix1 [root@tivaix1:/home/root] lsvg itmf_vg: LV NAME TYPE itmf_loglv jfslog itmf_lv jfs -l itmf_vg LPs 1 512 PPs 1 512 PVs LV STATE 1 closed/syncd 1 closed/syncd MOUNT POINT N/A /opt/hativoli

8. Vary off the volume group using the varyoffvg command so you can import the volume group into the remaining cluster nodes. In our environment, we entered the command:
varyoffvg itmf_vg

Import the volume group into the remaining cluster nodes


Import the volume group into the remaining cluster nodes so it becomes a shared volume group. In our environment, we imported volume group itmf_vg into cluster node tivaix2. Note: Importing a volume group also varies it on, so be sure to vary it off first with the varyoffvg command if it is in the ONLINE state on a cluster node. To import a volume group defined on SSA drives so it becomes a shared volume group with other cluster nodes: 1. Log into another cluster node as root user. In our environment, we logged into tivaix2 as root user. 2. Enter the SMIT fast path command: smit importvg. You can also enter: smit storage, go to Logical Volume Manager -> Volume Groups -> Import a Volume Group, and press Enter. The Import a Volume Group SMIT screen is displayed. 3. Use the same volume group name that you used in the preceding operation for the VOLUME GROUP name field. In our environment, we entered itmf_vg in the VOLUME GROUP name field. 4. Use the summary analysis table created in Plan the shared disk on page 421 to determine the logical disk to use. The volume group major

448

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

number is the same on all cluster nodes, so use the same volume group major number as in the preceding operation. In our environment, we observed that hdisk9 on tivaix1 corresponds to hdisk8 on tivaix2, so we used hdisk8 in the PHYSICAL VOLUME name field, as shown in Figure 5-17.

Import a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [itmf_vg] [hdisk8] [49]

VOLUME GROUP name * PHYSICAL VOLUME name Volume Group MAJOR NUMBER

+ +#

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-17 Import volume group itmf_vg on tivaix2

5. Press Enter to import the volume group. The COMMAND STATUS SMIT screen displays the progress and final result of the volume group import operation. 6. Use the lsvg and lspv commands to verify the volume group import. The output of these commands contains the name of the imported volume group. In our environment, we verified the volume group import as shown in Example 5-18 on page 450.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

449

Example 5-18 Verify the import of volume group itmf_vg into tivaix2 [root@tivaix2:/home/root] lsvg rootvg tiv_vg1 tiv_vg2 itmf_vg [root@tivaix2:/home/root] lsvg -l itmf_vg itmf_vg: LV NAME TYPE LPs PPs PVs LV STATE itmf_loglv jfslog 1 1 1 closed/syncd itmf_lv jfs 512 512 1 closed/syncd [root@tivaix2:/home/root] lspv hdisk0 0001814f62b2a74b rootvg hdisk1 none None hdisk2 none None hdisk3 none None hdisk4 none None hdisk5 000900066116088b tiv_vg1 hdisk6 000000000348a3d6 tiv_vg1 hdisk7 00000000034d224b tiv_vg2 hdisk8 0001813f72023fd6 itmf_vg hdisk9 0001813f72025253 itmf_vg hdisk10 0001813f71dd8f80 None hdisk11 00000000034d7fad tiv_vg2 hdisk12 0001814f7ce1d08d None hdisk16 0001814fe8d10853 None

MOUNT POINT N/A /opt/hativoli active

active active

7. Vary off the volume group using the varyoffvg command. In our environment, we entered the following command into tivaix2:
varyoffvg itmf_vg

Verify the volume group sharing


Manually verify that all imported volume groups can be shared between cluster nodes before configuring HACMP. If volume group sharing fails under HACMP, manual verification usually allows you to rule out a problem in the configuration of the volume groups, and focus upon the definition of the shared volume groups under HACMP. To verify volume group sharing: 1. Log into a cluster node as root user. In our environment, we logged into tivaix1 as root user. 2. Verify the volume group is not already active on the cluster node. Use the lsvg command as shown in Example 5-19 on page 451. The name of the

450

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

volume group does not appear in the output of the command if the volume group is not active on the cluster node.
Example 5-19 Verify a volume group is not already active on a cluster node [root@tivaix1:/home/root] lsvg -o rootvg

3. Vary on the volume group using the varyonvg command. In our environment, we entered the command:
varyonvg itmf_vg

4. Use the lspv and lsvg commands as shown in Example 5-20 to verify the volume group is put into the ONLINE state. The name of the volume group appears in the output of these commands now, where it did not before.
Example 5-20 How to verify volume group itmf_vg is online on tivaix1 [root@tivaix1:/home/root] lsvg -o itmf_vg rootvg [root@tivaix1:/home/root] lsvg -l itmf_vg itmf_vg: LV NAME TYPE LPs PPs PVs LV STATE itmf_loglv jfslog 1 1 1 closed/syncd itmf_lv jfs 512 512 1 closed/syncd [root@tivaix1:/home/root] lspv hdisk0 0001813fe67712b5 rootvg hdisk1 0001813f1a43a54d rootvg hdisk2 0001813f95b1b360 rootvg hdisk3 0001813fc5966b71 rootvg hdisk4 0001813fc5c48c43 None hdisk5 0001813fc5c48d8c None hdisk6 000900066116088b tiv_vg1 hdisk7 000000000348a3d6 tiv_vg1 hdisk8 00000000034d224b tiv_vg2 hdisk9 0001813f72023fd6 itmf_vg hdisk10 0001813f72025253 itmf_vg hdisk11 0001813f71dd8f80 None hdisk12 00000000034d7fad tiv_vg2 hdisk13 none None

MOUNT POINT N/A /opt/hativoli active active active active

active active

5. Use the df, mount, touch, and ls and umount commands to verify the availability of the logical volume, and to create a test file. The file system and mount point changes after mounting the logical volume. In our environment, we created the test file /opt/hativoli/node_tivaix1.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

451

Example 5-21 Verify availability of a logical volume in a shared volume group [root@tivaix1:/home/root] df -k /opt/hativoli Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd10opt 262144 68724 74% 3544 6% /opt [root@tivaix1:/home/root] mount /opt/hativoli [root@tivaix1:/home/root] df -k /opt/hativoli Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/itmf_lv 2097152 2031276 4% 17 1% /opt/hativoli [root@tivaix1:/home/root] touch /opt/hativoli/node_tivaix1 [root@tivaix1:/home/root] ls -l /opt/hativoli/node_tivaix* -rw-r--r-1 root sys 0 Dec 17 15:25 /opt/hativoli/node_tivaix1 [root@tivaix1:/home/root] umount /opt/hativoli

6. Vary off the volume group using the varyoffvg command. In our environment, we used the command:
varyoffvg itmf_vg

7. Repeat the operation on all remaining cluster nodes. Ensure test files created on other cluster nodes sharing this volume group exist. In our environment, we repeated the operation on tivaix2 as shown in Example 5-22.
Example 5-22 Verify shared volume group itmf_vg on tivaix2 [root@tivaix2:/home/root] lsvg -o rootvg [root@tivaix2:/home/root] varyonvg itmf_vg [root@tivaix2:/home/root] lsvg -o itmf_vg rootvg [root@tivaix2:/home/root] lsvg -l itmf_vg itmf_vg: LV NAME TYPE LPs PPs PVs LV STATE itmf_loglv jfslog 1 1 1 closed/syncd itmf_lv jfs 512 512 1 closed/syncd [root@tivaix2:/home/root] lspv hdisk0 0001814f62b2a74b rootvg hdisk1 none None hdisk2 none None hdisk3 none None hdisk4 none None hdisk5 000900066116088b tiv_vg1 hdisk6 000000000348a3d6 tiv_vg1 hdisk7 00000000034d224b tiv_vg2 hdisk8 0001813f72023fd6 itmf_vg hdisk9 0001813f72025253 itmf_vg hdisk10 0001813f71dd8f80 None

MOUNT POINT N/A /opt/hativoli active

active active

452

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

hdisk11 00000000034d7fad tiv_vg2 hdisk12 0001814f7ce1d08d None hdisk16 0001814fe8d10853 None [root@tivaix2:/home/root] df -k /opt/hativoli Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd10opt 262144 29992 89% 3587 6% /opt [root@tivaix2:/home/root] mount /opt/hativoli [root@tivaix2:/home/root] df -k /opt/hativoli Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/itmf_lv 2097152 2031276 4% 17 1% /opt/hativoli [root@tivaix2:/home/root] touch /opt/hativoli/node_tivaix2 [root@tivaix2:/home/root] ls -l /opt/hativoli/node_tivaix* -rw-r--r-1 root sys 0 Dec 17 15:25 /opt/hativoli/node_tivaix1 -rw-r--r-1 root sys 0 Dec 17 15:26 /opt/hativoli/node_tivaix2 [root@tivaix2:/home/root] umount /opt/hativoli [root@tivaix2:/home/root] varyoffvg itmf_vg

5.1.4 Install IBM Tivoli Management Framework


In this section we show how to install IBM Tivoli Management Framework Version 4.1 with all available patches as of the time of writing; specifically, how to install on tivaix1 in the environment used for this redbook. We only need to install once, because we used a hot standby configuration. After installing IBM Tivoli Management Framework, we describe how to install and configure HACMP for it on both tivaix1 and tivaix2. Concurrent access requires application support of the Cluster Lock Manager. IBM Tivoli Management Framework does not support Cluster Lock Manager, so we use shared Logical Volume Manager (LVM) access.

Plan for high availability considerations


We install the IBM Tivoli Management Framework before installing and configuring HACMPso if IBM Tivoli Management Framework exhibits problems after introducing HACMP, we will know the root cause is likely an HACMP configuration issue. It helps the overall deployment if we plan around some of the high availability considerations while installing IBM Tivoli Management Framework.

Installation directories
IBM Tivoli Management Framework uses the following directories on a Tivoli server: /etc/Tivoli Tivoli home directory, where IBM Tivoli Management Framework is installed under, and most Tivoli Enterprise products are usually installed in.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

453

Important: These are not the only directories used in a Tivoli Enterprise deployment of multiple IBM Tivoli products. In our environment, we left /etc/Tivoli on the local drives of each cluster node. This enabled the flexibility to easily use multiple, local Endpoint installations on each cluster node. Putting /etc/Tivoli on the shared disk volume is possible, but it involves adding customized start and stop HACMP scripts that would shuffle the contents of /etc/Tivoli depending upon what Endpoints are active on a cluster node. We use /opt/hativoli as the Tivoli home directory. Following best practice, we first install IBM Tivoli Management Framework into /opt/hativoli, then install and configure HACMP. Note: In an actual production deployment, best practice is to implement /etc/Tivoli on a shared volume group because leaving it on the local disk of a system involves synchronizing the contents of highly available Endpoints across cluster nodes.

Associated IP addresses
Configuring the Tivoli server as a resource group in a hot standby two-node cluster requires that the IP addresses associated with the server remain with the server, regardless of which cluster node it runs upon. The IP address associated with the installation of the Tivoli server should be the service IP address. When the cluster node the Tivoli server is running on falls over, the service IP label falls over to the new cluster node, along with the resource group that contains the Tivoli server.

Plan the installation sequence


Before installing, plan the sequence of the packages you are going to install. Refer to Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, for detailed information about what needs to be installed. Figure 5-18 on page 455 shows the sequence and dependencies of packages we planned for IBM Tivoli Management Framework Version 4.1 for the environment we used for this redbook.

454

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

TMF410

4.1-TMF-0008

4.1-TMF-0014

4.1-TMF-0015

4.1-TMF-0016

4.1-TMF-0017

4.1-TMF-0032

4.1-TMF-0034

odadmin rexec

Figure 5-18 IBM Tivoli Framework 4.1.0 application and patch sequence and dependencies as of December 2, 2003

Stage installation media


Complete the procedures listed in Stage installation media on page 313 to stage the IBM Tivoli Management Framework installation media.

Modify /etc/hosts and name resolution order


Complete the procedures in Modify /etc/hosts and name resolution order on page 250 to configure IP hostname lookups.

Install base Framework


In this section we show you how to install IBM Tivoli Management Framework so that it is specifically configured for IBM Tivoli Workload Scheduler on HACMP. This enables you to transition the instances of IBM Tivoli Management Framework used for IBM Tivoli Workload Scheduler to a mutual takeover environment if that becomes a supported feature in the future. We believe the configuration as shown in this section can be started and stopped directly from HACMP in a mutual takeover configuration. When installing IBM Tivoli Management Framework on an HACMP cluster node in support of IBM Tivoli Workload Scheduler, use the primary IP hostname as the hostname for IBM Tivoli Management Framework. Add an IP alias later for the

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

455

service IP label. When this configuration is used with the multiple Connector object configuration described in section, this enables Job Scheduling Console users to connect through any instance of IBM Tivoli Management Framework, no matter which cluster nodes fall over. IBM Tivoli Management Framework itself consists of a base install, and various components. You must first prepare for the base install by performing the commands as shown in Example 5-23 for cluster node tivaix1 in our environment. On tivaix2, we replace the IP hostname in the first command from tivaix1_svc to tivaix2_svc.
Example 5-23 Preparing for installation of IBM Tivoli Management Framework 4.1 [root@tivaix1:/home/root] HOST=tivaix1_svc [root@tivaix1:/home/root] echo $HOST > /etc/wlocalhost [root@tivaix1:/home/root] WLOCALHOST=$HOST [root@tivaix1:/home/root] export WLOCALHOST [root@tivaix1:/home/root] mkdir /opt/hativoli/install_dir [root@tivaix1:/home/root] cd /opt/hativoli/install_dir [root@tivaix1:/opt/hativoli/install_dir] /bin/sh \ > /usr/sys/inst.images/tivoli/fra/FRA410_1of2/WPREINST.SH to install, type ./wserver -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 [root@tivaix1:/opt/hativoli/install_dir] DOGUI=no [root@tivaix1:/opt/hativoli/install_dir] export DOGUI

After you prepare for the base install, perform the initial installation of IBM Tivoli Management Framework by running the command shown in Example 5-24. You will see output similar to this example; depending upon the speed of your server, it will take 5 to 15 minutes to complete. On tivaix2 in our environment, we run the same command except we change the third line of the command from tivaix1_svc to tivaix2_svc.
Example 5-24 Initial installation of IBM Tivoli Management Framework Version 4.1 [root@tivaix1:/home/root] cd /usr/local/Tivoli/install_dir [root@tivaix1:/usr/local/Tivoli/install_dir] sh ./wserver -y \ -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 \ -a tivaix1_svc -d \ BIN=/opt/hativoli/bin! \ LIB=/opt/hativoli/lib! \ ALIDB=/opt/hativoli/spool! \ MAN=/opt/hativoli/man! \ APPD=/usr/lib/lvm/X11/es/app-defaults! \ CAT=/opt/hativoli/msg_cat! \ LK=1FN5B4MBXBW4GNJ8QQQ62WPV0RH999P99P77D \ RN=tivaix1_svc-region \ AutoStart=1 SetPort=1 CreatePaths=1 @ForceBind@=yes @EL@=None

456

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Using command line style installation..... Unless you cancel, the following operations will be executed: need to copy the CAT (generic) to: tivaix1_svc:/opt/hativoli/msg_cat need to copy the CSBIN (generic) to: tivaix1_svc:/opt/hativoli/bin/generic need to copy the APPD (generic) to: tivaix1_svc:/usr/lib/lvm/X11/es/app-defaults need to copy the GBIN (generic) to: tivaix1_svc:/opt/hativoli/bin/generic_unix need to copy the BUN (generic) to: tivaix1_svc:/opt/hativoli/bin/client_bundle need to copy the SBIN (generic) to: tivaix1_svc:/opt/hativoli/bin/generic need to copy the LCFNEW (generic) to: tivaix1_svc:/opt/hativoli/bin/lcf_bundle.40 need to copy the LCFTOOLS (generic) to: tivaix1_svc:/opt/hativoli/bin/lcf_bundle.40/bin need to copy the LCF (generic) to: tivaix1_svc:/opt/hativoli/bin/lcf_bundle need to copy the LIB (aix4-r1) to: tivaix1_svc:/opt/hativoli/lib/aix4-r1 need to copy the BIN (aix4-r1) to: tivaix1_svc:/opt/hativoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix1_svc:/opt/hativoli/spool/tivaix1.db need to copy the MAN (aix4-r1) to: tivaix1_svc:/opt/hativoli/man/aix4-r1 need to copy the CONTRIB (aix4-r1) to: tivaix1_svc:/opt/hativoli/bin/aix4-r1/contrib need to copy the LIB371 (aix4-r1) to: tivaix1_svc:/opt/hativoli/lib/aix4-r1 need to copy the LIB365 (aix4-r1) to: tivaix1_svc:/opt/hativoli/lib/aix4-r1 Executing queued operation(s) Distributing machine independent Message Catalogs --> tivaix1_svc ..... Completed. Distributing machine independent generic Codeset Tables --> tivaix1_svc .... Completed. Distributing architecture specific Libraries --> tivaix1_svc ...... Completed. Distributing architecture specific Binaries --> tivaix1_svc ............. Completed. Distributing architecture specific Server Database --> tivaix1_svc

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

457

.......................................... Completed. Distributing architecture specific Man Pages --> tivaix1_svc ..... Completed. Distributing machine independent X11 Resource Files --> tivaix1_svc ... Completed. Distributing machine independent Generic Binaries --> tivaix1_svc ... Completed. Distributing machine independent Client Installation Bundle --> tivaix1_svc ... Completed. Distributing machine independent generic HTML/Java files --> tivaix1_svc ... Completed. Distributing architecture specific Public Domain Contrib --> tivaix1_svc ... Completed. Distributing machine independent LCF Images (new version) --> tivaix1_svc ............. Completed. Distributing machine independent LCF Tools --> tivaix1_svc ....... Completed. Distributing machine independent 36x Endpoint Images --> tivaix1_svc ............ Completed. Distributing architecture specific 371_Libraries --> tivaix1_svc .... Completed. Distributing architecture specific 365_Libraries --> tivaix1_svc .... Completed. Registering installation information...Finished.

Load Tivoli environment variables in .profile files


The Tivoli environment variables contain pointers to important directories that IBM Tivoli Management Framework uses for many commands. Loading the variables in the .profile file of a user account ensures that these environment variables are always available immediately after logging into the user account.

458

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Use the commands in Example 5-25 to modify the .profile files of the root user account on all cluster nodes to source in all Tivoli environment variables for IBM Tivoli Management Framework.
Example 5-25 Load Tivoli environment variables on tivaix1 PATH=${PATH}:${HOME}/bin if [ -f /etc/Tivoli/setup_env.sh ] ; then . /etc/Tivoli/setup_env.sh fi

Also enter these commands on the command line, or log out and log back in to activate the environment variables for the following sections.

Install Framework components and patches


After the base install is complete, you can install all remaining Framework components and patches by running the script shown in Example 5-26. If you use this script on tivaix2, change the line that starts with the string HOST= so that tivaix1 is replaced with tivaix2.
Example 5-26 Script for installing IBM Tivoli Management Framework Version 4.1 with patches #!/bin/ksh if [ -d /etc/Tivoli ] ; then . /etc/Tivoli/setup_env.sh fi reexec_oserv() { echo "Reexecing object dispatchers..." if [ `odadmin odlist list_od | wc -l` -gt 1 ] ; then # # Determine if necessary to shut down any clients tmr_hosts=`odadmin odlist list_od | head -1 | cut -c 36-` client_list=`odadmin odlist list_od | grep -v ${tmr_hosts}$` if [ "${client_list}" = "" ] ; then echo "No clients to shut down, skipping shut down of clients..." else echo "Shutting down clients..." odadmin shutdown clients echo "Waiting for all clients to shut down..." sleep 30 fi fi odadmin reexec 1 sleep 30 odadmin start clients

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

459

} HOST="tivaix1_svc" winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRE130 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JHELP41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JCF41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRIM41 $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i MDIST2GU $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISDEPOT $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISCLNT $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i ADE $HOST winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i AEF $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF008 -y -i 41TMF008 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF014 -y -i 41TMF014 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF015 -y -i 41TMF015 $HOST reexec_oserv wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF016 -y -i 41TMF016 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2928 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2929 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2931 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2932 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2962 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2980 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2984 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2986 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2987 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2989 $HOST wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF034 -y -i 41TMF034 $HOST reexec_oserv wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF032 -y -i JRE130_0 $HOST

This completes the installation of IBM Tivoli Management Framework Version 4.1. The successful completion of the installation performs a gross level verification of IBM Tivoli Management Framework. After installing IBM Tivoli Management Framework, configure it to meet the requirements of integrating with IBM Tivoli Workload Scheduler over HACMP.

Add IP alias to oserv


Installing IBM Tivoli Management Framework using the service IP hostname of the server binds the Framework server (also called oserv) to the corresponding service IP address. It only listens for Framework network traffic on this IP address. This ensures a highly available IBM Tivoli Management Framework only starts after HACMP is running.

460

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

In our environment, we also need oserv to listen on the persistent IP address. The persistent IP label/address is not moved between cluster nodes when a resource group is moved, but remains on the cluster node to ease administrative access (that is why it is called the persistent IP label/address). Job Scheduling Console users depend upon using the service IP address to access IBM Tivoli Workload Scheduler services. As a security precaution, IBM Tivoli Management Framework only listens on the IP address it is initially installed against unless the feature specifically disabled to bind against other addresses. We show you how to disable this feature in this section. To add the service IP label as a Framework oserv IP alias: 1. Log in as root user on a cluster node. In our environment, we logged in as root user on cluster node tivaix1. 2. Use the odadmin command as shown in Example 5-27 to verify the current IP aliases of the oserv, add the service IP label as an IP alias to the oserv, and then verify that the service IP label is added to the oserv as an IP alias. Note that the numeral 1 in the odadmin odlist add_ip_alias command should be replaced by the dispatcher number of your Framework installation.
Example 5-27 Add IP alias to Framework oserv server [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port IPaddr Hostname(s) 1369588498 1 ct94 9.3.4.3 tivaix1_svc [root@tivaix1:/home/root] odadmin odlist add_ip_alias 1 tivaix1 [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port IPaddr Hostname(s) 1369588498 1 ct94 9.3.4.3 tivaix1_svc 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com

The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation. In Example 5-28, the dispatcher number is 7.
Example 5-28 Identify dispatcher number of Framework installation [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port 1369588498 7 ct94 IPaddr Hostname(s) 9.3.4.3 tivaix1_svc

The dispatcher number will be something other than 1 if you delete and re-install Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

461

3. Use the odadmin command as shown in Example 5-29 to verify that IBM Tivoli Management Framework currently binds against the primary IP hostname, then disable the feature, and then verify that it is disabled. Note that the numeral 1 in the odadmin set_force_bind command should be replaced by the dispatcher number of your Framework installation.
Example 5-29 Disable set_force_bind object dispatcher option [root@tivaix1:/home/root] odadmin | grep Force Force socket bind to a single address = TRUE [root@tivaix1:/home/root] odadmin set_force_bind FALSE 1 [root@tivaix1:/home/root] odadmin | grep Force Force socket bind to a single address = FALSE

The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation. In Example 5-30, the dispatcher number is 7.
Example 5-30 Identify dispatcher number of Framework installation [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port 1369588498 7 ct94 IPaddr 9.3.4.3 Hostname(s) tivaix1_svc

The dispatcher number will be something other than 1 if you delete and re-install Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation. Important: Disabling the set_force_bind variable can cause unintended side effects for installations of IBM Tivoli Management Framework that also run other IBM Tivoli server products, such as IBM Tivoli Monitoring and IBM Tivoli Configuration Manager. Consult your IBM service provider for advice on how to address this potential conflict if you plan on deploying other IBM Tivoli server products on top of the instance of IBM Tivoli Management Framework that you use for IBM Tivoli Workload Scheduler. Best practice is to dedicate an instance of IBM Tivoli Management Framework for IBM Tivoli Workload Scheduler, typically on the Master Domain Manager, and not to install other IBM Tivoli server products into it. This simplifies these administrative concerns and does not affect the functionality of a Tivoli Enterprise environment. 4. Repeat the operation on all remaining cluster nodes. For our environment, we repeat the operation on tivaix2, replacing tivaix1 with tivaix2 in the commands.

462

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Move the .tivoli directory


The default installation of IBM Tivoli Management Framework on a UNIX system creates the /tmp/.tivoli directory. This directory contains files that are required by the object dispatcher process. In a high availability implementation, the directory needs to move with the resource group that contains IBM Tivoli Management Framework. This means we need to move the directory into the shared volume groups file system. In our environment, we moved the directory to /opt/hativoli/tmp/.tivoli. To use a different directory, you must set an environment variable in both the object dispatcher and the shell. After installing IBM Tivoli Management Framework, perform the following steps to set the necessary environment variables: 1. Create a directory. This directory must have at least public read and write permissions. However, define full permissions and set the sticky bit to ensure that users cannot modify files that they do not own. In our environment, we ran the commands shown in Example 5-31.
Example 5-31 Create the new .tivoli directory mkdir -p /opt/hativoli/tmp/.tivoli chmod ugo=rwx /opt/hativoli/tmp/.tivoli chmod u+s /opt/hativoli/tmp/.tivoli

2. Set the environment variable in the object dispatcher: a. Enter the following command:
odadmin environ get > envfile

b. Add the following line to the envfile file and save it:
TIVOLI_COMM_DIR=new_directory_name

c. Enter the following command:


odadmin environ set < envfile

3. Edit the Tivoli-provided set_env.csh, setup_env.sh, and oserv.rc files in the /etc/Tivoli directory to set the TIVOLI_COMM_DIR variable. 4. For HP-UX and Solaris systems, add the following line to the file that starts the object dispatcher:
TIVOLI_COMM_DIR=new_directory_name

Insert the line near where the other environment variables are set, in a location that runs before the object dispatcher is started. The following list contains the file that needs to be changed on each operating system: For HP-UX operating systems: /sbin/init.d/Tivoli For Solaris operating systems: /etc/rc3.d/S99Tivoli

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

463

5. Shut down the object dispatcher by entering the following command:


odadmin shutdown all

6. Restart the object dispatcher by entering the following command:


odadmin reexec all

5.1.5 Tivoli Web interfaces


IBM Tivoli Management Framework provides access to Web-enabled Tivoli Enterprise applications from a browser. When a browser sends an HTTP request to the Tivoli server, the request is redirected to a Web server. IBM Tivoli Management Framework provides this Web access by using some servlets and support files that are installed on the Web server. The servlets establish a secure connection between the Web server and the Tivoli server. The servlets and support files are called the Tivoli Web interfaces. IBM Tivoli Management Framework provides a built-in Web server called the spider HTTP service. It is not as robust or secure as a third-party Web server, so if you plan on deploying a Tivoli Enterprise product that requires Web access, consult your IBM service provider for advice about selecting a more appropriate Web server. IBM Tivoli Management Framework supports any Web server that implements the Servlet 2.2 specifications, but the following Web servers are specifically certified for use with IBM Tivoli Management Framework: IBM WebSphere Application Server, Advanced Single Server Edition IBM WebSphere Application Server, Enterprise Edition IBM WebSphere Enterprise Application Server Jakarta Tomcat The Web server can be hosted on any computer system. If you deploy a Web server on a cluster node, you will likely want to make it highly available. In this redbook we focus upon high availability for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework. Refer to IBM WebSphere V5.0 Performance, Scalability, and High Availability: WebSphere Handbook Series, SG24-6198-00, for details on configuring WebSphere Application Server for high availability. Consult your IBM service provider for more details on configuring other Web servers for high availability.

5.1.6 Tivoli Managed Node


Managed Nodes are no different from IBM Tivoli Management Framework Tivoli servers in terms of high availability design. They operate under the same constraint of only one instance per operating system instance. While the

464

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

AutoStart install variable of the wclient command implies we can configure multiple instances of the object dispatcher on a single operating system instance, IBM Tivoli Support staff confirmed for us that this is not a supported configuration at the time of writing. Use the wclient command to install a Managed Node in a highly available cluster, as shown in Example 5-32.
Example 5-32 Install a Managed Node wclient -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -p ibm.tiv.pr -P @AutoStart@=0 @ForceBind@=yes BIN=/opt/hativoli/bin! LIB=/opt/hativoli/lib! DB=/opt/hativoli/spool! MAN=/opt/hativoli/man! APPD=/usr/lib/lvm/X11/es/app-defaults! CAT=/opt/hativoli/msg_cat! tivaix3_svc \ \ \ \ \ \ \ \

In this example, we installed a Managed Node named tivaix3_svc on a system with the IP hostname tivaix3_svc (the service IP label of the cluster node) from the CD image we copied to the local drive in Stage installation media on page 455, into the directory /opt/hativoli. We also placed the managed resource object in the ibm.tiv.pr policy region. See about how to use the wclient command. Except for the difference in the initial installation (using the wclient command instead of the wserver command), planning and implementing a highly available Managed Node is the same as for a Tivoli server, as described in the preceding sections. If the constraint is lifted in future versions of IBM Tivoli Management Framework, or if you still want to install multiple instances of the object dispatcher on a single instance of an operating system, configure each instance with a different directory. To configure a different directory, change the BIN, LIB, DB, MAN, CAT and (optionally) APPD install variables that are passed to the wclient command. Configure the Tivoli environment files and the oserv.rc executable in /etc/Tivoli to accommodate the multiple installations. Modify external dependencies upon /etc/Tivoli where appropriate. We recommend using multiple, separate directories, one for each instance of IBM Tivoli Management Framework. Consult your IBM service provider for assistance with configuring this design.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

465

5.1.7 Tivoli Endpoints


Endpoints offer more options for high availability designs. When designing a highly available Tivoli Enterprise deployment, best practice is to keep the number of Managed Nodes as low as possible, and to use Endpoints as much as possible. In some cases (such as, for very old versions of Plus Modules) this might not be feasible, but the benefits of using Endpoints can often justify the cost of refactoring these older products into an Endpoint form. Unlike Managed Nodes, multiple Endpoints on a single instance of an operating system are supported. This opens up many possibilities for high availability design. One design is to create an Endpoint to associate with a highly available resource group on a shared volume group, as shown in Figure 5-19.

tivaix1

tivaix2

Framework oserv

Framework oserv

HA Endpoint lcfd

Shared Volume Group


/opt/hativoli/lcf

Figure 5-19 Normal operation of highly available Endpoint

466

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Under normal operation, cluster node tivaix1 runs the highly available Endpoint from the directory /opt/hativoli/lcf on the shared volume group. When the resource group falls over, tivaix1 is unavailable and the resource group moves to tivaix2. The Endpoint continues to listen on the IP service address of tivaix1, but runs off tivaix2 instead, as shown in Figure 5-20.

tivaix1

tivaix2

Framework oserv

Framework oserv

HA Endpoint lcfd

Shared Volume Group


/opt/hativoli/lcf

Figure 5-20 Fallover operation of highly available Endpoint

We recommend that you use this configuration to manage HACMP resource group-specific system resources. Examples of complementary IBM Tivoli products that leverage Endpoints in a highly available environment include: Monitor a file system in a resource group with IBM Tivoli Monitoring. Monitor a highly available database in a resource group with IBM Tivoli Monitoring for Databases.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

467

Inventory and distribute software used in a resource group with IBM Tivoli Configuration Manager. Enforce software license compliance of applications in a resource group with IBM Tivoli License Manager. Specific IBM Tivoli products may have specific requirements that affect high availability planning and implementation. Consult your IBM service provider for assistance with planning and implementing other IBM Tivoli products on top of a highly available Endpoint. Another possible design builds on top of a single highly available Endpoint. The highly available Endpoint is sufficient for managing the highly available resource group, but is limited in its ability to manage the cluster hardware. A local instance of an Endpoint can be installed to specifically manage compute resources associated with each cluster node. For example, assume we use a cluster configured with a resource group for a highly available instance of IBM WebSphere Application Server. The environment uses IBM Tivoli Monitoring for Web Infrastructure to monitor the instance of IBM WebSphere Application Server in the resource group. This is managed through a highly available Endpoint that moves with the Web servers resource group. It also needs to use IBM Tivoli Monitoring to continuously monitor available local disk space on each cluster node. In one possible fallover scenario, the resource group moves from one cluster node to another such that it leaves both the source and destination cluster nodes running. A highly available Endpoint instance can manage the Web server because they both move with a resource group, but it will no longer be able to manage hardware-based resources because the cluster node hardware itself is changed when the resource group moves. Under this design, the normal operation of the cluster we used for this redbook is shown in Figure 5-21 on page 469.

468

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tivaix1

tivaix2

Framework oserv Endpoint lcfd


/opt/lcftivoli
rootvg

Framework oserv Endpoint lcfd


/opt/lcftivoli
rootvg

HA Endpoint lcfd

Shared Volume Group


/opt/hativoli/lcf

Figure 5-21 Normal operation of local and highly available Endpoints

In normal operation then, three Endpoints are running. If the cluster moves the resource group containing the highly available Endpoint from tivaix1 to tivaix2, the state of the cluster would still leave three Endpoints, as shown in Figure 5-22 on page 470.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

469

tivaix1

tivaix2

Framework oserv Endpoint lcfd


/opt/lcftivoli
rootvg

Framework oserv Endpoint lcfd


/opt/lcftivoli
rootvg

HA Endpoint lcfd

Shared Volume Group


/opt/hativoli/lcf

Figure 5-22 Cluster state after moving highly available Endpoint to tivaix2

However, if cluster node tivaix1 fell over to tivaix2 instead, it would leave only two Endpoint instances running, as shown in Figure 5-23 on page 471.

470

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

tivaix1

tivaix2

Framework oserv Endpoint lcfd


/opt/lcftivoli
rootvg

Framework oserv Endpoint lcfd


/opt/lcftivoli
rootvg

HA Endpoint lcfd

Shared Volume Group


/opt/hativoli/lcf

Figure 5-23 Cluster state after falling over tivaix1 to tivaix2

In each scenario in this alternate configuration, an Endpoint instance is always running on all cluster nodes that remain operational, even if HACMP on that cluster node is not running. As long as the system is powered up and the operating system functional, the local Endpoint remains to manage that system. In this redbook we show how to install and configure a highly available Endpoint, then add a local Endpoint to the configuration. We use the same two-node cluster used throughout this document as the platform upon which we implement this configuration. Endpoints require a Gateway in the Tivoli environment to log into so they can reach the Endpoint Manager. In our environment, we create a Gateway using the wcrtgate command, and verify the operation using the wlookup and wgateway commands as shown in Example 5-33 on page 472.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

471

Example 5-33 Create a Gateway on tivaix1 [root@tivaix1:/home/root] wlookup -Lar Gateway [root@tivaix1:/home/root] wcrtgate -h tivaix1 -n tivaix1-gateway 1369588498.1.680#TMF_Gateway::Gateway# [root@tivaix1:/home/root] wlookup -Lar Gateway tivaix1-gateway [root@tivaix1:/home/root] wgateway tivaix1-gateway describe Object : 1369588498.1.680#TMF_Gateway::Gateway# Protocols : TCPIP Hostname : tivaix1 TCPIP Port : 9494 Session Timeout : 300 Debug level : 0 Start Time : 2003/12/22-18:53:05 Log Dir : /opt/hativoli/spool/tivaix1.db Log Size : 1024000 RPC Threads : 250 Max. Con. Jobs : 200 Gwy Httpd : Disabled mcache_bwcontrol : Disabled

In Example 5-33, we create a Gateway named tivaix1-gateway on the Managed Node tivaix1. Best practice is to design and implement multiple sets of Gateways, each set geographically dispersed when possible, to ensure that Endpoints always have a Gateway to log into. Gateways are closely related to repeaters. Sites that use IBM Tivoli Configuration Manager might want to consider using two parallel sets of Gateways to enable simultaneous use of inventory and software distribution operations, which require different bandwidth throttling characteristics. See Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, for more information about how to design a robust Gateway architecture. As long as at least one Gateway is created, all Endpoints in a Tivoli Enterprise installation can log into that Gateway. To install a highly available Endpoint: 1. Use the wlookup command to verify that the Endpoint does not already exist. In our environment, no Endpoints have been created yet, so the command does not return any output, as shown in Example 5-34.
Example 5-34 Verify no Endpoints exist within a Tivoli Enterprise installation [root@tivaix1:/home/root] wlookup -Lar Endpoint [root@tivaix1:/home/root]

472

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

2. Use the winstlcf command as shown in Example 5-35 to install the Endpoint. Refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 for details about how to use the winstlcf command. In our environment, we used the -d flag option to specify the installation destination of the Endpoint, the -g flag option to specify the Gateway we create, the -n flag option to specify the name of the Endpoint, the -v flag option for verbose output, and we use the IP hostname tivaix1_svc to bind the Endpoint to the IP service label of the cluster node.
Example 5-35 Install a highly available Endpoint on cluster node tivaix1 [root@tivaix1:/home/root] winstlcf -d /opt/hativoli/lcf -g tivaix1 -n hativoli \ -v tivaix1_svc Trying tivaix1_svc... password for root: ********** sh -c ' echo "__START_HERE__" uname uname uname uname -m -r -s -v || || || || hostinfo hostinfo hostinfo hostinfo | | | | grep grep grep grep NeXT NeXT NeXT NeXT

cd /tmp mkdir .tivoli.lcf.tmp.16552 cd .tivoli.lcf.tmp.16552 tar -xBf - > /dev/null || tar -xf tar -xBf tivaix1_svc-16552-lcf.tar generic/epinst.sh tivaix1_svc-16552-lcf.env > /dev/null || tar -xf tivaix1_svc-16552-lcf.tar generic/epinst.sh tivaix1_svc-16552-lcf.env sh -x generic/epinst.sh tivaix1_svc-16552-lcf.env tivaix1_svc-16552-lcf.tar cd .. rm -rf .tivoli.lcf.tmp.16552 ' ********** AIX:2:5:0001813F4C00 locating files in /usr/local/Tivoli/bin/lcf_bundle.41000... locating files in /usr/local/Tivoli/bin/lcf_bundle... Ready to copy files to host tivaix1_svc: destination: tivaix1_svc:/opt/hativoli/lcf source: tivaix1:/usr/local/Tivoli/bin/lcf_bundle.41000 files: generic/lcfd.sh generic/epinst.sh

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

473

generic/as.sh generic/lcf_env.sh generic/lcf_env.csh generic/lcf_env.cmd generic/lcf.inv bin/aix4-r1/mrt/lcfd lib/aix4-r1/libatrc.a lib/aix4-r1/libcpl272.a lib/aix4-r1/libdes272.a lib/aix4-r1/libmd2ep272.a lib/aix4-r1/libmrt272.a lib/aix4-r1/libtis272.a lib/aix4-r1/libio.a lib/aix4-r1/libtos.a lib/aix4-r1/libtoslog.a lib/aix4-r1/libtthred.a source: tivaix1:/usr/local/Tivoli/bin/lcf_bundle files: lib/aix4-r1/libmrt.a lib/aix4-r1/libcpl.a lib/aix4-r1/libdes.a Continue? [yYna?]y Tivoli Light Client Framework starting on tivaix1_svc Dec 22 19:00:53 1 lcfd Command line argv[0]='/opt/hativoli/lcf/bin/aix4-r1/mrt/lcfd' Dec 22 19:00:53 1 lcfd Command line argv[1]='-Dlcs.login_interfaces=tivaix1_svc' Dec 22 19:00:53 1 lcfd Command line argv[2]='-n' Dec 22 19:00:53 1 lcfd Command line argv[3]='hativoli' Dec 22 19:00:53 1 lcfd Command line argv[4]='-Dlib_dir=/opt/hativoli/lcf/lib/aix4-r1' Dec 22 19:00:53 1 lcfd Command line argv[5]='-Dload_dir=/opt/hativoli/lcf/bin/aix4-r1/mrt' Dec 22 19:00:53 1 lcfd Command line argv[6]='-C/opt/hativoli/lcf/dat/1' Dec 22 19:00:53 1 lcfd Command line argv[7]='-Dlcs.machine_name=tivaix1_svc' Dec 22 19:00:53 1 lcfd Command line argv[8]='-Dlcs.login_interfaces=tivaix1' Dec 22 19:00:53 1 lcfd Command line argv[9]='-n' Dec 22 19:00:53 1 lcfd Command line argv[10]='hativoli' Dec 22 19:00:53 1 lcfd Starting Unix daemon Performing auto start configuration Done. + set -a + WINSTENV=tivaix1_svc-16552-lcf.env + [ -z tivaix1_svc-16552-lcf.env ] + . ./tivaix1_svc-16552-lcf.env + INTERP=aix4-r1 + LCFROOT=/opt/hativoli/lcf + NOAS= + ASYNCH= + DEBUG= + LCFOPTS= -Dlcs.login_interfaces=tivaix1_svc -n hativoli + NOTAR=

474

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+ MULTIINSTALL= + BULK_COUNT= + BULK_PORT= + HOSTNAME=tivaix1_svc + VERBOSE=1 + PRESERVE= + LANG= + LC_ALL= + LCFDVRMP=LCF41015 + rm -f ./tivaix1_svc-16552-lcf.env + [ aix4-r1 != w32-ix86 -a aix4-r1 != w32-axp ] + umask 022 + + pwd stage=/tmp/.tivoli.lcf.tmp.16552 + [ -n ] + [ aix4-r1 = w32-ix86 -o aix4-r1 = os2-ix86 -o aix4-r1 = w32-axp ] + [ -d /opt/hativoli/lcf/bin/aix4-r1 ] + [ ! -z ] + MKDIR_CMD=/bin/mkdir -p /opt/hativoli/lcf/dat + [ -d /opt/hativoli/lcf/dat ] + /bin/mkdir -p /opt/hativoli/lcf/dat + [ aix4-r1 != w32-ix86 -a aix4-r1 != w32-axp -a aix4-r1 != os2-ix86 ] + chmod 755 /opt/hativoli/lcf/dat + cd /opt/hativoli/lcf + [ aix4-r1 = os2-ix86 -a ! -d /tmp ] + [ -n ] + [ aix4-r1 = w32-ix86 -a -z ] + [ aix4-r1 = w32-axp -a -z ] + mv generic/lcf.inv bin/aix4-r1/mrt/LCF41015.SIG + PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java130/jre/bin:/usr/java130/bin: /opt/hativoli/lcf/generic + export PATH + [ -n ] + [ -n ] + K=1 + fixup=1 + [ 1 -gt 0 ] + unset fixup + [ -n ] + [ -n ] + [ -n ] + [ -n ] + [ -z ] + port=9494 + [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp ] + ET=/etc/Tivoli/lcf + + getNextDirName /opt/hativoli/lcf/dat /etc/Tivoli/lcf uniq=1

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

475

+ LCF_DATDIR=/opt/hativoli/lcf/dat/1 + [ aix4-r1 != openstep4-ix86 ] + mkdir -p dat/1 + s=/opt/hativoli/lcf/dat/1/lcfd.sh + cp /opt/hativoli/lcf/generic/lcfd.sh /opt/hativoli/lcf/dat/1/lcfd.sh + sed -e s!@INTERP@!aix4-r1!g -e s!@LCFROOT@!/opt/hativoli/lcf!g -e s!@LCF_DATDIR@!/opt/hativoli/lcf/dat/1!g + 0< /opt/hativoli/lcf/dat/1/lcfd.sh 1> t + mv t /opt/hativoli/lcf/dat/1/lcfd.sh + [ aix4-r1 != w32-ix86 -a aix4-r1 != w32-axp -a aix4-r1 != os2-ix86 ] + chmod 755 /opt/hativoli/lcf/dat/1/lcfd.sh + chmod 755 /opt/hativoli/lcf/bin/aix4-r1/mrt/lcfd + chmod 755 /opt/hativoli/lcf/lib/aix4-r1/libatrc.a /opt/hativoli/lcf/lib/aix4-r1/libcpl.a /opt/hativoli/lcf/lib/aix4-r1/libcpl272.a /opt/hativoli/lcf/lib/aix4-r1/libdes.a /opt/hativoli/lcf/lib/aix4-r1/libdes272.a /opt/hativoli/lcf/lib/aix4-r1/libio.a /opt/hativoli/lcf/lib/aix4-r1/libmd2ep272.a /opt/hativoli/lcf/lib/aix4-r1/libmrt.a /opt/hativoli/lcf/lib/aix4-r1/libmrt272.a /opt/hativoli/lcf/lib/aix4-r1/libtis272.a /opt/hativoli/lcf/lib/aix4-r1/libtos.a /opt/hativoli/lcf/lib/aix4-r1/libtoslog.a /opt/hativoli/lcf/lib/aix4-r1/libtthred.a + s=/opt/hativoli/lcf/generic/lcf_env.sh + [ -f /opt/hativoli/lcf/generic/lcf_env.sh ] + sed -e s!@LCFROOT@!/opt/hativoli/lcf!g + 0< /opt/hativoli/lcf/generic/lcf_env.sh 1> t + mv t /opt/hativoli/lcf/generic/lcf_env.sh + label=tivaix1_svc + [ 1 -ne 1 ] + [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp ] + [ -n ] + /opt/hativoli/lcf/dat/1/lcfd.sh install -C/opt/hativoli/lcf/dat/1 -Dlcs.machine_name=tivaix1_svc -Dlcs.login_interfaces=tivaix1 -n hativoli + + expr 1 - 1 K=0 + [ 0 -gt 0 ] + set +e + ET=/etc/Tivoli/lcf/1 + [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp ] + [ aix4-r1 != openstep4-ix86 ] + [ ! -d /etc/Tivoli/lcf/1 ] + mkdir -p /etc/Tivoli/lcf/1 + mv /opt/hativoli/lcf/generic/lcf_env.sh /etc/Tivoli/lcf/1/lcf_env.sh + sed -e s!@INTERP@!aix4-r1!g -e s!@LCFROOT@!/opt/hativoli/lcf!g -e s!@LCF_DATDIR@!/opt/hativoli/lcf/dat/1!g + 0< /etc/Tivoli/lcf/1/lcf_env.sh 1> /etc/Tivoli/lcf/1/lcf_env.sh.12142 + mv /etc/Tivoli/lcf/1/lcf_env.sh.12142 /etc/Tivoli/lcf/1/lcf_env.sh + [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp -o aix4-r1 = os2-ix86 ] + mv /opt/hativoli/lcf/generic/lcf_env.csh /etc/Tivoli/lcf/1/lcf_env.csh + sed -e s!@INTERP@!aix4-r1!g -e s!@LCFROOT@!/opt/hativoli/lcf!g -e s!@LCF_DATDIR@!/opt/hativoli/lcf/dat/1!g + 0< /etc/Tivoli/lcf/1/lcf_env.csh 1> /etc/Tivoli/lcf/1/lcf_env.csh.12142

476

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

+ + + + + + + +

mv /etc/Tivoli/lcf/1/lcf_env.csh.12142 /etc/Tivoli/lcf/1/lcf_env.csh cp /etc/Tivoli/lcf/1/lcf_env.csh /etc/Tivoli/lcf/1/lcf_env.sh /opt/hativoli/lcf/dat/1 [ aix4-r1 = os2-ix86 ] [ -z ] sh /opt/hativoli/lcf/generic/as.sh 1 echo 1 1> /etc/Tivoli/lcf/.instance echo Done.

3. Use the wlookup and wep commands as shown in Example 5-36 to verify the installation of the highly available Endpoint.
Example 5-36 Verify installation of highly available Endpoint [root@tivaix1:/home/root] wlookup -Lar Endpoint hativoli [root@tivaix1:/home/root] wep ls G 1369588498.1.680 tivaix1-gateway 1369588498.2.522+#TMF_Endpoint::Endpoint# hativoli [root@tivaix1:/home/root] wep hativoli object label version id gateway pref_gateway netload interp login_mode protocol address policy httpd alias crypt_mode upgrade_mode last_login_time last_migration_time last_method_time 1369588498.2.522+#TMF_Endpoint::Endpoint# hativoli 41014 0001813F4C00 1369588498.1.680#TMF_Gateway::Gateway# 1369588498.1.680#TMF_Gateway::Gateway# OBJECT_NIL aix4-r1 desktop, constant TCPIP 192.168.100.101+9495 OBJECT_NIL tivoli:r)T!*`un OBJECT_NIL NONE enable 2003/12/22-19:00:54 2003/12/22-19:00:54 NOT_YET_SET

4. If this is the first time an Endpoint is installed on the system, the Lightweight Client Framework (LCF) environment file is installed in the /etc/Tivoli/lcf/1 directory, as shown in Example 5-37 on page 478. The directory with the highest number in the /etc/Tivoli/lcf directory is the latest installed environment files directory. Identify this directory and record it.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

477

Tip: Best practice is to delete all unused instances of the LCF environment directories. This eliminates the potential for misleading configurations.
Example 5-37 Identify directory location of LCF environment file [root@tivaix1:/home/root] ls /etc/Tivoli/lcf/1 ./ ../ lcf_env.csh lcf_env.sh

If you are unsure of which directory contains the appropriate environment files, use the grep command as shown in Example 5-38 to identify which instance of an Endpoint an LCF environment file is used for.
Example 5-38 Identify which instance of an Endpoint an LCF environment file is used for [root@tivaix1:/home/root] grep LCFROOT= /etc/Tivoli/lcf/1/lcf_env.sh LCFROOT="/opt/hativoli/lcf"

Important: Ensure the instance of a highly available Endpoint is the same on all cluster nodes that the Endpoint can fall over to. This enables scripts to be the same on every cluster node. 5. Stop the new Endpoint to prepare it for HACMP to start and stop it. Use the ps and grep commands to identify the running instances of Endpoints, source in the LCF environment, use the lcfd.sh command to stop the Endpoint (the environment that is sourced in identifies the instance of the Endpoint that is stopped), and use the ps and grep commands to verify that the Endpoint is stopped, as shown in Example 5-39.
Example 5-39 Stop an instance of an Endpoint [root@tivaix1:/home/root] ps -ef | grep lcf | grep -v grep root 21520 1 0 Dec 22 - 0:00 /opt/hativoli/bin/aix4-r1/mrt/lcfd -Dlcs.login_interfaces=tivaix1_svc -n hativoli -Dlib_dir=/opt/hativoli/lib/aix4-r1 -Dload_dir=/opt/hativoli/bin/aix4-r1/mrt -C/opt/hativoli/dat/1 -Dlcs.machine_name=tivaix1_svc -Dlcs.login_interfaces=tivaix1 -n hativoli [root@tivaix1:/home/root] . /etc/Tivoli/lcf/1/lcf_env.sh [root@tivaix1:/home/root] lcfd.sh stop [root@tivaix1:/home/root] ps -ef | grep lcf | grep -v grep

Disable automatic start


Disable the automatic start of any highly available Tivoli server, Managed Node, or Endpoint so that instead of starting as soon as the system restarts, they start under the control of HACMP.

478

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Endpoint installations configure the Endpoint to start every time the system restarts. High availability implementations need to start and stop highly available Endpoints after HACMP is running, so the automatic start after system restart needs to be disabled. Determine how an Endpoint starts on your platform after a system restart and disable it. In our environment, the highly available Endpoint is installed on an IBM AIX system. Under IBM AIX, the file /etc/rc.tman starts an Endpoint, where n is the instance of an Endpoint that is installed. Example 5-40 shows the content of this file. We remove the file to disable automatic start after system restart.
Example 5-40 Identify how an Endpoint starts during system restart [root@tivaix1:/etc] cat /etc/rc.tma1 #!/bin/sh # # Start the Tivoli Management Agent # if [ -f /opt/hativoli/dat/1/lcfd.sh ]; then /opt/hativoli/dat/1/lcfd.sh start fi

The oserv.rc program starts Tivoli servers and Managed Nodes. In our environment, the highly available Tivoli server is installed on an IBM AIX system. We use the find command as shown in Example 5-41 to identify the files in the /etc directory used to start the object dispatcher. The files (highlighted in italics) are: /etc/inittab and /etc/inetd.conf. We remove the lines found by the find command to disable the automatic start mechanism.
Example 5-41 Find all instances where IBM Tivoli Management Framework is started [root@tivaix1:/etc] find /etc -type f -exec grep 'oserv.rc' {} \; -print oserv:2:once:/etc/Tivoli/oserv.rc start > /dev/null 2>&1 /etc/inittab objcall dgram udp wait root /etc/Tivoli/oserv.rc /etc/Tivoli/oserv.rc inetd /etc/inetd.conf

You can use the same find command to determine how the object dispatcher starts on your platform. Use the following find command to search for instances of the string lcfd.sh in the files in the /etc directory if you need to identify the files where the command is used to start an Endpoint:
find /etc -type f -exec grep 'lcfd.sh' {} \; -print

Note that the line containing the search string appears first, followed by the file location.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

479

5.1.8 Configure HACMP


After verifying that the installation of IBM Tivoli Management Framework (whether it is a Tivoli server, Managed Node, or Endpoint) you want to make highly available correctly functions, then install and configure HACMP on the system. If IBM Tivoli Management Framework subsequently fails to start or function properly, you will know that it is highly likely the cause is due to an HACMP issue instead of an IBM Tivoli Management Framework issue. Restriction: These procedures are mutually exclusive from the instructions given in Chapter 4, IBM Tivoli Workload Scheduler implementation in a cluster on page 183. While some steps are the same, you can either implement either the scenario given in that chapter, or this chapter, but you cannot implement both at the same time. In this section we show how to install and configure HACMP for an IBM Tivoli Management Framework Tivoli server.

Install HACMP
Complete the procedures in Install HACMP on page 113.

Configure HACMP topology


Complete the procedures in Configure HACMP topology on page 219 to define the cluster topology.

Configure service IP labels/addresses


Complete the procedures in Configure HACMP service IP labels/addresses on page 221 to configure service IP labels and addresses.

Configure application servers


An application server is a cluster resource used to control an application that must be kept highly available. Configuring an application server does the following: It associates a meaningful name with the server application. For example, you could give an installation of IBM Tivoli Management Framework a name such as itmf. You then use this name to refer to the application server when you define it as a resource. It points the cluster event scripts to the scripts that they call to start and stop the server application.

480

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

It allows you to then configure application monitoring for that application server. We show in Add custom HACMP start and stop scripts on page 489 how to write the start and stop scripts for IBM Tivoli Management Framework. Note: Ensure that the server start and stop scripts exist on all nodes that participate as possible owners of the resource group where this application server resides. Complete the following steps to create an application server on any cluster node: 1. Enter: smitty hacmp. 2. Go to Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Application Servers and press Enter. The Configure Resources to Make Highly Available SMIT screen is displayed as shown in Figure 5-24.

Configure Resources to Make Highly Available Move cursor to desired item and press Enter. Configure Configure Configure Configure Service IP Labels/Addresses Application Servers Volume Groups, Logical Volumes and Filesystems Concurrent Volume Groups and Logical Volumes

F1=Help F9=Shell

F2=Refresh F10=Exit

F3=Cancel Enter=Do

F8=Image

Figure 5-24 Configure Resources to Make Highly Available SMIT screen

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

481

3. Go to Configure Application Servers and press Enter. The Configure Application Servers SMIT screen is displayed as shown in Figure 5-25.
Configure Application Servers Move cursor to desired item and press Enter. Add an Application Server Change/Show an Application Server Remove an Application Server

F1=Help F9=Shell

F2=Refresh F10=Exit

F3=Cancel Enter=Do

F8=Image

Figure 5-25 Configure Application Servers SMIT screen

4. Go to Add an Application Server and press Enter. The Add Application Server SMIT screen is displayed as shown in Figure 5-26 on page 483. Enter field values as follows: Server Name Enter an ASCII text string that identifies the server. You will use this name to refer to the application server when you define resources during node configuration. The server name can include alphabetic and numeric characters and underscores. Use no more than 64 characters. Enter the name of the script and its full pathname (followed by arguments) called by the cluster event scripts to start the application server. (Maximum 256 characters.) This script must be in the same location on each cluster node that might start the server. The contents of the script, however, may differ.

Start Script

482

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Stop Script

Enter the full pathname of the script called by the cluster event scripts to stop the server. (Maximum 256 characters.) This script must be in the same location on each cluster node that may start the server. The contents of the script, however, may differ.

Add Application Server Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [itmf] [/usr/es/sbin/cluster/> [/usr/es/sbin/cluster/>

* Server Name * Start Script * Stop Script

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-26 Fill out the Add Application Server SMIT screen for application server itmf

As shown in Figure 5-26, in our environment on tivaix1 we named the instance of IBM Tivoli Management Framework that normally runs on that cluster node itmf (for IBM Tivoli Management Framework). Note that no mention is made of the cluster nodes when defining an application server. We only mention the cluster node so you are familiar with the conventions we use in our environment. For the start script of application server itmf, we enter the following in the Start Script field:
/usr/es/sbin/cluster/utils/start_itmf.sh

The stop script of this application server is:


/usr/es/sbin/cluster/utils/stop_itmf.sh

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

483

This is entered in the Stop Script field. 5. Press Enter to add this information to the ODM on the local node. 6. Repeat the procedure for all additional application servers. For our environment, there are no further application servers to configure.

Configure the application monitoring


HACMP can monitor specified applications and automatically take action to restart them upon detecting process death or other application failures. Note: If a monitored application is under control of the system resource controller, check to be certain that action:multi are -O and -Q. The -O specifies that the subsystem is not restarted if it stops abnormally. The -Q specifies that multiple instances of the subsystem are not allowed to run at the same time. These values can be checked using the following command:
lssrc -Ss Subsystem | cut -d : -f 10,11

If the values are not -O and -Q, then they must be changed using the chssys command. You can select either of two application monitoring methods: Process application monitoring detects the death of one or more processes of an application, using RSCT Event Management. Custom application monitoring checks the health of an application with a custom monitor method at user-specified polling intervals. Process monitoring is easier to set up, as it uses the built-in monitoring capability provided by RSCT and requires no custom scripts. However, process monitoring may not be an appropriate option for all applications. Custom monitoring can monitor more subtle aspects of an applications performance and is more customizable, but it takes more planning, as you must create the custom scripts. We show you in this section how to configure process monitoring for IBM Tivoli Management Framework. Remember that an application must be defined to an application server before you set up the monitor. For IBM Tivoli Management Framework, we configure process monitoring for the oserv process because it will always run under normal conditions. If it fails, we want the cluster to automatically fall over, and not attempt to restart oserv. Because oserv starts very quickly, we only give it 60 seconds to start before monitoring begins. For cleanup and restart scripts, we will use the same scripts as the start and stop scripts discussed in Add custom HACMP start and stop scripts on page 489.

484

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Set up your process application monitor as follows: 1. Enter: smit hacmp. 2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resources Configuration -> Configure HACMP Application Monitoring -> Configure Process Application Monitor -> Add Process Application Monitor and press Enter. A list of previously defined application servers appears. 3. Select the application server for which you want to add a process monitor. In our environment, we selected itmf, as shown in Figure 5-27.

+--------------------------------------------------------------------------+ Application Server to Monitor Move cursor to desired item and press Enter. itmf F1=Help F2=Refresh F3=Cancel F8=Image F10=Exit Enter=Do /=Find n=Find Next +--------------------------------------------------------------------------+ Figure 5-27 Select an application server to monitor

4. In the Add Process Application Monitor screen, fill in the field values as follows: Monitor Name This is the name of the application monitor. If this monitor is associated with an application server, the monitor has the same name as the application server. This field is informational only and cannot be edited. Application Server Name (This field can be chosen from the picklist. It is already filled with the name of the application server you selected.) Processes to Monitor Specify the process(es) to monitor. You can type more than one process name. Use spaces to separate the names.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

485

Note: To be sure you are using correct process names, use the names as they appear from the ps -el command (not ps -f), as explained in the section Identifying Correct Process Names in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Process Owner Specify the user id of the owner of the processes specified above, for example, root. Note that the process owner must own all processes to be monitored. Instance Count Specify how many instances of the application to monitor. The default is 1 instance. The number of instances must match the number of processes to monitor exactly. If you put 1 instance, and another instance of the application starts, you will receive an application monitor error. Note: This number must be 1 if you have specified more than one process to monitor (one instance for each process). Stabilization Interval Specify the time (in seconds) to wait before beginning monitoring. For instance, with a database application, you may wish to delay monitoring until after the start script and initial database search have been completed. You may need to experiment with this value to balance performance with reliability. Note: In most circumstances, this value should not be zero. Restart Count Specify the restart count, that is the number of times to attempt to restart the application before taking any other actions. The default is 3. Note: Make sure you enter a Restart Method if your Restart Count is any non-zero value.

486

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Restart Interval Specify the interval (in seconds) that the application must remain stable before resetting the restart count. Do not set this to be shorter than (Restart Count) x (Stabilization Interval). The default is 10% longer than that value. If the restart interval is too short, the restart count will be reset too soon and the desired fallover or notify action may not occur when it should. Action on Application Failure Specify the action to be taken if the application cannot be restarted within the restart count. You can keep the default choice notify, which runs an event to inform the cluster of the failure, or select fallover, in which case the resource group containing the failed application moves over to the cluster node with the next highest priority for that resource group. Refer to Note on the Fallover Option and Resource Group Availability in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862 for more information. Notify Method (Optional) Define a notify method that will run when the application fails. This custom method runs during the restart process and during notify activity. Cleanup Method (Optional) Specify an application cleanup script to be invoked when a failed application is detected, before invoking the restart method. The default is the application server stop script defined when the application server was set up. Note: With application monitoring, since the application is already stopped when this script is called, the server stop script may fail. Restart Method (Required if Restart Count is not zero.) The default restart method is the application server start script defined previously, when the application server was

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

487

set up. You can specify a different method here if desired. In our environment, we entered the process /usr/hativoli/bin/aix4-r1/bin/oserv in the Process to Monitor field, root in the Process Owner field, 60 in the Stabilization Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left as is, as shown in Figure 5-28.

Add Process Application Monitor Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] itmf itmf [/usr/hativoli/bin/aix> [root] [] [60] [0] [] [fallover] [] [/usr/es/sbin/cluster/> [/usr/es/sbin/cluster/>

* * * *

Monitor Name Application Server Name Processes to Monitor Process Owner Instance Count * Stabilization Interval * Restart Count Restart Interval * Action on Application Failure Notify Method Cleanup Method Restart Method

# # # # +

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-28 Add Process Application Monitor SMIT screen for application server itmf

5. Press Enter after you have entered the desired information. The values are then checked for consistency and entered into the ODM. When the resource group comes online, the application monitor starts. In our environment that we use for this Redbook, the COMMAND STATUS SMIT screen displays two warnings as shown inFigure 5-29 on page 489, which we safely ignore because the default values applied are the desired values.

488

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

COMMAND STATUS Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below. claddappmon warning: e 1. claddappmon warning: use 0. The parameter "INSTANCE_COUNT" was not specified. Will us

The parameter "RESTART_INTERVAL" was not specified. Will

F1=Help F8=Image n=Find Next

F2=Refresh F9=Shell

F3=Cancel F10=Exit

F6=Command /=Find

Figure 5-29 COMMAND STATUS SMIT screen after creating HACMP process application monitor

6. Repeat the operation for remaining application servers. In our environment that we use for this Redbook, there are no other IBM Tivoli Management Framework application servers to configure. You can create similar application monitors for a highly available Endpoint.

Add custom HACMP start and stop scripts


For IBM Tivoli Management Framework, custom scripts for HACMP are required to start and stop the application server (in this case, the object dispatcher for Managed Nodes or the lightweight client framework for Endpoints). These are used when HACMP starts an application server that is part of a resource group, and gracefully shuts down the application server when a resource group is taken offline or moved. The stop script of course does not get an opportunity to execute if a cluster node is unexpectedly halted. We developed the following basic versions of the scripts for our environment. You may need to write your own version to accommodate your sites specific requirements.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

489

The following example shows a start script for a highly available object dispatcher (Managed Node or Tivoli server).
Example 5-42 Script to start highly available IBM Tivoli Management Framework #!/bin/sh # # Start IBM Tivoli Management Framework if [ -f /etc/Tivoli/setup_env.sh ] ; then . /etc/Tivoli/setup_env.sh /etc/Tivoli/oserv.rc start else exit 1 fi

The following example shows a stop script for a highly available object dispatcher.
Example 5-43 Script to stop highly available IBM Tivoli Management Framework #!/bin/sh # # Shut down IBM Tivoli Management Framework odadmin shutdown 1

The following example shows a start script for a highly available Endpoint.
Example 5-44 Start script for highly available Endpoint #!/bin/sh # # Starts the highly available Endpoint if [ -f /etc/Tivoli/lcf/1/lcf_env.sh ] ; then . /etc/Tivoli/lcf/1/lcf_env.sh lcfd.sh start else exit 1 fi

The stop script for a highly available Endpoint is similar, except that it passes the argument stop in the call to lcfd.sh, as shown in the following example.

490

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example 5-45 Stop script for highly available Endpoint #!/bin/sh # # Stops the highly available Endpoint if [ -f /etc/Tivoli/lcf/1/lcf_env.sh ] ; then . /etc/Tivoli/lcf/1/lcf_env.sh lcfd.sh stop else exit 1 fi

If you want to implement a highly available object dispatcher and Endpoint in the same resource group, merge the corresponding start and stop scripts into a single script. The configuration we show in this redbook is for a hot standby cluster, so using the same start and stop scripts on all cluster nodes is sufficient. Mutual takeover configurations will need to use more sophisticated scripts that determine the state of the cluster and start (or stop) the appropriate instances of object dispatchers and Endpoints.

Modify /etc/hosts and the name resolution order


Complete the procedures in Modify /etc/hosts and name resolution order on page 455 to modify /etc/hosts and name resolution order on both tivaix1 and tivaix2.

Configure HACMP networks and heartbeat paths


Complete the procedures in Configure HACMP networks and heartbeat paths on page 254 to configure HACMP networks and heartbeat paths.

Configure HACMP resource group


This creates a container to organize HACMP resources into logical groups that are defined later. Refer to High Availability Cluster Multi-Processing for AIX Concepts and Facilities Guide Version 5.1, SC23-4864 for an overview of types of resource groups you can configure in HACMP 5.1. Refer to the chapter on planning resource groups in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00 for further planning information. You should have your planning worksheets in hand. Using the standard path, you can configure resource groups that use the basic management policies. These policies are based on the three predefined types of startup, fallover, and fallback policies: cascading, rotating, concurrent. In addition

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

491

to these, you can also configure custom resource groups for which you can specify slightly more refined types of startup, fallover and fallback policies. Once the resource groups are configured, if it seems necessary for handling certain applications, you can use the Extended Configuration path to change or refine the management policies of particular resource groups (especially custom resource groups). Configuring a resource group involves two phases: Configuring the resource group name, management policy, and the nodes that can own it. Adding the resources and additional attributes to the resource group. Refer to your planning worksheets as you name the groups and add the resources to each one. To create a resource group: 1. Enter smit hacmp. 2. On the HACMP menu, select Initialization and Standard Configuration > Configure HACMP Resource Groups >Add a Standard Resource Group and press Enter. You are prompted to select a resource group management policy. 3. Select Cascading, Rotating, Concurrent or Custom and press Enter. For our environment we use Cascading. Depending on the previous selection, you will see a screen titled Add a Cascading | Rotating | Concurrent | Custom Resource Group. The screen will only show options relevant to the type of the resource group you selected. If you select custom, you will be asked to refine the startup, fallover, and fallback policy before continuing. 4. Enter the field values as follows for a cascading, rotating, or concurrent resource group: Resource Group Name Enter the desired name. Use no more than 32 alphanumeric characters or underscores; do not use a leading numeric. Do not use reserved words. See Chapter 6, section List of Reserved Words in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Duplicate entries are not allowed. Participating Node Names

492

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Enter the names of the nodes that can own or take over this resource group. Enter the node with the highest priority for ownership first, followed by the nodes with the lower priorities, in the desired order. Leave a space between node names. For example, NodeA NodeB NodeX. If you choose to define a custom resource group, you define additional fields. We do not use custom resource groups in this Redbook for simplicity of presentation. Figure 5-30 shows how we configured resource group itmf_rg in the environment implemented by this Redbook. We use this resource group to contain the instance of IBM Tivoli Management Framework normally running on tivaix1.

Add a Resource Group with a Cascading Management Policy (standard) Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [itmf_rg] [tivaix1 tivaix2]

* Resource Group Name * Participating Node Names / Default Node Priority

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-30 Configure resource group rg1

Configure resources in the resource groups


Once you have defined a resource group, you assign resources to it. SMIT can list possible shared resources for the node if the node is powered on (helping you to avoid configuration errors).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

493

When you are adding or changing resources in a resource group, HACMP displays only valid choices for resources, based on the resource group management policies that you have selected. To assign the resources for a resource group: 1. Enter: smit hacmp. 2. Go to Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Change/Show Resources for a Standard Resource Group and press Enter to display a list of defined resource groups. 3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, and Participating Node Names (Default Node Priority) fields filled in. Note: SMIT displays only valid choices for resources, depending on the type of resource group that you selected. The fields are slightly different for custom, non-concurrent, and concurrent groups. If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings. 4. Enter the field values as follows: Service IP Label/IP Addresses (Not an option for concurrent or custom concurrent-like resource groups.) List the service IP labels to be taken over when this resource group is taken over. Press F4 to see a list of valid IP labels. These include addresses which rotate or may be taken over. Filesystems (empty is All for specified VGs) (Not an option for concurrent or custom concurrent-like resource groups.) If you leave the Filesystems (empty is All for specified VGs) field blank and specify the shared volume groups in the Volume Groups field below, all file systems will be mounted in the volume group. If you leave the Filesystems field blank and do not specify the volume groups in the field below, no file systems will be mounted. You may also select individual file systems to include in the resource group. Press F4 to see a list of the filesystems. In this case only the specified file systems

494

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

will be mounted when the resource group is brought online. Filesystems (empty is All for specified VGs) is a valid option only for non-concurrent resource groups. Volume Groups (If you are adding resources to a non-concurrent resource group) Identify the shared volume groups that should be varied on when this resource group is acquired or taken over. Select the volume groups from the picklist or enter desired volume groups names in this field. Pressing F4 will give you a list of all shared volume groups in the resource group and the volume groups that are currently available for import onto the resource group nodes. Specify the shared volume groups in this field if you want to leave the field Filesystems (empty is All for specified VGs) blank and to mount all filesystems in the volume group. If you specify more than one volume group in this field, then all filesystems in all specified volume groups will be mounted; you cannot choose to mount all filesystems in one volume group and not to mount them in another. For example, in a resource group with two volume groups, vg1 and vg2, if the field Filesystems (empty is All for specified VGs) is left blank, then all the file systems in vg1 and vg2 will be mounted when the resource group is brought up. However, if the field Filesystems (empty is All for specified VGs) has only file systems that are part of the vg1 volume group, then none of the file systems in vg2 will be mounted, because they were not entered in the Filesystems (empty is All for specified VGs) field along with the file systems from vg1. If you have previously entered values in the Filesystems field, the appropriate volume groups are already known to the HACMP software. Concurrent Volume Groups (Appears only if you are adding resources to a concurrent or custom concurrent-like resource group.) Identify the shared volume groups that can be accessed simultaneously by multiple nodes. Select the

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

495

volume groups from the picklist, or enter desired volume groups names in this field. If you previously requested that HACMP collect information about the appropriate volume groups, then pressing F4 will give you a list of all existing concurrent capable volume groups that are currently available in the resource group, and concurrent capable volume groups available to be imported onto the nodes in the resource group. Disk fencing is turned on by default. Application Servers Indicate the application servers to include in the resource group. Press F4 to see a list of application servers. Note: If you are configuring a custom resource group, and choose to use a dynamic node priority policy for a cascading-type custom resource group, you will see the field where you can select which one of the three predefined node priority policies you want to use. In our environment, we defined resource group rg1 as shown in Figure 5-31 on page 497.

496

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Change/Show Resources for a Cascading Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] itmf_rg tivaix1 tivaix2 [tivaix1_svc] [itmf_vg] [] [itmf] + + + +

Resource Group Name Participating Node Names (Default Node Priority) * Service IP Labels/Addresses Volume Groups Filesystems (empty is ALL for VGs specified) Application Servers

F1=Help F5=Reset F9=Shell

F2=Refresh F6=Command F10=Exit

F3=Cancel F7=Edit Enter=Do

F4=List F8=Image

Figure 5-31 Define resource group rg1

For resource group rg1, we assign tivaix1_svc as the service IP label, tiv_vg1 as the sole volume group to use, and tws_svr1 for the application server. 5. Press Enter to add the values to the HACMP ODM. 6. Repeat the operation for other resource groups to configure. In our environment, we did not have any further resource groups to configure.

Configure cascading without fallback, other attributes


We configure all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Management Framework can be given enough time to quiesce before falling back. This is part of the extended resource group configuration. We use this step to also configure other attributes of the resource groups like the associated shared volume group and filesystems. To configure CWOF and other resource group attributes: 1. Enter: smit hacmp.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

497

2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resource Group Configuration -> Change/Show Resources and Attributes for a Resource Group and press Enter. SMIT displays a list of defined resource groups. 3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, Inter-site Management Policy, and Participating Node Names (Default Node Priority) fields filled in. If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings. 4. Enter true in the Cascading Without Fallback Enabled field by pressing Tab in the field until the value is displayed (Figure 5-32).

Change/Show All Resources and Attributes for a Cascading Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Resource Group Name Resource Group Management Policy Inter-site Management Policy Participating Node Names / Default Node Priority Dynamic Node Priority (Overrides default) Inactive Takeover Applied Cascading Without Fallback Enabled Application Servers Service IP Labels/Addresses Volume Groups Use forced varyon of volume groups, if necessary Automatically Import Volume Groups [MORE...19] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do [Entry Fields] itmf_rg cascading ignore tivaix1 tivaix2 [] false true [itmf] [tivaix1_svc] [itmf_vg] false false

+ + + + + + + +

F4=List F8=Image

Figure 5-32 Set cascading without fallback (CWOF) for a resource group

5. Repeat the operation for any other applicable resource groups. In our environment, we applied the same operation to resource group rg2.

498

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

For the environment in this Redbook, all resources and attributes for resource group rg1 are shown in Example 5-46.
Example 5-46 All resources and attributes for resource group rg1 [TOP] Resource Group Name Resource Group Management Policy Inter-site Management Policy Participating Node Names / Default Node Priority Dynamic Node Priority (Overrides default) Inactive Takeover Applied Cascading Without Fallback Enabled Application Servers Service IP Labels/Addresses Volume Groups Use forced varyon of volume groups, if necessary Automatically Import Volume Groups Filesystems (empty is ALL for VGs specified) Filesystems Consistency Check Filesystems Recovery Method Filesystems mounted before IP configured Filesystems/Directories to Export Filesystems/Directories to NFS Mount Network For NFS Mount Tape Resources Raw Disk PVIDs Fast Connect Services Communication Links Primary Workload Manager Class Secondary Workload Manager Class Miscellaneous Data [BOTTOM] [Entry Fields] itmf_rg cascading ignore tivaix1 tivaix2 [] false true [itmf] [tivaix1_svc] [itmf_vg] false false [/usr/local/itmf] fsck sequential false [] [] [] [] [] [] [] [] [] []

+ + + + + + + + + + + + + + + + + + + + +

Configure HACMP persistent node IP label/addresses


Complete the procedure in Configure HACMP persistent node IP label/addresses on page 272 to configure HACMP persistent node IP labels and addresses.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

499

Configure predefined communication interfaces


Complete the procedure in Configure predefined communication interfaces on page 276 to configure predefined communication interfaces to HACMP.

Verify the configuration


Complete the procedure in Verify the configuration on page 280 to verify the HACMP configuration. The output of the cltopinfo command for our environment is shown in Example 5-47.
Example 5-47 Output of cltopinfo command for hot standby Framework configuration Cluster Description of Cluster: cltivoli Cluster Security Level: Standard There are 2 node(s) and 3 network(s) defined NODE tivaix1: Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix1_bt2 10.1.1.101 tivaix1_bt1 192.168.100.101 Network net_tmssa_01 tivaix1_tmssa2_01 /dev/tmssa2 NODE tivaix2: Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix2_bt1 192.168.100.102 tivaix2_bt2 10.1.1.102 Network net_tmssa_01 tivaix2_tmssa1_01 /dev/tmssa1 Resource Group itmf_rg Behavior Participating Nodes Service IP Label

cascading tivaix1 tivaix2 tivaix1_svc

The output would be the same for configurations that add highly available Endpoints, because we use the same resource group in the configuration we show in this redbook.

500

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Start HACMP Cluster services


Complete the procedure in Start HACMP Cluster services on page 287 to start HACMP on the cluster.

Verify HACMP status


Complete the procedure in Verify HACMP status on page 292 to verify HACMP is running on the cluster.

Test HACMP resource group moves


Complete the procedure in Test HACMP resource group moves on page 294 to test moving resource group itmf_rg from cluster node tivaix1 to tivaix2, then from tivaix2 to tivaix1.

Live test of HACMP fallover


Complete the procedure in Live test of HACMP fallover on page 298 to test HACMP fallover of the itmf_rg resource group. Verify the lsvg command displays the volume group itmf_vg and the command clRGinfo command displays the resource group itmf_rg.

Configure HACMP to start on system restart


Complete the procedure in Configure HACMP to start on system restart on page 300 to set HACMP to start when the system restarts.

Verify Managed Node fallover


When halting cluster nodes during testing in Live test of HACMP fallover, a highly available Managed Node (or Tivoli server) will also start appropriately when the itmf_rg resource group is moved. Once you verify that a resource groups disk and network resources have moved, you must verify that the Managed Node itself functions in its new cluster node (or in HACMP terms, verify that the application server resource of the resource group is functions in the new cluster node). In our environment, we perform the live test of HACMP operation at least twice: once to test HACMP resource group moves of disk and network resources in response to a sudden halt of a cluster node, and again while verifying the highly available Managed Node is running on the appropriate cluster node(s). To verify that a highly available Managed Node is running during a test of a cluster node fallover from tivaix1 to tivaix2, follow these steps: 1. Log into the surviving cluster node as any user. 2. Use the odadmin command, as shown in Example 5-48 on page 502.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

501

Example 5-48 Sample output of command to verify IBM Tivoli Management Framework is moved by HACMP [root@tivaix1:/home/root] . /etc/Tivoli/setup_env.sh [root@tivaix1:/home/root] odadmin odlist Region Disp Flags Port IPaddr 1369588498 1 ct94 9.3.4.3 9.3.4.194

Hostname(s) tivaix1_svc tivaix1,tivaix1.itsc.austin.ibm.com

The command should be repeated while testing that CWOF works. If CWOF works, then the output will remain identical after the halted cluster node reintegrates with the cluster. The command should be repeated again to verify that falling back works. In our environment, after moving a resource group back to the reintegrated cluster node so the cluster is in its normal operating mode (tivaix1 has the itmf_rg resource group, and tivaix2 has no resource group), the output of the odadmin command on tivaix1 verifies that the Managed Node runs on the cluster node, but the same command fails on tivaix2 because the resource group is not on that cluster node.

Verify Endpoint fallover


Verifying an Endpoint fallover is similar to verifying a Managed Node fallover. Instead of using the odadmin command to verify that a cluster node is running a Managed Node, use the ps and grep commands as shown in Example 5-49 to verify that a cluster node is running a highly available Endpoint.
Example 5-49 Identify that an Endpoint is running on a cluster node [root@tivaix1:/home/root] ps -ef | grep lcf | grep -v grep root 21520 1 0 Dec 22 - 0:00 /opt/hativoli/bin/aix4-r1/mrt/lcfd -Dlcs.login_interfaces=tivaix1_svc -n hativoli -Dlib_dir=/opt/hativoli/lib/aix4-r1 -Dload_dir=/opt/hativoli/bin/aix4-r1/mrt -C/opt/hativoli/dat/1 -Dlcs.machine_name=tivaix1_svc -Dlcs.login_interfaces=tivaix1 -n hativoli

If there are multiple instances of Endpoints, identify the instance by the directory the Endpoint starts from, highlighted in italics in Example 5-49.

Save HACMP configuration snapshot


Take a snapshot to save a record of the HACMP configuration.

Production considerations
In this document, we show an example implementation, leaving out many ancillary considerations that obscure the presentation. In this section we discuss

502

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

some of the issues that you might face in an actual deployment for a production environment.

Security
IBM Tivoli Management Framework offers many configurable security options and mechanisms. One of these is an option to encrypt communications using Secure Sockets Layer (SSL). This requires a certificate authority (CA) to sign the SSL certificates. Highly available instances of IBM Tivoli Management Framework that use this option should plan and implement the means to make the CA highly available as well.

Tivoli Enterprise products


Not all Tivoli Enterprise products that leverage the Tivoli server, Managed Nodes and Endpoints are addressed with the high availability designs presented in this redbook. You should carefully examine each products requirements and modify your high availability design to accommodate them.

5.2 Implementing Tivoli Framework in a Microsoft Cluster


In this section we cover the installation of Tivoli on a Microsoft Cluster, which includes the following topics: Installation of a TMR server on a Microsoft Cluster Installation of a Managed Node on a Microsoft Cluster Installation of an Endpoint on a Microsoft Cluster

5.2.1 TMR server


In the following sections, we walk you through the installation of Tivoli Framework in a MSCS environment. Installation overview - provides an overview of cluster installation procedures. It also provides a reference for administrators who are already familiar with configuring cluster resources and might not need detailed installation instructions. Framework installation on node 1 - provides installation instructions for installing and configuring Tivoli Framework on the first node in the cluster. In this section of the install, node 1 will own the cluster resources required for the installation.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

503

Framework installation on node 2 - provides installation instructions for installing and configuring Tivoli Framework on the second node in the cluster. The majority of the configuration takes place in this section. The second node is required to own the cluster resources in this section. Cluster resource configuration - this describes how the Tivoli Framework services are configured as cluster resources. After configuring the cluster resources, the Framework should be able to be moved between the nodes.

Installation overview
In this section we walk through the installation and configuration of the Framework. The sections following provide greater detail.

Node 1 installation
1. Make sure Node 1 is the owner of the cluster group that contains the drive where framework will be installed (X:, in our example). 2. Insert the Tivoli Framework disc 1 in the CD-ROM drive and execute the following command: setup.exe advanced 3. Click Next past the welcome screen. 4. Click Yes at the license screen. 5. Click Next at the accounts and permissions page. 6. Enter the name of the cluster name resource in the advanced screen (tivw2kv1, in our example). Make sure that the start services automatically box is left unchecked. 7. Specify an installation password if you would like. Click Next. 8. Specify a remote administration account and password if applicable. Click Next. 9. Select Typical installation option. Click Browse and specify a location on the shared drive as the installation location (X:\tivoli, in our example). 10.Enter IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41 as the license key. Click Next. 11.Click Next to start copying files. 12.Press any key after the oserv service has been installed. 13.Click Finish to end the installation on node 1.

Node 2 installation
1. Copy tivoliap.dll from node 1 to node 2. 2. Copy the %SystemRoot%\system32\drivers\etc\Tivoli directory from node1 to node 2.

504

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

3. Move the cluster group from node 1 to node 2. 4. Source the Tivoli environment. 5. Create tivoli account by running %BINDIR%\TAS\Install\ntconfig -e. 6. Load the tivoliap.dll with the LSA by executing wsettap -a. 7. Set up TRAA account using wsettap. 8. Install TRIP using trip -install -auto. 9. Install the Autotrace service using %BINDIR%\bin\atinstall --quietcopy %BINDIR%\bin. 10.Install the object dispatcher using oinstall -install %DBDIR%\oserv.exe.

Cluster resource configuration


1. Open the Microsoft Cluster administrator. 2. Create a new resource for the TRIP service. a. Name the resource TRIP resource (TIVW2KV1 -Trip, in our example). Set the resource type to generic service. b. Select both nodes as possible owners. c. Select the cluster disk, cluster name and cluster IP as dependencies. d. Set the service name to trip and check the box Use network name for computer name. e. There is no registry setting required for the TRIP service. 3. Create a new resource for the oserv service. a. Name the oserv resource (TIVW2KV1 - Oserv, in our example). Set the resource type to Generic Service. b. Select both nodes as possible owners. c. Select the cluster disk, cluster name, cluster IP and TRIP as dependencies. d. Set the service name to oserv and check the box Use network name for computer name. e. Set the registry key SOFTWARE\Tivoli as key to replicate across nodes. 4. Bring the cluster group online.

TMR installation on node 1


The installation of a TMR server on an MSCS is very similar to a normal Tivoli Framework installation. In order to perform the installation, make sure that the Framework 4.1 Disk 1 is in the CD-ROM drive or has been copied locally.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

505

1. Start the installation by executing setup.exe advanced. Figure 5-33 illustrates how to initiate the setup using the Windows Run window.

Figure 5-33 Start the installation using setup.exe

2. After the installation is started, [advanced] is displayed after the Welcome to confirm that you are in advanced mode. Click Next to continue (Figure 5-34 on page 506).

Figure 5-34 Framework [advanced] installation screen

3. The license agreement will be displayed; click Yes to accept and continue (Figure 5-35 on page 507).

506

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-35 Tivoli License Agreement

4. The next setup screen (Figure 5-36 on page 508) informs you that the tmersrvd account will be created and the Tivoli_Admin_Privleges group will be created. If an Endpoint has been installed on the machine, the account and group will already be installed. Click Next to continue.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

507

Figure 5-36 Accounts and file permissions screen

5. Now you need to enter the hostname of the virtual server where you want the TMR to be installed. The hostname that you enter here will override the default value of the local hostname. Make sure that the Services start automatically box remains unchecked; you will handle the services via the Cluster Administrator. Click Next to continue (Figure 5-37 on page 509).

508

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-37 Framework hostname configuration

6. You can now enter an installation password, if desired. An installation password must be entered to install Managed Nodes, create interregion connections, or install software using Tivoli Software Installation Service. An installation password is not required in this configuration. Click Next to continue (Figure 5-38 on page 510).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

509

Figure 5-38 Framework installation password

7. Next you can specify a Tivoli Remote Access Account (TRAA). The TRAA is the user name and password that Tivoli will use to access remote file systems. This is not a required field and can be left blank. Click Next to continue (Figure 5-39 on page 511).

510

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-39 Tivoli Remote Access Account (TRAA) setup

8. You can now select from the different installation types. In our example, we show a Typical installation. For information about the other types of installations, refer to the Framework 4.1 documentation. You will want to change the location where the Tivoli Framework is installed. The installation defaults to C:\Program Files\Tivoli, so it needs to be changed to X\Tivoli. To change the installation directory, click Browse. Use the Windows browser to select the correct location for the installation directory. In our example, the drive shared by the cluster is the X: drive. Make sure you select the shared cluster drive as the installation location on your system. After the installation directory has been set, click Next to move to the next step (Figure 5-40 on page 512).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

511

Figure 5-40 Framework installation type

9. In the License key dialog (Figure 5-40), enter the following: IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41 Click Next to continue.

512

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-41 Framework license key setup

The setup program will ask you to review the settings that you have specified (Figure 5-42 on page 514). If settings need to be changed, you can modify them by clicking Back. After you are satisfied with the settings, click Next to continue.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

513

Figure 5-42 Framework setting review

10.After the files have been copied, the oserv will be installed (see Figure 5-43). You will have to select the DOS window and press any key to continue the installation.

Figure 5-43 Tivoli oserv service installation window

514

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

11.The Framework installation is now complete on the first node. Click Finish to exit the installation wizard (Figure 5-44). If the installation prompts you to restart the computer, select the option to restart later. You will need to copy some files off node 1 prior to rebooting.

Figure 5-44 Framework installation completion

TMR installation on node 2


The Tivoli Framework installation on the second node is not as straightforward as the installation of the installation of the first node. This installation consists of the following manual steps. 1. Before you fail over the X: drive and start the installation on node 2, you need to copy %SystemRoot%\system32\drivers\etc\Tivoli and %SystemRoot%\system32\tivoliap.dll files from node 1. The easiest way to do this is to copy the files to the shared drive and simply move the drive. However, you can also copy the files from one machine to another. One way to copy the files is to open a DOS window and copy the files using the DOS commands; see Figure 5-45 on page 516.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

515

The commands are as follows:


x: mkdir tmp xcopy /E c:\winnt\system32\drivers\etc\tivoli x:\tmp copy c:\winnt\system32\tivoliap.dll x:\

Figure 5-45 shows the output.

Figure 5-45 File copy output

2. After the files are copied, you can fail over the X: driver to node 2. You can do this manually by using the Cluster Administrator, but in this case you will need to restart the machine to register the tivoliap.dll on node 1, so you can simply restart node 1 and the drive should fail over automatically. After node 1 has started to reboot, the X: drive should fail over to node 2. To continue the Framework installation on the node, you will need to open a DOS window on node 2. Create the c:\winnt\system32\drivers\etc\tivoli directory on node 2:
mkdir c:\winnt\system32\drivers\etc\tivoli

This is shown in Figure 5-46 on page 517.

516

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-46 Create the etc\tivoli directory on node 2

3. Now you need to copy the Tivoli environment files from the X:\tmp directory to the c:\winnt\system32\drivers\etc\tivoli directory just created in node 2. To do this, execute:
xcopy /E x:\tmp\* c:\winnt\system32\drivers\etc\tivoli

Figure 5-47 shows the output of this command.

Figure 5-47 Copy the Tivoli environment files

4. Source the Tivoli environment:


c\: winnt\system32\drivers\etc\tivoli\setup_env.cmd

Figure 5-48 on page 518 shows the output of this command.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

517

Figure 5-48 Source the Tivoli environment

5. Now that the Tivoli environment is sourced, you can start configuring node 2 of the TMR. First you need to create the tmersrvd account and the Tivoli_Admin_Privleges group. To do this, execute the ntconfig.exe executable:
%BINDIR%\TAS\Install\ntconfig -e

See Figure 5-46 on page 517.

518

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-49 Add the Tivoli account

6. Copy tivoliap.dll from the X: drive to c:\winnt\system32:


copy x:\tivoliap.dll c:\winnt\system32

The output is shown in Figure 5-50 on page 520.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

519

Figure 5-50 Copy the tivoliap.dll

7. After tivoliap.dll has been copied, you can load it with the wsettap.exe utility:
wsettap -a

A reboot will be required before the tivoliap.dll is completely loaded.

520

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-51 Register the tivoliap.dll

8. Install the Autotrace service. Framework 4.1 includes a new embedded Autotrace service for use by IBM Support. Autotrace uses shared memory segments for logging purposes. To install Autotrace:
%BINDIR%\bin\atinstall --quitecopy %BINDIR%\bin

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

521

Figure 5-52 Installing Autotrace

9. Finally, you need to install and start the oserv service. To install the oserv service:
oinstall -install %DBDIR%\oserv.exe

Figure 5-53 on page 523 shows the output of the command, indicating that oserv service has been installed.

522

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-53 Create the oserv service

After the oserv service is installed, your setup of node 2 is complete. Now you need to restart node 2 to load tivoliap.dll.

Setting up cluster resources


Now that the binaries are installed on both nodes of the clusters, you need to create the cluster resources. You will need to create two cluster resources, one for the oserv service and one for the TRIP service. Because the oserv service depends on the TRIP service, you need to create the TRIP service first. Create the resources using the Cluster Administrator. 1. Open the Cluster Administrator by selecting Start -> Programs -> Administrative Tools -> Cluster Administrator. 2. After the Cluster Administrator is open, you can create a new resource by right-clicking your cluster group and selecting New -> Resource, as shown in Figure 5-54 on page 524.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

523

Figure 5-54 Create a new resource

3. Select the type of resource and add a name. You can name the resource however you would like. In our example, we chose TIVW2KV1 - TRIP, in order to adhere to our naming convention (see Figure 5-55 on page 525). The Description field is optional. Make sure that you change the resource type to a generic service, and that the resource belongs to the cluster group that contains the drive where the Framework was installed. Click Next to continue.

524

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-55 Resource name and type setup

4. Define which nodes can own the resource. Since you are configuring your TMR for a hot standby scenario, you need to ensure that both nodes are added as possible owners (see Figure 5-56 on page 526). Click Next to continue.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

525

Figure 5-56 Configure possible resource owners

5. Define the dependencies for the TRIP service. On an MSCS, dependencies are defined as resources that must be active in order for another resource to run properly. If a dependency is not running, the cluster will fail over and attempt to start on the secondary node. To configure TRIP, you need to select the shared disk the cluster IP and the cluster name resources as dependencies, as shown in Figure 5-57 on page 527. Click Next to continue.

526

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-57 TRIP dependencies

6. Define which service is associated with your resource. The name of the Tivoli Remote Execution Service is trip, so enter that in the Service name field. There are no start parameters. Make sure that the Use Network Name for computer name check box is selected (see Figure 5-58 on page 528). Click Next to continue.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

527

Figure 5-58 TRIP service name

7. One of the options available with MSCS is to replicate registry keys between the nodes of a cluster. This option is not required for the TRIP service, but you will use it later when you create the oserv service. Click Finish to continue (see Figure 5-59 on page 529).

528

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-59 Registry replication

The resource has now been created. You will notice that when a resource is created, it is offline. This is normal. You will start the resources after the configuration is complete. Next, create the oserv cluster resource. You do this by using the same process used to create the TRIP resource. 8. Open the Cluster Administrator, right-click your cluster group, and select New -> Resource, as shown in Figure 5-60 on page 530.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

529

Figure 5-60 Create a new resource

9. Select a name for the resource. We used oserv in our example, as seen in Figure 5-61 on page 531. Add a description if desired. Make sure you specify the resource type to be a Generic Service. Click Next to continue.

530

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-61 Resource name and type setup

10.Select both nodes as owners for the oserv resource, as shown in Figure 5-62 on page 532. Click Next to continue.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

531

Figure 5-62 Select owners of the resource

11.Select all the cluster resources in the cluster group as dependencies for the oserv resource, as seen in Figure 5-63 on page 533. Click Next to continue.

532

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-63 Select resource dependencies

f. Specify oserv as the service name. Make sure to check the box Use Network Name for computer name (see Figure 5-64 on page 534). Click Next to continue.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

533

Figure 5-64 Service and parameter setup

g. Click Add and specify the registry key SOFTWARE\Tivoli as the key to replicate (see Figure 5-65 on page 535). Click Finish to complete the cluster setup.

534

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-65 Registry replication

12.At this point, the installation of Framework on an MSCS is almost complete. Now you have to bring the cluster resources online. To do this, right-click the cluster group and select Bring Online, as seen in Figure 5-66 on page 536.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

535

Figure 5-66 Bringing cluster resources online

The Framework service should now fail over whenever the cluster or one of nodes fails.

5.2.2 Tivoli Managed Node


In this section, we cover the Managed Node Framework installation process on an MSCS. The Managed Node installation method we have chosen is via the Tivoli Desktop. However, the same concepts should apply for a Managed Node installed using Tivoli Software Installation Service (SIS), or using the wclient command. The following topics are covered in this section: Installation overview - provides a brief overview of the steps required to install Tivoli Framework on an MSCS Managed Node TRIP installation - describes the installation of the Tivoli Remote Execution Protocol (TRIP), which is a required prerequisite for Managed Node installation Managed Node installation - covers the steps to install a Managed Node on a MSCS from the Tivoli Desktop Manages Node configuration - covers the setup process on the second node, as well as the configuring oserv to bind to the cluster IP address

536

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Cluster resource configuration - covers the cluster configuration, which consists of the setup of the oserv and TRIP resources The Managed Node installation process has many installation steps in common with the installation of the TMR server. For these steps, we refer you back to the previous section for the installation directives

Installation overview
Here we give a brief outline of the Managed Node installation process on an MSCS system. The sections following describe the steps listed here in greater detail. Figure 5-67 on page 538 illustrates the configuration we use in our example.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

537

TMR

edinburgh
Clustered ManagedNode

tivw2k1
TIVW2KV1 Resource Group Driv e X: IP Address 9.3.4.199 Network Name TIVW2KV1

tivw2k2

Figure 5-67 Tivoli setup

TRIP installation
To install TRIP, follow these steps: 1. Insert Framework CD 2 in the CD-ROM drive and run setup.exe. 2. Click Next at the welcome screen, 3. Click Yes at the license agreement. 4. Select a local installation directory to install TRIP (c:\tivoli\trip, in our example). 5. Click Next to start copying files. 6. Press any key after the TRIP service has been installed. 7. Click Finish to complete the installation. 8. Follow steps 1-7 again on node 2 so TRIP is installed on both nodes of the cluster.

538

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Managed Node installation on node 1


1. Open the Tivoli desktop and log in to the TMR that will own the Managed Node. 2. Open a policy region where the Managed Node should reside and select Create -> ManagedNode. 3. Click Add Clients and enter the name associated with the cluster group where the Managed Node will be installed (tivw2kv1, in out example). 4. Click Select Media and browse to the location of Framework disc 1. 5. Click Install Options and make sure that the installation directories are all located on the clusters shared drive (X:\tivoli, in our example). Verify that Arrange for start of the Tivoli daemon at system (re)boot time is unchecked. 6. Select Account as the default access method, and specify an account and password with administrator access to the Managed Node you are installing. 7. Click Install & Close to start the installation. 8. Click Continue Install at the Client Install screen. 9. Specify a Tivoli Remote Access Account if necessary (in our example, we used the default access method option). 10.Click Close at the reboot screen. You do not want to reboot at this time. 11.Click Close after the Client Install window states that it has finished the client install.

Managed Node installation on node 2


1. Copy tivoliap.dll from node 1 to node 2. 2. Copy the %SystemRoot%\system32\drivers\etc\Tivoli directory from node1 to node 2. 3. Move the cluster group from node 1 to node 2. 4. Source the Tivoli environment. 5. Create the tivoli account by running %BINDIR%\TAS\Install\ntconfig -e. 6. Load tivoliap.dll with the LSA by executing wsettap -a. 7. Set up the TRAA account using wsettap. 8. Install the autotrace service %BINDIR%\bin\atinstall --quietcopy %BINDIR%\bin. 9. Install the object dispatcher oinstall -install %DBDIR%\oserv.exe

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

539

10.Start the oserv service:


net start oserv /-Nali /-k%DBDIR% /-b%BINDIR%\..

11.Change the IP address of the Managed Node from the physical IP to the cluster IP address:
odadmin odlist change_ip <dispatcher> <cluster ip> TRUE

12.Set the oserv to bind to a single IP:


odadmin set_force_bind TRUE <dispatcher>

Cluster resource configuration


1. Open the Microsoft Cluster administrator. 2. Create a new resource for the TRIP service. a. Name the resource TRIP resource (TIVW2KV1 -Trip, in our example). Set the resource type to Generic Service. b. Select both nodes as possible owners. c. Select the cluster disk, cluster name and cluster IP as dependencies. d. Set the service name to trip and check the box Use network name for computer name. e. There are no registry settings required for the TRIP service. 3. Create a new resource for the oserv service. a. Name the oserv resource (TIVW2KV1 - Oserv, in our example). Set the resource type to Generic Service. b. Select both nodes as possible owners. c. Select the cluster disk, cluster name and cluster IP as dependencies. d. Set the service name to oserv and check the box Use network name for computer name. e. Set the registry key SOFTWARE\Tivoli as key to replicate across nodes. 4. Bring the cluster group online.

TRIP installation
Tivoli Remote Execution Service (TRIP) must be installed before installing a Tivoli Managed Node. Install TRIP as follows: 1. Insert Tivoli Framework CD 2 in the CD-ROM drive of node 1 and execute the setup.exe found in the TRIP directory (see Figure 5-68 on page 541).

540

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-68 Start TRIP installation

2. Click Next past the installation Welcome screen (Figure 5-69).

Figure 5-69 TRIP Welcome screen

3. Click Yes at the License agreement (see Figure 5-70 on page 542).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

541

Figure 5-70 The TRIP license agreement

4. Select the desired installation directory. We used the local directory c:\tivoli, as shown in Figure 5-71 on page 543. Click Next to continue.

542

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-71 Installation directory configuration

5. Click Next to start the installation (see Figure 5-72 on page 544).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

543

Figure 5-72 Installation confirmation

6. Click any key after the TRIP service has been installed and started (Figure 5-73).

Figure 5-73 TRIP installation screen

7. Click Next to complete the installation (see Figure 5-74 on page 545).

544

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-74 TRIP installation completion

8. Repeat the TRIP installation steps 1-7 on node 2.

Managed Node installation on node 1


.In this section we describe the steps needed to install the Managed Node software on node 1 of the cluster. The Managed Node software will be installed on the clusters shared drive X:, so you need to make sure that node 1 is the owner of the resource group that contains the X: drive. We will be initiating the installation from the Tivoli Desktop, so log in the TMR (edinburgh). 1. After you are logged in to the TMR, navigate to a policy region where the Managed Node will reside and click Create -> ManagedNode (see Figure 5-75 on page 546).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

545

Figure 5-75 ManagedNode installation

2. Click Add Clients button and enter the name of the virtual name of the cluster group. In our case, it is tivw2kv1. Click Add & Close (Figure 5-76).

Figure 5-76 Add Clients dialog

546

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

3. Insert the Tivoli Framework CD 1 in the CD-ROM drive on the TMR server and click Select Media.... Navigate to the directory where the Tivoli Framework binaries are located on the CD-ROM. Click Set Media & Close (Figure 5-77).

Figure 5-77 Tivoli Framework installation media

4. Click Install Options.... Set all installation directories to the shared disk (X:). Make sure you check the boxes When installing, create Specified Directories if missing and Configure remote start capability of the Tivoli daemon. Do not check the box Arrange for start of the Tivoli daemon at system (re)boot time. Let the cluster service handle the oserv service. Click Set to continue (see Figure 5-78 on page 548).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

547

Figure 5-78 Tivoli Framework installation options

5. You need to specify the account that Tivoli will use to perform the installation on the cluster. Since you are only installing one Managed Node at this time, use the default access method. Make sure the Account radio button is selected, then enter the userid and password of an account on the node 1 with administrative rights on the machine. If a TMR installation password is used on your TMR, enter it now. Click Install & Close (see Figure 5-79 on page 549).

548

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-79 Specify a Tivoli access account

6. Now the Tivoli installation program will attempt to contact the Managed Node and query it to see what needs to be installed. You should see output similar to Figure 5-80 on page 550. 7. If there are no errors, then click Continue Install to begin the installation; see Figure 5-80 on page 550.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

549

Figure 5-80 Client installation screen

8. If your environment requires the use of a Tivoli Remote Access Account (TRAA), then specify the account here. In our example we selected Use Installation Access Method Account for our TRAA account. Click Continue (see Figure 5-81 on page 551).

550

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-81 Tivoli Remote Access Account (TRAA) setup

9. Select Close at the client reboot window (Figure 5-82). You do not want your servers to reboot until after you have configured them.

Figure 5-82 Managed Node reboot screen

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

551

10.The binaries will now start to copy from the TMR server to the Managed Node. The installation may take a while, depending on the speed of your network and the type of machines were your installing the ManagedNode software. After the installation is complete, you should see the following message at the bottom of the scrolling installation window: Finished client install. Click Close to complete the installation (Figure 5-83).

Figure 5-83 Managed Node installation window

552

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Managed Node installation on node 2


Now you need to replicate manually on node 2 what the Tivoli installation performed on node 1. Because steps 1 to 9 of the Managed Node configuration are the same as the TMR installation of node 2 (see 5.2.1, TMR server on page 503), we do not cover those steps in great detail here. 1. Copy the tivoliap.dll from node 1 to node 2. 2. Copy the %SystemRoot%\system32\drivers\etc\Tivoli directory from node1 to node 2. 3. Move the cluster group from node 1 to node 2. 4. Source the Tivoli environment on node 2. 5. Create the tivoli account by running %BINDIR%\TAS\Install\ntconfig -e. 6. Load the tivoliap.dll with the LSA by executing wsettap -a. 7. Set up the TRAA account by using wsettap. 8. Install the Autotrace service %BINDIR%\bin\atinstall --quietcopy %BINDIR%\bin. 9. Install the object dispatcher oinstall -install%DBDIR%\oserv.exe. 10.Start the oserv service:
net start oserv /-Nali /-k%DBDIR% /-b%BINDIR%\..S.

Figure 5-84 Starting the oserv service

11.Change the IP address of the Managed Node from the physical IP to the cluster IP address:
odadmin odlist change_ip <dispatcher> <cluster ip> TRUE

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

553

12.Set the oserv to bind to a single IP address:


odadmin set_force_bind TRUE <dispatcher>

Figure 5-85 Configure Managed Node IP address

13.Restart both systems to register tivoliap.dll.

Cluster resource configuration


The steps needed for cluster resource configuration here are the same as for the cluster resource configuration of a TMR as discussed in 5.2.1, TMR server on page 503, so refer to that section for detailed information. In this section, we simply guide you through the overall process. 1. Open the Microsoft Cluster administrator. 2. Create a new resource for the TRIP service. a. Name the resource TRIP resource (TIVW2KV1 -Trip, in our example). Set the resource type to Generic Service. b. Select both nodes as possible owners. c. Select the cluster disk, cluster name and cluster IP as dependencies. d. Set the service name to trip and check the box Use network name for computer name. e. There are no registry settings required for the TRIP service. 3. Create a new resource for the oserv service. a. Name the oserv resource (TIVW2KV1 - Oserv, in our example). Set the resource type to Generic Service. b. Select both nodes as possible owners.

554

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

c. Select the cluster disk, cluster name and cluster IP as dependencies. d. Set the service name to oserv and check the box Use network name for computer name. e. Set the registry key SOFTWARE\Tivoli as key to replicate across nodes. 4. Bring the cluster group online.

5.2.3 Tivoli Endpoints


In this section we provide a detailed overview describing how to install multiple Tivoli Endpoints (TMAs) on a Microsoft Cluster Service (MSCS). The general requirements for this delivery are as follows: Install a Tivoli Endpoint on each physical server in the cluster. Install a Tivoli Endpoint on a resource group in the cluster (Logical Endpoint). This Endpoint will have the hostname and IP address of the virtual server. The Endpoint resource will roam with the cluster resources. During a failover, the cluster services will control the startup and shutdown of the Endpoint. The purposes of this section are to clearly demonstrate what has been put in place (or implemented) by IBM/Tivoli Services, to provide a detailed document of custom configurations, installation procedures, and information that is generally not provided in user manuals. This information is intended to be a starting place for troubleshooting, extending the current implementation, and documentation of further work.

Points to consider
Note the following points regarding IBMs current solution for managing HA cluster environments for Endpoints. The Endpoint for the physical nodes to represent the physical characteristics (Physical Endpoint): Always stays at the local system Does not fail over to the alternate node in the cluster Monitors only the underlying infrastructure The Endpoint for every cluster resource group representing the logical characteristics (Logical Endpoint): Moves together with the cluster group Stops and starts under control of HA Monitors only the application components within the resource group Several limitations apply (for instance, Endpoints have different labels and listen on different ports)

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

555

Platforms Solaris, AIX, HP-UX, Windows NT, Windows 2000 Platform versions as supported by our products today

Installation and configuration


The complete solution for managing/monitoring the MSCS involves installing three Tivoli Endpoints on the two physical servers. One Physical Endpoint will reside on each server, while the third Endpoint will run where the cluster resource is running. For example, if node 1 is the active cluster or contains the cluster group, this node will also be running the Logical Endpoint alongside its own Endpoint (see Figure 5-86).

Figure 5-86 Endpoint overview

An Endpoint is installed on each node to manage the physical components, and we call this the Physical Endpoint". This Endpoint is installed on the local disk of the system using the standard Tivoli mechanism. This Endpoint is installed first, so its instance id is "1" on both physical servers (for example, \Tivoli\lcf\dat\1).

556

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

A second Endpoint instance (its instance id is "2") is installed on the shared file system. This Endpoint represents the application that runs on the cluster, and we call it the Logical Endpoint. The Endpoints will not share any path, cache or content; their disk layout is completely separated. The Logical Endpoint will have an Endpoint label that is different from the physical Endpoint and will be configured to listen on a different port than the physical Endpoint. The general steps to implementing this configuration are as follows: 1. Install the Tivoli Endpoint on node 1, local disk. 2. Install the Tivoli Endpoint on node 2, local disk. 3. Manually install the Tivoli Endpoint on the logical server, shared drive X: (while logged onto the currently active cluster node). 4. Configure the new LCFD service as a generic service in the cluster group (using the Cluster Administrator). 5. Move the cluster group to node 2 and register the new LCFD service on this node by using the lcfd.exe i command (along with other options).

Environment preparation and configuration


Before beginning the installation, make sure there are no references to lcfd in the Windows Registry. Remove any references to previously installed Endpoints, or you may run into problems during the installation. Note: This is very important to the success of the installation. If there are any references (typically legacy_lcfd), you will need to delete them using regedt32.exe. Verify that you have two-way communication to and from the Tivoli Gateways from the cluster server via hostname and IP address. Do this by updating your name resolution system (DNS, hosts files, and so on). We strongly recommend that you enter the hostname and IP address of the logical node in the hosts file of each physical node. This will locally resolve the logical servers hostname when issuing the ping a command. Finally, note that this solution works only with version 96 and higher of the Tivoli Endpoint.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

557

Install the Tivoli Endpoint on node 1


To install the Tivoli Endpoint on node 1, follow these steps: 1. Install the Tivoli Endpoint using the standard CD InstallShield setup program on one of the physical nodes in the cluster. 2. In our case, we leave the ports as default, but enter optional commands to configure the Endpoint and ensure its proper login.

Figure 5-87 Endpoint advanced configuration

The configuration arguments in the Other field are:


-n <ep label> -g <preferred gw> -d3 -D local_ip_interface=<node primary IP> -D bcast_disable=1

3. The Endpoint should install successfully and log in to the preferred Gateway. You can verify the installation and login by issuing the following commands on the TMR or Gateway (Figure 5-88 on page 559).

558

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-88 Endpoint login verification

Install the Tivoli Endpoint on node 2


To install the Tivoli Endpoint on node 2, follow these steps: 1. Install the Tivoli Endpoint on the physical node 2 in the cluster. Follow the same steps and options as in node 1 (refer to Install the Tivoli Endpoint on node 1 on page 558). 2. Verify that you have a successful installation and then log in as described.

Manually install the Tivoli Endpoint on the virtual node


To install the Tivoli Endpoint on the virtual node, follow these steps. Note: You will only be able to do this from the active cluster server, because the non-active node will not have access to the shared volume X: drive. 1. On the active node, copy only the Tivoli installation directory (c:\Program Files\Tivoli) to the root of the X: drive. Rename X:\Tivoli\lcf\dat\1 to X:\Tivoli\lcf\dat\2.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

559

Note: Do not use the Program Files naming convention on the X: drive. 2. Edit the X:\Tivoli\lcf\dat\2\last.cfg file, changing all of the references of c:\Program Files\Tivoli\lcf\dat\1 to X:\Tivoli\lcf\dat\2. 3. On both physical node 1 and physical node 2, copy the c:\winnt\Tivoli\lcf\1 directory to c:\winnt\Tivoli\lcf\2. 4. On both physical node 1 and physical node 2, edit the c:\winnt\Tivoli\lcf\2\lcf_env.cmd and lcf_env.sh files, replacing all references of c:\Program Files\Tivoli\lcf\dat\1 to X:\Tivoli\lcf\dat2. 5. Remove the lcfd.id, lcfd.sh, lcfd.log, lcfd.bk and lcf.dat files from the X:\Tivoli\lcf\dat\2 directory. 6. Add or change the entries listed in Example 5-50 to the X:\Tivoli\lcf\dat\2\last.cfg file.
Example 5-50 f:\Tivoli\lcf\dat\2\last.cfg file lcfd_port=9497 lcfd_preferred_port=9497 lcfd_alternate_port=9498 local_ip_interface=<IP of the virtual cluster> lcs.login_interfaces=<gw hostname or IP> lcs.machine_name=<hostname of virtual Cluster> UDP_interval=30 UDP_attempts=3 login_interval=120

The complete last.cfg file should resemble the output shown in Figure 5-89 on page 561.

560

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-89 Sample last.cfg file

7. Execute the following command:


X:\Tivoli\lcf\bin\w32-ix86\mrt\lcfd.exe -i -n <virtual_name> -C X:\Tivoli\lcf\dat\2 -P 9497 -g <gateway_label> D local_ip_interface=<virtual_ip_address>

Note: The IP address and name are irrelevant as long as their label is a unique label with -n specified. Every time the Endpoint logs in, the Gateway registers the IP that contacted it. It will use that IP from that point forward for down calls. A single interface cannot be bound to multiple interface machines, so the routing must be very good; otherwise, with every UP call generated, or every time the Endpoint starts, the IP address will be changed if it differs from the Gateway. However, if the Endpoint is routing out of an interface that is not reachable by the Gateway, then all downcalls will fail, even though the Endpoint logged in successfully. This will obviously cause some problems with the Endpoint.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

561

8. Set the Endpoint manager login_interval to a smaller number. The default=270 New=20. Run the following command on the TMR:
wepmgr set login_interval 20

Set up physical node 2 to run the Logical Endpoint


To set up the physical node 2 to run the Logical Endpoint, follow these steps: 1. Move the cluster group containing the X: drive to node 2, using the Cluster Administrator. 2. On node 2, which is now the active node (the node which you e have not yet registered the logical Endpoint), open a command prompt window and again run the following command to create and register the lcfd-2 service on this machine:
X:\Tivoli\lcf\bin\w32-ix86\mrt\lcfd.exe -i -n <virtual_name> -C X:\Tivoli\lcf\dat\2 -P 9497 -g <gateway_label> D local_ip_interface=<virtual_ip_address>

The output listed in Figure 5-90 is similar to what you should see.

Figure 5-90 Registering the lcfd service

3. Verify that the new service was installed correctly by viewing the services list (use the net start command or Control Pane -> Services). Also view the new registry entries using the Registry Editor. You will see two entries for the lcfd service, lcfd and lcfd-2, as shown in Figure 5-91 on page 563.

562

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-91 lcfd and lcfd-2 services in the registry

4. Verify that the Endpoint successfully started and logged into the Gateway/TMR and that it is reachable (Figure 5-92).

Figure 5-92 Endpoint login verification

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

563

Configure the cluster resources for the failover


To configure the cluster resources for the failover, follow these steps: 1. Add a new resource to the cluster. 2. Log on to the active cluster node and start the Cluster Administrator, using the virtual IP address or hostname. 3. Click Resource, then right-click in the right-pane and select New -> Resource (Figure 5-93).

Figure 5-93 Add a new cluster resource

4. Fill in the information as shown in the next dialog (see Figure 5-94 on page 565).

564

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-94 Name and resource type configuration

5. Select both TIVW2KV1 and TIVW2KV2 as possible owners of the cluster Endpoint resource (see Figure 5-95 on page 566).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

565

Figure 5-95 Possible Owners

6. Move all available resources to the Resources dependencies box (see Figure 5-96 on page 567).

566

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-96 Dependency configuration

7. Enter the new service name of the Endpoint just installed (see Figure 5-97 on page 568).

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

567

Figure 5-97 Add lcfd-2 service name

8. Click Next past the registry replication screen (see Figure 5-98 on page 569). No registry replication is required.

568

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 5-98 Registry replication

9. Click Next at the completion dialog (Figure 5-99).

Figure 5-99 Completion dialog

10.Bring the new service resource online by right-clicking the resource and selecting Bring Online (Figure 5-100 on page 570). You will see the icon first change to the resource book with a clock, and then it will come online and display the standard icon indicating it is online.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster

569

Figure 5-100 Bring resource group online

11.Test the failover mechanism and failover of the Cluster Endpoint service, as follows: a. Move the resource group from one server to the other, using the Cluster Administrator. b. After the resource group has been moved, log into the new active server and verify that Endpoint Service Tivoli Endpoint-1 is running along side the physical servers Endpoint Tivoli Endpoint. c. Failover again and do the same.

570

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Appendix A.

A real-life implementation
In this appendix, we describe the implementation tasks within a deployment of a HACMP Version 4.5 and IBM Tivoli Workload Scheduler Version 8.1 scheduling environment in a real customer. We cover the installation roadmap and actual installation steps, and provide our observations in this real-life implementation. The versions of software used are HACMP.

Copyright IBM Corp. 2004. All rights reserved.

571

Rationale for IBM Tivoli Workload Scheduler and HACMP integration


The rationale for the integration of IBM Tivoli Workload Scheduler and HACMP was to use a proactive approach to a highly available scheduling solution, rather than a reactive approach. The IBM AIX/SP frame hardware environment has been an impressively stable system. However, on occasion as a TCP/IP network issue arises, customers new to IBM Tivoli Workload Scheduler scheduling environments naturally become concerned that IBM Tivoli Workload Scheduler schedules and jobs are not running on FTAs as expected. It is then realized that the IBM Tivoli Workload Scheduler FTAs continue to run their jobs even during these temporary network disruptions. This concern then developed into a risk assessment where the actual loss of the IBM Tivoli Workload Scheduling Master Domain Manager was considered. Taking the loss of a IBM Tivoli Workload SchedulerMaster Domain Manager into consideration can be a serious concern for many customers. Where some customers feel a IBM Tivoli Workload Scheduler Backup Domain Manager is sufficient for a failover scenario, other customers will realize that their entire data center, which is now controlled by IBM Tivoli Workload Scheduler, could potentially go idle for several hours during this failover period. This could be a very serious problem for a large customer environment, especially if a IBM Tivoli Workload Scheduler MDM failure were to occur shortly before the release of the (05:59) Jnextday job. Data centers running business critical applications or 10000 to 20000 jobs a day simply cannot afford a lapse in a scheduling service. Therefore, a highly available IBM Tivoli Workload Scheduling scheduling solution must be implemented.

Our environment
Figure A-1 on page 573 shows an overview of the environments used in this implementation.

572

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure A-1 Our environment

Installation roadmap
Figure A-2 on page 574 shows our installation roadmap. This flowchart is provided to help visualize the steps required to perform a successful IBM Tivoli Workload Scheduler HACMP integration. The steps are arranged sequentially, although there are certain tasks that can be performed in parallel. This flowchart can be considered to be at least a partial checklist for the tasks that must be performed in your installation.

Appendix A. A real-life implementation

573

Figure A-2 IBM Tivoli Workload Scheduler HACMP configuration flowchart

Software configuration
The following is a description of the IBM Tivoli Workload Scheduling software configuration which is in production.

574

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

AIX 5.1 (Fix Pack 5100-03). IBM Tivoli Workload Scheduling 8.1 (Patch 08). Anywhere from 500 to 1500 business critical jobs running per day. There are currently 56 FTAs (both AIX and NT), with average of one FTA node being added per month. 125 defined schedules. 325 defined jobs. Nine different workstations classes Four customized calendars.

Hardware configuration
The hardware design and configuration for this type of work must be carefully planned and thought out before purchasing any devices for the configuration. If this is not done properly, the deployment of your design may be stalled until all component issues are resolved. There are several groups of people who would be involved in this design, and various team members may be able to assist in the configuration.

Disk storage design


The disk storage design and configuration is a critical component to a successful HACMP failover design. This disk configuration must be able to be seen by all nodes within the cluster. Our selection for this centralized disk storage is based on IBM 7133 SSA storage arrays. Note: The redundant SSA controllers must be of the same version and revision. Different levels of controllers provide different raid levels, speeds, or other functions, thereby introducing incompatibility problems into the HACMP design.

Heartbeat interface
The HACMP heartbeat design is a critical component to a stable HACMP deployment.

Appendix A. A real-life implementation

575

Our design uses the Non-IP Network Serial Cable method, because of: Simplicity; once the cable is installed and tested, the configuration will probably never be touched again. There are no electrical or power issues associated with this configuration. The design is portable in the event you migrate from one disk technology to another (for example, SCSI to SSA). There are no moving parts to this configuration, so there is virtually no mean time between failure (MTBF) issues on a serial cable.

Ethernet connectivity
Proper network connectivity is critical to a successful HACMP deployment. There is little purpose to continuing without it, as HACMP will not validate or accept the configuration if the network is not properly configured. Currently we have three Ethernet adapters per machine (en0, en1, en2), totaling 6 adapters. This configuration has six IP addresses, plus one more that is actually used for the IBM Tivoli Workload Scheduler service that all IBM Tivoli Workload Scheduler FTAs connect to (the service address). We will use IP aliasing in the final production environment; this aliasing process promotes a very fast HACMP failover. Notes: Understanding the network configuration is probably one of the most critical components to the HACMP configuration. Find assistance with this step if you do not have a good understanding of the HACMP and networking relationship. All adapters to be utilized within the HACMP solution must reside within different network subnets (but the netmask must be the same).

Installing the AIX operating system


AIX 5.1 must be installed on both nodes. The same version must be installed on both machines, and both nodes must be running at the same patch level. The files that should be backed up and restored to the new confirmation are: Root: .rhosts /etc/hosts

576

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

root: .profile/.kshrcopermps: .profile/.kshrcmaestro: .profile/.kshrc operha: profile/.kshrc Installation files: maestro.tar, HACMP, IBM Tivoli Workload Scheduler connectors, Plus module, and the IBM Tivoli Workload Scheduler Windows Java Console code /etc/password /usr/local/HACMP/scripts/*

Patching the operating system


HACMP 4.5 requires that the AIX operating system be patched to version 5100-02; the current HACMP test configuration is at 5100-03. IBM recommends that the latest level of operating system patches be installed on the nodes before going into production. The latest available patch level is 5100-04. Tip: To identify the current version on the AIX node, enter:
oslevel -r

Finishing the network configuration


After the operating system installation (and patching) has been completed, all the network adapters should be reviewed for accuracy. Tip: As root, run the command ifconfig -a, which will display all information about the configured adapters in the machine

Creating the TTY device within AIX


The creation of a tty device on each node is required for the serial heartbeat. This is done through the SMIT interface (it must be run by root). At this point, you can connect your serial cable (null modem Cable). Note: If you connect your cable before you define your device, your graphical display may not work because the boot process will see a device connected to the serial port and assume it is a terminal. Use the following link to Create the TTY Device within AIX: SMIT -> Devices -> TTY -> Add a TTY -> tty rs232 Asynchronous Terminal -> sa0 Available 00-00-S1 Standard I/O Serial Port1

Appendix A. A real-life implementation

577

Figure A-3 shows our selections.

Figure A-3 Add a TTY

Testing the heartbeat interface


To test the heartbeat interface, run the following tests.

The stty test


To test communication over the serial line after creating the tty device on both nodes, do the following: 1. On the first node, enter:
stty < /dev/ttyx

where /dev/ttyx is the newly added tty device. The command line on the first node should hang until the second node receives a return code. 2. On the second node, enter:
stty < /dev/ttyx

where /dev/ttyx is the newly added tty device.

578

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

If the nodes are able to communicate over the serial line, both nodes display their tty settings and return to the prompt. Note: This is a valid communication test of a newly added serial connection before the HACMP/ES for AIX /usr/es/sbin/cluster/clstrmgr daemon has been started. This test yields different results after the daemon has been started, since this daemon changes the initial settings of the tty devices and applies its own settings. The original settings are restored when the HACMP/ES for AIX software exits.

The cat test


To perform the cat test on two nodes connected by an RS232 cable, do the following: 1. On the first node, run:
cat < /dev/ttyN

where ttyN is the tty number which RS232 is using on the first node. Press Enter. The command line on the first node should hang. 2. On the second node, run:
cat /etc/hosts > /dev/ttyN

where ttyN is the tty number which RS232 is using on the second node. Press Enter. 3. If the data is transmitted successfully from one node to another, then the text from the /etc/hosts file from the second node scrolls on the console of the first node. Note: You can use any text file for this test, and do not need to specifically use the /etc/hosts file.

Configuring shared disk storage devices


Disk storage must be configured between both nodes. Both nodes must be able to mount the file system(s) in the same location. This file system is a non-concurrent volume because IBM Tivoli Workload Scheduler has no way of properly working with Raw File Systems. Note: Testing of disk storage can be done (as root) by issuing the commands: varyonvg IBM Tivoli Workload Schedulingvg, mount /opt/tws umount /opt/tws, varyoffvg twsvg

Appendix A. A real-life implementation

579

Copying installation code to shared storage


Since the machines in this cluster are not physically accessible, it is not realistic to assume you will be able to put CDs into their CD-ROMs as required in a normal installation. Therefore, it is important to copy the installation code into a central location within the cluster. The code that should be copied into the shared volume group that all cluster nodes can see. Following is a list of the code that should be copied into this shared location: IBM Tivoli Workload Scheduler Installation code/opt/tws/tmp/swinst/tws_81 IBM Tivoli Workload Scheduler - Patch Code /opt/tws/tmp/swinst/tws_81.patch IBM Tivoli Workload Scheduler Java Console (Latest version) /opt/tws/tmp/swinst/javacon_1.2.x Tivoli Framework/opt/tws/tmp/swinst/framework_3.7 Tivoli Framework - Patch Code /opt/tws/tmp/swinst/framework_3.7b.patch IBM Tivoli Workload Scheduler Connector for the Framework /opt/tws/tmp/swinst/connector_1.2 IBM Tivoli Workload Scheduler Connector Patch Code /opt/tws/tmp/swinst/connector_1.2.x.patch IBM Tivoli Workload Scheduler Plus Module for the Framework /opt/tws/tmp/swinst/plusmod_1.2 IBM Tivoli Workload Scheduler Plus Module Patch Code /opt/tws/tmp/swinst/plusmod_1.2.x.patch HACMP Installation Code/opt/tws/tmp/swinst/hacmp_4.5 HACMP Patch Code/opt/tws/tmp/swinst/hacmp_4.5.x.patch Documentation will also be located in the same volume group so that users can easily access it. The Adobe documentation (*.pdf) will be copied into this shared location: IBM Tivoli Workload Scheduler Documentation/opt/tws/tmp/docs/tws_v81 HACMP Documentation/opt/tws/tmp/docs/hacmp_v45 Note: It is critical that all data copied up to the UNIX cluster through FTP be copied in bin mode. This will prevent data corruption from dissimilar nodes (for example, Windows and UNIX).

580

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Creating user accounts


Create the user accounts (maestro and operha) after the shared disk storage is configured and tested. maestro The maestro account must be created on both machines while the volume group/file system is mounted to the machine. This means mounting the file system, creating the account, un-mounting the file system, logging onto the next machine in the cluster, mounting the file system on the second machine, creating the maestro account, and then un-mounting the file system. operha The operha account is an account to log into other than the maestro account (currently we are using an opermps account). The operha account is important because there are moments where we will need access to one or all nodes in the cluster, but we should be logged in as maestro because we would not have access to the shared file systems (opt/tws). Also, during a failover procedure, a user logged in as maestro will create problems as the system tries to un-mount the /opt/tws file system. Note: Because the users are created on both machines, they must have their userids synchronized across both machines. This is critical to the successful configuration of a cluster.

Creating group accounts


The user Tivoli group should be created after the shared disk storage is configured and tested. Keep the following in mind when creating the group accounts: This account was formerly known as unison. The Tivoli group must be associated with the creation of the maestro account. The Tivoli group must not be associated with the creation of the operha account.

Installing IBM Tivoli Workload Scheduler software


Installation of the IBM Tivoli Workload Scheduler software (Version 8.1) at this time must be done on all nodes in the cluster, which means that if there are two

Appendix A. A real-life implementation

581

nodes in the cluster, then two IBM Tivoli Workload Scheduler installations must occur. Note: You must complete the maestro user and Tivoli group before starting the installation of the IBM Tivoli Workload Scheduler software. The actual software installation can be done following the creation of the user and group on a single machine, or you can create the user and group on all nodes first, and then cycle around to install the software again (requiring you to issue the umount, varyoffvg/varonvg and mount commands).

Patching the IBM Tivoli Workload Scheduler configuration


Patching the IBM Tivoli Workload Scheduler engine (on both the master and the FTAs) is highly recommended. The method for deploying the patch will vary among customers; some will patch manually, while others may use a software distribution mechanism. Note: It is advised that the patching of the IBM Tivoli Workload Scheduler Master be done manually, because the IBM Tivoli Workload Scheduler Administration staff has access to the machine and you need to be very careful about the procedures that are performed, especially when dealing with the added complexities that the HACMP environment introduces.

Installing HACMP software


The installation of the HACMP software must be performed on all nodes within the HACMP Cluster (in our case, we have a two-node cluster). Notes: The current version of HACMP in our environment is 4.5. The location for the HACMP documentation (*.pdf) should reside under the volume group (twsvg) and be located in /opt/tws/tmp/docs. These Adobe *.pdf files will be delivered during the installation of HACMP and should be copied into the /opt/tws/tmp/docs so that they are easily located.

Patching the HACMP software


Patching the HACMP software is critical within the HACMP environment; it is advisable to patch the HACMP system twice a year. Whenever the HACMP upgrade occurs, it must be performed to all nodes within the HACMP Cluster. You cannot have multiple nodes within the cluster out of

582

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

code synchronization for an extended period of time (IBM will not support this configuration).

Installing the Tivoli TMR software


Installing the Tivoli TMR (or Tivoli server) must be done on all nodes in the cluster; if there are two nodes in the cluster, then two IBM Tivoli Workload Scheduling installations must occur. This is best done after the HACMP software is up and running, so you can install the TMR over the same intended HACMP TCP/IP Service address.

Patching the Tivoli TMR software


In contrast to the frequent patching of many TMR production environments, it is recommended that you patch your TMR to the latest code during the initial installation and then leave the TMR alone from there. Since IBM Tivoli Workload Scheduling uses the TMR solely for authentication, patching of the TMR rarely provides added benefits to the IBM Tivoli Workload Scheduling/Standalone TMR configuration.

TMR versus Managed Node installation


Tivoli recommends that the TMR used to facilitate the connection of the JSC to the IBM Tivoli Workload Scheduling engine be configured as a standalone TMR, for the following reasons: As mentioned, the TMR associated with IBM Tivoli Workload Scheduling rarely needs maintenance applied to it. Generally speaking, this has not proven to be the case for Framework infrastructures that are supporting other applications such as monitoring and software distribution. Having the TMR associated with IBM Tivoli Workload Scheduling separate allows for the mission-critical IBM Tivoli Workload Scheduling application to be isolated from the risks and downtime associated with patching which may be necessary for other Framework applications, but is not necessary for IBM Tivoli Workload Scheduling. The Framework is a critical component to the JSC GUI. Unlike monitoring, software distribution, or other applications, IBM Tivoli Workload Scheduling operations can typically tolerate very little downtime. By isolating the IBM Tivoli Workload Scheduling TMR from other Managed Nodes in the environment, different service level agreements can be established and adhered to for the environment.

Appendix A. A real-life implementation

583

In some cases, customers may decide to not follow Tivoli's recommended practice of using a dedicated TMR. In such cases, they will need to install a Tivoli Managed Node instead. Regardless of the customers decision, they must still install the Managed Node into the HACMP Cluster similarly to installing a TMR. If customers require a Tivoli Endpoint on the IBM Tivoli Workload Scheduling Master, that is an optional installation procedure that they will need to perform in the HACMP Cluster. In order to save time, this installation step should be coordinated with the TMR installation.

Configuring IBM Tivoli Workload Scheduler start and stop scripts


The start and stop scripts for the IBM Tivoli Workload Scheduler application must be prepared and located on each node within the cluster. Those scripts, located in /usr/local/HACMP/scripts on each machine, are called: tws_mdm_up.ksh tws_mdm_down.ksh Keep the following in mind when configuring the IBM Tivoli Workload Scheduler start and stop scripts: The start and stop scripts must not be located within the shared disk volume. The HACMP verification mechanism will flag this as an error. This particular location is consistent with other HACMP installations, that reside within the IBM England North Harbor Data Center. The start and stop scripts should be tested for their functionality before HACMP integration begins.

Configuring miscellaneous start and stop scripts


Following the creation of the IBM Tivoli Workload Scheduler Start and Stop scripts, it is likely that there will be other applications that will need to be included in the TSM design. Examples of applications that might be included in the IBM Tivoli Workload Scheduler HACMP design are: Apache Web Services DB2 TSM (Tivoli Storage Manager for data backups)

584

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Note: The creation of these start and stop scripts can occasionally be rather complicated, especially when the application is expected to run under an HACMP environment, so it is useful to have the subject matter expert for the application available, as well as a contact that can provide UNIX startup and shutdown shell scripts for that application.

Creating and modifying various system files


You will need to create or modify various files within this configuration; these files are required in order for IBM Tivoli Workload Scheduler and HACMP to work properly: /etc/hosts roots .rhosts files (needed for HACMP communications) maestros .profile file rootss .profile file operha / opermps .profile file

Configuring the HACMP environment


After the IBM Tivoli Workload Scheduler start and stop scripts have been developed and tested, you can begin your HACMP configuration. You will need to configure the following: Cluster Definition (Cluster ID) Cluster nodes (all nodes in the cluster) Cluster adapters (TCPIP network adapters) Cluster adapters (Non-TCPIP - Serial Heartbeat) Define Application Servers (IBM Tivoli Workload Scheduler start and stop script references) Define Resource Groups (IBM Tivoli Workload Scheduler Resource Group) Synchronize Cluster Topology Synchronize Cluster Resources

Testing the failover procedure


Testing the HACMP failover is a procedure that can take several days, depending upon the complexity of the configuration. The configuration that we test here has no complicated failover requirements, but it must still be tested and

Appendix A. A real-life implementation

585

understood. As we gain further experience in this area, we will begin to understand and tune both our HACMP environment and its test procedures. Figure A-4 shows our implementation environment in detail.

Figure A-4 Our environment in more detail

The details for specific configurations on our IBM Tivoli Workload Scheduler HACMP environment are described in the following sections.

HACMP Cluster topology


Example A-1 on page 587 shows our HACMP Cluster topology.

586

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Example: A-1 /usr/es/sbin/cluster/utilities/cllscf > cl_top.txt Cluster Description of Cluster tws Cluster ID: 71 There were 2 networks defined : production, serialheartbeat There are 2 nodes in this cluster. NODE tehnigaxhasa01: This node has 2 service interface(s): Service Interface emeamdm: IP address: 9.149.248.77 Hardware Address: Network: production Attribute: public Aliased Address?: Not Supported Service Interface emeamdm has 1 boot interfaces. Boot (Alternate Service) Interface 1: tehnigaxhasa01 IP address: 9.149.248.72 Network: production Attribute: public Service Interface emeamdm has 1 standby interfaces. Standby Interface 1: ha01stby IP address: 9.149.248.113 Network: production Attribute: public Service Interface nodetwo: IP address: /dev/tty1 Hardware Address: Network: serialheartbeat Attribute: serial Aliased Address?: Not Supported Service Interface nodetwo has no boot interfaces. Service Interface nodetwo has no standby interfaces. NODE tehnigaxhasa02: This node has 2 service interface(s): Service Interface tehnigaxhasa02: IP address: 9.149.248.74 Hardware Address: Network: production Attribute: public Aliased Address?: Not Supported Service Interface tehnigaxhasa02 has no boot interfaces. Service Interface tehnigaxhasa02 has 1 standby interfaces. Standby Interface 1: ha02stby IP address: 9.149.248.114 Network: production Attribute: public Service Interface nodeone: IP address: /dev/tty1 Hardware Address:

Appendix A. A real-life implementation

587

Network: serialheartbeat Attribute: serial Aliased Address?: Not Supported Service Interface nodeone has no boot interfaces. Service Interface nodeone has no standby interfaces. Breakdown of network connections: Connections to network production Node tehnigaxhasa01 is connected to network production by these interfaces: tehnigaxhasa01 emeamdm ha01stby Node tehnigaxhasa02 is connected to network production by these interfaces: tehnigaxhasa02 ha02stby Connections to network serialheartbeat Node tehnigaxhasa01 is connected to network serialheartbeat by these interfaces: nodetwo Node tehnigaxhasa02 is connected to network serialheartbeat by these interfaces: nodeone

HACMP Cluster Resource Group topology


Example A-2 shows our HACMP Cluster Resource Group topology.
Example: A-2 /usr/es/sbin/cluster/utilities/clshowres -g'twsmdmrg' > rg_top.txt Resource Group Name Node Relationship Site Relationship Participating Node Name(s) Dynamic Node Priority Service IP Label Filesystems Filesystems Consistency Check Filesystems Recovery Method Filesystems/Directories to be exported Filesystems to be NFS mounted Network For NFS Mount Volume Groups Concurrent Volume Groups Disks GMD Replicated Resources PPRC Replicated Resources Connections Services twsmdmrg cascading ignore tehnigaxhasa01 tehnigaxhasa02 emeamdm /opt/tws fsck sequential /opt/tws /opt/tws twsvg

588

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Fast Connect Services Shared Tape Resources Application Servers Highly Available Communication Links Primary Workload Manager Class Secondary Workload Manager Class Miscellaneous Data Automatically Import Volume Groups Inactive Takeover Cascading Without Fallback SSA Disk Fencing Filesystems mounted before IP configured Run Time Parameters: Node Name Debug Level Format for hacmp.out Node Name Debug Level Format for hacmp.out

twsmdm

false false true false false tehnigaxhasa01 high Standard tehnigaxhasa02 high Standard

ifconfig -a
Example A-3 shows the output of ifconfig -a in our environment.
Example: A-3 fconfig -a output Node01 $ ifconfig -a en0: flags=e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 9.164.212.104 netmask 0xffffffe0 broadcast 9.164.212.127 en1: flags=4e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT, PSEG> inet 9.149.248.72 netmask 0xffffffe0 broadcast 9.149.248.95 en2: flags=7e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT, CHECKSUM_OFFLOAD,CHECKSUM_SUPPORT,PSEG> inet 9.149.248.113 netmask 0xffffffe0 broadcast 9.149.248.127 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> Node02 $ ifconfig -a en0: flags=e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 9.164.212.105 netmask 0xffffffe0 broadcast 9.164.212.127

Appendix A. A real-life implementation

589

en1: flags=4e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT, PSEG> inet 9.149.248.74 netmask 0xffffffe0 broadcast 9.149.248.95 en2: flags=7e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT, CHECKSUM_OFFLOAD,CHECKSUM_SUPPORT,PSEG> inet 9.149.248.114 netmask 0xffffffe0 broadcast 9.149.248.127 lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>

Skills required to implement IBM Tivoli Workload Scheduling/HACMP


There are many skills needed to place such a system into production, and it is unlikely that any one person will perform this complex task alone. A large environment requiring this type of solution generally has specialists that administer various technology sectors. Therefore, it is critical that all participants become involved early in the design process so that there are minimal delays in implementing the project. It should be noted that while this particular exercise was specific to a IBM Tivoli Workload Scheduling/HACMP integration, the complexity and involvement needed would be no different were this a design utilizing the HP Service Guard or Sun Cluster to provide the high availability needed in a UNIX-based architecture other than AIX. Following is a summary of the roles and skill levels needed for this effort.

Networking Administration team


The networking team must have ample time to prepare the network switches and segments for an HACMP Cluster design. They may need to supply multiple network drops at the data center floor location. Since a large HACMP configuration may require six or more network drops, there may also be a need to purchase additional switches or blades. The skill set for these activities is medium to high. It is likely that several members of a networking team would be involved in these activities. Required time for activity: 2 to 5 days

590

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

AIX Administration team


This team is responsible for the following tasks: General setup of the RS/6000 and the AIX operating systems within the cluster Patching AIX operating systems DASD configuration Configuring and testing of the serial heart beat cable at the OS level Network configuration and connectivity testing Possibly assisting with the HACMP and IBM Tivoli Workload Scheduling installation The skill level for these tasks is high, and is best performed by an AIX administrator/specialist. Required time for activity: 5 to 15+ days

HACMP Administration team


The HACMP administration team is responsible for the daily operations of the HACMP environment. Many large customers will have a team dedicated to maintaining these complicated HACMP Clusters. Some of the duties they perform are installations, upgrades, troubleshooting and tuning. It is not unusual to find them having strong AIX skills, and their duties may overlap into AIX administration. The required skill level for these types of activities is high. The whole purpose of this environment is to provide a highly available 24-hour, 365-day a year operation. HACMP administrators having no training and a minimal skill level place the HACMP system, the application and the business at risk. Therefore, training for HACMP (or any clustering product) is required. Training for seasoned HACMP administrators is also suggested as HACMP has seen significant changes over the last several revisions. Required time for activity: 10 to 15 days, and ongoing support

Tivoli Framework Administration team


In larger shops, there may exist a Framework team that would install the TMRs (or Managed Nodes, if you decide against a dedicated TMR) for you. This team would need to be aware that, although it is performing multiple installations of a TMR, this effort must be coordinated with the HACMP administrators.

Appendix A. A real-life implementation

591

The required skill level for this activity is medium to high. Administrators may have procedures that will make the installation more efficient. Required time for activity: 10 to 15 days, and ongoing support

IBM Tivoli Workload Scheduling Administration team


The IBM Tivoli Workload Scheduling administration team may be well versed in the installation of the IBM Tivoli Workload Scheduling code (and patches) into the cluster. Otherwise, this task might be handled by the AIX administrators. The skill level for this type of configuration is high. This is a process requiring a thorough understanding of the following areas: The IBM Tivoli Workload Scheduling application and its recommended installation procedures The AIX operating system RAID levels and file system tuning configurations Fundamental understanding of the HACMP environment (which introduces complexities into the normal IBM Tivoli Workload Scheduling application installation) Required time for activity: 3 to 5 days

Hardware Purchasing Agent


This resource is responsible for purchasing all RS/6000 and AIX-related hardware, software, cables, storage cabinets, DASD, null modem serial cables, additional TCP/IP network switches and other hardware components required for the IBM Tivoli Workload Scheduling/HACMP implementation. The skill level for this activity is estimated to be low to medium. IBM sales has resources that are capable of quickly generating a robust configuration based on a customer's general hardware requirements. Required time for activity: 1 to 2 days

Data Center Manager


The tasks that are performed and coordinated by the data center management team can vary greatly. Tasks that need to be coordinated are floor space allocation and various procedures for placing machines into production. They also coordinate with other personnel such as electricians, HVAC specialists, and maintenance teams, who may need to prepare or reinforce the raised floor structure for the new system being delivered.

592

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

While the estimated technical skill level of this activity is low, it is an effort requiring a great deal of coordination skills. These activities can be time-consuming and need to be coordinated properly; otherwise, they will negatively impact the implementation schedule. Required time for activity: 2 to 3 days

Electrical Engineers
Tasks performed by a licensed engineer typically deal with potentially hazardous high voltage situations. The skill level for this type of activity is high. As this is specialized trade, it should not be performed by anyone other than a licensed engineer. Required time for activity: 1 to 2 days

HVAC Engineers
Heating, ventilation and air conditioning configurations are generally installed in large data centers before any equipment is ever delivered onto the data center floor. As the data center equipment population grows, however, cooling requirements should be reviewed as new equipment is placed on the data center floor. The skill level for these types of activities is high. As this is specialized trade, it should not be performed by anyone other than a licensed engineer. Required time for activity: 1 to 3 days

Service Engineers
IBM Service Engineers (SEs) are responsible for installing and testing the base functionality of the RS/6000 and possibly the base AIX operating system. The SE may also consult with the customer and assist in such activities as: SSA adapter configurations and tuning SSA Raid configurations and tuning TCP/IP network configurations and tuning The skill level for these installation activities is high. The IBM Service Engineer is a resource that is critical to a properly installed cluster configuration (for example, if a cable were improperly installed, you would inadvertently witness false HACMP takeovers). Required time for activity: 2 to 3 days

Appendix A. A real-life implementation

593

Backup Administration team


This team provides the vital service of integrating the HACMP solution into the backup configuration. In the case of this effort, a TSM client was installed into the configuration and the cluster is backed up nightly. This team is also responsible for providing assistance with disaster recovery testing, adding one more level of security to the complete environment. The skill level for any enterprise backup solution is high. Large backup environments require personnel who are trained and specialized in a very critical business activity. Required time for activity: 1 to 2 days

Observations and questions


In this section we offer our observations, together with questions and answers related with our implementation.

Observation 1
HACMP startup does not occur until both cluster nodes are running HACMP. After rebooting both nodes, we started the HACMP services on Node01 first and checked whether IBM Tivoli Workload Schedule had started. But after 10 or 15 minutes, IBM Tivoli Workload Scheduler still had not started. After waiting for some time, we started the HACMP services on Node02. Shortly after Node02 started its HACMP services, we saw the IBM Tivoli Workload Scheduler application come up successfully on Node01. We have placed UNIX wall commands in the IBM Tivoli Workload Scheduler startup (and shutdown) scripts, so we will see exactly when these IBM Tivoli Workload Scheduler-related scripts are invoked.

Question
Our environment is a two-node cluster dedicated to running the IBM Tivoli Workload Scheduler process tree (the second node in the cluster sits idle). Therefore, wouldnt it make sense for us to start the HACMP IBM Tivoli Workload Scheduler Resource Group as soon as possible, regardless of which node comes up first?

Answer
Yes, and that is normal. Your cascading config, as far as having node priority, is listed to have it start on Node01.

594

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Question
If this is acceptable (and advisable), exactly how is the HACMP configuration modified to accomplish this goal?

Answer
Why you are dependent on the second node to start should be related to how your NFS is set up. You may leave your fs as a local mount and export it, but do not nfs mount it back to itself.

Observation 2
During the startup of the HACMP Cluster, the connection to Node01 is lost. What occurs during this procedure is that the IP address on the Ethernet adapter is migrated to the EMEAMDM Service address (9.149.248.77). During this migration, your connection is broken and you must now physically reconnect to the machine through the EMEAMDM address.

Question
Does the addition of a third IP/Address (IP Aliasing) resolve this issue?

Answer
Yes. Your setup what is called a node alias, and probably even changed your topology config, where the boot and standby are both boot adapters. This would implement IP address takeover via aliasing (which would also be fast). However, node alias itself may not resolve this if it comes up on the boot adapter, which we believe is normal. So we think you would want to implement both node alias and IPAT via aliasing.

Question
Would this third IP address require an additional Ethernet adapter?

Answer
No, it does not.

Question
Would this third IP address need to be in a different subnet from the other two addresses?

Answer
Yes, it would. Here is what to do: Change your standby adapter to be a type boot adapter, and change your existing boot adapter(s) to be a different subnet than your service adapter subnet. This will give you a total of three subnets being used.

Appendix A. A real-life implementation

595

Then you can create a node alias, which can be the same subnet as the service, and it is actually quite normal to do so. Figure A-5 shows a generic example of topology config with IPAT via aliasing and the node alias, which is listed as persistent. This configuration requires a minimum of three subnets. The persistent address and service addresses can be on the same subnet (which is normal) or on separate subnets. This is also true when using multiple service addresses. (This example shows mutual takeover, which means node B fails to A also, so the service 1b does not apply for you, but should give you the idea.)

Node A

Node B

Boot 1a Persistent -

IP - 10.10.1.9 IP - 9.19.163.12

10.10.1.10 - IP 9.19.163.13 - IP

Boot 1b Persistent

Boot 2a Service 1a -

IP - 10.10.2.9 IP - 9.19.163.15

10.10.2.10 - IP 9.19.163.25 - IP

Boot 2b Service 1b

Netmask 255.255.255.0
Figure A-5 IPAT via aliasing topology example

Observation 3
During the failover process from Node01 to Node02, the service address on Node02 (9.149.248.74) remains unchanged, while the standby adapter (EN2 9.149.248.114) is migrated to the EMEAMDM service address (9.149.248.77). (In contrast, when HACMP services are started, we do get disconnected from the primary adapter in Node01, which is what we expected.) In this configuration, when we telnet into the EN1 adapters (9.149.248.72 and 9.149.248.74) on both machines, we do not get disconnected from the machine during the failover process.

Question
Is this behavior expected (or desired)?

596

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Answer
This is normal when doing traditional IPAT and one-sided takeover, because fallover of a service address will always move to the standby adapter, either locally for NIC failure, or remotely on system failure. If you implemented aliasing, you would not see any significant difference.

Question
Is this situation something we would like to see our Node01 do? (For example, have the secondary adapter (EN3) switch over to the EMEAMDM Service address, while EN2 (9.149.248.72) remains untouched and essentially acts as the backup Ethernet adapter.)

Answer
You could see the desired results if you implement aliasing.

Observation 4
Upon starting the HACMP Services on the nodes, we see content like that shown in Example A-4 in our smit logs.
Example: A-4 smit logs Oct 17 2003 20:56:39Starting execution of /usr/es/sbin/cluster/etc/rc.cluster with parameters: -boot -N -b 0513-029 The portmap Subsystem is already active. Multiple instances are not supported. 0513-029 The inetd Subsystem is already active. Multiple instances are not supported. Oct 17 2003 20:56:51Checking for srcmstr active...Oct 17 2003 20:56:51complete. 23026 - 0:00 syslogd Oct 17 2003 20:56:52 /usr/es/sbin/cluster/utilities/clstart : called with flags -sm -b 0513-059 The topsvcs Subsystem has been started. Subsystem PID is 20992. 0513-059 The grpsvcs Subsystem has been started. Subsystem PID is 17470. 0513-059 The grpglsm Subsystem has been started. Subsystem PID is 20824. 0513-059 The emsvcs Subsystem has been started. Subsystem PID is 19238.

Question
Are the statements outlined in bold normal?

Answer
Yes, especially after starting the first time. These services are started by HA on Node01, and by reboot on Node02. When stopping HA, it does not stop these particular services, so it is fine.

Appendix A. A real-life implementation

597

Observation 5
When attempting to test the failover on the cluster; never be logged in as the maestro user. Since this users home file system resides in the shared volume group (twsvg or /opt/tws), we will most likely have problems with: The cluster actually failing over because it will not be able to mount the file system Possible corruption of a file, or file system

Observation 6
The failover of the HACMP Cluster seems to work fine. We decided to benchmark the failover timings: Shutdown of HA services on Node1 - Wed Oct 22 17:45:51 EDT 2003 Startup of HA services on Node2 Wed Oct 22 17:47:37 EDT 2003 Result: a failover benchmark of approximately 106 seconds. The test is performed as follows. Have a machine that is external to the cluster prepared to ping emeamdm (9.149.248.77). This machine is called doswald.pok.ibm.com (you will need two terminals open to this machine). 1. In the first terminal, enter the UNIX date command (do not press Enter). 2. In the second terminal, enter the UNIX command ping 9.149.248.77 (do not press Enter). 3. Have terminals open to both nodes in the cluster. (We had both nodes in the cluster running the HACMP services, with the IBM Tivoli Workload Scheduler Resource Group running on Node1.) Node1 must be in seen when selecting smit hacmp -> Cluster Services -> Stop Cluster Services -> shutdown mode = takeover (press Enter only one time). 4. In the first terminal, from doswald, press Enter. This will give you the begin time of the cluster failover. 5. Very quickly go back to node1, and press Enter. This will start the cluster failover. 6. In the second terminal, from doswald, press Enter. This will execute the ping command. 7. In the first terminal, from doswald, enter the UNIX date command again (do not press Enter). 8. Wait for the ping command to resolve. Then press Enter for the final date command.

598

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

9. Subtract the first date command results from the second date command results.

Question
Does 106 seconds sound like a reasonable duration of time?

Answer
It does sound reasonable. However, the overall time should be the instant of failure until the time it takes for the application to get up and running by user connectivity on the other machine. You seem to only be testing IP connectivity time. You should also test via a hard failure, meaning halt the system.

Question
Would the addition of another IP address possibly improve this failover time of 106 seconds?

Answer
Only implementing IPAT via aliasing should improve this time (by perhaps a few seconds).

Question
Would the addition of another IP address require another physical Ethernet card?

Answer
No.

Appendix A. A real-life implementation

599

600

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Appendix B.

TMR clustering for Tivoli Framework 3.7b on MSCS


In this appendix, we provide step-by-step instructions on how the Tivoli Management Framework 3.7b on Windows 2000 was configured in a high availability environment. We guide you through the steps needed to install and configure the TMR. In this environment, the Windows server is configured with Windows 2000 Advanced Server Edition SP and is running the Microsoft Cluster Manager.

Copyright IBM Corp. 2004. All rights reserved.

601

Setup
The setup shown in Table B-1 was used during Windows TMR installation. The cluster includes physical nodes SJC-TDB-01 and SJC-TDB-02, with a virtual node named tivoli-cluster. The shared resource that is configured to fail over is defined as drive D:.
Table B-1 Installation setup

Hostname SJC-TDB-01 SJC-TDB-02 tivoli-cluster

IP address 10.254.47.191 10.254.47.192 10.254.47.190

Description Physical node Physical node Virtual node

Configure the wlocalhost


Framework 3.7b for Windows does not read the /etc/wlocalhosts file or the wlocalhosts environment variable. Instead, with Framework 3.7b, there is a wlocalhost command that is used to configure the value of the wlocalhost. The command will create the localhost registry key in the HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform registry path. If you have installed Framework on another Windows machine, you can copy the $BINDIR/bin/wlocalhost binary from another machine and run it locally to set this value. The syntax we used to set the value of the wlocalhost was wlocalhost tivoli-cluster. If you are installing Framework for the first time on the Windows platform, you can manually create this value using regedit.

Install Framework on the primary node


After the wlocalhost is set, the next step is to install Framework on the primary node. This is done by using the same procedures that are provided in the 3.7 Installation guide; the only exception is that you will want to specify the installation directory to be the shared drive (in our case, it is D:\tivoli). Once Framework is installed, open a command prompt and run the odadmin odlist command to verify that the oserv is bound to the virtual IP and hostname defined by the wlocalhost command. Restart the primary node to register the tivoliap.dll.

602

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Install Framework on the secondary node


Prior to installing Framework 3.7b on the secondary node, you will need to open the Cluster Manager and initiate a failover. Once the failover has occurred, you will need to delete the %DBDIR% directory and set the wlocalhost on the secondary node. If all went well during the installation on the primary node, you will be able to find the wlocalhost binary in the %BINDIR%/bin directory. After the %DBDIR% has been removed and the wlocalhost has been set, you can install Framework on the secondary node. The Framework installation should be identical to the installation on the primary node, with the installation directory being on the shared drive (D:\tivoli). After the installation, run the odadmin odlist command to verify that the oserv is bound to the virtual IP and hostname. Restart the secondary node, if it has not already been restarted.

Configure the TMR


Follow these steps, in the order specified, to configure the TMR.

Set the root administrators login


When installing Framework, a default high level administrator is created that is named Root_ SJC-TDB-02. This administrator is, by default, bound to a login at the hostname where the TMR was installed. In order to log in to the Tivoli Desktop, you need to modify the login so the user will be able to log in at the virtual host. First, open a command prompt and run the following command to set up an alias to allow the root user to log in:
odadmin odlist add_hostname_alias 1 10.254.47.190 SJC-TDB-02

Once the alias has been set, log in to the desktop and set the login with the appropriate hostname. Then use the following command to remove the alias:
odadmin odlist del_hostname_alias 1 10.254.47.190 SJC-TDB-02

Force the oserv to bind to the virtual IP


In order for the oserv to work properly, you need to bind it to the virtual IP address. This can be done with the following command:
odadmin set_force_bind TRUE 1

Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS

603

Change the name of the DBDIR


When Framework is installed, it will still point to SJC-TDB-02.db for the DBDIR, regardless of whether or not the wlocalhost is set. To resolve this, manually rename the DBDIR from the SJC-TDB-02.db to the tivoli-cluster.db directory.

Modify the setup_env.cmd and setup_env.sh


Next, modify the c:\winnt\system32\drivers\etc\tivoli\setup_env.* files that are used to set up the environment variables. Since Framework on Windows installs the DBDIR using the <hostname>.db directory instead of in the <virtual hostname>.db directory, you need to open a text editor and modify the directory where the environment variables point by changing all references of SJC-TDB-02.db to tivoli-cluster.db. Once this is done, copy the modified setup_env.cmd and setup_env.sh to the c:\winnt\system32\drivers\etc\tivoli on both nodes.

Configure the registry


There are two places to modify in the Windows registry when Tivoli is installed. You can modify these locations by using the regedit command. The first place to modify is under the HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\oserv94 path. You will need to modify the Service directory key and the Database directory key to point to the new D:\tivol\db\tivoli-cluster.db directory, instead of to the SJC-TDB-02.db directory. The second place to modify is where the oserv service looks for the oserv.exe executable; the location in the registry is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\oserv. You will only need to modify the path to the oserv.exe to d:\tivoli\db\tivoli-cluster.db. The modifications will have to be made on both primary and secondary nodes.

Rename the Managed Node


The TMRs Managed Node, which was created during the installation of Tivoli, was named by the hostname instead of the virtual hostname. This is not necessarily a problem since the oserv is bound to the virtual hostname and IP. To maintain consistency, however, in our case we opted to rename the ManagedNode to the name of the virtual hostname.

604

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

This was done with the following command from the Windows bash shell:
MN=`wlookup r ManagedNode SJC-TDB-02` idlcall $MN _set_label tivoli-cluster

If you perform this task, run the wlookup ar tivoli-cluster command afterward to verify that the rename was successful.

Rename the TMR


The default name of the TMR when it was installed on Windows was still SJC-TDB-01-region instead of tivoli-cluster-region. This is not a problem, but to maintain consistency we renamed the TMR using the following command:
wtmrname <virtual hostname>-region

If you perform this task, verify the result of the command by running the wtmrname command and check that the output shows tivoli-cluster-region.

Rename the top-level policy region


When the Framework was installed, it created a top-level policy region call SJC-TDB-02-region. This is not a problem, but to maintain consistency we chose to rename the region. This can be done from the Tivoli Desktop by right-clicking the SJC-TDB-02-region icon on the root administrators desktop and selection properties. Once the Properties dialog is open, you can change the name to tivoli-cluster-region then click Set & Close to activate the changes. We chose to change the name of the top-level policy region from the command line by using the following command:
PR=`wlookup r PolicyRegion SJC-TDB-02-region` idlcall $PR _set_label tivoli-cluster-region

If you perform this task, run the following command to verify the change:
wlookup r PolicyRegion tivoli-cluster-region`

Rename the root administrator


The default Tivoli administrator that was created was named Root_SJC-TDB-02-region. This is not a problem, but for consistency we choose to change the name to Root_tivoli-cluster-region. This was done from the Tivoli Desktop by opening the administrators window and right-clicking Root_SJC-TDB-02-region administrator and selecting

Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS

605

Properties. Once the properties window was open, we modified the name to Root_tivoli-cluster-region. If you perform this task, then click Save & Close and the configuration is complete.

Configure the ALIDB


When Tivoli was installed, the ALIDB was set to SJC-TDB-02.db; this is an internal value that is hardcoded into the Tivoli object database. In order to change this value, we had to output the sequence list to a file, then modify the file, and re-import the sequence list. In order to get the sequence list, we ran the following command from a bash shell:
MN=`wlookup r ManagedNode tivoli-cluster` idlcall $MN _get_locations > c:/locations.txt

We opened the c:\locations.txt file with a text editor and changed all occurrences of SJC_TDB-02 to tivoli-cluster. When the editing was complete, we re-imported the sequence-list using the following command.
idlcall $MN _set_locations < c:/locations.txt

If you perform this task, once the value is set you should be able to install software successfully.

Create the cluster resources


We followed these steps to create the cluster resources.

Create the oserv cluster resource


In order for the oserv service to fail over, we created a resource in the cluster manager for both oserv services. We opened up the cluster manager first on the primary node, and then on the secondary node. We right-clicked the cluster group and selected new resource. We defined the oserv as a Generic Service and added the required information.

Create the trip cluster resource


The trip service is required for the oserv to process correctly, so we also had to create a resource for it in the cluster manager. We opened up the cluster manager on either the primary or secondary node, right-clicked the cluster group, and selected new resource. We defined trip as a generic service and added the required information.

606

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Set up the resource dependencies


To set up the resource dependencies, right-click the oserv resource and set it so that the virtual hostname, virtual IP, quorum disk, shared disk and trip are set as dependencies. Without setting up these dependencies, the oserv could possibly get in to an infinite failover loop.

Validate and backup


Follow these steps to validate and back up your configuration.

Test failover
Open the cluster manager and initiate a failover. Verify that the oserv service starts on each node. If failover works, bring down the oserv on each node and verify that the cluster will fail over successfully. If the backup of the Tivoli databases works, it means that you have successfully installed Framework 3.7b on a Windows cluster.

Back up the Tivoli databases


This is the most important part of the installationif all the validation tests are positive, back up your Tivoli databases by running the wbkupdb command.

Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS

607

608

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Abbreviations and acronyms


AFS AIX APAR API BDM BMDM CLI CMP CORBA CPU CWOF DHCP DM DNS ESS FTA FTP HA HACMP HAGEO HCL IBM IP IPAT Andrew File System Advanced Interactive Executive authorized program analysis report Application Program Interface Backup Domain Manager Backup Master Domain Manager command line interface cluster multi-processing Common Object Request Broker Architecture ITWS workstation cascading without fallback Dynamic Host Configuration Protocol Domain Manager Domain Name System IBM TotalStorage Enterprise Storage Server Fault Tolerant Agent File Transfer Protocol high availability High Availability Cluster Multi-Processing High Availability Geographic Cluster system Hardware Compatibility List International Business Machines Corporation Internet Protocol IP Address Takeover SSA ITSO ITWS JFS JSC JSS JVM LCF LVM MDM MIB MSCS NFS NIC ODM PERL PID PTF PV PVID RAM RC SA SAF SAN SMIT SNMP International Technical Support Organization IBM Tivoli Workload Scheduler Journaled File System Job Scheduling Console Job Scheduling Services Java Virtual Machine Lightweight Client Framework Logical Volume Manager Master Domain Manager Management Information Base Microsoft Cluster Service Network File System Network Interface Card Object Data Manager Practical Extraction and Report Language process ID program temporary fix physical volume physical volume id random access memory return code Standard Agent System Authorization Facility Storage Area Network System Management Interface Tool Simple Network Management Protocol Serial Storage Architecture

Copyright IBM Corp. 2004. All rights reserved.

609

SCSI STLIST TCP TMA TMF TMR TRIP X-agent

Small Computer System Interface standard list Transmission Control Protocol Tivoli Management Agent Tivoli Management Framework Tivoli Management Region Tivoli Remote Execution Service Extended Agent

610

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this Redbook.

IBM Redbooks
For information on ordering these publications, see How to get IBM Redbooks on page 613. Note that some of the documents referenced here may be available in softcopy only. High Availability Scenarios for Tivoli Software, SG24-2032 IBM Tivoli Workload Scheduler Version 8.2: New Features and Best Practices, SG24-6628

Other publications
These publications are also relevant as further information sources: Tivoli Workload Scheduler Version 8.2, Error Message and Troubleshooting, SC32-1275 IBM Tivoli Workload Scheduler Version 8.2, Planning and Installation, SC32-1273 Tivoli Workload Scheduler Version 8.2, Reference Guide, SC32-1274 Tivoli Workload Scheduler Version 8.2, Plus Module Users Guide, SC32-1276 Tivoli Management Framework Maintenance and Troubleshooting Guide, GC32-0807 Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 Tivoli Workload Scheduler for Applications User Guide, SC32-1278 Tivoli Workload Scheduler Release Notes, SC32-1277 IBM Tivoli Workload Scheduler Job Scheduling Console Release Notes, SC32-1258 Tivoli Enterprise Installation Guide Version 4.1, GC32-0804 HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861

Copyright IBM Corp. 2004. All rights reserved.

611

High Availability Cluster Multi-Processing for AIX Master Glossary, Version 5.1, SC23-4867 HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864 High Availability Cluster Multi-Processing for AIX Programming Client Applications Version 5.1, SC23-4865 High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862 Tivoli Enterprise Installation Guide Version 4.1, GC32-0804 IBM Tivoli Workload Scheduler Job Scheduling Console Users Guide Feature Level 1.2, SH19-4552 IBM Tivoli Workload Scheduler Job Scheduling Console Users Guide Feature Level 1.3, SC32-1257 Tivoli Management Framework Reference Manual Version 4.1, SC32-0806

Online resources
These Web sites and URLs are also relevant as further information sources: FTP site for downloading Tivoli patches
ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_1.3/

HTTP site for downloading Tivoli patches


http://www3.software.ibm.com/ibmdl/pub/software/tivoli_support/patches_1.3/

Tivoli public Web site


http://www.ibm.com/software/tivoli/

IBM Fix Central Web site


http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp

Microsoft Software Update Web site


http://windowsupdate.microsoft.com

IBM site for firmware and microcode download-for storage devices


http://www.storage.ibm.com/hardsoft/products/ssa/index.html

IBM site for firmware and microcode download-for pSeries servers


http://www-1.ibm.com/servers/eserver/support/pseries/fixes/hm.html

Microsoft Hardware Compatibility List Web site


http://www.microsoft.com/hcl

612

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Microsoft Cluster Server white paper location


http://www.microsoft.com/ntserver/ProductInfo/Enterprise/clustering/ClustArchit.asp

IBM Web site that summarizes HACMP features


http://www-1.ibm.com/servers/aix/products/ibmsw/high_avail_network/hacmp.html

Microsoft Cluster Server white paper location


http://www.microsoft.com/ntserver/ProductInfo/Enterprise/clustering/ClustArchit.asp

RFC 952 document


http://www.ietf.org/rfc/rfc952.txt

RFC 1123 document


http://www.ietf.org/rfc/rfc1123.txt

Web page for more information on downloading and implementing ntp for time synchronization
http://www.ntp.org/

How to get IBM Redbooks


You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site:
ibm.com/Redbooks

Related publications

613

614

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Index
Symbols
.jobmanrc 60 .profile files 318, 458 .rhosts file 332 .tivoli directory 463 /etc/filesystems 101102 /etc/inittab 187 /etc/wlocalhosts 602 Cluster name 70 Fallover Strategy 71 Location of key application files 70 Node Relationship 70 Start Commands/Procedures 71 Stop Commands/Procedures 71 Verification Commands 71 ATM 21 authentication services 5 Autolink 387 Automatically Import Volume Groups 88 Automation 40 AutoStart install variable 465 Autotrace service 521 Availability analysis 343 available Connectors 331

Numerics
7133 Serial Disk System 26 8.2-TWS-FP02 210

A
Abend state 61 ABENDED 60 access method 50 active cluster server 559 active instance 307 Active/Active 44 Active/Passive 44 add IP alias to oserv 460 additional domains 2 advanced mode 506 AFS 313 AIX 33 AIX 5.2.0.0 114 AIX logical disk 434 AIX physical disk 434 ALIDB 606 Allow failback 382 Amount of downtime 344 Amount of uptime 344 Andrew File System See AFS APAR IY45695 117 Application Availability Analysis tool 344 application healthiness 41 application monitoring 228, 487 Application Server Worksheet 70 Application Worksheet 70 Application name 70

B
Backup Administrations Team 594 Backup Domain Manager 25, 572 Backup Master Domain Manager 5758 backup processors 9 base Framework install 459 batchman 194, 203 Batchman Lives 367 Batchman=LIVES 194 BDM See Backup Domain Manager best practice 408 big endian 58 bind 111 BMDM See Backup Master Domain Manager boolean expression 60 boot 78 boot IP label 78 built-in web server 464 business requirements 4

C
C program 50 CA7 51 cascading 72, 257

Copyright IBM Corp. 2004. All rights reserved.

615

cascading without fallback 260 Cascading Without Fallback Activated 88 cat test 579 certificate authority 503 cfgmgr 215 chdev 214 cl_RMupdate 229 Cleanup Method 487 clharvest_vg command 283 Client reconnection 22 Clinfo 21 Clinfo API 21 clRGinfo command 298 clsmuxpd 342 clsmuxpdES 293 clstrmgrES 293 cltopinfo command 282 cluster 7 cluster administration console 23 Cluster Administrator tool 156 Cluster Event Name 92 cluster event processing 91 Cluster Event Worksheet 92 Cluster Event Name 92 Cluster Name 92 Post-Event Command 92 cluster events 91 cluster group 138, 379 cluster hardware 468 cluster IP address 155 cluster manager 35, 269 cluster multi-processing 16 Cluster Service 23, 157 Cluster Service Configuration Wizard 156 Cluster Services 23 Cluster Services Group 368 cluster software 23, 38 cluster state 78 cluster status information 21 Clustering Technologies Basic elements 32 Managing system component 35 Typical configuration 33 Clustering technologies ix, 1, 8 High availability versus fault tolerance 8 loosely coupled machines 8 MC/Service Guard 33 Open Source Cluster Application Resources 33 overview 8

SCYLD Beowulf 33 Sun Cluster 33 terminology 7 types of cluster configurations 12 Veritas Cluster Service 33 clverify utility 257, 280 command 2, 270, 388 command line 191 communication networks 21 company name 191, 389 component failures 17 components file 48, 5455 composer program 365 computing power 45 concurrent 257 concurrent access environments 20 concurrent jobs 368 Configuration management 344 Configure the registry 604 Configure the TMR 603 Configure the wlocalhost 602 Configuring a resource group 492 conman CL 61 Connector 328 Connector Framework resource 325 Connector instance 5, 405 Connector name 327 Connector objects 331 container 491 CONTENTS.LST file 402 cookbook approach 28 CPU 2 cpuname 387 crossover cable 138 current plan 61 current working directory 323 Custom application monitoring 484 custom monitor 228 customized event processing 269 customizing cluster events 91 CWOF See cascading without fallback

D
Data Center Manager 592 database 58 Database directory registry key 604 databases in sync 58

616

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

default cluster group 138 default Gateway 112 Default Node Priority 87 Dependencies 40 Destination Directory 389 df command 110, 324 disaster recovery 11 disk adapter 82 Disk fencing 496 Disk Fencing Activated 88 Disk Mirroring 18, 45 disk technologies 82 dispatcher number 461 distributed computing resources 45 DM See Domain Manager DNS configuration 4 DNS server 252, 274 domain 2, 24, 4748 domain account 141 domain management responsibilities 59 Domain Manager 2, 4, 25, 48, 54, 192 domain name 50 Domain Name System See DNS domain user 354 domain workstations 58 downtime 9, 21 du command 324 dual Master Domain Manager configuration 346 dumpsec 198 duplicating system components 32

executable file 2 Extended Agent 49, 51 external disk 35 external disk device 20 external drive system 43

F
Failback 23 failed applications 22 failed disk drive 82 failed job 60 fallback 7, 23, 38 fallback policy 258, 492 fallover 7, 3738, 229, 257, 294 fallover policy 492 Fallover Strategy 73 Fault tolerance 8 fault tolerant 57 Fault Tolerant Agent 2, 4, 25, 28, 4950, 54, 192, 357 fence 249 Filesystem Recovery Method 88 Filesystems Consistency Check 88 FINAL 12 FINAL job stream definition 367 For Maestro 390 Forced HACMP stops 345 Framework 48 Framework 3.7b 601 Framework oserv IP alias 320 frequency of resource exchange 408 front-end application 21 fsck 88 FTA See Fault Tolerant Agent Full Status 58

E
echo command 323 efficiency of the cluster 60 Electrical Engineers 593 Enable NetBIOS 175 Endpoint 479, 489, 500, 502503 Endpoint manager login_interval 562 Enhanced Journaled File System 18 Enterprise management 343 environment variable 463 Error notification 18 ESS 26 Ethernet 21 Ethernet PCI Adapter 417 exchange Framework resources 337, 408

G
Gateway architecture 472 generic service 377, 554 Geographic high availability 343 geographically dispersed Gateways 472 get_disk_vg_fs 269 globalopts 58 globalopts file 191 grid 45 Grid Computing 45 grid computing 45

Index

617

Group 23

H
HA See high availability HACMP 33, 6771, 78, 82 HACMP 4.5 577 HACMP 5.1 Benefits 17 Implementing 67 Install base 122 Removing 134 Updating 126 HACMP Administrations Team 591 HACMP Cluster topology 586 HAGEO 26 halt command 298 hardware address 79 Hardware Compatibility List 139, 145 Hardware configurations 43 Hardware considerations Disk 43 Disk adapter 43 Disk controller 43 Network 42 Network adapter 42 Node 42 Power source 42 TCP/IP subsystem 43 hardware HA solution 58 Hardware Purchasing Agent 592 Hdisk 80 heartbeat 35 heartbeat mechanism 35 heartbeat packet 35 heartbeating 255257 Heartbeating over disk 213 high availability ix, 2, 8, 16, 32 high availability design 27 High Availability Geographic Cluster system See HAGEO High availability terminology Backup 7 Cluster 7 Fallback 7 Fallover 7 Joining 7 Node 7

Primary 7 Reintegration 7 High-Availability Cluster Multiprocessing See HACMP highest-priority node 380 highly available object dispatcher 490491 hostname 250251 hosts files 557 hot standby 12, 33 hot standby node 66 Hot standby scenario 66 Hot standby systems 46 HP-UX 33 HP-UX operating systems 463 HVAC Engineers 593

I
IBM Fix Central web site 114 IBM LoadLeveler, 59 IBM PCI Tokenring Adapter 417 IBM RS/6000 7025-F80 417 IBM service provider 464 IBM SSA 160 SerialRAID Adapter 417 IBM Tivoli Business Systems Manager 4 IBM Tivoli Configuration Manager 4, 345 IBM Tivoli Configuration Manager 4.2 210 IBM Tivoli Distributed Monitoring (Classic Edition) 4 IBM Tivoli Enterprise Console 4, 345 IBM Tivoli Enterprise Data Warehouse 4 IBM Tivoli Management Framework 45, 48, 66, 304, 318 IBM Tivoli NetView 4 IBM Tivoli ThinkDynamic Orchestrator 345 IBM Tivoli Workload Scheduler 5, 49, 54, 260, 318, 324 architectural overview 2 Backup Domain Manager 58 Backup Domain Manager feature 25 Backup Domain Manager feature versus high availability solutions 24 Backup Master Domain Manager 57 components file 48 Console 48 CPU 2 database 47 Domain Manager 2, 48 engine code 48 Extended Agent 49

618

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Fault Tolerant Agent 25, 49 geographically separate nodes 26 hardware failures to plan for 26 highly available configuration 25 instance 72 job flow 61 Job recovery 60 Job Scheduling Console 48 managed groups 2 Master Domain Manager 2, 4, 47 Multiple instances 56 out of the box integration 4 pre 8.2 versions 56 relationship between major components 6 scheduling network 2 scheduling objects 2 Software availability 57 Switch manager command 59 switchmgr command 24 Two instances 5456 when to implement high availability 24 workstation 2 IBM Tivoli Workload Scheduler high availability Advantages 26 HA solutions versus Backup Domain Manager 24 Hardware failures to plan for 26 in a nutshell 27 Possible failures 24 When to implement 24 IBM Tivoli Workload Scheduler Version 8.1 571 IBM Tivoli Workload Scheduler z/OS access method 51 IBM Tivoli Workload Scheduler/HACMP integration Add custom post-event HACMP script 242 Add custom start and stop HACMP scripts 234 Add IBM Tivoli Management Framework 303 Adding the FINAL jobstream 194 Applying fix pack 204 Checking the workstation definition 193 Configure application servers 223 Configure cascading without fallback 260, 264 Configure Framework access 330 Configure HACMP networks and heartbeat paths 254 Configure HACMP persistent node IP label/addresses 272 Configure HACMP resource groups 257 Configure HACMP service IP labels/addresses

221, 252 Configure HACMP to start on system restart 300 Configure heartbeating 213 Configure predefined communication interfaces 276 Configure pre-event and post-event commands 267 Configure pre-event and post-event processing 269 Configuring the engine 192 Create additional Connectors 328 Creating mount points on standby nodes 186 example .profile 191 implementation 184 implementation overview 184 Install base Framework 315 Installing the Connector 194 Installing the engine 191 Interconnect Framework servers 331 lessons learned 345 Live test of HACMP fallover 298 Log in using Job Scheduling Console 339 Modify /etc/hosts and name resolution order 250 one IBM Tivoli Workload Scheduler instance 345 Planning for IBM Tivoli Management Framework 303 Planning the installation sequence 312 Poduction considerations Configuration management 344 Dynamically creating and deleting Connectors 341 Enterprise management 343 forced HACMP stops 345 Geographic high availability 343 Measuring availability 343 Monitoring 342 Naming conventions 340 Notification 345 Provisioning 345 Security 342 Time synchronization 341 Preparing to install 188 Required skills 590 Setting the security 198 Start HACMP cluster services 287 Test HACMP resource group moves 294

Index

619

Things to consider Creating mount points on standby nodes 186 Files installed on the local disk 187 IP address 187 Location of engine executables 186 Netman port 187 Starting and stopping instances 187 user account and group account 186 Verify fallover 301 Verify the configuration 280 IBM Tivoli Workload Scheduling Administrations Team 592 IBM TotalStorage Enterprise Storage Server See ESS IBM WebSphere Application Server 464 ifconfig 298 Inactive Takeover 88 index file 325 industry-standard hardware 18 initializing oserv 400 initiator file 216 installation code 580 installation password 399 installation roadmap 573 installation user 5354, 56 Installation User Name 389 installation wizard 408 Installing additional languages 360 Autotrace service 505 Base Framework 315 Connector 194 Connector fix pack 204 Framework 37b 602 Framework components and patches 459 HACMP 92 highly available Endpoint 472 IBM Tivoli Management Framework Version 4.1 312 IBM Tivoli Workload Scheduler engine 191 IBM Tivoli Workload Scheduler Framework components 322 IBM Tivoli Workload Scheduler on MSCS 348 installation directory 355 Job Scheduling Connector 402 Job Scheduling Console 408 Job Scheduling Services 195, 401 Microsoft Cluster Service 141

multiple Tivoli Endpoints 555 Tivoli Framework components and patches 318 Tivoli Managed Node 536 TRIP 538 InstallShield 558 Instance Count 486 Instance Owner 195 instant messaging 310 Interconnecting Framework Servers 405 Inter-dispatcher encryption level 334 Interface Function 78 internal cluster communications 138 interregion encryption 334 interregional connections 399 Inter-site Management Policy 498 IP 78 IP address 155 IP Address Takeover 87 IP Alias 257 IP hostname lookup 455 IP label 78 IPAT 76 IPAT via IP Aliases 77 IPAT via IP Replacement 76

J
Jakarta Tomcat 464 Java interface 61 JES 51 JFS filesystem 437 jfs log volume 102 JFS logical volume 109 jfslog 84 Jnextday 194, 367 Jnextday job 58 job 2, 60 job abend 60 job definition 60 job execution 92 job management system 59 job progress information 51 job recovery 60 Job Scheduling Connector 48 Job Scheduling Console 56, 21, 49, 61, 320 Job Scheduling Services 5, 48 job status information 51 job turnaround time 60 jobs standard list file 50

620

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

jobman 203 jobmanrc 52 jobtable file 62 joining 7 JSC See Job Scheduling Console JSS See Job Scheduling Services

loosely coupled machines 8 lsattr 214, 417 lspv 84 lsvg 298 LVD SCSI Disk Drive 417 LVM See Logical Volume Manager

K
kill a job 61 kill a process 61 killed job 61

M
MaestroDatabase 337, 407 MaestroEngine 337, 407 MaestroPlan 337, 407 mailman 203 Maintenance Level 02 115 major number 84, 436 makesec 199 Managed Node 195196, 321, 462, 489, 536, 552 Managed Node software 545 ManagedNode resource 407 management hub 2 Management Policy 87 manual startup 368 -master 192 Master CPU name 389 Master Domain Manager 47, 54, 5758, 192, 357 Masters CPU definition 193 MC/Service Guard 33 mcmagent 50 mcmoptions 50 MDM See Master Domain Manager Measuring availability 343 method 52 methods directory 50 Microsoft 33, 145 Microsoft Cluster Administrator utility 147 Microsoft Cluster Service cluster group 166 concepts 22 Failback 23 Group 23 Load balancing 24 Quorum disk 23 Resource 23 Shared nothing 22 default cluster group 138 hardware considerations 139 installation 141

L
LAN 43 laptop 6 LCF 477 less busy server 24 License Key 400 license restrictions 40 licensing requirements 40 Lightweight Client Framework See LCF lightweight client framework 489 link verification test 422 Linux 33 Linux environment 322 little endian 58 Load balancing 59 Load balancing software 59 LoadLeveler administrator 59 LoadLeveler cluster 59 local configuration script 52 local disk 56 Local UNIX access method 52 local UNIX Extended Agent 52 local user 354 localhost registry key 602 localopts 58 logical storage 35 logical unit 23 logical volume 437 Logical Volume Manager 17, 83 logical volume name 99 logredo 88 Longest period of downtime 344 Longest period of uptime 344

Index

621

network name 139 our environment 138 Planning for installation 139 Pre-installation setup Add nodes to the domain 141 Configure Network Interface Cards 139 Create a domain account for the cluster 141 quorum partition size 140 Setup Domain Name System 139 Setup the shared storage 140 Update the operating system 141 Primary services Availability 21 Scalability 21 Simplification 21 private NIC 138 public NIC 138 service 22 Microsoft Windows 2000 305 Mirroring SSA disks 82 modify cpu 193 monitor jobs 49 mount 109 MSCS 2124 MSCS white paper 22 multi node cluster 44 multiple SSA adapters 441 mutual takeover 13, 346, 391 mutual takeover scenario 195

NIC 138139 node 35 Node Name 77 node_down event 269 node_id 213 node_up event 269 node_up_local 269 node-bound connection 221 non-active node 559 non-concurrent access 20 non-concurrent access environments 20 non-TCP/IP subsystems 21 normal job run 60 Notification 345 notification services 345 Notify Method 487 NT filesystem 140 NTFS 140 NTFS file system 355

O
Object Data Manager See ODM object database 406 object dispatcher 399, 489 observations 594 odadmin 406 odadmin command 320 ODM 134, 268 ODM entry 213 odmget 214 Online Planning Worksheet 211 OPC 51 Open Source Cluster Application Resources 33 Opens file dependency 49 Oracle Applications 50 Oracle e-Business Suite 50 oserv 320 oserv service 399, 606 oserv.exe 604 oserv.rc 479 oslevel 114

N
naming convention 340, 385, 524 netman 203 Netmask 7778, 256 network adapter 77, 79, 155 Network File System See NFS network interface 290 Network Interface Card See NIC Network Name 77 Network Time Protocol 341 Network Type 77 Networking Administrations Team 590 new day processing 59 new logical volume 441 NFS 313 NFS exported filesystem 8485

P
parent process 61 Participating Node Names 492 Participating Nodes 87 Patching

622

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Best practices 209 Connector 204 HACMP 5.1 117 IBM Fix Central web page 117 IBM Tivoli Workload Scheduler 204, 582 Job Scheduling Console 305 log file 210 operating system 141, 577 patch apply sequence for Framework 4.1 313 Tivoli Framework and components 318 Tivoli TMR software 583 twspatch script 204 PeopleSoft 50 PeopleSoft Client 51 PeopleSoft Extended Agent 50 PeopleSoft job 51 Percentage of uptime 344 persistent 78 Persistent IP label 78 physical disks 82 physical network 77 Planning applications for high availability 70 HA hardware considerations 41 HA software considerations Application behavior 39 Automation 40 Dependencies 40 Fallback policy 41 Licensing 40 Robustness 41 HACMP Cluster network 76 HACMP Cluster nodes 68 HACMP resource groups 87 HACMP shared disk device 81 HACMP shared LVM components 83 high availability design 418 IBM Tivoli Workload Scheduler in an HACMP Cluster 184 MSCS hardware 139 MSCS installation 139 shared disks for HACMP 421 point-to-point network 35 policy region 465, 539, 605 polling interval 228 port address 57 port number 56 post-event commands 92, 269 PowerPC 417

pre-event commands 269 Preferred owner 380 Prevent failback 382 primary IP hostname 335 primary node 72 private connection 138 Private Network Connection 154 private NIC 138 Process application monitoring 484 Process control 18 process ID 62 Process monitoring 228 Process Owner 486 process request table 51 Production considerations 340 production day 3 production file 194 production plan 204 program 2 promote the workstation 357 Provisioning 345 psagent 51 public NIC 138 PVID 84

Q
quiesce script 247, 269 quiescing the application server 245 quorum 149 Quorum Disk 23, 150

R
R/3 Application Server 50 R3batch 50 RAID 26 RAID array 26 rccondsucc 60 real life implementation 571 recovery procedure 41 Redbooks Web site 613 Contact us xi redundant disk adapters 82 redundant disk controllers 43 Redundant hardware 34 redundant network adapter 26 redundant network path 26 redundant physical networks 77 regedt32.exe 557

Index

623

region password 399 registry key 534 Registry replication 569 reintegrated node 92 reintegration 7 remote filesystem 313, 399 remote R3 System 50 remote shell access 332 Remote UNIX access method 52 Remote UNIX Extended Agent 52 replicate registry keys 528 Required skills 590 Resolve Dependencies 58 Resolvedep 387 resource 23, 35 resource group 3536, 87 resource group fallover 87 Resource Group Name 87 Resource group policy Cascading 87 Concurrent 87 Custom 87 Rotating 87 Resource Group Worksheet 87 Automatically Import Volume Groups 88 Cascading Without Fallback Activated 88 Cluster Name 87 Disk Fencing Activated 88 File systems Mounted before IP Configured 88 Filesystem Recovery Method 88 Filesystems 87 Filesystems Consistency Check 88 Inactive Takeover 88 Management Policy 87 Participating Nodes 87 Resource Group Name 87 Service IP Label 87 Volume Groups 87 response file 408 Restart Count 486 Restart Interval 487 Restart Method 487 restoration of service 18 return code 60 RFC 1123 252 RFC 952 252 Robustness 41 root user 195 rotating 257

RS-232C 35, 213

S
Samba 313 Sample last.cfg file 561 SAN 13 SAN network 43 SAP Extended Agent 50 SAP instance 50 SAP R/3 50 SchedulerDatabase 337, 407 SchedulerEngine 337, 407 SchedulerPlan 337, 407 scheduling network 2 scheduling objects 65, 204 SCSI 82, 417 SCSI drives 140 Secure Sockets Layer 503 Security 342 security file 198 Serial 213 Serial Storage Architecture See SSA SERVER column 406 server failure 22 Server versus job availability 10 service 78 Service directory registry key 604 Service Engineers 593 Service IP Label 78, 87, 320 Service Pack 4 138 Servlet 2.2 specifications 464 set_force_bind 462 set_force_bind variable 322 setup_env.cmd 604 setup_env.sh 604 shared disk volume 348 shared LVM access 453 shared memory segments 521 Shared nothing 22 shared nothing clustering architecture 22 shared resource 13 Shared Volume Group/Filesystem Worksheet 84 Filesystem Mount Point 85 Log Logical Volume name 84 Logical Volume Name 84 Major Number 84 Node Names 84

624

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Number of Copies of Logical Partition 84 Physical Volume 84 Shared Volume Group Name 84 Size 85 single point of failure 43 single points of failure 82 Small Computer System Interfaces SCSI SMIT 343 SMTP e-mail 400 SMUX 342 Software configurations 46 Software considerations Application behavior 39 Automation 40 Dependencies 40 Fallback policy 41 Licensing 40 Robustness 41 software heartbeat 22 Solaris 33 Solaris operating systems 463 spider HTTP service 464 SSA 82, 93 Serial Storage Architecture SSA connection address 424, 426 SSA disk subsystem 82 SSA Disk system 43 SSA disk tray 345 SSA links 421 Stabilization Interval 486, 488 Standard Agent 357 standard list file 51 start and stop scripts 584 start-of-day processing 58 startup policy 492 statefull connection 22 stateless connection 22 static IP address 139 Stop Commands/Procedures 74 Storage Area Network See SAN stty test 578 subevent 269 subnet 77, 155, 175, 596 subnet mask 257 SUCCES 60 successful job 60 Sun Cluster 33

supported HA configuration for a Tivoli server 416 supported platforms 408 switch manager command 59 switchmgr 5859 symbolic link 241 Symphony file 5859 synchronize the configuration 280 system crash 61

T
tar file 204 target 213 target file 216 target mode interface 215 Target Mode SCSI 35, 213 Target Mode SSA 35, 213 TCP port number 389 TCP/IP Network Interface Worksheet 78 Interface Function 78 IP Address 78 Netmask 79 Network Interface 78 Network Name 78 Node Name 78 TCP/IP Networks Worksheet 7778 Cluster Name 77 IP Address Offset for Heart beating over IP Aliases 77 IPAT via IP Aliases 77 Netmask 77 Network Name 77 Network Type 77 TCP/IP subsystem 21 TCPaddr 387 TCPIP 51 -thiscpu 192 Threshold 381 Time synchronization 341 time to quiesce 497 Tivoli administrator 330 Tivoli database 401 Tivoli Desktop 204, 401 Tivoli Desktop applications 21 Tivoli Desktop users 419 Tivoli Endpoint 555, 584 Tivoli Enterprise environment 462 Tivoli Enterprise products 503 Tivoli environment variable 318

Index

625

Tivoli Framework 3.7.1 408 Tivoli Framework Administrations Team 591 Tivoli Framework/HACMP integration Analyze assessments 432 Configure HACMP 480 Configure the application monitoring 484 Configure the logical volume 441 Create a logical volume and a JFS filesystem 437 Create shared disk volume 420 Export the volume group 444 Implementing 416 Install Tivoli Framework 453 Plan for high availability 453 Production considerations 502 Re-import the volume group 446 Security 503 Tivoli Endpoints 466 Tivoli Enterprise products 503 Tivoli Managed Node 464 Tivoli Web interfaces 464 Verify the volume group sharing 450 Tivoli Job Scheduling administration user 397 Tivoli Job Scheduling Services 1.3 408 Tivoli Management Region 65 See TMR Tivoli Management Region server 66 Tivoli Netman 368 Tivoli region ID 335 Tivoli Remote Access Account 399, 510 Tivoli Remote Execution Service See TRIP Tivoli Software Installation Service 399 Tivoli TMR software 583 Tivoli Token Service 368 Tivoli Web interfaces 464 Tivoli Workload Scheduler 368 Tivoli_Admin_Privleges group 507 tivoliap.dll 520, 602 TivoliAP.dll file 400 tmersrvd account 507 TMF_JSS.IND 195 TMR 6566 TMR interconnection 311 TMR server 65 TMR versus Managed Node installation 583 Token-Ring 21 top-level policy region 605 TRIP 540

TRIP resource 540 TRIP service 528 TTY Device 577 two node cluster 43 two-way interconnected TMR 337 two-way interconnection 335, 406 TWS_CONN.IND 325 TWShome directory 326 twspatch script 204 Types of hardware clusters Disk Mirroring 45 Grid Computing 45 Multi node cluster 44 Two node cluster 43

U
UNIX cluster 5051 unixlocl 52 unixrsh 52 upgrade AIX 114

V
varyoffvg 110 varyonvg 110 Verification Commands 71, 74 Verify Endpoint fallover 502 Verify Managed Node fallover 501 Veritas Cluster Service 33 virtual IP 155 virtual IP label 76 virtual server 24, 508 volume group 83, 102 volume group major number 435

W
wbkupdb 406 wclient 465 wconnect 333334, 406 wcrtgate 471 wgateway 471 wgetadmin command 331 Windows 2000 Advanced Edition 138 Windows 2000 Advanced Server 141 Windows Components Wizard 159 Windows NT/2000 Server Enterprise Edition 21 Windows registry 604 winstall command 325

626

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

wkbkupdb 323 wlocalhost binary 603 wlocalhost command 602 wlookup 333, 406 wlookup command 328 wlsconn 335, 406 wmaeutil 407 workstation 2 workstation definition 387 workstation limit 368 workstation name 50 wrapper 52 wrapper script 60 wserver 465 wserver command 334 wsetadmin command 330 wtmrname 605 wtwsconn.sh 327 wupdate 407

X
x-agent 49

Y
Y-cable 138

Z
z/OS access method 51 z/OS gateway 51

Index

627

628

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

(1.0 spine) 0.875<->1.498 460 <-> 788 pages

Back cover

High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework
Implementing high availability for ITWS and Tivoli Framework Windows 2000 Cluster Service and HACMP scenarios Best practices and tips
In this IBM Redbook, we show you how to design and create highly available IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework (TMR server, Managed Nodes and Endpoints) environments. We present High Availability Cluster Multiprocessing (HACMP) for AIX and Microsoft Windows Cluster Service (MSCS) case studies. The implementation of IBM Tivoli Workload Scheduler within a high availability environment will vary from platform to platform and from customer to customer, based on the needs of the installation. Here, we cover the most common scenarios and share practical implementation tips. We also give recommendations for other high availability platforms; although there are many different clustering technologies in the market today, they are similar enough to allow us to give useful advice regarding the implementation of a highly available scheduling system. Finally, although we basically address highly available scheduling systems, we also offer a section for customers who want to implement a highly available IBM Tivoli Management Framework environment, but who are not currently using IBM Tivoli Workload Scheduler. This publication is intended to be used as a major reference for designing and creating highly available IBM Tivoli Workload Scheduler and Tivoli Framework environments.

INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information: ibm.com/redbooks


SG24-6632-00 ISBN 0738498874

Related Interests