Beruflich Dokumente
Kultur Dokumente
com
© 2009 IBM
Corporation
Overview
The purpose of this presentation is to demonstrate how to find the cause of poor performance for an
IBM Integration Bus node (broker) for two different types of problem.
The examples are obtained on a Windows system but the principles of investigation and problem
determination apply equally on all platforms. The system level tools will differ though.
Introduction
Tools
Techniques
Demonstration
Root.Body.Level1.Level2.
Level3.Description.Line[1]; Set OutputRoot = InputRoot;
Bipservice
Integration Node
– Lightweight and resilient process that starts
and monitors the bipbroker process
Integration Server – If the bipbroker process fails, bipservice will
restart it
Application Application Bipbroker
Message Message – A more substantial process. Contains the
flows flows deployment manager and administrative
Libraries Libraries
agent. All commands, toolkit connections
and WebUI go through this process.
– Responsible for starting and monitoring the
biphttplistener, bipMQTT and
DataFlowEngine processes.
– If any process fail, bipbroker will restart
Integration Server [n] them.
Understand typical resource utilisation – need to understand if resource utilisation is higher than
expected or running as normal...
In busy times expect to use what is needed (!)
– Exactly what will depend on the configuration and the applications
– Typical to use CPU and memory plus I/O to some level
In quiet times Message Broker and MQ processes
– Should use very little CPU
– Should use very little I/O capacity
– Will retain memory
Some memory sizes whilst running the Coordinated Request Reply sample
– Bipservice 3.7 MB
– Bipbroker 112 MB
– Biphttplistener 35 MB
– DataFlowEngine 154 MB
• Can use from ~100 MB to GigaBytes depending on number of flows, complexity of the
message flow, the size of the messages
MQ processes
– Expect it to be less than IBM Integration Bus (76 MB for a simple queue manager)
– Will depend on number of open queues, channels, queue buffer sizes etc.
Monitoring tools
– At the operating system level to observe
• System resource usage – CPU, memory, I/O activity
• Heaviest resource users
Driving tools
– Needed to generate a continuous workload
• Important to assess performance after warm-up during sustained activity
filemon
DataFlowEngine.exe:
– This is the Integration
Server
amqzlaa0.exe
– This is the MQ agent for
LOCAL connections
(including the broker)
amqrmppa.exe
– This is the MQ agent for
CLIENT connections
Integration Bus
– User trace
– Trace nodes
– Activity Log
– WebUI
• Accounting & Statistics: Compare flow statistics at the node (broker), server (execution group),
container (application or library) or at an individual message flow level
• Resource Statistics: View resource use at the execution group level
MQ Explorer
Java Healthcenter
Resource
Node (broker) Statistics
Message Flow
Terminals
Message Model Node
$SYS/Broker/brokerName/StatisticsAccounting/recordType/executionGroupLabel/messageFlowLabel
Regular reporting
• Data published approximately every 20 seconds
$SYS/Broker/brokerName/ResourceStatistics/executionGroupLabel
– BackendReplyApp
• Sets the completion time in the message
• Writes a reply message
– Reply
• Reads the message from the back end message flow
• Retrieves the original message saved by the request message flow
• Writes an output message
CSIM_SERVER_IN_Q GET_BACKEND_REQ
GET_REPTO_STORE
– BackendReplyApp
GET_BACKEND_REQ GET_BACKEND_REP
– Reply
GET_BACKEND_REP CSIM_COMMON_REPLY_Q
GET_REPTO_STORE
Steps
1. Ensure all components are started and the applications works as expected
- Message flows, databases, external applications etc.
2. Start a load generator [JMSPerfharness in this case]
3. Look at activity
- Is processing happening at the expected rate?
- Is CPU usage as expected?
- Is memory usage as expected?
4. If things do not seem as expected
- Look for build up of messages
- Poor service times
5. Enable and view statistics
6. Analyse statistics
7. Examine message flows
Run JMSPerfharness
– Using 10 threads
– GET_REPTO_STORE is used mid-flow (so flows using this are less likely to be the problem)
– GET_BACKEND_REQ is the input queue for the BackendReplyApp
• Indicates flow is not running fast enough or not enough instances allocated
Need to investigate what is happening with BackendReplyApp
– For this use WebUI flow statistics
14 July 2015 © 2015 IBM
Corporation
Step 5 – Enable flow statistics
1 second sleep in the compute node within the message flow is causing slow processing times and no
CPU usage
– Matches the observations at the start
• Low CPU and low message rate
Unlikely to be so easy in future but slow service times, like slow synchronous web service invocations
would have the same effect
If it was slow web service response times then allocate more additional instances to improve
processing rate
JAVA_COMPUTE_IN JAVA_COMPUTE_OUT
All of the elapsed and CPU time is in the JavaCompute message flow, so continue investigation here
Environment variable:
IBM_JAVA_OPTIONS=-Xhealthcenter
Opens ports starting 1972, the Integration Server
running the JavaComputeTransform scneario is
using port 1974
-----------------------------------
[biphttplistener.exe]
[DataFlowEngine.exe]
[bipbroker.exe]
Wide range of tools available covering operating system and component performance
– Expect to use multiple tools
– After all it is important to understand what is happening at different levels
– Demonstration has shown how to use the key tools for MQ and IIB to debug a problem
Practice before hand
– Being familiar with the tools is a great help in a crisis
– Learning a new tool and solving a crisis is not a good combination
Know your applications and systems
– What is normal in terms of processing rate, CPU usage etc.
– This information allows to know whether there is a problem and to what extent
WebSphere Message Broker: Message display, test & performance utilities (IH03)
– http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg24000637
IBM Monitoring and Diagnostic Tools for Java – Getting started with Health Center
– http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/getting_started.html
MQ processes
Additional Instances usage and tuning
Task Function
AMQALMPX The checkpoint processor that periodically takes journal checkpoints.
AMQZMUC0 Utility manager. This job executes critical queue manager utilities, for example the
journal chain manager.
AMQZXMA0 The execution controller that is the first job started by the queue manager. It handles
MQCONN requests, and starts agent processes to process WebSphere MQ API calls
AMQZFUMA Object authority manager (OAM)
AMQZLAA0 Queue manager agents that perform most of the work for applications that connect to
the queue manager using MQCNO_STANDARD_BINDING.
AMQZLAS0 Queue manager agent.
AMQZMUF0 Utility Manager
AMQZMGR0 Process controller. This job is used to start up and manage listeners and services.
AMQZMUR0 Utility manager. This job executes critical queue manager utilities, for example the
journal chain manager.
AMQZDMAA Deferred Message Processor
AMQFQPUB Publish/subscribe process.
AMQFCXBA Broker worker job.
RUNMQBRK Broker control job.
AMQRMPPA Channel process pooling job.
AMQCRSTA TCP/IP-invoked channel responder.
Task Function
AMQCRS6B LU62 receiver channel and client connection.
AMQRRMFA Repository manager for clusters.
AMQPCSEA PCF command processor that handles PCF and remote administration requests.
RUNMQTRM Trigger monitor.
Integration Server level data contains the following data for each message flow in it:
– MessageFlowName – TotalSizeOfInputMessages
– TotalElapsedTime – MaximumSizeOfInputMessages
– MaximumElapsedTime – MinimumSizeOfInputMessages
– MinimumElapsedTime – NumberOfThreadsInPool
– TotalCPUTime – TimesMaximumNumberOfThreadsReached
– MaximumCPUTime – TotalNumberOfMQErrors
– MinimumCPUTime – TotalNumberOfMessagesWithErrors
– CPUTimeWaitingForInputMessage – TotalNumberOfErrorsProcessingMessages
– ElapsedTimeWaitingForInputMessage – TotalNumberOfCommits
– TotalInputMessages – TotalNumberOfBackouts
– TotalNumberOfTimeOutsWaitingForRepliesToAggregateMessages