CMG2009 - Predicting Batch Window Elapsed Time

Analytic Modeling Techniques for
Predicting Batch Window Elapsed Time
Debbie Sheetz, BMC Software
Techniques for capacity planning/modeling were first developed in order

to predict system responsiveness for interactive work. But many
companies have critical work which runs periodically (daily, monthly), non-
interactively, and with an elapsed time requirement (“batch window”).
Elapsed time prediction utilizes different modeling and analysis
techniques than interactive workload analysis does. These techniques are
outlined and applied in four case studies (UNIX/Windows systems): (1)
one job (2) multiple simultaneous jobs (3) a sequence of smaller jobs (4)
hundreds of sequences of smaller jobs.
1. INTRODUCTION
It is important to note that examples of critical batch
exist on all platforms – it is the oldest use of
The original motivation for developing capacity computing systems and continues to this day. On a
planning tools was the advent of “interactive” PC you have overnight backups and virus scans and
computing, where the customer was a person, and on large mainframes and distributed systems you
they cared how long it took for their “transaction” to have overnight database updates and multi-day
process. Prior to that, most computing consisted of sequences to produce bills. With all critical batch,
various varieties of batch processing and the there is always an explicit scheduling and frequency;
customers were internal, so not much was done in for example, you might schedule a backup to be
the way of capacity planning. So the first analytic performed daily, at midnight. Billing cycles are
models had three types of workloads available: usually executed monthly. The scheduling is
Interactive Time Sharing (TS), Interactive frequently also explicit about what state the server
Transaction Processing (TP), and Batch Processing will be in, i.e. not running any other work, for two
(BP). Usually analysis and modeling was focused reasons:
on the peak of the interactive workload(s), and batch
was just a low-priority stream of work that utilized (1) There may be a requirement that everything
the resources not required for the interactive else be “finished”. For example, open files
processing. The overnight period was then used cannot be backed up or nightly account
exclusively for “production” batch and the reconciliation cannot occur if customers are
“interactive” day would begin again the next still entering additional transactions.
morning. All the while, there continued to be (2) Having all computing resources available for
particular batch work whose completion was critical a compute-intensive activity always favors
to the business and whose precise time of the best possible performance!
completion was known. Examples of this type of
work would be reconciliation of financial accounts, For the largest examples of critical batch work, one
database updates, or billing cycles. or more large servers may be required to perform
the necessary processing, and these servers are
The key similarity between interactive and critical likely dedicated to this processing. The timing is
batch work is that there is a very specific usually dictated by business requirements rather
performance expectation, e.g. “under .1 second than by server availability. For smaller examples,
response time” or “finish by 8 AM”. The key such as a PC backup or virus scan, you intentionally
difference between interactive and batch work is that schedule these activities outside of your “interactive”
interactive work is typically evaluated one use of the PC, e.g. 3 AM, expecting the work to be
transaction at a time where critical batch is typically completed before you need to use the PC
a very large number of transactions which are interactively again.
submitted for processing all at once, and the
completion of the last request is the only timing that It is also common to have explicit expectations
is of interest. about when the batch activity can start as well as
the time by which it must be complete, for example, load is increasing
account reconciliation at a financial institution cannot
start until the “interactive” day has completed, and Objectively it doesn’t matter what the trend is or isn’t
the reconciliation must be completed before the next – what matters is that the analyst can confirm what
“interactive” day begins. And this is exactly why has been stated as the current performance results
capacity planning analysis is almost always required and changes (if any) in the workload.
–
the amount of work to be completed and the time Also historical measurement data allows the
available for completion often conflict with each selection of multiple “baseline” study periods – no
other. performance analysis should be attempted using
only one selected time period, or at least such
This is where our work begins. results should be considered tentative at best!
2.2 Resource Performance Objectives

2. METHODS
Another striking difference between batch and
interactive work is the interpretation of server
So how can the performance demands of batch be resource utilization.
represented? How does this differ from
characterizing interactive work? For interactive work, there is endless discussion of
what the proper settings are for resource thresholds:
2.1 Study Time Period Selection CPU queue length less than 2 per processor, 80%
CPU utilization, 40% disk utilization, etc. All of these
One of the most striking differences between batch thresholds are trying to capture what level of delays
and interactive work is the length of time selected for due to queueing will degrade performance too much.
analysis. Each interactive transaction contributes a tiny
amount to overall utilization – the fact that there are
Most commonly interactive work peaks at one or two hundreds of simultaneous users causes the
times during the day, so likely selections range from aggregate higher utilizations. Also the simultaneity
a one hour peak to a five minute peak. The amount isn’t coordinated – no matter what your favorite
of time selected depends only on how SLAs are distribution of arrivals (e.g. random, hyper-
defined: if the business will spend money to exponential, etc.), no one is going to argue that the
guarantee performance as measured for an hour, loading is even. The summary of the interactive
than a one hour peak should be selected; if the technique is that capacity planning seeks to keep
business will spend money to guarantee the aggregate utilizations and resulting queueing
performance for the highest activity for five minutes, below the level at which response time becomes
then a five minute peak should be selected. Peaks unacceptable.
may vary by day of week, time of month, or time of
year, and that needs to be taken into account when Batch analysis couldn’t be more different: maximum
selecting which day(s) to select the peak periods throughput/minimum elapsed time is achieved when
from. one or more resources achieve 100% utilization.
The first reaction to this principle is this has to be
Batch work runs until it is done, e.g. 1 hour, 8 hours, wrong – think of the queueing!! But the key insight
3 days, so the selected time period needs to match here is that there is usually no queueing at server
the specific characteristics of the batch work being resources because the queueing is occurring inside
studied. Depending on the frequency of the batch the batch processing, i.e. another transaction is not
work, e.g. daily, weekly, monthly, particular day(s) presented to the server to process until the last one
will be chosen for study. More discussion of how to has already completed. So expect 100% utilization
make these selections will be presented in the case with a zero queue – that means that the processing
studies, particularly how to deconstruct what using current resources is optimal. If more than one
appears as one batch sequence into component resource is measured at 100% utilization, even
sections as required. better! And if you don’t see at least one server
resource (CPU or disk) at 100%, that demonstrates
What is completely similar for all performance that there is a constraint in either the application
analysis is that historical data is of the utmost design or software implementation that is preventing
importance as it enables analysis of the batch stream from achieving optimal
performance.
(1) performance results trends, e.g. that the
batch run time is lengthening 2.3 Characterizing the Work
(2) workload trends, e.g. the amount of work
The biggest difference between representing (2) Competition from other workloads (e.g. new
interactive and batch work is the source: multiple workload(s), changed workload(s),
users issuing uncoordinated requests vs. a stream workload(s) removed)
of requests occurring precisely one after the other. (3) Hardware change (e.g. new CPU, additional
Also, it’s important to represent the difference CPUs, new devices, new caching)
between multiple simultaneous requests vs. a single (4) Change in application volume
(or multiple) stream(s) of requests.
The first three will be expressed as changes in
various model parameters (using standard analytic
The resource demands are not represented
model “what-if” procedures). The fourth case,
differently, i.e. a “transaction” requires a certain
change in volume, requires no change to the
number of milliseconds of CPU, a certain amount of
model.
I/O, etc. How you choose to define a transaction
can be different, but after that choice has been After the necessary changes have been applied to
made, the results have the same form. the model and the model re-evaluated, the new
throughput and response time will be calculated by
Another practical difference is that interactive work the model.
is usually characterized by aggregating the activities
of many users where batch work often focuses on a In order to determine the new job response time (for
very small number of processes, sometimes only all cases):
one. And even when there are multiple batch
processes, they are usually represented separately New BATCH-RESPONSE =
rather than being aggregated. New BATCH-TRANSACTIONS *
New BATCH-RESPONSE TIME (2.4.1)
2.4 Analysis Techniques
When only the application volume is changing, no
The most common approach for predicting model evaluation is required: the New BATCH-
interactive performance is analytic modeling; batch RESPONSE TIME is the baseline response time,
performance analysis begins with a bit of arithmetic, thus, only the new volume is needed to calculate the
frequently supplemented by analytic modeling. result.
Building a suitable analytic model depends on For cases where transaction volume is not
having adequate system measurement data changing, New BATCH-TRANSACTIONS is the
available, i.e. both system resource measurements same as Baseline BATCH-TRANSACTIONS.
as well as detailed process data representing the
application components. As introduced above, 2.4.2 Applying Stream Constraints to the
elapsed time analysis typically indicates selecting all Analysis
of the elapsed time, or as will be shown in the case
studies, all of the time within a phase of the batch Another change applied to the standard model is
processing. The process data is used to isolate the representation of a stream of transactions vs.
process(es) of interest into model-ready interactive arrivals of transactions. This is effected
“workload/transactions”. by augmenting the transaction description with the
maximum possible concurrency, e.g. for a single
2.4.1 Applying Business Volume to the job/process without threads, the maximum
Analysis concurrency is 1. Also, calibration of such a
transaction requires an addition to the standard
The standard model is then supplemented using two model calibration procedures: Check the result of
types of additional information: multiplying the transaction throughput and the
transaction response time. Ideally, this should come
(1) total job/process elapsed time (referred to out to about 3600 seconds (1 hour). If it comes out
as BATCH-RESPONSE) to less than 3600, some other processing is missing,
(2) total number of business requests and should be added. If it comes out to more than
processed during entire execution (referred 3600, you would have received notice of saturation
to as BATCH-TRANSACTIONS) during model evaluation and need to consider what
might be removed from the current transaction
Four categories of changes may now be evaluated processing requirements. If the maximum
with a model and a calculator: concurrency is 2, you use 7200 seconds as the
standard of comparison.
(1) Application change (e.g. more I/Os, I/O to
different devices, change in CPU Although the examples presented use distributed
requirements, fewer physical I/Os) systems data (Unix and Windows), the principles
and techniques apply to any computing system.
These techniques were first developed for
mainframes, then enhanced for mid-range systems,
then applied to distributed systems.
An important observation here is that not every

analysis includes using an analytic model, but the
process of interpreting and preparing the data for a
model provides the necessary context for an
effective analysis.
3. RESULTS
Figure 1. Server CPU Utilization
(possible 400%)
Four case studies will be presented, demonstrating
various aspects of the analysis techniques described
above. As suggested in the introduction, there is
both great variety in particular examples of elapsed
time analysis as well as common principles applied
to every one.
Each case study reflects consulting work done for

one phase of performance analysis, so “final”
conclusions for each case study are unknown. The
intended contributions were to identify the most
important characteristics and factors relating to
capacity planning solutions, if there was a capacity
planning solution at all. And even though there is
variety amongst these cases, these examples Figure 2. Physical Processsor Utilization
certainly don’t cover all types of batch analysis!
The job runs within Sybase (which is represented by
3.1 Daily Batch Job Analysis Case the “dataserver” processes) and executes from
about 7:00 to 7:30 AM. What’s notable here is that
The first case study was a request to analyze why a not only is the system utilization well below 400%
particular batch job took so long to run and thus how (400% = 4 processors) and no individual processor
it could be made to run faster. The staff above 75%, the overall system utilization actually
performance analysts had suggested allocating decreases when the job of interest is running! The
additional CPUs to the job as well as increasing the dataserver processes do record increased utilization
amount of memory available. The system had 4 (Figure 3):
processors configured. Two weeks of measurement
data were obtained to study the characteristics of
the job.
The focus here is to find what the constraint is to the

job running faster, and given the server data we
have, checking server resources is the first step.
We are hoping to find either CPU or disk resources

at 100% utilization, which would indicate that the job
is making maximum utilization of the available
resources, and our only task is to identify what kind
of hardware upgrade will produce the most
improvement in performance.
Figure 3. CPU Utilization by Process
First check on the CPU resource utilization, Name
checking both the overall system utilization as well
as the per physical processor utilization (Figure 1, Maybe the dataserver processes are individually
Figure 2): constrained?
The final step in this analysis is to look at the relative
use, i.e. “service time”, for CPU and I/O, since from
a hardware perspective that is the only response
time component that can be reduced (response time
= service time + queueing time). CPU service time
dominates, so the only upgrade that would offer a
speed up is a faster CPU. Note that additional
CPUs offer no relief since there is no measured
CPU queueing and the primary portion of the job is a
single process using its one processor. The most
important question to be answered is why is the
application structure the way it is and what is the
Figure 4. CPU Utilization per Sybase true constraint within the application which limits
dataserver Process performance? We have shown that it’s not the
server resources! This is definitely not a capacity
No 100% utilization here (Figure 4). planning problem.
Similar checks are made for I/O (maximum disk Although it’s not shown in the paper due to space
utilization is 4% and CPU wait for I/O is 2% and constraints, the two weeks of measurement data
under), paging (close to zero), and network (all not was overwhelmingly consistent, so representative
shown due to paper length requirements). All of this single samples are shown above.
indicates that hardware constraints of any type are
unlikely to be the cause for slowness so instead it
must be the application implementation itself. 3.2 Monthly Batch Processing Analysis Case
Another possibility is that there are some hardware
constraints, but they don’t appear using 15-minute The second case study was a request to analyze
average resource reporting. why a particular batch cycle took so long to run and
to predict how it would perform when the workload
For the next phase of the study, measurement data was increased. It is a billing application and the
is instead summarized for each minute (instead of performance requirement is that when the workload
for every 15 minutes), and the resource utilization is is doubled, the billing cycle still needs to complete
higher than what was seen before, but nothing is within nine days. The system had 12 processors
near to 100%. For example, it is still evident that the configured and the application uses Sybase for the
three Sybase processes are not saturated and in database backend. Two examples of this billing
fact during the heart of the processing, two of the cycle were measured in order to study the
three have no load at all (Figure 5): characteristics of the cycle: the March cycle
processed 1.7 million accounts and April processed
1.825 million accounts.
The billing cycle is completing within the required 9

days now, but without much time to spare. We
begin the analysis by selecting the longest phase of
the billing cycle, which currently requires 1 ¼ days
to complete, and applying the prediction formula
(2.4.1), is projected to take 2 ½ days to complete,
enough to violate the performance objective.
How is this billing application structured? This

sample is taken from a day when the billing cycle is
executing continuously from midnight to around 6
PM (Figure 6):
Figure 5. CPU Utilization per Sybase
dataserver Process
The CPU Idle due to I/O is as high as 10%, disk

utilization at 20%, paging peaks at 15 pages/sec –
still no indication that any server resource is over
used.
Figure 6. CPU Utilization by Process
Sybase is configured with 6 “dataserver” processes

(which enables it to use 6 out of the 12 processors
configured on the system), and the customer
accounts are extracted by 5 “extract98” processes, Figure 7. CPU Utilization per Analysis
each of which extracts accounts from one database.
Workload
Based on this profile, the workload will be
characterized by isolating each of the dataserver
Figure 7 shows two runs of the database extract
processes, and each of the extract processes.
phase, and we can clearly see the important
characteristics here:
Another interesting piece of data is that even though
the number of accounts increased by 7% between
March and June, the elapsed time increased by (1) the database extracts are quite evenly
30%. This is even worse than anticipated, so we balanced (good) and
know significant performance degradation is already (2) the Sybase dataservers are not as evenly
occurring. matched (not so good)
(3) some dataservers are using just about
We are hoping to find either CPU or disk resources 100% CPU, i.e. one processor (good).
at 100% utilization, which would indicate that the
application is making maximum utilization of the The second run completes more quickly than the
available resources, and thus indicate the first, and most likely it is due to the much more even
appropriate hardware upgrade. Figure 6 shows that loading of the dataservers, all showing about 100%
system CPU utilization is at about 6 to 7 processors CPU utilization.
out of 12. So there is ample available CPU
capacity. Disk utilization is as high as 80% -- no So the most immediate “upgrade” would be to
saturation here, either. So the main question has to configure more dataservers in order to access the
be why the application is not making full use of the available CPU capacity. Also, each time the cycle is
available CPU resource. being run, less than even utilization of the
dataservers most likely indicates a constraint other
than hardware inhibiting maximum server
performance. Uneven loading also appears to
result in uneven loading of the extract processes,
too, resulting in a longer runtime (Figure 8).
Figure 9. Process CPU Utilization Profile
Figure 8. CPU Utilization by extract Details of the process start times are observed,
Process allowing division of the sequence into five phases,
corresponding to the “FE_EMC_PRO” process
An analytic model predicts that about 75% of the running at the time: “A” from 20:31, “B” from 21:08,
growth could be absorbed with the current “C” from 22.41, “D” from 23:37, and “E” from 00:22
configuration, assuming that even loading of the (Figure 10).
dataservers (and extracts) is possible and
sustainable.
Similar analysis would be required for each of the

additional segments of the billing cycle as different
recommendations could result.
More detailed application analysis might reveal

application tuning opportunities, but most companies
choose adding hardware as being more cost-
effective than actually trying to alter the application
performance directly. Figure 10. Listing of selected process
Start Times
3.3 Batch Processing Sequence Analysis Case
The third case study analyzes a stream of batch An analytic model is constructed for each phase,
processes which currently complete in 4 hours and labelling the main transaction as “EMC-A”, “EMC-B”,
predicts what system upgrade(s) would be needed etc. for easy identification. The BATCH-
to maintain that elapsed time when the workload is TRANSACTIONS for each phase is 20,000
doubled. The system has 8 processors configured (preparing for use in equation 2.4.1).
and the application uses Oracle for the database
backend. The model for “EMC-B” is used as an example. The
original process run time is about 1.7 hours. The
The sequence begins in the evening and ends early model is evaluated (Figure 11).
the next morning. Measurement data for the two
days is analyzed in order to see the detailed process
structure. Specifically, there is a sequence of
processes named “FE_EMC_PRO” which generate
the activity (Figure 9).
Figure 11. Analytic Model Transaction

Report
To check calibration for the transaction of interest,

11.44K transactions * .30 sec/transaction = 3434
seconds. Compared to 3600 seconds (because So, if the new desired volume is 40000 transactions,
throughput is expressed as transactions per hour), and the new response time is .09 seconds, the total
that’s 95%, or 5% from perfect calibration. This is job run time is 1 hour, which is actually better than is
acceptable as is. required.
Now for the upgrade analysis. We observe the This is repeated for each of the 5 models, and in
transaction profile, i.e. how much time is service, every case, the upgrade is adequate.
and how much is wait:
The summary of the baseline performance is that
the transaction profiles are dominated by CPU
service time in each of the first four phases. These
four phases account for about 4 hours of
processing. The fifth phase, which is a clean up and
FTP phase, uses I/O more than CPU, but is a very
short phase (about 20 minutes).
The baseline job stream response time is

approximately 4.3 hours:
This transaction is almost entirely made up of CPU
service time. This immediately indicates that the 20000 transactions *
only type of upgrade that will have a significant
effect on performance will be one where the (.09 + .31 + .13 + .14 + .11) sec/tran = 4.3 hours
individual processor is much faster than the current
processor. Adding additional processors will not
have any effect (because there’s no CPU Wait time)
and I/O upgrades will have very little effect because
not much time is spent doing I/O.
The proposed upgrade is from current Sun 15-

processor system to an 8-processor Sun system
where the individual processor is about 9 times
faster. We “install” the upgrade on the system, and
reevaluate the model:
The relative response time of .29 indicates that we

will easily be able to double the transaction volume.
Also note that a substantial CPU upgrade makes the

I/O relatively more important to performance in the Figure 13. Baseline Model Response Time by
future. Phase
The summary of the ‘what-if’ modeling result is that
The transaction volume is doubled, and the model is the relative response times for the first four phases
re-evaluated (Figure 12): show that the planned CPU upgrade is adequate,
i.e. a relative response time of .5 or less indicates
that double the business volume can be handled
without extending run time. The fifth phase, does
not show the same improvement, only .70 of
original, so that phase will run a bit longer than the
original (40000 transactions * .08 sec/tran = 53
minutes).
Figure 12. Analytic Model Transaction
Results
There are hundreds of job sequences to be
performed, but no single sequence occupies the
entire batch window – generally individual
sequences are around an hour in length, beginning
with a data acquisition phase, then followed by data
analysis, database update, then a cleanup phase.
The set of sequences begins at midnight and ends
the next morning. Measurement data was obtained
for two weeks (14 samples of daily activity), and one
day is shown as an example here.
First, an overall view of the AIX frame ():
Figure 13. 'What-if' Model Response Time by

Phase
3.4 Batch Processing with a Large Number of
Jobs Analysis Case
Figure 15. Frame CPU Utilization by
The fourth case study analyzes a stream of batch
processes which currently complete in 8 hours and SPLPAR
predicts what system upgrade(s) would be needed
to maintain that elapsed time when the workload is The sequence of data acquisition, processing, and
quadrupled. The system has 16 processors cleanup is shown in pink, and the backend database
configured and the application uses Oracle for the activity in blue (
database backend. The system is an AIX Power6
frame, with 4 SPLPARs processing particular
portions of the workload (2 application partitions, 2
database partitions). The workload consists of
thousands of jobs.
Figure 146. Workload CPU Utilization
When the individual jobs/processes are examined,

the underlying application structure becomes evident
– many, many small jobs are used (Figure 15).
Given the relatively short sequences and the
number of pieces the work is divided amongst, the
sequencing aspect does not require explicit
representation, unlike the third case study. deployment to use the available CPU capacity for
the first few hours is unlikely. More promising is to
examine the cause(s) for the “tail” of the utilization,
from 4 to 8 AM. If the “tail” could be eliminated, the
batch processing could end 2 hours earlier (Table
1), which creates 25% additional capacity. So
instead of needing 64 processors for the increased
workload, only 48 will be needed.
Time Current Current Ideal

AM Processor Available Processor
s Used Processor s Used
s
4:00 13.4 2.6 16.0
4:30 10.6 5.4 16.0
5:00 7.3 8.7 16.0
5:30 5.0 11.0 2.3
Figure 157. Number of processes by
6:00 4.6 11.4 --
process name
6:30 4.9 11.1 --
7:00 2.9 13.1 --
The ‘what-if’ analysis is fairly straightforward: if no 7:30 1.7 14.4 --
change is made to the application, since the AIX Tota 50.3 50.3
frame is already at 100% utilization, 4 times the l
workload will require 4 times the number of
processors, e.g. 64 instead of the existing 16 (see Table 1. 'What-if' the processing tail
section 2.4.1 above). However, the measurement was eliminated
data shows opportunities to improve the throughput
if the existing CPU capacity can be more fully
utilized, i.e. the ideal batch CPU utilization of 100% 4. CONCLUSIONS
is being achieved for about only 3 of the 8 hours
available (Figure 16): Analytic models and their associated techniques can
be applied to elapsed time analysis as well as to
traditional interactive workload analysis. Maximum
performance is characterized by server resource
utilization of 100%, completely the opposite of
what’s desirable for interactive workloads. Selected
measurement periods usually encompass the entire
batch processing window. Variation in processing
profiles indicates the need to break the processing
up into phases, each of which is analyzed/modeled
separately. Many examples of batch processing are
in fact not constrained by server resources and
coming to that conclusion is the single most
important step in moving forward to identify what
about the application or its implementation is in fact
the real constraint. Lack of accurate elapsed time
Figure 168. Frame Processors Used analysis causes many poorly performing
applications to continue in that condition for longer
Another way to think about this is that the green than necessary as well as having both human and
area represents available capacity. Considering that server resources expended without achieving the
the first phase is data acquisition, tuning the desired effect!

CMG2009 - Predicting Batch Window Elapsed Time

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CMG2009 - Predicting Batch Window Elapsed Time

Hochgeladen von

Copyright:

Verfügbare Formate

Analytic Modeling Techniques for

Predicting Batch Window Elapsed Time

Debbie Sheetz, BMC Software

Techniques for capacity planning/modeling were first developed in order

2.2 Resource Performance Objectives

An important observation here is that not every

Each case study reflects consulting work done for

The focus here is to find what the constraint is to the

We are hoping to find either CPU or disk resources

The billing cycle is completing within the required 9

How is this billing application structured? This

The CPU Idle due to I/O is as high as 10%, disk

Sybase is configured with 6 “dataserver” processes

Similar analysis would be required for each of the

More detailed application analysis might reveal

Figure 11. Analytic Model Transaction

To check calibration for the transaction of interest,

The baseline job stream response time is

The proposed upgrade is from current Sun 15-

The relative response time of .29 indicates that we

Also note that a substantial CPU upgrade makes the

First, an overall view of the AIX frame ():

Figure 13. 'What-if' Model Response Time by

Figure 146. Workload CPU Utilization

When the individual jobs/processes are examined,

Time Current Current Ideal

Das könnte Ihnen auch gefallen