Beruflich Dokumente
Kultur Dokumente
1. INTRODUCTION
It is important to note that examples of critical batch
exist on all platforms – it is the oldest use of
The original motivation for developing capacity computing systems and continues to this day. On a
planning tools was the advent of “interactive” PC you have overnight backups and virus scans and
computing, where the customer was a person, and on large mainframes and distributed systems you
they cared how long it took for their “transaction” to have overnight database updates and multi-day
process. Prior to that, most computing consisted of sequences to produce bills. With all critical batch,
various varieties of batch processing and the there is always an explicit scheduling and frequency;
customers were internal, so not much was done in for example, you might schedule a backup to be
the way of capacity planning. So the first analytic performed daily, at midnight. Billing cycles are
models had three types of workloads available: usually executed monthly. The scheduling is
Interactive Time Sharing (TS), Interactive frequently also explicit about what state the server
Transaction Processing (TP), and Batch Processing will be in, i.e. not running any other work, for two
(BP). Usually analysis and modeling was focused reasons:
on the peak of the interactive workload(s), and batch
was just a low-priority stream of work that utilized (1) There may be a requirement that everything
the resources not required for the interactive else be “finished”. For example, open files
processing. The overnight period was then used cannot be backed up or nightly account
exclusively for “production” batch and the reconciliation cannot occur if customers are
“interactive” day would begin again the next still entering additional transactions.
morning. All the while, there continued to be (2) Having all computing resources available for
particular batch work whose completion was critical a compute-intensive activity always favors
to the business and whose precise time of the best possible performance!
completion was known. Examples of this type of
work would be reconciliation of financial accounts, For the largest examples of critical batch work, one
database updates, or billing cycles. or more large servers may be required to perform
the necessary processing, and these servers are
The key similarity between interactive and critical likely dedicated to this processing. The timing is
batch work is that there is a very specific usually dictated by business requirements rather
performance expectation, e.g. “under .1 second than by server availability. For smaller examples,
response time” or “finish by 8 AM”. The key such as a PC backup or virus scan, you intentionally
difference between interactive and batch work is that schedule these activities outside of your “interactive”
interactive work is typically evaluated one use of the PC, e.g. 3 AM, expecting the work to be
transaction at a time where critical batch is typically completed before you need to use the PC
a very large number of transactions which are interactively again.
submitted for processing all at once, and the
completion of the last request is the only timing that It is also common to have explicit expectations
is of interest. about when the batch activity can start as well as
the time by which it must be complete, for example, load is increasing
account reconciliation at a financial institution cannot
start until the “interactive” day has completed, and Objectively it doesn’t matter what the trend is or isn’t
the reconciliation must be completed before the next – what matters is that the analyst can confirm what
“interactive” day begins. And this is exactly why has been stated as the current performance results
capacity planning analysis is almost always required and changes (if any) in the workload.
–
the amount of work to be completed and the time Also historical measurement data allows the
available for completion often conflict with each selection of multiple “baseline” study periods – no
other. performance analysis should be attempted using
only one selected time period, or at least such
This is where our work begins. results should be considered tentative at best!
3. RESULTS
Figure 1. Server CPU Utilization
(possible 400%)
Four case studies will be presented, demonstrating
various aspects of the analysis techniques described
above. As suggested in the introduction, there is
both great variety in particular examples of elapsed
time analysis as well as common principles applied
to every one.
Similar checks are made for I/O (maximum disk Although it’s not shown in the paper due to space
utilization is 4% and CPU wait for I/O is 2% and constraints, the two weeks of measurement data
under), paging (close to zero), and network (all not was overwhelmingly consistent, so representative
shown due to paper length requirements). All of this single samples are shown above.
indicates that hardware constraints of any type are
unlikely to be the cause for slowness so instead it
must be the application implementation itself. 3.2 Monthly Batch Processing Analysis Case
Another possibility is that there are some hardware
constraints, but they don’t appear using 15-minute The second case study was a request to analyze
average resource reporting. why a particular batch cycle took so long to run and
to predict how it would perform when the workload
For the next phase of the study, measurement data was increased. It is a billing application and the
is instead summarized for each minute (instead of performance requirement is that when the workload
for every 15 minutes), and the resource utilization is is doubled, the billing cycle still needs to complete
higher than what was seen before, but nothing is within nine days. The system had 12 processors
near to 100%. For example, it is still evident that the configured and the application uses Sybase for the
three Sybase processes are not saturated and in database backend. Two examples of this billing
fact during the heart of the processing, two of the cycle were measured in order to study the
three have no load at all (Figure 5): characteristics of the cycle: the March cycle
processed 1.7 million accounts and April processed
1.825 million accounts.
Figure 8. CPU Utilization by extract Details of the process start times are observed,
Process allowing division of the sequence into five phases,
corresponding to the “FE_EMC_PRO” process
An analytic model predicts that about 75% of the running at the time: “A” from 20:31, “B” from 21:08,
growth could be absorbed with the current “C” from 22.41, “D” from 23:37, and “E” from 00:22
configuration, assuming that even loading of the (Figure 10).
dataservers (and extracts) is possible and
sustainable.
The third case study analyzes a stream of batch An analytic model is constructed for each phase,
processes which currently complete in 4 hours and labelling the main transaction as “EMC-A”, “EMC-B”,
predicts what system upgrade(s) would be needed etc. for easy identification. The BATCH-
to maintain that elapsed time when the workload is TRANSACTIONS for each phase is 20,000
doubled. The system has 8 processors configured (preparing for use in equation 2.4.1).
and the application uses Oracle for the database
backend. The model for “EMC-B” is used as an example. The
original process run time is about 1.7 hours. The
The sequence begins in the evening and ends early model is evaluated (Figure 11).
the next morning. Measurement data for the two
days is analyzed in order to see the detailed process
structure. Specifically, there is a sequence of
processes named “FE_EMC_PRO” which generate
the activity (Figure 9).
Now for the upgrade analysis. We observe the This is repeated for each of the 5 models, and in
transaction profile, i.e. how much time is service, every case, the upgrade is adequate.
and how much is wait:
The summary of the baseline performance is that
the transaction profiles are dominated by CPU
service time in each of the first four phases. These
four phases account for about 4 hours of
processing. The fifth phase, which is a clean up and
FTP phase, uses I/O more than CPU, but is a very
short phase (about 20 minutes).