Capacity Planning of SOA-Based Systems

SOA in the Telco Domain
Part II: Capacity Planning of SOA-Based Systems

by Masykur Marhendra, SOA Solution Architect at XL Axiata
SERVICE TECHNOLOGY MAGAZINE Issue LIV September 2011
Abstract - Service-oriented architecture in the telecommunication industry is the first but huge step for
answering many challenges from management to fulfilling product timeline from marketing request. To be able
to implement service-oriented architecture, we need to define at least what technology we will use, design of
the system architecture, implementation strategy, and the roadmap itself. Last of all is how we manage this
established service-oriented system; monitoring services performance, lifecycle of services, risk management,
and so on. To maintain services performance at its best, we need to have good services capacity planning
in terms of high availability, throughput of the services, resources consumption. On this journey, services will
also evolve and expand. At this point, we also need to have good capacity planning of the platform, including
Enterprise Services Bus, Messaging Bus, and other supporting platform like database.
Introduction
Nowadays, the telecommunication industry has very tight competition on delivering the best quality of services
on short message, data, and subscription service to those services. Service-oriented architecture is the first
but huge step for answering this challenge to fulfill the high demand of the subscriber. To be able to implement
service-oriented architecture, there are a few things that we need to do. We need to define the technology we
will use, determine the design of the system architecture, and plan the implementation strategy. Choosing the
best fit technology is the first critical point to do. We need to have a set of criteria for evaluating the capability of
the technology itself which can satisfy our requirements. In the telecommunication domain, 5-9 high availability
is mandatory. Afterwards, we can then design the system architecture to fit our needs.
The last steps we need are on how we manage well-established SOA based system. Managing the
service-oriented system includes managing service availability, service performance, service lifecycle, risk
management, and so on. This step is important to keep the system stable. It must mitigate all the risks that
may occur in the future. As the number of subscribers keep growing every day, more transaction are loaded to
the system. This enables telecommunication providers to keep delivery services in its best performance with
high-availability feature. To maintain services performance at its best, we need to keep evaluating the service
capacity in terms of high availability, throughput, and resources consumption. Moreover, when services evolve
and expand, we also need to define the platform capacity itself including Enterprise Service Bus, Messaging
Bus, and other supporting platform like database.
Capacity planning of SOA-based system is a mandatory step to keep the system running on its best. It involves
two main activities which are capacity planning of the services, and capacity planning of the platform. Services
capacity planning is more on services sizing in horizontal view to be able to handle increasing incoming
transaction requests with allocated system resources. Platform capacity planning is more about sizing of
the platform capabilities to give system resources to all running services, including Enterprise Service Bus,
Messaging Bus, and other supporting platform like database. In this article, writers will discuss about these two
activities.
Capacity Planning on Services Level
Services Instance Sizing

As like another applications, SOA-based services also runs into several instances. Each instance can hold the
same capability and capacity. To be able to handle required incoming requests, we need to define how many
service instance we will run. First, we need to know how much traffic can be handled by one service instance.
Copyright Arcitura Education Inc. 1 www.servicetechmag.com

Part II: Capacity Planning of SOA-Based Systems Service Technology Magazine (Issue LIV , September 2011)
Measuring can be done through benchmarking and performance test in environment that most resemble like
production one.
For example, let us measure a service for purchasing Blackberry package registration products from the UMB
channel, and we will refer to this as service-X. Forecasted loads will be at 300 transactions per second (tps)
with service level agreement not more than 25 seconds. To do the benchmarking, we can give load test step-
by-step from 100 tps until y-tps where the services performance starts to degrade. As a sample, we have the
below performance test results:
# Load Response & Increase Form

NO
(tps) Time (ms) 1 2 3 4
1 100 1200
2 150 1250 4.17%
3 200 2300 91.67% 84.00%
4 250 3210 167.50% 156.80% 39.57%
5 300 4000 233.33% 220.00% 73.91% 24.61%
Table 1 Service-X Performance Test Result
We can see that most response time is gained at 100 tps, which is our baseline. Performance starts to degrade
slightly when we give 200 tps loads to the service. Running service at 150 tps with 2 instances to handle 300
tps load can process faster than running 300 tps with 1 instance. On row no. 2, 1 transactions per second will
takes 8,33 ms to finished. For 300 transactions per second, it will needs only around 2500 ms. If we compare
with 300 tps with 1 instances it will need 4000 ms to complete. Within 4000 ms two instances of service at 150
tps can complete around 480 tps.
From this analysis, we can conclude that service-X can handle at most 150 transactions per second. And
running two service instances at 150 tps will give better results than running one instances at 300 tps.
Services Resources Sizing

Another sizing that we need to do at services level is the system resources. System resources sizing is more
about the processing unit and memory usage needed by a single service instance to run at certain transaction
load level. Planning services resources size will then correlated with the platform capacity sizing. There are two
main aspects we need to plan:
1. Processing Unit - is defined by how many cores are needed by a single instance to run a certain
transaction load. This is one of the most important aspects we need to calculate because service
availability depends a lot on the availability of the processing unit; we cannot run more services if there is
no processing unit available. We also need to put it when we try to design a Disaster Recovery Site for the
SOA.
To know how many instances we need, processing unit usage can be found out through benchmark and
performance test on the environment that most resemble the production environment. For example, we

have a performance test result of service-X as given in table 2 below for processing unit usage and we are
using 150 tps with 2 instances as we concluded from previous example.
# Load Response & Increase Form

NO
(tps) Time (ms) 1 2 3 4
1 100 1200
2 150 1250 4.17%
3 200 2300 91.67% 84.00%
4 250 3210 167.50% 156.80% 39.57%
5 300 4000 233.33% 220.00% 73.91% 24.61%
Table 2 Service-X Processing Unit Usage
In this the production environment, when the service runs and reaches 150 tps on load, it will give 2.75%
extra for each instance on the platform processing unit. For example, the current condition of our platform
still uses only 40% of the processing unit (on peak period). So it still safe to run two instances of service-X
on the platform.
This baseline data can also be useful when we need to do projection planning. Projection planning is
important in making management decisions in regards to the expansion of the platform, both horizontally
and vertically. This way they can overcome future events (like Idoel Fitri, Christmas Eve, New Year, etc.).
2. Memory - is used by the services to store data when transactions run, and more is released when the
transaction is finished. In some cases memory leakage can also happen. Whenever memory leakage
happens, a service cannot release all the memory resources back to the platform. This is because of the
quality of service implementation code on object management. So, whenever we want to put a service in
production we need to make sure that all the services are free from memory leakage problem. This way it
will not disturb our production runtime environment.
Unlike the processing unit, to determine how much memory is needed by the services, we will need to
do an estimation from the services activity process itself. For example, figure 1 describes the five main
activities service-X contains.

Figure 1 Service-X Activity
In the first activity, the service is translating the msisdn and keyword as incoming request parameter into the
internal data structure. This internal data structure is called Request Payload. Request Payload consists of two
main parts. They are:
1. Header - a payload that defines the properties of every single message request. This part contains several
elements like RequestID, EndSystem, TimestampIn, TimestampOut, Channel, UUID, ESBUUID, and more.
Header part will be carried until the end of activity.
2. Body - the main payload of the request. The Body part varies on each service, and depends on the specific
internal data structure implementation. For example, in service-X the body payload consist of msisdn,
keyword, subscriberNo, and soccd element in the body part.
Figure 2 Service-X Body Part

For example, the maximum header parts will contain 2,048 bytes, and body payload will contain 327 bytes. So,
we will have 2,375 bytes overhead.
In the next activity, service-X will load subscriber profile (subscriber number and soccd) from database based
on msisdn. Subscriber number and soccd will then be mapped with the request payload body. For example
subscriber number will have 32 bytes at maximum and soccd will have 16 bytes at maximum. Second actvitiy
will give additional 48 bytes on memory usage.
In third activity, service-X will register the subcriber number to the corresponding package based on the soccd
and keyword being input. Return value of this activity is only a boolean values, which takes 1 byte of data. So,
in this activity will give additional 1 byte on memory usage.
Unlike in the previous activities, the value of the fourth activity varies based on the package that the subcriber
has been registered to. But we can take the maximum value of it into take account. For example, to have a
good response message (which is defined by marketing team) we need 256 bytes. We can use this number as
a guideline. In the last activity, the response message will only be sent out through defined channels (UMB or
SMS).
If we sum all of the activities above, we will have the following estimation:
# Activity Header Body Extra Payload Total
1 2048 327 24 2399
2 2048 327 48 2423
3 2048 327 1 2376
4 2048 327 256 2631
5 2048 327 0 2375
Total 329 12204
Table 3 Memory Usage Estimation for Service-X
For single transaction service-X will at least need 12204 bytes of memory. As mentioned in the previous
example, if service-X will run at 150 tps per service instance, it will need at least 1,830,600 bytes of memory
(around 1,74 MB).
Capacity Planning on Infrastructure Level

Platform is a system where applications run. However, not all applications can run in multi-platforms. So
because the platform is the base foundation system in order for applications to run, we need to make sure
it has the availability and scalability to keep the growth of the application. Some important aspects of the
platform capacity are the processing unit, memory, and storage. In SOA-based system, there are Enterprise
Service Bus system, Messaging Bus system, and Database system (optional) that need to have good capacity
planning in terms of platform capacity aspects.

Enterprise Services Bus Capacity Planning

The Enterprise Service Bus is a system where collections of services are running to do mediation, routing,
transformation, and orchestration to process incoming request into desired results. From the previous chapter,
we already know what the requirements of a service can be ran on. Afterwards, we sum up together those
requirements and they become requirements for Enterprise Service Bus capacity planning.
In telecommunication domain, 5-9 high availability is a mandatory attributes. Enterprise Service Bus (ESB)
should able to serve all requests 24 hours a day. To keep availability of the ESB, we should have a good
capacity planning and high availability strategy for it:
1. Processing unit on one ESB should not exceed a number of threshold depends on policy we use.
2. Memory unit of ESB should have adequate free paging space to serve services that needs more memory
allocation.
3. Network bandwidth should be big enough to distribute certain transaction loads packet to the SOA system
(ESB, Messaging Bus, Database, Service Providers, etc).
4. High availability strategy must be able to support sustainability of the ESB system in order to serve the
request.
For example, our system consists of four ESBs. The threshold of the processing unit on each ESB depends on
the policy we use:
1. One-to-one pair - means that one ESB will be a fault tolerant system for one primary ESB. In this policy,
processing unit of primary ESB can be 100% capacity since the secondary ESB can only hold a single
primary ESB capacity. But this approach it very expensive, so we must have a backup for every single
primary ESB.
Figure 3 One-to-One Pair
2. N+1 - means that there is one ESB that is becoming the secondary ESB for all primary ESBs. If there is
one ESB fails, then it should fall over to the secondary ESB. In this policy, the processing unit of primary
ESB should not exceed 100/N % capacity, since the secondary ESB has to hold N-primary ESB capacity.

Figure 4 N+1 Policy
Unlike the processing unit, the memory of the ESB is much more straightforward. We just need to have
available paging space for services to allocate memory. For example, we have 64 GB of memory and in
average our services in one primary ESB need 128 MB to runs at 150 tps. Our single primary ESB can serve
until 512 services runs at 150 tps. But, we should also put a threshold for memory, not utilizing it until 100%.
For example, if it was 60% memory utilized, we should put another memory unit on ESB.
Messaging Bus Capacity Planning

Just like the Enterprise Service Bus, capacity planning on the Messaging Bus will also include processing unit,
memory, and high availability features. This creates high availability features that are more or less the same
as the Enterprise Service Bus. However, the Messaging Bus memory unit is not as straightforward as the
ESB memory unit. There are additional aspects we also need to consider, such as storage size which is for
persisting messages.
Capacity planning of memory on the Messaging Bus correlates with the message throughput (input and
output). It will use memory to retrieve messages from the service producer and send it to the service consumer
to keep the performance. For example, in our previous chapter, a message will have 12,204 bytes size. If
a service producer can create the message up to the 150 tps rate, we will need at least around 1.74 MB of
memory to holds the message for sending/receiving activity. Sometimes there is a condition where service
consumer is not available (i.e. periodic maintenance, restarting instances). In this case, all messages that are
produced by the service producer should not be send instantly, hence keeping it into persistent until service
consumer become available again. With this mechanism, we will not lose any messages/transactions. The
persistent size unit depends on how long the services consumer is usually unavailable, and also depends on
the message rate itself. For example, service consumer in the worst case is not available for 3 hours, while the
transaction loads are at 150 tps. So we will need to have 18,792 MB of persistent unit (storage).

Message Size: 1,74 MB
# Persistent Size
# Hours # Load (tps)
(MB)
2 150 1,879,200
3 150 2,818,800
4 150 3,758,400
5 150 4,698,000
6 150 15,637,600
Table 4 Persistent Unit Size Needed
Database Capacity Planning

The Database is usually used to keep business logs, application configuration, and transaction checkpoint.
Business logs are kept for the purpose of tracing transaction or troubleshooting of the production runtime
environment. Application configuration refers to all the configurations that are used by the application when
starting/setting up their runtime environment. And the transaction checkpoint is used by specific application to
maintain the transaction state whenever the transaction is failing.
Capacity planning of database is slightly different with the other two components, beside the processing unit
and high availability feature. In the enterprise level, the database usually keeps the data on persistent storage
that installed separately with it. Some key aspects when defining storage capacity are:
1. Traffic distribution - this is where traffic pattern is applied on runtime production. We can divide it into two
segments:
a. Non busy hours, will have 12 hours (5%) and 4 hours (10%) distribution pattern
b. Busy hours, will have 4 hours (50%) at morning, 2 hours (80%) at noon, and 2 hours (100%) at night
distribution pattern.
2. Record size of a transaction - this is the value needed when we store one business logs in the database
system, including:
a. Database overhead, used to define overhead value for clob, lob, or other byte data.
b. Redo log
c. Archive log
d. Index
3. Data retention - defines how long we need to keep our business logs in persistent storage.

Figure 5 Database Storage Architecture Using SAN
For example, we have requirement to keep business logs for 45 days for 150 tps loads. And one log record
contains at maximum 15.5 KB of data. We can do capacity estimation for the storage that we need by using
following formulation:
Number of TPS 150

Data Retention 45
5% <0.05>
x 150
Estimated Peak TPS
x 3.600
x 12
12 Hours
= 324.000
Traffic 5%
10% <0.1>
x 150
x 3.600
x 4
4 Hours
= 216.00
Traffic 10%
50% <0.5>
x 150
x 3.600
x 4
4 Hours
= 1.080.000
Traffic 50%

80% <0.8>
x 150
x 3.600
x 2
2 Hours
= 864.000
Traffic 80%
100% <1>
x 150
x 3.600
x 2
2 Hours
= 1.080.000
Peak Traffic
Traffic 5% for 12 hours 324.000
Traffic 10% for 4 hours + 216.000
Traffic 50% for 4 hours + 1.080.000
Traffic 80% for 2hours + 864.000
Traffic 100% for 2 hours + 1.080.000
Estimate Total Daily Traffic 3.564.000
Log Size per record (KB) 15,5
Size (KB) 324.000
DB Overhead 3
Total Size (KB) 165.726.000
1.048.576
Total Size Daily (GB) 158
Data Retention 45
Grand Total (GB) 7.112
Table 5 Storage Capacity Formulation

Conclusion
Database system/storage is not a mandatory component in SOA-based system, but it is very helpful for us to
see what is happening in our production environment. Most of it is for providing data to management in regards
to how much revenue is being produces by SOA-based system. By knowing this, they can make decisions on
whether it is worthy to put the service into a SOA-based system, considering all cost benefit.
About the Author: Masykur Marhendra

Masykur Marhendra Sukmanegara graduated from Bandung Institute of Technology. He
took Informatics Engineering with summa cumlaude predicate and placed first on Global
Warming Solution Technology from Environmental Ministerial Department. Starting his career
as a junior telecommunication developer on Switchlab, he implemented MSC encoding/
decoding modules based on 3GPP standard specifications. After the project was done, he
then took part of Javan IT Services consultancy as a J2EE Engineer working on various
Web applications and mobile application implementing in various industries. His career as a
J2EE Engineer then continued in XL Axiata, a 2nd telecommunication provider in Indonesia,
working on south and north application, integrating various telecommunication sub systems
with Java technology such as IBM Netcool, SMSC, SMS Gateway, and others. In 2010, he
was appointed to handle first SOA implementation in XL Axiata for Billing domain as a pilot
project. Designing and architecting a SOA platform, delivering the service-oriented porting
development from satellite applications, ensuring the service-oriented principles were being
followed, maintaining SOA capacity, putting the baseline for next architecture, development,
and monitoring the process.
http://www.servicetechmag.com/contributors/masykurmarhendra

Capacity Planning of SOA-Based Systems

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Capacity Planning of SOA-Based Systems

Hochgeladen von

Copyright:

Verfügbare Formate

SOA in the Telco Domain

Part II: Capacity Planning of SOA-Based Systems

Capacity Planning on Services Level

Services Instance Sizing

Copyright Arcitura Education Inc. 1 www.servicetechmag.com

# Load Response & Increase Form

2 150 1250 4.17%

3 200 2300 91.67% 84.00%

4 250 3210 167.50% 156.80% 39.57%

5 300 4000 233.33% 220.00% 73.91% 24.61%

Table 1 Service-X Performance Test Result

Services Resources Sizing

Copyright Arcitura Education Inc. 2 www.servicetechmag.com

# Load Response & Increase Form

2 150 1250 4.17%

3 200 2300 91.67% 84.00%

4 250 3210 167.50% 156.80% 39.57%

5 300 4000 233.33% 220.00% 73.91% 24.61%

Table 2 Service-X Processing Unit Usage

Copyright Arcitura Education Inc. 3 www.servicetechmag.com

Figure 1 Service-X Activity

Figure 2 Service-X Body Part

Copyright Arcitura Education Inc. 4 www.servicetechmag.com

# Activity Header Body Extra Payload Total

1 2048 327 24 2399

2 2048 327 48 2423

3 2048 327 1 2376

4 2048 327 256 2631

5 2048 327 0 2375

Total 329 12204

Table 3 Memory Usage Estimation for Service-X

Capacity Planning on Infrastructure Level

Copyright Arcitura Education Inc. 5 www.servicetechmag.com

Enterprise Services Bus Capacity Planning

Figure 3 One-to-One Pair

Copyright Arcitura Education Inc. 6 www.servicetechmag.com

Figure 4 N+1 Policy

Messaging Bus Capacity Planning

Copyright Arcitura Education Inc. 7 www.servicetechmag.com

Message Size: 1,74 MB

Table 4 Persistent Unit Size Needed

Database Capacity Planning

Copyright Arcitura Education Inc. 8 www.servicetechmag.com

Figure 5 Database Storage Architecture Using SAN

Number of TPS 150

Copyright Arcitura Education Inc. 9 www.servicetechmag.com

Traffic 5% for 12 hours 324.000

Traffic 10% for 4 hours + 216.000

Traffic 50% for 4 hours + 1.080.000

Traffic 80% for 2hours + 864.000

Traffic 100% for 2 hours + 1.080.000

Estimate Total Daily Traffic 3.564.000

Log Size per record (KB) 15,5

Size (KB) 324.000

Total Size (KB) 165.726.000

Total Size Daily (GB) 158

Grand Total (GB) 7.112

Table 5 Storage Capacity Formulation

Copyright Arcitura Education Inc. 10 www.servicetechmag.com

About the Author: Masykur Marhendra

Copyright Arcitura Education Inc. 11 www.servicetechmag.com

Das könnte Ihnen auch gefallen