Sie sind auf Seite 1von 15

Sizing Guide

Sizing
SAP BusinessObjects
Data Services, Version 4.1

Reserved for SAP and SAP Partners Consumption and


Usage
Document Version 1.1, March 2013

Copyright 2012 SAP AG. All rights reserved.

These materials are subject to change without notice. These


materials are provided by SAP AG and its affiliated

No part of this publication may be reproduced or transmitted

companies ("SAP Group") for informational purposes

in any form or for any purpose without the express permission

only, without representation or warranty of any kind, and SAP

of SAP AG. The information contained herein may be

Group shall not be liable for errors or omissions with respect

changed without prior notice.

to the materials. The only warranties for SAP Group products


and services are those that are set forth in the express

Some software products marketed by SAP AG and its

warranty statements accompanying such products and

distributors contain proprietary software components of other

services, if any. Nothing herein should be construed as

software vendors.

constituting an additional warranty.

Microsoft, Windows, Outlook, and PowerPoint are registered


trademarks of Microsoft Corporation.

Disclaimer
Some components of this product are based on Java. Any

IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex,

code change in these components may cause unpredictable

MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries,

and severe malfunctions and is therefore expressly

pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner,

prohibited, as is any decompilation of these components.

WebSphere, Netfinity, Tivoli, and Informix are trademarks or


registered trademarks of IBM Corporation in the United States

SAP Library document classification: CUSTOMERS &

and/or other countries.

PARTNERS

Oracle is a registered trademark of Oracle Corporation.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of


the Open Group.

Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame,


VideoFrame, and MultiWin are trademarks or registered
trademarks of Citrix Systems, Inc.

HTML, XML, XHTML and W3C are trademarks or registered


trademarks of W3C, World Wide Web Consortium,
Massachusetts Institute of Technology.

Java is a registered trademark of Sun Microsystems, Inc.


JavaScript is a registered trademark of Sun Microsystems,
Inc., used under license for technology invented and
implemented by Netscape.

MaxDB is a trademark of MySQL AB, Sweden.

SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP


NetWeaver, and other SAP products and services mentioned
herein as well as their respective logos are trademarks or
registered trademarks of SAP AG in Germany and in several
other countries all over the world. All other product and
service names mentioned are the trademarks of their
respective companies. Data contained in this document
serves informational purposes only. National product
specifications may vary.

2012 SAP AG. All rights reserved.

Sizing SAP Data Services - SAP Customers


__________________________________________________________________________________

TABLE OF CONTENTS
Introduction ...................................................................................................................................... 2

1
1.1
1.2
1.3
1.4
1.5

Functions of SAP BusinessObjects Data Services Data Quality ................................................. 2


Functions of SAP BusinessObjects Data Services Text Data Processing .................................. 2
Functions of SAP BusinessObjects Data Services Data Integration ........................................... 3
Architecture of SAP BusinessObjects Data Services ................................................................... 4
Common Factors that influence performance ............................................................................... 4

Sizing Fundamentals and Terminology ......................................................................................... 6

Memory (RAM) Requirements ......................................................................................................... 6

Initial Sizing for SAP BusinessObjects Data Services Data Quality .......................................... 7
4.1
4.2
4.3

Assumptions .................................................................................................................................. 7
Batch sizing guidelines ................................................................................................................. 7
Transactional sizing guidelines ..................................................................................................... 8
Initial Sizing for SAP BusinessObjects Data Services Text Data Processing .......................... 9

5.1
5.2
6

Assumptions .................................................................................................................................. 9
Batch sizing guidelines ................................................................................................................. 9
Initial Sizing for SAP BusinessObjects Data Services Data Integration Processing ............. 11

6.1
6.2

Assumptions ................................................................................................................................ 11
Batch sizing guidelines ............................................................................................................... 11

Miscellaneous ................................................................................................................................. 13

Comments and Feedback .............................................................................................................. 13

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

1 INTRODUCTION
SAP BusinessObjects Data Services delivers a single enterprise-class solution for data integration, data
quality, data profiling and text data processing that allows you to integrate, transform, improve and
deliver trusted data to critical business processes. It provides one development UI, metadata
repository, data connectivity layer, run-time environment and management console enabling IT
organizations to lower total cost of ownership and accelerate time to value. With SAP BusinessObjects
Data Services, IT organizations can maximize operational efficiency with a single solution to improve
data quality and gain access to heterogeneous sources and applications.

1.1 Functions of SAP BusinessObjects Data Services Data Quality

Data quality dashboards that show the impact of data quality problems on all downstream
systems or applications
Ability to apply data quality transformations to all types of data, regardless of industry or data
domain such as structured to unstructured data as well as customer, product, supplier, and
material information
Intuitive business user interfaces and data quality blueprints to guide you through the process of
standardizing, correcting, and matching data to reduce duplicates and identify relationships
Comprehensive global data quality coverage with support for over 230 countries
Comprehensive reference data
Broad, heterogeneous application and system support for both SAP and non-SAP sources and
targets
Prepackaged native integration of data quality best practices for SAP environments
Optimized developer productivity and application maintenance through intuitive transformations,
a centralized business rule repository, and object reuse
High performance and scalability with software that can meet high volume needs through
parallel processing, grid computing, and bulk data loading support
Flexible technology deployment options, from an enterprise platform to intuitive APIs that allow
developers quick data quality deployment and functionality

1.2 Functions of SAP BusinessObjects Data Services Text Data


Processing

Analyzes text and automatically identifies and extracts entities, including people, dates, places,
organizations and so on, in multiple languages.
Looks for patterns, activities, events, and relationships among entities and enables their
extraction.
Goes beyond conventional character matching tools for information retrieval, which can only
seek exact matches for specific strings. It understands semantics of words.
Supports extraction in 31 different languages.
Support not only text, HTML, and XML but binary document formats such as PDF and Microsoft
Word.
Allows specifying your own list of entities in a custom dictionary. These dictionaries enable you
to store entities and manage name variations. Known entity names can be standardized using a
dictionary.
Write custom rules to customize extraction output although pre-defined rules are provided to
support sentiment analysis, enterprises, and the public sector.
Broad, heterogeneous application and system support for both SAP and non-SAP sources and
targets
High performance and scalability with software that can meet high volume needs through
parallel processing, grid computing, and bulk data loading support

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

1.3 Functions of SAP BusinessObjects Data Services Data


Integration

Easy to configure transforms for typical complex tasks like Slow Changing Dimensions,
Hierarchy Flattening, etc.
Everything you need to build large jobs including error handling, dependency handling and
restart-ability
Extensive operational statistics
Rich connectivity to many sources and targets - most using the vendors native format for
maximum performance
Easy to use parallelization and performance optimization options
Functionalities to simplify daily operations and project hand-over like web based management
console, auto-documentation features and impact lineage information

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

1.4 Architecture of SAP BusinessObjects Data Services


The following diagram illustrates how SAP BusinessObjects Data Services components fit in with other
software in the SAP BusinessObjects portfolio.

More details about the architecture of the Data Quality Management and SAP BusinessObjects Data
Services can be found in the SAP BusinessObjects Data Services Administrators Guide.

1.5 Common Factors that influence performance


Many factors can influence the performance of SAP BusinessObjects Data Services.

Access to source and targets The bandwidth to the source and target can affect how fast
data can be passed through the dataflow.
Availability of additional RAM If caching is needed, allocating enough free RAM within the
system will speed up the dataflow, not only to cache lookup data but also reference data for
Data Quality transforms.
Configuration and System Landscape This sizing guide was created with SAP
BusinessObjects Data Services installed on the target database system. Source RDBMS was
located on a separate machine.
Competing applications Running multiple resource intensive applications may cause
competition for the resources and reduce the throughput for an individual job.
Operating System This sizing guide was created with Windows (2003/2008 Server) and
Linux (RedHat 5/6, Suse 10/11) in mind. Please contact SAP for specifics on sizing for other
operating systems.
Degree of Parallelization (DOP) The DOP setting can greatly influence performance when
the appropriate hardware is utilized. Increasing this setting will generally increase throughput.

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

Additional factors that influence performance of data integration:

Loader Method Depending on the databases (and versions) different loader options can
have dramatic differences in performance, regular load, Bulkloading, AutocorrectLoad. But
there is not a best method. Each has pros and cons depending how it got implemented by the
database vendor.
Transactional Loaders Loading data in one transaction means the dataflow cannot use
parallel sessions to speed up the loading.
Lookup and Join settings SAP BusinessObjects Data Services lets the user choose the
best lookup strategy, if the wrong is used based in the amount of data to be processed versus
the size of the lookup table, it can have sever performance impact.
Heterogeneous sources or all in one database If all data is in one database or a database
link exists between the databases, the SAP BusinessObjects Data Services optimizer has more
options so it can decide to delegate parts or all processing to the database.

Additional factors that influence performance of text data processing:

Document Characteristics The format, length, and density of the input documents impact
performance:
o Format XML and HTML require de-tagging before processing the text which has
more overhead than processing text directly. Additionally, converting a binary document
into a textual representation during processing has overhead.
o Length Longer input documents require more processing time.
o Density More dense, entity and fact rich, input documents require more processing
time.
Rule-based Extraction Using one or more rules to customize extraction may require more
processing time.

Additional factors that influence performance of data quality:

Complexity of processing A data quality transform can do varying degrees of simple or


complex processing based on the options set for the transform. Generally, the more complex
the processing, the more hardware resources that are needed.
Location of reference data - Several of the cleansing transforms use reference data located
on the file system and can be I/O or network dependent. The speed of this access will affect the
performance of these transforms.

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

2 SIZING FUNDAMENTALS AND TERMINOLOGY


SAP provides general sizing information on the SAP Service Marketplace. For the purpose of this guide,
we assume that you are familiar with sizing fundamentals. You can find more information at
http://service.sap.com/sizing Sizing Guidelines General Sizing Procedures.
This section explains the most important sizing terms, as these terms are used within this document.

Sizing
Sizing means determining the hardware requirements of an SAP application, such as the physical
memory, CPU processing power, and I/O capacity. The size of the hardware and database is influenced
by both business aspects and technological aspects. This means that the number of users using the
various application components and the data load they put on the server must be taken into account.

Benchmarking
Sizing information can be determined using SAP Standard Application Benchmarks and scalability
tests (www.sap.com/benchmark). Released for technology partners, benchmarks provide basic sizing
recommendations to customers by placing a substantial load upon a system during the testing of new
hardware, system software components, and relational database management systems (RDBMS). All
performance data relevant to the system, user, and business applications are monitored during a
benchmark run and can be used to compare platforms.

Initial Sizing
Initial sizing refers to the sizing approach that provides statements about platform-independent
requirements of the hardware resources necessary for representative, standard delivery SAP
applications. The initial sizing guidelines assume optimal system parameter settings, standard business
scenarios, and so on.

Expert Sizing
This term refers to a sizing exercise where customer-specific data is being analyzed and used to put
more detail on the sizing result. The main objective is to determine the resource consumption of
customized content and applications (not SAP standard delivery) by comprehensive measurements. For
more information, see http://service.sap.com/sizing Sizing Guidelines General Sizing Procedures
Expert Sizing.

Configuration and System Landscaping


Hardware resource and optimal system configuration greatly depend on the requirements of the
customer-specific project. This includes the implementation of distribution, security, and high availability
solutions by different approaches using various third-party tools. In the case of high availability through
redundant resources, for example, the final resource requirements must be adjusted accordingly.
There are some "best practices" which may be valid for a specific combination of operating system and
database. To provide guidance, SAP created the NetWeaver configuration guides
(http://service.sap.com/instguides SAP NetWeaver).

3 MEMORY (RAM) REQUIREMENTS


Each section below lists an amount of RAM required per CPU core. The current SAP EIM standard for
the amount of memory (RAM) needed for SAP BusinessObjects Data Services is 4 GB per CPU core.
This is the minimum amount of RAM required per CPU core. The customer could always leverage more
RAM than what is required.

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

4 INITIAL SIZING FOR SAP BUSINESSOBJECTS DATA


SERVICES DATA QUALITY
4.1 Assumptions

There is a mix of data from various regions of the world. For this sizing guide we assume 50% NA,
40% EMEA, 10% APJ. Major variations of this mix will affect needs for sizing.
Only data quality transforms are considered. Utilizing non-data quality transforms in a job may affect
the sizing requirements and performance of the overall job.
The reference data used for the transforms that require it was located on a local disk and enough
free RAM was allocated to allow for caching the majority of this reference data.
Address validation transforms are able to perform certified and non-certified processing of
addresses for those countries that provide a certification program. Running with certification mode
enabled requires the collection of processing statistics and use of more strict rules. This sizing guide
assumes that address data is not being processed with certification mode enabled.
The data quality transforms have options to enable and disable the generating of processing
statistics for reporting purposes. The sizing in this document assumes that generation of these
statistics is disabled.

4.2 Batch sizing guidelines


The input to the T-Shirt sizing is the desired throughput (records per hour) for the batch scenario which
best aligns with the type of processing in which you are interested. The throughput (records per hour)
metric used reflects the fact that the data quality transforms process records, which is generally a
subset of a row of data. An input source may have a row composed of 50 columns, but customers may
only be matching and cleansing 20 of those columns.
For example, lets assume that you have 10 million records that need to be cleansed within a 5 hour
maintenance window. This would mean that you would want a minimum throughput of 2 million records
per hour. Looking at table 1 in the Name and Address Cleansing scenario you can see that your
requirements match the small processing category and that you would need 2 CPUs to achieve this
throughput. If your maintenance window was 2 hours, then you would need to process 5 million records
per hour which matches the medium processing category and would require 6 CPUs.

Simple scenario Cleanse party data


In this scenario, address and name data is being parsed, corrected, and standardized.
Table 1
Category

Throughput
(records per hour)

Number of CPU Cores

Memory requirements
in GB (per CPU Core)

Small

2 million

Medium

5 million

Large

15 million

16

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

Moderate scenario Cleanse and match party data


In this scenario, address and name data is being parsed, corrected, and standardized and then
matched on name and address to find duplicates.
Table 2
Category

Throughput
(records per hour)

Number of CPU Cores

Memory requirements
in GB (per CPU Core)

Small

2 million

Medium

5 million

Large

15 million

20

Complex scenario Cleanse, enhance, and perform householding on party data


In this scenario, address and name data is being parsed, corrected, and standardized, appended with
geographic data, and then matched to find records belonging to a household. For this householding,
matching is performed on the address then family names within the address and then finally individuals
within the family.
Table 3
Category

Throughput
(records per hour)

Number of CPU Cores

Memory requirements
in GB (per CPU Core)

Small

2 million

Medium

5 million

10

Large

15 million

24

4.3 Transactional sizing guidelines


SAP BusinessObjects Data Services is also able to process data in transactional modes. Transactional
processing has slightly more overhead than batch because of the work required for processing client
requests and distributing them to the Job Servers via the Access Servers. The processing overhead for
handling requests in DS outweighs the variances in performance of our DQ transforms, so whether
youre doing cleansing, geocoding, matching, etc. the overall sizing requirements are similar.
When sizing for transactional processing, it is important to consider response time requirements,
estimated peak transactional throughput needs and the number of potential concurrent client requests
that need to be supported.

2012 SAP AG. All rights reserved.

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________
Simple to Moderate scenario Cleansing, matching, and enhancing data
This scenario encompasses any simple to moderate Data Quality scenario that parses, standardizes,
and corrects, along with any scenario that matches or enhances name and address data.
Table 4
Number of
Concurrent
Client Requests

Average
Transaction
Response Time

Peak Transactional
Throughput (per
hour)

Number of
CPU Cores

Memory
requirements
in GB (per CPU Core)

<50ms

~375 thousand

20

<50ms

~1.5 million

50

<50ms

~3.75 million

20

Based on the numbers above, a general guideline for transactional processing would be that for every 5
concurrent clients, 2 CPU cores are required to maintain the <50ms response time.

5 INITIAL SIZING FOR SAP BUSINESSOBJECTS DATA


SERVICES TEXT DATA PROCESSING
5.1 Assumptions

Only text data processing transforms are considered. Utilizing non-text data processing transforms
in a job may affect the sizing requirements and performance of the overall job.
The input data is stored on disk and there is no interaction between processes.
Multiple input languages are supported but only English is used.
Any extraction dictionaries or rules used are stored locally.

5.2 Batch sizing guidelines


The input to the T-Shirt sizing is the desired throughput (MB per hour) for the batch scenario which best
aligns with the type of processing in which you are interested. The throughput metric used reflects the
fact that TDP doesnt process records or rows of data, but unstructured documents.
For example, lets assume that you have 900 MB of text where each document is less than 1 MB in size
that needs to be processed within a 3 hour maintenance window. This would mean that you would want
a minimum throughput of 300 MB per hour. Looking at table 1 in the Small Document scenario you can
see that your requirements match the first row and that you would need 1 CPU to achieve this
throughput. If your maintenance window was 1 hour, then you would need to process 900 MB per hour
which matches the second row and would require 4 CPUs.

2012 SAP AG. All rights reserved.

10

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

Small Document scenario Extracting entities from documents smaller than 1 MB


In this scenario, entities are being extracted from text product reviews.
Table 1
Data Set Size
(MB)
85

85

85

Throughput
(MB per hour)

Number of CPU Cores

Memory requirements
in GB (per CPU Core)

450

1650

2750

Large Document scenario Extracting entities from documents 1 MB or larger


In this scenario, entities are being extracted from text product reviews.
Table 2
Data Set Size
(MB)

Throughput
(MB per hour)

Number of CPU Cores

Memory requirements
in GB (per CPU Core)

160

180

160

720

160

1360

Sentiment Analysis scenario Extracting entities and facts


In this scenario, entities, sentiments, and requests are being extracted from text product reviews.
Table 3
Data Set Size
(MB)
85

85

85

Throughput
(MB per hour)

Number of CPU Cores

Memory requirements
in GB (per CPU Core)

360

1200

1825

2012 SAP AG. All rights reserved.

11

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

6 INITIAL SIZING FOR SAP BUSINESSOBJECTS DATA


SERVICES DATA INTEGRATION PROCESSING
6.1 Assumptions

A source database with the TPC-C schema is in use.


A target server that has both the target database and SAP BusinessObjects Data Services installed.
Between the source database and the target server is a 1GBit Ethernet connection.
The source database is never the bottleneck; otherwise everything downstream will just wait for
getting more data.
Please note that to a large degree this is more about sizing the target database server than
hardware for SAP BusinessObjects Data Services.

6.2 Batch sizing guidelines


The throughput metric (rows per second) was used because of the speed in which data integration can
process rows of data. Data quality uses a similar metric (records per hour), but the hour unit would
make the numbers quite large for data integration. The idea behind the performance metrics below was
to implement a realistic scenario, execute it on different platforms and keep track of the resource
utilization. The TPC-C data model is an official and well adopted example of an OLTP system. We
used this as a source and did build a Data Warehouse with that data. The task of SAP BusinessObjects
Data Services was to load this Data Warehouse star schema data model and apply all the
transformations commonly used.

Material Dimension
The material dimension is built out of two source tables: item and stock. The item table contains all
100,000 products and each product has different stock levels per warehouse (200 warehouses). This
results in 20 million stock rows. The idea is to store in the material dimension all item attributes plus the
information of the stock level.
The delta load use case is identical to the initial load use case except that the data has to be compared
with the target before loading the changes.
Table 1
Test Case

Throughput (rows/sec)

CPU Cores/Disks

Memory requirements
in GB (per CPU Core)

Initial Load

212,000

16 core/8 disk

Initial Load

227,000

8 core/2 disk

Initial Load

204,000

4 core/1 disk

Delta Load

219,000

16 core/8 disk

Delta Load

224,000

8 core/2 disk

2012 SAP AG. All rights reserved.

12

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

Delta Load

202,000

4 core/1 disk

Customer Dimension (SCD2)


The customer dimension is a very simple use case. We read the customer source table and copy all
attributes over to the target dimension table. We query for the customers WAREHOUSE_KEY and
DISCTRICT_KEY to know the surrogate keys used in the data warehouse.
The delta load use case below suffers from the table data not having a delta indicator. For each delta,
the entire source has to be read and compared with the target to find changes. Generating the history
for the SCD2 can be more complex; for each incoming row, we need to find the more recent version as
stored in the target table, compare all priority fields, update the currently active version to mark it as not
current and insert the new version.

Table 2
Test Case

Throughput (rows/sec)

CPU Cores/Disks

Memory requirements
in GB (per CPU Core)

Initial Load

101,000

16 core/8 disk

Initial Load

78,000

8 core/2 disk

Initial Load

56,000

4 core/1 disk

Delta Load

21,000

16 core/8 disk

Delta Load

15,000

8 core/2 disk

Delta Load

15,000

4 core/1 disk

Fact Load
The fact table load is a typical case where two source tables have to be joined the order master and
order line item tables and then some transformation has to happen, most important lookup of the
surrogate keys in the dimension tables. So the use cases before had lots of attributes per row and little
transformation, this scenario does have a narrow table with large amounts of rows and many
transformations.
For the delta load the main difference to before is that a list of potential changes can be identified in the
source, e.g. by reading based on a timestamp.
Table 3
Test Case

Initial Load

Throughput (rows/sec)

200,000

CPU Cores/Disks

Memory requirements
in GB (per CPU Core)

16 core/8 disk

2012 SAP AG. All rights reserved.

13

Sizing SAP BusinessObjects Data Services - SAP Customers and Partners


__________________________________________________________________________________

Initial Load

130,000

8 core/2 disk

Initial Load

80,000

4 core/1 disk

Delta Load

1,400,000

16 core/8 disk

Delta Load

1,600,000

8 core/2 disk

Delta Load

1,200,000

4 core/1 disk

7 MISCELLANEOUS
Additional performance related information can be found at
http://wiki.sdn.sap.com/wiki/display/BOBJ/Performance

8 COMMENTS AND FEEDBACK


Both are very welcome; please send them to Ken Beutler, Product Owner, Information Quality & Insight
(ken.beutler@sap.com)

2012 SAP AG. All rights reserved.