Optimizing BW Query Performance

11/20/2016
OptimizingBWQueryPerformance|RunningSAPApplicationsontheMicrosoftPlatform
Server & Tools Blogs > Data Platform Blogs > Running SAP Applications on the Microsoft
Platform
Sign in
Running SAP Applications on the Microsoft

Platform
This blog provides information about running SAP Applications on the Microsoft Platform. The blog is
written by people who are working with SAP on the Microsoft Platform for decades.
Optimizing BW Query Performance
March 19, 2013 by Martin Merdes // 5 Comments

Share
SAP BW query performance depends on many factors: hardware, database configuration, BW configuration
and last but not least BW cube design and BW query design. When running a query, there are several caches
involved: disk caches, database caches and the BW OLAP cache. You should keep this in mind when
comparing BW query performance of different configurations or even different systems. In the following we
are discussing the configuration options, which are specific for Microsoft SQL Server.
Prerequisites
First of all you have to make sure that there is no bottleneck in the most important system resources: CPU,
memory and I/O. You can configure the maximum number of CPU threads, which are used for a single
database query see below. However, SQL Server may reduce the actual number of used threads, if there are
not enough free worker threads. Therefore the runtime of the same query can vary greatly, dependent on the
current system load. A memory bottleneck on SQL Server may result in additional I/O. When there are
sufficient CPU and memory resources, then repeatedly running queries are fully cached. In this case the
performance of the I/O system is not crucial.
A huge part of the overall runtime of a BW query can be consumed on the SAP application server, not on the
database server. Therefore the system resources on the application server are important, too. A simple BW
query typically consists of 3 parts:
A database query running against the ffact table of the cube
A parallel running database query against the efact table of the cube
An aggregation of the two result sets running in the OLAP processor on the application server this
process has nothing to do with BW Aggregates, see below
If you have never run BW cube compression see below, then all data is in the ffact table. If you do run BW
cube compression after each data load, then all data is in the efact table. In both cases, there is no need to
aggregate the two results sets, which reduces the BW query runtime on the application server.
https://blogs.msdn.microsoft.com/saponsqlserver/2013/03/19/optimizingbwqueryperformance/
1/10
11/20/2016
In SAP BW data is typically loaded into cubes using BW process chains. These chains contain steps for
dropping/recreating indexes and updating the database statistics for small dimension tables. If one of these
steps fails, then you may clearly see a BW query performance issue.
1. Number of requests in ffact table

The ffact table of a cube is optimized for data load DTP: Data Transfer Process, request deletion and BW
cube compression. On SQL Server, the ffact table is partitioned by the data load request ID as it is the case
for most other database systems supported by SAP BW. Having a separate partition per request ID, the
request deletion is very fast and does not consume considerable transaction log space. The BW cube
compression also benefits from this, because part of the BW cube compression is the deletion of the
compressed requests. However, BW query performance does not benefit from this kind of partitioning. Quite
the contrary, having more than a few hundred partitions results in a decreased BW query performance. We
ran a suite of queries, running against ffact tables with the same data but different number of partitions:
Number of partitions on ffact table
10
100
200
500
cold empty SQL data cache
100%
125%
152%
202%
warm table already in data cache
100%
110%
135%
142%
Average query runtime compared with 10 partitions in percent, lower number is better
Since the partitions are created and dropped automatically on the ffact table, a BW administrator has no
direct influence on the number of partitions. However, there are two ways to reduce the number of partitions.
First of all, you should keep the number of data loads DTPs into a cube as low as possible by avoiding small
request. You can combine many small requests to a single, large one by loading the requests first into a Data
Store Object DSO. The following DTP from the DSO to the cube creates then fewer but larger requests in the
cube.
Secondly, you can perform BW cube compression, which reduces the number of partitions again. For best
query performance you should compress all requests anyway which deletes all rows in the ffact table. BW
query performance is still good, if just most of the requests are compressed. Some customers keep new
requests for at least one week uncompressed in the ffact table. Thereby they can easily delete faulty
requests, which were loaded into the cube by mistake.
2. BW cube compression
The efact table of a cube is optimized for query performance. The process BW cube compression moves
aggregates single requests from the ffact table to the efact table. Dependent on the kind of data, the total
number of rows can dramatically be reduced by this aggregation. SAP generally recommends BW cube
compression for performance reasons, see
http://help.sap.com/saphelp_nw73/helpdata/en/4a/8f4e8463dd3891e10000000a42189c/content.htm. BW
cube compression has further advantages for inventory cubes. As a side effect, it reduces the number of
partitions on the ffact tables.
For Microsoft SQL Server, we did not always benefit from BW cube compression in the past. Reason was the
index layout using a heap for the efact table when having conventional btree indexes.
2/10
11/20/2016
However, when using SQL Server 2012 columnstore index, we strongly benefit from BW cube compression
for BW query performance. The process of cube compression became much faster, although it contains an
additional step: It fully reorganizes the columnstore index. Since the creation of a columnstore index scales
up very well, we use 8 CPU threads for this by default. You can change the default by setting the RSADMIN
parameter MSS_MAXDOP_INDEXING using report SAP_RSADMIN_MAINTAIN.
3. SQL Server compression

In typical customer scenarios, with sufficient CPU resources and a fast I/O system, the performance impact of
using SQL Server PAGE compression is minimal. When CPU is a bottleneck, PAGE compression may result in
reduced query performance. On the other hand, query performance increases by PAGE compression, if I/O is
a bottleneck. Therefore you should simply keep SAPs default: PAGE compression.
In our suite of test queries we have clearly seen this expected behavior of PAGE compression. When the table
already was in the SQL Server data cache, then the increased CPU usage resulted in slightly slower BW
queries. On the other hand, the query performance was better, if the data head to be read from disk first.
SQL Server compression
NONE
ROW
PAGE
cold empty SQL data cache
100%
76%
75%
warm table already in data cache
100%
101%
110%
Average query runtime compared with NONE compressed ffact table in percent, lower number is better
The disk space savings of ROW compression was as expected. The additional space savings of PAGE
compression was only moderate, because the ffact table only contains numeric fields. Best compression
ratios have been seen with string fields. A large number of partitions also results in an increased space usage
per table.
3/10
11/20/2016
Average disk space usage compared with NONE compressed ffact table, 100 partitions
in percent, lower number is better
4. Degree of parallelism
Per default, SAP BW requests two SQL Server threads per query by using a MaxDop hint. You can change
this default behavior by setting the RSADMIN parameter MSS_MAXDOP_QUERY using report
SAP_RSADMIN_MAINTAIN.
We measured the performance when running BW queries on BW compressed cubes with columnstore index,
dependent on the used degree of parallelism. Two suites of queries were running against two different cubes.
The biggest improvement was seen when moving from 1 CPU thread to 2 threads This is typically only the
case for columnstore indexes, not for conventional btree indexes. Increasing to 4 threads improved
performance noticeably. A further increase to 8 threads did not have any impact in many cases:
MaxDop
Cube 1 100,000,000 rows
1.00
8.13
12.26
15.61
Cube 2 10,000,000 rows
1.00
4.11
4.62
4.39
Average speed increase of columnstore indexes compared with Maxdop 1

in factors, higher number is better
The impact of MaxDop depends on many factors like cube design, query design, actual data and used
hardware. However, in all cases we have cleary seen the negative impact when using only a single CPU thread
per query. That is why you should never set MSS_MAXDOP_QUERY to 1.
When there is a temporary CPU bottleneck, SQL Server can reduce the actual number of used threads for a
query. In extreme cases, this could end up in one thread, even when MSS_MAXDOP_QUERY is set to 4 or
higher. In particular when running BW queries against multiproviders you can run into this issue. A BW multi
provider is a logical cube, which retrieves the data from multiple basis cubes at the same point in time. This
results in many simultaneously running SQL queries: 2 SQL queries one on the ffact table and one on the e
fact table per basis cube. When using MaxDop 4 for a multiprovider consisting of 4 basis cubes, SQL Server
may need up to 32 threads 2 tables * 4 cubes * MaxDop 4 = 32 threads. On a database server with less than
32 CPU threads this can result in actually using MaxDop 1 for at least one SQL query. Keep in mind, that the
response time of a BW query is determined by the slowest participating SQL query.
That is why the default value of MSS_MAXDOP_QUERY =2 was chosen relatively low. On a database server
with more than 64 CPU threads, you may increase MSS_MAXDOP_QUERY to 3 or 4 dependent on your
workload while BW queries are running.
4/10
11/20/2016
For best BW query performance, one should avoid running BW queries during high workload. For example,
BW cube compression can be very CPU intensive, since it includes the recreation of the columnstore index
using 8 CPU threads by default.
5. SQL Server 2012 Columnstore

Using SQL Server 2012 columnstore results in 50% to 80% space savings on efact tables, even compared
with SQL Server PAGE compression. Customers report an increased query performance of factor 3 to 7 in
average compared with conventional btree indexes on the same system. Some BW queries hardly benefit
from the columnstore, while a few queries are even 20 times faster. However, for SAP BW and SQL Server
2012, the columnstore is only available for the efact table. Therefore you must perform BW cube
compression, if you want to benefit from the columnstore.
A columnstore index is optimized for large tables having some million rows. It is described in detail in a
white paper http://scn.sap.com/docs/DOC33129 and in SAP note 1771177. Cubes with small fact tables,
having only a few thousand rows, do hardly benefit from the columnstore. However, the query performance
of such small cubes is typically not an issue. Therefore we recommend using the columnstore for all of your
BW cubes, independent of their current size. Keep in mind that BW cubes are typically growing by and by.
6. Partitioning of efact table

In SAP BW you can optionally and manually partition the efact table by a time characteristics. In contrast to
the partitioning of the ffact table, this partitioning is intended to increase query performance. In the past we
have seen mixed results when testing the impact of partitioning for BW query performance on SQL Server.
Therefore it was typically not worth to repartition a cube. However, when using SQL Server columnstore, we
benefit from partitioning of large efact tables.
Our test suite running on a 100,000,000 row cube showed the following results with MaxDop 4:
Index type
Partition type
btree
nonpart.
columnstore
partitioned
notpart.
partitioned
cold empty SQL buffer cache
1.00
1.06
4.77
9.83
warm data in cache
1.00
0.92
6.78
7.64
cold empty SQL buffer cache
1.00
2.03
warm data in cache
1.00
1.14
Average speed increase in factors, higher number is better

compared with btrees, nonpartitioned lines 1 and 2
and compared with columnstore, nonpartitioned lines 3 and 4
For SQL Server columnstore indexes we see consistent query performance improvements when using
partitioning of the efact table. The performance improvements compared with nonpartitioned columnstore
are moderate factor 1.14 if data is already in cache. However, this is an additional performance increase
compared with conventional btrees. For example, when a nonpartitioned columnstore index was 6.78 times
faster than a btree index, then the partitioned columnstore index was 7.64 times faster.
5/10
11/20/2016
A columnstore index is optimized for large tables having some million rows. Internally, a columnstore index
is divided into segments. Each segment contains up to one million rows. When using 8 CPU threads for
creating the columnstore index, you typically see 8 segments per column, which are not fully filled with one
million rows. When using partitioning for small tables, you further decrease the average segment size of the
columnstore index. Having too small segments decreases the query performance. Therefore you should
consider partitioning only for tables with at least some dozen of million rows.
To fully benefit from the performance improvements of the partitioned columnstore, you have to apply the
latest version of SAP note 1771177 first which will be released in April 2013. Then you should recreate the
indexes of existing, partitioned cubes. The new code improvements optimize columnstore segment
elimination, in addition to having partitions. Therefore, you have a performance benefit on BW cubes
containing a filter on a time characteristics, even when creating only a single partition.
7. BW Aggregates
A BW aggregate is neither a configuration option nor is it SQL Server specific. However, we want to discuss
aggregates here, since they were the preferred means to increase BW query performance in the past. A BW
aggregate is a copy of an existing BW basis cube with a restricted number of characteristics and/or applied
filters. BW aggregates are optimized for one or a few BW queries. Therefore you typically have to create many
aggregates in order to support all BW queries running against a single basis cube. Technically, a BW
aggregate looks like a conventional cube. It has two fact tables, each of them having its own database
indexes. Since BW aggregates are logical copies of the cube, they have to be manually loaded and
compressed each time data is loaded into the basis cube.
On the contrary, a columnstore index on a basis cube is maintained automatically. The size of the cube is
decreasing when using the columnstore, not increasing by creating additional copies of the cube. There is no
need to create new aggregates to support new or adhoc BW queries when using the columnstore.
Therefore the columnstore is the new preferred means to increase BW query performance. Once you define
a columnstore index on a BW cube for Microsoft SQL Server, all existing aggregates of this cube are
deactivated. This is done due to the fact that the BW OLAP processor is not aware of the columnstore.
Therefore using an aggregate which never has a columnstore index is avoided, once the basis cube has a
columnstore index.
Comparing BW query performance

The easiest way to measure the runtime of a BW query is using SAP transaction RSRT. The runtime of a BW
query consist of many components: SQL Query runtime, BW OLAP processor runtime, time for transferring
the results set to the BW application server and the BW client for example BEx or BOBJ and the rendering of
the results on the client.
When comparing BW query runtime between different systems or different configurations you have to
ensure, that you measure under the same conditions. For example, when running a SQL query as part of a
BW query, SQL Server may or may not perform physical I/O. By repeating the same BW query you can ensure
that all data is already in the SQL Server buffer cache. Alternatively you could run the SQL command DBCC
DROPCLEANBUFFERS to ensure that no data is in the buffer cache.
If you want to measure the database performance, then you should turn off the SAP OLAP cache. Otherwise
you typically measure the cache performance single threaded performance of SAP application server rather
6/10
11/20/2016
than the database performance. In the worst case, the database is not accessed at all, if the OLAP cache can
be used. You can turn off the OLAP cache on BW cube level using InfoProvider Properties in SAP transaction
RSA1:
Alternatively you can turn off the OLAP cache on BW query level using SAP transaction RSRT. However, in
productive customer systems the OLAP cache is typically turned on. So why should you turn it off for
performance tests? There are two reasons for this: Firstly, the likelihood of a fully filled OLAP cache is in a test
environment much higher than in a productive system. Therefore you would benefit much more from the
OLAP cache in a test system, which results in unrealistic measurements. Secondly, you typically want to tune
the slowest BW queries running under the worst conditions when the OLAP cache does not contain fitting
entries by chance.
Some BW queries are independent from database performance by nature. When there is a big result set with
millions of rows, then a huge part of the runtime is consumed for transferring the result set from the database
server to the SAP application server, and finally to the BW client. In this case you are measuring the network
throughput rather than the database performance.
Search MSDN with Bing

Search this blog
Search all blogs
Top Server & Tools Blogs

ScottGu's Blog
Brad Andersons "In the Cloud" Blog
Somasegar's Blog
Brian Harry's Blog
Steve "Guggs" Guggenheimer's Blog
Share This Post
7/10
11/20/2016
Recent Posts
SQL Server 2016 improvements for SAP BW November 11, 2016
Where can I find documentation or information for .? October 24, 2016
In a changing market we now offer a broader spectrum of topics October 23, 2016
How to install or upgrade SAP Systems to SQL Server 2016 October 4, 2016
Tags
Administration AlwaysOn Azure BW CHECKDB Compression consistency

Content Server corruption
DBCC Deadlock Desktop Optimization High
Availability Hot
News Jobs Licensing liveCache LOGINFO Log Shipping MaxDB Migration Mirroring Monitoring News
Performance Private Cloud SAP GUI Security Sizing SQL Server System
Center Upgrade Virtualization VLFs Windows
Archives
November 2016 1
October 2016 3
August 2016 2
June 2016 2
May 2016 4
March 2016 2
February 2016 1
January 2016 2
December 2015 1
October 2015 6
August 2015 1
All of 2016 17
All of 2015 29
All of 2014 23
All of 2013 29
All of 2012 43
All of 2011 28
All of 2010 33
All of 2009 15
All of 2008 18
All of 2007 11
Tags
Administration
BW
Compression
Performance
8/10
11/20/2016
Join the conversation

FK
Add Comment
4 years ago
Hi Martin,
we are 10 days productive with ColumnStoreIndex. Approximately 80% of all cubes were converted, the rest is
daily full load and datavolume is below 2Mio rows each. Biggest 6 Cubes are in the 60Mio rows size. We have
eliminated nearly all aggregates, the few remaining aggregates all have a reduction factor > 20 and are used
for highly aggregated daily reports.
User response is great. Naturally the ones with once long running queries say it's remarkebly faster and the
other with former response times below 10 sec see no significant improvement. For some it opens new
possibilities of analyzing and control data quality. If you can execute two or three steps in the same time it
took one step before the upgrade, you are more interested and motivated to do so.
The effect is not so prominent for users in ExcelEnvironment because of transfer time and ExcelOverhead,
but significant enough to be recognized.
ColumnStoreIndex in SAP BW may not be as impressively fast as SAP BW on HANA but a really big step
performancewise. And you can get it for only the cost of a proper configured standard hardware with little
upgrade and conversion effort and the most important, no licence overhead.
Thanks
NK
3 years ago
Hi Martin,
One of your blogs makes a mention of the number of indexes dropping from around 10 to 2 for an E fact
table. We have set up a POC environment and have executed MSSCSTORE for a test cube. The E table now
contains the earlier Dimension indexes, created by DDIC, a primary Index P and the CS index.
My question is whether we need to drop the dimension indexes in the E fact table? If yes, then, during the
next transport won"t they get recreated again? Is this what you allude to as reduction of indexes.
Cheers
Martin Merdes
3 years ago
Hi NK,
the indexes 010, 020, still exist in DDIC, but not on the DB once you create a CS index. If the indexes still
exist on DB which I have never seen or heard about, then open a support message at SAP in component
BWSYSDBMSS.
9/10
11/20/2016
Thanks
Martin
NK
3 years ago
Thank you for the response Martin. Understood, my mistake. Cameron clarified too.
They do not exist on the database.
Cheers
Naresh Anakamatla
3 years ago
Very nice information
2016 Microsoft Corporation.

Terms of Use Trademarks Privacy & Cookies
10/10

Optimizing BW Query Performance - Running SAP Applications On The Microsoft Platform

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Optimizing BW Query Performance - Running SAP Applications On The Microsoft Platform

Hochgeladen von

Copyright:

Verfügbare Formate

11/20/2016

Running SAP Applications on the Microsoft

March 19, 2013 by Martin Merdes // 5 Comments

1. Number of requests in ffact table

Number of partitions on ffact table

cold empty SQL data cache

warm table already in data cache

3. SQL Server compression

SQL Server compression

cold empty SQL data cache

warm table already in data cache

Cube 1 100,000,000 rows

Cube 2 10,000,000 rows

Average speed increase of columnstore indexes compared with Maxdop 1

5. SQL Server 2012 Columnstore

6. Partitioning of efact table

cold empty SQL buffer cache

warm data in cache

cold empty SQL buffer cache

warm data in cache

Average speed increase in factors, higher number is better

Comparing BW query performance

Search MSDN with Bing

Search all blogs

Top Server & Tools Blogs

Share This Post

Administration AlwaysOn Azure BW CHECKDB Compression consistency

DBCC Deadlock Desktop Optimization High

Join the conversation

Very nice information

2016 Microsoft Corporation.

Das könnte Ihnen auch gefallen