Sie sind auf Seite 1von 22

A Common Database

Approach for OLTP and


OLAP Using
an In-Memory Column
Database – Hasso
Platner
Nuzhi Meyen – 168247G
Content

▪ Introduction
▪ Column Storage
▪ Suitability of Column Storage for Update Intensive
Applications
▪ Consequences of Insert Only Approach
▪ Memory Consumption of Column Storage Vs Row Storage
▪ Impact of Column Store Databases on Application
Development

Nuzhi Meyen - 168247G 12/22/2019 2


Introduction

▪ RDBMS for enterprises/companies usually consists of two flavors.


1. Online Transaction Processing (OLTP) Systems
2. Online Analytical Processing (OLAP) Systems

▪ Both are based on relational theory but rely on different technical


approaches. This paper explores the capability of having both
OLTP & OLAP capabilities on a single RDBMS.

Nuzhi Meyen - 168247G 12/22/2019 3


Introduction (Cont…)

▪ Over the years growth of main memory & parallelism enabled


enterprise systems to grow in functionality & volume.
▪ In memory column storage was used for OLAP successfully prior and
this paper attempts to answer the questions regarding the
feasibility of an in memory column store database for OLTP
systems. This would enable both OLTP and OLAP queries to be
handled on a single column store database.

Nuzhi Meyen - 168247G 12/22/2019 4


Column Storage

Source: SAP HANA- An Introduction

Nuzhi Meyen - 168247G 12/22/2019 5


Column Storage (Cont…)

▪ Size of column store table is significantly less than that of


equivalent row based table . In the example given in the
paper for a table with 35 million rows and 160 attributes the
row based table consumes 35 Gb of space while the column
store table requires 8 Gb.

Nuzhi Meyen - 168247G 12/22/2019 6


Column Storage (Cont…)

Source: SAP HANA- An Introduction


Compression due to columnar format

Nuzhi Meyen - 168247G 12/22/2019 7


Column Storage (Cont…)

▪ Due to the fact that real world scenarios are based on set
operations and usage of the compressed integer format a
performance gain of 100-1000 compared to non-
compressed data formats was observed by the author.
▪ In addition parallelization can be easily implemented .

Nuzhi Meyen - 168247G 12/22/2019 8


Column Storage (Cont…)

Dictionary Encoding

Parallel Execution
Source: SAP Simple Finance- An Introduction

Nuzhi Meyen - 168247G 12/22/2019 9


Column Storage (Cont…)

▪ Advantages include with current CPU speeds no need to


even provide a primary key index and instead can use the
full table scan.
▪ Since there are no restrictions anymore for the applications
to select data only along on predefined navigation paths
this leads to a better separations of concerns between the
database layer and application layer.
▪ The author suggests that hard disks have become
yesterday’s tape and be used only for transaction logging
and snapshot recovery.
Nuzhi Meyen - 168247G 12/22/2019 10
Suitability of Column Storage for
Update Intensive Applications
▪ Updates on column store databases are considered to be
expensive since even though data is in memory the
attribute dictionaries might have to be recalculated.
▪ The author analyzed the updates in SAPs financial system
to categorize updates in to three main types.
1. Aggregate Updates
2. Status Updates
3. Value Updates

Nuzhi Meyen - 168247G 12/22/2019 11


Suitability of Column Storage for
Update Intensive Applications (Cont…)
▪ Updates on column store databases are considered to be
expensive since even though data is in memory the
attribute dictionaries might have to be recalculated.
▪ The author analyzed the updates in SAPs financial system
to categorize updates in to three main types.
1. Aggregate Updates
2. Status Updates
3. Value Updates

Nuzhi Meyen - 168247G 12/22/2019 12


Suitability of Column Storage for
Update Intensive Applications (Cont…)
▪ Updates on column store databases are considered to be
expensive since even though data is in memory the
attribute dictionaries might have to be recalculated.
▪ The author analyzed the updates in SAPs financial system
to categorize updates in to three main types.
1. Aggregate Updates
2. Status Updates
3. Value Updates

Nuzhi Meyen - 168247G 12/22/2019 13


Suitability of Column Storage for
Update Intensive Applications (Cont…)
▪ Aggregate Updates –
Most of the updates taking place in financial applications
apply to total records following the structure of the coding
block. The coding block can contain e.g. account number,
legal organization, year, etc. These total records are basically
materialized views on the journal entries in order to facilitate
fast response times when aggregations are requested. The
more instances of aggregates are requested, the better for
the relative performance of the column storage on the fly.

Nuzhi Meyen - 168247G 12/22/2019 14


Suitability of Column Storage for
Update Intensive Applications (Cont…)
▪ Status Updates –
Status variables (e.g. unpaid, paid) typically use a predefined
set of values and thus create no problem when performing an
in-place update since the cardinality of the variable does not
change.

Nuzhi Meyen - 168247G 12/22/2019 15


Suitability of Column Storage for
Update Intensive Applications (Cont…)
▪ Value Updates – A column-store table comprises a main
store and a delta store.

Source: SAP Simple Finance- An Introduction

Nuzhi Meyen - 168247G 12/22/2019 16


Suitability of Column Storage for
Update Intensive Applications (Cont…)

Source: SAP Simple Finance- An Introduction

Nuzhi Meyen - 168247G 12/22/2019 17


Consequences of Insert Only Approach

▪ Application Level Locks –


Many business transactions deal with several relational
tables and multiple tuples of one table simultaneously. The
applications “think” in objects, a layer which is established on
top of the relational model. For example in the reconciliation
of open items in accounts payable or receivable multiple
open items will be marked as paid in one transaction. The
lock is not on the accounting line items table but on the
objects creditor or debitor.

Nuzhi Meyen - 168247G 12/22/2019 18


Consequences of Insert Only Approach

▪ Database Level Locks –


Inserts are added to the delta store at the appropriate
partition of a table. The timestamp at the start of a query
defines which tuples are valid (only tuples with a lower
timestamp). In case an insert is in progress (single or multiple
ones) the timestamp of the start of a new query will be set to
the timestamp of the insert transaction minus one, and again
the ongoing insert(s) will be ignored. This procedure is
equivalent to snapshot isolation via timestamps

Nuzhi Meyen - 168247G 12/22/2019 19


Memory Consumption of Column Storage Vs Row
Storage
▪ With regards to memory consumption estimates the
author uses a factor of 10 based on compression in favor of
column storage.
▪ A factor of 2 in favor of column storage based on the
elimination of redundant aggregates is further estimated
by the author.
▪ With regards to main memory requirements authors take a
factor 5 in favor of column storage into account.

Nuzhi Meyen - 168247G 12/22/2019 20


Impact of Column Store Databases on
Application Development
▪ Rewriting existing applications using SQL the author
expects a reduction in the amount of code by more than
30% (in more formal applications like financials 40-50%).
▪ No Indices required.
▪ There are no aggregates in form of materialized views.They
will instead be created via algorithms on the fly.

Nuzhi Meyen - 168247G 12/22/2019 21


THANK YOU!!
QUESTIONS ?

Nuzhi Meyen - 168247G 12/22/2019 22

Das könnte Ihnen auch gefallen