Beruflich Dokumente
Kultur Dokumente
Lesson: Architecture
SAP HANA Database Architecture A running SAP HANA system consists of multiple
communicating processes (services). The following shows the main SAP HANA database
services in a classical application context.
Such traditional database applications use well-defined interfaces (for example, ODBC and
JDBC) to communicate with the database management system functioning as a data source,
usually over a network connection. Often running in the context of an application server, these
traditional applications use Structured Query Language (SQL) to manage and query the data
stored in the database.
The main SAP HANA database management component is known as the index server, which
contains the actual data stores and the engines for processing the data. The index server
processes incoming SQL or MDX statements in the context of authenticated sessions and
transactions.
The SAP HANA database has its own scripting language named SQLScript. SQLScript embeds
data-intensive application logic into the database. Classical applications tend to offload only
very limited functionality into the database using SQL. This results in extensive copying of data
from and to the database, and in programs that slowly iterate over huge data loops and are hard
to optimize and parallelize. SQLScript is based on side-effect free functions that operate on
tables using SQL queries for set processing, and is therefore parallelizable over multiple
processors.
In addition to SQLScript, SAP HANA supports a framework for the installation of specialized
and optimized functional libraries, which are tightly integrated with different data engines of the
index server. Two of these functional libraries are the SAP HANA Business Function Library
(BFL) and the SAP HANA Predictive Analytics Library (PAL). BFL and PAL functions can be
called directly from within SQLScript.
SAP HANA also supports the development of programs written in the R language. SQL and
SQLScript are implemented using a common infrastructure of built-in data engine functions that
have access to various meta definitions, such as definitions of relational tables, columns, views,
and indexes, and definitions of SQLScript procedures. This metadata is stored in one common
catalog.
The database persistence layer is responsible for durability and atomicity of transactions. It
ensures that the database can be restored to the most recent committed state after a restart
and that transactions are either completely executed or completely undone. The index server
uses the preprocessor server for analyzing text data and extracting the information on which the
text search capabilities are based. The name server owns the information about the topology of
SAP HANA system. In a distributed system, the name server knows where the components are
running and which data is located on which server.
Business Example
A customer need to transform his landscape to improve his global performance
and lessen his IT administration task. He needs to turn to SAP HANA technology
and learn the new architecture of SAP HANA Appliance.
At the top is the Connection and Session Management which creates and manages
sessions and connections for the database clients such as SAP BusinessObjects
Reporting tools or applications.
The Transaction Manager is the component that coordinates transactions, controls
transactional isolation and keeps track of running and closed transactions.
The client requests are analyzed and executed by the set of components
summarized as Request Processing and Execution Control. Once a session
is established, database clients typically use SQL statements to communicate
with Request Processing and Execution Control. For analytical applications the
multidimensional query language MDX is supported in addition.
Incoming SQL requests are received by the SQL Processor. Data manipulation
statements are executed by the SQL Processor itself. Other types of requests are
delegated to other components.
.
High Availability per Data Center Scale-Out with Standby Available today HW
partners: Fujitsu, HP, IBM, more to come.
High Availability across Data Centers Disaster Tolerance Planned for General
Availability of BW on HANA
HW partners, planned: Fujitsu, HP, IBM, more to come.
Solutions depend on HW partner technology.
High-Availability enables the failover of a node within one distributed SAP HANA
appliance. Failover uses a cold standby node and gets triggered automatically.
Landscape Up to 3 master name-servers can be defined. During startup one server
gets elected as active master. The active master assigns a volume to each starting
index server or no volume in case of standby servers.
Master name-server failure
In case of a master name-server failure, another of the remaining name-servers
will become active master.
Index-server failure
The master name-server detects an index-server failure and executes the failover.
During the failover the master name-server assigns the volume of the failed
index-server to the standby server.
The General Parallel File System (GPFS) is a shared-disk clustered file system
developed by IBM.
The mirroring is offered on the storage system level. It will be offered together
with the appliance as an special offering by our partners. The hardware partner
will define how this concept is finally realized with his operation possibilities.
Performance impact is to be expected on data changing operations as soon as
the synchronous mirroring is activated. The impact depends strongly on a lot
of external factors like distance, connection between data centers, etc. The
synchronous writing of the log with the concluding COMMITs is the crucial part
here.
In case of an emergency the primary data center is not available any more and a
process for the take-over must be initiated. So far a lot of customers wished to have
a manual process here, but an automated process is also able to be implemented.
This take-over process then would end the mirroring officially, will mount the disks
to the already installed HANA software and instances, and start up the secondary
database side of the cluster. If the hostnames and instance names on both sides of
the cluster are identical, no further steps with hdbrename are necessary.
It would be possible to run a development and/or QA instance of the three tier
installation on this secondary cluster hardware, simply to utilize it until the
take-over is executed. The take-over then would stop these dev. and/or QA
instances and mount the production disks to the hosts. It would require an
additional set of disks for the dev. and QA instance.
So far no hot standby via log shipping is available or even log shipping by
recovering of log backups on a standby host. This needs some changes in the
engines of HANA database which needs time to be realized. Both solutions are on
the agenda of HANAs future.
There are different technologies how to load data into SAP HANA (different data
provisioning scenarios) which are covered in the Unit Data Provisioning.
The methods are:
SAP Landscape Transformation
SAP Data Services
Flat file upload
Direct Extractor Connection (DXC)
Current log position is determined (log position from which logs must be
read during restart).
Change lock is released.
Phase #3:
All data is written to disk. Changes are allowed again during this phase.
Temporary buffers created in phase 2.
List of open transactions
Row store check point is invoke
Log queue is flushed up to the save point log position
Restart record is written (containing e.g. the save point log position)
Shadow paging is used to undo changes that were persisted since the last
save point. With the shadow page concept, physical disk pages written by the last
save point are not overwritten until the next save point is successfully completed.
Instead, new physical pages are used to persist changed logical pages. Until the
next save point is complete, two physical pages may exist for one logical page: The
shadow page, which still contains the version of the last save point, and the current
physical page which contains the changes written to disk after the last save point.
After restart, the system is restored from the save point versions of the data pages.
This way all data changes written since the last save point are automatically rolled
back. After the save point is restored, the log is replayed to restore the most recent
committed state.
SAP HANA SPS06 Onwards: Backup and Recovery Database copy from m nodes to
n nodes
A database copy using backup/recovery is with SPS06 possible for a scale-out
Figure 2: SAP HANA SPS 6 onwards : Backup and Recovery Database copy from
m nodes to n nodes
out):
/usr/sap/<SID>/SYS/global/hdb/custom/config
/usr/sap/<SID>/HDB<instance no>/<hostname> (without sub-directories!)
Figure 5: SAP HANA SPS6 Onwards: Backup Information in SAP HANA Studio
SAP HANA SPS6: Backup catalog in SAP HANA studio overview tab
The Backup catalog provides detailed information on data and log backups,
e.g. start/end time, duration, size, throughput
In the navigator in SAP HANA studio you can choose the Backup Catalog tab
Per default, only data backups are displayed. There is a check box to also
display log backup
More detailed information, e.g. status, destination type, location, and
services included in the backup are available and can be selected
Figure 123: SAP HANA SPS6 Onwards: Backup Catalog in SAP HANA Studio
Recovery:
In general there are three data sources involved in the recovery process:
Data backups stored in the file system
Log Backups stored in the file system
Online logs
Lesson : Hardware
General SAP HANA Hardware Specifications
SAP HANA is sold as a pre-configured, pre-installed appliance that is delivered directly from
the hardware partner. SUSE Linux SLES 11 and Red Hat Enterprise Linux for SAP HANA are
the only supported operating systems, and Intel E7 processors are the primary supported chips.
Samsung RAM is currently the primary memory used by most of the hardware partners.
Most partner systems use on-board 15k RPM hard disks (4x ratio for main memory) for datavolume backup and Fusion I/O SSD cards (1:1 ratio for main memory) for log-volume backup.
SAP ensures the quality, availability, and performance of the certified systems through a
rigorous process of end-to-end quality testing, performance testing, and continuous early
access to next-generation technologies from all of its partners.
SAP HANA Product Availability Matrix (PAM)
SAP has recently made the SAP HANA supported hardware matrix available on an open
website. Since the supported configurations are changing so rapidly, well simply insert the link
to the site here so you can always get the most up-to-date listing.
SAP Certified Appliance Hardware for SAP HANA
Additional Infrastructure
SAP recommends that customers deploy 10 gb network data connections. SAP has no
preference on external storage/SAN; rather, it is determined by the server vendor.
Multi-Node and Scale-Out Options
SAP HANA is a linearly scalable database, meaning, you can string together multiple physical
servers into a single logical database instance and achieve linear performance results for every
additional server added to the landscape.
Currently, SAP HANA has certified several vendors for multi-node scale out. Literally, you just
add another node/server to the landscape, and you immediately enjoy an exponential increase
in performance, in addition to the additional memory. Refer to the SAP HANA hardware partner
section of this chapter for more information on the various scale-out offerings from the individual
partners.
In 2012, SAP recently completed the first 100TB benchmark for the 16 node scale out solution.
The data set consisted of five years of Sales and Distribution Records (100 Billion records) and
was run on a single logical server consisting of 16 nodes. Each node was a certified IBM X5
machine with eight Intel E7-8870 processors with 10 cores, running at 2.40 GHz. The total cost
of the 16 node system was roughly USD$640K.
SAP HANA was able to scan 100 Billion rows/Sec on the 100 TB dataset and was able to load
16 million records/min. SAP HANAs compression algorithms were able to achieve 20x
compression on the raw data when loading into memory, going from 100TB on disk to 3.8TB in
memory.
Typical query results were:
No database tuning, indexing or caching were needed to achieve these results. To put that in
context, the closest competitive database is roughly 1000x slower in the same benchmark and
several times more expensive.
Joint roadmap enablement. Early in the design process, Intel and SAP decision-makers identify
complementary features and capabilities in their upcoming products, and those insights help to
direct the development cycle for maximum value.
Collaborative product optimization. Intel engineers located on-site at SAP work with their SAP
counterparts to provide tuning expertise that enables SAP HANA and other software solutions to
take advantage of the latest hardware features.
Combined research efforts. Together, researchers from Intel and SAP continually explore and
drive the future of business computing. As a result of these efforts, customer solutions achieve
performance, scalability, reliability, and energy efficiency that translate into favorable ROI and
TCO, for increased business value.
Operational Success and Management of Real-Time Events
In-memory computing based on SAP solutions on the Intel Xeon processor E7 family enables
greater business agility and innovative usage models that let companies respond to changing
conditions in real time.
Scenarios such as monitoring customer and supplier activity can generate petabytes of data,
the value of which depends on the ability to distill it into actionable intelligence.
SAP HANA and the latest Intel Xeon processor E7 v2 family with up to 15 cores per socket
deliver rapid data analysis that discerns patterns and trends so you can adjust your just-in-time
supply chain rapidly. You can also model what if scenarios to structure sales and promotions
for optimal outcomes based on the latest sales and pipeline information.
Features of the Intel Xeon processor E7 v2 family such as 37.5MB of L3 cache, Intel
QuickPath Interconnects, and quad-channel integrated memory controllers with max memory
speed of up to up to 1600MHz deliver extraordinary capabilities for businesses of all sizes that
implement SAP HANA for functionality such as business intelligence and data analytics.
Especially the up to 24 DIMMs per socket with 64GB max DIMM density which allow up to 6 TB
in 4-socket, up to 12 TB in 8-socket or 24 TB in 16-socket servers provide SAP customers
significant improvements for large transactional workloads e.g. BusinessSuite on HANA.
Performance Optimizations of SAP HANA with the Intel Xeon Processor E7 Family
SAP HANA benefits dramatically from high-speed Intel QuickPath processor- to-memory
interconnects and the latest processor instructions, Streaming SIMD Extensions e.g. Intel
AVX. Those features eliminate many I/O bottlenecks, so processor headroom is available to
generate excellent throughput and responsiveness. SAP HANA is also engineered to take
particular advantage of RAS (reliability, availability, and serviceability) features of the Intel Xeon
processor E7family, especially error correction through Machine Check Architecture Recovery,
for mission-critical implementations.
As a result of the high level of performance optimization for servers based on the Intel Xeon
processor E7 family, SAP HANA can provide businesses of all sizes superior results for data
warehousing implementations such as business intelligence and data analytics. Assured
Performance with Mission-Critical Advanced Reliability of the Intel Xeon Processor E7 Family
Machine Check Architecture Recovery, a reliability, availability, and serviceability (RAS) feature
built into the Intel Xeon processor E7 family, enables the hardware platform to generate
Machine Check Exceptions. In many cases, these notifications enable the system to take
corrective action that allows SAP HANA to keep running where an outage would otherwise
occur.
Hardware based on the Intel Xeon processor E7 family enables SAP HANA to fail over from
one processor socket to another in the event of a processor failure and to handle memory errors
with as little impact to workloads as possible.
Copyright 2014 Intel Corporation. All rights reserved
production environments such as the one used by Medtronic, a large, worldwide manufacturer
of medical devices (see customer example). The persistency layer is provided by two Fusion IO
cards to avoid possible bottlenecks in duo card configurations sharing the same PCI slot.
SAP HANA Scale-out offering
The Cisco UCS solution that has been certified for large SAP HANA implementations is a
uniquely scalable appliance. It allows customers to easily adapt to the growing demands of their
individual environment by incrementally adding Cisco B440 M2 blade servers with 4 Intel
Xeon Processors E7-4870 (2.4 GHz) and up to 512 GB usable memory each, as needed. For
every four Cisco UCS blade servers, the persistency layer is provided by an EMC VNX 5300 or
a NetApp FAS 3240, depending on customer preference.
The basic configuration of the Cisco scale-out offering is made up of redundant fabric
interconnects with embedded infrastructure management, a Cisco UCS C200 server for SAP
HANA studio, a Cisco 2911 for secure remote management, and one enclosure with support for
up to 4 Cisco B440 blades. The basic configuration can easily scale by adding up by a literally
infinite number of Cisco B440 M2 blades servers each and the correspondent storage from
EMC or NetApp. The beauty of Ciscos scale out architecture that it can be extended by
additional blades and storage units without shutting down the HANA system as we have proven
at eBay where an existing HANA system with 4 TB was extended to 12 TB on the fly without
any downtime
Copyright 2014 Cisco. All rights reserved
The other Hardware vendors. Please refer PAM for further details.