SAP HANA Vora Installation and Administr PDF

PUBLIC
SAP HANA Vora 1.3

Document Version: 1.2 – 2017-03-14
SAP HANA Vora Installation and Administration

Guide
Content
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 SAP HANA Vora and Apache Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 SAP HANA Vora and Apache Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
2.1 Installation Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Hadoop Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Cluster Provisioning Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Operating Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Supported Platforms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Cluster Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Required Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Browser Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Prepare the Distributed Log Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Prepare the Document Store Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Prepare the Disk Engine Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Prepare the Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Configure Sudo Access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Validate the Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Collect Hadoop Cluster Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 SAP HANA Vora Software Download. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
2.3 SAP HANA Vora Manager and SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Node Types and Node Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Installing SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Prepare for Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Generate an Initial Password for SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Installing the SAP HANA Vora Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Deploy the SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 Validate the SAP HANA Vora Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.7 Install the SAP HANA Vora Zeppelin Interpreter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.8 Connect SAP HANA Spark Controller to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.9 Connect SAP Lumira to SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
2.10 Updating SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Export Metadata from SAP HANA Vora 1.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
Update SAP HANA Vora Using Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Update SAP HANA Vora Using Cloudera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Update SAP HANA Vora for MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
SAP HANA Vora Installation and Administration Guide

2 PUBLIC Content
Import Metadata into SAP HANA Vora 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.11 SAP HANA Vora Default Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.1 Configure Proxy Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Enable Spark Auto-Registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Sizing Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Vora Disk Engine Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Vora In-Memory Engine Swapping Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Run SAP HANA Vora As a Non-Root User. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Start and Stop the SAP HANA Vora Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6 Start and Stop the SAP HANA Vora Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Examine the SAP HANA Vora Nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
3.8 Check the Connection Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
3.9 Manage Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.10 Manage Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.11 Delete the SAP HANA Vora Service State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.12 SAP HANA Vora Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.13 Cluster Utilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
3.14 Accessing SAP HANA Vora from SAP HANA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Enable the SAP HANA Wire for Smart Data Access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Create an SAP HANA Vora Remote Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Create Virtual Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
SQL Query and Data Type Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Reroute Stored Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.15 Best Practices: Administration and Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
HDFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Choosing a Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Example Cluster Configuration Including a Client Machine (Jump Box). . . . . . . . . . . . . . . . . . . .88
4 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1 Enabling Kerberos Authentication for SAP HANA Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Kerberos Overview and Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92
Enable Access to a Secured Hadoop Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Use SAP HANA Vora with the MIT Kerberos Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Use SAP HANA Vora with Active Directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Enabling Authentication Between SAP HANA Vora and HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Enable Authentication Between SAP HANA Vora Components. . . . . . . . . . . . . . . . . . . . . . . . . 98
Configure Authentication Between Apache Spark and SAP HANA Vora. . . . . . . . . . . . . . . . . . .100
Run the Spark Shell with Kerberos Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Content PUBLIC 3
Connect SAP Lumira to a Kerberized SAP HANA Vora Cluster. . . . . . . . . . . . . . . . . . . . . . . . . 102
Configuring Authentication for SAP HANA Vora with MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Configure the Thrift Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2 Configure SAP HANA Vora UI Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3 Verifying Consul UI Security Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

4 PUBLIC Content
1 Introduction
SAP HANA Vora provides a set of in-memory query engines and a disk-based processing engine that are
integrated in the Hadoop ecosystem and Spark execution framework. Able to scale to thousands of nodes,
SAP HANA Vora is designed for use in large distributed clusters and for handling big data.
Fast Query Execution
The SAP HANA Vora relational in-memory engine holds data in memory and boosts the execution
performance of Spark. Supporting just-in-time code compilation, it translates incoming SQL queries into
machine-level code on the fly using a LLVM compiler, enabling them to be executed quickly and efficiently.
Data Analytics
SAP HANA Vora makes available OLAP-style capabilities for data on Hadoop, in particular, a hierarchy
implementation that allows you to define hierarchical data structures and perform complex computations on
different levels of data. Extensions to Spark SQL also include enhancements to the data source API to enable
Spark SQL queries or parts of the queries to be pushed down to the appropriate SAP HANA Vora engines.
SAP HANA Integration
Data processing between the SAP HANA and Hadoop environments lets you combine data in SAP HANA with
big data stored in Hadoop systems and process it in Spark or SAP HANA applications.
Graph Processing
A distributed in-memory graph engine allows you to execute commonly used graph operations on data stored
in SAP HANA Vora and is optimally designed for complex read-only analytical queries on very large graphs.
Time Series Analysis
The in-memory time series engine supports time series analysis algorithms that work directly on top of the
compressed data, providing features such as standard aggregation, granularization, and advanced analysis.

Introduction PUBLIC 5
Document Store
A distributed in-memory JSON document store supports rich query processing over JSON data.
Disk Storage
The disk engine provides relational column-based storage, allowing you to use relational capabilities without
loading data into memory.
Business Functions
Business functions, such as currency conversion and unit of measure conversion, make it easier to use data in
business settings.
1.1 SAP HANA Vora and Apache Hadoop
The SAP HANA Vora solution is built on the Hadoop ecosystem, an open-source project providing a collection
of components that support distributed processing of large data sets across a cluster of machines. Hadoop
allows both structured as well as complex, unstructured data to be stored, accessed, and analyzed across the
cluster.
The main components used in this environment are shown in the figure below:

6 PUBLIC Introduction
Component Description More Information
Ambari An open operational framework for provisioning, Apache Ambari

managing and monitoring Apache Hadoop clusters.
Cloudera Cloudera Manager - Cloudera's automated cluster Cloudera

management tool.
MapR MapR Control System (MCS) - a cluster adminis MapR

tration tool for configuring, monitoring, and manag
ing clusters.
HDFS The Hadoop Distributed File System. HDFS Users Guide
Zookeeper A centralized service for maintaining configuration Apache ZooKeeper

information and naming, and for providing distrib
uted synchronization and group services.
Yarn Hadoop’s resource manager and job scheduler. Apache Hadoop YARN
HBase The Hadoop database. Apache HBase
Pig A high-level data-flow language and execution Apache Pig

framework for parallel computation.
Spark SQL A module for structured and semi-structured data Spark SQL and DataFrame Guide
processing.
Apache Hive A data warehouse infrastructure supporting data Apache Hive

summarization, query, and analysis.
MLib A machine learning tool that runs on Spark. Machine Learning Library (MLlib) Guide
1.2 SAP HANA Vora and Apache Spark
The SAP HANA Vora system consists of two main components, the SAP HANA Vora engines (a set of in-
memory query engines and a disk-based engine) and the SAP HANA Vora Spark extension library, which
provides access to the engines and their functional features.
SAP HANA Vora Engines
The SAP HANA Vora engines are services that you add to your existing Hadoop installation. SAP HANA Vora
instances (with the exception of the disk engine) hold data in memory and boost the performance of out-of-the
box Spark. To increase execution performance on the node level, you add an SAP HANA Vora instance to each
compute node so that it contains the following:
● A Spark worker (and the necessary Hadoop components)

● One or more SAP HANA Vora engines

Introduction PUBLIC 7
The integration of the SAP HANA Vora engine with Spark is shown in the overview below:
SAP HANA Vora Spark Extension Library
The SAP HANA Vora extension library allows SAP HANA Vora to be accessed through Spark. It also makes
available additional functionality, such as a hierarchy implementation, which allows you to build hierarchies
and run hierarchical queries.
Both components are contained in the SAP HANA Vora installation package.
Related Information
SAP HANA Vora Software Download [page 19]

Node Types and Node Assignments [page 21]

8 PUBLIC Introduction
2 Installation
Before installing SAP HANA Vora, review the installation prerequisites to ensure your Hadoop cluster is
properly configured. Then download the SAP HANA Vora installation package and install SAP HANA Vora on
your cluster.
Complete the individual tasks in the following order:
Task See
Ensure your Hadoop cluster is correctly set up and meets Installation Prerequisites [page 10]
the installation requirements for SAP HANA Vora
Find out which package is required to install SAP HANA SAP HANA Vora Software Download [page 19]
Vora and where it is available
Understand the purpose of the SAP HANA Vora Manager SAP HANA Vora Manager and SAP HANA Vora Services
and SAP HANA Vora services [page 20]
Check the node type overview to see where the SAP HANA Node Types and Node Assignments [page 21]
Vora Manager and SAP HANA Vora services should be de
ployed
Install the SAP HANA Vora Manager and SAP HANA Vora Installing SAP HANA Vora [page 23]
services
Ensure SAP HANA Vora is correctly installed Validate the SAP HANA Vora Installation [page 39]
Optionally enable the Zeppelin interpreter if you want to use Install the SAP HANA Vora Zeppelin Interpreter [page 41]
Zeppelin (an interactive data analytics tool)
Set up the Spark controller if you want to query tables ac Connect SAP HANA Spark Controller to SAP HANA Vora
cessible through Spark from SAP HANA [page 45]
Connect SAP Lumira if you want to visualize SAP HANA Connect SAP Lumira to SAP HANA Vora [page 48]
Vora data in SAP Lumira
Update your SAP HANA Vora installation with the latest ver Updating SAP HANA Vora [page 51]
sions of the installation packages
Related Information
SAP HANA Vora Default Ports [page 59]

Installation PUBLIC 9
2.1 Installation Prerequisites
A Hadoop cluster is a prerequisite for installing SAP HANA Vora. Review the installation requirements to
ensure that the cluster you use is correctly set up.
Installation Prerequisite Checklist
☐ Hadoop Distributions [page 10]

☐ Cluster Provisioning Tools [page 10]
☐ Operating Systems [page 11]
☐ Supported Platforms [page 12]
☐ Browser Support [page 13]
☐ Cluster Sizing [page 12]
☐ Required Components [page 13]
☐ Prepare the Distributed Log Server [page 14]
☐ Prepare the Document Store Server [page 15]
☐ Prepare the Disk Engine Server [page 16]
☐ Prepare the Cluster Manager [page 16]
☐ Configure Sudo Access [page 17]
☐ Validate the Cluster [page 18]
☐ Collect Hadoop Cluster Information [page 19]
2.1.1 Hadoop Distributions
SAP HANA Vora can only be used with selected Hadoop distributions:
● Hortonworks Data Platform (HDP)

● Cloudera Enterprise (CDH)
● MapR
2.1.2 Cluster Provisioning Tools
The cluster must be managed by one of the following cluster provisioning tools:
● Apache Ambari 2.2.1 and above

● Cloudera Manager 5.7

10 PUBLIC Installation
● MapR Control System (MCS) 5.1
2.1.3 Operating Systems
The following operating systems are supported:
● SUSE Linux Enterprise Server (SLES) 11 SP4

● Red Hat Enterprise Linux (RHEL) 6.8 (see compatibility pack details below) and 7.2
● CentOS 6.7 (see compatibility pack details below) and 7.2
C++ runtime compatibility packages are required for certain operating system versions (RHEL 6 und CentOS
6). For more information, see SAP Note 2228351 . The installation instructions given for the SAP HANA
database also apply to SAP HANA Vora.
Note
You need to configure Spark with the C++ runtime compatability package. For more information, see
Configure Spark with the SAP C++ Compatability Package [page 11].
For an up-to-date list of supported operating systems, see SAP Note 2213226 .
2.1.3.1 Configure Spark with the SAP C++ Compatability

Package
If you installed the C++ runtime compatability package, you need to configure the environment of the user
running Spark as well as YARN (yarn-site.xml).
Context
The SAP HANA Vora extension communicates with the SAP HANA Vora catalog through Java JNI. It makes
use of C++ libraries that require the C++ runtime compatability package to be configured. The connection can
be instantiated either from the Spark driver (which is run as the user who initiates the Spark session), or from
a Spark worker process, which is controlled by YARN. Therefore, the Spark user and YARN need to be
configured appropriately as described below.
Procedure
1. On all hosts and for each user who is able to run Spark, make the LD_PRELOAD environment variable
available, pointing to the path of the C++ compatability package:
export LD_PRELOAD=/opt/rh/SAP/lib64/compat-sap-c++.so:${LD_PRELOAD}

2. Open the yarn-site.xml configuration file on your system and add the following XML fragment:
<property>
<name>yarn.nodemanager.admin-env</name>
<value>LD_PRELOAD=/opt/rh/SAP/lib64/compat-sap-c++.so</value>
<description>LD_PRELOAD</description>
</property>
2.1.4 Supported Platforms
The following combinations of operating system, cluster provisioning tool, and Hadoop distribution are
supported:
Operating System Hadoop Distribution Hadoop Version Cluster Provisioning Tool
SLES 11 SP4(1) CDH 5.7 2.6.0 Cloudera Manager 5.7
SLES 11 SP4(1) HDP 2.4.2 2.7.1 Ambari 2.2.1 and above
RHEL 6.8 CDH 5.7 2.6.0 Cloudera Manager 5.7
RHEL 6.8 HDP 2.4.2 2.7.1 Ambari 2.2.1 and above
RHEL 6.8 MapR 5.1 2.7.0 MapR Control System 5.1
RHEL 7.2 MapR 5.1 2.7.0 MapR Control System 5.1
CentOS 7.2(2) HDP 2.4.2 2.7.1 Ambari 2.2.1 and above
CentOS 6.7(2) CDH 5.7 2.6.0 Cloudera Manager 5.7
● (1) This depends on the operating system version/SP released for the respective Hadoop Distribution.
● (2) Only selected combinations of CentOS versions and Hadoop Distributions are supported.
2.1.5 Cluster Sizing
To enable efficient cluster computation using the SAP HANA Vora extension, the cluster nodes should have at
least the following:
● 4 cores
● 16 GB of RAM
● 20 GB of free disk space for HDFS data

2.1.6 Required Components
The following components are required on the cluster:
Component More Information
HDFS 2.6.0, 2.7.0, or 2.7.1 https://hadoop.apache.org/docs/stable/
Spark 1.6 https://spark.apache.org/releases/
Yarn cluster manager https://spark.apache.org/docs/latest/running-on-yarn.html
Zeppelin v0.6.0 Optional – allows you to use Zeppelin integration: http://zeppelin.apache.org/
Spark Controller 1.6.1 Optional – allows to query SAP HANA Vora tables using Smart Data Access from
SAP HANA
2.1.7 Browser Support
SAP HANA Vora supports the following desktop browsers.
● Google Chrome
○ Latest release cycle for Windows and OS X (recommended)
● Microsoft Internet Explorer
○ IE11 Desktop
● Microsoft Edge
● Mozilla Firefox
○ Latest Extended Support Release cycle
○ Latest Rapid Release cycle (conditionally supported)
● Apple Safari
○ On OS X for 3 years from version release data
Note
Mobile browsers are not yet supported.

2.1.8 Prepare the Distributed Log Server
The SAP HANA Vora Distributed Log (DLog) component requires the RPM package libaio to be installed on
the target machines and the file descriptor limits as well as the locale to be set appropriately.
Procedure
1. Install the libaio package as follows:
Platform Command
RHEL/CentOS sudo yum install libaio
SLES sudo zypper install libaio
2. Increase the system file descriptor limit if necessary:

a. Check the current limit:
cat /proc/sys/fs/file-max
You are generally advised to set the limit to 65536 per 1 GB of RAM.
b. If necessary, increase the limit by adding or modifying the following line in the /etc/sysctl.conf
file:
fs.file-max=<limit>
c. Run the following to load the new setting:
sysctl --load=/etc/sysctl.conf
3. Set the default ulimit value:

a. Add or modify the following line in the /etc/security/limits.conf file:
* - nofile 1000000
Caution
Do not set the limit to a value larger than 1048576 or you may be unable to log in to your system
(notably on RHEL 7.1).
b. Log out or reboot so that the ulimit change takes effect.

4. Make sure that the system locale is configured.
○ To list the locales, which are available on the system, use:
locale -a

○ To list the current settings, use:
locale
○ To globally set the locale, configure the LANG and/or LC_* variables appropriately for your system (for
more information about these variables, see man 7 locale) :
Platform Procedure
RHEL/CentOS 1. To set the system locale, configure the variables in

○ RHEL/CentOS 6: /etc/sysconfig/i18n
○ CentOS 7: /etc/locale.conf
For example, LANG="en_US.UTF-8" will default all locale settings to
en_US.UTF-8.
2. To set an individual user’s locale, configure the variables in $HOME/.i18n.
3. Log out and back in for the changes to take effect.
SLES 1. To set the system locale, prefix the variables names with RC_ and configure
them in /etc/sysconfig/language (for example,
RC_LANG="en_US.UTF-8" will default all locale settings to en_US.UTF-8).
2. To set an individual user’s locale, configure the variables (without the RC_ pre
fix) in $HOME/.i18n.
3. Log out and back in for the changes to take effect.
2.1.9 Prepare the Document Store Server
The SAP HANA Vora Document Store component requires the RPM package numactl to be installed on the
target machines.
Procedure
Install the numactl package as follows:
Platform Command
RHEL/CentOS sudo yum install numactl
SLES sudo zypper install numactl

2.1.10 Prepare the Disk Engine Server
The SAP HANA Vora Disk Engine component requires the RPM packages libtool and libaio to be installed
on the target machines.
Procedure
1. Install the libtool package as follows:
Platform Command
RHEL/CentOS sudo yum install libtool libtool-ltdl
SLES sudo zypper install libtool
2. Install the libaio package as follows:
Platform Command
RHEL/CentOS sudo yum install libaio
SLES sudo zypper install libaio
2.1.11 Prepare the Cluster Manager
The SAP HANA Vora Manager component requires the lsof and ifconfig RPM packages to be installed on
the target machines.
Procedure
1. Install the lsof package as follows:
Platform Command
RHEL/CentOS sudo yum install lsof
SLES sudo zypper install lsof

2. Install ifconfig (contained in the net-tools package) as follows:
Platform Command
RHEL/CentOS sudo yum install net-tools
SLES sudo zypper install net-tools
2.1.12 Configure Sudo Access
To run scripts that use sudo, you need to ensure that the requiretty setting is disabled and that the user
(except root) has sudo permission. Make the necessary changes in the etc/sudoers file using the visudo
command.
Context
For some operating systems, requiretty is a default setting and requires you to have a terminal when
executing sudo. You can either disable requiretty globally by commenting it out or disable it per user. If
necessary, assign sudo permission to the specified user (that will deploy SAP HANA Vora) and set the
NOPASSWD parameter so that a password is not requested when sudo is run.
Procedure
1. Open the etc/sudoers file:
sudo visudo
2. Disable requiretty using either of the options below:
#option 1: comment out

#Defaults requiretty
...
#option 2: allow user <user> to run sudo without a terminal
Defaults:<user> !requiretty
...
MapR only: The mapr user needs to execute scripts using sudo, so you need to disable requiretty for
that user as well.
3. MapR only: Enable a user to run sudo without a password by adding the following:
user_name ALL = NOPASSWD: /path/to/program

Sample Code
mapr ALL=NOPASSWD:ALL
4. Do this on all nodes where SAP HANA Vora will be installed.
2.1.13 Validate the Cluster
To ensure that the components have been correctly installed, run a sample Spark application on the cluster,
such as SparkPi, which calculates the approximate value of Pi.
Prerequisites
● SPARK_HOME has been set correctly.
Example
Ambari $SPARK_HOME=/usr/hdp/current/spark-client/
Cloudera $SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
MapR $SPARK_HOME=/opt/mapr/spark/spark-1.6.1
● You are able to access HDFS
Procedure
Execute the following:
Sample Code
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --

num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 2
--queue default $SPARK_HOME/lib/spark-examples*.jar 10 2>/dev/null
You should see something like this:
Pi is roughly 3.140292
For more information, see Spark Examples .

2.1.14 Collect Hadoop Cluster Information
Before proceeding with the installation, collect and document the following information about your Hadoop
cluster. You will need to have this information at hand during the installation.
Procedure
Make a note of the following information:
○ User and password for Ambari/Cloudera/MapR

○ Operating system user and password
○ HDFS user and password
○ Installation directories of Ambari/Cloudera/MapR, and so on
2.2 SAP HANA Vora Software Download

The SAP HANA Vora engine and extension library are contained in installation packages provided specifically
for each of the cluster provisioning tools. You can download the installation package you require from the SAP
Software Download Center.
The installation packages are as follows:
● SAP HANA Vora for Ambari: VORA_AM<version>.TGZ

● SAP HANA Vora for Cloudera: VORA_CL<version>.TGZ
● SAP HANA Vora for MapR: VORA_MR<VERSION>.TGZ
Find the package in the SAP Software Download Center as follows:
1. Open the SAP Support Launchpad .

2. Choose Software Downloads.
3. Locate the SAP HANA Vora installation package. For example:
○ Search directly using "vora" name combinations, for example, "vora 1.3".
○ Search by alphabetical index (A-Z), for example, under "H" or "V" for "SAP HANA Vora".
○ Search by category: Choose SAP In-Memory (SAP HANA) VORA, SAP IN-MEMORY DISTRIBUTED
COMPUTE ENGINE SAP HANA VORA 1 .

2.3 SAP HANA Vora Manager and SAP HANA Vora Services
SAP HANA Vora is installed as a set of services on your cluster. These consist of the SAP HANA Vora Manager,
which you install and manage using your cluster provisioning tool, and the SAP HANA Vora services, which you
install and manage from the SAP HANA Vora administration UI.
SAP HANA Vora Manager
The SAP HANA Vora Manager is the base deployment for the SAP HANA Vora Services and provides the
infrastructure for managing their configuration and deployment. It consists of the following components:
Component Description
Consul (HashiCorp) Consul is used to implement the discovery service, which manages the service
endpoints in the cluster and provides embedded health checks.
Nomad (HashiCorp) Nomad is both a process scheduler and resource manager. It is responsible for
managing the SAP HANA Vora services as well as their node assignments. If a
service fails, Nomad will automatically keep trying to restart it until the prede
fined number of retries has been reached.
SAP HANA Vora Manager UI The SAP HANA Vora Manager UI shows the status of all SAP HANA Vora services
and allows them to be started, stopped, and configured, by specifying node as
signments and setting service parameters.
SAP HANA Vora Services
The SAP HANA Vora Manager UI is used to manage the individual SAP HANA Vora services, which are listed
below:
Service Description
Vora Catalog A metadata store
Vora Disk An engine for storing data to disk
Vora Distributed Log A distributed commit log providing persistence for the Vora Catalog
Vora Document Store An engine for working with documents
Vora Graph An engine for processing graph data
Vora In-Memory Engine SAP HANA Vora relational in-memory engine
Vora Landscape Server A service that controls data partitioning and placement across database engines
Vora Thriftserver A gateway compatible with the Hive JDBC Driver
Vora Time Series An engine for processing time series data
Vora Tools A web-based user interface with a SQL editor and OLAP modeler

Service Description
Vora Transaction Broker A service that manages user transactions
Vora Transaction Coordinator A service that enforces consistent (meta)data modifications
Vora Transaction Lock Manager A driver for query execution on the database engines with user session seman
tics
SAP HANA Vora Binaries
When you deploy the SAP HANA Vora Manager, the SAP HANA Vora binaries included in the installation
package are distributed to all nodes in the cluster. These binaries also include the SAP HANA Vora Spark
extension library, which is contained in the JAR file spark-sap-datasources-<VERSION>-assembly.jar.
Related Information

Installing SAP HANA Vora [page 23]
2.4 Node Types and Node Assignments
When you deploy the SAP HANA Vora services on the cluster, you need to choose appropriate nodes. An
overview of the different node types and how and where SAP HANA Vora services should be deployed is given
below.
Node Types
For the purposes of setting up a cluster, four different types of cluster nodes are distinguished:
Node Type Description
Management node Contains the cluster provisioning tool, for example, Ambari, Cloudera, or MapR.
Master nodes Contain central cluster components, such as the NameNode server.
Worker nodes These are the compute nodes of the cluster. They contain components such as
DataNodes or NodeManagers.
Jump boxes Contain only client components, such as the HDFS client, and serve as an entry
point for users to start compute jobs using Spark.

Node Assignments
Install the SAP HANA Vora services on the cluster as outlined below:
Service Node Assignment
Vora Manager Install on all nodes in the cluster as follows:
● Masters: Install on at least one master node. Install on at least three mas
ter nodes in production environments (recommended).
● Workers: Install on all nodes
● Clients: Install on all nodes
Vora Catalog Install on at least one node*.
Vora Disk Install on one or more nodes.
Vora Distributed Log Install on at least the same number of nodes as defined by the Distributed Log
replication factor for the Vora Catalog:
● Five nodes are recommended for production environments. This allows

you to use three-way replication and two standby nodes for failover. Any
additional nodes can also serve as standbys.
● When reassigning nodes, make sure that the number of nodes involved re
mains below the replication factor (that is, does not exceed REPLICA
TION_FACTOR-1). You might otherwise lose all nodes where data is per
sisted.
Vora Document Store Install on one or more nodes.
Vora Graph Install on one or more nodes.
Vora In-Memory Engine Install on worker nodes (those nodes where a DataNode is deployed):
● All worker nodes (recommended)
Vora Landscape Server Install on at least one node*.
Vora Thriftserver Install on at least one node, typically the jump box (recommended)*.
Vora Time Series Install on one or more nodes.
Vora Tools Install on at least one node, typically the jump box (recommended)*.
Vora Transaction Broker Install on at least one node*.
Vora Transaction Coordinator Install on at least one node*.
Vora Transaction Lock Manager Install on at least one node*.
Note
* This service runs on a single node. When started, it automatically runs on only one of the assigned nodes.

2.5 Installing SAP HANA Vora
To install SAP HANA Vora, first install and deploy the SAP HANA Vora Manager on your cluster. Once the SAP
HANA Vora Manager is up and running, you can configure and start the SAP HANA Vora services from the SAP
HANA Vora Manager UI.
The high-level installation steps are outlined below:
Step Tool Procedure See
1 Terminal 1. Download the SAP HANA Vora package Prepare for Installation [page 23]
2. Unpack it
3. Restart the cluster manager
2 Terminal Generate an initial username and password for Generate an Initial Password for SAP
the SAP HANA Vora Manager and SAP HANA HANA Vora [page 25]
Vora Tools
Note: Kerberos–enabled Ha Before proceeding with the installation, refer to Enabling Kerberos Authentication
doop clusters the Security section of the guide for SAP HANA Vora [page 91]
Review the step required before installation Enable Access to a Secured Hadoop
Cluster [page 93]
3 Cluster manager 1. Add the Vora Manager service Installing the SAP HANA Vora Man
2. Deploy it ager [page 27]
4 SAP HANA Vora Man 1. Configure the SAP HANA Vora services Deploy the SAP HANA Vora Services
ager UI 2. Start them [page 36]
Note
If your Hadoop cluster requires an HTTP(S) proxy to access content through the HTTP(S) protocol, make
sure that the proxy is configured before starting SAP HANA Vora. For more information, see Configure
Proxy Settings [page 61].
2.5.1 Prepare for Installation
Download and extract the SAP HANA Vora installation package.
Procedure
Cluster Provisioning Tool Steps

Ambari 1. Log on to the Ambari cluster management node.

2. Download VORA_AM<version>.TGZ from the SAP Software Download Center
(https://launchpad.support.sap.com/#/softwarecenter ) to the management
node.
3. Go to /var/lib/ambari-server/resources/stacks/HDP/
<HDP_version>/services.
4. Copy VORA_AM<version>.TGZ to that directory and extract it.
5. Restart the Ambari server with the following command:
$ ambari-server restart
Depending on your cluster configuration, you may need to be the root user or a
user with administrator rights to do so.
Ambari is now able to provision the SAP HANA Vora Manager on the Hadoop cluster.
Cloudera 1. Log on to the Cloudera cluster management node.

2. Download VORA_CL<version>.TGZ from the SAP Software Download Center
(https://launchpad.support.sap.com/#/softwarecenter ) to a temporary di
rectory on the management node.
3. Extract the package.
4. Copy all files contained in the csd directory to /opt/cloudera/csd, the de
fault local descriptor repository path.
5. Copy all files contained in the parcel-repo directory to /opt/cloudera/
parcel-repo, the default local parcel repository path.
6. Restart the Cloudera server, for example as follows:
$ service cloudera-scm-server restart
Depending on your cluster configuration, you may need to be the root user or a
user with administrator rights to do so.
Cloudera is now able to provision the SAP HANA Vora Manager on the Hadoop cluster.
Note
Do not remove the temporary directory until you have generated the initial pass
word for SAP HANA Vora.
Note
SAP HANA Vora can only be installed as a Cloudera parcel and not as a Cloudera
package.
MapR 1. Download the file VORA_MR<VERSION>.TGZ from the SAP Software Download
Center (https://launchpad.support.sap.com/#/softwarecenter ) to the cluster
host.

2. Extract the package to a directory, for example, /tmp/vora-install.

3. Create a group vora and user vora on all nodes of the cluster.
When adding a user to the cluster nodes, make sure that the user ID (UID) is al
ways the same. The same applies to the group ID (GID). For example:
sudo groupadd vora --gid 44936

sudo useradd vora --uid 44936 -g vora
2.5.2 Generate an Initial Password for SAP HANA Vora
SAP HANA Vora is shipped with two UI tools: the SAP HANA Vora Manager, which is used to administer the
SAP HANA Vora services, and the SAP HANA Vora Tools, which allow you to query data and create relational
models. Both UIs require a username and password to log on.
Prerequisites
You have downloaded and extracted the SAP HANA Vora installation package as described in Prepare for
Installation [page 23].
Context
As the administrator, you need to create the initial username and password for both UIs during the installation
of SAP HANA Vora.
The password needs to be stored in an encrypted form in a file named htpasswd on the file system where
either the SAP HANA Vora Tools or SAP HANA Vora Manager will run. You therefore need to distribute the
htpasswd file to all nodes that have the master role (that is, where the SAP HANA Vora Manager will be
installed as a master) or that will host the SAP HANA Vora Tools.
Tip
If in doubt about the node assignment of the SAP HANA Vora Manager or SAP HANA Vora Tools (or to have
the flexibility to re-assign these services), copy the htpasswd file to the /etc/vora/
{manager,datatools} directories on all nodes and set the ownership and permissions there.
Set up the password file as described below.

Procedure
1. Execute the genpasswd.sh script.

You may need to run the script as the root user.
2. Enter the username and password.
Note
The same username and password will be used by the SAP HANA Vora Tools and the SAP HANA Vora
Manager as the initial username and password.
3. Enter a directory on the file system where the htpasswd file should be stored. If the path does not exist,
the script will create the path.
Note
The directory should have limited access permissions to prevent other users from being able to modify
files in the directory.
4. Set up the htpasswd file on all hosts that will host the SAP HANA Vora Manager service. Log in to each
host and do the following:
a. Create the group vora and user vora.
b. As the root user, create the directory /etc/vora/manager:
mkdir –p /etc/vora/manager
c. Copy htpasswd from the host where it was generated in step 3 to /etc/vora/manager.
d. Change the ownership of htpasswd to the user vora:
chown vora htpasswd
e. Change the permissions to rw for vora:
chmod 600 htpasswd
5. Set up the htpasswd file on all hosts that will host the SAP HANA Vora Tools. Log in to each host and do
the following:
a. Create the group vora and user vora.
b. As the root user, create the directory /etc/vora/datatools:
mkdir –p /etc/vora/datatools
c. Copy htpasswd from the host where it was generated in step 3 to /etc/vora/datatools.
d. Change the ownership of htpasswd to the user vora:
chown vora htpasswd
e. Change the permissions to rw for vora:
chmod 600 htpasswd
6. Continue with the installation as described in Installing the SAP HANA Vora Manager.

Related Information
Installing the SAP HANA Vora Manager [page 27]
2.5.3 Installing the SAP HANA Vora Manager
Use the Ambari, Cloudera, or MapR cluster provisioning tool to install and deploy the SAP HANA Vora Manager
on your cluster.
Roles
The SAP HANA Vora Manager is installed with the following roles:
Role Description
Masters The master role makes available the SAP HANA Vora Manager UI application for configur
ing the SAP HANA Vora services. Install the master role on at least one node.
Workers The worker role provides agent functionality for the particular node on which it is installed.
Install the worker role each node of the cluster.
Clients This package contains all SAP HANA Vora executables and basic configuration files. Install
the client on each node of the cluster.
Procedure
Install the SAP HANA Vora Manager as follows:
Cluster Administration Tool Procedure
Ambari Install the SAP HANA Vora Manager for Ambari [page 28]
Cloudera Install the SAP HANA Vora Manager for Cloudera [page 29]
MapR Installing the SAP HANA Vora Manager for MapR [page 31]
The cluster administration tool will configure and start Consul, Nomad, and the SAP HANA Vora Manager UI
(note that the individual components are not shown on the UI).
SAP HANA Vora Environment Variables
The /etc/vora/vora-env.sh file is automatically generated on each node by the SAP HANA Vora Manager
before the service is started. It is generated for both the master and worker roles.

The file contains environment variables for improved interaction with the SAP HANA Vora software, for
example, the variable VORA_SPARK_HOME. It is recommended to set these variables and source the file when
using SAP HANA Vora.
2.5.3.1 Install the SAP HANA Vora Manager for Ambari
Use the Ambari cluster provisioning tool to install the SAP HANA Vora Manager on your cluster.
Procedure
1. On the Ambari Administration UI, add the Vora Manager service.
a. On the Ambari dashboard, choose Actions Add Service .

b. On the Choose Services screen, select the Vora Manager option and click Next.
2. On the Assign Masters screen, add the hosts on which the Vora Manager Master should run.
a. Select at least one master host.
Note
SAP HANA Vora requires that there is always at least one master instance running. You should
therefore consider installing the master on at least three nodes in production environments.
b. Click Next.
3. On the Assign Slaves and Clients screen, add the Vora Manager Worker and Vora Client as follows:
a. Add the Vora Manager Worker to all nodes.
b. Add the Vora Client to all nodes.
This distributes the SAP HANA Vora binaries to all nodes in the cluster.
c. Click Next.
4. Customize the service:
a. In the Advanced vora-manager-config section, correct the default log and data directory settings if
necessary.
b. If you want to run SAP HANA Vora with a non-root user, set vora_manager_run_as_user. For more
information, see Run SAP HANA Vora As a Non-Root User [page 67].
5. Deploy the service and complete the installation.
Results
When deployment has completed, the Vora Manager service should be up and running and its status should be
shown as green. For example:

Both Consul and Nomad should also be up and running and you should be able to access the Vora Manager UI
at <VORA MASTER HOST>:19000.
Related Information
Deploy the SAP HANA Vora Services [page 36]
2.5.3.2 Install the SAP HANA Vora Manager for Cloudera
Use the Cloudera cluster provisioning tool to install the SAP HANA Vora Manager on your cluster.
Prerequisites
Cloudera (CDH) has been installed as a parcel.
Note
SAP HANA Vora can only be installed as a Cloudera parcel and not as a Cloudera package.
Context
Remember
● Install the master role on at least one node.

● Install the worker role all nodes of the cluster.
● Install the gateway role (client) on all nodes of the cluster.
Procedure
1. In the Cloudera Manager, distribute and activate the Vora Manager parcel.
a. In the main menu, choose Hosts Parcels .

b. In the parcel list, locate SAPHanaVora and choose the Distribute button.
Wait until the parcel's status is shown as distributed.
c. Choose the Activate button.
d. Choose OK to confirm.
The parcel's status is shown as distributed and activated.
2. Add the Vora Manager service.
a. Go to the Home screen.
b. Open the drop-down menu next to the cluster name and choose Add Service.
A list of service types is displayed.
c. On the Add Service screen, select the Vora Manager option and choose Continue.
3. On the role assignment page, assign the hosts.
a. Click the box below Vora Manager Master.
The Hosts Selected dialog box appears.
b. Select at least one master host.
Note
SAP HANA Vora requires that there is always at least one master instance running. You should
therefore consider installing the master on at least three nodes in production environments.
c. Choose OK.
d. Click the box below Vora Manager Worker.
e. Add the Vora Manager worker role to all nodes.
Note
All nodes need the worker role.
f. Choose OK.
g. Click the box below Gateway.
h. Add the Vora Manager gateway role to all nodes.
This distributes the SAP HANA Vora binaries to all nodes in the cluster.
Note
All nodes need the gateway role.

i. Choose OK and then Continue.
4. Review the changes:
a. Correct the default log and data directory settings if necessary.
b. If you want to run SAP HANA Vora with a non-root user, set User to run vora services, Group to run
vora services, System User, and System Group. For more information, see Run SAP HANA Vora As a
Non-Root User [page 67].
c. Choose Continue.
5. When the SAP HANA Vora Manager has been successfully deployed and started, choose Continue and
then Finish.
Results
When deployment has completed, the Vora Manager service should be up and running and its status should be
shown as green.
Both Consul and Nomad should also be up and running and you should be able to access the Vora Manager UI
at <VORA MASTER HOST>:19000.
Related Information
Deploy the SAP HANA Vora Services [page 36]
2.5.3.3 Installing the SAP HANA Vora Manager for MapR
Install the SAP HANA Vora package for MapR on your cluster. This is currently a manual installation process.
Prerequisites
● The MapR cluster is already set up.

● The MapR File System must be accessible through NFS on every node where SAP HANA Vora is deployed.
● The mechanism for the MapR central configuration has been established.
● Apache Spark (version 1.6.1) has been installed and is fully functional (for example, the Spark shell can be
launched without any errors).
● It is recommended to install Hive and the Hive Metastore, which should be properly configured to allow it
to be accessed by Spark.

SAP HANA Vora RPM Packages
The files contained in the SAP HANA Vora package are RPM packages that can be installed with package
management tools like yum (for the Red Hat or CentOS Linux distribution). The following table describes the
RPM packages required to install SAP HANA Vora:
Package Name Description
mapr-vora-base-<version>.<arch>.rpm SAP HANA Vora base package: This package contains all SAP HANA
Vora executables and basic configuration files.
It needs to be installed on each node of the cluster.
mapr-vora-manager-<version>.<arch>.rpm Configuration files for the SAP HANA Vora Manager.
It needs to be installed on each node on which the SAP HANA Vora serv
ices are deployed. Depending on the role to be played by the node, either
the mapr-vora-manager-master and/or the mapr-vora-
manager-worker rpm package needs to be deployed in addition.
Prerequisites: vora-base and mapr-core
mapr-vora-manager-master-<ver Configuration files for the master role of the SAP HANA Vora Manager.
sion>.<arch>.rpm
The master role of the SAP HANA Vora Manager makes available the
SAP HANA Vora Manager UI application for configuring the SAP HANA
Vora services. It is recommended to install this role on ZooKeeper,
CLDB, or resource manager nodes.
SAP HANA Vora requires that there is always at least one instance of this
role running. You should therefore consider installing this role on at least
three nodes in production environments.
Prerequisites: mapr-vora-manager
mapr-vora-manager-worker-<ver Configuration files for the worker role of the SAP HANA Vora Manager.
sion>.<arch>.rpm
The worker role of the SAP HANA Vora Manager provides agent func
tionality for the particular node on which it is installed. Install this role on
all nodes of the cluster.
Prerequisites: mapr-vora-manager
Note
The MapR installer cannot yet be used to deploy the HANA Vora components across the cluster. However,
the manual installation steps required can be easily automated, using password-less SSH access as
described in the MapR installation guide.
Procedure
1. Install the SAP HANA Vora Manager [page 33]

2. Configure the SAP HANA Vora Manager [page 34]
3. Start the SAP HANA Vora Manager [page 35]
2.5.3.3.1 Install the SAP HANA Vora Manager
Install the SAP HANA Vora roles on the appropriate nodes of the cluster.
Prerequisites
The tool used in step 4 requires a password-less SSH connection to all nodes in the cluster. The user must
either be root or able to invoke sudo. For more information, see Configure Sudo Access [page 17].
Context
It is recommended that you distribute the SAP HANA Vora Manager roles on the cluster as follows:
● On master nodes, for example, nodes containing the ZooKeeper or CLDB service: Deploy the packages
vora-base, mapr-vora-manager, mapr-vora-manager-master, and mapr-vora-manager-worker.
● On worker nodes, for example, nodes containing the NodeManager service: Deploy the packages vora-
base, mapr-vora-manager, and mapr-vora-manager-worker.
Perform the steps outlined below on all nodes of the cluster.
Procedure
1. Log on to a cluster node with an administrative user, for example, the mapr user.
2. Navigate to the installation directory. For example:
cd /tmp/vora-install
3. Install the packages as follows:
○ For the master role:
sudo yum install vora-deps-..rpm

sudo yum install vora-base-<version>rpm
sudo yum install mapr-vora-manager-<version>rpm
sudo yum install mapr-vora-manager-master-<version>rpm
sudo yum install mapr-vora-manager-worker-<version>rpm
sudo /opt/mapr/server/configure.sh -R -no-autostart
○ For the worker role:
sudo yum install vora-deps-..rpm

sudo yum install vora-base-<version>rpm
sudo yum install mapr-vora-manager-<version>rpm
sudo yum install mapr-vora-manager-worker-<version>rpm
sudo /opt/mapr/server/configure.sh -R -no-autostart
4. Repeat this procedure on further nodes. You can use a small utility tool to distribute the software and
installation across the nodes. For example:
a. Deploy the vora-manager-master role to all nodes containing the CLDB service:
/opt/mapr/vora/service-control.sh manager-master deploy \
–-ref=cldb
b. Deploy the vora-manager-worker role to all nodes containing the NodeManager service:
/opt/mapr/vora/service-control.sh manager-worker deploy \
-–ref=nodemanager
2.5.3.3.2 Configure the SAP HANA Vora Manager
After the installation of the packages, you can adjust the SAP HANA Vora Manager configuration to suit your
own requirements.
Context
The SAP HANA Vora Manager configuration is contained in two configuration files:
● /opt/mapr/conf/conf.d/vora_default_settings.sh
This file contains all configuration parameters for the SAP HANA Vora services. It is realized as a shell
script and uses environment variables for interaction with the SAP HANA Vora Manager. You can change
the parameters for the ports and log location in this file.
● /etc/vora/vora-env.sh
This file contains environment variables for working with the SAP HANA Vora software.
If possible, only make changes to the configuration in the vora_default_settings.sh file.
Procedure
1. Copy the file /opt/mapr/conf/conf.d/vora_default_settings.sh to a different local directory. For

example:
cp /opt/mapr/conf/conf.d/vora_default_settings.sh /tmp/
vora_default_settings.sh
2. Edit the temporary configuration file with a text editor.

3. Upload the temporary configuration file to the central configuration:
hadoop fs –mkdir –p /var/mapr/configuration/conf/conf.d

hadoop fs –put /tmp/vora_default_settings.sh /var/mapr/configuration/conf/
conf.d
After some time, the central configuration will have been replicated to all cluster nodes.
The same procedure can be applied to the environment variables file, if required.
2.5.3.3.3 Start the SAP HANA Vora Manager
After the installation of the SAP HANA Vora Manager, two new services are available as MapR services. These
are the vora-manager-master and vora-manager-worker.
Context
The services are visible on the installed nodes using either the MapR Control system or the MapRCLI
command line tool. By default, the services are installed but not automatically started.
Note
The SAP HANA Vora Manager only becomes functional in the master role if the Vora Manager is started on
all nodes on which the master role is installed.
In order to start the SAP HANA Vora Manager, proceed as described below.
Procedure
1. Start the Vora Manager (masters only) as follows:
sudo /opt/mapr/vora/service-control.sh manager-master start
2. Start the Vora Manager (workers only) as follows:
sudo /opt/mapr/vora/service-control.sh manager-worker start
3. Log on to the MapR Control System and verify the service status on the various cluster nodes.

2.5.4 Deploy the SAP HANA Vora Services
Use the SAP HANA Vora Manager UI to configure and deploy the SAP HANA Vora services on your cluster.
Prerequisites
The SAP HANA Vora Manager is up and running.
Context
The SAP HANA Vora Manager UI allows you to start and stop services as well as manage their configuration
and node assignments.
When initially installed, the SAP HANA Vora services are not yet configured. Before starting the services, work
through the service list and for each service:
● Ensure that the configuration parameters are correctly set

● Assign the nodes on which the service should be deployed
Note that you can also run individual services or all services straight away by simply loading their default
configuration and starting them. You might find this useful for a quick test. However, it is recommended that
you explicitly configure the services before starting them.
Procedure
1. Open the SAP HANA Vora Manager UI.

a. Point your browser to <VORA MASTER HOST>:19000.
b. Log in using the initial user and password defined earlier.
2. Choose Services.
3. In the list on the Services screen, select the service to be configured.
The Configuration and Node Assignment tabs for the selected service appear. For example:

4. On the Configuration tab, enter any required values and correct the default log settings and other default
values if necessary.
○ For the Vora Catalog, check in particular the following setting:
Parameter Description
Distributed Log replication factor This value defines the availability and durability guaran
tees for the metadata. It can be at most the number of
nodes assigned to the Distributed Log.
○ For the Vora Thriftserver, enter the following required information:
Location of Spark installation for SAP HANA Vora This value depends on where Spark is installed on your
Thriftserver system.
Location of Java installation for SAP HANA Vora This value depends on where JAVA is installed on your
Thriftserver system.
Note
The SAP HANA Vora Thriftserver runs an instance of Hive ThriftServer2. Since Hive is used
internally, you need to have either a working Hive configuration or no Hive configuration at all.
5. On the Node Assignment tab, assign the selected service to the appropriate nodes.
○ Specify the number of instances to run:

Number of instances The number of instances to run on the assigned nodes.

If a service only supports one instance, this parameter
is set to 1 and cannot be changed.
Run instances on distinct hosts If selected (default), only one instance is run on each
assigned host.
Note that for the Vora Distributed Log the number of instances automatically equals the number of
nodes selected.
○ Select the nodes on which the service should run.

You need to select at least the same number of nodes as specified in the Number of instances field, if
the Run instances on distinct hosts option is also selected.
For more information about which nodes to assign, see Node Types and Node Assignments [page 21].
6. Choose Apply to save.

The status of the service is now shown as configured.
Note
You need to save the configuration for each individual service. Once a configuration has been saved the
status of the service changes from Not Configured to Configured. You can also start a service even if it
has not been configured. In this case, the default configuration will be applied (you will be prompted to
confirm that you want to start the service with the default configuration).
7. Start all services.

When you have configured and completed the node assignments for all services, choose Start All.
All services are started and their status shown as running. The health of each service as given by Consul is
also indicated. For example:
Related Information

Start and Stop the SAP HANA Vora Services [page 71]
Examine the SAP HANA Vora Nodes [page 74]

2.6 Validate the SAP HANA Vora Installation
To check that the SAP HANA Vora engine and extension library have been correctly installed and that you can
use the SAP HANA Vora features in Spark, create a table and load data into it from a file stored in HDFS.
Prerequisites
● You have already successfully deployed the SAP HANA Vora services on the cluster and the instances are
running.
● You have already installed Spark.
Context
The SAP HANA Vora Spark extension is located in the vora-spark directory. The exact location of the
directory depends on which cluster manager you are using. It is recommended to set the $VORA_SPARK_HOME
environment variable to point to this directory. It is contained in the /etc/vora/vora-env.sh file together
with other environment variables, which allow you to interact more easily with SAP HANA Vora.
Example
Ambari
$VORA_SPARK_HOME=/var/lib/ambari-agent/cache/stacks/HDP/
<HDP_version>/services/vora-manager/package/lib/vora-spark
Cloudera
$VORA_SPARK_HOME=/opt/cloudera/parcels/SAPHanaVora-
<version>/lib/vora-spark
MapR
$VORA_SPARK_HOME=/opt/vora/lib/vora-spark
The vora-spark directory contains the following folders:
● lib/: Contains the spark-sap-datasources-<VERSION>-assembly.jar file with all necessary

dependencies (excluding Spark).
● bin/: Contains scripts for ease of use.
● META-INF/: Contains the pom.properties and pom.xml files.
Procedure
1. Create a file in HDFS. Note that in this example the test file, test.csv, is stored in a directory set up for
the user "vora" (user/vora):

Sample Code
echo "1,2,Hello" > test.csv

hadoop fs -put test.csv /user/vora/test.csv
hadoop fs -cat /user/vora/test.csv
1,2,Hello
2. Open a Spark shell, for example, by using the shell script:
$VORA_SPARK_HOME/bin/start-spark-shell.sh
3. Enter the following statements in the Spark shell to create a table and check that it has been successfully
created:
scala> import org.apache.spark.sql.SapSQLContext

scala> val vc = new SapSQLContext(sc)
scala> val testsql = """
CREATE TABLE table001 (a1 double, a2 int, a3 string)
USING com.sap.spark.vora
OPTIONS (
files "/user/vora/test.csv"
)"""
scala> vc.sql(testsql)
scala> vc.sql("show tables").show
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
| table001| false|
+---------+-----------+
scala> vc.sql("SELECT * FROM table001").show
+---+--+-----+
| a1|a2| a3|
+---+--+-----+
|1.0| 2|Hello|
+---+--+-----+
scala > <Ctrl-D to quit>
Results
You have now successfully validated the SAP HANA Vora extension and can use it as follows:
● The JAR file in the lib folder (spark-sap-datasources-VERSION-assembly.jar) can be provided to

Spark using the --jars option.
For example, assuming the spark-shell command is on the user's path:
$ spark-shell --jars $VORA_SPARK_HOME/lib/spark-sap-datasources-VERSION-

assembly.jar
● Alternatively, the shell scripts in the bin folder can be used to run a Spark shell with the SAP HANA Vora
extension library. To do so, the SPARK_HOME environment variable needs to point to the Spark folder on
the jump box.
You can then start the Spark shell in Yarn client mode as follows:
$ ./start-spark-shell.sh --master yarn-client

2.7 Install the SAP HANA Vora Zeppelin Interpreter
Zeppelin is a graphical user interface that allows you, as a data scientist, to interact easily with a cluster. The
SAP HANA Vora Spark extension provides an interpreter for the Zeppelin user interface.
Prerequisites
Zeppelin is properly installed and functioning correctly on the cluster:
● You require Zeppelin 0.6.x built against Spark 1.6, Hadoop 2.7, Yarn, and Scala 2.10.
● Zeppelin 0.6.0 is available as a binary package for Scala 2.10 (http://zeppelin.apache.org/download.html
).
Note that the Zeppelin 0.6.1 binary download is for Scala 2.11 only.
Note
The Zeppelin binaries made available by Hortonworks Ambari are not compatible with SAP HANA Vora.
Context
The SAP HANA Vora extension library has its own SQLContext class. A modified Zeppelin interpreter,
spark.vora, is therefore required to allow Zeppelin to run in the modified context. To enable the interpreter,
you need to register it with Zeppelin.
Procedure
1. Copy zeppelin/zeppelin*.jar to <ZEPPELIN_HOME>/interpreter/spark:
$ cp $VORA_SPARK_HOME/zeppelin/zeppelin-<VERSION>.jar \
<ZEPPELIN_HOME>/interpreter/spark/
Note
The location of the zeppelin*.jar file depends on your installation:
○ Ambari, for example: /var/lib/ambari-agent/cache/stacks/HDP/<HDP_version>/
services/vora-manager/package/lib/vora-spark/zeppelin/
○ Cloudera, for example: /opt/cloudera/parcels/SAPHanaVora-<version>/lib/vora-
spark/zeppelin/
○ MapR, for example: /opt/vora/lib/vora-spark/zeppelin/
<ZEPPELIN_HOME> refers to the directory to which the Zeppelin binaries have been extracted.

2. Extract the shipped interpreter-setting.json and include it in the zeppelin-spark.jar file:
$ cd <ZEPPELIN_HOME>/interpreter/spark
$ // extract the new interpreter settings
$ jar xf zeppelin-<VERSION>.jar interpreter-setting.json
$ // replace the old one in the zeppelin-spark jar and remove it
$ jar uf zeppelin-spark-<ZEPPELIN_VERSION>.jar interpreter-setting.json
$ rm interpreter-setting.json
3. Add the following variables to the <ZEPPELIN_HOME>/conf/zeppelin-env.sh file:
○ HDP/CDH:
export MASTER=yarn-client
○ MapR 5.x:
export MASTER=yarn-client
export HADOOP_CONF_DIR="/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop"
export HADOOP_HOME="/opt/mapr/hadoop/hadoop-2.7.0/"
export ZEPPELIN_JAVA_OPTS="-Djava.security.auth.login.config=/opt/mapr/
conf/mapr.login.conf"
Example
1. cp $ZEPPELIN_HOME/conf/zeppelin-env.sh.template $ZEPPELIN_HOME/conf/
zeppelin-env.sh
2. chmod 0755 $ZEPPELIN_HOME/conf/zeppelin-env.sh
3. vi $ZEPPELIN_HOME/conf/zeppelin-env.sh
4. Insert the variables shown above and save your changes.
Note
Zeppelin also requires the environment variables SPARK_HOME and HADOOP_CONF_DIR to be set. If
these are not already set, you can add them to the zeppelin-env.sh file as well.
4. Add the interpreter class sap.zeppelin.spark.SapSqlInterpreter to the

zeppelin.interpreters property in the <ZEPPELIN_HOME>/conf/zeppelin-site.xml file:
...
<property>
<name>zeppelin.interpreters</name>
<value>INTERPRETER_1,...,INTERPRETER_N,sap.zeppelin.spark.SapSqlInterpreter</
value>
<description>Comma separated interpreter configurations.
First interpreter becomes the default</description>
</property>
...
Note
Make sure that the SAP interpreter class sap.zeppelin.spark.SapSqlInterpreter occurs after
the Spark interpreter class org.apache.zeppelin.spark.SparkInterpreter in the resulting list
of interpreters.

5. Optional: Add the following port information to the zeppelin-site.xml file:
<property>
<name>zeppelin.server.port</name>
<value>9099</value>
<description>Server port.</description>
</property>
6. For HDP with Ambari only: Update the YARN configuration as follows:
a. Check the installed HDP version (<HDP_VERSION>), for example, from the following directory
name: /usr/hdp/<HDP_VERSION>
b. On the Ambari administration interface, select the YARN service and choose the Configs
Advanced tab. Scroll down to the Custom yarn-site section and choose Add Property.
c. Add a property with the key hdp.version and value <HDP_VERSION>.
7. Start the Zeppelin server:
$ <ZEPPELIN_HOME>/bin/zeppelin-daemon.sh start
8. In a web browser, open Zeppelin: http://VORA JUMPBOX HOST:9099

9. Remove and re-add the Spark interpreter:
a. In the top right corner, click your user name and in the dropdown menu choose Interpreter:
b. Remove the Spark interpreter and confirm its removal.

c. Choose the Create button to create a new interpreter.
d. Re-add the Spark interpreter, name it spark, and choose spark as the interpreter group:
e. MapR only: Add the mapr-zookeeper JAR file as a dependency of your SAP HANA Vora installation.
For example, /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/
zookeeper-3.4.5-mapr-1503.jar.
The mapr-zookeeper dependency must come before the spark-sap-datasources-assembly
JAR file (which you add in the next step):

f. Add the spark-sap-datasources-assembly JAR file as a dependency of your SAP HANA Vora
installation. For example, $VORA_SPARK_HOME/lib/spark-sap-datasources-<VERSION>-
assembly.jar.
g. Make sure that master is still set to yarn-client.
h. Make sure that the Spark-specific properties match your cluster's environment.
The spark.executor.memory property should not be set to a value higher than the available
memory on the host where the Spark and SAP HANA Vora jobs will be executed. Typically the default
value is 512m.
i. Save your changes.
The Spark interpreter should be visible again and should now include spark.vora:
10. Test that the Zeppelin interpreter has been successfully installed:
a. Create a new notebook and add the following two scripts:
%spark.vora CREATE TABLE table01 (a1 double, a2 int, a3 string)

USING com.sap.spark.vora
%spark.vora SHOW TABLES
b. Execute the scripts.
The execution of the first snippet might take some time (1-3 minutes), since a Spark application needs to
be started on the server. Once the application is running, subsequent calls will be much faster (depending
on the actual query).
Example output:

Note
The log files are available as follows:
○ <ZEPPELIN_HOME>/logs/zeppelin-*-.log: Contains the Web-UI related output.
○ <ZEPPELIN_HOME>/logs/zeppelin-interpreter-*-.log: Contains the output you would see
in a Spark shell.
Related Information
Spark Interpreter for Apache Zeppelin
2.8 Connect SAP HANA Spark Controller to SAP HANA

Vora
Configure the Spark controller to use SAP HANA Vora. This allows you to connect from SAP HANA to SAP
HANA Vora and query SAP HANA Vora tables.
Prerequisites
The Spark controller has been installed and configured. For more information, see Set up Spark Controller
Manually in the SAP HANA Administration Guide.
Note that the Confirm Connection to Hive Metastore step is not necessary when you run the Spark controller
with SAP HANA Vora. If you copy hive-site.xml into the Spark controller’s conf directory, you might
encounter issues unless you have a valid Hive installation that is appropriately configured and your Hive
metastore is running properly.
Context
Note
If the Spark controller has been installed through Ambari, you should also configure the service using the
Ambari UI. This applies to the settings that you need to make in the following configuration files:
● hana_hadoop-env.sh: Use the Advanced hana_hadoop-env section on the Spark controller Configs
tab.
● hanaes-site.xml: Use the Custom hanaes-site section on the Spark controller Configs tab.
Then save your configuration changes and restart the Spark controller service.

Note
To use the Spark controller with MapR, see SAP Note 2408096 for more information.
Procedure
1. Make the SAP HANA Vora data sources JAR and the Spark assembly JAR available to the Spark controller:
a. Identify the SAP HANA Vora data sources JAR file. It is usually located under the following path:
$ echo $VORA_SPARK_HOME/lib/spark-sap-datasources-<TAB>
If the VORA_SPARK_HOME environment variable is not set, you can identify the file by searching as
follows:
$ sudo find / -name "spark-sap-datasources-*.jar"
b. If not done during the general Spark controller setup, identify the Spark assembly JAR:
$ echo $SPARK_HOME/lib/spark-assembly-<TAB>
If the SPARK_HOME environment variable is not set, you can identify the file by searching as follows:
$ sudo find / -name "spark-assembly-*.jar"
c. Set the following environment variables in /usr/sap/spark/controller/conf/hana_hadoop-

env.sh:
export HANA_SPARK_ASSEMBLY_JAR=<PATH_TO_SPARK_ASSEMBLY_JAR>
export HANA_SPARK_ADDITIONAL_JARS=<PATH_TO_SAP_HANA_VORA_DATASOURCE_JAR>
Make sure that you use the same versions that you are using to create tables. Compatibility between
different packages is not always guaranteed.
2. Configure the Spark controller.
In the Spark controller configuration file /usr/sap/spark/controller/conf/hanaes-site.xml,
change the value of the property sap.hana.hadoop.datastore from hive to vora. It should look like
this:
<property>
<name>sap.hana.hadoop.datastore</name>
<value>vora</value>
<final>true</final>
</property>
Note
You need to make sure that the Spark-specific properties match your cluster's environment, that is,
spark.executor.memory and spark.executor.instances. Otherwise the Spark controller may
not be able to start up properly because of resource allocation issues. For more information, see Spark
Controller [page 67].
3. For Cloudera only:

a. Add the following line to /usr/sap/spark/controller/conf/hana_hadoop-env.sh:
export HADOOP_CLASSPATH=`hadoop classpath`
b. Change the following line in the /usr/sap/spark/controller/bin/hanaes script:

Change:
CLASSPATH="${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}:$
{HADOOP_CLASSPATH}"
To:
CLASSPATH="${HADOOP_CLASSPATH}:${HANA_SPARK_ASSEMBLY_JAR}:$
{HANA_SPARK_ADDITIONAL_JARS}"
4. Restart the Spark controller.
For the configuration changes to take effect, restart the Spark controller, for example, using the following
commands:
$ cd /usr/sap/spark/controller/bin
$ ./hanaes stop
$ ./hanaes start
5. Verify the configuration changes.
To verify whether the configuration changes were successful, check the Spark controller log
file: /var/log/hanaes/hana_controller.log
After initialization, the file should contain the following lines at the end:
(DATE and TIME) INFO Server: Running Spark Controller

(DATE and TIME) INFO CommandRouter: Connecting to Vora Engine
(DATE and TIME) INFO CommandRouter: Initialized Router
(DATE and TIME) INFO CommandRouter: Server started
If these lines are missing, double-check whether the spark-sap-datasources-<VERSION>-

assembly.jar is present and the configuration settings are correct.
Results
After successful configuration, you can see the tables stored in SAP HANA Vora in SAP HANA Studio, and you
can add virtual tables and submit queries, as described in the SAP HANA Spark Controller documentation.
Related Information
Using SAP HANA Spark Controller

Accessing SAP HANA Vora from SAP HANA [page 80]

2.9 Connect SAP Lumira to SAP HANA Vora
Connect SAP Lumira to SAP HANA Vora to visualize data from SAP HANA Vora, Spark, and SAP HANA, in SAP
Lumira.
Prerequisites
● You need SAP Lumira version 1.29 or higher.

● MapR only: You need an OS user lumira with the password lumira on all nodes that could be running the
SAP HANA Vora Thriftserver.
Context
To use SAP Lumira with SAP HANA Vora, you need to install the relevant drivers in SAP Lumira to be able to
connect from it using JDBC. This allows you to create a connection to SAP HANA Vora using the SAP HANA
Vora Thriftserver.
Procedure
1. Install the JDBC driver. You need to use the Spark drivers.
a. Open SAP Lumira and choose Preferences SQL Drivers .

b. Select Generic JDBC datasource – JDBC Drivers and choose Install Drivers.
c. Select all *.jar files under C:\Program Files\SAP Lumira\Desktop\utilities\SparkJDBC,

choose Open, and then Done.

d. To apply the driver changes, restart SAP Lumira.
2. Start the SAP HANA Vora Thriftserver from the SAP HANA Vora Manager UI.
3. Create a connection to SAP HANA Vora.
a. In SAP Lumira choose File New .

The Add new dataset dialog box appears.
b. Select Query with SQL and choose Next.
c. Select Generic JDBC datasource – JDBC Drivers and choose Next. Note that the green tick indicates
that the drivers are installed.
d. Enter the required credentials and connection URLs as follows:

Field Value
User Name lumira
Password lumira
JDBC URL jdbc:spark://<host>:<port>/

default;CatalogSchemaSwitch=0;UseNativeQuery=1
○ host: Host name of the Thrift server
○ port: The default value is 19123
JDBC Class com.simba.spark.jdbc4.Driver
e. Choose Connect.
You should now see the CATALOG_VIEW, where you can select tables and enter SQL queries.
4. Use Beeline, a JDBC client, to register tables created in SAP HANA Vora in the Thrift server.
a. Open the Beeline command line client:
./beeline
b. Execute the following statement to connect to the Thrift server, replacing the host name and port as
needed:
!connect jdbc:hive2://<hostname of thrift server>:<port, default: 19123>
c. When prompted for a user name and password, enter lumira in both cases.
d. Register the tables by running the following command:
REGISTER ALL TABLES USING com.sap.spark.vora;
Note
Table definitions are stored in the SAP HANA Vora catalog. This allows you to register and re-
register tables whenever you start or restart the Thrift server. The tables are persisted as long as
the Thrift server is connected.
5. View the data in SAP Lumira.

a. In SAP Lumira, refresh the CATALOG_VIEW (see step 3 above) by choosing Previous and then Next.
b. Drill down in the CATALOG_VIEW into Spark to see the tables available on the Thrift server.

c. In the Query field, enter a select statement and choose Preview. Note that you need to use the same
format for select statements as in the Beeline command line client.
A preview of the selected data is displayed.
d. Use the standard SAP Lumira functionality to create a report and visualize the data.
Related Information
SAP Lumira
Connect SAP Lumira to a Kerberized SAP HANA Vora Cluster [page 102]
2.10 Updating SAP HANA Vora
Update your SAP HANA Vora installation by downloading and installing the latest version of the installation
package. The update process involves a complete uninstall of SAP HANA Vora followed by a fresh install.
Table and View Definitions
The table, view, and partitioning function definitions in the SAP HANA Vora Catalog will not be automatically
migrated. Use the SAP HANA Vora data migration feature to recreate tables, views, and partitioning functions
after an update. This applies to support package updates (SAP HANA Vora 1.2 to 1.3) only.
Alternatively, use scripts to recreate objects after an update. This applies in particular to patch updates (1.3.x
to 1.3.y).

Service Configuration Settings
Existing configurations, including node assignments, will be deprecated. Reassign services to nodes after an
update. For patch updates, optionally export service configurations from the SAP HANA Vora Manager UI. You
can reimport them after the update if they are still compatible.
Distributed Log Persistence Directory
SAP HANA Vora 1.2 to 1.3 only: Use a new directory for the distributed log's persistence. Alternatively, remove
the old directory entirely (back up first, if necessary), for example, using one of the options below:
● Remove the old directory: rm -rf <store-directory>

● Overwrite the old directory: Call the v2dlog format tool with the parameter --force-format
Old Data
Patch updates (1.3.x to 1.3.y) only: Remove old data by deleting the following Vora directories on all hosts:
● /var/log/vora*
● /var/local/vora/
● /lib/vora*
● /etc/vora/
● /var/run/vora/
Related Information
Export Metadata from SAP HANA Vora 1.2 [page 53]

Update SAP HANA Vora Using Ambari [page 54]
Update SAP HANA Vora Using Cloudera [page 55]
Update SAP HANA Vora for MapR [page 57]
Import Metadata into SAP HANA Vora 1.3 [page 58]
Install the SAP HANA Vora Zeppelin Interpreter [page 41]

2.10.1 Export Metadata from SAP HANA Vora 1.2
The SAP HANA Vora data migration feature allows you to to dump the metadata for tables, views, and
partitioning functions defined on a SAP HANA Vora cluster to a local file system as JSON files. You can use
these files to import the metadata into an SAP HANA Vora 1.3 cluster.
Context
The data migration JAR file is available in SAP HANA Vora 1.3.
Procedure
1. Extract the data migration JAR:

a. Extract the $VORA_SPARK_HOME/lib/data-migration.jar file from the SAP HANA Vora
installation package.
b. Copy it to the master machine of your cluster.
c. Include it as an additional JAR file when you run the start-spark-shell.sh script.
2. Use the data migration utility to export the data as follows:
import com.sap.spark.vora.client.DataMigrationUtil
DataMigrationUtil.dumpMetadata(
path: String = DEFAULT_PATH, // = “/”
voraCatalogTimeout: Int = DEFAULT_VORA_CATALOG_TIMEOUT, // = 30
discoveryAddress: Option[String] = None): Unit
path The path to the location where you want to write the JSON files containing the met
adata.
voraCatalogTimeout The timeout duration for the SAP HANA Vora catalog connection in seconds. The
default is 30.
discoveryUrl The connection URL for the Discovery service. This is needed if the Discovery serv
ice agent is not running on every node in the cluster.
The dumpMetadata function, when called with the appropriate parameters, dumps the JSON file
containing the metadata to the specified path. Three files are written:
○ tables.json: metadata for all tables
○ views.json: metadata for all views
○ partitioningFunctions.json: metadata for all partitioning functions
Related Information
Import Metadata into SAP HANA Vora 1.3 [page 58]

2.10.2 Update SAP HANA Vora Using Ambari
Use the Ambari cluster provisioning tool to install the latest version of SAP HANA Vora on your cluster. To
allow a fresh install, you first need to perform a complete uninstall of SAP HANA Vora.
Procedure
1. SAP HANA Vora 1.3 only: Stop the SAP HANA Vora services on the SAP HANA Vora Manager UI.
a. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000).
b. Choose Stop All to stop all services.
c. Optional: Export the service configuration if you want to use it again after the update, provided it is still
compatible.
2. Stop the SAP HANA Vora services on the Ambari dashboard.
a. In the Services panel, select an SAP HANA Vora service (SAP HANA Vora 1.2) or the Vora Manager
service (SAP HANA Vora 1.3).
b. In the Service Actions dropdown menu on the Services page, choose Stop.
c. SAP HANA Vora 1.2 only: Repeat for all other SAP HANA Vora services.
3. Remove the services.
a. Run the following command from any machine where curl is available, for example, the management
node of the cluster, replacing the placeholders with appropriate values:
curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -X DELETE -H 'X-Requested-

By:admin' \
http://<YOUR_MGMT_NODE_FQDN>:8080/api/v1/clusters/\
<YOUR_CLUSTER_NAME>/services/<SERVICE_NAME>
Replace SERVICE_NAME as follows:
Service service_name
SAP HANA Vora 1.2
Vora Base HANA_VORA_BASE
Vora Catalog HANA_VORA_CATALOG
Vora Discovery HANA_VORA_DISCOVERY
Vora Distributed Log HANA_VORA_DLOG
Vora Thriftserver HANA_VORA_THRIFTSERVER
Vora Tools HANA_VORA_TOOLS
Vora V2Server HANA_VORA_V2SERVER
SAP HANA Vora 1.3
Vora Manager HANA_VORA_MANAGER

Note
If a service is shown as stopped on the Ambari UI, but Ambari responds that it isn't when you try
and remove it, you can use the following commands to stop it:
To stop a component, run the following command for every component of the SAP HANA Vora
service:
curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -H "X-Requested-By: ambari" -X

PUT -d '{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":
{"state":"INSTALLED"}}}' http://$AMBARI_SERVER:8080/api/v1/clusters/
$CLUSTER_NAME/hosts/$COMPONENT_MACHINE/host_components/$COMPONENT_NAME
To stop a service, run the following command once for the SAP HANA Vora service:
curl -u <AMBARI_USER>:<AMBARI_PASSWORD> -H "X-Requested-By: ambari" -X

PUT -d '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":
{"state":"INSTALLED"}}}' http://$AMBARI_SERVER:8080/api/v1/clusters/
$CLUSTER_NAME/services/$SERVICENAME
b. On the Ambari cluster management node, remove the vora-<component> folders from the
directory /var/lib/ambari-server/resources/stacks/HDP/<HDP_version>/services/.
4. For patch updates only: Remove old data.

Remove the following Vora directories on all hosts:
○ /var/log/vora*
○ /var/local/vora/
○ /lib/vora*
○ /etc/vora/
○ /var/run/vora/
You might also want to remove /run/lock/vora/, /var/lock/vora/, and /var/log/messages-*.
5. Reinstall SAP HANA Vora.
a. Download and extract the new SAP HANA Vora version. See Prepare for Installation [page 23].
b. Create an initial user and password. See Generate an Initial Password for SAP HANA Vora [page 25].
c. Install the SAP HANA Vora Manager. See Install the SAP HANA Vora Manager for Ambari [page 28].
d. Configure and start the SAP HANA Vora services. See Deploy the SAP HANA Vora Services [page 36].
2.10.3 Update SAP HANA Vora Using Cloudera
Use the Cloudera cluster provisioning tool to install the latest version of SAP HANA Vora on your cluster. To
allow a fresh install, you first need to perform a complete uninstall of SAP HANA Vora.
Procedure
1. SAP HANA Vora 1.3 only: Stop the SAP HANA Vora services on the SAP HANA Vora Manager UI.

a. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000).
b. Choose Stop All to stop all services.
c. Optional: Export the service configuration if you want to use it again after the update, provided it is still
compatible.
2. Stop the SAP HANA Vora services on the Cloudera Manager UI.
a. On the Cloudera Manager Home page, click to the right of an SAP HANA Vora service (SAP HANA
Vora 1.2) or the Vora Manager (SAP HANA Vora 1.3) and choose Stop in the dropdown menu.
b. Choose Stop to confirm.
3. Delete the SAP HANA Vora services.
a. On the Home page, click to the right of an SAP HANA Vora service (SAP HANA Vora 1.2) or the Vora
Manager (SAP HANA Vora 1.3) and choose Delete in the dropdown menu.
b. Choose Delete to confirm.
4. Delete the parcels.
a. Choose Hosts Parcels .

b. Choose the Deactivate button next to SAPHanaVora and confirm.
c. In the dropdown menu next to SAPHANAVora, choose Remove From Hosts and confirm.
d. In the dropdown menu next to SAP HANA Vora, choose Delete and confirm.
e. Delete the SAP HANA Vora files in the directory /opt/cloudera/csd and /opt/cloudera/
parcel-repo/ from the management node.
5. For patch updates only: Remove old data.
Remove the following Vora directories on all hosts:
○ /var/log/vora*
○ /var/local/vora/
○ /lib/vora*
○ /etc/vora/
○ /var/run/vora/
You might also want to remove /run/lock/vora/, /var/lock/vora/, and /var/log/messages-*.
c. Install the SAP HANA Vora Manager. See Install the SAP HANA Vora Manager for Cloudera [page 29].

2.10.4 Update SAP HANA Vora for MapR
To update SAP HANA Vora for MapR, you need to perform an uninstall followed by a fresh install.
Prerequisites
In order to avoid data loss:
● Use the same hosts as before for the Distributed Log service
● Do not change the persistency of the Distributed Log service
Procedure
1. Stop the HANA Vora Services completely, either using the MapR Control System or with the MapRCLI
command line tool.
2. Back up the configuration file:
cd /opt/mapr/conf/conf.d
cp vora_default_settings.sh vora_default_settings.sh.bak
3. On all cluster nodes, remove the "mapr-vora-base" package. This will also remove all dependent SAP
HANA Vora packages:
yum remove mapr-vora-base

c. Install the SAP HANA Vora Manager. See Installing the SAP HANA Vora Manager for MapR [page 31].
Adjust the configuration file vora_default_settings.sh based on your previous settings.

2.10.5 Import Metadata into SAP HANA Vora 1.3
Use the JSON files containing the metadata you exported from SAP HANA Vora 1.2 to import it into an SAP
HANA Vora 1.3 cluster.
Prerequisites
In order for the data import into SAP HANA Vora 1.3 to work, the file paths on which the tables depend, as
specified in the metadata, still need to be valid. If this is not the case, the metadata will not be loaded
successfully.
Context
You can load metadata that was exported to JSON files from SAP HANA Vora 1.2 either directly from the JSON
files or using JSON strings.
Procedure
Use the load data utitlity to load the data as follows:
import com.sap.spark.vora.client.LoadOldMetadataUtil
val util = new LoadOldMetadataUtil(
sqlContext: SQLContext,
voraCatalogTimeout: Int = DEFAULT_VORA_CATALOG_TIMEOUT,
discoveryUrls: List[String])
util.loadPartitioningFunctionMetadata(
jsonFile: Option[File] = None,
jsonString: Option[String] = None): Unit
util.loadTableMetadata(
util.loadViewMetadata(
sqlContext The SapSQLContext
voraCatalogTimeout The timeout duration for the SAP HANA Vora catalog connection in seconds. The de
fault is 30.
discoveryUrls The connection URLs for the Discovery service. If there is a Discovery service agent
running on every node, then ("localhost" :: Nil) is enough. However, if there
are nodes in the cluster without a Discovery service agent running, the parameter
should contain a list of valid connection URLs.

jsonFile The JSON file representing the metadata
jsonString The JSON string representing the metadata
As the code above shows, you first need to create an instance of the LoadOldMetadataUtil class with
appropriate parameters.
You then call the loadTableMetadata, loadViewMetadata, and loadPartitioningFunctionMetadata

functions to recreate the corresponding tables, views, and partitioning functions. These three functions can be
called with either a JSON file or a JSON string, but not with both. Each function parses the corresponding
metadata to create the tables, views, or partitioning functions.
Note the following points to ensure that the recreation works without any problems:
○ If there is partitioning metadata, it should be loaded first because the tables might depend on it.
○ Tables should be loaded before views, since views depend on tables.
○ To be able to load tables, the files specified in the table metadata must exist, otherwise the tables cannot
be created and loaded.
Related Information
Export Metadata from SAP HANA Vora 1.2 [page 53]
2.11 SAP HANA Vora Default Ports
By default, SAP HANA Vora is configured to use the port numbers given below.
Component Port Number
Zeppelin 9099
Thrift server 19123
SAP HANA Vora Tools 9225
SAP HANA Vora Manager UI 19000
Related Information
Manage Ports [page 75]

3 Administration
There are some standard administration tasks you need to perform and best practices for the ongoing
operation of your SAP HANA Vora services and Hadoop cluster.
See the following topics:
Topic Description
Configure Proxy Settings [page 61] If your cluster runs behind a proxy, set up your proxy settings
Enable Spark Auto-Registration [page 62] Automatically load data sources on startup
Sizing Configuration [page 63] Configure the SAP HANA Vora disk engine sizing, the SAP HANA Vora in-
memory engine sizing, the Spark parameters related to the result han
dling and performance of SAP HANA Vora queries, as well as the param
eters related to SAP HANA Spark Controller resources
Run SAP HANA Vora As a Non-Root User [page Set up a non-root user to run SAP HANA Vora in the Ambari or Cloudera
67] environment
Start and Stop the SAP HANA Vora Manager Start, stop, and restart the SAP HANA Vora Manager on your cluster
[page 69]
Start and Stop the SAP HANA Vora Services Start, stop, and restart the SAP HANA Vora services on your cluster
[page 71]
Examine the SAP HANA Vora Nodes [page Check your SAP HANA Vora cluster nodes' service assignments and
74] their resource usage
Check the Connection Status [page 74] Check the status of the connections between SAP HANA Vora and other
components and systems
Manage Ports [page 75] Manage the ports used by the SAP HANA Vora Manager and SAP HANA
Vora services
Manage Users [page 76] Manage the users for the SAP HANA Vora Manager UI and SAP HANA
Vora Tools
Delete the SAP HANA Vora Service State [page Remove the complete in-memory and on-disk state of all SAP HANA
77] Vora services
SAP HANA Vora Logs [page 78] Check the locations of the SAP HANA Vora logs
Cluster Utilities [page 79] Use these methods, for example, to force a data reload, clear the cata
log, or clear health information from the Consul discovery service
Accessing SAP HANA Vora from SAP HANA Connect from SAP HANA to SAP HANA Vora using SAP HANA smart
[page 80] data access (SDA)
Best Practices: Administration and Operations Achieve higher performance on your cluster by observing some basic
[page 87] best practices

60 PUBLIC Administration
3.1 Configure Proxy Settings
If your cluster runs behind a proxy, you need to set up your proxy settings correctly so that the SAP HANA
Vora engine and Spark are able to access external services, such as Amazon S3.
Procedure
1. Make sure that the following environment variables have been configured with the appropriate URLs in
the /etc/environment file:
http_proxy
HTTP_PROXY
https_proxy
HTTPS_PROXY
FTP_PROXY
ftp_proxy
no_proxy
You can add variables to the /etc/environment file as follows:
Sample Code
export http_proxy=http://proxy.example.com:8080
export HTTP_PROXY=http://proxy.example.com:8080
export https_proxy=https://proxy.example.com:8080
export HTTPS_PROXY=https://proxy.example.com:8080
If any of the variables are not set up properly, make the necessary corrections and then restart the SAP
HANA Vora service using the cluster provisioning tool (for example, Ambari or Cloudera Manager).
2. Make sure that the following variables are passed to the JVM running the Spark driver:
http.proxyHost
http.proxyPort
https.proxyHost
https.proxyPort
You can do this by setting the extraJavaOptions property in the spark-defaults.conf file.
○ If you are running Spark in YARN client mode, you can set the property as follows:
spark.yarn.am.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> -
Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> -
Dhttps.proxyPort=<HTTPS_PORT>
○ If you are running Spark in YARN cluster mode, you can set the property as follows:
spark.driver.extraJavaOptions -Dhttp.proxyHost=<HTTP_HOST> -
Dhttp.proxyPort=<HTTP_PORT> -Dhttps.proxyHost=<HTTPS_HOST> -
Dhttps.proxyPort=< HTTPS_PORT>

Administration PUBLIC 61
3.2 Enable Spark Auto-Registration
The spark.sap.autoregister option is a Spark configuration parameter that specifies which data sources
should be automatically loaded on startup. This allows all tables that were previously loaded and saved in the
SAP HANA Vora catalog to be re-registered in the Spark context automatically.
Prerequisites
To use Spark auto-registration, the Discovery Service must be up and running.
Context
When you run the Thriftserver, for example, all tables will be automatically registered at startup if Spark auto-
registration is enabled.
To enable Spark auto-registration, you can set the Spark auto-registration option in the Spark defaults
configuration file or when executing spark-submit.
Procedure
● Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) in

the spark-defaults.conf file:
Sample Code
spark.sap.autoregister com.sap.spark.vora
spark.vora.discovery <discovery_service_url>
● Set the spark.sap.autoregister parameter and spark.vora.discovery parameter (optional) when

executing spark-submit:
Sample Code
spark-submit --conf spark.sap.autoregister=com.sap.spark.vora

--conf spark.vora.discovery=<discovery_service_url>

3.3 Sizing Configuration
Configure the parameters related to SAP HANA Vora disk engine sizing, SAP HANA Vora in-memory engine
sizing, the Spark parameters related to the result handling and performance of SAP HANA Vora queries, as
well as the parameters related to SAP HANA Spark Controller resources.
Related Information
Vora Disk Engine Sizing [page 63]

Vora In-Memory Engine Swapping Mechanism [page 65]
Spark [page 66]
Spark Controller [page 67]
3.3.1 Vora Disk Engine Sizing
Configure the disk engine memory sizing and database sizing.
3.3.1.1 Disk Engine Memory Sizing
You can set the maximum memory usage for the SAP HANA Vora disk engine using the following parameters
on the SAP HANA Vora Manager UI:
Memory limitation for underly This value sets the maximum memory for the following:
ing disk engine
● The main persistent data cache size
● The temporary transient data cache size
● The main segment designed to be used for loading operations
If you set this parameter to 3000, each of the three memory segments can be allo
cated up to 3000 MB separately. You may find it helpful to increase this limit when cer
tain queries or load operations starve the memory. For the best overall performance,
you should set this parameter to 25% of the total machine RAM.
Memory limitation for disk en This value sets the catalog store cache size upper memory limit of the underlying disk
gine catalog engine. The default is 256m (256 MB).
In some extreme cases, the standard catalog cache size might be too small, for exam
ple, to accommodate certain queries that need a lot of parsing. In these cases, you may
find it helpful to increase this limit.

3.3.1.2 Disk Engine Database Sizing
You can configure the database sizing for the SAP HANA Vora disk engine using the following parameters on
the SAP HANA Vora Manager UI:
Initial main database size for The initial database file size in megabytes. The default is 10000.
disk engine
This parameter determines the size of the initial main user space for the database. Us
ing the default value, 10 GB of main space will be allocated on disk. The main space
stores the database objects such as tables, indexes, and table metadata. Depending on
the nature of the data to be loaded into the engine, you can expect to have larger loads
fit into smaller main spaces since the data is compressed in the engine.
You can let the engine dynamically add space by setting a lower initial size than the size
of the load per node. For example, you set 50 GB as the initial main database size. For a
500 GB load per node, the space could be increased to 200 GB during the load, allow
ing it to successfully finish.
Initial temp database size for The initial temp database file size in megabytes. The default is 1000.
disk engine
This parameter determines the size of the initial temporary space for the database. Us
ing the default value, 1 GB of temporary space will be allocated on disk. The temporary
space stores temporary database objects such as temporary tables and indexes. Data
exchanged between nodes is stored in temporary space before it is stored in main
space. It is recommended to increase this value to 10-15 GB if the database size ex
ceeds 100 GB.
Initial system database size for The initial system database file size in megabytes. The default is 10000.
disk engine
This parameter determines the size of the initial system space for the database. Using
the default value, 10 GB of system space will be allocated on disk. The system space
stores important database structures, including the free list, which lists blocks in use,
transaction information, and other internal structures required for proper operation of
the engine. For databases with less than 100 GB per node, it is recommended that the
system space be 5-10% of the size of the main database space with a minimum of 4
GB.
For databases that exceed 100 GB, it should be 1-2% of the size of the main database
space with a minimum of 8 GB. For example, 10-20 GB if one node is to have a 1 TB
database. Note that the engine will try to increase the size if the space usage exceeds a
certain limit, however it is recommended to set proper initial sizes.
The database is created on the node after the first CREATE TABLE statement is issued on the disk engine.
These values are only effective when creating a new database on the node. If the node already has a database
in the database directory, the engine connects to that database and omits creating the database and ignores
the initial size parameters. The initial size parameters will be used again on the same node either after a
wipeout or when different database directories are chosen. These database spaces will be dynamically
increased as the database grows. The higher you set these values, the more time it will take for the initial
database creation.

3.3.1.3 Disk Engine Database Directories
When you allocate system directories for database files, do not use file systems that are shared over a local
area network. Doing so can lead to poor I/O performance and other problems, including overloading the local
area network. Overall performance of the engine can be improved by locating the database log files on a
dedicated disk drive.
You can configure the disk engine directory locations for the SAP HANA Vora disk engine using the following
parameters on the SAP HANA Vora Manager UI:
Disk engine database directory The path name of files containing the database main, temporary, and system spaces
for the underlying disk engine.
Disk engine database log direc The path name of the segment containing the message trace file and the transaction
tory log file of the underlying engine.
Disk engine temporary data di The path name of the intermediate temporary files that are created to load data into or
rectory exchange data in the engine. After injecting data into the engine, the temporary files
are deleted. The required temporary size depends on the exchange chunk size (typi
cally less than 500 MB) and the number of disk nodes in the cluster.
3.3.2 Vora In-Memory Engine Swapping Mechanism
You can activate a swapping mechanism for the relational in-memory engine by setting a memory limit in its
configuration.
It is recommended to set the limit at no more than half the RAM space of each worker node. When the tables
stored in the in-memory engine on any of the nodes exceed this limit, the engine on that node tries to unload
data to disk, based on a least-recently-used algorithm, that is, the data that hasn’t been accessed for a long
period of time is unloaded first.
The unload happens on the granularity of table columns. A column can only be unloaded if it is not currently
being used by any queries. When an unloaded column is needed again, the entire column is loaded back into
memory. However, since the in-memory engine is optimized to handle all data in memory, heavy use of the
unload mechanism has a negative impact on performance.
Ideally the amount of data loaded into the in-memory engine should therefore not exceed about half the total
RAM space of the cluster. If required, however, and negative performance effects are acceptable, this limit can
be exceeded as long as there is free disk space.
Furthermore, it is not possible to load a table that is itself larger than (number of nodes * memory
limit), because during the load the entire table has to be in memory.

You can configure the swapping mechanism for the relational in-memory engine using the following
parameters on the SAP HANA Vora Manager UI:
Memory limit for Vora In-Mem The default value for this parameter is -1, which means that there is no memory limit.
ory Engine swap When this value is changed to a non-negative value, the in-memory engine considers it
as a memory limit in bytes on each node where it is started. When the tables stored in
the in-memory engine on any of the nodes exceed this limit, the engine on that node
tries to unload data to disk.
If the unload is not sufficient to reduce the memory used by the in-memory engine’s
tables below the limit, an out of memory error is thrown and the corresponding host is
marked as failed.
Swap directory The local path to the folder into which unloaded data is written. By default, this
is /var/local/vora/vora-v2server/swap. In general, it has to be a folder
where the vora user has write access.
3.3.3 Spark
When SAP HANA Vora is integrated into Spark, we propose that the respective Spark jobs are run as YARN
applications.
The following Spark parameters affect the result handling and query performance of SAP HANA Vora queries:
spark.executor.instances This affects parallelism when data is queried from SAP HANA Vora ta
bles. This parameter should be at least equal to the number of installed
engines (for example, 5 if 5 relational in-memory engines are installed).
spark.executor.memory This affects the intermediate result size that can be stored in memory.
This parameter should be at least 2 GB and must be increased if Spark
has problems transferring huge results in shuffle stages or when writing
data to disk.
spark.yarn.am.memory (yarn-client This affects the result size of SAP HANA Vora queries that can be trans
mode) ferred or shown in client applications, such as the Thrift server or Zeppe
lin. This parameter should be at least 2 GB.
spark.driver.memory (yarn-cluster Depending on the Spark application, the driver might need to handle in
mode) termediate results. This parameter should be at least 2 GB.
Note
Please consult the Spark documentation for information about Spark sizing.
Note that SAP HANA Vora resource managment is not controlled by YARN.

Related Information
Spark Hardware Provisioning
3.3.4 Spark Controller
The Spark controller is an SAP HANA component. The section below outlines some basic best practices for
configuring the SAP HANA Spark controller resources.
1. Hadoop resources are typically shared across multiple engines and use cases. As the SAP HANA
administrator, work together with your Hadoop administrator and agree upon the allowed resource
allocation for the SAP HANA Spark controller.
2. It is good practice to create a separate YARN queue with a percentage of resources specifically for the
Spark controller. This allows better resource management and monitoring.
3. You can use the spark.yarn.queue property to leverage the queue created above.
4. There are two other properties that define resource allocation:
○ spark.executor.memory
We recommend a minimum of 3g for optimal performance (this can be lowered to 1g in a development
environment). You only need to increase this value if out of memory exceptions occur (due to skewed
partitioning or data intensive operations). However, this generally works well in most use cases.
○ spark.executor.instances
This is basically the number you get from the parameters above:
Min(number of virtual cores allocated to queue, (available memory in queue /
spark.executor.memory))
A higher number of instances will not cause any issues. Spark runs with the maximum number of
executors it manages to commission. It is better not to set a lower value, since the performance for
queries on large data sets or concurrent queries is directly proportional to it.
3.4 Run SAP HANA Vora As a Non-Root User
You can run SAP HANA Vora with a non-root user in the Ambari and Cloudera environments.
Prerequisites
You have created a password-less sudo user on all nodes by adding the following line to /etc/sudoers:
%<USER> ALL=(ALL) NOPASSWD: ALL

Context
A user cannot be changed from root to non-root automatically. This procedure involves manual steps that
need to be performed on all applicable nodes of the cluster. We recommend that you configure the user or
group correctly during the initial deployment of SAP HANA Vora. In this case you only need to perform steps 3
and 4 below.
Procedure
1. Make sure that you have stopped all running Vora services from the SAP HANA Vora Manager UI.
2. From the cluster provisioning tool (Ambari or Cloudera), stop the SAP HANA Vora Manager.
3. Set permissions for the new user or group for the following files:
○ The password files for the SAP HANA Vora Manager and SAP HANA Vora Tools
○ The SAP HANA Vora keytabs, certificates, and private keys
○ The following directories:
○ chown -R <user>:<group> <log directories of all vora services> => for

the default configuration: chown -R <user>/<group> /var/log/vora
○ chown -R <user>:<group> <vora_disk_data_dir> <vora_disk_tmp_dir>

<vora_disk_database_log_dir>
○ chown -R <user>:<group> <vora_dlog_store_dir>
○ chown -R <user>:<group> <vora_thriftserver_metastore_dir>
○ chown -R <user>:<group> /etc/vora /var/run/vora /var/lock/vora
○ chown -R <user>:<group> <vora_scheduler_data_dir>

<vora_discovery_data_dir>
Note that this step needs to be performed manually on all applicable nodes of the cluster.
4. In Ambari or Cloudera, go to the Vora Manager configuration screen.
Option Description
Ambari In the Advanced vora-manager-config section, set vora_manager_run_as_user and save your changes.
Note that for Ambari, SAP HANA Vora currently only supports the same name for the user and group, for
example, user vora, group vora.
Cloudera 1. Set the following and save your changes:

○ User to run vora services
○ Group to run vora services
○ System User
○ System Group
2. Click Actions Deploy client configuration .
5. Start the SAP HANA Vora Manager.

6. Start the Vora services from the SAP HANA Vora Manager UI.
3.5 Start and Stop the SAP HANA Vora Manager
Use the cluster provisioning tool to start, stop, and restart the SAP HANA Vora Manager on your cluster.
Context
Note that Ambari is used in the procedure below. The procedure is similar for Cloudera and MapR.
Procedure
1. On the Ambari dashboard, select the Vora Manager service in the Services panel.
The Services summary tab shows how many instances of the SAP HANA Vora Manager are running, for
example:
2. On the Services page, you have the following options:

○ To start, stop, or restart all instances of the SAP HANA Vora Manager, choose the appropriate option
in the Service Actions dropdown menu:
Option Description
Start Starts the Vora Manager service on all hosts
Stop Stops the Vora Manager service on all hosts
Restart All Stops and then starts the Vora Manager service on all hosts
Restart Vora Manager Workers Performs a rolling restart of the Vora Manager Workers across all hosts.
You can specify the following:
○ The number of instances to be started at a time
○ How long to wait between batches
○ The number of allowed restart failures
○ To only restart instances with stale configuration
○ To activate maintenance mode
Turn On Maintenance Mode Suppresses alerts generated by the Vora Manager service
○ To start, stop, or restart the instances by host:

1. Click the Vora <Master/Workers/Clients> link.
If the selected service is running on more than one host, a list of hosts is displayed.
2. Click the relevant host link.
The component list and host details are displayed.
3. In the component list, locate the SAP HANA Vora Manager service and choose the appropriate
option from the dropdown menu. For example:

Related Information
SAP HANA Vora Manager and SAP HANA Vora Services [page 20]
3.6 Start and Stop the SAP HANA Vora Services
SAP HANA Vora provides a dedicated Web UI for managing the configuration and deployment of the SAP
HANA Vora services. It allows you to start, stop, and configure the SAP HANA Vora services on your cluster.
Context
The SAP HANA Vora Manager UI is available at: <VORA MASTER HOST>:19000
Choose the Services tab to display the list of SAP HANA Vora services and access the functions for configuring
and deploying them.
Note
SAP HANA Vora instances hold data in memory and boost the performance of the compute nodes. When
you stop or restart the SAP HANA Vora engine instances, the data is removed completely from the in-
memory database. This means that the fraction of data a certain instance was responsible for will have to
be reloaded from disk when it is needed by a query again.
Procedure
Start, stop, and manage the configuration and node assignments of the SAP HANA Vora services as follows:
To ... Do the following
Start services ○ All services:

1. In the menu bar on the left, choose Start All.
○ Selected service:
1. Select a service in the list.
2. In the menu bar on the right, choose Start.
Stop services ○ All services:

1. In the menu bar on the left, choose Stop All.
2. In the menu bar on the right, choose Stop.

Download service configuration ○ All services:

1. In the menu bar on the left, choose Download Configuration. The configura
tion is downloaded as a JSON file: vora-services.json
2. In the menu bar on the right, choose Download Configuration. The configu
ration is downloaded as a JSON file: vora-<service>.json
Upload service configuration 1. In the menu bar on the left, choose the Upload button.
2. Browse to the relevant directory and double-click the applicable JSON configu
ration file to upload it. If services are running, you will be prompted to confirm
that the services should be stopped to apply the uploaded configuration.
Configure a service 1. Select a service in the list.

2. On the configuration tab on the right, enter the configuration details or choose
one of the following:
○ Load Default to load the default configuration
○ Upload to upload the service configuration from a selected JSON file
3. Choose Apply to save the configuration.

Remove a configuration
2. In the menu bar on the right, choose Clear. This removes all settings. The serv
ice status is reverted to Not Configured.
Assign nodes (hosts) 1. Select a service in the list.

2. Switch to the Node Assignment tab.
3. In the Number of instances field, enter the number of instances you want to run
on the assigned nodes.
Note that if a service only supports one instance, this parameter is set to 1 and
cannot be changed.
Note also that for the Vora Distributed Log the number of instances automati
cally equals the number of nodes selected.
4. Select the Distinct hosts flag for instances option if only one instance should run
on each selected host.
5. To assign nodes, select the individual nodes or choose Select All.
6. Choose Apply to save the node configuration for the selected service.
It the service is running, the Change a running service dialog appears, prompting
you to confirm that the service should be migrated to run on the selected nodes.
This allows you to apply your updates to the affected nodes only, without stop
ping all instances of the service.
7. Choose OK to accept the service migration option.
Unassign nodes (hosts) 1. Select a service in the list.

2. Switch to the Node Assignment tab.
3. Deselect individual nodes or choose Clear All.
4. Choose Apply to save the node configuration.

It the service is running, the Change a running service dialog appears, prompting
you to confirm that the service should be migrated to run on the selected nodes.
This allows you to apply your updates to the affected nodes only, without stop
ping all instances of the service.
5. Choose OK to accept the service migration option.
Note
To run services on new hosts that are not yet part of your cluster, you first need to add the new hosts to the
cluster using the standard procedure supported by your cluster provisioning tool (Ambari, Cloudera, or
MapR). Then follow the steps described above to configure and run the services on these hosts.
Next Steps
After restarting the SAP HANA Vora services, any tables held in the SAP HANA Vora in-memory database will
have been removed, but the associated metadata will still be available. This allows you to force a table reload.
To do so, start the Spark shell and run the markAllHostsAsFailed() function in the ClusterUtils object:
com.sap.spark.vora.client.ClusterUtils.markAllHostsAsFailed(discoveryAddress:
Option[String] = None): Unit
Spark will assume that the SAP HANA Vora engine instances are empty and reload the data according to the
metadata information.
Note that discoveryAddress is the address of the Consul Discovery service. If no argument is passed, the
method will try to connect to the local Consul Discovery agent.
Related Information

SAP HANA Vora Manager and SAP HANA Vora Services [page 20]

3.7 Examine the SAP HANA Vora Nodes
The Nodes tab of the SAP HANA Vora Manager UI provides an overview of your cluster nodes, the SAP HANA
Vora services running on them, and each node's resource usage.
Procedure
1. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000) and log in.
2. Choose the Nodes tab.
The nodes are listed on the left with the following information:
○ Their roles (master, worker, or both)
○ The number of service instances running on them and their status (passing, critical, or warning)
3. Display the service details for a specific node:
a. Select a node in the list.
b. On the right, choose the Services tab, which shows the following:
○ Each running service instance with its technical name
○ The service status (passing, critical, or warning)
○ The health status for the Vora catalog (0 - passing, 1 - warning, 2 - critical)
○ The port used by the service (except the Vora catalog)
4. Display the statistics for a specific node:
a. Select a node in the list.
b. On the right, choose the Stats tab, which shows the following:
○ The amount of memory, in total, available, used, and free
○ The amount of CPU used by the user, system, and that is idle
○ The amount of disk space, in total, used, and available
3.8 Check the Connection Status
The connection status indicates whether there are active connections to the components and systems used
by the SAP HANA Vora Manager and SAP HANA Vora Tools.
Procedure
1. Open the web UI and log in:
○ SAP HANA Vora Manager: <VORA MASTER HOST>:19000

○ SAP HANA Vora Tools: http://<VORA TOOLS HOST>:9225

2. In the top right corner, choose the Connection Status/Connection: <status> button:
3. Check the information displayed in the Connection Status dialog box:
SAP HANA Vora Manager Description
Consul The status of the connection to the Consul service, at the given IP address
and port
Nomad The status of the connection to the Nomad service, at the given IP address
and port
SAP HANA Vora Version The version of SAP HANA Vora currently in use
SAP HANA Vora Tools Description
Client Version The client version of the SAP HANA Vora Tools
Vora The status of the connection to the Thrift server, shown in the form host
name and port, together with the user used to connect to the Thrift server.
For example, vora@thriftserverhost:19123.
HANA The status of the connection to SAP HANA, as defined in the Spark de
faults configuration file (spark-defaults.conf)
3.9 Manage Ports
You can manage the ports used by the SAP HANA Vora Manager and SAP HANA Vora services.
Context
The SAP HANA Vora Manager, SAP HANA Vora Tools and SAP HANA Vora Thriftserver are assigned default
ports during the installation of SAP HANA Vora. The SAP HANA Vora services, however, are dynamically
assigned port numbers by Nomad. The port numbers are between 20000-60000.
You can use the vora_manager_reserved_ports parameter to exclude the ports you do not want to be
assigned by Nomad. You might want to do this, for example, if your operating system is using some of the
ports within this range.
For information about the Vora transaction coordinator port used for the SAP HANA Wire, see Enable the SAP
HANA Wire for Smart Data Access.

Procedure
Choose the appropriate option to make changes to the port assignments:
Option Description
Change the Vora Manager port In the cluster manager, enter the new port number in the vora_manager_gui_port
field. In Ambari, for example, this is in the Advanced vora-manager-config section
on the Configs tab.
Change the Vora Tools port On the SAP HANA Vora Manager UI, change the port number in the Network port
for binding field on the Configuration tab of the Vora Tools service.
Change the Vora Thriftserver port On the SAP HANA Vora Manager UI, change the port number in the Network port
for binding field on the Configuration tab of the Vora Thriftserver service.
Exclude ports In the cluster manager, enter the port numbers to be excluded in the
vora_manager_reserved_ports field. In Ambari, for example, this is in the
Advanced vora-manager-config section on the Configs tab.
Related Information
SAP HANA Vora Default Ports [page 59]

Enable the SAP HANA Wire for Smart Data Access [page 81]
3.10 Manage Users
You can create, edit, and delete users for the SAP HANA Vora Manager UI and SAP HANA Vora Tools.
Context
All users can create new users, delete users, and change their own or other users' passwords. The user name
cannot be changed.
Note
If the SAP HANA Vora Manager is installed on multiple master nodes, any users you create on one of the
master nodes will not exist on the other master nodes. User names and passwords are stored in a file that is
not shared between the SAP HANA Vora Manager instances.

Procedure
1. Open the web UI and log in:
○ SAP HANA Vora Manager: <VORA MASTER HOST>:19000

○ SAP HANA Vora Tools: http://<VORA TOOLS HOST>:9225
2. Choose the User Management tab.
3. Choose the appropriate option:
Option Description
Create a new user 1. Choose Create.

2. In the Create User dialog box, enter the new user's name.
3. Enter the new user's password twice.
4. Choose OK to save your entries.
Change a user's password 1. Select the user and choose Edit.

2. Enter the new password twice.
3. Choose OK to save your entries.
Delete a user 1. Select the user and choose Delete.

2. Choose OK to confirm.
3.11 Delete the SAP HANA Vora Service State
The wipe-out tool allows you to delete the complete in-memory and on-disk state of all SAP HANA Vora
services.
Context
The wipe-out tool deletes data such as database schemas and tables, database transaction logs, metadata in
the SAP HANA Vora catalog, and so on. It does not, however, touch the configuration settings or the traces
and logs on the individual hosts.
The effects of applying the wipe-out procedure are similar to the results of a fresh install. The wipe-out option
should therefore be used with caution.
Procedure
1. Open the SAP HANA Vora Manager UI by pointing your browser to <VORA MASTER HOST>:19000 and log
in.
2. On the Services page, stop all services.
3. In the top right corner, choose the Wipe Out button:

You will be prompted to confirm that you want to proceed.
Caution
Use this option with care. It will stop all services and remove their data, resulting in potential data loss.
4. Confirm to start the wipe-out process.

When the wipe-out process has completed, you should see the following:
○ All services have been stopped.
○ The data has been removed, for example, in the /var/local/vora/* directories.
3.12 SAP HANA Vora Logs

The SAP HANA Vora services save their log files to the /var/log directories.
Log Directories
/var/log/vora-manager (file)
/var/log/vora/vora-catalog
/var/log/vora/vora-dlog
/var/log/vora/vora-disk
/var/log/vora/vora-docstore
/var/log/vora/vora-graph
/var/log/vora/vora-landscape
/var/log/vora/vora-manager
/var/log/vora/vora-thriftserver
/var/log/vora/vora-timeseries
/var/log/vora/vora-tools
/var/log/vora/vora-txbroker
/var/log/vora/vora-txcoordinator
/var/log/vora/vora-txlocker
/var/log/vora/vora-v2server
You can change the locations of each log folder except /var/log/vora-manager.
Note
The Ambari vora_manager_log_dir parameter specifies the directory used if Nomad, Consul, or the SAP
HANA Vora Manager UI generates exceptions or core dumps (for example, stderr, stdout).

3.13 Cluster Utilities
The ClusterUtils class provides a set of utility methods designed for administrators of the SAP HANA Vora
system.
numOfLoadingThreads()
This method can be used to determine the number of parallel blocking threads that can be generated within
the current execution environment.
com.sap.spark.vora.client.ClusterUtils.numOfLoadingThreads(maxNumOfLoadingThreads
: Int, creationTimeMillis: Int): Int
markAllHostsAsFailed()
Marks all hosts as failed. This method is useful for force loading data.
com.sap.spark.vora.client.ClusterUtils.markAllHostsAsFailed(discoveryAddress:
The parameter discoveryAddress is the address of the Consul discovery service. If no argument is passed,
the method will try to connect to the local Consul discovery agent.
cleanVoraCatalog()
Deletes all content in the metadata store for the SAP HANA Vora catalog.
com.sap.spark.vora.client.ClusterUtils.cleanVoraCatalog(discoveryAddress:
Use this method to clear the SAP HANA Vora catalog for test purposes or if it has become inconsistent, for
example, after updating the SAP HANA Vora engine or extension library. Bear in mind that once the catalog
has been cleared, you will need to re-create your tables in SAP HANA Vora.
Sample Code
import com.sap.spark.vora.client._
ClusterUtils.cleanVoraCatalog(Some("discoveryService:8500"))
clearPersistentHealthInformation()
Clears all persistent health information from the Consul discovery service.
com.sap.spark.vora.client.ClusterUtils.clearPersistentHealthInformation(discovery
Address: Option[String] = None): Boolean
releaseAllLocks()
Releases all locks stored inside the Consul discovery service.
com.sap.spark.vora.client.ClusterUtils.releaseAllLocks(discoveryAddress:

releaseLock()
Releases a single lock stored in the Consul discovery service, without affecting other locks.
releaseLock(lockId: String, discoveryAddress: Option[String] = None): Unit
workerParallelismReport()
Returns a textual report that shows how a given number of workers work in parallel.
com.sap.spark.vora.client.ClusterUtils.workerParallelismReport(sc:
SparkContext): String
3.14 Accessing SAP HANA Vora from SAP HANA
You can connect to and access data in SAP HANA Vora from SAP HANA using SAP HANA smart data access
(SDA). You can establish an SDA connection either through the SAP HANA Spark Controller or directly using
the SAP HANA Vora remote source adapter.
SAP HANA Spark Controller
The Spark controller provides access to a Hadoop cluster. When connecting through the SAP HANA Spark
Controller, an additional process is started on the Hadoop cluster that communicates with the SAP HANA Vora
engines through Spark. For more information about the Spark controller, see Using SAP HANA Spark
Controller.
You can use the SAP HANA Spark controller as follows:
1. Install and configure the Spark controller in SAP HANA and configure SAP HANA Vora to use it. See
Connect SAP HANA Spark Controller to SAP HANA Vora [page 45].
2. Create remote sources and virtual tables as described in the SAP HANA Administration Guide. See Create
a Remote Source and Managing Virtual Tables.
SAP HANA Vora Remote Source Adapter
You can create an SDA remote source connection directly to the SAP HANA Vora cluster using the SAP HANA
Vora remote source adapter voraodbc.
You can use the SAP HANA Vora remote source adapter as follows:
1. Ensure that the SAP HANA Wire protocol is enabled. See Enable the SAP HANA Wire for Smart Data
Access [page 81].
2. Create a remote source using the voraodbc remote source adapter. See Create an SAP HANA Vora
Remote Source [page 82].

3. Create virtual tables that represent the tables you want to access in the SAP HANA Vora remote source.
See Create Virtual Tables [page 83].
4. Optionally reroute stored procedures from SAP HANA to SAP HANA Vora, so that they run directly on the
applicable objects. See Reroute Stored Procedures [page 86].
Note
Note the following:
● The voraodbc SDA adapter is delivered with SAP HANA SPS12 and higher.
● You cannot connect to a Kerberos-enabled SAP HANA Vora cluster.
● You can currently only create virtual tables based on tables in the SAP HANA Vora disk engine.
3.14.1 Enable the SAP HANA Wire for Smart Data Access
SAP HANA Vora supports the SAP HANA Wire protocol, which allows a direct connection to be established
from SAP HANA to SAP HANA Vora using SAP HANA smart data access (SDA).
Context
The SAP HANA Wire is implemented in the SAP HANA Vora transaction coordinator and is enabled by default.
Procedure
1. Open the SAP HANA Vora Manager UI (<VORA MASTER HOST>:19000) and log in.
2. Choose Services.
3. In the list on the Services screen, select Vora Transaction Coordinator.
4. On the Configuration tab, select the HANA Wire activation option.
5. In the Instance number for Vora Transaction Coordinator field, enter the instance number of the Vora
cluster. This number will be used to derive the port number of the Vora transaction coordinator for the
remote source connection.
6. Stop the Vora Transaction Coordinator service.
7. Choose Apply to save your settings.
8. Start the Vora Transaction Coordinator service again.

3.14.2 Create an SAP HANA Vora Remote Source
Create an SDA remote source connection directly to the SAP HANA Vora cluster using the SAP HANA Vora
remote source adapter voraodbc.
Prerequisites
You have enabled the SAP HANA Wire. See Enable the SAP HANA Wire for Smart Data Access [page 81].
Procedure
On the SAP HANA instance, create a remote source using the following SQL statement:
Note
The SAP HANA Studio remote source editor (UI) does not currently support the SAP HANA Vora remote
source adapter.
CREATE REMOTE SOURCE <Name> ADAPTER "voraodbc"

CONFIGURATION 'ServerNode=<TC Server>:<TC HANA Wire Port>;Driver=libodbcHDB'
WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=my_user;password=my_password';
Replace the variables as follows:
○ <Name>: Name of the remote source to be displayed in the SAP HANA Studio.
○ <TC Server>: IP address/domain name of the server in the SAP HANA Vora cluster on which the
transaction coordinator is running.
○ <TC HANA Wire Port>: SAP HANA Wire port of the transaction coordinator, which is determined as 3XX15,
where XX is the instance number of the SAP HANA Vora cluster as configured in the SAP HANA Vora
Manager. (Note that it is not the number of the SAP HANA instance adding the SAP HANA Vora cluster as
a remote source). For example, when the instance number of the SAP HANA Vora cluster is configured as
25, the SAP HANA Wire port is 32515.
Note
SAP HANA Vora does not currently check credentials. You can use any user and password.
Results
The remote source is now listed under Provisioning Remote Sources . Expand the remote source to see
the users and tables.

Related Information
CREATE REMOTE SOURCE
3.14.3 Create Virtual Tables
Virtual tables represent the tables you want to access in the SAP HANA Vora remote source. You can add one
or more remote objects as virtual tables.
Prerequisites
A remote source connection has been created using the voraodbc adapter. See Create an SAP HANA Vora
Context
You can access data stored in the SAP HANA Vora disk engine. You cannot currently access data in the in-
memory relational engine, document store, time series engine, and graph engine.
Procedure
Add the table you want to access in the remote source as a virtual table:
Option Steps
SAP HANA Studio Provisioning 1. Expand the remote source to see the users and tables: Provisioning Remote
UI
Sources <remote-source> <user> .
2. Browse the tables and select the table you want to access.
3. From the context menu, choose Add as Virtual Table.
4. Enter a table name, select the schema where the virtual table should be stored on
your SAP HANA instance, and choose Create.
SQL Command CREATE VIRTUAL TABLE <local schema name>.<local table

name>
AT "<remote source>"."<NULL>"."<remote
schema>"."<remote table>"
Replace the variables as follows:

○ <local schema name>: Name of the schema on the SAP HANA instance in which
the virtual table should be created.

Option Steps
○ <local table name>: Name to be assigned to the virtual table.

○ <remote source>: Name of the remote source in which the remote table is lo
cated.
○ <remote schema>: Name of the schema in the remote source in which the remote
table is located.
○ <remote table>: Name of the table to be added as a virtual table from the remote
source.
Note that <NULL> is the NULL database item displayed when you browse the remote
source in the SAP HANA studio.
Note
○ SAP HANA imposes a maximum length of 256 characters for the names of schemas, tables, and
columns. If an SAP HANA Vora table does not meet these requirements, it cannot be added as a virtual
table.
○ Table names have to be uppercase so that SAP HANA can access the tables.
Results
The new virtual table is now listed under Catalog <schema> Tables . You can run SQL queries on
virtual tables in the same way as with normal SAP HANA tables.
Related Information
Managing Virtual Tables

SQL Query and Data Type Restrictions [page 84]
3.14.4 SQL Query and Data Type Restrictions
When creating and querying virtual tables based on SAP HANA Vora remote sources created through the
voraodbc adapter, be aware of the following restrictions.
SQL Queries
The following types of SQL queries are supported:
● SELECT queries

● Joins between SAP HANA and SAP HANA Vora tables
● INSERT/UPDATE
The following SQL queries are not currently supported:
● Prepared INSERT/UPDATE
● SELECT queries with LIMIT on one or more CHAR or VARCHAR columns. As a result, the SAP HANA
studio feature of selecting a virtual table and choosing Open Content from the context menu does not
work.
Note the following:
● Avoid SQL queries with excessively large result sets, for example, SELECT * without any WHERE
conditions on a table with 10^7 rows. This applies equally when you execute SQL queries directly in SAP
HANA Vora.
● Close sessions frequently and open a new session. Query results that are not completely fetched by the
SAP HANA client are not automatically freed by a timeout, so you could have a resource leak when a
session is left open for a long time.
Data Types
The main differences between the SAP HANA and SAP HANA Vora data types are listed below:
Data Type Difference
String types Maximum VARCHAR length in SAP HANA: 5000, CHAR: 2000.
Maximum length in SAP HANA Vora: 2bn. An SAP HANA Vora table cannot be
added as a virtual table if one of its columns exceeds the limit. Note that in this
case the error message in SAP HANA may be inconclusive.
TINYINT, SMALLINT, INTEGER, BIGINT There are different SQL integer types in SAP HANA, while there is only one (IN
TEGER) in SAP HANA Vora. SAP HANA Vora INTEGER columns are exposed as
BIGINT columns to SAP HANA.
Numeric types The following values are used to represent null in SAP HANA Vora and cannot be
inserted into SAP HANA Vora virtual tables from SAP HANA:
● INTEGER: min value

● FLOAT, DOUBLE: negative infinity value
DECIMAL Maximum precision in SAP HANA Vora: 18 digits
TIME SAP HANA Vora has split seconds, SAP HANA only has full seconds. On SELECT,
split seconds are cut off or rounded down.
TIMESTAMP Split-second precision in SAP HANA Vora is higher than in SAP HANA. On SE
LECT, digits that cannot be represented in SAP HANA are cut off or rounded
down.
DATE, TIMESTAMP Ancient date values are not currently supported, for example, earlier than the
year 1500.

3.14.5 Reroute Stored Procedures
You can reroute the execution of simple stored procedures from SAP HANA to SAP HANA Vora. In order to do
so, the stored procedure must be defined in both SAP HANA and SAP HANA Vora.
Prerequisites
A remote source connection has been created using the voraodbc adapter. See Create an SAP HANA Vora
Procedure
1. Create the stored procedure in both SAP HANA and SAP HANA Vora as follows:
a. SAP HANA:
CREATE PROCEDURE <ProcedureName> ( <ParameterMode> <ParameterIdentifier>

<ParameterType> )
READS SQL DATA AS BEGIN <Statement>; END;
b. SAP HANA Vora:
CREATE PROCEDURE <ProcedureName> ( <ParameterMode> <ParameterIdentifier>

<ParameterType> )
AS BEGIN <Statement>; END;
Note the following:

○ <ParameterMode>: IN/INOUT (OUT parameters are currently not supported)
○ <ParameterType>: SQL parameter type. Only primitive types are currently supported (not dates,
timestamps, or blobs). For example:
○ CHAR
○ VARCHAR
○ INTEGER
○ REAL
○ DOUBLE
2. Register or deregister a rerouting from SAP HANA to the SAP HANA Vora remote source as follows:
Option Steps
Register a rerouting alter procedure <ProcedureName>

add route to remote source <VoraRemoteSourceName>;
Deregister a rerouting alter procedure <ProcedureName>

drop route to remote source
<VoraRemoteSourceName>;

3. Optionally check which routes are registered in SAP HANA as follows:
select * from PROCEDURE_ROUTES;
4. Call a rerouted procedure as follows:
call <ProcedureName>(<Parameters>);
3.15 Best Practices: Administration and Operations
By observing some basic best practices, you can achieve higher performance on your Hadoop cluster.
A Hadoop cluster typically involves a very large number of relatively similar computers so, in general, a good
way to install a cluster is by distinguishing between four types of machines:
1. Cluster provisioning system with Ambari, Cloudera, or MapR installed

2. Master cluster nodes that contain systems such as HDFS NameNodes and central cluster management
tools (such as the Yarn resource manager)
3. Worker nodes that do the actual computing and contain HDFS data
4. Jump boxes that contain only client components. These machines allow users to start their jobs.
If you have a very specific setup where you have, for example, divided compute nodes and HDFS data nodes,
be aware that this might not be the best choice.
Related Information
HDFS [page 87]

Choosing a Cluster Manager [page 88]
Example Cluster Configuration Including a Client Machine (Jump Box) [page 88]
3.15.1 HDFS
By default HDFS stores three replicas of each data block on different machines. Besides providing the
necessary fault tolerance, this also increases data locality.
Be aware that if the data that is used for SQL processing is not evenly distributed, this might lead to longer
loading times for tables and therefore affect the performance of the cluster when used in combination with
SAP HANA Vora. This might be the case if you delete a large amount of data (it will be unbalanced) or if you
also use HDFS for data that is not used for processing with SAP HANA Vora.
Note
It is important to keep the data that you use in SAP HANA Vora/Spark as evenly distributed as possible on
HDFS to increase speed. There are a number of HDFS tools available to re-balance the data.

3.15.2 Choosing a Cluster Manager
The cluster manager is responsible for distributing tasks throughout the compute nodes of the cluster. Each
node that assumes computation tasks is managed by a cluster manager.
In order to run, an application requests resources from the cluster manager. If this is successful, the cluster
manager transfers the actual application to the nodes in question and starts it.
The cluster manager therefore serves as an abstraction layer for the application, allowing it to be developed
independently of the cluster setup. This means that Spark, as well as all its extensions for SAP HANA Vora, can
be installed on a single node and will then be automatically transferred to the compute nodes. The problem
with this, however, is that Spark itself also includes a cluster manager, called Spark standalone mode.
Logically, however, it is an independent system that is not related to the computational capabilities of Spark.
The system provided by SAP HANA Vora is completely independent of the cluster manager. If you are
deploying a test and development environment with a small number of nodes, we recommend that you choose
Spark’s standalone cluster manager. For information about how to install it, see the Spark manual.
Your Hadoop distribution usually comes with a built-in cluster manager. In most cases, this is Yarn. Yarn
distinguishes between Node Managers, which are responsible for a compute node, and the Resource Manager,
which keeps track of the overall workload of the cluster and distributes tasks to the Node Managers.
Note
If your cluster manager has central components, such as the Resource Manager, you should put them on
separate machines that do not compute jobs.
Related Information
Spark Standalone Mode
3.15.3 Example Cluster Configuration Including a Client

Machine (Jump Box)
This example shows how a small Hadoop system consisting of 60 nodes in total can be configured.
Each node is quite small and contains 32 GB of RAM. Yarn is used as the cluster manager. The nodes are
configured as follows:
● 1 Ambari server
● 2 master nodes (Resource Manager and NameNodes)
● 56 worker/compute nodes
● 1 jump box containing client components
All components are provisioned by Ambari with the standard settings. Particularly noteworthy is the way the
jump box is configured to enable a user to easily deploy applications and use the platform.

Each user is assigned a separate Linux user, including a home directory containing Spark binaries as well as a
shaded JAR of all the components and dependencies provided by SAP. Each user then has the following
directory structure:
● /home/user/spark: Symlink to the current Spark installation

● /home/user/sapjars: Shaded JARs
● Each user also has a home directory on HDFS
For convenience, the environment variables are configured as follows in the .profile file:
# Include spark home

export SPARK_HOME="$HOME/spark"
# Hadoop conf dir
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export YARN_CONF_DIR="/etc/hadoop/conf"
export JAVA_HOME="/usr/jdk64/jdk1.7.0_67/"
export PATH="$PATH:$SPARK_HOME/bin"
To use the SAP HANA Vora Spark integration component, certain system-specific variables need to be
configured in Spark. See the developer manual for more details. For convenience, these are configured in the
spark-defaults.conf file so that all system-specific variables are located in one place:
spark.driver.extraJavaOptions -XX:MaxPermSize=256m
# Uncomment the following line and enter your Amazon S3 secret access key, if
# you have one
# spark.vora.s3secretaccesskeyid <S3 secret access key>
Based on this configuration, users can easily start a shell or deploy an application with the following
commands:
spark-shell --num-executors 3 --driver-memory 4g --executor-memory 2g

--master yarn-client --jars ~/sapjars/shaded.jar
spark-submit --class com.sap.spark.vora.example.ExampleQueryHDFS
--master yarn-client --jars sapjars/shaded.jar SparkVoraTrialProject-0.0.1.jar

4 Security
When using a distributed system, you need to be sure that your data and processes support your business
needs without allowing unauthorized access to critical information. User errors, negligence, or attempted
manipulation of your system should not result in loss of information or processing time.
These demands on security apply likewise to SAP HANA Vora.
Security Guides
SAP HANA Vora functions as an execution engine within a Spark/Hadoop landscape. When installed on nodes
in an Ambari/Cloudera/MapR cluster, SAP HANA Vora becomes an available service that can be added
through the Ambari/Cloudera/MapR administration interface, in parallel with existing services. Therefore, the
corresponding security guides also apply to SAP HANA Vora:
Guide Noteworthy Sections
Ambari Security Guide Configuring Ambari and Hadoop for Kerberos
Cloudera Security Guide 5.7 Enabling Kerberos Authentication Using the Wizard
MapR Security Guide Enabling and Disabling Security Features on Your Cluster
Generating a maprticket from a Kerberos Ticket
Spark Security Full document
Related Information
Enabling Kerberos Authentication for SAP HANA Vora [page 91]

Configure SAP HANA Vora UI Security [page 108]
Verifying Consul UI Security Measures [page 109]

90 PUBLIC Security
4.1 Enabling Kerberos Authentication for SAP HANA Vora
It is assumed that you already have an active Kerberos environment and that Kerberos is enabled on the
underlying Hadoop cluster. SAP HANA Vora does not provide a Kerberos environment or any default security
configuration.
Task Overview
Task Description See
Before installing SAP HANA To install SAP HANA Vora, the HDFS superuser needs to be Enable Access to a Secured
Vora: able to access the secured cluster. Hadoop Cluster [page 93]
Ensure access to a secured

cluster
Create principals and key To set up Kerberos authentication for SAP HANA Vora, you Use SAP HANA Vora with the
tabs for SAP HANA Vora need to generate valid principals and keytabs specifically for MIT Kerberos Distribution
SAP HANA Vora, distribute them to the relevant nodes in [page 93]
the cluster, and protect them with the necessary security
Use SAP HANA Vora with Ac
measures. You need to manually create and copy the keytab
tive Directory [page 94]
files (that is, by hand using scp) to all nodes.
Hadoop in Secure Mode
Configure SAP HANA Vora to If HDFS security is enabled, the SAP HANA Vora services Enabling Authentication Be
access a secured HDFS need to be correctly configured to access it. tween SAP HANA Vora and
HDFS [page 95]
Configure the SAP HANA The SAP HANA Vora components can mutually authenticate Enable Authentication Be
Vora components' authenti each other to prevent any interaction by malicious parties. tween SAP HANA Vora Com
cation to each other Like Hadoop, SAP HANA Vora uses Kerberos as the authen ponents [page 98]
tication mechanism. This also works standalone for SAP
HANA Vora and doesn't require Hadoop security to be ena
bled.
Configure Spark on a secur If Spark is used on a security-enabled Hadoop or SAP HANA Configure Authentication Be
ity-enabled SAP HANA Vora Vora cluster, configuration is needed to allow it to access tween Apache Spark and
cluster the required resources. SAP HANA Vora [page 100]
Run the Spark Shell with Ker

beros Authentication [page
101]
Configure SAP Lumira to If SAP Lumira is used together with a Kerberos-enabled SAP Connect SAP Lumira to a
connect to a security-ena HANA Vora cluster, it needs to be configured to allow it to Kerberized SAP HANA Vora
bled SAP HANA Vora cluster connect to a Kerberos-enabled Thrift server through its Cluster [page 102]
Simba JDBC driver.
Use Kerberos with MapR Use MapR tickets on top of Kerberos tickets for user and Configuring Authentication
service authentication. for SAP HANA Vora with
MapR [page 105]

Security PUBLIC 91
Task Description See
Configure the Thrift server to Configure the SAP HANA Vora Thriftserver for Kerberos au Configure the Thrift Server
run on a security-enabled thentication. [page 106]
SAP HANA Vora cluster
4.1.1 Kerberos Overview and Requirements
Strong authentication is the basis of a secure Hadoop environment. To establish secure communication
among its components, Hadoop uses Kerberos.
Kerberos has three main components at a high level:
● A database of the users and services (known as principals) that it knows about and their respective
Kerberos passwords
● An authentication server (AS) that performs the initial authentication and issues a Ticket Granting Ticket
(TGT)
● A Ticket Granting Server (TGS) that issues subsequent service tickets based on the initial TGT
The set of hosts, users, and services over which the Kerberos server has control is called a realm. Note the
following Kerberos terminology:
Term Description
Key Distribution Center (KDC) The trusted source for authentication in a Kerberos-enabled environment
Kerberos KDC server The machine or server that serves as the Key Distribution Center
Kerberos client Any machine in the cluster that authenticates against the KDC
Principal The unique name of a user or service that authenticates against the KDC
Keytab A file that includes one or more principals and their keys
Realm The Kerberos network that includes a KDC and a number of clients
Requirements
For SAP HANA Vora to be able to access a Kerberos-enabled Hadoop system with an MIT or Active Directory
backend, the following principals are needed:
● A user principal for all the following tasks:

○ Enable engines to access HDFS
○ Submit jobs from SAP HANA Vora Tools to the Thrift server
○ Submit Spark jobs to the Thrift server
● A service principal for the following:
○ SAP HANA Vora Thriftserver
○ SAP HANA Vora engines
SAP suggests using a user principal with the name vora and a service principal called vora/<fqdn> for all
service principals.

92 PUBLIC Security
Related Information
Enabling Kerberos Authentication for SAP HANA Vora [page 91]
4.1.2 Enable Access to a Secured Hadoop Cluster
If Kerberos is enabled on your cluster, the SAP HANA Vora Ambari and Cloudera packages need to access
HDFS as the superuser during the installation of SAP HANA Vora. The HDFS superuser therefore needs to be
assigned the necessary credentials and tickets to be able to access a secured cluster.
Context
Prepare these credentials on the machine where the Ambari server or Cloudera management server is running
before starting the SAP HANA Vora installation.
Procedure
Execute the following command as the HDFS superuser:
Sample Code
kinit -kt <path/to/hdfs_user_keytab> <hdfs_superuser>
Superuser rights are also needed for MapR. Use the maprlogin command for this purpose.
4.1.3 Use SAP HANA Vora with the MIT Kerberos Distribution
SAP HANA Vora needs one user principal and one service principal to run properly.
Procedure
1. Create the necessary principals as shown in the example below:
sudo kadmin -p admin/admin -q "addprinc -randkey vora@<REALM>"

sudo kadmin -p admin/admin -q "addprinc -randkey vora/<fqdn>@<REALM>"

Security PUBLIC 93
2. Generate the keytabs as shown in the example below. Note that <fqdn> refers to all nodes where the SAP
HANA Vora services run:
sudo kadmin -p admin/admin -q "xst -k /etc/security/keytabs/vora.keytab

vora@EXAMPLE.COM"
sudo kadmin -p admin/admin -q "xst -k /etc/security/keytabs/
vora.service.keytab vora/FQDN@EXAMPLE.COM"
Remember
These are example commands that you need to adapt as appropriate for your environment.
3. Distribute the generated keytabs to the same location on every node using, for example, scp.
4.1.4 Use SAP HANA Vora with Active Directory
Your Hadoop cluster is using Active Directory (AD) instead of the MIT Kerberos distribution.
Procedure
1. Add users and service principals to Active Directory.

You need to add one service principal per host and assign it to a distinct user. This is exactly the same
approach as that followed by standard cluster managers (for example, Cloudera Management Server)
during Kerberos configuration.
For example, you could use vora-<hostname> as a distinct user:
dsadd user CN=vora-<hostname>,CN=Users,DC=AD,DC=HADOOP -upn vora/

<fqdn>@<REALM> -pwdneverexpires yes -disabled no -acctexpires never -
mustchpwd no -pwd <password>
Note that you cannot add multiple service principals to one user. For more information, see the Microsoft
Ktpass documentation.
2. Create keytab files for each service principal.
Create one keytab file for each service principal and host:
ktpass.exe /out vora-<hostname>.keytab /princ vora/<fqdn>@A<REALM> /mapuser AD

\v2server-<host> /crypto all /ptype KRB5_NT_PRINCIPAL /pass +rndPass
3. Distribute the generated keytabs (with the scp command, for example) to each host, using the same file
name and location on each host.
4. Verify with kinit.
Use kinit to verify that the above setup can be successfully run in your environment:
kinit -kt <keytab file> <SPN>

94 PUBLIC Security
For client principals the configuration could look like this:
Sample Code
dsadd user CN=vora,CN=Users,DC=AD,DC=HADOOP -upn vora@AD.HADOOP -

pwdneverexpires yes -disabled no -acctexpires never -mustchpwd no -pwd
<password>
ktpass.exe /out vora.keytab /princ vora@AD.HADOOP /mapuser vora@AD.HADOOP /
crypto all /ptype KRB5_NT_PRINCIPAL /pass +rndPass
Related Information
Microsoft Ktpass
4.1.5 Enabling Authentication Between SAP HANA Vora and

HDFS
SAP HANA Vora is able to read and write data from and to a security-enabled Hadoop Distributed File System
(HDFS) by means of Kerberos authentication. SAP HANA Vora currently only supports a single user for
accessing HDFS. It does not support Hadoop user impersonation.
SAP HANA Vora has two plugins to access HDFS. They can can be set on the SAP HANA Vora Manager UI for
each Vora engine if needed.
Note that if the following parameters are set on the Hadoop cluster, you need to use the native HDFS plugin:
● hadoop.rpc.protection is set to privacy

● dfs.encrypt.data.transfer is set
● Extended ALCs are used on the cluster
● Encrypted HDFS zones are used
Otherwise it is recommended that you use the default HDFS plugin.
Note
MapR clusters do not need HDFS configuration. For more information, see Configuring Authentication for
SAP HANA Vora with MapR.
Related Information
Enable Authentication for Default HDFS [page 96]

Enable Authentication for Native HDFS [page 97]
Configuring Authentication for SAP HANA Vora with MapR [page 105]

Security PUBLIC 95
4.1.5.1 Enable Authentication for Default HDFS
To enable Kerberos authentication, you need to configure the vora.security.kerberos.hdfs.principal
and vora.security.kerberos.hdfs.keytab.path parameters.
Context
● vora.security.kerberos.hdfs.principal
Set this parameter to a valid Kerberos principal. Both user principals and service principals can be used:
○ User principal: For example, vora@SAP.COM, where vora is the identifier of the principal and SAP.COM
is the realm of the principal.
○ Service principal: For example, vora/server.example.com@SAP.COM, where vora is the identifier
of the service principal, server.example.com is the fully qualified domain name of the service
principal, and SAP.COM is the realm of the service principal.
● vora.security.kerberos.hdfs.keytab.path
Set this parameter to the path of a valid keytab. A keytab is a file containing pairs of Kerberos principals
and encrypted copies of their corresponding keys. The keytab files are used to acquire a TGT (Ticket
Granting Ticket) and tickets from the TGS (Ticket-Granting Service) of the Kerberos Server (KDC). Since
they contain sensitive keys for authentication, they should be protected with strict file permissions. For
example, only the vora user should have read permission on the keytab files used by the Vora services.
Procedure
Set the parameters in /etc/hadoop/conf/core-site.xml.
If the core-site.xml is located elsewhere, make sure that each node has a link to the path /etc/hadoop/
conf/core-site.xml. An example of the core-site.xml file is shown below:
<configuration>
<property>
<name>hadoop.security.authentication</name>
<value>KERBEROS</value> 
</property>
<property>
<name>vora.security.hdfs.keytab.path</name>
<value>/etc/security/keytabs/vora.keytab</value>
</property>
<property>
<name>vora.security.hdfs.principal</name>
<value>vora</value>
</property>
</configuration>
Note
The hadoop.security.authentication parameter is a general parameter for enabling security in
Hadoop and is not related specifically to SAP HANA Vora. For more information, see the Hadoop
documentation.

96 PUBLIC Security
It is recommended that you set the parameters directly from the cluster management systems (Apache
Ambari or Cloudera Manager), since they are able to overwrite manually edited configuration files. In Apache
Ambari, for example:
SAP HANA Vora automatically checks the HDFS authentication type from the core-site.xml file (located
at /etc/hadoop/conf/core-site.xml). If it is set to KERBEROS, it uses the provided principal and keytab to
establish Kerberos-authenticated connections with HDFS. Otherwise, authentication is not performed.
4.1.5.2 Enable Authentication for Native HDFS
If native HDFS is enabled on the underlying Hadoop cluster, you need to activate the native HDFS plugin for all
SAP HANA Vora engine types and enable HDFS user impersonation.
Prerequisites
All configuration settings described for default HDFS are needed for native HDFS as well.
Procedure
1. On the SAP HANA Vora Manager UI, activate the native HDFS plugin for all SAP HANA Vora engine types
by selecting the Use native hdfs library option.
2. Enable HDFS impersonation for the vora user in the core-site.xml file or any other relevant Hadoop
configuration file.
This allows the user principal defined previously to impersonate the user vora, since all SAP HANA Vora
services run under the vora user. For more information, see the Hadoop Proxy User documentation.
For example:
Sample Code
<property>
<name>hadoop.proxyuser.vora.groups</name>
<value>*</value>
</property>

Security PUBLIC 97
<property>
<name>hadoop.proxyuser.vora.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.vora.users</name>
<value>*</value>
</property>
The example above shows one of the easiest ways to enable HDFS impersonation. However, it also allows
all users to impersonate the vora user at HDFS, so SAP recommends that you configure the necessary
ACLs on a case-by-case basis for your cluster.
Related Information
Hadoop: Proxy User
4.1.6 Enable Authentication Between SAP HANA Vora

Components
SAP HANA Vora uses Kerberos authentication to secure communication between its components. You can
use the SAP HANA Vora Manager to configure Kerberos authentication for the SAP HANA Vora services and
tools.
Configure the Vora Services
Procedure
On the Services tab of the SAP HANA Vora Manager UI, configure the SAP HANA Vora components using the
following parameters on each component's configuration page.
Make sure that the principals and keytabs are the same for all SAP HANA Vora services (note that this does
not include the SAP HANA Vora Tools).
Caution
For SAP HANA Vora to run correctly, you must set all services to either KERBEROS or NONE. If the options
are only partially set, this causes stability issues.
Kerberos principal Enter the service principal identifier. For example, if the full principal name is
v2server/server.example.com@SAP.COM, enter only v2server. The

98 PUBLIC Security
fully qualified domain name is automatically resolved from the DNS by the Vora
in-memory engine, while the realm SAP.COM is resolved from the default Ker
beros configuration. SAP HANA Vora currently only works with the default Ker
beros realm.
Kerberos keytab The file system path of the keytab file. The keytab should contain a service princi
pal whose fully qualified domain name is consistent with the domain name of the
machine where the keytab file is located. For example, if a service principal in a
keytab is v2server/server.example.com@SAP.COM, reverse name res
olution should be properly configured and should provide the same fully qualified
domain name as in the principal name (that is, server.example.com in the
example).
Authentication type There are two possible values for this field, NONE and KERBEROS. The default
value is NONE.
Configure the Vora Tools
Procedure
1. On the Vora Tools Configuration tab of the SAP HANA Vora Manager UI, configure the SAP HANA Vora
Tools using the following parameters:
Kerberos principal Enter a valid Kerberos principal. Both user principals and service principals
can be used for this parameter.
Kerberos principal of Hive Thrift The Kerberos principal as configured for Hive Thrift Server 2 in hive-
Server 2 site.xml. Enter only the service principal identifier. The fully qualified do
main name is automatically resolved from the DNS. Ensure that the vora
user has read access to the Hive keytab.
Kerberos keytab The file system path of the keytab file.
Authentication type There are two possible values for this field, NONE and KERBEROS. The default
value is NONE.
Note that problems may occur when the Hive configuration file is used by both the SAP HANA Vora
Thriftserver and Hive. Most Hive installations create the hive.service.keytab file with the hive user
as the owner. Ensure that the vora user has read access to this file in order to be able to run the Thrift
server.
2. On the Vora Thriftserver tab of the SAP HANA Vora Manager UI, configure the SAP HANA Vora
Thriftserver.
The SAP HANA Vora Thriftserver needs Kerberos tickets to submit Spark jobs that run in a Kerberized
YARN environment.
Extra arguments for SAP HANA Specify additional arguments. You can pass the Kerberos principal and keytab
Vora Thriftserver from this field using the --keytab and --principal parameters.

Security PUBLIC 99
For example:
You need to set additional parameters to configure the Thrift server on MapR clusters. For more
information, see Configure the Thrift Server.
Note
The required library libgssapi_krb5.so should be on your library path to run the SAP HANA Vora
Tools properly. On some Linux distributions, it could be named differently or be missing.
Related Information
Configure the Thrift Server [page 106]
4.1.7 Configure Authentication Between Apache Spark and

SAP HANA Vora
The authentication type of the SAP HANA Vora JDBC driver is controlled by the Spark parameter
spark.jdbcvora.authenticate. If it is set to KERBEROS, the SAP HANA Vora JDBC driver will perform
Kerberos authentication, otherwise no authentication is performed at all by the driver.
Prerequisites
A JAAS file is needed by the JDBC driver for Kerberos authentication. It must exist on every machine on which
the Spark driver and Spark executors are running. A sample JAAS file is shown below:
vora {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/etc/security/keytabs/vora.keytab"
storeKey=true
useTicketCache=false
principal="vora@ANSIBLE"
doNotPrompt=true;
};
The owner of the JAAS file should be the vora user. The vora.keytab file,which is a keytab derived from the
client principal, must be the same on all nodes.
You can set the principal and keytab in the JAAS file using JAAS syntax. For more information, see Class
Krb5LoginModule .

100 PUBLIC Security
Procedure
1. Pass the spark.jdbcvora.authenticate parameter to Spark.

You have two options:
○ Pass it to ./spark-submit as an argument (--conf key=value). For example, pass the following if
you want to perform authentication:
--conf spark.jdbcvora.authenticate=KERBEROS
○ Set the parameter in the Spark default configuration file located at $spark_home/conf/spark-
defaults.conf.
The example below shows the spark-defaults.conf file with Kerberos authentication enabled for the
SAP HANA Vora JDBC driver:
2. Pass the following parameters (lines 3, 4, and 5 in the figure above) to Spark as well:
○ Principal name of the v2server: This is specified with the parameter spark.v2server.principal.
This value must be the same as the principal name defined on the SAP HANA Vora Manager's
V2Server configuration tab.
○ Path of the JAAS file: This can be passed to the Spark driver and executors using the following
parameters, together with the Djava.security.auth.login.config option:
○ spark.executor.extraJavaOptions
○ spark.driver.extraJavaOptions
4.1.8 Run the Spark Shell with Kerberos Authentication
To run the Spark shell in a Kerberized environment, you need to configure the spark-env.sh files and then
run the start_spark_shell.sh script with the –-principal and --keytab parameters.
Prerequisites
The cluster has been configured for Kerberos authentication.
Procedure
1. Make the following changes to the $SPARK_HOME/conf/spark-env.sh file on each node of your cluster:
V2_AUTH_CONFIG='

Security PUBLIC 101
{
"auth_type": "KERBEROS",
"components": [{
"kerberos": {
"keytab": "<VORA_SERVICE_PRINCIPAL_PATH>",
"principal": "<VORA_SERVICE_PRINCIPAL>"
},
"name": "CAUTH_SERVER"
}, {
"kerberos": {
"keytab": "<VORA_SERVICE_PRINCIPAL_PATH>",
"principal": "<VORA_SERVICE_PRINCIPAL>"
},
"name": "CAUTH_CLIENT"
}]
}'
export V2_AUTH_CONFIG
The keytab and principal names specified above must match the entries for all other SAP HANA Vora
services.
2. Run the start_spark_shell.sh script with the –-principal and --keytab parameters:
○ The value of the principal parameter must be the user logged into the system where
start_spark_shell.sh is run.
○ The keytab parameter refers to the specific keytab file for this user. Note that there should be no
spaces in the path or name of the keytab file.
Sample Code
./start-spark-shell.sh --principal vora --keytab /etc/security/keytabs/

vora.keytab
You can find the ./start-spark-shell.sh script under the following paths:
○ Ambari installations: /var/lib/ambari-agent/cache/stacks/HDP/2.4/services/vora-

manager/package/lib/vora-spark/bin/start-spark-shell.sh
○ Cloudera installations: /opt/cloudera/parcels/SAPHanaVora-<version>/lib/vora-
spark/bin/start-spark-shell.sh
4.1.9 Connect SAP Lumira to a Kerberized SAP HANA Vora

Cluster
SAP Lumira connects to a Kerberos-enabled SAP HANA Vora cluster through a generic JDBC driver
configured with special security parameters.
Context
You need to add the generic JDBC driver to SAP Lumira. You also need to create Kerberos configuration files
to configure the connection to SAP HANA Vora and specify those files in the SAPLumira.ini file so that the

102 PUBLIC Security
configuration is propogated to the environment. Finally, you can use SAP Lumira to create a connection with
the necessary parameters.
Procedure
1. Install the generic JDBC driver by loading the required JAR files from the C:\Program Files\SAP
Lumira\Desktop\utilities\SparkJDBC directory:
a. Open SAP Lumira and choose Preferences SQL Drivers .

b. Select Generic JDBC datasource – JDBC Drivers and choose Install Drivers.
c. Select all JAR files under C:\Program Files\SAP Lumira\Desktop\utilities\SparkJDBC,
choose Open, and then Done.
d. To apply the driver changes, restart SAP Lumira.
2. Create the LumiraKerberosLogin.conf file with the following content:
Client {
com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true
keyTab="<VORA_PRINCIPAL_PATH>"
principal="<VORA_PRINCIPAL>" doNotPrompt=true;
};
○ The configuration above tells the Krb5LoginModule to use the keytab file without prompting the user
for a password.
○ The principal attribute is the user principal that SAP Lumira uses for authentication by the KDC.
3. Make sure that the krb5.ini file contains the following:
[libdefaults]
default_realm = <VORA_REALM>
dns_lookup_kdc = true
dns_lookup_realm = true
default_tkt_enctypes = rc4-hmac
default_tgs_enctypes = rc4-hmac
[domain_realm]
.example.com = <VORA_REALM>
example.com = <VORA_REALM>
[realms]
<VORA_REALM> = {
default_domain = <customer_domain>
kdc = <customer_kdc>
}
[capaths]
The configuration above specifies that the default realm is VORA_REALM and that it corresponds to
<customer_domain>. It also specifies that the KDC service is running at the <customer_kdc> address in
the <customer_domain>.
4. Add the configuration file paths to the SAPLumira.ini file.
SAP Lumira reads the SAPLumira.ini file under the SAP Lumira installation directory to get the
necessary starting parameters. You need to pass the file paths of the files above as parameters to make
them effective throughout the authentication process.
Add the following lines to C:\Program Files\SAP Lumira\Desktop\SAPLumira.ini:
-Djava.security.krb5.conf=C:/Windows/krb5.ini

Security PUBLIC 103
-Djava.security.auth.login.config=C:/LumiraKerberosLogin.conf
SAP Lumira is now able to propogate the necessary values to the authentication module.
5. Start SAP Lumira.
6. Configure the database source to establish the data flow:
a. In SAP Lumira choose File New .

The Add new dataset dialog box appears.
b. Select Query with SQL and choose Next.
c. Select Generic JDBC datasource – JDBC Drivers and choose Next.
d. Enter the following values in the fields below:
Field Value
User Name lumira
Password lumira
JDBC URL jdbc:spark://<THRIFT_URL>/

default;AuthMech=1;KrbRealm=<VORA_REALM>;KrbHostFQDN=<THR
IFT_FQDN>;KrbServiceName=hive
JDBC Class com.simba.spark.jdbc4.Driver
○ JDBC Class: The JDBC class specifies which driver is used for the JDBC implementation. For a
secured SAP HANA Vora connection, SAP Lumira should be configured to use Simba (this option
has been tested). The selected driver determines which URL parameters you need to set to
connect to an authenticated service (see next point).
○ JDBC URL: The <THRIFT_URL> should point to the Thrift server host to be contacted. The default
port of the Thrift server is 19123 (for example, thriftserverhost:19123/default).
The JDBC URL also has special parameters required for Kerberos authentication. These are
defined in the Simba JDBC documentation as follows:
JDBC URL Parameter Description
AuthMech Set to 1 to indicate Kerberos authentication
KrbRealm Set to VORA_REALM (the running SAP HANA Vora cluster and
krb5.ini file above are already configured to operate in
VORA_REALM)
KrbHostFQDN Set to the Thrift server's fully qualified domain name (FQDN)
KrbServiceName Set to hive
e. Choose Connect.
A connection is created to the security-enabled SAP HANA Vora cluster.
Related Information
Connect SAP Lumira to SAP HANA Vora [page 48]

2210624 - How do I configure SAP Lumira for Kerberos Authentication

104 PUBLIC Security
4.1.10 Configuring Authentication for SAP HANA Vora with
MapR
By design, the MapR security architecture is different to that of the other cluster managers like Ambari and
Cloudera.
MapR introduces MapR tickets on top of Kerberos tickets and uses them for user and service authentication.
For this reason, the SAP HANA Vora services also need MapR tickets to access MapR cluster services, such as
MapR-FS.
However, the SAP HANA Vora services communicates internally with native Kerberos so a Kerberos
infrastructure is still needed to secure SAP HANA Vora. All Kerberos configuration for the SAP HANA Vora
services is therefore also applicable for MapR clusters.
Related Information
Access MapR-FS [page 105]
4.1.10.1 Access MapR-FS
SAP HANA Vora needs MapR service tickets to access MapR-FS.
Prerequisites
The vora user needs read access permission on the file system to access tickets.
Context
The SAP HANA Vora services access MapR-FS with the user to which the ticket belongs. SAP recommends
that this user is also named vora. If the ticket is obtained by using Kerberos tickets, make sure that the
Kerberos principal has the user name vora. Also make sure that the tickets have a long expiration period to
avoid having to refresh and distribute them to all cluster nodes again.
Procedure
1. Create the MapR service tickets.

Security PUBLIC 105
Example
maprlogin generateticket –type service –out /etc/vora/vora_ticket –duration

365:0:0 -renewal 365:0:0
The command above creates a long-lived ticket for the user that is logged in. The ticket is valid for 365
days and the maximum renewal time is also 365 days. SAP HANA Vora does not provide a ticket renewal
service for MapR tickets and therefore, after this period, you will need to create another ticket and
distribute it to the cluster nodes. To create service tickets using Kerberos tickets, make sure that you run
kinit first and then run maprlogin kerberos to obtain the initial user ticket.
2. Distribute the tickets to the same directory on all nodes in the cluster.
3. Set the MapR ticket location on the SAP HANA Vora Manager UI.
Next Steps
Configure the SAP HANA Vora Thriftserver to use the MapR service tickets.
Related Information
Configure the Thrift Server [page 106]

MapR Security Architecture
4.1.11 Configure the Thrift Server
The SAP HANA Vora Thriftserver can be configured for Kerberos authentication and can be run on a
Kerberized Hadoop cluster. The SAP HANA Vora Thriftserver does not support impersonation in the Hadoop
cluster.
Procedure
1. To run the SAP HANA Vora Thriftserver on a Kerberized Hadoop cluster, set the following HiveServer2
security properties:
hive.server2.enable.doAs Disable Hive impersonation
hive.server2.authentication Enable Kerberos authentication
hive.server2.authentication.kerberos.principal Kerberos principal name to be used by the Thriftserver

106 PUBLIC Security
hive.server2.authentication.kerberos.keytab Kerberos keytab file to be used by the Thriftserver
If you have HiveServer2 and/or Hive metastore installed in your cluster, you may have to adjust additional
configuration parameters specific to your cluster. For more information, see the HiveServer2
configuration guide.
2. MapR only:
a. Configure the SAP HANA Vora Thriftserver to use an internal metastore.
The SAP HANA Vora Thriftserver shipped within MapR is capable of authenticating through MapR
tickets. MapR does not support running the Spark Thrift JDBC/ODBC server in a secured cluster. The
SAP HANA Vora Thriftserver therefore cannot be run against the Hive metastore in a secured MapR
cluster.
To run the SAP HANA Vora Thriftserver in a secured MapR cluster, you need to configure it to use an
internal metastore. An example hive-site.xml configuration for running the SAP HANA Vora
Thriftserver in a secured MapR cluster is shown below:
<configuration>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>disable impersonation on hive server</description>
</property>
<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>mapr/_HOST@ANSIBLE</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>/opt/mapr/conf/mapr.keytab</value>
</property>
</configuration>
b. Configure the SAP HANA Vora Thriftserver to use the MapR service ticket generated for the vora user
when running in a secured MapR cluster (for more information, see Access MapR-FS).
To do so, you need to set the MAPR_TICKETFILE_LOCATION environment variable in $SPARK_HOME/

conf/spark-env.sh. An example configuration line in spark-env.sh is shown below:
export MAPR_TICKETFILE_LOCATION=/etc/vora/vora.service.ticket
The ticket at the specified location should be readable only by the vora user.
c. SAP HANA Vora only supports the SASL Quality of Protection (QoP) level of authentication for the
GSSAPI mechanism. Add the following string in the Extra arguments for SAP HANA Vora Thriftserver
field on the SAP HANA Vora Manager UI:
--hiveconf hive.server2.thrift.sasl.qop=auth

Security PUBLIC 107
3. Cloudera only: Configure the Thriftserver in the SAP HANA Vora Manager.
The Thrift server configuration on Cloudera is overwritten by the Cloudera Manager after deploying the
client configuration.
To avoid problems, SAP suggests you add the following string in the Extra arguments for SAP HANA Vora
Thriftserver field:
--conf spark.jdbcvora.authenticate=kerberos --conf

spark.v2server.principal=vora
Related Information
Setting Up HiveServer2
Access MapR-FS [page 105]
Spark Feature Support
4.2 Configure SAP HANA Vora UI Security
You can enable SSL/TLS for the SAP HANA Vora Manager and SAP HANA Vora Tools UIs. By default they use
plain HTTP. A public key infrastructure (PKI) is needed to enable this.
Context
The UIs of the SAP HANA Vora Tools and SAP HANA Vora Manager can be served through HTTPS instead of
HTTP. To enable this feature, you need to make the following changes in the associated configuration files.
Procedure
1. Open the configuration files:
○ Vora Tools:
<VORA-INSTALL-PATH>/vora-tools/bin/service/authweb/meta.json
○ Vora Manager:
<VORA-INSTALL-PATH>/vora-manager-gui/bin/service/authweb/meta.json
2. Make the following changes:
{
"constructor": "webserver",

108 PUBLIC Security
"config": {
"HTTPAddr": ":9225",
"EnableAuth": true,
"Userstores": ["/etc/vora/datatools/htpasswd", "htpasswd"],
"HTTPSAddr": ":9443", <- ADD THIS
"TLSCertFilePath": "/path/to/certificate", <- ADD THIS
"TLSKeyFilePath": "/path/to/key" <- ADD THIS
}
}
Note that TLSCertFilePath should point to a PEM-encoded X.509v3 certificate file and
TLSKeyFilePath should point to a PEM-encoded and unencrypted private key file. Also make sure that
you select different ports for SAP HANA Vora Tools and SAP HANA Vora Manager in case they run on the
same node.
If both HTTP and HTTPS endpoints are enabled, the HTTP endpoint will be automatically redirected to
HTTPS by default.
3. For Internet Explorer: SAP also recommends that you set the Access data sources across domains option
to Disable, since this can cause security issues for SAP HANA Vora as well as other applications:
Tools Internet options Security Trusted sites Custom level Miscellaneous Access data
sources across domains .
4.3 Verifying Consul UI Security Measures
SAP HANA Vora is delivered with Consul as a key-value store persistency layer. The Consul UI is disabled by
default when SAP HANA Vora is delivered for security reasons. It is recommended to use third-party best
practices for the Consul service to make the production environment more secure.
If you have your own Consul attached to SAP HANA Vora it is strongly recommended to disable the Consul UI
or enable the necessary protections for it (that is, SSL/TLS, link encryption, and so on) to avoid security
vulnerabilities in SAP HANA Vora.
Although the Consul UI is disabled, Consul still serves requests through its REST API on the external interface.
This is required for the SAP HANA Vora cluster to work properly and for the cluster nodes to communicate
with each other using this interface. However, SAP strongly recommends blocking this connection to the
external world using standard measures like external firewalls. For more information, refer to the Consul
resources to find the best measures for your infrastructure.
Note that you can re-enable the Consul UI by adding a file named consul_ui to the path /etc/vora and
restarting the SAP HANA Vora Manager service from the cluster manager:
touch /etc/vora/consul_ui
This will activate the Consul UI only on the node where the file is placed.
Related Information
Consul Security Model

Security PUBLIC 109
Important Disclaimers and Legal Information
Coding Samples
Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system
environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and
completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP
intentionally or by SAP's gross negligence.
Accessibility
The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be
a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however,
does not apply in cases of willful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations
of SAP.
Gender-Neutral Language
As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as
"sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun
does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.
Internet Hyperlinks
The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does
not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any
damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for
transparency (see: http://help.sap.com/disclaimer).

110 PUBLIC Important Disclaimers and Legal Information
Important Disclaimers and Legal Information PUBLIC 111
go.sap.com/registration/
contact.html
© 2017 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any
form or for any purpose without the express permission of SAP SE
or an SAP affiliate company. The information contained herein may
be changed without prior notice.
Some software products marketed by SAP SE and its distributors
contain proprietary software components of other software
vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company
for informational purposes only, without representation or warranty
of any kind, and SAP or its affiliated companies shall not be liable for
errors or omissions with respect to the materials. The only
warranties for SAP or SAP affiliate company products and services
are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein
should be construed as constituting an additional warranty.
SAP and other SAP products and services mentioned herein as well
as their respective logos are trademarks or registered trademarks
of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the
trademarks of their respective companies.
Please see http://www.sap.com/corporate-en/legal/copyright/
index.epx for additional trademark information and notices.

SAP HANA Vora Installation and Administr PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

SAP HANA Vora Installation and Administr PDF

Hochgeladen von

Copyright:

Verfügbare Formate

PUBLIC

SAP HANA Vora 1.3

SAP HANA Vora Installation and Administration

SAP HANA Vora Installation and Administration Guide

SAP HANA Vora Installation and Administration Guide

SAP HANA Vora Installation and Administration Guide

Fast Query Execution

SAP HANA Integration

Time Series Analysis

SAP HANA Vora Installation and Administration Guide

1.1 SAP HANA Vora and Apache Hadoop

SAP HANA Vora Installation and Administration Guide

Ambari An open operational framework for provisioning, Apache Ambari

Cloudera Cloudera Manager - Cloudera's automated cluster Cloudera

MapR MapR Control System (MCS) - a cluster adminis­ MapR

HDFS The Hadoop Distributed File System. HDFS Users Guide

Zookeeper A centralized service for maintaining configuration Apache ZooKeeper

HBase The Hadoop database. Apache HBase

Pig A high-level data-flow language and execution Apache Pig

Apache Hive A data warehouse infrastructure supporting data Apache Hive

1.2 SAP HANA Vora and Apache Spark

SAP HANA Vora Engines

● A Spark worker (and the necessary Hadoop components)

SAP HANA Vora Installation and Administration Guide

SAP HANA Vora Spark Extension Library

SAP HANA Vora Software Download [page 19]

SAP HANA Vora Installation and Administration Guide

Complete the individual tasks in the following order:

SAP HANA Vora Default Ports [page 59]

SAP HANA Vora Installation and Administration Guide

Installation Prerequisite Checklist

☐ Hadoop Distributions [page 10]

2.1.1 Hadoop Distributions

● Hortonworks Data Platform (HDP)

2.1.2 Cluster Provisioning Tools

● Apache Ambari 2.2.1 and above

SAP HANA Vora Installation and Administration Guide

2.1.3 Operating Systems

The following operating systems are supported:

● SUSE Linux Enterprise Server (SLES) 11 SP4

2.1.3.1 Configure Spark with the SAP C++ Compatability

SAP HANA Vora Installation and Administration Guide

2.1.4 Supported Platforms

Operating System Hadoop Distribution Hadoop Version Cluster Provisioning Tool

SLES 11 SP4(1) CDH 5.7 2.6.0 Cloudera Manager 5.7

SLES 11 SP4(1) HDP 2.4.2 2.7.1 Ambari 2.2.1 and above

RHEL 6.8 CDH 5.7 2.6.0 Cloudera Manager 5.7

RHEL 6.8 HDP 2.4.2 2.7.1 Ambari 2.2.1 and above

RHEL 6.8 MapR 5.1 2.7.0 MapR Control System 5.1

RHEL 7.2 MapR 5.1 2.7.0 MapR Control System 5.1

CentOS 7.2(2) HDP 2.4.2 2.7.1 Ambari 2.2.1 and above

CentOS 6.7(2) CDH 5.7 2.6.0 Cloudera Manager 5.7

2.1.5 Cluster Sizing

SAP HANA Vora Installation and Administration Guide

The following components are required on the cluster:

Component More Information

HDFS 2.6.0, 2.7.0, or 2.7.1 https://hadoop.apache.org/docs/stable/

Spark 1.6 https://spark.apache.org/releases/

Yarn cluster manager https://spark.apache.org/docs/latest/running-on-yarn.html

Zeppelin v0.6.0 Optional – allows you to use Zeppelin integration: http://zeppelin.apache.org/

2.1.7 Browser Support

SAP HANA Vora supports the following desktop browsers.

SAP HANA Vora Installation and Administration Guide

MapR MapR Control System (MCS) - a cluster adminis MapR