Sie sind auf Seite 1von 59

Hortonworks on

Nutanix
Nutanix Reference Architecture

Version 2.1 • October 2018 • RA-2030


Hortonworks on Nutanix

Copyright
Copyright 2018 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual
property laws.
Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other
marks and names mentioned herein may be trademarks of their respective companies.

Copyright | 2
Hortonworks on Nutanix

Contents

1. Executive Summary................................................................................ 5

2. Audience and Purpose........................................................................... 6

3. Solution Overview................................................................................... 7
3.1. Invisible Infrastructure....................................................................................................7
3.2. Nutanix Architecture...................................................................................................... 8
3.3. Hadoop 2.0 (HDFS and YARN) Architecture.............................................................. 11
3.4. What Is the Hortonworks Data Platform?....................................................................12
3.5. Nutanix Prism Central................................................................................................. 14
3.6. Hadoop the Nutanix Way............................................................................................ 15

4. Solution Design..................................................................................... 19
4.1. Hadoop 2.0 Storage Consumption.............................................................................. 19
4.2. Hadoop 2.0 on Nutanix: Storage Consumption...........................................................21
4.3. Hadoop VM Operating System Changes.................................................................... 25
4.4. Hadoop Cluster Design............................................................................................... 30
4.5. General Sizing............................................................................................................. 33
4.6. ESXi Networking..........................................................................................................37
4.7. AHV Networking.......................................................................................................... 38

5. Validation and Benchmarking..............................................................41


5.1. Environment Overview.................................................................................................41
5.2. ESXi and AHV Test Environment Configuration......................................................... 44

6. Solution Application..............................................................................46
6.1. Scenario: Large Hadoop Cluster with Rack Awareness..............................................46

7. Conclusion............................................................................................. 48

Appendix......................................................................................................................... 49

3
Hortonworks on Nutanix

Configuration....................................................................................................................... 49
Disabling Transparent Huge Pages (THP) and Setting read_ahead_kb per vDisk............ 51
Configuring Chronyd (NTP) Service................................................................................... 52
Kernel Settings....................................................................................................................54
Glossary of Terms.............................................................................................................. 55
About the Authors............................................................................................................... 55
About Nutanix......................................................................................................................56

List of Figures................................................................................................................57

List of Tables................................................................................................................. 59

4
Hortonworks on Nutanix

1. Executive Summary
This document makes recommendations for designing, optimizing, and scaling Hortonworks Data
Platform (HDP) deployments on Nutanix. It shows the scalability of the Enterprise Cloud Platform
and provides detailed performance and configuration information regarding the cluster’s scale-out
capabilities when supporting Hadoop deployments.
We have performed extensive testing to simulate real-world workloads and conditions for an HDP
Hadoop environment on both AHV—the native Nutanix hypervisor—and ESXi. We based the
sizing data and recommendations made in this document on multiple test iterations and thorough
technical validation. For validation, we used the Hortonworks-supplied Independent Hardware
Vendor (IHV) tests; this testing suite has allowed us to self-certify HDP on a Nutanix Cloud OS
cluster.
This reference architecture details the underlying Nutanix platform configuration we tested,
as well as our layout for the HDP software components, which follows current best practice
guidance and serves as the basis for ongoing work with the Hortonworks partner team.

Table 1: Solution Details

Nutanix AOS Hypervisor


Product Name Product Version Hypervisor
Version Version
Hortonworks Data
2.6.2 5.5 AHV 20160925.71
Platform
Hortonworks Data
2.6.2 5.5 ESXi 3620759
Platform

1. Executive Summary | 5
Hortonworks on Nutanix

2. Audience and Purpose


This reference architecture is part of the Nutanix Solutions Library. It is intended for architects
and systems engineers responsible for designing, managing, and supporting Nutanix running
Hadoop. Readers should already be familiar with Hadoop and Nutanix.
We cover the following subject areas:
• Overview of the Nutanix solution for delivering Hadoop on AHV and ESXi.
• The benefits of Hortonworks on Nutanix.
• Architecting a complete Hortonworks solution on the Nutanix platform.
• Sizing guidance for scaling Hortonworks deployments on Nutanix.
• Design and configuration considerations when architecting Hortonworks Hadoop on AHV and
ESXi and the Acropolis Distributed Storage Fabric (DSF).
Unless otherwise stated, the solution described in this document is valid on all supported AOS
releases.

Table 2: Document Version History

Version
Published Notes
Number
1.0 October 2015 Original publication.
1.1 April 2016 Updated platform overview.
1.2 October 2017 Updated platform overview.
2.0 May 2018 Updated to include both ESXi and AHV support.
Added Solution Details table and updated Usable Storage
2.1 October 2018
with HDFS and the Acropolis DSF table.

2. Audience and Purpose | 6


Hortonworks on Nutanix

3. Solution Overview

3.1. Invisible Infrastructure


Invisible infrastructure means that enterprises can focus on solving business problems instead of
on repetitive and tedious management and maintenance tasks. Eliminating complexity frees your
IT staff to do other work, while achieving shorter response times for application owners. Nutanix
can provide a highly available common framework for a variety of workloads, ensuring that your
Hadoop infrastructure is consistent throughout your datacenter.

Figure 1: Nutanix Web-Scale Properties

The Nutanix Enterprise Cloud Platform has an architecture very similar to Hadoop’s, allowing
data localization, high throughput, and the capacity to achieve great scale. Nutanix provides
all the benefits of virtualization without the pitfalls of shared storage, such as the complexity
of managing a separate data fabric and the introduction of bottlenecks and silos as you go to
scale your environment. Nutanix offers one-click management via the Prism UI. The experience
that comes from building our own distributed file system uniquely positions Nutanix to maintain,
remediate, and deliver insights into Hadoop’s infrastructure. As you scale from dozens to

3. Solution Overview | 7
Hortonworks on Nutanix

thousands of nodes with Nutanix, you can enjoy enterprise quality with consumer-grade look and
feel.

3.2. Nutanix Architecture


The Nutanix hyperconverged infrastructure is a scale-out cluster of high-performance nodes
(or servers), each running a standard hypervisor and containing processors, memory, and local
storage (consisting of SSD flash and high-capacity SATA disk drives). Each node runs VMs just
like a standard VM host.
In addition, the Acropolis Distributed Storage Fabric (DSF) virtualizes local storage from all nodes
into a unified pool. In effect, the DSF acts like an advanced NAS that uses local SSDs and disks
from all nodes to store VM data. VMs running on the cluster write data to the DSF as if they were
writing to shared storage.

Figure 2: Nutanix Architecture

The DSF understands the concept of a VM and provides advanced data management features.
It brings data closer to VMs by storing the data locally on the system, resulting in higher
performance at a lower cost. Nutanix platforms can horizontally scale from as few as three nodes
to a very large number, providing agility and flexibility as a customer’s infrastructure needs grow.
The Nutanix Capacity Optimization Engine (COE) transforms data to increase data efficiency on
disk, using compression as one of its key techniques. The DSF provides both inline and post-
process compression to suit the customer’s needs and the types of data involved.
Inline compression condenses sequential streams of data or large I/O sizes in memory before
writing them to disk, while post-process compression initially writes the data as usual (in an
uncompressed state), then uses the Nutanix MapReduce framework to compress the data
cluster-wide. When using inline compression with random I/O, the system writes data to the

3. Solution Overview | 8
Hortonworks on Nutanix

oplog uncompressed, coalesces it, and then compresses it in memory before writing it to the
extent store. From the AOS 5.0 release onward, Nutanix has used LZ4 and LZ4HC for data
compression. Releases prior to AOS 5.0 use the Google Snappy compression library. Both
methods provide good compression ratios with minimal computational overhead and extremely
fast compression and decompression rates.
The following figure shows an example of how inline compression interacts with the DSF write I/
O path.

Figure 3: ILM and Compression

Nutanix relies on a replication factor for data protection and availability. This method provides the
highest degree of availability, because it does not require reading from more than one storage
location or data recomputation on failure. However, this advantage comes at the cost of storage
resources, as it requires full copies.

3. Solution Overview | 9
Hortonworks on Nutanix

To provide a balance between availability and a reduced storage requirement, the DSF can
encode data using the patented Nutanix version of erasure coding, called EC-X.
EC-X encodes a strip of data blocks on different nodes and determines parity, in a manner
similar to RAID (levels 4, 5, 6, and so on). In the event of a host or disk failure, the system can
use the parity to decode any missing data blocks via decoding. In the DSF, the data block is an
extent group, and each data block must be on a different node and belong to a different vDisk.
You can configure the number of data and parity blocks in a strip based on how many failures
you need to tolerate. In most cases, we can think of the configuration as <number of data
blocks> / <number of parity blocks>.
For example, “replication factor 2-like” availability (N + 1) could consist of three or four data
blocks and one parity block in a strip (3/1 or 4/1). “Replication factor 3-like” availability (N + 2)
could consist of three or four data blocks and two parity blocks in a strip (3/2 or 4/2).

Figure 4: Forming a 4/1 Strip with Replication Factor 2

The calculation for expected overhead is <# parity blocks> / <# parity blocks + # data blocks>.
For example, a 4/1 strip has a 20 percent overhead or 1.2x compared to the 2x of replication
factor 2.
EC-X is a post-process framework that doesn’t affect the traditional write I/O path. The encoding
uses the internal MapReduce framework for task distribution. Once the strip is formed, the
system removes the redundant copies of the data.

3. Solution Overview | 10
Hortonworks on Nutanix

EC-X complements compression savings to decrease the amount of storage needed. EC-X
generally accounts for about 80 percent of the cluster’s total capacity. The other 20 percent is
write-hot data that isn’t a good candidate for encoding.

3.3. Hadoop 2.0 (HDFS and YARN) Architecture

Figure 5: Hadoop 2.0

YARN
Yet Another Resource Negotiator (YARN) is a core part of Hadoop. YARN manages the compute
resources (memory and CPU) in a Hadoop cluster. A YARN application can request these
resources, and YARN makes them available according to its scheduler policy. In a Hadoop
cluster, YARN achieves 100 percent usage of all resources on the physical system while letting
every application perform at its maximum potential. YARN is made up of following components:
• ResourceManager: A single entry point for clients to submit YARN applications. Responsible
for application management and scheduling.
• NodeManager: Runs on the DataNode worker nodes in a Hadoop cluster to enable job
scheduling.
• Container: Resource unit in YARN. Containers are sized by memory and CPU and are
dynamic.

3. Solution Overview | 11
Hortonworks on Nutanix

• Application Master: Makes sure that the application gets the necessary resources from the
ResourceManager. Also directs cleanup when the job finishes or if there is an application
failure.

HDFS
A Hadoop Distributed File System (HDFS) instance consists of a cluster of machines, often
referred to as the Hadoop cluster or HDFS cluster. There are two main components of an HDFS
cluster:
• NameNode: The master HDFS node that manages the data (without actually storing it),
determining and maintaining how chunks of data are distributed across DataNodes.
• DataNode: A worker node that stores chunks of data and replicates the chunks across other
data nodes.
NameNode and DataNode are daemon processes running in the cluster. In the HDFS master/
worker node architecture, the NameNode is the master server that manages the file system
namespace and regulates clients’ access to files. It determines how chunks of data are
distributed across DataNodes, but data never resides on or passes through the NameNode.
As worker nodes for the NameNode, DataNodes store chunks of data and replicate the chunks
across other DataNodes. DataNodes constantly communicate their state with the NameNode as
well as with other DataNodes, through commands from the NameNode, to replicate data.

3.4. What Is the Hortonworks Data Platform?


The Hortonworks Data Platform (HDP) is the only 100 percent open source data management
platform for Apache Hadoop. HDP allows you to capture, process, and share data in any format
and at scale. Built and packaged by the core architects, builders, and operators of Hadoop, HDP
includes all of the components necessary to manage a cluster at scale and uncover business
insights from existing and new big data sources.

Enterprise Ready
HDP provides Apache Hadoop for the enterprise, developed completely in the open and
supported by the deepest technology expertise. HDP incorporates current community innovation
and is tested on the most mature Hadoop test suite and on thousands of nodes. Engineers with
the broadest and most complete knowledge of Apache Hadoop develop and support HDP.

3. Solution Overview | 12
Hortonworks on Nutanix

Figure 6: HDP Is Enterprise Ready

Security
HDP is designed to meet the changing needs of big data processing within a single platform
and supports all big data scenarios, from batch, to interactive, to real-time and streaming. With
YARN, HDP offers a versatile data access layer at the core of enterprise Hadoop that can
incorporate new processing engines as they become ready for enterprise consumption. HDP
provides the comprehensive enterprise capabilities of security, governance, and operations for
enterprise Hadoop implementations.

Hortonworks Data Platform Key Features

Open Source Cluster Management: Apache Ambari


HDP includes Apache Ambari, the only open source management and monitoring tool available
for Hadoop. Ambari is free and uses existing, well-known technologies and protocols for
simplified integration with your existing management and monitoring tools. Ambari provides
rollback options and makes installing and configuring HDFS, YARN, HBase, and Hive easy.

Developers Can Get to Work Quickly


HDP’s integrated experience allows for SQL query building, displays a visual “explain plan,”
and provides for an extended debugging experience when using the Tez execution engine. In

3. Solution Overview | 13
Hortonworks on Nutanix

addition to the SQL builder, a Pig Latin Editor brings a modern browser-based IDE experience to
Pig. There is also a file browser for HDFS.

Data Management
The Apache Falcon web forms-based approach allows for rapid development of feeds and
processes. The Falcon UI also allows you to search and browse processes that have executed,
visualize lineage, and set up mirroring jobs to replicate files and databases between clusters
or to cloud storage such as Microsoft Azure Storage. As an additional benefit, centralized
authorization and policy management for security and audit information with Apache Atlas allows
companies to ensure data governance compliance.

Data Access
As organizations strive to store their data in a single repository efficiently and to interact with it
simultaneously in different ways, they need SQL, streaming, data science, batch, and in-memory
processing from a tool such as Spark. HDP uses YARN as its architectural center, allowing
Hadoop to connect easily to hundreds of data systems, including Spark, without having to write
code. HDP’s support for all Apache Hadoop projects makes it a great choice.

3.5. Nutanix Prism Central


Prism Central is a powerful management tool that allows administrators to centrally manage and
control more than 100 Nutanix clusters around the world from a single pane of glass. It removes
the operational complexity from managing multiple clusters, whether they are in a single location
or geographically distributed across different datacenters.
The HTML 5-based interface provides a bird’s-eye view of IT resources across multiple clusters,
enabling administrators to select and manage individual clusters as required. Single sign
on streamlines large-scale management by eliminating the need to log on to each cluster
individually. Prism Central also provides an aggregate view of all environment resources, so
administrators can quickly and efficiently monitor all virtual DataNodes and storage resources
and identify potential issues in individual clusters. Prism Central is hypervisor-agnostic and can
manage multiple clusters running different hypervisors. You can deploy Prism Central in any of
the Nutanix clusters in the global environment.
With this feature, you can manage multiple petabytes in a single HDFS namespace while
maintaining separate failure domains. As the reliance on Hadoop grows inside the enterprise,
Prism allows you to manage reliability and cluster health from a central vantage point.

3. Solution Overview | 14
Hortonworks on Nutanix

Figure 7: Prism

3.6. Hadoop the Nutanix Way


The Nutanix system operates and scales HDP in conjunction with other hosted services,
providing a single scalable platform for all deployments. Existing sources and platforms can send
data to the Hortonworks platform on Nutanix over the network. The figure below shows a high-
level view of the HDP on Nutanix solution.

3. Solution Overview | 15
Hortonworks on Nutanix

Figure 8: HDP on Nutanix Conceptual Arch

The Nutanix modular scale-out approach enables customers to select any initial deployment size
and grow in granular data and desktop increments. This design removes the hurdle of a large
up-front infrastructure purchase that a customer may need many months or years to grow into,
ensuring a faster time to value for an HDP implementation.

Why Virtualize Hadoop Nodes on the Enterprise Cloud Platform?


• Make Hadoop an app: With one-click upgrades, Prism’s HTML 5 user interface makes
managing infrastructure easy. Manage golden images for Hadoop across multiple Nutanix
clusters using integrated data protection. Easily address problematic firmware upgrades and
save time.

3. Solution Overview | 16
Hortonworks on Nutanix

Note: Whenever possible, incorporate all best practices (including those listed in the
Appendix) into the base image. However, do not install any Hortonworks software
into this golden image.

• Acropolis high availability and automated Security Technical Implementation Guides


(STIGs) keep your data available and secure.
• Data locality: Nutanix is the only virtualization vendor that uses data locality. Data locality is a
core architectural principle for both Hadoop and Nutanix. Nutanix support makes working with
current and newer Apache projects very easy, with consistent performance as you scale.
• Scale out with workload demands: The Nutanix cluster can scale out seamlessly to suit
project-specific Hortonworks workload requirements. On a Nutanix cluster, adding and
removing nodes are one-click operations, with the cluster automatically rebalancing compute
and storage across available nodes.
• DevOps: Big data scientists demand performance, reliability, and a flexible scale model.
IT operations rely on virtualization to tame server sprawl, increase utilization, encapsulate
workloads, manage capacity growth, and alleviate disruptive outages caused by hardware
downtime. By virtualizing Hadoop, data scientists and IT operations mutually achieve all these
objectives, while preserving autonomy and independence for their respective responsibilities.
• Batch scheduling and stacked workloads: Allow all workloads and applications, such as
Hadoop, virtual desktops, and servers, to coexist. Schedule jobs to run during off-peak hours
to take advantage of idle night and weekend hours that would otherwise go to waste. Nutanix
also allows you to bypass the flash tier for sequential workloads, eliminating the time it takes
to rewarm the cache for mixed workloads.
• New Hadoop economics: Bare-metal implementations are expensive and can spiral out of
control. The downtime and underutilized CPU that are consequences of physical servers’
workloads can jeopardize project viability. Virtualizing Hadoop reduces complexity and
ensures success for sophisticated projects with a scale-out, grow-as-you-go model—a perfect
fit for big data projects.
• Unified data platform: Run multiple data processing platforms along with Hadoop YARN on a
single unified data platform, the Acropolis DSF.
• Analytic high-density engine: With the Nutanix solution, you can start small and scale, letting
you accurately match supply with demand and minimize the up-front capital expenditure.
• Change management: Maintain control and separation between development, test, staging,
and production environments. Snapshots and fast clones can help you share production data
with nonproduction jobs, without requiring full copies and unnecessary data duplication.
• Data efficiency: The Nutanix solution is truly VM-centric for all compression policies.
Unlike traditional solutions that perform compression mainly at the LUN level, the Nutanix
solution provides all of these capabilities at the VM and file levels, greatly increasing

3. Solution Overview | 17
Hortonworks on Nutanix

efficiency and simplicity. These capabilities ensure the highest possible compression and
decompression performance on a subblock level. While developers may or may not run jobs
with compression, IT operations can ensure that the system is storing cold data effectively.
You can also apply Nutanix erasure coding (EC-X) on top of compression savings.
• Automatic leveling and automatic archive: Nutanix spreads data evenly across the cluster,
ensuring that local drives don’t fill up and cause an outage when space is available elsewhere
on the network. Cold data can move from compute nodes to cold storage nodes, freeing up
room for hot data without consuming additional licenses.
• Time-sliced clusters: Like public cloud EC2 environments, Nutanix can provide a truly
converged cloud infrastructure, allowing you to run your Hadoop, server, and desktop
virtualization on a single cloud to get the efficiency and savings you require.

3. Solution Overview | 18
Hortonworks on Nutanix

4. Solution Design
Using the Hortonworks Data Platform on Nutanix, you have the flexibility to start small, with
a single block, and scale up incrementally one node at a time. This approach provides the
best of both worlds—the ability to start small and grow to massive scale without any impact on
performance.

4.1. Hadoop 2.0 Storage Consumption


The below diagram shows a typical workflow when a client starts a job that is using MapReduce.
Here, we focus on what happens when a DataNode writes to disk.

4. Solution Design | 19
Hortonworks on Nutanix

Figure 9: Hadoop 2.0 Workflow

Hadoop 2.0 Workflow


1. Client submits a job.
2. Response with ApplicationID.
3. Containers launch context.
4. Start ApplicationMaster.
5. Get capabilities.

4. Solution Design | 20
Hortonworks on Nutanix

6. Request or receive containers.


7. Container launches requests.
8. Write data.
On step eight from the figure above, Node 1 writes to the local disk and creates local copies. The
HDFS replication factor is set to 3 by default. This means that for every piece of data, the system
creates three copies of that data. The first copy is stored on the local node (A1), the second copy
is placed off-rack if possible, and the third copy is placed on a random node in the same rack as
the second copy. This placement strategy promotes data availability and allows multiple nodes
to use the copies of data, parallelizing their efforts to get fast results. When new jobs run, the
system selects NodeManagers where the data involved resides to reduce network congestion
and increase performance. Replication factor 3 with Hadoop has overhead of 3x.

4.2. Hadoop 2.0 on Nutanix: Storage Consumption


Hadoop and Nutanix are similar in the way they use data locality for performance and replication
factor for availability and throughput. This section gives an overview of the impact of changing
the replication factor on HDFS and the DSF.

Production Environments
In production environments, you should use a minimum of HDFS replication factor 2, so that
the NameNode has multiple options for placing containers and YARN can work with local data.
Replication factor 2 on HDFS also helps with job reliability. If a physical node or VM goes down
because of maintenance or an error, YARN jobs can quickly restart.

Table 3: Hadoop on Acropolis DSF Parameters for Production

Item Detail Rationale


HDFS Replication Factor 2 Hadoop job reliability and parallelization
Acropolis Replication Factor 2 Data availability

4. Solution Design | 21
Hortonworks on Nutanix

Figure 10: HDFS on Acropolis DSF

In the diagram above, once the local data node writes A1 to HDFS, DSF creates B1 locally
and creates the second copy based on Nutanix availability domains. HDFS also writes A2 (a
second copy) so that the same process creates C1 and C2 (two additional copies on the DSF)
synchronously. Because the Hadoop DataNode has knowledge of A1 and A2, you can use both
copies for task parallelization.

Note: Using HDFS replication factor 2 may affect performance in certain situations
because it provides limited options for scheduling compute tasks to use the local
copies of data of which HDFS is aware. However, as the additional copies created
at higher HDFS replication factors simply offer more availability and are not used for
any consensus or voting majority, Nutanix recommends HDFS replication factor 2.

As an example, a 36-node HDP cluster formed with NX-8150 appliances would have
approximately 1,088 TB of raw capacity. Because the Acropolis DSF uses a replication factor of
2, the usable capacity on Nutanix is 544 TB before any capacity transformations. The table below
breaks down the HDFS usable storage on the DSF. Hadoop uses approximately 20 percent of
the usable storage for running the DataNode’s OS for the YARN and log directories. We must
divide the remaining usable storage by 2, because HDFS maintains its own replication factor.

Table 4: Usable Storage with HDFS and the Acropolis DSF

Total Hadoop Usable Storage = (20% * total Acropolis DSF replication factor 2 capacity) + (80%
* total Acropolis DSF replication factor 2 capacity / HDFS replication factor)
Total Hadoop Usable Storage = (0.2 * 544) + (0.8 * 544 / 2)
Total Hadoop Usable Storage = 108.8 TB + 217.6 TB / 2
Total Hadoop Usable Storage = 326.40 TB

4. Solution Design | 22
Hortonworks on Nutanix

Nutanix Storage Container Layout


Storage setup on Nutanix is very simple. We recommend having two Nutanix storage containers:
one container for the OS, localized YARN temporary data, and logs, and the second container for
HDFS data.

Figure 11: Nutanix Containers Mounted as Datastores on ESXi Hosts

In specific use cases, such as when deploying Hadoop, Nutanix maintains the ability to change
the default priority of the tiered storage pool on a per-container basis. A Nutanix container—
not to be confused with the YARN unit of compute resources—is a thinly provisioned unit of
storage, created from the global cluster storage pool consisting of every local disk and SSD and
accessible to all nodes in the Nutanix cluster.
ncli container edit name=HDFS sequential-io-priority-order=DAS-SATA,SSD-SATA,SSD-PCIe

The command syntax above results in the following changes to the HDFS container; compare
these changes to the YARN container parameters. Because of the highly sequential nature of
the HDFS workload, the command changes the container’s sequential I/O priority ordering so
that the DAS-SATA tier takes precedence for the HDFS container. After this change, writes go
directly to the spinning disk and use all the available spindles at once.

4. Solution Design | 23
Hortonworks on Nutanix

HDFS Container Parameters


ncli container list
Id : 000554b4-30a9-b4fa-0000-000000007e02::20197
Uuid : 2e7dce39-f21f-49ec-a34e-9a39d7f0c307
Name : HDFS
Storage Pool Id : 000554b4-30a9-b4fa-0000-000000007e02::8
Storage Pool Uuid : 3b2d9011-7d1f-4711-895b-c614cf2589f1
Free Space (Logical) : 32.03 TiB (35,216,103,712,295 bytes)
Used Space (Logical) : 31.58 TiB (34,719,287,361,536 bytes)
Allowed Max Capacity : 63.61 TiB (69,935,391,073,831 bytes)
Used by other Containers : 8.35 TiB (9,182,409,310,208 bytes)
Explicit Reservation : 0 bytes
Thick Provisioned : 0 bytes
Replication Factor : 2
Oplog Replication Factor : 2
NFS Whitelist Inherited : true
Container NFS Whitelist :
VStore Name(s) : HDFS
Random I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Sequential I/O Pri Order : DAS-SATA, SSD-SATA, SSD-PCIe
Compression : off
Fingerprint On Write : off
On-Disk Dedup : none
Erasure Code : off

4. Solution Design | 24
Hortonworks on Nutanix

YARN Container Parameters


Id : 000554b4-30a9-b4fa-0000-000000007e02::20199
Uuid : 3fed1462-c98d-4cee-a800-bb32b078a7f6
Name : YARN
Storage Pool Id : 000554b4-30a9-b4fa-0000-000000007e02::8
Storage Pool Uuid : 3b2d9011-7d1f-4711-895b-c614cf2589f1
Free Space (Logical) : 32.03 TiB (35,216,103,712,295 bytes)
Used Space (Logical) : 8.35 TiB (9,182,409,310,208 bytes)
Allowed Max Capacity : 40.38 TiB (44,398,513,022,503 bytes)
Used by other Containers : 31.58 TiB (34,719,287,361,536 bytes)
Explicit Reservation : 0 bytes
Thick Provisioned : 305 GiB (327,491,256,320 bytes)
Replication Factor : 2
Oplog Replication Factor : 2
NFS Whitelist Inherited : true
Container NFS Whitelist :
VStore Name(s) : YARN
Random I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Sequential I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA
Compression : off
Fingerprint On Write : off
On-Disk Dedup : none
Erasure Code : off

4.3. Hadoop VM Operating System Changes


I/O Scheduler
We used the noop I/O scheduler, which allows the host or hypervisor to optimize the I/O
requests, for vDisks presented to both data and management node VMs. The noop scheduler
spends as few CPU cycles as possible in the guest for I/O scheduling. The host or hypervisor
receives an overview of the requests of all guests and has a separate strategy for handling I/O.
For full details, please refer to the Appendix.

4. Solution Design | 25
Hortonworks on Nutanix

Kernel Tuning
The primary changes to kernel tuning involved disabling Transparent Huge Pages (THP) and
anonymous page swapping (vm.swappiness), as well as reducing read-ahead on a per-vDisk
basis. Full kernel parameters are available in the Appendix.

SELinux
HDP requires that you disable SELinux on all guest VMs that are part of the Hadoop cluster.
Disable SELinux on RHEL or CentOS-based Linux distributions by editing /etc/selinux/config
and setting SELINUX=disabled. This change must be performed as root (or with proper sudo
access) and requires a reboot.

Network Time Protocol


Because all VMs that form part of the Hadoop cluster require the same time and date settings,
we highly recommend using the Network Time Protocol (NTP). The CentOS 7 VMs based on
systemd that we used in testing have chronyd as their default NTP daemon. Attempting to start
the ntpd service fails, as systemd causes ntpd to exit if chronyd is already running. The Appendix
provides full details on how to configure and enable the chrony service via systemd.

Hostname Resolution
We highly recommend using DNS for hostname resolution. For Hadoop to function properly,
forward and reverse lookups for all hosts in the HDP cluster must be the inverse of each other.
To ensure proper DNS resolution, you can perform this easy test on the hosts:
dig <hostname>
dig –x <ip_address_returned_from_hostname_lookup>

File System Creation and Mount Options


File system creation options:
1. Enable journal mode.
2. Reduce superuser block reservation from 5 percent to 1 percent for root (using the -m1
option).
3. Set the options sparse_super (to minimize number of superblock backups), dir_index (to use
b-tree indexes for directory trees), and extent (for extent-based allocations):
# mkfs –t ext4 –m1 –O sparse_super, dir_index,extent, has_journal /dev/sdb

Mount options:
1. Disable atime on the data disks (and from root fs).

4. Solution Design | 26
Hortonworks on Nutanix

2. In the /etc/fstab file, ensure that the appropriate file systems have the noatime mount option
specified:
LABEL=ROOT / ext4 noatime,data=ordered 0 0
UUID=99d77c99-1506-474c-99ce-282fe9e2dbcf /disk/sde ext4 noatime,data=ordered 0 0

Example:
/dev/sde on /disks/sde type ext4 (rw,noatime,data=ordered)
/dev/sdh on /disks/sdh type ext4 (rw,noatime,data=ordered)
/dev/sdm on /disks/sdm type ext4 (rw,noatime,data=ordered)
/dev/sdk on /disks/sdk type ext4 (rw,noatime,data=ordered)
/dev/sdg on /disks/sdg type ext4 (rw,noatime,data=ordered)
/dev/sdd on /disks/sdd type ext4 (rw,noatime,data=ordered)
/dev/sdb on /disks/sdb type ext4 (rw,noatime,data=ordered)
/dev/sdi on /disks/sdi type ext4 (rw,noatime,data=ordered)
/dev/sdc on /disks/sdc type ext4 (rw,noatime,data=ordered)
/dev/sdj on /disks/sdj type ext4 (rw,noatime,data=ordered)
/dev/sdf on /disks/sdf type ext4 (rw,noatime,data=ordered)
/dev/sdl on /disks/sdl type ext4 (rw,noatime,data=ordered)

4. Solution Design | 27
Hortonworks on Nutanix

Figure 12: Acropolis DSF Container Layout

Nutanix recommends having four vDisks on the YARN container for each DataNode VM. Try to
size the YARN vDisks to be 20 percent of the node’s usable capacity. If each node has 4 TB of
usable capacity, then each vDisk would be about 200 GB. Each DataNode should have eight
vDisks on the HDFS container.
We recommend sizing your HDFS vDisks based on the available usable capacity, enabling you
to make the best use of your Hadoop storage. This approach requires that you adjust the Hadoop
configuration, but the Ambari Cluster Manager makes Hadoop configuration adjustment simple.

4. Solution Design | 28
Hortonworks on Nutanix

YARN Configuration
We used the following Hortonworks-provided utility scripts to configure the required YARN
parameters for the Hadoop cluster. Note that the arguments provided to the utility script are
based on the individual configuration of a DataNode VM:
# python yarn-utils.py -c 12 -m 67 -d 12 -k False
Using cores=12 memory=67GB disks=12 hbase=False
Profile: cores=12 memory=67584MB reserved=1GB usableMem=66GB disks=12
Num Container=22
Container Ram=3072MB
Used Ram=66GB
Unused Ram=1GB
yarn.scheduler.minimum-allocation-mb=3072
yarn.scheduler.maximum-allocation-mb=67584
yarn.nodemanager.resource.memory-mb=67584
mapreduce.map.memory.mb=3072
mapreduce.map.java.opts=-Xmx2457m
mapreduce.reduce.memory.mb=3072
mapreduce.reduce.java.opts=-Xmx2457m
yarn.app.mapreduce.am.resource.mb=3072
yarn.app.mapreduce.am.command-opts=-Xmx2457m
mapreduce.task.io.sort.mb=1228

Storage

Table 5: Disk and File System Layout for ESXi

Node / Role Disk and File System Layout Description

/dev/sdb on /disk1
/dev/sdc on /disk2 4 disks are used for
Management / Master
/dev/sdd on /disk3 management functions

/dev/sde on /disk4

4. Solution Design | 29
Hortonworks on Nutanix

Node / Role Disk and File System Layout Description

/dev/sdb on /disk1
/dev/sdc on /disk2 4 disks are used for YARN
YARN NodeManager
/dev/sdd on /disk3 functions

/dev/sde on /disk4

/dev/sdf on /disk5
/dev/sdg on /disk6
/dev/sdh on /disk7
/dev/sdi on /disk8 8 disks make up HDFS
DataNode and are used for additional
/dev/sdj on /disk9 services like Solr
/dev/sdk on /disk10
/dev/sdl on /disk11
/dev/sdm on /disk12

4.4. Hadoop Cluster Design


Because the HDFS namespace can spread out over many physical nodes, it can also spread out
over multiple Nutanix clusters with a single namespace. Using HDFS rack awareness, you can
create multiple failure domains to build a system that meets the highest uptime requirements.
During testing, we used eight NX-8150-G5s with both ESXi and AHV. Because the NX-8150-G5s
only have one node, we made the individual block a NodeGroup. The figure below shows how
we deployed the DataNodes, with three DataNodes per host where master services were not
running. The master services VMs ran on a separate 3460-G4. Using DRS antiaffinity rules, each
master service node ran on a separate node.

4. Solution Design | 30
Hortonworks on Nutanix

Figure 13: Proposed DataNode VM Layout

4. Solution Design | 31
Hortonworks on Nutanix

Figure 14: Easy to Configure Rack Awareness with Ambari

In general, keep individual Nutanix clusters to 32 nodes when setting the DSF replication factor
to two. This configuration fits nicely into one rack using the NX-6000 line. When using replication
factor 3 for the DSF you can have 64 nodes in one cluster, but as replication factor 3 uses more
SSD space, only move to this setting if you need to scale past 32 nodes in one cluster.
Using this 32-node design with replication factor 2, you can have one full rack go down and
have a node outage in another rack—and still keep your cluster running. This failure scenario is
depicted in the figure in the Failure Domain Scenarios section below.

4. Solution Design | 32
Hortonworks on Nutanix

Failure Domain Scenarios

HDFS Replication Factor = 2 and DSF Replication Factor = 2

Figure 15: DFS Replication Factor 2 Rack Awareness Failure Domain

If you allow each rack to be a separate Nutanix cluster, HDFS can survive an outage in an entire
rack, and it can lose one additional node in every other available rack.

4.5. General Sizing


In the following section we cover the design decisions and rationale for Hadoop deployments on
the Enterprise Cloud Platform. Sizing Hadoop across the different Apache projects is kind of a

4. Solution Design | 33
Hortonworks on Nutanix

black art, and this makes it a great fit for virtualization, because you can easily adjust your VM
settings.

Sizing Compute
If you have an existing physical deployment, you can use the formula below. Typically, you
don’t want to oversubscribe your CPU, but if your system is not busy 24/7, an underutilized
environment just has jobs running longer than expected. It’s best to find out how quickly the
business needs jobs to finish.

Table 6: Applied Hyperthreading Bonus

Hyperthreading
Use Case Rationale
Bonus

Account for hyperthreading


Existing Physical Install 1.3
Account for underutilization of physical installs

Amazon VMs count hyperthreaded cores as full


Amazon Web Services 2
cores when running VMs

Less CPU is needed


Indexing
I/O-Bound Workloads 2 Grouping
Data importing and exporting
Data transformation

Table 7: Available vCPU for Hadoop VMs per Node

Available vCPUs for Hadoop VMs per node =


(node total core count * hyperthreading bonus) - CVM CPU core count

NameNode Compute
The NameNode doesn’t require a lot of resources. Four vCPUs are a great starting place for
most Hortonworks clusters. As the Hortonworks cluster grows, you can always assign more
resources later within both ESXi and AHV.

4. Solution Design | 34
Hortonworks on Nutanix

Table 8: General NameNode CPU Sizing

Number of Data Nodes DataNode CPUs (vCPUs)


< 50 4
> 50 6

DataNode Compute
The DataNode can be divided based on how many VMs you want to run on each node. The CVM
has access to eight vCPUs, but under load it typically only uses half of the vCPUs assigned.
If you plan on running multiple VMs, size the VMs to stay within the size of physical CPU. For
example, if you’re using a 10-CPU core, then make sure the VMs don’t go over 10 vCPUs.

Sizing Memory

NameNode Memory
NameNode sizing depends on how many data blocks you are storing. One million blocks equal 1
GB of NameNode memory. Because the NameNode is virtualized, you can easily adjust as your
environment changes.

Table 9: General NameNode Memory Sizing

Number of Data Nodes DataNode Memory (GB)


< 50 40 GB
> 50 80 GB

The other master nodes, which run services such as Zookeeper for high availability, Ambari, and
monitoring, do not take a lot of memory resources.

DataNode Memory
Typical sizing for the DataNode is 6 GB to 8 GB per vCPU assigned. More RAM is always helpful
and can avoid spilling to disk when running MapReduce jobs. If you are running Spark on the
same DataNodes, you might want to size for 12 GB to 16 GB per core.

CVM Memory
Most Hadoop workloads do not benefit from the local RAM cache; however, to help with
metadata, the minimum CVM memory is 40 GB of RAM.

4. Solution Design | 35
Hortonworks on Nutanix

Network
Nutanix uses a leaf-spine network architecture for true linear scaling. A leaf-spine architecture
consists of two network tiers: an L2 leaf and an L3 spine based on 40 GbE and nonblocking
switches. This architecture maintains consistent performance without any throughput reduction
because there is a static maximum of three hops between any two nodes in the network.
The figure below shows a design of a scale-out leaf-spine network architecture that provides
20 Gb active throughput from each node to its L2 leaf and 80 Gb active throughput from each
leaf-to-spine switch that is scalable from one Nutanix block to thousands without any impact on
available bandwidth.

Figure 16: Leaf-Spine Network Architecture

4. Solution Design | 36
Hortonworks on Nutanix

MTU Size
On each of the DataNode and management node VMs, we add the following setting to the
interface configuration file /etc/sysconfig/network-scripts/ ifcfg-<interface> to implement large or
jumbo frames:

MTU=9000

4.6. ESXi Networking

Figure 17: ESXi Networking Using 10 GbE Physical Adapters

Similarly, all of the portgroups that attach to VMs need to use jumbo frames. The configuration is
as follows:
# esxcfg-vswitch -m 9000 <vswitch-name>

4. Solution Design | 37
Hortonworks on Nutanix

Figure 18: MTU Settings for Jumbo Frames on ESXi Standard vSwitch

Note: You must adjust the MTU size on both the switch side and the host side. The
specific commands and details for adjustments on the switch side vary depending on
the switch vendor.

4.7. AHV Networking


AHV uses Open vSwitch (OVS) to connect the CVM, the hypervisor, and guest VMs to each
other and to the physical network on each node. Open vSwitch (OVS) is an open source software
switch implemented in the Linux kernel and designed to work in a multiserver virtualization
environment. By default, OVS behaves like a layer-2 learning switch that maintains a MAC
address table. The hypervisor host and VMs connect to virtual ports on the switch.

4. Solution Design | 38
Hortonworks on Nutanix

Figure 19: Configuration for AHV Networking

To take advantage of the bandwidth provided by multiple upstream switch links, you can use
the balance-slb bond mode. (The default mode for AHV is active-backup.) The balance-slb bond
mode in OVS takes advantage of all links in a bond and uses measured traffic load to rebalance
VM traffic from highly used to less used interfaces. When the configurable bondrebalance interval
expires, OVS uses the measured load for each interface and the load for each source MAC hash
to spread traffic evenly among links in the bond. Traffic from some source MAC hashes may
move to a less active link to more evenly balance bond member utilization.
Configure the balance-slb algorithm for each bond on all AHV nodes in the Nutanix cluster with
the following commands:
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up bond_mode=balance-slb"
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up other_config:bond-
rebalanceinterval=30000"

Repeat this configuration on all CVMs in the cluster.

4. Solution Design | 39
Hortonworks on Nutanix

Verify the proper bond mode on each CVM with the following commands:
nutanix@CVM$ ssh root@192.168.5.1 "ovs-appctl bond/show br0-up "
---- br0-up ---
bond_mode: balance-slb
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 29108 ms l
acp_status: off
slave eth2: enabled
may_enable: true
hash 120: 138065 kB load
hash 182: 20 kB load
slave eth3: enabled
active slave
may_enable: true
hash 27: 0 kB load
hash 31: 20 kB load
hash 104: 1802 kB load
hash 206: 20 kB load

Run the following command to enable jumbo frames on the interfaces:


nutanix@cvm$ manage_ovs --interfaces <interfaces> --mtu 9000 update_uplinks

Be sure to plan carefully before enabling jumbo frames. The entire network infrastructure,
Nutanix cluster, and components that connect to Nutanix must all be configured properly to
avoid packet fragmentation. Configuration details for increasing the network MTU to enable
jumbo frames on AHV hosts can be found in KB 3529 and KB 1807 for CVMs. Work with Nutanix
support before enabling jumbo frames.

4. Solution Design | 40
Hortonworks on Nutanix

5. Validation and Benchmarking


We completed the solution and testing presented in this document with HDP 2.6.2 deployed on
ESXi and AHV with AOS 5.5.

5.1. Environment Overview


We set up separate worker nodes as a single Nutanix cluster to colocate three DataNodes
per physical server. The three DataNodes are three individual VMs configured as per the Test
Environment Configuration section below. We configured an additional four-node NX-3060
Nutanix cluster to host the majority of the management services. Using antiaffinity rules, each
master service node ran on a separate node. All Nutanix nodes connected to an Arista 7050s
top-of-rack switch via 10 GbE.

Figure 20: Management Cluster Hosting HDP Master Services

The VMs on the management cluster ran the following key HDP components:
• MN9

5. Validation and Benchmarking | 41


Hortonworks on Nutanix

Zookeeper
• MN8
Zookeeper, App Timeline Server
• MN7
Zookeeper
• MN6
ResourceManager
• MN5
NameNode
• MN4
Secondary NameNode
Various other ancillary components, including those dealing with metrics collection, were spread
across all other nodes forming the management cluster. Below, we describe the configuration of
each VM on the management cluster.

5. Validation and Benchmarking | 42


Hortonworks on Nutanix

Figure 21: Data Cluster Hosting Three DataNodes per Hypervisor Host

Each physical hypervisor node hosts three worker node VMs, with an additional hypervisor node
left unoccupied for failover and resiliency. The HDP component layout across the VM was:
• DN1–21

5. Validation and Benchmarking | 43


Hortonworks on Nutanix

DataNode, ResourceManager

5.2. ESXi and AHV Test Environment Configuration


Hardware
Storage and compute:
• Nutanix NX-8150
• 480 GB SSD
Network:
• Arista 7050Q (L3 spine) / 7050S (L2 leaf) Series switches

Hadoop Server Configuration


First master node VM:
• OS: CentOS 7.2 x64
• CPU and memory: 24 vCPU, 100 GB
• Disk:
# 1x 150 GB (OS)
# 4x 250 GB (Data)
Second master node VM:
• OS: CentOS 7.2 x64
• CPU and memory: 24 vCPU, 100 GB
• Disk:
# 1x 60 GB (OS)
# 4x 250 GB (Data)
Third master node VM:
• OS: CentOS 7.2 x64
• CPU and memory: 24 vCPU, 100 GB
• Disk:
# 1x 60 GB (OS)
# 4x 250 GB (Data)

5. Validation and Benchmarking | 44


Hortonworks on Nutanix

ESXi worker node VMs:


• OS: CentOS 7.2 x64
• CPU and memory: 12 vCPU, 67 GB (If using Impala, a minimum of 128 GB RAM is
recommended.)
• Disk:
# 1x 150 GB (OS)
# 4x 102 GB (YARN)
# 8x 205 GB (HDFS)
Hortonworks Data Platform:
• Version: 2.6.2
Acropolis Operating System (AOS):
• Version: 5.5+
• Management cluster
# CVM: 8 vCPU, 32 GB RAM
• Worker cluster
# CVM: 12 vCPU, 40 GB RAM
ESXi:
• Version: 3620759+
AHV:
• Version: 20170830.58

5. Validation and Benchmarking | 45


Hortonworks on Nutanix

6. Solution Application
In this section we consider real-world scenarios and outline sizing metrics and components. The
applications below assume a knowledge user workload; however, results can vary based on
utilization and workload.

6.1. Scenario: Large Hadoop Cluster with Rack Awareness

Table 10: Detailed Component Breakdown

Item Value Item Value


Components Infrastructure
# of Nutanix nodes 40 # of Hadoop nodes 40
# of Nutanix blocks 37 # of Nutanix clusters 2
# of datastore(s) (2 per
# of RU (Nutanix) 74 4
cluster)
# of 10 GbE ports 72 # of VMs 108
# of 100/1000 ports
40
(IPMI)
# of L2 leaf switches 4
# of L3 spine switches 2

The solution below has over 544 TB of usable capacity on Nutanix storage. Every Nutanix cluster
has a YARN container and an HDFS container configured. Recall that we have changed the
tiering priority for the HDFS container to write to the DAS-SATA layer in the first instance. Prism
provides insight into all of the Nutanix clusters.
Master node services run on a separate smaller NX-3460 cluster. The master services VMs
are very important but don’t have the same performance requirements as the DataNodes. This
difference in performance requirements allows you to isolate them, so they are not impacted by
the heavy sequential workloads.
You can configure Nutanix per-VM replication to provide management over a golden Hadoop
image for the DataNodes. The golden image is configured using current good practice as

6. Solution Application | 46
Hortonworks on Nutanix

detailed throughout this document. When you need to deploy a new DataNode, you can use
Nutanix data protection workflows to clone it from such a master image. Such agile workflows
allow for speedy deployments, and you can further enhance them, either by using the Nutanix
API to add automation or with Nutanix Calm.

Figure 22: 544 TB Usable Nutanix Storage

6. Solution Application | 47
Hortonworks on Nutanix

7. Conclusion
The Nutanix Enterprise Cloud Platform with either AHV or ESXi gives customers an easy way
to deploy and manage the Hortonworks Data Platform without additional costs. Density for
HDP deployments is driven primarily by CPU requirements and not by any I/O bottleneck. In
testing, the platform was easily able to handle the IOPS and throughput requirements for Hadoop
instances. We determined sizing for the pods after carefully considering performance as well as
accounting for the additional resources needed for N + 1 failover capabilities.
The HDP on Nutanix solution provides a single high-density platform for Hadoop deployments,
VM hosting, and application delivery. This modular, pod-based approach means that you can
easily scale your deployments. Our testing proves that HDFS can run natively on both AHV and
ESXi, reducing the overhead associated with traditional Hadoop deployments.

7. Conclusion | 48
Hortonworks on Nutanix

Appendix

Configuration
In this section, we detail a minimum configuration for an HDP deployment on Nutanix.

Storage and Compute

Table 11: NX-3460-G5 Configuration

SKU Quantity Description


NX-3460-G5-11130 1 Nutanix Hardware Platform
—NX-3160-G5, 1 node
Nutanix Software
—Foundation: Hypervisor Agnostic Installer
—Controller VM
—Prism Starter Management
Acropolis Starter License Entitlement

C-CPU-2640v4 8 Intel Xeon Processor 2.4 GHz 10-core Broadwell


E5-2640 v4 25M Cache
C-MEM-32GBDDR4-2400 32 32 GB DDR4 Memory Module
C-HDD-2TB-2.5 16 2 TB 2.5" HDD
C-SSD-1200GB-2.5-C 8 1.2 TB 2.5" SSD 0.00 2
C-NIC-10G-2-SI 4 10 GbE Dual SFP+ Network Adapter

Appendix | 49
Hortonworks on Nutanix

Table 12: NX-8150 Configuration

SKU Quantity Description


NX-8150-G5-11130 12 Nutanix Hardware Platform
—NX-8150-G5, 1 node
Nutanix Software
—Foundation: Hypervisor Agnostic Installer
—Controller VM
—Prism Starter Management
—Acropolis Starter License Entitlement

C-CPU-2680v3 16 Intel Xeon Processor 2.5 GHz 12-core Broadwell


E5-2667 v4 20M Cache
C-MEM-32GBDDR4-2400 64 32 GB DDR4 Memory Module
C-HDD-2TB-2.5 160 2 TB 2.5" HDD
C-SSD-1200GB-2.5-C 32 1.2 TB 2.5" SSD 0.00 2
C-NIC-10G-2-SI 8 10 GbE Dual SFP+ Network Adapter

Software for All Solutions


Nutanix: AOS 5.5+
VMware: ESXi, Build 3620759
Nutanix: AHV, 20170830.58

Table 13: Recommended SKUs

CNS-INST-1-NC 4 Nutanix cluster deployment, per-node basis

CUSTOMER COURSE: Nutanix Platform Administration course


(Online Plus)
DELIVERY: Online Plus (self-paced online, plus guided experience
EDU-C-ADMOLPLUS and hosted labs)
FOCUS: Platform Administration, prepares for NPP certification
PRICED: per student

Appendix | 50
Hortonworks on Nutanix

Disabling Transparent Huge Pages (THP) and Setting read_ahead_kb per


vDisk
Add the code below to the /etc/rc.d/rc.local script on your Linux system to disable THP and set
the per-vDisk read_ahead_kb to 512 K. The read-ahead numbers are specified in 512 sector
blocks, which yields the below figure of 1,024. On a CentOS 7 operating system controlled via
systemd, make the following changes.
Add the following to your rc.local script:
#disable THP at boot time
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
#set read_ahead to 512k (1024 sectors)
/usr/sbin/blockdev --setra 1024 /dev/sda
/usr/sbin/blockdev --setra 1024 /dev/sdb
/usr/sbin/blockdev --setra 1024 /dev/sdc
/usr/sbin/blockdev --setra 1024 /dev/sdd
/usr/sbin/blockdev --setra 1024 /dev/sde
/usr/sbin/blockdev --setra 1024 /dev/sdf
/usr/sbin/blockdev --setra 1024 /dev/sdg
/usr/sbin/blockdev --setra 1024 /dev/sdh
/usr/sbin/blockdev --setra 1024 /dev/sdi
/usr/sbin/blockdev --setra 1024 /dev/sdj
/usr/sbin/blockdev --setra 1024 /dev/sdk
/usr/sbin/blockdev --setra 1024 /dev/sdl
/usr/sbin/blockdev --setra 1024 /dev/sdm

Verify that you can execute the script:


chmod + x /etc/rc.d/rc.local

Appendix | 51
Hortonworks on Nutanix

Configuring Chronyd (NTP) Service


Based on your geographic location or zone, add the necessary NTP server entries from http://
www.pool.ntp.org to the file /etc/chrony.conf. These entries must replace whatever other entries
currently exist in the file. Thus, for the North American zone, add:
server 0.us.pool.ntp.org iburst
server 1.us.pool.ntp.org iburst
server 2.us.pool.ntp.org iburst
server 3.us.pool.ntp.org iburst

Now start the chronyd service using the system interface:


# systemctl enable chronyd
# systemctl start chronyd

If necessary, allow NTP traffic ingress and egress access to the server:
# firewall-cmd --add-service=ntp --permanent
success
# firewall-cmd --reload
success

Appendix | 52
Hortonworks on Nutanix

You can verify chronyd operation using the following commands. For further troubleshooting,
please refer to documentation available online.
# chronyc tracking
Reference ID : 208.75.89.4 (time.tritn.com)
Stratum : 3
Ref time (UTC) : Wed Aug 30 12:01:15 2017
System time : 0.000030019 seconds slow of NTP time
Last offset : -0.000030078 seconds
RMS offset : 0.000167859 seconds
Frequency : 4.386 ppm fast
Residual freq : -0.000 ppm
Skew : 0.019 ppm
Root delay : 0.021557 seconds
Root dispersion : 0.001798 seconds
Update interval : 1038.1 seconds
Leap status : Normal
# chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
============================================================================
^* time.tritn.com 2 10 377 25m -853us[ -883us] +/- 11ms
^- 104.131.53.252 2 10 377 866 +1581us[+1581us] +/- 71ms
^+ time-b.timefreq.bldrdoc.g 1 10 377 276 +1705us[+1705us] +/- 20ms
^- palpatine.steven-mcdonald 2 10 377 618 +1266us[+1266us] +/- 40ms

Appendix | 53
Hortonworks on Nutanix

Kernel Settings
Add the following changes to /etc/sysctl.d/99-sysctl.conf on a systemd-based CentOS 7 Linux
distribution:
vm.swappiness=1
vm.dirty_background_ratio = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mtu_probing=1

For vSphere deployments, set queue depth, read-ahead, and the I/O scheduler per vDisk using
udev rules, create a file /etc/udev/rules.d/99-nutanix.rules and add the following entries for ESXi:
ACTION=="add|change", KERNEL=="sd*", ATTR{bdi/read_ahead_kb}="512", RUN+="/bin/sh -c '/bin/
echo 16 > /sys%p/queue/nr_requests && /bin/echo noop > /sys%p/queue/scheduler'"

For AHV deployments, we can use higher numbers for nr_requests, because AHV Turbo allows
us to enable multiqueue. To set queue depth and the I/O scheduler per vDisk using udev rules,
create a file /etc/udev/rules.d/99-nutanix.rules and add the following entries for AHV.
YARN disks:
ACTION=="add|change", KERNEL=="sd[a-g]", ATTR{bdi/read_ahead_kb}="4096", RUN+="/bin/sh -c '/
bin/echo 128 > /sys%p/queue/nr_requests && /bin/echo noop > /sys%p/queue/scheduler'"

HDFS disks:
ACTION=="add|change", KERNEL=="sd[h-m]", ATTR{bdi/read_ahead_kb}="4096", RUN+="/bin/sh -c '/
bin/echo 32 > /sys%p/queue/nr_requests && /bin/echo noop > /sys%p/queue/scheduler'"

Changes made using udev may be tested and implemented from the command line as follows:
/sbin/udevadm control --reload-rules
/sbin/udevadm trigger --type=devices --action=change

Tip: Be sure to verify that these changes persist after a system reboot.

To enable multiqueue for AHV, edit the /etc/default/grub file and append scsi_mod.use_blk_mq=y
dm_mod.use_blk_mq=y to the GRUB_CMDLINE_LINUX line:
GRUB_CMDLINE_LINUX=“crashkernel=auto rhgb quiet scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y”

Run the grub2-mkconfig -o /boot/grub2/grub.cfg command:


[root@datanode ~1]# grub2-mkconfig -o /boot/grub2/grub.cfg

Reboot.

Appendix | 54
Hortonworks on Nutanix

Glossary of Terms
• DataNode
Worker node in the cluster, to which HDFS data is written.
• HDFS
Hadoop Distributed File System.
• High Availability
Configuration that addresses availability issues in a cluster. In a standard configuration, the
NameNode is a single point of failure. Each cluster has one NameNode, and if that machine or
process becomes unavailable, the cluster as a whole is also unavailable until you either restart
the NameNode or bring it up on a new host.
High availability enables running two NameNodes in the same cluster: the active NameNode
and the standby NameNode. The standby NameNode allows a fast failover to a new
NameNode in case of machine crash or planned maintenance.
• NameNode
The metadata master of HDFS, essential for the integrity and proper functioning of the
distributed file system.
• NodeManager
The process that starts application processes and manages resources on the DataNodes.
• ResourceManager
The resource management component of YARN. This component initiates application startup
and controls scheduling on the DataNodes of the cluster (one instance per cluster).
• ZooKeeper
A centralized service for maintaining configuration information and naming, as well as for
providing distributed synchronization and group services.

About the Authors


Dwayne Lessner is a technical marketing engineer on the Product Marketing team at Nutanix. In
this role, Dwayne helps design, test, and build solutions on top of the Enterprise Cloud Platform.
Dwayne has worked in healthcare and oil and gas for over 10 years in various roles. A strong
background in server and desktop virtualization has given Dwayne the opportunity to work
with many different application frameworks and architectures. Dwayne has been a speaker at
BriForum, Nutanix, VMware, and various industry events and conferences.

Appendix | 55
Hortonworks on Nutanix

Follow Dwayne on Twitter at @dlink7.


Ray Hassan is part of the Global Solutions Engineering team at Nutanix. Ray focuses on Cloud
Native and Emerging Technologies. These technologies include such areas as containers,
NoSQL databases, big data, and search. He develops reference architectures, best practice
guides, white papers, and more on how to make Next Gen applications integrate and perform
with minimal friction on the Nutanix platform. Before Nutanix, he spent over 15 years as working
on clustering and storage technologies.
Follow Ray on Twitter @cannybag.

About Nutanix
Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that
power their business. The Nutanix Enterprise Cloud OS leverages web-scale engineering and
consumer-grade design to natively converge compute, virtualization, and storage into a resilient,
software-defined solution with rich machine intelligence. The result is predictable performance,
cloud-like infrastructure consumption, robust security, and seamless application mobility for a
broad range of enterprise applications. Learn more at www.nutanix.com or follow us on Twitter
@nutanix.

Appendix | 56
Hortonworks on Nutanix

List of Figures
Figure 1: Nutanix Web-Scale Properties........................................................................... 7

Figure 2: Nutanix Architecture........................................................................................... 8

Figure 3: ILM and Compression........................................................................................9

Figure 4: Forming a 4/1 Strip with Replication Factor 2..................................................10

Figure 5: Hadoop 2.0.......................................................................................................11

Figure 6: HDP Is Enterprise Ready.................................................................................13

Figure 7: Prism................................................................................................................ 15

Figure 8: HDP on Nutanix Conceptual Arch................................................................... 16

Figure 9: Hadoop 2.0 Workflow.......................................................................................20

Figure 10: HDFS on Acropolis DSF................................................................................ 22

Figure 11: Nutanix Containers Mounted as Datastores on ESXi Hosts.......................... 23

Figure 12: Acropolis DSF Container Layout....................................................................28

Figure 13: Proposed DataNode VM Layout.................................................................... 31

Figure 14: Easy to Configure Rack Awareness with Ambari...........................................32

Figure 15: DFS Replication Factor 2 Rack Awareness Failure Domain.......................... 33

Figure 16: Leaf-Spine Network Architecture................................................................... 36

Figure 17: ESXi Networking Using 10 GbE Physical Adapters....................................... 37

Figure 18: MTU Settings for Jumbo Frames on ESXi Standard vSwitch........................ 38

Figure 19: Configuration for AHV Networking................................................................. 39

Figure 20: Management Cluster Hosting HDP Master Services..................................... 41

Figure 21: Data Cluster Hosting Three DataNodes per Hypervisor Host........................ 43

57
Hortonworks on Nutanix

Figure 22: 544 TB Usable Nutanix Storage.................................................................... 47

58
Hortonworks on Nutanix

List of Tables
Table 1: Solution Details................................................................................................... 5

Table 2: Document Version History.................................................................................. 6

Table 3: Hadoop on Acropolis DSF Parameters for Production......................................21

Table 4: Usable Storage with HDFS and the Acropolis DSF.......................................... 22

Table 5: Disk and File System Layout for ESXi..............................................................29

Table 6: Applied Hyperthreading Bonus..........................................................................34

Table 7: Available vCPU for Hadoop VMs per Node...................................................... 34

Table 8: General NameNode CPU Sizing.......................................................................35

Table 9: General NameNode Memory Sizing................................................................. 35

Table 10: Detailed Component Breakdown.....................................................................46

Table 11: NX-3460-G5 Configuration.............................................................................. 49

Table 12: NX-8150 Configuration.................................................................................... 50

Table 13: Recommended SKUs...................................................................................... 50

59