Beruflich Dokumente
Kultur Dokumente
Nutanix
Nutanix Reference Architecture
Copyright
Copyright 2018 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual
property laws.
Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other
marks and names mentioned herein may be trademarks of their respective companies.
Copyright | 2
Hortonworks on Nutanix
Contents
1. Executive Summary................................................................................ 5
3. Solution Overview................................................................................... 7
3.1. Invisible Infrastructure....................................................................................................7
3.2. Nutanix Architecture...................................................................................................... 8
3.3. Hadoop 2.0 (HDFS and YARN) Architecture.............................................................. 11
3.4. What Is the Hortonworks Data Platform?....................................................................12
3.5. Nutanix Prism Central................................................................................................. 14
3.6. Hadoop the Nutanix Way............................................................................................ 15
4. Solution Design..................................................................................... 19
4.1. Hadoop 2.0 Storage Consumption.............................................................................. 19
4.2. Hadoop 2.0 on Nutanix: Storage Consumption...........................................................21
4.3. Hadoop VM Operating System Changes.................................................................... 25
4.4. Hadoop Cluster Design............................................................................................... 30
4.5. General Sizing............................................................................................................. 33
4.6. ESXi Networking..........................................................................................................37
4.7. AHV Networking.......................................................................................................... 38
6. Solution Application..............................................................................46
6.1. Scenario: Large Hadoop Cluster with Rack Awareness..............................................46
7. Conclusion............................................................................................. 48
Appendix......................................................................................................................... 49
3
Hortonworks on Nutanix
Configuration....................................................................................................................... 49
Disabling Transparent Huge Pages (THP) and Setting read_ahead_kb per vDisk............ 51
Configuring Chronyd (NTP) Service................................................................................... 52
Kernel Settings....................................................................................................................54
Glossary of Terms.............................................................................................................. 55
About the Authors............................................................................................................... 55
About Nutanix......................................................................................................................56
List of Figures................................................................................................................57
List of Tables................................................................................................................. 59
4
Hortonworks on Nutanix
1. Executive Summary
This document makes recommendations for designing, optimizing, and scaling Hortonworks Data
Platform (HDP) deployments on Nutanix. It shows the scalability of the Enterprise Cloud Platform
and provides detailed performance and configuration information regarding the cluster’s scale-out
capabilities when supporting Hadoop deployments.
We have performed extensive testing to simulate real-world workloads and conditions for an HDP
Hadoop environment on both AHV—the native Nutanix hypervisor—and ESXi. We based the
sizing data and recommendations made in this document on multiple test iterations and thorough
technical validation. For validation, we used the Hortonworks-supplied Independent Hardware
Vendor (IHV) tests; this testing suite has allowed us to self-certify HDP on a Nutanix Cloud OS
cluster.
This reference architecture details the underlying Nutanix platform configuration we tested,
as well as our layout for the HDP software components, which follows current best practice
guidance and serves as the basis for ongoing work with the Hortonworks partner team.
1. Executive Summary | 5
Hortonworks on Nutanix
Version
Published Notes
Number
1.0 October 2015 Original publication.
1.1 April 2016 Updated platform overview.
1.2 October 2017 Updated platform overview.
2.0 May 2018 Updated to include both ESXi and AHV support.
Added Solution Details table and updated Usable Storage
2.1 October 2018
with HDFS and the Acropolis DSF table.
3. Solution Overview
The Nutanix Enterprise Cloud Platform has an architecture very similar to Hadoop’s, allowing
data localization, high throughput, and the capacity to achieve great scale. Nutanix provides
all the benefits of virtualization without the pitfalls of shared storage, such as the complexity
of managing a separate data fabric and the introduction of bottlenecks and silos as you go to
scale your environment. Nutanix offers one-click management via the Prism UI. The experience
that comes from building our own distributed file system uniquely positions Nutanix to maintain,
remediate, and deliver insights into Hadoop’s infrastructure. As you scale from dozens to
3. Solution Overview | 7
Hortonworks on Nutanix
thousands of nodes with Nutanix, you can enjoy enterprise quality with consumer-grade look and
feel.
The DSF understands the concept of a VM and provides advanced data management features.
It brings data closer to VMs by storing the data locally on the system, resulting in higher
performance at a lower cost. Nutanix platforms can horizontally scale from as few as three nodes
to a very large number, providing agility and flexibility as a customer’s infrastructure needs grow.
The Nutanix Capacity Optimization Engine (COE) transforms data to increase data efficiency on
disk, using compression as one of its key techniques. The DSF provides both inline and post-
process compression to suit the customer’s needs and the types of data involved.
Inline compression condenses sequential streams of data or large I/O sizes in memory before
writing them to disk, while post-process compression initially writes the data as usual (in an
uncompressed state), then uses the Nutanix MapReduce framework to compress the data
cluster-wide. When using inline compression with random I/O, the system writes data to the
3. Solution Overview | 8
Hortonworks on Nutanix
oplog uncompressed, coalesces it, and then compresses it in memory before writing it to the
extent store. From the AOS 5.0 release onward, Nutanix has used LZ4 and LZ4HC for data
compression. Releases prior to AOS 5.0 use the Google Snappy compression library. Both
methods provide good compression ratios with minimal computational overhead and extremely
fast compression and decompression rates.
The following figure shows an example of how inline compression interacts with the DSF write I/
O path.
Nutanix relies on a replication factor for data protection and availability. This method provides the
highest degree of availability, because it does not require reading from more than one storage
location or data recomputation on failure. However, this advantage comes at the cost of storage
resources, as it requires full copies.
3. Solution Overview | 9
Hortonworks on Nutanix
To provide a balance between availability and a reduced storage requirement, the DSF can
encode data using the patented Nutanix version of erasure coding, called EC-X.
EC-X encodes a strip of data blocks on different nodes and determines parity, in a manner
similar to RAID (levels 4, 5, 6, and so on). In the event of a host or disk failure, the system can
use the parity to decode any missing data blocks via decoding. In the DSF, the data block is an
extent group, and each data block must be on a different node and belong to a different vDisk.
You can configure the number of data and parity blocks in a strip based on how many failures
you need to tolerate. In most cases, we can think of the configuration as <number of data
blocks> / <number of parity blocks>.
For example, “replication factor 2-like” availability (N + 1) could consist of three or four data
blocks and one parity block in a strip (3/1 or 4/1). “Replication factor 3-like” availability (N + 2)
could consist of three or four data blocks and two parity blocks in a strip (3/2 or 4/2).
The calculation for expected overhead is <# parity blocks> / <# parity blocks + # data blocks>.
For example, a 4/1 strip has a 20 percent overhead or 1.2x compared to the 2x of replication
factor 2.
EC-X is a post-process framework that doesn’t affect the traditional write I/O path. The encoding
uses the internal MapReduce framework for task distribution. Once the strip is formed, the
system removes the redundant copies of the data.
3. Solution Overview | 10
Hortonworks on Nutanix
EC-X complements compression savings to decrease the amount of storage needed. EC-X
generally accounts for about 80 percent of the cluster’s total capacity. The other 20 percent is
write-hot data that isn’t a good candidate for encoding.
YARN
Yet Another Resource Negotiator (YARN) is a core part of Hadoop. YARN manages the compute
resources (memory and CPU) in a Hadoop cluster. A YARN application can request these
resources, and YARN makes them available according to its scheduler policy. In a Hadoop
cluster, YARN achieves 100 percent usage of all resources on the physical system while letting
every application perform at its maximum potential. YARN is made up of following components:
• ResourceManager: A single entry point for clients to submit YARN applications. Responsible
for application management and scheduling.
• NodeManager: Runs on the DataNode worker nodes in a Hadoop cluster to enable job
scheduling.
• Container: Resource unit in YARN. Containers are sized by memory and CPU and are
dynamic.
3. Solution Overview | 11
Hortonworks on Nutanix
• Application Master: Makes sure that the application gets the necessary resources from the
ResourceManager. Also directs cleanup when the job finishes or if there is an application
failure.
HDFS
A Hadoop Distributed File System (HDFS) instance consists of a cluster of machines, often
referred to as the Hadoop cluster or HDFS cluster. There are two main components of an HDFS
cluster:
• NameNode: The master HDFS node that manages the data (without actually storing it),
determining and maintaining how chunks of data are distributed across DataNodes.
• DataNode: A worker node that stores chunks of data and replicates the chunks across other
data nodes.
NameNode and DataNode are daemon processes running in the cluster. In the HDFS master/
worker node architecture, the NameNode is the master server that manages the file system
namespace and regulates clients’ access to files. It determines how chunks of data are
distributed across DataNodes, but data never resides on or passes through the NameNode.
As worker nodes for the NameNode, DataNodes store chunks of data and replicate the chunks
across other DataNodes. DataNodes constantly communicate their state with the NameNode as
well as with other DataNodes, through commands from the NameNode, to replicate data.
Enterprise Ready
HDP provides Apache Hadoop for the enterprise, developed completely in the open and
supported by the deepest technology expertise. HDP incorporates current community innovation
and is tested on the most mature Hadoop test suite and on thousands of nodes. Engineers with
the broadest and most complete knowledge of Apache Hadoop develop and support HDP.
3. Solution Overview | 12
Hortonworks on Nutanix
Security
HDP is designed to meet the changing needs of big data processing within a single platform
and supports all big data scenarios, from batch, to interactive, to real-time and streaming. With
YARN, HDP offers a versatile data access layer at the core of enterprise Hadoop that can
incorporate new processing engines as they become ready for enterprise consumption. HDP
provides the comprehensive enterprise capabilities of security, governance, and operations for
enterprise Hadoop implementations.
3. Solution Overview | 13
Hortonworks on Nutanix
addition to the SQL builder, a Pig Latin Editor brings a modern browser-based IDE experience to
Pig. There is also a file browser for HDFS.
Data Management
The Apache Falcon web forms-based approach allows for rapid development of feeds and
processes. The Falcon UI also allows you to search and browse processes that have executed,
visualize lineage, and set up mirroring jobs to replicate files and databases between clusters
or to cloud storage such as Microsoft Azure Storage. As an additional benefit, centralized
authorization and policy management for security and audit information with Apache Atlas allows
companies to ensure data governance compliance.
Data Access
As organizations strive to store their data in a single repository efficiently and to interact with it
simultaneously in different ways, they need SQL, streaming, data science, batch, and in-memory
processing from a tool such as Spark. HDP uses YARN as its architectural center, allowing
Hadoop to connect easily to hundreds of data systems, including Spark, without having to write
code. HDP’s support for all Apache Hadoop projects makes it a great choice.
3. Solution Overview | 14
Hortonworks on Nutanix
Figure 7: Prism
3. Solution Overview | 15
Hortonworks on Nutanix
The Nutanix modular scale-out approach enables customers to select any initial deployment size
and grow in granular data and desktop increments. This design removes the hurdle of a large
up-front infrastructure purchase that a customer may need many months or years to grow into,
ensuring a faster time to value for an HDP implementation.
3. Solution Overview | 16
Hortonworks on Nutanix
Note: Whenever possible, incorporate all best practices (including those listed in the
Appendix) into the base image. However, do not install any Hortonworks software
into this golden image.
3. Solution Overview | 17
Hortonworks on Nutanix
efficiency and simplicity. These capabilities ensure the highest possible compression and
decompression performance on a subblock level. While developers may or may not run jobs
with compression, IT operations can ensure that the system is storing cold data effectively.
You can also apply Nutanix erasure coding (EC-X) on top of compression savings.
• Automatic leveling and automatic archive: Nutanix spreads data evenly across the cluster,
ensuring that local drives don’t fill up and cause an outage when space is available elsewhere
on the network. Cold data can move from compute nodes to cold storage nodes, freeing up
room for hot data without consuming additional licenses.
• Time-sliced clusters: Like public cloud EC2 environments, Nutanix can provide a truly
converged cloud infrastructure, allowing you to run your Hadoop, server, and desktop
virtualization on a single cloud to get the efficiency and savings you require.
3. Solution Overview | 18
Hortonworks on Nutanix
4. Solution Design
Using the Hortonworks Data Platform on Nutanix, you have the flexibility to start small, with
a single block, and scale up incrementally one node at a time. This approach provides the
best of both worlds—the ability to start small and grow to massive scale without any impact on
performance.
4. Solution Design | 19
Hortonworks on Nutanix
4. Solution Design | 20
Hortonworks on Nutanix
Production Environments
In production environments, you should use a minimum of HDFS replication factor 2, so that
the NameNode has multiple options for placing containers and YARN can work with local data.
Replication factor 2 on HDFS also helps with job reliability. If a physical node or VM goes down
because of maintenance or an error, YARN jobs can quickly restart.
4. Solution Design | 21
Hortonworks on Nutanix
In the diagram above, once the local data node writes A1 to HDFS, DSF creates B1 locally
and creates the second copy based on Nutanix availability domains. HDFS also writes A2 (a
second copy) so that the same process creates C1 and C2 (two additional copies on the DSF)
synchronously. Because the Hadoop DataNode has knowledge of A1 and A2, you can use both
copies for task parallelization.
Note: Using HDFS replication factor 2 may affect performance in certain situations
because it provides limited options for scheduling compute tasks to use the local
copies of data of which HDFS is aware. However, as the additional copies created
at higher HDFS replication factors simply offer more availability and are not used for
any consensus or voting majority, Nutanix recommends HDFS replication factor 2.
As an example, a 36-node HDP cluster formed with NX-8150 appliances would have
approximately 1,088 TB of raw capacity. Because the Acropolis DSF uses a replication factor of
2, the usable capacity on Nutanix is 544 TB before any capacity transformations. The table below
breaks down the HDFS usable storage on the DSF. Hadoop uses approximately 20 percent of
the usable storage for running the DataNode’s OS for the YARN and log directories. We must
divide the remaining usable storage by 2, because HDFS maintains its own replication factor.
Total Hadoop Usable Storage = (20% * total Acropolis DSF replication factor 2 capacity) + (80%
* total Acropolis DSF replication factor 2 capacity / HDFS replication factor)
Total Hadoop Usable Storage = (0.2 * 544) + (0.8 * 544 / 2)
Total Hadoop Usable Storage = 108.8 TB + 217.6 TB / 2
Total Hadoop Usable Storage = 326.40 TB
4. Solution Design | 22
Hortonworks on Nutanix
In specific use cases, such as when deploying Hadoop, Nutanix maintains the ability to change
the default priority of the tiered storage pool on a per-container basis. A Nutanix container—
not to be confused with the YARN unit of compute resources—is a thinly provisioned unit of
storage, created from the global cluster storage pool consisting of every local disk and SSD and
accessible to all nodes in the Nutanix cluster.
ncli container edit name=HDFS sequential-io-priority-order=DAS-SATA,SSD-SATA,SSD-PCIe
The command syntax above results in the following changes to the HDFS container; compare
these changes to the YARN container parameters. Because of the highly sequential nature of
the HDFS workload, the command changes the container’s sequential I/O priority ordering so
that the DAS-SATA tier takes precedence for the HDFS container. After this change, writes go
directly to the spinning disk and use all the available spindles at once.
4. Solution Design | 23
Hortonworks on Nutanix
4. Solution Design | 24
Hortonworks on Nutanix
4. Solution Design | 25
Hortonworks on Nutanix
Kernel Tuning
The primary changes to kernel tuning involved disabling Transparent Huge Pages (THP) and
anonymous page swapping (vm.swappiness), as well as reducing read-ahead on a per-vDisk
basis. Full kernel parameters are available in the Appendix.
SELinux
HDP requires that you disable SELinux on all guest VMs that are part of the Hadoop cluster.
Disable SELinux on RHEL or CentOS-based Linux distributions by editing /etc/selinux/config
and setting SELINUX=disabled. This change must be performed as root (or with proper sudo
access) and requires a reboot.
Hostname Resolution
We highly recommend using DNS for hostname resolution. For Hadoop to function properly,
forward and reverse lookups for all hosts in the HDP cluster must be the inverse of each other.
To ensure proper DNS resolution, you can perform this easy test on the hosts:
dig <hostname>
dig –x <ip_address_returned_from_hostname_lookup>
Mount options:
1. Disable atime on the data disks (and from root fs).
4. Solution Design | 26
Hortonworks on Nutanix
2. In the /etc/fstab file, ensure that the appropriate file systems have the noatime mount option
specified:
LABEL=ROOT / ext4 noatime,data=ordered 0 0
UUID=99d77c99-1506-474c-99ce-282fe9e2dbcf /disk/sde ext4 noatime,data=ordered 0 0
Example:
/dev/sde on /disks/sde type ext4 (rw,noatime,data=ordered)
/dev/sdh on /disks/sdh type ext4 (rw,noatime,data=ordered)
/dev/sdm on /disks/sdm type ext4 (rw,noatime,data=ordered)
/dev/sdk on /disks/sdk type ext4 (rw,noatime,data=ordered)
/dev/sdg on /disks/sdg type ext4 (rw,noatime,data=ordered)
/dev/sdd on /disks/sdd type ext4 (rw,noatime,data=ordered)
/dev/sdb on /disks/sdb type ext4 (rw,noatime,data=ordered)
/dev/sdi on /disks/sdi type ext4 (rw,noatime,data=ordered)
/dev/sdc on /disks/sdc type ext4 (rw,noatime,data=ordered)
/dev/sdj on /disks/sdj type ext4 (rw,noatime,data=ordered)
/dev/sdf on /disks/sdf type ext4 (rw,noatime,data=ordered)
/dev/sdl on /disks/sdl type ext4 (rw,noatime,data=ordered)
4. Solution Design | 27
Hortonworks on Nutanix
Nutanix recommends having four vDisks on the YARN container for each DataNode VM. Try to
size the YARN vDisks to be 20 percent of the node’s usable capacity. If each node has 4 TB of
usable capacity, then each vDisk would be about 200 GB. Each DataNode should have eight
vDisks on the HDFS container.
We recommend sizing your HDFS vDisks based on the available usable capacity, enabling you
to make the best use of your Hadoop storage. This approach requires that you adjust the Hadoop
configuration, but the Ambari Cluster Manager makes Hadoop configuration adjustment simple.
4. Solution Design | 28
Hortonworks on Nutanix
YARN Configuration
We used the following Hortonworks-provided utility scripts to configure the required YARN
parameters for the Hadoop cluster. Note that the arguments provided to the utility script are
based on the individual configuration of a DataNode VM:
# python yarn-utils.py -c 12 -m 67 -d 12 -k False
Using cores=12 memory=67GB disks=12 hbase=False
Profile: cores=12 memory=67584MB reserved=1GB usableMem=66GB disks=12
Num Container=22
Container Ram=3072MB
Used Ram=66GB
Unused Ram=1GB
yarn.scheduler.minimum-allocation-mb=3072
yarn.scheduler.maximum-allocation-mb=67584
yarn.nodemanager.resource.memory-mb=67584
mapreduce.map.memory.mb=3072
mapreduce.map.java.opts=-Xmx2457m
mapreduce.reduce.memory.mb=3072
mapreduce.reduce.java.opts=-Xmx2457m
yarn.app.mapreduce.am.resource.mb=3072
yarn.app.mapreduce.am.command-opts=-Xmx2457m
mapreduce.task.io.sort.mb=1228
Storage
/dev/sdb on /disk1
/dev/sdc on /disk2 4 disks are used for
Management / Master
/dev/sdd on /disk3 management functions
/dev/sde on /disk4
4. Solution Design | 29
Hortonworks on Nutanix
/dev/sdb on /disk1
/dev/sdc on /disk2 4 disks are used for YARN
YARN NodeManager
/dev/sdd on /disk3 functions
/dev/sde on /disk4
/dev/sdf on /disk5
/dev/sdg on /disk6
/dev/sdh on /disk7
/dev/sdi on /disk8 8 disks make up HDFS
DataNode and are used for additional
/dev/sdj on /disk9 services like Solr
/dev/sdk on /disk10
/dev/sdl on /disk11
/dev/sdm on /disk12
4. Solution Design | 30
Hortonworks on Nutanix
4. Solution Design | 31
Hortonworks on Nutanix
In general, keep individual Nutanix clusters to 32 nodes when setting the DSF replication factor
to two. This configuration fits nicely into one rack using the NX-6000 line. When using replication
factor 3 for the DSF you can have 64 nodes in one cluster, but as replication factor 3 uses more
SSD space, only move to this setting if you need to scale past 32 nodes in one cluster.
Using this 32-node design with replication factor 2, you can have one full rack go down and
have a node outage in another rack—and still keep your cluster running. This failure scenario is
depicted in the figure in the Failure Domain Scenarios section below.
4. Solution Design | 32
Hortonworks on Nutanix
If you allow each rack to be a separate Nutanix cluster, HDFS can survive an outage in an entire
rack, and it can lose one additional node in every other available rack.
4. Solution Design | 33
Hortonworks on Nutanix
black art, and this makes it a great fit for virtualization, because you can easily adjust your VM
settings.
Sizing Compute
If you have an existing physical deployment, you can use the formula below. Typically, you
don’t want to oversubscribe your CPU, but if your system is not busy 24/7, an underutilized
environment just has jobs running longer than expected. It’s best to find out how quickly the
business needs jobs to finish.
Hyperthreading
Use Case Rationale
Bonus
NameNode Compute
The NameNode doesn’t require a lot of resources. Four vCPUs are a great starting place for
most Hortonworks clusters. As the Hortonworks cluster grows, you can always assign more
resources later within both ESXi and AHV.
4. Solution Design | 34
Hortonworks on Nutanix
DataNode Compute
The DataNode can be divided based on how many VMs you want to run on each node. The CVM
has access to eight vCPUs, but under load it typically only uses half of the vCPUs assigned.
If you plan on running multiple VMs, size the VMs to stay within the size of physical CPU. For
example, if you’re using a 10-CPU core, then make sure the VMs don’t go over 10 vCPUs.
Sizing Memory
NameNode Memory
NameNode sizing depends on how many data blocks you are storing. One million blocks equal 1
GB of NameNode memory. Because the NameNode is virtualized, you can easily adjust as your
environment changes.
The other master nodes, which run services such as Zookeeper for high availability, Ambari, and
monitoring, do not take a lot of memory resources.
DataNode Memory
Typical sizing for the DataNode is 6 GB to 8 GB per vCPU assigned. More RAM is always helpful
and can avoid spilling to disk when running MapReduce jobs. If you are running Spark on the
same DataNodes, you might want to size for 12 GB to 16 GB per core.
CVM Memory
Most Hadoop workloads do not benefit from the local RAM cache; however, to help with
metadata, the minimum CVM memory is 40 GB of RAM.
4. Solution Design | 35
Hortonworks on Nutanix
Network
Nutanix uses a leaf-spine network architecture for true linear scaling. A leaf-spine architecture
consists of two network tiers: an L2 leaf and an L3 spine based on 40 GbE and nonblocking
switches. This architecture maintains consistent performance without any throughput reduction
because there is a static maximum of three hops between any two nodes in the network.
The figure below shows a design of a scale-out leaf-spine network architecture that provides
20 Gb active throughput from each node to its L2 leaf and 80 Gb active throughput from each
leaf-to-spine switch that is scalable from one Nutanix block to thousands without any impact on
available bandwidth.
4. Solution Design | 36
Hortonworks on Nutanix
MTU Size
On each of the DataNode and management node VMs, we add the following setting to the
interface configuration file /etc/sysconfig/network-scripts/ ifcfg-<interface> to implement large or
jumbo frames:
…
MTU=9000
…
Similarly, all of the portgroups that attach to VMs need to use jumbo frames. The configuration is
as follows:
# esxcfg-vswitch -m 9000 <vswitch-name>
4. Solution Design | 37
Hortonworks on Nutanix
Figure 18: MTU Settings for Jumbo Frames on ESXi Standard vSwitch
Note: You must adjust the MTU size on both the switch side and the host side. The
specific commands and details for adjustments on the switch side vary depending on
the switch vendor.
4. Solution Design | 38
Hortonworks on Nutanix
To take advantage of the bandwidth provided by multiple upstream switch links, you can use
the balance-slb bond mode. (The default mode for AHV is active-backup.) The balance-slb bond
mode in OVS takes advantage of all links in a bond and uses measured traffic load to rebalance
VM traffic from highly used to less used interfaces. When the configurable bondrebalance interval
expires, OVS uses the measured load for each interface and the load for each source MAC hash
to spread traffic evenly among links in the bond. Traffic from some source MAC hashes may
move to a less active link to more evenly balance bond member utilization.
Configure the balance-slb algorithm for each bond on all AHV nodes in the Nutanix cluster with
the following commands:
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up bond_mode=balance-slb"
nutanix@CVM$ ssh root@192.168.5.1 "ovs-vsctl set port br0-up other_config:bond-
rebalanceinterval=30000"
4. Solution Design | 39
Hortonworks on Nutanix
Verify the proper bond mode on each CVM with the following commands:
nutanix@CVM$ ssh root@192.168.5.1 "ovs-appctl bond/show br0-up "
---- br0-up ---
bond_mode: balance-slb
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 29108 ms l
acp_status: off
slave eth2: enabled
may_enable: true
hash 120: 138065 kB load
hash 182: 20 kB load
slave eth3: enabled
active slave
may_enable: true
hash 27: 0 kB load
hash 31: 20 kB load
hash 104: 1802 kB load
hash 206: 20 kB load
Be sure to plan carefully before enabling jumbo frames. The entire network infrastructure,
Nutanix cluster, and components that connect to Nutanix must all be configured properly to
avoid packet fragmentation. Configuration details for increasing the network MTU to enable
jumbo frames on AHV hosts can be found in KB 3529 and KB 1807 for CVMs. Work with Nutanix
support before enabling jumbo frames.
4. Solution Design | 40
Hortonworks on Nutanix
The VMs on the management cluster ran the following key HDP components:
• MN9
Zookeeper
• MN8
Zookeeper, App Timeline Server
• MN7
Zookeeper
• MN6
ResourceManager
• MN5
NameNode
• MN4
Secondary NameNode
Various other ancillary components, including those dealing with metrics collection, were spread
across all other nodes forming the management cluster. Below, we describe the configuration of
each VM on the management cluster.
Figure 21: Data Cluster Hosting Three DataNodes per Hypervisor Host
Each physical hypervisor node hosts three worker node VMs, with an additional hypervisor node
left unoccupied for failover and resiliency. The HDP component layout across the VM was:
• DN1–21
DataNode, ResourceManager
6. Solution Application
In this section we consider real-world scenarios and outline sizing metrics and components. The
applications below assume a knowledge user workload; however, results can vary based on
utilization and workload.
The solution below has over 544 TB of usable capacity on Nutanix storage. Every Nutanix cluster
has a YARN container and an HDFS container configured. Recall that we have changed the
tiering priority for the HDFS container to write to the DAS-SATA layer in the first instance. Prism
provides insight into all of the Nutanix clusters.
Master node services run on a separate smaller NX-3460 cluster. The master services VMs
are very important but don’t have the same performance requirements as the DataNodes. This
difference in performance requirements allows you to isolate them, so they are not impacted by
the heavy sequential workloads.
You can configure Nutanix per-VM replication to provide management over a golden Hadoop
image for the DataNodes. The golden image is configured using current good practice as
6. Solution Application | 46
Hortonworks on Nutanix
detailed throughout this document. When you need to deploy a new DataNode, you can use
Nutanix data protection workflows to clone it from such a master image. Such agile workflows
allow for speedy deployments, and you can further enhance them, either by using the Nutanix
API to add automation or with Nutanix Calm.
6. Solution Application | 47
Hortonworks on Nutanix
7. Conclusion
The Nutanix Enterprise Cloud Platform with either AHV or ESXi gives customers an easy way
to deploy and manage the Hortonworks Data Platform without additional costs. Density for
HDP deployments is driven primarily by CPU requirements and not by any I/O bottleneck. In
testing, the platform was easily able to handle the IOPS and throughput requirements for Hadoop
instances. We determined sizing for the pods after carefully considering performance as well as
accounting for the additional resources needed for N + 1 failover capabilities.
The HDP on Nutanix solution provides a single high-density platform for Hadoop deployments,
VM hosting, and application delivery. This modular, pod-based approach means that you can
easily scale your deployments. Our testing proves that HDFS can run natively on both AHV and
ESXi, reducing the overhead associated with traditional Hadoop deployments.
7. Conclusion | 48
Hortonworks on Nutanix
Appendix
Configuration
In this section, we detail a minimum configuration for an HDP deployment on Nutanix.
Appendix | 49
Hortonworks on Nutanix
Appendix | 50
Hortonworks on Nutanix
Appendix | 51
Hortonworks on Nutanix
If necessary, allow NTP traffic ingress and egress access to the server:
# firewall-cmd --add-service=ntp --permanent
success
# firewall-cmd --reload
success
Appendix | 52
Hortonworks on Nutanix
You can verify chronyd operation using the following commands. For further troubleshooting,
please refer to documentation available online.
# chronyc tracking
Reference ID : 208.75.89.4 (time.tritn.com)
Stratum : 3
Ref time (UTC) : Wed Aug 30 12:01:15 2017
System time : 0.000030019 seconds slow of NTP time
Last offset : -0.000030078 seconds
RMS offset : 0.000167859 seconds
Frequency : 4.386 ppm fast
Residual freq : -0.000 ppm
Skew : 0.019 ppm
Root delay : 0.021557 seconds
Root dispersion : 0.001798 seconds
Update interval : 1038.1 seconds
Leap status : Normal
# chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
============================================================================
^* time.tritn.com 2 10 377 25m -853us[ -883us] +/- 11ms
^- 104.131.53.252 2 10 377 866 +1581us[+1581us] +/- 71ms
^+ time-b.timefreq.bldrdoc.g 1 10 377 276 +1705us[+1705us] +/- 20ms
^- palpatine.steven-mcdonald 2 10 377 618 +1266us[+1266us] +/- 40ms
Appendix | 53
Hortonworks on Nutanix
Kernel Settings
Add the following changes to /etc/sysctl.d/99-sysctl.conf on a systemd-based CentOS 7 Linux
distribution:
vm.swappiness=1
vm.dirty_background_ratio = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mtu_probing=1
For vSphere deployments, set queue depth, read-ahead, and the I/O scheduler per vDisk using
udev rules, create a file /etc/udev/rules.d/99-nutanix.rules and add the following entries for ESXi:
ACTION=="add|change", KERNEL=="sd*", ATTR{bdi/read_ahead_kb}="512", RUN+="/bin/sh -c '/bin/
echo 16 > /sys%p/queue/nr_requests && /bin/echo noop > /sys%p/queue/scheduler'"
For AHV deployments, we can use higher numbers for nr_requests, because AHV Turbo allows
us to enable multiqueue. To set queue depth and the I/O scheduler per vDisk using udev rules,
create a file /etc/udev/rules.d/99-nutanix.rules and add the following entries for AHV.
YARN disks:
ACTION=="add|change", KERNEL=="sd[a-g]", ATTR{bdi/read_ahead_kb}="4096", RUN+="/bin/sh -c '/
bin/echo 128 > /sys%p/queue/nr_requests && /bin/echo noop > /sys%p/queue/scheduler'"
HDFS disks:
ACTION=="add|change", KERNEL=="sd[h-m]", ATTR{bdi/read_ahead_kb}="4096", RUN+="/bin/sh -c '/
bin/echo 32 > /sys%p/queue/nr_requests && /bin/echo noop > /sys%p/queue/scheduler'"
Changes made using udev may be tested and implemented from the command line as follows:
/sbin/udevadm control --reload-rules
/sbin/udevadm trigger --type=devices --action=change
Tip: Be sure to verify that these changes persist after a system reboot.
To enable multiqueue for AHV, edit the /etc/default/grub file and append scsi_mod.use_blk_mq=y
dm_mod.use_blk_mq=y to the GRUB_CMDLINE_LINUX line:
GRUB_CMDLINE_LINUX=“crashkernel=auto rhgb quiet scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y”
Reboot.
Appendix | 54
Hortonworks on Nutanix
Glossary of Terms
• DataNode
Worker node in the cluster, to which HDFS data is written.
• HDFS
Hadoop Distributed File System.
• High Availability
Configuration that addresses availability issues in a cluster. In a standard configuration, the
NameNode is a single point of failure. Each cluster has one NameNode, and if that machine or
process becomes unavailable, the cluster as a whole is also unavailable until you either restart
the NameNode or bring it up on a new host.
High availability enables running two NameNodes in the same cluster: the active NameNode
and the standby NameNode. The standby NameNode allows a fast failover to a new
NameNode in case of machine crash or planned maintenance.
• NameNode
The metadata master of HDFS, essential for the integrity and proper functioning of the
distributed file system.
• NodeManager
The process that starts application processes and manages resources on the DataNodes.
• ResourceManager
The resource management component of YARN. This component initiates application startup
and controls scheduling on the DataNodes of the cluster (one instance per cluster).
• ZooKeeper
A centralized service for maintaining configuration information and naming, as well as for
providing distributed synchronization and group services.
Appendix | 55
Hortonworks on Nutanix
About Nutanix
Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that
power their business. The Nutanix Enterprise Cloud OS leverages web-scale engineering and
consumer-grade design to natively converge compute, virtualization, and storage into a resilient,
software-defined solution with rich machine intelligence. The result is predictable performance,
cloud-like infrastructure consumption, robust security, and seamless application mobility for a
broad range of enterprise applications. Learn more at www.nutanix.com or follow us on Twitter
@nutanix.
Appendix | 56
Hortonworks on Nutanix
List of Figures
Figure 1: Nutanix Web-Scale Properties........................................................................... 7
Figure 7: Prism................................................................................................................ 15
Figure 18: MTU Settings for Jumbo Frames on ESXi Standard vSwitch........................ 38
Figure 21: Data Cluster Hosting Three DataNodes per Hypervisor Host........................ 43
57
Hortonworks on Nutanix
58
Hortonworks on Nutanix
List of Tables
Table 1: Solution Details................................................................................................... 5
59