OpenStack Sahara Essentials
By Omar Khedher
()
About this ebook
- A fast paced guide to help you utilize the benefits of Sahara in OpenStack to meet the Big Data world of Hadoop.
- A step by step approach to simplify the complexity of Hadoop configuration, deployment and maintenance.
This book is intended for developers who want to become proficient with OpenStack Sahara. The book is best suited for readers who are familiar with databases, Hadoop, and Spark. Basic prior knowledge of OpenStack is expected. The readers should also be familiar with different Linux boxes and virtualization technology.
Related to OpenStack Sahara Essentials
Related ebooks
OpenStack Administration with Ansible 2 - Second Edition Rating: 0 out of 5 stars0 ratingsOpenStack Networking Essentials Rating: 0 out of 5 stars0 ratingsPractical OneOps Rating: 0 out of 5 stars0 ratingsIntroducing Azure Kubernetes Service: A Practical Guide to Container Orchestration Rating: 0 out of 5 stars0 ratingsCassandra Design Patterns - Second Edition Rating: 0 out of 5 stars0 ratingsOpenStack Object Storage (Swift) Essentials Rating: 0 out of 5 stars0 ratingsMastering OpenStack Rating: 1 out of 5 stars1/5OpenStack Orchestration Rating: 5 out of 5 stars5/5Docker Networking Cookbook Rating: 0 out of 5 stars0 ratingsHybrid Cloud Management with Red Hat CloudForms Rating: 0 out of 5 stars0 ratingsOpenStack Administration with Ansible Rating: 0 out of 5 stars0 ratingsLearning Windows Server Containers Rating: 0 out of 5 stars0 ratingsNginx Troubleshooting Rating: 0 out of 5 stars0 ratingsOpenStack Trove Essentials Rating: 0 out of 5 stars0 ratingsOpenStack Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsTroubleshooting OpenStack Rating: 0 out of 5 stars0 ratingsVMware vSphere Design Essentials Rating: 0 out of 5 stars0 ratingsLearning CoreOS Rating: 0 out of 5 stars0 ratingsInstant VMware vCloud Starter Rating: 0 out of 5 stars0 ratingsApache Cassandra Essentials Rating: 4 out of 5 stars4/5Implementing Cloud Storage with OpenStack Swift Rating: 0 out of 5 stars0 ratingsInstant Apache ActiveMQ Messaging Application Development How-to Rating: 0 out of 5 stars0 ratingsMastering Ansible - Second Edition Rating: 0 out of 5 stars0 ratingsOpenStack Essentials Rating: 0 out of 5 stars0 ratingsExtending Docker Rating: 0 out of 5 stars0 ratingsMastering Cloud Development using Microsoft Azure Rating: 0 out of 5 stars0 ratingsLearning Apache Mahout Classification Rating: 0 out of 5 stars0 ratingsMicrosoft Azure IaaS Essentials Rating: 4 out of 5 stars4/5Instant CloudFlare Starter Rating: 0 out of 5 stars0 ratingsDemystifying the Azure Well-Architected Framework: Guiding Principles and Design Best Practices for Azure Workloads Rating: 0 out of 5 stars0 ratings
System Administration For You
Cybersecurity: The Beginner's Guide: A comprehensive guide to getting started in cybersecurity Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 0 out of 5 stars0 ratingsCompTIA A+ Complete Review Guide: Core 1 Exam 220-1101 and Core 2 Exam 220-1102 Rating: 5 out of 5 stars5/5Practical Data Analysis Rating: 4 out of 5 stars4/5Learn Windows PowerShell in a Month of Lunches Rating: 0 out of 5 stars0 ratingsNetworking for System Administrators: IT Mastery, #5 Rating: 5 out of 5 stars5/5Mastering Windows PowerShell Scripting Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5PowerShell: A Comprehensive Guide to Windows PowerShell Rating: 4 out of 5 stars4/5Bash Command Line Pro Tips Rating: 5 out of 5 stars5/5Linux Command-Line Tips & Tricks Rating: 0 out of 5 stars0 ratingsGit Essentials Rating: 4 out of 5 stars4/5Learn PowerShell Scripting in a Month of Lunches Rating: 0 out of 5 stars0 ratingsLinux for Beginners: Linux Command Line, Linux Programming and Linux Operating System Rating: 4 out of 5 stars4/5The Complete Powershell Training for Beginners Rating: 0 out of 5 stars0 ratingsPowerShell in Depth Rating: 0 out of 5 stars0 ratingsMastering Linux Shell Scripting Rating: 4 out of 5 stars4/5UNIX Shell Programming Interview Questions You'll Most Likely Be Asked Rating: 0 out of 5 stars0 ratingsOperating Systems DeMYSTiFieD Rating: 0 out of 5 stars0 ratingsLinux Bible Rating: 0 out of 5 stars0 ratingsPowerShell: A Beginner's Guide to Windows PowerShell Rating: 4 out of 5 stars4/5Arduino: A Quick-Start Beginner's Guide Rating: 4 out of 5 stars4/5e-Discovery For Dummies Rating: 0 out of 5 stars0 ratingsLinux Commands By Example Rating: 5 out of 5 stars5/5ConfigMgr - An Administrator's Guide to Deploying Applications using PowerShell Rating: 5 out of 5 stars5/5
Reviews for OpenStack Sahara Essentials
0 ratings0 reviews
Book preview
OpenStack Sahara Essentials - Omar Khedher
Table of Contents
OpenStack Sahara Essentials
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. The Essence of Big Data in the Cloud
It is all about data
The dimensions of big data
The big challenge of big data
The revolution of big data
A key of big data success
Use case: Elastic MapReduce
OpenStack crossing big data
Sahara: bringing big data to the cloud
Sahara in OpenStack
The Sahara OpenStack mission
The Sahara's architecture
Summary
2. Integrating OpenStack Sahara
Preparing the test infrastructure environment
OpenStack test topology environment
OpenStack test networking layout
OpenStack test environment design
Installing OpenStack
Network requirements
System requirements
Running the RDO installation
Integrating Sahara
Installing and configuring OpenStack Sahara
Installing the Sahara user interface
Summary
3. Using OpenStack Sahara
Planning a Hadoop deployment
Assigning Hadoop nodes
Sahara provisioning plugins
Creating a Hadoop cluster
Preparing the image from Horizon
Preparing the image using CLI
Creating the Node Group Template
Creating the Node Group Template in Horizon
Creating a Node Group Template using CLI
Creating the Node Cluster Template
Creating the Node Cluster Template with Horizon
Creating the Node Cluster Template using CLI
Launching the Hadoop cluster
Launching the Hadoop cluster with Horizon
Launching the Hadoop cluster using the CLI
Summary
4. Executing Jobs with Sahara
Job glossary in Sahara
Job binaries in Sahara
Jobs in Sahara
Running jobs in Sahara
Executing jobs via Horizon
Executing jobs using the Sahara RESTful API
API authentication
Launching an EDP job
Registering a Spark image using REST API
Creating Spark node group templates
Creating a Spark cluster template
Launching the Spark cluster
Creating a job binary
Creating a Spark job template
Executing the Spark job
Extending the Spark job
Summary
5. Discovering Advanced Features with Sahara
Sahara plugins
Vanilla Apache Hadoop
Building an image for the Apache Vanilla plugin
Vanilla Apache requirements and limitations
Hortonworks Data Platform plugin
Building an image for the HDP plugin
HDP requirements and limitations
Cloudera Distribution Hadoop plugin
Building an image for the CDH plugin
CDH requirements and limitations
Apache Spark plugin
Building an image for the Spark plugin
Spark requirements and limitations
Affinity and anti-affinity
Anti-affinity in action
Boosting Elastic Data Processing performance
Defining the network
Increasing data reliability
Summary
6. Hadoop High Availability Using Sahara
HDP high-availability support
Minimum requirements for the HA Hadoop cluster in Sahara
HA Hadoop cluster templates
CDH high-availability support
Summary
7. Troubleshooting
Troubleshooting OpenStack
OpenStack debug tool
Troubleshooting SELinux
Troubleshooting identity
Troubleshooting networking
Troubleshooting data processing
Debugging Sahara
Logging Sahara
Troubleshooting missing services
Troubleshooting cluster creation
Troubleshooting user quota
Troubleshooting cluster scaling
Troubleshooting cluster access
Summary
Index
OpenStack Sahara Essentials
OpenStack Sahara Essentials
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2016
Production reference: 1200416
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-596-9
www.packtpub.com
Credits
Author
Omar Khedher
Reviewer
Ami Shmueli
Acquisition Editor
Nitin Dasan
Content Development Editor
Abhishek Jadhav
Technical Editor
Nirant Carvalho
Copy Editors
Madhusudan Uchil
Jonathan Todd
Project Coordinator
Judie Jose
Proofreader
Safis Editing
Indexer
Priya Sane
Graphics
Kirk D'Penha
Production Coordinator
Shantanu N. Zagade
Cover Work
Shantanu N. Zagade
About the Author
Omar Khedher is a systems and network engineer. He worked for a few years in cloud computing environments and was involved in several private cloud projects based on OpenStack. Leveraging his skills as a system administrator in virtualisation, storage, and networking, he works as cloud system engineer for a leading advertising technology company, Fyber, based in Berlin. Currently, together with several highly skilled professional teams in the market, they collaborate to build a high scalable infrastructure based on the cloud platform.
Omar is also the author of another OpenStack book, Mastering OpenStack, Packt Publishing. He has authored also few academic publications based on new researches for the cloud performance improvement.
I would like to dedicate this book to my lovely parents, my brothers, and my dear 'schat', who have supported me remotely throughout the writing of this book. Big thanks go out to my PhD supervisor Dr. Mohamed in KSA, to my professional friend Andre Van De Water , to Belgacem in Tunisia for their guidance and critique and to my colleagues at Fyber for sharing knowledge. I extend a special thanks to Kenji Shioda for providing resources in his cloud platform to make the book labs happen. I would like to thank all the reviewers of this book for their accurate notices and precious remarks. A thank you to Abhishek Jadhav for the continued and great work on this book, which has taken a good piece of work. And with no doubt, a great thanks to the immense work provided by the OpenStack community to deliver such an amazing project, Sahara, that empowers the OpenStack journey.
About the Reviewer
Ami Shmueli has 18 years of experience in IT, System, DevOps, Automation.
He fulfilled different roles over those years as Infrastructure manager, Technical product manager, DevOps leader, and other technical and managerial roles.
He has vast experience in cloud design and architecture for telco enterprises and financial companies.
Ami has a proven experience in both R&D and infrastructure domains. Currently, Ami is involved in building high-scale on-premise clouds based on OpenStack as well as leading several application modernization projects on top of AWS, Azure, and SoftLayer.
www.PacktPub.com
eBooks, discount offers, and more
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Preface
OpenStack, the ultimate cloud computing operating system, keeps growing and gaining more popularity around the globe. One of the main reasons of OpenStack's success is the collaboration of several big enterprises and companies worldwide. Within every new release, the OpenStack community brings a new incubated project to the cloud computing open source world. Lately, big data has also taken a very important role in the OpenStack journey. Within its broad definition of the complexity of data management and its value extraction, the big-data business faces several challenges that need to be tackled. With the growth of the concept of cloud paradigm in the last decade, the big-data world can also be offered as a service. Specifically, the OpenStack community has taken on such a challenge to turn it into a very unique opportunity: Big Data as a Service. The Sahara project makes provisioning a complete elastic Hadoop cluster a very seamless operation with no need for touching the underlying infrastructure. Running on OpenStack, Sahara becomes a very mature project that supports Hadoop and Spark, the open source in-memory computing framework. That becomes a very good deal to find a parallel world about Big Data and Data Processing in Sahara named Elastic Data Processing. Sahara, formerly known as Savanna, has become a very attractive project, mature and supporting several big data providers.
In this book, we will explore the main motivation of using Sahara and how it interacts with other services of OpenStack. The main motivation of using Sahara is the facilities exposed from a central dashboard to manage big-data infrastructure and simplify data-processing tasks. We will walk through the installation and integration of Sahara OpenStack, launch clusters, execute sample jobs, explore more functions, and troubleshoot some common errors. By the end of this book, you should not only understand how Sahara operates and functions within the OpenStack ecosystem but also realize its major use cases of cluster and workload management.
What this book covers
Chapter 1, The Essence of Big Data in the Cloud, introduces the motivation of using the cloud computing paradigm in big-data management. The chapter will focus on the need of a different way to resolve big-data analysis complexity by looking at the Sahara project and its internal architectural design.
Chapter 2, Integrating OpenStack Sahara, walks through all the necessary steps for installing a multi-node OpenStack environment and integrating Sahara, and it shows you how to run it successfully along with the existing OpenStack environment.
Chapter 3, Using OpenStack Sahara, describes the workflow of Hadoop cluster creation using Sahara. The chapter shows you how to speed up launching clusters using templates through Horizon and via the command line in OpenStack.
Chapter 4, Executing Jobs with Sahara, focuses on executing sample