Sie sind auf Seite 1von 32

Taking Command of Big Data: Analytics and Storage Solutions for High Impact Business Insight

Ryan Peterson Director, Solutions Architecture Isilon Storage Division

Copyright 2013 EMC Corporation. All rights reserved.

Roadmap Information Disclaimer


EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, Roadmap Information). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby. Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization.

Copyright 2013 EMC Corporation. All rights reserved.

Agenda
Quick Review of Isilon Key Features Quick Review of Hadoop Lessons Learned Common Misconceptions Hadoop Technology Review Hadoop Technology Challenges Lessons Learned Seeing Hadoop Differently Case Study Example Resources

Copyright 2013 EMC Corporation. All rights reserved.

The Unstructured Data Challenge


90 80 70 60 50 40 30 20 10 0

Exabytes

2009

2010

2011

2012

2013

2014

File based: 61.8% CAGR

Block based: 23.7% CAGR

By 2013, 80% of all storage capacity sold will be for unstructured data
Source: Scale Out Storage in the Content Driven Enterprise: Unleashing the Value of Information Assets, IDC White Paper

Copyright 2013 EMC Corporation. All rights reserved.

EMC Isilon Scale-Out NAS


Single file system, single volume, global namespace for simplicity and ease of use Scales to over 20 PB Stripes data across all nodes for high resiliency and up to N+4 data protection Robust data backup and disaster recovery options Unmatched efficiency with > 80% storage utilization and automated storage tiering Worlds fastest NAS with over 100 GB/s throughput, 1.6M SPECsfs ops Integrated support for industry-standard protocols including NFS, SMB, HTTP, FTP, and HDFS for operational flexibility Native HDFS and HDFS 2.0 support

Copyright 2013 EMC Corporation. All rights reserved.

Hadoop Finding your Gold Nugget of Data

Copyright 2013 EMC Corporation. All rights reserved.

Hadoop
Created 6+ years ago Software platform designed to analyze massive amounts of unstructured data Two core components:
Hadoop Distributed File System (HDFS) (storage) MapReduce (compute)

Now a top-level Apache project backed by large, open source development community
Copyright 2013 EMC Corporation. All rights reserved. 7

Hadoop Lessons Learned


Hadoop is a complete solution Hadoop is a share-nothing architecture Hadoop is a mainstream technology Hadoop is only for Data Scientists Hadoop is only good with DAS HDFS is a robust file system Hadoop is an Engineering Exercise

Common Misperceptions

Copyright 2013 EMC Corporation. All rights reserved.

Isilon HDFS interface


Isilon supports the HDFS interfaces for the NameNode and DataNode to host and metadata and data Underlying filesystem is OneFS As simple as pointing the HDFS clients to the DNS name of the Isilon cluster!
Copyright 2013 EMC Corporation. All rights reserved. 9

Technology Review

Copyright 2013 EMC Corporation. All rights reserved.

10

Technology Review
NameNode Secondary NameNode

Job Tracker

DataNode / Task Tracker

Copyright 2013 EMC Corporation. All rights reserved.

11

NameNode
Manages the file system namespace Stores all the Metadata in the RAM Filenames, owners, group, access info Knows associated blocks Manages block replication

Copyright 2013 EMC Corporation. All rights reserved.

12

Secondary NameNode
Manages edit log and check-pointing of NameNode metadata Does NOT provide NameNode failover
Is not a backup or hot standby for the NameNode

Copyright 2013 EMC Corporation. All rights reserved.

13

Job Tracker
Manages all the jobs to the cluster Tracks and reports the status of jobs and tasks Provides job queuing functionality

Copyright 2013 EMC Corporation. All rights reserved.

14

DataNode / Task Tracker


Stores blocks of files on top of native host OS file system (e.g. EXT3, ZFS) Serves read/write requests from the clients Perform block creation, deletion, and replication Same block can be stored on multiple DataNodes for redundancy

Copyright 2013 EMC Corporation. All rights reserved.

15

Technology Challenges

Copyright 2013 EMC Corporation. All rights reserved.

16

Hadoop Technology Challenges


Traditional Hadoop NameNode Architecture and Data Resiliency Data Protection and Version Control with Hadoop Manual Import and Export of Data Scalability of Traditional Hadoop Infrastructure Protocol Support Time to Results

Copyright 2013 EMC Corporation. All rights reserved.

17

Traditional NameNode Architecture


No automatic recovery of NameNode = downtime Even with NameNode failover due out soon in Hadoop, manual recovery required

NameNode

When NameNode map is lost provides or damaged, location details data of alllocation stored information no information longer exists

Copyright 2013 EMC Corporation. All rights reserved.

18

Distributed (Clustered) NameNode When Using Isilon


Metadata stored across systems same way as standard file metadata Built-in clustered redundancy across many nodes
Clustered NameNode

NameNode

Clustering the NameNode on Isilon allows for the failure protection level Isilon already provides

Copyright 2013 EMC Corporation. All rights reserved.

19

Snapshot/Version Control

Before
Traditional HDFS does not have replication No Snapshotting of data Loss of Version control Not designed for Mission Critical

After
Full Snapshot IQTM integration identifies changes Multi-threaded, Multi-Node Scale-Out replication Improved RPO/RTO for business continuity Geo-replicated Hadoop!

Copyright 2013 EMC Corporation. All rights reserved.

20

Traditional Share-Nothing Hadoop


Unstructured Data

Existing Primary Storage

How long would it take to copy all of data to storage your Hadoop on a another Stick (R=3) platform? means 5 data copies ($$$$) How would maintain data Data hasyou to copy to the consistency when a file changes Hadoop cluster before analysis on can your primary storage? begin (Time to Results)

Existing Virtualized Data Center

SHARE-NOTHING Hadoop Infrastructure

Copyright 2013 EMC Corporation. All rights reserved.

21

Isilon Share-Everything Hadoop


Unstructured Data

Existing Primary Storage

Use Native HDFS Protocol

Existing Virtualized Data Center

New Hadoop Compute Nodes

Start using Hadoop NOW with unused processing and RAM available in your VMware environment No replication required (Use your existing data) Access to same data via NAS and HDFS protocols Time to results extremely fast using already existing data with NO COPIES or wasted $$$$
22

Copyright 2013 EMC Corporation. All rights reserved.

Protocol Support
Servers

Before HDFS is not visible to Windows, Unix, Linux, Apple, or any other file system natively Big Data is only used for Big Data After Inherent Multi-Protocol Support in Isilon allows ubiquitous access to all file systems including Hadoop Big Data is actual data!

Servers

Servers

Servers

Copyright 2013 EMC Corporation. All rights reserved.

23

Time-to-Results
Have you ever copied 100TB from Primary Storage to a Hadoop system?
Existing Primary Storage

Data Center Network

How long does it take to copy 100TB from one place to another over a 10GB link? >24 Hours
Analysis

Existing Primary Storage

Data Center Network

Reading relevant data to analysis

Hadoop on a Stick
Data Copy

Hadoop Processing Nodes


In-Place Analysis

Copyright 2013 EMC Corporation. All rights reserved.

24

Dependent Scaling
Required Hadoop Cluster Nodes

Traditional Hadoop HDFS


Storage to Compute ratio is fixed Scaling compute means scaling capacity Difficult to provide QoS Compute upgrade is a forklift

Storage

Required performance/ capacity

Isilon HDFS
Scale compute independent of storage Achieve optimal performance balance even as workloads evolve No data migrations, ever! Add new performance as hardware evolves

Compute

Copyright 2013 EMC Corporation. All rights reserved.

25

Independent Scaling

Traditional Hadoop HDFS


Storage to Compute ratio is fixed Scaling compute means scaling capacity Difficult to provide QoS Compute upgrade is a forklift

Storage

Required performance/ capacity Required Hadoop Cluster Nodes

Isilon HDFS
Scale compute independent of storage Achieve optimal performance balance even as workloads evolve No data migrations, ever! Add new performance as hardware evolves

Compute

Copyright 2013 EMC Corporation. All rights reserved.

26

Hadoop Lesson Learned


Hadoop can be inexpensive Hadoop can be easy to deploy Hadoop can use my existing data

See Hadoop Differently

Hadoop NameNode data can be protected Hadoop data can have uptime guarantees HDFS is better as a protocol than file system Isilon addresses many Hadoop challenges

Copyright 2013 EMC Corporation. All rights reserved.

27

Captures Competitive Advantage with Hadoop Analytics and EMC Isilon


Challenge
Data growing 2550 terabytes per year Limited performance and capacity to support intensive Hadoop analytics Disparate systems lacked performance and capacity

Return Path

VP Infrastructure Operations

DIZ CARTER

Isilon serves NFS data across multiple product suites and makes it easily X-series Hadoop, internally accessible to our Hadoop analytics team. Thats a significant business developed email allowing SmartPools, SmartConnect, enabler, Return Path to develop customer solutions much intelligence solutions faster. SmartQuotas, InsightIQ

Solution

Applications

Results

Enables unconstrained access to email data for analysis Reduces shared storage data center footprint by 30 percent Improves availability and reliability for Hadoop analytics savings of $350,000 from lower power, cooling, and maintenance

Copyright 2013 EMC Corporation. All rights reserved.

28

For More Information


EMC.com:
EMC Isilon Scale-Out NAS: http://www.emc.com/isilon Scale-Out Storage Solutions for Hadoop: http://www.emc.com/big-data/scale-out-storage-hadoop.htm

Solution Brief: EMC Big Data Storage and Analytics Solution White Paper: Hadoop on EMC Isilon Scale-Out NAS Analyst Report: EMCs Enterprise Hadoop Solution, Enterprise Strategy Group, 2012 Email me: ryan.peterson@emc.com

Copyright 2013 EMC Corporation. All rights reserved.

29

Related Sessions
Session Name Isilon Scale-Out NAS Overview and Future Directions Protecting & Backing Up the Isilon Cluster at Enterprise Scale Get Better Insight into Your Isilon Cluster with Tools that Help You Manage Your Performance & Capacity Birds of a Feather Online File Sharing & Collaboration Opportunities and Challenges in Deploying with On-Premise Storage Hadoop Opportunities and Challenges in Deploying with an Enterprise Infrastructure
Copyright 2013 EMC Corporation. All rights reserved.

Date Monday 5/6 Wednesday 5/8 Tuesday 5/7 Thursday 5/9 Tuesday 5/7 Thursday 5/9 Date Tuesday, 5/7 Wednesday, 5/8

Time 1-2pm 8:30-9:30am 10-11am 8:30-9:30am 10-11am 11:30am-12:30pm Time 1-2pm 1-2pm
30

Stop by the EMC ISILON Booth #124 and Wednesday Keynote for a chance to win
Weigh your current Big Data at the EMC ISILON booth and get a t-shirt. Join one of our theater presentations and receive a FREE drink ticket at the Captains Lab.

Discover the future of enterprise storage at Isilons Keynote Wednesday, April 8 11:30 AM Venetian Ballroom Drawing immediately following for a 3D Printer (Makerbot)

Copyright 2013 EMC Corporation. All rights reserved.

31

Das könnte Ihnen auch gefallen