Beruflich Dokumente
Kultur Dokumente
Agenda
Quick Review of Isilon Key Features Quick Review of Hadoop Lessons Learned Common Misconceptions Hadoop Technology Review Hadoop Technology Challenges Lessons Learned Seeing Hadoop Differently Case Study Example Resources
Exabytes
2009
2010
2011
2012
2013
2014
By 2013, 80% of all storage capacity sold will be for unstructured data
Source: Scale Out Storage in the Content Driven Enterprise: Unleashing the Value of Information Assets, IDC White Paper
Hadoop
Created 6+ years ago Software platform designed to analyze massive amounts of unstructured data Two core components:
Hadoop Distributed File System (HDFS) (storage) MapReduce (compute)
Now a top-level Apache project backed by large, open source development community
Copyright 2013 EMC Corporation. All rights reserved. 7
Common Misperceptions
Technology Review
10
Technology Review
NameNode Secondary NameNode
Job Tracker
11
NameNode
Manages the file system namespace Stores all the Metadata in the RAM Filenames, owners, group, access info Knows associated blocks Manages block replication
12
Secondary NameNode
Manages edit log and check-pointing of NameNode metadata Does NOT provide NameNode failover
Is not a backup or hot standby for the NameNode
13
Job Tracker
Manages all the jobs to the cluster Tracks and reports the status of jobs and tasks Provides job queuing functionality
14
15
Technology Challenges
16
17
NameNode
When NameNode map is lost provides or damaged, location details data of alllocation stored information no information longer exists
18
NameNode
Clustering the NameNode on Isilon allows for the failure protection level Isilon already provides
19
Snapshot/Version Control
Before
Traditional HDFS does not have replication No Snapshotting of data Loss of Version control Not designed for Mission Critical
After
Full Snapshot IQTM integration identifies changes Multi-threaded, Multi-Node Scale-Out replication Improved RPO/RTO for business continuity Geo-replicated Hadoop!
20
How long would it take to copy all of data to storage your Hadoop on a another Stick (R=3) platform? means 5 data copies ($$$$) How would maintain data Data hasyou to copy to the consistency when a file changes Hadoop cluster before analysis on can your primary storage? begin (Time to Results)
21
Start using Hadoop NOW with unused processing and RAM available in your VMware environment No replication required (Use your existing data) Access to same data via NAS and HDFS protocols Time to results extremely fast using already existing data with NO COPIES or wasted $$$$
22
Protocol Support
Servers
Before HDFS is not visible to Windows, Unix, Linux, Apple, or any other file system natively Big Data is only used for Big Data After Inherent Multi-Protocol Support in Isilon allows ubiquitous access to all file systems including Hadoop Big Data is actual data!
Servers
Servers
Servers
23
Time-to-Results
Have you ever copied 100TB from Primary Storage to a Hadoop system?
Existing Primary Storage
How long does it take to copy 100TB from one place to another over a 10GB link? >24 Hours
Analysis
Hadoop on a Stick
Data Copy
24
Dependent Scaling
Required Hadoop Cluster Nodes
Storage
Isilon HDFS
Scale compute independent of storage Achieve optimal performance balance even as workloads evolve No data migrations, ever! Add new performance as hardware evolves
Compute
25
Independent Scaling
Storage
Isilon HDFS
Scale compute independent of storage Achieve optimal performance balance even as workloads evolve No data migrations, ever! Add new performance as hardware evolves
Compute
26
Hadoop NameNode data can be protected Hadoop data can have uptime guarantees HDFS is better as a protocol than file system Isilon addresses many Hadoop challenges
27
Return Path
VP Infrastructure Operations
DIZ CARTER
Isilon serves NFS data across multiple product suites and makes it easily X-series Hadoop, internally accessible to our Hadoop analytics team. Thats a significant business developed email allowing SmartPools, SmartConnect, enabler, Return Path to develop customer solutions much intelligence solutions faster. SmartQuotas, InsightIQ
Solution
Applications
Results
Enables unconstrained access to email data for analysis Reduces shared storage data center footprint by 30 percent Improves availability and reliability for Hadoop analytics savings of $350,000 from lower power, cooling, and maintenance
28
Solution Brief: EMC Big Data Storage and Analytics Solution White Paper: Hadoop on EMC Isilon Scale-Out NAS Analyst Report: EMCs Enterprise Hadoop Solution, Enterprise Strategy Group, 2012 Email me: ryan.peterson@emc.com
29
Related Sessions
Session Name Isilon Scale-Out NAS Overview and Future Directions Protecting & Backing Up the Isilon Cluster at Enterprise Scale Get Better Insight into Your Isilon Cluster with Tools that Help You Manage Your Performance & Capacity Birds of a Feather Online File Sharing & Collaboration Opportunities and Challenges in Deploying with On-Premise Storage Hadoop Opportunities and Challenges in Deploying with an Enterprise Infrastructure
Copyright 2013 EMC Corporation. All rights reserved.
Date Monday 5/6 Wednesday 5/8 Tuesday 5/7 Thursday 5/9 Tuesday 5/7 Thursday 5/9 Date Tuesday, 5/7 Wednesday, 5/8
Time 1-2pm 8:30-9:30am 10-11am 8:30-9:30am 10-11am 11:30am-12:30pm Time 1-2pm 1-2pm
30
Stop by the EMC ISILON Booth #124 and Wednesday Keynote for a chance to win
Weigh your current Big Data at the EMC ISILON booth and get a t-shirt. Join one of our theater presentations and receive a FREE drink ticket at the Captains Lab.
Discover the future of enterprise storage at Isilons Keynote Wednesday, April 8 11:30 AM Venetian Ballroom Drawing immediately following for a 3D Printer (Makerbot)
31