Beruflich Dokumente
Kultur Dokumente
Agenda
Introduction to Big Data
Why Big Data?
Big Data Overview
Hadoop Overview
Why Hadoop?
Who can learn Hadoop?
2.9 Million
20 hours
20PetaBytes
50 Million
700 Billion
Data sent & received by mobile users per day 1.3 ExaBytes
Products ordered on Amazon per second
*Source:http://www. http://ibm.com/
73 items
*Source:http://www. http://forbes.com/
Hadoop Overview
Hadoop allows batch processing for colossal data sets
Commodity Hardware
Its the average amount of computing resources.
It doesnt imply low quality but, affordability.
Hadoop Clusters run on Commodity Servers.
Commodity servers have an average ratio of disk space to
memory which is not like specialized servers with high memory
or CPU.
Servers are not designed specifically to distribute storage and
process framework, but its made to fit the purpose.
Benefits of Hadoop
Scalable Hadoop can store and distribute very large data sets
across hundreds of inexpensive servers that operate in parallel.
Benefits of Hadoop
Cost-Effective Hadoop is a scale-out architecture that stores all
the company's data for later use, for which it offers computing
and storage capabilities for a reasonable price.
*Source:http://www. http://datanami.com/
Why Hadoop
It provides insights into daily operations
Drives new product ideas
Used by companies for research and development and marketing
analysis
Image and text processing.
Analyses huge amount of data in comparatively less time.
Network monitoring
Log and/or click stream analysis of various kinds.
Hadoop Forecast
*Source:http://www. http://alliedmarketresearch.com/
Even if you arent introduced to Java & Linux before, You can
learn it parallel along with Hadoop.
*Source:http://www. http://the451group.com/
*Source:http://www. http://itproportal.com/
Hadoop Architecture
The 2 main components of Hadoop are:
Hadoop Distributed File System (HDFS) is the storage
component that breaks files into blocks, replicates and stores
them across the cluster.
*Source:http://www. http://cloudera.com/
Hadoop Ecosystem
Apache Hadoop Distributed File System offers storage of large
files across multiple machines.
Hadoop Ecosystem
Apache Flume is an Unstructured data aggregator to HDFS.
Apache Sqoop is a system for transferring bulk data
between HDFS and relational databases.
Apache Oozie is a workflow scheduler system to manage Apache
Hadoop jobs.
Apache Zookeeper is a coordinator with tools to write correct
distributed applications.
Apache Avro is a framework for modelling, serializing and
making Remote Procedure Calls.
Q&A
Q&A
THANK YOU