Sie sind auf Seite 1von 8

Shahbaz Khan

Shahbaz Khan , MS - IT 5/19/2019 1


 old structured vs. un structured .
 Variety (Multiple structure)
 Volume (terabytes to petabytes)
 Velocity (short time and large amount of data) like
financial data for decision making. Data in
motion(hadoop is for rest)
 Example: Pictures, Emails, Clicked events, messaging,
open data, public posts etc

Shahbaz Khan , MS - IT 5/19/2019 2


 Due to rapid change in data now traditional file system
and tools can not handle these data due to scale and
complexity.
 Hadoop is platform that is available to deal with this.

Shahbaz Khan , MS - IT 5/19/2019 3


 In 2002 apache nutch(open source web search engine)
project was interested for infrastructure to deal the
with billions of web pages.
 In 2004 Google published white paper of GFS and
Mapreduce and in middle of 2005 nutch was using
both Mapreduce and HDFS.
 In 2006 they become the part of Lucene Subproject
named Hadoop.

Shahbaz Khan , MS - IT 5/19/2019 4


 Hadoop is a framework for storing data on large
clusters of commodity hardware(everyday hardware
that is easy available and affordable) and running
applications against that data.
 Hadoop was the name of that (co creator of hadoop)
Doug cutting’s son gave to his elephant toy.

Shahbaz Khan , MS - IT 5/19/2019 5


 It contains two main components
 Distributed processing framework mapreduce.
 Distributed file system HDFS.
 It is mater slave architecture.
 Master Nodes that controls the storage and processing
system and slave nodes which store the data and
process the data.

Shahbaz Khan , MS - IT 5/19/2019 6


 Mapreduce involves the processing of a sequence of
operations on distributed data sets. The data consists
on key value pairs and components have two phases
that is map phase and reduce phase.
 The job tracker and task tracker.

Shahbaz Khan , MS - IT 5/19/2019 7


 Name Node(master, metadata)
 Data Node (slave node, stores and read)

Shahbaz Khan , MS - IT 5/19/2019 8

Das könnte Ihnen auch gefallen