old structured vs. un structured . Variety (Multiple structure) Volume (terabytes to petabytes) Velocity (short time and large amount of data) like financial data for decision making. Data in motion(hadoop is for rest) Example: Pictures, Emails, Clicked events, messaging, open data, public posts etc
Shahbaz Khan , MS - IT 5/19/2019 2
Due to rapid change in data now traditional file system and tools can not handle these data due to scale and complexity. Hadoop is platform that is available to deal with this.
Shahbaz Khan , MS - IT 5/19/2019 3
In 2002 apache nutch(open source web search engine) project was interested for infrastructure to deal the with billions of web pages. In 2004 Google published white paper of GFS and Mapreduce and in middle of 2005 nutch was using both Mapreduce and HDFS. In 2006 they become the part of Lucene Subproject named Hadoop.
Shahbaz Khan , MS - IT 5/19/2019 4
Hadoop is a framework for storing data on large clusters of commodity hardware(everyday hardware that is easy available and affordable) and running applications against that data. Hadoop was the name of that (co creator of hadoop) Doug cutting’s son gave to his elephant toy.
Shahbaz Khan , MS - IT 5/19/2019 5
It contains two main components Distributed processing framework mapreduce. Distributed file system HDFS. It is mater slave architecture. Master Nodes that controls the storage and processing system and slave nodes which store the data and process the data.
Shahbaz Khan , MS - IT 5/19/2019 6
Mapreduce involves the processing of a sequence of operations on distributed data sets. The data consists on key value pairs and components have two phases that is map phase and reduce phase. The job tracker and task tracker.
Shahbaz Khan , MS - IT 5/19/2019 7
Name Node(master, metadata) Data Node (slave node, stores and read)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!