Hadoop Bigdata Training Bangalore

Big Data & Hadoop
Agenda
Introduction to Big Data
Why Big Data?
Big Data Overview
Hadoop Overview
Why Hadoop?
Who can learn Hadoop?
#Trending : Jobs for Hadoop and Java

Hadoop : Architecture & Ecosystem
Introduction to Big Data

Big Data : A Term for collective data sets with large and complex
volumes of data.
Volumes are in Petabytes (1024 TB) or Exabytes (1024 PB) &
will soon be Zettabytes (1024 EB).

Hence, the data are hard to interpret & process in the existing
traditional data processing application and tools.
Why Big Data

To Manage Huge Data in a better way.
Benefit of Data Speed, Capacity & Scalability from cloud storage.
Potential insights by Data Analysis Methods.
Companies can find new prospects & Business Opportunities.
Unlike other methods, with Big Data, Business Users can
Visualize the Data.
Big Data Overview

Big Data include:
Traditional Structured Databases from inventories, orders and
customer information.
Unstructured Data from web, social networking sites etc.,
The problem with these massive datasets are that it cant be

analyzed with standard tool & procedures.
Processing these data appropriately can help an Organization
gain useful insights on the business prospects.
Unstructured data Growth

No. of Emails sent per second
2.9 Million
Videos uploaded on YouTube per min
20 hours
Data processed by Google per day
20PetaBytes
Tweets per day
50 Million
Minutes spent on FaceBook per month
700 Billion
Data sent & received by mobile users per day 1.3 ExaBytes
Products ordered on Amazon per second
*Source:http://www. http://ibm.com/
73 items
Unstructured data Growth
*Source:http://www. http://forbes.com/
Hadoop Overview
Hadoop allows batch processing for colossal data sets
(Petabytes & Exabytes) as a series of parallel processes.

Hadoop cluster comprises a number of server "nodes.
Nodes store and process data in a parallel and distributed
fashion.
Its a parallelized, distributed storage & processing framework
that can operate on commodity servers.
Commodity Hardware
Its the average amount of computing resources.
It doesnt imply low quality but, affordability.
Hadoop Clusters run on Commodity Servers.
Commodity servers have an average ratio of disk space to
memory which is not like specialized servers with high memory
or CPU.
Servers are not designed specifically to distribute storage and
process framework, but its made to fit the purpose.
Benefits of Hadoop
Scalable Hadoop can store and distribute very large data sets
across hundreds of inexpensive servers that operate in parallel.
Failure Tolerance HDFS can replicate files for specified
number of times and can automatically re-replicate data blocks

on nodes that have failed.
Benefits of Hadoop
Cost-Effective Hadoop is a scale-out architecture that stores all
the company's data for later use, for which it offers computing
and storage capabilities for a reasonable price.
Speed Hadoops unique storage method is based on a distributed

file system, resulting in much faster data processing.
Flexible Hadoop easily access new data sources and different

types of data to generate insights.
*Source:http://www. http://datanami.com/
Why Hadoop
It provides insights into daily operations
Drives new product ideas
Used by companies for research and development and marketing
analysis
Image and text processing.
Analyses huge amount of data in comparatively less time.
Network monitoring
Log and/or click stream analysis of various kinds.
Hadoop Forecast
*Source:http://www. http://alliedmarketresearch.com/
Who can Learn Hadoop

Anyone with basic knowledge of Java & Linux.
Even if you arent introduced to Java & Linux before, You can
learn it parallel along with Hadoop.
Hadoop projects are available as Architect, Developer, Tester,

Linux/Network/Hardware Administrator.
Some need the knowledge of Java and some dont.
Who can Learn Hadoop

SQL knowledge will help in learning HiveQL, which is a
feature in Hadoop Ecosystem.
Knowledge of Linux in will be helpful in understanding

Hadoop command line Parameters.
But even without any prerequisite knowledge of Java & Linux,

with the help of few basic classes you can Learn Hadoop.
#Trending: Hadoop Jobs
*Source:http://www. http://the451group.com/
Job Opportunities in Hadoop

MNCs like IBM, Microsoft & Oracle have integrated with
Hadoop.
Also, companies like Facebook, HortonWorks, Amazon, ebay

and Yahoo! are currently looking for Hadoop Professionals.
So, companies are looking for IT professionals with enough

Hadoop Mapreduce skills.
Salary Trend in Hadoop
*Source:http://www. http://itproportal.com/
Hadoop Architecture
The 2 main components of Hadoop are:
Hadoop Distributed File System (HDFS) is the storage
component that breaks files into blocks, replicates and stores
them across the cluster.
MapReduce, the processing component that distributes the

workload for operations on files stored in HDFS and
automatically restarts failed work.
*Source:http://www. http://cloudera.com/
Hadoop Ecosystem
Apache Hadoop Distributed File System offers storage of large
files across multiple machines.
Apache MapReduce is a program for processing large data sets

with a parallel & distributed algorithm on a clusters.
Apache Hive data warehouse in distributed storage facilitating
data summarization, queries and managing large datasets.
Apache Pig is an engine for executing data flows in parallel on
Apache Hadoop.
Apache HBase Non-relational distributed database performing
real-time operations in large tables.
Hadoop Ecosystem
Apache Flume is an Unstructured data aggregator to HDFS.
Apache Sqoop is a system for transferring bulk data
between HDFS and relational databases.
Apache Oozie is a workflow scheduler system to manage Apache
Hadoop jobs.
Apache Zookeeper is a coordinator with tools to write correct
distributed applications.
Apache Avro is a framework for modelling, serializing and
making Remote Procedure Calls.
Q&A
Q&A
THANK YOU

Hadoop Bigdata Training Bangalore

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Hadoop Bigdata Training Bangalore

Hochgeladen von

Copyright:

Verfügbare Formate

Big Data & Hadoop

#Trending : Jobs for Hadoop and Java

Introduction to Big Data

will soon be Zettabytes (1024 EB).

traditional data processing application and tools.

Why Big Data

Visualize the Data.

Big Data Overview

The problem with these massive datasets are that it cant be

Unstructured data Growth

Videos uploaded on YouTube per min

Data processed by Google per day

Tweets per day

Minutes spent on FaceBook per month

Unstructured data Growth

(Petabytes & Exabytes) as a series of parallel processes.

Failure Tolerance HDFS can replicate files for specified

number of times and can automatically re-replicate data blocks

Speed Hadoops unique storage method is based on a distributed

Flexible Hadoop easily access new data sources and different

Who can Learn Hadoop

Hadoop projects are available as Architect, Developer, Tester,

Some need the knowledge of Java and some dont.

Who can Learn Hadoop

Knowledge of Linux in will be helpful in understanding

But even without any prerequisite knowledge of Java & Linux,

#Trending: Hadoop Jobs

Job Opportunities in Hadoop

Also, companies like Facebook, HortonWorks, Amazon, ebay

So, companies are looking for IT professionals with enough

Salary Trend in Hadoop

MapReduce, the processing component that distributes the

Apache MapReduce is a program for processing large data sets

Das könnte Ihnen auch gefallen