Sie sind auf Seite 1von 12

2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News

http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 1/12
YOU ARE HERE : BIG DATA ANALYTICS NEWS ANALYTICS VITAL HADOOP TOOLS FOR CRUNCHING BIG DATA
Vital
Hadoop
tools for
crunching
Big Data
Tweet 1 55 Like
Share
0
& by bigdata O 07 February 2014
Analytics, Big Data, Cassandra,
Hadoop, Hadoop Tutorials, HBase, Hive,
Impala, MapReduce, MongoDB, NoSQL,
Pig 9 Tags: Ambari, Apache Flume,
Apache Pig, Apache spark, Avro, Big
Data, Hadoop, Hadoop Interview
Questions, Hadoop Tutorials, HBase,
HDFS, Hive, Hive Interview Questions,
Mahout, NoSQL, Oozie, Solr, SQL on
Hadoop, sqoop, Zookeeper No
Comment
Categories
Analytics
Big Data
Big Data Use Cases
Cassandra
Cloud Computing
Cloudera
Couchbase
Events/Seminars
Google
Hadoop
Hadoop Tutorials
HBase
Hive
Impala
MapReduce
MongoDB
NoSQL
Pig
Predictive Analytics
SAS
Splunk
Uncategorized
0 Search
Subscribe updates via Email
Enter your email address:
Subscribe
Delivered by FeedBurner
Home About Analytics Big Data Hadoop NoSQL Hive MongoDB Impala
Big Data Use Cases Big Data Events
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 2/12
Today, the most popularly term
in IT world is Hadoop. Within a
short span of time, Hadoop has
grown massively and has
proved to be useful for a large
collection of diverse projects.
The Hadoop community is fast
evolving and has very
prominent role in its eco-
system.
Here is a look at the essential
tools and codes that comes
under the collective heading,
Hadoop.
Hadoop:
When we think of Hadoop, the
first thing that comes to our
mind is the map and reduce
tools. Generally, the entire
group of map and reduce tools
are termed as Hadoop, but the
small pile of code at the center
is referred as Hadoop, which is
licensed under Apache. These
codes are Java based and they
synchronize worker nodes in
executing a function stored
locally. The results are then
aggregated and reported. In
the above process, the first
step of Aggregation is called
as Map and the second step
Big Data Analytics
+ 32
Follow +1
Recent Posts
Java and Analytics the next frontier
Vital Hadoop tools for crunching Big Data
7 Tips to Succeed with Big Data in 2014
6 reasons your Big Data Hadoop project will
fail in 2014
Neo4j, A Graph Database For Building
Recommendation Engines, Gets A Visual
Overhaul
Archives
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 3/12
of Reporting is called as
Reduce. Hadoop allows
programmers to concentrate
on writing code for data
analysis. Hadoop is also
designed to work around faults
and errors that are expected
by individual machines.
Ambari:
Ambari is an Apache project
supported by Hortonworks. It
offers a web based GUI
(Graphical User Interface) with
wizard scripts for setting up
clusters with most of the
standard components. Ambari
provisions, manages and
monitors all the clusters of
Hadoop jobs.
HDFS (Hadoop Distributed
File System):
The HDFS, distributed under
Apache license offers a basic
framework for splitting up data
collections between multiple
nodes. In HDFS, the large files
are broken into blocks, where
several nodes hold all of the
blocks from a file. The file
system is designed in a way to
mix fault tolerance with high
throughput. The blocks of
HDFS are loaded to maintain
steady streaming. They are not
usually cached to minimize
latency.
HBase:
August 2013
Meta
Register
Log in
Entries RSS
Comments RSS
WordPress.org
^ Big Data
Java and Analytics the next
frontier
" 08 February 2014
Vital Hadoop tools for
crunching Big Data
" 07 February 2014
7 Tips to Succeed with Big
Data in 2014
" 07 February 2014
6 reasons your Big Data
Hadoop project will fail in
2014
" 06 February 2014
Neo4j, A Graph Database
For Building
Recommendation Engines,
Gets A Visual Overhaul
" 05 February 2014
^ Hadoop
Java and Analytics the next
frontier
" 08 February 2014
Vital Hadoop tools for crunching Big
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 4/12
HBase is a column-oriented
database management system
that runs on top of HDFS.
HBase applications are written
in Java, very much like the
MapReduce application. It
comprises a set of tables,
where each table contains rows
and columns like a traditional
database. When the data falls
into the big table, HBase will
store the data, search it and
automatically share the table
across multiple nodes so that
MapReduce jobs can run it
locally. HBase offers a limited
guarantee for some local
changes. The changes that
happen in a single row can
succeed or fail at the same
time.
Hive:
If you are already fluent with
SQL, then you can leverage
Hadoop using Hive. Hive was
developed by some folks at
Facebook. Apache Hive
regulates the process of
extracting bits from all the files
in HBase. It supports analysis
of large datasets stored in
Hadoops HDFS and
compatible file systems. It also
provides an SQL like language
called HSQL (HiveSQL) that
gets into the files and extracts
the required snippets for the
code.
Sqoop:
Data
" 07 February 2014
7 Tips to Succeed with Big
Data in 2014
" 07 February 2014
6 reasons your Big Data
Hadoop project will fail in
2014
" 06 February 2014
^ NoSQL
Vital Hadoop tools for
crunching Big Data
" 07 February 2014
7 Tips to Succeed with Big
Data in 2014
" 07 February 2014
6 reasons your Big Data
Hadoop project will fail in
2014
" 06 February 2014
Neo4j, A Graph Database
For Building
Recommendation Engines,
Gets A Visual Overhaul
" 05 February 2014
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 5/12
Apache Sqoop is specially
designed to transfer bulk data
efficiently from the traditional
databases into Hive or HBase.
It can also be used to extract
data from Hadoop and export it
to external structured data-
stores like relational databases
and enterprise data
warehouses. Sqoop is a
command line tool, mapping
between the tables and the
data storage layer, translating
the tables into a configurable
combination of HDFS, HBase
or Hive.
Pig:
When the data stored is visible
to Hadoop, Apache Pig dives
into the data and runs the code
that is written in its own
language, called Pig Latin. Pig
Latin is filled with abstractions
for handling the data. Pig
comes with standard functions
for common tasks like
averaging data, working with
dates, or to find differences
between strings. Pig also
allows the user to write
languages of their own, called
UDF (User Defined Function),
when the standard functions
fall short.
Zookeeper:
Zookeeper is a centralized
service that maintains,
configures information, gives a
name and provides distributed
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 6/12
synchronization across a
cluster. It imposes a file
system-like hierarchy on the
cluster and stores all of the
metadata for the machines, so
we can synchronize the work of
the various machines.
NoSQL:
Some Hadoop clusters
integrate with NoSQL data
stores that come with their own
mechanisms for storing data
across a cluster of nodes. This
allows them to store and
retrieve data with all the
features of the NoSQL
database, after which Hadoop
can be used to schedule data
analysis jobs on the same
cluster.
Mahout:
Mahout is designed to
implement a great number of
algorithms, classifications and
filtering of data analysis to
Hadoop cluster. Many of the
standard algorithms like K-
means, Dirichelet, parallel
pattern and Bayesian
classifications are ready to run
on the data with a Hadoop style
Map and reduce.
Lucene/Solr:
Lucene, written in Java
integrates easily with Hadoop
Mapreduce Hadoop
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 7/12
and is a natural companion for
Hadoop. It is a tool meant for
indexing large blocks of
unstructured text. Lucene
handles the indexing, while
Hadoop handles the distributed
queries across the cluster.
Lucene-Hadoop features are
rapidly evolving as new
projects are being developed.
Avro:
Avro is a serialization system
that bundles the data together
with a schema for
understanding it. Each packet
comes with a JSON data
structure. JSON explains how
the data can be parsed. The
header of JSON specifies the
structure for the data, where
the need to write extra tags in
the data to mark the fields can
be avoided. The output is
considerably more compact
than the traditional formats like
XML.
Oozie:
A job can be simplified by
breaking it into steps. On
breaking the project in to
multiple Hadoop jobs, Oozie
starts processing them in the
right sequence. It manages the
workflow as specified by DAG
(Directed Acyclic Graph) and
there is no need for timely
monitor.
GIS Tools:
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 8/12
Working with geographic maps
is a big job for clusters running
Hadoop. The GIS (Geographic
Information System) tools for
Hadoop projects have adapted
best Java based tools for
understanding geographic
information to run with Hadoop.
The databases can now handle
geographic queries using
coordinates and the codes can
deploy the GIS tools.
Flume:
Gathering all the data is equal
to storing and analyzing it.
Apache Flume dispatches
special agents to gather
information that will be stored
in HDFS. The information
gathered can be log files,
Twitter API, or website scraps.
These data can be chained
and subjected to analyses.
Spark:
Spark is the next generation
that pretty much works like
Hadoop that processes data
cached in the memory. Its
objective is to make data
analysis fast to run and write
with a general execution model.
This can optimize arbitrary
operator graphs and support
in-memory computing, which
lets it query data faster than
disk-based engines like
Hadoop.
SQL on Hadoop:
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 9/12
When its required to run a
quick ad-hoc query of all the
data in the cluster, a new
Hadoop job can be written, but
this takes some time. When
programmers started doing this
more often, they came up with
tools written in the simple
language of SQL. These tools
offer quick access to the
results.
Java and
Analytics
the next
frontier
February 8,
2014
I've been pretty verbal about Java
going down the wrong path and
...
7 Tips to
Succeed
with Big
Data in
2014
February 7, 2014
Information from Tableau
Software What a year 2013 was
for big data. From ...
6 reasons
your Big
Data
Hadoop
project
will fail in 2014
February 6, 2014
By Steve Jones Ok so Hadoop
is the bomb, Hadoop is the
Related Posts
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 10/12
schizzle, ...
Neo4j, A
Graph
Database
For
Building
Recommendation
Engines, Gets A Visual
Overhaul
February 5, 2014
Part of the problem with any
powerful technology is how it is
...
Super
Bowl ads
need Big
Data to
be
effective
February 4, 2014
When the Denver Broncos take
the field against the Seattle
Seahawks, the ...
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 11/12
A Tags
Analytics Big Data
A Hot Topics
About (1)
Univa, MapR Partner Over
A About Us
Welcome to Big Data Analytics
News! The site is all about Big Data
7 Tips to
Succeed with
Big Data in
2014
Java and
Analytics the
next frontier
Univa, MapR
Partner Over
Enterprise-
Grade Hadoop
Management
1 comment
RDBMS
dominate the
database
market, but
NoSQL
systems are
catching up
1 comment
The 4 Key
Pillars of
Hadoop
Performance
and Scalability
1 comment
3 Tools
Companies
Can Use to
Harness the
Power of Big
Data
1 comment
AROUND THE WEB
ALSO ON BIG DATA ANALYTICS
NEWS
WHAT'S THIS?
Why
Wall
Street
Brokers
Refuse
to Sell
This
Proven
18.79%
Investment
Moneynews
Man and
woman
die while
trying to
retrieve
a
cellphone
from
Chicago
river
YJNews
50
Photographs
That Will
Blow Your
Mind
PBH Network
0 Comments

Sort by Best
Start the dis
Be the first to comment.
Favorite
2/8/2014 Vital Hadoop tools for crunching Big Data | Big Data Analytics News
http://bigdataanalyticsnews.com/vital-hadoop-tools-crunching-big-data/ 12/12
Big Data
Analytics big data
services business intelligence
Cloud Computing
Cloudera Couchbase Database
Data Science Data Scientists
Data Warehousing EMC Facebook
Google Hadoop
Hadoop 2.x Hadoop Cluster
Hadoop ETL Hadoop Interview
Questions Hadoop OpenStack
Hadoop system Hadoop
Tutorials HBase HDFS Hive
Hortonworks IBM Impala Java JSON
machine learning MapR
MapReduce Microsof t
MongoDB MySQL
NoSQL Oracle Pig
Predictive Analytics
RDBMS R programming language
Teradata YARN
Follow my blog with Bloglovin
Enterprise-Grade Hadoop
Management (1)
A Set of Hadoop-related open
source icons (1)
Improving the Big Data Toolkit (1)
Couchbase rolls out native NoSQL
databases for iOS, Android (1)
information and provides the latest
advances in Big Data, Hadoop,
NoSql Databases, And Data
Analytics. The site is the industry's
online resource for exclusive stuff
on Big Data.
This site is dedicated to providing
the latest news on Big Data, Big
Data Analytics, Business
intelligence, Data Warehousing,
NoSql, Hadoop, Mapreduce,
Hadoop Hive, HBase...Read more
Copyright 2014. Big Data Analytics News

Das könnte Ihnen auch gefallen