Sie sind auf Seite 1von 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/303988873

Big Data and Hadoop

Article · April 2015

CITATIONS READS

0 5,985

1 author:

Tehseen Qureshi
University of Agriculture Faisalabad
9 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

development of MPLS VPN using IPV6 View project

Big Data Analytics View project

All content following this page was uploaded by Tehseen Qureshi on 15 June 2016.

The user has requested enhancement of the downloaded file.


Review: Big Data and Hadoop
Big Data and its application

Zahid Javed
Department of Computer Science
Summit College
Faisalabad, Pakistan
zahidjaved_uaf@yahoo.com
Badarqa Shakoor
Tariq Shahzad Department of Computer Science
Department of Computer Science University of Agriculture
Government College University for Women Faisalabad, Pakistan
Faisalabad, Pakistan badarqa25@gmail.com

Muhammad Tehseen Qureshi Fozia Mushtaq


Department of Computer Science Department of Computer Science
University of Agriculture University of Agriculture
Faisalabad, Pakistan Faisalabad, Pakistan
tehseen.qureshi@gmail.com emman941@gmail.com

Abstract— The word ‘Big Data’ designates advanced


A. Four Dimensions of Big Data
methods and tools to capture, store, distribute, manage
and investigate petabyte or larger sized datasets with high Volume: Large volumes of data Velocity: Quickly moving
velocity and different arrangements. Big data can be data, Variety: structured, unstructured, images, etc. Veracity:
organized, unstructured or semi organized, resulting in Trust and integrity is a challenge and a must and is important
incapability of predictable data management methods. Put for big data just as for traditional relational DBs.
another way, big data is the realization of greater business
intelligence by storing, processing, and analyzing data that was
previously ignored due to the limitations of traditional data
management technologies. Hadoop is the main podium for
organizing Big Data, and cracks the tricky of creating it
convenient for analytics determinations. Hadoop is an
open source software project that allows the distributed
handling of large datasets across bunches of service
servers. It is considered to scale up from a single server to
thousands of technologies, with a very high degree of fault
tolerance.

Index Terms—Big Data, Hadoop, Components of Hadoop

I. INTRODUCTION
Big data and analysis are at the center of modern science
and business. This data transactions online, e-mails, videos,
Figure: Big Data Platform
audio, images, click streams, logs, posts, search queries, health
records, social networking, communicating science data, B. Big Data Mystery
sensors and mobile phones and their applications are created.  Although there is a massive spike available data, the
They are stored in the database and the massive increase, the percentage of the data that an enterprise can understand
farm, store, manage, share, analyze and typical database is on the decline
software tools are difficult to see through .  The data that the enterprise is trying to understand is
saturated with both useful signals and lots of noise.
II. SOME CONCEPTS cash, firms with solid state drives (SSD) can move. Ten to one
NoSQL (Not Only SQL): relational data model (ie, no hundred times faster, but more expensive. And really turbo-
tables, no use of limited or SQL) "move out" that database: charge who wants to set up their firms, computing "in memory
of" exists.
 And retrieve data (not necessarily table) focus on new
data appending. This figure is 10,000 times faster than the old methods may
be stored in RAM, which allows you.
 Data objects can be used to find the key that will focus
on the data stores of value. B. Process
 Storage of large amounts of unstructured data to Talk to a consultant data and Hadoop on before starting to
support the focus. waffle about how long you can. In fact, Big Data and Hadoop
 SQL is not used to retrieve data storage or no acid are inseparable many people assume that they are in
(atomicity, consistency, isolation, durability). themselves. For newcomers, the Hadoop to store and process
 An outline is less focused on NoSQL architecture (ie, data in separate sections or groups is a way. It has huge
is not predefined data structure). reserves to deal with data, making it ideal to grow, easy to
 The database is built and populated • In contrast, the manage and easy means.
traditional relationship DBS needs to be defined So what's new? Whisper on the road Hadoop on its age is
schema. starting to see that. Here Sanjay Joshi, Indian giant Mahindra
Tech Global Big Data owner is: "Hadoop has been around for
nearly a decade and, it comes to technology, that is a long time
- and there seem to be few limits are.”
C. Analysis
OLDE people in large numbers, could analyze the data
scientists who were paid only folk. He baffled and amazed
onlookers with their coding skills. No longer. Additional staff
by using simple graphical interface data to search for valuable
patters is possible today. Drag and drop most popular chart
which use Tableau,, Qlikview, SAS Visual Analytics and are
Opinurate.
Martha Lane-Fox founded by Lucky Voice karaoke chain,
uses Tableau to plan. Staff songs, the most popular food and
drink customers who prefer to look forward booking data, and
can play an important role in re-booking. Tableau coding
format altogether eschews. Users simply take over and create a
new chart lists the data transfer.
Walmart suitable for cross-selling and up-selling product
promotions Neo4j graph database to identify uses. This non-
technically minded staff is fairly easy to use. Big data software
makers are aware of this trend and these systems are now
easily collaborate to ensure the mesh. For example, the log
analytics, a concept tool maker, recently scientists rely on data
to the end with the stated aim, HP's has partnered with Vertica
Analytics Platform. Contacts between the two are automatic.
Figure: Layered Architecture of Big Data System Scientists have barely data. The second is the desire to tools
should be considered.
III. THREE TRIALS FRONTING BIG DATA IV. HADOOP TO OVERCOME BIG DATA TRIALS
Big data is a big challenge. Data storage will flood your Hadoop on large data sets in a distributed computing
arrival. Data processing tools you need are new and scary. And environment used to support the processing of a programming
commercially valuable insights to analyze the data to find there framework. Hadoop on an application broken down into
is real work. different parts of a software framework that was developed by
. Google's MapReduce. Hadoop on the Hadoop ecosystem
A. Storage Appache current kernel, MapReduce, HDFS and Apache Hive,
Zookeeper base and consists of different components. HDFS
Data for firms handling the oodles of puzzles is how to
and MapReduce are described in the following Points.
store it. Traditional old-fashioned method of "spinning platter"
was to use the hard disk drive. These are slow, but not much to
save a lot of money. Speed is a priority, so a little bit more
point of view, it can be located on multiple dimensions. For
example, a very large dataset analytics can be applied to a
small subset can be reduced. In a traditional data storage
scenario, by the analyst to create something usable data ETL
operation can entail applying. In the Hadoop, MapReduce jobs
this type of operation, as is written in Java. Make them easy to
write programs like Hive and Pig are a number of high-level
languages. The results of these works back to HDFS either
written or can be placed in a traditional data warehouse.

Figure: Hadoop Architecture


A. HDFSs
Hadoop on the Hadoop Distributed File System, or HDFS
includes a fault-tolerant storage system. HDFS data storage Figure: MapReduce Architecture
infrastructure without losing significant parts of failure, huge
amounts of information stored in the scale incrementally and is
able to survive. To create Hadoop clusters of machines and CONCLUSION
coordinate work between them. Clusters can be made with
We have entered the era of big data. volume, velocity and
cheap computers. One fails, the remaining machines in the
variety of Big Data with Big Data describes the concept. The
cluster working on Hadoop on transition from work without
paper also focuses on the problems of Big Data processing.
losing data or interrupting the cluster continues to operate.
The technical challenges of big data processing solutions must
HDFS ", blocks known as" breaking into pieces the files and
be efficient and fast. Big paper is used for the processing of
servers redundantly across the pool to store blocks from each
data on which Hadoop is an open source software is described.
cluster storage management. In normal case, HDFS three
different servers by copying each piece three complete copies .
of each file is stored. REFERENCES
[1] H. S. Bhosale and P. D. P. Gadekar, “A Review Paper on Big
Data and Hadoop,” vol. 4, no. 10, pp. 1–7, 2014.
[2] Y. Demchenko, C. Ngo, and P. Membrey, “Architecture
Framework and Components for the Big Data Ecosystem,”
2013.
[3] W. Fan and A. Bifet, “Mining Big Data : Current Status , and
Forecast to the Future,” vol. 14, no. 2, pp. 1–5.
[4] S. Sagiroglu and D. Sinanc, “Big Data : A Review,” pp. 42 –47,
2013.
[5] E. Sivaraman and R. Manickachezian, “High Performance and
Fault Tolerant Distributed File System for Big Data Storage and
Processing Using Hadoop,” 2014 International Conference on
Intelligent Computing Applications, pp. 32–36, Mar. 2014.
[6] C. Kaewkasi, “A Study of Big Data Processing Constraints on a
Figure: HDFS Architecture
Low-Power Hadoop Cluster,” 2014.
B. MapReduce
Hadoop MapReduce framework on environmental pillar in
the system is processing. Framework specification of an
operation and data distribution problem, and run in parallel, on
a very large data set allows to be applied. From an analyst's

View publication stats

Das könnte Ihnen auch gefallen