Sie sind auf Seite 1von 8

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org


Volume 5, Issue 5, September - October 2016

ISSN 2278-6856

Airline recommender- a MapReduce way of


Optimizing analytical work loads
B Naveen Kumar
V R Siddhartha Engineering College , Vijayawada-520007

ABSTRACT
Most of the people travelling through flight experience delay
time in the daily bases. The average delay was no were
mention in the public web sites. The model proposed in this
paper analyze a huge international data to develop a person's
co-relation of delay time, based on hadoop map reducing
algorithms. The Pearsons co-relation algorithm was
implemented using Mahout to get results for flight delayed
times. The outputs thus created helps the user choosing the
flight based on the delay times. This procedure has very low
time complexity and very high efficient.

Key words: Hadoop, Mahout, Map Reduce,


Pearsons co relation
1. INTRODUCTION
Most of the people travelling through flight experience
delay time in the daily bases.The average delay was no
ware mention in the public web sites. The model proposed
in this paper analyze a huge international data to develop
a persons co-relation of delay time thats providing an
outputs that helps the user choosing the flight based on the
delay times.
Most of the flights searched online are based on arrival
time, departure time, and amount of time they take to
reach the destination the number of stops between them
and even there are acquisition to make to choose are
airline also. All of these are ok, but what happened in a
real time situation where user really buy the ticket
following all the criteria and the flight is delayed. It is not
all special problem it is a casual and daily problem .most
of the passengers who are travelling through this air plane
face the problem of delay on daily basis.
Most planes delay for 5 to 10 min or sometimes it may
extends to 20 or 30 min even hours and days in that
saturation applied to choose well based process amount of
time for whom time is money. For those I want to
recommend a new way to calculate delay time on an
average package

Volume 5, Issue 5, September October 2016

My project now take a real time airplane data for


international flights to analyses the data before buying a
ticket. So buy taking a real time data it is used by airport
authorities to controls the time to control the delay time of
the plan is used to analyze the total delay time of every
plane that is flying from every airport.
1.1 HADOOP:
The most standard execution of MapReduce is Hadoop.
The commonness of Hadoop stems from its awesome
execution in dealing with frustrations and limit of scaling
to a far reaching number of worker centers. Hadoop is an
endeavor from the Apache Software Foundation written in
Java. It engages organization of petabytes of data in a
large number machines. The inspiration begins from
Google's MapReduce and Google File System papers.
Hadoop's most noteworthy supporter has been the request
creature Yahoo. Besides, task is parallelized - the Map
stage. By then all these transitional results are merged into
one result - the Reduce stage. There are 2 modules in
Hadoop - Job Tracker and Task Trackers. The Job Tracker
(a java methodology) is responsible for watching the
occupation, managing the Map/Reduce stage, managing
the retries if there ought to emerge an event of
mistakes.[1]
1.2 HDFS:
The Hadoop Distributed File System (HDFS) is a spread
record structure planned to continue running on product
gear. It has various comparable qualities with existing
passed on record structures. In any case, the refinements
from other coursed archive systems are basic. HDFS is
exceedingly accuse to lerant and is proposed to be sent on
straightforwardness hardware.
HDFS gives high throughput access to application data
and is sensible for applications that have broad data sets.
HDFS loosens up an OS necessities to engage spilling
access to report structure data. HDFS was at first
functioned as base for the Apache Nutch web crawler
wander. HDFS is without further ado an Apache Hadoop
Page 45

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016
subproject.
HDFS has a specialist/slave outline. A HDFS cluster
includes a lone NameNode, a specialist server that
arrangements with the archive structure namespace and
controls access to records by clients. Moreover, there are
different Datallodes, for the most part one for each center
point in the gathering, which manage limit joined to the
centers that they continue running on. HDFS revealed a
record structure namespace and grants customer data to be
secured in archives. Inside, an archive is part into one or
more pieces and these squares are secured in a course of
action of Data stores.
The Name Node executes report system namespace
operations like opening, closing, and renaming records
and registries. It also same reducer meanwhile. While this
system can consistently appear to be inefficient stood out
from computations that are more progressive, MapReduce
can be associated with basically greater datasets than
"thing" servers can manage a sweeping server
residence can use MapReduce to sort a petabyte of data in
only two or three hours.
The parallelism in like manner offers some likelihood of
recovering from midway disillusionment of servers or
limit in the midst of the operation: if one mapper or
reducer misses the mark, the work can be rescheduled
expecting the data is still open. Chooses the mapping of
squares to Data Nodes. The Data Nodes are responsible
for serving read and form requests from the archive
system's clients. The Data Nodes also perform square
creation, deletion, and replication upon rule from the
Name Node. The Name Node and Data Node are bits of
programming proposed to continue running on thing
machines. These machines consistently run a GNU/Linux
working structure (OS). HDFS is built using the Java
tongue; any machine that support Java can run the Name
Node or the Data Node programming. Utilization of the
exceedingly advantageous Java lingo infers that HDFS can
be sent on a broad assortment of machines. An average
association has a conferred machine that runs only the
Name Node programming.
1.3 MAPREDUCE:
MapReduce address just a part of a more noteworthy space
with different other extraordinary giving out structures,
for occurrence, Dryad, Nephele, Pact, or Hive. As
requirements be, the techniques and sections we expect
that must fit will differing structures, and finally to
immaculate models before the late dealing with

Volume 5, Issue 5, September October 2016

ISSN 2278-6856

frameworks. Map Reduce is implied as another strategy of


get prepared tremendous information in dispersed
enrolling conditions, it is besides rebuff as a "basic stride
in reverse" intricacy and DBMS. It is comprehended that
MapReduce is without format and summary free. The Map
Reduce system requires parsing every record at looking at
information. As the outcome, shows that nor is amazing at
the other done well and the two improvements are
complementary.[5] Some DBMS seller in like way has
included MapReduce front-closes into their frameworks
including Aster, Hadoop DB, Greenplum and Vertuca. For
the most part of these are unmoving databases, which
basically give a MapReduce front-end to a DBMS.
MapReduce has ended up different thoughts in different
fields, including information mining, data recuperation,
picture recuperation, machine learning, and outline
recognition.[3] MapReduce runtime structure by including
Join stage before Reduce stage to perform complex
information examination attempts on incomprehensible
social affairs MapReduce with particular workloads that
strategy a little division of the whole information set, so
neglect to review the limits of the MapReduce
arrangement under overwhelming workloads that
procedure exponentially developing information volumes.
Different structures can give passed on preparing
administrations. The most acclaimed execution is
MapReduce, proposed by Google in 2004. It is a
programming contraption and a related use for dealing
with and makes liberal information sets.[4] It upgrades
differing operations on boundless information sets. Clients
exhibit an associate point of confinement that techniques a
key/respect pair to make an amusement arrangement of
transitional key/respect sets, and a lessen limit that unions
each broadly engaging respect associated with the same
for the most part key. [5]
1.4 APACHE MAHOUT:
To bolster our examination in communitarian adjusting, a
few recommender framework stages were studied,
including Lens Kit, easyrec5, and MymediaLite. We chose
Mahout since it gives a large number of the craved
attributes required for a recommender improvement
workbench plat structure. [6] Mahout is a generation level,
open-source, framework and comprises of an extensive
variety of utilizations that are helpful or a recommender
framework engineer: community ltering calculations,
information bunching, and information order. Mahout is
likewise exceedingly adaptable and can bolster
Page 46

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016
disseminated handling of huge information sets crosswise
over groups of PCs utilizing Hadoop Mahout
recommenders
support
different
likeness
and
neighborhood arrangement figurings, proposal forecast
calculations incorporate client based, thing based, SlopeOne and Singular Value Decomposition (SVD), and it
additionally consolidates Root Mean Squared Error
(RMSE) and Mean Absolute Error (MAE) assessment
techniques. Mahout is perused ily extensible and gives an
extensive variety of Java classes for customization. As an
open-source extend, the Mahout designer/donor group is
extremely dynamic; the Mahout wiki likewise gives a
rundown of engineers and a rundown of sites that have
actualized Mahout. [6]
1.5 MOTIVATION:
Aircraft flight delays have gone under expanded
examination of late in the mainstream press, with the
Government Aeronautics Organization information
uncovering that carrier on-time execution was even from a
pessimistic standpoint level in 14 years in 2006. Flight
delays have been credited to a few causes, for example,
climate conditions, air terminal clog, airspace blockage,
utilization of littler air ship via carriers, and so forth. We
investigate observational flight information distributed by
the Agency of Transportation Insights to appraise the
planned on-time entry likelihood of every business
household flight flown in the Assembled States in 2007 by
a noteworthy bearer. The basic estimation come nearer
from econometrics is then used to credit the overage to
underage cost proportion of the newsvendor model for
every flight.
In this paper, we analyze the effect of the planned square
time designated for a flight, a variable controlled via
aircrafts, on-time entry execution. Considering this things
i need to make a situation that likewise consider the
deferral times of the flight at the season of showing
results.

2.EXSISTING WORKS:
Victoria Lopez et.al. [8] Have purposed a technique for
Grouping with enormous information had ended up one of
the most recent patterns when discussed gained from the
accessible data. The information development in the most
recent years had listened the enthusiasm for viably
obtaining learning to examine and Mama anticipate
patterns. The assortment and veracity that were identified
with enormous information presented a level of instability

Volume 5, Issue 5, September October 2016

ISSN 2278-6856

that must be taken care of notwithstanding the volume and


speed prerequisites. This information more often than not
likewise shows what was known as the issue of
arrangement with imbalanced datasets, a class
dissemination where the most critical ideas to be educated
were exhibited by an immaterial number of case in
connection to the quantity of case from alternate classes.
Keeping in mind the end goal to sufficiently manage
imbalanced enormous information we propose the ChiFRBCS-BigData CS calculation, a fluffy standard based
arrangement framework that could manage the
uncertainly that was presented in vast volumes of
information without dismissing the scholarly in the
underrepresented class. The strategy utilized the
MapReduce structure to disseminate the computational
operations of the fluffy model while it incorporates costtouchy learning strategies in its configuration to address
the irregularity that is available in the information. The
great execution of this methodology is upheld by the test
examination that is done more than twenty-four
imbalanced huge information instances of study. The
outcomes got demonstrate that the proposition can deal
with these issues acquired focused results both in the order
execution of the model and the time required for the
calculation.
Rama Satish K.V. [9] In the year 2014 Major information
handling
with
tackling
hadoop-mapreduce
for
streamlining anlytical workload In this paper, we have
displayed a productive procedure to characterize the
information utilizing firefly and guileless Bayes classifier.
Proposed system is contained into two stage, (i) Guide
lessen structure for preparing and (ii) Guide diminish
system for testing. At first, the information test
information is sustained into framework to choose the
appropriate
element
for
enormous information
characterization. The firefly calculation is being utilized
and the upgraded highlight space is picked with the best
wellness. Once the best component space is found through
firefly calculation, the grouping of huge information is
done utilizing the guileless bayes classifier. Here, these
two procedures are successfully disseminated taking into
account the idea given in Guide Diminish structure. The
outcomes for the decrease yield are accepted through
assessment measurements in particular, exactness,
specificity, affectability and time. For near examination,
proposed enormous information order is contrasted and
the current works, for example, guileless Bayes and neural
Page 47

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016
system for twitter datasets.
Priyanka Shenoy1, Manoj jai1. [10] In this year 2013
Recommendation Systems are a powerful tool for
extracting additional value for business from its user
databases. These Systems help us to find the items the user
wants to buy or would like to view. It benefits the users by
helping them find the item they like. They are
increasingly used as a tool for E- commerce on the web.
Our paper lays an emphasis on Collaborative filtering
using Pearsons Coefficient which allows scaling large
datasets as well as giving accurate recommendations.
Qingchen Zhang et.al. [10] Have purposed the matter with
the fast improvement of the Web of Things and Electronic
Trade, They have entered the time of huge information.
The attributes, for example, awesome sum and
heterogeneousity, of enormous information convey the test
to the capacity and examination. The paper exhibited a
general stockpiling engineering for enormous information
in cloud environment. We utilize grouping investigation
to isolate the cloud hubs into various bunches as per the
correspondence cost between various hubs. The bunch
with the most grounded registering force was chosen to
give the widespread stockpiling and question interface for
clients. Each of different groups is in charge of putting
away the information of a specific model, for example,
social information, key-esteem information, and record
information et cetera. Tests demonstrate that design can
put away a wide range of heterogeneous huge information
and give clients brought together capacity and question
interface for enormous information effectively and rapidly.

3. PROPOSED WORK:
The presented stockpiling outline can support distinctive
data models, including an extensive variety of social data
and non-social heterogeneous data called MySQL data, by
isolating center points in disseminated stockpiling center
into a couple gatherings, each of which stores data with
exceptional model, for instance, key worth model and
record model. In addition, the designing outfits customers
with bound together stockpiling interface and inquiry
interface. The whole plan can be allocated into two layers
in basis, data examination layer and data stockpiling
layer.As of late, different analysts exhibit a few
calculations for enormous information order taking into
account grouping techniques. In any case, the test is not
just in decreased the memory just and how the
dimensionality and versatility is thought about for

Volume 5, Issue 5, September October 2016

ISSN 2278-6856

information grouping in light of the fact that, in all


actuality, the handling is with an expansive and high
dimensional information. Along these lines, (i)
condemnation of dimensionality and (ii) adaptability are
taken as difficulties that are vital to plan a major
information characterization technique. By taking care of
every one of these criteria, a major information
arrangement system is critically required for enhancing
the order precision. By explaining these two difficulties in
this exploration, the component choice and Guide
Decrease system is fused into the examination. The
component determination technique can illuminate the
scourge of dimensionality by distinguishing the
appropriate elements. At that point, the second test can be
comprehended utilizing Guide lessen system that is
outstanding disseminated engineering to do the complex
computational errands in a compelling way. Moreover, the
exactness of recognizing huge information messages is
additionally critical limitations so that the successful
methods ought to be hybridized.
MAP OPERATION:
In this segment, we have depicted about preparing
procedure of Map decrease system for email enormous
information grouping, the proposed Map decrease
structure based information order framework, which
comprises of three imperative capacities: Map,
Intermediate framework and Reduce. The general
operation of proposed engineering is given by
DSM IMS R Final esteem

(1)

Where, DS is the dataset, M is the mapper, IMS is the


middle of the road framework
A major information dataset DS, it is firstly partitioned
into number of subsets. Subset contains numerous traits.

Figure1: Mapper operation [3]


Page 48

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016
Where, n DS and DS 1 2, are the subsets. Typically, guide
is composed by the client, takes an information combine
and produces an arrangement of middle of the road
key/esteem sets. In guide diminish engineering figure 2,
for every information, we relate a guide operation. The
initial step is to segment the information dataset,
commonly put away in a disseminated record framework,
among the PCs that execute the guide usefulness. From
the rationale viewpoint, all information is dealt with as a
Key (K), Value (V) pair. Every traits in the info dataset is
spoken to as a <key1, value1>. In the second step, every
mapper applies the guide capacity on every single credit to
produce a rundown on the structure (< key2, value2 >)*,
where ()* speaks to arrangements of length zero or more.
REDUCER OPERATION:
The third step is to modify the yield of the mappers into
the structures that execute the reduce handiness. A
diminish operation takes all qualities address by the same
key in the midway rundown and strategies them as
requirements be, transmitting a last new rundown. Here,
once the best part space is recognized through firefly
count, the enormous data game plan is done using the
Pearson co-association. Yield from all Map center points,
<key1>, and <value1> segments, are assembled by
key1values before being spread to Reduce operation. It is
the turn of Reduce operation to combine value1 values
according to a specific key1. Aftereffect of Reduce
operation may be in setup of an once-over, <value2>or just
a single worth, value2.
ARCHISTRUCTURE DIAGRAM:

Figure 2: System Architecture Diagram

Volume 5, Issue 5, September October 2016

ISSN 2278-6856

Carrier flight delays have gone under expanded


examination of late in the mainstream press, with the
Federal Aviation Administration information uncovering
that aircraft on-time execution was even from a
pessimistic standpoint level in 13 years in 2007. Flight
delays have been ascribed to a few causes, for example,
climate conditions, airplane terminal clog, airspace
blockage, utilization of littler air ship via carriers, and so
forth. In this paper, we inspect the effect of the planned
piece time distributed for a flight, a variable controlled via
aircrafts, on-time landing execution. We examine exact
flight information distributed by the Bureau of
Transportation Statistics to gauge the planned on-time
landing likelihood of every business local flight flown in
the United States in 2008 by a noteworthy bearer.
The auxiliary estimation come nearer from econometrics
is then used to credit the overage to underage cost
proportion of the newsvendor model for every flight. Our
outcomes demonstrate that carriers methodicallly
"underemphasize" flight delays, i.e., the flight delay costs
inferred by the newsvendor model are not exactly the
suggested expenses of unexpected arrivals for a huge part
of flights. Our outcomes demonstrate that income drivers
(e.g., normal toll) and aggressive measures (e.g., piece of
the pie) significantly affect the booked on-time landing
likelihood. We additionally demonstrate that the planned
on-time landing likelihood is not emphatically influenced
by the aggregate number of travelers on the flying
machine pivot who could be influenced by a flight delay,
or the quantity of approaching and active associating
travelers on a flight. Operational qualities, for example,
the center and talked system structure likewise
significantly affect the booked on-time entry likelihood. At
long last, full-benefit aircrafts put a higher weight on the
expense recently entries than do ease bearers, and flying
on the most reduced admission flight on a course brings
about a drop in the booked .
DATA SET:
In the data set to receive the American static all
association official web site. [11] The data consists of
flight arrival and departure details for all commercial
flights within the UK, This is a large dataset: there are
nearly 130 million records in total, and takes up 1.8
gigabytes of space compressed and 14 gigabytes when
uncompressed. To make sure that you're not overwhelmed
by the size of the data, It is an Airline dataset which
consists of important attributes as follows.
Page 49

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016

ISSN 2278-6856

5.150 + min we will have rating


Name
1.Year
2.Month
3.DayofMonth
4.DayofWeek
5.DepTime
6.CRSDepTime
7.ArrTime
8.CRSArrTime
9.UniqueCarrier
10.FlightNum
11.TailNum
12.ActualElapsedTime
13.CRSEElasedTime
14.AirTime
15.ArrDealy
16.DepDelay
17.Origin
18.Dest
19.Distance
20.TaxiIn
21.taxiout
22.cancelled
23.Cancellationcode
24.Diverted
25.CarrierDelay
26.WeatherDelay
27.NASDelay
28.SecurityDelay
29.LateAircraftDelay

Description
1989 to 2008
1-12
1-31
1(Monday)-7(Sunday)
Actual
departure
time(local,hhmm)
Scheduled
departure
time(local,hhmm)
Actual
arrivaltime(local,hhmm)
Scheduled arrival time
Uniquecarrier code
flightnumber
Plane tail number
In minutes
In minutes
In minutes
Arrival delaying minutes
Departure delay, in
minutes
Origin IATA airport
code
Destination
IATA
airport code
In miles
Taxi in time,in minutes
Taxi outtime in minutes
Was the flight cancelled?
Reson for cancellation
1=yes, 0=no
In minutes
In minutes
In minutes
In minutes
In minutes

Recommendation:
Pearson's relationship frequently a couple of quantitative
variables are measured on each person from a case. If we
consider a few such variables, it is from time to time
imperative to develop if there is a relationship between the
two; i.e. to check whether they are connected. We can sort
the sort of association by considering as one variable
additions what happens to the following variable:
Positive Connection: the other variable tends to in like
manner augmentation;
Negative Connection: the other variable has a tendency to
decrease;
No Connection: the other variable does not tend to either
augment or lessening.
The beginning stage of any such investigation ought to in
this way be the development and resulting examination of
a scatterplot. Case of negative, no and positive connection
are as per the following.

Data Cleaning:
The Data set has been sent for cleaning and the next phase
.The main purpose of using data set cleaning is to obtain
required data.

Correlation coefficient :
Pearson's association coefficient is an accurate measure of
the nature of a straight relationship between coordinated
data. In an example it is implied by r and is by blueprint
constrained as takes after
-1<=r<=1

Data Classification:
After the cleaned Data has been send for classification.
After classification we get the plain details In which the
plain has highest rating means no delay at maximum If we
have rating from
1. Delay >=0 && delay<25 we will have rating 5.
2. Delay >=25 && delay<100 we will have rating 4
3. Delay >=100 && delay<125 we will have rating 3
4. Delay >=125 && delay<150 we will have rating 2

Furthermore:
Positive qualities mean positive straight relationship;
Negative qualities connote negative straight relationship;
An estimation of 0 implies no straight relationship; The
closer the quality is to 1 or 1, the more grounded the
immediate relationship. In the figures diverse cases and
their relating test relationship coefficient qualities are
presented. The underlying three address the "convincing"
association estimations of - 1, 0and1:

Volume 5, Issue 5, September October 2016

Page 50

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016

When we r=+-1 say we have immaculate connection with


the focuses being in an impeccable straight
Constantly what we see in an example are qualities as
takes after: line.
Assumptions:
The computation of Pearson's connection coefficient and
consequent importance testing of it requires the
accompanying information suppositions to hold:
Interim or proportion level;
Straightly related;
Bivariate typically conveyed.
Practically speaking the last presumption is checked by
requiring both variables to be independently typically
appropriated (which is a by-item outcome of bivariate
ordinariness). Practically Pearson's relationship coefficient
is delicate to skewed dispersions and exceptions, in this
way in the event that we don't have these conditions we
are content.

ISSN 2278-6856

Figure 3: Data cleaning


..............DATA CLASSIFICATION:.........
After the cleaned Data has been send for classification.
After classification we get the plain details In which the
plain has highest rating means no delay at maximum

Figure 4: Data classification

................RECUMENDATION:.........

4.RESULTS
To make sure that you're not overwhelmed by the size of
the data, In large dataset process only Map Reduce
Framework. This process of working resulted in accurate
past and affective results previous ways of using data base
or files has a higher time complexity.
Figure 5: Recommendation
............DATACLEANING:.........................
The Data set has been sent for cleaning and the next phase
.The main purpose of using data set cleaning is to obtain
required data.

Volume 5, Issue 5, September October 2016

Page 51

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 5, Issue 5, September - October 2016

Figure 6 : time complexity

5.Conclusion
After a systematic a flow of events to analyze real time
data I came to a conclusion that the steps flowed the above
procedure will result in output that can be useful to every
air traveler. This data can be included at the time of
searching for flight or sorting flight information so that
every end user will be benefitted. Since the present
technique available online doesnt have any such delay
time criteria I can say that I am the first one to provide
search information to the public.

References
[1] BenediktElser, Alberto Montresor An Evaluation
Study of Big Data Frameworks for Graph
Processing, IEEE International Conference on Big
Data, 2013, pp. 60-67, 2013.
[2] Jeffrey Dean, and Sanjay Ghemawat. MapReduce:
Simplified Data Processing on Large Clusters.
Proceedings of the 6th conference on Symposium on
Operating Systems Design & Implementation.,
USENIX Association, Berkley, CA, USA, pp. 137150, 2004..
[3] Rama Satish K V and N P Kavya Big Data Processing
with harnessing Hadoop - MapReduce for Optimizing
Analytical Workloads, International Conference on
Contemporary Computing and Informatics (IC3I)
Issue May 2014.
[4] T. Kovacs, A fast classification based method for
fractal image encoding, Image and Vision
Computing, vol. 26, no. 8, pp. 11291136, 2008.
[5] Jeffrey Dean, and Sanjay Ghemawat. Map Reduce:
Simplified Data Processing on Large Clusters.
Proceedings of the 6th conference on Symposium on
Operating Systems Design & Implementation.,
USENIX Association, Berkley, CA, USA, pp. 137150, 2004.

Volume 5, Issue 5, September October 2016

ISSN 2278-6856

[6] Jiangtao Yin, Yong Liao, Mario Baldi, Lixin Gao and
Antonio Nucci Efficient Analytics on Ordered
Datasets using MapReduce, In proceedings of the
22nd international symposium on High-performance
parallel and distributed computing, ACM New York,
NY, USA, pp. 125-126, 2013.
[7] Rosmy C Jose and Shaiju Paul Privacy in Map
Reduce Based Systems: A Review, International
Journal of Computer Science and Mobile Computing,
Vol. 3, No. 2, pp.463 466, 2014.
[8] Victoria Lpez, Saradel Ro, Jos Manuel Bentez,
Francisco Herrera Cost-sensitive linguistic fuzzy rule
based classification systems under the Map Reduce
framework for imbalanced big data, Fuzzy Sets and
Systems, In Press, 2014.
[9] Praveen Kumar K R, R. Aparna, Storage and Access
in Product Review System using Hadoop,
International Journal of Recent Advances in
Engineering & Technology, pp 2347 - 2812, Volume2, Issue -6, 2014
[10] Qingchen Zhang, Zhikui Chen, Ailing Lv, Liang
Zhao, Fangyi Liu and Jian Zou A Universal Storage
Architecture for Big Data in Cloud Environment,
IEEE International Conference on Green Computing
and Communications, Beijing, pp. 447-480, 2013.
[11] Priyanka Shenoy, manoj Jain, Abhishek shetty,
deepali Vora/International jouranal of engineering
research and Application(IJERA) ISSN:2248-9622
vol.3,issue 2,March-April2013,app.676-679

Page 52

Das könnte Ihnen auch gefallen