Sie sind auf Seite 1von 13

TECHNICAL SEMINAR REPORT

on
APACHE HADOOP

SANDHYA G.V
1AP06CS042
VISVESVARAYA
TECHNOLOGICAL
UNIVERSITY
Jnana Sangama, Belgaum-590014, Karnataka,
INDIA

Seminar Report
On
“APACHE HADOOP”

Submitted in partial fulfillment of the requirements for the


VIII Semester

Bachelor of Engineering
IN
COMPUTER SCIENCE AND
ENGINEERING

For the Academic year


2009-2010
BY
SANDHYA G.V
(1AP06CS042)
Department of Computer Science and
Engineering
APS COLLEGE OF ENGINEERING
SOMANAHALLI ,BANGALORE-560082
2009-2010
APS COLLEGE OF
ENGINEERING
SOMANAHALLI
BANGALORE-560082

Department of Computer Science and Engineering

CERTIFICATE

This is to certify that the seminar entitled “APACHE


HADOOP” is a bonafide work carried out by
SANDHYA G.V bearing register number
1AP06CS042 in partial fulfillment for the award of
Degree of Bachelors (Bachelors of Engineering) in
Computer Science and Engineering of Visvesvaraya
Technological University, Belgaum during the year
2009-2010.

Signatures:

Seminar Guide

Head of the Deptartment


Prof Sri. C.P Sameerana

HOD, CSE

APS College of Engineering

Bangalore-82

ACKNOWLEDGEMENT

The satisfaction & euphoria that accompany the


successful completion of any ask would be but
incomplete without the mention of the people who
made it possible, whose constant guidance &
encouragement crowned my effort with success.

I express my sincere gratitude to Prof C.P


Sameerana, HOD of CSE for providing us the
necessary facilities to carry out my seminar work.
With profound sense of gratitude, I
acknowledge the guidance and support extended by
Asst. Prof. Shewtha and Mr. Somshekar, Dept of
CSE. Her incessant encouragement and invaluable
technical support have been of immense help in
realizing this seminar.Her guidance gave me the
environment to enhance my knowledge, skills and to
reach the pinnacle with sheer determination, dedication
and hard work.

I also extend my thanks to all the faculty


members of the Department of Computer Science &
Engineering, APSCE, Bengaluru, who have encouraged
us throughout the course of Bachelore of Engineering
and my friends.

SANDHYAG.V (1AP06CS042)
ABSTRACT

APACHE HADOOP is a software framework for data-


intensive ,distributed applications under a free license.It
enables applications to work with thousands of nodes
and petabytes of data.Hadoop was inspired by Google’s
Map/Reduce and the Google File System(GFS).

Hadoop is a top-level Apache project, being built and


used by a community of contributors from all over the
world. Yahoo! has been the largest contributor to the
project and uses Hadoop extensively in its web search
and advertising businesses. It was originally developed
to support distribution for the Nutch, which crawls the
web and builds a search engine index for the crawled
pages.

Hadoop is a framework for running applications on


large clusters of commodity hardware.The Hadoop
framework transparently provides applications both
reliability and data motion. Hadoop implements a
computational paradigm named map/reduce,it provides
a distributed file system that stores data on the compute
nodes, providing very high aggregate bandwidth across
the cluster.

Hadoop, a free software program named after a toy


elephant, has taken over some of the world’s biggest
Web sites. It controls the top search engines and
determines the ads displayed next to the results. It
decides what people see on Yahoo’s homepage and
finds long-lost friends on Facebook. New York Times,
Facebook are some examples of Hadoop
implementations. IBM and Google have announced a
major initiative to use Hadoop to support university
courses in distributed computer programming.
TABLE OF CONTENTS

1.Introduction…………………………………………
………………………….1

2.Motivation for
Hadoop…………………………………………
………………1

2.1Nutch search
engine…………………………………………
…………2

3.Hadoop
architecture……………………………………
………………………..3

4.Hadoop
core……………………………………………
………………………..5

4.1HDFS………………………………………
…………………………...5
4.2 Hadoop map
reduce…………………………………………
…………9

5. Hadoop
implementation…………………………………
………………………12

6. Software
requirements……………………………………
……………………..16

7. Hardware
requirements……………………………………
……………………21

8. Usage pattern
…………………………………………………
………………..22

9 Hadoop
Usage……………………………………………
……………………..25

9.1 Hadoop usage


@yahoo…………………………………………
……..25

9.2 Hadoop usage


@Facebook……………………………………
……….28

9.3 Hadoop usage


@Hive…………………………………………
……….29

9.4 Hadoop usage


@Amazon………………………………………
………29
10.
Advantages……………………………………
………………………………..30

11.
Disadvantages…………………………………
……………………………….31

12. Conclusion and future


work……………………………………………
………32

13.
References………………………………………
………………………………33

Das könnte Ihnen auch gefallen