Sie sind auf Seite 1von 15

Hadoop Interview Questions

As Hadoop becomes the hot career today, many finds


difficult to pass through interview.


In this post we have put together a comprehensive list
of frequently asked interview questions and answers to
help you get through your Hadoop interview.
1) What is MapReduce?

It is a framework or a programming model that is used
for processing large data sets over clusters of
computers using distributed programming.
2) What are maps and reduces?

Maps and Reduces are two phases of solving a query
in HDFS. Map is responsible to read data from input
location, and based on the input type, it will generate a
key value pair, that is, an intermediate output in local
machine. Reducer is responsible to process the
intermediate output received from the mapper and
generate the final output.

3) What are the four basic parameters of a mapper?

The four basic parameters of a mapper are Long
Writable, text, text and IntWritable. The first two
represent input parameters and the second two
represent intermediate output parameters.
4) What are the four basic parameters of a reducer?

The four basic parameters of a reducer are text,
IntWritable, text, IntWritable. The first two represent
intermediate output parameters and the second two
represent final output parameters.

5) Which are the three modes in which Hadoop can be
run?

The three modes in which Hadoop can be run are:
1. standalone (local) mode
2. Pseudo-distributed mode
3. Fully distributed mode
6) What are the features of Stand alone (local) mode?

In stand-alone mode there are no daemons, everything
runs on a single JVM. It has no DFS and utilizes the
local file system. Stand-alone mode is suitable only for
running MapReduce programs during development. It
is one of the most least used environments.

7) What are the features of Pseudo mode?

Pseudo mode is used both for development and in the
QA environment. In the Pseudo mode all the daemons
run on the same machine.
8) What is BloomMapFile used for?

The BloomMapFile is a class that extends MapFile. So
its functionality is similar to MapFile. BloomMapFile
uses dynamic Bloom filters to provide quick
membership test for the keys. It is used in Hbase table
format.
9) What is PIG?

PIG is a platform for analyzing large data sets that
consist of high level language for expressing data
analysis programs, coupled with infrastructure for
evaluating these programs. PIGs infrastructure layer
consists of a compiler that produces sequence of
MapReduce Programs.
10) What is the difference between logical and physical
plans?

The logical plan describes the logical operators that
have to be executed by Pig during execution. After this,
Pig produces a physical plan. The physical plan
describes the physical operators that are needed to
execute the script.


Weve covered some of the frequently asked interview
questions.
If you are looking out for more Hadoop Interview
Questions that are frequently asked by employers, visit

http://www.edureka.in/blog/category/big-data-and-hadoop/

Das könnte Ihnen auch gefallen