As Hadoop becomes the hot career today, many finds
difficult to pass through interview.
In this post we have put together a comprehensive list of frequently asked interview questions and answers to help you get through your Hadoop interview. 1) What is MapReduce?
It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming. 2) What are maps and reduces?
Maps and Reduces are two phases of solving a query in HDFS. Map is responsible to read data from input location, and based on the input type, it will generate a key value pair, that is, an intermediate output in local machine. Reducer is responsible to process the intermediate output received from the mapper and generate the final output.
3) What are the four basic parameters of a mapper?
The four basic parameters of a mapper are Long Writable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters. 4) What are the four basic parameters of a reducer?
The four basic parameters of a reducer are text, IntWritable, text, IntWritable. The first two represent intermediate output parameters and the second two represent final output parameters.
5) Which are the three modes in which Hadoop can be run?
The three modes in which Hadoop can be run are: 1. standalone (local) mode 2. Pseudo-distributed mode 3. Fully distributed mode 6) What are the features of Stand alone (local) mode?
In stand-alone mode there are no daemons, everything runs on a single JVM. It has no DFS and utilizes the local file system. Stand-alone mode is suitable only for running MapReduce programs during development. It is one of the most least used environments.
7) What are the features of Pseudo mode?
Pseudo mode is used both for development and in the QA environment. In the Pseudo mode all the daemons run on the same machine. 8) What is BloomMapFile used for?
The BloomMapFile is a class that extends MapFile. So its functionality is similar to MapFile. BloomMapFile uses dynamic Bloom filters to provide quick membership test for the keys. It is used in Hbase table format. 9) What is PIG?
PIG is a platform for analyzing large data sets that consist of high level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. PIGs infrastructure layer consists of a compiler that produces sequence of MapReduce Programs. 10) What is the difference between logical and physical plans?
The logical plan describes the logical operators that have to be executed by Pig during execution. After this, Pig produces a physical plan. The physical plan describes the physical operators that are needed to execute the script.
Weve covered some of the frequently asked interview questions. If you are looking out for more Hadoop Interview Questions that are frequently asked by employers, visit