Sie sind auf Seite 1von 11

Hadoop interview questions?

For Free Demo Ph:+1 (646) 880-9474


For Details : Folkstrain.com

1.What is Hadoop Map Reduce ?


For process giant information sets in parallel across a hadoop cluster,
Hadoop MapReduce framework is employed. Data info analysis uses a two
step map and reduce process.

2.How Hadoop MapReduce works?


In MapReduce, throughout the map part it counts the words in every
document, where as with in the reduce part it aggregates the data as per the
document spanning the whole assortment. Through out the map part the
input file is split into splits for analysis by map tasks running in parallel
across Hadoop framework.

For Free Demo Ph:+1 (646) 880-9474


For Details : Folkstrain.com

3.Differentiate between Structured and Unstructured data?


Data which might be hold on in traditional database systems within the
sort of rows and columns, as an example the online purchase transactions
may be stated as Structured data. data which might be hold only partially in
traditional(DBS) database systems.

4.Explain what is shuffling in MapReduce ?


The process by that the system performs the kind and transfers the
map outputs to the reducer as inputs is understood because the shuffle.

For Free Demo Ph:+1 (646) 880-9474


For Details : Folkstrain.com

5.On what concept the Hadoop framework works?


1) HDFS - Hadoop distributed file system is that the java primarily
based classification system for ascendable and reliable storage of big
datasets. information in HDFS is hold within the variety of blocks and it
operates on the Master Slave design.
2) Hadoop MapReduce -This could be a java primarily based
programming paradigm of Hadoop framework that gives measurability
across numerous Hadoop clusters. MapReduce distributes the
employment into numerous tasks which will run in parallel.

For Free Demo Ph:+1 (646) 8809474


For Details : Folkstrain.com

6.What are the main components of a Hadoop Application?


Core elements of a Hadoop application are:
Hadoop Common
HDFS
Hadoop MapReduce
YARN
7.Explain what is distributed Cache in MapReduce Framework ?
Distributed Cache is a vital feature provided by map reduce framework.
after you wish to share some files across all nodes in Hadoop Cluster,
DistributedCache is employed. The files may be associate degree
practicable jar files or easy properties file.

For Free Demo Ph:+1 (646) 8809474


For Details : Folkstrain.com

8.What is Hadoop streaming?


Hadoop distribution includes a generic application programming
interface for writing Map and scale back jobs in any desired programming
language like Perl,Ruby,python etc. this is often said as Hadoop
Streaming. Users will produce and run jobs with any quite shell scripts or
feasible because the clerk or Reducers.

9.Explain what is NameNode in Hadoop?


Hadoop distribution includes a generic application programming
interface for writing Map and scale back jobs in any desired programming
language like Perl,Ruby,python etc. this is often said as Hadoop
Streaming.
For Free Demo Ph:+1 (646) 880-9474
For Details : Folkstrain.com

10.What is the best hardware configuration to run


Hadoop?
The good configuration for execution Hadoop jobs is dual core
machines or dual processors with 4GB or 8GB RAM that use
computer code memory. Hadoop extremely edges from victimization
computer code memory .it's not low - finish. computer code memory is
usually recommended for running Hadoop as a result of most of the
Hadoop users have older varied check errors by victimization non
computer code memory.

For Free Demo Ph:+1 (646) 8809474


For Details : Folkstrain.com

11.What are the most commonly defined input


formats in Hadoop?
There are three most common Input Formats defined in Hadoop
are:
Text Input Format: It is the default input format defined in Hadoop.
Key Value Input Format: It input format is used for plain text files
where in the files are broken down into lines.
Sequence File Input Format: It input format is used for reading files
in sequence.

For Free Demo Ph:+1 (646) 8809474


For Details : Folkstrain.com

12.What do the four Vs of Big Data denote?


simple explanation for the four critical features of big data:
a) Volume Scale of data
b) Velocity Analysis of streaming data
c) Variety Different forms of data
d) Veracity Uncertainty of data

13.Explain what is heartbeat in HDFS?


Heartbeat is brought up a signal used between an information
node and Name node, and between task tracker and job tracker, if
the Name node or job tracker doesn't reply to the signal, then it's
thought of there's some problems with data node or task tracker.

For Free Demo Ph:+1 (646) 8809474


For Details : Folkstrain.com

14. What is WebDAV in Hadoop?


To support updating & editing files WebDAV may be a set of
extensions to http. On most OS WebDAV shares are often
mounted as file systems , therefore it's attainable to access HDFS
as a customary filing system by exposing HDFS over WebDAV.

15.What is sqoop in Hadoop ?


To transfer the data between Hadoop HDFS and Relational
database management (RDBMS) tool is used known as Sqoop.
Using Sqoop information is transferred from RDMS like MySQL or
Oracle into HDFS likewise as mercantilism information from HDFS
file to RDBMS.
For Free Demo Ph:+1 (646) 8809474
For Details : Folkstrain.com

Thank You
For Free Demo Ph:+1 (646) 880-9474
For Details : Folkstrain.com

Das könnte Ihnen auch gefallen