Sie sind auf Seite 1von 93

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on History of

Hadoop.
1. IBM and ________ have announced a major initiative to use Hadoop to support university
courses in distributed computer programming.
a) Google Latitude
b) Android (operating system)
c) Google Variations
d) Google
Answer:d
Explanation:Google and IBM Announce University Initiative to Address Internet-Scale.
2. Point out the correct statement :
a) Hadoop is an ideal environment for extracting and transforming small volumes of data
b) Hadoop stores data in HDFS and supports data compression/decompression
c) The Giraph framework is less useful than a MapReduce job to solve graph and machine
learning
d) None of the mentioned
Answer:b
Explanation:Data compression can be achieved using compression algorithms like bzip2, gzip,
LZO, etc. Different algorithms can be used in different scenarios based on their capabilities.
3. What license is Hadoop distributed under ?
a) Apache License 2.0
b) Mozilla Public License
c) Shareware
d) Commercial
Answer:a
Explanation:Hadoop is Open Source, released under Apache 2 license.
4. Sun also has the Hadoop Live CD ________ project, which allows running a fully functional
Hadoop cluster using a live CD.
a) OpenOffice.org
b) OpenSolaris
c) GNU
d) Linux

Answer:b
Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image.
5. Which of the following genres does Hadoop produce ?
a) Distributed file system
b) JAX-RS
c) Java Message Service
d) Relational Database Management System
Answer:a
Explanation: The Hadoop Distributed File System (HDFS) is designed to store very large data
sets reliably, and to stream those data sets at high bandwidth to user.
6. What was Hadoop written in ?
a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)
Answer:c
Explanation: The Hadoop framework itself is mostly written in the Java programming language,
with some native code in C and command line utilities written as shell-scripts.
7. Which of the following platforms does Hadoop run on ?
a) Bare metal
b) Debian
c) Cross-platform
d) Unix-like
Answer:c
Explanation:Hadoop has support for cross platform operating system.
8. Hadoop achieves reliability by replicating the data across multiple hosts, and hence does not
require ________ storage on hosts.
a) RAID
b) Standard RAID levels
c) ZFS
d) Operating system

Answer:a
Explanation:With the default replication value, 3, data is stored on three nodes: two on the same
rack, and one on a different rack.
9. Above the file systems comes the ________ engine, which consists of one Job Tracker, to
which client applications submit MapReduce jobs.
a) MapReduce
b) Google
c) Functional programming
d) Facebook
Answer:a
Explanation:MapReduce engine uses to distribute work around a cluster.

10. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and
matrix operations.
a) Machine learning
b) Pattern recognition
c) Statistical classification
d) Artificial intelligence
Answer:a
Explanation: The Apache Mahout projects goal is to build a scalable machine learning tool.
1. As companies move past the experimental phase with Hadoop, many cite the need for
additional capabilities, including:
a) Improved data storage and information retrieval
b) Improved extract, transform and load features for data integration
c) Improved data warehousing functionality
d) Improved security, workload management and SQL support
Answer:d
Explanation:Adding security to Hadoop is challenging because all the interactions do not follow
the classic client- server pattern.
2. Point out the correct statement :
a) Hadoop do need specialized hardware to process the data
b) Hadoop 2.0 allows live stream processing of real time data
c) In Hadoop programming framework output files are divided in to lines or records

d) None of the mentioned


Answer:b
Explanation:Hadoop batch processes data distributed over a number of computers ranging in
100s and 1000s.
3. According to analysts, for what can traditional IT systems provide a foundation when theyre
integrated with big data technologies like Hadoop ?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Answer:a
Explanation:Data warehousing integrated with Hadoop would give better understanding of data.
4. Hadoop is a framework that works with a variety of related tools. Common cohorts include:
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet
Answer:a
Explanation:To use Hive with HBase youll typically want to launch two clusters, one to run
HBase and the other to run Hive.
5. Point out the wrong statement :
a) Hardtops processing capabilities are huge and its real advantage lies in the ability to process
terabytes & petabytes of data
b) Hadoop uses a programming model called MapReduce, all the programs should confirms to
this model in order to work on Hadoop platform
c) The programming model, MapReduce, used by Hadoop is difficult to write and test
d) All of the mentioned
Answer:c
Explanation: The programming model, MapReduce, used by Hadoop is simple to write and test.
6. What was Hadoop named after?
a) Creator Doug Cuttings favorite circus act
b) Cuttings high school rock band

c) The toy elephant of Cuttings son


d) A sound Cuttings laptop made during Hadoops development
Answer:c
Explanation:Doug Cutting, Hadoops creator, named the framework after his childs stuffed toy
elephant.
7. All of the following accurately describe Hadoop, EXCEPT:
a) Open source
b) Real-time
c) Java-based
d) Distributed computing approach
Answer:b
Explanation:Apache Hadoop is an open-source software framework for distributed storage and
distributed processing of Big Data on clusters of commodity hardware.
8. __________ can best be described as a programming model used to develop Hadoop-based
applications that can process massive amounts of data.
a) MapReduce
b) Mahout
c) Oozie
d) All of the mentioned
Answer:a
Explanation:MapReduce is a programming model and an associated implementation for
processing and generating large data sets with a parallel, distributed algorithm.
9. __________ has the worlds largest Hadoop cluster.
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned
Answer:c
Explanation:Facebook has many Hadoop clusters, the largest among them is the one that is used
for Data warehousing.
10. Facebook Tackles Big Data With _______ based on Hadoop.
a) Project Prism

b) Prism
c) Project Big
d) Project Data
Answer:a
Explanation:Prism automatically replicates and moves data wherever its needed across a vast
network of computing facilities.
Hadoop Questions and Answers Hadoop Ecosystem
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Hadoop
Ecosystem.
1. ________ is a platform for constructing data flows for extract, transform, and load (ETL)
processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive
Answer:c
Explanation:Apache Pig is a platform for analyzing large data sets that consists of a high-level
language for expressing data analysis programs.
2. Point out the correct statement :
a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to
querying data
b) Hive is a relational database with SQL support
c) Pig is a relational database with SQL support
d) All of the mentioned
Answer:a
Explanation:Hive is a SQL-based data warehouse system for Hadoop that facilitates data
summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible
file systems.
3. _________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.
a) Scalding
b) HCatalog
c) Cascalog

d) All of the mentioned


Answer:c
Explanation:Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the
name Cascalog is a contraction of Cascading and Datalog.
4. Hive also support custom extensions written in :
a) C#
b) Java
c) C
d) C++
Answer:b
Explanation:Hive also support custom extensions written in Java, including user-defined
functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats.
5. Point out the wrong statement :
a) Elastic MapReduce (EMR) is Facebooks packaged Hadoop offering
b) Amazon Web Service Elastic MapReduce (EMR) is Amazons packaged Hadoop offering
c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate
d) All of the mentioned
Answer:a
Explanation:Rather than building Hadoop deployments manually on EC2 (Elastic Compute
Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation
commands, either through the AWS Web Console or through command-line tools.
6. ________ is the most popular high-level Java API in Hadoop Ecosystem
a) Scalding
b) HCatalog
c) Cascalog
d) Cascading
Answer:d
Explanation:Cascading hides many of the complexities of MapReduce programming behind
more intuitive pipes and data flow abstractions.
7. ___________ is general-purpose computing model and runtime system for distributed data
analytics.
a) Mapreduce

b) Drill
c) Oozie
d) None of the mentioned
Answer:a
Explanation:Mapreduce provides a flexible and scalable foundation for analytics, from
traditional reporting to leading-edge machine learning algorithms.
8. The Pig Latin scripting language is not only a higher-level data flow language but also has
operators similar to :
a) SQL
b) JSON
c) XML
d) All of the mentioned
Answer:a
Explanation:Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL
and the low-level procedural style of MapReduce.
9. _______ jobs are optimized for scalability but not latency.
a) Mapreduce
b) Drill
c) Oozie
d) Hive
Answer:d
Explanation:Hive Queries are translated to MapReduce jobs to exploit the scalability of
MapReduce.

10. ______ is a framework for performing remote procedure calls and data serialization.
a) Drill
b) BigTop
c) Avro
d) Chukwa
Answer:c
Explanation:In the context of Hadoop, Avro can be used to pass data from one program or
language to another.

Hadoop Questions and Answers Introduction to Mapreduce


This set of Multiple Choice Questions & Answers (MCQs) focuses on MapReduce.
1. A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the
JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
Answer:c
Explanation:TaskTracker receives the information necessary for execution of a Task from
JobTracker, Executes the Task, and Sends the Results back to JobTracker.
2. Point out the correct statement :
a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function
c) Reduce Task in MapReduce is performed using the Map() function
d) All of the mentioned
Answer:a
Explanation:This feature of MapReduce is Data Locality.
3. ___________ part of the MapReduce is responsible for processing one or more chunks of data
and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
Answer:a
Explanation:Map Task in MapReduce is performed using the Map() function.
4. _________ function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
a) Reduce
b) Map
c) Reducer

d) All of the mentioned


Answer:a
Explanation:Reduce function collates the work and resolves the results.
5. Point out the wrong statement :
a) A MapReduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the map and
reduce methods
d) None of the mentioned
Answer:d
Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and reexecutes the failed tasks.
6. Although the Hadoop framework is implemented in Java , MapReduce applications need not
be written in :
a) Java
b) C
c) C#
d) None of the mentioned
Answer:a
Explanation:Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce
applications (non JNITM based).
7. ________ is a utility which allows users to create and run jobs with any executables as the
mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer:b
Explanation:Hadoop streaming is one of the most important utilities in the Apache Hadoop
distribution.

8. __________ maps input key/value pairs to a set of intermediate key/value pairs.


a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned
Answer:a
Explanation:Maps are the individual tasks that transform input records into intermediate records.
9. The number of maps is usually driven by the total size of :
a) inputs
b) outputs
c) tasks
d) None of the mentioned
Answer:a
Explanation:Total size of inputs means total number of blocks of the input files.
10. _________ is the default Partitioner for partitioning key space.
a) HashPar
b) Partitioner
c) HashPartitioner
d) None of the mentioned
Answer:c
Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method called
getPartition to partition.
Hadoop Questions and Answers Analyzing Data with Hadoop
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Analyzing Data
with Hadoop.
1. Mapper implementations are passed the JobConf for the job via the ________ method
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configureable
d) None of the mentioned

Answer:b
Explanation:JobConfigurable.configure method is overridden to initialize themselves.
2. Point out the correct statement :
a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit generated by
the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value)
format
d) All of the mentioned
Answer: d
Explanation:Reporters can be used to set application-level status messages and update Counters.
3. Input to the _______ is the sorted output of the mappers.
a) Reducer
b) Mapper
c) Shuffle
d) All of the mentioned
Answer:a
Explanation:In Shuffle phase the framework fetches the relevant partition of the output of all the
mappers, via HTTP.
4. The right number of reduces seems to be :
a) 0.90
b) 0.80
c) 0.36
d) 0.95
Answer:d
Explanation: The right number of reduces seems to be 0.95 or 1.75.
5. Point out the wrong statement :
a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but increases load
balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired
d) The framework groups Reducer inputs by keys (since different mappers may have output the

same key) in sort stage


Answer:a
Explanation:Reducer has 3 primary phases: shuffle, sort and reduce.
6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned
Answer:d
Explanation: The output of the reduce task is typically written to the FileSystem. The output of
the Reducer is not sorted.
7. Which of the following phases occur simultaneously ?
a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map
d) All of the mentioned
Answer:a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being
fetched they are merged.
8. Mapper and Reducer implementations can use the ________ to report progress or just indicate
that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
9. __________ is a generalization of the facility provided by the MapReduce framework to
collect data output by the Mapper or the Reducer
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned

Answer:b
Explanation:Hadoop MapReduce comes bundled with a library of generally useful mappers,
reducers, and partitioners.
10. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop
framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
d) None of the mentioned
Answer:b
Explanation:JobConf represents a MapReduce job configuration.
Hadoop Questions and Answers Scaling out in Hadoop
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Scaling out in
Hadoop.
1. ________ systems are scale-out file-based (HDD) systems moving to more uses of memory in
the nodes.
a) NoSQL
b) NewSQL
c) SQL
d) All of the mentioned
Answer:a
Explanation: NoSQL systems make the most sense whenever the application is based on data
with varying data types and the data can be stored in key-value notation.
2. Point out the correct statement :
a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload
b) HDFS runs on a small cluster of commodity-class nodes
c) NEWSQL is frequently the collection point for big data
d) None of the mentioned
Answer:a
Explanation:Hadoop together with a relational data warehouse, they can form very effective data
warehouse infrastructure.

3. Hadoop data is not sequenced and is in 64MB to 256 MB block sizes of delimited record
values with schema applied on read based on:
a) HCatalog
b) Hive
c) Hbase
d) All of the mentioned
Answer:a
Explanation:Other means of tagging the values also can be used.
4. __________ are highly resilient and eliminate the single-point-of-failure risk with traditional
Hadoop deployments
a) EMR
b) Isilon solutions
c) AWS
d) None of the mentioned
Answer:b
Explanation:enterprise data protection and security options including file system auditing and
data-at-rest encryption to address compliance requirements is also provided by Isilon solution.
5. Point out the wrong statement :
a) EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and
highly efficient storage platform
b) Isilons native HDFS integration means you can avoid the need to invest in a separate Hadoop
infrastructure
c) NoSQL systems do provide high latency access and accommodate less concurrent users
d) None of the mentioned
Answer:c
Explanation:NoSQL systems do provide low latency access and accommodate many concurrent
users.
6. HDFS and NoSQL file systems focus almost exclusively on adding nodes to :
a) Scale out
b) Scale up
c) Both Scale out and up
d) None of the mentioned

Answer:a
Explanation:HDFS and NoSQL file systems focus almost exclusively on adding nodes to
increase performance (scale-out) but even they require node configuration with elements of scale
up.
7. Which is the most popular NoSQL database for scalable big data store with Hadoop ?
a) Hbase
b) MongoDB
c) Cassandra
d) None of the mentioned
Answer:a
Explanation:HBase is the Hadoop database: a distributed, scalable Big Data store that lets you
host very large tables billions of rows multiplied by millions of columns on clusters built
with commodity hardware.
8. The ___________ can also be used to distribute both jars and native libraries for use in the
map and/or reduce tasks.
a) DataCache
b) DistributedData
c) DistributedCache
d) All of the mentioned
Answer:c
Explanation: The child-jvm always has its current working directory added to the
java.library.path and LD_LIBRARY_PATH.
9. HBase provides ___________ like capabilities on top of Hadoop and HDFS.
a) TopTable
b) BigTop
c) Bigtable
d) None of the mentioned
Answer:c
Explanation: Google Bigtable leverages the distributed data storage provided by the Google File
System.
10. _______ refers to incremental costs with no major impact on solution design, performance
and complexity.
a) Scale-out

b) Scale-down
c) Scale-up
d) None of the mentioned
Answer:c
Explanation:dding more CPU/RAM/Disk capacity to Hadoop DataNode that is already part of a
cluster does not require additional network switches.
Hadoop Questions and Answers Hadoop Streaming
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Hadoop
Streaming.
1. Streaming supports streaming command options as well as _________ command options.
a) generic
b) tool
c) library
d) task
Answer:a
Explanation:Place the generic options before the streaming options, otherwise the command will
fail.
2. Point out the correct statement :
a) You can specify any executable as the mapper and/or the reducer
b) You cannot supply a Java class as the mapper and/or the reducer
c) The class you supply for the output format should return key/value pairs of Text class
d) All of the mentioned
Answer:a
Explanation:If you do not specify an input format class, the TextInputFormat is used as the
default.
3. Which of the following Hadoop streaming command option parameter is required ?
a) output directoryname
b) mapper executable
c) input directoryname
d) All of the mentioned

Answer:d
Explanation:Required parameters is used for Input and Output location for mapper.
4. To set an environment variable in a streaming command use:
a) -cmden EXAMPLE_DIR=/home/example/dictionaries/
b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/
c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/
d) -cmenv EXAMPLE_DIR=/home/example/dictionaries/
Answer:c
Explanation:Environment Variable is set using cmdenv command.
5. Point out the wrong statement :
a) Hadoop has a library package called Aggregate
b) Aggregate allows you to define a mapper plugin class that is expected to generate
aggregatable items for each input key/value pair of the mappers
c) To use Aggregate, simply specify -mapper aggregate
d) None of the mentioned
Answer:c
Explanation:To use Aggregate, simply specify -reducer aggregate:
6. The ________ option allows you to copy jars locally to the current working directory of tasks
and automatically unjar the files.
a) archives
b) files
c) task
d) None of the mentioned
Answer:a
Explanation:Archives options is also a generic option.
7. ______________ class allows the Map/Reduce framework to partition the map outputs based
on certain key fields, not the whole keys.
a) KeyFieldPartitioner
b) KeyFieldBasedPartitioner
c) KeyFieldBased
d) None of the mentioned

Answer:b
Explanation: The primary key is used for partitioning, and the combination of the primary and
secondary keys is used for sorting.
8. Which of the following class provides a subset of features provided by the Unix/GNU Sort ?
a) KeyFieldBased
b) KeyFieldComparator
c) KeyFieldBasedComparator
d) All of the mentioned
Answer:c
Explanation:Hadoop has a library class, KeyFieldBasedComparator, that is useful for many
applications.
9. Which of the following class is provided by Aggregate package ?
a) Map
b) Reducer
c) Reduce
d) None of the mentioned
Answer:b
Explanation:Aggregate provides a special reducer class and a special combiner class, and a list of
simple aggregators that perform aggregations such as sum, max, min and so on over a
sequence of values.
10.Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that
effectively allows you to process text data like the unix ______ utility.
a) Copy
b) Cut
c) Paste
d) Move
Answer:b
Explanation: The map function defined in the class treats each input key/value pair as a list of
fields.
Hadoop Questions and Answers Introduction to HDFS
This set of Multiple Choice Questions & Answers (MCQs) focuses on Hadoop Filesystem
HDFS.

1. A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication
Answer:b
Explanation:All the metadata related to HDFS including the information about data nodes, files
stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
2. Point out the correct statement :
a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault
tolerance
d) None of the mentioned
Answer:a
Explanation:There can be any number of DataNodes in a Hadoop Cluster.
3. HDFS works in a __________ fashion.
a) master-worker
b) master-slave
c) worker/slave.
d) All of the mentioned
Answer:a
Explanation:NameNode servers as the master and each DataNode servers as a worker/slave
4. ________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned
Answer:c
Explanation:Secondary namenode is used for all time availability and reliability.
5. Point out the wrong statement :
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file

level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that
DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to
Answer:d
Explanation: NameNode is aware of the files to which the blocks stored on it belong to.
6. Which of the following scenario may not be a good fit for HDFS ?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned
Answer:a
Explanation:HDFS can be used for storing archive data since it is cheaper as HDFS allows
storing the data on low cost commodity hardware while ensuring a high degree of faulttolerance.
7. The need for data replication can arise in various scenarios like :
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned
Answer:d
Explanation:Data is replicated across different DataNodes to ensure a high degree of faulttolerance.
8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication
Answer:a
Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has
more than one DataNode, with data replicated across them.

9. HDFS provides a command line interface called __________ used to interact with HDFS.
a) HDFS Shell
b) FS Shell
c) DFS Shell
d) None of the mentioned
Answer:b
Explanation: The File System (FS) shell includes various shell-like commands that directly
interact with the Hadoop Distributed File System (HDFS).
10. HDFS is implemented in _____________ programming language.
a) C++
b) Java
c) Scala
d) None of the mentioned
Answer:b
Explanation:HDFS is implemented in Java and any computer which can run Java can host a
NameNode/DataNode on it.
Hadoop Questions and Answers Java Interface
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Java Interface.
1. In order to read any file in HDFS, instance of __________ is required.
a) filesystem
b) datastream
c) outstream
d) inputstream
Answer:a
Explanation:InputDataStream is used to read data from file.
2. Point out the correct statement :
a) The framework groups Reducer inputs by keys
b) The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are
merged
c) Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how intermediate
keys are grouped, these can be used in conjunction to simulate secondary sort on values
d) All of the mentioned

Answer:d
Explanation:If equivalence rules for keys while grouping the intermediates are different from
those for grouping keys before reduction, then one may specify a Comparator.
3. ______________ is method to copy byte from input stream to any other stream in Hadoop.
a) IOUtils
b) Utils
c) IUtils
d) All of the mentioned
Answer:a
Explanation:IOUtils class is static method in Java interface.
4. _____________ is used to read data from bytes buffers .
a) write()
b) read()
c) readwrite()
d) All of the mentioned
Answer:a
Explanation:readfully method can also be used instead of read method.
5. Point out the wrong statement :
a) The framework calls reduce method for each pair in the grouped inputs
b) The output of the Reducer is re-sorted
c) reduce method reduces values for a given key
d) None of the mentioned
Answer:b
Explanation: The output of the Reducer is not re-sorted.
6. Interface ____________ reduces a set of intermediate values which share a key to a smaller
set of values.
a) Mapper
b) Reducer
c) Writable
d) Readable
Answer:b
Explanation:Reducer implementations can access the JobConf for the job.

7. Reducer is input the grouped output of a :


a) Mapper
b) Reducer
c) Writable
d) Readable
Answer:a
Explanation:In the phase the framework, for each Reducer, fetches the relevant partition of the
output of all the Mappers, via HTTP.
8. The output of the reduce task is typically written to the FileSystem via :
a) OutputCollector
b) InputCollector
c) OutputCollect
d) All of the mentioned
Answer:a
Explanation:In reduce phase the reduce(Object, Iterator, OutputCollector, Reporter) method is
called for each pair in the grouped inputs.
9. Applications can use the _________ provided to report progress or just indicate that they are
alive.
a) Collector
b) Reporter
c) Dashboard
d) None of the mentioned
Answer:b
Explanation:In scenarios where the application takes a significant amount of time to process
individual key/value pairs, this is crucial since the framework might assume that the task has
timed-out and kill that task.
10. Which of the following parameter is to collect keys and combined values ?
a) key
b) values
c) reporter
d) output

Answer:d
Explanation:reporter parameter is for facility to report progress.
Hadoop Questions and Answers Data Flow
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Data Flow.
1. ________ is a programming model designed for processing large volumes of data in parallel
by dividing the work into a set of independent tasks.
a) Hive
b) MapReduce
c) Pig
d) Lucene
Answer:b
Explanation:MapReduce is the heart of hadoop.
2. Point out the correct statement :
a) Data locality means movement of algorithm to the data instead of data to algorithm
b) When the processing is done on the data algorithm is moved across the Action Nodes rather
than data to the algorithm
c) Moving Computation is expensive than Moving Data
d) None of the mentioned
Answer:a
Explanation:Data flow framework possesses the feature of data locality.
3. The daemons associated with the MapReduce phase are ________ and task-trackers.
a) job-tracker
b) map-tracker
c) reduce-tracker
d) All of the mentioned
Answer:a
Explanation:Map-Reduce jobs are submitted on job-tracker.
4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the
work as close to the data as possible
a) DataNodes
b) TaskTracker
c) ActionNodes

d) All of the mentioned


Answer:b
Explanation:A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to
check its status whether the node is dead or alive.
5. Point out the wrong statement :
a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1)
list(K2, V2)
b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2,
list(V2)) list(K3, V3)
c) MapReduce has a complex model of data processing: inputs and outputs for the map and
reduce functions are key-value pairs
d) None of the mentioned
Answer:c
Explanation:MapReduce is relatively simple model to implement in Hadoop.
6. InputFormat class calls the ________ function and computes splits for each file and then sends
them to the jobtracker.
a) puts
b) gets
c) getSplits
d) All of the mentioned
Answer:c
Explanation:InputFormat uses their storage locations to schedule map tasks to process them on
the tasktrackers.
7. On a tasktracker, the map task passes the split to the createRecordReader() method on
InputFormat to obtain a _________ for that split.
a) InputReader
b) RecordReader
c) OutputReader
d) None of the mentioned
Answer:b
Explanation: The RecordReader loads data from its source and converts into key-value pairs
suitable for reading by mapper.

8. The default InputFormat is __________ which treats each value of input a new value and the
associated key is byte offset.
a) TextFormat
b) TextInputFormat
c) InputFormat
d) All of the mentioned
Answer:b
Explanation:A RecordReader is little more than an iterator over records, and the map task uses
one to generate record key-value pairs.
9. __________ controls the partitioning of the keys of the intermediate map-outputs.
a) Collector
b) Partitioner
c) InputFormat
d) None of the mentioned
Answer:b
Explanation: The output of the mapper is sent to the partitioner.
10. Output of the mapper is first written on the local disk for sorting and _________ process.
a) shuffling
b) secondary sorting
c) forking
d) reducing
Answer:a
Explanation:All values corresponding to the same key will go the same reducer.
Hadoop Questions and Answers Hadoop Archives
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Hadoop
Archives.
1. _________ is the name of the archive you would like to create.
a) archive
b) archiveName
c) Name
d) None of the mentioned

Answer:b
Explanation: The name should have a *.har extension.
2. Point out the correct statement :
a) A Hadoop archive maps to a file system directory
b) Hadoop archives are special format archives
c) A Hadoop archive always has a *.har extension
d) All of the mentioned
Answer:d
Explanation:A Hadoop archive directory contains metadata (in the form of _index and
_masterindex) and data (part-*) files.
3. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem
than the default file system.
a) Hive
b) Pig
c) MapReduce
d) All of the mentioned
Answer:c
Explanation:Hadoop Archives is exposed as a file system MapReduce will be able to use all the
logical input files in Hadoop Archives as input.
4. The __________ guarantees that excess resources taken from a queue will be restored to it
within N minutes of its need for them.
a) capacitor
b) scheduler
c) datanode
d) None of the mentioned
Answer:b
Explanation:Free resources can be allocated to any queue beyond its guaranteed capacity.
5. Point out the wrong statement :
a) The Hadoop archive exposes itself as a file system layer
b) Hadoop archives are immutable
c) Archive renames, deletes and creates return an error
d) None of the mentioned

Answer:d
Explanation:All the fs shell commands in the archives work but with a different URI.
6. _________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share
large clusters.
a) Flow Scheduler
b) Data Scheduler
c) Capacity Scheduler
d) None of the mentioned
Answer:c
Explanation: The Capacity Scheduler supports for multiple queues, where a job is submitted to a
queue.
7. Which of the following parameter describes destination directory which would contain the
archive ?
a) -archiveName
b) <source>
c) <destination>
d) None of the mentioned
Answer:c
Explanation: -archiveName is the name of the archive to be created.
8. _________ identifies filesystem pathnames which work as usual with regular expressions.
a) -archiveName <name>
b) <source>
c) <destination>
d) None of the mentioned
Answer:d
Explanation: identifies destination directory which would contain the archive.
9. __________ is the parent argument used to specify the relative path to which the files should
be archived to
a) -archiveName <name>
b) -p <parent_path>
c) <destination>
d) <source>

Answer:b
Explanation: The hadoop archive command creates a Hadoop archive, a file that contains other
files.
10. Which of the following is a valid syntax for hadoop archive ?
a)
hadooparchive [ Generic Options ] archive
-archiveName <name>
[-p <parent>]
<source>
<destination>
b)
hadooparch [ Generic Options ] archive
-archiveName <name>
[-p <parent>]
<source>
<destination>
c)
hadoop [ Generic Options ] archive
-archiveName <name>
[-p <parent>]
<source>
<destination>
d) None of the mentioned
Answer:c
Explanation: The Hadoop archiving tool can be invoked using the following command format:
hadoop archive -archiveName name -p *

Hadoop Questions and Answers Hadoop I/O


This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Hadoop I/O.
1. Hadoop I/O Hadoop comes with a set of ________ for data I/O.
a) methods
b) commands
c) classes
d) None of the mentioned
Answer:d
Explanation:Hadoop I/O consist of primitives for serialization and deserialization.
2. Point out the correct statement :
a) The sequence file also can contain a secondary key-value list that can be used as file
Metadata
b) SequenceFile formats share a header that contains some information which allows the reader
to recognize is format
c) Therere Key and Value Class Names that allow the reader to instantiate those classes, via
reflection, for reading
d) All of the mentioned
Answer:d
Explanation:In contrast with other persistent key-value data structures like B-Trees, you cant
seek to a specified key editing, adding or removing it.
3. Apache Hadoops ___________ provides a persistent data structure for binary key-value pairs.
a) GetFile
b) SequenceFile
c) Putfile
d) All of the mentioned
Answer:b
Explanation:SequenceFile is append-only.
4. How many formats of SequenceFile are present in Hadoop I/O ?
a) 2
b) 3
c) 4
d) 5

Answer:b
Explanation:SequenceFile has 3 available formats: An Uncompressed format, A Record
Compressed format and a Block-Compressed.
5. Point out the wrong statement :
a) The data file contains all the key, value records but key N + 1 must be greater then or equal to
the key N
b) Sequence file is a kind of hadoop file based data structure
c) Map file type is splittable as it contains a sync point after several records
d) None of the mentioned
Answer:c
Explanation:Map file is again a kind of hadoop file based data structure and it differs from a
sequence file in a matter of the order.
6. Which of the following format is more compression-aggressive ?
a) Partition Compressed
b) Record Compressed
c) Block-Compressed
d) Uncompressed
Answer:c
Explanation:SequenceFile key-value list can be just a Text/Text pair, and is written to the file
during the initialization that happens in the SequenceFile.
7. The __________ is a directory that contains two SequenceFile.
a) ReduceFile
b) MapperFile
c) MapFile
d) None of the mentioned
Answer:c
Explanation:Sequence files are data file (/data) and the index file (/index).
8. The ______ file is populated with the key and a LongWritable that contains the starting byte
position of the record.
a) Array
b) Index
c) Immutable

d) All of the mentioned


Answer:b
Explanation:Index doest contains all the keys but just a fraction of the keys.
9. The _________ as just the value field append(value) and the key is a LongWritable that
contains the record number, count + 1.
a) SetFile
b) ArrayFile
c) BloomMapFile
d) None of the mentioned
Answer:b
Explanation: The SetFile instead of append(key, value) as just the key field append(key) and the
value is always the NullWritable instance.
10. ____________ data file takes is based on avro serializaton framework which was primarily
created for hadoop.
a) Oozie
b) Avro
c) cTakes
d) Lucene
Answer:b
Explanation:Avro is a splittable data format with a metadata section at the beginning and then a
sequence of avro serialized objects.
Hadoop Questions and Answers Compression
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Compression.
1. The _________ codec from Google provides modest compression ratios.
a) Snapcheck
b) Snappy
c) FileCompress
d) None of the mentioned
Answer:b
Explanation:Snappy has fast compression and decompression speeds.

2. Point out the correct statement :


a) Snappy is licensed under the GNU Public License (GPL)
b) BgCIK needs to create an index when it compresses a file
c) The Snappy codec is integrated into Hadoop Common, a set of common utilities that supports
other Hadoop subprojects
d) None of the mentioned
Answer:c
Explanation:You can use Snappy as an add-on for more recent versions of Hadoop that do not yet
provide Snappy codec support.
3. Which of the following compression is similar to Snappy compression ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned
Answer:a
Explanation:LZO is only really desirable if you need to compress text files.
4. Which of the following supports splittable compression ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned
Answer:a
Explanation:LZO enables the parallel processing of compressed text file splits by your
MapReduce jobs.
5. Point out the wrong statement :
a) From a usability standpoint, LZO and Gzip are similar.
b) Bzip2 generates a better compression ratio than does Gzip, but its much slower
c) Gzip is a compression utility that was adopted by the GNU project
d) None of the mentioned
Answer:a
Explanation:From a usability standpoint, Bzip2 and Gzip are similar.

6. Which of the following is the slowest compression technique ?


a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned
Answer:b
Explanation:Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.
7. Gzip (short for GNU zip) generates compressed files that have a _________ extension.
a) .gzip
b) .gz
c) .gzp
d) .g
Answer:b
Explanation:You can use the gunzip command to decompress files that were created by a number
of compression utilities, including Gzip.
8. Which of the following is based on the DEFLATE algorithm ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned
Answer:c
Explanation:gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and
Huffman Coding.
9. __________ typically compresses files to within 10% to 15% of the best available techniques.
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned
Answer:b
Explanation:bzip2 is a freely available, patent free (see below), high-quality data compressor.
10. The LZO compression format is composed of approximately __________ blocks of
compressed data.

a) 128k
b) 256k
c) 24k
d) 36k
Answer:b
Explanation:LZO was designed with speed in mind: it decompresses about twice as fast as gzip,
meaning its fast enough to keep up with hard drive read speeds.
Hadoop Questions and Answers Data Integrity
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Data Integrity.
1. The HDFS client software implements __________ checking on the contents of HDFS files.
a) metastore
b) parity
c) checksum
d) None of the mentioned
Answer:c
Explanation:When a client creates an HDFS file, it computes a checksum of each block of the
file and stores these checksums in a separate hidden file in the same HDFS namespace.
2. Point out the correct statement :
a) The HDFS architecture is compatible with data rebalancing schemes
b) Datablocks support storing a copy of data at a particular instant of time.
c) HDFS currently support snapshots.
d) None of the mentioned
Answer:a
Explanation:A scheme might automatically move data from one DataNode to another if the free
space on a DataNode falls below a certain threshold.
3. The ___________ machine is a single point of failure for an HDFS cluster.
a) DataNode
b) NameNode
c) ActionNode
d) All of the mentioned

Answer:b
Explanation:If the NameNode machine fails, manual intervention is necessary. Currently,
automatic restart and failover of the NameNode software to another machine is not supported.
4. The ____________ and the EditLog are central data structures of HDFS.
a) DsImage
b) FsImage
c) FsImages
d) All of the mentioned
Answer:b
Explanation:A corruption of these files can cause the HDFS instance to be non-functional
5. Point out the wrong statement :
a) HDFS is designed to support small files only
b) Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get
updated synchronously
c) NameNode can be configured to support maintaining multiple copies of the FsImage and
EditLog
d) None of the mentioned
Answer:a
Explanation:HDFS is designed to support very large files.
6. __________ support storing a copy of data at a particular instant of time.
a) Data Image
b) Datanots
c) Snapshots
d) All of the mentioned
Answer:c
Explanation:One usage of the snapshot feature may be to roll back a corrupted HDFS instance to
a previously known good point in time.
7. Automatic restart and ____________ of the NameNode software to another machine is not
supported.
a) failover
b) end
c) scalability

d) All of the mentioned


Answer:a
Explanation:If the NameNode machine fails, manual intervention is necessary.
8. HDFS, by default, replicates each data block _____ times on different nodes and on at least
____ racks.
a) 3,2
b) 1,2
c) 2,3
d) All of the mentioned
Answer:a
Explanation:HDFS has a simple yet robust architecture that was explicitly designed for data
reliability in the face of faults and failures in disks, nodes and networks.
9. _________ stores its metadata on multiple disks that typically include a non-local file server.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
Answer:b
Explanation:HDFS tolerates failures of storage servers (called DataNodes) and its disks
10. The HDFS file system is temporarily unavailable whenever the HDFS ________ is down.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned
Answer:b
Explanation:When the HDFS NameNode is restarted it recovers its metadata.
Hadoop Questions and Answers Serialization
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Serialization.
1. Apache _______ is a serialization framework that produces data in a compact binary format.
a) Oozie
b) Impala

c) kafka
d) Avro
Answer:d
Explanation:Apache Avro doesnt require proxy objects or code generation.
2. Point out the correct statement :
a) Apache Avro is a framework that allows you to serialize data in a format that has a schema
built in
b) The serialized data is in a compact binary format that doesnt require proxy objects or code
generation
c) Including schemas with the Avro messages allows any application to deserialize the data
d) All of the mentioned
Answer:d
Explanation:Instead of using generated proxy libraries and strong typing, Avro relies heavily on
the schemas that are sent along with the serialized data.
3. Avro schemas describe the format of the message and are defined using :
a) JSON
b) XML
c) JS
d) All of the mentioned
Answer:a
Explanation: The JSON schema content is put into a file.
4. The ____________ is an iterator which reads through the file and returns objects using the
next() method.
a) DatReader
b) DatumReader
c) DatumRead
d) None of the mentioned
Answer:b
Explanation:DatumReader reads the content through the DataFileReader implementation.
5. Point out the wrong statement :
a) Java code is used to deserialize the contents of the file into objects
b) Avro allows you to use complex data structures within Hadoop MapReduce jobs

c) The m2e plug-in automatically downloads the newly added JAR files and their dependencies
d) None of the mentioned
Answer:d
Explanation:A unit test is useful because you can make assertions to verify that the values of the
deserialized object are the same as the original values.
6. The ____________ class extends and implements several Hadoop-supplied interfaces.
a) AvroReducer
b) Mapper
c) AvroMapper
d) None of the mentioned
Answer:c
Explanation:AvroMapper is used to provide the ability to collect or map data.
7. ____________ class accepts the values that the ModelCountMapper object has collected.
a) AvroReducer
b) Mapper
c) AvroMapper
d) None of the mentioned
Answer:a
Explanation:AvroReducer summarizes them by looping through the values.
8. The ________ method in the ModelCountReducer class reduces the values the mapper
collects into a derived value
a) count
b) add
c) reduce
d) All of the mentioned
Answer:c
Explanation:In some case, it can be simple sum of the values.
9. Which of the following works well with Avro ?
a) Lucene
b) kafka
c) MapReduce

d) None of the mentioned


Answer:c
Explanation:You can use Avro and MapReduce together to process many items serialized with
Avros small binary format.
10. __________ tools is used to generate proxy objects in Java to easily work with the objects.
a) Lucene
b) kafka
c) MapReduce
d) Avro
Answer:d
Explanation:Avro serialization includes the schema with it in JSON format which allows
you to have different versions of objects.
Hadoop Questions and Answers Avro-1
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Avro-1.
1. Avro schemas are defined with _____
a) JSON
b) XML
c) JAVA
d) All of the mentioned
Answer:a
Explanation:JSON implementation facilitates implementation in languages that already have
JSON libraries.
2. Point out the correct statement :
a) Avro provides functionality similar to systems such as Thrift
b) When Avro is used in RPC, the client and server exchange data in the connection handshake
c) Apache Avro, Avro, Apache, and the Avro and Apache logos are trademarks of The Java
Foundation
d) None of the mentioned
Answer:a
Explanation:Avro differs from these systems in the fundamental aspects like untagged data.

3. __________ facilitates construction of generic data-processing systems and languages.


a) Untagged data
b) Dynamic typing
c) No manually-assigned field IDs
d) All of the mentioned
Answer:b
Explanation:Avro does not require that code be generated.
4. With ______ we can store data and read it easily with various programming languages
a) Thrift
b) Protocol Buffers
c) Avro
d) None of the mentioned
Answer:c
Explanation:Avro is optimized to minimize the disk space needed by our data and it is flexible.
5. Point out the wrong statement :
a) Apache Avro is a data serialization system
b) Avro provides simple integration with dynamic languages
c) Avro provides rich data structures
d) All of the mentioned
Answer:d
Explanation: Code generation is not required to read or write data files nor to use or implement
RPC protocols in Avro.
6. ________ are a way of encoding structured data in an efficient yet extensible format.
a) Thrift
b) Protocol Buffers
c) Avro
d) None of the mentioned
Answer:b
Explanation:Google uses Protocol Buffers for almost all of its internal RPC protocols and file
formats.
7. Thrift resolves possible conflicts through _________ of the field.
a) Name

b) Static number
c) UID
d) None of the mentioned
Answer:b
Explanation:Avro resolves possible conflicts through the name of the field.
8. Avro is said to be the future _______ layer of Hadoop.
a) RMC
b) RPC
c) RDC
d) All of the mentioned
Answer:b
Explanation:When Avro is used in RPC, the client and server exchange schemas in the
connection handshake.
9. When using reflection to automatically build our schemas without code generation, we need to
configure Avro using :
a) AvroJob.Reflect(jConf);
b) AvroJob.setReflect(jConf);
c) Job.setReflect(jConf);
d) None of the mentioned

Answer:c
Explanation:For strongly typed languages like Java, it also provides a generation code layer,
including RPC services code generation.
10. We can declare the schema of our data either in a ______ file.
a) JSON
b) XML
c) SQL
d) R
Answer:c
Explanation:Schema can be declared using an IDL or simply through Java beans by using
reflection-based schema building.

Hadoop Questions and Answers Avro-2


This set of Interview Questions and Answers focuses on Avro.
1. Which of the following is a primitive data type in Avro ?
a) null
b) boolean
c) float
d) All of the mentioned
Answer:d
Explanation:Primitive type names are also defined type names.
2. Point out the correct statement :
a) Records use the type name record and support three attributes
b) Enum are represented using JSON arrays
c) Avro data is always serialized with its schema
d) All of the mentioned
Answer:a
Explanation:A record is encoded by encoding the values of its fields in the order that they are
declared.
3. Avro supports ______ kinds of complex types.
a) 3
b) 4
c) 6
d) 7
Answer:d
Explanation:Avro supports six kinds of complex types: records, enums, arrays, maps, unions and
fixed.
4.________ are encoded as a series of blocks.
a) Arrays
b) Enum
c) Unions
d) Maps

Answer:a
Explanation:Each block of array consists of a long count value, followed by that many array
items. A block with count zero indicates the end of the array. Each item is encoded per the arrays
item schema.
5. Point out the wrong statement :
a) Record, enums and fixed are named types
b) Unions may immediately contain other unions
c) A namespace is a dot-separated sequence of such names
d) All of the mentioned
Answer:b
Explanation:Unions may not immediately contain other unions.
6. ________ instances are encoded using the number of bytes declared in the schema.
a) Fixed
b) Enum
c) Unions
d) Maps
Answer:a
Explanation:Except for unions, the JSON encoding is the same as is used to encode field default
values.
7. ________ permits data written by one system to be efficiently sorted by another system.
a) Complex Data type
b) Order
c) Sort Order
d) All of the mentioned
Answer:c
Explanation:Avro binary-encoded data can be efficiently ordered without deserializing it to
objects.
8. _____________ are used between blocks to permit efficient splitting of files for MapReduce
processing.
a) Codec
b) Data Marker
c) Syncronization markers

d) All of the mentioned


Answer:c
Explanation:Avro includes a simple object container file format.
9. The __________ codec uses Googles Snappy compression library.
a) null
b) snappy
c) deflate
d) None of the mentioned
Answer:b
Explanation:Snappy is a compression library developed at Google, and, like many technologies
that come from Google, Snappy was designed to be fast.
10. Avro messages are framed as a list of _________
a) buffers
b) frames
c) rows
d) None of the mentioned
Answer:b
Explanation:Framing is a layer between messages and the transport. It exists to optimize certain
operations.
Hadoop Questions and Answers Metrics in Hbase
This set of Interview Questions & Answers focuses on Hbase.
1. _______ can change the maximum number of cells of a column family.
a) set
b) reset
c) alter
d) select
Answer:c
Explanation:Alter is the command used to make changes to an existing table.
2. Point out the correct statement :
a) You can add a column family to a table using the method addColumn()
b) Using alter, you can also create a column family

c) Using disable-all, you can truncate a column family


d) None of the mentioned
Answer:a
Explanation:Columns can also be added through HbaseAdmin.
3. Which of the following is not a table scope operator ?
a) MEMSTORE_FLUSH
b) MEMSTORE_FLUSHSIZE
c) MAX_FILESIZE
d) All of the mentioned
Answer:a
Explanation:Using alter, you can set and remove table scope operators such as MAX_FILESIZE,
READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc.
4. You can delete a column family from a table using the method _________ of HBAseAdmin
class.
a) delColumn()
b) removeColumn()
c) deleteColumn()
d) All of the mentioned
Answer:c
Explanation:Alter command also can be used to delete a column family.
5. Point out the wrong statement :
a) To read data from an HBase table, use the get() method of the HTable class
b) You can retrieve data from the HBase table using the get() method of the HTable class
c) While retrieving data, you can get a single row by id, or get a set of rows by a set of row ids,
or scan an entire table or a subset of rows
d) None of the mentioned
Answer:d
Explanation:You can retrieve an HBase table data using the add method variants in Get class.
6. __________ class adds HBase configuration files to its object.
a) Configuration
b) Collector
c) Component

d) None of the mentioned


Answer:a
Explanation:You can create a configuration object using the create() method of the
HbaseConfiguration class.
7. The ________ class provides the getValue() method to read the values from its instance.
a) Get
b) Result
c) Put
d) Value
Answer:b
Explanation:Get the result by passing your Get class instance to the get method of the HTable
class. This method returns the Result class object, which holds the requested result.
8. ________ communicate with the client and handle data-related operations.
a) Master Server
b) Region Server
c) Htable
d) All of the mentioned
Answer:b
Explanation:Region Server handle read and write requests for all the regions under it.
9. _________ is the main configuration file of HBase.
a) hbase.xml
b) hbase-site.xml
c) hbase-site-conf.xml
d) None of the mentioned
Answer:b
Explanation:Set the data directory to an appropriate location by opening the HBase home folder
in /usr/local/HBase.
10. HBase uses the _______ File System to store its data.
a) Hive
b) Imphala
c) Hadoop

d) Scala
Answer:c
Explanation: The data storage will be in the form of regions (tables). These regions will be split
up and stored in region servers.
adoop Questions and Answers Mapreduce Development-2
This set of Questions & Answers focuses on Hadoop MapReduce.
1. The Mapper implementation processes one line at a time via _________ method.
a) map
b) reduce
c) mapper
d) reducer
Answer:a
Explanation: The Mapper outputs are sorted and then partitioned per Reducer.
2. Point out the correct statement :
a) Mapper maps input key/value pairs to a set of intermediate key/value pairs
b) Applications typically implement the Mapper and Reducer interfaces to provide the map and
reduce methods
c) Mapper and Reducer interfaces form the core of the job
d) None of the mentioned
Answer:d
Explanation: The transformed intermediate records do not need to be of the same type as the
input records.
3. The Hadoop MapReduce framework spawns one map task for each __________ generated by
the InputFormat for the job.
a) OutputSplit
b) InputSplit
c) InputSplitStream
d) All of the mentioned
Answer:b
Explanation:Mapper implementations are passed the JobConf for the job via the
JobConfigurable.configure(JobConf) method and override it to initialize themselves.

4. Users can control which keys (and hence records) go to which Reducer by implementing a
custom :
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned
Answer:a
Explanation:Users can control the grouping by specifying a Comparator via
JobConf.setOutputKeyComparatorClass(Class).
5. Point out the wrong statement :
a) The Mapper outputs are sorted and then partitioned per Reducer
b) The total number of partitions is the same as the number of reduce tasks for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value)
format
d) None of the mentioned
Answer:d
Explanation:All intermediate values associated with a given output key are subsequently grouped
by the framework, and passed to the Reducer(s) to determine the final output.
6. Applications can use the ____________ to report progress and set application-level status
messages
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned
Answer:c
Explanation:Reporter is also used to update Counters, or just indicate that they are alive.
7. The right level of parallelism for maps seems to be around _________ maps per-node
a) 1-10
b) 10-100
c) 100-150
d) 150-200

Answer:b
Explanation:Task setup takes a while, so it is best if the maps take at least a minute to execute.
8. The number of reduces for the job is set by the user via :
a) JobConf.setNumTasks(int)
b) JobConf.setNumReduceTasks(int)
c) JobConf.setNumMapTasks(int)
d) All of the mentioned
Answer:b
Explanation:Reducer has 3 primary phases: shuffle, sort and reduce.
9. The framework groups Reducer inputs by key in _________ stage.
a) sort
b) shuffle
c) reduce
d) None of the mentioned
Answer:a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being
fetched they are merged.
10. The output of the reduce task is typically written to the FileSystem via _____________ .
a) OutputCollector.collect
b) OutputCollector.get
c) OutputCollector.receive
d) OutputCollector.put
Answer:a
Explanation: The output of the Reducer is not sorted.
Hadoop Questions and Answers MapReduce Features-1
This set of Hadoop Questions & Answers for freshers focuses on MapReduce Features.
1. Which of the following is the default Partitioner for Mapreduce ?
a) MergePartitioner
b) HashedPartitioner
c) HashPartitioner
d) None of the mentioned

Answer:c
Explanation: The total number of partitions is the same as the number of reduce tasks for the job.
2. Point out the correct statement :
a) The right number of reduces seems to be 0.95 or 1.75
b) Increasing the number of reduces increases the framework overhead
c) With 0.95 all of the reduces can launch immediately and start transferring map outputs as the
maps finish
d) All of the mentioned
Answer:c
Explanation:With 1.75 the faster nodes will finish their first round of reduces and launch a
second wave of reduces doing a much better job of load balancing.
3. Which of the following partitions the key space ?
a) Partitioner
b) Compactor
c) Collector
d) All of the mentioned
Answer:a
Explanation:Partitioner controls the partitioning of the keys of the intermediate map-outputs.
4. ____________ is a generalization of the facility provided by the MapReduce framework to
collect data output by the Mapper or the Reducer
a) OutputCompactor
b) OutputCollector
c) InputCollector
d) All of the mentioned
Answer:b
Explanation:Hadoop MapReduce comes bundled with a library of generally useful mappers,
reducers, and partitioners.
5. Point out the wrong statement :
a) It is legal to set the number of reduce-tasks to zero if no reduction is desired
b) The outputs of the map-tasks go directly to the FileSystem
c) The Mapreduce framework does not sort the map-outputs before writing them out to the
FileSystem

d) None of the mentioned


Answer:d
Explanation:Outputs of the map-tasks go directly to the FileSystem, into the output path set by
setOutputPath(Path).
6. __________ is the primary interface for a user to describe a MapReduce job to the Hadoop
framework for execution.
a) JobConfig
b) JobConf
c) JobConfiguration
d) All of the mentioned
Answer:b
Explanation:JobConf is typically used to specify the Mapper, combiner (if any), Partitioner,
Reducer, InputFormat, OutputFormat and OutputCommitter implementations.
7. The ___________ executes the Mapper/ Reducer task as a child process in a separate jvm.
a) JobTracker
b) TaskTracker
c) TaskScheduler
d) None of the mentioned
Answer:a
Explanation: The child-task inherits the environment of the parent TaskTracker.
8. Maximum virtual memory of the launched child-task is specified using :
a) mapv
b) mapred
c) mapvim
d) All of the mentioned
Answer:b
Explanation:Admins can also specify the maximum virtual memory of the launched child-task,
and any sub-process it launches recursively, using mapred.
9. Which of the following parameter is the threshold for the accounting and serialization
buffers ?
a) io.sort.spill.percent
b) io.sort.record.percent

c) io.sort.mb
d) None of the mentioned
Answer:a
Explanation:When percentage of either buffer has filled, their contents will be spilled to disk in
the background.
10. ______________ is percentage of memory relative to the maximum heapsize in which map
outputs may be retained during the reduce.
a) mapred.job.shuffle.merge.percent
b) mapred.job.reduce.input.buffer.percen
c) mapred.inmem.merge.threshold
d) io.sort.factor
Answer:b
Explanation:When the reduce begins, map outputs will be merged to disk until those that remain
are under the resource limit this defines.
Hadoop Questions and Answers MapReduce Features-2
This set of Interview Questions & Answers focuses on MapReduce.
1. ____________ specifies the number of segments on disk to be merged at the same time.
a) mapred.job.shuffle.merge.percent
b) mapred.job.reduce.input.buffer.percen
c) mapred.inmem.merge.threshold
d) io.sort.factor
Answer:d
Explanation:io.sort.factor limits the number of open files and compression codecs during the
merge.
2. Point out the correct statement :
a) The number of sorted map outputs fetched into memory before being merged to disk
b) The memory threshold for fetched map outputs before an in-memory merge is finished
c) The percentage of memory relative to the maximum heapsize in which map outputs may not
be retained during the reduce
d) None of the mentioned

Answer:a
Explanation:When the reduce begins, map outputs will be merged to disk until those that remain
are under the resource limit this defines.
3. Map output larger than ___ percent of the memory allocated to copying map outputs.
a) 10
b) 15
c) 25
d) 35
Answer:c
Explanation:Map output will be written directly to disk without first staging through memory.
4. Jobs can enable task JVMs to be reused by specifying the job configuration :
a) mapred.job.recycle.jvm.num.tasks
b) mapissue.job.reuse.jvm.num.tasks
c) mapred.job.reuse.jvm.num.tasks
d) All of the mentioned
Answer:b
Explanation:Many of my tasks had performance improved over 50% using
mapissue.job.reuse.jvm.num.tasks.
5. Point out the wrong statement :
a) The task tracker has local directory to create localized cache and localized job
b) The task tracker can define multiple local directories
c) The Job tracker cannot define multiple local directories
d) None of the mentioned
Answer:d
Explanation:When the job starts, task tracker creates a localized job directory relative to the local
directory specified in the configuration.
6. During the execution of a streaming job, the names of the _______ parameters are
transformed.
a) vmap
b) mapvim
c) mapreduce
d) mapred

Answer:d
Explanation:To get the values in a streaming jobs mapper/reducer use the parameter names with
the underscores.
7. The standard output (stdout) and error (stderr) streams of the task are read by the TaskTracker
and logged to :
a) ${HADOOP_LOG_DIR}/user
b) ${HADOOP_LOG_DIR}/userlogs
c) ${HADOOP_LOG_DIR}/logs
d) None of the mentioned
Answer:b
Explanation: The child-jvm always has its current working directory added to the
java.library.path and LD_LIBRARY_PATH.
8. ____________ is the primary interface by which user-job interacts with the JobTracker.
a) JobConf
b) JobClient
c) JobServer
d) All of the mentioned
Answer:b
Explanation:JobClient provides facilities to submit jobs, track their progress, access componenttasks reports and logs, get the MapReduce clusters status information and so on.
9. The _____________ can also be used to distribute both jars and native libraries for use in the
map and/or reduce tasks.
a) DistributedLog
b) DistributedCache
c) DistributedJars
d) None of the mentioned
Answer:b
Explanation:Cached libraries can be loaded via System.loadLibrary or System.load.
10. __________ is used to filter log files from the output directory listing.
a) OutputLog
b) OutputLogFilter
c) DistributedLog
d) DistributedJars

d) None of the mentioned


Answer:b
Explanation:User can view the history logs summary in specified directory using the following
command $ bin/hadoop job -history output-dir.
Hadoop Questions and Answers Hadoop Configuration
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Hadoop
Configuration.
1. Which of the following class provides access to configuration parameters ?
a) Config
b) Configuration
c) OutputConfig
d) None of the mentioned
Answer:b
Explanation:Configurations are specified by resources.
2. Point out the correct statement :
a) Configuration parameters may be declared static
b) Unless explicitly turned off, Hadoop by default specifies two resources
c) Configuration class provides access to configuration parameters
d) None of the mentioned
Answer:a
Explanation:Once a resource declares a value final, no subsequently-loaded resource can alter
that value.
3. ___________ gives site-specific configuration for a given hadoop installation.
a) core-default.xml
b) core-site.xml
c) coredefault.xml
d) All of the mentioned
Answer:b
Explanation:core-default.xml is read-only defaults for hadoop.
4. Administrators typically define parameters as final in __________ for values that user
applications may not alter.

a) core-default.xml
b) core-site.xml
c) coredefault.xml
d) All of the mentioned
Answer:b
Explanation:Value strings are first processed for variable expansion.
5. Point out the wrong statement :
a) addDeprecations adds a set of deprecated keys to the global deprecations
b) Configuration parameters cannot be declared final
c) addDeprecations method is lockless
d) None of the mentioned
Answer:b
Explanation:Configuration parameters may be declared final.
6. _________ method clears all keys from the configuration.
a) clear
b) addResource
c) getClass
d) None of the mentioned
Answer:a
Explanation:getClass is used to get the value of the name property as a Class.
7. ________ method adds the deprecated key to the global deprecation map.
a) addDeprecits
b) addDeprecation
c) keyDeprecation
d) None of the mentioned
Answer:b
Explanation:addDeprecation does not override any existing entries in the deprecation map.
8. ________ checks whether the given key is deprecated.
a) isDeprecated
b) setDeprecated
c) isDeprecatedif

d) All of the mentioned


Answer:a
Explanation:Method returns true if the key is deprecated and false otherwise.
9. _________ is useful for iterating the properties when all deprecated properties for currently set
properties need to be present.
a) addResource
b) setDeprecatedProperties
c) addDefaultResource
d) None of the mentioned
Answer:b
Explanation:setDeprecatedProperties sets all deprecated properties that are not currently set but
have a corresponding new property that is set.
10. Which of the following adds a configuration resource ?
a) addResource
b) setDeprecatedProperties
c) addDefaultResource
d) addResource
Answer:d
Explanation: The properties of this resource will override properties of previously added
resources, unless they were marked final.
Hadoop Questions and Answers Security
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Security.
1. For running hadoop service daemons in Hadoop in secure mode, ___________ principals are
required.
a) SSL
b) Kerberos
c) SSH
d) None of the mentioned
Answer:b
Explanation:Each service reads auhenticate information saved in keytab file with appropriate
permission.

2. Point out the correct statement :


a) Hadoop does have the definition of group by itself
b) MapReduce JobHistory server run as same user such as mapred
c) SSO environment is managed using Kerberos with LDAP for Hadoop in secure mode
d) None of the mentioned
Answer:c
Explanation:You can change a way of mapping by specifying the name of mapping provider as a
value of hadoop.security.group.mapping.
3. The simplest way to do authentication is using _________ command of Kerberos.
a) auth
b) kinit
c) authorize
d) All of the mentioned
Answer:b
Explanation:HTTP web-consoles should be served by principal different from RPCs one.
4. Data transfer between Web-console and clients are protected by using :
a) SSL
b) Kerberos
c) SSH
d) None of the mentioned
Answer:a
Explanation:AES offers the greatest cryptographic strength and the best performance
5. Point out the wrong statement :
a) Data transfer protocol of DataNode does not use the RPC framework of Hadoop
b) Apache Oozie which access the services of Hadoop on behalf of end users need to be able to
impersonate end users
c) DataNode must authenticate itself by using privileged ports which are specified by
dfs.datanode.address and dfs.datanode.http.address
d) None of the mentioned
Answer:d
Explanation:Authentication is based on the assumption that the attacker wont be able to get root
privileges.

6. In order to turn on RPC authentication in hadoop, set the value of


hadoop.security.authentication property to :
a) zero
b) kerberos
c) false
d) None of the mentioned
Answer:b
Explanation:Security settings need to be modified properly for robustness.
7. The __________ provides a proxy between the web applications exported by an application
and an end user.
a) ProxyServer
b) WebAppProxy
c) WebProxy
d) None of the mentioned
Answer:b
Explanation:If security is enabled it will warn users before accessing a potentially unsafe web
application. Authentication and authorization using the proxy is handled just like any other
privileged web application.
8. ___________ used by YARN framework which define how any container launched and
controlled.
a) Container
b) ContainerExecutor
c) Executor
d) All of the mentioned
Answer:b
Explanation: The container process has the same Unix user as the NodeManager.
9. The ____________ requires that paths including and leading up to the directories specified in
yarn.nodemanager.local-dirs
a) TaskController
b) LinuxTaskController
c) LinuxController
d) None of the mentioned

Answer:b
Explanation:LinuxTaskController keeps track of all paths and directories on datanode.
10. The configuration file must be owned by the user running :
a) DataManager
b) NodeManager
c) ValidationManager
d) None of the mentioned
Answer:b
Explanation:To re-cap,local file-sysytem permissions need to be modified
Hadoop Questions and Answers MapReduce Job-1
This set of Hadoop Interview Questions & Answers for freshers focuses on MapReduce Job.
1. __________ storage is a solution to decouple growing storage capacity from compute
capacity.
a) DataNode
b) Archival
c) Policy
d) None of the mentioned
Answer:b
Explanation:Nodes with higher density and less expensive storage with low compute power are
becoming available.
2. Point out the correct statement :
a) When there is enough space, block replicas are stored according to the storage type list
b) One_SSD is used for storing all replicas in SSD
c) Hot policy is useful only for single replica blocks
d) All of the mentioned
Answer:a
Explanation: The first phase of Heterogeneous Storage changed datanode storage model from a
single storage.
3. ___________ is added for supporting writing single replica files in memory.
a) ROM_DISK
b) ARCHIVE
c) RAM_DISK

d) All of the mentioned


Answer:c
Explanation:DISK is the default storage type.
4. Which of the following has high storage density ?
a) ROM_DISK
b) ARCHIVE
c) RAM_DISK
d) All of the mentioned
Answer:b
Explanation:Little compute power is added for supporting archival storage.
5. Point out the wrong statement :
a) A Storage policy consists of the Policy ID
b) The storage policy can be specified using the dfsadmin -setStoragePolicy command
c) dfs.storage.policy.enabled is used for enabling/disabling the storage policy feature
d) None of the mentioned
Answer:d
Explanation: The effective storage policy can be retrieved by the dfsadmin -getStoragePolicy
command.
6. Which of the following storage policy is used for both storage and compute ?
a) Hot
b) Cold
c) Warm
d) All_SSD
Answer:a
Explanation:When a block is hot, all replicas are stored in DISK.
7. Which of the following is only for storage with limited compute ?
a) Hot
b) Cold
c) Warm
d) All_SSD

Answer:b
Explanation:When a block is cold, all replicas are stored in ARCHIVE.
8. When a block is warm, some of its replicas are stored in DISK and the remaining replicas are
stored in :
a) ROM_DISK
b) ARCHIVE
c) RAM_DISK
d) All of the mentioned
Answer:b
Explanation:Warm storage policy is partially hot and partially cold.
9. ____________ is used for storing one of the replicas in SSD.
a) Hot
b) Lazy_Persist
c) One_SSD
d) All_SSD
Answer:c
Explanation: The remaining replicas are stored in DISK.
10. ___________ is used for writing blocks with single replica in memory.
a) Hot
b) Lazy_Persist
c) One_SSD
d) All_SSD
Answer:b
Explanation: The replica is first written in RAM_DISK and then it is lazily persisted in DISK.
Hadoop Questions and Answers MapReduce Job-2
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on MapReduce Job2.
1. _________ is a data migration tool added for archiving data.
a) Mover
b) Hiver
c) Serde

d) None of the mentioned


Answer:a
Explanation:Mover periodically scans the files in HDFS to check if the block placement satisfies
the storage policy.
2. Point out the correct statement :
a) Mover is not similar to Balancer
b) hdfs dfsadmin -setStoragePolicy puts a storage policy to a file or a directory.
c) addCacheArchive add archives to be localized
d) None of the mentioned
Answer:c
Explanation:addArchiveToClassPath(Path archive) adds an archive path to the current set of
classpath entries.
3. Which of the following is used to list out the storage policies ?
a) hdfs storagepolicies
b) hdfs storage
c) hd storagepolicies
d) All of the mentioned
Answer:a
Explanation:Arguments are none for the hdfs storagepolicies command.
4. Which of the following statement can be used get the storage policy of a file or a directory ?
a) hdfs dfsadmin -getStoragePolicy path
b) hdfs dfsadmin -setStoragePolicy path policyName
c) hdfs dfsadmin -listStoragePolicy path policyName
d) All of the mentioned
Answer:a
Explanation: refers to the path referring to either a directory or a file.
Hadoop Questions and Answers Task Execution
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Task
Execution.
1. Which of the following node is responsible for executing a Task assigned to it by the
JobTracker ?

a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
Answer:c
Explanation:TaskTracker receives the information necessary for execution of a Task from
JobTracker, Executes the Task, and Sends the Results back to JobTracker.
2. Point out the correct statement :
a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function.
c) Reduce Task in MapReduce is performed using the Map() function.
d) All of the mentioned
Answer:a
Explanation:This feature of MapReduce is Data Locality.
3. ___________ part of the MapReduce is responsible for processing one or more chunks of data
and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned
Answer:a
Explanation:Map Task in MapReduce is performed using the Map() function.
4. _________ function is responsible for consolidating the results produced by each of the Map()
functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned
Answer:a
Explanation:Reduce function collates the work and resolves the results.
5. Point out the wrong statement :
a) A MapReduce job usually splits the input data-set into independent chunks which are

processed by the map tasks in a completely parallel manner


b) The MapReduce framework operates exclusively on pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the map and
reduce methods
d) None of the mentioned
Answer:d
Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and reexecutes the failed tasks.
6. Although the Hadoop framework is implemented in Java , MapReduce applications need not
be written in :
a) Java
b) C
c) C#
d) None of the mentioned
Answer:a
Explanation:Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce
applications (non JNITM based).
7. ________ is a utility which allows users to create and run jobs with any executable as the
mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned
Answer:b
Explanation:Hadoop streaming is one of the most important utilities in the Apache Hadoop
distribution.
8. __________ maps input key/value pairs to a set of intermediate key/value pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned

Answer:a
Explanation:Maps are the individual tasks that transform input records into intermediate records.
9. The number of maps is usually driven by the total size of :
a) inputs
b) outputs
c) tasks
d) None of the mentioned
Answer:a
Explanation:Total size of inputs means total number of blocks of the input files.
10. Running a ___________ program involves running mapping tasks on many or all of the
nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned
Answer:a
Explanation: In some applications, component tasks need to create and/or write to side-files,
which differ from the actual job-output files.
Hadoop Questions and Answers YARN-1
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on YARN-1.
1. ________ is the architectural center of Hadoop that allows multiple data processing engines.
a) YARN
b) Hive
c) Incubator
d) Chuckwa
Answer:a
Explanation:YARN is the prerequisite for Enterprise Hadoop, providing resource management
and a central platform to deliver consistent operations, security, and data governance tools across
Hadoop clusters.
2. Point out the correct statement :
a) YARN also extends the power of Hadoop to incumbent and new technologies found within the
data center

b) YARN is the central point of investment for Hortonworks within the Apache community
c) YARN enhances a Hadoop compute cluster in many ways
d) All of the mentioned
Answer:d
Explanation:YARN provides ISVs and developers a consistent framework for writing data access
applications that run IN Hadoop.
3. YARNs dynamic allocation of cluster resources improves utilization over more static _______
rules used in early versions of Hadoop.
a) Hive
b) MapReduce
c) Imphala
d) All of the mentioned
Answer:b
Explanation:Multi-tenant data processing improves an enterprises return on its Hadoop
investments.
4. The __________ is a framework-specific entity that negotiates resources from the
ResourceManager
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned
Answer:c
Explanation:Each ApplicationMaster has responsibility for negotiating appropriate resource
containers from the schedule.
5. Point out the wrong statement :
a) From the system perspective, the ApplicationMaster runs as a normal container.
b) The ResourceManager is the per-machine slave, which is responsible for launching the
applications containers
c) The NodeManager is the per-machine slave, which is responsible for launching the
applications containers, monitoring their resource usage
d) None of the mentioned

Answer:b
Explanation:ResourceManager has a scheduler, which is responsible for allocating resources to
the various applications running in the cluster, according to constraints such as queue capacities
and user limits.
6. Apache Hadoop YARN stands for :
a) Yet Another Reserve Negotiator
b) Yet Another Resource Network
c) Yet Another Resource Negotiator
d) All of the mentioned
Answer:c
Explanation:YARN is a cluster management technology.
7. MapReduce has undergone a complete overhaul in hadoop :
a) 0.21
b) 0.23
c) 0.24
d) 0.26
Answer:b
Explanation: The fundamental idea of MRv2 is to split up the two major functionalities of the
JobTracker.
8. The ____________ is the ultimate authority that arbitrates resources among all the
applications in the system.
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned
Answer:b
Explanation: The ResourceManager and per-node slave, the NodeManager (NM), form the datacomputation framework.
9. The __________ is responsible for allocating resources to the various running applications
subject to familiar constraints of capacities, queues etc.
a) Manager
b) Master
c) Scheduler

d) None of the mentioned


Answer:c
Explanation: The Scheduler is pure scheduler in the sense that it performs no monitoring or
tracking of status for the application.
10. The CapacityScheduler supports _____________ queues to allow for more predictable
sharing of cluster resources.
a) Networked
b) Hierarchial
c) Partition
d) None of the mentioned
Answer:b
Explanation: The Scheduler has a pluggable policy plug-in, which is responsible for partitioning
the cluster resources among the various queues, applications etc.
Hadoop Questions and Answers YARN-2
This set of Hadoop Question Bank focuses on YARN.
1. Yarn commands are invoked by the ________ script.
a) hive
b) bin
c) hadoop
d) home
Answer:b
Explanation:Running the yarn script without any arguments prints the description for all
commands.
2. Point out the correct statement :
a) Each queue has strict ACLs which controls which users can submit applications to individual
queues
b) Hierarchy of queues is supported to ensure resources are shared among the sub-queues of an
organization
c) Queues are allocated a fraction of the capacity of the grid in the sense that a certain capacity of
resources will be at their disposal
d) All of the mentioned

Answer:d
Explanation:All applications submitted to a queue will have access to the capacity allocated to
the queue.
3. The queue definitions and properties such as ________, ACLs can be changed, at runtime.
a) tolerant
b) capacity
c) speed
d) All of the mentioned
Answer:b
Explanation:Administrators can add additional queues at runtime, but queues cannot be deleted
at runtime.
4. The CapacityScheduler has a pre-defined queue called :
a) domain
b) root
c) rear
d) All of the mentioned
Answer:b
Explanation:All queueus in the system are children of the root queue.
5. Point out the wrong statement :
a) The multiple of the queue capacity which can be configured to allow a single user to acquire
more resources
b) Changing queue properties and adding new queues is very simple
c) Queues cannot be deleted, only addition of new queues is supported
d) None of the mentioned
Answer:d
Explanation:You need to edit conf/capacity-scheduler.xml and run yarn rmadmin -refreshQueues
for changing queue properties.
6. The updated queue configuration should be a valid one i.e. queue-capacity at each level should
be equal to :
a) 50%
b) 75%
c) 100%

d) 0%
Answer:c
Explanation:Queues cannot be deleted, only addition of new queues is supported.
7. Users can bundle their Yarn code in a _________ file and execute it using jar command.
a) java
b) jar
c) C code
d) xml
Answer:b
Explanation:Usage: yarn jar [mainClass] args
8. Which of the following command is used to dump the log container ?
a) logs
b) log
c) dump
d) All of the mentioned
Answer:a
Explanation:Usage: yarn logs -applicationId .
9. __________ will clear the RMStateStore and is useful if past applications are no longer
needed.
a) -format-state
b) -form-state-store
c) -format-state-store
d) None of the mentioned
Answer:c
Explanation:-format-state-store formats the RMStateStore.
10. Which of the following command runs ResourceManager admin client ?
a) proxyserver
b) run
c) admin
d) rmadmin

Answer:d
Explanation:proxyserver command starts the web proxy server.
Hadoop Questions and Answers Mapreduce Types
This set of Hadoop Questions & Answers for experienced focuses on MapReduce Types.
1. ___________ generates keys of type LongWritable and values of type Text.
a) TextOutputFormat
b) TextInputFormat
c) OutputInputFormat
d) None of the mentioned
Answer:b
Explanation:If K2 and K3 are the same, you dont need to call setMapOutputKeyClass().
2. Point out the correct statement :
a) The reduce input must have the same types as the map output, although the reduce output
types may be different again
b) The map input key and value types (K1 and V1) are different from the map output types
c) The partition function operates on the intermediate key
d) All of the mentioned
Answer:d
Explanation:In practice, the partition is determined solely by the key (the value is ignored).
3. In _____________, the default job is similar, but not identical, to the Java equivalent.
a) Mapreduce
b) Streaming
c) Orchestration
d) All of the mentioned
Answer:b
Explanation:MapReduce Types and Formats MapReduce has a simple model of data processing.
4. An input _________ is a chunk of the input that is processed by a single map.
a) textformat
b) split
c) datanode
d) All of the mentioned

Answer:b
Explanation:Each split is divided into records, and the map processes each recorda key-value
pairin turn.
5. Point out the wrong statement :
a) If V2 and V3 are the same, you only need to use setOutputValueClass()
b) The overall effect of Streaming job is to perform a sort of the input
c) A Streaming application can control the separator that is used when a key-value pair is turned
into a series of bytes and sent to the map or reduce process over standard input
d) None of the mentioned
Answer:d
Explanation:If a combine function is used then it is the same form as the reduce function, except
its output types are the intermediate key and value types (K2 and V2), so they can feed the
reduce function.
6. An ___________ is responsible for creating the input splits, and dividing them into records.
a) TextOutputFormat
b) TextInputFormat
c) OutputInputFormat
d) InputFormat
Answer:d
Explanation:As a MapReduce application writer, you dont need to deal with InputSplits directly,
as they are created by an InputFormat.
7. ______________ is another implementation of the MapRunnable interface that runs mappers
concurrently in a configurable number of threads.
a) MultithreadedRunner
b) MultithreadedMap
c) MultithreadedMapRunner
d) SinglethreadedMapRunner
Answer:c
Explanation:A RecordReader is little more than an iterator over records, and the map task uses
one to generate record key-value pairs, which it passes to the map function.
8. Which of the following is the only way of running mappers ?
a) MapReducer
b) MapRunner

c) MapRed
d) All of the mentioned
Answer:b
Explanation:Having calculated the splits, the client sends them to the jobtracker.
9. _________ is the base class for all implementations of InputFormat that use files as their data
source .
a) FileTextFormat
b) FileInputFormat
c) FileOutputFormat
d) None of the mentioned
Answer:b
Explanation:FileInputFormat provides implementation for generating splits for the input files.
10. Which of the following method add a path or paths to the list of inputs ?
a) setInputPaths()
b) addInputPath()
c) setInput()
d) None of the mentioned
Answer:b
Explanation:FileInputFormat offers four static convenience methods for setting a JobConfs
input paths.
Hadoop Questions and Answers Mapreduce Formats-1
This set of Hadoop Interview Questions & Answers for experienced focuses on MapReduce
Formats.
1. The split size is normally the size of an ________ block, which is appropriate for most
applications.
a) generic
b) task
c) library
d) HDFS

Answer:d
Explanation:FileInputFormat splits only large files(Here large means larger than an HDFS
block).
2. Point out the correct statement :
a) The minimum split size is usually 1 byte, although some formats have a lower bound on the
split size
b) Applications may impose a minimum split size.
c) The maximum split size defaults to the maximum value that can be represented by a Java long
type
d) All of the mentioned
Answer:a
Explanation: The maximum split size has an effect only when it is less than the block size,
forcing splits to be smaller than a block.
3. Which of the following Hadoop streaming command option parameter is required ?
a) output directoryname
b) mapper executable
c) input directoryname
d) All of the mentioned
Answer:d
Explanation:Required parameters is used for Input and Output location for mapper.
4. To set an environment variable in a streaming command use:
a) -cmden EXAMPLE_DIR=/home/example/dictionaries/
b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/
c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/
d) -cmenv EXAMPLE_DIR=/home/example/dictionaries/
Answer:c
Explanation:Environment Variable is set using cmdenv command.
5. Point out the wrong statement :
a) Hadoop works better with a small number of large files than a large number of small files
b) CombineFileInputFormat is designed to work well with small files
c) CombineFileInputFormat does not compromise the speed at which it can process the input in a
typical MapReduce job

d) None of the mentioned


Answer:c
Explanation:If the file is very small (small means significantly smaller than an HDFS block)
and there are a lot of them, then each map task will process very little input, and there will be a
lot of them (one per file), each of which imposes extra bookkeeping overhead.
6. The ________ option allows you to copy jars locally to the current working directory of tasks
and automatically unjar the files.
a) archives
b) files
c) task
d) None of the mentioned
Answer:a
Explanation:Archives options is also a generic option.
7. ______________ class allows the Map/Reduce framework to partition the map outputs based
on certain key fields, not the whole keys.
a) KeyFieldPartitioner
b) KeyFieldBasedPartitioner
c) KeyFieldBased
d) None of the mentioned
Answer:b
Explanation: The primary key is used for partitioning, and the combination of the primary and
secondary keys is used for sorting.
8. Which of the following class provides a subset of features provided by the Unix/GNU Sort ?
a) KeyFieldBased
b) KeyFieldComparator
c) KeyFieldBasedComparator
d) All of the mentioned
Answer:c
Explanation:Hadoop has a library class, KeyFieldBasedComparator, that is useful for many
applications.
9. Which of the following class is provided by Aggregate package ?
a) Map

b) Reducer
c) Reduce
d) None of the mentioned
Answer:b
Explanation:Aggregate provides a special reducer class and a special combiner class, and a list of
simple aggregators that perform aggregations such as sum, max, min and so on over a
sequence of values.
10.Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that
effectively allows you to process text data like the unix ______ utility.
a) Copy
b) Cut
c) Paste
d) Move
Answer:b
Explanation: The map function defined in the class treats each input key/value pair as a list of
fields.
Hadoop Questions and Answers Mapreduce Formats-2
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Mapreduce
Formats-2.
1. ___________ takes node and rack locality into account when deciding which blocks to place
in the same split
a) CombineFileOutputFormat
b) CombineFileInputFormat
c) TextFileInputFormat
d) None of the mentioned
Answer:b
Explanation:CombineFileInputFormat does not compromise the speed at which it can process the
input in a typical MapReduce job.
2. Point out the correct statement :
a) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable
number of lines of input
b) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a
mapper

c) The number depends on the size of the split and the length of the lines.
d) All of the mentioned
Answer:d
Explanation:Large XML documents that are composed of a series of records can be broken
into these records using simple string or regular-expression matching to find start and end tags of
records.
3. The key, a ____________, is the byte offset within the file of the beginning of the line.
a) LongReadable
b) LongWritable
c) LongWritable
d) All of the mentioned
Answer:b
Explanation: The value is the contents of the line, excluding any line terminators (newline,
carriage return), and is packaged as a Text object.
4. _________ is the output produced by TextOutputFor mat, Hadoops default OutputFormat.
a) KeyValueTextInputFormat
b) KeyValueTextOutputFormat
c) FileValueTextInputFormat
d) All of the mentioned
Answer:b
Explanation:To interpret such files correctly, KeyValueTextInputFormat is appropriate.
5. Point out the wrong statement :
a) Hadoops sequence file format stores sequences of binary key-value pairs
b) SequenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the
sequence files keys and values as opaque binary objects
c) SequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that retrieves the
sequence files keys and values as opaque binary objects.
d) None of the mentioned
Answer:c
Explanation:SequenceFileAsBinaryInputFormat is used for reading keys, values from
SequenceFiles in binary (raw) format.

6. __________ is a variant of SequenceFileInputFormat that converts the sequence files keys


and values to Text objects
a) SequenceFile
b) SequenceFileAsTextInputFormat
c) SequenceAsTextInputFormat
d) All of the mentioned
Answer:b
Explanation:With multiple reducers, records will be allocated evenly across reduce tasks, with all
records that share the same key being processed by the same reduce task.
7. __________ class allows you to specify the InputFormat and Mapper to use on a per-path
basis.
a) MultipleOutputs
b) MultipleInputs
c) SingleInputs
d) None of the mentioned
Answer:b
Explanation:One might be tab-separated plain text, the other a binary sequence file. Even if they
are in the same format, they may have different representations, and therefore need to be parsed
differently.
8. ___________ is an input format for reading data from a relational database, using JDBC.
a) DBInput
b) DBInputFormat
c) DBInpFormat
d) All of the mentioned
Answer:b
Explanation:DBInputFormat is the most frequently used format for reading data.
9. Which of the following is the default output format ?
a) TextFormat
b) TextOutput
c) TextOutputFormat
d) None of the mentioned

Answer:c
Explanation:TextOutputFormat keys and values may be of any type.
10. Which of the following writes MapFiles as output ?
a) DBInpFormat
b) MapFileOutputFormat
c) SequenceFileAsBinaryOutputFormat
d) None of the mentioned
Answer:c
Explanation:SequenceFileAsBinaryOutputFormat writes keys and values in raw binary format
into a SequenceFile container.
Hadoop Questions and Answers Hadoop Cluster-1
This set of Questions and Answers focuses on Hadoop Cluster
1. Mapper implementations are passed the JobConf for the job via the ________ method
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configureable
d) None of the mentioned
Answer:b
Explanation:JobConfigurable.configure method is overrided to initialize themselves.
2. Point out the correct statement :
a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit generated by
the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value)
format
d) All of the mentioned
Answer: d
Explanation:Reporters can be used to set application-level status messages and update Counters.
3. Input to the _______ is the sorted output of the mappers.
a) Reducer
b) Mapper
c) Shuffle

d) All of the mentioned


Answer:a
Explanation:In Shuffle phase the framework fetches the relevant partition of the output of all the
mappers, via HTTP.
4. The right number of reduces seems to be :
a) 0.90
b) 0.80
c) 0.36
d) 0.95
Answer:d
Explanation: The right number of reduces seems to be 0.95 or 1.75.
5. Point out the wrong statement :
a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but increases load
balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired
d) The framework groups Reducer inputs by keys (since different mappers may have output the
same key) in sort stage
Answer:a
Explanation:Reducer has 3 primary phases: shuffle, sort and reduce.
6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned
Answer:d
Explanation: The output of the reduce task is typically written to the FileSystem. The output of
the Reducer is not sorted.
7. Which of the following phases occur simultaneously ?
a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map

d) All of the mentioned


Answer:a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being
fetched they are merged.
8. Mapper and Reducer implementations can use the ________ to report progress or just indicate
that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
Answer:c
Explanation:Reporter is a facility for MapReduce applications to report progress, set applicationlevel status messages and update Counters.
9. __________ is a generalization of the facility provided by the MapReduce framework to
collect data output by the Mapper or the Reducer
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
Answer:b
Explanation:Hadoop MapReduce comes bundled with a library of generally useful mappers,
reducers, and partitioners.
10. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop
framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
d) None of the mentioned
Answer:b
Explanation:JobConf represents a MapReduce job configuration.
Hadoop Questions and Answers Hadoop Cluster-2

This set of Hadoop assessment questions focuses on Hadoop Cluster.


1. ________ systems are scale-out file-based (HDD) systems moving to more uses of memory in
the nodes.
a) NoSQL
b) NewSQL
c) SQL
d) All of the mentioned
Answer:a
Explanation: NoSQL systems make the most sense whenever the application is based on data
with varying data types and the data can be stored in key-value notation.
2. Point out the correct statement :
a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload
b) HDFS runs on a small cluster of commodity-class nodes
c) NEWSQL is frequently the collection point for big data
d) None of the mentioned
Answer:a
Explanation:Hadoop together with a relational data warehouse, they can form very effective data
warehouse infrastructure.
3. Hadoop data is not sequenced and is in 64MB to 256 MB block sizes of delimited record
values with schema applied on read based on:
a) HCatalog
b) Hive
c) Hbase
d) All of the mentioned
Answer:a
Explanation:Other means of tagging the values also can be used.
4. __________ are highly resilient and eliminate the single-point-of-failure risk with traditional
Hadoop deployments
a) EMR
b) Isilon solutions
c) AWS
d) None of the mentioned

Answer:b
Explanation:enterprise data protection and security options including file system auditing and
data-at-rest encryption to address compliance requirements is also provided by Isilon solution.
5. Point out the wrong statement :
a) EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and
highly efficient storage platform
b) Isilons native HDFS integration means you can avoid the need to invest in a separate Hadoop
infrastructure
c) NoSQL systems do provide high latency access and accommodate less concurrent users
d) None of the mentioned
Answer:c
Explanation:NoSQL systems do provide low latency access and accommodate many concurrent
users.
6. HDFS and NoSQL file systems focus almost exclusively on adding nodes to :
a) Scale out
b) Scale up
c) Both Scale out and up
d) None of the mentioned
Answer:a
Explanation:HDFS and NoSQL file systems focus almost exclusively on adding nodes to
increase performance (scale-out) but even they require node configuration with elements of scale
up.
7. Which is the most popular NoSQL database for scalable big data store with Hadoop ?
a) Hbase
b) MongoDB
c) Cassandra
d) None of the mentioned
Answer:a
Explanation:HBase is the Hadoop database: a distributed, scalable Big Data store that lets you
host very large tables billions of rows multiplied by millions of columns on clusters built
with commodity hardware.
8. The ___________ can also be used to distribute both jars and native libraries for use in the
map and/or reduce tasks.

a) DataCache
b) DistributedData
c) DistributedCache
d) All of the mentioned
Answer:c
Explanation: The child-jvm always has its current working directory added to the
java.library.path and LD_LIBRARY_PATH.
9. HBase provides ___________ like capabilities on top of Hadoop and HDFS.
a) TopTable
b) BigTop
c) Bigtable
d) None of the mentioned
Answer:c
Explanation: Google Bigtable leverages the distributed data storage provided by the Google File
System.
10. _______ refers to incremental costs with no major impact on solution design, performance
and complexity.
a) Scale-out
b) Scale-down
c) Scale-up
d) None of the mentioned
Answer:c
Explanation:dding more CPU/RAM/Disk capacity to Hadoop DataNode that is already part of a
cluster does not require additional network switches.
Hadoop Questions and Answers HDFS Maintenance
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on HDFS
Maintenance.
1. Which of the following is a common hadoop maintenance issue ?
a) Lack of tools
b) Lack of configuration management
c) Lack of web interface
d) None of the mentioned

Answer:b
Explanation:Without a centralized configuration management framework, you end up with a
number of issues that can cascade just as your usage picks up.
2. Point out the correct statement :
a) RAID is turned off by default
b) Hadoop is designed to be a highly redundant distributed system
c) Hadoop has a networked configuration system
d) None of the mentioned
Answer:b
Explanation:Hadoop deployment is sometimes difficult to implement.
3. ___________ mode allows you to suppress alerts for a host, service, role, or even the entire
cluster.
a) Safe
b) Maintenance
c) Secure
d) All of the mentioned
Answer:b
Explanation:Maintenance mode can be useful when you need to take actions in your cluster and
do not want to see the alerts that will be generated due to those actions.
4. Which of the following is a configuration management system ?
a) Alex
b) Puppet
c) Acem
d) None of the mentioned
Answer:b
Explanation:Administrators may use configuration management systems such as Puppet and
Chef to manage processes.
5. Point out the wrong statement :
a) If you set the HBase service into maintenance mode, then its roles (HBase Master and all
Region Servers) are put into effective maintenance mode
b) If you set a host into maintenance mode, then any roles running on that host are put into
effective maintenance mode
c) Putting a component into maintenance mode prevent events from being logged

d) None of the mentioned


Answer:c
Explanation:Maintenance mode only suppresses the alerts that those events would otherwise
generate.
6. Which of the following is a common reason to restart hadoop process ?
a) Upgrade Hadoop
b) React to incidents
c) Remove worker nodes
d) All of the mentioned
Answer:d
Explanation: The most common reason administrators restart Hadoop processes is to enact
configuration changes.
7. __________ Managers Service feature monitors dozens of service health and performance
metrics about the services and role instances running on your cluster.
a) Microsoft
b) Cloudera
c) Amazon
d) None of the mentioned
Answer:b
Explanation:Managers Service feature presents health and performance data in a variety of
formats.
8. Which of the tab shows all the role instances that have been instantiated for this service ?
a) Service
b) Status
c) Instance
d) All of the mentioned
Answer:c
Explanation: The Instances page displays the results of the configuration validation checks it
performs for all the role instances for this service.
9. __________ is a standard Java API for monitoring and managing applications.
a) JVX
b) JVM

c) JMX
d) None of the mentioned
Answer:c
Explanation:Hadoop includes several managed beans (MBeans), which expose Hadoop metrics
to JMX-aware applications.
10. NameNode is monitored and upgraded in a __________ transition.
a) safemode
b) securemode
c) servicemode
d) None of the mentioned
Answer:b
Explanation: The HDFS service has some unique functions that may result in additional
information on its Status and Instances pages.
Hadoop Questions and Answers Monitoring HDFS
This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on Monitoring
HDFS.
1. For YARN, the ___________ Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
d) Replication
Answer:c
Explanation:All the metadata related to HDFS including the information about data nodes, files
stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.
2. Point out the correct statement :
a) The Hadoop framework publishes the job flow status to an internally running web server on
the master nodes of the Hadoop cluster
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault
tolerance
d) None of the mentioned

Answer:a
Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows
information about the NameNode itself.
3. For ________, the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned
Answer:a
Explanation:HBase Master UI provides information about the number of live, dead and
transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.
4. ________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned
Answer:c
Explanation:Secondary namenode is used for all time availability and reliability.
5. Point out the wrong statement :
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file
level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that
DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to
Answer:d
Explanation:NameNode is aware of the files to which the blocks stored on it belong to.
6. Which of the following scenario may not be a good fit for HDFS ?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned

Answer:a
Explanation:HDFS can be used for storing archive data since it is cheaper as HDFS allows
storing the data on low cost commodity hardware while ensuring a high degree of faulttolerance.
7. The need for data replication can arise in various scenarios like :
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned
Answer:d
Explanation:Data is replicated across different DataNodes to ensure a high degree of faulttolerance.
8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication
Answer:a
Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has
more than one DataNode, with data replicated across them.
9. HDFS provides a command line interface called __________ used to interact with HDFS.
a) HDFS Shell
b) FS Shell
c) DFS Shell
d) None of the mentioned
Answer:b
Explanation: The File System (FS) shell includes various shell-like commands that directly
interact with the Hadoop Distributed File System.
10. During start up, the ___________ loads the file system state from the fsimage and the edits
log file.
a) DataNode
b) NameNode
c) ActionNode

d) None of the mentioned


Answer:b
Explanation:HDFS is implemented on any computer which can run Java can host a
NameNode/DataNode on it.

Das könnte Ihnen auch gefallen