Sie sind auf Seite 1von 3

SAPTHAGIRI COLLEGE OF ENGINEERING

(Affiliated to VTU, Belagavi, Approved by AICTE, New Delhi)


14/5, CHIKKASANDRA, HESARAGHATTA MAIN ROAD
BENGALURU-560 057
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING
Big Data Analytics Question Bank
Module-1

1. Describe features of HDFS


2. Explain in detail different components of HDFS
3. How HDFS block replication happens show it pictorially
4. Explain how HDFS works in safe mode ? Define Rack awareness
5. Explain NameNode high availability design with diagram
6. Explain Apache map reduce parallel data flow
7. Define HDFS NameNode Federation example ? Explain HDFS NFS Gateway ?
8. Explain HDFS check points , backups and snapshots ?
9. List few HDFS commands
10. List Hadoop Benchmarks and explain in detail TeraSort and TestDFSIO Benchmark ?
11. Explain in detail “mapred” command ?
12. With simple scripts explain mapper and reducer ?
13. Explain how mapreduce model functions ?
14. Explain in detail Mapreduce Parallel Data flow with diagram?
15. Explain process placement during MapReduce ?
16. How is HDFS Fault Tolerant ? Explain Speculative execution in HDFS
17. Write wordCount program in Java, C++ and Python
18. Explain Streaming interface ? What are the limitations of Streaming interface ?
19. Explain pipes interface ?
20. How debugging is done in HDFS ? Explain Hadoop Log Management ?

Module 2
1. Explain Apache Pig along with commands.
2. Explain Apache Hive with commands
3. Explain Apache Sqoop ? Explain Apache Sqoop Import and Export methods.
4. Describe Apache Flume Agent Components with neat sketch. [Include
pipeline and also consolidation network]
5. Explain in detail Apache Oozie with workflow DAG.
6. Explain HBase in detail.
7. Explain Structure of YARN Applications.
8. Explain the following –
Apache Tez, Apache Giraph, Apache Storm, Apache Spark, Apache Flink
9. Explain YARN architecture taking two clients with neat diagram.

1
Module 3
1. How BI can be used for better decisions ?
2. Explain BI tools in detail.
3. Explain any 2 BI applications in detail.
4. List three Business intelligence applications in Healthcare and wellness.
5. List three Business intelligence applications in Education
6. List three Business intelligence applications in Customer relationship management.
7. What are the design considerations for Data warehouse ?

8. Compare Datamart and Datawarehouse.

9. Describe DataWarehouse Architecture.

10. Explain DataLoading process.

11. Explain DataWarehouse Design.

12. Explain Datawarehouse Best practices.

13. What is DataMining ? What are supervised and unsupervised learning techniques.

14. What are the possible outputs of DataMining ?

15. How to evaluate Data Mining results?

16. Explain Data Mining Techniques.

17. List down Data Mining tools.

18. List down the Data Mining Best practices.

19. What are the major mistakes to be avoided when doing Data Mining ?

20. What is confusion matrix ?

21. Why is data preparation so important and time consuming ?

Module 4 & Module 5

1. What is clustering? Explain the applications of clustering. Write the generic pseudo code for
clustering.

2. Comparison between decision tree with table lookup.

3. Explain with an example of K-Means algorithm for clustering.

4. Explain the construction of the decision tree and pseudo code of making a decision tree.

2
5. Write an architectural diagram for text mining and explain. What are the applications of text
mining?

6. Explain the web content mining, structure mining and usage mining in detail.

7. Write the differences between text mining and data mining.

8. How data can be stored and accessed in big data technologies.

9. Write a note on web mining algorithms.

10. Explain Term Document Matrix.

Das könnte Ihnen auch gefallen