CP7019-Managing Big Data-Anna University - Question Paper

Reg No.
Question Paper Code : 13277

M.E. DEGREE EXAMINATION, NOVEMBER / DECEMBER 2014
Elective
Computer Science and Engineering
CP7019 MANAGING BIG DATA
(Common to M.E. Computer Science and Engineering (with specialization in Networks)
and M.E. Biometrics and Cyber Security)
(Regulations 2013)
Time : Three hours
Maximum : 100 Marks

Answer ALL questions
PART A (10 x 2 = 20 Marks)
1.
What is Big Data? Why we need to analyze Big Data?
2.
Write down any four industry examples for Big Data.
3.
Compare and contrast NoSQL vs. Relational Databases.
4.
Write down the disadvantages of Aggregate Oriented Database. How to overcome that?
5.
Define Data Locality Optimization.
6.
State the purpose of Hadoop Pipes.
7.
What do you mean by Apache Oozie? What are its contents?
8.
List down the entities of YARN?
9.
How Cassendra integrated with Hadoop?
10. List down the tools related with Hadoop.

PART B (5 x 16 = 80 Marks)
11. (a)
(i)
Discuss about the three dimensions of Big Data.
(ii)
Why Crowd Sourcing Analytics needed? Explain.
(10)
(6)
Or
(b) (i)
Why Hadoop is called a Big Data technology? Explain how it supports

Big Data?
12. (a)
(10)
(ii)
Illustrate on how Cloud and Big Data related to each other.
(6)
(i)
With the help of a Data Model explain aggregations and relations.
(8)
(ii)
Give an example for Map Reduce Calculations.
(8)
Or
(b) (i)
(ii)
13. (a)
Describe about Graph Databases and Schemaless Databases.

Explain on how to combine Sharding and Replication.
With an example code explain on how Hadoop analyzes data?
(10)
(6)
(16)
Or
(b) (i)
(ii)
Discuss the steps involved in designing HDFS.

Show on how a client read and write data in HDFS. Give an example
code.
14. (a)
(8)
(8)
With necessary diagram explain the Anatomy of MapReduce Job run?
(16)
Or
(b) Discuss on the different types and formats of MapReduce with an example
eachone.
15. (a)
(16)
Give an overview on:

(i)
HBase Data Model
(8)
(ii)
Pig Data Model
(8)
Or
(b) (i)
What are the different ways to insert data into a table using Hive. Give
a sample query for each kind?
(ii)
Write down the queries involved in Hive Data Definition.
B.BHUVANESWARAN / AP (SS) / CSE / REC - 2
(10)
(6)
Reg No.
Question Paper Code : 63326
M.E. DEGREE EXAMINATION, NOVEMBER / DECEMBER 2015
Elective
Computer Science and Engineering
CP7019 MANAGING BIG DATA
(Common to M.E. Computer Science and Engineering (with specialization in Networks)
and M.E. Biometrics and Cyber Security)
(Regulations 2013)
Time : Three hours
Maximum : 100 Marks

Answer ALL questions
PART A (10 x 2 = 20 Marks)
1.
What are the three dimensions used in Big Data?
2.
Define Crowd Sourcing Analytics.
3.
What is schemaless database?
4.
What are aggregates?
5.
What is MapReduce data flow with multiple related tasks?
6.
What is failover and fencing?
7.
Define TaskTracker failure.
8.
What is shuffle and sort?
9.
What are the different ways of executing Pig program?
10. Contrast TINY-INT and SMALL-INT in Hive data types?

PART B (5 x 16 = 80 Marks)
11. (a)
What is the role of Big Data Analytics in industries? Illustrate with three
domain examples.
(16)
Or
(b) (i)
12. (a)
Explain the components of Hadoop system.
(8)
(ii)
Brief about Inter-firewall and Tran-firewall analytics.
(8)
(i)
Explain the process of partitioning and combining MapReduce System.
(8)
(ii)
Explain about peer-to-peer replication in distribution models.
(8)
Or
(b) Describe about Materialized Views and Sharding.
13. (a)
(16)
Describe the following in terms of HDFS:

(i)
Name Node and Data Node
(4)
(ii)
Basic Filesystem operations in Hadoop
(4)
(iii) Query in Filesystem
(4)
(iv) Coherency Model in Hadoop Filesystem
(4)
Or
(b) (i)
(ii)
Explain the concept of serialization in Hadoop.

Write short notes on sequence file and map file in file based data
structures.
14. (a)
(i)
(10)
(6)
Write down and explain MapReduce workflow for the following

system:
Find the mean maximum recorded temperature for everyday of the year
(ii)
and every weather station.
(8)
Explain packaging, deployment running for the above workflow job.
(8)
Or
(b) (i)
15. (a)
Explain YARN MapReduce in anatomy of MapReduce job runs.
(8)
(ii)
Explain any two multi-user schedulers in MapReduce.
(8)
(i)
Explain in detail about HBase implementation.
(8)
(ii)
Discuss about common issues when running HBase cluster under load.
(8)
Or
(b) (i)
(ii)
Briefly explain the join operations using Hive.
(8)
Explain about four types of functions in Pig.
(8)

CP7019-Managing Big Data-Anna University - Question Paper

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CP7019-Managing Big Data-Anna University - Question Paper

Hochgeladen von

Copyright:

Verfügbare Formate

Reg No.

Question Paper Code : 13277

Maximum : 100 Marks

What is Big Data? Why we need to analyze Big Data?

Write down any four industry examples for Big Data.

Compare and contrast NoSQL vs. Relational Databases.

Define Data Locality Optimization.

State the purpose of Hadoop Pipes.

What do you mean by Apache Oozie? What are its contents?

List down the entities of YARN?

How Cassendra integrated with Hadoop?

10. List down the tools related with Hadoop.

Discuss about the three dimensions of Big Data.

Why Crowd Sourcing Analytics needed? Explain.

Why Hadoop is called a Big Data technology? Explain how it supports

Illustrate on how Cloud and Big Data related to each other.

With the help of a Data Model explain aggregations and relations.

Give an example for Map Reduce Calculations.

Describe about Graph Databases and Schemaless Databases.

With an example code explain on how Hadoop analyzes data?

Discuss the steps involved in designing HDFS.

With necessary diagram explain the Anatomy of MapReduce Job run?

Give an overview on:

HBase Data Model

Pig Data Model

Write down the queries involved in Hive Data Definition.

B.BHUVANESWARAN / AP (SS) / CSE / REC - 2

Maximum : 100 Marks

What are the three dimensions used in Big Data?

Define Crowd Sourcing Analytics.

What is schemaless database?

What are aggregates?

What is MapReduce data flow with multiple related tasks?

What is failover and fencing?

Define TaskTracker failure.

What is shuffle and sort?

What are the different ways of executing Pig program?

10. Contrast TINY-INT and SMALL-INT in Hive data types?

Explain the components of Hadoop system.

Brief about Inter-firewall and Tran-firewall analytics.

Explain the process of partitioning and combining MapReduce System.

Explain about peer-to-peer replication in distribution models.

B.BHUVANESWARAN / AP (SS) / CSE / REC - 3

(b) Describe about Materialized Views and Sharding.

Describe the following in terms of HDFS:

Name Node and Data Node

Basic Filesystem operations in Hadoop

(iii) Query in Filesystem

(iv) Coherency Model in Hadoop Filesystem

Explain the concept of serialization in Hadoop.

Write down and explain MapReduce workflow for the following

and every weather station.

Explain packaging, deployment running for the above workflow job.

Explain YARN MapReduce in anatomy of MapReduce job runs.

Explain any two multi-user schedulers in MapReduce.

Explain in detail about HBase implementation.

Briefly explain the join operations using Hive.

Explain about four types of functions in Pig.

B.BHUVANESWARAN / AP (SS) / CSE / REC - 4

Das könnte Ihnen auch gefallen