Sie sind auf Seite 1von 12

SRM VALLIAMMAI ENGINEERING

COLLEGE
SRM Nagar, Kattankulathur – 603 203

DEPARTMENT OF
COMPUTER SCIENCE ANDENGINEERING
QUESTION BANK

VIII SEMESTER

IT6006- DATA ANALYTICS

Regulation – 2013

Academic Year 2019 – 20

Prepared by

Ms. G.SANGEETHA, Assistant Professor/CSE


VALLIAMMAI ENGINEERING COLLEGE
SRM Nagar,Kattankulathur- 603 203
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Academic Year 2019-2020
QUESTION BANK- ODD SEMESTER
NAME OF THE SUBJECT DATA ANALYTICS
SUBJECT CODE IT6006
SEMESTER VII
YEAR IV
DEPARTMENT COMPUTER SCIENCE AND ENGINEERING
HANDLED & PREPARED BY Ms. G.SANGEETHA

UNIT 1 INTRODUCTION TO BIG DATA


Introduction to Big Data Platform – Challenges of conventional systems - Web data –
Evolution of Analytic scalability, analytic processes and tools, Analysis vs reporting -
Modern data analytic tools, Stastical concepts: Sampling distributions, resampling,
statistical inference, prediction error.

PART – A
Q.No Question Competence Level
1 List the main characteristics of Big Data. Remember BTL 1
2 Differentiate Big Data and Conventional Data. Understand BTL 2
3 List the various dimensions of growth of Big Data. Remember BTL 1
4 List the advantage of using a Massive Parallel Processing system. Remember BTL 1
Show why Artificial Neural Networks are not commonly used for
5 Apply BTL 3
Data Mining tasks.
6 Examine why do you use inferential statistics in Big data. Remember BTL 1

7 Define Baye`s Rule. Remember BTL 1

8 Classify the types of web mining. Analyze BTL 4


9 Why domain expertise is required for any type of Data Analytics? Remember BTL 1
10 Give reason : “Web Data is the most popular Big Data”. Understand BTL 2

11 Justify “Accuracy in big data is beneficial” Evaluate BTL 5

12 Give the Advantages of using Semi-Structured Data format. Understand BTL 2


13 Differentiate Private and Public Cloud. Understand BTL 2
Compare how semi-structured data is different from Unstructured and
14 Analyze BTL 4
Structured Data.
Can you generalize the role of analytical tools in big data. Create BTL 6
15
16 Analyse the similarity between the data mining and data analysis. Apply BTL 3
17 Show what the use of MapReduce in data analytics . Apply BTL 3
18 Why you need to tame big data ? Justify. Evaluate BTL 5

19 Summarize the benefits of analytic sandbox. Analyze BTL 4

20 Generalize the ideas of hypothesis testing. Create BTL 6

PART-B
Q.No. Question Competence Level
i. What is Bigdata? Describe the main features of a data analytical Remember BTL 1
1 system? (6)
ii. Describe in detail about the role of statistical models in Big
data. (7)
i. List the main characteristics ofbigdata. (4) Remember BTL 1
2 ii Describe big data architecture with a neat schematic diagram.(9).
3 Formulate the different statistical concepts in inference (13)
Create BTL 6

i. Define prediction. (3) Remember BTL 1


4 ii. Describe the various prediction techniques in detail. (10)
i. Point out the features of Massive parallel processing system.(5) Analyze BTL 4
5 ii. Explain the use of Massive Parallel Processing system in big data
analytics. (8)
i. Show how would you use the sampling distribution system. (6) Apply BTL 3
6 ii. Illustrate the resampling methods in detail. (7)
Differentiate in detail the analysis tools and reporting tools used in Understand BTL 2
7
Big-data. (13)
i. What are the best practices in Big Data Analytics? (5) Analyze BTL 4
8 ii. Explain the techniques used in Data analytics. (8)
i. What make great analysis? State reason with example . (6) Understand BTL 2
9
ii. Distinguish between attrition and Response modeling. (7)
i. Describe in detail about hypothesis testing. (6) Remember BTL 1
10 ii. Describe in detail about the probability distribution and entropy.
(7)
i. Illustrate the importance of the tools in big data. (6) Apply BTL 3
11 ii. Examine in detail the trends and technology in bigdata. (7)
i. Assess the difficulties faced by conventional systems. (5) Evaluate BTL 5
12 ii. Explain the differences between big data architecture from the
traditional one. (8)
13 Summarize the modern data analytic toolsindetail. (13) Understand BTL 2
i. Pointout the importance of datamining in data analytics. (4) Analyze BTL 4
14 ii. Analyse the similarity between the data mining and data analysis.
(9)
PART C
Analysein detail about the challenges of the Big Data inModern Analyze BTL 4
1.
DataAnalytics. (15)
Justify the Statement “Web Data is the Most Popular Big Data”with Evaluate BTL 5
2.
reference to dataanalyticprofessional. (15)
Comment on the statement “Relationship between parallel databases Evaluate BTL 5
3. and Big Data with respect to three V’s”. (15)
Develop the role of Analytic Sandbox and its benefits in the Analytic Create BTL 6
4. Process and differentiate how the views of IT professional are different
from Analytic Professional. (15)
UNIT II DATA ANALYSIS
Regression modeling, Multivariate analysis, Bayesian modeling, inference and Bayesian
networks, Support vector and kernel methods, Analysis of time series: linear systems
analysis, nonlinear dynamics - Rule induction - Neural networks: learning and
generalization, competitive learning, principal component analysis and neural networks;
Fuzzy logic: extracting fuzzy models from data, fuzzy decision trees, Stochastic search
methods.
PART – A
Q.No Question Competence Level
1 What is data analysis? Remember BTL 1
2 Show how would you use for envision the “flow” of dynamics? Apply BTL 3
3 Generalize support-vector machines. Create BTL 6
4 What are the sub processes of Intelligent Data Analysis? Understand BTL 2
5 List out the stages in the Bayesian Data Analysis. Remember BTL 1
6 Define multivariate analysis. Remember BTL 1
7 Classify the stages in the process of IDA. Analyze BTL 4
8 Assess the importance of neural networks in data analysis. Evaluate BTL 5
What information is used to analyse data using Kohonen’s self- Remember BTL 1
9
organizing maps?
10 Generalize propositional rule learning. Create BTL 6
11 Give the main idea of principal component analysis. Understand BTL 2
12 What is Multiple Linear Regression? Remember BTL 1
13 Show how is logistic regression different from linear regression. Apply BTL 3
14 Classify the levels of the techniques in multivariate analysis. Analyze BTL 4
15 What can you say about Hebbian learning? Understand BTL 2
16 List the sub processes of Intelligent Data Analysis. Remember BTL 1
17 What is Rule Induction? Understand BTL 2
18 Relate confidence and support in association rule. Apply BTL 3
19 Point out the difference between linear and nonlinear regression. Analyze BTL 4
20 Summarize the parameters used to characterize any fuzzy membership Evaluate BTL 5
function..

PART-B
Q.No. Question Competence Level

i. Examine the purpose of using Regression Modeling inData Remember BTL 1


Analysis. (7)
1
ii. What kind of inferences it provides? (6)

2 Give a short note on Data Analysis and its Importance. (13) Understand BTL 2
3 i. Assess when do we use multivariate analysis. (5) Evaluate BTL 5
ii. Explain in detail about the various Multivariate
Analysis Techniques with examples. (8)
i. What is the main idea of analyzing time series? (5) Understand BTL 2
4
ii. Distinguish between linear and non-linear dynamics in brief.(8)
i. Analyse and write a short note on BayesianDataAnalysis. (8) Analyze BTL 4
5
ii. Explain Bayesian Inference process in detail. (5)
Point out some of the applications of Data Analysis and its impact on Analyze BTL 4
6
various fields. (13)
i. What are prediction error?. (4) Remember BTL 1
7 ii. State and explain the prediction error in regression and
classification with suitable example. (9)
i. Identify the different mechanisms needed for learning. (6) Apply BTL 3
8 ii. How do use the generalization techniques neededto
Illustrate neural networks? (7)
List the types of evolution strategies in search analysis and Remember BTL 1
9
explain in detail. (13)
i.Distinguish between supervised and unsupervised learning with Analyze BTL 4
10
example (6)
ii.Given the following 3D input data identify the principal
component 1 1 9; 2 4 6; 3 7 4 ; 4 11 4; 5 9 2. (7)
Remember BTL 1
11 List out and explain some of the applications of SVMindetail (13)
Understand BTL 2
i. What is PrincipalComponentAnalysis? (7)
12
ii. Discuss how is it useful in explaining data patterns. (6)

i. Explain the structure of neural networks. (6) Apply BTL 3


13 ii. Illustrate the mathematical functions used in the data analysis
process. (7)
Generalize for what purpose the Fuzzy logics serve in the field of Create BTL 6
14 dataanalysis. (13)
PARTC
Justify the statement in detail : “Data Analysis is not a decision- Evaluate BTL 5
1 making system, but a decision-supporting system” . (15)
Create a Regression Model for “Happy people get many hours of Create BTL 6
2 sleep” using your own data and what kind of inferences it provides.
(15)
Analyze, how does Fuzzy Analytics fit for Financial Market Analyze BTL 4
3
Researchers. (15)
Design and Explain Naïve Bayes Classifier. Use the data give below Create BTL 6
to indicate the probability of a player who enjoys playing a sport.
Here, P stands for Yes and N stands for No. Now, let us say we want
to predict “Enjoy Sport” on a day with the following conditions:
4
< Outlook = sunny; temparature = cool;Humidity = high; windy
=strong> (15)
UNIT III - MINING DATA STREAMS
Introduction to Streams Concepts – Stream data model and architecture - Stream
Computing, Sampling data in a stream – Filtering streams – Counting distinct elements
in a stream – Estimating moments – Counting oneness in a window – Decaying window -
Realtime Analytics Platform(RTAP) applications - case studies - real time sentiment
analysis, stock market predictions.
PART – A
Q.No. Question Competence Level
1 List the main characteristics of stream sources. Remember BTL 1
2 What factors lead to Concept Drift? Remember BTL 1
3 Analysewhy data stream management is relevant in data mining. Analyze BTL 4
4 Define decay window. Remember BTL 1
5 List out the examples for stream sources. Remember BTL 1
6 List out the few challenges of data mining algorithms. Remember BTL 1
7 What is a cardinality estimation problem? Understand BTL 2
8 Analysethe statement “Filtering a Data Stream”. Apply BTL 3
9 How are “moments” estimated? Understand BTL2
10 What is Real-Time Analysis? Understand BTL 2
11 Show how to deal with infinite streams. Apply BTL 3
12 What is a Data Stream Management System? Remember BTL 1
13 Show what examples can you find for stream sources. Apply BTL 3
14 What is called Data Stream Mining? Understand BTL 2
Compare and contrast RTAP (real time analytics platform) and Analyze BTL 4
15
RTSA (real time sentiment analysis)?
16 Prove by induction on m that 1+3+5+· · ·+(2m−1) =m2 Analyze BTL 4
Can you identify the following? Evaluate BTL 5
Suppose our stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5.
Our hash functions will all be of the form h(x) = ax+b mod 32 for
some a and b. You should treat the result as a 5-bit binary integer.
17 Determine the tail length for each stream element and the resulting
estimate of the number of distinct elements if the hash function is:
(a) h(x) = 2x + 1 mod32.
(b) h(x) = 3x + 7 mod32.
(c) h(x) = 4x mod 32.
Compute the surprise number (second moment) for the stream 3, 1, Evaluate BTL 5
18
4, 1, 3, 4, 2, 1, 2. What is the third moment of this stream?
Based on what you know, plan how would you partition the Create BTL 6
19 following bit stream into buckets 1001011011101? Find all of
them?
Generalize what is the storage requirement for the DGIM Create BTL 6
20
algorithm.
PART-B
Q.No. Question Competence Level
i. Define data stream. (3) Remember
1 ii. Describe the Big Data Stream Analytics Framework (BDSAF) BTL 1
with a neat architecture diagram (10)
i. Explain Sampling in Data Streams . (5) Evaluate BTL 5
2
ii. Explain the sampling types in detail (8)
i. Describe briefly how to count the distinct elements in a stream. Remember BTL 1
3 (9)
ii. What do you meant by count–distinct problem ? (4)
i. Analyze how sentiment analysis is playing a major role in data Analyze BTL 4
4 mining. (6)
ii. Infer what approaches are use to make sentiment analysis.(7)
i. Express what bloom filters are. (3) Understand BTL 2
5 ii. Summarize the function of bloom filters with example.(10)
Discuss in detail about the real time analytics platform Understand BTL 2
6
applications. (13)
i. Examine what is real time sentiment analysis. (3) Apply BTL 3
7 ii. Show how the mining concept is used in real time sentiment
analysis (10)
Describe how is data analysis used in Remember BTL1
8 i. Stock market predictions. (7)
ii. Weather forecasting predictions. (6)
Apply BTL 3
i. Illustrate what approaches are used to estimate the moments.(8)
9
ii. Examine the function cost of exact counts. (5)

10 Discuss the concept of decaying window in detail. (13) Understand BTL 2


11 Describe data stream management systems in detail. (13) Remember BTL 1
Analyze what are the phases involved in real time data analytics. Analyze BTL 4
12
(13)
i. Assuming a real time stock market situation, generalize various Create BTL 6
13 ideas used in prediction analysis. (5)
ii Explain the Flajolet-Martin Algorithm in detail. (8)
i. Explain in detail about Alon-Matias-Szegedy algorithm for Analyze BTL 4
14 second moments. (8)
ii. Explain the concept of higher order moments. (5)
PART – C
Formulate the process of Data Stream Mining with suitable Create BTL 6
1
examples. (15)
Analyze the Bloom Filter in detail with an algorithm. Apply this Analyze BTL 4
2 bloom filter algorithm in Adhar card( Unique Identification number)
(15)
3 Assess the role of Decaying Windows in data stream analysis and Evaluate BTL 5
give examples. (15)
Prepare a generic design for Realtime Analytics Create BTL 6
4 Platform(RTAP). Discuss your answer related to realtime
sentiment analysis. (15)
UNIT IV FREQUENT ITEMSETS AND CLUSTERING
Mining Frequent itemsets - Market based model – Apriori Algorithm – Handling large
data sets in Main memory – Limited Pass algorithm – Counting frequent itemsets in a
stream – Clustering Techniques – Hierarchical – K- Means – Clustering high
dimensional data – CLIQUE and PROCLUS – Frequent pattern based clustering
methods – Clustering in non-euclidean space – Clustering for streams andParallelism.

PART – A
Q.No. Question Competence Level
1 Define frequent itemset. Remember BTL 1
Compare and contrast the Multistage and Multi-Hash
2 Understand BTL 2
algorithm
3 List the features of cluster. Remember BTL 1
4 Show what the role of monotonicity is. Apply BTL 3
5 Define singleton. Remember BTL 1
6 Assess how to pick K in a K-Means Algorithm. Evaluate BTL 5
7 What can you say about CLIQUE and PROCLUS? Understand BTL 2
8 Define Hierarchical Clustering. Remember BTL 1
9 Analyse the association rule of frequent items. Analyze BTL 4
10 List the clustering strategies. Remember BTL 1
11 Show how to stop the Merger Process. Apply BTL 3
12 Explain the role of hash tree in association rule discovery. Remember BTL 1
13 What is meant by Merging Buckets in BDMO? Understand BTL 2
14 Formulate the applications of frequent itemset. Create BTL 6
15 Give an outline of strength and weakness of clique. Understand BTL 2
16 Show how to use the main memory for Itemset Counting. Apply BTL 3
Compare and contrast the relationship between centroids and
17 Analyze BTL 4
clustering.
18 Explain the working of Toivonen’s algorithm with example. Analyze BTL 4
19 Can you identify the Pair Counting Bottleneck . Evaluate BTL 5
20 Generalize how to initialize the K-Means algorithm. Create BTL 6

PART-B
Q.No. Question Competence Level
i. Define K-Means algorithm and how will you initialize the Remember BTL 1
1 clusters and pick the value for K? (8)
ii. Examine how the data is processed in BFR Algorithm.(5)
i. Illustrate briefly about Mining frequent Itemsets with its Apply BTL 3
2
applications. (9)
ii. Illustrate how you will find Association Rules with High
confidence. (4)
i. Explain k-means clustering algorithm with an example. (6) Analyze BTL 4
3 ii. List the different hierarchical clustering techniques and explain
any one. (7)
Summarize the hierarchical clustering in Euclidean and non- Understand BTL 2
4
Euclidean Spaces with its efficiency. (13)
Analyse and write a short note on Market-Basket Model with a suitable Analyze BTL 4
5
example. (13)
i. What are the main features of GRGPF Algorithm? (4) Remember BTL 1
6 ii. Examine how to initialize the cluster tree and add points in
GRGPFAlgorithm. (9)
7 Describe about Stream clustering and parallel clustering.(13) Understand BTL 2
A database has five transactions. Let min sup = 60% and min Create BTL 6
conf=80%
TID ITEMS
T100 Milk, Onion, Nuts, Kiwi, Egg, Yoghurt
8 T200 Dhal, Onion, Nuts, Kiwi, Egg, Yoghurt
T300 Milk, Apple, Kiwi, Egg
T400 Milk, Curd, Kiwi, Yoghurt
T500 Curd, Onion, Kiwi, Ice cream,Egg
Find all frequent itemsets using Apriori method. (13)
Discuss the various steps of PROCLUS clustering algorithm and Understand BTL 2
9
itssignificances (13)
Quote short notes on Remember BTL 1
i. Simple Randomized Algorithm. (4)
10
ii. SON Algorithm. (4)
iii. Toivonen’s Algorithm. (5)
IllustratehowwouldyoudescribethevariousstepsofCLIQUE Apply BTL 3
11
clustering algorithm andits significances (13)
i. List the difficulties of handling large datasets.(4) Remember BTL 1
12 ii. What approach can be used to handle large datasets in main
memory? (9)
13 Explain the two-pass A-Priori Algorithm in detail (13). Analyze BTL 4
Suppose that A, B ,C ,D , E and F are all the items. For a particular Evaluate BTL 5
14
support threshold the maximal frequent item sets are {A , B, C } and
{D , E }. What is the negative border? (13)
PART – C
Evaluate the Apriori algorithm for discovering frequent item sets of the Evaluate BTL 5
following table. (15)

Summarize hierarchical clustering in detail. Given a one dimensional Evaluate BTL 5


dataset{1, 5, 8, 10, 2}use the agglomerative clustering algorithms with
2 the complete link with Eucledian distance to establish a hierarchical
grouping relationship. By using the maximal lifetime as the cutting
threshold, how many clusters are there? What is their membership in
each cluster? (15)
Develop the steps in K-means algorithm and using K- means algorithm Create BTL 6
and Euclidean distance to cluster the following 8 examples into 3
clusters. A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5) A6=(6,4),
A7=(1,2), A8=(4,9). Suppose that the initial seeds are A1, A4 and A7.
Run the k-means algorithm for 1 epoch only. At the end of this epoch
3 show
i) The new clusters (5)
ii) The centres of the new clusters (5)
iii) How many more iterations are needed to converge? Draw the
result for each epoch. (5)

Compose a Kohenen self organizing net with two cluster units and five Create BTL 6
input units. The weight vectors for the cluster units are given by
4
W1= [1.0, 0.9, 0.7, 0.5, 0.3 ]
W2= [0.3, 0.5, 0.7, 0.9, 1.0]
Use the square of Euclidean distance to find the winning cluster unit
for the input pattern x= [0.0, 0.5, 1.0, 0.5, 0.0] .Using a learning rate of
0.25, find the new weights for the winning unit[ HINT the winner unit
is the one with smaller index] \ ht
(15)

UNIT V FRAMEWORKS AND VISUALIZATION


MapReduce – Hadoop, Hive, MapR – Sharding – NoSQL Databases - S3 - Hadoop
Distributedfilesystems–Visualizations-Visualdataanalysistechniques,interaction
techniques; Systems and applications:
PART – A
Q.No. Question Competence Level
1 What is CAP theorem? State its significances. Remember BTL 1
2 Describe Relational Database. Understand BTL 2
3 Deduce the components of Hadoop framework. Evaluate BTL 5
4 Pointouthow can you manage compute node failures. Analyze BTL 4
5 What is the advantage of MaPR? Remember BTL 1
6 Give the applications of IDA. Understand BTL 2
7 What is Hadoop Distributed File System? Remember BTL 1
8 Show if we use a triangular matrix to count pairs and n, the Apply BTL 3
number of items is 20, what pair’s count is in a [100].
Who is generating big data and what are the ecosystem Create BTL 6
9
projects used for processing?
10 Illustrate the strength and weakness of map reduce. Apply BTL 3
11 List the data types to be visualized. Remember BTL 1
12 Express how does Map-Reduce computation execute? Understand BTL 2
13 What is NoSQL? Remember BTL 1
14 Illustrate the benefits of visual data exploration. Apply BTL 3
15 Classify visualization techniques. Analyze BTL 4
16 Give the features of Hive. Understand BTL 2
17 Classify interaction techniques. Analyze BTL 4
18 What is hive in Big Data? Remember BTL 1
19 Judge why the partitions are shuffled in map reduce? Evaluate BTL 5
20 How will you formulate Hadoop development? Create BTL 6

PART-B
Q.No. Question Competence Level
i. List the features of Hadoop and explain the functionalities Remember BTL 1
of Hadoop cluster? (6)
1
ii. Describe briefly about Hadoop input and output and write a
note on data integrity? (7)
i. Illustrate in detail about Hive data manipulation, queries, Apply BTL 3
2 data definition and datatypes. (7)
ii. Illustrate in brief composing map reduce calculations.(6)
Describe the system architecture and components of Hive and Remember BTL 1
3
Hadoop. (13)
Explain briefly on Analyze BTL 4
4
i. MapR (4) ii.Shrading(5) iii. S3 (4)
Consider a collection of literature survey made by a Create BTL 6
researcher in the form of a text document with respectto cloud
5 and big data analytics. Using Hadoop and MapReduce,
write a program to count the occurrence of pre dominant
keywords. (13)
i. Describe Map Reduce framework in detail. Draw the Remember BTL 1
architectural diagram for physical organization of compute
6
nodes. (7)
ii. Define HDFS. Explain HDFS in detail. (6)
i. Analyse what are the visualization techniques used to Apply BTL 3
7
visualizing data. (7)
ii. Explain any two approaches. (6)
Summarize briefly on Understand BTL 2
8 i. Features of MapR distribution. (8)
ii. Explain the architecture for MapR. (5)
Quote short notes on Remember BTL 1
9
i. NoSQL Databases and its types. (6)
ii. Visualization for BigData. (7)
10 Discuss the various core components of the Hadoop. (13) Understand BTL 2
11 Compare and Contrast the Hadoop and MapR. (13) Analyze BTL 4
i. Explain the purpose of sharding. (7) Analyze BTL 4
12
ii. Explain the process of sharding in MongoDB. (6)
Describe in detail about the issues in the development of Understand BTL 2
13
IDA.(13)
i. Assess the significances of Map Reduce . (4) Evaluate BTL 5
14 ii. Explain about Hadoop distributed file system architecture
withneatdiagram. (9)
PART-C
Recommend a procedure to find the number of occurrence of Evaluate BTL 5
1
a word in adocument . (15)
Analyse the use of Hive. How Does Hive Interact With Analyze BTL 4
2
Hadoop explain in detail? (15)
Develop a visualization technique to represent the following Create BTL 6
data. (15)
i. uni-variate data,
3
ii. 2D data
iii. multi-dimensionaldata
iv. pyramid-typedata.
Formulate how big data analysis helps business people to Create BTL 6
4 increase their revenue. Discuss with any one real time
application. (15)

Das könnte Ihnen auch gefallen