Sie sind auf Seite 1von 5

DEPARTMENT OF COMPUTER APPLICATIONS

SUBJECT: DATA MINING TECHNIQUES & WAREHOUSING


UNIT 1
1. Describe the knowledge discovery process used in databases. List essential stages of KDD.
2. Why do we need data mining, discuss? Describe various types of databases and scope of
data mining therein.
3. How is the following helpful in data mining?
(i)
Fuzzy logic
(ii)Neural network
(iii)Genetic Algorithm
4. Discuss major issues in data mining regarding:
(i)
Mining Methodology and user interaction
(ii)
Performance
(iii)Diverse data type
5. Draw & discuss architecture of a data mining system.
6. Explain the typical Data Warehouse architecture in detail. Justify the role of Metadata in
Data Warehouse design.
7. Explain the characteristic features of a data warehouse with examples. Every Data structure
in a data warehouse contains the time dimension, Justify.
8. A data warehouse designed is subject-oriented in nature. What could be the assumed
subjects of decision-making / in-depth analysis for a local community bank?
9. Explain the star, snowflake & fact constellation schemas for representing multidimensional databases taking suitable examples.
10. Describe the different types of OLAP operations supported by Multidimensional databases
with an example.
11. Distinguish between:(i) MIS and Data warehouse.
(ii) OLAP and OLTP systems
(iii) Operational Data base systems and Data Warehouses
(iv) ROLAP and MOLAP
(v) Dependent and Independent Data Marts
(vi) Supervised and Unsupervised Learning
UNIT 2
1. What are the types of storage repositories in the data staging component of DWH
architecture ?
2. Discuss issues to consider during data integration.
3. List five reasons why you think data quality is critical in a data warehouse.
4. List and describe the five primitives for specifying a data mining task.
5. Describe why concept hierarchies are useful in data mining. Briefly define the four major
types of concept hierarchies with example.
6. Describe the differences between the following architectures for the integration of a data
mining system with a database or a data warehouse system: no coupling, loose coupling,
semitight coupling and tight coupling.
7. For class characterization, what are the major differences between a data cube-based
implementation and relational implementation such as attribute-oriented induction?
8. Discuss why analytical characterization is needed and how it can be performed. Compare
the results of two induction methods: (1) with relevance analysis and (2) without relevance
analysis.
9. Outline a data cube-based incremental algorithm for mining analytical class comparisons.
10. Outline a method for (1) parallel and (2) distributed mining of statistical measures of data
dispersion in a data cube environment.

UNIT-3
1. Describe the stages of mining Temporal Data. Also explain some of the Temporal
Association rules.
2. Discuss the concept of frequent set, confidence and support. Define an association rule.
What are the steps in Association Rule Mining?
3. Define a FP-tree. Discuss the method of computing a FP-tree.
4. What is Market basket analysis?
5. Briefly outline the major steps of decision tree classification.
6. Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of samples to evaluate pruning?
7. Why is nave Bayesian classification called nave? Briefly outline the major ideas of nave
Bayesian classification.
8. Compare the advantages and disadvantages of eager classification versus lazy
classification.
9. Encode K-nearest neighbour classification algorithm for mining a database.
10. What is boosting? State why it may improve the accuracy of decision tree induction.
1.
2.
3.
4.
5.
6.
7.
8.

(i)
(ii)

What is a Classification problem? How is decision tree useful in classification?


Write the major issues regarding Classification and Prediction.
Briefly outline the major steps of decision tree classification.
Why is nave Bayesian classification called nave? Explain Nave Bayesian
classification with example.
Compare the advantages and disadvantages of eager classification versus lazy
classification.
Why is tree pruning useful in decision tree induction? What is a drawback of
using a separate set of samples to evaluate pruning?
Encode K-nearest neighbour classification algorithm for mining a database.
The following table shows the midterm and final exam grades obtained for
students in a database course.
X
Y
Midterm exam
Final Exam
72
84
50
63
81
77
74
78
84
90
96
75
59
49
83
79
65
77
33
52
88
74
81
90
Plot the data. Do X and Y seem to have a linear relationship?
Use the method of least squares to find an equation for the prediction of a
students final exam grade based on students midterm grade in the course.

9. Predict the final exam grade of a student who received an 86 on the midterm
exam.Consider the set of training samples
F1
F2
F3
Category
Example1 A
T
0.2
+
Example2 B
F
0.5
+
Example3 B
F
0.9
+
Example4 B
T
0.6
Example5 A
T
0.1
Example6 A
T
0.7
(a) How might a Nave Bayes system classify the following test example?
F1=c, F2=T, F3=0.8
(b) Show the calculations that ID3 would perform to determine the root node of a
decision tree using the above training examples.
10. The sample dataset below contains the profile of 12 customers whose buy or
no-buys to the new promotional email are listed below:Customer Customer uses
Education level Buy
income
high connection
decision
1
Low
No
High school
No-buy
2
Low
Yes
High school
No-buy
3
Low
No
College
No-buy
4
Low
Yes
College
Buy
5
Medium
No
High school
No-buy
6
Medium
Yes
High school
No-buy
7
Medium
No
College
Buy
8
Medium
Yes
College
Buy
9
High
No
High school
No-buy
10
High
Yes
High school
Buy
11
High
No
College
Buy
12
High
Yes
College
Buy
Try to predict the buying decision of a new customer whose annual income is Rs.
15,00,000 uses a 512 KB modem and majored in business management.
UNIT IV
1. What is Clustering?
2. Given two objects represented by the tuples (10, 44, 3, 28, 18) and (13, 50, 2, 18,
25) :
(i)
Compute the Euclidean distance between the two objects.
(ii)
Compute the Manhattan distance between the two objects.
3. Given the following measurements for the variable marks:
40, 65, 54, 78, 92, 72, 46, 59, 80, 63
standardize the variable by the following:
(i)
Compute the mean absolute deviation of marks.
(ii)
Compute the z-score for the first four measurements.

4. Describe the working of the PAM algorithm. Compare its performance with
CLARA and CLARANS.
5. Briefly describe density-based clustering methods and grid- based clustering
methods. Give examples in each case.
6. Why is outlier mining important? Explain distance-based outlier detection and
deviation- based outlier detection approaches.
1.

UNIT-V
Text Mining is different from conventional Data Mining. Comment.

2. Discuss the usability of Data Mining techniques in E-Commerce.


3. Can we do Data Mining on data generated by Web. Explain.
4. Explain the use of Data Mining techniques in the field of IDS (Intrusion Detection System)
and future challenges to improve it.
5. Describe the stages of mining Temporal Data.

Das könnte Ihnen auch gefallen