Beruflich Dokumente
Kultur Dokumente
UNIT-3
1. Describe the stages of mining Temporal Data. Also explain some of the Temporal
Association rules.
2. Discuss the concept of frequent set, confidence and support. Define an association rule.
What are the steps in Association Rule Mining?
3. Define a FP-tree. Discuss the method of computing a FP-tree.
4. What is Market basket analysis?
5. Briefly outline the major steps of decision tree classification.
6. Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of samples to evaluate pruning?
7. Why is nave Bayesian classification called nave? Briefly outline the major ideas of nave
Bayesian classification.
8. Compare the advantages and disadvantages of eager classification versus lazy
classification.
9. Encode K-nearest neighbour classification algorithm for mining a database.
10. What is boosting? State why it may improve the accuracy of decision tree induction.
1.
2.
3.
4.
5.
6.
7.
8.
(i)
(ii)
9. Predict the final exam grade of a student who received an 86 on the midterm
exam.Consider the set of training samples
F1
F2
F3
Category
Example1 A
T
0.2
+
Example2 B
F
0.5
+
Example3 B
F
0.9
+
Example4 B
T
0.6
Example5 A
T
0.1
Example6 A
T
0.7
(a) How might a Nave Bayes system classify the following test example?
F1=c, F2=T, F3=0.8
(b) Show the calculations that ID3 would perform to determine the root node of a
decision tree using the above training examples.
10. The sample dataset below contains the profile of 12 customers whose buy or
no-buys to the new promotional email are listed below:Customer Customer uses
Education level Buy
income
high connection
decision
1
Low
No
High school
No-buy
2
Low
Yes
High school
No-buy
3
Low
No
College
No-buy
4
Low
Yes
College
Buy
5
Medium
No
High school
No-buy
6
Medium
Yes
High school
No-buy
7
Medium
No
College
Buy
8
Medium
Yes
College
Buy
9
High
No
High school
No-buy
10
High
Yes
High school
Buy
11
High
No
College
Buy
12
High
Yes
College
Buy
Try to predict the buying decision of a new customer whose annual income is Rs.
15,00,000 uses a 512 KB modem and majored in business management.
UNIT IV
1. What is Clustering?
2. Given two objects represented by the tuples (10, 44, 3, 28, 18) and (13, 50, 2, 18,
25) :
(i)
Compute the Euclidean distance between the two objects.
(ii)
Compute the Manhattan distance between the two objects.
3. Given the following measurements for the variable marks:
40, 65, 54, 78, 92, 72, 46, 59, 80, 63
standardize the variable by the following:
(i)
Compute the mean absolute deviation of marks.
(ii)
Compute the z-score for the first four measurements.
4. Describe the working of the PAM algorithm. Compare its performance with
CLARA and CLARANS.
5. Briefly describe density-based clustering methods and grid- based clustering
methods. Give examples in each case.
6. Why is outlier mining important? Explain distance-based outlier detection and
deviation- based outlier detection approaches.
1.
UNIT-V
Text Mining is different from conventional Data Mining. Comment.