Beruflich Dokumente
Kultur Dokumente
t1
84
39
t2
8
67
t3
64
61
t4
6
77
t5
94
80
t6
36
35
t7
51
89
t8
21
80
Use rpart with the training examples to come up with a small set of rules that
correctly classify the output variable d based on input variable values (t1, t2,
t3, t4, t5, t6, t7, and t8).
Answer: Completed
Command:
> library(rpart)
> trainingdata = read.csv("dt_train.csv")
> modeldata <- rpart(d ~ t1+t2+t3+t4+t5+t6+t7+t8, data = trainingdata, method
= "class")
> modeldata
d
1
1
Output:
n= 600
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 600 95 1 (0.1583333 0.8416667)
2) t7< 33.5 196 95 1 (0.4846939 0.5153061)
4) t5< 60.5 125 30 0 (0.7600000 0.2400000)
8) t3< 76 95 0 0 (1.0000000 0.0000000) *
9) t3>=76 30 0 1 (0.0000000 1.0000000) *
5) t5>=60.5 71 0 1 (0.0000000 1.0000000) *
3) t7>=33.5 404 0 1 (0.0000000 1.0000000) *
Analysis:
Terminal nodes (leafs) are marked as * at the end of every row. In this case the
nodes are 3, 5, 8 and 9.
Specify the rules.
Command:
> rule <- path.rpart(modeldata, nodes= 3)
> rule <- path.rpart(modeldata, nodes= 5)
> rule <- path.rpart(modeldata, nodes= 8)
> rule <- path.rpart(modeldata, nodes= 9)
Output:
node number: 3
root
t7>=33.5
node number: 5
root
t7< 33.5
t5>=60.5
node number: 8
root
t7< 33.5
t5< 60.5
t3< 76
node number: 9
root
t7< 33.5
t5< 60.5
t3>=76
Analysis:
Predicted value of 'd' is referred from the 'yval' value for terminal node in
'modeldata'. The rules can be specified as:
If t7 >= 33.5 THEN d = 1
If t7 < 33.5 AND t5 >= 60.5 THEN d = 1
If t7 < 33.5 AND t5 < 60.5 AND t3 <76 THEN d = 0
If t7 < 33.5 AND t5 < 60.5 AND t3 >= 76 THEN d = 1
The file dt_test.csv contains 200 test examples with the same 10 variables.
Test your trained classifier on these test example and present your confusion
matrix. Comment on your classification accuracy.
Command & Confusion Matrix (highlighted in Grey):
> testdata = read.csv("dt_test.csv")
> testdataRule1 <- subset(testdata, t7>= 33.5)
> table(testdataRule1$d, testdataRule1$d == "1")
TRUE
1 129
> testdataRule2 <- subset(testdata, t7< 33.5 & t5>= 60.5)
> table(testdataRule2$d, testdataRule2$d == "1")
TRUE
1 22
> testdataRule3 <- subset(testdata, t7< 33.5 & t5< 60.5 & t3 < 76)
t1
8
22
74
66
55
34
23
9
6
68
t2
86
36
26
71
72
58
70
19
71
40
Command:
> newdata = read.csv("dt_new.csv")
> predict(modeldata, newdata)
Output:
01
t3
55
80
32
71
61
22
39
67
20
86
t4
53
69
26
52
41
84
65
43
6
82
t5
36
90
38
42
91
84
16
2
27
82
t6
12
33
52
88
39
61
71
20
58
44
t7
82
22
63
89
50
95
96
92
6
61
t8
19
6
12
70
96
57
78
3
22
48
d
1
1
1
1
1
1
1
1
0
1
1 01
2 01
3 01
4 01
5 01
6 01
7 01
8 01
9 10
10 0 1
Analysis:
d = 1 for new_case = 1, 2, 3, 4, 5, 6, 7, 8 and 10
d = 0 for new_case = 9