Sie sind auf Seite 1von 6

1

Exercise session 1

1.1 Exercise 3.1 { modi ed

Represent as a decision tree:


a.
b.
c.
d.
e.
f.

A ^ :B

:A _ B
A _ (B ^ C )
(:A ^ B) _ (A ^ :B)
(A _ B) ^ (C _ D _ :E )
(A _ B _ C ) ^ (D _ E _ F )

Solution

a.

b.

A
B
f alse

c.

A
B

f alse
true

true

true

A
B

true

f alse

d.

e.

A
B
f alse

true

f alse

B
true

true

f alse

B
D

true

true
f alse

true

f alse

true

true

f.

f alse

with

A
B

f alse

D
E

true

true
f alse

true

f alse

1.2 Exercise 3.2

Consider the following table of training examples:


Instance
1
2
3
4
5
6

Classi cation
+
+
+
1

a1

T
T
T
F
F
F

a2

T
T
F
F
T
T

true

a. What is the entropy with respect to the \classi cation" attribute ?


b. What is the information gain of a2 relative to these training examples?
Solution

a. equal probabilities ! entropy = 1


b. entropy stays 1 ! gain = 0.
1.3 Exercise 3.3

True or false? If a boolean decision tree D2 is an elaboration of D1 ("elaboration" meaning that a leaf
of D1 has been changed into a subtree in D2) then D1 is more general than D2.
Solution

. Re ning a \false" leaf into an internal node with one \false" and one \true" leaf makes the tree
more general.
False

1.4 Exercise 3.4

a. Show a decision tree that could be learned by ID3 assuming it gets the following examples: (Table
2.1 in Mitchell's book)
Sky AirTemp Humidity Wind Water Forecast
sunny warm
normal strong warm same
sunny warm
high
strong warm same
rainy cold
high
strong warm change
sunny warm
high
strong cool change
b. Add this example:
5 sunny warm normal weak warm same no
then show how ID3 would induce a decision tree for these 5 examples.
Example
1
2
3
4

Solution

a. Sky : sunny ! yes ; rainy ! no


Alternative: AirTemp : warm ! yes ; cold ! no
b. S = [3+,2-], entropy(S) = 0.97
S =
= [3+; 1 ], E = 0.81
S =
= [0+; 1 ], E = 0
Gain
= 0.97 - 4/5*0.81 - 1/5*0 = 0.32
Sky
Sky

sunny
rainy

Sky

EnjoySport
yes
yes
no
yes

Attribute Gain
Sky
0.32
AirT emp 0.32
Hum
0.02
W ind
0.32
W ater
0.17
F orecast 0.02
Sky is best attribute; right node is then pure (E=0); compute split for left node:
Attribute Gain
AirT emp 0.0
Hum
0.31
W ind
0.81
W ater
0.12
F orecast 0.12
Wind is best; splits into two pure leaves ! done.
Sky
sunny rainy

[0+; 1 ]

W ind
strong week

 [3+; 0 ] [0+; 1 ]
1.5 Exercise 4.2

Design a two-input perceptron that implements the boolean function A ^ :B, and a 2-layer network of
perceptrons that implements A XOR B.
Solution
A and not B
1

A xor B
1

-1

-1

1
A

A
-1

-1

1
-1
-1

1
1

1.6 Exercise: the Perceptron training rule

The perceptron training rule can be used to learn a perceptron's weight vector for a given set of training
examples. One starts with random weights and then iteratively applies the perceptron training rule to
each example. The loop over the training examples is repeated until all examples are classi ed correctly.
The weight for input i is updated in each step according to
w
w + w
w = (t o)x
i

with t the output for the current example, o the output generated by the perceptron and  a constant
called the learning rate. If  is suciently small and the training examples are linearly separable then
the perceptron is guaranteed to correctly predict all training examples after a nite number of iterations
(See also 4.4.2 in Mitchell's book).
Consider the following training examples (x1 ; x2; t):
(1; 3; 1); (2; 2; 1); (3; 1; 1); (4; 2; 1)
a. Train a perceptron with two inputs (plus a threshold input x0 = 1) so that it predicts t from x1
and x2 . Start with weights w0, w1 and w2 = 1 and take  = 0:1.
b. Plot the input points in (x1 ; x2 ) space and show the decision surface after each training step.
c. Show on the same graph the decision surface for a decision tree.
Solution
4

decision tree
X <= 2.5

e1

(0), (1), (2)


e4
x+y+1>=0

2
e2
1

e3

(3)

0.4x+0.4y+0.6>=0
(4)

0.4x+0.8y+0.8>=0

+
4

Figure 1: Evolution of the decision surface when learning a perceptron


a. For each example (here we use the order in which they are mentioned), apply the perceptron rule.
 Initial decision criterion: pos if 1x + 1y + 1  0 or equivalently, y  x 1.
 Example (1; 3; +1): t = 1; o = 1 ) t o = 0 ) no change to weights.
 Example (2; 2; +1): same remark
 Example (3; 1; 1): t = 1; o = 1 ) t o = 2
w
w +  (t o)x = 1 + 0:1( 2)3 = 0:4
w
w +  (t o)y = 1 + 0:1( 2)1 = 0:8
w1
w1 +  (t o)1 = 1 + 0:1( 2)1 = 0:8
New decision criterion: pos if 0:4x + 0:8y + 0:8  0 or equivalently, y  0:5x 1.
x

 Example (4; 2; 1): t = 1; o = 1 ) t

o= 2
+ (t o)x = 0:4 + 0:1( 2)4 = 0:4
+ (t o)y = 0:8 + 0:1( 2)2 = 0:4
+ (t o)1 = 0:8 + 0:1( 2)1 = 0:6
New decision criterion: pos if 0:4x + 0:4y + 0:6  0 or equivalently, y  x 1:5.
After seeing 4 examples, the perceptron is consistent with all the examples.
b. See Figure 1 for an overview of how the decision surface changes.
c. A possible decision tree given the training examples is:
wx
wy
w1

wx
wy
w1

25
:

pos

>

2:5

neg

The decision surface for this tree is also shown on Figure 1.


1.7 Exercise: Decision regions in neural networks

Consider a two-dimensional space XY; X and Y are inputs of a perceptron. We have seen that the
decision surface of a perceptron is always a straight line. We also know that perceptrons can apply OR
and AND operations to boolean values.
Now consider a neural network with two layers of perceptrons (a hidden layer with n perceptrons and
an output layer of 1 perceptron). Based on the above observations, what kind of decision surface do you
think such a network can at least form?
Now look at Figure 4.5 in Mitchell's book, and compare the decision regions seen there to the results you
just obtained. What is your conclusion?
Solution

-3

-1

-1

1
1

x
1

1
3
-1

1
3

-1
1

-1

-1

Figure 2: Output of a simple network with 5 hidden nodes, where the output node is not thresholded.
All output weights are assumed to be 1 here. What decision regions could you get in this case by adding
a threshold to the last node?
2-layer case: each perceptron in the hidden layer represents a straight line; the perceptron res if a signal
is on one side of the line. Viewing the outputs of the hidden layer as booleans, the output perceptron
can combine these booleans with an AND operation. It will re if the input signal is to one speci c side
of each line. This allows the network to represent any convex polygon.
5

More generally, the network can form decision regions bounded by n straight lines, where n is the number
of nodes in the hidden layer. A simple example is shown in Figure 2.
In Fig. 4.5, the decision region corresponding to each single output node of the network indeed resembles
an area bounded by a number of lines, however the lines are not straight. This is due to the nonlinear
character of the network (using sigmoids instead of strict threshold functions).

Das könnte Ihnen auch gefallen