Beruflich Dokumente
Kultur Dokumente
Feature
Selec/on
Step
1:
Use
Domain
knowledge
to
guide
you
whenever
possible
Step
2:
Visualize
aKributes
Remove
aKributes
with
no
values,
too
many
missing
values
Check
for
obvious
outliers
and
remove
them
Step
3:
Construct
new
aKributes
(if
it
makes
sense)
Combine
aKributes
Normalize
numeric
aKributes
(for
regression,
Nave
Bayes,
NN
hKp://www.tuVs.edu/ ~gdallal/regtrans.htm)
Create
binary
aKributes
from
nominal
aKributes
Step
4:
Select
the
best
subset
of
aKributes
for
the
problem
IF
IN
DOUBT
CHOOSE
A
METHOD
THAT
DOES
THE
FEATURE
SELECTION
FOR
YOU
(for
example,
decision
trees)
The
Basics
Basic
Ideas
Usually
faced
with
problem
of
selec/ng
subset
of
possible
predictors
Have
to
balance
conic/ng
objec/ves
Want
to
include
all
variables
that
have
legi/mate
predic/ve
skill
Want
to
exclude
all
extraneous
variables
that
t
only
sample- specic
noise
Reduce
predic/ve
skill
Increase
standard
errors
of
regression
coecients
,
classica/on,
etc.
Feature
Selec/on
Ranking
By
some
objec/ve
(for
example,
informa/on
gain)
Subset
Algorithms
(see
next
slide)
Wrapper
(try
subset
within
the
context
of
the
algorithm
you
know
you
are
going
to
use)
Backward elimina/on
Only possible method for very large predictor pools Local op/miza/on at each step, no guarantee of nding overall op/mum Start with all predictors in equa/on
Remove predictor with smallest t value Con/nue un/l numerical criterion signals maximum (minimum)
Mul/colinearity
(regression)
The degree of correlation between Xs. A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation) Imprecise estimates of slopes and even the signs of the coefficients may be misleading. t-tests which fail to reveal significant factors. The
analysis
of
variance
for
the
overall
model
may
show
a
highly
signicantly
good
t,
when
paradoxically;
the
tests
for
individual
predictors
are
non-signicant.