Sie sind auf Seite 1von 7

S7

Extra: Feature Selec/on


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30



Feature Selec/on
Step 1: Use Domain knowledge to guide you whenever possible Step 2: Visualize aKributes Remove aKributes with no values, too many missing values Check for obvious outliers and remove them Step 3: Construct new aKributes (if it makes sense) Combine aKributes Normalize numeric aKributes (for regression, Nave Bayes, NN hKp://www.tuVs.edu/ ~gdallal/regtrans.htm) Create binary aKributes from nominal aKributes Step 4: Select the best subset of aKributes for the problem IF IN DOUBT CHOOSE A METHOD THAT DOES THE FEATURE SELECTION FOR YOU (for example, decision trees)

The Basics
Basic Ideas
Usually faced with problem of selec/ng subset of possible predictors Have to balance conic/ng objec/ves
Want to include all variables that have legi/mate predic/ve skill Want to exclude all extraneous variables that t only sample- specic noise
Reduce predic/ve skill Increase standard errors of regression coecients , classica/on, etc.

Ideally would be able to determine single best subset of predictors to include


But no single deni/on of best Dierent algorithms will produce dierent "best" subsets Problems magnied by correla/on among predictors

Feature Selec/on
Ranking
By some objec/ve (for example, informa/on gain)

Subset
Algorithms (see next slide) Wrapper (try subset within the context of the algorithm you know you are going to use)

Feature Selec/on Algorithms


All possible subsets
Only feasible with small number of poten/al predictors (maybe 10 or less) Then can use one or more of possible numerical criteria to nd overall best Start with no predictors

Forward stepwise regression


First include predictor with highest correla/on with response In subsequent steps add predictors with highest par/al correla/on with response controlling for variables already in equa/ons Stop when numerical criterion signals maximum (minimum) Some/mes eliminate variables when t value gets too small

Backward elimina/on

Only possible method for very large predictor pools Local op/miza/on at each step, no guarantee of nding overall op/mum Start with all predictors in equa/on

OVen produces dierent nal model than forward stepwise method

Remove predictor with smallest t value Con/nue un/l numerical criterion signals maximum (minimum)

Mul/colinearity (regression)
The degree of correlation between Xs. A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation) Imprecise estimates of slopes and even the signs of the coefficients may be misleading. t-tests which fail to reveal significant factors. The analysis of variance for the overall model may show a highly signicantly good t, when paradoxically; the tests for individual predictors are non-signicant.

S7 Extra: Feature Selec/on


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

Das könnte Ihnen auch gefallen