Beruflich Dokumente
Kultur Dokumente
Introduction
In this lab, you will investigate other classication techniques with the labor contract data set used in the previous labs. These techniques apply statistical, distance based, and neural network approaches for classication. You will compare these models to each other and to the decision tree models from lab 2.
Outcomes
By completing this lab, students should Understand how to build na Bayesian models for classication; ve Understand how to build a feed forward, back propagation neural network for classication; Understand how to build a distance based model for classication; Understand how to assess the accuracy of several models using cross-validation; Understand how to assess the interpretability and size of dierent types of models for classication.
Prepatory Reading
Chapters 3 and 4 in Dunham. Chapters 4 and 6 Witten and Frank (available on reserve) Sections 7.4, 7.5, and 7.7 in Han and Kamber (available on reserve) Documentation on the various classier implementations available via http://www.cs.waikato. ac.nz/~ml/weka/doc_gui/packages.html
Materials
The following materials will be used in this lab: Weka, which is available on studsys; The data le /home/cstruble/class/mscs228/data/UCI/labor.arff, located on studsys and available from the course web site; 1
Pre-lab Questions
These questions should be answered before you perform the lab assignment. Record your answers in the introduction section of your lab assignment in your lab notebook. 1. Using a na Bayesian model with maximum likelihood estimates for classication with the ve credit risk data below, what would the data item D = good, high, none, 1513k be classied as? Customer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Credit History bad unknown unknown unknown unknown unknown bad bad good good good good good bad Debt high high low low low low low low low high high high high high Collateral none none none none none adequate none adequate none adequate none none none none Income $015k $1535k $1535k $015k > $35k > $35k $015k > $35k > $35k > $35k $015k $1535k > $35k $1535k Credit Risk? high high moderate high low low high moderate low low high moderate low high
2. What would a 3-NN model classify the data item D as? 3. Draw a feed-forward neural network architecture to classify the credit risk data above. 4. Hypothesize whether your best decision tree model from lab 2 will be better, the same, or worse than your best na Bayesian, K-NN, or neural network model that you create in this ve assignment. Provide a reasoning for making your hypotheses.
Procedure
This section provides the steps to take for this lab assignment. As you carry out each step, record observations you make in your lab notebook. Your notes do not have to be completed writing, but youll use them to generate your nal lab report. When you work on a data mining problem, it is important to record any steps you take along with observations you make at each step. Remember, it is often the goal of a data mining project to produce a nal report. That report includes a summary of the steps you took to achieve your nal results. For all of the models below, use 10-fold cross validation to evaluate the accuracy of your models. I will not provide explicit instructions for doing so.