Beruflich Dokumente
Kultur Dokumente
Ping Yin
11/10/2016
Contents
Executive Summary ------------------------------------------------------------------------------------- 3
Introduction ---------------------------------------------------------------------------------------------- 4
Purpose ---------------------------------------------------------------------------------------------------- 5
Methodology
Executive Summary
After data preparation and partition, three models are
built in SAS studio, EM, and DataRobot
The same test dataset is scored by these models
The model built in EM has the best performance
Introduction
Can we predict peoples Income level based on their
age, gender, education, etc.?
What is my income level after I graduate?
Purpose
Figure out the best predictive model for Income dataset
Predict my Income level
Practice skills for preparing data, building model, and model
assessment
Data Selection
Exploration
Using SAS studio to explore data
32,561 observations
15 variables: 6 Num, 9 Char
Num: Age Capitalgain Capitalloss Weekhour Edunum Fnlwgt
Char: Income Relationship Education Occupation Sex
Marital
Workclass Race Nativecountry
Target: Income (>50K , <=50k)
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Exploration
Data issues :
SAS Studio
Training dataset
Enterprise Miner
Test dataset
DataRobot
Model Comparison
Model Comparison
The best model in this project:
EM
Studio
DataRobot
EM
Studio
DataRobot
Using 70%
data to build a
model
DataRobo
t Project
The
overall
best
model
Options
Using DataRobot to build models without handling data
issues
Summary
We can predict peoples income level based on their
characteristics
For Income dataset, DataRobot is most robust to build
models
Be aware of unexpected outcomes for data preparing
Back and forth, until getting an ideal result
Appendix
Link to Data:
https://www.kaggle.com/uciml/adult-census-Income
Thanks !