Sie sind auf Seite 1von 2

20 marks objective test

10 marks ppt
20 marks project

UCI website for data mining


google data for data mining
data dictionary where all variables are defined
variable name is in german
case of logistic regression, so ans will be in the form of if the person is a good
person ( will repay the amount ) ?
Data is categorical so create a dummy variable
Bi variate profiling next class
duration is aledmount of credit is continous
create a new column called random =RANDBETWEEN(1,10)
save and close
spss

Convert duration to scale variable,hoehe , and alter are scale variables

analyze-regression-->binary logistic
dependent variable is credit
balance credit and everything except random is covariate
random is selection variable
1)click on rule and do less than or equal to 7 that means random number which are
less than or equal to 7 will be a part of training data and reamining is
validation data

2)there are two types of data traininng data and validation data
3) clic on categorical and select all variables and unselect duration hoehe and
alter
4) click on first as reference c ategory is compared to others reference is 0 means
NO
5) click on save and click under predicted values on probabilites
6)click on option and cluck on hosmer_lemeshow
7) Methord is forward LR is same as stepwise methord in linear regression
Analysis
case processing summary
736 cases selected for test data

dependent variable is 0 or 1
categorical variable coding 0 is taken as dummy variable which means model without
it shows how different variables are defined in categorical form
Beginning block
model is predicted 69.6 in selected cases and 66.7 in unselected cases

Variables in the Equation


Block 1 mehtord=forward stepwise is the methord tell our computer to run the
logistic reegression

Model summary
-2log likelihood tell that the last model likelihood values becomes less as soon
the step increases

Hosmer test tell that how many expected cases are equal to observed cases. Reject
null hypothesis if P<=alpha
null is ob=exp
significance value is 0.969 so not reject the value

Variables in the equation


all the signicatn variables that predict 0 and 1 in the model

now click on Pre_1 which means answer wald=b/s.e if wald value is high it is good

to see model has predicted test well or not ROC curve


Test variable is predicted prob
state variable is credit
value of state variable is 1
roc curve with diagonal reference line

area under the curve more it is close to 1 means model has preicted well

ROC curve is sensistivity vs

click on file --save as excel file


open the excel file and apply filter
click on random and uncheck all except 8 , 9 ,10 since we have to apply test in
excel sps validate for 1--7

create new sheet after applying random filter


sort predicted value largest to smallest
copy credit column after predicted column name

Create a decile which means dividing data set into buckets .


now give 1 to all first 265 variables
265 is the total count

select credit and decile and create a new pivot table


decile to row lables
and credit to values

Cumulative good
to determine cuttoff the maximum profit than go to cu bad avoided % fourth decile

Das könnte Ihnen auch gefallen