Setup: sas control center -> SASEM -> Configuration setup.
Used by large banks, consulting companies, tremendous productivity in mins, business users less tuned to programming, mainly business uses. Cmd -> javaws (1st test pass) -> java1.8 (required with 32 bit) (Java se runtime 8u45) -> to check for right version javaws viewer -> java -> view -> x56 for 32 bit version. Run application for java runtime Co variance n correlation capturing degree of linear association btw 2 variables -1 to 1; standardized form of co-variance; unit less Agency problems Co-variance not scaled invariant and hence not compatible; scale invariance independent of scale Std deviations can be multiplied as they can exist. Co-relation btw totally different headlines Making prediction with the i/p variables independent variables Regression what & why ? used here for modelling Create a New Project -> Next ; ok -> enter project name -> next 2 -> finish -> stored in sas studio ( click on the link in the web) Create lib -> new-> lib-> ceate new lib-> will be given by sir Applied analytics using enterprise miner aaem 3 steps- create project, lib, diagram Segment ppl lapsing donors donated last year but not in the present so we target them. Figuring out to response or not coming back with the help of label (obs 1 donated, 2 didnt , etc.) How ppl responded in the past and whom to target today. (campaigns). In 98 donated in 96-97 ; 95-96 are lapsing donors -> divide data like training and test data as we need evidence of how the model we make is performing. We make a model with training data and then we predict with the other set data (like exam and q not from the book but similar) over fitting. Evidence of generalization. Over sampling building model for defaulters. Response sheet of a campaign is very low in reality, so eg actual response rate is 5% so we cant build a model in the ration of 19:1 -> so I will be having the info about good guys which I wont be able to predict the defaulters. Non- homogenous; so take a sample eg 1 milli records overall with 5% responders with data set of 50 split (adjustment formula); probability estimate and then again the adjustment. Models cant be built on unbalanced data. Separate sampling responders n non-responders arbitrarily extorting data Meta data adviser option - variable 1st column Id variable one which is not required the entire process of modelling only for identifier. Role and the level of each variable is guessed by the tool from name and value respectively; dummy variable eg. Salary = (a+b) * gender so g=0- >m; g=1+f Salary a+b(region) r=0 n; r=1 e; r= 2 w; r=3 s; which implies hardcoding of the data which comes down to forcing the relationship. So slary = a+bn+be+bw+bs Tool is not able to identify btw nominal variable. Class levels count threshold = 2 try out; variables with loads of classes but are nominal in nature and arent required, then we can reject this variable. Qualitative variable will be continuously used even though it has 99 classes. Credit scoring model to identify defaulters; h0 not default; h1 defaulters Type 1- not defaulters being named as defaulters type 2 vice versa
True negative 0,0 T1 0,1
False negative 1,0 True positive 1,1
Minimization depends on the cost on the company eg type 2 is loan
outstanding & type 1 opportunity cost through cltv. Segment of the customer, find cltv calculating expected loss so minimize the most imp one hence type 1. (extortion high interest rate or the goons) Estimation of probability and classification groups based on a threshold Type 1 error increases and type 2 decreases The cost differentiation is based on the segments historical data Specificity TN/(Tn+FP) Lift is measure of performance; downward sloping curve Depth x-axis % response % capture response and % response rate Not communicated well amongst the target customers so inflow will be less Insufficent advertisement cal lead to less inflow Paying capacity of ld people maynot avail all our facilities which can pose risk to our profit margin