Beruflich Dokumente
Kultur Dokumente
Project
Installment – II
Group – 8
Abhishek Singh (173)
Girish Matada (194)
Jai Kishore Jangir (198)
Paras Kohli (210)
Tushar Rane (233)
Unnati Kandelwal (234)
Installment 2
1. If the average PRSM of all loans is less than 0.9, then management may
need to seek concessions from the large bank that funds its lending
operation. Management would prefer to avoid this negotiation unless the
data have compelling information. What do you recommend to
management?
A. In this case we will apply hypothesis testing to help management decide regarding the
negotiation. For which following conditions need to be tested:
Condition 1: SRS Condition - Sample comprises less than 10% of the population. Sample
Size is 628 which is less than 10% of the population.
Condition 2: N>10(K4 (Kurtosis))
Summary Statistics
Mean = 0.860211
Std Dev = 0.2228784
Std Err Mean = 0.0088938
Upper 95% Mean = 0.8776762
Lower 95% Mean = 0.8427457
N = 628
Variance = 0.0496748
Kurtosis = -0.089767
Hypothesis Testing:
Calculation:
Additionally as per JMP the p value for test statistic = -4.47 is 0.0001 which is beyond the
95% confidence.
2. The categorical variable Loan Size Class organizes the loans into size
categories, S1 for the smallest loans, then S2, S3, and S4 for the largest.
Business processes currently used by the lender are well matched to
dealing with smaller loans. Management plans to retain the current
processes unless the population share of loans in classes S1 and S2 is less
than 50%. Use a hypothesis test and tell management whether to change
Processes.
A. To solve this we will apply hypothesis testing to help management to decide whether to
retain the process. For which following conditions need to be tested:
Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
Size is 628 which is less than 10% of the population.
Condition 2 : Both np and n(p - 1) are larger than 10.
= 628(.4697) = 294.97 which is > 10
= 628(1-.4697) = 333 which is >10
Where p is the proportion of S1+S2 which is p= .4697
Hypothesis Testing:
A. Apply hypothesis testing to help management decide regarding the retainment of process. For
hypothesis testing the following conditions need to be validated.
Condition 1: SRS Condition - Sample comprises less than 10% of the population. Sample
Size is 628 which is less than 10% of the population.
Condition 2: Both npo and n(p o - 1) are larger than 10.
= 628(.45) = 282.6 which is > 10
= 628(1-4.5)
= 345.4 which is >10
Where p is the proportion of S1+S2 which is p= .45 (as stated in the question)
Hypothesis Testing:
Ho: Ho >= 0.5
H1: H1 < 0.5
po = 0.5
Proportion of S1+S2 in the sample = 0.45 (p) (as mentioned in the question) Applying z-
test, we will be calculating the z value to test the hypothesis.
Technical Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
size is 628 which is less than 10% of the population.
Condition 2 : Both npR and n(pR - 1) are larger than 10 and npO and n(pO - 1) Loan type R
= 628(.5769) = 362.29 which is > 10
= 628(1-.5769) = 265 which is >10
Loan type O
= 628(.48822) = 306.6 which is > 10
= 628(1-.48822) = 321.3 which is >10
Substantive Validation
No Lurking variable: Assuming that there are no lurking variables as the sample comes from
a randomized data. SRS sample as discussed under Technical validation conditions.
Hypothesis Testing:
Ho: pR- po = 0; the difference is not statistically significant;
H1: ur- uo not equal to 0; the difference is statistically significant.
Technical Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|) Loan type R:
Substantive Validation:
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under Technical validation
conditions.
Condition 2: The variances are similar since the IQR for both the plots are:
Loan type R : IQR = .77 to 1.01
Loan type O : IQR = .67 to 1.01
Hypothesis Testing:
(a) FICO
(b) Years in business SMMD Term 1, 2017
(c) Current delinquent credit lines
(d) Number of derogatory legal items
A.
a) For hypothesis testing the following conditions need to be validated.
Technical Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|) Loan type R:
nR = 234, uR = 2.72, sR= 3.28, Kurtosis = 2.04
Loan type O no = 394, uo = 2.88, sO = 3.95, Kurtosis = 28.7
nR> 10(|Kurtosis|) = 234 > 20.4
and no> 10(|Kurtosis|) = 394 > 287
Substantive Validation:
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under
Technical validation conditions.
Condition 2: The variances are similar since the IQR for both the plots are: Loan type R :
IQR = 0 to 4
Loan type O : IQR = 0 to 4
Hypothesis Testing:
Ho: uR- uo = 0; the difference is not statistically significant ;
H1: uR - uo not equal to 0; the difference is statistically significant.
At z = 0.05, z value +- 2 (two sided test hence z=2) . The z statistic value calculated for the
sample is less than the critical z value hence Ho is not rejected. Therefore, the difference is
statistically significant.
Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|) Loan type R:
nR = 234, uR = 13.07, sR= 14.54, Kurtosis = 8.9
Loan type O :
no = 394, u o =7.93 , sO = 9.28, Kurtosis = 13.18
nR> 10(|Kurtosis|) = 234 > 89
and no> 10(|Kurtosis|) = 394 > 131
Validation
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under
Technical validation conditions.
Condition 2: The variances are similar since the IQR for both the plots are: Loan type R :
IQR = 4 to 17 Loan type O :
IQR = 2 to 10
Hypothesis Testing:
Ho: uR- uo = 0; the difference is not statistically significant ;
H1: uR- uo not equal to 0; the difference is statistically significant.
Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR > 10(|Kurtosis|) and no > 10(|Kurtosis|)
Loan type R:
nR = 234, u R = 0.891, s R = 0.192 Loan type O
no = 394, u o = 0.842, sO = 0.012
nR> 10(|Kurtosis|) = 234 > .6
and no> 10(|Kurtosis|) = 394 > 2.2
Validation:
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under
Technical validation conditions.
Condition 2: The variances are similar since the IQR for both the plots are:
Loan type R : IQR = .77 to 1.01 Loan type O :
IQR = .67 to 1.01
Hypothesis Testing:
Ho: uR- uo = 0; the difference is not statistically significant
H1: uR - uo not equal to 0; the difference is statistically significant.
Sample z statistics = (uR - uo) / se( (uR - uo)
= (13.07 -7.93)/se(13.07 -7.93)
= 5.14/
= 0.28
At z = 0.05, z value +- 2 (two sided test hence z=2) . The z statistic value calculated for
the sample is less than the critical z value hence Ho is not rejected. Therefore the
difference is statistically significant.
Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|)
Loan type R:
nR = 234, u R = 1.213, sR = 3.22, Kurtosis = 38.06 Loan type O
no = 394, u o =0.91 , sO = 2.35, Kurtosis = 19.99
nR> 10(|Kurtosis|) = 234 <380.6 and no> 10(|Kurtosis|) = 394 > 199.9
Since the kurtosis test for the Loan type R for Number of derogatory legal items is not
valid hence we cannot perform the Z test and hence is inconclusive of the result.
7. Find the 95% confidence interval for the average FICO score and give a
brief interpretation of the meaning of this interval.
Mean 581.25478
Std Dev 70.971536
Std Err Mean 2.8320726
Upper 95% Mean 586.81627
Lower 95% Mean 575.69328
N 628
Variance 5036.9589
Kurtosis -0.374674
The confidence interval for this dataset states that there is 95% probability that the mean
FICO for the population of borrowers lies between 575.7 and 586.8
A. From the plot it is not possible to infer about the relation of SIC codes with PRSM score due
to the following reasons:
The sample size of each SIC code is very less as shown in the plots. Additionally,
the normal quantile plots show that the distributions are not normal.
We have sample size of most of the PRSM score w.r.t SIC codes greater than 10|K|.
Thus we cannot use the normal distribution formulas to compare the variation of
PRSM with respect to the SIC codes.
Confounding Effect: Inferring the value of PRSM from the data distribution as per
SIC Code would be biased since the data won’t be randomly distributed leading to
confounding effects.