Sie sind auf Seite 1von 12

BUSINESS STATISTICS

Project
Installment – II

Group – 8
Abhishek Singh (173)
Girish Matada (194)
Jai Kishore Jangir (198)
Paras Kohli (210)
Tushar Rane (233)
Unnati Kandelwal (234)
Installment 2

1. If the average PRSM of all loans is less than 0.9, then management may
need to seek concessions from the large bank that funds its lending
operation. Management would prefer to avoid this negotiation unless the
data have compelling information. What do you recommend to
management?

A. In this case we will apply hypothesis testing to help management decide regarding the
negotiation. For which following conditions need to be tested:

Condition 1: SRS Condition - Sample comprises less than 10% of the population. Sample
Size is 628 which is less than 10% of the population.
Condition 2: N>10(K4 (Kurtosis))

Kurtosis = 0.10 therefore the condition holds: N=627>10*(.08)  627>0.8

Summary Statistics
Mean = 0.860211
Std Dev = 0.2228784
Std Err Mean = 0.0088938
Upper 95% Mean = 0.8776762
Lower 95% Mean = 0.8427457
N = 628
Variance = 0.0496748
Kurtosis = -0.089767

Hypothesis Testing:

Ho: Ho >= 0.9


H1: Ha < 0.9

 Applying t-test, we will be calculating the t value to test the hypothesis.


Assumption: Manager wrongly avoids the negotiation and gets fooled by only 5% sample
that is he seeks negotiation.

Calculation:

t = (x-uo) /(s/sqrt(n)) = (.86 - .9)/(.22/(Sqrt(628))


= -4.47 x(sample mean)
= 0.86 uo(test mean)
= 0.9
s(sample standard deviation) = 0.22 n(sample size) = 628
At a = 0.05, t value = -1.645 . The t value calculated is less than the critical t value hence, Ho
is rejected. Therefore, Manager will seek negotiation.

Additionally as per JMP the p value for test statistic = -4.47 is 0.0001 which is beyond the
95% confidence.

2. The categorical variable Loan Size Class organizes the loans into size
categories, S1 for the smallest loans, then S2, S3, and S4 for the largest.
Business processes currently used by the lender are well matched to
dealing with smaller loans. Management plans to retain the current
processes unless the population share of loans in classes S1 and S2 is less
than 50%. Use a hypothesis test and tell management whether to change
Processes.

A. To solve this we will apply hypothesis testing to help management to decide whether to
retain the process. For which following conditions need to be tested:

Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
Size is 628 which is less than 10% of the population.
Condition 2 : Both np and n(p - 1) are larger than 10.
= 628(.4697) = 294.97 which is > 10
= 628(1-.4697) = 333 which is >10
Where p is the proportion of S1+S2 which is p= .4697

Hypothesis Testing:

Ho: Ho >= 0.5


H1: H1 < 0.5
Proportion of S1+S2 in the sample = (75+220)/628
= 295/628
= 0.4697 (p)
Applying z-test, we will be calculating the z value to test the hypothesis.

Sample z statistics = (p-po) /(sqrt((po (1-po)) /n))


= (.467 - .5) /sqrt(.25/628)
= -1.515
At z = 0.05, z value = -1.645.
The The z statistic value calculated for the sample is greater than the critical z value hence
Ho is not rejected. Therefore Management should retain the current process.
3. Suppose that loans in size categories S1 and S2 together make up 45% in
the population of loans served by this lender. What is the probability that
the procedure used in Question 2 would fail to recommend the proper
action, namely to revise business processes.

A. Apply hypothesis testing to help management decide regarding the retainment of process. For
hypothesis testing the following conditions need to be validated.

Condition 1: SRS Condition - Sample comprises less than 10% of the population. Sample
Size is 628 which is less than 10% of the population.
Condition 2: Both npo and n(p o - 1) are larger than 10.
= 628(.45) = 282.6 which is > 10
= 628(1-4.5)
= 345.4 which is >10
Where p is the proportion of S1+S2 which is p= .45 (as stated in the question)

Hypothesis Testing:
Ho: Ho >= 0.5
H1: H1 < 0.5
po = 0.5
Proportion of S1+S2 in the sample = 0.45 (p) (as mentioned in the question) Applying z-
test, we will be calculating the z value to test the hypothesis.

Sample z statistics = (p-po) /(sqrt((po (1-po)) /n))


= (.45 - .5) /sqrt(.25/628)
= -2.5
At z = 0.05, z value = -1.645. The z statistic value calculated for the sample is less than
the critical z value hence Ho is rejected. Therefore Management should change the
current process.
4. Question 8 of the first installment compares the proportions loans that are
“above” and “below” for repeat and original loans. Is there a statistically
significant difference between the proportions in the two samples? Use an
appropriate rather than informal procedure.

A. The proportions of the Loan types are:

Loan type(R)= pR =0.5769


Loan type(O) = po = 0.48822
We will apply hypothesis testing to compare the two Loan types. For hypothesis testing the
following conditions need to be validated.

Technical Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
size is 628 which is less than 10% of the population.
Condition 2 : Both npR and n(pR - 1) are larger than 10 and npO and n(pO - 1) Loan type R
= 628(.5769) = 362.29 which is > 10
= 628(1-.5769) = 265 which is >10
Loan type O
= 628(.48822) = 306.6 which is > 10
= 628(1-.48822) = 321.3 which is >10

Substantive Validation
No Lurking variable: Assuming that there are no lurking variables as the sample comes from
a randomized data. SRS sample as discussed under Technical validation conditions.

Hypothesis Testing:
Ho: pR- po = 0; the difference is not statistically significant;
H1: ur- uo not equal to 0; the difference is statistically significant.

Sample z statistics = (pR - po) / se( (pR - po) = 2.019


At z = 0.05, z value +- 2 (two sided test hence z=2) . The z statistic value calculated for the
sample is more than the critical z value hence Ho is rejected. Therefore the difference is
significantly different.

5. Rather than compare proportions as in Question 4, compare the average


PRSM scores for repeat and original loans. Does a test of the difference in
average PRSM scores produce the same result as the test of proportions?

A. For hypothesis testing the following conditions need to be validated.

Technical Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|) Loan type R:

nR = 234, u R = 0.891, sR= 0.192 Loan type O


no = 394, u o = 0.842, sO = 0.012
nR> 10(|Kurtosis|) = 234 > .6
and no> 10(|Kurtosis|) = 394 > 2.2

Substantive Validation:
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under Technical validation
conditions.
Condition 2: The variances are similar since the IQR for both the plots are:
Loan type R : IQR = .77 to 1.01
Loan type O : IQR = .67 to 1.01

Hypothesis Testing:

Ho: uR- uo = 0, the difference is not statistically significant


H1: pR - po not equal to 0, the difference is statistically significant.

Sample z statistics = (uR - uo) / se( (uR - uo)


= (.89 -.84)/se(.89 -.84)
= .05/ sqrt(0.000158 + 0.0001428)
= .05/.0173 = 2.89
At z = 0.05, z value +- 2 (two sided test hence z=2). The z statistic value calculated for the
sample is more than the critical z value hence Ho is rejected. Therefore the difference is
significantly different.
Yes test of the difference in average PRSM scores produce the same result as the test of
proportions
6. Using two-sample comparisons, do original/repeat loans appear
significantly different in any of the following respects? Report a p-value for
each of the following and summarize your findings. No elaborate
comparison is necessary.

(a) FICO
(b) Years in business SMMD Term 1, 2017
(c) Current delinquent credit lines
(d) Number of derogatory legal items
A.
a) For hypothesis testing the following conditions need to be validated.
Technical Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|) Loan type R:
nR = 234, uR = 2.72, sR= 3.28, Kurtosis = 2.04
Loan type O no = 394, uo = 2.88, sO = 3.95, Kurtosis = 28.7
nR> 10(|Kurtosis|) = 234 > 20.4
and no> 10(|Kurtosis|) = 394 > 287

Substantive Validation:
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under
Technical validation conditions.
Condition 2: The variances are similar since the IQR for both the plots are: Loan type R :
IQR = 0 to 4
Loan type O : IQR = 0 to 4
Hypothesis Testing:
Ho: uR- uo = 0; the difference is not statistically significant ;
H1: uR - uo not equal to 0; the difference is statistically significant.

Sample z statistics = (uR - uo) / se( (uR - uo)


= (2.72 -2.88)/se(2.72 -2.88)
= - (.16/ 0.29)
= -0.55

At z = 0.05, z value +- 2 (two sided test hence z=2) . The z statistic value calculated for the
sample is less than the critical z value hence Ho is not rejected. Therefore, the difference is
statistically significant.

b) For hypothesis testing the following conditions need to be validated.


Technical Validation

Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|) Loan type R:
nR = 234, uR = 13.07, sR= 14.54, Kurtosis = 8.9
Loan type O :
no = 394, u o =7.93 , sO = 9.28, Kurtosis = 13.18
nR> 10(|Kurtosis|) = 234 > 89
and no> 10(|Kurtosis|) = 394 > 131
Validation
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under
Technical validation conditions.
Condition 2: The variances are similar since the IQR for both the plots are: Loan type R :
IQR = 4 to 17 Loan type O :
IQR = 2 to 10

Hypothesis Testing:
Ho: uR- uo = 0; the difference is not statistically significant ;
H1: uR- uo not equal to 0; the difference is statistically significant.

Sample z statistics = (uR - uo) / se( (uR - uo)


= (13.07 -7.93)/se(13.07 -7.93)
= 5.14/1.18
= 4.32
At z = 0.05, z value +- 2 (two sided test hence z=2) . The z statistic value calculated for the
sample is more than the critical z value hence Ho is rejected. Therefore, the difference is not
statistically significant.

c) For hypothesis testing the following conditions need to be validated.


Technical Validation:

Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR > 10(|Kurtosis|) and no > 10(|Kurtosis|)
Loan type R:
nR = 234, u R = 0.891, s R = 0.192 Loan type O
no = 394, u o = 0.842, sO = 0.012
nR> 10(|Kurtosis|) = 234 > .6
and no> 10(|Kurtosis|) = 394 > 2.2

Validation:
Condition 1 - No Lurking variable : Assuming that there are no lurking variables as the
sample comes from a randomized data. SRS sample as discussed under
Technical validation conditions.
Condition 2: The variances are similar since the IQR for both the plots are:
Loan type R : IQR = .77 to 1.01 Loan type O :
IQR = .67 to 1.01

Hypothesis Testing:
Ho: uR- uo = 0; the difference is not statistically significant
H1: uR - uo not equal to 0; the difference is statistically significant.
Sample z statistics = (uR - uo) / se( (uR - uo)
= (13.07 -7.93)/se(13.07 -7.93)
= 5.14/
= 0.28
At z = 0.05, z value +- 2 (two sided test hence z=2) . The z statistic value calculated for
the sample is less than the critical z value hence Ho is not rejected. Therefore the
difference is statistically significant.

d) For hypothesis testing the following conditions need to be validated.

Validation
Condition 1 : SRS Condition - Sample comprises less than 10% of the population.
Sample size is 628 which is less than 10% of the population.
Condition 2 : Both nR> 10(|Kurtosis|) and no> 10(|Kurtosis|)
Loan type R:
nR = 234, u R = 1.213, sR = 3.22, Kurtosis = 38.06 Loan type O
no = 394, u o =0.91 , sO = 2.35, Kurtosis = 19.99
nR> 10(|Kurtosis|) = 234 <380.6 and no> 10(|Kurtosis|) = 394 > 199.9

Since the kurtosis test for the Loan type R for Number of derogatory legal items is not
valid hence we cannot perform the Z test and hence is inconclusive of the result.
7. Find the 95% confidence interval for the average FICO score and give a
brief interpretation of the meaning of this interval.

A. For hypothesis testing the following conditions need to be validated.


Condition 1 : SRS Condition - Sample comprises less than 10% of the population. Sample
size is 628 which is less than 10% of the population.
Condition 2 : N>10(K4(Kurtosis))
Kurtosis = -0.37 therefore the condition holds
N=627>3.7

Mean 581.25478
Std Dev 70.971536
Std Err Mean 2.8320726
Upper 95% Mean 586.81627
Lower 95% Mean 575.69328
N 628
Variance 5036.9589
Kurtosis -0.374674

The confidence interval for this dataset states that there is 95% probability that the mean
FICO for the population of borrowers lies between 575.7 and 586.8

8. Going forward, management would like to estimate the average PRSM in


the population of loans produced by a new ISO to 2 decimal places (i.e., to
have the margin of error less than 0.005). How large a sample of loans
from this ISO would you recommend?
A. Margin error = 2(SD)/ Sqrt(n)

0.005 = 2 (.5)/sqrt(n), considering


SD= 0.5 ,
n = 40000(sample loans if we consider SD = 0.5).
This sample space is huge. We will be checking the CI for the dataset and taking CI value
closer to 0.5.
95% CI for the SD = 0.2077 to 0.24. Taking SD = 0.24 for the margin error formula. We
have:

0.005 = 2(.24)/ SQRT (n)


n = 9216
9. The Chief Risk Officer (CRO) is concerned about the percentage of loans
that have PRSM less than 0.60. What can you tell the CRO about this
percentage from your data?
A. As per the graph shown we can suggest the CRO that we are 95% confident that the
percentage of the loans which have PRSM less than 0.60 will lie between 8.8% and 15.5%
with a mean 0f 11.7%.

10.Do businesses operating in any specific SIC code appear to have


particularly high or low PRSM scores? List at least 2 aspects of the data
that complicate formulating an answer to this question.

A. From the plot it is not possible to infer about the relation of SIC codes with PRSM score due
to the following reasons:

 The sample size of each SIC code is very less as shown in the plots. Additionally,
the normal quantile plots show that the distributions are not normal.

 We have sample size of most of the PRSM score w.r.t SIC codes greater than 10|K|.
Thus we cannot use the normal distribution formulas to compare the variation of
PRSM with respect to the SIC codes.

 Confounding Effect: Inferring the value of PRSM from the data distribution as per
SIC Code would be biased since the data won’t be randomly distributed leading to
confounding effects.

Das könnte Ihnen auch gefallen