Beruflich Dokumente
Kultur Dokumente
Question 01.
(a) Describe the distribution of the PRSM scores using both graphics and descriptive
statistics.
(b) Does it appear reasonable to use the Empirical Rule to describe the variation of PRSM?
Briefly explain your reasoning.
(c) If you were to remedy any evident anomalies with this variable, how well would the
Empirical Rule work now?
Solution 01
Part a)
PRSM Score has been calculated using,
PRSM Score = (2*Amt paid in 6 months)/Total amt to be paid
Post this, we need to check whether the data is normally distributed or not, for this, we have
included the beginning 628 data records in our test.
Descriptive Method:
Mean of PRSM scores=1.932137
Standard deviation of PRSM scores=28.04
Inorder to find if the data follows normal distribution, we have to follow empirical rule and
check whether 68% of the values fall under one sd,95% fall under two sd and 99.73% fall
under three sd. Through R, we obtained that for one standard deviation 99.84% of values are
coming under the curve which means that the data is not normally distributed.
Graphical Method:
We plotted a density histogram with normal curve to check. Fig. 1 depicts that the data is
highly skewed to the right and does not follow normal distribution.
We noticed that one of the PRSM scores had abnormally high value of 703.535 occurring at
row no. 527. So, we assumed that there was discrepancy in recording the data and hence we
removed this particular record from the dataset.
After removing the particular record, we found that
Mean of PRSM scores=0.81
Standard deviation of PRSM scores=0.22
Descriptive method:
We found that for empirical rule the following values were obtained:
One standard deviation: 69.69%
Two standard deviation: 94.89%
Three standard deviation: 99.68%
Hence, as per empirical rule we can say that the data approximately follows a normal
distribution.
Graphical Method:
After re-plotting the histogram and qq plot we found that the data nearly follows a normal
distribution (Fig 3 and 4)
The t-test result show that null hypothesis is rejected and mean is less than the target of 1.
Question 03:
Control charts can be used to measure the stability of many types of data,
including the performance of loans. Assume that the loans in your data table are
arranged in chronological order, starting from the first row through row 628.
Generate a control chart of the PRSM scores for your sample, completing the JMP
dialog as shown below. These choices set the process mean = 0.9, the
standard deviation = 0.24, and group the loans into batches of size 40. Be sure
to resolve the anomaly noted in the first question.
Part a)
Do the resulting x-bar and s-charts indicate that the lending process has been in
control over the sampled period? What are the implications, especially with
regard to the confidence interval in Question 2?
Part b)
Why are the control limits for the mean in the x-bar chart so much wider than the
confidence interval for the mean used in Question 2? Two reasons, please! On
drawing the x-bar and s-charts, it can be noted that the process is in control (Fig. 5 and 6).
Solution 03
Part a)
Fig 6. S-chart
Part b)
The control limits are based on 6 while the control limits in question 2 were based on 4 for
95% confidence interval.
The plot shows that the data follows negative exponential distribution and not normal
distribution.
(b) Describe the distribution of the variable defined by log(1+Years in Business).
In
particular, is the variation in the transformed variable nearly normal?
0.8094
Std Dev
0.2067
Kurtosis
-0.0216
8. For this question, recode the PRSM score into a two-level categorical
variable, labeled as Above or Below depending on whether the PRSM
score is above or below the average PRSM score in your dataset.
It has been suggested that loans given to repeat customers (Loan Type
identifies this variable) perform worse than those given to first time
borrowers. To what extent is this assertion supported by the data? Provide
a brief discussion using ideas from the first two lectures to support your
answer.
From, the Cross table Type-1 refers to original customer, Type 2 refers to repeating customer,
Below and high are categorized as per the target PRSM score of 1.
As the table suggests 85.5% of original customers are not meeting the target whereas 75.5%
of repeating customers are not meeting the target PRSM. So our assumption cannot be
justified based on this table.
9. Does your answer to the previous question imply that Original/Repeat loan
status is the cause of an improved PRSM score or might there be another
explanation? If so, suggest a possible lurking variable that could
influence the comparison (the lurking variable can be a hypothetical one,
it doesnt have to exist in the data set). Otherwise, explain briefly why it is
not possible. (Reading Section 5.2 of SF could be helpful here.)
From the correlation matrix we can notice that there is no significant correlation between the
variables and PRSM , hence a new variable by variable transformation is required.
10.If you wanted to construct an approximate 95% confidence interval for the
proportion of future loans that originate from ISO Loan Masters, and this CI
was to have a margin of error of +/-2%, then what sample size would you
recommend?
Zalpha=1.96
E=.02
Standard Deviation=0.211651
N=Zalpha^2*Standard Deviation^2/E^2
N=430