Sie sind auf Seite 1von 10

MTH 357 Final Exam - Dr.

Minghui Yang
From: Francisco Almonte
December 19, 2010

Q1: Data: EXCEL files Chp2 IBD.xls Questions:

1. How many variables are in this data set and their type (nominal, ordinal, interval, ratio)?
Answer:
There are six variables in the data set. Here are their names and types:

Variable Name Variable Type


Company Nominal
EPS Rating Ordinal
Relative Price Strength Ordinal
Relative Strength Ordinal
Sales/Margins/ROE Ordinal
PE Ratio Ratio

2. Draw frequency distribution for Sales/Margins/ROE

Frequency Distribution of
Sales/Margins/ROE
12
10
8
Frequency

6
4
2
0

A B C D E

Sales/Margins/ROE

Figure 1: Histogram displaying the distribution for Sales/Margins/ROE.

1
3. Generate descriptive statistics for EPS rating and explain them (mean, variance, medium, mode,
range, mode, skewness, kurtosis etc)
Answer:
I generated the following EPS rating descriptive statistics using the “R” statistics package:

EPS Rating Descriptive Statistics

Statistic Name Value Meaning


mean 62.028 The sum of a list of numbers, divided by the total number
of numbers in the list.

variance 688.142 Measure of the average distance between each of a set of


data points and their mean value; equal to the sum of the
squares of the deviation from the mean value.

median 71 “Middle value” of a list. The smallest number such that at


least half the numbers in the list are no greater than it.

mode(s) 58,76,80,84 For lists, the mode is the most common (frequent) value.
A list can have more than one mode. A data set has no
mode when all the numbers appear in the data with the
same frequency.

range 92 The range of a set of numbers is the largest value in the


set minus the smallest value in the set.

skewness -0.738 Skewness is a measure of symmetry, or more precisely, the


lack of symmetry. A distribution, or data set, is symmet-
ric if it looks the same to the left and right of the center
point. Negative values for the skewness indicate data that
are skewed left and positive values for the skewness indi-
cate data that are skewed right. In this case the data are
skewed left.

kurtosis 2.299 The kurtosis is a measure of the peakedness of the data


distribution. Negative kurtosis would indicates a flat dis-
tribution, which is said to be platykurtic. Positive kurtosis
would indicates a peaked distribution, which is said to be
leptokurtic. In this case, it is leptokurtic.

4. Use regression analysis to determine if EPS Rating is related to Relative Price Strength
Answer:
See Figure #2 for a plot of “EPS Rating Vs. Relative Price”. “EPS Rating” is the in-
dependent variable, whereas “Relative Price” is the dependent variable. I fitted a linear
regression line through the plots. From visually inspecting the line fit, it appears that
there is no correlation between the variables. I base this assessment on the observation
that plot points do not cluster around the line.

I confirmed this assessment by interpreting the correlation coefficient (using Spear-


man’s method)a . Its value is: 0.094. Assuming |0.6| or higher for this number means the
variables are related, we can conclude that they are not related.
a Spearman’s correlation coefficient is preferred over Pearson’s when the variables being tested are

ordinal. Pearson’s coefficient number would’ve been: 0.18

2
EPS Rating
Vs.
Relative Price


70

● ●

● ●
60



● ●

50
Relative Price

● ●
● ●
40


● ●

● ●
30



20

● ● ●
● ●


10


● ●

20 40 60 80 100

EPS Rating

Figure 2: Regression Analysis for EPS Rating Vs. Relative Price

3
Q2: 20% of applicants for a VISA card are rejected. Write down the probability distribution formula. What
is its expected rejection number and variance (formula and your calculation)?
Answer:
The general probability distribution function for this binomial distribution is:

P (X = x) =n Cx px q n−x =n Cx 0.2x 0.8n−x

Now let’s customize this function even further for this problem:

P (X = x) =15 Cx 0.2x 0.815−x

Also, as I’ve done on previous homework, I will use existing technology to generate a
distribution table I can use to answer any questions:

x P (X = x)a x P (X = x)
0 0.0352 8 0.0035
1 0.1319 9 7e-04
2 0.2309 10 1e-04
3 0.2501 11 0
4 0.1876 12 0
5 0.1032 13 0
6 0.043 14 0
7 0.0138 15 0
a Note: I rounded down to four digits.

What is the probability that among the next 15 applicants:

a) None will be rejected


Answer:
From the table I constructed, P (X = 0) = 0.0352

b) All will be rejected


Answer:
From the table I constructed, P (X = 15) = 0

The above result means that there is virtually no chance of this happening.

c) Less than 4 will be rejected


Answer:
From the table I constructed:

P (X < 4) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 0.0352 + 0.1319 + 0.2309 + 0.2501
= 0.6481

d) More then 6 will be rejected


Answer:
From the table I constructed and re-using the previous results:

P (X > 6) = 1 − [P (X < 4) + P (X = 4) + P (X = 5) + P (X = 6)]


= 1 − (0.6481 + 0.1876 + 0.1032 + 0.043)
= 0.0181

4
Q3: Data: EXCEL files Chp7 MetAreas.xls. 100 random samples were selected from US and Canada cities
with Overall Rating for each city.
1. What is the sample mean and variance?
Answer:
The sample mean is: 67.599; and the sample variance is: 16.499.

2. What is the variance of the mean? What is its distribution?


Answer:
The variance of the mean is computed as follows:

σx2 16.499
σx̄2 = = = 0.165
n 100
Since the population standard deviation is unknown, the mean has a Student’s t distribu-
tion.

3. What is the expected Population mean for Rating?


Answer:
The expected population mean for the Rating is:

µx̄ ± σx̄ = 67.599 ± 0.406

4. What is the 95% confidence interval of the population mean?


Answer:
Since the Student’s t table does not show values for degree of freedom (df) equal to 99 (it
jumps from 60 to 120), I’ll have to use the normal (Z) distribution table for this portion of
the problem.

Looking up Z95% in the distribution table, we get: Z95% = 1.96

Now, we calculate the lower and upper limits:

Lower limit = µx̄ − σx̄ Z95%


= 67.599 − 0.406 × 1.96 = 66.803

Upper limit = µx̄ + σx̄ Z95%


= 67.599 + 0.406 × 1.96 = 68.395

So the 95% confidence interval is: 66.803 ≥ µx̄ ≤ 68.395

5. If we want the error to be 1% (plus, minus 1%, total 2%), what should the sample size be?
Answer:
Assuming that the desired confidence interval is still 95%, we use the formula:
 2  2
σx̄ Z95% 0.406 × 1.96
n= = = 1583
0.02 0.02

5
6. If one states “the average Rating score for US and Canada cities is at least 70”, develop your null
and alternative test at 1% and 5% level to see if the statement is true?
Answer:
Again, because the book’s Student’s t table does not have values for df = 99, we will have
to calculate the z-score and use the normal probability table.

First, let’s establish the hypotheses:

H0 : µ ≥ 70 ⇒ Ha : µ < 70

Now lets’s calculate the z-score:


67.599 − 70
zcalc =   = −5.9138
4.062

100

Now let’s lookup the left-tailed z-scores for each confidence level:

z0.01 ≈ −0.025 and z0.05 ≈ −0.125

Since zcalc is off the chart to the far left: At both the 1% and 5% significance levels, we
can reject the null hypothesis and conclude that: “the average Rating score for US and
Canada cities is less than 70”.

6
Q4: Data: EXCEL files Chp10 CheckAcct.xls. Some samples from two branches of BECU were selected
about the check balance.
1. What is the sample mean and variance for each branch? What is the difference between the two
sample mean?
Answer:
Here is the sample mean and variance for each branch (the df is also included):

Sample Mean Sample Variance df


Cherry Grove 1025 22500.296 27
Beechmont 910.045 15624.141 21
A confidence level was not stated, so I will assume 95% (α = 0.05). So for the dif-
ference, we use the following formula:
s
s21 s2
x̄1 − x̄2 ± tα/2 + 2
n1 n2

To find tα/2 we must compute the combined df :


2
s21 s22

2
n1 + n2
22500.296
28 + 15624.141
22
df =  2 2 2 = 1 22500.296 2 1 15624.141 2
  
1 s1 s22 + 22−1
n1 −1 n1 + n21−1 n2 28−1 28 22

= 47.81 ≈ 48

The book’s t table jumps from df = 40 to df = 60. Instead of using the book’s t
table, I found one onlinea that included two-tail values for df = 48. This yielded that
tn−1,α/2 = t48,0.025 = 2.011

Now we go back to the first formula and plug in values:


r
22500.296 15624.141
1025 − 910.045 ± 2.011 +
28 22
114.955 ± 78.242

The above means that there is a 95% certainty that the difference between the population
means is between: 36.713b and 193.197c .
a This online table is located at: http://www.medcalc.org/manual/t-distribution.php
b 114.955 - 78.242 = 36.713
c 114.955 + 78.242 = 193.197

7
2. Use F test to see if their variance is the same?
Answer:
Again, a confidence level was not stated, so I will assume 95% (α = 0.05).

First, let’s state the hypotheses:

H0 : σ1 = σ2 ⇒ Ha : σ1 6= σ2

Second, calculate the F -value:

s21 22500.296
Fcalc = = = 1.44
s22 15624.141

Since most f test tables do not have df = 27 for v1 , we will use f0.05 (30, 21). Note that
since this is a two-tailed test, our confidence level will be reduced to 90%.

Third, lookup F30,21 in the table:

F30,21 = 2.0102

Lastly, we compare F -values and come to a conclusion:

Since (Fcalc < F30,21 ), we fail to reject the null hypothesis and conclude that: “both
standard deviations are equal (σ1 = σ2 )”.

8
3. If one states “the difference of checking balance between the 2 branches is at least $200”, do you
use t-test or z-test for your test? Why? Is it one-tail or two-tails test? Develop your test at 1%
and 5% level and show your result.
Answer:
First off, we will need to use a t-test because the population standard deviations are
unknown. It is a two-tail test where the critical regions fall on the left and right tails;
this is because we are interested in wether the difference from one account to the other is
±$200.

Here are the hypotheses:

H0 : µ1 − µ2 − 200 = 0 ⇒ Ha : µ1 − µ2 − 200 6= 0

Now let’s calculate the t-value:


(x̄1 − x̄2 ) − 200 (1025 − 910.045) − 200
tcalc = q 2 2
= q = −2.186
s1 s2 22500.296 15624.141
n1 + n2 28 + 22

We’re also going to need the combined df which we already calculated for the first prob-
lem. It is ≈ 48.

The book’s t table jumps from df = 40 to df = 60. Instead of using the book’s t
table, I found one onlinea that included two-tail values for df = 48.

Comparing the left tail according to the online table, and since α/2 = 0.005 at 1%:

tcalc = −2.186 > t48,0.005 = −3.505

According to this comparison, at a 1% level, we reject the null and conclude that: “the
difference of checking balance between the 2 branches is less than $200”.

Now, comparing the left tail according to the online table, and since α/2 = 0.025 at
5%:

tcalc = −2.186 < t48,0.025 = −2.011

According to this comparison, at a 5% level, we just barely fail to reject the null and con-
clude that, indeed: “the difference of checking balance between the 2 branches is at least
$200”.
a This online table is located at: http://www.medcalc.org/manual/t-distribution.php

9
Q5: Data: EXCEL files Chp15 HomeValue.xls. Which should be your independent and dependent variables?
Perform your multiple regression analysis.
Answer:
Since it appears that we are trying to predict the score values, the “Score” column will be
our dependent variable. Meanwhile, both the “RecRes” and “Afford” columns will be our
independent variables.

I used the “R” statistics package to generate the following multiple-regression fit
summary:

Residuals:
Min 1Q Median 3Q Max
-6.0603 -2.2437 -0.1644 1.6569 7.7508

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.4848 3.2281 10.373 9.03e-09 ***
RecRes 1.8998 0.2603 7.298 1.24e-06 ***
Afford 2.6108 0.4545 5.745 2.38e-05 ***
--
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.803 on 17 degrees of freedom


Multiple R-squared: 0.8067, Adjusted R-squared: 0.784
F-statistic: 35.47 on 2 and 17 DF, p-value: 8.57e-07
Most importantly from the above multiple-regression analysis results, we can distill
the following linear model formula to predict “Score” values:

y(x1 , x2 ) = 33.48 + 1.9x1 + 2.61x2 (1)

From the above linear model, y = “Score”; x1 = “RecRes”; and x2 = “Afford”

After using formula #1 to predict some “Score” values, I concluded that: “There is a
strong linear correlation between variable “Score” and variables “RecRes” and “Afford””.

Here are two examples of how I tested the linear model represented by formula #1:

y(8, 2) = 33.48 + 1.9(8) + 2.61(2) = 53.9

y(2, 7) = 33.48 + 1.9(2) + 2.61(7) = 55.55

The above predicted “Score” values are relatively close to the actual “Score” values from
the tabulated dataset.

10

Das könnte Ihnen auch gefallen