Beruflich Dokumente
Kultur Dokumente
Minghui Yang
From: Francisco Almonte
December 19, 2010
1. How many variables are in this data set and their type (nominal, ordinal, interval, ratio)?
Answer:
There are six variables in the data set. Here are their names and types:
Frequency Distribution of
Sales/Margins/ROE
12
10
8
Frequency
6
4
2
0
A B C D E
Sales/Margins/ROE
1
3. Generate descriptive statistics for EPS rating and explain them (mean, variance, medium, mode,
range, mode, skewness, kurtosis etc)
Answer:
I generated the following EPS rating descriptive statistics using the “R” statistics package:
mode(s) 58,76,80,84 For lists, the mode is the most common (frequent) value.
A list can have more than one mode. A data set has no
mode when all the numbers appear in the data with the
same frequency.
4. Use regression analysis to determine if EPS Rating is related to Relative Price Strength
Answer:
See Figure #2 for a plot of “EPS Rating Vs. Relative Price”. “EPS Rating” is the in-
dependent variable, whereas “Relative Price” is the dependent variable. I fitted a linear
regression line through the plots. From visually inspecting the line fit, it appears that
there is no correlation between the variables. I base this assessment on the observation
that plot points do not cluster around the line.
2
EPS Rating
Vs.
Relative Price
●
●
70
● ●
●
● ●
60
●
●
● ●
●
50
Relative Price
● ●
● ●
40
●
● ●
●
● ●
30
●
●
20
● ● ●
● ●
●
●
10
●
● ●
20 40 60 80 100
EPS Rating
3
Q2: 20% of applicants for a VISA card are rejected. Write down the probability distribution formula. What
is its expected rejection number and variance (formula and your calculation)?
Answer:
The general probability distribution function for this binomial distribution is:
Now let’s customize this function even further for this problem:
Also, as I’ve done on previous homework, I will use existing technology to generate a
distribution table I can use to answer any questions:
x P (X = x)a x P (X = x)
0 0.0352 8 0.0035
1 0.1319 9 7e-04
2 0.2309 10 1e-04
3 0.2501 11 0
4 0.1876 12 0
5 0.1032 13 0
6 0.043 14 0
7 0.0138 15 0
a Note: I rounded down to four digits.
The above result means that there is virtually no chance of this happening.
P (X < 4) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 0.0352 + 0.1319 + 0.2309 + 0.2501
= 0.6481
4
Q3: Data: EXCEL files Chp7 MetAreas.xls. 100 random samples were selected from US and Canada cities
with Overall Rating for each city.
1. What is the sample mean and variance?
Answer:
The sample mean is: 67.599; and the sample variance is: 16.499.
σx2 16.499
σx̄2 = = = 0.165
n 100
Since the population standard deviation is unknown, the mean has a Student’s t distribu-
tion.
5. If we want the error to be 1% (plus, minus 1%, total 2%), what should the sample size be?
Answer:
Assuming that the desired confidence interval is still 95%, we use the formula:
2 2
σx̄ Z95% 0.406 × 1.96
n= = = 1583
0.02 0.02
5
6. If one states “the average Rating score for US and Canada cities is at least 70”, develop your null
and alternative test at 1% and 5% level to see if the statement is true?
Answer:
Again, because the book’s Student’s t table does not have values for df = 99, we will have
to calculate the z-score and use the normal probability table.
H0 : µ ≥ 70 ⇒ Ha : µ < 70
Now let’s lookup the left-tailed z-scores for each confidence level:
Since zcalc is off the chart to the far left: At both the 1% and 5% significance levels, we
can reject the null hypothesis and conclude that: “the average Rating score for US and
Canada cities is less than 70”.
6
Q4: Data: EXCEL files Chp10 CheckAcct.xls. Some samples from two branches of BECU were selected
about the check balance.
1. What is the sample mean and variance for each branch? What is the difference between the two
sample mean?
Answer:
Here is the sample mean and variance for each branch (the df is also included):
= 47.81 ≈ 48
The book’s t table jumps from df = 40 to df = 60. Instead of using the book’s t
table, I found one onlinea that included two-tail values for df = 48. This yielded that
tn−1,α/2 = t48,0.025 = 2.011
The above means that there is a 95% certainty that the difference between the population
means is between: 36.713b and 193.197c .
a This online table is located at: http://www.medcalc.org/manual/t-distribution.php
b 114.955 - 78.242 = 36.713
c 114.955 + 78.242 = 193.197
7
2. Use F test to see if their variance is the same?
Answer:
Again, a confidence level was not stated, so I will assume 95% (α = 0.05).
H0 : σ1 = σ2 ⇒ Ha : σ1 6= σ2
s21 22500.296
Fcalc = = = 1.44
s22 15624.141
Since most f test tables do not have df = 27 for v1 , we will use f0.05 (30, 21). Note that
since this is a two-tailed test, our confidence level will be reduced to 90%.
F30,21 = 2.0102
Since (Fcalc < F30,21 ), we fail to reject the null hypothesis and conclude that: “both
standard deviations are equal (σ1 = σ2 )”.
8
3. If one states “the difference of checking balance between the 2 branches is at least $200”, do you
use t-test or z-test for your test? Why? Is it one-tail or two-tails test? Develop your test at 1%
and 5% level and show your result.
Answer:
First off, we will need to use a t-test because the population standard deviations are
unknown. It is a two-tail test where the critical regions fall on the left and right tails;
this is because we are interested in wether the difference from one account to the other is
±$200.
H0 : µ1 − µ2 − 200 = 0 ⇒ Ha : µ1 − µ2 − 200 6= 0
We’re also going to need the combined df which we already calculated for the first prob-
lem. It is ≈ 48.
The book’s t table jumps from df = 40 to df = 60. Instead of using the book’s t
table, I found one onlinea that included two-tail values for df = 48.
Comparing the left tail according to the online table, and since α/2 = 0.005 at 1%:
According to this comparison, at a 1% level, we reject the null and conclude that: “the
difference of checking balance between the 2 branches is less than $200”.
Now, comparing the left tail according to the online table, and since α/2 = 0.025 at
5%:
According to this comparison, at a 5% level, we just barely fail to reject the null and con-
clude that, indeed: “the difference of checking balance between the 2 branches is at least
$200”.
a This online table is located at: http://www.medcalc.org/manual/t-distribution.php
9
Q5: Data: EXCEL files Chp15 HomeValue.xls. Which should be your independent and dependent variables?
Perform your multiple regression analysis.
Answer:
Since it appears that we are trying to predict the score values, the “Score” column will be
our dependent variable. Meanwhile, both the “RecRes” and “Afford” columns will be our
independent variables.
I used the “R” statistics package to generate the following multiple-regression fit
summary:
Residuals:
Min 1Q Median 3Q Max
-6.0603 -2.2437 -0.1644 1.6569 7.7508
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.4848 3.2281 10.373 9.03e-09 ***
RecRes 1.8998 0.2603 7.298 1.24e-06 ***
Afford 2.6108 0.4545 5.745 2.38e-05 ***
--
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
After using formula #1 to predict some “Score” values, I concluded that: “There is a
strong linear correlation between variable “Score” and variables “RecRes” and “Afford””.
Here are two examples of how I tested the linear model represented by formula #1:
The above predicted “Score” values are relatively close to the actual “Score” values from
the tabulated dataset.
10