Beruflich Dokumente
Kultur Dokumente
1
22
2
21
3
22
4
27
5
22
6
36
Of course, if the die is fair, the probability of each face should be the same or 1/6. In 150 rolls
you would therefore expect each face to have about 25 appearances. Yet the 6 appears 36
times. Is this coincidence or perhaps indicative of a biased die?
The key to answering this question is to look at how far off the data are from what we would
expect for an even die. If we call fi the frequency of category i, and ei the expected count of
category i, then the 2 test statistic is defined to be
2
n 1
( fi ei )
i =1
ei
Statistical inference is based on the assumption that none of the expected counts is smaller
than 1 and most (80%) are bigger than 5. To perform the analysis in R, we proceed as
follows:
> freq=c(22,21,22,27,22,36)
> probs=c(1,1,1,1,1,1)/6
> chisq.test(freq,p=probs)
Chi-squared test for given probabilities
data: freq
X-squared = 6.72, df = 5, p-value = 0.2423
Page 1 of 18
As we see from the output, the chi-squared test statistic is 6.72. On 5 degrees of freedom and
at the 5% significance level this is not statistically significant as indicated by the p-value of
0.2423.
(a) The letter distribution of the 5 most popular letters in the English language is known to be
approximately
E
29
Letter
Frequency
T
21
N
17
R
17
O
16
That is, when E, T, N, R, O appear, on average 29 times out of 100 it is an E and not the other
4. This information can be useful in breaking codes where the letters of a message are
jumbled up. To break the code, it is useful to first know if the English language was used
before the letters were jumbled!
Suppose the following distribution of the above 5 letters is found in a particular code:
E
100
Letter
Frequency
T
110
N
80
R
55
O
14
Do a chi-squared goodness of fit test using R to see if the letter proportions for this text are
Student
1
5
Beers
0.10
BAL
2
2
0.03
3
9
0.19
4
8
0.12
5
3
0.04
6
7
0.095
7
3
0.07
8
5
0.06
9
3
0.02
10
5
0.05
Page 2 of 18
Question Three
(a) Using the R concatenate function for creating vectors c(), create a vector x that contains
the values (3,1,4,5,9,3). Also, create a vector y that contains the values (2,7,1,8,2,8).
(b) Enter an expression to evaluate x + y without storing x + y.
(c) Create a vector z that contains the sum of x and y.
(d) Remove the vector z, (type ?rm to find out about the remove function). Then create a new
vector z containing the values (1,4,1). Enter expressions to evaluate x + z and y + z . This
illustrates the recycling rule adopted by R when adding vectors of different lengths.
Question Four
3 1
(a) Create a matrix x equal to
and a matrix y equal to
4 5
2 7
.
1 2
(a) Read the R help on the function seq. seq is a helpful function for creating vectors with
entries which form regular sequences. Once you have read the help, try to do the
following:
(a) Create a vector x containing the integers from 1 to 100.
(b) Create a vector x of length 20 where the first entry is 1 and the difference between
successive entries is 0.1.
(c) Create a vector x where the first entry is 1, the last entry is 2, x has length 6 and the
entries of x are equally spaced.
(b) Read the R help on the function rep. rep is a helpful function for creating vectors with
repetitious entries. Once you have read the help, try to do the following:
i. Create a vector x containing the values (3,1,4,1,5). Using rep, create a new vector
y of length 10 in which each entry of x is repeated twice.
ii. Create a vector z where the whole of x is repeated twice (that is, we have two
copies of x stacked one on top of each other).
iii. Create a vector z where the first three entries of x are repeated four times, and the
remaining entries are repeated twice.
Page 3 of 18
The ocean swell produces spectacular eruptions of water through a hole in the cliff at Kiama
Blowhole, about 120km south of Sydney. The duration of 65 successive eruptions of the
blowhole starting at 1.40pm on 12 July, 1998 were recorded. The first twenty of these
observations are shown in the table below.
(a) Enter the durations in R as a vector kiama using the concatenate function c().
(b) Produce a summary of the data by typing summary(kiama).
(c) Produce a histogram of the data by typoing hist(kiama). Read the R help to find out more
about options to control the appearance of the histogram type?hist.
(d) Produce a boxplot of the data by typing boxplot(kiama). Read the R help to learn more
about options to control the appearance of the boxplot type ?boxplot.
(e) Produce a scatterplot of the eruption durations against the order of observation using the
plot() function.
Observation
Number
1
2
3
4
5
6
7
8
9
10
Duration
83
51
87
60
28
95
8
27
15
10
Observation
Number
11
12
13
14
15
16
17
18
19
20
Duration
18
16
29
54
91
8
17
55
10
35
Question Seven
The table below shows the number of deaths caused by firearms in Australia over a number
of years expressed as a rate per 100,000 of population.
Year
1983
1984
1985
1986
1987
1988
1989
1990
Rate
4.31
4.42
4.52
4.35
4.39
4.21
3.40
3.61
Page 4 of 18
Year
1991
1992
1993
1994
1995
1996
1997
Rate
3.67
3.61
2.98
2.95
2.72
2.96
2.30
Write a function in R which returns a loan repayment schedule. Your function should take as
inputs: the interest rate (nominal per annum), the loan principal, the compounding frequency
(per annum), the loan term (in years) and the repayment frequency (number of repayments
per annum). Your function should return a table with one row for each repayment. Each row
should contain: the time period (which will be in years divided by the repayment frequency),
the loan still outstanding at the beginning of the time period, the capital repaid during the
time period, the interest paid during the time period and the loan still outstanding at the end
of the time period.
Question Nine
Recall the back-shift function from time series: BYt = Yt 1 and B d Yt = Yt d . Write a backshift
function in R that takes a vector as its argument and returns the once backshifted version of
the vector. Check that your function works. Can you write the function so that it does not
include loops?
Question Ten
Below is shown a function that when run will simulate values from a particular beta
distribution. Study it and then answer the questions which follow.
function(nsims) {for(i in 1:nsims) {
repeat{
temp1<<-runif(1)
temp2<<-runif(1)
y[i]<<-temp2
if(temp2<dbeta(temp1,2,2)/1.5) {x[i]<<-temp1;break}}}
}
(a) What are the parameters of the beta distribution that the above function simulates values
from?
(b) Modify the function betasim so that the user of the function inputs the values of the two
parameters as well as the number of simulations required.
Page 5 of 18
Your function should take x and the vector of polynomial coefficients as arguments and it
should return the value of the evaluated polynomial. Call this function directpoly.
Page 6 of 18
(a)
> beers=c(5,2,9,8,3,7,3,5,3,5)
> bal=c(.1,.03,.19,.12,.04,.095,.07,.06,.02,.05)
> plot(beers,bal,main="Plot of BAL against number of beers")
Page 7 of 18
>junk=lm(bal~beers)
> junk
Call:
lm(formula = bal ~ beers)
Coefficients:
(Intercept)
-0.0185
beers
0.0192
> summary(junk)
Call:
lm(formula = bal ~ beers)
Residuals:
Min
1Q Median
-0.0275 -0.0187 -0.0071
3Q
0.0194
Max
0.0357
Coefficients:
Page 8 of 18
0.0192 0.2
= 0.22786.
0.003511
This is clearly not statistically significant. Note the degrees of freedom for the t-test here.
They are 8 - this is equal to the error degrees of freedom.
Note that R can easily supply us with the critical value associated with the t distribution. For
a one-sided test (applying in the negative direction) at the 5% significance level, the critical
value is the value on the horizontal axis that has 5% of probability to the right of it.
> qt(0.05,df=8)
[1] -1.859548
0.0185
= 0.96204.
0.01923
This is clearly not statistically significant. Note the degrees of freedom for the t-test here.
They are 8 - this is equal to the error degrees of freedom.
Note that R can easily supply us with the rejection region associated with this t-test. For a
two-sided test at the 5% significance level, the critical value is the value on the horizontal
axis that has 2.5% of probability to the right of it.
> qt(0.975,df=8)
[1] 2.306004
Page 9 of 18
(a)
> x=c(3,1,4,5,9,3)
> y=c(2,7,1,8,2,8)
(b)
> x+y
[1] 5
5 13 11 11
> z=x+y
> z
[1] 5 8
5 13 11 11
(c)
(d)
> rm(z)
> z
Error: object "z" not found
> z=c(1,4,1)
> x+z
[1] 4 5 5 6 13 4
> y+z
[1] 3 11 2 9 6 9
(a)
> x=matrix(c(3,1,4,5),nrow=2,ncol=2)
> x
[,1] [,2]
[1,]
3
4
[2,]
1
5
> x=matrix(c(3,4,1,5),nrow=2,ncol=2)
> x
[,1] [,2]
[1,]
3
1
[2,]
4
5
> y=matrix(c(2,1,7,2),nrow=2,ncol=2)
> y
[,1] [,2]
[1,]
2
7
[2,]
1
2
Page 10 of 18
(c)
> x*y
[1,]
[2,]
[,1] [,2]
6
7
4
10
(d)
> x%*%y
[,1] [,2]
[1,]
7
23
[2,]
13
38
(e)
> z=t(x)
> z
[,1] [,2]
[1,]
3
4
[2,]
1
5
Solution to Question Five
(a) (i)
> x=seq(1,100,1)
> x
[1]
1
2
3
4
5
6
7
8
9 10
15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28
33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46
51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64
69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82
87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
(ii)
> x=seq(from=1,by=0.1,length.out=20)
> x
Page 11 of 18
11
12
13
14
29
30
31
32
47
48
49
50
65
66
67
68
83
84
85
86
(iii)
> x=seq(from=1,to=2,length.out=6)
> x
[1] 1.0 1.2 1.4 1.6 1.8 2.0
Note that the even spacing is automatically generated by the seq function.
(b)
(i)
>
>
>
>
?rep
x=c(3,1,4,1,5)
y=rep(x,2)
y
[1] 3 1 4 1 5 3 1 4 1 5
(ii)
> x=c(3,1,4,1,5)
> y=rep(x,c(2,2,2,2,2))
> y
[1] 3 3 1 1 4 4 1 1 5 5
(iii)
> z=rep(x,2)
> z=rep(x,c(rep(4,3),rep(2,2)))
> z
[1] 3 3 3 3 1 1 1 1 4 4 4 4 1 1 5 5
Solution to Question Six
(a)
>kiama=c(83,51,87,60,28,95,8,27,15,10,18,16,29,54,91,8,17,55,1
0,35)
(b)
> summary(kiama)
Min. 1st Qu. Median
8.00
15.75
28.50
Max.
95.00
(c)
> hist(kiama,main="Histogram of the Kiama Data")
Page 12 of 18
(d)
>boxplot(kiama,main="Boxplot of the Kiama Data")
(e)
>plot(kiama,main="Scatterplot of the Kiama Data")
Page 13 of 18
(a)
> q7=read.csv("addl unit one q7.csv",header=T)
(b)
> plot(Year,Rate, main="Scatterplot of Rate against Year")
(c)
> junk=lm(Rate~Year)
> junk
Call:
lm(formula = Rate ~ Year)
Page 14 of 18
Year
-0.1521
> summary(junk)
Call:
lm(formula = Rate ~ Year)
Residuals:
Min
1Q
Median
-0.38142 -0.16824 -0.01667
3Q
0.22071
Max
0.30701
Coefficients:
Estimate Std. Error t value
(Intercept) 306.3199
29.8444
10.26
Year
-0.1521
0.0150 -10.14
--Signif. codes: 0 '***' 0.001 '**' 0.01
Pr(>|t|)
1.33e-07 ***
1.53e-07 ***
'*' 0.05 '.' 0.1 ' ' 1
(d)
>plot(junk)
Page 15 of 18
Page 16 of 18
(a)
( , ) = ( 2, 2 ) .
(c) Use the command betasim(5,10,50000). The output is stored in a vector x. Find the mean
of x. It should be close to 1/3.
Page 17 of 18
Page 18 of 18