Sie sind auf Seite 1von 18

ACST357/862 General Insurance Pricing and Reserving

ADDITIONAL EXERCISES INVOLVING R


Question One
The aim of this question is to demonstrate how we can use R to perform the chi-squared test
for assessing whether observed and expected outcomes are compatible.
Recall from earlier Statistics units, that a goodness of fit test can be used to check whether
observed data come from a specified population. The chi-squared test is a specific type of
goodness of fit test that allows one to test if categorical data correspond to a model where the
data are chosen from the categories according to some specified set of probabilities.
We give a simple example of this here that serves two purposes:
1. To remind you of the chi-squared test and how it works;
2. To illustrate to you how the R software can be used to perform chi-squared tests.
If we toss a die 150 times and find that we observe the following distribution of rolls, is the
die fair?
Face
Number
of Rolls

1
22

2
21

3
22

4
27

5
22

6
36

Of course, if the die is fair, the probability of each face should be the same or 1/6. In 150 rolls
you would therefore expect each face to have about 25 appearances. Yet the 6 appears 36
times. Is this coincidence or perhaps indicative of a biased die?
The key to answering this question is to look at how far off the data are from what we would
expect for an even die. If we call fi the frequency of category i, and ei the expected count of
category i, then the 2 test statistic is defined to be

2
n 1

( fi ei )

i =1

ei

Statistical inference is based on the assumption that none of the expected counts is smaller
than 1 and most (80%) are bigger than 5. To perform the analysis in R, we proceed as
follows:
> freq=c(22,21,22,27,22,36)
> probs=c(1,1,1,1,1,1)/6
> chisq.test(freq,p=probs)
Chi-squared test for given probabilities
data: freq
X-squared = 6.72, df = 5, p-value = 0.2423

Page 1 of 18

ACST357/862 General Insurance Pricing and Reserving

As we see from the output, the chi-squared test statistic is 6.72. On 5 degrees of freedom and
at the 5% significance level this is not statistically significant as indicated by the p-value of
0.2423.
(a) The letter distribution of the 5 most popular letters in the English language is known to be
approximately
E
29

Letter
Frequency

T
21

N
17

R
17

O
16

That is, when E, T, N, R, O appear, on average 29 times out of 100 it is an E and not the other
4. This information can be useful in breaking codes where the letters of a message are
jumbled up. To break the code, it is useful to first know if the English language was used
before the letters were jumbled!
Suppose the following distribution of the above 5 letters is found in a particular code:
E
100

Letter
Frequency

T
110

N
80

R
55

O
14

Do a chi-squared goodness of fit test using R to see if the letter proportions for this text are

E = 0.29, T = 0.21, N = 0.17, R = 0.17, O = 0.16.


Question Two
It is well known that the more beer you drink, the more your blood alcohol level (BAL) rises.
Suppose we have the following data on student beer consumption:

Student
1
5
Beers
0.10
BAL

2
2
0.03

3
9
0.19

4
8
0.12

5
3
0.04

6
7
0.095

7
3
0.07

8
5
0.06

9
3
0.02

10
5
0.05

Use R to help answer the following questions:


(a) Make a scatterplot and fit the data with a regression line including a slope and intercept.
(b) Test the hypothesis that another beer raises your BAL by 0.02 percent against the
alternative that the increase in BAL is less than this.
(c) Perform an hypothesis test to see if the intercept is zero. Use a two-sided alternative
hypothesis.

Page 2 of 18

ACST357/862 General Insurance Pricing and Reserving

Question Three
(a) Using the R concatenate function for creating vectors c(), create a vector x that contains
the values (3,1,4,5,9,3). Also, create a vector y that contains the values (2,7,1,8,2,8).
(b) Enter an expression to evaluate x + y without storing x + y.
(c) Create a vector z that contains the sum of x and y.
(d) Remove the vector z, (type ?rm to find out about the remove function). Then create a new
vector z containing the values (1,4,1). Enter expressions to evaluate x + z and y + z . This
illustrates the recycling rule adopted by R when adding vectors of different lengths.

Question Four

3 1
(a) Create a matrix x equal to
and a matrix y equal to
4 5

2 7

.
1 2

(b) Use R to evaluate the sum of x and y.


(c) Use R to evaluate an element by element product of x and y.
(d) Use R to evaluate the matrix product of x and y.
(e) Store the transpose of x in the matrix z. Use the Help to find a suitable function for this.
Question Five

(a) Read the R help on the function seq. seq is a helpful function for creating vectors with
entries which form regular sequences. Once you have read the help, try to do the
following:
(a) Create a vector x containing the integers from 1 to 100.
(b) Create a vector x of length 20 where the first entry is 1 and the difference between
successive entries is 0.1.
(c) Create a vector x where the first entry is 1, the last entry is 2, x has length 6 and the
entries of x are equally spaced.
(b) Read the R help on the function rep. rep is a helpful function for creating vectors with
repetitious entries. Once you have read the help, try to do the following:
i. Create a vector x containing the values (3,1,4,1,5). Using rep, create a new vector
y of length 10 in which each entry of x is repeated twice.
ii. Create a vector z where the whole of x is repeated twice (that is, we have two
copies of x stacked one on top of each other).
iii. Create a vector z where the first three entries of x are repeated four times, and the
remaining entries are repeated twice.

Page 3 of 18

ACST357/862 General Insurance Pricing and Reserving


Question Six

The ocean swell produces spectacular eruptions of water through a hole in the cliff at Kiama
Blowhole, about 120km south of Sydney. The duration of 65 successive eruptions of the
blowhole starting at 1.40pm on 12 July, 1998 were recorded. The first twenty of these
observations are shown in the table below.
(a) Enter the durations in R as a vector kiama using the concatenate function c().
(b) Produce a summary of the data by typing summary(kiama).
(c) Produce a histogram of the data by typoing hist(kiama). Read the R help to find out more
about options to control the appearance of the histogram type?hist.
(d) Produce a boxplot of the data by typing boxplot(kiama). Read the R help to learn more
about options to control the appearance of the boxplot type ?boxplot.
(e) Produce a scatterplot of the eruption durations against the order of observation using the
plot() function.
Observation
Number
1
2
3
4
5
6
7
8
9
10

Duration

83
51
87
60
28
95
8
27
15
10

Observation
Number
11
12
13
14
15
16
17
18
19
20

Duration

18
16
29
54
91
8
17
55
10
35

Question Seven

The table below shows the number of deaths caused by firearms in Australia over a number
of years expressed as a rate per 100,000 of population.
Year
1983
1984
1985
1986
1987
1988
1989
1990

Rate
4.31
4.42
4.52
4.35
4.39
4.21
3.40
3.61

Page 4 of 18

Year
1991
1992
1993
1994
1995
1996
1997

Rate
3.67
3.61
2.98
2.95
2.72
2.96
2.30

ACST357/862 General Insurance Pricing and Reserving


(a) Copy these data into an Excel spreadsheet, save the spreadsheet as a .CSV file and import
the data into an R dataframe.
(b) Produce a scatterplot of Rate against Year. Do you think that a simple linear regression
model is appropriate for these data?
(c) Fit a simple linear regression model. From the summary output for the fitted model, state
whether you think there is strong evidence for a linear trend in death rates over time.
(d) By using the plot command, produce residual and diagnostic plots for the fitted model.
What do the plots tell you?
Question Eight

Write a function in R which returns a loan repayment schedule. Your function should take as
inputs: the interest rate (nominal per annum), the loan principal, the compounding frequency
(per annum), the loan term (in years) and the repayment frequency (number of repayments
per annum). Your function should return a table with one row for each repayment. Each row
should contain: the time period (which will be in years divided by the repayment frequency),
the loan still outstanding at the beginning of the time period, the capital repaid during the
time period, the interest paid during the time period and the loan still outstanding at the end
of the time period.
Question Nine

Recall the back-shift function from time series: BYt = Yt 1 and B d Yt = Yt d . Write a backshift
function in R that takes a vector as its argument and returns the once backshifted version of
the vector. Check that your function works. Can you write the function so that it does not
include loops?
Question Ten

Below is shown a function that when run will simulate values from a particular beta
distribution. Study it and then answer the questions which follow.
function(nsims) {for(i in 1:nsims) {
repeat{
temp1<<-runif(1)
temp2<<-runif(1)
y[i]<<-temp2
if(temp2<dbeta(temp1,2,2)/1.5) {x[i]<<-temp1;break}}}
}

(a) What are the parameters of the beta distribution that the above function simulates values
from?
(b) Modify the function betasim so that the user of the function inputs the values of the two
parameters as well as the number of simulations required.

Page 5 of 18

ACST357/862 General Insurance Pricing and Reserving


(c) Test out your modified code in R and use it to simulate 50,000 values from the beta
distribution with parameters ( , ) = ( 5,10 ) . Calculate the sample mean and sample
variance of your simulated values.
Question Eleven

Write a function which will evaluate polynomials of the form


P ( x ) = cn x n 1 + cn 1 x n 2 + L + c2 x + c1.

Your function should take x and the vector of polynomial coefficients as arguments and it
should return the value of the evaluated polynomial. Call this function directpoly.

Page 6 of 18

ACST357/862 General Insurance Pricing and Reserving


Solution to Question One
> x=c(100,110,80,55,14)
> probs=c(29,21,17,17,16)/100
> chisq.test(x,p=probs)
Chi-squared test for given probabilities
data: x
X-squared = 55.3955, df = 4, p-value = 2.685e-11

This indicates that this text is unlikely to be written in English.


Solution to Question Two

(a)
> beers=c(5,2,9,8,3,7,3,5,3,5)
> bal=c(.1,.03,.19,.12,.04,.095,.07,.06,.02,.05)
> plot(beers,bal,main="Plot of BAL against number of beers")

Page 7 of 18

ACST357/862 General Insurance Pricing and Reserving

>junk=lm(bal~beers)
> junk
Call:
lm(formula = bal ~ beers)
Coefficients:
(Intercept)
-0.0185

beers
0.0192

> summary(junk)
Call:
lm(formula = bal ~ beers)
Residuals:
Min
1Q Median
-0.0275 -0.0187 -0.0071

3Q
0.0194

Max
0.0357

Coefficients:

Page 8 of 18

ACST357/862 General Insurance Pricing and Reserving


Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.018500
0.019230 -0.962 0.364200
beers
0.019200
0.003511
5.469 0.000595 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02483 on 8 degrees of freedom
Multiple R-Squared: 0.789,
Adjusted R-squared: 0.7626
F-statistic: 29.91 on 1 and 8 DF, p-value: 0.0005953

(b) The required statistical test has:


H 0 : slope = 0.02
H1 : slope < 0.02.
The test statistic is t8 =

0.0192 0.2
= 0.22786.
0.003511

This is clearly not statistically significant. Note the degrees of freedom for the t-test here.
They are 8 - this is equal to the error degrees of freedom.
Note that R can easily supply us with the critical value associated with the t distribution. For
a one-sided test (applying in the negative direction) at the 5% significance level, the critical
value is the value on the horizontal axis that has 5% of probability to the right of it.
> qt(0.05,df=8)
[1] -1.859548

(c) The required statistical test has:


H 0 : intercept = 0
H1 : slope 0.

The test statistic is t8 =

0.0185
= 0.96204.
0.01923

This is clearly not statistically significant. Note the degrees of freedom for the t-test here.
They are 8 - this is equal to the error degrees of freedom.
Note that R can easily supply us with the rejection region associated with this t-test. For a
two-sided test at the 5% significance level, the critical value is the value on the horizontal
axis that has 2.5% of probability to the right of it.
> qt(0.975,df=8)
[1] 2.306004

Page 9 of 18

ACST357/862 General Insurance Pricing and Reserving


The rejection region therefore includes values of the test statistic less than -2.306004 and
greater than 2.306004.
Solution to Question Three

(a)
> x=c(3,1,4,5,9,3)
> y=c(2,7,1,8,2,8)

(b)
> x+y
[1] 5

5 13 11 11

> z=x+y
> z
[1] 5 8

5 13 11 11

(c)

(d)
> rm(z)
> z
Error: object "z" not found
> z=c(1,4,1)
> x+z
[1] 4 5 5 6 13 4
> y+z
[1] 3 11 2 9 6 9

#as you would expect

Solution to Question Four

(a)
> x=matrix(c(3,1,4,5),nrow=2,ncol=2)
> x
[,1] [,2]
[1,]
3
4
[2,]
1
5
> x=matrix(c(3,4,1,5),nrow=2,ncol=2)
> x
[,1] [,2]
[1,]
3
1
[2,]
4
5
> y=matrix(c(2,1,7,2),nrow=2,ncol=2)
> y
[,1] [,2]
[1,]
2
7
[2,]
1
2

Page 10 of 18

ACST357/862 General Insurance Pricing and Reserving


(b)
> x+y
[,1] [,2]
[1,]
5
8
[2,]
5
7

(c)
> x*y
[1,]
[2,]

[,1] [,2]
6
7
4
10

(d)
> x%*%y
[,1] [,2]
[1,]
7
23
[2,]
13
38

(e)
> z=t(x)
> z
[,1] [,2]
[1,]
3
4
[2,]
1
5
Solution to Question Five

(a) (i)
> x=seq(1,100,1)
> x
[1]
1
2
3
4
5
6
7
8
9 10
15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28
33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46
51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64
69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82
87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100

(ii)
> x=seq(from=1,by=0.1,length.out=20)
> x

Page 11 of 18

11

12

13

14

29

30

31

32

47

48

49

50

65

66

67

68

83

84

85

86

ACST357/862 General Insurance Pricing and Reserving


[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3
2.4 2.5 2.6 2.7 2.8 2.9

(iii)
> x=seq(from=1,to=2,length.out=6)
> x
[1] 1.0 1.2 1.4 1.6 1.8 2.0

Note that the even spacing is automatically generated by the seq function.
(b)
(i)
>
>
>
>

?rep
x=c(3,1,4,1,5)
y=rep(x,2)
y
[1] 3 1 4 1 5 3 1 4 1 5

(ii)
> x=c(3,1,4,1,5)
> y=rep(x,c(2,2,2,2,2))
> y
[1] 3 3 1 1 4 4 1 1 5 5

(iii)
> z=rep(x,2)
> z=rep(x,c(rep(4,3),rep(2,2)))
> z
[1] 3 3 3 3 1 1 1 1 4 4 4 4 1 1 5 5
Solution to Question Six

(a)
>kiama=c(83,51,87,60,28,95,8,27,15,10,18,16,29,54,91,8,17,55,1
0,35)

(b)
> summary(kiama)
Min. 1st Qu. Median
8.00
15.75
28.50

Mean 3rd Qu.


39.85
56.25

Max.
95.00

(c)
> hist(kiama,main="Histogram of the Kiama Data")

Page 12 of 18

ACST357/862 General Insurance Pricing and Reserving

(d)
>boxplot(kiama,main="Boxplot of the Kiama Data")

(e)
>plot(kiama,main="Scatterplot of the Kiama Data")

Page 13 of 18

ACST357/862 General Insurance Pricing and Reserving

Solution to Question Seven

(a)
> q7=read.csv("addl unit one q7.csv",header=T)

(b)
> plot(Year,Rate, main="Scatterplot of Rate against Year")

(c)
> junk=lm(Rate~Year)
> junk
Call:
lm(formula = Rate ~ Year)

Page 14 of 18

ACST357/862 General Insurance Pricing and Reserving


Coefficients:
(Intercept)
306.3199

Year
-0.1521

> summary(junk)
Call:
lm(formula = Rate ~ Year)
Residuals:
Min
1Q
Median
-0.38142 -0.16824 -0.01667

3Q
0.22071

Max
0.30701

Coefficients:
Estimate Std. Error t value
(Intercept) 306.3199
29.8444
10.26
Year
-0.1521
0.0150 -10.14
--Signif. codes: 0 '***' 0.001 '**' 0.01

Pr(>|t|)
1.33e-07 ***
1.53e-07 ***
'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.251 on 13 degrees of freedom


Multiple R-Squared: 0.8878,
Adjusted R-squared: 0.8792
F-statistic: 102.9 on 1 and 13 DF, p-value: 1.527e-07

(d)
>plot(junk)

Page 15 of 18

ACST357/862 General Insurance Pricing and Reserving

Page 16 of 18

ACST357/862 General Insurance Pricing and Reserving


Solution to Question Eight
repaysched=function(int_rate, principal, comp_freq, loan_term,
repay_freq) {
eff_rate<<-(1+int_rate/comp_freq)^(comp_freq/repay_freq)-1
repayment<<-principal/((1-(1+eff_rate)^(loan_term*repay_freq))/eff_rate)
period<<-c(seq(0,loan_term*repay_freq-1,1))
period2<<-period+1
loan_os_beg<<-repayment*(1-(1+eff_rate)^((loan_term*repay_freq-period)))/eff_rate
loan_os_end<<-repayment*(1-(1+eff_rate)^((loan_term*repay_freq-period2)))/eff_rate
cap_rpd<<-loan_os_beg-loan_os_end
int_pd<<-repayment-cap_rpd
out=data.frame(period2, loan_os_beg, cap_rpd,
int_pd,loan_os_end)
print(out,digits=6)
}
Solution to Question Nine
> firstdiff
function(input){
temp1=c(0,input)
temp2=c(input,0)
temp=temp2-temp1
output=temp[2:length(input)]
return(output)}
Solution to Question Ten

(a)

( , ) = ( 2, 2 ) .

(b) The modified code is:


betasim=function(alpha,beta,nsims) {for(i in 1:nsims) {
repeat{
temp1<<-runif(1)
temp2<<-runif(1)
y[i]<<-temp2
if(temp2<dbeta(temp1,alpha,beta)/dbeta((alpha-1)/(alpha+beta2),alpha,beta)) {x[i]<<-temp1;break}}}
}

(c) Use the command betasim(5,10,50000). The output is stored in a vector x. Find the mean
of x. It should be close to 1/3.

Page 17 of 18

ACST357/862 General Insurance Pricing and Reserving


Solution to Question Eleven
> directpoly
function(xin,vecin){
temp=0
for(i in 1:length(vecin)){
temp=temp+vecin[i]*xin^(i-1)}
return(temp)}
>

Page 18 of 18

Das könnte Ihnen auch gefallen