Sie sind auf Seite 1von 24

An Easy Introduction To R

Ismail Basoglu
October 2, 2009

1 Introduction
This document contains an easy introduction to the programming language R. By the help of each
example given in this document, you should be able to gather a basic knowledge about R, which will
help you to comprehend and create financial applications in IE 586 and statistical applications in IE 508
courses. In order to comprehend this programming language, it is recommended that you try each and
every step of the applications presented in this document.
You can download the latest version of R from “http://cran.r-project.org/”. For Windows users, click
to the “Windows” link, then the “base” link and you will see the download link for the “*.exe” file.
Have fun!!!

2 R Works with Vectors


2.1 Creating Vectors
In order to assign a value to a specified variable (e.g. 3 to x), we do the following:
x<-3
or
x=3
We will use the operator <- in our future examples for assigning values.
When we assign a number to a variable, R counts it as a vector with a single element. So, by using
[.] next to the specified variable, we can assign another element onto any index we want. Finally, if we
want to see what is stored in that specified variable, simply we write its name and press enter.
x[4]<-7.5
x # press enter to display the content of x
# [1] 3.0 NA NA 7.5
Here, NA stands for “not available”. In fact, we have not assigned any values for the second and the
third indices.
You can use # to add comments on a command line. R will ignore the rest of the line after this
symbol. However, the next line will be executed by R. (So you do not have to close your comment line
with # when it ends)
We can create an consecutive integer vector between two integers by a simple command.
x<-1:8 # creates a consecutive integer vector
x
# [1] 1 2 3 4 5 6 7 8
y<-15:11
y
# [1] 15 14 13 12 11

1
Following operation will add 3 to each element of x and store it as y.
y<-x+3
y
# [1] 4 5 6 7 8 9 10 11
The previous command actually sums up a vector of length 8 with a single element vector. Here, R
repeats the short vector again and again until it reaches the length of the long vector. Following sequence
of commands explains this operation clearly.
x<-1:8
y<-1:4
x
# [1] 1 2 3 4 5 6 7 8
y
# [1] 1 2 3 4
x+y # we can see the summation without storing them to any new variable
# [1] 2 4 6 8 6 8 10 12
In this summation y is repeated upto index 8 (since x is a vector of length 8). So, the fifth element of
x is summed up with the first element of y, the sixth element of x is summed up with the second element
of y and so on. Yet, we might wonder what would it be if the length of y was not a multiple of the length
of y. We can try to see it.
x<-1:8
y<-1:3
x+y
# [1] 2 4 6 5 7 9 8 10
# Warning message:
# In x + y : longer object length is not a multiple of shorter object length
R again repeats the short vector until it reaches the length of the long vector. However, the last
repetition may not be complete. R returns a warning message about this, yet it executes the operation.
You should also know that you can make subtractions, multiplications, divisions, power and modular
arithmetic operations with the same sense. We will get into these operations in section 2.4.
We can create vectors also with specified values. As instance, let us create a vector of length 6 with
values 4, 8, 15, 16, 23, 42 and another vector of length 4 with values 501, 505, 578, 586. We use the function
c in order to combine those values i a vector. We can also learn about the number of elements in a vector
by using length() command.
x<-c(4,8,15,16,23,42) # "c"ombines a series of values
y<-c(501,505,578,586)
x
# [1] 4 8 15 16 23 42
y
# [1] 501 505 578 586
z<-c(x,y) # we can also combine two vectors
z
# [1] 4 8 15 16 23 42 501 505 578 586

length(z)
# [1] 10
We can also revert a vector from the last element to the first.
z<-rev(z) # we can use the same object to reassign that object
z
# [1] 586 578 505 501 42 23 16 15 8 4

2
Suppose we would like to create a vector of length 10, elements of which will all be equal to 5. We do
the following.
x<-rep(5,10) # "rep"eat 5 ten times
x
# [1] 5 5 5 5 5 5 5 5 5 5
y<-c(3,5,7)
z<-rep(y,4) # repeat vector y 4 times
z
# [1] 3 5 7 3 5 7 3 5 7 3 5 7

From the previous example, we see that we can also repeat vectors. As a last example for this section,
we would like to create a vector of length 21 between values 2 and 3, so that the difference between
consecutive elements will all be equal.
x<-seq(2,3,length.out=21) # "seq" stands for sequence
x
# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50
# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00
If we are not interested in the length of the sequence but the step size, we can use by parameter
instead of length.out.
x<-seq(2,3,by=0.05)
x
# [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50
# [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00

2.2 Logical Expressions


You can use
• < : less
• <=: less or equal
• > : greater
• >=: greater or equal
• ==: equal (do not forget that a single = symbol is used for assigning values)
• !=: not equal
to write logical expressions, so they will return a vector of TRUEs and FALSEs (in other words a vector
of zeroes and ones). In the following sequence of examples, we create a vector and use it in different
logical expressions. If a vector element satisfies the expression, it returns a TRUE, otherwise a FALSE in
the corresponding index. You can use && as “and” and || as “or” in between logical expressions.
x<-10:20
x
# [1] 10 11 12 13 14 15 16 17 18 19 20
x<17
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
x<=17
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
x>14
# [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
x>=14

3
# [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
x==16
# [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
x!=16
# [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE

x<-5
(x<=10) && (x>=8)
# [1] FALSE
(x<=10) || (x>=8)
# [1] TRUE
So what can we do with these logical expressions. As a first simple example, we have a vector x of
integers from 1 to 20. We want to obtain a vector such that for every element of x that is less than 8, it
will yield zero values and the other elements will remain the same as they are in x.
x<-1:20
y<-(x>=8)*(x)
y
# [1] 0 0 0 0 0 0 0 8 9 10 11 12 13 14 15 16 17
#[18] 18 19 20
As for the second example, we will evaluate the ordering costs of some goods. We can order at least
30 and at most 50 units of goods from our supplier in a single order. We have a fixed cost of 50$ if we
order less than or equal to 45 units and 15$ otherwise. A single unit costs 7$ if we order less than 40
units and 6.5$ otherwise. If we want to evaluate the total ordering cost for each alternative:
units<-30:50
marginalcost<-7*units*(units<40)+6.5*units*(units>=40)
marginalcost
# [1] 210.0 217.0 224.0 231.0 238.0 245.0 252.0 259.0
# [9] 266.0 273.0 260.0 266.5 273.0 279.5 286.0 292.5
#[17] 299.0 305.5 312.0 318.5 325.0

fixedcost<-50*(units<=45)+15*(units>45)
fixedcost
# [1] 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 15
#[18] 15 15 15 15

totalcost<-fixedcost+marginalcost
totalcost
# [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0
# [9] 316.0 323.0 310.0 316.5 323.0 329.5 336.0 342.5
#[17] 314.0 320.5 327.0 333.5 340.0
Following from the previous example, say we are not interested in an ordering that costs greater than
318$. Under these circumstances, we just want to make a list of the amount of units that we can order
and the list of costs correspond to that amount of units.
units[totalcost<=318]
# [1] 30 31 32 33 34 35 36 37 38 40 41 46
totalcost[totalcost<=318]
# [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0
# [9] 316.0 310.0 316.5 314.0
The first of the previous two commands tells to yield the elements of units vector only for which the
corresponding elements of totalcost vector is less than or equal to 318. The second command tells to
yield the elements of totalcost vector only which are less than or equal to 318.

4
Like we did in the previous example, we can extract a subvector (subset of a vector which follows the
same sequence) from a vector with different ways. Check out following examples:

x<-seq(5,8,by=0.3) # we will have 11 elements in this vector


x
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0

y1<-x[3:7] # extract a subvector from the indices 3 to 7


y1
# [1] 5.6 5.9 6.2 6.5 6.8

y2<-x[2*(1:5)] # extract a subvector from even indices


y2
# [1] 5.3 5.9 6.5 7.1 7.7

y3<-x[-1] # extract a subvector by eliminating the first index


y3
# [1] 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0

y4<-x[-length(x)] # extract a subvector by eliminating the last index


y4
# [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7

y5<-x[-seq(1,11,3)] # extract a subvector by eliminating all indices given


y5
# [1] 5.3 5.6 6.2 6.5 7.1 7.4 8.0

y6<-x[seq(1,11,3)] # extract a subvector by choosing all indices given


y6
# [1] 5.0 5.9 6.8 7.7

2.3 Creating Matrices


Every vector we create with the methods given in section 2.1 and 2.2 are vertical vectors by default. Do
not get confused with the display of the vector. We can create a horizontal vector by using the function
t(), where t stands for transpose.
x<-1:5
y<-t(x)
y
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
As you can see, R displays a horizontal vector in a completely different way. And if we take the
transpose of vector y, we will see the actual display of a vertical vector.
t(y)
# [,1]
# [1,] 1
# [2,] 2
# [3,] 3
# [4,] 4
# [5,] 5
In order to create an m × n matrix in R, first we need to create a vector (let us name it vec) which
contains the columns of the matrix sequentially from the first to the last. Then we use the function
simply matrix(vec,nrow=m,ncol=n).

5
vec<-1:12
x<-matrix(vec,nrow=3,ncol=4)

x
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12

t(x)
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9
# [4,] 10 11 12
You can take the inverse of a n × n matrix by using solve() function.
x<-matrix(c(1,2,-1,1,2,1,2,-2,-1),nrow=3,ncol=3)
x
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 2 -2
[3,] -1 1 -1

xinv<-solve(x)
xinv
# [,1] [,2] [,3]
# [1,] 0.0000000 0.25000000 -0.5
# [2,] 0.3333333 0.08333333 0.5
# [3,] 0.3333333 -0.16666667 0.0
You can create a matrix that has its all elements equal by writing that specific value into the first
parameter poistion in the function matrix(). You can also assign a vector into the diagonal elements of
a square matrix with the function diag().
x<-matrix(0,nrow=4,ncol=4)
x
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 0 0
# [4,] 0 0 0 0

diag(x)<-1 # assigns 1 to all diagonal elements of x


x
# [,1] [,2] [,3] [,4]
# [1,] 1 0 0 0
# [2,] 0 1 0 0
# [3,] 0 0 1 0
# [4,] 0 0 0 1
You can learn about the number of columns, number of rows and the total number of elements in a
matrix by following functions.
x<-matrix(0,ncol=5,nrow=4)
ncol(x)

6
# [1] 5
nrow(x)
# [1] 4
length(x)
# [1] 20

2.4 Arithmetic Operations on R


We made a little introduction to the arithmetic operations in the section 2.1. We have stated that we can
sum and multiply two vectors componentwisely and we can also make subtarction, division and modular
arithmetic operations in the same way.
x<-2*(1:5)
x
# [1] 2 4 6 8 10

y<-1:5
y
# [1] 1 2 3 4 5

x+y
# [1] 3 6 9 12 15

x*y
# [1] 2 8 18 32 50

x/y
# [1] 2 2 2 2 2

x-y
# [1] 1 2 3 4 5

x^2 # makes a power operation


# [1] 4 16 36 64 100
x^y
# [1] 2 16 216 4096 100000

x%%3 # yields mod(3) of every element in x


# [1] 2 1 0 2 1

y<-3:7
y
# [1] 3 4 5 6 7

x%%y # makes a productwise modular operation


# [1] 2 0 1 2 3

x%/%y # makes an integer division


# [1] 0 1 1 1 1
In the previous example, x and y were vertical vectors. Even if one of them were defined as a horizontal
vector, R again would do those operations but this time the results would also be horizontal vectors.
You can find the maximum value with max() and its minimum value with min(). You can sum up
all the elements of a vector with sum() and take the product of all the elements of a vector with prod().

x<-c(3,1,6,5,8,10,9,12,3)

7
min(x)
# [1] 1
max(x)
# [1] 12
sum(x)
# [1] 57
prod(x)
# [1] 2332800

You can compare two vectors componentwisely by pmax() and pmin(), so you can either obtain the
componentwise maximum or the componentwise minimum of two vectors (this will come very handy
especially in option pricing simulation). You can also sort a vector with the function sort() and the
function order() yields the sequence of indices when sorting a vector (both functions sort values from
minimum to maximum by default, but we can use additional parameter decreasing=TRUE to obtain an
order from maximum to minimum)
x<-1:10
y<-10:1
z<-c(3,2,1,6,5,4,10,9,8,7)

a<-pmax(x,y,z) # you can write as many vectors as you want


a
# [1] 10 9 8 7 6 6 10 9 9 10
sort(a)
# [1] 6 6 7 8 9 9 9 10 10 10
order(a)
# [1] 5 6 4 3 2 8 9 1 7 10

b<-pmin(x,y,z)
b
# [1] 1 2 1 4 5 4 4 3 2 1
sort(b,decreasing=TRUE)
# [1] 5 4 4 4 3 2 2 1 1 1
order(b,decreasing=TRUE)
# [1] 5 4 6 7 8 2 9 1 3 10
R can also do matrix multiplications with the operator %*%. This operator should be handled carefully
to obtain correct results. Be sure about the dimensions of your matrices. R is also capable of making
some corrections if the dimensions of the matrices do not hold.
x<-matrix(1:6,ncol=2,nrow=3)
x
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6

y<-matrix(1:4,ncol=2,nrow=2)
y
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4

x%*%y
# [,1] [,2]
# [1,] 9 19

8
# [2,] 12 26
# [3,] 15 33

y%*%x
# Error in y %*% x : non-conformable arguments

y%*%t(x) # taking the transpose should help


[,1] [,2] [,3]
[1,] 13 17 21
[2,] 18 24 30
Consider matrix multiplication of two vertical vectors. R corrects the first vector as a horizontal
vector and the operation yields a scalar. If we were to make a matrix multiplication of two horizontal
vectors, R would not be able to make any correction about this and would yield an error. To return an
outer product, the second vector must strictly be horizontal.
x<-1:3
y<-3:1

x%*%y
# [,1]
# [1,] 10

t(x)%*%t(y)
# Error in t(x) %*% t(y) : non-conformable arguments

t(x)%*%y # same as the first operation but we have a correct notation now
# [,1]
# [1,] 10

x%*%t(y) # only this one returns an outer product


# [,1] [,2] [,3]
# [1,] 3 2 1
# [2,] 6 4 2
# [3,] 9 6 3
Given a vector of real values, you can obtain the cumulative sums vector by the function cumsum()
and the cumulative products vector by the function cumprod().
x<-c(1,4,5,6,2,12)
y<-cumsum(x)
y
# [1] 1 5 10 16 18 30
# every index has the sum of the elements in x upto that index

z<-cumprod(x)
z
# [1] 1 4 20 120 240 2880
# every index has the product of the elements in x upto that index
You can evaluate the factorial of a positive real number with factorial() and absolute value of
any real number with abs(). You can take the square root of positive real number with sqrt(), and
the logarithm of a positive real number with log(). You can compute the exponential function of a real
number with exp() and the gamma function of a positive real number with gamma(). For integer rounding,
floor() yields the largest integer which is less than or equal to the specified value and ceiling() yields
the smallest integer which is greater than or equal to the specified value. as.integer() yields only the
integer part of the specified value.

9
x<-c(1,4,5,6,2,12)
factorial(3)
# [1] 6
factorial(1:6)
# [1] 1 2 6 24 120 720

abs(-4)
# [1] 4
abs(c(-3:3))
# [1] 3 2 1 0 1 2 3

sqrt(4)
# [1] 2
sqrt(1:9)
# [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
# [9] 3.000000

log(100) # this is natural logarithm unless any base is defined


# [1] 4.60517
log10(100) # this is logarithm with base 10
# [1] 2
log2(100) # this is logarithm with base 2
# [1] 6.643856
log(100,5) # this is logarithm with base 5, which is the second parameter in log()
# [1] 2.861353
log(c(10,20,30,40))
# [1] 2.302585 2.995732 3.401197 3.688879

exp(4.60517) # must yield 100, maybe with a rounding error


# [1] 99.99998
exp(log(100)) # no rounding errors
# [1] 100
exp(seq(-2,2,0.4))
# [1] 0.1353353 0.2018965 0.3011942 0.4493290 0.6703200 1.0000000 1.4918247
# [8] 2.2255409 3.3201169 4.9530324 7.3890561

gamma(5) # equivalent to factorial(4)


# [1] 24
gamma(5.5) # equivalent to factorial(4.5)
# [1] 52.34278

x<-c(-3,-3.5,4,4.2)
floor(x)
# [1] -3 -4 4 4
ceiling(x)
# [1] -3 -3 4 5
as.integer(x)
# [1] -3 -3 4 4

10
3 Probability and Statistical Basis of R
3.1 Probability Functions in R
There are four functions related to the distributions which are well-known and commonly used in proba-
bility theory and statistics. Let us give the definitions of those functions on normal distribution and then
talk about this probability distributions which are available in R.

• dnorm(x,y,z): returns the pdf (probability distribution function) value of x in a normal distribution
with mean y and standard deviation z.
• pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distribution
with mean y and standard deviation z.
• qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standard
deviation z. Clearly x must be in the closure of the unit interval (x ∈ [0, 1]).
• rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates will
follow a normal distribution with mean y and standard deviation z.

Check out the following examples about normal distribution:


dnorm(0.5) # if no parameter is defined, R assumes a std. normal distribution
# [1] 0.3520653
dnorm(0,2,1)
# [1] 0.05399097
dnorm(3,3,5)
# [1] 0.07978846

pnorm(0) # the area below the curve


# on the left side of "0" in a std. normal distribution
# [1] 0.5
pnorm(2)
# [1] 0.9772499
pnorm(5,3,1)
# [1] 0.9772499

# following are the inverse of the previous "pnorm()" functions


qnorm(0.5)
# [1] 0
qnorm(0.9772499)
# [1] 2.000001
qnorm(0.9772499,3,1)
# [1] 5.000001

rnorm(20,2,1) # will generate 20 RVs which follow normal dist.


# with mean 2 and std. dev. 1
# [1] 2.31502453 0.37445729 2.04994863 1.89381118 0.63099383 1.50837615
# [7] 0.57363369 2.84601422 2.54003868 3.43652548 0.88941281 3.36373629
# [13] 0.58945290 2.44678124 -0.05360271 2.73920472 2.73643684 1.79465998
# [19] 1.30906099 2.18648566

Here is a list of useful distributions that are available for computation in R. There are also other
distributions which are available in R but not in this list. (For each distribution below, you can obtain
the cdf function by changing the initial d to p, the inverse cdf by changing to q and random variate
generator by changing to r). Apart from the normal distribution, please intend to practice and learn
about d,p,q,r functions over the first six distribution in this list.

11
• dpois(x,y) : returns the pmf (probability mass function) value of x in a poisson distribution with
mean (rate) y.
• dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size y
and success probability z.
• dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y.
• dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upper
bound z.

• dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y.
• dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parameter
y and a scale parameter z. (If you do not write scale in parameter definition, it assumes z as the
rate parameter, which is equal to 1/scale)

• dcauchy(x,y,z) : returns the pdf value of x in a cauchy distribution with a location parameter y
and scale parameter z.
• dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom y
and the non-centrality parameter z.
• dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and the
non-centrality parameter z.
• df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degrees
of freedom-2 z and the non-centrality parameter a.
• dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersion
parameter y and success probability z.
• dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hypergeometric distri-
bution with a white population size y, a black population size z, number of drawings made from
the whole population a.
• dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y and
log-standard deviation z.
• dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y and
shape-2 parameter z.
• dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter y
and scale parameter z.
• dweibull(x,y,z) : returns the pdf value of x in a weibull distribution with a shape parameter y
and scale parameter z.

3.2 Statistical Functions in R and Analyzing Simulation Output


You can find the mean of a vector with the function mean(), its standard deviation with sd(), its variance
with var(), its median with median(). You can use the function summary() to learn about 25 and 75
per cent quantiles (which are called quartiles altogether with the median).
x<-rnorm(1000000,5,2) # x is a vector of 1000000 RVs
# which follow a normal dist. with mean 5 and std. dev. 2

mean(x)
# [1] 4.997776
sd(x)

12
# [1] 2.000817
var(x)
# [1] 4.003268
median(x)
# [1] 4.997408
summary(x)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -4.904 3.650 4.997 4.998 6.346 14.420
summary(x,digits=6)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -4.90360 3.65020 4.99741 4.99778 6.34564 14.42310
quantile(x) # this command yields the quartiles also
# 0% 25% 50% 75% 100%
# -4.903599 3.650201 4.997408 6.345639 14.423129

# quartiles can also be obtained by the following way


sort(x)[1000000*0.25]
# [1] 3.650189
sort(x)[1000000*0.5]
# [1] 4.997408
sort(x)[1000000*0.75]
# [1] 6.345639
Of course, when you try this sequence of commands, you will get different results since rnorm() will
produce RVs from a different seed.
In Monte Carlo simulation, we gather a vector of random variables xi , i = 1, . . . , n as the output.
Since we are interested in the expectation of the output
 random variable, we estimate this expectation
Pn
with the mean of the vector, which is x̄ = xi n. This estimate is also another random variable,
i=1
since when we run the simulation again, we will get a completely different mean. Due to central limit
theorem, as n goes to infinity, the difference between the actual value and this estimate follows a normal
distribution with mean 0 and standard deviation:
v v
u n u n
1tu X 2 1 u1 X 2 sd (x)
(xi − x̄) = √ t (xi − x̄) = √
n i=1 n n i=1 n

where sd(x) is the standard deviation of output vector and evaluated with sd() in R.

n<-1000000
x<-rexp(n,3) # suppose x is our simulation output vector
# it actually follows exponential distribution with rate 3
# but assume we do not know this fact

mean(x) # this is our expectation that we are interested in


# we also need the standard deviation of this expectation
# [1] 0.33318
sd(x)/sqrt(n) # this is the standard deviation of the mean
# [1] 0.0003337626
sd(x) # DO NOT CONFUSE this with the std. dev. of the mean.
# this is the std. dev. of the output vector which might not
# even follow a normal distribution
# [1] 0.3337626

# Here is an elegant way to summarize your simulation output


xest<-mean(x)

13
# Since xest follows a normal distribution, we can obtain a %95 confidence interval for it
error<-qnorm(0.975)*sd(x)/sqrt(n) # this is the radius for %95 confidence interval
ubound<-xest+error
lbound<-xest-error
res<-c(xest,error,ubound,lbound)
names(res)<-c("result","error estimate","%95ub","%95lb")
res
# result error estimate %95ub %95lb
# 0.3331800165 0.0006541626 0.3338341792 0.3325258539

4 Creating Functions and Defining Loops in R


4.1 Creating Functions in R
We use the following structure in order to create a specific function which is not already defined in R.
# f<-function(p1,p2,....) # define necessary parameters for the function
# {
# use defined parameters (arguments) and other tools to obtain your result
# write your result variable into the last line so the function will return it
# }
So the sequence of commands says f is a function with parameters (p1,p2,....) which does the
operations in {}.
Check out the following examples of simple functions to comprehend how to create functions in R.
# EXAMPLE 01
# A function that yields the circumference and the area of a circle given the radius
circle<-function(
r # radius length
){
cf<-2*pi*r # evaluates the circumference
a<-pi*r^2 # evaluates the enclosed area
res<-c(cf,a)
names(res)<-c("circumference","area")
res
}

circle(3)
# circumference area
# 18.84956 28.27433
circle(1)
# circumference area
# 6.283185 3.141593

# EXAMPLE 02
# A function that yields the perimeter and the area of a triangle
# given corner coordinates
# Check "www.mathopenref.com/coordtriangleareabox.html" for the explanation
triangle<-function(
a, # coordinate of 1st corner (must be a vector of length 2)
b, # coordinate of 2nd corner (must be a vector of length 2)
c # coordinate of 3rd corner (must be a vector of length 2)
){
if(length(a)!=2 || length(b)!=2 || length(c)!=2){
print("error, coordinates inappropriate")

14
}# evaluating the perimeter
ab<-sqrt((a[1]-b[1])^2+(a[2]-b[2])^2)
bc<-sqrt((c[1]-b[1])^2+(c[2]-b[2])^2)
ac<-sqrt((a[1]-c[1])^2+(a[2]-c[2])^2)
pm<-ab+bc+ac
# evaluating the area
trab<-abs((a[1]-b[1])*(a[2]-b[2]))/2
trbc<-abs((c[1]-b[1])*(c[2]-b[2]))/2
trac<-abs((a[1]-c[1])*(a[2]-c[2]))/2

maxxy<-pmax(a,b,c)
minxy<-pmin(a,b,c)

sqa<-min(max((a[1]-minxy[1])*(a[2]-minxy[2]),0),max((maxxy[1]-a[1])*(maxxy[2]-a[2]),0))
sqb<-min(max((b[1]-minxy[1])*(b[2]-minxy[2]),0),max((maxxy[1]-b[1])*(maxxy[2]-b[2]),0))
sqc<-min(max((c[1]-minxy[1])*(c[2]-minxy[2]),0),max((maxxy[1]-c[1])*(maxxy[2]-c[2]),0))
area<-(maxxy[1]-minxy[1])*(maxxy[2]-minxy[2])-trab-trbc-trac-sqa-sqb-sqc

pm<-(area!=0)*pm # if area=0, then there is no triangle

res<-c(pm,area)
names(res)<-c("perimeter","area")
res
}

coora<-c(23,18)
coorb<-c(13,34)
coorc<-c(50,5)
triangle(coora,coorb,coorc)
# perimeter area
# 95.84525 151.00000

coora<-c(10,18)
coorb<-c(13,34)
coorc<-c(50,5)
triangle(coora,coorb,coorc)
# perimeter area
# 105.3489 339.5000

coora<-c(3,5)
coorb<-c(9,15)
coorc<-c(6,10)
triangle(coora,coorb,coorc)
# perimeter area
# 0 0

Remember the ordering cost problem in section 2.2. We will create a function that yields the output
in case of a change in unit costs and ordering costs. In this function we will also assign default values to
input parameters. So, whenever a parameter is undefined in the function call, R will assume the default
value for this parameter.

# EXAMPLE 03
orderingcostlist<-function(
huc=7, # higher unit cost
luc=6.5, # lower unit cost

15
ucc=40, # minimum order amount with the lower unit cost
hfc=50, # higher fixed cost
lfc=15, # lower fixed cost
fcc=45, # maximum order amount with the higher unit cost
tcub=318 # total cost upper bound
){
units<-30:50
marginalcost<-huc*units*(units<ucc)+luc*units*(units>=ucc)
fixedcost<-hfc*(units<=fcc)+lfc*(units>fcc)
totalcost<-fixedcost+marginalcost
res<-totalcost[totalcost<=tcub]
names(res)<-units[totalcost<=tcub]
res
}

orderingcostlist() # will yield the same results before


# 30 31 32 33 34 35 36 37 38 40 41 46
# 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0 316.0 310.0 316.5 314.0

orderingcostlist(hfc=55,luc=6.3) # we just change two parameter values


# 30 31 32 33 34 35 36 37 40 41 46 47 48
# 265.0 272.0 279.0 286.0 293.0 300.0 307.0 314.0 307.0 313.3 304.8 311.1 317.4

In order to see the construction of an if-else statement in R, We will implement following function as
a last example.
 2

 x x < −2
x+6 −2 ≤ x < 0

f (x) =
−x + 6 0≤x<4
 √


x x≥4

# EXAMPLE 04
f<-function(x){
if(x<(-2)){
x^2
}else if(x<0){
x+6
}else if(x<4){
-x+6
}
else{
sqrt(x)
}
}

c(f(-4),f(-1),f(3),f(9))
# [1] 16 5 3 3

Note that you can also use predefined functions (R functions) as parameters. You will see an example
of this in section 4.2.

4.2 Defining Loops in R


A basic structure for a predefined number of loops, we use the following structure:
# for(i in x){ # as i gets sequential values from vector x in each loop

16
# do required operations depending on i variable
# }
You can do every vectoral operation with a for-loop. But in R, it takes longer to execute loops. Thus,
it is better to use vectoral operations when possible. The following example estimates the expectation
for the maximum of two standard uniform random variates, Y = max(U1 , U2 ), which is actually equal
to 2/3. We will not use pmax() function. Instead, we will define a for-loop. Now, this is our first Monte
Carlo simulation in this paper.
simmax2unif<-function(n){
y<-0
# in order to record the output of our simulation in "res"
# we should define it before the for-loop
for(i in 1:n){ # i will take integer values from 1 to n
u1<-runif(1)
u2<-runif(1)
y[i]<-max(u1,u2) # record the estimate as the "i"th entry
}
res<-mean(y)
res[2]<-qnorm(0.975)*sd(y)/sqrt(n)
names(res)<-c("expectation","error estimate")
res
}

simmax2unif(100000)
# expectation error estimate
# 0.665354266 0.001463458
system.time(x<-simmax2unif(100000)) # execution time in seconds
# user system elapsed
# 35.30 0.08 35.43

# Do the same simulation with pmax()


simmax2unif_2<-function(n){
u1<-runif(n)
u2<-runif(n)
y<-pmax(u1,u2)
res<-mean(y)
res[2]<-qnorm(0.975)*sd(y)/sqrt(n)
names(res)<-c("expectation","error estimate")
res
}

simmax2unif_2(1000000)
# expectation error estimate
# 0.6665182787 0.0004621282
system.time(x<-simmax2unif_2(100000)) # execution time in seconds
# user system elapsed
# 0.03 0.00 0.03
As you can see, vectoral operations work way much faster than loops. Still, under some circumstances,
loops might be the only option to make a computation.
While-loops are useful espacially for the convergence algorithms. For undefined number of loops, we
use a while-loop, which is defined as follows:
# while(condition){ # as long as the condition is satisfied, run the loop
# do required operations
# }

17
Here is a basic root finding algorithm that uses a while-loop:
# a root finding algorithm
# finds the unique real root of a continuous function in an interval
# the function should intersect with x-axis and should not be a tangent to x-axis
findroot<-function(
f, # continuous function that we will solve for zero
interval, # the interval where we have a single solution (a vector of length 2)
errbound=1e-12, # maximum approximation error
trace=FALSE # if trace is true, print the covergent sequence
){
a<-interval[1]
b<-interval[2]
if(f(a)*f(b)>0){
print("error - no solution or more than one solution")
}else{
counter<-0
res<-0
err<-abs(a-b)
while(err>errbound){
c<-(a+b)/2
fc<-f(c)
if(f(a)*fc>0){
a<-c
}else{
b<-c
}
err<-abs(a-b)
counter<-counter+1
res[counter]<-a
}
print(c(a,counter))
if(trace){
print(res)
}
}
}

func<-function(x){x^2-2}
int<-c(1,2)
findroot(func,int)
# [1] 1.414214 40.000000
findroot(func,int,trace=TRUE)
# [1] 1.414214 40.000000
# [1] 1.000000 1.250000 1.375000 1.375000 1.406250 1.406250 1.414062 1.414062
# [9] 1.414062 1.414062 1.414062 1.414062 1.414185 1.414185 1.414185 1.414200
# [17] 1.414207 1.414211 1.414213 1.414213 1.414213 1.414213 1.414214 1.414214
# [25] 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214
# [33] 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214 1.414214

5 Drawing Plot Diagrams and Histograms in R


We would like to draw a plot diagram for the density function of standard normal distirbution in the
interval (-4,4). We should create a dense vector in the x-axis (it should be dense in order to make a good

18
approximation), and evaluate their function responses as a second vector.

x<-seq(-4,4,length.out=51) # this is not enoughly dense


y<-dnorm(x)
plot(x,y) # plots with blank dots (figure 1)

windows() # you can use this command to display your diagram in a new window
plot(x,y,type="l") # connects the same dots (figure 2)

x<-seq(-4,4,length.out=10001) # this is a dense vector


y<-dnorm(x)
windows()
plot(x,y,type="l") # connects more dense dots (figure 3)
# diagrams are in the next page

Now, we want to see how to draw a histogram of a vector in R with hist(). Histograms are quite
pretty tools to see the distribution of a given data. You can obtain a better histogram by changing break
parameter.
x<-rnorm(1000000,3,1.5)
# a vector of normal RVs with mean 3 and std. dev. 1.5

hist(x)

windows()
hist(x,breaks=50)

windows()
hist(x,breaks=100)
# histograms are in the next page
You can also add new lines and functions to a plot diagram or a histogram which is already displayed.
We use lines() command with a similar use of plot() command. This time, there is no necessity for
adding a type parameter. You can also add lines to existing diagrams with abline() command. Check
out following examples.
hist(x,breaks=100)
y<-seq(-5,10,length.out=100001)
lines(y,dnorm(y,3,1.5)*200000)

y<-seq(-5,10,length.out=101)
windows()
plot(y,dnorm(y,3,1.5))
lines(y,dnorm(y,3,1.5))

windows()
plot(y,dnorm(y,3,1.5),type="l")
abline(v=4.5) # add a "v"ertical line on x=4.5
abline(v=1.5) # add a "v"ertical line on x=1.5
abline(h=dnorm(1.5,3,1.5)) # add a "h"orizontal line on y=dnorm(1.5,3,1.5)
abline(a=0.10,b=0.01) # add a line with slope=0.01 and intercept=0.10
# diagrams are in the next page

19
Figure 1: Plot diagrams for the density function of standard normal distribution

Figure 2: Histograms of a vector of normal RVs with mean 3 and std. dev. 1.5

Figure 3: Adding lines on existing diagrams with lines() (1-2) and abline() (3) commands

20
6 Basic User Information
6.1 Scaning and Printing Data
Assume that you have a data (containing real numbers) written in a text file in the following format.
3 25 94.9 12
547 32556 56
89 567
435 342.1
76.5 983.2
0 343
# There are 15 real values
You can use the command scan() in order to store this data in a vector by scanning it from left to
right and top to down. Spaces and new lines will separate the values to store them in new indices.
x<-scan()
# press enter after writing this line, it will display "1:" on the command line
# Press CTRL+V to paste the copied data, 15 real values will be stored in x
# it will display "16:" on the command line
# Press enter in order to finish scanning process, 16th index will be ignored

# 1: 3 25 94.9 12
# 5: 547 32556 56
# 8: 89 567
# 10: 435 342.1
# 12: 76.5 983.2
# 14: 0 343
# 16:
# Read 15 items

x
# [1] 3.0 25.0 94.9 12.0 547.0 32556.0 56.0 89.0 567.0
# [10] 435.0 342.1 76.5 983.2 0.0 343.0
You can also scan a column of cells from an Excel sheet, but not rows. Be careful that the decimal
separator is (.) in R. So you can only scan values that uses (.) as the decimal separator.
You can also read tables from a text file. Assume you have a text file containing a data similar to the
following format:
length weight age
1.72 72.3 25
1.69 85.3 23
1.80 75.0 26
1.61 66 23
1.73 69 24
# 3 values in each row
Right click to the R shortcut on your desktop. Choose properties and see your “Start In” directory
(you can also change it). Copy your text file and paste it in that directory. Suppose it is named data.txt.
Write the following command:
x<-read.table(file="data.txt",header=TRUE)
# if you do not have any headers in your data, choose header as FALSE
x # press enter to display x table
# length weight age
# 1 1.72 72.3 25

21
# 2 1.69 85.3 23
# 3 1.80 75.0 26
# 4 1.61 66.0 23
# 5 1.73 69.0 24
x$length
# [1] 1.72 1.69 1.80 1.61 1.73
x$weight
# [1] 72.3 85.3 75.0 66.0 69.0
x$age
# [1] 25 23 26 23 24
In order to read tables from Excel sheet, you can just copy and paste it to a text file, so you can read
the table from that file.
You can print a comment or a vector within a function by using print() command. To print a
comment, do not forget to put it in a quotation.

print("error")
# [1] "error"
x<-1:5
print(x)
# [1] 1 2 3 4 5

6.2 Session Management


You can find detailed information about the functions which came with R. You can learn about the
parameters (arguments) that are available within the function and a few examples about the function.
Just write ? and the name of the function that you want to learn information about. Check out the
explanations given in R about following functions.
?det
?sample
?sin
?cbind
You can use apropos(".") to find a list of all functions that contains a specific word. These functions
can be given with the default library or can be defined by you in that work session.
apropos("norm")
# [1] "dlnorm" "dnorm" "normalizePath" "plnorm"
# [5] "pnorm" "qlnorm" "qnorm" "qqnorm"
# [9] "qqnorm.default" "rlnorm" "rnorm"

apropos("exp")
# [1] ".__C__expression" ".expand_R_libs_env_var" ".Export"
# [4] ".mergeExportMethods" ".standard_regexps" "as.expression"
# [7] "as.expression.default" "char.expand" "dexp"
# [10] "exp" "expand.grid" "expand.model.frame"
# [13] "expm1" "expression" "getExportedValue"
# [16] "getNamespaceExports" "gregexpr" "is.expression"
# [19] "namespaceExport" "path.expand" "pexp"
# [22] "qexp" "regexpr" "rexp"
# [25] "SSbiexp" "USPersonalExpenditure"
If you need to see all the objects that you have created in your work session, simply write objects().
objects()
# [1] "a" "b" "circle" "coora"

22
# [5] "coorb" "coorc" "error" "f"
# [9] "findroot" "fixedcost" "func" "int"
# [13] "lbound" "marginalcost" "n" "orderingcostlist"
# [17] "res" "simmax2unif" "simmax2unif_2" "totalcost"
# [21] "triangle" "ubound" "units" "vec"
# [25] "x" "xest" "xinv" "y"
# [29] "y1" "y2" "y3" "y4"
# [33] "y5" "y6" "z"

7 Exercises
1. Write an R function that takes
• your initial capital K = 100,
• continuously compounding interest rate r = 0.12,
• a vector ty of times (yearly)
as parameters (arguments). The function should yield the state of your capital at the end of time
periods given in ty vector. The function should also yield a plot diagram which shows the state
of the capital on y-axis and time on x-axis.
Note: The state of the capital at time t is evaluated by K(t) = K × ert .

(a) Run the function in order to show your capital state at the end of every year until the end of
the 10th year.
(b) Run the function in order to show your capital state at the end of every month until the end
of the 3rd year.

2. We are interested in finding π with Monte Carlo simulation.


Hint: Generate two vectors of standard uniform RVs of length n = 10000. Coupling them will yield
uniformly distributed points on [0, 1] × [0, 1]. Count the number of points which fall into the unit
circle, say it is x. Now, 4 × x/n should yield an estimate for π. Repeat the procedure nout = 100
times.
Note: You can find interesting information about π in the link:
http://www.sixtysymbols.com/videos/035.htm
3. James and Dwight are flipping a coin which has a head probability p. James scores 1 point whenever
head comes and Dwight scores 1 point whenever tail comes. The game ends whenever somebody
gets 10 points ahead. We are interested in the number of coin flips that James and Dwight should
realize in order to claim a winner. Use Monte Carlo simulation to solve the following questions.

(a) What is the expected number coin flips if p = 0.4? Draw a histogram of the simulation output.
(b) What is the expected number coin flips if p = 0.5? Draw a histogram of the simulation output.
(c) What is the expected number coin flips if p = 0.55 Draw a histogram of the simulation output.

Note: The difference between the scores at ith coin flip D(i) = j(i) − d(i) is called a random walk
process. The problem can be solved analytically with a Markov Chain structure, however we are
interested in a solution with Monte Carlo simulation.
Hint 1 (primitive method): Create a for-loop of length n that stores the number of games they
have to play to claim a winner (in result vector).
In each for-loop, run a while-loop that generates a standart uniform random variate to identify the
winner and adds one point to his score. Before closing while-loop, evaluate the absolute difference
between scores (that will decide to break out from while-loop or not). Also put a counter into the

23
while-loop, in order to find the number of coin flips. Store this number in the corresponding index
of your result vector just when you break out from the while-loop.
Hint 2 (a more fast and professional way): Generate a vector of standard uniform RVs (of
a length k). Identify winners for each round with a single uniform RV. Use a trick with cumsum()
(How?). If a sequence of k rounds is not enough to make 10 points absolute difference, add another
k uniform RVs at the end of the previous uniform RVs vector. Go on untill you get a 10 points
difference. Then, find the first index that yields a 10 point absolute difference.
Repeat the whole procedure n times.

24

Das könnte Ihnen auch gefallen