R Tutorial PDF

An Easy Introduction To R for IE 460, IE 508 and IE 586 Course Participants
Ismail Baolu s g February 23, 2012
Introduction
This document contains an easy introduction to the programming language R. By the help of each example given in this document, you should be able to gather a basic knowledge about R, which will help you To use predened functions of statistical forecasting models and realize an eective analysis of given data or time series in IE 460 Statistical Forecasting and Time Series course, To run statistical tests, build statistical models and apply inferential methods regarding the topics in IE 508 Statistical Inference course, To create nancial applications and implement Monte Carlo Methods in IE 586 Quantitative Finance course. In order to comprehend this programming language, it is recommended that you try each and every step of the applications presented in this document. You can download the latest version of R from http://cran.r-project.org/. For Windows users, click Windows link, then the base link and you will see the download link for the *.exe le. Once you install R, we recommend you to write your code in script les. Just click File from the quick access bar, then New script and you can write your code inside this script. If you have a complete code in your script le, you can press Ctrl+A and then Ctrl+R to run your code in the R console in a fast manner. You can always save your script les, then reach them again by clicking File and Open script from the quick access bar. Have fun!
2
2.1
R Works with Vectors

Creating Vectors
In order to assign a value to a specied variable (e.g. 3 to x), we do the following: x <- 3 or x = 3 We will use the operator <- in our future examples for assigning values. 1 When we assign a number to a variable, R considers it as a vector with a single element. So, by using [.] next to the specied variable, we can assign another element onto any index we want. Finally, if we want to see what is stored in that specied variable, simply we write its name and press enter. x[4] <- 7.5 x # press enter to display the content of x # [1] 3.0 NA NA 7.5 Here, NA stands for not available. In fact, we have not assigned any values for the second and the third indices of the vector x. You can use # to add comments on a command line. R will ignore the rest of the line after this symbol. However, the next line will be executed by R. (So you do not have to close your comment line with # when you nish) We can create an consecutive integer vector between two integers by a simple command. x x # y y # <- 1:8 # creates a consecutive integer vector [1] 1 2 3 4 5 6 7 8 <- 15:11 [1] 15 14 13 12 11 Following operation will add 3 to each element of x and store it as y. y <- x+3 y # [1] 4
9 10 11
The previous command actually sums up a vector of length 8 with a single element vector. Here, R repeats the short vector again and again until it reaches the length of the long vector. Following sequence of commands explains this operation clearly. x <- 1:8 y <- 1:4 x # [1] 1 2 3 4 5 6 7 8 y # [1] 1 2 3 4 x+y # we can see the summation without storing them to any new variable # [1] 2 4 6 8 6 8 10 12 In this summation y is repeated up to index 8 (since x is a vector of length 8). So, the fth element of x is summed up with the rst element of y, the sixth element of x is summed up with the second element of y and so forth. Yet, we might wonder what would it be if the length of y was not a multiple of the length of y. We can try to see it.
1 There are signicant dierences between assignment operators, however these dierences do not have a high importance for the applications in IE 460, IE 508 and IE 586 courses. Still, you can nd a brief explanation in the following link: http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x <- 1:8 y <- 1:3 x+y # [1] 2 4 6 5 7 9 8 10 # Warning message: # In x + y : longer object length is not a multiple of shorter object length R again repeats the short vector until it reaches the length of the long vector. However, the last repetition may not be complete. R returns a warning message about this, yet it executes the operation. You should also know that you can make subtractions, multiplications, divisions, power and modular arithmetic operations with the same sense. We will get into these operations in section 2.4. We can create vectors also with specied values. As instance, let us create a vector of length 6 with values 4, 8, 15, 16, 23, 42 and another vector of length 4 with values 501, 505, 578, 586. We use the function c in order to combine those values in a vector. We can also learn about the number of elements in a vector by using length() command. x y x # y # z z # <- c(4,8,15,16,23,42) # "c"ombines a series of values <- c(501,505,578,586) [1] 4 8 15 16 23 42
[1] 501 505 578 586 <- c(x,y) # we can also combine two vectors [1] 4 8 15 16 23 42 501 505 578 586
length(z) # [1] 10 We can also revert a vector from the last element to the rst. z <- rev(z) # we can use the same object to reassign that object z # [1] 586 578 505 501 42 23 16 15 8 4 Suppose we would like to create a vector of length 10, elements of which will all be equal to 5. We do the following. x <- rep(5,10) # "rep"eat 5 ten times x # [1] 5 5 5 5 5 5 5 5 5 5 y <- c(3,5,7) z <- rep(y,4) # repeat vector y 4 times z # [1] 3 5 7 3 5 7 3 5 7 3 5 7 rep(y,c(2,3,5)) # repeat the elements of y vector at an amount # of the elements of the next vector # [1] 3 3 5 5 5 7 7 7 7 7 From the previous example, we see that we can also repeat vectors. As a last example for this section, we would like to create a vector of length 21 between values 2 and 3, so that the dierence between consecutive elements will all be equal. x <- seq(2,3,length.out=21) # "seq" stands for sequence x # [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 # [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00
If we are not interested in the length of the sequence but the step size, we can use by parameter instead of length.out. x <- seq(2,3,by=0.05) x # [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 # [12] 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00
2.2
Logical Expressions
You can use the following logical operators to write logical expressions, so they will return a vector of TRUEs and FALSEs (in other words, a vector of zeros and ones, which can also be used in vector operations). < : less than <=: less than or equal to > : greater than >=: greater than or equal to ==: equal to (do not forget that a single = symbol is used for assigning values) !=: not equal to In the following sequence of examples, we create a vector and use it in dierent logical expressions. If a vector element satises the expression, it returns a TRUE, otherwise a FALSE in the corresponding index. You can use && as and and || as or in between logical expressions. x <- 10:20 x # [1] 10 11 x<17 # [1] TRUE x<=17 # [1] TRUE x>14 # [1] FALSE x>=14 # [1] FALSE x==16 # [1] FALSE x!=16 # [1] TRUE
12 13 14 15 16 17 18 19 20 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
TRUE FALSE
x <- 5 (x<=10) && (x>=8) # [1] FALSE (x<=10) || (x>=8) # [1] TRUE So, what can we do with these logical expressions? As a rst simple example, we have a vector x of integers from 1 to 20. We want to obtain a vector such that for every element of x that is less than 8, it will yield zero values and the other elements will remain the same as they are in x.
x <- 1:20 y <- (x>=8)*(x) y # [1] 0 0 0 0 #[18] 18 19 20
9 10 11 12 13 14 15 16 17
As for the second example, we will evaluate the ordering costs of some goods. We can order at least 30 and at most 50 units of goods from our supplier in a single order. We have a xed cost of 50$ if we order less than or equal to 45 units and 15$ otherwise. A single unit costs 7$ if we order less than 40 units and 6.5$ otherwise. If we want to evaluate the total ordering cost for each alternative: units <- 30:50 marginalcost <- 7*units*(units<40)+6.5*units*(units>=40) marginalcost # [1] 210.0 217.0 224.0 231.0 238.0 245.0 252.0 259.0 # [9] 266.0 273.0 260.0 266.5 273.0 279.5 286.0 292.5 #[17] 299.0 305.5 312.0 318.5 325.0 fixedcost <- 50*(units<=45)+15*(units>45) fixedcost # [1] 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 15 #[18] 15 15 15 15 totalcost <- fixedcost+marginalcost totalcost # [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0 # [9] 316.0 323.0 310.0 316.5 323.0 329.5 336.0 342.5 #[17] 314.0 320.5 327.0 333.5 340.0 Following from the previous example, say we are not interested in an ordering that costs greater than 318$. Under these circumstances, we just want to make a list of the amount of units that we can order and the list of costs correspond to that amount of units. units[totalcost<=318] #returns the amount of units corresponding #to a cost less than or equal to 318 # [1] 30 31 32 33 34 35 36 37 38 40 41 46 totalcost[totalcost<=318] #returns the total costs corresponding to a total #cost less than or equal to 318 (got the idea?) # [1] 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0 # [9] 316.0 310.0 316.5 314.0 The rst of the previous two commands tells to yield the elements of units vector only for which the corresponding elements of totalcost vector is less than or equal to 318. The second command tells to yield the elements of totalcost vector only which are less than or equal to 318. Like we did in the previous example, we can extract a subvector (subset of a vector which inherits the same sequence) from a vector with dierent ways. Check out following examples:
x <- seq(5,8,by=0.3) # we will have 11 elements in this vector x # [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0 length(x) # [1] 11 y1 <- x[3:7] # extract a subvector from the indices 3 to 7 y1 # [1] 5.6 5.9 6.2 6.5 6.8 y2 <- x[2*(1:5)] # extract a subvector from even indices y2 # [1] 5.3 5.9 6.5 7.1 7.7 y3 <- x[-1] # extract a subvector by eliminating the first index y3 # [1] 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 8.0 y4 <- x[-length(x)] # extract a subvector by eliminating the last index y4 # [1] 5.0 5.3 5.6 5.9 6.2 6.5 6.8 7.1 7.4 7.7 y5 <- x[-seq(1,11,3)] # extract a subvector by eliminating all indices given y5 # [1] 5.3 5.6 6.2 6.5 7.1 7.4 8.0 y6 <- x[seq(1,11,3)] # extract a subvector by choosing all indices given y6 # [1] 5.0 5.9 6.8 7.7
2.3
Creating Matrices
Every vector we create with the methods given in section 2.1 and 2.2 are vertical vectors by default. Do not get confused with the display of the vector. We can create a horizontal vector by using the function t(), where t stands for transpose. x <- 1:5 y <- t(x) y # [,1] [,2] [,3] [,4] [,5] # [1,] 1 2 3 4 5 As you can see, R displays a horizontal vector in a completely dierent way. And if we take the transpose of vector y, we will see the actual display of a vertical vector. t(y) #or we can just go with t(t(x)) # [,1] # [1,] 1 # [2,] 2 # [3,] 3 # [4,] 4 # [5,] 5 In order to create an m n matrix in R, rst we need to create a vector (let us name it vec) which contains the columns of the matrix sequentially from the rst to the last. Then we use the function simply matrix(vec,nrow=m,ncol=n). 7
vec <- 1:12 x <- matrix(vec,nrow=3,ncol=4) x # [,1] [,2] [,3] [,4] # [1,] 1 4 7 10 # [2,] 2 5 8 11 # [3,] 3 6 9 12 t(x) # [,1] [,2] [,3] # [1,] 1 2 3 # [2,] 4 5 6 # [3,] 7 8 9 # [4,] 10 11 12 It is also possible to assign the elements of a matrix row by row. vec <- 1:12 x <- matrix(vec,nrow=3,ncol=4,byrow=TRUE) x # [,1] [,2] [,3] [,4] # [1,] 1 2 3 4 # [2,] 5 6 7 8 # [3,] 9 10 11 12 You can take the inverse of a n n matrix by using solve() function. x <- matrix(c(1,2,-1,1,2,1,2,-2,-1),nrow=3,ncol=3) x [,1] [,2] [,3] [1,] 1 1 2 [2,] 2 2 -2 [3,] -1 1 -1 xinv <- solve(x) xinv # [,1] [,2] [,3] # [1,] 0.0000000 0.25000000 -0.5 # [2,] 0.3333333 0.08333333 0.5 # [3,] 0.3333333 -0.16666667 0.0 You can create a matrix that has its all elements equal by writing that specic value into the rst parameter position in the function matrix(). You can also assign a vector into the diagonal elements of a square matrix with the function diag().
x x # # # # #
<- matrix(0,nrow=4,ncol=4) [,1] [,2] [,3] [,4] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1,] [2,] [3,] [4,]
diag(x) <- 1 # assigns 1 to all diagonal elements of x x # [,1] [,2] [,3] [,4] # [1,] 1 0 0 0 # [2,] 0 1 0 0 # [3,] 0 0 1 0 # [4,] 0 0 0 1 You can learn about the number of columns, number of rows and the total number of elements in a matrix by following functions. x <- matrix(0,ncol=5,nrow=4) ncol(x) # [1] 5 nrow(x) # [1] 4 length(x) # [1] 20
2.4
Arithmetic Operations on R
We made a little introduction to the arithmetic operations in the section 2.1. We have stated that we can sum and multiply two vectors componentwisely and we can also make subtraction, division and modular arithmetic operations with the same manner. x <- 2*(1:5) x # [1] 2 4 6 8 10 y <- 1:5 y # [1] 1 2 3 4 5 x+y # [1] 3 6 9 12 15 x*y # [1] 2 8 18 32 50 x/y # [1] 2 2 2 2 2 x-y # [1] 1 2 3 4 5 x^2 # makes a power operation # [1] 4 16 36 64 100 x^y # [1] 2 16 216 4096 100000 x%%3 # yields mod(3) of every element in x # [1] 2 1 0 2 1
y <- 3:7 y # [1] 3 4 5 6 7 x%%y # makes a productwise modular operation # [1] 2 0 1 2 3 x%/%y # makes an integer division # [1] 0 1 1 1 1 In the previous example, x and y were vertical vectors. Even if one of them was dened as a horizontal vector, R again would do those operations but this time the results would also be horizontal vector. You can nd the maximum value with max() and its minimum value with min(). You can sum up all the elements of a vector with sum() and take the product of all the elements of a vector with prod(). x <- c(3,1,6,5,8,10,9,12,3) min(x) # [1] 1 max(x) # [1] 12 sum(x) # [1] 57 prod(x) # [1] 2332800 You can compare two vectors componentwisely by pmax() and pmin(), so you can either obtain the componentwise maximum or the componentwise minimum of two vectors2 . You can also sort a vector with the function sort() and the function order() yields the index sequence of sorted vector. Both functions sort values from minimum to maximum by default, but we can use additional parameter decreasing=TRUE to obtain an order from maximum to minimum. x <- 1:10 y <- 10:1 z <- c(3,2,1,6,5,4,10,9,8,7) a <- pmax(x,y,z) # a # [1] 10 9 8 7 sort(a) # [1] 6 6 7 8 order(a) # [1] 5 6 4 3 you can write as many vectors as you want 6 9 2 6 10 9 8 9 9 10
9 10 10 10 9 1 7 10
b <- pmin(x,y,z) b # [1] 1 2 1 4 5 4 4 3 2 1 sort(b,decreasing=TRUE) # [1] 5 4 4 4 3 2 2 1 1 1 order(b,decreasing=TRUE) # [1] 5 4 6 7 8 2 9
3 10
R can also do matrix multiplications with %*% operator. This operator should be handled carefully to obtain correct results. Be sure about the dimensions of your matrices. R is also capable of making some corrections if the dimensions of the matrices do not hold.
2 For
IE 586 students, this might come very handy especially in option pricing simulation.
10
x x # # # #
<- matrix(1:6,ncol=2,nrow=3) [,1] [,2] 1 4 2 5 3 6
[1,] [2,] [3,]
y <- matrix(1:4,ncol=2,nrow=2) y # [,1] [,2] # [1,] 1 3 # [2,] 2 4 x%*%y # [,1] [,2] # [1,] 9 19 # [2,] 12 26 # [3,] 15 33 y%*%x # Error in y %*% x : non-conformable arguments y%*%t(x) # taking the transpose should help [,1] [,2] [,3] [1,] 13 17 21 [2,] 18 24 30 Consider matrix multiplication of two vertical vectors. R corrects the rst vector as a horizontal vector and the operation yields a scalar. If we were to make a matrix multiplication of two horizontal vectors, R would not be able to make any correction about this and would yield an error. To return an outer product, the second vector must strictly be horizontal. x <- 1:3 y <- 3:1 x%*%y # R makes a correction by applying transpose to x # [,1] # [1,] 10 t(x)%*%t(y) # Error in t(x) %*% t(y) : non-conformable arguments t(x)%*%y # same as the first operation but we have a correct notation now # [,1] # [1,] 10 x%*%t(y) # only this one returns an outer product # [,1] [,2] [,3] # [1,] 3 2 1 # [2,] 6 4 2 # [3,] 9 6 3 Given a vector of real values, you can obtain the cumulative sums vector by the function cumsum() and the cumulative products vector by the function cumprod(). On the other hand diff() gives you the dierences between the consecutive elements of a vector.
11
x y y # #
<- c(1,4,5,6,2,12) <- cumsum(x) [1] 1 5 10 16 18 30 every index has the sum of the elements in x up to that index
z <- cumprod(x) z # [1] 1 4 20 120 240 2880 # every index has the product of the elements in x up to that index diff(z) # [1]
16
100
120 2640
You can evaluate the factorial of a positive real number with factorial() and absolute value of any real number with abs(). You can take the square root of positive real number with sqrt(), and the logarithm of a positive real number with log(). You can compute the exponential function of a real number with exp() and the gamma function of a positive real number with gamma(). For integer rounding, floor() yields the largest integer which is less than or equal to the specied value and ceiling() yields the smallest integer which is greater than or equal to the specied value. as.integer() yields only the integer part of the specied value. factorial(3) # [1] 6 factorial(1:6) # [1] 1 2
24 120 720
abs(-4) # [1] 4 abs(c(-3:3)) # [1] 3 2 1 0 1 2 3 sqrt(4) # [1] 2 sqrt(1:9) # [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427 # [9] 3.000000 log(100) # this is natural logarithm unless any base is defined # [1] 4.60517 log10(100) # this is logarithm with base 10 # [1] 2 log2(100) # this is logarithm with base 2 # [1] 6.643856 log(100,5) # this is logarithm with base 5, which is the second parameter in log() # [1] 2.861353 log(c(10,20,30,40)) # [1] 2.302585 2.995732 3.401197 3.688879
12
exp(4.60517) # must yield 100, maybe with a rounding error # [1] 99.99998 exp(log(100)) # no rounding errors # [1] 100 exp(seq(-2,2,0.4)) # [1] 0.1353353 0.2018965 0.3011942 0.4493290 0.6703200 1.0000000 1.4918247 # [8] 2.2255409 3.3201169 4.9530324 7.3890561 gamma(5) # equivalent to factorial(4) # [1] 24 gamma(5.5) # equivalent to factorial(4.5) # [1] 52.34278 x <- c(-3,-3.5,4,4.2) floor(x) # [1] -3 -4 4 4 ceiling(x) # [1] -3 -3 4 5 as.integer(x) # [1] -3 -3 4 4
13
3
3.1
Probability and Statistical Basis of R

Probability Functions in R
There are four functions related to the distributions which are well-known and commonly used in probability theory and statistics. Let us give the denitions of those functions on normal distribution and then talk about this probability distributions which are available in R. dnorm(x,y,z): returns the pdf (probability distribution function) value of x in a normal distribution with mean y and standard deviation z. pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distribution with mean y and standard deviation z. qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standard deviation z. Clearly x must be in the unit interval (x [0, 1]). rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates will follow a normal distribution with mean y and standard deviation z. Check out the following examples about normal distribution: dnorm(0.5) # if no parameter is defined, R assumes a std. normal distribution # [1] 0.3520653 dnorm(0,2,1) # [1] 0.05399097 dnorm(3,3,5) # [1] 0.07978846 pnorm(0) # the area below the curve # on the left side of "0" in a std. normal distribution # [1] 0.5 pnorm(2) # [1] 0.9772499 pnorm(5,3,1) # [1] 0.9772499 # following are the inverse of the previous "pnorm()" functions qnorm(0.5) # [1] 0 qnorm(0.9772499) # [1] 2.000001 qnorm(0.9772499,3,1) # [1] 5.000001 rnorm(20,2,1) # will generate 20 RVs which follow normal dist. # with mean 2 and std. dev. 1 # [1] 2.31502453 0.37445729 2.04994863 1.89381118 0.63099383 # [7] 0.57363369 2.84601422 2.54003868 3.43652548 0.88941281 # [13] 0.58945290 2.44678124 -0.05360271 2.73920472 2.73643684 # [19] 1.30906099 2.18648566
1.50837615 3.36373629 1.79465998
14
Here is a list of useful distributions that are available for computation in R. There are also other distributions which are available in R but not in this list. (For each distribution below, you can obtain the cdf function by changing the initial letter d to p, the inverse cdf by changing to q and random variate generator by changing to r). Apart from the normal distribution, please intend to practice and learn about d,p,q,r functions over the rst nine distributions in this list3 . dpois(x,y) : returns the pmf (probability mass function) value of x in a Poisson distribution with mean (rate) y. dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size y and success probability z. dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y. dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upper bound z. dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y. dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parameter y and a scale parameter z. (If you do not write scale in parameter denition, it assumes z as the rate parameter, which is equal to 1/scale) dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom y and the non-centrality parameter z. dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and the non-centrality parameter z. df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degrees of freedom-2 z and the non-centrality parameter a. dcauchy(x,y,z) : returns the pdf value of x in a Cauchy distribution with a location parameter y and scale parameter z. dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersion parameter y and success probability z. dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hyper geometric distribution with a white population size y, a black population size z, number of drawings made from the whole population a. dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y and log-standard deviation z. dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y and shape-2 parameter z. dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter y and scale parameter z. dweibull(x,y,z) : returns the pdf value of x in a Weibull distribution with a shape parameter y and scale parameter z.
3 For
IE 586 students, it is sucient to practice and learn about the rst ve distributions in the list.
15
3.2
Statistical Functions in R
You can nd the mean of a vector with the function mean(), its standard deviation with sd(), its variance with var(), its median with median(). You can use the function summary() to learn about 25 and 75 percent quantiles (which are called quartiles altogether with the median). x <- rnorm(1000000,5,2) # x is a vector of 1000000 RVs # which follow a normal dist. with mean 5 and std. dev. 2 mean(x) # [1] 4.997776 sd(x) # [1] 2.000817 var(x) # [1] 4.003268 median(x) # [1] 4.997408 summary(x) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -4.904 3.650 4.997 4.998 6.346 14.420 summary(x,digits=6) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -4.90360 3.65020 4.99741 4.99778 6.34564 14.42310 quantile(x) # this command yields the quartiles also # 0% 25% 50% 75% 100% # -4.903599 3.650201 4.997408 6.345639 14.423129 # quartiles can also be obtained by the following way sort(x)[1000000*0.25] # [1] 3.650189 sort(x)[1000000*0.5] # [1] 4.997408 sort(x)[1000000*0.75] # [1] 6.345639 Of course, when you try this sequence of commands, you will get dierent results since rnorm() will produce RVs from a dierent seed4 .
4 Especially for implementing common random numbers, you are able to use set.seed() command by setting an arbitrary value in the parentheses to x your random number sequence. For more details, see the link: https://stat.ethz.ch/pipermail/r-help/2006-June/107399.html
16
4
4.1
Creating Functions and Dening Loops in R

Creating Functions in R
We use the following structure in order to create a specic function which is not already dened in R. # f # { # # # # # } <- function(p1,p2,....) # define necessary parameters for the function use defined parameters (arguments) and other tools to obtain your result print values or objects if necessary draw diagrams if necessary write your result variable into the last line so the function will return it
So the sequence of commands says f is a function with parameters (p1,p2,....) which does the operations in {}. Check out the following examples of simple functions to comprehend how to create functions in R. # EXAMPLE 01 # A function that yields the circumference and the area of a circle given the radius circle <- function(r) # radius length { cf <- 2*pi*r # evaluates the circumference a <- pi*r^2 # evaluates the enclosed area res <- c(cf,a) names(res) <- c("circumference","area") res } circle(3) # circumference # 18.84956 circle(1) # circumference # 6.283185
area 28.27433 area 3.141593
17
# EXAMPLE 02 # A function that yields the perimeter and the area of a triangle # given corner coordinates in R2 # Check "www.mathopenref.com/coordtriangleareabox.html" for the explanation triangle <- function( a, # coordinate of 1st corner (must be a vector of length 2) b, # coordinate of 2nd corner (must be a vector of length 2) c # coordinate of 3rd corner (must be a vector of length 2) ){ if(length(a)!=2 || length(b)!=2 || length(c)!=2){ print("error, coordinates inappropriate") } # evaluating the perimeter ab <- sqrt((a[1]-b[1])^2+(a[2]-b[2])^2) bc <- sqrt((c[1]-b[1])^2+(c[2]-b[2])^2) ac <- sqrt((a[1]-c[1])^2+(a[2]-c[2])^2) pm <- ab+bc+ac # evaluating the area trab <- abs((a[1]-b[1])*(a[2]-b[2]))/2 trbc <- abs((c[1]-b[1])*(c[2]-b[2]))/2 trac <- abs((a[1]-c[1])*(a[2]-c[2]))/2 maxxy <- pmax(a,b,c) minxy <- pmin(a,b,c) sqa <- min(max((a[1]-minxy[1])*(a[2]-minxy[2]),0),max((maxxy[1]-a[1])*(maxxy[2]-a[2]),0)) sqb <- min(max((b[1]-minxy[1])*(b[2]-minxy[2]),0),max((maxxy[1]-b[1])*(maxxy[2]-b[2]),0)) sqc <- min(max((c[1]-minxy[1])*(c[2]-minxy[2]),0),max((maxxy[1]-c[1])*(maxxy[2]-c[2]),0)) area <- (maxxy[1]-minxy[1])*(maxxy[2]-minxy[2])-trab-trbc-trac-sqa-sqb-sqc pm <- (area!=0)*pm # if area=0, then there is no triangle res <- c(pm,area) names(res) <- c("perimeter","area") res } coora <- c(23,18) coorb <- c(13,34) coorc <- c(50,5) triangle(coora,coorb,coorc) # perimeter area # 95.84525 151.00000 coora <- c(10,18) coorb <- c(13,34) coorc <- c(50,5) triangle(coora,coorb,coorc) # perimeter area # 105.3489 339.5000
18
coora <- c(3,5) coorb <- c(9,15) coorc <- c(6,10) triangle(coora,coorb,coorc) # perimeter area # 0 0 Remember the ordering cost problem in section 2.2. We will create a function that yields the output in case of a change in unit costs and ordering costs. In this function we will also assign default values to input parameters. So, whenever a parameter is undened in the function call, R will assume the default value for this parameter. # EXAMPLE 03 orderingcostlist <- function( huc=7, # higher unit cost luc=6.5, # lower unit cost ucc=40, # minimum order amount with the lower unit cost hfc=50, # higher fixed cost lfc=15, # lower fixed cost fcc=45, # maximum order amount with the higher unit cost tcub=318 # total cost upper bound ){ units <- 30:50 marginalcost <- huc*units*(units<ucc)+luc*units*(units>=ucc) fixedcost <- hfc*(units<=fcc)+lfc*(units>fcc) totalcost <- fixedcost+marginalcost res <- totalcost[totalcost<=tcub] names(res) <- units[totalcost<=tcub] res } orderingcostlist() # will yield the same results before # 30 31 32 33 34 35 36 37 38 40 41 46 # 260.0 267.0 274.0 281.0 288.0 295.0 302.0 309.0 316.0 310.0 316.5 314.0 orderingcostlist(hfc=55,luc=6.3) # we just change two parameter values # 30 31 32 33 34 35 36 37 40 41 46 47 48 # 265.0 272.0 279.0 286.0 293.0 300.0 307.0 314.0 307.0 313.3 304.8 311.1 317.4 In order to see the construction of an if-else statement in R, We will implement following function as a last example. 2 x < 2 x x+6 2 x < 0 f (x) = x + 6 0 x < 4 x x4
19
# EXAMPLE 04 f <- function(x){ if(x<(-2)){ x^2 }else if(x<0){ x+6 }else if(x<4){ -x+6 } else{ sqrt(x) } } c(f(-4),f(-1),f(3),f(9)) # [1] 16 5 3 3 Note that you can also introduce predened functions as parameters. You will see an example of this in section 4.2.
4.2
Dening Loops in R
A basic structure for a predetermined number of loops, we use the following structure: # for(i in x){ # as i gets sequential values from vector x in each loop # do required operations depending on i variable # } You can do every vectoral operation with a for-loop. But in R, it takes longer to execute loops than it does in C. Thus, it is better to use vectoral operations when possible. The following example estimates5 the expectation for the maximum of two standard uniform random variates, Y = max(U1 , U2 ), which is actually equal to 2/3. We will not use pmax() function. Instead, we will dene a for-loop. Now, this is our rst Monte Carlo simulation in this paper. simmax2unif <- function(n){ y <- 0 # in order to record the output of our simulation in "res" # we should define it before the for-loop for(i in 1:n){ # i will take integer values from 1 to n u1 <- runif(1) u2 <- runif(1) y[i] <- max(u1,u2) # record the estimate as the "i"th entry } res <- mean(y) names(res) <- c("expectation") res }
5 Especially for IE 586 students, we have to imply the importance of returning condence intervals for Monte Carlo (simulation) estimates. In the functions simmax2unif and simmax2unif 2, we have missed that output due to unwanted complications at this level.
20
simmax2unif(100000) # expectation # 0.665354266 system.time(x <- simmax2unif(100000)) # execution time in seconds # user system elapsed # 35.30 0.08 35.43 # Do the same simulation with pmax() simmax2unif_2 <- function(n){ u1 <- runif(n) u2 <- runif(n) y <- pmax(u1,u2) res <- mean(y) names(res) <- c("expectation") res } simmax2unif_2(1000000) # expectation # 0.6665182787 system.time(x <- simmax2unif_2(100000)) # execution time in seconds # user system elapsed # 0.03 0.00 0.03 As you can see, vectoral operations work way much faster than loops. Still, under some circumstances, loops might be the only option to implement your algorithms. While-loops are useful espacially for the convergence algorithms. For undetermined number of loops, we use a while-loop, which is dened as follows: # while(condition){ # as long as the condition is satisfied, run the loop # do required operations # }
21
Here is a basic root nding algorithm that uses a while-loop: # a root finding algorithm # finds the unique real root of a continuous function in an interval # the function should intersect with x-axis and should not be a tangent to x-axis findroot <- function( f, # continuous function that we will solve for zero interval, # the interval where we have a single solution (a vector of length 2) errbound=1e-12, # maximum approximation error trace=FALSE # if trace is true, print the covergent sequence ){ a <- interval[1] b <- interval[2] if(f(a)*f(b)>0){ print("error - no solution or more than one solution") }else{ counter <- 0 res <- 0 err <- abs(a-b) while(err>errbound){ c <- (a+b)/2 fc <- f(c) if(f(a)*fc>0){ a <- c }else{ b <- c } err <- abs(a-b) counter <- counter+1 res[counter] <- a } print(c(a,counter)) if(trace){ print(res) } } } func <- function(x){x^2-2} int <- c(1,2) findroot(func,int) # [1] 1.414214 40.000000 findroot(func,int,trace=TRUE) # [1] 1.414214 40.000000 # [1] 1.000000 1.250000 1.375000 # [9] 1.414062 1.414062 1.414062 # [17] 1.414207 1.414211 1.414213 # [25] 1.414214 1.414214 1.414214 # [33] 1.414214 1.414214 1.414214
1.375000 1.414062 1.414213 1.414214 1.414214
1.406250 1.414185 1.414213 1.414214 1.414214
1.406250 1.414185 1.414213 1.414214 1.414214
1.414062 1.414185 1.414214 1.414214 1.414214
1.414062 1.414200 1.414214 1.414214 1.414214
22
Drawing Plot Diagrams and Histograms in R
We would like to draw a plot diagram for the density function of standard normal distirbution in the interval (-4,4). We should create a dense vector in the x-axis (it should be dense in order to make a good approximation), and evaluate their function responses as a second vector. x <- seq(-4,4,length.out=51) # this is not enoughly dense y <- dnorm(x) plot(x,y) # plots with blank dots (figure 1) windows() # you can use this command to display your diagram in a new window plot(x,y,type="l") # connects the same dots (figure 2) x <- seq(-4,4,length.out=10001) # this is a dense vector y <- dnorm(x) windows() plot(x,y,type="l") # connects more dense dots (figure 3) The plot diagrams are given in Table 1. Now, we want to see how to draw a histogram of a vector in R with hist(). Histograms are quite pretty tools to see the distribution of given data. You can obtain a better histogram by changing break parameter. x <- rnorm(1000000,3,1.5) # a vector of normal RVs with mean 3 and std. dev. 1.5 hist(x) windows() hist(x,breaks=50) windows() hist(x,breaks=100) The histograms are given in Table 2. You can also add new lines and functions to a plot diagram or a histogram which is already displayed6 . We use lines() command with a similar use of plot() command. This time, there is no necessity for adding a type parameter. You can also add lines to existing diagrams with abline() command. Check out following examples. hist(x,breaks=100) y <- seq(-5,10,length.out=100001) lines(y,dnorm(y,3,1.5)*200000) y <- seq(-5,10,length.out=101) windows() plot(y,dnorm(y,3,1.5)) lines(y,dnorm(y,3,1.5)) windows() plot(y,dnorm(y,3,1.5),type="l") abline(v=4.5) # add a "v"ertical line on x=4.5 abline(v=1.5) # add a "v"ertical line on x=1.5 abline(h=dnorm(1.5,3,1.5)) # add a "h"orizontal line on y=dnorm(1.5,3,1.5) abline(a=0.10,b=0.01) # add a line with slope=0.01 and intercept=0.10 The diagrams are given in Table 3.
6 One can use points() command to add new points to a plot diagram. For more details, see the following link: http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/points.html
23
Figure 1: Plot diagrams for the density function of standard normal distribution
Figure 2: Histograms of a vector of normal RVs with mean 3 and standard deviation 1.5
Figure 3: Adding lines on existing diagrams with lines() (1-2) and abline() (3) commands
24
6
6.1
Basic User Information

Scaning and Printing Data
Assume that you have a data7 written in a text le in the following format. 3 25 94.9 12 547 32556 56 89 567 435 342.1 76.5 983.2 0 343 # There are 15 real values You can use the command scan() in order to store this data in a vector by scanning it from left to right and top to down. Spaces and new lines will separate the values to store them in new indices. x # # # # # # # # # # # # <- scan() press enter after writing this line, it will display "1:" on the command line Press CTRL+V to paste the copied data, 15 real values will be stored in x it will display "16:" on the command line Press enter in order to finish scanning process, 16th index will be ignored 1: 3 25 94.9 12 5: 547 32556 56 8: 89 567 10: 435 342.1 12: 76.5 983.2 14: 0 343 16: Read 15 items
x # [1] # [10]
3.0 435.0
25.0 342.1
94.9 76.5
12.0 983.2
547.0 32556.0 0.0 343.0
56.0
89.0
567.0
You can also scan a column of cells from an Excel sheet, but not rows. Be careful that the decimal separator is (.) in R. So you can only scan values that uses (.) as the decimal separator. You can also read tables from a text le. Assume you have a text le containing a data similar to the following format: length weight age 1.72 72.3 25 1.69 85.3 23 1.80 75.0 26 1.61 66 23 1.73 69 24 # 3 values in each row Right click to the R shortcut on your desktop. Choose properties and learn your Start In directory8 . Copy your text le and paste it in that directory. Suppose it is named data.txt. Write the following command:
7 Such 8 You
data should only contain rational numbers. can also change your start in directory.
25
x <- read.table(file="data.txt",header=TRUE) # if you do not have any headers in your data, choose header as FALSE x # press enter to display x table # length weight age # 1 1.72 72.3 25 # 2 1.69 85.3 23 # 3 1.80 75.0 26 # 4 1.61 66.0 23 # 5 1.73 69.0 24 x$length # [1] 1.72 1.69 1.80 1.61 1.73 x$weight # [1] 72.3 85.3 75.0 66.0 69.0 x$age # [1] 25 23 26 23 24 In order to read tables from Excel sheet, you can just copy and paste it to a text le. Then, you can read the table from that le. You can print a comment or an object9 within a function by using print() command. To print a comment, do not forget to put it in a quotation. print("error") # [1] "error" x <- 1:5 print(x) # [1] 1 2 3 4 5
6.2
Session Management
You can nd detailed information about the functions which came predened with R. You can learn about the parameters (arguments) that are available within the function and a few examples about the function. Just write ? and the name of the function that you want to learn information about. Check out the explanations given in R about following functions. ?det ?sample ?sin ?cbind You can use apropos(".") to nd a list of all functions that contains a specic word. These functions can be given with the default library or can be dened by you in that R session. apropos("norm") # [1] "dlnorm" "dnorm" # [5] "pnorm" "qlnorm" # [9] "qqnorm.default" "rlnorm" "normalizePath" "qnorm" "rnorm" "plnorm" "qqnorm"
9 Objects
can be vectors, matrices, arrays, functions, lists (lists are similar to structures in C), tables etc.
26
apropos("exp") # [1] ".__C__expression" # [4] ".mergeExportMethods" # [7] "as.expression.default" # [10] "exp" # [13] "expm1" # [16] "getNamespaceExports" # [19] "namespaceExport" # [22] "qexp" # [25] "SSbiexp"
".expand_R_libs_env_var" ".standard_regexps" "char.expand" "expand.grid" "expression" "gregexpr" "path.expand" "regexpr" "USPersonalExpenditure"
".Export" "as.expression" "dexp" "expand.model.frame" "getExportedValue" "is.expression" "pexp" "rexp"
If you need to see all the objects that you have created in your work session, simply write objects(). objects() # [1] "a" # [5] "coorb" # [9] "findroot" # [13] "lbound" # [17] "res" # [21] "triangle" # [25] "x" # [29] "y1" # [33] "y5" "b" "coorc" "fixedcost" "marginalcost" "simmax2unif" "ubound" "xest" "y2" "y6" "circle" "error" "func" "n" "simmax2unif_2" "units" "xinv" "y3" "z" "coora" "f" "int" "orderingcostlist" "totalcost" "vec" "y" "y4"
You can always save your R session together with the objects that you have created by clicking File, then Save Workspace from the quick access bar. You can always reach your saved workspaces by a double-click on the saved le.
27

R Tutorial PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

R Tutorial PDF

Hochgeladen von

Copyright:

Verfügbare Formate

An Easy Introduction To R for IE 460, IE 508 and IE 586 Course Participants

Ismail Baolu s g February 23, 2012

R Works with Vectors

FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE

TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE

x <- 1:20 y <- (x>=8)*(x) y # [1] 0 0 0 0 #[18] 18 19 20

<- matrix(0,nrow=4,ncol=4) [,1] [,2] [,3] [,4] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

[1,] [2,] [3,] [4,]

b <- pmin(x,y,z) b # [1] 1 2 1 4 5 4 4 3 2 1 sort(b,decreasing=TRUE) # [1] 5 4 4 4 3 2 2 1 1 1 order(b,decreasing=TRUE) # [1] 5 4 6 7 8 2 9

<- matrix(1:6,ncol=2,nrow=3) [,1] [,2] 1 4 2 5 3 6

[1,] [2,] [3,]

Probability and Statistical Basis of R

1.50837615 3.36373629 1.79465998

Creating Functions and Dening Loops in R

area 28.27433 area 3.141593

1.375000 1.414062 1.414213 1.414214 1.414214

1.406250 1.414185 1.414213 1.414214 1.414214

1.406250 1.414185 1.414213 1.414214 1.414214

1.414062 1.414185 1.414214 1.414214 1.414214

1.414062 1.414200 1.414214 1.414214 1.414214

Drawing Plot Diagrams and Histograms in R

Basic User Information

547.0 32556.0 0.0 343.0

".expand_R_libs_env_var" ".standard_regexps" "char.expand" "expand.grid" "expression" "gregexpr" "path.expand" "regexpr" "USPersonalExpenditure"

".Export" "as.expression" "dexp" "expand.model.frame" "getExportedValue" "is.expression" "pexp" "rexp"

Das könnte Ihnen auch gefallen