Sie sind auf Seite 1von 100

Mini-curso

Introdução ao R

Prof. Dr. Henrique Castro


hcastro@usp.br

Faculdade de Economia, Administração


e Contabilidade da Universidade de São
Paulo, FEA-USP

2013
1 of 100
Working directory
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Get or Set Working Directory


• getwd(): returns an absolute filepath representing the current
working directory of the R process
• setwd("C:/Users/Henrique/Documents"): is used to set the
working directory to dir.

2 of 100
Installing and loading R packages
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Install packages

install.packages("foreign")

Loading packages

install.packages("foreign")

3 of 100
Importing data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• Importing data into R is fairly simple.


• For Stata and Systat, use the foreign package.
• For SPSS and SAS I would recommend the Hmisc package for
ease and functionality.
From a Comma Delimited Text File
# first row contains variable names, comma is separator
# assign the variable id to row names

mydata <- read.table("mydata.csv", header=TRUE, sep=",", row.names="id")

Exercise
• Create a csv file and import.
4 of 100
Importing data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

From Excel
• The best way to read an Excel file is to export it to a comma
delimited file and import it using the method above.
• On windows systems you can use the RODBC package to access
Excel files.
• The first row should contain variable/column names.

# first row contains variable names


# we will read in workSheet mysheet

library(RODBC)
channel <- odbcConnectExcel("c:/myexel.xls")
mydata <- sqlFetch(channel, "mysheet")
odbcClose(channel)
5 of 100
Importing data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

From SPSS
# save SPSS dataset in transport format
get file=’c:\mydata.sav’.
export outfile=’c:\mydata.por’.

# in R
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

From Stata
# input Stata file
library(foreign)
mydata <- read.dta("c:/mydata.dta")

6 of 100
Getting Information on a Dataset
Introdução ao R (Prof. Henrique Castro, FEA-USP)

# list objects in the working environment


ls()

# list the variables in mydata


names(mydata)

# list the structure of mydata


str(mydata)

# dimensions of an object
dim(object)

# class of an object (numeric, matrix, data frame, etc)


class(object)

# print mydata
mydata

# print first 10 rows of mydata


head(mydata, n=10)

# print last 5 rows of mydata


tail(mydata, n=5)

7 of 100
Selecting Elements
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Identify rows, columns or elements using subscripts


> mydata[,3] # 3th column of matrix
[1] 0.02138096 0.53024983 0.81832713 0.68361351 0.57439575 0.65139071
[7] 0.77752234 0.78719712 0.99746985 0.91793588 0.08434072 0.22034468
[13] 0.41735257 0.02295936 0.38344190 0.59126108 0.96521616 0.56327819
[19] 0.38097578 0.40297945 0.24743431 0.40990097 0.95087157 NA
[25] 0.86153218

> mydata[3,] # 3rd row of matrix


y x1 x2
3 0.327052 0.2195572 0.8183271

> mydata[2:4,1:3] # rows 2,3,4 of columns 1,2,3


y x1 x2
2 0.6111948 0.3738518 0.5302498
3 0.3270520 0.2195572 0.8183271
4 0.9560805 0.2477098 0.6836135

8 of 100
Vectors and Data Frames
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Vectors Data Frame


> d <- c(1,2,3,4) > mydata2 <- data.frame(d,e,f)
> e <- c("red", "white", "red", NA) > mydata2
> f <- c(TRUE,TRUE,TRUE,FALSE) d e f
> d 1 1 red TRUE
[1] 1 2 3 4 2 2 white TRUE
> e 3 3 red TRUE
[1] "red" "white" "red" NA 4 4 <NA> FALSE
> f > names(mydata2) <- c("ID","Color","Passed") # variable names
[1] TRUE TRUE TRUE FALSE > mydata2
ID Color Passed
1 1 red TRUE
2 2 white TRUE
3 3 red TRUE
4 4 <NA> FALSE

9 of 100
Missing Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• In R, missing values are represented by the symbol NA (not


available) .
Excluding Missing Values from Analyses
• Arithmetic functions on missing values yield missing values.
x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2

Function complete.cases()

> complete.cases(mydata)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[25] TRUE

10 of 100
Missing Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

List rows of data that have missing values


> mydata[!complete.cases(mydata),]
y x1 x2
23 0.54610883 NA 0.9508716
24 0.03563499 0.07205155 NA

Function na.omit()

> # create new dataset without missing data


> newdata <- na.omit(mydata)
> dim(newdata)
[1] 23 3
> tail(newdata)
y x1 x2
18 0.3477647 0.1576324 0.5632782
19 0.4663852 0.5410509 0.3809758
20 0.6129595 0.8376189 0.4029794
21 0.6741362 0.1673157 0.2474343
22 0.4829047 0.1548488 0.4099010
25 0.5821938 0.0698297 0.8615322
11 of 100
Data Management
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Creating new variables


> mydata$sum <- mydata$x1 + mydata$x2
> mydata$mean <- (mydata$x1 + mydata$x2)/2
> head(mydata)
y x1 x2 sum mean
1 0.8122147 0.4219462 0.02138096 0.4433272 0.2216636
2 0.6111948 0.3738518 0.53024983 0.9041016 0.4520508
3 0.3270520 0.2195572 0.81832713 1.0378844 0.5189422
4 0.9560805 0.2477098 0.68361351 0.9313233 0.4656617
5 0.9247670 0.2061416 0.57439575 0.7805373 0.3902687
6 0.8935983 0.6727692 0.65139071 1.3241599 0.6620799
> tail(mydata)
y x1 x2 sum mean
20 0.61295952 0.83761895 0.4029794 1.2405984 0.6202992
21 0.67413620 0.16731569 0.2474343 0.4147500 0.2073750
22 0.48290474 0.15484882 0.4099010 0.5647498 0.2823749
23 0.54610883 NA 0.9508716 NA NA
24 0.03563499 0.07205155 NA NA NA
25 0.58219381 0.06982970 0.8615322 0.9313619 0.4656809

12 of 100
Data Management
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Recoding variables
> mydata$x1cat<-ifelse(mydata$x1>0.5,c("high"),c("low"))
> head(mydata)
y x1 x2 sum mean x1cat
1 0.8122147 0.4219462 0.02138096 0.4433272 0.2216636 low
2 0.6111948 0.3738518 0.53024983 0.9041016 0.4520508 low
3 0.3270520 0.2195572 0.81832713 1.0378844 0.5189422 low
4 0.9560805 0.2477098 0.68361351 0.9313233 0.4656617 low
5 0.9247670 0.2061416 0.57439575 0.7805373 0.3902687 low
6 0.8935983 0.6727692 0.65139071 1.3241599 0.6620799 high

> mydata$x1cat[mydata$x1>0.5] <- "High"


> mydata$x1cat[mydata$x1<=0.5] <- "Low"
> head(mydata)
y x1 x2 sum mean x1cat
1 0.8122147 0.4219462 0.02138096 0.4433272 0.2216636 Low
2 0.6111948 0.3738518 0.53024983 0.9041016 0.4520508 Low
3 0.3270520 0.2195572 0.81832713 1.0378844 0.5189422 Low
4 0.9560805 0.2477098 0.68361351 0.9313233 0.4656617 Low
5 0.9247670 0.2061416 0.57439575 0.7805373 0.3902687 Low
6 0.8935983 0.6727692 0.65139071 1.3241599 0.6620799 High

13 of 100
Data Management
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Merging Data
> x<-data.frame(1:10,rep(1))
> names(x)<-c("id","x1")
> head(x,2)
id x1
1 1 1
2 2 1
> y<-data.frame(1:10,rep(2))
> names(y)<-c("id","y1")
> head(y,2)
id y1
1 1 2
2 2 2
> dataxy<-merge(x,y,by="id")
> head(dataxy)
id x1 y1
1 1 1 2
2 2 1 2
3 3 1 2
4 4 1 2
5 5 1 2
6 6 1 2

14 of 100
Data Management
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Reshaping Data
> d1 <- data.frame(subject = c("1", "2"),
+ x0 = c("male", "female"),
+ x1_2000 = 1:2,
+ x1_2005 = 5:6,
+ x2_2000 = 1:2,
+ x2_2005 = 5:6
+ )

> d1
subject x0 x1_2000 x1_2005 x2_2000 x2_2005
1 1 male 1 5 1 5
2 2 female 2 6 2 6

> reshape(d1, dir = "long", varying = 3:6, sep = "_")


subject x0 time x1 x2 id
1.2000 1 male 2000 1 1 1
2.2000 2 female 2000 2 2 2
1.2005 1 male 2005 5 5 1
2.2005 2 female 2005 6 6 2

15 of 100
Data Management
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Reshaping Data
> d1 <- data.frame(subject = c("1", "2"),
+ x0 = c("male", "female"),
+ x1_2000 = 1:2,
+ x1_2005 = 5:6,
+ x2_2000 = 1:2,
+ x2_2005 = 5:6
+ )

> d1
subject x0 x1_2000 x1_2005 x2_2000 x2_2005
1 1 male 1 5 1 5
2 2 female 2 6 2 6

> reshape(d1, dir = "long", varying = 3:6, sep = "_")


subject x0 time x1 x2 id
1.2000 1 male 2000 1 1 1
2.2000 2 female 2000 2 2 2
1.2005 1 male 2005 5 5 1
2.2005 2 female 2005 6 6 2

16 of 100
Subsetting Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Selecting (Keeping) Variables


> # select variables y, x1
> myvars <- c("y", "x1")
> newdata <- mydata[myvars]
> head(newdata)
y x1
1 0.8122147 0.4219462
2 0.6111948 0.3738518
3 0.3270520 0.2195572
4 0.9560805 0.2477098
5 0.9247670 0.2061416
6 0.8935983 0.6727692

17 of 100
Subsetting Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Excluding (DROPPING) Variables


> newdata <- mydata[c(-3,-5)]
> head(newdata)
y x1 sum x1cat
1 0.8122147 0.4219462 0.4433272 Low
2 0.6111948 0.3738518 0.9041016 Low
3 0.3270520 0.2195572 1.0378844 Low
4 0.9560805 0.2477098 0.9313233 Low
5 0.9247670 0.2061416 0.7805373 Low
6 0.8935983 0.6727692 1.3241599 High

18 of 100
Subsetting Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Selecting Observations
# first 5 observerations
> newdata <- mydata[1:5,]
> newdata
y x1 x2 sum mean x1cat
1 0.8122147 0.4219462 0.02138096 0.4433272 0.2216636 Low
2 0.6111948 0.3738518 0.53024983 0.9041016 0.4520508 Low
3 0.3270520 0.2195572 0.81832713 1.0378844 0.5189422 Low
4 0.9560805 0.2477098 0.68361351 0.9313233 0.4656617 Low
5 0.9247670 0.2061416 0.57439575 0.7805373 0.3902687 Low

# based on variable values


> newdata <- mydata[which(mydata$x1cat==’Low’ & mydata$sum > 1), ]
> newdata
y x1 x2 sum mean x1cat
3 0.3270520 0.2195572 0.8183271 1.037884 0.5189422 Low
17 0.1994746 0.1669037 0.9652162 1.132120 0.5660599 Low

19 of 100
Subsetting Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Selection using the Subset Function


> newdata <- subset(mydata, x1>=.6 | x2<.4, select=c(y:x2))
> newdata
y x1 x2
1 0.81221468 0.4219462 0.02138096
6 0.89359831 0.6727692 0.65139071
7 0.59623223 0.7039585 0.77752234
9 0.05417394 0.6776613 0.99746985
11 0.79439514 0.4500744 0.08434072
12 0.21072273 0.3740807 0.22034468
14 0.02172214 0.2773447 0.02295936
15 0.14615300 0.7039209 0.38344190
16 0.89172957 0.8183909 0.59126108
19 0.46638519 0.5410509 0.38097578
20 0.61295952 0.8376189 0.40297945
21 0.67413620 0.1673157 0.24743431

20 of 100
Subsetting Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Random Samples
> set.seed(1)
> x<-1:25
> sample(x, 7, replace = F)
[1] 7 9 14 20 5 18 22

> set.seed(1)
> mysample <- mydata[sample(1:nrow(mydata), 7, replace=FALSE),]
> mysample
y x1 x2 sum mean x1cat
7 0.59623223 0.7039585 0.77752234 1.4814808 0.7407404 High
9 0.05417394 0.6776613 0.99746985 1.6751312 0.8375656 High
14 0.02172214 0.2773447 0.02295936 0.3003041 0.1501520 Low
20 0.61295952 0.8376189 0.40297945 1.2405984 0.6202992 High
5 0.92476700 0.2061416 0.57439575 0.7805373 0.3902687 Low
18 0.34776470 0.1576324 0.56327819 0.7209106 0.3604553 Low
22 0.48290474 0.1548488 0.40990097 0.5647498 0.2823749 Low

21 of 100
Numeric Functions
Introdução ao R (Prof. Henrique Castro, FEA-USP)

abs(x) absolute value


sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
trunc(x) trunc(5.99) is 5
round(x, digits=n) round(3.475, digits=2) is 3.48
signif(x, digits=n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.
log(x) natural logarithm
log10(x) common logarithm
exp(x) ex

22 of 100
Statistical Functions
Introdução ao R (Prof. Henrique Castro, FEA-USP)
Function Description

mean(x, trim=0, na.rm=T) mean of object x


# trimmed mean, removing any missing values and
# 5 percent of highest and lowest scores
mx <- mean(x,trim=.05,na.rm=TRUE)

sd(x) standard deviation of object(x). also look at var(x) for variance and
mad(x) for median absolute deviation.

median(x) median

quantiles where x is the numeric vector whose quantiles are desired


quantile(x, probs) and probs is a numeric vector with probabilities in [0,1].
# 30th and 84th percentiles of x
y <- quantile(x, c(.3,.84))

range(x) range

sum(x) sum

diff(x, lag=1) lagged differences, with lag indicating which lag to use

min(x) minimum

max(x) maximum

scale(x, center=TRUE, scale=TRUE) column center or standardize a matrix.


23 of 100
Descriptive Statistics
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Summary function
> # mean,median,25th and 75th quartiles,min,max
> summary(mydata)
y x1 x2 sum
Min. :0.02172 Min. :0.03389 Min. :0.02138 Min. :0.3003
1st Qu.:0.32603 1st Qu.:0.16459 1st Qu.:0.38283 1st Qu.:0.5796
Median :0.54611 Median :0.32560 Median :0.56884 Median :0.9313
Mean :0.50232 Mean :0.37177 Mean :0.55256 Mean :0.9200
3rd Qu.:0.67414 3rd Qu.:0.57398 3rd Qu.:0.79498 3rd Qu.:1.1864
Max. :0.95608 Max. :0.83762 Max. :0.99747 Max. :1.6751
NA’s :1 NA’s :1 NA’s :2
mean x1cat
Min. :0.1502 Length:25
1st Qu.:0.2898 Class :character
Median :0.4657 Mode :character
Mean :0.4600
3rd Qu.:0.5932
Max. :0.8376
NA’s :2

24 of 100
Descriptive Statistics
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Using the pastecs package

> library(pastecs)
> stat.desc(mydata)
y x1 x2 sum mean x1cat
nbr.val 25.00000000 24.00000000 24.00000000 23.00000000 23.00000000 NA
nbr.null 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 NA
nbr.na 0.00000000 1.00000000 1.00000000 2.00000000 2.00000000 NA
min 0.02172214 0.03388894 0.02138096 0.30030407 0.15015204 NA
max 0.95608055 0.83761895 0.99746985 1.67513117 0.83756558 NA
range 0.93435840 0.80373000 0.97608890 1.37482709 0.68741355 NA
sum 12.55789913 8.92258074 13.26137199 21.16102961 10.58051480 NA
median 0.54610883 0.32559825 0.56883697 0.93132330 0.46566165 NA
mean 0.50231597 0.37177420 0.55255717 0.92004477 0.46002238 NA
SE.mean 0.05751775 0.05243628 0.06117871 0.07890179 0.03945090 NA
CI.mean.0.95 0.11871081 0.10847271 0.12655781 0.16363231 0.08181615 NA
var 0.08270730 0.06598952 0.08982804 0.14318634 0.03579659 NA
std.dev 0.28758876 0.25688425 0.29971327 0.37839971 0.18919986 NA
coef.var 0.57252563 0.69096848 0.54241133 0.41128402 0.41128402 NA

25 of 100
Descriptive Statistics
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Using the psych package

> library(psych)
> describe(mydata[c(-6)])
var n mean sd median trimmed mad min max range skew kurtosis se
y 1 25 0.50 0.29 0.55 0.51 0.33 0.02 0.96 0.93 -0.11 -1.17 0.06
x1 2 24 0.37 0.26 0.33 0.36 0.26 0.03 0.84 0.80 0.39 -1.31 0.05
x2 3 24 0.55 0.30 0.57 0.56 0.32 0.02 1.00 0.98 -0.21 -1.13 0.06
sum 4 23 0.92 0.38 0.93 0.91 0.50 0.30 1.68 1.37 0.13 -1.08 0.08
mean 5 23 0.46 0.19 0.47 0.45 0.25 0.15 0.84 0.69 0.13 -1.08 0.04

26 of 100
Descriptive Statistics
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Summary Statistics by Group


> library(psych)
> describeBy(mydata[,c(-6)], group = mydata$x1cat)
group: High
var n mean sd median trimmed mad min max range skew kurtosis se
y 1 8 0.52 0.31 0.54 0.52 0.32 0.05 0.89 0.84 -0.20 -1.49 0.11
x1 2 8 0.68 0.12 0.69 0.68 0.11 0.51 0.84 0.33 -0.16 -1.44 0.04
x2 3 8 0.62 0.23 0.62 0.62 0.28 0.38 1.00 0.62 0.26 -1.55 0.08
sum 4 8 1.30 0.23 1.31 1.30 0.20 0.92 1.68 0.75 -0.09 -1.13 0.08
mean 5 8 0.65 0.12 0.65 0.65 0.10 0.46 0.84 0.38 -0.09 -1.13 0.04
------------------------------------------------------------------
group: Low
var n mean sd median trimmed mad min max range skew kurtosis se
y 1 16 0.49 0.30 0.52 0.49 0.35 0.02 0.96 0.93 -0.01 -1.32 0.07
x1 2 16 0.22 0.13 0.19 0.21 0.15 0.03 0.45 0.42 0.36 -1.24 0.03
x2 3 15 0.49 0.32 0.53 0.49 0.43 0.02 0.97 0.94 -0.04 -1.46 0.08
sum 4 15 0.72 0.26 0.72 0.71 0.31 0.30 1.13 0.83 0.02 -1.55 0.07
mean 5 15 0.36 0.13 0.36 0.36 0.16 0.15 0.57 0.42 0.02 -1.55 0.03

27 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Scatter plot Regression of MPG on Weight

# Creating a Graph ●

attach(mtcars) ● ●

30
plot(mpg~wt, xlab="Car weight", ylab="Miles per gallon")
title("Regression of MPG on Weight") ●

25
Miles per gallon

● ●

Saving Graphs ●


20

● ●



• You can save the graph in a 15




● ● ●


variety of formats from the ●

menu File → Save As ● ●


10

2 3 4 5

Car weight

28 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Histograms Histogram of mtcars$mpg

12
• You can create histograms with the
function hist(x) where x is a numeric

10
vector of values to be plotted.
• The option freq=FALSE plots

8
Frequency
probability densities instead of

6
frequencies.

4
• The option breaks= controls the
number of bins. 2

# Simple Histogram
0

hist(mtcars$mpg) 10 15 20 25 30 35

mtcars$mpg

29 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Histograms Histogram of mtcars$mpg

• You can create histograms with the

7
function hist(x) where x is a numeric

6
vector of values to be plotted.

5
• The option freq=FALSE plots

Frequency

4
probability densities instead of
frequencies.

3
• The option breaks= controls the
2
number of bins. 1

# Colored Histogram with Bins = 12


0

hist(mtcars$mpg, breaks=12, col="red") 10 15 20 25 30

mtcars$mpg

30 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Dot Plots Gas Milage for Car Models

Volvo 142E ●

• Create dotplots with the dotchart(x, Maserati Bora


Ferrari Dino

Ford Pantera L ●

labels=) function, where x is a numeric Lotus Europa


Porsche 914−2 ●

vector and labels is a vector of labels for Fiat X1−9


Pontiac Firebird ●

Camaro Z28 ●

each point. AMC Javelin ●

Dodge Challenger ●

• cex controls the size of the labels. Toyota Corona


Toyota Corolla
Honda Civic


Fiat 128 ●

Chrysler Imperial ●

Lincoln Continental ●

# Simple Dotplot Cadillac Fleetwood ●

Merc 450SLC ●
dotchart(mtcars$mpg,labels=row.names(mtcars),cex=.7, Merc 450SL ●

main="Gas Milage for Car Models", Merc 450SE


Merc 280C

xlab="Miles Per Gallon") Merc 280 ●

Merc 230 ●

Merc 240D ●

Duster 360 ●

Valiant ●

Hornet Sportabout ●

Hornet 4 Drive ●

Datsun 710 ●

Mazda RX4 Wag ●

Mazda RX4 ●

10 15 20 25 30

Miles Per Gallon

31 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Dot Plots Gas Milage for Car Models


grouped by cylinder

• You can add a groups= option to Toyota Corolla


Fiat 128
Lotus Europa ●

designate a factor specifying how the Honda Civic


Fiat X1−9
Porsche 914−2 ●

elements of x are grouped. Merc 240D


Merc 230 ●

Datsun 710 ●

• If so, the option gcolor= controls the Toyota Corona


Volvo 142E

6
color of the groups label. Hornet 4 Drive
Mazda RX4 Wag ●

Mazda RX4 ●

Ferrari Dino ●

Merc 280 ●

Valiant ●
# Dotplot: Grouped Sorted and Colored Merc 280C ●

# Sort by mpg, group and color by cylinder 8


Pontiac Firebird ●
x <- mtcars[order(mtcars$mpg),] # sort by mpg Hornet Sportabout ●

Merc 450SL ●
x$cyl <- factor(x$cyl) # it must be a factor Merc 450SE ●

Ford Pantera L ●
x$color[x$cyl==4] <- "red" Dodge Challenger ●

AMC Javelin ●
x$color[x$cyl==6] <- "blue" Merc 450SLC ●

Maserati Bora ●
x$color[x$cyl==8] <- "darkgreen" Chrysler Imperial ●

Duster 360 ●
dotchart(x$mpg,labels=row.names(x),cex=.7,groups= x$cyl, Camaro Z28 ●

Lincoln Continental ●
main="Gas Milage for Car Models\ngrouped by cylinder", Cadillac Fleetwood ●

xlab="Miles Per Gallon", gcolor="black", color=x$color) 10 15 20 25 30

Miles Per Gallon

32 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Bar Plots Car Distribution

• Create barplots with the barplot(height)

14
function, where height is a vector or
matrix.

12
• If height is a vector, the values determine

10
the heights of the bars in the plot.

8
# Simple Bar Plot

6
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
xlab="Number of Gears")
4
2
0

3 4 5

Number of Gears

33 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Bar Plots Car Distribution

• Create barplots with the barplot(height)


function, where height is a vector or

5 Gears
matrix.
• If height is a vector, the values determine
the heights of the bars in the plot.

4 Gears
# Simple Horizontal Bar Plot with Added Labels
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution", horiz=TRUE,
names.arg=c("3 Gears", "4 Gears", "5 Gears"))
3 Gears

0 2 4 6 8 10 12 14

34 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Stacked Bar Plot Car Distribution by Gears and VS

• If height is a matrix and the option 1

14
0
beside=FALSE then each bar of the plot
corresponds to a column of height, with

12
the values in the column giving the

10
heights of stacked “sub-bars”.

8
# Stacked Bar Plot with Colors and Legend
counts <- table(mtcars$vs, mtcars$gear)

6
barplot(counts, main="Car Distribution by Gears and VS",
xlab="Number of Gears", col=c("darkblue","red"),
4
legend = rownames(counts)) 2
0

3 4 5

Number of Gears

35 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Grouped Bar Plot Car Distribution by Gears and VS

12
• If height is a matrix and beside=TRUE, 0
1
then the values in each column are

10
juxtaposed rather than stacked.

8
# Grouped Bar Plot
counts <- table(mtcars$vs, mtcars$gear)
barplot(counts, main="Car Distribution by Gears and VS",

6
xlab="Number of Gears", col=c("darkblue","red"),
legend = rownames(counts), beside=TRUE)

4
2
0

3 4 5

Number of Gears

36 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Line Charts
• Line charts are created with the
function lines(x, y, type=)
where x and y are numeric vectors of
(x,y) points to connect. type description
• type= can take the values in the
table. p points
l lines
• The lines( ) function adds o overplotted points and lines
information to a graph. It can not b, c points (empty if "c") joined by lines
produce a graph on its own. s, S stair steps
• Usually it follows a plot(x, y) h histogram-like vertical lines
command that produces a graph. n does not produce any points or lines
• By default, plot( ) plots the (x,y)
points. Use the type="n" option in
the plot( ) command, to create the
graph with axes, titles, etc., but
without plotting the points.
37 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

5
Line Charts

4
x <- 1:5 # create some data
y <- x # create some data

3
y
# plotting symbol and color
plot(x, y, type="n")
lines(x, y, type="o") 2
1

1 2 3 4 5
x

38 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Line Charts
type= p type= l type= o type= b

5
# plotting symbol and color

4
par(pch=22, col="red")

3
y

y
# all plots on one page

2
par(mfrow=c(2,4))

1
1 3 5 1 3 5 1 3 5 1 3 5
opts = c("p","l","o","b","c","s","S","h") x x x x
for(i in 1:length(opts)){
type= c type= s type= S type= h
heading = paste("type=",opts[i])

5
plot(x, y, type="n", main=heading)

4
lines(x, y, type=opts[i])
3

3
y

y
} 2

2
# return to original setting
1

1
par(mfrow=c(1,1)) 1 3 5 1 3 5 1 3 5 1 3 5
x x x x

39 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

type= p type= l type= o type= b


Line Charts

5
4

4
par(pch=22, col="blue")

3
y

y
par(mfrow=c(2,4))

2
opts = c("p","l","o","b","c","s","S","h")

1
1 3 5 1 3 5 1 3 5 1 3 5
for(i in 1:length(opts)){ x x x x
heading = paste("type=",opts[i])
type= c type= s type= S type= h
plot(x, y, main=heading)

5
lines(x, y, type=opts[i])

4
}
3

3
y

y
# return to original setting 2

2
par(mfrow=c(1,1))
1

1
1 3 5 1 3 5 1 3 5 1 3 5
x x x x

40 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Pie Charts
• Pie charts are not recommended in the R
Pie Chart of Countries
documentation, and their features are
somewhat limited.
• The authors recommend bar or dot plots UK

over pie charts because people are able US


to judge length more accurately than
volume. Australia

Simple Pie Chart France

Germany
# Simple Pie Chart
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries")

41 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Pie Chart of Countries

Pie Chart with Annotated Percentages UK 24%

US 20%
# Pie Chart with Percentages
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France") Australia 8%
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct) # add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels France 16%
pie(slices,labels = lbls, col=rainbow(length(lbls)),
main="Pie Chart of Countries") Germany 32%

42 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Pie Chart of Countries


3D Pie Chart
• The pie3D( ) function in the
plotrix package provides 3D UK
exploded pie charts. US
Australia
# 3D Exploded Pie Chart
library(plotrix) France
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France") Germany
pie3D(slices,labels=lbls,explode=0.1,
main="Pie Chart of Countries")

43 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Boxplot Car Milage Data

• Boxplots can be created for individual

30
variables or for variables by group.

Miles Per Gallon


• The format is boxplot(x, data=),

25
where x is a formula and data=

20
denotes the data frame providing the
data.

15
# Boxplot of MPG by Car Cylinders
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", 10
xlab="Number of Cylinders", ylab="Miles Per Gallon") 4 6 8
Number of Cylinders

44 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Simple Scatterplot Scatterplot Example

• There are many ways to create a


scatterplot in R.

30
• The basic function is plot(x, y),

Miles Per Gallon


25
where x and y are numeric vectors
denoting the (x,y) points to plot.

20
15
# Simple Scatterplot
attach(mtcars)
plot(wt, mpg, main = "Scatterplot Example",
10
xlab = "Car Weight",
ylab = "Miles Per Gallon", pch=19) 2 3 4 5
Car Weight

45 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Scatterplot Example

30
Add Fit Line to Scatterplot

Miles Per Gallon


25
# Add fit lines

20
abline(lm(mpg~wt), col="red")

15
10
2 3 4 5
Car Weight

46 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Simple Scatterplot
cyl
• The scatterplot( ) function in the 4 Enhanced Scatter Plot
6
car package offers many enhanced 8
features, including fit lines, marginal
box plots, conditioning on a factor,

30
Miles Per Gallon
and interactive point identification.

25
• Each of these features is optional.

20
# Enhanced Scatterplot of MPG vs. Weight

15
# by Number of Car Cylinders
library(car) 10
scatterplot(mpg ~ wt | cyl, data=mtcars, smoother = F,
xlab="Weight of Car", ylab="Miles Per Gallon", 2 3 4 5
main="Enhanced Scatter Plot",
labels=row.names(mtcars)) Weight of Car

47 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Simple Scatterplot Matrix


100 300 2 3 4 5

30
Scatterplot Matrices mpg

20
10
• There are at least 4 useful functions

300
disp
for creating scatterplot matrices.

100

5.0
# Basic Scatterplot Matrix

4.0
pairs(~mpg+disp+drat+wt,data=mtcars, drat

3.0
main="Simple Scatterplot Matrix")
2 3 4 5
wt

10 20 30 3.0 4.0 5.0

48 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Scatterplot Matrices
100 300 2 3 4 5

• The car package can condition the mpg

30
scatterplot matrix on a factor, and

20
4
6
8
optionally include lowess and linear

10
disp
best fit lines, and boxplot, densities,

300
or histograms in the principal

100
diagonal, as well as rug plots in the

5.0
drat
margins of the cells.

4.0
3.0
# Scatterplot Matrices from the car Package wt
2 3 4 5
library(car)
scatterplotMatrix(~mpg+disp+drat+wt|cyl,
data=mtcars, smoother=F, by.group=TRUE,
diagonal = "density") 10 20 30 3.0 4.0 5.0

49 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

3D Scatterplot
3D Scatterplots

• You can create a 3D scatterplot with


the scatterplot3d package.

35
• Use the function scatterplot3d(x,

30
y, z).

25
mpg

disp
500

20
400
# 3D Scatterplot 300

15
library(scatterplot3d) 200
attach(mtcars) 100
10
0
scatterplot3d(wt,disp,mpg, main="3D Scatterplot") 1 2 3 4 5 6

wt

50 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

3D Scatterplot

3D Scatterplots with Coloring and


Vertical Drop Lines

35
30
library(scatterplot3d)
attach(mtcars)

25
mpg
scatterplot3d(wt,disp,mpg, pch=16,

disp
highlight.3d=TRUE, 500

20
400
type="h", main="3D Scatterplot") 300

15
200
100
10
0
1 2 3 4 5 6

wt

51 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

3D Scatterplot
3D Scatterplots with Coloring and
Vertical Drop Lines and Regression
Plane

35
30
library(scatterplot3d)
attach(mtcars)

25
s3d <-scatterplot3d(wt,disp,mpg, pch=16,

mpg

disp
highlight.3d=TRUE, 500

20
type="h", main="3D Scatterplot") 400
300
fit <- lm(mpg ~ wt+disp)

15
200
s3d$plane3d(fit) 100
10
0
1 2 3 4 5 6

wt

52 of 100
Creating a Graph
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Combining Plots

• R makes it easy to combine multiple


plots into one overall graph, using par()
function. Scatterplot of wt vs. mpg Scatterplot of wt vs disp

• With the par() function, you can include

30

300
mpg

disp
the option mfrow=c(nrows, ncols) to

20

100
create a matrix of nrows x ncols plots

10
2 3 4 5 2 3 4 5
that are filled in by row. wt wt

• mfcol=c(nrows, ncols) fills in the Histogram of wt Boxplot of wt


matrix by columns.

5
0 2 4 6 8
Frequency

4
3
# 4 figures arranged in 2 rows and 2 columns

2
attach(mtcars)
par(mfrow=c(2,2)) 2 3 4 5

plot(wt,mpg, main="Scatterplot of wt vs. mpg") wt

plot(wt,disp, main="Scatterplot of wt vs disp")


hist(wt, main="Histogram of wt")
boxplot(wt, main="Boxplot of wt")

53 of 100
Hypothesis Testing
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Student’s t-Test
• The t.test() function produces a variety of t-tests.
• Unlike most statistical packages, the default assumes unequal variance and
applies the Welsh df modification.

Function Help # independent 2-group t-test


# where y is numeric and x is a binary factor
t.test(y~x)
t.test(x, ...)
# independent 2-group t-test
## Default S3 method: t.test(y1,y2) # where y1 and y2 are numeric
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"), # paired t-test
mu = 0, paired = FALSE, var.equal = FALSE, t.test(y1,y2,paired=TRUE) # y1 & y2 are numeric
conf.level = 0.95, ...)
# one sample t-test
t.test(y,mu=3) # Ho: mu=3

54 of 100
Multiple (Linear) Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• R provides comprehensive support for multiple linear regression.

Other useful functions


Fitting the Model
coefficients(fit) # model coefficients
confint(fit, level=0.95) # CIs for model parameters
# Multiple Linear Regression Example fitted(fit) # predicted values
fit <- lm(y ~ x1 + x2 + x3, data=mydata) residuals(fit) # residuals
summary(fit) # show results anova(fit) # anova table
vcov(fit) # covariance matrix for model parameters

55 of 100
Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Simple Linear Regression

• We begin with a small example to provide a feel for the process.


• The data set Journals is taken from Stock and Watson (2007).
• The data provide some information on subscriptions to economics
journals at US libraries for the year 2000.
• Bergstrom (2001) argues that commercial publishers are charging
excessive prices for academic journals and also suggests ways that
economists can deal with this problem.
• The Journals data frame contains 180 observations (the journals) on 10
variables, among them the number of library subscriptions (subs), the
library subscription price (price), and the total number of citations for the
journal (citations).
56 of 100
Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Simple Linear Regression


• The data can be loaded, transformed, and summarized via
> library(AER)
> data("Journals")
> names(Journals)
[1] "title" "publisher" "society" "price" "pages" "charpp" "citations"
[8] "foundingyear" "subs" "field"
> journals <- Journals[, c("subs", "price")]
> journals$citeprice <- Journals$price/Journals$citations
> summary(journals)
subs price citeprice
Min. : 2.0 Min. : 20.0 Min. : 0.005223
1st Qu.: 52.0 1st Qu.: 134.5 1st Qu.: 0.464495
Median : 122.5 Median : 282.0 Median : 1.320513
Mean : 196.9 Mean : 417.7 Mean : 2.548455
3rd Qu.: 268.2 3rd Qu.: 540.8 3rd Qu.: 3.440171
Max. :1098.0 Max. :2120.0 Max. :24.459459

57 of 100
Simple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• In view of the wide range of the variables, combined with a


considerable amount of skewness, it is useful to take logarithms.

80
Frequency

Frequency
80
40
R code

40
0

0
0 200 600 1000 0 5 10 15 20 25
par(mfrow=c(2,2))
journals$subs journals$citeprice
hist(journals$subs, main="")
hist(journals$citeprice, main="")
hist(log(journals$subs), main="")
hist(log(journals$citeprice), main="")
par(mfrow=c(1,1))
Frequency

Frequency
40
40

20
20
0

0
0 2 4 6 8 −6 −4 −2 0 2 4
log(journals$subs) log(journals$citeprice)

58 of 100
Simple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• The goal is to estimate the effect of the price per citation on the
number of library subscriptions.
> jmodel<-lm(log(subs)~log(citeprice), data = journals)
> summary(jmodel)

Call:
lm(formula = log(subs) ~ log(citeprice), data = journals)

Residuals:
Min 1Q Median 3Q Max
-2.72478 -0.53609 0.03721 0.46619 1.84808

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.76621 0.05591 85.25 <2e-16 ***
log(citeprice) -0.53305 0.03561 -14.97 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.7497 on 178 degrees of freedom


Multiple R-squared: 0.5573, Adjusted R-squared: 0.5548
F-statistic: 224 on 1 and 178 DF, p-value: < 2.2e-16

59 of 100
Simple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Confidence intervals
• It is good practice to give a measure of error along with every
estimate.
• One way to do this is to provide a confidence interval.
• This is available via the extractor function confint().

> confint(jmodel, level = 0.95)


2.5 % 97.5 %
(Intercept) 4.6558822 4.8765420
log(citeprice) -0.6033319 -0.4627751

60 of 100
Simple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Prediction
• Often a regression model is used for prediction.
• There are two types of predictions: the prediction of points on the regression
line and the prediction of a new data value.
• The standard errors of predictions for new data take into account both the
uncertainty in the regression line and the variation of the individual points
about the line.
• Thus, the prediction interval is larger than that for prediction of points on the
line.
• The function predict() provides both types of standard errors.
> predict(jmodel, newdata=data.frame(citeprice=2.11), interval = "confidence")
fit lwr upr
1 4.368188 4.247485 4.48889
> predict(jmodel, newdata=data.frame(citeprice=2.11), interval = "prediction")
fit lwr upr
1 4.368188 2.883746 5.852629

61 of 100
Simple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Diagnostic plots

Standardized residuals
Residuals vs Fitted Normal Q−Q
• The plot() command for class IO IO

2
Residuals
1

0
lm() object provides four

−1
BoIES BoIES
diagnostic plots.

−3
−3
MEPiTE MEPiTE

3 4 5 6 7 −2 −1 0 1 2
• The figure depicts the result for Fitted values Theoretical Quantiles
the journals regression.
Standardized residuals

Standardized residuals
• We set the graphical parameter Scale−Location
MEPiTE
Residuals vs Leverage

3
BoIES RoRPE
IO
mfrow to c(2, 2) using the

−1 1
1.0

par() function, creating a 2 × 2 Ecnmt

Cook's distance 0.5

−4
0.0

MEPiTE
matrix of plotting areas to see all
3 4 5 6 7 0.00 0.02 0.04 0.06
four plots simultaneously. Fitted values Leverage

62 of 100
Simple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Testing a linear hypothesis

• Often it is necessary to test more general hypotheses.


• This is possible using the function linearHypothesis() from the car
package
> linearHypothesis(jmodel, "log(citeprice)=-0.5")
Linear hypothesis test

Hypothesis:
log(citeprice) = - 0.5

Model 1: restricted model


Model 2: log(subs) ~ log(citeprice)

Res.Df RSS Df Sum of Sq F Pr(>F)


1 179 100.54
2 178 100.06 1 0.48421 0.8614 0.3546

63 of 100
R and LATEX: texreg package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

> library(texreg)
> texreg(jmodel, dcolumn = TRUE, booktabs = TRUE)
\begin{table} Model 1
\begin{center}
\begin{tabular}{l D{.}{.}{3.5}@{} }
\toprule
(Intercept) 4.77∗∗∗
\midrule
& \multicolumn{1}{c}{Model 1} \\
(0.06)
(Intercept) & 4.77^{***} \\
& (0.06) \\
log(citeprice) −0.53∗∗∗
log(citeprice) & -0.53^{***} \\
& (0.04) \\
(0.04)
\midrule
R$^2$ & 0.56 \\ R2 0.56
Adj. R$^2$ & 0.55 \\
Num. obs. & 180 \\ Adj. R2 0.55
\bottomrule
\multicolumn{2}{l}{\scriptsize{ Num. obs. 180
\textsuperscript{***}$p<0.001$,
\textsuperscript{**}$p<0.01$, *** p < 0.001, ** p < 0.01, * p < 0.05
\textsuperscript{*}$p<0.05$}}
\end{tabular}
\caption{Statistical models} Table: Statistical models
\label{table:coefficients}
\end{center}
\end{table}
64 of 100
R and LATEX: stargazer package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

> library(stargazer)
> stargazer(jmodel, align=T)
\begin{table}[!htbp] \centering
\caption{}
\label{}
\begin{tabular}{@{\extracolsep{5pt}}lD{.}{.}{-3} } \\[-1.8ex]\hline \hline \\[-1.8ex]
& \multicolumn{1}{c}{\textit{Dependent variable:}} \\ \cline{2-2}
\\[-1.8ex] & \multicolumn{1}{c}{log(subs)} \\ \hline \\[-1.8ex]
log(citeprice) & -0.533^{***} \\
& (0.036) \\
& \\
Constant & 4.766^{***} \\
& (0.056) \\
& \\ \hline \\[-1.8ex]
Observations & \multicolumn{1}{c}{180} \\
R$^{2}$ & \multicolumn{1}{c}{0.557} \\
Adjusted R$^{2}$ & \multicolumn{1}{c}{0.555} \\
Residual Std. Error & \multicolumn{1}{c}{0.750 (df = 178)} \\
F Statistic & \multicolumn{1}{c}{224.037$^{***}$ (df = 1; 178)} \\ \hline \hline \\[-1.8ex]
\textit{Note:} & \multicolumn{1}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\
\normalsize
\end{tabular}
\end{table}

65 of 100
R and LATEX: stargazer package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Table: caption

Dependent variable:
log(subs)
log(citeprice) −0.533∗∗∗
(0.036)

Constant 4.766∗∗∗
(0.056)

Observations 180
R2 0.557
Adjusted R2 0.555
Residual Std. Error 0.750 (df = 178)
F Statistic 224.037∗∗∗ (df = 1; 178)
Note: ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

66 of 100
Multiple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• To illustrate how to deal with multiple regression in R, we consider a standard


task in labor economics, estimation of a wage equation in semilogarithmic form.
• Here, we employ the CPS1988 data frame collected in the March 1988 Current
Population Survey (CPS) by the US Census Bureau and analyzed by Bierens
and Ginther (2001).

> data("CPS1988")
> head(CPS1988)
wage education experience ethnicity smsa region parttime
1 354.9 7 45 cauc yes northeast no
2 123.5 12 1 cauc yes northeast yes
3 370.4 9 9 cauc yes northeast no
4 754.9 11 46 cauc yes northeast no
5 593.5 12 36 cauc yes northeast no
6 377.2 16 22 cauc yes northeast no

67 of 100
Multiple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

> options(digits=4)
> educmodel<-lm(log(wage)~experience+I(experience^2)+education+ethnicity,data=CPS1988)
> summary(educmodel)

Call:
lm(formula = log(wage) ~ experience + I(experience^2) + education +
ethnicity, data = CPS1988)

Residuals:
Min 1Q Median 3Q Max
-2.943 -0.316 0.058 0.376 4.383

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.321395 0.019174 225.4 <2e-16 ***
experience 0.077473 0.000880 88.0 <2e-16 ***
I(experience^2) -0.001316 0.000019 -69.3 <2e-16 ***
education 0.085673 0.001272 67.3 <2e-16 ***
ethnicityafam -0.243364 0.012918 -18.8 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.584 on 28150 degrees of freedom


Multiple R-squared: 0.335, Adjusted R-squared: 0.335
F-statistic: 3.54e+03 on 4 and 28150 DF, p-value: <2e-16

68 of 100
Multiple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Interactions
• Let us consider an interaction between ethnicity and education.

> cps_int<-lm(log(wage)~experience+I(experience^2)+education*ethnicity,data=CPS1988)
> coeftest(cps_int)

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 4.313059 0.019590 220.17 <2e-16 ***
experience 0.077520 0.000880 88.06 <2e-16 ***
I(experience^2) -0.001318 0.000019 -69.34 <2e-16 ***
education 0.086312 0.001309 65.94 <2e-16 ***
ethnicityafam -0.123887 0.059026 -2.10 0.036 *
education:ethnicityafam -0.009648 0.004651 -2.07 0.038 *
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

69 of 100
Multiple Linear Regression
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Interactions
• Let us consider only the interaction between ethnicity and education.

> cps_int2<-lm(log(wage)~experience+I(experience^2)+education:ethnicity,data=CPS1988)
> coeftest(cps_int2)

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 4.303774 0.019085 225.5 <2e-16 ***
experience 0.077567 0.000880 88.1 <2e-16 ***
I(experience^2) -0.001320 0.000019 -69.5 <2e-16 ***
education:ethnicitycauc 0.086986 0.001269 68.5 <2e-16 ***
education:ethnicityafam 0.067812 0.001642 41.3 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

70 of 100
R and LATEX: stargazer package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Dependent variable:
log(wage)
(1) (2)
education 0.101∗∗∗ 0.076∗∗∗
(0.001) (0.001)

experience 0.020∗∗∗
(0.0003)

Constant 4.489∗∗∗ 5.178∗∗∗


(0.020) (0.019)

Observations 28,155 28,155


R2 0.213 0.095
Adjusted R2 0.213 0.095
Residual Std. Error 0.635 (df = 28152) 0.681 (df = 28153)
F Statistic 3,806.000∗∗∗ (df = 2; 28152) 2,942.000∗∗∗ (df = 1; 28153)

Note: p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

71 of 100
Linear Regression with Time Series Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• The package dynlm (Zeileis 2008) provides the function dynlm(),


which allows for formulas such as

∆yt = β0 + β1 ∆yt−1 + xt−4 ,

• that describes a regression of the first differences of a variable y on


its first difference lagged by one period and on the fourth lag of a
variable x.

72 of 100
Linear Regression with Time Series Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Example: create data em estimate linear regression


> set.seed(1)
> x<-rnorm(100)
> y<-0.5*x+rnorm(100)/2
> summary(lm(y~x))

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-0.9384 -0.3069 -0.0697 0.2697 1.1731

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0188 0.0485 -0.39 0.7
x 0.4995 0.0539 9.27 4.6e-15 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.481 on 98 degrees of freedom


Multiple R-squared: 0.467, Adjusted R-squared: 0.462
F-statistic: 86 on 1 and 98 DF, p-value: 4.58e-15
73 of 100
Linear Regression with Time Series Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Example: bind data, create time series and estimate dynamic


model
> data<-cbind(y,x)
> head(data)
y x
[1,] -0.6234 -0.6265
[2,] 0.1129 0.1836
[3,] -0.8733 -0.8356
[4,] 0.8767 1.5953
[5,] -0.1625 0.3295
[6,] 0.4734 -0.8205
> data.ts<-ts(data,start=c(2005,1), frequency = 12)
> window(data.ts, start=c(2005,1), end=c(2005,6))
y x
Jan 2005 -0.6234 -0.6265
Feb 2005 0.1129 0.1836
Mar 2005 -0.8733 -0.8356
Apr 2005 0.8767 1.5953
May 2005 -0.1625 0.3295
Jun 2005 0.4734 -0.8205
> library(dynlm)
> dynmodel<-dynlm(d(y)~L(d(y))+L(x, 4), data=data.ts)

74 of 100
Linear Regression with Time Series Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Example: summarize dynamic linear model

> summary(dynmodel)

Time series regression with "ts" data:


Start = 2005(5), End = 2013(4)

Call:
dynlm(formula = d(y) ~ L(d(y)) + L(x, 4), data = data.ts)

Residuals:
Min 1Q Median 3Q Max
-2.0278 -0.4267 0.0013 0.5882 2.0425

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0143 0.0838 -0.17 0.86
L(d(y)) -0.5087 0.0876 -5.81 8.9e-08 ***
L(x, 4) 0.0214 0.0940 0.23 0.82
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.809 on 93 degrees of freedom


Multiple R-squared: 0.271, Adjusted R-squared: 0.256
F-statistic: 17.3 on 2 and 93 DF, p-value: 4.06e-07
75 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• There has been considerable interest in panel data econometrics


over the last two decades, and hence it is almost mandatory to
include a brief discussion of some common specifications in R.
• The package plm (Croissant and Millo 2008) contains the relevant
fitting functions and methods.
• The main difference between cross-sectional data and panel data is
that panel data have an internal structure, indexed by a
two-dimensional array, which must be communicated to the fitting
function.
• We refer to the cross-sectional objects as “individuals” and the
time identifier as “time”.

76 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• For illustrating the basic fixed- and random-effects methods, we use the
well-known Grunfeld data (Grunfeld 1958) comprising 20 annual
observations on the three variables for 11 large US firms for the years
1935-1954.
• The basic one-way panel regression is

investit = β1 valueit + β2 capitalit + ηi + νit ,

• where invest is the real gross investment, value is the real value of the
firm, and capital is the real value of the capital stock.
• Originally employed in a study of the determinants of corporate
investment in a University of Chicago Ph.D. thesis, these data have been
a textbook classic since the 1970s.
• The package AER provides the full data set comprising all 11 firms.
77 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• We use the Grunfeld data from package plm.


> data("Grunfeld", package = "plm")
> head(Grunfeld)
firm year inv value capital
1 1 1935 317.6 3078.5 2.8
2 1 1936 391.8 4661.7 52.6
3 1 1937 410.6 5387.1 156.9
4 1 1938 257.7 2792.2 209.2
5 1 1939 330.8 4313.2 203.4
6 1 1940 461.2 4643.9 207.2

• Utilizing plm.data(), we tell R that the individuals are called


“firm”, whereas the time identifier is called “year”:
> library("plm")
> pgr <- plm.data(Grunfeld, index = c("firm", "year"))
78 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• Panel data models are estimated using

Pooled OLS (POLS)

> gr_pool <- plm(inv ~ value + capital, data = pgr, model = "pooling")

Fixed effects (FE)

> gr_fe <- plm(inv ~ value + capital, data = pgr, model = "within")

Random effects (RE)

> gr_re <- plm(inv ~ value + capital, data = pgr, model = "random")

79 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

library(stargazer)
stargazer(gr_pool, gr_fe, gr_re, align=T, column.labels = c("POLS", "FE", "RE"),
dep.var.caption = "Dependent variable: investment")

Dependent variable: investment


inv
POLS FE RE
(1) (2) (3)
value 0.116∗∗∗ 0.110∗∗∗ 0.110∗∗∗
(0.006) (0.012) (0.010)

capital 0.231∗∗∗ 0.310∗∗∗ 0.308∗∗∗


(0.025) (0.017) (0.017)

Constant −42.714∗∗∗ −57.834∗∗


(9.512) (28.899)

Observations 200 200 200


R2 0.812 0.767 0.770
Adjusted R2 0.800 0.721 0.758
F Statistic 426.576∗∗∗ (df = 2; 197) 309.014∗∗∗ (df = 2; 188) 328.837∗∗∗ (df = 2; 197)

Note: p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

80 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• It is of interest to check whether the fixed effects are really needed.


• This is done by comparing the fixed effects and the pooled OLS
fits by means of pFtest() and yields
> pFtest(gr_fe, gr_pool)

F test for individual effects

data: inv ~ value + capital


F = 49.1766, df1 = 9, df2 = 188, p-value < 2.2e-16
alternative hypothesis: significant effects

81 of 100
Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• A comparison of the regression coefficients shows that fixed- and


random-effects methods yield rather similar results for these data.
• Hausman test can be also used to differentiate between fixed
effects model and random effects model in panel data.
• In this case, Random effects (RE) is preferred under the null
hypothesis due to higher efficiency, while under the alternative
Fixed effects (FE) is at least consistent and thus preferred.
> phtest(gr_re, gr_fe)

Hausman Test

data: inv ~ value + capital


chisq = 2.3304, df = 2, p-value = 0.3119
alternative hypothesis: one model is inconsistent

82 of 100
Dynamic Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• Now we present a more advanced example, the dynamic panel data


model estimated by the method of Arellano and Bond (1991).
• Recall that their estimator is a generalized method of moments
(GMM) estimator utilizing lagged endogenous regressors after a
first-differences transformation.
• plm comes with the original Arellano-Bond data (EmplUK) dealing
with determinants of employment (emp) in a panel of 140 UK firms
for the years 1976–1984.
• The data are unbalanced, with seven to nine observations per firm.

83 of 100
Dynamic Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Arellano and Bond (1991): Difference GMM

> ab1991 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1)


+ + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:8),
+ data = EmplUK, effect = "twoways", model = "twosteps")
> summary(ab1991)
Twoways effects Two steps model
Unbalanced Panel: n=140, T=7-9, N=1031
Number of Observations Used: 611
Coefficients
Estimate Std. Error z-value Pr(>|z|)
lag(log(emp), 1:2)1 0.4742 0.1854 2.56 0.01054 *
lag(log(emp), 1:2)2 -0.0530 0.0517 -1.02 0.30605
lag(log(wage), 0:1)0 -0.5132 0.1456 -3.53 0.00042 ***
lag(log(wage), 0:1)1 0.2246 0.1419 1.58 0.11353
log(capital) 0.2927 0.0626 4.67 3.0e-06 ***
lag(log(output), 0:1)0 0.6098 0.1563 3.90 9.5e-05 ***
lag(log(output), 0:1)1 -0.4464 0.2173 -2.05 0.03996 *
---
Sargan Test: chisq(25) = 30.11 (p.value=0.22)
Autocorrelation test (1): normal = -1.538 (p.value=0.062)
Autocorrelation test (2): normal = -0.2797 (p.value=0.39)
Wald test for coefficients: chisq(7) = 142 (p.value=<2e-16)
Wald test for time dummies: chisq(6) = 16.97 (p.value=0.00939)

84 of 100
Dynamic Linear Regression with Panel Data
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Blundell and Bond (1998): System GMM

> bb1998 <- pgmm(log(emp) ~ lag(log(emp), 1)+ lag(log(wage), 0:1) +


+ lag(log(capital), 0:1) | lag(log(emp), 2:8) + lag(log(wage), 2:8)
+ + lag(log(capital), 2:8), data = EmplUK, effect = "twoways",
+ model = "onestep", transformation = "ld")
> summary(bb1998, robust = TRUE)
Twoways effects One step model
Unbalanced Panel: n=140, T=7-9, N=1031
Number of Observations Used: 1642
Coefficients
Estimate Std. Error z-value Pr(>|z|)
lag(log(emp), 1) 0.9356 0.0263 35.58 < 2e-16 ***
lag(log(wage), 0:1)0 -0.6310 0.1181 -5.34 9.1e-08 ***
lag(log(wage), 0:1)1 0.4826 0.1369 3.53 0.00042 ***
lag(log(capital), 0:1)0 0.4839 0.0539 8.98 < 2e-16 ***
lag(log(capital), 0:1)1 -0.4244 0.0585 -7.26 4.0e-13 ***
---
Sargan Test: chisq(100) = 118.8 (p.value=0.0971)
Autocorrelation test (1): normal = -4.808 (p.value=7.61e-07)
Autocorrelation test (2): normal = -0.28 (p.value=0.39)
Wald test for coefficients: chisq(5) = 11175 (p.value=<2e-16)
Wald test for time dummies: chisq(7) = 14.71 (p.value=0.0399)

85 of 100
Time Series
Introdução ao R (Prof. Henrique Castro, FEA-USP)

The quantmod Package

• quantmod is the short form of Quantitative Financial Modelling Framework, a


package written by Jeffrey A. Ryan.
• The quantmod package for R is designed to assist the quantitative trader in the
development, testing, and deployment of statistically based trading models.
• What quantmod IS: A rapid prototyping environment, with comprehensive
tools for data management and visualization, where quant traders can quickly
and cleanly explore and build trading models.
• What quantmod is NOT: A replacement for anything statistical; It has no
‘new’ modelling routines or analysis tool to speak of; It does now offer charting
not currently available elsewhere in R, but most everything else is more of a
wrapper to what you already know and love about the language and packages
you currently use.

86 of 100
Time Series: quantmod Package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• It is possible with one quantmod Getting data


function to load data from a variety
of sources, including > # from google finance
> getSymbols("YHOO",src="google")
◦ Yahoo! Finance (OHLC data) [1] "YHOO"
◦ Federal Reserve Bank of St. Louis
FRED (11,000 economic series) > # from yahoo finance
◦ Google Finance (OHLC data) > getSymbols("GOOG",src="yahoo")
[1] "GOOG"
◦ Oanda, The Currency Site (FX
and Metals) > # FX rates from FRED
◦ MySQL databases (Your local > getSymbols("DEXJPUS",src="FRED")
data) [1] "DEXJPUS"
◦ R binary formats (.RData and
.rda) > # Platinum from Oanda
> getSymbols("XPT/USD",src="Oanda")
◦ Comma Separated Value files [1] "XPTUSD"
(.csv)
87 of 100
Time Series: quantmod Package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Apple stock prices

> library(quantmod)

> getSymbols("AAPL",src="yahoo")
[1] "AAPL"

> head(AAPL)
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
2007-01-03 86.29 86.58 81.90 83.80 44225700 81.03
2007-01-04 84.05 85.95 83.82 85.66 30259300 82.83
2007-01-05 85.77 86.20 84.40 85.05 29812200 82.24
2007-01-08 85.96 86.53 85.28 85.47 28468100 82.64
2007-01-09 86.45 92.98 85.15 92.57 119617800 89.51
2007-01-10 94.75 97.80 93.45 97.00 105460000 93.79

88 of 100
Time Series: quantmod Package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

> chartSeries(AAPL, theme="white")

AAPL [2007−01−03/2013−11−22]
Last 519.8 700

600

500

400

300

200

100
120
100 Volume (millions):
80 7,979,500
60
40
20

jan 03 2007 jan 02 2009 jan 03 2011 jan 02 2013

89 of 100
Time Series: quantmod Package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Aggregating to a different time scale


> periodicity(AAPL)
Daily periodicity from 2007-01-03 to 2013-11-22

> head(to.weekly(AAPL))
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
2007-01-05 86.29 86.58 81.90 85.05 104297200 82.24
2007-01-12 85.96 97.80 85.15 94.62 351865300 91.49
2007-01-19 95.68 97.60 88.12 88.50 236407700 85.57
2007-01-26 89.14 89.16 84.99 85.38 195789700 82.55
2007-02-02 86.30 86.65 83.70 84.75 129342000 81.95
2007-02-09 84.30 86.51 82.86 83.27 144630100 80.51

> head(to.monthly(AAPL))
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
jan 2007 86.29 97.80 81.90 85.73 971777900 82.89
fev 2007 86.23 90.81 82.86 84.61 490084100 81.81
mar 2007 84.03 96.83 83.75 92.91 568523000 89.84
abr 2007 94.14 102.50 89.60 99.80 480705000 96.50
mai 2007 99.59 122.17 98.55 121.19 620181000 117.18
jun 2007 121.10 127.61 115.40 122.04 831412200 118.00

90 of 100
Time Series: quantmod Package
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Period Simple Returns


> head(dailyReturn(AAPL[,6])) > head(monthlyReturn(AAPL[,6]))
daily.returns monthly.returns
2007-01-03 0.000000 2007-01-31 0.022954
2007-01-04 0.022214 2007-02-28 -0.013029
2007-01-05 -0.007123 2007-03-30 0.098154
2007-01-08 0.004864 2007-04-30 0.074132
2007-01-09 0.083132 2007-05-31 0.214301
2007-01-10 0.047816 2007-06-29 0.006998
> head(weeklyReturn(AAPL[,6])) > head(quarterlyReturn(AAPL[,6]))
weekly.returns quarterly.returns
2007-01-05 0.014933 2007-03-30 0.1087
2007-01-12 0.112476 2007-06-29 0.3134
2007-01-19 -0.064707 2007-09-28 0.2575
2007-01-26 -0.035293 2007-12-31 0.2907
2007-02-02 -0.007268 2008-03-31 -0.2756
2007-02-09 -0.017572 2008-06-30 0.1668
91 of 100
Time Series: Basic Plots
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Monthly Log Returns


r.aapl
> r.aapl<-monthlyReturn(AAPL[,6], type = "log")

0.1
> head(r.aapl)
monthly.returns

−0.4
2007-01-31 0.022695
jan 2007 jan 2008 jan 2009 jan 2010 jan 2011 jan 2012 jan 2013
2007-02-28 -0.013115
2007-03-30 0.093631
2007-04-30 0.071513 acf(r.aapl)
2007-05-31 0.194168

−0.2 0.1
2007-06-29 0.006973 ACF

5 10 15
Lag

ACF and PACF pacf(r.aapl)


Partial ACF
−0.2 0.1

> library(TSA)
> par(mfrow=c(2,1))
> acf(r.aapl, main = "acf(r.aapl)") 5 10 15
> pacf(r.aapl, main = "pacf(r.aapl)") Lag
> par(mfrow=c(1,1))

92 of 100
Time Series: ARIMA models
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• The package forecast (Hyndman and Khandakar 2008) contains


a function auto.arima() that performs a search over a
user-defined set of models.
• auto.arima() returns best ARIMA model according to either
AIC, AICc or BIC value.
> library(forecast)
> auto.arima(r.aapl, ic = "bic")
Series: r.aapl
ARIMA(0,0,0) with zero mean

sigma^2 estimated as 0.0113: log likelihood=68.37


AIC=-134.7 AICc=-134.7 BIC=-132.3

93 of 100
Time Series: ARIMA models
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Estimate an ARIMA model


> ar.aapl<-arima(r.aapl, order = c(1,0,0), include.mean = T)
> ar.aapl
Series: x
ARIMA(1,0,0) with non-zero mean

Coefficients:
ar1 intercept
0.170 0.022
s.e. 0.107 0.013

sigma^2 estimated as 0.0105: log likelihood=71.49


AIC=-139 AICc=-138.7 BIC=-131.7

> coeftest(ar.aapl)

z test of coefficients:

Estimate Std. Error z value Pr(>|z|)


ar1 0.1702 0.1075 1.58 0.113
intercept 0.0223 0.0135 1.66 0.098 .
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

94 of 100
Time Series: ARIMA models
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Ljung-Box Test of the Residuals of the Estimated Model


> Box.test(resid(ar.aapl), lag = 5, type = "Ljung-Box", fitdf = 1)

Box-Ljung test

data: resid(ar.aapl)
X-squared = 4.592, df = 4, p-value = 0.3318

> Box.test(resid(ar.aapl), lag = 10, type = "Ljung-Box", fitdf = 1)

Box-Ljung test

data: resid(ar.aapl)
X-squared = 9.166, df = 9, p-value = 0.4221

> Box.test(resid(ar.aapl), lag = 15, type = "Ljung-Box", fitdf = 1)

Box-Ljung test

data: resid(ar.aapl)
X-squared = 12.21, df = 14, p-value = 0.5896

95 of 100
Time Series: ARCH effects
Introdução ao R (Prof. Henrique Castro, FEA-USP)

Ljung-Box Test of the Squared Residuals of the Estimated


Model
> Box.test(resid(ar.aapl)^2, lag = 5, type = "Ljung-Box")

Box-Ljung test

data: resid(ar.aapl)^2
X-squared = 2.762, df = 5, p-value = 0.7367

> Box.test(resid(ar.aapl)^2, lag = 10, type = "Ljung-Box")

Box-Ljung test

data: resid(ar.aapl)^2
X-squared = 26.59, df = 10, p-value = 0.003021

> Box.test(resid(ar.aapl)^2, lag = 15, type = "Ljung-Box")

Box-Ljung test

data: resid(ar.aapl)^2
X-squared = 27.31, df = 15, p-value = 0.02633

96 of 100
Time Series: ARCH effects
Introdução ao R (Prof. Henrique Castro, FEA-USP)

ARCH LM Test

> library(FinTS) # Necessary package for ArchTest


> ArchTest(r.aapl)

ARCH LM-test; Null hypothesis: no ARCH effects

data: r.aapl
Chi-squared = 20.93, df = 12, p-value = 0.05143

97 of 100
Time Series: GARCH Model
Introdução ao R (Prof. Henrique Castro, FEA-USP)

GARCH(1, 1)

> library(fGarch)
> model<-garchFit(~garch(1,1), data = r.aapl, trace=F)
> summary(model)
Title: GARCH Modelling
Conditional Distribution: norm

Coefficient(s):
Estimate Std. Error t value Pr(>|t|)
mu 0.0215996 0.0104962 2.058 0.0396 *
omega 0.0006796 0.0005811 1.169 0.2422
alpha1 0.1014303 0.0807611 1.256 0.2091
beta1 0.8378938 0.0795071 10.539 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

98 of 100
References
Introdução ao R (Prof. Henrique Castro, FEA-USP)

• Crawley, M. J. (2012). The R book. 2nd ed. Wiley.


• Dalgaard, P. (2008). Introductory statistics with R. 2nd ed. Springer.
• Kleiber, C., & Zeileis, A. (2008). Applied econometrics with R. Springer.
• Ruppert, D. (2011). Statistics and data analysis for financial engineering.
Springer.
• Tsay, R. S. (2010). Analysis of financial time series. 3rd ed. Wiley.
• Tsay, R. S. (2013). An Introduction to Analysis of Financial Data with R.
Wiley.
• Wooldridge, J. M. (2010). Econometric analysis of cross section and
panel data. 2nd ed. MIT.
• Wooldridge, J. M. (2012). Introductory econometrics: a modern
approach. 5th ed. Cengage.
99 of 100
Mini-curso
Introdução ao R

Prof. Dr. Henrique Castro


hcastro@usp.br

Faculdade de Economia, Administração


e Contabilidade da Universidade de São
Paulo, FEA-USP

2013
100 of 100