Sie sind auf Seite 1von 36

A Short Introduction to R

By and Richard Harris, School of Geographical Sciences, University of Bristol


A Short Introduction to R by Richard Harris is licensed under a Creative Coons Attribution!
"onCoercial!ShareAli#e $%& Unported 'icense%
Based on a (or# at (((%social!statistics%org%
You are free:
to Share ) to copy, distribute and transit the (or#
to Remix ) to adapt the (or#
Under the following conditions:
Attribution ) *ou ust attribute the (or# in the follo(ing anner+ Based on A Short Introduction
to R by Richard Harris ,(((%social!statistics%org-%
Noncommercial ) *ou ay not use this (or# for coercial purposes% Use for education in a
recognised higher education institution ,a University- is perissible%
Share Alike ) If you alter, transfor, or build upon this (or#, you ay distribute the resulting
(or# only under the sae or siilar license to this one%
With the understanding that:
Waiver ) Any of the above conditions can be (aived if you get perission fro the copyright
holder ,Richard Harris, rich%harris.bris%ac%u#-
Public omain ) /here the (or# or any of its eleents is in the public doain under applicable
la(, that status is in no (ay affected by the license%
!ther Rights ) In no (ay are any of the follo(ing rights affected by the license+
*our fair dealing or fair use rights, or other applicable copyright e0ceptions and liitations1
2he author3s oral rights1
Rights other persons ay have either in the (or# itself or in ho( the (or# is used, such as publicity
or privacy rights%
Notice ) 4or any reuse or distribution, you ust a#e clear to others the license ters of this
(or# (hich applies also to derivatives%
,5ocuent version 6, 7&67-
Introduction
2his docuent presents a short introduction to R highlighting soe geographical functionality%
Specifically, it provides+
A basic overvie( of the 3nuts and bolts3 of R ,Session 6-
Soe e0aple of data analysis and siple apping in R ,Session 7-
Soe further inforation about the (or#ings of R ,Session $-
2he docuent is provided in good faith and the contents have been tested by the author% Ho(ever,
use is entirely as the user3s ris#% Absolutely no responsibility or liability is accepted by the author
for conse8uences arising fro ho(soever this docuent is used% It is is licensed under a Creative
Coons Attribution!"onCoercial!ShareAli#e $%& Unported 'icense ,see above-%
"efore starting the following should be considered#
4irst, you (ill notice that in this docuent the pages and, ore unusually, the lines are nubered%
2he reason is educational+ it a#es directing a class to a specific part of a page easier and faster% 4or
other readers, the line nubers can be ignored%
Second, the sessions presue that, as (ell as R, a nuber of additional R pac#ages ,libraries- have
been installed and are available to use% 2he coplete list of pac#ages used is Rgoogle9aps, png, sp
and spdep% 2o install theses pac#ages, use
> install.packages(c("RgoogleMaps", "png", "sp", "spdep"))
4urther instructions for ho( to install pac#ages can be found in Section $%:%6, 3Installing and
loading one or ore of the pac#ages3 on page $:%
2hird, each session is (ritten to be copleted in a single sitting% If that is not possible, then it (ould
norally be possible to stop at a convenient point, save the (or#space before 8uitting R, then
reload the saved (or#space (hen you (ish to continue% "ote, ho(ever, that (hilst the additional
pac#ages ,libraries- need only be installed once, they ust be loaded each tie you begin again in
R and re8uire the% Any ob;ects that (ere attached before 8uitting R also need to be attached again
to ta#e you bac# to the point at (hich you left off% See the sections entitled 3 Saving and loading
(or#spaces3, 3Attaching a data frae3 and 3Installing and loading one or ore of the pac#ages3 on
pages 6&, 7< and $: for further inforation%
6
$
=
<
67
6:
6>
76
7?
7@
7
Session 1: Getting Started with R
2his session provides a brief introduction to ho( R (or#s and to introduce soe of the ore
coon coands and procedures%
1.1 About R
R is an open source soft(are pac#age, licensed under the G"U General Aublic 'icence% *ou can
obtain and install it for free, (ith versions available for ACs, 9acs and 'inu0% 2o find out (hat3s
available, go to the Coprehensive R Archive "et(or# ,CRA"- at http+BBcran%r!pro;ect%orgB
Being free is not necessarily a good reason to use R% Ho(ever, R is not ;ust free, it is also (ell
developed, (ell docuented, (idely used and (ell supported by an e0tensive user counity% It is
not ;ust soft(are for 3hobbyists3% It is (idely used in research, both acadeic and coercial%
In his boo# R in a Nutshell ,C3Reilly, 7&6&-, Doseph Adler (rites, ER is very good at plotting
graphics, analyFing data, and fitting statistical odels using data that fits in the coputer3s
eory%G
"evertheless, no soft(are is a perfect tool for every ;ob and Adler adds that Eit3s not good at storing
data in coplicated structures, efficiently 8uerying data, or (or#ing (ith data that doesn3t fit in the
coputer3s eory%G
2o these caveats it should be added that R does not offer spreadsheet editing of data to the level
found, for e0aple, in 9icrosoft H0cel% Conse8uently, it is often easier to prepare and 3clean3 data
prior to loading the into R% 2here is an add!in to R that provides soe integration (ith H0cel% Go
to http+BBrco%univie%ac%atB and loo# for RH0cel%
A possible barrier to learning R is that it is generally coand!line driven% 2hat is, the user types a
coand that the soft(are interprets and responds to% 2his can be daunting for those (ho are used
to e0tensive graphical user interfaces ,GUIs- (ith drop!do(n enus, tabs, pop!up enus, left or
right!clic#ing and other navigational tools to steer you through a process% It ay ean that R ta#es
a (hile longer to learn1 ho(ever, that tie is (ell spent% Cnce you #no( the coands it is usually
uch faster to type the than to (or# through a series of enu options% 2hey can be easily edited
to change things such as the siFe or colour of sybols on a graph, and a log or script of the
coands can be saved for use on another occasion%
Saying that, a fairly siple and platfor independent GUI called R Coander can be installed
,see http+BBcran%r!pro;ect%orgB(ebBpac#agesBRcdrBinde0%htl-% 4ield et al%3s boo# Discovering
Statistics Using R provides a coprehensive introduction to statistical analysis in R using both
coand!lines and R Coander%
1.2 Getting Started
Assuing R has been installed in the noral (ay on your coputer, clic#ing on the lin#Bshortcut to
R on the des#top (ill open the RGui, offering soe drop!do(n enu options, and also the R
Console, (ithin (hich R coands are typed and e0ecuted% 2he appearance of the RGui differs a
little depending upon the operating syste being used ,/indo(s, 9ac or 'inu0- but having used
one it should be fairly straightfor(ard to navigate around another%
$
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
Figure 1.1. Screen shot of the R Gui for Windows
1.2.1 Using R as a calculator
At its siplest, R can be used as a calculator% 2yping 1 + 1 after the propt > (ill ,after pressing
the returnBenter #ey, - produce the result 7, as in the follo(ing e0aple+
> 1 + 1
[1] 2
Coents can be indicated (ith a hash tag and (ill be ignored
> # This is a comment, no need to tpe it
Soe other siple atheatical e0pressions are given belo(%
> 1! " #
[1] #
> 1! $ 2
[1] 2!
> 1! " # $ 2 # The o%de% o& ope%ations gi'es p%io%it to
# m(ltiplication
[1] !
> (1! " #) $ 2 # The (se o& )%ackets changes the o%de%
[1] 1!
> s*%t(1!!) # +ses the &(nction that calc(lates the s*(a%e %oot
[1] 1!
> 1!,2 # 1!
2
[1] 1!!
> 1!!,!.# # 1!
!.#
, i.e. the s*(a%e %oot again
[1] 1!
> 1!,-
[1] 1!!!
> log1!(1!!) # +ses the &(nction that calc(lates the common log
[1] 2
> log1!(1!!!)
[1] -
> 1!! . #
[1] 2!
> 1!!,!.# . #
[1] 2
?
$
=
<
67
6:
6>
76
7?
7@
$&
$$
1.2.2 Incomplete commands
If you see the + sybol instead of the usual ,>- propt it is because (hat has been typed prior to
pressing the return #ey is incoplete% Cften there is a issing brac#et% 4or e0aple,
> s*%t(
+ 1!!
+ )
[1] 1!
> (1 + 2) $ (# " 1
+ )
[1] 12
Coands bro#en over ultiple lines can be easier to read%
> &o%(i in 1/1!) 0
+ p%int(i)
+ 1
[1] 1
[1] 2
[1] -
[1] 2
[1] #
[1] 3
[1] 4
[1] 5
[1] 6
[1] 1!
1.2.3 Repeating or modifying a previous command
If a ista#e is ade that needs to be corrected or if soe previously typed coands (ill be
repeated then the I and J #eys on the #eyboard can be used to scroll bet(een previous entries in the
R Console% 2ry itK
1.3 Scripting and Logging in R
1.3.1 Scripting
*ou can create a ne( script file fro the drop do(n enu 4ile L "e( script ,in /indo(s- or 4ile
L "e( 5ocuent ,9ac CS-% It is basically a te0t file in (hich you could (rite, for e0aple,
a 7" 1/1!
p%int(a)
In /indo(s, if you ove the cursor up to the re8uired line of the script and press Ctrl M R, then it
(ill be run in the R Console% So, for e0aple, ove the cursor to (here you have typed a 7" 1/1!
and press Ctrl M R% 2hen ove do(n a line and do the sae% 2he contents of a N the nubers 6 to
6& N should be printed in the R Console% If you continue to #eep the focus on the Scripting (indo(
and go to Hdit in the RGui you (ill find an option to run everything% Siilar coands are
available for other Cperating Systes% *ou can save files and load previously saved files%
Scripting is both good practice and good sense% It is good practice because it allo(s for
reproducibility of your (or#% It is good sense because if you need to go bac# and change things you
can do so easily (ithout having to start fro scratch%
All the coands for this session are contained in the file session6%R% If you prefer not to type the
coands yourself then you ay (ish to open this docuent and use it%
:
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
1.3.2 Logging
*ou can save the contents of the R Console (indo( to a te0t file% 2he easiest (ay to do this is to
clic# on the R Console ,to ta#e the focus fro the Scripting (indo(- and then use 4ile L Save
History ,in /indo(s- or 4ile L Save As ,9ac-% "ote that graphics are not usually plotted in the R
Console and therefore need to be saved separately%
1.4 Some R Basics
1.4.1 Functions, assignments and getting elp
It is helpful to understand R as an ob;ect!oriented syste that assigns inforation to ob;ects (ithin
the current (or#space% 2he (or#space is siply all the ob;ects that have been created or loaded
since beginning the session in R% 'oo# at it this (ay+ the ob;ects are li#e bo0 files, containing useful
inforation, and the (or#space is a larger storage bo0, #eeping the inforation together% A useful
feature of this is that R can operate on ultiple tables of data at once+ they are ;ust stored as
separate ob;ects (ithin the (or#space%
2o vie( the ob;ects currently in the (or#space, type
> ls()
cha%acte%(!)
5oing this runs the function ls(), (hich lists the contents of the (or#space% 2he result,
cha%acte%(!), indicates that the (or#space is epty% ,Assuing it currently is-%
2o find out ore about a function, type 8 or help (ith the function nae,
> 8ls()
> help(ls)
2his (ill provide details about the function, including e0aples of its use% It (ill also list the
arguents re8uired to run the arguent, soe of (hich ay be optional and soe of (hich ay
have default values (hich can be changed if re8uired% Consider, for e0aple,
> 8log()
A re8uired arguent is 0, (hich is the data value or values% 2yping log() oits any data and
generates an error% Ho(ever, log(1!!) (or#s ;ust fine% 2he arguent base ta#es a default value of e
6
(hich is appro0iately 7%@7 and eans the natural logarith is calculated% Using log(1!!,
)ase91!) gives the coon logarith, (hich can also be calculated using the convenience function
log1!(1!!).
2he results of atheatical e0pressions can be assigned to ob;ects, as can the outcoe of any
coands e0ecuted in the R Console% /hen the ob;ect is given a nae that is different to other
ob;ects (ithin the current (or#space, a ne( ob;ect (ill be created% /here the nae and ob;ect
already e0ists, the previous contents of the ob;ect (ill be over!(ritten, (ithout (arning N so be
carefulK
> a 7" 1! : #
> p%int(a)
[1] #
> ) 7" 1! $ 2
> p%int())
[1] 2!
> p%int(a $ ))
[1] 1!!
> a 7" a $ )
=
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
> p%int(a)
[1] 1!!
In these e0aples the assignent is achieved using the cobination of 7 and ", as in a 7" 1!!%
Alternatively, 1!! "> a could be used or, ore siply, a 9 1!!% 2he p%int(..)coand can often
be oitted, though it is useful, and soeties necessary ,for e0aple, (hen (hat you had hoped
(ould appear on screen doesn3t-%
> & 9 a $ )
> p%int(&)
[1] 2!!!
> &
[1] 2!!!
> s*%t())
[1] 2.2421-3
> p%int(s*%t()), digits9-) # The additional pa%amete% no; speci&ies
# the n(m)e% o& signi&icant &ig(%es
[1] 2.24
> c(a,)) # The c(...) &(nction com)ines its a%g(ments
[1] 1!! 2!
> c(a,s*%t()))
[1] 1!!.!!!!!! 2.2421-3
> p%int(c(a,s*%t())), digits9-)
[1] 1!!.!! 2.24
Although the naing of ob;ects is fle0ible, there are soe e0ceptions,
> <a 7" 1!
=%%o%/ (ne>pected inp(t in "<"
> 2a 7" 1!
=%%o%/ (ne>pected sm)ol in "2a"
"ote also that R is case sensitive, so a and A are different ob;ects
> a 7" 1!
> ? 7" 2!
> a 99 ?
[1] @?AB=
2he follo(ing is not sensible because it (on3t appear in the (or#space, although it is there,
> .a 7" 1!
> ls()
[1] "a" ")" "&"
> .a
[1] 1!
> %m(.a, ?) # Remo'es the o)Cects .a and ? (see )elo;)
1.4.2 Removing o!"ects from te #or$space
4ro typing ls() (e #no( that the (or#space no longer is epty% 2o reove an ob;ect fro the
(or#space it can be referenced e0plicitly or indirectly by its position in the (or#space% 2o see ho(
the second of these options (ill (or#, type
> ls()
[1] "a" ")" "&"
2he output returned fro the ls() function here is a vector of length three (here the first eleent is
the first ob;ect ,alphabetically- in the (or#space, the second is the second ob;ect, and so forth% /e
can access specific eleents by using notation of the for ls[inde>.n(m)e%]% So, the first eleent,
@
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
?>
the first ob;ect in the (or#space can be obtained using,
> ls()[1] # Det the )%ackets %ightE some %o(nded some s*(a%e
[1] "a"
> ls()[2]
[1] ")"
"ote ho( the s8uare brac#ets [F] are being used to reference specific eleents (ithin the vector%
Siilarly,
> ls()[-]
[1] "&"
> ls()[c(1,-)]
[1] "a" "&"
> ls()[c(1,2,-)]
[1] "a" ")" "&"
> ls()[c(1/-)] # 1/- means the n(m)e%s 1 to -
[1] "a" ")" "&"
Using the reove function, %m(...), the second and third ob;ects in the (or#space can be reoved
using
> %m(list9ls()[c(1,-)])
> ls()
[1] ")"
Alternatively, ob;ects can be reoved by nae
> %m())
2o delete all the ob;ects in the (or#space and therefore epty it, type the follo(ing code but N be
(arnedK N there is no undo function% /henever %m(...) is used the ob;ects are deleted peranently%
> %m(list9ls())
> ls()
cha%acte%(!) # Gn othe% ;o%ds, the ;o%kspace is empt
1.4.3 Saving and loading #or$spaces
Because ob;ects are deleted peranently, a sensible precaution prior to using %m(...) is to save the
(or#space% 2o do so perits the (or#space to be reloaded if necessary and the ob;ects recovered%
Cne (ay to save the (or#space is to use
> sa'e.image(&ile.choose(ne;9T))
Alternatively, the drop!do(n enus can be used ,4ile L Save /or#space in the /indo(s version
of the RGui-% In either case, type the e0tension %R5ata anually else it ris#s being oitted, a#ing
it harder to locate and reload (hat has been saved% 2ry creating a couple of ob;ects in your
(or#space and then save it (ith the naes (or#space6%R5ata
2o load a previously saved (or#space, use
> load(&ile.choose())
or the drop!do(n enus%
/hen 8uitting R, it (ill propt to save the (or#space iage% If the option to save is chosen it (ill
be saved to the file %R5ata (ithin the (or#ing directory% Assuing that directory is the default one,
the (or#space (ill be reloaded autoatically each and every tie R is opened, (hich could be
useful or it could be irritating% 2o stop it, locate and delete the file% 2he current (or#ing directory is
identified using the get (or#ing directory, get;d() and changed ost easily using the drop!do(n
>
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
enus%
> get;d()
[1] ".+se%s.gg%Ch"
*our (or#ing directory (ill differ fro the above%
A good strategy for file anageent is to create a ne( folder for each pro;ect in R, saving the
(or#space regularly in it using a naing convention such as 5ecO>O6%R5ata, 5ecO>O7%R5ata etc%
2hat (ay you can easily find and recover (or#%
1. !uitting R
Before 8uitting R, you ay (ish to save the (or#space% 2o 8uit R use either the drop!do(n enus
or
> *()
As proised, you (ill be propted (hether to save the (or#space% Ans(ering yes (ill save the
(or#space to the file %R5ata in the current (or#ing directory ,see section 6%?%$, 3Saving and loading
(or#spaces3, on page 6&, above-% 2o e0it (ithout the propt, use
> *(sa'e 9 "no")
Cr, ore siply,
> *("no")
1." Getting #e$p
In addition to the use of the 8 or help(F) docuentation and the aterial available at CRA",
http+BBcran%r!pro;ect%orgB, R has an active user counity% Helpful ailing lists can be accessed
fro (((%r!pro;ect%orgBail%htl%
Aerhaps the best all round introduction to R is the An Introduction to R (hich is freely available at
CRA" ,http+BBcran%r!pro;ect%orgBanuals%htl- or by using the drop!do(n Help enus in the RGui%
It is clear and succinct%
I also have a free introduction to statistical analysis in R (hich accopanies the boo# Statistics for
Geograph and !nviron"ental Science% It can be obtained fro http+BB(((%social!statistics%orgBP
pQ$:?%
2here are any boo#s available% 9y favourite, (ith a oderate level statistical leaning and (ritten
(ith clarity is,
9aindonald, D% R Braun, D%, 7&&@% Data Analsis and Graphics using R ,7
nd
edition-% Cabridge+
CUA%
I also find useful,
Adler, D%, 7&6&% R in a Nutshell% C3Reilly+ Sebastopol, CA%
Cra(ley, 9D, 7&&:% Statistics# An Introduction using R% Chichester+ /iley ,(hich is a shortened
version of $he R %oo& by the sae author-%
4ield, A%, 9iles, D% R 4ield, S%, 7&67% Discovering Statistics Using R% 'ondon+ Sage
Ho(ever, none of these boo#s is about apping or spatial analysis ,of particular interest to e as a
geographer-% 4or that, the authoritative guide a#ing the lin#s bet(een geographical inforation
<
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
science, geographical data analysis and R ,but not really (ritten for R ne(coers- is,
Bivand, R%S%, Aebesa, H%D% R GTeF!Rubio, U%, 7&&>% Applied Spatial Data Analsis with R%
Berlin+ Springer%
Also helpful is,
/ard, 9%5% R S#rede Gleditsch, V%, 7&&>% Spatial Regression 'odels% 'ondon+ Sage% ,/hich uses
R code e0aples-%
2he follo(ing boo# has a short section of aps as (ell as other graphics in R ,and is also, as the
title suggests, good for practical guidance on ho( to analyse surveys using cluster and stratified
sapling, for e0aple-+
'uley, 2%, 7&6&% (o"ple) Surves. A Guide to Analsis Using R. Hobo#en, "D+ /iley%
Springer publish an ever!gro(ing series of boo#s under the banner Use RK If you are interested in
visualiFation, tie!series analysis, Bayesian approaches, econoetrics, data ining, W, then you3ll
find soething of relevance at http+BB(((%springer%coBseriesB=<<6% But you ay (ell also find
(hat you are loo#ing for for free on the Internet%
6&
$
=
<
67
Session 2: A %emonstration o& R
2his session provides a 8uic# tour of soe of R3s functionality, (ith a focus on soe geographical
applications% 2he idea here is to sho(case a little of (hat R can do rather than providing a
coprehensive e0planation to all that is going on% Ai for an intuitive understanding of the
coands and procedures but do not (orry about the detail% 9ore inforation about the (or#ings
of R is given in the ne0t session% An siple introduction to graphics and statistical analysis in R is
given in Statistics for Geograph and !nviron"ental Science# An Introduction in R, available at
http+BB(((%social!statistics%orgBPpQ$:?%
2.1 Getting Started
Instead of re8uiring you to type a series of coands, in this session they can be e0ecuted
autoatically fro a previously (ritten source file ,a script+ see Section 6%$%6, page @-% As the
coands are e0ecuted (e (ill as# R to echo ,print- the on screen so you can follo(ing (hat is
going on% At regular intervals you (ill be propted to press return before the script continues%
2o begin, either+
type,
> so(%ce(&ile.choose(), echo9T)
and load the file session7%R% After soe coents that you can ignore, you (ill be propted to
load the %csv file schools%csv+
> ## Read in the &ile schools.cs' &ile
> ;ait()
Hlease p%esss %et(%n
schools.data 7" %ead.cs'(&ile.choose())
Assuing there is no error, (e (ill no( proceed to a siple inspection of the data%
*r+
type,
> schools.data 7" %ead.cs'(&ile.choose())
in (hich case you (ill need to type all the coands that follo( anually as (ell ,no need to type
the coents, of course+ X%%%-
2.2 'hec(ing the data
It is al(ays sensible to chec# a data table for any obvious errors%
> head(schools.data) # Bho;s the &i%st &e; %o;s o& the data
> tail(schools.data) # Bho;s the )ottom &e; %o;s o& the data
/e can produce a suary of each colun in the data table using
> s(mma%(schools.data)
In this instance, each colun is a continuous variable so (e obtain a si0!nuber suary of the
centre and spread of each variable%
2he naes of the variables are
> names(schools.data)
66
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
"e0t the nuber of coluns and ro(s, and a chec# N ro(!by!ro( N to see if the data are coplete
,have no issing data-%
> ncol(schools.data)
> n%o;(schools.data)
> complete.cases(schools.data)
It is not the ost coprehensive chec# but everything appears to be in order%
2.3 Some simp$e graphics
2he file schools%csv contains inforation about the location and soe attributes of schools in
Greater 'ondon ,in 7&&>-% 2he locations are given as a grid reference ,Hasting, "orthing-% 2he
inforation is not real but is realistic% It should not, ho(ever, be used to a#e inferences about real
schools in 'ondon%
Cf particular interest is the average attainent on leaving priary school of pupils entering their
first year of secondary school% 5o soe schools in 'ondon attract higher attaining pupils ore than
othersP 2he variable attainent contains this inforation%
A stripchart and then a histogra (ill sho( that ,not surprisingly- there is variation in the average
prior attainent by school%
> attach(schools.data)
> st%ipcha%t(attainment, method9"stack", >la)9"Mean H%io% ?ttainment ) Bchool")
> hist(attainment, col9"light )l(e", )o%de%9"da%k )l(e", &%e*9@, lim9c(!,!.-!),
+ >la)9IMean attainment)
Here the histogra is scaled so the total area sus to one% 2o this (e can add a rug plot,
> %(g(attainment)
also a density curve, a "oral curve for coparison and a legend%
> lines(densit(so%t(attainment)))
> >> 7" se*(&%om92-, to9-#, )9!.1)
> 7" dno%m(>>, mean(attainment), sd(attainment))
> lines(>>, , lt9"dotted")
> %m(>>, )
> legend("top%ight", legend9c("densit c(%'e","Jo%mal c(%'e"),
+ lt9c("solid","dotted"))
If (ould be interesting to #no( if attainent varies by school type% A siple (ay to consider this is
to produce a bo0 plot% 2he data contain a series of duy variables for each of a series of school
types ,Uoluntary Aided Church of Hngland+ coe Q 61 Uoluntary Aided Roan Catholic+ rc Q 61
Uoluntary controlled faith school+ vol%con Q 61 another type of faith school+ other%faith Q 61 a
selective school ,(ith an entrance e0a-+ selective Q 6-% /e (ill cobine these into a single,
categorical variable then produce the bo0 plot sho(ing the distribution of average attainent by
school type%
4irst the categorical variable+
> school.tpe 7" %ep("Jot @aith.Belecti'e", times9n%o;(schools.data))
> school.tpe[coe991] 7" "K? Lo="
> school.tpe[%c991] 7" "K? RL"
> school.tpe['ol.con991] 7" "KL"
> school.tpe[othe%.&aith991] 7" "Mthe% @aith"
> school.tpe[selecti'e991] 7" "Belecti'e"
> school.tpe 7" &acto%(school.tpe)
67
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
> le'els(school.tpe)
[1] "Jot @aith.Belecti'e" "Mthe% @aith" "Belecti'e" [etc.]
"o( the bo0 plots+
> pa%(mai9c(1,1.2,!.#,!.#)) # Lhanges the g%aphic ma%gins
> )o>plot(attainment N school.tpe, ho%iOontal9T, >la)9"Mean attainment", las91,
+ ce>.a>is9!.5) # Gncl(des options to d%a; the )o>es and la)els ho%iOontall
> a)line('9mean(attainment), lt9"dashed") # ?dds the mean 'al(e to the plot
> legend("top%ight", legend9"D%and Mean", lt9"dashed")
Figure +.1. A histogra" with annotation in R
Figure +.+. 'ean prior attain"ent , school tpe
"ot surprisingly, the selective schools recruit the pupils (ith highest average prior attainent%
6$
$
=
<
2.4 Some simp$e statistics
It appears that there are differences in the levels of prior attainent of pupils in different school
types% /e can test (hether the variation is significant using an analysis of variance%
> s(mma%(ao'(attainment N school.tpe))
P& B(m B* Mean B* @ 'al(e H%(>@)
school.tpe # 246.5 6#.6# 41.22 72e"13 $$$
Resid(als -31 25#.! 1.-2
It is, at a greater than <<%<Y confidence ,4 Q @6%?7, p Z &%&&6-%
/e ight also be interested in coparing those schools (ith the highest and lo(est proportions of
4ree School 9eal eligible pupils to see if they are recruiting pupils (ith e8ual or differing ean
prior attainent%
> attainment.high.&sm.schools 7" attainment[&sm > *(antile(&sm, p%o)s9!.4#)]
# @inds the attainment sco%es &o% schools ;ith the highest p%opo%tions o& @BM p(pils
> attainment.lo;.&sm.schools 7" attainment[&sm 7 *(antile(&sm, p%o)s9!.2#)]
# @inds the attainment sco%es &o% schools ;ith the lo;est p%opo%tions o& @BM p(pils
> t.test(attainment.high.&sm.schools, attainment.lo;.&sm.schools)
Qelch T;o Bample t"test
data/ attainment.high.&sm.schools and attainment.lo;.&sm.schools
t 9 "1#.!2-1, d& 9 1#2.132, p"'al(e 7 2.2e"13
alte%nati'e hpothesis/ t%(e di&&e%ence in means is not e*(al to !
6# pe%cent con&idence inte%'al/
"-.2-42!3 "2.3-622!
sample estimates/
mean o& > mean o&
23.#5-#2 26.32142
It coes as little surprise to learn that those schools (ith the greatest proportions of 4S9 eligible
pupils are also those recruiting lo(er attaining pupils on average ,ean attainent 7=%= Us 7<%=, t Q
!6:%&, p Z &%&&6-%
H0ploring this further, the Aearson correlation bet(een the ean prior attainent of pupils entering
each school and the proportion of the that are 4S9 eligible is !&%=><, and significant ,p Z &%&&6-+
> %o(nd(co%(&sm, attainment),-)
> co%.test(&sm, attainment)
Hea%sonRs p%od(ct"moment co%%elation
data/ &sm and attainment
t 9 "15.14-1, d& 9 -3#, p"'al(e 7 2.2e"13
alte%nati'e hpothesis/ t%(e co%%elation is not e*(al to !
6# pe%cent con&idence inte%'al/
"!.4-6213# "!.3-1-6-6
sample estimates/
co%
"!.35621#6
Cf course, the use of the Aearson correlation assues that the relationship is linear, so let3s chec#+
> plot(attainment N &sm)
> a)line(lm(attainment N &sm)) # ?dds a line o& )est &it (a %eg%ession line)
2here is soe suggestion the relationship ight be curvilinear% Ho(ever, (e (ill ignore that here%
4inally, soe regression odels% 2he first see#s to e0plain the ean prior attainent scores for the
6?
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
schools in 'ondon by the proportion of their inta#e (ho are free school eal eligible% ,2he result is
the regression line sho(n on the scatterplot above-%
2he second adds a variable giving the proportion of the inta#e of a (hite ethnic group%
2he third adds a duy variable indicating (hether the school is selective or not%
> model1 7" lm(attainment N &sm, data9schools.data)
> s(mma%(model1)
Lall/
lm(&o%m(la 9 attainment N &sm, data 9 schools.data)
Resid(als/
Min 1S Median -S Ma>
"2.5541 "!.421- "!.1153 !.#254 -.3351
Loe&&icients/
=stimate Btd. =%%o% t 'al(e H%(>TtT)
(Gnte%cept) 26.316! !.1125 2#5.12 72e"13 $$$
&sm "3.#236 !.-3!- "15.14 72e"13 $$$
"""
Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.!# U.V !.1 U V 1
Resid(al standa%d e%%o%/ 1.145 on -3# deg%ees o& &%eedom
M(ltiple R"s*(a%ed/ !.24#,?dC(sted R"s*(a%ed/ !.24-3
@"statistic/ --!.- on 1 and -3# P@, p"'al(e/ 7 2.2e"13
> model2 7" lm(attainment N &sm + ;hite, data9schools.data)
> s(mma%(model2)
Lall/
lm(&o%m(la 9 attainment N &sm + ;hite, data 9 schools.data)
Resid(als/
Min 1S Median -S Ma>
"2.6222 "!.426# "!.1--# !.#111 -.45-4
Loe&&icients/
=stimate Btd. =%%o% t 'al(e H%(>TtT)
(Gnte%cept) -!.12#! !.1646 1#2.21 7 2e"13 $$$
&sm "4.2#!2 !.2212 "14.2! 7 2e"13 $$$
;hite "!.5422 !.2463 "-.12 !.!!163 $$
"""
Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.!# U.V !.1 U V 1
Resid(al standa%d e%%o%/ 1.132 on -32 deg%ees o& &%eedom
M(ltiple R"s*(a%ed/ !.2554, ?dC(sted R"s*(a%ed/ !.25#6
@"statistic/ 14-.6 on 2 and -32 P@, p"'al(e/ 7 2.2e"13
> model- 7" (pdate(model2, . N . + selecti'e)
> s(mma%(model-)
Lall/
lm(&o%m(la 9 attainment N &sm + ;hite + selecti'e, data 9 schools.data)
6:
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
Resid(als/
Min 1S Median -S Ma>
"2.3232 "!.#32! !.!#-4 !.#3!4 -.321#
Loe&&icients/
=stimate Btd. =%%o% t 'al(e H%(>TtT)
(Gnte%cept) 26.14!3 !.1356 142.412 72e"13 $$$
&sm "#.2-51 !.-#61 "12.#53 72e"13 $$$
;hite "!.2266 !.2226 "1.!22 !.-!4
selecti'e -.2435 !.2--5 12.542 72e"13 $$$
"""
Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.!# U.V !.1 U V 1
Resid(al standa%d e%%o%/ !.6156 on -3- deg%ees o& &%eedom
M(ltiple R"s*(a%ed/ !.352-, ?dC(sted R"s*(a%ed/ !.3463
@"statistic/ 2#6.5 on - and -3- P@, p"'al(e/ 7 2.2e"13
'oo#ing at the ad;usted R!s8uared value, each odel appears to be an iproveent on the one that
precedes it ,arginally so for odel 7-% Ho(ever, loo#ing at the last ,odel $-, (e ay suspect that
(e could drop the (hite ethnicity variable (ith no significant loss in the aount of variance
e0plained% An analysis of variance confirs that to be the case%
> model2 7" (pdate(model-, . N . " ;hite)
> ano'a(model2, model-)
?nalsis o& Ka%iance Ta)le
Model 1/ attainment N &sm + selecti'e
Model 2/ attainment N &sm + ;hite + selecti'e
Res.P& RBB P& B(m o& B* @ H%(>@)
1 -32 -!4.22
2 -3- -!3.#2 1 !.55222 1.!224 !.-!42
2he residual error, easured by the residual su of s8uares ,RSS-, is not very different for the t(o
odels, and that difference, &%>>7, is not significant ,4 Q 6%&?:, p Q &%$&@-%
2. Some simp$e maps
2he schools data contain geographical coordinates and are therefore geographical data%
Conse8uently they can be apped% 2he siplest (ay for point data is to use a 7!diensional plot,
a#ing sure the aspect ratio is fi0ed correctly%
> plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools")
Aongst the attribute data for the schools, the variable esl gives the proportion of pupils (ho spea#
Hnglish as an additional language% It (ould be interesting for the siFe of the sybol on the ap to
be proportional to it%
> plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools",
+ ce>9s*%t(esl$#))
It ight also be nice to add a little colour to the ap% /e ight, for e0aple, change the default
plotting 3character3 to a filled circle (ith a yello( bac#ground%
> plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools",
+ ce>9s*%t(esl$#), pch921, )g9"ello;")
A ore interesting option (ould be to have the circles filled (ith a colour gradient that is related to
a second variable in the data N the proportion of pupils eligible for free school eals for e0aple%
2o achieve this, (e can begin by creating a siple colour palette+
6=
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
> palette 7" c("ello;","o%ange","%ed","p(%ple")
/e no( cut the free school eals eligibility variable into 8uartiles ,four classes, each containing
appro0iately the sae nuber of observations-%
> map.class 7" c(t(&sm, *(antile(&sm), la)els9@?AB=, incl(de.lo;est9TR+=)
/hat has happened is that the fs variable has been split into four groups (ith the value 6 given to
the first 8uarter of the data ,schools (ith the lo(est proportions of eligible pupils-, the value 7 given
to the ne0t 8uarter, then $, and finally the value ? for schools (ith the highest proportions of 4S9
eligible pupils%
2here are, then, no( four ap classes and the sae nuber of colours in the palette% Schools in
ap class 6 ,and (ith the lo(est proportion of fs!eligible pupils- (ill be coloured yello(, the ne0t
class (ill be orange, and so forth%
Bringing it all together,
> plot(=asting, Jo%thing, asp91, main9"Map o& Aondon schools",
+ ce>9s*%t(esl$#), pch921, )g9palette[map.class])
It (ould be good to add a legend, and perhaps a scale bar and "orth arro(% "evertheless, as a first
ap in R this isn3t too badK
Figure +.-. A si"ple point "ap in R
/hy don3t (e be a bit ore abitious and overlay the ap on a Google 9aps tile, adding a legend
as (e do soP 2his re8uires us to load an additional library for R and to have an active Internet
connection%
> li)%a%(RgoogleMaps)
,If it hasn3t been installed, it could be using install.packages(c("RgoogleMaps","png")) (hich
installs both it and another pac#age, png, that it re8uires for any functions-%
Assuing that the data frae, schools%data, reains in the (or#space and attached ,it (ill be if you
have follo(ed the instructions above-, and that the colour palette created above has not been
deleted, then the ap sho(n in 4igure 7%? is created (ith the follo(ing code+
> MMap 7" MapWackg%o(nd(lat9Aat, lon9Aong)
> HlotMnBtaticMap(MMap, Aat, Aong, ce>9s*%t(esl$#), pch921,
6@
$
=
<
67
6:
6>
76
7?
7@
)g9palette[map.class])
> legend("tople&t", legend9paste("7",tappl(&sm, map.class, ma>)),
pch921, pt.)g9palette, pt.ce>91.#, )g9";hite", title9"H(@BM"eligi)le)")
> legKals 7" se*(&%om9!.2,to91,)9!.2)
> legend("top%ight", legend9%o(nd(legKals,-), pch921, pt.)g9";hite",
pt.ce>9s*%t(legKals$#), )g9";hite", title9"H(=BA)")
Reeber that the data are siulated% 2he points sho(n on the ap are not the true locations of
schools in 'ondon%
Figure +... A slightl less si"ple "ap produced in R
2." Some simp$e geographica$ ana$)sis
Reeber the regression odels fro earlierP It (ould be interesting to test the assuption that
the residuals e0hibit independence by loo#ing for spatial dependencies% 2o do this (e (ill consider
to (hat degree the residual value for any one school correlates (ith the ean residual value for its
si0 nearest other schools ,the choice of si0 is copletely arbitrary-%
4irst, (e (ill ta#e a copy of the schools data and convert that into an e0plicitly spatial ob;ect in R+
> detach(schools.data)
> schools.> 7" schools.data
> li)%a%(sp)
> attach(schools.>)
> coo%dinates(schools.>) 7" c("=asting", "Jo%thing")
> # Lon'e%ts into a spatial o)Cect
> class(schools.>)
6>
$
=
<
67
6:
6>
76
> detach(schools.>)
> p%oC2st%ing(schools.>) 7" LRB("+p%oC9tme%c dat(m9MBDW-3")
> # Bets the Loo%dinate Re&e%encing Bstem
Second, (e find the si0 nearest neighbours for each school%
> li)%a%(spdep)
> nea%est.si> 7" knea%neigh(schools.>, k93, R?JJ9@)
> # R?JJ 9 @ to o'e%%ide the (se o& the R?JJ package that ma not )e installed
/e can learn fro this that the si0 nearest schools to the first school in the data ,ro( 6- are schools
:, $>, 7, ?&, 77$ and =+
> nea%est.si>Xnn[1,]
[1] # -5 2 2! 22- 3
2he neighbours ob;ect, nearest%si0, is an ob;ect of class #nn+
> class(nea%est.si>)
It is ne0t converted into the ore generic class of neighbours%
> neigh)o(%s 7" knn2n)(nea%est.si>)
> class(neigh)o(%s)
[1] "n)"
> s(mma%(neigh)o(%s)
Jeigh)o(% list o)Cect/
J(m)e% o& %egions/ -34
J(m)e% o& nonOe%o links/ 22!2
He%centage nonOe%o ;eights/ 1.3-2544
?'e%age n(m)e% o& links/ 3
[etc.]
2he connections bet(een each point and its neighbours can then be plotted% It ay ta#e a fe(
inutes%
> plot(neigh)o(%s, coo%dinates(schools.>))
Having identified the si0 nearest neighbours to each school (e could give each e8ual (eight in a
spatial (eights atri0 or, alternatively, decrease the (eight (ith distance a(ay ,so the first nearest
neighbour gets ost (eight and the si0th nearest the least-% Creating a atri0 (ith e8ual (eight
given to all neighbours is straightfor(ard%
> spatial.;eights 7" n)2list;(neigh)o(%s)
,2he other possibility (ill not be considered further here but is achieved by creating then supplying
a list of general (eights to the function-
/e no( have all the inforation re8uired to test (hether there are spatial dependencies in the
residuals% 2he ans(er is yes ,9oran3s I Q &%76>, p Z &%&&6, indicating positive spatial
autocorrelation-%
> lm.mo%antest(model2, spatial.;eights)
Dlo)al Mo%anRs G &o% %eg%ession %esid(als
data/
model/ lm(&o%m(la 9 attainment N &sm + selecti'e, data 9 schools.data)
;eights/ spatial.;eights

Mo%an G statistic standa%d de'iate 9 4.61#2, p"'al(e 9 1.2-#e"1#
alte%nati'e hpothesis/ g%eate%
sample estimates/
M)se%'ed Mo%anRs G =>pectation Ka%iance
!.2151612352 "!.!!-5#5#4!2 !.!!!454!115
6<
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
?>
2.* +id)ing up
It is better to save your (or#space regularly (hilst you are (or#ing ,see Section 6%?%$, 3Saving and
loading (or#spaces3, page 6&- and certainly before you finish% 5on3t forget to include the
e0tension %R5ata (hen saving% Having done so, you can tidy!up the (or#space%
> sa'e.image(&ile.choose(ne;9T))
> %m(list9ls()) # We ca%e&(l, it deletes e'e%thingE
7&
$
=
Session 3: A Litt$e ,ore about the wor(ings o& R
2his session provides a little ore guidances on the 3inner (or#ings3 of R% All the coands are
contained in file session$%R and can be run using it ,see 3Scripting3 on p%@-%
3.1 '$asses and t)pes
'et us create t(o ob;ects, each a vector containing ten eleents% 2he first (ill be the nubers fro
one to ten, recorded as integers% 2he second (ill be the sae se8uence but no( recorded as real
nubers ,that is, 3floating point3 nubers, those (ith a decial place-%
> ) 7" 1/1!
> )
[1] 1 2 - 2 # 3 4 5 6 1!
> c 7" se*(&%om91.!, to91!.!, )91)
> c
[1] 1 2 - 2 # 3 4 5 6 1!
"ote that in the second case, (e could ;ust type,
> c 7" se*(1, 1!, 1)
> c
[1] 1 2 - 2 # 3 4 5 6 1!
2his (or#s because if (e don3t e0plicitly define the arguent ,so oit &%om91 etc%- then R (ill
assue that (e are giving values to the arguents in their default order, (hich in this case is fro,
to and by%2ype ?seq and loo# under Usage for this to a#e a little ore sense%
In any case, the t(o ob;ects, b and c, are printed the sae on screen but one is an ob;ect of class
integer (hereas the other is an ob;ect of class nueric and of type double ,double precision in the
eory space-%
> class())
[1] "intege%"
> class(c)
[1] "n(me%ic"
> tpeo&(c)
[1] "do()le"
Cften it possible to coerce an ob;ect fro one class and type to another%
> ) 7" 1/1!
> class())
[1] "intege%"
> ) 7" as.do()le())
> class())
[1] "n(me%ic"
> tpeo&())
[1] "do()le"
> class(c)
> c 7" as.intege%(c)
> class(c)
[1] "intege%"
> c
[1] 1 2 - 2 # 3 4 5 6 1!
> c 7" as.cha%acte%(c)
> class(c)
[1] "cha%acte%"
76
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
> c
[1] "1" "2" "-" "2" "#" "3" "4" "5" "6" "1!"
2he e0aples above are trivial% Ho(ever, it is iportant to understand that seeingly generic
functions li#e s(mma%(...) ay produce outputs that are dependent upon the class type% 2ry, for
e0aple,
> class())
[1] "n(me%ic"
> s(mma%())
Min. 1st S(. Median Mean -%d S(. Ma>.
1.!! -.2# #.#! #.#! 4.4# 1!.!!
> class(c)
[1] "cha%acte%"
> s(mma%(c)
Aength Llass Mode
1! cha%acte% cha%acte%
In the first instance, a si0 nuber suary of the centre and spread of the nueric data is given%
2hat a#es no sense for character data% 2he second suary gives the length of the vector, its class
type and its storage ode%
A ore interesting e0aple is provided if (e consider the plot(...) coand, used first (ith a
single data variable, secondly (ith t(o variables in a data table, and finally on a odel of the
relationship bet(een those t(o variables%
2he first variable is created by generating 6&& observations dra(n randoly fro a "oral
distribution (ith ean of 6&& and a standard deviation of 7&%
> 'a%1 7" %no%m(n91!!, mean91!!, sd92!)
Being rando, the data assigned to the variable (ill differ fro user to user% Usually (e (ould
(ant this% Ho(ever, in this case it ight be easier to ensure (e all get the sae+
> set.seed(1)
> 'a%1 7" %no%m(n91!!, mean91!!, sd92!)
Al(ays chec# the data,
> class('a%1)
[1] "n(me%ic"
> length('a%1) # The n(m)e% o& elements in the 'ecto%
[1] 1!!
> s(mma%('a%1)
Min. 1st S(. Median Mean -%d S(. Ma>.
##.41 6!.12 1!2.-! 1!2.2! 11-.5! 125.!!
> head('a%1) # The &i%st &e; elements
[1] 54.24!62 1!-.34254 5-.2542- 1-1.6!#32 1!3.#6!13 5-.#6!3-
> tail('a%1) # The last &e; elements
[1] 1-1.4-334 111.1364- 42.23513 55.#-236 4#.#!44# 6!.#-166
2hey see fineK Returning to the use of the plot(...) coand, in this instance it siply plots the
data in order of their position in the vector%
> plot('a%1)
77
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
Figure -.1. A si"ple plot of a nu"eric vector
2o deonstrate a different interpretation of the plot coand, a second variable is no( created that
is a function of the first but (ith soe rando error%
> set.seed(1!1)
> 'a%2 7" - $ 'a%1 + 1! + %no%m(1!!, !, 2#)
# ;hich, )eca(se n, mean and sd a%e the &i%st th%ee a%g(ments into %no%m
# is the same as ;%iting 'a%2 7" - $ 'a%1 + 1! + %no%m(n91!!, mean91!!, sd92!)
> head('a%2)
[1] 232.2316 --2.5-!1 222.6554 211.!4#5 --4.#-64 26!.1211
"e0t, the t(o variables are gathered together in a data table, of class data frae, (here each ro( is
an observation and each colun is a variable% 2here is ore about data fraes on page 7@, in
Section $%7 ,35ata fraes3-
> mdata 7" data.&%ame(> 9 'a%1, 9 'a%2)
> class(mdata)
[1] "data.&%ame"
> head(mdata)
>
1 54.24!62 232.2316
2 1!-.34254 --2.5-!1
- 5-.2542- 222.6554
2 1-1.6!#32 211.!4#5
# 1!3.#6!13 --4.#-64
3 5-.#6!3- 26!.1211
> n%o;(mdata) # The n(m)e% o& %o;s in the data
[1] 1!!
> ncol(mdata) # The n(m)e% o& col(mns
[1] 2
In this case, plotting the data frae (ill produce a scatter plot% ,2he line of best fit also sho(n in
4igure $%6 (ill be added shortly-%
> plot(mdata)
If there had been ore than t(o coluns in the data table, or if they had not been arranged in 0, y
order, then the plot could be produced by referencing the coluns directly% All the follo(ing are
e8uivalent+
7$
$
=
<
67
6:
6>
76
7?
7@
$&
> ;ith(mdata, plot(>, )) # Ye%e the o%de% is >,
> ;ith(mdata, plot( N >)) # Ye%e it is N >
> plot(mdataX>, mdataX)
> plot(mdata[,1], mdata[,2]) # Hlot (sing the &i%st and second col(mns
> plot(mdata[,2] N mdata[,1])
2he attach(...) coand could also be used% 2his is introduced in Section $%7%7, 3Attaching a data
frae3 on page 7<%
Figure -.+. A scatter plot. A line of ,est fit has ,een added.
2he line of best fit in 4igure $%7 is a regression line% 2o fit the regression odel, suarising the
relationship bet(een y and 0, use
> model1 7" lm( N >, data9mdata) # lm is sho%t &o% linea% model
> class(model1)
[1] "lm"
odel6 is an ob;ect of class l, short for linear odel% Using the s(mma%(...) function suarises
the relationship bet(een y and 0%
> s(mma%(model1)
Lall/
lm(&o%m(la 9 N >, data 9 mdata)
Resid(als/
Min 1S Median -S Ma>
"#4.1!2 "13.242 !.252 1#.155 24.26!
Loe&&icients/
=stimate Btd. =%%o% t 'al(e H%(>TtT)
(Gnte%cept) 5.3232 1-.32!5 !.3-# !.#24
> -.!!22 !.1-1- 22.545 72e"13 $$$
"""
Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.!# U.V !.1 U V 1
Resid(al standa%d e%%o%/ 2-.24 on 65 deg%ees o& &%eedom
M(ltiple R"s*(a%ed/ !.522-, ?dC(sted R"s*(a%ed/ !.52!4
@"statistic/ #2-.2 on 1 and 65 P@, p"'al(e/ 7 2.2e"13
"o( using the plot(...) function on the ob;ect of class l has an effect that is soe(hat different
fro the previous t(o cases% It produces a series a diagnostic plots to help chec# the assuptions of
7?
$
=
<
67
6:
6>
76
7?
7@
$&
regression have been et%
> plot(model1)
2he first plot is a chec# for non!constant variance and outliers, the second for norality of the
odel residuals, the third is siilar to the first, and the fourth identifies both e0tree residuals and
outliers%
2hese four plots can be vie(ed together, changing the default graphical paraeters to sho( the
plots in a 7!by!7 array ,as in 4igure $%$-%
> pa%(m&%o; 9 c(2,2)) # Bets the g%aphical o(tp(t to )e 2 > 2
> plot(model1)
4inally, (e ight li#e to go bac# to our previous scatter plot and add the regression line of best fit
to it,
> pa%(m&%o; 9 c(1,1)) # Resets the ;indo; to a single g%aph
> plot(mdata)
> a)line(model1)
Figure -.-.Default plots for an o,/ect of class linear "odel
3.2 %ata &rames
2he preceding section introduced the data frae as a class of ob;ect containing a table of data (here
the variables are the coluns of the data and the ro(s are the observations%
> class(mdata)
> s(mma%(mdata)
'oo#ing at the data suary, the ob;ect ydata contains t(o coluns, labelled 0 and y% 2hese
colun headers can also be revealed by using
7:
$
=
<
67
6:
6>
76
> names(mdata)
[1] ">" ""
or (ith
> colnames(mdata)
[1] ">" ""
2he ro( naes appear to be the nubers fro 6 to 6&& ,the nuber of ro(s in the data-, though
actually they are character data+
> %o;names(mdata)
[1] "1" "2" "-" "2" "#" "3" "4" "5" [etc.]
> class(%o;names(mdata))
[1] "cha%acte%"
2he colun naes can be changed either individually or together% Individually+
> names(mdata)[1] 7" "'1"
> names(mdata)[2] 7" "'2"
> names(mdata)
[1] "'1" "'2"
And all at once+
> names(mdata) 7" c(">","")
> names(mdata)
[1] ">" ""
W as can the ro( naes,
> %o;names(mdata)[1] 7" "!"
> %o;names(mdata)
[1] "!" "2" "-" "2" "#" "3" "4" "5" [etc.]
> %o;names(mdata) 9 se*(&%om9!, )91, length.o(t9n%o;(mdata))
> %o;names(mdata)
[1] "!" "1" "2" "-" "2" "#" "3" "4" "5" [etc.]
2he above can be especially useful (hen erging data tables (ith GIS shapefiles in R ,because the
first entry in an attribute table for a shapefile usually is given an I5 of &-% Cther(ise, it is usually
easiest for the first ro( in a data table to be labelled 6, so let3s put the bac# to ho( they (ere%
> %o;names(mdata) 9 1/n%o;(mdata)
> %o;names(mdata)
[1] "1" "2" "-" "2" "#" "3" "4" "5" [etc.]
3.2.1 Referencing ro#s and columns in a data frame
2he s8uare brac#et notation can be used to inde0 specific ro(, coluns or cells in the data frae%
4or e0aple+
> mdata[1,] # The &i%st %o; o& data
>
1 54.24!62 232.2316
> mdata[2,] # The second %o; o& data
>
2 1!-.3426 --2.5-!1
> %o(nd(mdata[2,],2) # The second %o;, %o(nded to 2 decimal places
>
2 1!-.34 --2.5-
> mdata[n%o;(mdata),] # The &inal %o; o& the data
>
1!! 6!.#-166 231.2-3
> mdata[,1] # The &i%st col(mn o& data
7=
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
?>
[1] 54.24!62 1!-.34254 5-.2542- 1-1.6!#32 [etc.]
> mdata[,2] # The second col(mn, ;hich is also F
[1] 232.2316 --2.5-!1 222.6554 211.!4#5 --4.#-64 [etc.]
> mdata[,ncol(mdata)] # F the &inal col(mn o& data
[1] 232.2316 --2.5-!1 222.6554 211.!4#5 --4.#-64 [etc.]
> mdata[1,1] # The data in the &i%st %o; o& the &i%st col(mn
[1] 54.24!62
> mdata[#,2] # The data in the &i&th %o; o& the second col(mn
[1] --4.#-64
> %o(nd(mdata[#,2],!)
[1] --5
Specific coluns of data can also be referenced using the $ notation
> mdataX> # =*(i'alent to mdata[,1] )eca(se the col(mn name is >
[1] 54.24!62 1!-.34254 5-.2542- 1-1.6!#32 1!3.#6!13 [etc.]
> mdataX
[1] 232.2316 --2.5-!1 222.6554 211.!4#5 --4.#-64 26!.1211 [etc.]
> s(mma%(mdataX>)
Min. 1st S(. Median Mean -%d S(. Ma>.
##.41 6!.12 1!2.-! 1!2.2! 11-.5! 125.!!
> s(mma%(mdataX)
Min. 1st S(. Median Mean -%d S(. Ma>.
12!.2 252.1 -12.1 -1#.3 -##.4 224.3
> mean(mdataX>)
[1] 1!2.1444
> median(mdataX)
[1] -12.1223
> sd(mdataX>) # Di'es the standa%d de'iation o& >
[1] 14.63-66
> )o>plot(mdataX)
> )o>plot(mdataX, ho%iOontal9T, main9"Wo>plot o& 'a%ia)le ")
,Bo0plots are soeties said to be easier to read (hen dra(n horiFontally-
Cne (ay to avoid the use of the X notation is to use the function ;ith(...) instead+
> ;ith(mdata, 'a%(>)) # Di'es the 'a%iance o& >
[1] -22.4!25
> ;ith(mdata, plot(, >la)9"M)se%'ation n(m)e%"))
3.2.2 %ttacing a data frame
Soeties any of the (ays to access a specific part of a data table becoes tiresoe and it is useful
to reference the colun or variable nae directly% 4or e0aple, instead of having to type
mean(mdata[,1]), mean(mdataX>) or ;ith(mdata, mean(>)) it (ould be easier ;ust to refer to the
variable of interest, 0, as in mean(>)%
2o achieve this the attach(...)coand is used% Copare, for e0aple,
> mean(>)
=%%o% in mean(>) / o)Cect R>R not &o(nd
,(hich generates an error because there is not an ob;ect called 0 in the (or#space1 it is only a
colun nae (ithin the data frae ydata- (ith
> attach(mdata)
> mean(>)
[1] 1!2.1444
7@
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
?>
(hich (or#s fine% If, to use the earlier analogy, ob;ects in R3s (or#space are li#e bo0 files, then no(
you have opened one up and its contents ,(hich include the variable 0- are visible%
2o detach the contents of the data frae use detach(...)
> detach(mdata)
> mean(>)
=%%o% in mean(>) / o)Cect R>R not &o(nd
It is sensible to use detach (hen the data frae is no longer being used or else confusion can arise
(hen ultiple data fraes contain the sae colun naes, as in the follo(ing e0aple+
> attach(mdata)
> mean(>) # This ;ill gi'e the mean o& mdataX>
[1] 1!2.1444
> mdata2 9 data.&%ame(> 9 1/1!, 911/2!)
> head(mdata2)
>
1 1 11
2 2 12
- - 1-
2 2 12
# # 1#
3 3 13
> attach(mdata2)
The &ollo;ing o)Cect(s) a%e masked &%om RmdataR/
>,
> mean(>) # This ;ill no; gi'e the mean o& mdata2X>
[1] #.#
> detach(mdata2)
> mean(>)
[1] 1!2.1444
> detach(mdata)
> %m(mdata2)
3.2.3 Su!&setting te data ta!le and logical 'ueries
Subsets of a data frae can be created by referencing specific ro(s (ithin it% 4or e0aple, iagine
(e (ant a table only of those observations that have a a value above the ean of soe variable%
> attach(mdata)
> s()set 7" ;hich(> > mean(>))
> class(s()set)
[1] "intege%"
> s()set
[1] 2 2 # 4 5 6 11 12 1# 15 16 2! 21 22 2# -! -1 -- [etc.]
> mdata.s() 7" mdata[s()set,]
> head(mdata.s())
>
2 1!-.3426 --2.5-!1
2 1-1.6!#3 211.!4#5
# 1!3.#6!2 --4.#-64
4 1!6.4253 -#2.41##
5 112.433# -#1.2511
6 111.#1#3 -34.2423
"ote ho( the ro( naes of this subset have been inherited fro the parent data frae%
A ore direct approach is to define the subset as a logical vector that is either true or false
dependent upon (hether a condition is et%
7>
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
?>
:6
> s()set 7" > > mean(>)
> class(s()set)
[1] "logical"
> s()set
[1] @?AB= TR+= @?AB= TR+= TR+= @?AB= TR+= TR+= TR+= [etc.]
> mdata.s() 7" mdata[s()set,]
> head(mdata.s())
>
2 1!-.3426 --2.5-!1
2 1-1.6!#3 211.!4#5
# 1!3.#6!2 --4.#-64
4 1!6.4253 -#2.41##
5 112.433# -#1.2511
6 111.#1#3 -34.2423
A yet ore parsionious (ay of achieving the sae is+
> mdata.s() 7" mdata[> > mean(>),]
# Belects those %o;s that meet the logical condition, and all col(mns
> head(mdata.s())
>
2 1!-.3426 --2.5-!1
2 1-1.6!#3 211.!4#5
# 1!3.#6!2 --4.#-64
4 1!6.4253 -#2.41##
5 112.433# -#1.2511
6 111.#1#3 -34.2423
In the sae (ay, to select those ro(s (here 0 is greater than or e8ual to the ean of 0 and y is
greater than or e8ual to the ean of y
> mdata.s() 7" mdata[> >9 mean(>) Z >9 mean(),]
# The sm)ol Z is (sed &o% and
Cr, those ro(s (here 0 is less than the ean of 0 or y is less than the ean of y
> mdata.s() 7" mdata[> 7 mean(>) T 7 mean(),]
# The sm)ol T is (sed &o% o%
3.2.4 ,issing data
9issing data is given the value J?% 4or e0aple,
> mdata[1,1] 9 J?
> mdata[2,2] 9 J?
> head(mdata)
>
1 J? 232.2316
2 1!-.34254 J?
- 5-.2542- 222.6554
2 1-1.6!#32 211.!4#5
# 1!3.#6!13 --4.#-64
3 5-.#6!3- 26!.1211
R (ill, by default, report "A or an error (hen soe calculations are tried (ith issing data+
> mean(mdataX>)
[1] J?
> *(antile(mdataX)
=%%o% in *(antile.de&a(lt(mdataX) /
missing 'al(es and JaJRs not allo;ed i& Rna.%mR is @?AB=
2o overcoe this, the default can be changed or the issing data reoved%
7<
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
?>
:6
4or the first option,
> mean(mdataX>, na.%m9T)
[1] 1!2.-23-
> *(antile(mdataX, na.%m9T) # Pi'ides the data into *(a%tiles
![ 2#[ #![ 4#[ 1!![
12!.224# 252.1!32 -1-.4#-3 -#3.#532 224.3!2!
4or the second, there are various (ays to reove the issing data% 4or e0aple W
> s()set 7" Eis.na(mdataX>)
W creates a logical vector (hich is true (here the data values of 0 are not issing ,the E in the
e0presion eans not-+
> head(s()set)
[1] @?AB= TR+= TR+= TR+= TR+= TR+=
Using the subset,
> >2 7" mdataX>[s()set]
> mean(>2)
[1] 1!2.-23-
9ore succinctly,
> ;ith(mdata, mean(>[Eis.na(>)]))
[1] 1!2.-23-
Alternatively, a ne( data frae could be created (ithout any issing data (hereby any ro( (ith
any issing value is oitted%
> s()set 7" complete.cases(mdata)
> head(s()set)
[1] @?AB= @?AB= TR+= TR+= TR+= TR+=
> mdata.complete 9 mdata[s()set,]
> head(mdata.complete)
>
- 5-.2542- 222.6554
2 1-1.6!#32 211.!4#5
# 1!3.#6!13 --4.#-64
3 5-.#6!3- 26!.1211
4 1!6.425#5 -#2.41##
5 112.43326 -#1.2511
3.2. Reading data &rom a &i$e into a data &rame
2he accopanying file schools%csv contains inforation about the location and soe attributes of
schools in Greater 'ondon ,in 7&&>-% 2he locations are given as a grid reference ,Hasting,
"orthing-% 2he inforation is not real but is realistic% It should not, ho(ever, be used to a#e
inferences about real schools in 'ondon%
A standard (ay to read a file into a data frae, (ith cases corresponding to lines and variables to
fields in the file, is to use the %ead.ta)le(...) coand%
> 8%ead.ta)le
In the case of schools%csv, it is coa deliited and has colun headers% 'oo#ing through the
arguents for %ead.ta)le the data ight be read into R using
> schools.data 7" %ead.ta)le("schools.cs'", heade%9T, sep9",")
2his (ill only (or# if the file is located in the (or#ing directory, else the location ,path- of the file
$&
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
(ill need to be specified ,or the (or#ing directory- changed% 9ore conveniently, use &ile.choose()
> schools.data 7" %ead.ta)le(&ile.choose(), heade%9T, sep9",")
'oo#ing through the usage of read%table in the R help page, a variant of the coand is found
(here the defaults are for coa deliited data% So, ost siply, (e could use,
schools.data 7" %ead.cs'(&ile.choose())
Having read!in the data, soe basic chec#s of it are helpful,
> head(schools.data, n9-)
@BM =?A B=J ;hite )lk.ca% )lk.a&% indian pakistani [etc.]
1 !.3#6 !.#5- !.!-1 !.214 !.!-2 !.222 !.!!2 !.!2!
2 !.-61 !.222 !.!!1 !.-#! !.!54 !.123 !.!!- !.!12
- !.4!5 !.62- !.!-5 !.!25 !.!!! !.2-6 !.!!! !.!!2
# Kie;s the &i%st th%ee lines o& the data
> ncol(schools.data)
[1] 14
> n%o;(schools.data)
[1] -33
> s(mma%(schools.data)
@BM =?A B=J [etc.]
Min. /!.!!!! Min. /!.!!!! Min. /!.!!!!!
1st S(./!.1-2- 1st S(./!.1242 1st S(./!.!!5!!
Median /!.2#!! Median /!.-13# Median /!.!2!!!
Mean /!.24!2 Mean /!.-261 Mean /!.!2-!5
-%d S(./!.-564 -%d S(./!.#122 -%d S(./!.!-2!!
Ma>. /!.44-! Ma>. /1.!!!! Ma>. /!.11-!!
It sees to be fine%
4or ore about iporting and e0porting data in R, consult the R help docuent, R 5ata
IportBH0port%
3.3 Lists
A list is a little li#e a data frae but offers a ore fle0ible (ay to gather ob;ects of different classes
together% 4or e0aple,
> mlist 7" list(schools.data, model1, "a")
> class(mlist)
[1] "list"
2o find the nuber of coponents in a list, use length(...),
> length(mlist)
[1] -
Here the first coponent is the data frae containing the schools data% 2he second coponent is the
linear odel created earlier% 2he third is the character EaG% 2o reference a specific coponent,
double s8uare brac#ets are used+
> head(mlist[[1]], n9-)
@BM =?A B=J ;hite )lk.ca% )lk.a&% indian pakistani [etc.]
1 !.3#6 !.#5- !.!-1 !.214 !.!-2 !.222 !.!!2 !.!2!
2 !.-61 !.222 !.!!1 !.-#! !.!54 !.123 !.!!- !.!12
- !.4!5 !.62- !.!-5 !.!25 !.!!! !.2-6 !.!!! !.!!2
> s(mma%(mlist[[2]])
Lall/
$6
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
?:
lm(&o%m(la 9 N >, data 9 mdata)
Resid(als/
Min 1S Median -S Ma>
"#4.1!2 "13.242 !.252 1#.155 24.26!
Loe&&icients/
=stimate Btd. =%%o% t 'al(e H%(>TtT)
(Gnte%cept) 5.3232 1-.32!5 !.3-# !.#24
> -.!!22 !.1-1- 22.545 72e"13 $$$
"""
Bigni&. codes/ ! U$$$V !.!!1 U$$V !.!1 U$V !.!# U.V !.1 U V 1
Resid(al standa%d e%%o%/ 2-.24 on 65 deg%ees o& &%eedom
M(ltiple R"s*(a%ed/ !.522-, ?dC(sted R"s*(a%ed/ !.52!4
@"statistic/ #2-.2 on 1 and 65 P@, p"'al(e/ 7 2.2e"13
> class(mlist[[-]])
[1] "cha%acte%"
2he double s8uare brac#ets can be cobined (ith single ones% 4or e0aple,
> mlist[[1]][1,]
@BM =?A B=J ;hite )lk.ca% )lk.a&% indian pakistani [etc.]
1 !.3#6 !.#5- !.!-1 !.214 !.!-2 !.222 !.!!2 !.!2!
is the first ro( of the schools data% 2he first cell of the sae data is
> mlist[[1]][1,1]
[1] 24
3.4 -riting a &unction
In brief, a function is (ritten in R in the follo( (ay,
> &(nction.name 7" &(nction(list o& a%g(ments) 0
+ &(nction code
+ %et(%n(%es(lt)
+ 1
So, a siple function to divide the product of t(o nubers by their su could be,
> m.&(nction 7" &(nction(>1, >2) 0
+ %es(lt 7" (>1 $ >2) . (>1 + >2)
+ %et(%n(%es(lt)
+ 1
"o( running the function
> m.&(nction(-, 4)
[1] 2.1
3. R pac(ages &or mapping and spatia$ data ana$)sis
By default, R coes (ith a base set of pac#ages and ethods for data analysis and visualiFation%
Ho(ever, there are any other pac#ages available, too, that greatly e0tend R3s value and
functionality% 2hese pac#ages are listed alphabetically at http+BBcran%r!pro;ect%orgB(ebBpac#agesB
availableOpac#agesObyOnae%htl%
Because there are so any, it can be useful to bro(se the pac#ages by topic ,at http+BBcran%r!
pro;ect%orgB(ebBvie(sB-% 2he topic, or 3tas# vie(3 of particular interest here is the analysis of spatial
data+ http+BBcran%r!pro;ect%orgB(ebBvie(sBSpatial%htl
$7
$
=
<
67
6:
6>
76
7?
7@
$&
$$
$=
$<
?7
3.(.1 Installing and loading one or more of te pac$ages
Note# If reading this in class it is li&el that the pac&ages have ,een installed alread or ou will
not have the ad"inistrative rights to install the". If so0 this section is for infor"ation onl.
2o install a specific pac#age the install.packages(...) coand is used, as in+
> install.packages("ct'")
Gnstalling package(s) into U.+se%s.gg%Ch.Ai)%a%.R.2.1-.li)%a%V
(as Uli)V is (nspeci&ied)
t%ing +RA Rhttp/..c%an.(k.%"p%oCect.o%g.)in.macos>.leopa%d.cont%i).2.1-.ct'<!.4"
2.tgOR
Lontent tpe Rapplication.>"gOipR length 25636- )tes (252 \))
opened +RA
99999999999999999999999999999999999999999999999999
do;nloaded 252 \)
2he pac#age needs to be installed once but loaded each tie R is started $ using the li)%a%(...)
coand
> li)%a%("ct'")
In this case (hat has been installed is a pac#age that (ill no( allo( all the pac#ages associated
(ith the spatial tas# vie( to be installed together, using+
> install.'ie;s("Bpatial")
"ote that installing pac#ages ay, by default, re8uire access to a directoryBfolder for (hich
adinistrative rights are re8uired% If necessary, it is entirely possible to install R ,and therefore the
additional pac#ages- in, for e0aple, 39y 5ocuents3 or on a USB stic#%
3." +id)ing up and .uitting
*ou ay (ant to save andBor tidy up your (or#space before 8uitting R% See sections 6%: and 7%@ on
pages 66 and 77%
3.* /urther In&ormation
See An Introduction to R0 available at CRA" ,,http+BBcran%r!pro;ect%orgBanuals%htl- or by using
the drop!do(n Help enus in the RGui%
$$
$
=
<
67
6:
6>
76
7?
7@
$?

Das könnte Ihnen auch gefallen