Ben Fanson Simeon Lisovski Lecture Outline 1) introduction to R graphics 2) introduction to ggplot
Helpful references - http://www.cookbook-r.com/Graphs/ - ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham R graphics Pros 1) You can make almost any graph that you can think of 2) Graphics are publishable quality 3) Combined with the previous programming learned, you can 'easily' make very complex graphs to visualize your data and statistical models 4) You can make lots of graphs easily [e.g. plot for each individual]
Cons 1) it takes some effort to learn the language and quirks of the graphing approach
Overview of R main graphics R graphics
base plot [original R graphics]
- plot() - hist() - barplot() - pairs() plot(...) image(...) barplot (...) persp(...) pairs(...) and lots more.... Some advantages of base plot 1) I find it the easiest to build very customized plot since you build the plots one element at a time
#--- example code to build a plot by each element ---# plot.new() points(seq(0,1,0.1),seq(0,1,0.1), pch=1:10) axis(1,at=c(0.2,0.7)) axis(2,at=c(0.1,0.8)) mtext('xlab',1,line=2) mtext('ylab',2,line=2) box() abline(0,1, col='red') mtitle('Title',lr='') Some advantages of base plot 1) I find it the easiest way to build very customized plot, since you build the plots one element at a time
2) being the original, it is the most integrated with packages
base plot and methods ds <- data.frame(x=1:10,y=rnorm(10,1:10,3)) plot(ds$y ~ ds$x) base plot and methods ds <- data.frame(x=1:10,y=rnorm(10,1:10,3)) plot(ds$y ~ ds$x) lm_model <- lm( y ~ x, data=ds) par(mfrow=c(2,2)) plot(lm_model) base plot and methods how can plot() give you very different results????????????? ds <- data.frame(x=1:10,y=rnorm(10,1:10,3)) plot(ds$y ~ ds$x) lm_model <- lm( y ~ x, data=ds) par(mfrow=c(2,2)) plot(lm_model) plot() is not a single function How does plot() work? 1) plot() looks at the class of the object(s) and then choose another function e.g. plot( y ~ x ) plot asks what is class(y) and class(x) and since both are numeric vector, it makes a scatterplot
plot() is not a single function How does plot() work? 2) plot() looks at the class of the object(s) and then choose another function e.g. plot( lm_mod )
plot asks what is class(lm_mod), and since it is a 'lm' class, it runs function plot.lm() which makes four graphs by default methods(plot) base plot and methods Overview of R main graphics R graphics
base plot [original R graphics]
- plot() - hist() - barplot() - pairs() Overview of R main graphics R graphics grid graphics [ alternative framework]
base plot [original R graphics]
- plot() - hist() - barplot() - pairs() Overview of R main graphics R graphics grid graphics [ alternative framework]
base plot [original R graphics]
lattice - plot() - hist() - barplot() - pairs() - xyplot() - barchart() - wireframe() xyplot(...) Faceting (aka Trellising) barchart(...) Lattice can also do most things are base plot wireframe(...) Overview of R main graphics R graphics grid graphics [ alternative framework]
base plot [original R graphics]
lattice - plot() - hist() - barplot() - pairs() - xyplot() - barchart() - wireframe() Overview of R main graphics R graphics grid graphics [ alternative framework]
http://mandymejia.wordpress.com/2013/11/13/10-reasons-to-switch-to-ggplot-7/ ggplot(...) + geom_point(...) + facet_wrap(...) ggmap(...) + geom_tiles() why I use ggplot? 1) I like the faceting and grouping...makes it easy to make quick, yet complex graphs for data exploration
2) I found it easier to add a new layer
3) I liked the grouping options and colour schemes in ggplot
4) You can make up your own 'theme' that you can use over and over again 5) Lots of active development in the area cons of ggplot... 1) I find working with grid graphics more difficult than base plot. This makes it harder to do some of those final touches on the graph. [Note- ggplot2 community is active, so can often find the answer or get help easy enough]
2) no 3d plotting
3) Customising axis labels for facetted graphs can be annoying
4) cannot do double axes a) Hadley Wickham refuses to add this feature due to philosophical objections b) though I have heard of a workaround for it
Raster vs. vector graphics Raster images - method: based on a grid of dots (pixels). Each pixel is assigned a colour.
- file formats: jpg, tiff, bitmap, psd
- use: best for photographs Raster vs. vector graphics Vector images - method: based on mathematical equations to redraw the image
- file formats: eps, ps, pdf, ai
- use: best for drawings, logos, graphics. Much easier to do post- processing revisions
Raster vs. vector graphics Adobe illustrator for post-processing Illustrator is great for minor little touches to the graphs or collating multiple graphs into a single page.
<< illustrator quick demo >> Short introduction to ggplot
geoms geometric objects [think of as plot type] e.g. scatterplot, line graph, histogram
ggplot jargon geom_point() geom_line() geom_bar() geoms geometric objects [think of as plot type] e.g. scatterplot, line graph, histogram
aes aesthetics are the attributes associated with each geometric object
ggplot jargon aesthetics x-value = 2.4 y-value = 0.4 shape = dot colour = black transparency = opaque aesthetics x-value = c(1.7,2.4,2.7...) y-value = c(-0.5, 0.4,0.6...) line type = solid colour = black transparency = opaque geoms geometric objects [think of as plot type] e.g. scatterplot, line graph, histogram
aes aesthetics are the attributes associated with each geometric object
scales attributes of the x-axis and y-axis [and any z-axis]
ggplot jargon scales continuous ranges from -1.5 to 2.1 ticks marks at every 0.5
scales continuous ranges from -1.0 to 1.0 ticks marks at every 0.5
set.seed=100 ds <- data.frame(x=1:10,y=rnorm(10)) ggplot(ds, aes(x=x, y=y)) + geom_point(aes(size=y)) geoms geometric objects [think of as plot type] e.g. scatterplot, line graph, histogram
aes aesthetics are the attributes associated with each geometric object
scales attributes of the x-axis and y-axis [and any z-axis]
facets making separate plots broken up by one or two variables ggplot jargon facets set.seed=100 ds <- data.frame(x=1:30,y=rnorm(30), sex=rep(c('m','f'),each=15)) ggplot(ds, aes(x=x, y=y)) + geom_point(aes(size=y)) + facet_grid(.~sex) similar to dplyr grammar, think of it as a sentence that you are building 'specify dataset' + # ggplot(ds,...) ggplot grammar similar to dplyr grammar, think of it as a sentence that you are building 'specify dataset' + 'specify x, y, grouping variables' +
ggplot grammar # aes(x=,y=,col=, shape=) similar to dplyr grammar, think of it as a sentence that you are building 'specify dataset' + 'specify x, y, grouping variables' + 'specify plot layers (e.g. point, line, stat function)' +
ggplot grammar # geom_name() similar to dplyr grammar, think of it as a sentence that you are building 'specify dataset' + 'specify x, y, grouping variables' + 'specify plot layers (e.g. point, line, stat function)' + 'specify if you want faceting' +
ggplot grammar # facet_grid() similar to dplyr grammar, think of it as a sentence that you are building 'specify dataset' + 'specify x, y, grouping variables' + 'specify plot layers (e.g. point, line, stat function)' + 'specify if you want faceting' + 'specify minor details/options [labels, position of legend..]'
ggplot grammar # scale_name(), theme(), labs() example dataset Bird_id Sex Treatment Growth_rate 1 male t1 12.3 2 male t2 10.3 3 male t3 14.5 4 female t1 14.3 5 female t2 9.3 6 female t3 15.6 = ds ggplot( ds ) + geom_point( aes(x=sex, y= growth_rate ) )
scatterplot of data ggplot( ds ) + geom_point( aes(x=trt, y= growth_rate ) )
scatterplot of data ggplot( ds ) + geom_point( aes(x=trt, y= growth_rate, col=sex ) )
scatterplot of data colour by sex ggplot( ds ) + geom_point( aes(x=trt, y= growth_rate, col=sex ) ) + geom_line( aes(x=trt, y= growth_rate, col=sex, group=sex ) )
what if we want to add a line group = sex is needed only because trt is categorical. If trt was numeric, then it would not be needed ggplot( ds, aes(x=trt, y= growth_rate, col=sex, group=sex ) ) + geom_point( ) + geom_line( )
you can move aes() to ggplot() ggplot( ds, aes(x=trt, y= growth_rate, col=sex, group=sex ) ) + geom_point( ) + geom_line( ) + facet_grid(.~sex)
adding a facet row ~ column ['.' just means no grouping variable] 1) I have not had to specify the dataset anymore
2) all the geom adopt the same scales (no specifying x-range or y-range)
3) grouping by colour, shape, fill, etc. is easy
4) faceting is quick
5) a common language to everything (i.e. not a bunch of separate packages for different plot types) key points so far Learning about base plot - introducing basics of plot() - overlaying plots and customizing your plots - discuss some more advanced plotting functions
What's next Lecture 8: Hands on Section 1) get Lecture8.R from github
2) make sure that you have data/lecture7/ [same files as last week]
3) open up Lecture8.R in Rcourse_proj.Rpoj
4) start working through the example and then try the exercise Lecture 8 files