Sie sind auf Seite 1von 30

rstudio::conf

2019 / CHEATSHEETS
RStudio
RStudio
RStudio IDE
IDE
IDE ::: ::: CHEAT
CHEAT
CHEATSHEET
CHEAT SHEET
SHEET
SHEET
Documents
Documents
Documents and
and
and
DocumentsOpen
and Apps
Apps
Apps
Apps Write
Write
Write
Write Code
Code
Code
Code R R Support
Support
RImport
Support
RImport
Support Pro
Pro
Pro
Pro Features
Features
Features
Features
OpenShiny,
Shiny,R RMarkdown,
Markdown, Navigate
Navigate Open
Openininnew
new Save
SaveFind
Findand
and Compile
Compileasas Run
Run data
data History
Historyofofpast
past Display
Display.RPres
.RPresslideshows
slideshows Share
ShareProject
Project Active
Activeshared
shared
Open Shiny, Navigate Open ininnew
new Save
SaveFind and
and Compile asas selected
Run Import data History ofofpast Display .RPres slideshows Share
ShareProject
Project Active shared
knitr, Shiny,R R
knitr,Sweave,
Open Sweave, Markdown,
LaTeX,
LaTeX,
.Rd
.Rdfiles
Markdown, files Navigate
tabs
tabs Open
window
window Find
replace
replace Compile
notebook
notebook Run
selected Import
with data commands
withwizard
wizard History
commands past Display
toto File
File> >New.RPres
New File slideshows
File>> with
with
with
Collaborators
Collaborators Active shared
collaborators
collaborators
knitr,
knitr,
and Sweave,
andmore Sweave,
more LaTeX, .Rd
LaTeX,
ininSource
Source .Rdfiles
Pane
Pane files tabs
tabs window
window replace
replace notebook
notebook selected
code
code
selected with
withwizard run/copy toto File
wizard commands
run/copy
commands R File> >New
NewFile
RPresentation
Presentation File> > withCollaborators
Collaborators collaborators
collaborators Startnew
Start newR RSession
Session
and code run/copy R RPresentation Start new
newRproject
Session
Rproject
Session
andmore
moreininSource
SourcePane
Pane code run/copy Presentation T TH HJ J inStart
incurrent
current
inincurrent project
Check
Check Render
Render Choose
Choose Choose
Choose Insert
Insert T TH HJ J T H J
T H J current
Close project
CloseRR
Check
spelling Render
spelling
Check output Choose
output
Render output Choose
output
Choose output Insert
output
Choose code
code
Insert
T H J
T H J Close R Rinin
spelling output output output code Session
Session
Close
spelling output format
output output
format location code
location chunk
chunk Session
projectinin
Session
project
format location
format location chunk chunk project
project
Select
Select
Select
R Select
RVersion
Version
Cursorsofof Re-run
Cursors Re-run Source
Sourcewithwithoror Show Showfile file Load
Load Save
Save Delete
Deleteallall Search
Searchinsideinside R RVersion
Cursors
Cursors
shared
sharedusersof Re-run
ofusers Re-run
previous
previouscode code Source
Source
without
withoutwith
with
Echo
Echooror Show Show
outline
outlinefile
file Load
workspace Save
Load
workspace workspace Delete
Save
workspace saved all
savedobjects
Delete objects
all Search inside
environment
environment
Search inside Version
Jump
Jumptoto Jump
Jump Run Run Publish
Publish ShowShowfilefile shared users previous code without
shared users previous code without Echo outline Echo outline workspace
workspace workspace savedobjects
workspace saved objects environment
environment PROJECT
PROJECTSYSTEMSYSTEM
Jump
Jumptoto Jump
previous
previous toJumpnext Run
tonext selected Publish
Run
selected serverShow
toPublish
toserver Show
outlinefile
outline file PROJECT SYSTEM
previous to next selected to server outline Multiple
Multiplecursors/column
cursors/columnselection
selection Choose
Chooseenvironment
Choose
environmenttotodisplay
environment
displayfrom
totodisplay
from Display
Displayobjects objects PROJECT
File
File> >New SYSTEM
NewProject
Project
chunk
chunk chunk
previous chunk
to next lines
lines
selected to server outline Multiple
withAlt
with
MultipleAlt+cursors/column
+mouse drag.selection
mousedrag.
cursors/column selection listChoose
listofofparent
parentenvironments
environments
environment displayfrom
from Display
asDisplay objects
aslistlistoror
grid
grid
objects File
chunk
chunk chunk
chunk lineslines
withAlt
with Alt+ +mouse
mousedrag.
drag. listlistofofparent
parentenvironments
environments asaslistlistororgrid
grid File> >New NewProject
Project
Code
Codediagnostics
diagnosticsthatthatappear
appearininthe themargin.
margin. RStudio
RStudiosaves
savesthe thecall
callhistory,
history,
Access
Accessmarkdown
markdownguide guideatat RStudio saves the call
Access
Help markdown
Help> >Markdown
Markdown guide
Quick
Quickatat
Reference
Reference
Code
Hover
Hover
Code diagnostics
over that
overdiagnosticthatappear
diagnostic
diagnostics symbols
symbols
appear infor
inthe
for margin.
details.
the details.
margin. RStudio
workspace,
workspace, saves
and
and the callhistory,
working
working history,
Access markdown guide Hover over diagnostic symbols for details. workspace, and working
Help
Help> >Markdown
MarkdownQuick QuickReference
Reference Hover over diagnostic symbols for details. directory
directoryassociated
workspace, associated with
and working witha a
Syntax
Syntaxhighlighting
highlightingbased based Name
Nameofof directory
directory
project. associated
associated
project.It Itreloads
reloadseachwith
with
eachwhenaa
when
Jump
Jumptoto Set Setknitr
knitr Run Runthis
thisandand RunRunthis
this Syntax highlighting based Name ofproject
current
current
Name ofproject project. It Itreloads each
Jump
chunk
chunk
Jump toto Set
chunkknitr
knitr Run
chunk
Set allRunthis
thisand
allprevious
previous and Run
code this
code
Run chunk
chunk
this
ononyour
your
Syntax file's
file'sextension
extension
highlighting based current
currentproject
project
you
youre-open
re-open
project. eachwhen
a aproject.
project.
reloads when
ononyour file's extension
your file's extension you
youre-open
re-opena aproject.
project.
chunk chunk
chunk optionsoptions code
chunk all previous
code
all chunks code
chunks code chunk
previous chunk
options code chunks Tab
Tabcompletion
completiontotofinish finish
options code chunks Tab
Tabcompletion
function
function names,
names,
completion to
filefinish
file
to paths,
paths,
finish Displays
Displayssaved
savedobjects
objectsbyby View
Viewinindata
data View
Viewfunction
function RStudio
RStudioopens
opensplots
plotsinina adedicated
dedicatedPlots
Plotspane
pane
function names, file
filepaths, Displays saved objects byby View inindata
data View function RStudio
function
arguments,
arguments, names,
andandmore. more. paths, Displays
type
typewith
with saved
short
short objects
description
description View
viewer
viewer View
source
sourcefunction
code
code RStudioopens
opensplots
plotsinina adedicated
dedicatedPlots
Plotspane
pane
arguments, and
arguments, and more. more. type with short description
type with short description viewer
viewer source
sourcecode
code
Multi-language
Multi-languagecode code
Multi-language
snippets
snippetstotoquickly
Multi-language code
quickly codeuseuse Navigate
Navigate Open Openinin Export
Export DeleteDelete Delete
Delete
snippets
common
common
snippets to toquickly
blocks
blocks
quickly use
ofofcode.
code.
use Navigate
recent plots Open
recentplots
Navigate inin Export
window
window
Open plot
plot plot
Export Delete
Delete Delete
plot allDelete
allplots
plots
RStudio
RStudiorecognizes
recognizesthat
thatfiles namedapp.R,
filesnamed app.R, common
commonblocks blocksofofcode. code. recent
recentplots window plot
plots window plot plot plot allallplots
plots
RStudio recognizes
server.R,
server.R,
RStudio ui.R,
ui.R,and
and
recognizes that files
global.R filesnamed
global.R
that belong app.R,
belong
named toapp.R,
toa ashiny
shinyapp
app Jump
Jumptotofunction
functionininfile
file Change
Changefile filetype
type GUI
GUIPackage
Packagemanager
managerlists
listsevery
everyinstalled
installedpackage
package
server.R, ui.R, global.R Jump
Jumptotofunction
functionininfile Change
Changefile filetype GUI Package manager lists every installed package
server.R, ui.R, and global.R belong toa ashiny
and belong to shinyapp
app file type Create
Create UploadUploadDelete
Delete Rename
Rename Change
Change GUI Package manager lists every installed package
Create
folder Upload
folder
Create file
file
Upload Delete
file Rename
file
Delete file
file
Rename Change
directory
directory
Change
folder
folder file file file
file file file directory
directory
Install
Install Update
Update Create
Createreproducible
reproduciblepackage
package
Run
Run Choose
Choose Publish
Publishtoto Manage
Manage Path
Pathtotodisplayed
displayeddirectory
directory Install
Packages Update
Packages
Install PackagesCreate
Packages
Update library reproducible
libraryfor
Create foryour package
yourproject
project
reproducible package
Run
app Choose
app
Run locationtotoPublish
location
Choose Publish toto Manage
shinyapps.io
shinyapps.io publish
publish
Manage Working
Working Maximize,
Maximize, Path
Pathtotodisplayed
displayeddirectory
directory Packages
app Packages Packages
Packageslibrary
libraryforforyour
yourproject
project
app location
view
viewapp
app
locationtotoor
shinyapps.io
orserver
server
shinyapps.io publish
accounts
accounts
publish Working
Directory
Directory
Working Maximize,
minimize
minimizepanes
Maximize,panes
view app or server
view app or server accounts
accounts Directory
Directory minimize
minimizepanes
Press
Presstotosee
see Drag panepanes
Dragpane A AFile
Filebrowser
browserkeyed keyedtotoyour
yourworking
workingdirectory.
directory.
Press
Presstoto
command
command see
see Drag
history
history pane
boundaries
boundaries
Drag pane A AFile
Click
Clickon
Filebrowser
on
file
browser keyed
fileoror totoyour
directory
directory
keyed nameworking
name
your directory.
totoopen.
open.
working directory.
Click Click
Clicktotoload
loadpackage
packagewith
with Package Delete
Package Delete
command
commandhistory
history boundaries
boundaries Clickononfile
fileorordirectory
directoryname
nametotoopen.
open. Click totoload
library().
library().
Click package
Unclick
Unclick
load with
totodetach
package detach
with version Delete
Package
version
Package from
from
Delete
library().
library(). Unclick
Unclickto detach
to detach version
version from
from
Debug
Debug
DebugMode
Mode Version
Version Control
Controlwith
package
packagewith withdetach()
detach() installed library
installed library

Debug Mode Version Control


package with detach() installed library
Mode Version
TurnononatatControl
withGit
GitororSVN
SVN package with detach() installed library
with
withGit ororSVN
>Git SVN RStudio
RStudioopensopensdocumentation
documentationinina adedicated
dedicatedHelp
Helppane
pane
Launch
Launchdebugger
debugger Open
Opentraceback
tracebacktotoexamine
examine Turn Tools
Tools> >Project
ProjectOptions
Options >Git/SVN
Git/SVN RStudio
RStudioopensopensdocumentation
documentationinina adedicated
dedicatedHelp
Helppane
pane
Turn on at Tools > Project Options > Git/SVN
Open withdebug(),
Openwith debug(),browser(),
browser(),orora abreakpoint.
breakpoint.RStudio
RStudiowill
willopen the Launch
openthe Launch
mode
modefromdebugger
debugger
fromorigin
origin Open
Open
the traceback
traceback
thefunctions
functions toto
that
thatRexamine
examine
Rcalled
called Turn on at Tools > Project Options
Stage Show
Stage Showfile Commit Push/Pull
file Commit
> Git/SVN
Push/Pull View
View
Open
Openwith
debugger
debugger
withdebug(),
debug(),
mode
modewhenbrowser(),
whenbrowser(), ororaaabreakpoint.
it itencounters
encounters abreakpoint RStudio
breakpoint
while
breakpoint. will
willopen
whileexecuting
RStudioexecuting the
the mode
code.
opencode. ofmode from
oferror fromorigin
error origin the
thefunctions
before
before the that
theerror
functionserror R Rcalled
occurred
occurred
that called Push/Pull
debugger files: Show
Stage
files:
Stage diff file
diff
Show file Commit
staged filesto
stagedfiles
Commit to remote View
remote
Push/Pull History
History
View
debuggermode
modewhen
whenit itencounters
encountersa abreakpoint
breakpointwhile
whileexecuting code. ofoferror
executingcode. error before
beforethetheerror
erroroccurred
occurred files: diff
files: diff staged filestotoremote
stagedfiles remote History
History Home
Homepage
pageofof Search
Searchwithin
within Search
Searchforfor
Click
Clicknext
nexttoto • •Added
A A
Added Home
helpfulpage
helpful
Home linksofof
links
page Search
help
helpfile
Search within
filewithin Search
help
helpfile
Search forfor
file
Click next •D• AD•Added
A
Click
line nexttotototo
linenumber
number Deleted
Deleted
Added
D• Deleted
helpful links
helpful links help file
help file help
helpfile
file
line number
numberto
add/remove
add/remove
line a to
a M • DM•Modified
Modified
Deleted Viewer
ViewerPane
Panedisplays
displaysHTML
HTMLcontent,
content,such
suchasasShiny
Shinyapps,
apps,
•R• MR•Modified Viewer Pane
Panedisplays HTML content, such asasShiny
Shinyapps,
M
add/remove
breakpoint.a a
add/remove
breakpoint. Modified
Renamed
Renamed Open Openshell
shelltoto current
current Viewer
RMarkdown
RMarkdown displays
reports,HTML
reports,
andand content,
interactivesuch
interactive
visualizations
visualizations apps,
R• Renamed Open shell
breakpoint.
breakpoint. ?• ?•Untracked
R Untracked type
Renamed Open
type shelltoto
commands
commands current
branch
branch
current RMarkdown reports, and interactive visualizations
RMarkdown reports, and interactive visualizations
?• Untracked type commands branch
?• Untracked type commands branch
Highlighted
Highlighted
Highlighted
Package
Package Writing
Writing
line
lineshows
shows
Highlighted Stop
StopShiny
Shiny Publish
Publishtotoshinyapps.io,
shinyapps.io, Refresh Refresh
Package Writing
line shows Stop
appShiny Publish totoshinyapps.io, Refresh
Package Writing
line
where
whereshows Stop
app Shiny Publish
rpubs,
rpubs, shinyapps.io,
RSConnect,
RSConnect, …… Refresh
where
execution
executionhas
where has app
app rpubs,
rpubs,RSConnect,
RSConnect,……
execution
paused has
execution
paused has View(<data>)
View(<data>)opens
opensspreadsheet
spreadsheetlikelikeview
viewofofdata
dataset
set
File
File> >New
NewProject
Project> > View(<data>)
paused
paused File > >New
File NewProject
Project >> View(<data>)opens
opensspreadsheet
spreadsheetlikelikeview
viewofofdata
dataset
set
New
NewDirectory
Directory> >R RPackage
Package
New
New Directory
Directory> R Package
Run
Runcommands
commandsinin Examine
Examinevariables
variables Select
Selectfunction
function Step through Step
Stepthrough Stepintointoand
and Resume
Resume Quit
Quitdebug
debug Turn
Turnproject
project into
into package,> R Package
package,
Run commands inin Examine variables Turn project into package,
Run commands
environment
environment where
where inExamine variables Select
inexecuting
executing inSelect function
function
intraceback
tracebacktoto Step
Step
code
code through line Step
through
one
oneline outofinto
Step
out intoand
and
offunctions
functions Resume
executionQuit
execution
Resume mode
mode
Quitdebug
debug Turn
Enable
Enable
project
Enable roxygen into
roxygen package,
documentation
documentation with
with
environment
execution
executionhaswhere
has
environment paused
paused
where in executing
environment
environment
in executing in traceback
debug tracebacktoto
debug
in code
atat one
a atime
code time line out
one line toout of
torun
runfunctions
of functions execution
executionmode
mode Tools>roxygen
Tools
Enable >Project
Project
roxygen documentation
Options
Options> >Build
documentation with
Build
Tools
Tools
with
execution Tools
Tools > Project Options > BuildTools
> Project Options > Build Tools
executionhas
haspaused
paused environment
environment debug
debug atata atime
time totorun
run Roxygen
Roxygenguide
guideatat Filter
Filterrows
rowsbybyvalue
value Sort
Sortbyby Search
Search
Roxygen guide Filter rows bybyvalue Sort
Sortbyby Search
Roxygen
Help guideatat
Help> >Roxygen
Roxygen Quick
QuickReference
Reference orFilter
orvalue rows
valuerange
range value values
values Search
for
forvalue
value
Help ororvalue
valuerange values forforvalue
Help> >Roxygen
RoxygenQuickQuickReference
Reference range values value

RStudio®
RStudio®is is
a trademark
a trademark
ofof
RStudio,
RStudio,
Inc.
Inc.• CC
• CC
BYBY
SASARStudio
RStudio• info@rstudio.com
• info@rstudio.com• 844-448-1212
• 844-448-1212
• rstudio.com
• rstudio.com
• Learn
• Learn
more
more www.rstudio.com
atat www.rstudio.com• RStudio
• RStudio
IDE
IDE0.99.832
0.99.832• Updated:
• Updated:
2016-01
2016-01
RStudio®
RStudio®is is
a trademark ofof
a trademark RStudio, Inc.
RStudio, Inc.• CC BYBY
• CC SASARStudio
RStudio• info@rstudio.com
• info@rstudio.com• 844-448-1212 • rstudio.com
• 844-448-1212 • Learn
• rstudio.com more
• Learn more www.rstudio.com
atat www.rstudio.com• RStudio IDE
• RStudio IDE0.99.832
0.99.832• Updated: 2016-01
• Updated: 2016-01
1 LAYOUT Windows/Linux Mac 4 WRITE CODE Windows /Linux Mac WHY RSTUDIO SERVER PRO?
Move focus to Source Editor Ctrl+1 Ctrl+1 Attempt completion Tab or Ctrl+Space Tab or Cmd+Space RSP extends the the open source server with a
Move focus to Console Ctrl+2 Ctrl+2 Navigate candidates / / commercial license, support, and more:
Move focus to Help Ctrl+3 Ctrl+3 Accept candidate Enter, Tab, or Enter, Tab, or • open and run multiple R sessions at once
Show History Ctrl+4 Ctrl+4 Dismiss candidates Esc Esc • tune your resources to improve performance
Undo Ctrl+Z Cmd+Z • edit the same project at the same time as others
Show Files Ctrl+5 Ctrl+5
Redo Ctrl+Shi +Z Cmd+Shi +Z • see what you and others are doing on your server
Show Plots Ctrl+6 Ctrl+6
Cut Ctrl+X Cmd+X
Show Packages Ctrl+7 Ctrl+7 • switch easily from one version of R to a di erent version
Copy Ctrl+C Cmd+C
Show Environment Ctrl+8 Ctrl+8 • integrate with your authentication, authorization, and audit practices
Paste Ctrl+V Cmd+V
Show Git/SVN Ctrl+9 Ctrl+9 Download a free 45 day evaluation at
Select All Ctrl+A Cmd+A
Show Build Ctrl+0 Ctrl+0 Delete Line Ctrl+D Cmd+D www.rstudio.com/products/rstudio-server-pro/
Select Shi +[Arrow] Shi +[Arrow]
2 RUN CODE Windows/Linux Mac Select Word Ctrl+Shi + / Option+Shi + /
Search command history Ctrl+ Cmd+ Select to Line Start Alt+Shi + Cmd+Shi + 5 DEBUG CODE Windows/Linux Mac
Navigate command history / / Select to Line End Alt+Shi + Cmd+Shi + Toggle Breakpoint Shi +F9 Shi +F9
Move cursor to start of line Home Cmd+ Select Page Up/Down Shi +PageUp/Down Shi +PageUp/Down Execute Next Line F10 F10
Move cursor to end of line End Cmd+ Select to Start/End Shi +Alt+ / Cmd+Shi + / Step Into Function Shi +F4 Shi +F4
Delete Word Le Ctrl+Backspace Ctrl+Opt+Backspace Finish Function/Loop Shi +F6 Shi +F6
Change working directory Ctrl+Shi +H Ctrl+Shi +H
Interrupt current command Esc Esc Delete Word Right Option+Delete Continue Shi +F5 Shi +F5
Delete to Line End Ctrl+K Stop Debugging Shi +F8 Shi +F8
Clear console Ctrl+L Ctrl+L
Quit Session (desktop only) Ctrl+Q Cmd+Q Delete to Line Start Option+Backspace
Restart R Session Ctrl+Shi +F10 Cmd+Shi +F10 Indent Tab (at start of line) Tab (at start of line) 6 VERSION CONTROL Windows/Linux Mac
Run current line/selection Ctrl+Enter Cmd+Enter Outdent Shi +Tab Shi +Tab Show di Ctrl+Alt+D Ctrl+Option+D
Yank line up to cursor Ctrl+U Ctrl+U Commit changes Ctrl+Alt+M Ctrl+Option+M
Run current (retain cursor) Alt+Enter Option+Enter
Yank line a er cursor Ctrl+K Ctrl+K Scroll di view Ctrl+ / Ctrl+ /
Run from current to end Ctrl+Alt+E Cmd+Option+E
Insert yanked text Ctrl+Y Ctrl+Y Stage/Unstage (Git) Spacebar Spacebar
Run the current function Ctrl+Alt+F Cmd+Option+F
Insert <- Alt+- Option+- Stage/Unstage and move to next Enter Enter
Source a file Ctrl+Alt+G Cmd+Option+G
Insert %>% Ctrl+Shi +M Cmd+Shi +M
Source the current file Ctrl+Shi +S Cmd+Shi +S
Show help for function F1 F1
Source with echo Ctrl+Shi +Enter Cmd+Shi +Enter Show source code F2 F2 7 MAKE PACKAGES Windows/Linux Mac
New document Ctrl+Shi +N Cmd+Shi +N Build and Reload Ctrl+Shi +B Cmd+Shi +B
3 NAVIGATE CODE Windows /Linux Mac New document (Chrome) Ctrl+Alt+Shi +N Cmd+Shi +Opt+N Load All (devtools) Ctrl+Shi +L Cmd+Shi +L
Goto File/Function Ctrl+. Ctrl+. Open document Ctrl+O Cmd+O Test Package (Desktop) Ctrl+Shi +T Cmd+Shi +T
Fold Selected Alt+L Cmd+Option+L Save document Ctrl+S Cmd+S Test Package (Web) Ctrl+Alt+F7 Cmd+Opt+F7
Unfold Selected Shi +Alt+L Cmd+Shi +Option+L Close document Ctrl+W Cmd+W Check Package Ctrl+Shi +E Cmd+Shi +E
Fold All Alt+O Cmd+Option+O Close document (Chrome) Ctrl+Alt+W Cmd+Option+W Document Package Ctrl+Shi +D Cmd+Shi +D
Unfold All Shi +Alt+O Cmd+Shi +Option+O Close all documents Ctrl+Shi +W Cmd+Shi +W
Go to line Shi +Alt+G Cmd+Shi +Option+G Extract function Ctrl+Alt+X Cmd+Option+X 8 DOCUMENTS AND APPS Windows/Linux Mac
Jump to Shi +Alt+J Cmd+Shi +Option+J Extract variable Ctrl+Alt+V Cmd+Option+V Preview HTML (Markdown, etc.) Ctrl+Shi +K Cmd+Shi +K
Switch to tab Ctrl+Shi +. Ctrl+Shi +. Reindent lines Ctrl+I Cmd+I Knit Document (knitr) Ctrl+Shi +K Cmd+Shi +K
Previous tab Ctrl+F11 Ctrl+F11 (Un)Comment lines Ctrl+Shi +C Cmd+Shi +C Compile Notebook Ctrl+Shi +K Cmd+Shi +K
Next tab Ctrl+F12 Ctrl+F12 Reflow Comment Ctrl+Shi +/ Cmd+Shi +/ Compile PDF (TeX and Sweave) Ctrl+Shi +K Cmd+Shi +K
First tab Ctrl+Shi +F11 Ctrl+Shi +F11 Reformat Selection Ctrl+Shi +A Cmd+Shi +A Insert chunk (Sweave and Knitr) Ctrl+Alt+I Cmd+Option+I
Last tab Ctrl+Shi +F12 Ctrl+Shi +F12 Select within braces Ctrl+Shi +E Ctrl+Shi +E Insert code section Ctrl+Shi +R Cmd+Shi +R
Navigate back Ctrl+F9 Cmd+F9 Show Diagnostics Ctrl+Shi +Alt+P Cmd+Shi +Opt+P Re-run previous region Ctrl+Shi +P Cmd+Shi +P
Navigate forward Ctrl+F10 Cmd+F10 Transpose Letters Ctrl+T Run current document Ctrl+Alt+R Cmd+Option+R
Jump to Brace Ctrl+P Ctrl+P Move Lines Up/Down Alt+ / Option+ / Run from start to current line Ctrl+Alt+B Cmd+Option+B
Select within Braces Ctrl+Shi +Alt+E Ctrl+Shi +Option+E Copy Lines Up/Down Shi +Alt+ / Cmd+Option+ / Run the current code section Ctrl+Alt+T Cmd+Option+T
Use Selection for Find Ctrl+F3 Cmd+E Add New Cursor Above Ctrl+Alt+Up Ctrl+Option+Up Run previous Sweave/Rmd code Ctrl+Alt+P Cmd+Option+P
Find in Files Ctrl+Shi +F Cmd+Shi +F Add New Cursor Below Ctrl+Alt+Down Ctrl+Option+Down Run the current chunk Ctrl+Alt+C Cmd+Option+C
Find Next Win: F3, Linux: Ctrl+G Cmd+G Move Active Cursor Up Ctrl+Alt+Shi +Up Ctrl+Option+Shi +Up Run the next chunk Ctrl+Alt+N Cmd+Option+N
Find Previous W: Shi +F3, L: Cmd+Shi +G Move Active Cursor Down Ctrl+Alt+Shi +Down Ctrl+Opt+Shi +Down Sync Editor & PDF Preview Ctrl+F8 Cmd+F8
Jump to Word Ctrl+ / Option+ / Find and Replace Ctrl+F Cmd+F Previous plot Ctrl+Alt+F11 Cmd+Option+F11
Jump to Start/End Ctrl+ / Cmd+ / Use Selection for Find Ctrl+F3 Cmd+E Next plot Ctrl+Alt+F12 Cmd+Option+F12
Toggle Outline Ctrl+Shi +O Cmd+Shi +O Replace and Find Ctrl+Shi +J Cmd+Shi +J Show Keyboard Shortcuts Alt+Shi +K Option+Shi +K

Learn more at www.rstudio.com • RStudio IDE 0.1.0 • Updated: 2017-09


Shiny : : CHEAT SHEET
Basics Building an App Complete the template by adding arguments to fluidPage() and a
body to the server function. Inputs
A Shiny app is a web page (UI) connected to a Add inputs to the UI with *Input() functions
library(shiny) collect values from the user
computer running a live R session (Server)
Add outputs with *Output() functions
ui <- fluidPage( Access the current value of an input object with
numericInput(inputId = "n",
"Sample size", value = 25), input$<inputId>. Input values are reactive.
Tell server how to render outputs with R in plotOutput(outputId = "hist")
the server function. To do this: ) actionButton(inputId, label, icon,
…)
1. Refer to outputs with output$<id> server <- function(input, output) {
output$hist <- renderPlot({
2. Refer to inputs with input$<id> hist(rnorm(input$n))
actionLink(inputId, label, icon, …)
Users can manipulate the UI, which will cause })
the server to update the UI’s displays (by 3. Wrap code in a render*() function before }

running R code). saving to output shinyApp(ui = ui, server = server) checkboxGroupInput(inputId, label,
Save your template as app.R. Alternatively, split your template into two files named ui.R and server.R. choices, selected, inline)
APP TEMPLATE library(shiny) # ui.R
fluidPage(
ui.R contains everything checkboxInput(inputId, label,
Begin writing a new app with this template. ui <- fluidPage(
numericInput(inputId = "n", you would save to ui. value)
numericInput(inputId = "n",
Preview the app by running the code at the R "Sample size", value = 25), "Sample size", value = 25),
command line. plotOutput(outputId = "hist") plotOutput(outputId = "hist")
) ) server.R ends with the dateInput(inputId, label, value, min,
function you would save max, format, startview, weekstart,
server <- function(input, output) { # server.R
library(shiny) output$hist <- renderPlot({ to server. language)
ui <- fluidPage() hist(rnorm(input$n)) function(input, output) {
}) output$hist <- renderPlot({
server <- function(input, output){} } hist(rnorm(input$n)) No need to call
}) dateRangeInput(inputId, label,
shinyApp(ui = ui, server = server) shinyApp(ui = ui, server = server) } shinyApp(). start, end, min, max, format,
Save each app as a directory that holds an app.R file (or a server.R file and a ui.R file) plus optional extra files. startview, weekstart, language,
separator)
• uiHTML
- nested R functions that assemble an
user interface for your app
app-name The directory name is the name of the app
.r
 app.R (optional) defines objects available to both
global.R Launch apps with fileInput(inputId, label, multiple,
• server - a function with instructions on how
to build and rebuild the R objects displayed

 DESCRIPTION
ui.R and server.R
(optional) used in showcase mode
runApp(<path to
directory>)
accept)
in the UI  README (optional) data, scripts, etc.
 <other files> (optional) directory of files to share with web numericInput(inputId, label, value,
• shinyApp - combines ui and server into
an app. Wrap with runApp() if calling from a
 www browsers (images, CSS, .js, etc.) Must be named "www" min, max, step)

sourced script or inside a function. passwordInput(inputId, label,


Outputs - render*() and *Output() functions work together to add R output to the UI value)
SHARE YOUR APP works
with radioButtons(inputId, label,
The easiest way to share your app DT::renderDataTable(expr, options, dataTableOutput(outputId, icon, …)
callback, escape, env, quoted) choices, selected, inline)
is to host it on shinyapps.io, a
cloud based service from RStudio renderImage(expr, env, quoted, imageOutput(outputId, width, height,
deleteFile) click, dblclick, hover, hoverDelay, inline, selectInput(inputId, label, choices,
hoverDelayType, brush, clickId, hoverId) selected, multiple, selectize, width,
1. Create a free or professional account at size) (also selectizeInput())
http://shinyapps.io renderPlot(expr, width, height, res, …, plotOutput(outputId, width, height, click,
env, quoted, func) dblclick, hover, hoverDelay, inline, sliderInput(inputId, label, min, max,
2. Click the Publish icon in the RStudio IDE hoverDelayType, brush, clickId, hoverId) value, step, round, format, locale,
or run: ticks, animate, width, sep, pre,
renderPrint(expr, env, quoted, func, verbatimTextOutput(outputId) post)
rsconnect::deployApp("<path to directory>") width)
renderTable(expr,…, env, quoted, func) tableOutput(outputId)
Build or purchase your own Shiny Server submitButton(text, icon)
at www.rstudio.com/products/shiny-server/ renderText(expr, env, quoted, func) textOutput(outputId, container, inline) (Prevents reactions across entire app)
uiOutput(outputId, inline, container, …)
renderUI(expr, env, quoted, func) & htmlOutput(outputId, inline, container, …) textInput(inputId, label, value)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at shiny.rstudio.com • shiny 0.12.0 • Updated: 2016-01
Reactivity UI - An app’s UI is an HTML document. Layouts
Reactive values work together with reactive functions. Call a reactive value from within the arguments of one Use Shiny’s functions to assemble this HTML with R. Combine multiple elements
of these functions to avoid the error Operation not allowed without an active reactive context. fluidPage( into a "single element" that
textInput("a","") Returns has its own properties with
) HTML a panel function, e.g.
## <div class="container-fluid">
## <div class="form-group shiny-input-container"> wellPanel(dateInput("a", ""),
## <label for="a"></label> submitButton()
## <input id="a" type="text" )
## class="form-control" value=""/>
absolutePanel() navlistPanel()
## </div>
conditionalPanel() sidebarPanel()
## </div> fixedPanel() tabPanel()
headerPanel() tabsetPanel()
Add static HTML elements with tags, a list of inputPanel() titlePanel()
functions that parallel common HTML tags, e.g. mainPanel() wellPanel()
tags$a(). Unnamed arguments will be passed
into the tag; named arguments will become tag Organize panels and elements into a layout with a
attributes. layout function. Add elements as arguments of the
fluidRow() layout functions.
tags$a tags$data tags$h6 tags$nav tags$span
tags$abbr tags$datalist tags$head tags$noscript tags$strong
tags$address tags$dd tags$header tags$object tags$style ui <- fluidPage(
tags$area tags$del tags$hgroup tags$ol tags$sub row
column col fluidRow(column(width = 4),
tags$article tags$details tags$hr tags$optgroup tags$summary column(width = 2, offset = 3)),
tags$aside tags$dfn tags$HTML tags$option tags$sup column fluidRow(column(width = 12))
tags$audio tags$div tags$i tags$output tags$table )
CREATE YOUR OWN REACTIVE VALUES RENDER REACTIVE OUTPUT tags$b tags$dl tags$iframe tags$p tags$tbody
tags$base tags$dt tags$img tags$param tags$td
# example snippets *Input() functions library(shiny) render*() functions tags$bdi tags$em tags$input tags$pre tags$textarea flowLayout()
(see front page) tags$bdo tags$embed tags$ins tags$progress tags$tfoot ui <- fluidPage(
ui <- fluidPage( (see front page) tags$blockquote tags$eventsource tags$kbd tags$q tags$th object object object flowLayout( # object 1,
ui <- fluidPage( textInput("a","","A"),
textInput("a","","A") reactiveValues(…) )
textOutput("b") tags$body tags$fieldset tags$keygen tags$ruby tags$thead 1 2 3 # object 2,
) Builds an object to tags$br tags$figcaption tags$label tags$rp tags$time # object 3
tags$button tags$figure tags$legend tags$rt tags$title object 3 )
Each input function server <- display. Will rerun code in tags$canvas tags$footer tags$li tags$s tags$tr
server <- function(input,output){ body to rebuild the object tags$caption tags$form tags$link tags$samp tags$track )
creates a reactive value output$b <-
function(input,output){
stored as input$<inputId> renderText({ whenever a reactive value tags$cite tags$h1 tags$mark tags$script tags$u sidebarLayout()
rv <- reactiveValues()
in the code changes. tags$code tags$h2 tags$map tags$section tags$ul ui <- fluidPage(
rv$number <- 5 input$a
}) tags$col tags$h3 tags$menu tags$select tags$var sidebarLayout(
} reactiveValues() creates a } Save the results to tags$colgroup tags$h4 tags$meta tags$small tags$video sidebarPanel(),
list of reactive values tags$command tags$h5 tags$meter tags$source tags$wbr side main mainPanel()
shinyApp(ui, server) output$<outputId> panel
whose values you can set. The most common tags have wrapper functions. You panel )
)
do not need to prefix their names with tags$
PREVENT REACTIONS TRIGGER ARBITRARY CODE
ui <- fluidPage( splitLayout()
library(shiny) isolate(expr) library(shiny) observeEvent(eventExpr h1("Header 1"),
hr(),
ui <- fluidPage(
ui <- fluidPage( ui <- fluidPage( , handlerExpr, event.env, splitLayout( # object 1,
textInput("a","","A"), Runs a code block. textInput("a","","A"), event.quoted, handler.env,
br(), object object # object 2
textOutput("b") Returns a non-reactive actionButton("go","Go")
handler.quoted, labe,
p(strong("bold")),
1 2 )
) ) p(em("italic")),
copy of the results. suspended, priority, domain, )
server <- p(code("code")),
server <- autoDestroy, ignoreNULL)
function(input,output){ function(input,output){ a(href="", "link"),
output$b <- observeEvent(input$go,{ HTML("<p>Raw html</p>") verticalLayout() ui <- fluidPage(
renderText({ print(input$a) Runs code in 2nd ) verticalLayout( # object 1,
isolate({input$a}) }) object 1 # object 2,
}) } argument when reactive
} values in 1st argument # object 3
object 2
change. See observe() for To include a CSS file, use includeCSS(), or )
shinyApp(ui, server) shinyApp(ui, server) )
alternative. 1. Place the file in the www subdirectory object 3
2. Link to it with
MODULARIZE REACTIONS DELAY REACTIONS Layer tabPanels on top of each other,
tags$head(tags$link(rel = "stylesheet", and navigate between them, with:
ui <- fluidPage( type = "text/css", href = "<file name>"))
textInput("a","","A"), reactive(x, env, quoted, library(shiny) eventReactive(eventExpr, ui <- fluidPage( tabsetPanel(
textInput("z","","Z"),
textOutput("b"))
label, domain) ui <- fluidPage(
textInput("a","","A"),
valueExpr, event.env, tabPanel("tab 1", "contents"),
Creates a reactive expression event.quoted, value.env, tabPanel("tab 2", "contents"),
server <-
actionButton("go","Go"),
textOutput("b")
To include JavaScript, use includeScript() or tabPanel("tab 3", "contents")))
function(input,output){ that ) value.quoted, label, 1. Place the file in the www subdirectory
re <- reactive({ • caches its value to reduce server <-
domain, ignoreNULL) 2. Link to it with ui <- fluidPage( navlistPanel(
paste(input$a,input$z)}) computation function(input,output){ tabPanel("tab 1", "contents"),
output$b <- renderText({ re <- eventReactive( Creates reactive
re() • can be called by other code input$go,{input$a}) expression with code in tags$head(tags$script(src = "<file name>")) tabPanel("tab 2", "contents"),
})
• notifies its dependencies
output$b <- renderText({ tabPanel("tab 3", "contents")))
} re() 2nd argument that only
shinyApp(ui, server) when it ha been invalidated }) invalidates when reactive IMAGES To include an image ui <- navbarPage(title = "Page",
}
Call the expression with values in 1st argument 1. Place the file in the www subdirectory tabPanel("tab 1", "contents"),
function syntax, e.g. re() shinyApp(ui, server)
change. 2. Link to it with img(src="<file name>") tabPanel("tab 2", "contents"),
tabPanel("tab 3", "contents"))
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at shiny.rstudio.com • shiny 0.12.0 • Updated: 2016-01
R Markdown : : CHEAT SHEET File path to output document
5

What is R Markdown? 1 Find in document synch publish


button to
.rmd Structure rmarkdown
accounts at YAML Header 

.Rmd files · An R Markdown 4 rpubs.com, Optional section of render (e.g. pandoc)
(.Rmd) file is a record of your
3 options written as key:value pairs (YAML).
shinyapps.io
research. It contains the code that a At start of file
scientist needs to reproduce your work set insert run code RStudio 

Rmd along with the narration that a reader preview code chunk(s) Connect Between lines of - - -
needs to understand your work. location chunk go to Text

2 code Reload document Narration formatted with markdown, mixed with:
Reproducible Research · At the click of a chunk publish
button, or the type of a command, you Code Chunks

show Chunks of embedded code. Each chunk:
can rerun the code in an R Markdown file outline
to reproduce your work and export the Begins with ```{r}
results as a finished report. ends with ```
Dynamic Documents · You can choose R Markdown will run the code and append the results to the doc.
to export the finished report in a variety
of formats, including html, pdf, MS It will use the location of the .Rmd file as the working directory
run all
Word, or RTF documents; html or pdf previous
based slides, Notebooks, and more. chunks
modify
chunk
run
current Parameters
Workflow
options chunk
Parameterize your documents to reuse with
different inputs (e.g., data, values, etc.)
---
1. Add parameters · Create and set params:
parameters in the header as sub- n: 100
values of params d: !r Sys.Date()
---
2. Call parameters · Call parameter
values in code as params$<name>
6 Today’s date
3. Set parameters · Set values wth is `r params$d`
1 Open a new .Rmd file at File ▶ New File ▶ 
 Knit with parameters or the params
R Markdown. Use the wizard that opens to pre- argument of render():
populate the file with a template 7 render("doc.Rmd", params = list(n = 1,
2 Write document by editing template d = as.Date("2015-01-01"))

3 Knit document to create report; use knit button or


render() to knit
4 Preview Output in IDE window render Interactive

5 Publish (optional) to web server
6 Examine build log in R Markdown console
Use rmarkdown::render() to render/knit at cmd line. Important args:
input - file to render output_options - 
 output_file params - list of envir - environment 
 encoding - of input
Documents
output_format List of render 
 output_dir params to use to evaluate code file Turn your report into an interactive Shiny 

7 Use output file that is saved along side .Rmd options (as in YAML) chunks in document in 4 steps
1. Add runtime: shiny to the YAML header.

Embed code with knitr syntax


2. Call Shiny input functions to embed input objects.
3. Call Shiny render functions to embed reactive output.
4. Render with rmarkdown::run or click Run Document in
INLINE CODE CODE CHUNKS GLOBAL OPTIONS RStudio IDE
Insert with `r <code>`. Results appear as text without code. One or more lines surrounded with ```{r} and ```. Place chunk Set with knitr::opts_chunk$set(), e.g.
Built with `r getRversion()` Built with 3.2.3 options within curly braces, after r. Insert with ```{r include=FALSE}
```{r echo=TRUE} knitr::opts_chunk$set(echo = TRUE) ---
getRversion() ``` output: html_document
``` runtime: shiny
---
IMPORTANT CHUNK OPTIONS
```{r, echo = FALSE}
cache - cache results for future knits (default = dependson - chunk dependencies for caching fig.align - 'left', 'right', or 'center' (default = message - display code messages in 
 numericInput("n",
FALSE) (default = NULL) 'default') document (default = TRUE) "How many cars?", 5)
cache.path - directory to save cached results in echo - Display code in output document (default = fig.cap - figure caption as character string (default results (default = 'markup')

(default = "cache/") TRUE) = NULL) 'asis' - passthrough results
 renderTable({
'hide' - do not display results
 head(cars, input$n)
child - file(s) to knit and then include (default = engine - code language used in chunk (default = fig.height, fig.width - Dimensions of plots in })
NULL) 'R') inches 'hold' - put all results below all code
```
collapse - collapse all output into single block error - Display error messages in doc (TRUE) or highlight - highlight source code (default = TRUE) tidy - tidy code for display (default = FALSE)
(default = FALSE) stop render when errors occur (FALSE) (default = include - Include chunk in doc after running warning - display code warnings in document
comment - prefix for each line of results (default = '##') FALSE) (default = TRUE) (default = TRUE)
Embed a complete app into your document with
eval - Run code in chunk (default = TRUE) shiny::shinyAppDir()
Options not listed above: R.options, aniopts, autodep, background, cache.comments, cache.lazy, cache.rebuild, cache.vars, dev, dev.args, dpi, NOTE: Your report will rendered as a Shiny app, which means
engine.opts, engine.path, fig.asp, fig.env, fig.ext, fig.keep, fig.lp, fig.path, fig.pos, fig.process, fig.retina, fig.scap, fig.show, fig.showtext, fig.subcap, interval, you must choose an html output format, like html_document,
out.extra, out.height, out.width, prompt, purl, ref.label, render, size, split, tidy.opts and serve it with an active R Session.

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at rmarkdown.rstudio.com • rmarkdown 1.6 • Updated: 2016-02
Pandoc’s Markdown Set render options with YAML
Write with syntax on the left to create effect on right (after render)

Plain text
When you render, R Markdown
1. runs the R code, embeds results and text into .md file with knitr
rmarkdown
End a line with two spaces
to start a new paragraph. 2. then converts the .md file into the finished format with pandoc
*italics* and **bold**

beamer
ioslides
`verbatim code`

gituhb
word
html

slidy
sub/superscript^2^~2~ sub-option description

odt
pdf

md
rtf
~~strikethrough~~
escaped: \* \_ \\ citation_package The LaTeX package to process citations, natbib, biblatex or none X X X
endash: --, emdash: ---
Set a document’s code_folding Let readers to toggle the display of R code, "none", "hide", or "show" X
equation: $A = \pi*r^{2}$
default output format ---
equation block: output: html_document colortheme Beamer color theme to use X
in the YAML header: ---
$$E = mc^{2}$$ # Body css CSS file to use to style document X X X
> block quote dev Graphics device to use for figure output (e.g. "png") X X X X X X X
duration Add a countdown timer (in minutes) to footer of slides X
# Header1 {#anchor} output value creates
fig_caption Should figures be rendered with captions? X X X X X X X
## Header 2 {#css_id} html_document html
fig_height, fig_width Default figure height and width (in inches) for document X X X X X X X X X X
### Header 3 {.css_class}
pdf_document pdf (requires Tex )
word_document Microsoft Word (.docx) highlight Syntax highlighting: "tango", "pygments", "kate","zenburn", "textmate" X X X X X
#### Header 4 includes File of content to place in document (in_header, before_body, after_body) X X X X X X X X
odt_document OpenDocument Text
##### Header 5 rtf_document Rich Text Format incremental Should bullets appear one at a time (on presenter mouse clicks)? X X X
###### Header 6 md_document Markdown keep_md Save a copy of .md file that contains knitr output X X X X X X
<!--Text comment--> github_document Github compatible markdown keep_tex Save a copy of .tex file that contains knitr output X X
ioslides_presentation ioslides HTML slides latex_engine Engine to render latex, "pdflatex", "xelatex", or "lualatex" X X
\textbf{Tex ignored in HTML}
<em>HTML ignored in pdfs</em> slidy_presentation slidy HTML slides lib_dir Directory of dependency files to use (Bootstrap, MathJax, etc.) X X X
<http://www.rstudio.com> beamer_presentation Beamer pdf slides (requires Tex) mathjax Set to local or a URL to use a local/URL version of MathJax to render equations X X X
[link](www.rstudio.com)
Jump to [Header 1](#anchor) Indent 2 Indent 4 md_extensions Markdown extensions to add to default definition or R Markdown X X X X X X X X X X
image: Customize output with spaces spaces number_sections Add section numbering to headers X X
---
sub-options (listed to output: html_document:
![Caption](smallorb.png) the right): code_folding: hide pandoc_args Additional arguments to pass to Pandoc X X X X X X X X X X
* unordered list toc_float: TRUE preserve_yaml Preserve YAML front matter in final document? X
+ sub-item 1 ---
+ sub-item 2 # Body reference_docx docx file whose styles should be copied when producing docx output X
- sub-sub-item 1 self_contained Embed dependencies into the doc X X X
* item 2 html tabsets slide_level The lowest heading level that defines individual slides X
Continued (indent 4 spaces)
Use tablet css class to place sub-headers into tabs smaller Use the smaller font size in the presentation? X
1. ordered list # Tabset {.tabset .tabset-fade .tabset-pills} smart Convert straight quotes to curly, dashes to em-dashes, … to ellipses, etc. X X X
2. item 2 ## Tab 1 template Pandoc template to use when rendering file quarterly_report.html). X X X X X
i) sub-item 1
A. sub-sub-item 1 text 1 Tabset theme Bootswatch or Beamer theme to use for page X X
(@) A list whose numbering ## Tab 2 Tab 1 Tab 2 toc Add a table of contents at start of document X X X X X X X
text 2 toc_depth The lowest level of headings to add to table of contents X X X X X X
continues after text 1
### End tabset End tabset toc_float Float the table of contents to the left of the main content X
(@) an interruption

Term 1

: Definition 1

| Right | Left | Default | Center |


Create a Reusable Template Table Suggestions Citations and Bibliographies
1. Create a new package with a inst/rmarkdown/templates Several functions format R data into tables Create citations with .bib, .bibtex, .copac, .enl, .json,
|------:|:-----|---------|:------:| directory .medline, .mods, .ris, .wos, and .xml files
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 | 2. In the directory, Place a folder that contains:
 ---
| 1 | 1| 1 | 1 | template.yaml (see below)
 1. Set bibliography file and CSL 1.0 
 bibliography: refs.bib
skeleton.Rmd (contents of the template)
 Style file (optional) in the YAML header csl: style.csl
- slide bullet 1 any supporting files
- slide bullet 2 2. Use citation keys in text ---
3. Install the package
(>- to have bullets appear on click) 4. Access template in wizard at File ▶ New File ▶ R Markdown data <- faithful[1:4, ] Smith cited [@smith04].
horizontal rule/slide break: template.yaml ```{r results = 'asis'} Smith cited without author [-@smith04].
knitr::kable(data, caption = "Table with kable”) @smith04 cited in line.
*** ```
---
A footnote [^1] name: My Template ```{r results = "asis"}
— print(xtable::xtable(data, caption = "Table with xtable”), 3. Render. Bibliography will be 

[^1]: Here is the footnote. type = "html", html.table.attributes = "border=0")) added to end of document
``` Learn more in 

```{r results = "asis"} the stargazer,
stargazer::stargazer(data, type = "html", title = "Table xtable, and knitr
with stargazer") packages.
```
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at rmarkdown.rstudio.com • rmarkdown 1.6 • Updated: 2016-02
Data Import : : CHEAT SHEET
R’s tidyverse is built around tidy data stored
in tibbles, which are enhanced data frames.
Read Tabular Data - These functions share the common arguments: Data types
read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), readr functions guess
The front side of this sheet shows the types of each column and
how to read text files into R with quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000,
n_max), progress = interactive()) convert types when appropriate (but will NOT
readr. convert strings to factors automatically).
The reverse side shows how to A B C Comma Delimited Files
a,b,c read_csv("file.csv") A message shows the type of each column in the
create tibbles with tibble and to 1 2 3
result.
1,2,3 4 5 NA To make file.csv run:
layout tidy data with tidyr.
4,5,NA write_file(x = "a,b,c\n1,2,3\n4,5,NA", path = "file.csv")
## Parsed with column specification:
## cols(
OTHER TYPES OF DATA A B C Semi-colon Delimited Files ## age = col_integer(), age is an
a;b;c read_csv2("file2.csv")
Try one of the following packages to import 1 2 3 ## sex = col_character(), integer
other types of files 1;2;3 4 5 NA write_file(x = "a;b;c\n1;2;3\n4;5;NA", path = "file2.csv") ## earn = col_double()
4;5;NA ## )
• haven - SPSS, Stata, and SAS files
Files with Any Delimiter sex is a
• readxl - excel files (.xls and .xlsx) character
A B C read_delim("file.txt", delim = "|") earn is a double (numeric)
• DBI - databases a|b|c 1 2 3 write_file(x = "a|b|c\n1|2|3\n4|5|NA", path = "file.txt")
• jsonlite - json 1|2|3 4 5 NA 1. Use problems() to diagnose problems.
• xml2 - XML 4|5|NA Fixed Width Files x <- read_csv("file.csv"); problems(x)
• httr - Web APIs read_fwf("file.fwf", col_positions = c(1, 3, 5))
• rvest - HTML (Web Scraping) A B C
abc write_file(x = "a b c\n1 2 3\n4 5 NA", path = "file.fwf") 2. Use a col_ function to guide parsing.
1 2 3
123 4 5 NA • col_guess() - the default
Save Data 4 5 NA
Tab Delimited Files
read_tsv("file.tsv") Also read_table(). • col_character()
write_file(x = "a\tb\tc\n1\t2\t3\n4\t5\tNA", path = "file.tsv") • col_double(), col_euro_double()
Save x, an R object, to path, a file path, as: • col_datetime(format = "") Also
USEFUL ARGUMENTS col_date(format = ""), col_time(format = "")
Comma delimited file
write_csv(x, path, na = "NA", append = FALSE, Example file 1 2 3 Skip lines • col_factor(levels, ordered = FALSE)
a,b,c
col_names = !append) write_file("a,b,c\n1,2,3\n4,5,NA","file.csv") read_csv(f, skip = 1) • col_integer()
1,2,3 4 5 NA
File with arbitrary delimiter f <- "file.csv" • col_logical()
4,5,NA
write_delim(x, path, delim = " ", na = "NA", • col_number(), col_numeric()
append = FALSE, col_names = !append) A B C No header A B C Read in a subset • col_skip()
1 2 3 x <- read_csv("file.csv", col_types = cols(
CSV for excel read_csv(f, col_names = FALSE) 1 2 3 read_csv(f, n_max = 1)
4 5 NA A = col_double(),
write_excel_csv(x, path, na = "NA", append =
x y z Provide header B = col_logical(),
FALSE, col_names = !append)
Missing Values C = col_factor()))
String to file
A B C read_csv(f, col_names = c("x", "y", "z")) A B C
1 2 3 NA 2 3 read_csv(f, na = c("1", "."))
write_file(x, path, append = FALSE) 4 5 NA 4 5 NA 3. Else, read in as character vectors then parse
String vector to file, one element per line with a parse_ function.
write_lines(x,path, na = "NA", append = FALSE) • parse_guess()
Object to RDS file Read Non-Tabular Data • parse_character()
• parse_datetime() Also parse_date() and
write_rds(x, path, compress = c("none", "gz", Read a file into a raw vector
"bz2", "xz"), ...) Read a file into a single string parse_time()
read_file(file, locale = default_locale()) read_file_raw(file)
Tab delimited files • parse_double()
Read each line into its own string Read each line into a raw vector • parse_factor()
write_tsv(x, path, na = "NA", append = FALSE,
col_names = !append) read_lines(file, skip = 0, n_max = -1L, na = character(), read_lines_raw(file, skip = 0, n_max = -1L, • parse_integer()
locale = default_locale(), progress = interactive()) progress = interactive()) • parse_logical()
Read Apache style log files • parse_number()
read_log(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = -1, progress = interactive()) x$A <- parse_number(x$A)
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
Tibbles - an enhanced data frame Tidy Data with tidyr Split Cells
Tidy data is a way to organize tabular data. It provides a consistent data structure across packages.
The tibble package provides a new Use these functions to
A table is tidy if: Tidy data: A * B -> C
S3 class for storing tabular data, the split or combine cells
tibble. Tibbles inherit the data frame A B C A B C A B C A * B C into individual, isolated
class, but improve three behaviors:
• Subsetting - [ always returns a new tibble,
[[ and $ always return a vector.
& values.

separate(data, col, into, sep = "[^[:alnum:]]


Each variable is in Each observation, or Makes variables easy Preserves cases during +", remove = TRUE, convert = FALSE,
• No partial matching - You must use full its own column case, is in its own row to access as vectors vectorized operations extra = "warn", fill = "warn", ...)
column names when subsetting
Separate each cell in a column to make
• Display - When you print a tibble, R provides a
concise view of the
Reshape Data - change the layout of values in a table several columns.
table3
data that fits on Use gather() and spread() to reorganize the values of a table into a new layout.
# A tibble: 234 × 6
manufacturer model displ
country year rate country year cases pop
one screen 1
<chr>
audi
<chr> <dbl>
a4 1.8
gather(data, key, value, ..., na.rm = FALSE, spread(data, key, value, fill = NA, convert = FALSE, A 1999 0.7K/19M A 1999 0.7K 19M
2 audi a4 1.8
3 audi a4 2.0 A 2000 2K/20M A 2000 2K 20M
4
5
audi
audi
a4
a4
2.0
2.8 convert = FALSE, factor_key = FALSE) drop = TRUE, sep = NULL) B 1999 37K/172M B 1999 37K 172

w
w
6 audi a4 2.8
7 audi a4 3.1
B 2000 80K/174M B 2000 80K 174
gather() moves column names into a key spread() moves the unique values of a key
8 audi a4 quattro 1.8
9 audi a4 quattro 1.8
10 audi a4 quattro 2.0 C 1999 212K/1T C 1999 212K 1T
# ... with 224 more rows, and 3
#
#
more variables: year <int>,
cyl <int>, trans <chr>
column, gathering the column values into a column into the column names, spreading the C 2000 213K/1T C 2000 213K 1T
single value column. values of a value column across the new columns.
separate(table3, rate,
tibble display table4a table2
country 1999 2000 country year cases country year type count country year cases pop
into = c("cases", "pop"))
156 1999 6 auto(l4)
A 0.7K 2K A 1999 0.7K A 1999 cases 0.7K A 1999 0.7K 19M
separate_rows(data, ..., sep = "[^[:alnum:].]
157 1999 6 auto(l4)
158 2008 6 auto(l4)
159 2008 8 auto(s4) B 37K 80K B 1999 37K A 1999 pop 19M A 2000 2K 20M
160 1999 4 manual(m5)
C 212K 213K C 1999 212K A 2000 cases 2K B 1999 37K 172M
+", convert = FALSE)
161 1999 4 auto(l4)
162 2008 4 manual(m5)
163 2008
164 2008
4 manual(m5)
4 auto(l4)
A 2000 2K A 2000 pop 20M B 2000 80K 174M
165 2008
166 1999
[ reached
4
4
auto(l4)
auto(l4)
getOption("max.print")
B 2000 80K B 1999 cases 37K C 1999 212K 1T Separate each cell in a column to make
C 2000 213K C 2000 213K 1T
A large table
-- omitted 68 rows ] B 1999 pop 172M several rows. Also separate_rows_().
key value B 2000 cases 80K
to display data frame display B 2000 pop 174M table3
• Control the default appearance with options: C 1999 cases 212K country year rate country year rate
C 1999 pop 1T A 1999 0.7K/19M A 1999 0.7K
options(tibble.print_max = n, C 2000 cases 213K A 2000 2K/20M A 1999 19M
tibble.print_min = m, tibble.width = Inf) C 2000 pop 1T B 1999 37K/172M A 2000 2K
key value B 2000 80K/174M A 2000 20M
• View full data set with View() or glimpse() gather(table4a, `1999`, `2000`, C 1999 212K/1T B 1999 37K
key = "year", value = "cases") spread(table2, type, count) C 2000 213K/1T B 1999 172M
• Revert to data frame with as.data.frame() B 2000 80K
B 2000 174M
CONSTRUCT A TIBBLE IN TWO WAYS
tibble(…)
Handle Missing Values C
C
1999
1999
212K
1T

Both drop_na(data, ...) fill(data, ..., .direction = c("down", "up")) replace_na(data, C 2000 213K
Construct by columns. make this replace = list(), ...)
C 2000 1T
Drop rows containing Fill in NA’s in … columns with most
tibble(x = 1:3, y = c("a", "b", "c")) tibble NA’s in … columns. recent non-NA values. Replace NA’s by column. separate_rows(table3, rate)
tribble(…) x x x
x1 x2 x1 x2 x1 x2 x1 x2 x1 x2 x1 x2
Construct by rows. A tibble: 3 × 2
x y A 1 A 1 A 1 A 1 A 1 A 1 unite(data, col, ..., sep = "_", remove = TRUE)
tribble( ~x, ~y, <int> <chr> B NA D 3 B NA B 1 B NA B 2
Collapse cells across several columns to
1 1 a C NA C NA C 1 C NA C 2
1, "a", 2 2 b D 3 D 3 D 3 D 3 D 3 make a single column.
2, "b", 3 3 c E NA E NA E 3 E NA E 2
3, "c") table5
drop_na(x, x2) fill(x, x2) replace_na(x, list(x2 = 2)) country century year country year
as_tibble(x, …) Convert data frame to tibble. Afghan 19 99 Afghan 1999

enframe(x, name = "name", value = "value") Expand Tables - quickly create tables with combinations of values Afghan
Brazil
20
19
0
99
Afghan
Brazil
2000
1999
Convert named vector to a tibble Brazil 20 0 Brazil 2000
complete(data, ..., fill = list()) expand(data, ...) China 19 99 China 1999
is_tibble(x) Test whether x is a tibble. China 20 0 China 2000
Adds to the data missing combinations of the Create new tibble with all possible combinations
values of the variables listed in … of the values of the variables listed in … unite(table5, century, year,
complete(mtcars, cyl, gear, carb) expand(mtcars, cyl, gear, carb) col = "year", sep = "")
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
Data Transformation with dplyr : : CHEAT SHEET
dplyr
dplyr functions work with pipes and expect tidy data. In tidy data:
Manipulate Cases Manipulate Variables
A B C A B C
& EXTRACT CASES EXTRACT VARIABLES
pipes
Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table.
Each variable is in Each observation, or x %>% f(y)
its own column case, is in its own row becomes f(x, y)

w
www
ww
pull(.data, var = -1) Extract column values as

w
www
filter(.data, …) Extract rows that meet logical
criteria. filter(iris, Sepal.Length > 7) a vector. Choose by name or index.
pull(iris, Sepal.Length)
Summarise Cases distinct(.data, ..., .keep_all = FALSE) Remove select(.data, …)

w
www
ww w
www
rows with duplicate values. 
 Extract columns as a table. Also select_if().
These apply summary functions to columns to create a new distinct(iris, Species) select(iris, Sepal.Length, Species)
table of summary statistics. Summary functions take vectors as
input and return one value (see back). sample_frac(tbl, size = 1, replace = FALSE,
weight = NULL, .env = parent.frame()) Randomly

w
www
ww
summary function Use these helpers with select (),
select fraction of rows. 
 e.g. select(iris, starts_with("Sepal"))
sample_frac(iris, 0.5, replace = TRUE)
summarise(.data, …)
 contains(match) num_range(prefix, range) :, e.g. mpg:cyl

w
ww
Compute table of summaries. 
 sample_n(tbl, size, replace = FALSE, weight = ends_with(match) one_of(…) -, e.g, -Species
summarise(mtcars, avg = mean(mpg)) NULL, .env = parent.frame()) Randomly select matches(match) starts_with(match)
size rows. sample_n(iris, 10, replace = TRUE)
count(x, ..., wt = NULL, sort = FALSE)

Count number of rows in each group defined slice(.data, …) Select rows by position. MAKE NEW VARIABLES

w
ww
slice(iris, 10:15)

w
www
ww
by the variables in … Also tally().

count(iris, Species) These apply vectorized functions to columns. Vectorized funs take
top_n(x, n, wt) Select and order top n entries (by vectors as input and return vectors of the same length as output
group if grouped data). top_n(iris, 5, Sepal.Width) (see back).
VARIATIONS vectorized function
summarise_all() - Apply funs to every column.
mutate(.data, …) 


w
wwww
w
summarise_at() - Apply funs to specific columns.
summarise_if() - Apply funs to all cols of one type. Logical and boolean operators to use with filter() Compute new column(s).
mutate(mtcars, gpm = 1/mpg)
< <= is.na() %in% | xor()
> >= !is.na() ! & transmute(.data, …)

Group Cases
w
ww
See ?base::logic and ?Comparison for help. Compute new column(s), drop others.
transmute(mtcars, gpm = 1/mpg)
Use group_by() to create a "grouped" copy of a table. 

dplyr functions will manipulate each "group" separately and mutate_all(.tbl, .funs, …) Apply funs to every

w
www
then combine the results. ARRANGE CASES column. Use with funs(). Also mutate_if().

arrange(.data, …) Order rows by values of a mutate_all(faithful, funs(log(.), log2(.)))

ww
ww w
www
ww
mutate_if(iris, is.numeric, funs(log(.)))

w
w
mtcars %>% column or columns (low to high), use with
desc() to order from high to low.

w
group_by(cyl) %>% mutate_at(.tbl, .cols, .funs, …) Apply funs to

ww
arrange(mtcars, mpg)
summarise(avg = mean(mpg)) arrange(mtcars, desc(mpg)) specific columns. Use with funs(), vars() and
the helper functions for select().

mutate_at(iris, vars( -Species), funs(log(.)))
group_by(.data, ..., add = ungroup(x, …) ADD CASES add_column(.data, ..., .before = NULL, .after =

w
www
ww
FALSE) Returns ungrouped copy 
 NULL) Add new column(s). Also add_count(),
add_row(.data, ..., .before = NULL, .after = NULL)

w
www
ww
Returns copy of table 
 of table. add_tally(). add_column(mtcars, new = 1:32)
grouped by … ungroup(g_iris) Add one or more rows to a table.
g_iris <- group_by(iris, Species) add_row(faithful, eruptions = 1, waiting = 1)
rename(.data, …) Rename columns.


w
wwww
rename(iris, Length = Sepal.Length)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2017-03
Vector Functions Summary Functions Combine Tables
TO USE WITH MUTATE () TO USE WITH SUMMARISE () COMBINE VARIABLES COMBINE CASES dplyr
mutate() and transmute() apply vectorized summarise() applies summary functions to x y
functions to columns to create new columns. columns to create a new table. Summary A B C A B D A B C A B D A B C

Vectorized functions take vectors as input and


return vectors of the same length as output.
functions take vectors as input and return single
values as output.
a
b
c
t
u
v
1
2
3
+ a
b
d
t
u
w
3
2
1
= a
b
c
t
u
v
1
2
3
a
b
d
t
u
w
3
2
1 x
a
b
c
t
u
v
1
2
3

A B C
vectorized function summary function Use bind_cols() to paste tables beside each
other as they are. + y
C v 3
d w 4

COUNTS bind_cols(…) Returns tables placed side by


OFFSETS
dplyr::n() - number of values/rows side as a single table.  Use bind_rows() to paste tables below each
dplyr::lag() - Offset elements by 1 BE SURE THAT ROWS ALIGN.
dplyr::n_distinct() - # of uniques other as they are.
dplyr::lead() - Offset elements by -1
sum(!is.na()) - # of non-NA’s
CUMULATIVE AGGREGATES Use a "Mutating Join" to join one table to bind_rows(…, .id = NULL)
LOCATION DF
x
A
a
B
t
C
1
dplyr::cumall() - Cumulative all() columns from another, matching values with Returns tables one on top of the other
mean() - mean, also mean(!is.na()) the rows that they correspond to. Each join
x b u 2
as a single table. Set .id to a column
dplyr::cumany() - Cumulative any() x c v 3
median() - median retains a different combination of values from name to add a column of the original
cummax() - Cumulative max() z
z
c
d
v
w
3
4
dplyr::cummean() - Cumulative mean() the tables. table names (as pictured)
LOGICALS
cummin() - Cumulative min()
cumprod() - Cumulative prod() mean() - Proportion of TRUE’s A B C D left_join(x, y, by = NULL, A B C intersect(x, y, …)
cumsum() - Cumulative sum() sum() - # of TRUE’s a
b
t
u
1
2
3
2
copy=FALSE, suffix=c(“.x”,“.y”),…) c v 3
Rows that appear in both x and y.
c v 3 NA Join matching values from y to x.
RANKINGS POSITION/ORDER A B C setdiff(x, y, …)
dplyr::first() - first value right_join(x, y, by = NULL, copy = a t 1 Rows that appear in x but not y.
dplyr::cume_dist() - Proportion of all values <= A
a
B
t
C
1
D
3 FALSE, suffix=c(“.x”,“.y”),…) b u 2
dplyr::dense_rank() - rank with ties = min, no dplyr::last() - last value
dplyr::nth() - value in nth location of vector
b u 2 2
Join matching values from x to y. A B C union(x, y, …)
gaps d w NA 1
a t 1 Rows that appear in x or y. 

dplyr::min_rank() - rank with ties = min inner_join(x, y, by = NULL, copy = b u 2
(Duplicates removed). union_all()
dplyr::ntile() - bins into n bins RANK A B C D c v 3
a t 1 3 FALSE, suffix=c(“.x”,“.y”),…) d w 4 retains duplicates.
dplyr::percent_rank() - min_rank scaled to [0,1] quantile() - nth quantile  b u 2 2
Join data. Retain only rows with
dplyr::row_number() - rank with ties = "first" min() - minimum value matches.
max() - maximum value
MATH A B C D full_join(x, y, by = NULL, Use setequal() to test whether two data sets
+, - , *, /, ^, %/%, %% - arithmetic ops SPREAD a t 1 3
copy=FALSE, suffix=c(“.x”,“.y”),…) contain the exact same rows (in any order).
b u 2 2
log(), log2(), log10() - logs IQR() - Inter-Quartile Range c v 3 NA Join data. Retain all values, all rows.
<, <=, >, >=, !=, == - logical comparisons mad() - median absolute deviation d w NA 1

dplyr::between() - x >= left & x <= right sd() - standard deviation EXTRACT ROWS
dplyr::near() - safe == for floating point var() - variance x y
numbers
Use by = c("col1", "col2", …) to A B C A B D

+ =
A B.x C B.y D
a t 1 a t 3
MISC
a t 1 t 3
specify one or more common b u 2 b u 2

Row Names
b u 2 u 2
c v 3 NA NA columns to match on. c v 3 d w 1
dplyr::case_when() - multi-case if_else() left_join(x, y, by = "A")
dplyr::coalesce() - first non-NA values by
element  across a set of vectors Tidy data does not use rownames, which store a
variable outside of the columns. To work with the A.x B.x C A.y B.y Use a named vector, by = c("col1" = Use a "Filtering Join" to filter one table against
dplyr::if_else() - element-wise if() + else() rownames, first move them into a column. a t 1 d w
"col2"), to match on columns that the rows of another.
dplyr::na_if() - replace specific values with NA C A B
b u 2
c v 3
b u
a t have different names in each table.
pmax() - element-wise max() A B rownames_to_column() left_join(x, y, by = c("C" = "D")) A B C
semi_join(x, y, by = NULL, …)
pmin() - element-wise min() 1 a t 1 a t Move row names into col. a t 1 Return rows of x that have a match in y.
dplyr::recode() - Vectorized switch() 2 b u 2 b u a <- rownames_to_column(iris, var A1 B1 C A2 B2 Use suffix to specify the suffix to b u 2 USEFUL TO SEE WHAT WILL BE JOINED.
dplyr::recode_factor() - Vectorized switch()
 3 c v 3 c v
= "C") a t 1 d w give to unmatched columns that
for factors b u 2 b u
have the same name in both tables. A B C anti_join(x, y, by = NULL, …)

c v 3 a t
A B C A B column_to_rownames() left_join(x, y, by = c("C" = "D"), suffix = c v 3 Return rows of x that do not have a
1 a t 1 a t
Move col in row names.  c("1", "2")) match in y. USEFUL TO SEE WHAT WILL
2 b u 2 b u
3 c v 3 c v column_to_rownames(a, var = "C") NOT BE JOINED.

Also has_rownames(), remove_rownames()

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2017-03
Data Visualization with ggplot2 : : CHEAT SHEET
Basics Geoms Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables. 

Each function returns a layer.
GRAPHICAL PRIMITIVES TWO VARIABLES 

ggplot2 is based on the grammar of graphics, the idea continuous bivariate distribution
that you can build every graph from the same a <- ggplot(economics, aes(date, unemploy)) continuous x , continuous y
components: a data set, a coordinate system, b <- ggplot(seals, aes(x = long, y = lat)) h <- ggplot(diamonds, aes(carat, price))
e <- ggplot(mpg, aes(cty, hwy))
and geoms—visual marks that represent data points. a + geom_blank()
 e + geom_label(aes(label = cty), nudge_x = 1, h + geom_bin2d(binwidth = c(0.25, 500))

(Useful for expanding limits) nudge_y = 1, check_overlap = TRUE) x, y, label, x, y, alpha, color, fill, linetype, size, weight
F M A alpha, angle, color, family, fontface, hjust,
b + geom_curve(aes(yend = lat + 1,
 lineheight, size, vjust
+ = xend=long+1,curvature=z)) - x, xend, y, yend,
alpha, angle, color, curvature, linetype, size e + geom_jitter(height = 2, width = 2) 

h + geom_density2d()

x, y, alpha, colour, group, linetype, size
x, y, alpha, color, fill, shape, size
data geom coordinate plot a + geom_path(lineend="butt", linejoin="round", h + geom_hex()

x=F·y=A system linemitre=1)
 e + geom_point(), x, y, alpha, color, fill, shape, x, y, alpha, colour, fill, size
x, y, alpha, color, group, linetype, size size, stroke

To display values, map variables in the data to visual a + geom_polygon(aes(group = group))
 e + geom_quantile(), x, y, alpha, color, group,
properties of the geom (aesthetics) like size, color, and x x, y, alpha, color, fill, group, linetype, size linetype, size, weight
 continuous function
and y locations. i <- ggplot(economics, aes(date, unemploy))
b + geom_rect(aes(xmin = long, ymin=lat, xmax=
F M A long + 1, ymax = lat + 1)) - xmax, xmin, ymax, e + geom_rug(sides = "bl"), x, y, alpha, color, i + geom_area()

ymin, alpha, color, fill, linetype, size x, y, alpha, color, fill, linetype, size
+ =
linetype, size
a + geom_ribbon(aes(ymin=unemploy - 900, e + geom_smooth(method = lm), x, y, alpha, i + geom_line()

ymax=unemploy + 900)) - x, ymax, ymin, color, fill, group, linetype, size, weight x, y, alpha, color, group, linetype, size
data geom coordinate plot alpha, color, fill, group, linetype, size
x=F·y=A system e + geom_text(aes(label = cty), nudge_x = 1,
color = F i + geom_step(direction = "hv")

size = A nudge_y = 1, check_overlap = TRUE), x, y, label, x, y, alpha, color, group, linetype, size

alpha, angle, color, family, fontface, hjust, 

LINE SEGMENTS lineheight, size, vjust 

common aesthetics: x, y, alpha, color, linetype, size 

b + geom_abline(aes(intercept=0, slope=1)) visualizing error
Complete the template below to build a graph. b + geom_hline(aes(yintercept = lat)) df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2)
required b + geom_vline(aes(xintercept = long)) discrete x , continuous y j <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se))
ggplot (data = <DATA> ) + f <- ggplot(mpg, aes(class, hwy))
b + geom_segment(aes(yend=lat+1, xend=long+1)) j + geom_crossbar(fatten = 2)

<GEOM_FUNCTION> (mapping = aes( <MAPPINGS> ), x, y, ymax, ymin, alpha, color, fill, group, linetype,
b + geom_spoke(aes(angle = 1:1155, radius = 1)) f + geom_col(), x, y, alpha, color, fill, group,
stat = <STAT> , position = <POSITION> ) + Not 
 linetype, size size
<COORDINATE_FUNCTION> + required,
sensible j + geom_errorbar(), x, ymax, ymin, alpha, color,
f + geom_boxplot(), x, y, lower, middle, upper, group, linetype, size, width (also
<FACET_FUNCTION> + defaults
supplied ONE VARIABLE continuous ymax, ymin, alpha, color, fill, group, linetype, geom_errorbarh())
<SCALE_FUNCTION> + c <- ggplot(mpg, aes(hwy)); c2 <- ggplot(mpg) shape, size, weight
j + geom_linerange()

<THEME_FUNCTION> f + geom_dotplot(binaxis = "y", stackdir = x, ymin, ymax, alpha, color, group, linetype, size
c + geom_area(stat = "bin")
 "center"), x, y, alpha, color, fill, group
x, y, alpha, color, fill, linetype, size j + geom_pointrange()

ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot f + geom_violin(scale = "area"), x, y, alpha, color, x, y, ymin, ymax, alpha, color, fill, group, linetype,
that you finish by adding layers to. Add one geom c + geom_density(kernel = "gaussian")
 fill, group, linetype, size, weight shape, size
function per layer. 
 x, y, alpha, color, fill, group, linetype, size, weight
aesthetic mappings data geom
c + geom_dotplot() 
 maps
qplot(x = cty, y = hwy, data = mpg, geom = “point") x, y, alpha, color, fill data <- data.frame(murder = USArrests$Murder,

Creates a complete plot with given data, geom, and discrete x , discrete y state = tolower(rownames(USArrests)))

mappings. Supplies many useful defaults. c + geom_freqpoly() x, y, alpha, color, group, g <- ggplot(diamonds, aes(cut, color)) map <- map_data("state")

linetype, size k <- ggplot(data, aes(fill = murder))
last_plot() Returns the last plot g + geom_count(), x, y, alpha, color, fill, shape,
c + geom_histogram(binwidth = 5) x, y, alpha, k + geom_map(aes(map_id = state), map = map)
ggsave("plot.png", width = 5, height = 5) Saves last plot color, fill, linetype, size, weight size, stroke + expand_limits(x = map$long, y = map$lat),
as 5’ x 5’ file named "plot.png" in working directory. map_id, alpha, color, fill, linetype, size
Matches file type to file extension. c2 + geom_qq(aes(sample = hwy)) x, y, alpha,
color, fill, linetype, size, weight
THREE VARIABLES
seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2))l <- ggplot(seals, aes(long, lat))
discrete l + geom_contour(aes(z = z))
 l + geom_raster(aes(fill = z), hjust=0.5, vjust=0.5,
d <- ggplot(mpg, aes(fl)) x, y, z, alpha, colour, group, linetype, 
 interpolate=FALSE)

size, weight x, y, alpha, fill
d + geom_bar() 

x, alpha, color, fill, linetype, size, weight l + geom_tile(aes(fill = z)), x, y, alpha, color, fill,
linetype, size, width

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at http://ggplot2.tidyverse.org • ggplot2 2.1.0 • Updated: 2016-11
Stats An alternative way to build a layer Scales Coordinate Systems Faceting
A stat builds new variables to plot (e.g., count, prop). Scales map data values to the visual values of an r <- d + geom_bar() Facets divide a plot into 

fl cty cyl aesthetic. To change a mapping, add a new scale. r + coord_cartesian(xlim = c(0, 5)) 
 subplots based on the 

xlim, ylim
 values of one or more 

+ =
x ..count.. (n <- d + geom_bar(aes(fill = fl))) The default cartesian coordinate system discrete variables.
aesthetic prepackaged scale-specific r + coord_fixed(ratio = 1/2) 

scale_ to adjust scale to use arguments ratio, xlim, ylim
 t <- ggplot(mpg, aes(cty, hwy)) + geom_point()
data stat geom coordinate plot Cartesian coordinates with fixed aspect ratio
x = x ·
 system n + scale_fill_manual( between x and y units
y = ..count.. values = c("skyblue", "royalblue", "blue", “navy"), r + coord_flip() 
 t + facet_grid(. ~ fl)

Visualize a stat by changing the default stat of a geom limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", “r"), xlim, ylim
 facet into columns based on fl
name = "fuel", labels = c("D", "E", "P", "R")) Flipped Cartesian coordinates
function, geom_bar(stat="count") or by using a stat t + facet_grid(year ~ .)

r + coord_polar(theta = "x", direction=1 ) 
 facet into rows based on year
function, stat_count(geom="bar"), which calls a default range of title to use in labels to use breaks to use in theta, start, direction

geom to make a layer (equivalent to a geom function). values to include legend/axis in legend/axis legend/axis
in mapping Polar coordinates t + facet_grid(year ~ fl)

Use ..name.. syntax to map stat variables to aesthetics. r + coord_trans(ytrans = “sqrt") 
 facet into both rows and columns
xtrans, ytrans, limx, limy

GENERAL PURPOSE SCALES Transformed cartesian coordinates. Set xtrans and t + facet_wrap(~ fl)

wrap facets into a rectangular layout
geom to use stat function geommappings ytrans to the name of a window function.
Use with most aesthetics
i + stat_density2d(aes(fill = ..level..), Set scales to let axis limits vary across facets
scale_*_continuous() - map cont’ values to visual ones π + coord_quickmap()
geom = "polygon") 60

variable created by stat scale_*_discrete() - map discrete values to visual ones π + coord_map(projection = "ortho", t + facet_grid(drv ~ fl, scales = "free")


lat
scale_*_identity() - use data values as visual ones orientation=c(41, -74, 0))projection, orienztation, x and y axis limits adjust to individual facets

xlim, ylim "free_x" - x axis limits adjust

c + stat_bin(binwidth = 1, origin = 10)
 scale_*_manual(values = c()) - map discrete values to long

Map projections from the mapproj package


manually chosen visual ones "free_y" - y axis limits adjust
x, y | ..count.., ..ncount.., ..density.., ..ndensity.. (mercator (default), azequalarea, lagrange, etc.)
scale_*_date(date_labels = "%m/%d"), date_breaks = "2 Set labeller to adjust facet labels
c + stat_count(width = 1) x, y, | ..count.., ..prop.. weeks") - treat data values as dates.
c + stat_density(adjust = 1, kernel = “gaussian") 
 scale_*_datetime() - treat data x values as date times. t + facet_grid(. ~ fl, labeller = label_both)
x, y, | ..count.., ..density.., ..scaled..
e + stat_bin_2d(bins = 30, drop = T)

Use same arguments as scale_x_date(). See ?strptime for
label formats. Position Adjustments fl: c fl: d fl: e fl: p fl: r

t + facet_grid(fl ~ ., labeller = label_bquote(alpha ^ .(fl)))


x, y, fill | ..count.., ..density.. Position adjustments determine how to arrange geoms ↵c ↵d ↵e ↵p ↵r
e + stat_bin_hex(bins=30) x, y, fill | ..count.., ..density..
X & Y LOCATION SCALES that would otherwise occupy the same space. t + facet_grid(. ~ fl, labeller = label_parsed)
e + stat_density_2d(contour = TRUE, n = 100)
 Use with x or y aesthetics (x shown here) c d e p r
x, y, color, size | ..level.. scale_x_log10() - Plot x on log10 scale s <- ggplot(mpg, aes(fl, fill = drv))
s + geom_bar(position = "dodge")

Labels
e + stat_ellipse(level = 0.95, segments = 51, type = "t") scale_x_reverse() - Reverse direction of x axis
scale_x_sqrt() - Plot x on square root scale Arrange elements side by side
l + stat_contour(aes(z = z)) x, y, z, order | ..level.. s + geom_bar(position = "fill")

Stack elements on top of one another, 
 t + labs( x = "New x axis label", y = "New y axis label",

l + stat_summary_hex(aes(z = z), bins = 30, fun = max)
 COLOR AND FILL SCALES (DISCRETE) normalize height
x, y, z, fill | ..value.. title ="Add a title above the plot", 

n <- d + geom_bar(aes(fill = fl)) e + geom_point(position = "jitter")
 Use scale functions
subtitle = "Add a subtitle below title",
 to update legend
l + stat_summary_2d(aes(z = z), bins = 30, fun = mean)
 Add random noise to X and Y position of each caption = "Add a caption below plot", labels
n + scale_fill_brewer(palette = "Blues") 
 element to avoid overplotting
x, y, z, fill | ..value.. For palette choices: <aes> = "New <aes>
<AES> <AES> legend title")
A
RColorBrewer::display.brewer.all() e + geom_label(position = "nudge")

f + stat_boxplot(coef = 1.5) x, y | ..lower.., 
 B Nudge labels away from points
 t + annotate(geom = "text", x = 8, y = 9, label = "A")
..middle.., ..upper.., ..width.. , ..ymin.., ..ymax.. n + scale_fill_grey(start = 0.2, end = 0.8, 

na.value = "red") geom to place manual values for geom’s aesthetics
f + stat_ydensity(kernel = "gaussian", scale = “area") x, y | s + geom_bar(position = "stack")

..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width.. Stack elements on top of one another
COLOR AND FILL SCALES (CONTINUOUS)
e + stat_ecdf(n = 40) x, y | ..x.., ..y..
e + stat_quantile(quantiles = c(0.1, 0.9), formula = y ~
o <- c + geom_dotplot(aes(fill = ..x..)) Each position adjustment can be recast as a function with
manual width and height arguments
Legends
log(x), method = "rq") x, y | ..quantile.. o + scale_fill_distiller(palette = "Blues") s + geom_bar(position = position_dodge(width = 1)) n + theme(legend.position = "bottom")

e + stat_smooth(method = "lm", formula = y ~ x, se=T, 
 Place legend at "bottom", "top", "left", or "right"
level=0.95) x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax.. o + scale_fill_gradient(low="red", high="yellow") n + guides(fill = "none")


Themes
Set legend type for each aesthetic: colorbar, legend, or
ggplot() + stat_function(aes(x = -3:3), n = 99, fun = o + scale_fill_gradient2(low="red", high=“blue", none (no legend)
dnorm, args = list(sd=0.5)) x | ..x.., ..y.. mid = "white", midpoint = 25) n + scale_fill_discrete(name = "Title", 

e + stat_identity(na.rm = TRUE) 
 labels = c("A", "B", "C", "D", "E"))

o + scale_fill_gradientn(colours=topo.colors(6)) r + theme_bw()
 r + theme_classic() Set legend title and labels with a scale function.
ggplot() + stat_qq(aes(sample=1:100), dist = qt, White background

Also: rainbow(), heat.colors(), terrain.colors(), with grid lines r + theme_light()
dparam=list(df=5)) sample, x, y | ..sample.., ..theoretical..
Zooming
cm.colors(), RColorBrewer::brewer.pal() r + theme_gray()
 r + theme_linedraw()
e + stat_sum() x, y, size | ..n.., ..prop.. Grey background 
 r + theme_minimal()

e + stat_summary(fun.data = "mean_cl_boot") SHAPE AND SIZE SCALES (default theme) Minimal themes
h + stat_summary_bin(fun.y = "mean", geom = "bar") r + theme_dark()
 r + theme_void()
 Without clipping (preferred)
p <- e + geom_point(aes(shape = fl, size = cyl)) dark for contrast
p + scale_shape() + scale_size() Empty theme t + coord_cartesian(

e + stat_unique() xlim = c(0, 100), ylim = c(10, 20))
p + scale_shape_manual(values = c(3:7))
With clipping (removes unseen data points)
t + xlim(0, 100) + ylim(10, 20)
p + scale_radius(range = c(1,6))
p + scale_size_area(max_size = 6) t + scale_x_continuous(limits = c(0, 100)) +
scale_y_continuous(limits = c(0, 100))

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at http://ggplot2.tidyverse.org • ggplot2 2.1.0 • Updated: 2016-11
Apply functions with purrr : : CHEAT SHEET
Apply Functions Work with Lists
Map functions apply a function iteratively to each element of a list FILTER LISTS SUMMARISE LISTS TRANSFORM LISTS
or vector.
map(.x, .f, …) Apply a a b pluck(.x, ..., .default=NULL) a FALSE every(.x, .p, …) Do all a a modify(.x, .f, ...) Apply
fun( ,…) b Select an element by name elements pass a test? function to each element. Also
map( , fun, …) fun( ,…) function to each b b b
element of a list or c or index, pluck(x,"b") ,or its c every(x, is.character) c c map, map_chr, map_dbl,
fun( ,…) d attribute with attr_getter. d d map_dfc, map_dfr, map_int,
vector. map(x, is.logical)
pluck(x,"b",attr_getter("n")) a TRUE some(.x, .p, …) Do some map_lgl. modify(x, ~.+ 2)
b elements pass a test? 

map2(.x, ,y, .f, …) Apply a a keep(.x, .p, …) Select c some(x, is.character) a a modify_at(.x, .at, .f, ...) Apply
fun( , ,…) elements that pass a function to elements by name
map2( , ,fun,…) fun( , ,…) a function to pairs of b c b b
fun( , ,…) elements from two lists, c logical test. keep(x, is.na) a TRUE has_element(.x, .y) Does a c c or index. Also map_at.
vectors. map2(x, y, sum) b list contain an element? d d modify_at(x, "b", ~.+ 2)
a b discard(.x, .p, …) Select c has_element(x, "foo")
b elements that do not pass a a a modify_if(.x, .p, .f, ...) Apply
pmap(.l, .f, …) Apply a c logical test. discard(x, is.na) detect(.x, .f, ..., .right=FALSE, b b function to elements that
fun( , , ,…) function to groups of a c
pmap( ,fun,…) fun( , , ,…) b .p) Find first element to pass. c c pass a test. Also map_if.
fun( , , ,…) elements from list of lists, a NULL b compact(.x, .p = identity)
 c detect(x, is.character) d d modify_if(x, is.numeric,~.+2)
vectors. pmap(list(x, y, z), b Drop empty elements.
sum, na.rm = TRUE) c NULL compact(x) detect_index(.x, .f, ..., .right modify_depth(.x,.depth,.f,...)
a 3
b = FALSE, .p) Find index of Apply function to each
a a head_while(.x, .p, …) first element to pass. element at a given level of a
fun invoke_map(.f, .x = c
detect_index(x, is.character) list. modify_depth(x, 1, ~.+ 2)
fun ( ,…)
list(NULL), …, .env=NULL) b b Return head elements
invoke_map( fun , ,…) fun ( ,…) c until one does not pass. xy z
fun fun ( ,…) Run each function in a list. d Also tail_while.
2
vec_depth(x) Return depth
a
Also invoke. l <- list(var, head_while(x, is.character) b (number of levels of WORK WITH LISTS
sd); invoke_map(l, x = 1:9) c indexes). vec_depth(x) array_tree(array, margin =
lmap(.x, .f, ...) Apply function to each list-element of a list or vector. NULL) Turn array into list.
RESHAPE LISTS JOIN (TO) LISTS Also array_branch.
imap(.x, .f, ...) Apply .f to each element of a list or vector and its index.
array_tree(x, margin = 3)
a flatten(.x) Remove a level + append(x, values, after =
OUTPUT of indexes from a list. Also length(x)) Add to end of list. cross2(.x, .y, .filter = NULL)
b +
map(), map2(), pmap(), function returns c flatten_chr, flatten_dbl, append(x, list(d = 1)) All combinations of .x
imap and invoke_map flatten_dfc, flatten_dfr, and .y. Also cross, cross3,
map list flatten_int, flatten_lgl. prepend(x, values, before = cross_df. cross2(1:3, 4:6)
each return a list. Use a +
suffixed version to map_chr character vector flatten(x) 1) Add to start of list.
return the results as a map_dbl double (numeric) vector prepend(x, list(d = 1)) a p set_names(x, nm = x) Set
specific type of flat xy x y transpose(.l, .names = b q the names of a vector/list
vector, e.g. map2_chr, map_dfc data frame (column bind) a a NULL) Transposes the index + splice(…) Combine objects c r directly or with a function.
pmap_lgl, etc. map_dfr data frame (row bind) b b order in a multi-level list. into a list, storing S3 objects set_names(x, c("p", "q", "r"))
c c transpose(x) + as sub-lists. splice(x, y, "foo") set_names(x, tolower)
Use walk, walk2, and map_int integer vector
pwalk to trigger side map_lgl logical vector
effects. Each return its
input invisibly.
walk triggers side effects, returns
the input invisibly
Reduce Lists Modify function behavior
a b
func + a b c d func( , ) reduce(.x, .f, ..., .init) compose() Compose negate() Negate a quietly() Modify
SHORTCUTS - within a purrr function: Apply function recursively multiple functions. predicate function (a function to return
c
"name" becomes ~ .x .y becomes func( , ) to each element of a list or pipe friendly !) list of results,
function(x) x[["name"]], function(.x, .y) .x .y, e.g.
d vector. Also reduce_right, lift() Change the type output, messages,
func( , ) reduce2, reduce2_right. of input a function partial() Create a warnings.
e.g. map(l, "a") extracts a map2(l, p, ~ .x +.y ) becomes
from each element of l map2(l, p, function(l, p) l + p ) reduce(x, sum) takes. Also lift_dl, version of a function
lift_dv, lift_ld, lift_lv, that has some args possibly() Modify
~ .x becomes function(x) x, ~ ..1 ..2 etc becomes lift_vd, lift_vl. preset to values. function to return
func + a b c d func( , ) accumulate(.x, .f, ..., .init)
e.g. map(l, ~ 2 +.x) becomes function(..1, ..2, etc) ..1 ..2 etc, c Reduce, but also return default value
func( , ) rerun() Rerun safely() Modify func whenever an error
map(l, function(x) 2 + x ) e.g. pmap(list(a, b, c), ~ ..3 + ..1 - ..2) d intermediate results. Also
becomes pmap(list(a, b, c), func( , ) expression n times. to return list of occurs (instead of
accumulate_right. results and errors. error).
function(a, b, c) c + a - b) accumulate(x, sum)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at purrr.tidyverse.org • purrr 0.2.3 • Updated: 2017-09
Nested Data "cell" contents
List Column Workflow Nested data frames use a list column, a list that is stored as a
column vector of a data frame. A typical workflow for list columns:
A nested data frame stores

1 2 3
Sepal.L Sepal.W Petal.L Petal.W
individual tables within the 5.1 3.5 1.4 0.2 Make a list Work with Simplify
4.9 3.0 1.4 0.2
cells of a larger, organizing column list columns the list
4.7 3.2 1.3 0.2 S.L S.W P.L P.W
table. 4.6 3.1 1.5 0.2 Species S.L S.W P.L P.W 5.1 3.5 1.4 0.2
Call: column
lm(S.L ~ ., df)
setosa 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2
5.0 3.6 1.4 0.2 Coefs:
setosa 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2
(Int) S.W P.L P.W
n_iris$data[[1]] setosa 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 2.3 0.6 0.2 0.2
setosa 4.6 3.1 1.5 0.2
Species data S.L S.W P.L P.W
nested data frame Sepal.L Sepal.W Petal.L Petal.W versi 7.0 3.2 4.7 1.4
setos <tibble [50x4]> 7.0 3.2 4.7 1.4
Species data model Call: Species beta
versi 6.4 3.2 4.5 1.5 setosa <tibble [50x4]> <S3: lm> lm(S.L ~ ., df) setos 2.35
Species data 7.0 3.2 4.7 1.4 versi <tibble [50x4]> 6.4 3.2 4.5 1.5
versi 6.9 3.1 4.9 1.5 versi <tibble [50x4]> <S3: lm> Coefs: versi 1.89
virgini <tibble [50x4]> 6.9 3.1 4.9 1.5
setosa <tibble [50 x 4]> 6.4 3.2 4.5 1.5 versi 5.5 2.3 4.0 1.3 virgini <tibble [50x4]> <S3: lm> (Int) S.W P.L P.W virgini 0.69
5.5 2.3 4.0 1.3 1.8 0.3 0.9 -0.6
versicolor <tibble [50 x 4]> 6.9 3.1 4.9 1.5 virgini 6.3 3.3 6.0 2.5
virginica <tibble [50 x 4]> 5.5 2.3 4.0 1.3 virgini 5.8 2.7 5.1 1.9 S.L S.W P.L P.W
virgini 7.1 3.0 5.9 2.1 Call:
6.5 2.8 4.6 1.5 6.3 3.3 6.0 2.5
n_iris virgini 6.3 2.9 5.6 1.8 5.8 2.7 5.1 1.9
lm(S.L ~ ., df)

n_iris$data[[2]] 7.1 3.0 5.9 2.1 Coefs:


(Int) S.W P.L P.W
6.3 2.9 5.6 1.8 0.6 0.3 0.9 -0.1

Sepal.L Sepal.W Petal.L Petal.W n_iris <- iris %>% mod_fun <- function(df) b_fun <- function(mod)
Use a nested data frame to: 6.3 3.3 6.0 2.5 group_by(Species) %>% lm(Sepal.Length ~ ., data = df) coefficients(mod)[[1]]
5.8 2.7 5.1 1.9 nest()
• preserve relationships 7.1 3.0 5.9 2.1 m_iris <- n_iris %>% m_iris %>% transmute(Species,
between observations and 6.3 2.9 5.6 1.8 mutate(model = map(data, mod_fun)) beta = map_dbl(model, b_fun))
subsets of data 6.5 3.0 5.8 2.2

• manipulate many sub-tables n_iris$data[[3]]


1. MAKE A LIST COLUMN - You can create list columns with functions in the tibble and dplyr packages, as well as tidyr’s nest()
at once with the purrr functions map(), map2(), or pmap().
tibble::tribble(…) tibble::tibble(…) dplyr::mutate(.data, …) Also transmute()
Makes list column when needed Saves list input as list columns Returns list col when result returns list.
Use a two step process to create a nested data frame: tribble( ~max, ~seq, max seq tibble(max = c(3, 4, 5), seq = list(1:3, 1:4, 1:5)) mtcars %>% mutate(seq = map(cyl, seq))
1. Group the data frame into groups with dplyr::group_by() 3, 1:3, 3 <int [3]>
4 <int [4]>
2. Use nest() to create a nested data frame 4, 1:4, 5 <int [5]>
with one row per group S.L S.W P.L P.W 5, 1:5) tibble::enframe(x, name="name", value="value") dplyr::summarise(.data, …)
5.1 3.5 1.4 0.2 Converts multi-level list to tibble with list cols Returns list col when result is wrapped with list()
4.9 3.0 1.4 0.2
Species S.L S.W P.L P.W
setosa 5.1 3.5 1.4 0.2
Species
setosa
S.L S.W P.L P.W
5.1 3.5 1.4 0.2 4.7 3.2 1.3 0.2
enframe(list('3'=1:3, '4'=1:4, '5'=1:5), 'max', 'seq') mtcars %>% group_by(cyl) %>%
setosa 4.9 3.0 1.4 0.2 setosa 4.9 3.0 1.4 0.2 4.6 3.1 1.5 0.2 summarise(q = list(quantile(mpg)))
setosa 4.7 3.2 1.3 0.2 setosa 4.7 3.2 1.3 0.2 5.0 3.6 1.4 0.2
setosa 4.6 3.1 1.5 0.2 setosa 4.6 3.1 1.5 0.2
setosa 5.0 3.6 1.4 0.2 setosa 5.0 3.6 1.4 0.2 S.L S.W P.L P.W 2. WORK WITH LIST COLUMNS - Use the purrr functions map(), map2(), and pmap() to apply a function that returns a result element-wise
versi 7.0 3.2 4.7 1.4 versi 7.0 3.2 4.7 1.4 Species data 7.0 3.2 4.7 1.4
versi 6.4 3.2 4.5 1.5 versi 6.4 3.2 4.5 1.5 setos <tibble [50x4]> 6.4 3.2 4.5 1.5 to the cells of a list column. walk(), walk2(), and pwalk() work the same way, but return a side effect.
versi 6.9 3.1 4.9 1.5 versi 6.9 3.1 4.9 1.5 versi <tibble [50x4]> 6.9 3.1 4.9 1.5
versi 5.5 2.3 4.0 1.3 versi 5.5 2.3 4.0 1.3 virgini <tibble [50x4]> 5.5 2.3 4.0 1.3 purrr::map(.x, .f, ...) data data result
versi 6.5 2.8 4.6 1.5 versi 6.5 2.8 4.6 1.5 6.5 2.8 4.6 1.5 fun( <tibble [50x4]> , …)
virgini virgini 6.3 3.3 6.0 2.5
Apply .f element-wise to .x as .f(.x) map( <tibble [50x4]>
, fun, …) fun( , …)
result 1
6.3 3.3 6.0 2.5 <tibble [50x4]> <tibble [50x4]> result 2
virgini 5.8 2.7 5.1 1.9 virgini 5.8 2.7 5.1 1.9 S.L S.W P.L P.W n_iris %>% mutate(n = map(data, dim)) <tibble [50x4]> fun( <tibble [50x4]> , …) result 3
virgini 7.1 3.0 5.9 2.1 virgini 7.1 3.0 5.9 2.1 6.3 3.3 6.0 2.5
virgini 6.3 2.9 5.6 1.8 virgini 6.3 2.9 5.6 1.8 5.8 2.7 5.1 1.9 purrr::map2(.x, .y, .f, ...) data model data model result
virgini 6.5 3.0 5.8 2.2 virgini 6.5 3.0 5.8 2.2 7.1 3.0 5.9 2.1
Apply .f element-wise to .x and .y as .f(.x, .y) <tibble [50x4]> <S3: lm> fun( <tibble [50x4]> , <S3: lm> ,…) result 1
6.3 2.9 5.6 1.8 map2( <tibble [50x4]>
, <S3: lm>
, fun, …) fun( <tibble [50x4]> , <S3: lm> ,…) result 2
n_iris <- iris %>% group_by(Species) %>% nest() 6.5 3.0 5.8 2.2 m_iris %>% mutate(n = map2(data, model, list)) <tibble [50x4]> <S3: lm> fun( <tibble [50x4]> , <S3: lm> ,…) result 3

tidyr::nest(data, ..., .key = data) purrr::pmap(.l, .f, ...)


data model funs data model funs result
For grouped data, moves groups into cells as data frames. Apply .f element-wise to vectors saved in .l fun( <tibble [50x4]> , <S3: lm> , ,…)
m_iris %>% pmap(list( <tibble [50x4]>
<tibble [50x4]> ,
<S3: lm>
<S3: lm> , coef
AIC
), fun, …) fun( <tibble [50x4]> , <S3: lm> ,
coef
AIC ,…)
result 1
result 2
mutate(n = pmap(list(data, model, data), list)) <tibble [50x4]> <S3: lm> BIC fun( <tibble [50x4]> , <S3: lm> , BIC ,…) result 3

Unnest a nested data frame Species data


setos <tibble [50x4]>
Species
setosa
S.L S.W P.L P.W
5.1 3.5 1.4 0.2
with unnest(): versi <tibble [50x4]> setosa 4.9 3.0 1.4 0.2 3. SIMPLIFY THE LIST COLUMN (into a regular column)
virgini <tibble [50x4]> setosa 4.7 3.2 1.3 0.2
n_iris %>% unnest() setosa 4.6 3.1 1.5 0.2
Use the purrr functions map_lgl(), purrr::map_lgl(.x, .f, ...) purrr::map_dbl(.x, .f, ...)
versi 7.0 3.2 4.7 1.4
tidyr::unnest(data, ..., .drop = NA, .id=NULL, .sep=NULL) versi 6.4 3.2 4.5 1.5 map_int(), map_dbl(), map_chr(), Apply .f element-wise to .x, return a logical vector Apply .f element-wise to .x, return a double vector
versi 6.9 3.1 4.9 1.5
as well as tidyr’s unnest() to reduce n_iris %>% transmute(n = map_lgl(data, is.matrix)) n_iris %>% transmute(n = map_dbl(data, nrow))
Unnests a nested data frame. versi 5.5 2.3 4.0 1.3
virgini 6.3 3.3 6.0 2.5 a list column into a regular column. purrr::map_chr(.x, .f, ...)
virgini 5.8 2.7 5.1 1.9 purrr::map_int(.x, .f, ...)
virgini 7.1 3.0 5.9 2.1 Apply .f element-wise to .x, return an integer vector Apply .f element-wise to .x, return a character vector
virgini 6.3 2.9 5.6 1.8
n_iris %>% transmute(n = map_int(data, nrow)) n_iris %>% transmute(n = map_chr(data, nrow))
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at purrr.tidyverse.org • purrr 0.2.3 • Updated: 2017-09
String manipulation with stringr : : CHEAT SHEET
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.

Detect Matches Subset Strings Manage Lengths


TRUE str_detect(string, pattern) Detect the str_sub(string, start = 1L, end = -1L) Extract 4 str_length(string) The width of strings (i.e.
TRUE
FALSE
presence of a pattern match in a string. substrings from a character vector. 6 number of code points, which generally equals
2
TRUE str_detect(fruit, "a") str_sub(fruit, 1, 3); str_sub(fruit, -2) 3
the number of characters). str_length(fruit)
1 str_which(string, pattern) Find the indexes of str_subset(string, pattern) Return only the str_pad(string, width, side = c("left", "right",
2
4
strings that contain a pattern match. strings that contain a pattern match. "both"), pad = " ") Pad strings to constant
str_which(fruit, "a") str_subset(fruit, "b") width. str_pad(fruit, 17)
0 str_count(string, pattern) Count the number str_extract(string, pattern) Return the first str_trunc(string, width, side = c("right", "left",
3
1
of matches in a string. NA pattern match found in each string, as a vector. "center"), ellipsis = "...") Truncate the width of
2 str_count(fruit, "a") Also str_extract_all to return every pattern strings, replacing content with ellipsis.
match. str_extract(fruit, "[aeiou]") str_trunc(fruit, 3)
str_locate(string, pattern) Locate the
start end

2 4
4 7 positions of pattern matches in a string. Also str_match(string, pattern) Return the first str_trim(string, side = c("both", "left", "right"))
NA NA str_locate_all. str_locate(fruit, "a") NA NA
pattern match found in each string, as a Trim whitespace from the start and/or end of a
3 4
matrix with a column for each ( ) group in string. str_trim(fruit)
pattern. Also str_match_all.
str_match(sentences, "(a|the) ([^ ]+)")

Mutate Strings Join and Split Order Strings


str_sub() <- value. Replace substrings by str_c(..., sep = "", collapse = NULL) Join 4 str_order(x, decreasing = FALSE, na_last =
identifying the substrings with str_sub() and multiple strings into a single string. 1 TRUE, locale = "en", numeric = FALSE, ...)1 Return
assigning into the results. str_c(letters, LETTERS)
3
2
the vector of indexes that sorts a character
str_sub(fruit, 1, 3) <- "str" vector. x[str_order(x)]
str_c(..., sep = "", collapse = NULL) Collapse
str_replace(string, pattern, replacement) a vector of strings into a single string. str_sort(x, decreasing = FALSE, na_last = TRUE,
Replace the first matched pattern in each str_c(letters, collapse = "") locale = "en", numeric = FALSE, ...)1 Sort a
character vector.
string. str_replace(fruit, "a", "-")
str_sort(x)
str_dup(string, times) Repeat strings times
str_replace_all(string, pattern, times. str_dup(fruit, times = 2)
replacement) Replace all matched patterns
in each string. str_replace_all(fruit, "a", "-") str_split_fixed(string, pattern, n) Split a
vector of strings into a matrix of substrings
Helpers
str_conv(string, encoding) Override the
A STRING str_to_lower(string, locale = "en")1 Convert (splitting at occurrences of a pattern match). encoding of a string. str_conv(fruit,"ISO-8859-1")
a string strings to lower case. Also str_split to return a list of substrings.
str_to_lower(sentences) str_split_fixed(fruit, " ", n=2) str_view(string, pattern, match = NA) View
HTML rendering of first regex match in each
a string str_to_upper(string, locale = "en")1 Convert {xx} {yy} str_glue(…, .sep = "", .envir = parent.frame()) string. str_view(fruit, "[aeiou]")
A STRING strings to upper case. Create a string from strings and {expressions}
str_to_upper(sentences) to evaluate. str_glue("Pi is {pi}") str_view_all(string, pattern, match = NA) View
a string HTML rendering of all regex matches.
str_to_title(string, locale = "en")1 Convert str_glue_data(.x, ..., .sep = "", .envir = str_view_all(fruit, "[aeiou]")
A String strings to title case. str_to_title(sentences) parent.frame(), .na = "NA") Use a data frame,
list, or environment to create a string from str_wrap(string, width = 80, indent = 0, exdent
strings and {expressions} to evaluate. = 0) Wrap strings into nicely formatted
str_glue_data(mtcars, "{rownames(mtcars)} paragraphs. str_wrap(sentences, 20)
has {hp} hp")

1 See bit.ly/ISO639-1 for a complete list of locales.


RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor  • stringr 1.2.0 • Updated: 2017-10
Need to Know Regular Expressions - Regular expressions, or regexps, are a concise language for
describing patterns in strings.
[:space:]
new line


Pattern arguments in stringr are interpreted as MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx)
regular expressions after any special characters [:blank:] .
have been parsed. string (type regexp matches example
this) (to mean this) (which matches this) space
In R, you write regular expressions as strings, a (etc.) a (etc.) see("a") abc ABC 123 .!?\(){} tab
sequences of characters surrounded by quotes \\. \. . see("\\.") abc ABC 123 .!?\(){}
("") or single quotes('').
\\! \! ! see("\\!") abc ABC 123 .!?\(){} [:graph:]
Some characters cannot be represented directly \\? \? ? see("\\?") abc ABC 123 .!?\(){}
in an R string . These must be represented as \\\\ \\ \ see("\\\\") abc ABC 123 .!?\(){} [:punct:]
special characters, sequences of characters that \\( \( ( see("\\(") abc ABC 123 .!?\(){}
have a specific meaning., e.g. . , : ; ? ! \ | / ` = * + - ^
\\) \) ) see("\\)") abc ABC 123 .!?\(){}
Special Character Represents \\{ \{ { see("\\{") abc ABC 123 .!?\(){} _ ~ " ' [ ] { } ( ) < > @# $
\\ \ \\} \} } see( "\\}") abc ABC 123 .!?\(){}
\" " \\n \n new line (return) see("\\n") abc ABC 123 .!?\(){} [:alnum:]
\n new line \\t \t tab see("\\t") abc ABC 123 .!?\(){}
Run ?"'" to see a complete list \\s \s any whitespace (\S for non-whitespaces) see("\\s") abc ABC 123 .!?\(){} [:digit:]
\\d \d any digit (\D for non-digits) see("\\d") abc ABC 123 .!?\(){}
0 1 2 3 4 5 6 7 8 9
Because of this, whenever a \ appears in a regular \\w \w any word character (\W for non-word chars) see("\\w") abc ABC 123 .!?\(){}
expression, you must write it as \\ in the string \\b \b word boundaries see("\\b") abc ABC 123 .!?\(){}
that represents the regular expression. [:digit:]
1
digits see("[:digit:]") abc ABC 123 .!?\(){} [:alpha:]
1
Use writeLines() to see how R views your string [:alpha:] letters see("[:alpha:]") abc ABC 123 .!?\(){} [:lower:] [:upper:]
1
after all special characters have been parsed. [:lower:] lowercase letters see("[:lower:]") abc ABC 123 .!?\(){}
[:upper:]
1
uppercase letters see("[:upper:]") abc ABC 123 .!?\(){} a b c d e f A B CD E F
writeLines("\\.") [:alnum:]
1
letters and numbers see("[:alnum:]") abc ABC 123 .!?\(){}
# \. g h i j k l GH I J K L
[:punct:] 1 punctuation see("[:punct:]") abc ABC 123 .!?\(){}
mn o p q r MNO PQ R
writeLines("\\ is a backslash") [:graph:] 1 letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?\(){}
# \ is a backslash [:space:] 1 space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?\(){} s t u vw x S TU V WX
[:blank:] 1 space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?\(){} z Z
. every character except a new line see(".") abc ABC 123 .!?\(){}
INTERPRETATION 1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]]
Patterns in stringr are interpreted as regexs To
change this default, wrap the pattern in one of:
ALTERNATES alt <- function(rx) str_view_all("abcde", rx) QUANTIFIERS quant <- function(rx) str_view_all(".a.aa.aaa", rx)
regex(pattern, ignore_case = FALSE, multiline = example example
FALSE, comments = FALSE, dotall = FALSE, ...) regexp matches regexp matches
Modifies a regex to ignore cases, match end of ab|d or alt("ab|d") abcde a? zero or one quant("a?") .a.aa.aaa
lines as well of end of strings, allow R comments [abe] one of alt("[abe]") abcde a* zero or more quant("a*") .a.aa.aaa
within regex's , and/or to have . match everything a+ one or more quant("a+") .a.aa.aaa
including \n. [^abe] anything but alt("[^abe]") abcde
str_detect("I", regex("i", TRUE)) [a-c] range alt("[a-c]") abcde 1 2 ... n a{n} exactly n quant("a{2}") .a.aa.aaa
1 2 ... n a{n, } n or more quant("a{2,}") .a.aa.aaa
fixed() Matches raw bytes but will miss some n ... m a{n, m} between n and m quant("a{2,4}") .a.aa.aaa
characters that can be represented in multiple ANCHORS anchor <- function(rx) str_view_all("aaa", rx)
ways (fast). str_detect("\u0130", fixed("i")) regexp matches example
^a start of string anchor("^a") aaa GROUPS ref <- function(rx) str_view_all("abbaab", rx)
coll() Matches raw bytes and will use locale
specific collation rules to recognize characters a$ end of string anchor("a$") aaa Use parentheses to set precedent (order of evaluation) and create groups
that can be represented in multiple ways (slow).
regexp matches example
str_detect("\u0130", coll("i", TRUE, locale = "tr"))
(ab|d)e sets precedence alt("(ab|d)e") abcde
LOOK AROUNDS look <- function(rx) str_view_all("bacad", rx)
boundary() Matches boundaries between example
characters, line_breaks, sentences, or words. regexp matches Use an escaped number to refer to and duplicate parentheses groups that occur
str_split(sentences, boundary("word")) a(?=c) followed by look("a(?=c)") bacad earlier in a pattern. Refer to each group by its order of appearance
a(?!c) not followed by look("a(?!c)") bacad string regexp matches example
(?<=b)a preceded by look("(?<=b)a") bacad (type this) (to mean this) (which matches this) (the result is the same as ref("abba"))

(?<!b)a not preceded by look("(?<!b)a") bacad \\1 \1 (etc.) first () group, etc. ref("(a)(b)\\2\\1") abbaab

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor  • stringr 1.2.0 • Updated: 2017-10
Dates and times with lubridate : : CHEAT SHEET
Date-times 2017-11-28 12:00:00 2017-11-28 12:00:00 Round Date-times
A date-time is a point on the timeline, A date is a day stored as An hms is a time stored as floor_date(x, unit = "second")
stored as the number of seconds since the number of days since the number of seconds since
1970-01-01 00:00:00 UTC 1970-01-01 00:00:00 Round down to nearest unit.
2016 2017 2018 2019 2020 floor_date(dt, unit = "month")
Jan Feb Mar Apr
2017-11-28 12:00:00 dt <- as_datetime(1511870400) d <- as_date(17498) t <- hms::as.hms(85) round_date(x, unit = "second")
## "2017-11-28 12:00:00 UTC" ## "2017-11-28" ## 00:01:25 Round to nearest unit.
round_date(dt, unit = "month")
Jan Feb Mar Apr

ceiling_date(x, unit = "second",


PARSE DATE-TIMES (Convert strings or numbers to date-times) GET AND SET COMPONENTS change_on_boundary = NULL)
d ## "2017-11-28"
1. Identify the order of the year (y), month (m), day (d), hour (h), Use an accessor function to get a component. day(d) ## 28 Round up to nearest unit.
Jan Feb Mar Apr
minute (m) and second (s) elements in your data. ceiling_date(dt, unit = "month")
Assign into an accessor function to change a day(d) <- 1
2. Use the function below whose name replicates the order. Each component in place. d ## "2017-11-01" rollback(dates, roll_to_first =
accepts a wide variety of input formats. FALSE, preserve_hms = TRUE)
Roll back to last day of previous
month. rollback(dt)
2017-11-28T14:02:00 ymd_hms(), ymd_hm(), ymd_h(). 2018-01-31 11:59:59 date(x) Date component. date(dt)
ymd_hms("2017-11-28T14:02:00")

Stamp Date-times
2018-01-31 11:59:59 year(x) Year. year(dt)
2017-22-12 10:00:00 ydm_hms(), ydm_hm(), ydm_h(). isoyear(x) The ISO 8601 year.
ydm_hms("2017-22-12 10:00:00") epiyear(x) Epidemiological year.
stamp() Derive a template from an example string and return a new
11/28/2017 1:02:03 mdy_hms(), mdy_hm(), mdy_h().
2018-01-31 11:59:59 month(x, label, abbr) Month. function that will apply the template to date-times. Also
mdy_hms("11/28/2017 1:02:03") month(dt) stamp_date() and stamp_time().

1 Jan 2017 23:59:59 dmy_hms(), dmy_hm(), dmy_h(). 2018-01-31 11:59:59


day(x) Day of month. day(dt)
1. Derive a template, create a function
sf <- stamp("Created Sunday, Jan 17, 1999 3:34")
Tip: use a
dmy_hms("1 Jan 2017 23:59:59") date with
wday(x,label,abbr) Day of week.
2. Apply the template to dates day > 12
20170131 ymd(), ydm(). ymd(20170131) qday(x) Day of quarter.
sf(ymd("2010-04-05"))
2018-01-31 11:59:59 ## [1] "Created Monday, Apr 05, 2010 00:00"
July 4th, 2000 mdy(), myd(). mdy("July 4th, 2000") hour(x) Hour. hour(dt)
2018-01-31 11:59:59
4th of July '99 dmy(), dym(). dmy("4th of July '99") minute(x) Minutes. minute(dt)

2001: Q3 yq() Q for quarter. yq("2001: Q3") 2018-01-31 11:59:59 second(x) Seconds. second(dt) Time Zones
2:01 hms::hms() Also lubridate::hms(), x
J F M A M J week(x) Week of the year. week(dt) R recognizes ~600 time zones. Each encodes the time zone, Daylight
Savings Time, and historical calendar variations for an area. R assigns
hm() and ms(), which return J A S O N D isoweek() ISO 8601 week.
periods.* hms::hms(sec = 0, min= 1, epiweek() Epidemiological week. one time zone per vector.
x
hours = 2)
J F M A M J Use the UTC time zone to avoid Daylight Savings.
quarter(x, with_year = FALSE)
J A S O N D Quarter. quarter(dt) OlsonNames() Returns a list of valid time zone names. OlsonNames()
2017.5 date_decimal(decimal, tz = "UTC")
date_decimal(2017.5)
x
J F M A M J semester(x, with_year = FALSE) 5:00 6:00
J A S O N D Semester. semester(dt)
now(tzone = "") Current time in tz 4:00 Mountain Central 7:00
with_tz(time, tzone = "") Get
(defaults to system tz). now() am(x) Is it in the am? am(dt) Pacific Eastern the same date-time in a new
today(tzone = "") Current date in a pm(x) Is it in the pm? pm(dt) time zone (a new clock time).
xxxxx
January

xxx
with_tz(dt, "US/Pacific")
tz (defaults to system tz). today()
dst(x) Is it daylight savings? dst(d) PT
MT
fast_strptime() Faster strptime. CT ET
fast_strptime('9/1/01', '%y/%m/%d') leap_year(x) Is it a leap year? force_tz(time, tzone = "") Get
leap_year(d) 7:00 7:00 the same clock time in a new
parse_date_time() Easier strptime. Pacific Eastern time zone (a new date-time).
parse_date_time("9/1/01", "ymd") update(object, ..., simple = FALSE) force_tz(dt, "US/Pacific")
update(dt, mday = 2, hour = 1) 7:00 7:00
Mountain Central
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at lubridate.tidyverse.org • lubridate 1.6.0 • Updated: 2017-12
Math with Date-times — Lubridate provides three classes of timespans to facilitate math with dates and date-times
Math with date-times relies on the timeline, Periods track changes in clock times, Durations track the passage of Intervals represent specific intervals Not all years
which behaves inconsistently. Consider how which ignore time line irregularities. physical time, which deviates from of the timeline, bounded by start and are 365 days
the timeline behaves during: clock time when irregularities occur. end date-times. due to leap days.
A normal day nor + minutes(90) nor + dminutes(90) interval(nor, nor + minutes(90)) Not all minutes
nor <- ymd_hms("2018-01-01 01:30:00",tz="US/Eastern") are 60 seconds due to
leap seconds.

1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00


It is possible to create an imaginary date
1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00
by adding months, e.g. February 31st
The start of daylight savings (spring forward) gap + minutes(90) gap + dminutes(90) interval(gap, gap + minutes(90)) jan31 <- ymd(20180131)
gap <- ymd_hms("2018-03-11 01:30:00",tz="US/Eastern") jan31 + months(1)
## NA
1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00
%m+% and %m-% will roll imaginary
dates to the last day of the previous
The end of daylight savings (fall back) month.
lap + minutes(90) lap + dminutes(90) interval(lap, lap + minutes(90))
lap <- ymd_hms("2018-11-04 00:30:00",tz="US/Eastern") jan31 %m+% months(1)
## "2018-02-28"
add_with_rollback(e1, e2, roll_to_first =
12:00 1:00 2:00 3:00 12:00 1:00 2:00 3:00 12:00 1:00 2:00 3:00 12:00 1:00 2:00 3:00 TRUE) will roll imaginary dates to the
leap + years(1) leap + dyears(1) interval(leap, leap + years(1)) first day of the new month.
Leap years and leap seconds
leap <- ymd("2019-03-01") add_with_rollback(jan31, months(1),
roll_to_first = TRUE)
## "2018-03-01"
2019 2020 2021 2019 2020 2021 2019 2020 2021 2019 2020 2021

PERIODS DURATIONS INTERVALS


Add or subtract periods to model events that happen at specific clock Add or subtract durations to model physical processes, like battery life. Divide an interval by a duration to determine its physical length, divide
times, like the NYSE opening bell. Durations are stored as seconds, the only time unit with a consistent length. an interval by a period to determine its implied length in clock time.
Difftimes are a class of durations found in base R.
Start End
Make a period with the name of a time unit pluralized, e.g. Make a duration with the name of a period prefixed with a d, e.g. Make an interval with interval() or %--%, e.g. Date Date

p <- months(3) + days(12) years(x = 1) x years. dd <- ddays(14) dyears(x = 1) 31536000x seconds. i <- interval(ymd("2017-01-01"), d) ## 2017-01-01 UTC--2017-11-28 UTC
p months(x) x months. dd dweeks(x = 1) 604800x seconds. j <- d %--% ymd("2017-12-31") ## 2017-11-28 UTC--2017-12-31 UTC
"3m 12d 0H 0M 0S" weeks(x = 1) x weeks. "1209600s (~2 weeks)" ddays(x = 1) 86400x seconds. a %within% b Does interval or date-time a fall
days(x = 1) x days. Exact Equivalent dhours(x = 1) 3600x seconds. within interval b? now() %within% i
Number Number
of months of days etc. hours(x = 1) x hours. length in in common dminutes(x = 1) 60x seconds.
minutes(x = 1) x minutes. seconds units dseconds(x = 1) x seconds. int_start(int) Access/set the start date-time of
seconds(x = 1) x seconds. dmilliseconds(x = 1) x x 10-3 seconds. an interval. Also int_end(). int_start(i) <- now();
milliseconds(x = 1) x milliseconds. int_start(i)
dmicroseconds(x = 1) x x 10-6 seconds.
microseconds(x = 1) x microseconds dnanoseconds(x = 1) x x 10-9 seconds. int_aligns(int1, int2) Do two intervals share a
nanoseconds(x = 1) x nanoseconds. dpicoseconds(x = 1) x x 10-12 seconds. boundary? Also int_overlaps(). int_aligns(i, j)
picoseconds(x = 1) x picoseconds.
duration(num = NULL, units = "second", ...) int_diff(times) Make the intervals that occur
period(num = NULL, units = "second", ...) An automation friendly duration between the date-times in a vector.
An automation friendly period constructor. constructor. duration(5, unit = "years") v <-c(dt, dt + 100, dt + 1000); int_diff(v)
period(5, unit = "years")
as.duration(x, …) Coerce a timespan to a int_flip(int) Reverse the direction of an
as.period(x, unit) Coerce a timespan to a duration. Also is.duration(), is.difftime(). interval. Also int_standardize(). int_flip(i)
period, optionally in the specified units. as.duration(i)
Also is.period(). as.period(i) l int_length(int) Length in seconds. int_length(i)
make_difftime(x) Make difftime with the
period_to_seconds(x) Convert a period to specified number of units.
the "standard" number of seconds implied int_shift(int, by) Shifts an interval up or down
make_difftime(99999) the timeline by a timespan. int_shift(i, days(-1))
by the period. Also seconds_to_period().
period_to_seconds(p)
as.interval(x, start, …) Coerce a timespans to
an interval with the start date-time. Also
is.interval(). as.interval(days(1), start = now())
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at lubridate.tidyverse.org • lubridate 1.6.0 • Updated: 2017-12
Package Development: : CHEAT SHEET
Package Structure Setup (� DESCRIPTION)
A package is a convention for organizing files into directories. The � DESCRIPTION file describes your work, sets up how your Package: mypackage
package will work with other packages, and applies a copyright. Title: Title of Package
This sheet shows how to work with the 7 most common parts of Version: 0.1.0
an R package: � You must have a DESCRIPTION file
Add the packages that yours relies on with
Authors@R: person("Hadley", "Wickham", email =
"hadley@me.com", role = c("aut", "cre"))

� Package � devtools::use_package()
Description: What the package does (one paragraph)
Depends: R (>= 3.1.0)
Import packages that your package
� DESCRIPTION SETUP License: GPL-2
must have to work. R will install them
Adds a package to the Imports or Suggests field LazyData: true
� R/ WRITE CODE Imports: when it installs your package.
� tests/ TEST CC0 MIT GPL-2 dplyr (>= 0.4.0),
ggvis (>= 0.2) Suggest packages that are not very
� man/ DOCUMENT No strings attached. MIT license applies to GPL-2 license applies to your
Suggests: essential to yours. Users can install
your code if re-shared. code, and all code anyone
� vignettes/ TEACH bundles with it, if re-shared. knitr (>= 0.1.0) them manually, or not, as they like.
� data/ ADD DATA
� NAMESPACE ORGANIZE
Write Code ( � R/) Test ( � tests/)
The contents of a package can be stored on disk as a:
All of the R code in your package goes in � R/. A package with just Use � tests/ to store tests that will alert you if your code breaks.
• source - a directory with sub-directories (as above) an R/ directory is still a very useful package.
• bundle - a single compressed file (.tar.gz)
• binary - a single compressed file optimized for a specific OS � Create a new package project with
� Add a tests/ directory
devtools::create("path/to/name")
Or installed into an R library (loaded into memory during an R
session) or archived online in a repository. Use the functions Create a template to develop into a package. � Import testthat with devtools::use_testthat(), which
sets up package to use automated tests with testthat
below to move between these states.
� Save your code in � R/ as scripts (extension .R)
� Write tests with context(), test(), and expect statements
Repository

In memory

� Save your tests as .R files in tests/testthat/


Installed

WORKFLOW
Bundle
Source

Binary

1. Edit your code. WORKFLOW


install.packages() CRAN ○ 2. Load your code with one of 1. Modify your code or tests.
Example Test
install.packages(type = "source") CRAN ○ devtools::load_all() 2. Test your code with one of
context("Arithmetic")
○ ○ Re-loads all saved files in � R/ into memory. devtools::test()
R CMD install ○ ○ Ctrl/Cmd + Shift + L (keyboard shortcut) Runs all tests in � tests/ test_that("Math works", {
expect_equal(1 + 1, 2)
○ ○ Saves all open files then calls load_all(). Ctrl/Cmd + Shift + T expect_equal(1 + 2, 3)
devtools::install() ○ (keyboard shortcut) expect_equal(1 + 3, 4)
3. Experiment in the console. })
devtools::build() ○ ○ 3. Repeat until all tests pass
4. Repeat.
devtools::install_github() github ○ Expect statement Tests
devtools::load_all() ○ ○ • Use consistent style with r-pkgs.had.co.nz/r.html#style expect_equal() is equal within small numerical tolerance?
Build & Reload (RStudio) ○ ○ ○ • Click on a function and press F2 to open its definition expect_identical() is exactly equal?
library() • Search for a function with Ctrl + . expect_match() matches specified string or regular
○ ○ expression?
expect_output() prints specified output?
Internet On disk library memory
Visit r-pkgs.had.co.nz to expect_message() displays specified message?
devtools::use_build_ignore("file") learn much more about expect_warning() displays specified warning?
Adds file to .Rbuildignore, a list of files that will not be included writing and publishing expect_error() throws specified error?
packages for R expect_is() output inherits from certain class?
when package is built.
expect_false() returns FALSE?
expect_true() returns TRUE?

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at http://r-pkgs.had.co.nz/ • devtools 1.5.1 • Updated: 2015-01
Document ( man/) Add Data ( data/)
 man/ contains the documentation for your functions, the help The  data/ directory allows you to
pages in your package. ROXYGEN2 include data with your package.

Use roxygen comments to document each function The roxygen2 package lets you write
 beside its definition documentation inline in your .R files with a  Save data as .Rdata files (suggested)
shorthand syntax. devtools implements Store data in one of data/, R/Sysdata.rda, inst/extdata
 Document the name of each exported data set
roxygen2 to make documentation.  Always use LazyData: true in your DESCRIPTION file.
 Include helpful examples for each function • Add roxygen documentation as comment lines 
that begin with #’.
devtools::use_data()
WORKFLOW • Place comment lines directly above the code that defines the Adds a data object to data/
object documented.
(R/Sysdata.rda if internal = TRUE)
1. Add roxygen comments in your .R files • Place a roxygen @ tag (right) after #’ to supply a specific
2. Convert roxygen comments into documentation with one of: section of documentation. devtools::use_data_raw()
Adds an R Script used to clean a data set to data-raw/.
devtools::document() • Untagged lines will be used to generate a title, description, Includes data-raw/ on .Rbuildignore.
Converts roxygen comments to .Rd files and places and details section (in that order)
them in  man/. Builds NAMESPACE.
Store data in
#' Add together two numbers.
Ctrl/Cmd + Shift + D (Keyboard Shortcut) #' • data/ to make data available to package users
3. Open help pages with ? to preview documentation
#' @param x A number. • R/sysdata.rda to keep data internal for use by your
#' @param y A number. functions.
4. Repeat #' @return The sum of \code{x} and \code{y}.
#' @examples
• inst/extdata to make raw data available for loading and
#' add(1, 1) parsing examples. Access this data with system.file()
.Rd FORMATTING TAGS #' @export
add <- function(x, y) {
\emph{italic text} \email{name@@foo.com} x + y
\strong{bold text}
\code{function(args)}
\href{url}{display}
\url{url}
}
Organize ( NAMESPACE)
\pkg{package}
\link[=dest]{display} COMMON ROXYGEN TAGS The  NAMESPACE file helps you make your package self-
contained: it won’t interfere with other packages, and other
\dontrun{code} \linkS4class{class}
@aliases @inheritParams @seealso packages won’t interfere with it.
\dontshow{code} \code{\link{function}}
\donttest{code} \code{\link[package]{function}} @concepts @keywords @format
@describeIn @param @source data Export functions for users by placing @export in their
\deqn{a + b (block)} \tabular{lcr}{ @examples @rdname @include
 roxygen comments
\eqn{a + b (inline)} left \tab centered \tab right \cr @export @return @slot S4 Import objects from other packages with

}
cell \tab cell \tab cell \cr
@family @section @field RC  package::object (recommended) or @import,
@importFrom, @importClassesFrom,
@importMethodsFrom (not always recommended)
Teach ( vignettes/)
 vignettes/ holds documents that teach your users how to solve real problems with your tools.
WORKFLOW
 Create a  vignettes/ directory and a template vignette with
devtools::use_vignette()
---
title: "Vignette Title" 1. Modify your code or tests.
author: "Vignette Author"
2. Document your package (devtools::document())
Adds template vignette as vignettes/my-vignette.Rmd. date: "`r Sys.Date()`"
output: rmarkdown::html_vignette 3. Check NAMESPACE
 Append YAML headers to your vignettes (like right)
vignette: > 4. Repeat until NAMESPACE is correct
 Write the body of your vignettes in R Markdown
(rmarkdown.rstudio.com)
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc} SUBMIT YOUR PACKAGE
--- r-pkgs.had.co.nz/release.html

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at http://r-pkgs.had.co.nz/ • devtools 1.5.1 • Updated: 2015-01
Deep Learning with Keras : : CHEAT SHEET Keras TensorFlow

Intro Define
INSTALLATION
The keras R package uses the Python keras library.
Compile Fit Evaluate Predict
Keras is a high-level neural networks API You can install all the prerequisites directly from R.
developed with a focus on enabling fast • Model • Batch size
• Sequential • Optimiser • Epochs • Evaluate • classes https://keras.rstudio.com/reference/install_keras.html
experimentation. It supports multiple back-
ends, including TensorFlow, CNTK and Theano. model • Loss • Validation • Plot • probability
library(keras) See ?install_keras
• Multi-GPU • Metrics split
install_keras() for GPU instructions
TensorFlow is a lower level mathematical model
library for building deep neural network This installs the required libraries in an Anaconda
architectures. The keras R package makes it https://keras.rstudio.com The “Hello, World!” environment or virtual environment 'r-tensorflow'.
easy to use Keras and TensorFlow in R. https://www.manning.com/books/deep-learning-with-r of deep learning

TRAINING AN IMAGE RECOGNIZER ON MNIST DATA


Working with keras models # input layer: use MNIST images
DEFINE A MODEL PREDICT CORE LAYERS mnist <- dataset_mnist()
keras_model() Keras Model x_train <- mnist$train$x; y_train <- mnist$train$y
predict() Generate predictions from a Keras model
layer_input() Input layer x_test <- mnist$test$x; y_test <- mnist$test$y
keras_model_sequential() Keras Model composed of
a linear stack of layers predict_proba() and predict_classes()
Generates probability or class probability predictions layer_dense() Add a densely- # reshape and rescale
for the input samples connected NN layer to an output x_train <- array_reshape(x_train, c(nrow(x_train), 784))
multi_gpu_model() Replicates a model on different
GPUs x_test <- array_reshape(x_test, c(nrow(x_test), 784))
predict_on_batch() Returns predictions for a single layer_activation() Apply an x_train <- x_train / 255; x_test <- x_test / 255
batch of samples activation function to an output
COMPILE A MODEL layer_dropout() Applies Dropout y_train <- to_categorical(y_train, 10)
predict_generator() Generates predictions for the
to the input y_test <- to_categorical(y_test, 10)
compile(object, optimizer, loss, metrics = NULL) input samples from a data generator
Configure a Keras model for training
layer_reshape() Reshapes an # defining the model and layers
output to a certain shape model <- keras_model_sequential()
FIT A MODEL OTHER MODEL OPERATIONS model %>%
layer_dense(units = 256, activation = 'relu',

fit(object, x = NULL, y = NULL, batch_size = NULL, summary() Print a summary of a Keras model layer_permute() Permute the
epochs = 10, verbose = 1, callbacks = NULL, …) input_shape = c(784)) %>%
dimensions of an input according
Train a Keras model for a fixed number of epochs to a given pattern layer_dropout(rate = 0.4) %>%
(iterations) export_savedmodel() Export a saved model layer_dense(units = 128, activation = 'relu') %>%
n layer_repeat_vector() Repeats layer_dense(units = 10, activation = 'softmax’)
fit_generator() Fits the model on data yielded batch- get_layer() Retrieves a layer based on either its the input n times
by-batch by a generator name (unique) or index
# compile (define loss and optimizer)
pop_layer() Remove the last layer in a model x f(x) layer_lambda(object, f) Wraps model %>% compile(
train_on_batch() test_on_batch() Single gradient arbitrary expression as a layer
update or model evaluation over one batch of loss = 'categorical_crossentropy',
samples save_model_hdf5(); load_model_hdf5() Save/ optimizer = optimizer_rmsprop(),
L1 L2 layer_activity_regularization()
Load models using HDF5 files Layer that applies an update to metrics = c('accuracy’)
the cost function based input )
EVALUATE A MODEL serialize_model(); unserialize_model() activity
Serialize a model to an R object # train (fit)
layer_masking() Masks a model %>% fit(
evaluate(object, x = NULL, y = NULL, batch_size = clone_model() Clone a model instance sequence by using a mask value to
NULL) Evaluate a Keras model x_train, y_train,
skip timesteps
epochs = 30, batch_size = 128,
freeze_weights(); unfreeze_weights()
evaluate_generator() Evaluates the model on a data layer_flatten() Flattens an input validation_split = 0.2
generator Freeze and unfreeze weights
)
model %>% evaluate(x_test, y_test)
model %>% predict_classes(x_test)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at keras.rstudio.com • keras 2.1.2 • Updated: 2017-12
More layers Preprocessing
CONVOLUTIONAL LAYERS ACTIVATION LAYERS SEQUENCE PREPROCESSING Keras TensorFlow
layer_conv_1d() 1D, e.g.
temporal convolution
layer_activation(object, activation)
Apply an activation function to an output
pad_sequences()
Pads each sequence to the same length (length of Pre-trained models
the longest sequence)
layer_activation_leaky_relu() Keras applications are deep learning models
layer_conv_2d_transpose() Leaky version of a rectified linear unit skipgrams() that are made available alongside pre-trained
Transposed 2D (deconvolution) Generates skipgram word pairs weights. These models can be used for
α layer_activation_parametric_relu() prediction, feature extraction, and fine-tuning.
layer_conv_2d() 2D, e.g. spatial Parametric rectified linear unit make_sampling_table() application_xception()

convolution over images Generates word rank-based probabilistic sampling xception_preprocess_input()

layer_activation_thresholded_relu() table Xception v1 model

Thresholded rectified linear unit
layer_conv_3d_transpose()
Transposed 3D (deconvolution) layer_activation_elu() TEXT PREPROCESSING application_inception_v3()

layer_conv_3d() 3D, e.g. spatial Exponential linear unit inception_v3_preprocess_input()
text_tokenizer() Text tokenization utility Inception v3 model, with weights pre-trained
convolution over volumes
on ImageNet
fit_text_tokenizer() Update tokenizer internal
layer_conv_lstm_2d() vocabulary
Convolutional LSTM DROPOUT LAYERS application_inception_resnet_v2()

save_text_tokenizer(); load_text_tokenizer() inception_resnet_v2_preprocess_input()
layer_separable_conv_2d() layer_dropout() Inception-ResNet v2 model, with weights
Depthwise separable 2D Save a text tokenizer to an external file
Applies dropout to the input trained on ImageNet
layer_upsampling_1d() texts_to_sequences();
layer_spatial_dropout_1d() texts_to_sequences_generator() application_vgg16(); application_vgg19()
layer_upsampling_2d() layer_spatial_dropout_2d()
layer_upsampling_3d() Transforms each text in texts to sequence of integers VGG16 and VGG19 models
layer_spatial_dropout_3d()
Upsampling layer Spatial 1D to 3D version of dropout texts_to_matrix(); sequences_to_matrix() application_resnet50() ResNet50 model
layer_zero_padding_1d() Convert a list of sequences into a matrix
layer_zero_padding_2d() application_mobilenet()

layer_zero_padding_3d() RECURRENT LAYERS text_one_hot() One-hot encode text to word indices mobilenet_preprocess_input()

Zero-padding layer mobilenet_decode_predictions()

layer_simple_rnn() text_hashing_trick()
Fully-connected RNN where the output mobilenet_load_model_hdf5()
layer_cropping_1d() Converts a text to a sequence of indexes in a fixed-
layer_cropping_2d() is to be fed back to input MobileNet model architecture
size hashing space
layer_cropping_3d()
Cropping layer layer_gru() text_to_word_sequence()
Gated recurrent unit - Cho et al Convert text to a sequence of words (or tokens)
POOLING LAYERS ImageNet is a large database of images with
layer_cudnn_gru() labels, extensively used for deep learning
layer_max_pooling_1d() Fast GRU implementation backed IMAGE PREPROCESSING
layer_max_pooling_2d() by CuDNN imagenet_preprocess_input()
layer_max_pooling_3d() image_load() Loads an image into PIL format. imagenet_decode_predictions()
Maximum pooling for 1D to 3D layer_lstm() Preprocesses a tensor encoding a batch of
Long-Short Term Memory unit - flow_images_from_data() images for ImageNet, and decodes predictions
layer_average_pooling_1d() Hochreiter 1997 flow_images_from_directory()
layer_average_pooling_2d()
layer_average_pooling_3d()
Average pooling for 1D to 3D
layer_cudnn_lstm()
Fast LSTM implementation backed
Generates batches of augmented/normalized data
from images and labels, or a directory Callbacks
by CuDNN image_data_generator() Generate minibatches of A callback is a set of functions to be applied at
layer_global_max_pooling_1d() given stages of the training procedure. You can
layer_global_max_pooling_2d() image data with real-time data augmentation.
LOCALLY CONNECTED LAYERS use callbacks to get a view on internal states
layer_global_max_pooling_3d() and statistics of the model during training.
Global maximum pooling fit_image_data_generator() Fit image data
layer_locally_connected_1d() generator internal statistics to some sample data callback_early_stopping() Stop training when
layer_global_average_pooling_1d() layer_locally_connected_2d() a monitored quantity has stopped improving
layer_global_average_pooling_2d() Similar to convolution, but weights are not generator_next() Retrieve the next item
layer_global_average_pooling_3d() callback_learning_rate_scheduler() Learning
shared, i.e. different filters for each patch
Global average pooling rate scheduler
image_to_array(); image_array_resize()
 callback_tensorboard() TensorBoard basic
image_array_save() 3D array representation visualizations
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at keras.rstudio.com • keras 2.1.2 • Updated: 2017-12
Data Science in Spark with Sparklyr : : CHEAT SHEET
Intro Data Science Toolchain with Spark + sparklyr Using
sparklyr is an R interface for Apache Spark™,
it provides a complete dplyr backend and the Import Tidy
Understand
Communicate
sparklyr
Transform Visualize
option to query directly using Spark SQL • Export an R • dplyr verb Transformer Collect data into • Collect data A brief example of a data analysis using
statement. With sparklyr, you can orchestrate DataFrame • Direct Spark function R for plotting into R Apache Spark, R and sparklyr in local mode
distributed machine learning using either • Read a file SQL (DBI) • Share plots,
Spark’s MLlib or H2O Sparkling Water. • Read existing • SDF function Wrangle Model documents, library(sparklyr); library(dplyr); library(ggplot2);
Hive table (Scala API) • Spark MLlib and apps library(tidyr);
Starting with version 1.044, RStudio Desktop, Install Spark locally
R for Data Science, Grolemund & Wickham
• H2O Extension set.seed(100)
Server and Pro include integrated support for
the sparklyr package. You can create and spark_install("2.0.1") Connect to local version
manage connections to Spark clusters and local Getting Started sc <- spark_connect(master = "local")
Spark instances from inside the IDE.
LOCAL MODE (No cluster required) ON A YARN MANAGED CLUSTER
RStudio Integrates with sparklyr
1. Install a local version of Spark: 1. Install RStudio Server or RStudio Pro on import_iris <- copy_to(sc, iris, "spark_iris",
Open connection log Disconnect spark_install ("2.0.1") one of the existing nodes, preferably an overwrite = TRUE)
2. Open a connection edge node Copy data to Spark memory
sc <- spark_connect (master = "local") 2. Locate path to the cluster’s Spark Home
Directory, it normally is “/usr/lib/spark” partition_iris <- sdf_partition( Partition
import_iris,training=0.5, testing=0.5) data
3. Open a connection
Open the ON A MESOS MANAGED CLUSTER
Spark UI spark_connect(master=“yarn-client”,
version = “1.6.2”, spark_home = sdf_register(partition_iris,
1. Install RStudio Server or Pro on one of the
Preview existing nodes [Cluster’s Spark path]) c("spark_iris_training","spark_iris_test"))
Spark & Hive Tables 1K rows
2. Locate path to the cluster’s Spark directory
Create a hive metadata for each partition
3. Open a connection
Cluster Deployment spark_connect(master=“[mesos URL]”,
version = “1.6.2”, spark_home = ON A SPARK STANDALONE CLUSTER
tidy_iris <- tbl(sc,"spark_iris_training") %>%
select(Species, Petal_Length, Petal_Width)
[Cluster’s Spark path]) 1. Install RStudio Server or RStudio Pro on
MANAGED CLUSTER Spark ML
Worker Nodes one of the existing nodes or a server in the
Cluster Manager Decision Tree
same LAN Model
Driver Node model_iris <- tidy_iris %>%
USING LIVY (Experimental)
fd 1. The Livy REST application should be
2. Install a local version of Spark:
spark_install (version = “2.0.1")
ml_decision_tree(response="Species",
features=c("Petal_Length","Petal_Width"))
YARN running on the cluster 3. Open a connection
fd or
Mesos 2. Connect to the cluster spark_connect(master=“spark:// test_iris <- tbl(sc,"spark_iris_test") Create
sc <- spark_connect(method = "livy", host:port“, version = "2.0.1", reference to
fd master = "http://host:port") spark_home = spark_home_dir())
pred_iris <- sdf_predict(
Spark table
model_iris, test_iris) %>%
Bring data back
STAND ALONE CLUSTER Worker Nodes
Tuning Spark collect
into R memory
for plotting
Driver Node pred_iris %>%
fd EXAMPLE CONFIGURATION IMPORTANT TUNING PARAMETERS with defaults inner_join(data.frame(prediction=0:2,
lab=model_iris$model.parameters$labels)) %>%
fd config <- spark_config()
config$spark.executor.cores <- 2


spark.yarn.am.cores • spark.executor.instances
spark.yarn.am.memory 512m • spark.executor.extraJavaOptions ggplot(aes(Petal_Length, Petal_Width, col=lab)) +
config$spark.executor.memory <- "4G" • spark.network.timeout 120s • spark.executor.heartbeatInterval 10s geom_point()
fd sc <- spark_connect (master="yarn-client", • spark.executor.memory 1g • sparklyr.shell.executor-memory
config = config, version = "2.0.1") • spark.executor.cores 1 • sparklyr.shell.driver-memory spark_disconnect(sc) Disconnect

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at spark.rstudio.com • sparklyr 0.5 • Updated: 2016-12
Reactivity Visualize & Communicate Model (MLlib)
COPY A DATA FRAME INTO SPARK SPARK SQL COMMANDS DOWNLOAD DATA TO R MEMORY ml_decision_tree(my_table,
sdf_copy_to(sc, iris, "spark_iris") r_table <- collect(my_table) response = “Species", features =
DBI::dbWriteTable(sc, "spark_iris", iris)
plot(Petal_Width~Petal_Length, data=r_table)
sdf_copy_to(sc, x, name, memory, repartition, c(“Petal_Length" , "Petal_Width"))
DBI::dbWriteTable(conn, name, dplyr::collect(x)
overwrite) value) Download a Spark DataFrame to an R DataFrame ml_als_factorization(x, user.column = "user",
sdf_read_column(x, column) rating.column = "rating", item.column = "item",
IMPORT INTO SPARK FROM A FILE FROM A TABLE IN HIVE Returns contents of a single column to R rank = 10L, regularization.parameter = 0.1, iter.max = 10L,
Arguments that apply to all functions: my_var <- tbl_cache(sc, name= ml.options = ml_options())
sc, name, path, options = list(), repartition = 0, "hive_iris") SAVE FROM SPARK TO FILE SYSTEM ml_decision_tree(x, response, features, max.bins = 32L, max.depth
memory = TRUE, overwrite = TRUE Arguments that apply to all functions: x, path
tbl_cache(sc, name, force = TRUE) = 5L, type = c("auto", "regression", "classification"), ml.options =
CSV spark_read_csv( header = TRUE, Loads the table into memory spark_read_csv( header = TRUE, ml_options()) Same options for: ml_gradient_boosted_trees
columns = NULL, infer_schema = TRUE, CSV
delimiter = ",", quote = "\"", escape = "\\", ml_generalized_linear_regression(x, response, features,
delimiter = ",", quote = "\"", escape = "\\", my_var <- dplyr::tbl(sc,
charset = "UTF-8", null_value = NULL) intercept = TRUE, family = gaussian(link = "identity"), iter.max =
charset = "UTF-8", null_value = NULL) name= "hive_iris")
dplyr::tbl(scr, …) JSON spark_read_json(mode = NULL) 100L, ml.options = ml_options())
JSON spark_read_json()
Creates a reference to the table PARQUET spark_read_parquet(mode = NULL) ml_kmeans(x, centers, iter.max = 100, features = dplyr::tbl_vars(x),
PARQUET spark_read_parquet() without loading it into memory compute.cost = TRUE, tolerance = 1e-04, ml.options = ml_options())
ml_lda(x, features = dplyr::tbl_vars(x), k = length(features), alpha =
Wrangle Reading & Writing from Apache Spark (50/k) + 1, beta = 0.1 + 1, ml.options = ml_options())
ml_linear_regression(x, response, features, intercept = TRUE,
SPARK SQL VIA DPLYR VERBS ML TRANSFORMERS tbl_cache
sdf_copy_to alpha = 0, lambda = 0, iter.max = 100L, ml.options = ml_options())
Translates into Spark SQL statements ft_binarizer(my_table,input.col=“Petal_Le dplyr::tbl
dplyr::copy_to Same options for: ml_logistic_regression
ngth”, output.col="petal_large", DBI::dbWriteTable
my_table <- my_var %>% ml_multilayer_perceptron(x, response, features, layers, iter.max =
threshold=1.2)
filter(Species=="setosa") %>% 100, seed = sample(.Machine$integer.max, 1), ml.options =
sample_n(10) Arguments that apply to all functions: ml_options())
x, input.col = NULL, output.col = NULL spark_read_<fmt>
sdf_collect ml_naive_bayes(x, response, features, lambda = 0, ml.options =
DIRECT SPARK SQL COMMANDS dplyr::collect File
ft_binarizer(threshold = 0.5) ml_options())
Assigned values based on threshold sdf_read_column System
my_table <- DBI::dbGetQuery( sc , ”SELECT * ml_one_vs_rest(x, classifier, response, features, ml.options =
spark_write_<fmt>
FROM iris LIMIT 10") ft_bucketizer(splits) ml_options())

Extensions
DBI::dbGetQuery(conn, statement) Numeric column to discretized column
ml_pca(x, features = dplyr::tbl_vars(x), ml.options = ml_options())
ft_discrete_cosine_transform(inverse
Create an R package that calls the full Spark API & ml_random_forest(x, response, features, max.bins = 32L,
SCALA API VIA SDF FUNCTIONS = FALSE)
provide interfaces to Spark packages. max.depth = 5L, num.trees = 20L, type = c("auto", "regression",
Time domain to frequency domain
sdf_mutate(.data) CORE TYPES "classification"), ml.options = ml_options())
Works like dplyr mutate function ft_elementwise_product(scaling.col)
spark_connection() Connection between R and the ml_survival_regression(x, response, features, intercept =
Element-wise product between 2 cols
sdf_partition(x, ..., weights = NULL, seed = Spark shell process TRUE,censor = "censor", iter.max = 100L, ml.options = ml_options())
sample (.Machine$integer.max, 1)) ft_index_to_string() spark_jobj() Instance of a remote Spark object
Index labels back to label as strings ml_binary_classification_eval(predicted_tbl_spark, label, score,
sdf_partition(x, training = 0.5, test = 0.5) spark_dataframe() Instance of a remote Spark
metric = "areaUnderROC")
sdf_register(x, name = NULL) ft_one_hot_encoder() DataFrame object
Continuous to binary vectors ml_classification_eval(predicted_tbl_spark, label, predicted_lbl,
Gives a Spark DataFrame a table name CALL SPARK FROM R metric = "f1")
sdf_sample(x, fraction = 1, replacement = ft_quantile_discretizer(n.buckets=5L) invoke() Call a method on a Java object
Continuous to binned categorical ml_tree_feature_importance(sc, model)
TRUE, seed = NULL) invoke_new() Create a new object by invoking a
values
sdf_sort(x, columns) constructor
Sorts by >=1 columns in ascending order ft_sql_transformer(sql) invoke_static() Call a static method on an object sparklyr
sdf_with_unique_id(x, id = "id") ft_string_indexer( params = NULL) is an R
Column of labels into a column of label MACHINE LEARNING EXTENSIONS
sdf_predict(object, newdata) indices. ml_options() interface
ml_create_dummy_variables()
Spark DataFrame with predicted values for
ft_vector_assembler() ml_model()
ml_prepare_dataframe()
Combine vectors into single row-vector
ml_prepare_response_features_intercept()
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at spark.rstudio.com • sparklyr 0.5 • Updated: 2016-12
Tidy evaluation with rlang : : CHEAT SHEET
Vocabulary Quoting Code a+b e
3 a+b
Tidy Evaluation (Tidy Eval) is not a package, but a framework Quote code in one of two ways (if in doubt use a quosure):
for doing non-standard evaluation (i.e. delayed evaluation) that
makes it easier to program with tidyverse functions.
QUOSURES EXPRESSION
pi Symbol - a name that represents a value Quosure- An expression that has Quoted Expression - An expression
or object stored in R. is_symbol(expr(pi)) a
b been saved with an environment ? that has been saved by itself.
(aka a closure). A quoted expression can be evaluated
Environment - a list-like object that binds q when a+b e when a+b
A quosure can be evaluated later later to return a result that will depend
symbols (names) to objects stored in memory. a + b, a b evaluated 3 in the stored environment to a+b evaluated ? on the environment it is evaluated in
a 1
Each env contains a link to a second, parent
b 2 return a predictable result.
env, which creates a chain, or search path, of
environments. is_environment(current_env())

rlang::caller_env(n = 1) Returns rlang::quo(expr) Quote contents as a quosure. Also quos to quote rlang::expr(expr) Quote contents. Also exprs to quote multiple
calling env of the function it is in. multiple expressions. a <- 1; b <- 2; q <- quo(a + b); qs <- quos(a, b) expressions. a <- 1; b <- 2; e <- expr(a + b); es <- exprs(a, b, a + b)

rlang::child_env(.parent, ...) Creates rlang::enquo(arg) Call from within a function to quote what the user rlang::enexpr(arg) Call from within a function to quote what the user
new env as child of .parent. Also env. passed to an argument as a quosure. Also enquos for multiple args. passed to an argument. Also enexprs to quote multiple arguments.
rlang::current_env() Returns quote_this < - function(x) enquo(x) quote_that < - function(x) enexpr(x)
execution env of the function it is in. quote_these < - function(…) enquos(…) quote_those < - function(…) enexprs(…)

Constant - a bare value (i.e. an atomic rlang::new_quosure(expr, env = caller_env()) Build a rlang::ensym(x) Call from within a function to quote what the user
1 quosure from a quoted expression and an environment. passed to an argument as a symbol, accepts strings. Also ensyms.
vector of length 1). is_bare_atomic(1)
new_quosure(expr(a + b), current_env()) quote_name < - function(name) ensym(name)
Call object - a vector of symbols/constants/calls quote_names < - function(…) ensyms(…)
abs ( 1 )
that begins with a function name, possibly
followed by arguments. is_call(expr(abs(1)))

pi code Code - a sequence of symbols/constants/calls Parsing and Deparsing Evaluation


that will return a result if evaluated. Code can be: To evaluate an expression, R :
3.14 result
1. Evaluated immediately (Standard Eval) "a + b" e "a + b" 1.Looks up the symbols in the expression in
parse a + b deparse a 1
the active environment (or a supplied one),
2. Quoted to use later (Non-Standard Eval)
followed by the environment's parents
is_expression(expr(pi))
Parse - Convert a string Deparse - Convert a saved
+ fun
b 2 2.Executes the calls in the expression
to a saved expression. expression to a string.
e Expression - an object that stores quoted code The result of an expression depends on
without evaluating it. is_expression(expr(a + b)) a+b fun(1, 2)
a+b 3 which environment it is evaluated in.
rlang::parse_expr(x) Convert rlang::expr_text(expr, width =
q Quosure- an object that stores both quoted a string to an expression. Also 60L, nlines = Inf) Convert expr
code (without evaluating it) and the code's parse_exprs, sym, parse_quo, to a string. Also quo_name. QUOTED EXPRESSION QUOSURES (and quoted exprs)
a + b, a b environment. is_quosure(quo(a + b)) parse_quos. e<-parse_expr("a+b") expr_text(e)
rlang::eval_bare(expr, env = rlang::eval_tidy(expr, data = NULL,
a rlang::quo_get_env(quo) Return parent.frame()) Evaluate expr in env = caller_env()) Evaluate expr in
b
the environment of a quosure. env. eval_bare(e, env =.GlobalEnv) env, using data as a data mask.
Will evaluate quosures in their
a
b rlang::quo_set_env(quo, expr)
Set the environment of a quosure. Building Calls a
stored environment. eval_tidy(q)
+b Data Mask - If data is non-NULL,
a + b rlang::quo_get_expr(quo) Return rlang::call2(.fn, ..., .ns = NULL) Create a call from a function and a list eval_tidy inserts data into the
the expression of a quosure. search path before env, matching
of args. Use exec to create and then evaluate the call. (See back page a+b
for !!!) args <- list(x = 4, base = 2) symbols to names in data.
Expression Vector - a list of pieces of quoted call2("log", x = 4, base = 2)
code created by base R's expression and parse log (x = 4 , base = 2 ) call2("log", !!!args) Use the pronoun .data$ to force a
functions. Not to be confused with expression. a <- 1; b <- 2 symbol to be matched in data, and
exec("log", x = 4, base = 2) p <- quo(.data$a + !!b) !! (see back) to force a symbol to
2 exec("log", !!!args)
mask <- tibble(a = 5, b = 6)
eval_tidy(p, data = mask) be matched in the environments.

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyeval.tidyverse.org • rlang 0.3.0 • Updated: 2018-11
Quasiquotation (!!, !!!, :=) Programming Recipes WRITE A
FUNCTION
THAT RECOGNIZES
Quoting function- A function that quotes any of its arguments internally for delayed evaluation QUASIQUOTATION
QUOTATION QUASIQUOTATION in a chosen environment. You must take special steps to program safely with a quoting function. (!!,!!!,:=)
Storing an expression Quoting some parts of an
without evaluating it. expression while evaluating 1. Capture the
How to spot a quoting function? dplyr::filter(cars, speed = = 25) quasiquotation-aware
e <- expr(a + b) and then inserting the results argument with rlang::enquo.
of others (unquoting others). A function quotes an argument if the speed dist
e <- expr(a + b) argument returns an error when run
on its own. 1 25 85 2. Evaluate the arg with rlang::eval_tidy.
e
Many tidyverse functions are quoting add1 <- function(x) {
log ( e ) log(e) log ( a + b ) log(a + b) speed == 25 q <- rlang::enquo(x) 1
functions: e.g. filter, select, mutate,
fun a+b fun summarise, etc. Error! rlang::eval_tidy(q) + 1 2
}
expr(log(e)) expr(log(!!e))

PROGRAM WITH A QUOTING FUNCTION PASS MULTIPLE ARGUMENTS PASS TO ARGUMENT NAMES
rlang provides !!, !!!, and := for doing quasiquotation. TO A QUOTING FUNCTION OF A QUOTING FUNCTION
!!, !!!, and := are not functions but syntax (symbols recognized
data_mean <- function(data, var) { group_mean <- function(data, var, …) { named_mean <- function(data, var) {
by the functions they are passed to). Compare this to how
require(dplyr) require(dplyr) require(dplyr)
. is used by magrittr::%>%() var <- rlang::enquo(var) 1 var <- rlang::enquo(var) var <- rlang::ensym(var) 1
. is used by stats::lm() data %>% group_vars <- rlang::enquos(…) 1 data %>%
.x is used by purrr::map(), and so on. summarise(mean = mean(!!var)) 2 data %>% summarise(!!name := mean(!!var)) 2
} group_by(!!!group_vars) %>% 2 }
!!, !!!, and := are only recognized by some rlang functions and
functions that use those functions (such as tidyverse functions). summarise(mean = mean(!!var))
}

a !! Unquotes the 1. Capture user argument that will 1. Capture user arguments that will 1. Capture user argument that will
symbol or call that be quoted with rlang::enquo. be quoted with rlang::enquos. be quoted with rlang::ensym.
log ( 1 + b ) log(1 + b) follows. Pronounced
fun fun 2 "unquote" or "bang- 2. Unquote the user argument into 2. Unquote splice the user arguments 2. Unquote the name into the
!! the quoting function with !!. into the quoting function with !!!. quoting function with !! and :=.
bang." a <- 1; b <- 2
expr(log(!!a + b)) expr(log(!!a + b))

a+b Combine !! with ()


to unquote a longer
log ( 3 ) log(3) expression. MODIFY USER ARGUMENTS APPLY AN ARGUMENT TO A DATA FRAME PASS CRAN CHECK
fun !!
a <- 1; b <- 2
expr(log(!!(a + b)))
expr(log(!!(a + b)))
my_do <- function(f, v, df) { subset2 <- function(df, rows) { #' @importFrom rlang .data 1
!!! Unquotes a vector f <- rlang::enquo(f) 1 rows <- rlang::enquo(rows) 1 mutate_y <- function(df) {
x or list and splices the v <- rlang::enquo(v) vals <- rlang::eval_tidy(rows, data = df) dplyr::mutate(df, y = .data$a +1) 2
log ( 8, b = 2 ) log(8, b=2) results as arguments todo <- rlang::quo((!!f)(!!v)) 2 df[vals, , drop = FALSE] 2 }
fun into the surrounding rlang::eval_tidy(todo, df) }
!!! call. Pronounced 3
"unquote splice" or }
expr(log(!!!x))
"bang-bang-bang."
x <- list(8, b = 2)
expr(log(!!!x)) 1. Capture user arguments 1. Capture user argument Quoted arguments in tidyverse functions
with rlang::enquo. with rlang::enquo. can trigger an R CMD check NOTE about
n undefined global variables. To avoid this:
uno := 1 := Replaces an = to 2. Unquote user arguments into a 2. Evaluate the argument with
uno = 1
allow unquoting within new expression or quosure to use rlang::eval_tidy. Pass the data 1. Import rlang::.data to your package,
!!
1 the name that appears frame to data to use as a data mask. perhaps with the roxygen2 tag
on the left hand side of 3. Evaluate the new expression/ @importFrom rlang .data
tibble::tibble(!!n := 1) the =. Use with !! quosure instead of the original 3. Suggest in your documentation
n <- expr(uno) argument that your users use the .data 2. Use the .data$ pronoun in front of
tibble::tibble(!!n := 1) and .env pronouns. variable names in tidyverse functions

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at tidyeval.tidyverse.org • rlang 0.3.0 • Updated: 2018-11
caret Package
Preprocessing Performance Metrics
Transformations, filters, and other operations can be applied to To choose how to summarize a model, the trainControl
Cheat Sheet the predictors with the preProc option. function is used again.

train(, preProc = c("method1", "method2"), ...) trainControl(summaryFunction = <R function>,


classProbs = <logical>)
Specifying the Model Methods include: Custom R functions can be used but caret includes several:
defaultSummary (for accuracy, RMSE, etc), twoClassSummary
Possible syntaxes for specifying the variables in the model: • "center", "scale", and "range" to normalize predictors.
(for ROC curves), and prSummary (for information retrieval). For
• "BoxCox", "YeoJohnson", or "expoTrans" to transform
train(y ~ x1 + x2, data = dat, ...) the last two functions, the option classProbs must be set to
predictors.
train(x = predictor_df, y = outcome_vector, ...) TRUE.
• "knnImpute", "bagImpute", or "medianImpute" to
train(recipe_object, data = dat, ...)
impute.
• rfe, sbf, gafs, and safs only have the x/y interface. • "corr", "nzv", "zv", and "conditionalX" to filter. Grid Search
• The train formula method will always create dummy • "pca", "ica", or "spatialSign" to transform groups.
variables. To let train determine the values of the tuning parameter(s), the
train determines the order of operations; the order that the tuneLength option controls how many values per tuning
• The x/y interface to train will not create dummy variables methods are declared does not matter. parameter to evaluate.
(but the underlying model function might).
Remember to: The recipes package has a more extensive list of preprocessing Alternatively, specific values of the tuning parameters can be
• Have column names in your data.
operations. declared using the tuneGrid argument:

• Use factors for a classification outcome (not 0/1 or integers). grid <- expand.grid(alpha = c(0.1, 0.5, 0.9),
• Have valid R names for class levels (not “0"/"1") Adding Options lambda = c(0.001, 0.01))

• Set the random number seed prior to calling train repeatedly train(x = x, y = y, method = "glmnet",
to get the same resamples across calls. Many train options can be specified using the trainControl preProc = c("center", "scale"),
function: tuneGrid = grid)
• Use the train option na.action = na.pass if you will
being imputing missing data. Also, use this option when train(y ~ ., data = dat, method = "cubist",
predicting new data containing missing values. trControl = trainControl(<options>))
Random Search
To pass options to the underlying model function, you can pass
them to train via the ellipses: Resampling Options For tuning, train can also generate random tuning parameter
train(y ~ ., data = dat, method = "rf", combinations over a wide range. tuneLength controls the total
# options to `randomForest`: trainControl is used to choose a resampling method: number of combinations to evaluate. To use random search:
importance = TRUE)
trainControl(method = <method>, <options>)
trainControl(search = "random")
Parallel Processing Methods and options are:

The foreach package is used to run models in parallel. The


• "cv" for K-fold cross-validation (number sets the # folds).
Subsampling
• "repeatedcv" for repeated cross-validation (repeats for #
train code does not change but a “do” package must be called repeats).
first. With a large class imbalance, train can subsample the data to
• "boot" for bootstrap (number sets the iterations). balance the classes them prior to model fitting.
# on MacOS or Linux # on Windows • "LGOCV" for leave-group-out (number and p are options).
library(doMC) library(doParallel)
• "LOO" for leave-one-out cross-validation. trainControl(sampling = "down")
registerDoMC(cores=4) cl <- makeCluster(2)
registerDoParallel(cl) • "oob" for out-of-bag resampling (only for some models). Other values are "up", "smote", or "rose". The latter two may
• "timeslice" for time-series data (options are require additional package installs.
The function parallel::detectCores can help too.
initialWindow, horizon, fixedWindow, and skip).

CC BY SA Max Kuhn • max@rstudio.com • https://github.com/topepo/ Learn more at https://topepo.github.io/caret/ • Updated: 9/17


Notes:

RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at www.rstudio.com
RStudio Community rstd.io/community
Developer Blog rstd.io/dev-blog
R Views Blog rstd.io/rviews-blog
Tidyverse Blog rstd.io/tidy-blog
Tensorflow Blog rstd.io/tf-blog
Twitter rstd.io/twitter
GitHub rstd.io/github
LinkedIn rstd.io/linkedin
YouTube rstd.io/youtube
Facebook rstd.io/facebook

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com

Das könnte Ihnen auch gefallen