Beruflich Dokumente
Kultur Dokumente
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 1 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
The Basics
Let's start with the basics. You'll want to make sure you have
downloaded and installed R. I'm also using RStudio as my IDE, so
you should install that as well. You'll be glad you did; it's awesome.
You'll also want to install and load the ggplot2 library, which not
only contains the data set we want to use but will also come in
handy when we get to creating charts and graphs later. We will also
install and load the dplyr library to help with manipulating the
data.
install.packages("ggplot2")
install.packages("dplyr")
library(ggplot2)
library(dplyr)
We are going to use the diamonds data set that comes with
ggplot2. The data set contains prices and other attributes of over
50,000 diamonds.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 2 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
OK, so let's take an initial look at the data. You can type diamonds
into the R console and it will print out the data set in the console
screen, but I advise against doing this. If you're an Excel user,
you're used to viewing data in a tabular format. You can do that in
one line of code.
diamonds<- data.frame(diamonds)
The first 7 columns are pretty well labeled, so we won't mess with
those, but the last 3 aren't labeled very well. So let's rename
columns 8, 9, and 10. We'll call them length, width, and depth
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 3 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
respectively.
names(diamonds)[8]<-"length"
names(diamonds)[9]<-"width"
names(diamonds)[10]<-"depth"
You'll notice that now we have two columns named depth. Let's
rename the first one (column 5) to "depthperc."
names(diamonds)[5]<-"depthperc"
Calculated Columns
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 4 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Here we are using the mutate function on the diamonds data set
to multiply length, width, and depth. It's assigning the outcome of
that to a new column called cubic.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 5 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Summaries
The next most common task that Excel is used for is summarizing
data. These tasks range from simply calculating column totals to
the more intermediate pivot tables. I'll show you how to do both in
R.
First, let's say that we want to summarize our data set and calculate
the overall averages for all the numeric fields (carat, depthperc,
table, price, length, width, depth, and cubic). This would be the
equivalent of going to the bottom of each column in Excel and
typing =AVERAGE(A2:A53940) and then copying that formula
over to the bottom of all the other columns you wanted to average.
colMeans(diamonds[,c(1,5:11)])
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 6 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Let's say you wanted to add carat to the non-numeric fields and
then calculate the averages for each combination of the new group
of non-numeric fields. This would take a bit of work in Excel
(maybe even some pivot-tabling), but is pretty easy in R.
First, let's round the carat values to the nearest 0.25 carat so that
our numbers are not all over the place.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 7 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
install.packages("reshape2")
library(reshape2)
Then, we'll use the dcast function to get our data into the same
pivot table format.
pivot_table <-
dcast(diamonds[,c('color','clarity','price')],
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 8 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
color~clarity, mean)
Here, we're taking the color, clarity, and price columns from the
diamonds data frame, casting (pivoting) them out by color (rows)
and clarity (columns), and calculating the average price for each
combination.
VLookups
Another very common thing people do in Excel are vlookups. The
scenario arises where you have two related data sets and you want
to pull some values from data set B over to their appropriate place
in data set A. So you type something like
=VLOOKUP(A2,K2:K50,2,0) and Excel looks up the value in A2
in column K and returns the value in the column next to the
matching value.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 9 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
First, let's change the name of the price column in the Summary
data frame to avgprice. This way, we won't have two price fields
when we bring it over.
names(Summary)[7]<-"avgprice"
Next, let's merge the data sets and bring over the average price.
We merged the diamonds data frame with just the columns that we
needed from the Summary data frame and the result was that it
added the avgprice field to our diamonds data frame.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 10 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Conditional Statements
Excel users also periodically use conditional (IF) statements for
filling in values according to whether certain conditions are met. R
is also very good for doing this.
Here we've set anything less than 0.5 carat to small, anything
between 0.5 and 1 carat to medium, and anything 1 carat and above
to large.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 11 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Bar Charts
Taking a look at our diamonds data set, let's say we want to create
a chart that shows how many diamonds of each size
(small/medium/large) are in our data. Here's how you would do
that in R.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 12 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Line Charts
The second type of chart we're going to create is a line chart. These
are usually used when you have data that changes over some
period of time and you want to see the magnitude and velocity of
those changes. Since our diamonds data set doesn't have any time
series data in it, we'll do something a little different. We will create
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 13 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
a line for each color and see how the number of diamonds of that
color change across clarity categories.
ggplot(diamonds, aes(clarity)) +
geom_freqpoly(aes(group = color, colour = color))
+
labs(x="Clarity", y="Number of Diamonds",
title="Clarity by Color")
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 14 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
It looks like most diamonds fall into the middle clarity categories.
Also, pretty interesting that there are more G color diamonds in
the higher clarity categories than any other color.
Scatterplots
Now let's do a fairly simple scatter plot so you can get a sense of
how to do one in R. For this, we are going to use the ggplot
command that comes with the ggplot2 package.
Alternatively, you can produce the same thing with the qplot
function as well.
The resulting plot shows the relationship between the carat weight
and the price of the diamonds in our data set, and we've also set
the points to be different colors according to the clarity of the
diamond. The graph below shows us that the larger the diamond
and the better the clarity, the more expensive it tends to be.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 15 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
To create other types of charts and graphs, the ggplot2 index site is
a wonderful resource that has code and visuals for different types
of graphs.
Conclusion
Well, there you have it - a guide to get almost any Excel user
started in R. Two things I want to mention before I leave you to
explore some more on your own.
First, I've found that if you want to get good at R (or anything
really), the trick is to find a reason to use it every day. It doesn't
matter if it's something small, just open up R Studio and go
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 16 of 17
District Data Labs - How to Transition from Excel to R 09/08/2014 14:03
Finally, if you found this post useful, go to the blog's home page
and click the Subscribe button. We'll be cranking out a lot more
great, educational data science content in the near future and I'd
hate for you to miss any of it.
https://districtdatalabs.silvrback.com/intro-to-r-for-microsoft-excel-users Page 17 of 17