Beruflich Dokumente
Kultur Dokumente
Rajesh Jakhotia
22 Jun 2014
About K2 Analytics
At K2 Analytics, we believe that skill development is very important for the
growth of an individual, which in turn leads to the growth of Society & Industry
and ultimately the Nation as a whole. For this it is important that access to
knowledge and skill development trainings should be made available easily
and economically to every individual.
Our Vision: To be the preferred partner for training and skill development
Our Mission: To provide training and skill development training to individuals,
make them skilled & industry ready and create a pool of skilled resources
readily available for the industry
We have chosen Business Intelligence and Analytics as our focus area. With
this endeavour we make this Self-Learning R Essentials accessible to all
those who wish to learn R. We hope it is of help to you. For any feedback /
suggestion or you are looking for job in analytics then feel free to write back to
us at ar.jakhotia@k2analytics.co.in
Welcome to R!!!
K2Analytics.co.in
Welcome to R
Content
Introduction to R
Understanding R data structures
Importing Data
Managing Data
R Programming Structures
Basic Charting and Plotting
K2Analytics.co.in
Introduction to R
What is R?
Why R?
Installing R
Understanding the R interface
R environment variables and startup files
How to get help in R
R Console & R Editor
What is R?
Free software environment for statistical computing and graphics
Compiles and runs on a wide variety of UNIX platforms, Windows
and Mac OS
Official website: http://cran.r-project.org/
R's source code is freely available under the GNU General Public
License
K2Analytics.co.in
Why R?
Free and exceptionally good statistical tool
Provides cutting edge statistical techniques as available in many
paid expensive software
Has decent data handling and data manipulation capabilities
Provides connectors to social media sites and one can also easily
get get streaming data
R can work with Big Data
Installing R
Go to website: http://cran.r-project.org/
Click the link based on OS Environment
Click base
Download R installer
Double click on the installer
Select Run
K2Analytics.co.in
R Interface
R Console is where
you execute the
code
R Editor to write
and save code
K2Analytics.co.in
10
Sys.getenv("R_HOME")
Sys.getenv("R_PROFILE")
Sys.getenv("R_PROFILE_USER")
Sys.getenv("R_DATA")
11
R_DATA: The path from where R loads the last saved image from
the current directory, if there is one. The extension of the is .RData
K2Analytics.co.in
12
Customizing R Startup
At startup, R searches for Renviron.site file
Default location is R_HOME/etc/Renviron.site
R_HOME is the path where you installed R. In my case it is
C:/Program Files/R/R-3.0.3
The factory installation does not come up with the Renviron.site file.
You have to create one in notepad and save at the above path
K2Analytics.co.in
13
K2Analytics.co.in
14
Note when the site file and profile file are loaded only the base
packages gets loaded. If you have to refer to any other packages
then they need to be explicitly loaded.
K2Analytics.co.in
15
}
If your R is Running then
close and restart R
Go to R Editor
Click File>Save or Save
As
Note the default folder
path
Note: the above is just an
example. There is lots you
can do as part of
customizing your R Startup
K2Analytics.co.in
16
The utility of startup.Rprofile is that here you can define all your
functions that you may wish to frequently use
K2Analytics.co.in
17
Running R Code
Interactive Mode
You run R by typing the code at the R Command Prompt
Script Mode
You run your code written in script file saved with .R extension
Syntax: Source(myprog.R)
Let us create the file and save in working directory path.
To get working directory path use getwd() command
Assume we have the following statements in myprog.R file
cat(Welcome to R\n)
## \n is escape sequence for new line
Batch Mode:
R CMD BATCH c:\Training\myprog.R c:\Training\myprog.Rout
K2Analytics.co.in
18
R Tip
R Tip
Set you R interface with R Console
& R Editor placed side-by-side
Write all your code in R Editor
19
Variables in R
Variable names in R are case sensitive ( A and a are two different
variables in R)
It can be alpha-numeric and can contain _ or . as part of variable
name
It cannot contain operators (+ - / * < > % =) or special characters like
~{}?#$@
A variable name cannot start with number
K2Analytics.co.in
21
Scalar Variables
Scalar Variable It is single value variables. Scalars in R are vectors
of length 1
K2Analytics.co.in
22
Vector Variables
Vector Variable It is a sequence of numbers
Note small x and Capital X are two different vectors. R is case sensitive
c is the concatenate function
You can easily do mathematical operation on two vectors of same size just
as you would do on two scalar variables
All vector elements must be of the same mode; it can be integer, numeric,
string, object, etc
K2Analytics.co.in
23
Matrices
Matrix variable is a 2 way table structure having rows and columns
Note the subtle difference in which the values have got populated in matrix m & M
Also note that to create the matrix we have used the R function named matrix by
passing certain arguments and values
K2Analytics.co.in
24
K2Analytics.co.in
25
Lists
In a Vector all values can be of only one mode type
In case you wish to save values of different mode types then we
should use Lists. Sample Syntax:
K2Analytics.co.in
26
Lists contd
Vectors in R are similar to Arrays in C. Elements cannot be deleted
in Vectors and if you wish to do it then use Lists
Adding to Lists
K2Analytics.co.in
27
List element can also be accessed using name tags as shown below
K2Analytics.co.in
28
List unlist
E.g. 1
E.g. 2
E.g. 3
E.g. 2 Name tags exists. The mode of the vector is character. (LCD rule)
E.g. 3 Note the suffixes 1, 2, 3, and 4 given to the VectorElement tags of the List
K2Analytics.co.in
29
Summarizing List
Lists are kind of vectors which can store values of different modes
We can add / delete values from list
List values can be given name tags
K2Analytics.co.in
30
Data Frames
Data Frame is used for storing data tables.
Very simply said, what we call Table in SQL parlance, Dataset in
SAS is called Data Frame in R terminology
The columns are the Vectors
Small e.g. to create a Data Frame
The first line of the data table showing the
column names is called header.
Each horizontal line representing a record
is called row
31
rm - Remove
rm is to remove objects no longer needed
(Cleanup)
Note: R does in memory processing and hence it is
advisable to keep removing objects which are not required.
K2Analytics.co.in
32
Note: c() adds them head to tail; cbind() combines them into matrix
form; rbind() adds them row-wise
K2Analytics.co.in
33
K2Analytics.co.in
34
K2Analytics.co.in
35
Factors
We are creating a vector named
data and it is of type character
Factor provide an efficient ways of storing data in R. If you have large data frame having
categorical variable then Factor converts the categorical values into levels and each level
corresponds to an integer number; For the factor column, this integer value is stored in the
data frame rather than the actual value.
36
Factorscontd
From previous e.g. and this e.g.
you can see that the levels are in
ascending order
K2Analytics.co.in
37
Importing Data
Reading tabular datafiles
Reading CSV files
Importing data from Excel
Importing data from SAS
Accessing Database
Saving in Rdata
Loading Rdata Objects
Writing to files
read.table
read.table function reads data from txt / csv file and returns a Data
Frame
Arguments
file = <the file path>
sep = argument to specify the separator
header = TRUE; if the first row of the data contains column names
stringsAsFactors = FALSE; this option will prevent character
variables to be converted to Factors
as.is = argument can be used to suppress factor conversion for
certain specific column; TRUE will ensure suppression of factor
conversion
There are many other arguments; run ?read.table command to get full help on all the arguments
K2Analytics.co.in
39
read.table e.g.
Sample Data file
Loan_Cross_Sell_Logistic_Regression_Sample.CSV
Data Import Syntax
40
read.table contd
Note the columns have the proper names as was in the first row of
the data file
In case the file is tab delimited the sep argument will become
sep = \t
K2Analytics.co.in
41
read.fwf
Read.fwf is used to read Fixed Width Format file
~ is to be replaced by full folder path
K2Analytics.co.in
42
K2Analytics.co.in
43
Load the library and call the read function to import from SAS dataset
K2Analytics.co.in
44
Close Connection
K2Analytics.co.in
45
Saving Objects
Let us start the R session afresh and try the below
Use
header=TRUE
option if the
column headers
is the first row in
the file
K2Analytics.co.in
46
K2Analytics.co.in
47
Note: the above command writes the row names (here row numbers are row names)
as an addition column in the output file. to avoid this use the option row.names=F
K2Analytics.co.in
48
K2Analytics.co.in
49
Thank you
End of Part 1