Sie sind auf Seite 1von 37

Basic Functions in R

CREATING AN VECTOR, MATRIX,


ARRAY,FACTOR ,LIST AND DATA FRAME
• Vector:- Vector is scalar and it contains only
one row or only one colum
• Vector in R:-
>x=C(1,2,3)
>x
Output
[1] 1 2 3
• Matrix:- matrix is two dimensional it contains
rows, columns
• Creating matrix in R:- create 2,5 matrix
x=matrix(1:10,2,5) #rows=2,columns=5
x=matrix(1:100,2,5)
#rows=2,columns=5(wrong we should give
rows x columns=2x5=10,but here we gave
1:100,so we get error
• Array:- Array is three dimmensional
• Syntax in R
array(no.ofelements,dimn=(rows,coloumns,matrix)
• Create an Arreay with 2x2x5(2 rows, 2 column,5
matrices)
• array(1:20,dim=c(2,2,5))
Output:-
1 [,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 , , 3
[,1] [,2] [1,] 9 11 [2,] 10 12 , , 4 [,1] [,2] [1,] 13 15 [2,] 14
16 , , 5 [,1] [,2] [1,] 17 19 [2,] 18 20
• List:-list consits of vector,array and matrix with in
it
• List also contains another list with in it so it is
called as recursive function
• Syntax in R:-
• l1=list(c(1,2,3),matrix(1:10,2,5),array(1:12,dim=c(
2,2,3)))
>l2=list(c(4,5,6),matrix(11:20,2,5),array(1:20,dim=c(
2,5,4)),l1)
>l2
Dataframe Test -2 12marks Question
• Syntax for creating a data frame

• Output of created dataframe:


b)Attributes syntax:-
Attributes(df)
Attributes Output
3. Subsetting first four rows of df
4)Subsetting Last row of df

5)Subsetting all the columns except 2nd column


of df
6)Subset first row and first two columns of df

7)Subset all rows first column of df

8)Subset element in 10th row and second


column of df
Importing CSV ,txt file in to R
• First convert Excel file to CSV file
• And here I saved my csv file on desktop in the
folder example
• Before importing the csv I should set my
working directory to the folder example
• getwd() gives Present Working Directory
• Now I am going to set my working directory to
the folder example which is present on the
desktop

• Now my Working directory is set and now if I


want to see files Present in the directory just
type dir()
• > read.csv("Book3.csv")#Syntax for importing
csv file
• Importing txt file in to R
Basic Statistical Functions in R
• str() • t.test()
• summary() • sample()
• mean() • subset()
• median() • rnorm()
• var() • quantile()
• sd() • seq()
• scale() • options()
• rank() • All blue colour marked are in
• sort() word document that is send
• sum() previously
• rowSum()
• colSum()
• prod()
• cor()
• plot()
• table()
• chisquare.test()
• lm()
• Str()

Summary()
Mean,Median,Mode,Variance,sd
Scale()
• Sort():-by default ascending order,we can
change to descending order

Rank()
Rowsums(),Colsums()
• Table():- It is used for converting in to
contingency table and we need nominal
variable for this

Chi-sq test:- For chi sq test we want two


nominal variables
• Correlation():-By default we get pearson
correlation

• Spearman Correlation:-
Types of Graphs
 Bar Graph. For Qualitative Variables
 Pie chart. (nominal /Ordinal)

For Quantitative
 Line Graph Variables (interval/ratio)
 Histogram.
 Leaf and Stem Plot.
 Box Plot.
 Scatter Plot
Bar diagram
• Bar diagram is the graphical representation of a
nominal/ordinal variable.
• Three types of bar diagrams: simple, subdivided,
stacked
• First we have draw the frequency table
• Then convert that table in to barplot
R syntax for bar diagram
>barplot(table(chickwts$feed)) # simple bar plot
>barplot(table(var1,var2))#subdivided bar plot
>barplot(table(var1,var2),beside=T)# stacked bar plot
Pie chart
• Pie chart is useful to show the relative
importance various categories of a nominal
variable.
• To draw the pie plot first we need to generate
the frequency table and on top of which the
pie chart is drawn.
R syntax for Pie chart
>pie(table(chickwts$feed))
Line Graph
• Used to track trends and patterns in time
series data.
• We can draw line graph for one or more
variables
R code for univariate line graph
>plot(trees$Height, type=‘ l ‘ )
R code for mutivariate line graph
> matplot(trees,type = ‘ l ‘ )
Histogram
• Common graphical presentation of quantitative
data is a histogram.
• The variable of interest is placed on the
horizontal axis.
• A rectangle is drawn above each class interval
with its height corresponding to the interval’s
frequency, relative frequency, or percent
frequency.
R code for histogram
>hist(trees$Volume)
3.Scatter Diagram
• A scatter diagram is a graphical tool for
analyzing correlation between two variables.

• One variable is plotted on the horizontal axis


and the other is plotted on the vertical axis.

• The pattern of their intersecting points can


graphically show correlation patterns
R code for scatter diagram
>plot(trees$Height,trees$Girth)
5.Leaf and Stem Plot
•A stem and leaf plot is a method used to organize
statistical data
•To construct a stem-and-leaf display,
the observations must first be sorted in ascending
order.
• it must be determined what the stems will represent
and what the leaves will represent.
• Typically, the leaf contains the last digit of the
number and the stem contains all of the other digits.
• ex: 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88
106.
• In this example, the leaf represents the ones place
and the stem will represent the rest of the number.
R code for leaf and stem plot
>stem(trees$Height)
Leaf and Stem Plot:
illustration
Box and Whisker Plot
• It is a type of graph which is used to show the shape of the
distribution, its central value, and variability
• In statistics the five-number summary is a descriptive
statistic that provides information about a set of
observations regarding it’s distribution, central value and
variability.
 the sample minimum (smallest observation)
 the lower quartile or first quartile
 the median (middle value)
 the upper quartile or third quartile
 the sample maximum(largest observation)

R code for boxplot


>boxplot(chickwts$weight)
Visualising Time series data in R
#Plotting the time series
>plot(Airpassengers)
#Decomposition of time series
>Decompose(Airpassengers)
#plotting the additive decomposition
>plot(decompose(Airpassengers))
Box and Whisker Plot
Visualising Time series data in R
#Plotting the time series
>plot(Airpassengers)
#Decomposition of time series
>Decompose(Airpassengers)
#plotting the additive decomposition
>plot(decompose(Airpassengers))
Introduction to Business Analytics
Case Study-I
1. Extract a data set from the datasets package in R
2. How do you find the various attributes of the data set?
3. How do find out the measures of central tendency of
the variables?
4. How do you find out the dispersion measures including
coefficient of variation?
5. How do you subset the data ?
6. How do you add all the rows ?
7. How do you conduct correlation?
8. How do you get a scatter plot?
9. How do you get frequency table?
10. How do you draw a bar plot?
Case Study II
• Explain the structure of the data ‘mtcars’
• Draw appropriate graphs for the variables
• Find correlations between variables
• Do linear regression to find out the
relationship between mpg and hp,cyl.
Looping Functions in R
What is a loop ?
• a structure, series, or process, the end of which is
connected to the beginning.
• A looping function performs the same task
repeatedly.
• Important Looping functions in R
– apply # loops a function across rows or columns of a
data frame and outputs to df
– Lapply# loops a function across columns of a data
frame and outputs to a list
– tapply# loops a function on a variable across groups
and outputs to a vector.
Lab Test in R Programming
Write R code and execute them in R Studio

• Create a working directory on the desktop


• Create a excel spread sheet with three numeric and
two factor variables and convert it into a csv file and
place it into working directory.
• Import the data into R console
• Summerise the data which should include averages,
dispersion measures and shape statistics.
• Draw frequency tables for factor variables and
correlations for numeric variables
• Draw appropriate graphs for both numeric and factor
variables in the data frame.

Das könnte Ihnen auch gefallen