Sie sind auf Seite 1von 3

List of R Commands

Descriptive Statistics Related

1. Summary descriptive statistics

>summary(var.name)

2. Constructing boxplot

>boxplot(var.name)

3. Constructing histogram

>hist(var.name)

4. Calculating normal probability or finding value on a normal distribution for a given


probability

>pnorm(x, mean, standard.deviation) (this gives you the probability that a normal distribution is
less than or equal to x)

>qnorm(p,mean, standard.deviation) (this gives you the value x such that the probability that
the normal distribution is less or equal to x is equal to p).

Regression Analysis Related

1. Linear regression analysis

>lm(y~x)

#Here the command is lm and y~x indicates that it is a regression of y (response) on x


(predictor).You can save the analysis into an object by assigning a name to it.

>lm.fit1=lm(y~x) #this saves the fit into an object called lm.fit1

>summary(lm.fit1) #this will give you a summary report of the regression analysis
>plot(lm.fit1)

#This generates 4 plots, including the residuals vs. fitted values and normal probability plot of
the residuals. The residuals vs. fitted values plot can also be done in the following way

>plot(lm.fit1$fitted,lm.fit1$residuals) #this is the command that plots two variables (x,y)


>abline(h=0) #this adds a reference line at y = 0

>qqnorm(lm.fits$residuals)

#The qqnorm function plots the normal probability plot of the given variable

Multiple Linear Regression Related

>pairs(meddicorp.sales[,c(-4,-5)],pch=16)

#The R command pairs produces a matrix of pair-wise scatterplots. Since we only look at the
first three variables, SALES, ADV and BONUS, what meddicorp.sales[,c(-4,-5)] does is to tell R
to just retrieve the first three columns (c(-4, -5) means ignoring the 4th and 5th columns). The
data frame meddicorp.sales is saved as a 25x5 matrix. The command pch=16 tells R to use
plotting characteristic 16, which is a black dot.

>attach(meddicorp.sales)
>pairs(cbind(SALES,ADV,BONUS),pch=16)

#The attach() commands allows you to work with a specific data frame (meddicorp.sales in this
case) so you can simply use the variables included in the data frame without having to specify
which data frame from which the variables are drawn.

Alternatively, you can use data=data.name in the lm() as

>meddicorp.fits=lm(SALES~ADV+BONUS,data=meddicorp.sales)

>meddicorp.fits=lm(SALES~ADV+BONUS)

#The linear regression command lm() is still used. Note that the general format for fitting a
multiple regression would be y~x1+x2+. Also note that the regression analysis is saved into
an object called meddicorp.fits.

>summary(meddicorp.fits)
#This prints out the regression analysis

Variance Inflation Factor

To calculate VIF, you need to first install a package called HH. Select Packages menu on the
top of R console screen, then select Install package(s). When a box containing a list of
packages pops up, just select HH, then click ok. This package called HH will now be installed on
your R. However, in order to use it, you have to type

>library(HH)

#you only need to do this once every time you open R. However, if you exit R and reopens it,
then you need to do it again in order to calculate VIF

>vif(Bodyfat=Tricep+Thigh+Midarm,data=bodyfat)

#this will calculate the VIF of this linear regression model. Note that I am assuming that I
already read the data file and saved it into a data frame called bodyfat.

Das könnte Ihnen auch gefallen