Sie sind auf Seite 1von 3

Commands to be used in R

1. To get descriptive statistics like mean, median, maximum value , minimum value, standard
deviation use the command:- summary(dataset name)
2. To get descriptive statistics like mean, median, maximum value , minimum value, standard
deviation use the command corresponding to a single variable in a data set :- summary(dataset
name$variablename)
3. To compare actual mean of a variable with a assumed mean we need to perform independent
sample t-test. The commands for this are mentioned as below:
a. For two tailed test: t.test(variable name, data=dataset name)
b. For Right tailed test: t.test(variable name, data=dataset name,alter= greater)
c. For left tailed test: t.test(variable name, data=dataset name,alter= less)
d. For Right tailed test at 90% level of confidence: t.test(variable name, data=dataset
name,alter= greater, conf.level=0.90)
4. To compare mean of two different populations (variables) we need to perform paired sample t-
test. The commands for this are mentioned as below:
a. For two tailed test: t.test(variable1, variable2, data=dataset name,pair=TRUE)
b. For Right tailed test: t.test(variable1, variable2, pair=TRUE, data=dataset name,alter=
greater)
c. For left tailed test: t.test(variable1, variable2, pair=TRUE, data=dataset name,alter=
less)
d. For Right tailed test at 90% level of confidence: t.test(variable1, variable2, pair=TRUE,
data=dataset name,alter= greater, conf.level=0.90)

To perform T- test, we need to satisfy the assumption of Normality, which can be done through Shapiro
test. The command for this is: Shapiro.test(variable name).

If normality is not confirmed we need to perform U-test. The command for U-test are
mentioned as below:

a. For two tailed test: wilcox.test(variable name, data=dataset name,exact=FALSE)


b. For Right tailed test: wilcox.test(variable name, data=dataset name,alter= greater,
exact=FALSE)
c. For left tailed test: wilcox.test(variable name, data=dataset name,alter= less,
exact=FALSE)
d. For Right tailed test at 90% level of confidence: wilcox.test(variable name,
data=dataset name,alter= greater, conf.level=0.90, exact=FALSE)
5. To compare mean of more than two different populations (variables) we need to perform One
Way ANOVA. The commands for this are mentioned as below:

Step-1: Access or create a data set


Step-2: Merge all the columns(variables) into one column using file=stack(dataset name)
command.
Step-3 : Run the following command to get output of ANOVA:
results=aov(values~ind, data=file)
summary(results)
Step-4 : Run the following command to get output of Post Hoc tTest:
Posthoc=TuckeyHSD(results)

6. To test the association/check the equality of proportion among variables we need to perform
chi-square test. The commands for this are mentioned as below:
chisq.test(dataset name)
7. To establish relationship among two or more than two different variables, we need to perform
correlation analysis. The commands for this are mentioned as below:
cor(dataset name)
cor(x,y,data=dataset name)

For correlation analysis with missing values:

cor(x,y,data=dataset name.use=complete.obs)

For hypothesis testing using correlation analysis

cor.test(x,y,data=datasetname)
8. To assess the Impact of one variable over other we need to perform a Regression analysis:
results=lm(x~y,data=dataset name)
summary(results)
9. To assess the Impact of more than one variable over other we need to perform a multiple
Regression analysis:
Results=lm(x~y+z,data=dataset name)
summary(results)
To check multi-colinearity in multiple regression analysis we need to compute VIF using
following commands
require(car)
vif(results)
10. To perform logistic regression analysis: (it is used when the dependent variable is categorical
in nature and follows binomial distribution)
results=glm(x~y+z,data=dataset name)
summary(results)
11. To get predicted values using the simple/multiple/logistic regression use the following
command
predict(results,data.frame(y=the given value, z=the given value))
12. Doing exploratory factor analysis in R (it is used to reduce items into factors)
Step-1: check sampling adequacy:
require(psych)
KMO(dataset name)
Step-2: check scale reliability:
require(psych)
alpha(dataset name)
step-3 : estimate the number of factors to be extracted:
PCA=princomp(dataset name)
summary(PCA)
screeplot(PCA)
Step-4 : perform the factor analysis:
factanal(dataset name, factors=no of factors to be extracted, rotation=varimax,
scores=regression)
13. Doing confirmatory factor analysis in R (it follows the results of exploratory factor analysis
and used for scale validiation)
require(lavaan)
model=latent variable1(factor-1)=~item1+item2+item3
latent variable2(factor-2)=~item4+item5+item6
latent variable3(factor-3)=~item7+item8
results=cfa(model,data=dataset name)
summary(results, fit.measures=TRUE)

14. Cluster analysis is used for grouping/classify the objects/population/charecteristics on the


basis of similarity.
Doing hierarchical clustering in R: (clustering on the basis of distance or dissimilarity of one
object from the other )
Step-1: compute Euclidian distance
D=dist(dataset name)
Step-2: Perform hierarchical clustering on the basis of Euclidian distance
C=hclust(D)
Step-3: Draw the dendrogram:
plot(C)

Step-4: To show the cluster membership numerically:

M=cutree(C,k=no of clusters desired/range of clusters desired)

Step-5: To show the cluster membership graphicaly:

rect.hclust(C,k=no of clusters desired, border= blue)

Doing K-means clustering in R: (clustering on the basis of observations with nearest mean)

km=kmeans(dataset name, no of clusters desired)

require(cluster)

clusplot(dataset name, km$cluster,labels=2)

15. To save the out puts in R- Environment:


save(list,file= dataset name)
16. To create subset from a Data set/ to remove columns (variables) from a data set name:
If you desire to get rid of the variable present in 2 nd and 5th column run the
following command
Mydata=dataset name[c(-2,-5)]

Das könnte Ihnen auch gefallen