Beruflich Dokumente
Kultur Dokumente
1. To get descriptive statistics like mean, median, maximum value , minimum value, standard
deviation use the command:- summary(dataset name)
2. To get descriptive statistics like mean, median, maximum value , minimum value, standard
deviation use the command corresponding to a single variable in a data set :- summary(dataset
name$variablename)
3. To compare actual mean of a variable with a assumed mean we need to perform independent
sample t-test. The commands for this are mentioned as below:
a. For two tailed test: t.test(variable name, data=dataset name)
b. For Right tailed test: t.test(variable name, data=dataset name,alter= greater)
c. For left tailed test: t.test(variable name, data=dataset name,alter= less)
d. For Right tailed test at 90% level of confidence: t.test(variable name, data=dataset
name,alter= greater, conf.level=0.90)
4. To compare mean of two different populations (variables) we need to perform paired sample t-
test. The commands for this are mentioned as below:
a. For two tailed test: t.test(variable1, variable2, data=dataset name,pair=TRUE)
b. For Right tailed test: t.test(variable1, variable2, pair=TRUE, data=dataset name,alter=
greater)
c. For left tailed test: t.test(variable1, variable2, pair=TRUE, data=dataset name,alter=
less)
d. For Right tailed test at 90% level of confidence: t.test(variable1, variable2, pair=TRUE,
data=dataset name,alter= greater, conf.level=0.90)
To perform T- test, we need to satisfy the assumption of Normality, which can be done through Shapiro
test. The command for this is: Shapiro.test(variable name).
If normality is not confirmed we need to perform U-test. The command for U-test are
mentioned as below:
6. To test the association/check the equality of proportion among variables we need to perform
chi-square test. The commands for this are mentioned as below:
chisq.test(dataset name)
7. To establish relationship among two or more than two different variables, we need to perform
correlation analysis. The commands for this are mentioned as below:
cor(dataset name)
cor(x,y,data=dataset name)
cor(x,y,data=dataset name.use=complete.obs)
cor.test(x,y,data=datasetname)
8. To assess the Impact of one variable over other we need to perform a Regression analysis:
results=lm(x~y,data=dataset name)
summary(results)
9. To assess the Impact of more than one variable over other we need to perform a multiple
Regression analysis:
Results=lm(x~y+z,data=dataset name)
summary(results)
To check multi-colinearity in multiple regression analysis we need to compute VIF using
following commands
require(car)
vif(results)
10. To perform logistic regression analysis: (it is used when the dependent variable is categorical
in nature and follows binomial distribution)
results=glm(x~y+z,data=dataset name)
summary(results)
11. To get predicted values using the simple/multiple/logistic regression use the following
command
predict(results,data.frame(y=the given value, z=the given value))
12. Doing exploratory factor analysis in R (it is used to reduce items into factors)
Step-1: check sampling adequacy:
require(psych)
KMO(dataset name)
Step-2: check scale reliability:
require(psych)
alpha(dataset name)
step-3 : estimate the number of factors to be extracted:
PCA=princomp(dataset name)
summary(PCA)
screeplot(PCA)
Step-4 : perform the factor analysis:
factanal(dataset name, factors=no of factors to be extracted, rotation=varimax,
scores=regression)
13. Doing confirmatory factor analysis in R (it follows the results of exploratory factor analysis
and used for scale validiation)
require(lavaan)
model=latent variable1(factor-1)=~item1+item2+item3
latent variable2(factor-2)=~item4+item5+item6
latent variable3(factor-3)=~item7+item8
results=cfa(model,data=dataset name)
summary(results, fit.measures=TRUE)
Doing K-means clustering in R: (clustering on the basis of observations with nearest mean)
require(cluster)