Beruflich Dokumente
Kultur Dokumente
HOMEWORK
Excel fluency
MR-B
Raghav Sharma
2018229
Cleaning up of data:
There were two rows which had missing values in Gender. Since we can’t replace the values because Gender is not
continuous. So, we delete them.
Next we, remove missing values in the other variables by series mean.
N Mean Media Mod Std. Varian Ske Std. Kurt Std. Mini Maxim
sin n s of of
g Skewn Kurt
ess osis
Playful_1 382 2 2.7304 3.0000 2.00 .83120 .691 .291 .125 -.395 .249 1.00 5.00
Playful_2 382 2 2.8586 3.0000 2.00 .89379 .799 .104 .125 -.804 .249 1.00 5.00
Playful_3 382 2 2.7225 3.0000 2.00 .88526 .784 .299 .125 -.475 .249 1.00 5.00
Playful_4 382 2 3.0916 3.0000 4.00 .91329 .834 -.286 .125 -.681 .249 1.00 5.00
Playful_5 382 2 2.9948 3.0000 4.00 .94465 .892 -.083 .125 -.883 .249 1.00 5.00
Playful_6 382 2 3.0262 3.0000 4.00 .95260 .907 -.107 .125 -.925 .249 1.00 5.00
Playful_7 382 2 2.8796 3.0000 3.00 .89686 .804 .064 .125 -.627 .249 1.00 5.00
CompLatent_1 382 2 2.6518 2.0000 2.00 .97029 .941 .277 .125 -.798 .249 1.00 5.00
CompLatent_3 382 2 3.2382 3.0000 4.00 .96299 .927 -.422 .125 -.700 .249 1.00 5.00
CompLatent_4 382 2 2.5288 2.0000 2.00 .99234 .985 .438 .125 -.430 .249 1.00 5.00
CompLatent_5 382 2 2.6702 2.0000 2.00 1.03030 1.062 .304 .125 -.876 .249 1.00 5.00
AtypUse_1 382 2 2.3979 2.0000 2.00 .93842 .881 .835 .125 .193 .249 1.00 5.00
AtypUse_2 382 2 2.3586 2.0000 2.00 .97726 .955 .706 .125 -.090 .249 1.00 5.00
AtypUse_3 382 2 2.2042 2.0000 2.00 .82620 .683 .811 .125 .596 .249 1.00 5.00
AtypUse_4 382 2 2.2513 2.0000 2.00 .91341 .834 .707 .125 .020 .249 1.00 5.00
AtypUse_5 382 2 2.2461 2.0000 2.00 .85553 .732 .794 .125 .449 .249 1.00 5.00
Useful_1 382 2 4.0262 4.0000 4.00 .70940 .503 -.703 .125 .989 .249 2.00 5.00
382 2 3.9791 4.0000 4.00 .69083 .477 -.597 .125 1.13 .249 1.00 5.00
Useful_4
5
382 2 4.0681 4.0000 4.00 .69914 .489 -.881 .125 2.17 .249 1.00 5.00
Useful_6
6
Useful_7 382 2 4.0890 4.0000 4.00 .68920 .475 -.359 .125 -.055 .249 2.00 5.00
Joy_2 382 2 3.4895 4.0000 4.00 .92990 .865 -.452 .125 -.189 .249 1.00 5.00
Joy_3 382 2 3.5393 4.0000 4.00 .90914 .827 -.623 .125 .250 .249 1.00 5.00
382 2 3.7199 4.0000 4.00 .83088 .690 -.982 .125 1.27 .249 1.00 5.00
Joy_4
0
Joy_5 382 2 3.3717 3.0000 4.00 .95188 .906 -.289 .125 -.435 .249 1.00 5.00
Joy_6 381 3 3.6404 4.0000 4.00 .85492 .731 -.659 .125 .346 .249 1.00 5.00
Joy_7 382 2 3.7173 4.0000 4.00 .85951 .739 -.767 .125 .678 .249 1.00 5.00
382 2 3.9948 4.0000 4.00 .64395 .415 -.825 .125 2.00 .249 2.00 5.00
InfoAcq_1
4
InfoAcq_2 382 2 3.8534 4.0000 4.00 .68694 .472 -.681 .125 .946 .249 2.00 5.00
382 2 3.9188 4.0000 4.00 .70708 .500 -.869 .125 1.88 .249 1.00 5.00
InfoAcq_4
0
InfoAcq_5 382 2 3.7016 4.0000 4.00 .79364 .630 -.586 .125 .513 .249 1.00 5.00
382 2 3.8796 4.0000 4.00 .72536 .526 -.891 .125 1.89 .249 1.00 5.00
DecQual_2
2
DecQual_3 382 2 3.8796 4.0000 4.00 .73969 .547 -.626 .125 .980 .249 1.00 5.00
DecQual_4 382 2 3.7513 4.0000 4.00 .76226 .581 -.727 .125 .989 .249 1.00 5.00
382 2 3.7513 4.0000 4.00 .71610 .513 -.671 .125 1.03 .249 1.00 5.00
DecQual_5
4
DecQual_6 382 2 3.91 4.00 4 .813 .661 -.715 .125 .621 .249 1 5
DecQual_7 382 2 3.82 4.00 4 .740 .547 -.520 .125 .733 .249 1 5
DecQual_8 382 2 3.25 3.00 3 .899 .808 -.059 .125 -.467 .249 1 5
380 4 21.650 22.000 22.0 2.12067 4.497 1.36 .125 7.36 .250 17.0 35.00
Age
0 0 0 6 9 0
Education -How 382 2 2.1832 2.0000 2.00 .94906 .901 1.53 .125 11.7 .249 .00 10.00
many years of 4 46
formal
college/university
education have
you completed?
Frequency 384 0 5.10 5.00 6 1.596 2.546 -.287 .125 -.506 .248 1 8
381 3 4.4199 3.0000 3.00 2.94190 8.655 1.58 .125 6.62 .249 .00 25.00
Experience
1 7
Inference:
1. Minimum & Maximum: To confirm this, we check the “Variable view” of the SPSS Data sheet and
we confirm if the range mentioned there collates with the data we have. In this case it does as
checked from the data sheet.
2. Mean: The mean for each of them is between the higher and the lower value. So that shows that there
is no error there.
3. Standard Deviation: This shows the variability spread of the data.
4. Skewness: Here we can see the skewness is negative for 26 out of the 37 variables, so we can say
that the data is skewed more towards the left. We can confirm the same from the histogram obtained
for these variables.
5. Kurtosis: The value for this is less than 3 and less than 0 as well, 26 out of the 37 variables, this tells
us that the outliers are not as extreme and that the peak of the curve, when plotted is flatter.
6. The ratio of skewness to its standard error can be used as a test of normality (that is, you can reject
normality if the ratio is less than -2 or greater than +2). A large positive value for skewness indicates
a long right tail; an extreme negative value indicates a long left tail.
For the first row it is 2.3 and the second row it is .83.
Test of Normality:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Since the value of p is <0.05, it means that the data is not normally distributed. Now we change the level of
confidence to 99% and check again.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Since the value of p is still <0.05, it means that the data is not normally distributed.
Correlation:
Factor Analysis:
Sig. .000
This table shows two tests that indicate the suitability of your data for structure detection. The Kaiser-Meyer-Olkin
Measure of Sampling Adequacy is a statistic that indicates the proportion of variance in your variables that might be
caused by underlying factors. High values (close to 1.0) generally indicate that a factor analysis may be useful with
your data. If the value is less than 0.50, the results of the factor analysis probably won't be very useful.
Since the value is very close to 1, it means factor analysis would be useful.
Bartlett's test of sphericity tests the hypothesis that your correlation matrix is an identity matrix, which would
indicate that your variables are unrelated and therefore unsuitable for structure detection. Small values (less than
0.05) of the significance level indicate that a factor analysis may be useful with your data.
Since the value is < 0.05, it means factor analysis would be useful.
Table of Communalities:
It tell us to what extent do our 8 underlying factors account for the variance of our input variables. This is answered
by the r square values. Variables having low communalities -< 0.40- don't contribute much to measuring the
underlying factors.
We can remove such variables from the analysis.
Table of component Matrix:
The component matrix shows the Pearson correlations between the items and the components.These correlations
are called factor loadings.
Component Matrixa
Component
1 2 3 4 5 6 7 8
For instance, V12 measures (correlates with) components 1,2, and 3.. If a variable has more than 1 substantial factor
loading, we call those cross loadings.
Rotated Component Matrix:
Component
1 2 3 4 5 6 7
Playful_1 .657
Playful_2 .828
Playful_3 .759
Playful_4 .710
Playful_5 .809
Playful_6 .825
Playful_7 .653
CompLatent_1 .778
CompLatent_3 .774
CompLatent_4 .769
CompLatent_5 .723
AtypUse_1 .797
AtypUse_2 .851
AtypUse_3 .886
AtypUse_4 .866
AtypUse_5 .869
Useful_1 .806
Useful_4 .845
Useful_6 .767
Useful_7 .713
Joy_2 .815
Joy_3 .783
Joy_4 .811
Joy_5 .801
Joy_7 .831
InfoAcq_1 .676
InfoAcq_2 .674
InfoAcq_4 .480 .596
InfoAcq_5 .550
DecQual_2 .764
DecQual_3 .819
DecQual_4 .789
DecQual_5 .775
DecQual_6 .520
DecQual_7 .655
DecQual_8 .589
Gender .
SMEAN(Useful_2) .793
SMEAN(Useful_3) .832
SMEAN(Useful_5) .838
SMEAN(Joy_1) .687
SMEAN(InfoAcq_3) .682
SMEAN(DecQual_1) .570 .435
SMEAN(Joy_6) .782
For instance, all the variables having” Playful” are best explained by component 4.
Cronbach's alpha:
Factor 1:
You can see that the Cronbach's alpha value for Factor 1 is shown to be approximately 0.943. The cutoff value of
0.7 is usually used in social science researches. So, Cronbach's value of 0.7 or higher is generally considered
reliable. Value ranges from 0 to 1.
Factor 2:
You can see that the Cronbach's alpha value for Factor 2 is shown to be approximately 0.94. The cutoff value of
0.7 is usually used in social science researches. So, Cronbach's value of 0.7 or higher is generally considered
reliable. Value ranges from 0 to 1.
Factor 3:
You can see that the Cronbach's alpha value for Factor 3 is shown to be approximately 0.904.
Factor 4:
You can see that the Cronbach's alpha value for Factor 4 is shown to be approximately 0.912.
Factor 5:
You can see that the Cronbach's alpha value for Factor 5 is shown to be approximately 0.934.
Factor 6:
You can see that the Cronbach's alpha value for Factor 6 is shown to be approximately 0.807.
Factor 7:
You can see that the Cronbach's alpha value for Factor 7 is shown to be approximately 0.817.
In the model fit summary, we saw the CMIN Table where PCMIN/DF value is 1.756 which is less than 3. Hence, our
model can be considered fit.
After looking at the PCMIN/DF value, we look for CFI (Comparative fit Index) in the Baseline Comparisons
which is 0.894 which is closer to 1. Hence, our Factor Analysis gets confirmed at this step.
At the end, we look for the RMSEA (ROOT MEAN SEA) which calculates the badness of model fit. Our
default model showcased RMSEA Value as 0.048 which is significantly very less. Hence, our model is a good
fit.
Removing all the constructs, we obtain a clear version of the above path analysis depicting only the factors
of the model.
Results for path analysis:
In the model fit summary, we saw the CMIN Table where PCMIN/DF value is 2.369 which is less than 3.
Hence, our model can be considered fit.
After looking at the PCMIN/DF value, we look for CFI (Comparative fit Index) in the Baseline
Comparisons which is 0.838 which is closer to 1. Hence, our Factor Analysis gets confirmed at this step.
At the end, we look for the RMSEA (ROOT MEAN SEA) which calculates the badness of model fit. Our
default model showcased RMSEA Value as 0.064 which is significantly less. Hence, our model is a good fit.
Inference:
First table provides basic information like mean, standard deviation etc. about the selected sample.
In second table, we check the p value which should be less than 0.05 to accept the null hypothesis. Here
however, the p value is <0.05 therefore we reject the null hypothesis and we infer that the average
experience of the population is not 5.
Here, we check the significance value for Levene’s test first and since p here is >0.05, the data is homogenous and so
we check the significant value of the first row. Here significance is >0.05 so the null hypothesis is accepted that is
there is no significant difference in the Creativity when respondent interact with Excel in Males and Females.
Paired T test cannot be performed as we do not have variables of before and after.
To test the homogeneity of a sample we run the Levene’s test and to check the value we see the column
named “Sig.”. If the value of p is >0.05, that means equal variances are not assumed.
The means are significantly different. We reject the null hypothesis because p<0.05
Test 5: Manova
Research Question: There is no significant difference in users who find excel playful with respect to the number of
times they use excel and gender.
Hypothesis: There is no significant difference in users who find excel playful with respect to the number of times
they use excel and gender.
Between-Subjects Factors
Value Label N
2 Almost Never 27
7 Daily 38
Multiple times 24
8
daily
1 Male 290
Gender
2 Female 92
This table shows all the data mike mean, standard deviation etc.
Descriptive Statistics
Box's M 115.504
F 1.476
df1 72
df2 11948.872
Sig. .006
Box’s Test: It is to check the quality of the covariance matrix. Here we check the p value which is not less
0.05. Then we check the p value at 99% significance level. P value still remains 0.06>0.01, Therefore, we
reject the null hypothesis and the data is not homogeneous.
Multivariate Testsa
Interpretation: We will check only wilkis lambda and Pillai’s trace, we check the significance value of each
of the factors.
a) Frequency: The significance value is 0.287, so the test value is insignificant.
b) Gender: The significance value is 0.366, so the test value is insignificant.
c) Frequency * Gender: The significance value is 0.948, so the test value is insignificant.
Therefore, these constructs have no effect.
To test the homogeneity of a sample we run the Levene’s test and to check the value we see the column
named “Sig.”. If the value of p is >0.05, that means equal variances are not assumed.
SPSS regression with default settings results in four tables. The most important table is the last table, “Coefficients”.
b coefficient is statistically significant if its p-value is smaller than 0.05. Since it is more than 0.05. So it is
insignificant.
The beta coefficients allow us to compare the relative strengths of our predictors. It is 0.035 in this case.
The value of R is 0.035, which shows very less correlation or nearly no correlation.
Test 6 – Discriminant Analysis:
Research Question: We want to predict the frequency of the Excel usage by Uselful_2, Joy_1, InfoAcq_3
The group statistics tells us the mean and standard deviation among the Frequency and the independent
variable.
Since Sig. value <0.05, it is statistically significant
In Pooled within groups we can see that we don’t have highly correlated variables, which was one of the pre
requisites in discriminant analysis. Log determinant values are nearly equal.
The sig. value of Box’s test is >0.01. So we fail to reject the null hypothesis of equal covariance matrices.
Standardized Canonical Discriminant Function
Coefficients
Function
1 2 3
Here it explains in the order of explanation, which variable explains the most. As we can see Useful 2
explains the most.
Structure Matrix
Function
1 2 3
Frequency
Almost Once per 2-3 times Once 2-3 Daily Multiple times daily
Never month per month per week times
per week
a. Hierarchical cluster
Interpretation:
According to the hierarchical cluster, maximum number of good clusters are 4.
b. K-Means Cluster
ANOVA
Cluster Error
Mean Square df Mean Square df F Sig.
Playful 25.774 4 .268 377 96.123 .000
AtypUse 33.209 4 .297 377 111.627 .000
CompLatent 23.703 4 .386 377 61.364 .000
Useful 16.108 4 .199 377 81.083 .000
Joy 25.841 4 .319 377 80.990 .000
InfoAcq 13.515 4 .151 377 89.566 .000
DecQual 20.847 4 .242 377 86.089 .000
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.
Interpretation: We first check the significance level of all the variables we have taken. Since the significance
for all of them are less than 0.05 at 95% confidence level, we can say that all these variables are significant.
Interpretation: We selected 5 clusters and we see that the number of variables in each of the cluster isn’t
uniform so we change the number of clusters to 4 and check again.
Number of Cases in each
Cluster
Clusteri 1 96.000
nt 2 112.000
3 103.000
4 71.000
Valid 382.000
Missing .000
There’s still no uniformity in the numbers so we reduce the number of clusters by one more.
The values have now become pretty uniform so we will finalize 3 clusters.
c. Two-step cluster