Sie sind auf Seite 1von 17

APSY 607: Assignment 2 Jaylene Bettcher Dear Mr.

Drake (employer), In order to provide you with several insights into the data that you have given me, I first had to prepare the data for analysis. It is important to prepare data for analysis to ensure that assumptions for the statistical measures used are being met, and to ensure that the data is an accurate representation of the general population, which allows us to make valid inferences. The following procedures were done to prepare your data for analysis, and not to worry, the data was not deleted or altered in any way. Data Preparation Examining out of range values and missing values I was very impressed to see that there is no missing data for any of the variables, which will contribute to accuracy of the statistical measures. The data for each of the variables appears to be within range. However, it is worth noting that there was a larger range (62.4) for confidence in ability to do math, which will be further examined throughout the process. Ensuring plausible means and standard deviations The mean and median for all of the variables are similar, which is significant of a normal distribution. Again, it is important to note that confidence in ability to do math has a relatively large standard deviation (10.84) that may be indicative of larger variance between the values. I will continue to examine this variable. Investigating univariate outliers Figure 1. Investigating Univariate Outliers in Math Related Variables

The circles in the above graph that the value at question is 1.5 inter quartile ranges (IQR) from the median. Although the value varies from the median, it is not a significant outlier. From this figure it is evident that there are no significant univariate outliers or extreme values that will distort the statistics and increase type one and type two errors. Checking pairwise plots for non linearity and heteroscedasticity Figure 2. The Relationship between Confidence in Ability and Math Enjoyment

The data in figure 2 indicates that there is a relatively strong positive relationship between confidence in ability to do math and math enjoyment. However, it is not of great concern as the pattern is not an elongated thin oval so it is not a strong positive relationship. Figure 3. The Relationship between Confidence in Ability and General Stress Scale

The relationship displayed in figure 3 between confidence in ability to do math and score on general stress scale is fairly scattered and cloud like so it is likely that there is not a significant correlation between the two variables. Many of the variables examined were consistent with this pattern, which increases power as the variables remain independent. Figure 4. The Relationship between Hours of Math Homework and Math Enjoyment

Figure 4 depicts a weak positive relationship between hours of math homework and math enjoyment. However, the values are not as scattered as the previous figure that may indicate biases in the relationship, which will be further examined. Examining skewness and kurtosis Figure 5. Examining Skewness and Kurtosis among Math Related Variables
Statistics hours of confidenc math math enjoyme nt 250 0 .009 .154 teacher math support 250 0 -.027 .154 peer math support 250 0 -.004 .154 score on general stress scale 250 0 -.328 .154 parent math enjoyment 250 0 -.198 .154 age (years) 250 0 -.002 .154

e in ability homewo to do math N Valid Missing Skewness Std. Error of Skewness Kurtosis Std. Error of Kurtosis -.376 .307 2.994 .307 250 0 .222 .154 rk per month 250 0 .940 .154

-.169 .307

-.512 .307

-.132 .307

.391 .307

.202 .307

.022 .307

Skewness and kurtosis are two components of normality. When the distribution is normal, the values of skewness and kurtosis should be close to 0. It is evident that the variables are all close to zero, which is indicative of a normal distribution. It is important that the distribution is normal in order to reduce error. Checking for multivariate outliers Please see the multiple regression section for further details. I chose to disclose details about checking for multivariate outliers in this section because it is easier to understand when paired with a research question and a multiple regression equation. Evaluating variables for multicollinearity and singularity Figure 6. Evaluating Variables for Multicollinearity and Singularity in Math Variables

Correlations score hours of confidenc e in ability math homework math enjoyme nt .616
**

on teacher math support .053 peer math support .141


*

general stress scale .043

parent math enjoyment -.210


**

age (years) .171


**

to do math per month confidence in ability to do math Pearson Correlati on Sig. (2tailed) N hours of math homework per month Pearson Correlati on Sig. (2tailed) N math enjoyment Pearson Correlati on Sig. (2tailed) .000 .000 250 .616
**

.429

**

.000

.000

.406

.026

.495

.001

.007

250 .429
**

250 1

250 .585
**

250 -.033

250 .124
*

250 -.003

250 .006

250 -.005

.000

.000

.599

.050

.961

.925

.938

250 .585
**

250 1

250 -.083

250 .245
**

250 .043

250 -.226
**

250 -.084

.192

.000

.500

.000

.187

N teacher math support Pearson Correlati on Sig. (2tailed) N peer math support Pearson Correlati on Sig. (2tailed) N score on general Pearson Correlati

250 .053

250 -.033

250 -.083

250 1

250 -.290
**

250 -.074

250 .120

250 .070

.406

.599

.192

.000

.244

.058

.270

250 .141
*

250 .124
*

250 .245
**

250 -.290
**

250 1

250 .011

250 -.190
**

250 -.100

.026

.050

.000

.000

.861

.003

.115

250 .043

250 -.003

250 .043

250 -.074

250 .011

250 1

250 .005

250 .063

stress scale on Sig. (2tailed) N parent math Pearson enjoyment Correlati on Sig. (2tailed) N age (years) Pearson Correlati on Sig. (2tailed) N 250 250 250 250 250 250 250 250 .007 .938 .187 .270 .115 .325 .529 250 .171
**

.495

.961

.500

.244

.861

.940

.325

250 -.210
**

250 .006

250 -.226
**

250 .120

250 -.190
**

250 .005

250 1

250 .040

.001

.925

.000

.058

.003

.940

.529

250 -.005

250 -.084

250 .070

250 -.100

250 .063

250 .040

250 1

**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Figure 6 allows us to see that the two variables that have the most positive correlations are confidence in ability to do math and math enjoyment. Since these variables are significantly correlated with many other variables, they may weaken the analysis and provide inaccurate results. It is also important to note that parent math enjoyment is negatively correlated with many variables, which may also weaken results.

Multiple Regression Mr. Drake, after preparing your data for analysis, it is evident that teacher math support and peer math support are correlated at -.290. This seems problematic to me as one would assume that teacher math support and peer math support would coexist in a classroom environment. Therefore I thought it would be beneficial to further examine this correlation and its impact on students enjoyment for math. The multiple regression analysis is based on the following research question: How much does teacher support and peer support contribute to math enjoyment in youth? Figure 7. Descriptive Statistics of Math Related Variables
Descriptive Statistics Mean math enjoyment teacher math support peer math support 20.0827 27.5500 11.770 Std. Deviation 3.33981 5.25000 1.7000 N 250 250 250

The standard deviation for peer math support is 1.7, signifying low variance between scores, while the standard deviation for teacher math support is 5.25, signifying higher variance between scores. The standard deviation of 5.25 may be indicative of unequal, and therefore it should be further examined. It is important to have equal variance as it is an assumption for multiple regression analysis. Figure 8. The Relationship between Teacher and Peer Math Support and Math Enjoyment
Correlations teacher math math enjoyment Pearson Correlation math enjoyment teacher math support peer math support Sig. (1-tailed) math enjoyment teacher math support peer math support N math enjoyment teacher math support peer math support 1.000 -.083 .245 . .096 .000 250 250 250 support -.083 1.000 -.290 .096 . .000 250 250 250 peer math support .245 -.290 1.000 000 .000 . 250 250 250

Figure 8 displays that there is a correlation of -.290 between teacher math support and peer math support, a correlation of -.083 between teacher math support and math enjoyment, and a correlation of .245 between peer math support and math enjoyment. Furthermore, there is a significant positive correlation between peer math support and math enjoyment. It is important that independent variables have a higher correlation with the dependent variable than they do with each other. After further examining the predictors, I found that R square is .06, which means that the predictors account for 6% of the variability in the dependent variable. The Durbin-Watson is 1.921, which is an acceptable level because it indicates that the residuals are more than likely not auto correlated. The results of an ANOVA indicate that teacher and peer support contribute to 7.94 of the variability of math enjoyment. A subsequent examination of the coefficients was done and it is apparent that peer math support is a significant predictor of math enjoyment, as ir contributes to 24% of the variability of math enjoyment, whereas teacher math support does not contribute to math enjoyment. To follow up on the Durbin-Watson, collinearity statistics were examined to inform us if our variables are independent enough and not too highly correlated. The tolerance was .916, which indicates that the two independent variables are independent enough. A Variance inflation factor of one is acceptable and we have 1.092, so we can presume that the independent variables are not too highly correlated and are acceptable for this model. On average the mahal. distance is under the critical value, but there is one value that is well above it (11.687) that we will have to further investigate to ensure that there are not multivariate outliers. To further investigate multivariate outliers I examine the Cooks distance and found that there are no values over one, which means that there are not any highly influential data points. Figure 10. Histogram of Math Enjoyment

Figure 11. P-P Plot of Regression Standardized Residual

Figure 12. Scatterplot of the Math Related Variables

Figure 10 indicates that the dependent variable is normally distributed, figure 11 indicates that the standardized residuals are normally distributed, and figure 12 indicates a non-linear relationship. Therefore, it is evident that the assumptions for the multiple regression were met, and that the results are accurate which will aid us in answering our inference. Strengths and Limitations of a Multiple Regression Strengths: A multiple regression has many stringent assumptions that must be followed in order to ensure that your inference is accurate. The assumptions include; errors are independent, errors follow a normal distribution, and errors have constant variance. Limitations: A limitation of using a multiple regression is that many people without statistical backgrounds often assume that a correlation means causation. For instance, they may look at this data and believe that peer support causes math enjoyment, when, in fact, the data merely indicates that peer support is correlated with math enjoyment. Furthermore, a multiple regression can only predict a single dependent variable from a set of independent variables. Recommendations (What I would have done differently)

Mr. Drake, it is important that you create your research questions before collecting your data in order to help eliminate biases. Also, a lot of the variables were similar and as a result they were correlated with one another (although none were significant). It is important that the independent variables are not correlated with one another because it weakens the analysis. Inference Since the assumptions were met it is plausible to believe that our results are an accurate representation of the given population. The results tell us that teacher support is negatively correlated with math enjoyment in youth, and that peer support is correlated with math enjoyment in youth. However, these results need to be interpreted with caution as the independent variables were correlated (although they were not significantly correlated). These results may be beneficial to teachers of the given population; it may be of benefit for teachers to incorporate group projects, whereby peers are required to support one another, as it is positively correlated with math enjoyment. MANOVA It is important to review the criteria for the groups in order to eliminate confusion. Group 1 represents youth whose parents/guardians have a high social economic status (SES), group 2 represents youth whose parents/guardians have an average SES, and group 3 represents youth whose parents/guardians have a low SES. I thought it would be interesting to examine if SES impacts parent math enjoyment, and furthermore, youths confidence in ability to do math and the amount of time that they spend on math homework. The following research question was examined by the MANOVA: Do different types of SES have significantly different impacts on parental math enjoyment, youths confidence in their ability to do math, and the amount of time that youth dedicate to their math homework? When looking at between-subject factors, we can see that there are 88 subjects in group 1 (high SES), 90 subjects in group 2 (average SES), and 72 subjects in group 3 (low SES). It is beneficial to have equal numbers in each group; unfortunately our groups are not equal, which may lead to biases in the results. Next I ran the Boxs Test of Equality of Covariance Matrices and I found that the results were not significant, so we can assume that the covariance matrices are equal. Furthermore, the results of the Bartletts Test of Sphericity were significant, so we can assume that the variables are highly correlated enough to use in a multivariate analysis. Figure 13. Multivariate Tests to determine significance of Math Related Variables
Multivariate Tests
c

Partial Eta Effect Intercept Pillai's Trace Value .986 F 5773.811


a

Hypothesis df 3.000

Error df 245.000

Sig. .000

Squared .986

Wilks' Lambda Hotelling's Trace Roy's Largest Root group Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root a. Exact statistic

.014 70.700 70.700 .037 .963 .038 .032

5773.811 5773.811 5773.811

a a a

3.000 3.000 3.000 6.000 6.000 6.000 3.000

245.000 245.000 245.000 492.000 490.000 488.000 246.000

.000 .000 .000 .160 .160 .160 .050

.986 .986 .986 .019 .019 .019 .031

1.550 1.551
a

1.551 2.635
b

b. The statistic is an upper bound on F that yields a lower bound on the significance level. c. Design: Intercept + group

Wilks Lambda is the best indicator of significance from the previous tests. According to the results of the Wilks Lambda, the group effect is not significant, and as such there may not be significant results to answer the research question with. After running Levenes test of Equality of Error Variances it is clear that there are equal error variances across groups, which increases power. Furthermore, Tests of Between-Subjects Effects indicated that there are no significant differences between confidence in ability to do math, hours of math homework per month, and parent math enjoyment. Figure 14. Residual Sums of Squares Cross Products Matrix of Math Related Variables
Residual SSCP Matrix confidence in ability to do math Sum-of-Squares and CrossProducts confidence in ability to do math hours of math homework per month parent math enjoyment Covariance confidence in ability to do math hours of math homework per month parent math enjoyment Correlation confidence in ability to do math hours of math homework per month parent math enjoyment -.216 .004 1.000 .417 1.000 .004 -4.464 1.000 .024 .417 3.665 -.216 13.187 8.600 .024 -1102.557 116.026 5.980 13.187 905.200 -4.464 3257.289 2124.273 5.980 28658.376 hours of math homework per month 3257.289 parent math enjoyment -1102.557

Residual SSCP Matrix confidence in ability to do math Sum-of-Squares and CrossProducts confidence in ability to do math hours of math homework per month parent math enjoyment Covariance confidence in ability to do math hours of math homework per month parent math enjoyment Correlation confidence in ability to do math hours of math homework per month parent math enjoyment Based on Type III Sum of Squares -.216 .004 1.000 .417 1.000 .004 -4.464 1.000 .024 .417 3.665 -.216 13.187 8.600 .024 -1102.557 116.026 5.980 13.187 905.200 -4.464 3257.289 2124.273 5.980 28658.376 hours of math homework per month 3257.289 parent math enjoyment -1102.557

Figure 14 tells us about the correlation between the dependent variables. It appears that there is a reasonable correlation between hours between hours of math homework per month and confidence in ability to do math. There is a very weak correlation between hours of math homework per month and parent enjoyment of math, and there is a negative correlation between confidence in ability to do math and parent math enjoyment. Figure 15. Confidence in Ability to do Math and SES

Figure 16. Hours of Math Homework per month and SES

Figure 17. Parent Math Enjoyment and SES

Although figure 15, 16, and 17 appear to have significant differences between grouping variables, the margins are merely a few numbers apart, and as such the differences are not significant. With that being said, the largest group difference in figure 15 is between high SES and average SES, the largest group difference in figure 16 is between high SES and low SES, and the largest group difference in figure 17 is between average SES and low SES. There are not any trends or significant differences between grouping variables, and the data appears to be randomly dispersed. Figure 18. Contrast Results of Math Related Variables

Contrast Results (K Matrix) Dependent Variable confidence in ability to do grouping variable Helmert Contrast Level 1 vs. Later Contrast Estimate Hypothesized Value Difference (Estimate - Hypothesized) Std. Error Sig. 95% Confidence Interval for Difference Level 2 vs. Level Contrast Estimate 3 Hypothesized Value Difference (Estimate - Hypothesized) Std. Error Sig. 95% Confidence Interval for Difference Lower Bound Upper Bound Lower Bound Upper Bound math 3.137 0 3.137 1.430 .029 .321 5.952 -.574 0 -.574 1.703 .737 -3.928 2.781 hours of math homework per month .932 0 .932 .389 .017 .165 1.699 .178 0 .178 .464 .701 -.735 1.092 parent math enjoyment .066 0 .066 .254 .794 -.434 .567 -.265 0 -.265 .303 .383 -.861 .332

Figure 18 tells us about the confidence levels in the dependent variables. It is evident that the confidence levels are not very wide, and therefore there is not a lot of variability, which allows you to be more confidence in your results. The confidence level for the confidence in ability to do math is the highest. Once again, none of the results are significant. Figure 19. Multiple Comparisons between SES and Math Related Variables
Multiple Comparisons Tukey HSD 95% Confidence Mean (I) grouping Dependent Variable confidence in ability to do math Average SES variable High SES (J) grouping variable Average SES Low SES High SES Low SES Low SES High SES Difference (I-J) 3.4234 2.8498 -3.4234 -.5736 -2.8498 Std. Error 1.61482 1.71171 1.61482 1.70313 1.71171 Sig. .088 .221 .088 .939 .221 Interval Lower Bound -.3842 -1.1863 -7.2311 -4.5895 -6.8859 Upper Bound 7.2311 6.8859 .3842 3.4423 1.1863

Average SES hours of math homework per month Average SES High SES Average SES Low SES High SES Low SES Low SES High SES Average SES parent math enjoyment Average SES High SES Average SES Low SES High SES Low SES Low SES High SES Average SES Based on observed means. The error term is Mean Square(Error) = 3.665.

.5736 .8428 1.0211 -.8428 .1784 -1.0211 -.1784 .1987 -.0660 -.1987 -.2647 .0660 .2647

1.70313 .43965 .46602 .43965 .46369 .46602 .46369 .28699 .30421 .28699 .30269 .30421 .30269

.939 .136 .075 .136 .922 .075 .922 .768 .974 .768 .657 .974 .657

-3.4423 -.1939 -.0777 -1.8794 -.9150 -2.1200 -1.2717 -.4780 -.7833 -.8754 -.9784 -.6514 -.4491

4.5895 1.8794 2.1200 .1939 1.2717 .0777 .9150 .8754 .6514 .4780 .4491 .7833 .9784

Figure 19 displays all of the comparisons between SES and the math related variables in the research question. As expected, none of comparisons between the independent and the dependent variables are significant. The comparisons with the most significance (although not significant as previously stated) are confidence in ability to do math between high SES and average SES (.88), and hours spent on math homework between high SES and low SES. I also explored the dependent variables separately with the Tukey HSD, and it is evident means from the confidence in ability to do math have the farthest spread, which is reflects the data from figure 19. The means for parent math enjoyment are extremely similar, and the means for the hours of math homework per month slightly differ between high SES and low SES, which reflects the previous figure. Strengths and Limitations Strengths: The MANOVA also has stringent assumptions that include; normality, linearity, and homoscedasticity. These assumptions, which were met, allow us to be confident in our results. The MANOVA examines variance and covariance, assists with deflating type 1 error, and it provides more valuable results (when used appropriately) than a univariate analysis. Limitations: The results from the MANOVA, when paired with ad-hoc contrasts and post- hoc analysis may be difficult and time consuming to interpret. Also, the power of the MANOVA declines and the number of dependent variables increases. Furthermore, if the variables are unreliable, their multivariate representation may be somewhat distorted. Recommendations (what I would do differently)

I would have chosen my independent variables and written my research question before I collected my data, which may eliminate potential biases in the results. Inference Mr. Drake, from the results of the MANOVA, ad- hoc contrasts, and post-hoc comparisons, it is evident that there are no significant results. With that being said, there was a slight group difference between high SES and average SES in confidence in ability to do math, as well as a small difference between average SES and low SES in hours of math homework per month. Therefore, you may wish to do further research on the impact of SES (particularly high SES and average SES) on confidence in ability to do math, and the impact of SES (particularly average SES and low SES) on the hours of math homework that youth do per month. Discriminant Analysis Drake, it is important to run a discriminant analysis in order to distinguish which independent variables (confidence in math ability, hours of math homework, or parent enjoyment) are contributing to the differences (if any) in social economic status. Therefore, the following research question was examined by a discriminant analysis: Does confidence in math ability, hours of math homework, and or parent enjoyment contribute to the differences in social economic status (SES)? I reviewed the analysis case processing summary and found that there is no missing data, as previously reported in the data cleaning exercise. Figure 20. Group Statistics for Math Related Variables and SES
Group Statistics Valid N (listwise) grouping variable High SES confidence in ability to do math hours of math homework per month parent math enjoyment Average SES confidence in ability to do math hours of math homework per month parent math enjoyment Low SES confidence in ability to do math 11.6038 40.6517 1.84407 11.32992 90 72 90.000 72.000 12.2881 2.47349 90 90.000 11.8025 40.0781 1.91176 10.05532 88 90 88.000 90.000 13.1309 3.48480 88 88.000 Mean 43.5015 Std. Deviation 11.00970 Unweighted 88 Weighted 88.000

hours of math homework per month parent math enjoyment Total confidence in ability to do math hours of math homework per month parent math enjoyment

12.1097

2.71471

72

72.000

11.8685 41.4483

2.00203 10.83725

72 250

72.000 250.000

12.5334

2.95483

250

250.000

11.7500

1.91000

250

250.000

There is a large difference in the standard deviation of confidence in ability to do math compared to the other independent variables, however, there is not a lot of variability across groups for confidence in ability to do math, so we can assume that the variance is equal. Figure 21. Tests of Equality of Group Means for Math Related Variables
Tests of Equality of Group Means Wilks' Lambda confidence in ability to do math hours of math homework per month parent math enjoyment .997 .433 2 247 .649 .977 2.893 2 247 .057 .980 F 2.524 df1 2 df2 247 Sig. .082

Figure 21 allows us to see that hours of math homework per month approaches significance, but because of the strict guidelines it is not significant. I also explored the log determinants, which should be close together. The log determinants for average SES and low SES were 7.45 and 7.96 respectively, and the log determinant for high SES was 8.28. The test results for Boxs M were not significant, and subsequently we can assume equality of comatrices for this model. Figure 22. Eigenvalues of function 1 and 2
Eigenvalues Canonical Function 1 2 Eigenvalue .032 .006
a a

% of Variance 84.2 15.8

Cumulative % 84.2 100.0

Correlation .176 .077

a. First 2 canonical discriminant functions were used in the analysis.

It is better to have a higher eigenvalue, which is signifies that the functions accounting for the variance. Mr. Drake, the highest eigenvalue within your data is 0.32, which cumulatively

accounts 84.2% of the variance. However, if we look closer at the canonical correlation it is evident that overall these functions arent accounting for much of the variance. I also ran a Wilks Lambda test and I discovered that both of the functions are not significant. Since neither function 1 nor function 2 were significant the results of the structure matrix are irrelevant, and as such I will not be reporting them. Strengths and Limitations Strengths: Discriminant analysis identifies best linear combination of continuous variables for discriminating among groups in the categorical DV. It can verify results on multiple samples for generalizability. And last but not least, it has stringent assumptions of normality, linearity, and homoscedasticity. Limitations: The discriminant analysis does provide further insight on the data if the results from the MANOVA were not significant. We are unable to use continuous dependent variables in a discriminant analysis. Also, groups are not randomly assigned, and subsequently there may be many other contributing factors associated with certain groups that bias the results. Recommendations (what I would do differently) Once again, I would have formulated my research question before I collected my data. It also would have been a lot more riveting to conduct a discriminant analysis on a MANOVA with significant differences. Also, it does not seem plausible for math related variables to impact SES, and therefore, by preparing the variables and research questions, the results may have been an accurate representation of the population at hand. Inference Although all of the assumptions were met, and we were able to assume equality of comatrices, there were no significant results to report. Thus, it is unlikely that confidence in math ability, hours of math homework, and or parent enjoyment contribute to the differences in social economic status (SES).

Thank you for choosing JB data analyses, please find your bill enclosed. Have a nice day, Sincerely, Jaylene Bettcher

Das könnte Ihnen auch gefallen