Sie sind auf Seite 1von 16

Part Five

Statistical techniques

to compare groups

In Part Five of this book we will be exploring some of the techniques available in SPSS to assess differences between groups or conditions. The techniques used are quite complex, drawing on a lot of underlying theory and statistical principles. Before you start your analysis using SPSS it is important that you have at least a basic understanding of the statistical techniques that you intend to use. There are many good statistical texts available that can help you with this (a list of some suitable books is provided in the References for this part and in the Recommended references at the end of the book). It would be a good idea to review this material now. This will help you understand what SPSS is calculating for you, what it means and how to interpret the complex array of numbers generated in the output. In the remainder of this part and the following chapters I have assumed that you have a basic grounding in statistics and are familiar with the terminology used.

Techniques covered in Part Five

There is a whole family of techniques that can be used to test for significant differences between groups. Although there are many different statistical techniques available in the SPSS package, only the main techniques are covered here:

Independent-samples t-test; Paired-samples t-test; One-way analysis of variance (between groups); one-way analysis of variance (repeated measures); two-way analysis of variance (between groups); mixed between-within groups analysis of variance; multivariate analysis of variance (MANOVA); one-way, and two-way analysis of covariance (ANCOVA); and non-parametric techniques.

SPSS Survival Manual

In Chapter 10 you were guided through the process of deciding which statistical technique would suit your research question. This depends on the nature of your research question, the type of data you have and the number of variables and groups you have. (If you have not read through that chapter, then you should do so before proceeding any further.)

Some of the key points to remember when choosing which technique is the right one for you are as follows:

T-tests are used when you have only two groups (e.g. males/females). Analysis of variance techniques are used when you have two or more groups. Paired-samples or repeated measures techniques are used when you test the same people on more than one occasion, or you have matched pairs. Between-groups or independent-samples techniques are used when the subjects in each group are different people (or independent). One-way analysis of variance is used when you have only one independent variable.

Two-way analysis of variance is used when you have two independent variables. Multivariate analysis of variance is used when you have more than one dependent variable. Analysis of covariance (ANCOVA) is used when you need to control for an additional variable which may be influencing the relationship between your independent and dependent variable. Before we begin to explore some of the techniques available, there are a number of common issues that need to be considered. These topics will be relevant to many of the chapters included in this part of the book, so you may need to refer back to this section as you work through each chapter.

Assumptions

Each of the tests in this section have a number of assumptions underlying their use. There are some general assumptions that apply to all of the parametric techniques discussed here (e.g. t-tests, analysis of variance), and additional assumptions associated with specific techniques. The general assumptions are presented in this section and the more specific assumptions are presented in the following chapters, as appropriate. You will need to refer back to this section when using any of the techniques presented in Part Five. For information on the procedures used to check for violation of assumptions, see Tabachnick and Fidell (2001, Chapter 4). For further discussion of the consequences of violating the assumptions, see Stevens (1996, Chapter 6) and Glass, Peckham and

Sanders (1972).

Part Five Statistical techniques to compare groups

Level of measurement

Each of these approaches assumes that the dependent variable is measured at the interval or ratio level, that is, using a continuous scale rather than discrete categories. Wherever possible when designing your study, try to make use of continuous, rather than categorical, measures of your dependent variable. This gives you a wider range of possible techniques to use when analysing your data.

Random sampling

The techniques covered in Part Five assume that the scores are obtained using a random sample from the population. This is often not the case in real-life research.

Independence of observations

The observations that make up your data must be independent of one another. That is, each observation or measurement must not be influenced by any other observation or measurement. Violation of this assumption, according to Stevens (1996, p. 238), is very serious. There are a number of research situations that may violate this assumption of independence. Examples of some such studies are described below (these are drawn from Stevens, 1996, p. 239; and Gravetter & Wallnau, 2000, p. 262):

Studying the performance of students working in pairs or small groups. The behaviour of each member of the group influences all other group members, thereby violating the assumption of independence. Studying the TV watching habits and preferences of children drawn from the same family. The behaviour of one child in the family (e.g. watching Program A) is likely to influence all children in that family, therefore the observations are not independent. Studying teaching methods within a classroom and examining the impact on students behaviour and performance. In this situation all students could be influenced by the presence of a small number of trouble-makers, therefore individual behavioural or performance measurements are not independent. Any situation where the observations or measurements are collected in a group setting, or subjects are involved in some form of interaction with one another, should be considered suspect. In designing your study you should try to ensure that all observations are independent. If you suspect some violation of this assumption, Stevens (1996, p. 241) recommends that you set a more stringent alpha value (e.g. p<.01).

SPSS Survival Manual

Normal distribution

It is assumed that the populations from which the samples are taken are normally distributed. In a lot of research (particularly in the social sciences), scores on the dependent variable are not nicely normally distributed. Fortunately, most of the techniques are reasonably robust or tolerant of violations of this assumption. With large enough sample sizes (e.g. 30+), the violation of this assumption should not cause any major problems (see discussion of this in Gravetter & Wallnau, 2000, p. 302; Stevens, 1996, p. 242). The distribution of scores for each of your groups can be checked using histograms obtained as part of the Descriptive Statistics, Explore option of SPSS (see Chapter 6). For a more detailed description of this process, see Tabachnick and Fidell (2001, pp. 99104).

Homogeneity of variance

Techniques in this section make the assumption that samples are obtained from populations of equal variances. This means that the variability of scores for each of the groups is similar. To test this, SPSS performs the Levene test for equality of variances as part of the t-test and analysis of variances analyses. The results are presented in the output of each of these techniques. Be careful in interpreting the results of this test: you are hoping to find that the test is not significant (i.e. a significance level of greater than .05). If you obtain a significance value of less than .05, this suggests that variances for the two groups are not equal, and you

have therefore violated the assumption of homogeneity of variance. Dont panic if you find this to be the case. Analysis of variance is reasonably robust to violations of this assumption, provided the size of your groups is reasonably similar (e.g. largest/smallest=1.5, Stevens, 1996, p. 249). For t-tests you are provided with two sets of results, for situations where the assumption is not violated and for when it is violated. In this case, you just consult whichever set of results is appropriate for your data.

Type 1 error, Type 2 error and power

As you will recall from your statistics texts, the purpose of t-tests and analysis of variance is to test hypotheses. With these types of analyses there is always the possibility of reaching the wrong conclusion. There are two different errors that we can make. We may reject the null hypothesis when it is, in fact, true (this is referred to as a Type 1 error). This occurs when we think there is a difference between our groups, but there really isnt. We can minimise this possibility by selecting an appropriate alpha level (the two levels often used are .05, and .01).

There is also a second type of error that we can make (Type 2 error). This occurs when we fail to reject a null hypothesis when it is, in fact, false (i.e. believing that the groups do not differ, when in fact they do). Unfortunately these

Part Five Statistical techniques to compare groups

two errors are inversely related. As we try to control for a Type 1 error, we actually increase the likelihood that we will commit a Type 2 error.

Ideally we would like the tests that we use to correctly identify whether in fact there is a difference between our groups. This is called the power of a test. Tests vary in terms of their power (e.g. parametric tests such as t-tests, analysis of variance etc. are more powerful than non-parametric tests); however, there are other factors that can influence the power of a test in a given situation:

sample size; effect size (the strength of the difference between groups, or the influence of the independent variable); and alpha level set by the researcher (e.g. .05/.01). The power of a test is very dependent on the size of the sample used in the study. According to Stevens (1996), when the sample size is large (e.g. 100 or more subjects), then power is not an issue (p. 6). However, when you have a study where the group size is small (e.g. n=20), then you need to be aware of the possibility that a non-significant result may be due to insufficient power. Stevens (1996) suggests that when small group sizes are involved it may be necessary to adjust the alpha level to compensate (e.g. set a cut-off of .10 or .15, rather than

the traditional .05 level).

There are tables available (see Cohen, 1988) that will tell you how large your sample size needs to be to achieve sufficient power, given the effect size you wish to detect. Ideally you would want an 80 per cent chance of detecting a relationship (if in fact one did exist). Some of the SPSS programs also provide an indication of the power of the test that was conducted, taking into account effect size and sample size. If you obtain a non-significant result and are using quite a small sample size, you need to check these power values. If the power of the test is less than .80 (80 per cent chance of detecting a difference), then you would need to interpret the reason for your nonsignificant result carefully. This may suggest insufficient power of the test, rather than no real difference between your groups. The power analysis gives an indication of how much confidence you should have in the results when you fail to reject the null hypothesis. The higher the power, the more confident you can be that there is no real difference between the groups.

Planned comparisons/Post-hoc analyses

When you conduct analysis of variance you are determining whether there are significant differences among the various groups or conditions. Sometimes you may be interested in knowing if, overall, the groups differ (that your independent variable in some way influences scores on your dependent variable). In other research contexts, however, you might be more focused and interested in testing the differences between specific groups, not between all the various groups. It is

SPSS Survival Manual

important that you distinguish which applies in your case, as different analyses are used for each of these purposes.

Planned comparisons (also know as a priori) are used when you wish to test specific hypotheses (usually drawn from theory or past research) concerning the differences between a subset of your groups (e.g. do Groups 1 and 3 differ significantly?). These comparisons need to be specified, or planned, before you analyse your data, not after fishing around in your results to see what looks interesting!

Some caution needs to be exercised with this approach if you intend to specify a lot of different comparisons. Planned comparisons do not control for the increased risks of Type 1 errors. A Type 1 error involves rejecting the null hypothesis (e.g. there are no differences among the groups) when it is actually true. In other words there is an increased risk of thinking that you have found a significant result when in fact it could have occurred by chance. If there are a large number of differences that you wish to explore, it may be safer to use the alternative approach (post-hoc comparisons), which is designed to protect against Type 1 errors.

The other alternative is to apply what is known as a Bonferroni adjustment to the alpha level that you will use to judge statistical significance. This involves setting a more stringent alpha level for each comparison, to keep the alpha across

all the tests at a reasonable level. To achieve this you can divide your alpha level (usually .05) by the number of comparisons that you intend to make, and then use this new value as the required alpha level. For example, if you intend to make three comparisons the new alpha level would be .05 divided by 3 which equals .017. For a discussion on this technique, see Tabachnick and Fidell (2001, p. 50)

Post-hoc comparisons (also known as a posteriori) are used when you want to conduct a whole set of comparisons, exploring the differences between each of the groups or conditions in your study. If you choose this approach your analysis consists of two steps. First, an overall F ratio is calculated which tells you whether there are any significant differences among the groups in your design. If your overall F ratio is significant (indicating that there are differences among your groups), you can then go on and perform additional tests to identify where these differences occur (e.g. does Group 1 differ from Group 2 or Group 3, do Group 2 and Group 3 differ).

Post-hoc comparisons are designed to guard against the possibility of an increased Type 1 error due to the large number of different comparisons being made. This is done by setting more stringent criteria for significance, and therefore it is often harder to achieve significance. With small samples this can be a problem, as it can be very hard to find a significant result, even when the apparent difference in scores between the groups is quite large.

There are a number of different post-hoc tests that you can use, and these vary in terms of their nature and strictness. The assumptions underlying the posthoc

tests also differ. Some assume equal variances for the two groups (e.g. Tukey),

Part Five Statistical techniques to compare groups

others do not assume equal variance (e.g. Dunnetts C test). Two of the most commonly used post-hoc tests are Tukeys Honestly Significant Different test (HSD) and the Scheffe test. Of the two, the Scheffe test is the most cautious method for reducing the risk of a Type 1 error. However, the cost here is power. You may be less likely to detect a difference between your groups using this approach.