Sie sind auf Seite 1von 6

Statistics with the SPSS Package

4. Descriptive statistics
From the Analyze menu choose the Descriptive Statistics option
4.0 Choosing your data set
If you only wish to analyse a subset of the data, then you can use the select cases option from the
data menu, highlight If condition is satisfied and click on if. Suppose we only wish to analyse
the sub-sample made up of males. In the select cases field we input the condition - sex="m". The
inverted commas are necessary when the data is inputted in non-numerical form. It should be noted
that the observations which do not satisfy this condition are crossed out in the first column.
To return to the complete data set again use select cases from the data menu and click on the All
Cases option.
4.1 Frequency tables
These are used to illustrate the distribution of qualitative random variables or discrete random
variables which take a relatively small number of values (e.g. the results of die rolls)
Choose Frequency from the Descriptive Statistics option.
For example, the table below gives the number of students in 3 departments and the percentage
share of each department (in terms of the total number of students)
department

Valid

Chemistry
French
Mech. Eng.
Total

Frequency
31
32
37
100

Percent
31.0
32.0
37.0
100.0

Valid Percent
31.0
32.0
37.0
100.0

Cumulative
Percent
31.0
63.0
100.0

4.2 Contingency Tables

The crosstabs option presents a contingency table illustrating the relationship between two
qualitative traits. One variable is placed in the rows field and one variable in the columns field. If
we use the cells option and highlight the expected option, we obtain the average number of
observations that we expect in a cell given that the traits are independent. Considering the relation
between sex and department we obtain the following contingency table.
sex * department Crosstabulation

sex

f
m

Total

Count
Expected Count
Count
Expected Count
Count
Expected Count

Chemistry
19
15.5
12
15.5
31
31.0

department
French
Mech. Eng.
16
15
16.0
18.5
16
22
16.0
18.5
32
37
32.0
37.0

Total
50
50.0
50
50.0
100
100.0

In order to describe the nature of any possible association between the two variables, it is necessary
to compare the count (number observed) with the expected count. Here, it can be seen that females
are more likely to study chemistry (there are more females studying chemistry than you would
expect if there was no relation between sex and department), similarly males are more likely to
study mechanical engineering. Since there will always be variation resulting from the fact that we
are observing a sample, such a description is qualitative and should not be used to state whether an
association is significant or not (see 2 tests of independence).
4.3 Descriptive Statistics
The descriptives option calculates various measures of centrality (mean) and dispersion (range,
standard deviation and variance) for quantitative variables. By default the mean, standard
deviation, minimum value and maximum value are displayed, by choosing the options menu the
variance, range and standard error of the mean can be displayed.
Descriptive Statistics

height
weightbef
weightaft
bmi
Valid N (listwise)

N
Statistic
100
100
100
100
100

Range
Statistic
65
57
57
8.66

Minimum
Statistic
139
43
42
19.37

Maximum
Statistic
204
100
99
28.03

Mean
Statistic
Std. Error
170.40
1.340
67.15
1.049
68.29
1.068
23.0575
.17887

Std.
Deviation
Statistic
13.398
10.487
10.676
1.78872

Variance
Statistic
179.495
109.987
113.970
3.200

Here, there are 100 observations of the body mass index (BMI), the smallest is 19.37, the largest is
28.03 and the range is 28.03-19.37=8.66. The mean BMI is 23.0575 with a standard error of
0.17887 (this is an approximation of the average error obtained when the sample mean is used to
estimate the unknown population mean BMI). The standard deviation of the BMI in the sample is
1.78872 and the variance 1.788722=3.200.
It should be noted that the median and the other quartiles can be calculated by highlighting the
corresponding statistics in the frequency menu. When the frequency menu is used to analyse a
continuous variable, then the option display frequency tables should be unmarked.

5. Tests regarding the population mean or two populations means


Such tests are carried out using the Compare Means option on the Analyze menu
5.1 Tests regarding a population mean
We choose the one-sample t test option and place the appropriate variable in the test variable field.
Suppose we wish to carry out the following test

H0: the mean height of all Irish students is 170cm


Against the alternative
HA: the mean height of all Irish students is not 170cm
Input height into the field for the test variable. The test value to be input at the bottom of this
window is 170. We obtain the following output
One-Sample Test
Test Value = 170

t
height

df
.299

99

Sig. (2-tailed)
.766

Mean
Difference
.400

95% Confidence
Interval of the
Difference
Lower
Upper
-2.26
3.06

t is the realisation of the test statistic (a measure of the distance between the mean from the null
hypothesis and the sample mean). df degrees of freedom, this is the number of observations
minus 1. Sig this is the p-value for the test. The confidence interval given is a confidence interval
for the difference between the population mean and the mean from the null hypothesis. By adding
the test value to the end points of this interval we obtain a 95% confidence interval for the mean
height of all Irish students. Hence, the 95% confidence interval for this mean is
[170-2.26, 170+3.06] = [167.74, 173.06]
The conclusion of the test is based on the p-value. We normally test at a significance level of 5%
(0.05). If the p-value is less than 0.05, we reject the null hypothesis at this significance level (we
have evidence that the null hypothesis is false). In addition, if p<0.01, we have strong evidence that
the null hypothesis is false and if p<0.001, we have very strong evidence that the null hypothesis is
false. In this example p=0.766>0.05, thus there is no evidence that the null hypothesis is false.
We can also use the duality between confidence intervals and tests to carry out a test. SPSS gives
the appropriate confidence interval for the difference between the population mean and its
hypothesised value (here 170). If 0 lies within this interval, we do not reject the null hypothesis at
the corresponding significance level. Here a 95% confidence interval is given. We can use this to
test the null hypothesis at a significance level of 5%. Since 0 belongs to the confidence interval, we
do not reject the null hypothesis that the mean height of all Irish students is 170.
Note: By default a 95% confidence interval is calculated for the difference between the population
mean and its hypothesised value. We can chance the confidence level within the options menu in
the one-sample t test window and choosing the appropriate confidence level.
In cases where we reject H0, we are interested in how the population mean appears to differ from
the mean from the null hypothesis. If t is positive, then the population mean appears to be larger
than the mean from the null hypothesis. If t is negative, then the population mean appears to be
smaller than the mean from the null hypothesis. Note that we can calculate the means of the
variables which interest us using the means option in the compare means menu.
5.2 Tests regarding the difference between 2 population means
5.2.1 For dependent samples
Such tests are used when pairs of observations are made on one group of individuals (e.g. mass

before and after a diet, blood pressure before and after treatment). In such a case we have pairs of
observations (Xi , Yi ), where this pair of observations are made on the i-th member of the test
group. We wish to test the hypothesis
H0: X Y = 0 (i.e. there is no difference between the population means)
against
HA: X Y 0 (i.e. there is a difference between the population means)
We wish to test whether the mean weight of students changes during their studies. The samples
should be entered in 2 columns. We choose the paired-sample t-test option and select the pair of
variables we are interested in (weightbef weight at the start of studies and weightaft - weight after
studies). The output is as follows
Paired Samples Statistics

Pair
1

weightbef
weightaft

Mean
67.15
68.29

Std. Deviation
10.487
10.676

100
100

Std. Error
Mean
1.049
1.068

Paired Samples Test


Paired Differences

Pair 1

weightbef - weightaft

Mean
-1.142

Std. Deviation
1.966

Std. Error
Mean
.197

95% Confidence
Interval of the
Difference
Lower
Upper
-1.533
-.752

t
-5.810

df
99

Sig. (2-tailed)
.000

The first table give the two sample means together with the sample deviations. The second table
gives the difference between the sample means (i.e. the mean weight before minus the mean weight
after for the samples). The confidence interval is a 95% confidence interval for the difference
between the two corresponding population means (i.e. the mean weight of all Irish students at the
start of their studies minus the mean weight of all Irish students at the start of the their studies). As
before, the confidence level may be changed within the options menu.
Sig. gives the p-value for the test. As before, the conclusion of the test is based on this p-value.
Since p is approximately 0, we reject the null hypothesis (we have very strong evidence that the
null hypothesis is false). In order to see how the mean weight of students changes during their
studies, we return to the first table. It can be seen that on average students are heavier at the end of
their studies. Our final conclusion is
We have very strong evidence that on average students are heavier after their studies than when
they start their studies.
Again we can use duality to carry out the hypothesis test (here at a significance level of 5%). Since
0 does not belong to this confidence interval, we have evidence that the mean weight of students
changes during their studies.
It should be noted that this test is in essence a one-sample test. If we calculate the amount by which
the weight of each student changes (i.e. Di = Xi Yi ), we can test the hypothesis
H0: D = 0
HA: D 0

This may be done in SPSS be defining the variable d = weightbef-weightaft and using the onesample t-test option with 0 as the test value. The results are exactly the same as the results obtained
using the paired samples option.

5.2.2 For independent samples


Such tests are used when one type of observation is made on two groups of individuals (for
example, the weight of British and Irish students, the height of males and females). All the
observations should be entered in one column and a grouping variable defines which group an
individual belongs to. We wish to test the hypothesis
The independent samples t-test should be chosen. We wish to test the hypothesis
H0: X Y = 0 (i.e. there is no difference between the population means)
against
HA: X Y 0 (i.e. there is a difference between the population means).
Suppose we want to test the hypothesis that the mean BMI of males is equal to the mean BMI of
females. BMI is chosen as the test variable, sex is chosen as the grouping variable. In order to
define the 2 groups, it is necessary to click on the define groups option. We define the labels of the
2 groups in this window. The output is as follows
Group Statistics

bmi

sex
m
f

N
50
50

Mean
22.8234
23.2916

Std. Deviation
1.88803
1.66969

Std. Error
Mean
.26701
.23613

Independent Samples Test


Levene's Test for
Equality of Variances

F
bmi

Equal variances
assumed
Equal variances
not assumed

Sig.
.527

.469

t-test for Equality of Means

df

Sig. (2-tailed)

Mean
Difference

Std. Error
Difference

95% Confidence
Interval of the
Difference
Lower
Upper

-1.314

98

.192

-.46821

.35644

-1.17555

.23914

-1.314

96.556

.192

-.46821

.35644

-1.17569

.23927

The first table gives the sample means (22.8234 for males and 23.2916 for females). We may be
interested in whether the variance of the BMI depends on sex. This may be read in the first two
columns. The null hypothesis is that variance does not depend on sex, the alternative is that the
variance depends on sex. The p-value for this test is 0.469 (>0.05) hence we do not reject the null
hypothesis.
Now we test whether the mean BMI index depends on sex. We can always use the equal variances
not assumed row (if there is no significant difference between the variances, the results of both
tests are almost identical). The p-value for this test is 0.192 (>0.05), hence we do not reject the null
hypothesis that the mean BMI index does not depend on sex. The final two columns give a

confidence interval for the difference between the mean BMI of all males and the BMI of all the
females (the mean for males minus the mean for females). The 95% confidence interval for this
difference is [-1.17569, 0.23927]. Since 0 is contained in this interval, at a significance level of 5%
we do not reject the null hypothesis. Again, the confidence level may be changed using the options
window.

Das könnte Ihnen auch gefallen