Sie sind auf Seite 1von 27

SW388R7

Data Analysis &


Computers II
Slide 1
Assumption of normality
Assumption of normality
Transformations
Assumption of normality script
Practice prolems
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 2
Assumption of !ormality

"any of t#e statistical met#ods t#at $e $ill apply


re%uire t#e assumption t#at a &ariale or &ariales
are normally distriuted'

Wit# multi&ariate statistics( t#e assumption is t#at


t#e comination of &ariales follo$s a multi&ariate
normal distriution'

Since t#ere is not a direct test for multi&ariate


normality( $e )enerally test eac# &ariale
indi&idually and assume t#at t#ey are multi&ariate
normal if t#ey are indi&idually normal( t#ou)# t#is is
not necessarily t#e case'
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 3
*&aluatin) normality

T#ere are ot# )rap#ical and statistical met#ods for


e&aluatin) normality'

+rap#ical met#ods include t#e #isto)ram and


normality plot'

Statistical met#ods include dia)nostic #ypot#esis


tests for normality( and a rule of t#um t#at says a
&ariale is reasonaly close to normal if its s,e$ness
and ,urtosis #a&e &alues et$een -1'. and /1'.'

!one of t#e met#ods is asolutely definiti&e'


SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 4
Transformations

W#en a &ariale is not normally distriuted( $e can


create a transformed &ariale and test it for
normality' If t#e transformed &ariale is normally
distriuted( $e can sustitute it in our analysis'

T#ree common transformations are0 t#e lo)arit#mic


transformation( t#e s%uare root transformation( and
t#e in&erse transformation'

All of t#ese c#an)e t#e measurin) scale on t#e


#ori1ontal a2is of a #isto)ram to produce a
transformed &ariale t#at is mat#ematically
e%ui&alent to t#e ori)inal &ariale'
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 5
W#en transformations do not $or,

W#en none of t#e transformations induces normality


in a &ariale( includin) t#at &ariale in t#e analysis
$ill reduce our effecti&eness at identifyin) statistical
relations#ips( i'e' $e lose po$er'

We do #a&e t#e option of c#an)in) t#e $ay t#e


information in t#e &ariale is represented( e')'
sustitute se&eral dic#otomous &ariales for a sin)le
metric &ariale'
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 6
Prolem 1
In t#e dataset +SS3...'sa&( is t#e follo$in)
statement true( false( or an incorrect application of a
statistic4 5se .'.1 as t#e le&el of si)nificance'
6ased on a dia)nostic #ypot#esis test of normality(
total #ours spent on t#e Internet is normally
distriuted'
1' True
3' True $it# caution
3' 7alse
8' Incorrect application of a statistic
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 7
Computin) 9*2plore: descripti&e statistics
To compute the statistics
needed for evaluating the
normality of a variable, select
the Explore command from
the Descriptive Statistics
menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 8
Addin) t#e &ariale to e e&aluated
First, click on the
variable to be included
in the analysis to
highlight it.
Second, click on right
arrow button to move
the highlighted variable
to the Dependent List.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide 9
Selectin) statistics to e computed
To select the statistics for the
output, click on the
Statistics command button.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
!
Includin) descripti&e statistics
First, click on the
Descriptives checkbox
to select it. lear the
other checkboxes.
Second, click on the
Continue button to
complete the re!uest for
statistics.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide

Selectin) c#arts for t#e output


To select the diagnostic charts
for the output, click on the
Plots command button.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
2
Includin) dia)nostic plots and statistics
First, click on the
None option button
on the "oxplots panel
since boxplots are not
as helpful as other
charts in assessing
normality.
Second, click on the
Normality plots with tests
checkbox to include
normality plots and the
hypothesis tests for
normality.
Third, click on the Histogram
checkbox to include a
histogram in the output. #ou
may want to examine the
stem$and$leaf plot as well,
though % find it less useful.
Finally, click on the
Continue button to
complete the re!uest.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
3
Completin) t#e specifications for t#e analysis
lick on the &' button to
complete the specifications
for the analysis and re!uest
S(SS to produce the output.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
4
TOTAL TIME SPENT ON THE INTERNET
100.0
90.0
80.0
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
Histogra
!
r
"
#
$
"
%
&
'
50
40
30
20
10
0
St(. )"* + 15.35
M"a% + 10.7
N + 93.00
T#e #isto)ram
)n initial impression of the
normality of the distribution
can be gained by examining
the histogram.
%n this example, the
histogram shows a substantial
violation of normality caused
by a extremely large value in
the distribution.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
5
Nora, -.- P,ot o/ TOTAL TIME SPENT ON THE INTERNET
O0s"r*"( 1a,$"
120 100 80 60 40 20 0 .20 .40
E
2
3
"
&
t
"
(

N
o
r

a
,
3
2
1
0
.1
.2
.3
T#e normality plot
The problem with the normality of this
variable*s distribution is reinforced by the
normality plot.
%f the variable were normally distributed,
the red dots would fit the green line very
closely. %n this case, the red points in the
upper right of the chart indicate the
severe skewing caused by the extremely
large data values.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
6
Tests of Normality
.246 93 .000 .606 93 .000
TOTAL TIME SPENT
ON THE INTERNET
Statisti& (/ Sig. Statisti& (/ Sig.
4o,ogoro*.Sir%o*
a
S5a3iro.6i,7
Li,,i"/ors Sig%i/i&a%&" 8orr"&tio%
a.
T#e test of normality
(roblem + asks about the results of the test of normality. Since the sample
si,e is larger than -., we use the 'olmogorov$Smirnov test. %f the sample
si,e were -. or less, we would use the Shapiro$/ilk statistic instead.
The null hypothesis for the test of normality states that the actual
distribution of the variable is e!ual to the expected distribution, i.e., the
variable is normally distributed. Since the probability associated with the
test of normality is 0 ....+ is less than or e!ual to the level of significance
1...+2, we re3ect the null hypothesis and conclude that total hours spent on
the %nternet is not normally distributed. 14ote5 we report the probability as
0....+ instead of .... to be clear that the probability is not really ,ero.2
The answer to problem + is false.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
7
T#e assumption of normality script
)n S(SS script to produce all
of the output that we have
produced manually is
available on the course web
site.
)fter downloading the script,
run it to test the assumption
of linearity.
Select Run Script
from the 6tilities
menu.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
8
Selectin) t#e assumption of normality script
First, navigate to the folder containing your
scripts and highlight the
4ormality)ssumption)ndTransformations.S"S
script.
Second, click on
the Run button to
activate the script.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
9
Specifications for normality script
The default output is to do all of the
transformations of the variable. To
exclude some transformations from the
calculations, clear the checkboxes.
Third, click on the OK
button to run the script.
First, move variables from
the list of variables in the
data set to the aria!les to
"est list box.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
2!
Tests of Normality
.246 93 .000 .606 93 .000
TOTAL TIME SPENT
ON THE INTERNET
Statisti& (/ Sig. Statisti& (/ Sig.
4o,ogoro*.Sir%o*
a
S5a3iro.6i,7
Li,,i"/ors Sig%i/i&a%&" 8orr"&tio%
a.
T#e test of normality
The script produces the same output that we
computed manually, in this example, the tests
of normality.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
2
Prolem 3
In t#e dataset +SS3...'sa&( is t#e follo$in)
statement true( false( or an incorrect application of a
statistic4
6ased on t#e rule of t#um for t#e allo$ale
ma)nitude of s,e$ness and ,urtosis( total #ours
spent on t#e Internet is normally distriuted'
1' True
3' True $it# caution
3' 7alse
8' Incorrect application of a statistic
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
22
Descriptives
10.731 1.5918
7.570
13.893
8.295
5.500
235.655
15.3511
.2
102.0
101.8
10.200
3.532 .250
15.614 .495
M"a%
Lo9"r :o$%(
;33"r :o$%(
95< 8o%/i("%&"
I%t"r*a, /or M"a%
5< Tri"( M"a%
M"(ia%
1aria%&"
St(. )"*iatio%
Mi%i$
Ma2i$
Ra%g"
I%t"r#$arti," Ra%g"
S7"9%"ss
4$rtosis
TOTAL TIME SPENT
ON THE INTERNET
Statisti& St(. Error
Tale of descripti&e statistics
To answer problem
7, we look at the
values for skewness
and kurtosis in the
Descriptives table.
The skewness and kurtosis for the variable both exceed the rule of
thumb criteria of +... The variable is not normally distributed.
The answer to problem 7 if false.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
23
Prolem 3
In t#e dataset +SS3...'sa&( is t#e follo$in) statement
true( false( or an incorrect application of a statistic4
5se .'.1 as t#e le&el of si)nificance'
6ased on a dia)nostic #ypot#esis test of normality(
;total #ours spent on t#e Internet; is not normally
distriuted' A lo)arit#mic transformation of ;total
#ours spent on t#e Internet; results in a &ariale t#at
is normally distriuted'
1' True
3' True $it# caution
3' 7alse
8' Incorrect application of a statistic
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
24
Tests of Normality
.047 93 .200= .994 93 .951
.118 93 .003 .868 93 .000
.288 93 .000 .495 93 .000
Logarit5 o/ NETIME
>L?10@NETIMEAB
S#$ar" Root o/ NETIME
>S-RT@NETIMEAB
I%*"rs" o/ NETIME
>1C@NETIMEAB
Statisti& (/ Sig. Statisti& (/ Sig.
4o,ogoro*.Sir%o*
a
S5a3iro.6i,7
T5is is a ,o9"r 0o$%( o/ t5" tr$" sig%i/i&a%&".
=.
Li,,i"/ors Sig%i/i&a%&" 8orr"&tio%
a.
T#e test of normality
(roblem 8 specifically asks about the results of the test of
normality for the logarithmic transformation. Since our sample
si,e is larger than -., we use the 'olmogorov$Smirnov test.
The null hypothesis for the 'olmogorov$Smirnov test of
normality states that the actual distribution of the transformed
variable is e!ual to the expected distribution, i.e., the
transformed variable is normally distributed. Since the
probability associated with the test of normality 1..7..2 is
greater than the level of significance, we fail to re3ect the null
hypothesis and conclude that the logarithmic transformation of
total hours spent on the %nternet is normally distributed.
The answer to problem 8 is true.
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
25
<t#er prolems on assumption of normality

A prolem may as, aout t#e assumption of normality


for a nominal le&el &ariale' T#e ans$er $ill e 9An
inappropriate application of a statistic: since t#ere is
no e2pectation t#at a nominal &ariale e normal'

A prolem may as, aout t#e assumption of normality


for an ordinal le&el &ariale' If t#e &ariale or
transformed &ariale is normal( t#e correct ans$er to
t#e %uestion is 9True $it# caution: since $e may e
re%uired to defend treatin) an ordinal &ariale as
metric'

=uestions $ill specify a le&el of si)nificance to use and


t#e statistical e&idence upon $#ic# you s#ould ase
your ans$er'
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
26
Steps in ans$erin) %uestions aout t#e
assumption of normality - %uestion 1
The following is a guide to the decision process for answering
problems about the normality of a variable5
Does the statistical
evidence support
normality assumption9
#es
4o
%ncorrect application
of a statistic
#es
4o
%s the variable to be
evaluated metric9
:alse
)re any of the metric
variables ordinal level9
#es
True
4o
True with caution
SW388
R7
Data
Analysi
s &
Compu
ters II
Slide
27
Steps in ans$erin) %uestions aout t#e
assumption of normality - %uestion 3
The following is a guide to the decision process for answering
problems about the normality of a transformation5
Statistical evidence
supports normality9
#es
4o
%ncorrect application
of a statistic
#es
4o
%s the variable to be
evaluated metric9
Statistical evidence
for transformation
supports normality9
Either variable
ordinal level9
4o
4o
#es
:alse
True
True with caution

Das könnte Ihnen auch gefallen