Sie sind auf Seite 1von 44

Statistics

1
Dr Sue Hill MA PhD FRCA
University Hospital
Southampton, UK

What is statistics?
Mathematical calculations based on

sample data to used to estimate


population values
Summary statistics
Inferential statistics (tests!)
2

Types of Data
Categorical data can be:
Nominal or ordinal
special categorical (binary) data
Numerical data can be:
Discrete
Continuous
3

Data Type: Decide which for


each of these:
Womens EU Heights of
Shoe sizes: six 8-y old
boys
36
n=2
1.1m
37
n=5
1.05m
38
n=42
1.2m
39
n=55
1.15m
40
n=25
1.15m
41
n=6
1.4m

Number of
Throwing a
patients with head or tail:
a given blood
group:
O

n=55

n=45

n=12

AB

n=20

Head

Data Type:
Womens EU Heights of Number of
Throwing a
Shoe sizes: six 8-y old patients with head or tail:
boys
a given blood
36
n=2
group:
37
n=5
1.1m
Head
O
n=55
38
n=42
1.05m
39

n=55

40

n=25

41

n=6

1.2m
1.15m

Numerical

1.15m

Discrete

1.4m

n=45

n=12

AB

n=20

Data Type:

37

n=5

1.1m

Throwing a
Number of
patients with head or tail:
a given blood
group:

38

n=42

1.05m

n=55

39

n=55

1.2m

n=45

40

n=25

1.15m

n=12

41

n=6

1.15m

AB

n=20

Womens EU Heights of
Shoe sizes: six 8-y old
boys
36
n=2

Numerical
Discrete

1.4m

Head

Numerical,
Continuous

Data Type:
Womens EU Heights of
Shoe sizes: six 8-y old
boys
36
n=2
37

n=5

1.1m

38

n=42

1.05m

39

n=55

1.2m

40

n=25

1.15m

n=6

1.15m

41

Numerical
Discrete

Number of
Throwing a
patients with head or tail:
a given blood
group:
O

n=55

n=45

n=12

AB

n=20

1.4m

Categorical,

Numerical,

Nominal

Continuous

Head

Data Type:
Throwing a
Womens EU Heights of Number of
Shoe sizes: six 8-y old patients with head or tail:
a given blood
boys
36
n=2
group:
37
38
39
40
41

n=5

1.1m

n=42

1.05m

n=55

1.2m

n=25

1.15m

n=6

1.15m

Numerical
Discrete

n=55

n=45

n=12

AB

n=20

1.4m

Head

Special
Categorical
Binary

Numerical Categorical,
Continuous Nominal

Think of some relevant


examples of Categorical,
Ordinal data...

Categorical, Ordinal data


ASA grade
GCS Score
Aldrete Score
Cancer stage
10

Summary Population Statistics


Measure of average value
mean
median
mode

Measure of spread of values


variance/standard deviation
interquartile range
11

Mode, median, mean?


What type of data?????
80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


12

Mode?
80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


13

Mode?
Mode = 3
80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


14

Median?
Mode = 3
80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


15

Median? (value of the (n+1)/2 observation,


when ranked in ascending order)
Mode = 3
80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


16

Mean?
Mode = 3
80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


17

Mean?
Mode = 3
Mean = 2.84 80
50
60
50
40
20

Number of absentees from class over 250 consecutive days


18

Confidence Interval for the Mean


The 95% CI around the sample mean indicates the

range of values which, on 95% of occasions, will


contain the true population mean
For large samples:
a. find the mean
b. calculate the SEM (SEM = s.d./n)
c. multiply SEM by 1.96
d. CI = mean 1.96.SEM
19

Symmetric distribution of
values Mean, median, mode?

20

Symmetric distribution of
values Mean, median, mode?
mode = 40
mean = 40.27

median = 40

21

Asymmetric distribution of
values: skewed data

22

Asymmetric distribution of
values: skewed data - negative
skew
mode = 60
median = 50
mean = 45

23

Asymmetric distribution of
values: skewed data

24

Asymmetric distribution of
values: skewed data: positive
skew
mode = 20
median = 30
mean = 33.6

25

Transforming Skewed data


To make the distribution more symmetric and

approximately normally distributed


Positively skewed:
squaring
third power etc

Negatively skewed:
square root
logarithm

26

Spread of Data Values


Range
Variance and standard deviation
Interquartile range

27

Variance & Standard


Deviation
Useful particularly for symmetric distributions

To estimate the population variance:


- calculate the sample mean (m)
- subtract the mean value from each of our observations and
square the resulting value (why?) then sum these all
together and divide by the number of observations less
1(why?)

- ((x - m)2)/(n-1)
- standard deviation is the square root of the variance

28

Symmetric distribution: mean & s.d.

29

Symmetric distribution:m 1 s.d.


mean = 42.6
s.d. 16.1

s.d. 16.1

30

Interquartile range
Most useful for non-symmetrical, skewed

data
Identifies spread of values for the middle

50% of data (25% - 75% of values)


- rank the observations from lowest to highest
- find the lower quartile boundary: the value in the
(n+1)/4 position
- find the upper quartile boundary: the value in the
3(n+1)/4 position
31

Symmetric distribution: IQR


median = 40
IQR

32

Symmetric distribution: IQR < 1 s.d.


median = 40
30

50

26.5 m = 42.6 58.7

33

Asymmetric distribution of
values: IQR
median = 30
IQR

34

Asymmetric distribution of
values: 1 s.d
-1 s.d. to +1 s.d.

mean = 33.6

35

Box-and-whisker plot
80
70
60
50
40
30
20
10

median

median

Positive skew

Negative skew
36

Data Distributions
Continuous data may follow a known distribution

Normal distribution
t-distribution
Discrete data may follow a known distribution

Poisson distribution
Binomial distribution
The data may not follow any known distribution, it

may be skewed or multi-modal


37

The Normal (Gaussian)


Distribution
Mean = median = mode

Thus about 68% of events within 1 s.d. of the mean, 95%


within 2 s.d.
A value may lie outside 2 s.d. 5% of the time and still belong
to the distribution (i.e. a probability of 0.05)
38

A Normal Distribution
The exact shape of a Normal frequency

distribution depends on two


parameters:
- the mean ()
- the variance (2)

39

Standard Normal Distribution


All Gaussian distributions, N( , 2), can be

transformed into the standard normal distribution


(s.n.d.) with mean 0 and variance 1 [N(0, 1)]
z-tables display cumulative probability in terms of

multiples of s.d. from the mean in a s.n.d.


for z = +1.96, p = 0.975 (only 2.5% of observations lie above 1.96
s.d. from the mean)

To find the z-score for x from N( , )

z = (x - )/
40

The t-distribution
- symmetrical distribution
- differs from the Normal distribution in that slightly more
observations outside 2 s.d. from the mean
- exact shape depends on the number of degrees of freedom

41

Binomial Distribution
Number of successes in a series of independent

events each of which has two possible outcomes.


Shape depends on the probability of success,
symmetrical only if p = 0.5

42

Poisson distribution
Likelihood of n successes when mean number of successes

known and there is a random process determining when an


event will occur e.g. number of cardiac arrests in a hospital per day
- shape depends on mean: as the mean increases, so it
becomes more symmetrical

mean = variance

43

Summary
Statistics is all about estimating population values from samples
Confidence intervals allow us to indicate a range which, on 95% of
occasions, will contain the true population value

Summary statistics are used to describe the average value and


spread of values

mean and variance, for symmetric data


median and IQR, better for asymmetric data
The Normal distribution is continuous whereas the Poisson and
Binomial distributions are discrete

44

Das könnte Ihnen auch gefallen