Beruflich Dokumente
Kultur Dokumente
http://search.yahoo.com/search;_ylt=AnAUD4
pLlnU4Dk9H04wQYrebvZx4?fr=yfp-t-701-s
&toggle=1&cop=mss&ei=UTF8&fp_ip=ph&p
=types%20of%20descriptive%20statistics
1
Overview
Descriptive statistics
The what and why of descriptive statistics
Types of variables
3
Purpose of Descriptive Statistics
Characterize subjects in a study
Sample size
Patterns of sampling
Summary measures
Distribution
4
Purpose (con’t)
Validity of assumptions
Distribution
outliers
Equal variance
Linearity
Hypothesis generating
Exploring unanticipated effects
Difference in effects across subgroups
Characterization of dose response
Linear
exponential
5
Types of Descriptive statistics
Univariate
Describing one variable
Bivariate
Describing two variables simultaneously
Trivariate
Describing three variables simultaneously
6
Types of variables
7
Definitions
Variable: a characteristic that changes or
varies over time and/or different subjects
under consideration.
9
Quantitative variables
Discrete variables: can only take values from
a list of possible values
Number of co-morbidities
10
Categorical variables
Nominal: unordered categories
Race/ethnicity
Gender
11
Univariate statistics
(numerical variables)
Summary measures
Measures of location
Measures of spread
12
n
xi
i 1
Summary Statistics:
Measures of central tendency (location)
Mean: The mean of a data set is the sum of the
observations divided by the number of observation
Population mean: 1 n Sample mean: 1 n
xi x xi
n i 1 n i 1
Note: The mean assumes that the data is normally distributed. If this is not the case it is
better to report the median as the measure of location.
15
Summary statistics
Measures of spread (scale)
Variance: The average of the squared deviations of
each sample value from the sample mean, except that
instead of dividing the sum of the squared deviations
by the sample size N, the sum is divided by N-1.
1 n
s
2
i x x 2
n 1 i 1
17
Summary statistics: measures of spread
(scale)
We can describe the spread of a distribution by using
percentiles.
Q1 Q2 Q3
19
Five number system
Maximum
Minimum
Median=50th percentile
Lower quartile Q1=50th percentile
Upper quartile Q3=75th percentile
20
Graphical display of numerical
variables
(histogram)
Class Interval
20
Frequency
20-under 30 6
Frequency
30-under 40 18
10
40-under 50 11
50-under 60 11
60-under 70 3
0
70-under 80 1
0 10 20 30 40 50 60 70 80
Years
21
Graphical display of numerical
variables
(stem and leaf plot)
Raw Data Stem Leaf
86 77 91 60 55 2 3
76 92 47 88 67 3 9
4 79
23 59 72 75 83
5 569
77 68 82 97 89 6 07788
7 0245567789
81 75 74 39 67
8 11233689
79 83 70 78 91 9 11247
68 49 56 94 81
22
Graphical display of numerical
variables
(box plot)
Median
Minimum Q1 Q2 Q3 Maximum
23
Graphical display of numerical
variables
(box plot)
S<0 S=0 S>0
28
Bivariate Relationships
Two quantitative variables
Scatter plot
Side by side stem and leaf plots
30
Two quantitative variables
Correlation
A relationship between two variables.
Explanatory Response
(Independent)Variable (Dependent)Variable
x y
Hours of Training Number of Accidents
Shoe Size Height
Cigarettes smoked per day Lung Capacity
Score on SAT Grade Point Average
Height IQ
What type of relationship exists between the two variables
and is the correlation significant?
31
Scatter Plots and Types of Correlation
x = hours of training
Accidents y = number of accidents
60
50
A ccid e n ts
40
30
20
10
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
300 350 400 450 500 550 600 650 700 750 800
Math SAT
150
140
130
IQ
120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation 34
Correlation Coefficient
A measure of the strength and direction of a linear relationship
between two variables
nxy xy
r
nx 2 x
2
ny 2 (y ) 2
-1 0 1
If r is close to -1 If r is close to If r is close to 1
there is a strong 0 there is no there is a strong
negative linear positive
correlation correlation correlation
35
Positive and negative correlation
1 If two variables x and y are positively correlated this means
that:
large values of x are associated with large values of y, and
small values of x are associated with small values of y
36
Positive correlation
37
Negative correlation
38
Two qualitative variables
(Contingency Tables)
40
Contingency table:
heart attack example
Heart Attack No Heart Total
Attack
41
Two qualitative variables
Marijuana Use in College: x=parental use, y=student use
Both Neither One 60
50
Never 17 141 68 226
40
Occasional 11 54 44 109 30
20
Regular 19 40 51 110 10
0
47 235 163 445 Both N either One
43
One quantitative, One qualitative
Box plot of age by low birth weight
Mean age by low birth weight
50
24
23.66
40 23.5
23
a
g 30
22.5 22.31
e
20
22
21.5
yes no
10
44
Case Study #1
Birth weight and age
b wt
5000
4000
3000
2000
1000
10 20 30 40 50
a ge
45
r=.09
Trivariate Relationships
An extension of bivariate descriptive statistics
46
Confounding and effect modification
A factor, Z, is said to confound a relationship
between a risk factor, X, and an outcome, Y, if it is
not an effect modifier and the unadjusted strength of
the relationship between X and Y differs from the
common strength of the relationship between X and Y
for each level of Z.
48
Case study #1
Race and smoking status
90
82.09
80
70
61.54
60 54.17
50 45.83
yes
40 38.46
no
30
20 17.91
10
0
white black other
49
Case Study #1
Race, smoking status, LBW
70
63.46
60 58.33
60
smokers
50
40 41.67
40 36.54
yes
30 no
20
10
0
white black other
100
90.91
90
80
Non-smokers 70 68.75
63.64
60
50 yes
40 36.36 no
31.25
30
20
9.09
10
0
white black other 50
Multivariate Statistics
Allows one to calculated the association
between and response and outcome of interest,
after controlling for potential confounders.
51