Source: Pllnu4Dk9H04Wqyrebvzx4?Fr Yfp-T-701-S &toggle 1&cop Mss&Ei Utf8&Fp - Ip PH&P Types of Descriptive Statistics

Source
 http://search.yahoo.com/search;_ylt=AnAUD4
pLlnU4Dk9H04wQYrebvZx4?fr=yfp-t-701-s
&toggle=1&cop=mss&ei=UTF8&fp_ip=ph&p
=types%20of%20descriptive%20statistics
1
Overview
 Descriptive statistics
 The what and why of descriptive statistics
 Types of variables
 Formulas and interpretations of commonly used descriptive

statistics
 Pictorial representations of descriptive statistics
 Examining the relationship between two or more variables

2
Descriptive Statistics
 Used to describe the basic features of the data
in the study
 Types of variables
 Summary statistics
 Distribution of variables
 Pictorial representation
 Allows you to get a feel for the data
3
Purpose of Descriptive Statistics
 Characterize subjects in a study
 Sample size
 Patterns of sampling
 Summary measures
 Distribution
 Finding errors in data collection or data entry

 Impossible, improbable, or inappropriate values
 Values too high or too low
 Outliers
 Strange combinations
 Missing data
 Response rates
4
Purpose (con’t)
 Validity of assumptions
 Distribution
 outliers
 Equal variance
 Linearity
 Hypothesis generating
 Exploring unanticipated effects
 Difference in effects across subgroups
 Characterization of dose response
 Linear
 exponential
5
Types of Descriptive statistics
 Univariate
 Describing one variable
 Bivariate
 Describing two variables simultaneously
 Trivariate
 Describing three variables simultaneously
6
Types of variables
7
Definitions
 Variable: a characteristic that changes or
varies over time and/or different subjects
under consideration.
 Changing over time

 Blood pressure, height, weight
 Changing across a population

 gender, race/ethnicity
8
Definitions (con’t)
 Quantitative variables (numeric): measure a
numerical quantity of amount on each
experimental unit
 Qualitative variables (categorical): measure

a non numeric quality or characteristic on each
experimental unity by classifying each subject
into a category
9
Quantitative variables
 Discrete variables: can only take values from
a list of possible values
 Number of co-morbidities
 Continuous variables: can assume the

infinitely many values corresponding to the
points on a line interval
 weight, height
10
Categorical variables
 Nominal: unordered categories
 Race/ethnicity
 Gender
 Ordinal: ordered categories

 likert scales( disagree, neutral, agree )
 Income categories
11
Univariate statistics
(numerical variables)
 Measures of location
 Measures of spread
 Overall pattern (distribution)

 Unimodal (one major peak) vs. bimodal) (2 peaks)
 Symmetric vs. skewed
 Outliers-an individual value that falls outside the
overall pattern
12
n
   xi
i 1
Summary Statistics:
Measures of central tendency (location)
 Mean: The mean of a data set is the sum of the
observations divided by the number of observation
 Population mean: 1 n Sample mean: 1 n
   xi x   xi
n i 1 n i 1
 Median: The median of a data set is the “middle

value”
 For an odd number of observations, the median is the
observation exactly in the middle of the ordered list
 For an even number of observation, the median is the mean
of the two middle observation is the ordered list
 Mode: The mode is the single most frequently 13

occurring data value
Skewness
The skewness of a distribution is measured by
comparing the relative positions of the mean, median
and mode.
 Distribution is symmetrical
 Mean = Median = Mode
 Distribution skewed right

 Median lies between mode and mean, and
mode is less than mean
 Distribution skewed left

 Median lies between mode and mean, and
14
Relative positions of the mean and median for (a)
right-skewed, (b) symmetric, and
(c) left-skewed distributions
Note: The mean assumes that the data is normally distributed. If this is not the case it is
better to report the median as the measure of location.
15
Summary statistics
Measures of spread (scale)
 Variance: The average of the squared deviations of
each sample value from the sample mean, except that
instead of dividing the sum of the squared deviations
by the sample size N, the sum is divided by N-1.
1 n
s 
2
 i  x  x  2
n  1 i 1
 Standard deviation: The square root of the sample

variance
n
1
s  x  x
n  1 i 1
i
2
 Range: the difference between the maximum and

minimum values in the sample.
16
Normal curves
same mean but different standard deviation
17
Summary statistics: measures of spread
(scale)
 We can describe the spread of a distribution by using
percentiles.
 The pth percentile of a distribution is the value such that p

percent of the observations fall at or below it.
 Median=50th percentile
 Quartiles divide data into four equal parts.

 First quartile—Q1
 25% of observations are below Q1 and 75% above Q1
 Second quartile—Q2
 Third quartile—Q3
18
Quartiles
Q1 Q2 Q3
25% 25% 25% 25%
19
Five number system
 Maximum
 Minimum
 Median=50th percentile
 Lower quartile Q1=50th percentile
 Upper quartile Q3=75th percentile
20
Graphical display of numerical
variables
(histogram)
Class Interval
20
Frequency
20-under 30 6
Frequency
30-under 40 18
10
40-under 50 11
50-under 60 11
60-under 70 3
0
70-under 80 1
0 10 20 30 40 50 60 70 80
Years
21
variables
(stem and leaf plot)
Raw Data Stem Leaf
86 77 91 60 55 2 3
76 92 47 88 67 3 9
4 79
23 59 72 75 83
5 569
77 68 82 97 89 6 07788
7 0245567789
81 75 74 39 67
8 11233689
79 83 70 78 91 9 11247
68 49 56 94 81
22
variables
(box plot)
Median
Minimum Q1 Q2 Q3 Maximum
23
variables
(box plot)
S<0 S=0 S>0
Negatively Symmetric Positively

Skewed (Not Skewed) Skewed
24
Univariate statistics
(categorical variables)
 Count=frequency
 Percent=frequency/total sample
 The distribution of a categorical variable lists

the categories and gives either a count or a
percent of individuals who fall in each
category
25
Displaying categorical variables
Rank Cause of Frequency
Death (%)
1 Heart 710,760
Disease (43%)
2 Cancer 553,091
(33%)
heart cancer stroke CLRD accident
3 Stroke 167,661
(11%)
60
4 CLRD 122,009
( 7%) 40
5 Accidents 97,900 20
( 6%)
0
Total All five 1,651,421 heart cancer stroke CLRD accident
causes
26
27
Bivariate relationships
 An extension of univariate descriptive
statistics
 Used to detect evidence of association in the

sample
 Two variables are said to be associated if the
distribution of one variable differs across groups or
values defined by the other variable
28
Bivariate Relationships
 Two quantitative variables
 Scatter plot
 Side by side stem and leaf plots
 Two qualitative variables

 Tables
 Bar charts
 One quantitative and one qualitative variable

 Side by side box plots
 Bar chart
29
Response and explanatory variables
 Response variable: the variable which we
intend to model.
 we intend to explain through statistical modeling
 Explanatory variable: the variable or variables

which may be used to model the response
variable
 values may be related to the response variable
30
Two quantitative variables
Correlation
A relationship between two variables.
Explanatory Response
(Independent)Variable (Dependent)Variable
x y
Hours of Training Number of Accidents
Shoe Size Height
Cigarettes smoked per day Lung Capacity
Score on SAT Grade Point Average
Height IQ
What type of relationship exists between the two variables
and is the correlation significant?
31
Scatter Plots and Types of Correlation
x = hours of training
Accidents y = number of accidents
60
50
A ccid e n ts
40
30
20
10
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
Negative Correlation as x increases, y decreases

32
x = SAT score
GPA
4.00
y = GPA
3.75
3.50
3.25
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300 350 400 450 500 550 600 650 700 750 800
Math SAT
Positive Correlation as x increases y increases

33
IQ
x = height y = IQ
160
150
140
130
IQ
120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation 34
Correlation Coefficient
A measure of the strength and direction of a linear relationship
between two variables
nxy  xy
r
nx 2   x 
2
ny 2  (y ) 2
The range of r is from -1 to 1.
-1 0 1
If r is close to -1 If r is close to If r is close to 1
there is a strong 0 there is no there is a strong
negative linear positive
correlation correlation correlation
35
Positive and negative correlation
1 If two variables x and y are positively correlated this means
that:
 large values of x are associated with large values of y, and
 small values of x are associated with small values of y
2 If two variables x and y are negatively correlated this means

that:
 large values of x are associated with small values of y, and
 small values of x are associated with large values of y
36
Positive correlation
37
Negative correlation
38
Two qualitative variables
(Contingency Tables)
 Categorical data is usually displayed using a

contingency table, which shows the frequency
of each combination of categories observed in
the data value
 The rows correspond to the categories of the
explanatory variable
 The columns correspond the categories of the

response variable
39
Example
 Aspirin and Heart Attacks
 Explanatory variable=drug received
 placebo
 Aspirin
 Response variable=heart attach status

 yes
 no
40
Contingency table:
heart attack example
Heart Attack No Heart Total
Attack
Aspirin 104 10,933 11,037
placebo 189 10,845 11,034
Total 293 21,778 22,071
41
Two qualitative variables
Marijuana Use in College: x=parental use, y=student use
Both Neither One 60
50
Never 17 141 68 226
40
Occasional 11 54 44 109 30
20
Regular 19 40 51 110 10
0
47 235 163 445 Both N either One
N ever Occasional Regular

42
Case Study #1
Mean birth weight by race
3200
3103.74
3100
3000
2900
2804.01
2800
2719.69
2700
2600
2500
white black other
43
One quantitative, One qualitative
Box plot of age by low birth weight
Mean age by low birth weight
50
24
23.66
40 23.5
23
a
g 30
22.5 22.31
e
20
22
21.5
yes no
10
0 1 low birth weight

l b w
low birth weight
44
Case Study #1
Birth weight and age
b wt
5000
4000
3000
2000
1000
10 20 30 40 50
a ge
45
r=.09
Trivariate Relationships
 An extension of bivariate descriptive statistics
 We focus on description that helps us decide

about the role variables might play in the
ultimate statistical analyses
 Identify variables that can increase the

precision of the data analysis used to answer
associations between two other variables
46
Confounding and effect modification
 A factor, Z, is said to confound a relationship
between a risk factor, X, and an outcome, Y, if it is
not an effect modifier and the unadjusted strength of
the relationship between X and Y differs from the
common strength of the relationship between X and Y
for each level of Z.
 A factor, Z, is said to be an effect modifier of a

relationship between a risk factor, X, and an outcome
measure, Y, if the strength of the relationship between
the risk factor, X, and the outcome, Y, varies among
the levels of Z.
47
Example: confounding
 In our low birth weight data suppose we wish
to investigate the association between race and
low birth weight.
 Our ability to detect this association might be

affected by:
 Smoking status being associated with low birth
weight
 Smoking status being associated with race
48
Case study #1
Race and smoking status
90
82.09
80
70
61.54
60 54.17
50 45.83
yes
40 38.46
no
30
20 17.91
10
0
white black other
49
Case Study #1
Race, smoking status, LBW
70
63.46
60 58.33
60
smokers
50
40 41.67
40 36.54
yes
30 no
20
10
0
white black other
100
90.91
90
80
Non-smokers 70 68.75
63.64
60
50 yes
40 36.36 no
31.25
30
20
9.09
10
0
white black other 50
Multivariate Statistics
 Allows one to calculated the association
between and response and outcome of interest,
after controlling for potential confounders.
 Allows for one to assess the association

between an outcome and multiple response
variables of interest.
Statistical Models
51

Source: Pllnu4Dk9H04Wqyrebvzx4?Fr Yfp-T-701-S &toggle 1&cop Mss&Ei Utf8&Fp - Ip PH&P Types of Descriptive Statistics

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Source: Pllnu4Dk9H04Wqyrebvzx4?Fr Yfp-T-701-S &toggle 1&cop Mss&Ei Utf8&Fp - Ip PH&P Types of Descriptive Statistics

Hochgeladen von

Copyright:

Verfügbare Formate

Source

 Formulas and interpretations of commonly used descriptive

 Pictorial representations of descriptive statistics

 Examining the relationship between two or more variables

 Allows you to get a feel for the data

 Finding errors in data collection or data entry

 Changing over time

 Changing across a population

 Qualitative variables (categorical): measure

 Continuous variables: can assume the

 Ordinal: ordered categories

 Overall pattern (distribution)

 Median: The median of a data set is the “middle

 Mode: The mode is the single most frequently 13

 Distribution skewed right

 Distribution skewed left

 Standard deviation: The square root of the sample

 Range: the difference between the maximum and

 The pth percentile of a distribution is the value such that p

 Quartiles divide data into four equal parts.

25% 25% 25% 25%

Negatively Symmetric Positively

 The distribution of a categorical variable lists

 Used to detect evidence of association in the

 Two qualitative variables

 One quantitative and one qualitative variable

 Explanatory variable: the variable or variables

Negative Correlation as x increases, y decreases

Positive Correlation as x increases y increases

The range of r is from -1 to 1.

2 If two variables x and y are negatively correlated this means

 Categorical data is usually displayed using a

 The columns correspond the categories of the

 Response variable=heart attach status

Aspirin 104 10,933 11,037

placebo 189 10,845 11,034

Total 293 21,778 22,071

N ever Occasional Regular

0 1 low birth weight

low birth weight

 We focus on description that helps us decide

 Identify variables that can increase the

 A factor, Z, is said to be an effect modifier of a

 Our ability to detect this association might be

 Allows for one to assess the association

Das könnte Ihnen auch gefallen