Beruflich Dokumente
Kultur Dokumente
Describing*&*Presen-ng*
Data*
Nooriah*Mohamed*Salleh*
MBBS(Malaya),*MPH(Tulane),*DrPH(Tulane)*
Variables*are*labels.*The* Data*is*the*value*you*get*
value*of*variables*can*
from*observa-on*thru*
vary**
measuring,*coun-ng*etc.*
*
*
Example:**
Example:*
age,**
***#*of*pa-ent*
Gender*
Occupa-on*
Ethnicity**
***weight*of*baby*(kg)*
***#*of*doctors*
*****
Recap:*Variables*&*Data*
Variable
Types*of*Data*
Data
Nominal Data
23 years old
Categorical Data
3.0 kg
Gender
male
Ethnic group
Malay
Occupation of mother
Housewife
Ordinal Data
Data
Discrete Data
Numerical Data
Continuous Data
Nominal*categorical*data*
It*can*be*allocated*into*one*of*a*number*of*
categories.*
Has*no*meaningful*order*
Example:*
Blood*type*(A,B,*AB,O),*sex*(M,*F)*
Discrete*numerical*data*
Countable*variables.*
Integer*form*(discrete)*
Numbers*of*things*
Example:*
number*of*pregnancies*
Number*of*pa-ents*
Number*of*teeth***
Ordinal*categorical*data*
It*can*be*allocated*to*one*of*a*number*of*
categories*arranged*in*a*meaningful*order.*
Example:*
Very*sa-sed,*sa-sed,*neutral,*unsa-sed,*very*
unsa-sed.**
Grade*I,*Grade*II,*Grade*III*(Tumor*Grading)*
Moderate,*Severe,*Very*Severe*(Pain)*
Con-nuous*numerical*data*
Measurable*variables.*
Round*to*the*nearest*integer*
Example:*
Weight*(Kg)*
Height*(metre)*
BP*(mmHg)*
Age*(years)*
Dura-on*of*surgery*(hour)*
Describing*data*with*tables*
1)*frequency*table**
2)*rela-ve*and*cumula-ve*frequency*
3)*grouped*frequency*
4)*open[*ended*groups*
5)*cross[tabula-on*
1.*Frequency*table*
A*picture*of*the*frequency*distribu-ons*
variables
Mortality (%)
Tally
No. of ICU
patients
11.2-15.1
1, 1, 1, 1, 1, 1, 1, 1, 1
15.2-20.1
1, 1, 1, 1, 1, 1, 1, 1
20.2-25.1
1, 1, 1, 1, 1
25.2-30.1
1, 1, 1
30.2-35.1
1,
2.*Rela-ve*frequency,*cumula-ve*
frequency*
Rela-ve*frequency:*percentage*of*the*total*
Cumula-ve*frequency:**
parity
No.of women
Percentage
(relative frequency)
Cumulative
percentage
12.5
12.5
15
27.5
14
35
62.5
10
25
87.5
7.5
95
2.5
97.5
25
100
frequency
3.*Grouped*frequency*
Grouped*frequency:*for*con-nuous*metric*
data*
A group
width of
300g
The class
lower limit
The class
upper limit
Birthweight
No. of infants
2700-2999
3000-3299
3300-3599
3600-3899
3900-4199
4200-4499
Table*for*display*of*Data*
Type of data
Table
Ordinal
Frequency table
numerical
Discrete
Continuous numerical
data
One*or*two*values*which*are*called*outliers,*
are*a*long*way*from*the*general*mass*of*the*
data.*
Use**or***
Grouped Frequency
5)*Cross[tabula-on*
Association between breast lump and parity
2 or fewer
children
4.*Open[ended*group*
Totals
Malignant
Benign
Yes
21
25
No
11
15
Totals
32
40
3.*Describing*data*with*charts*
1.Nominal*data:*
(1)*the*pie*chart*
(2)*the*simple*bar*chart*
(3)*the*cluster*bar*chart*
(4)*the*stacked*bar*chart*
2.*Ordinal*data:*
(1)*the*pie*chart*
(2)*the*bar*chart*
(3)*the*dotplot**
*
3.*Discrete*numerical*data*
4.*Con-nuous*numerical*data*[*
histogram*
*
5.*Cumula-ve*ordinal*or*discrete*
data*[*step*chart*
*
6.*Cumula-ve*con-nuous*data*[*
cumula-ve*frequency*or*ogive*
*
7.*Time*based*data:*-meseries*
chart*
*
1.1.*Pie*chart*
1.2*Simple*bar*chart*
4[5*categories*
Describe*1*variable*
Start*at*0*in*the*same*order*as*the*table*
Same*widths,*equal*spaces*between*bars*
Bar Chart: Hair colar of the chidren
receiving d-phenothrin
60
55
50
40
30
blonde
brown
red, 4, 4%
red
dark
21
18
20
10
0
blonde
brown
red
dark
1.3**Clustered*bar*chart*
Number of workers
6000
Cluster percetage bar chart of the hair
color receiving Malathion and dphenothrin
5000
4000
60
56
52
50
3000
blonde
40
2000
20
10
1000
brown
28
30
22
18
16
4
dark
4
0
malathion
0
Dentists
Doctors
Nurses
Profession
Pharmacists
red
d-penothrin
Number of workers
3000
2000
1000
Dentists
Doctors
Nurses
Pharmacists
3000
2000
1000
0
Dentists
Doctors
Nurses
Pharmacists
Profession
Private
1.4*Stacked*bar*chart*
Sector
Public
Private
Public
60%
Fomer smokers
40%
Smokers
20%
0%
Breast-fed
Bottle-fed
Number of workers
Number of workers
4000
4000
3000
2000
1000
0
Dentists
Doctors
Nurses
Profession
Pharmacists
6000
Dentists
Doctors
Nurses
Pharmacists
90
80
Percent by sector
Number of workers
5000
Private
Public
100
4000
3000
2000
1000
70
60
50
40
30
20
10
0
Private
Public
Dentists
Doctors
Sector
3000
2000
1000
Dentists
Doctors
Nurses
Pharmacists
3000
2000
1000
Nurses
Pharmacists
90
2000
1000
70
60
50
40
30
20
10
Profession
Pharmacists
40
30
20
10
Private
Dentists
Doctors
Nurses
Pharmacists
4000
3000
2000
1000
0
Dentists
Doctors
Nurses
Profession
Pharmacists
Public
5000
Number of workers
3000
50
Sector
6000
80
4000
60
100
Percent by sector
Number of workers
Private
Public
5000
70
Sector
6000
80
Public
Dentists
Doctors
Nurses
Pharmacists
100
90
Doctors
Profession
Nurses
90
0
Private
Dentists
Doctors
Nurses
Pharmacists
100
4000
Private
Public
Doctors
Number of workers
Number of workers
4000
Dentists
Pharmacists
Profession
Dentists
Nurses
80
70
60
50
40
30
20
10
0
Private
Public
Sector
Private
Public
Sector
Time Trend
Exaggerate visually the increase in # prescriptions
written per person by starting at 8 rather than 0
Table*1:*Response*under*two*treatments*
Response to
Treatment
None
Partial
Complete
Total
Treatment
A
3
15
9
B
2
22
30
100
90
80
70
60
50
40
30
20
10
0
27
54
Treatment
Histogram
Divide the range of the data into a suitably
chosen number of intervals, all of the same
width
The number of observations that fall
within each interval is plotted
Response to
treatment
None
Partial
Complete
100
90
80
70
60
50
40
30
20
10
0
A
Treatment
40
20
Percent
Frequency
30
10
20
10
0
40
0
40
60
SysVol
60
SysVol
Histogram*
Step[up*chart*
Exercise 3.8 Cumulative percetage o finfants
40
35
100
30
100
90
80
25
20
Thrombosis cases
60
15
10
40
36.67
20
0
19
20-24
25-29
30-34
35
16.67
6.67
Cumula-ve*frequency*curve*
Cumulative
percetage o
finfants
60
10
4.*Describing*data*from*its*distribu-onal*
shape*
1.*symmetric*mound[shaped*distribu-ons*
100
40
80
Attempting suicide
60
Later successful
40
35
30
25
20
Thrombosis cases
15
20
10
0
15-24
25-34
35-44
45-54
55-64
65-74
75-84
> 85
5
0
19
20-24
25-29
30-34
35
Non-Symmetrical Histograms
Skewed Histograms
Common Shapes of
Histograms
Skewed Histograms
Skewed*distribu-ons*
Exercise 4.2 shape
Skewed right
(positive skew)
160
140
120
100
80
60
40
20
0
Attempting suicide
Bimodal*distribu-ons*
Shape*of*data*distribu-ons*
******
Symmetrical*or*skewed*
Left-Skewed
Mean Median Mode
Right-Skewed
Symmetric
Mean = Median = Mode
A*bimodal*distribu-on*is*one*with*two*dis-nct*
humps*or*peaks*
Scaeer*Plots**Posi-ve*Correla-on*
Scaeer*Plots*
Overall grade
Scaeer*plots*are*similar*to*line*graphs*in*that*
each*graph*uses*the*horizontal*(*x*)*axis*and*
ver-cal*(*y*)**axis*to*plot*data*points.*
*
Scaeer*plots*are*most*ogen*used*to*show*
correla-ons*or*rela-onships*among*data.*
80
60
40
20
0
0
0.5
1.5
2.5
Time in hours
3.5
4.5
Study Time
Class
Grade
55
0.5
61
67
1.5
73
81
2.5
89
91
3.5
93
95
4.5
97
Scaeer*Plots**Nega-ve*Correla-on*
Weight Loss Over Time
250
200
Weight
150
Weight
100
50
0
0
10
12
Scaeer*Plot*of*the*Data*
Weight
0
200
0.5
205
190
1.5
195
180
2.5
190
170
3.5
177
160
4.5
170
150
5.5
168
Sandwich
Hamburger
Total Fat
(g)
(X)
Total Calories
(y)
260
Cheeseburger
13
320
Quarter Pounder
21
420
30
530
140
Big Mac
31
560
6.5
150
31
550
130
34
590
7.5
170
120
8.5
130
110
9.5
115
10
100
10.5
120
11
90
11.5
90
12
80
Damaged*for*life*by*too*much*TV*
N Z Herald (04/10/2005)
Crispy Chicken
25
500
Fish Fillet
28
560
Grilled Chicken
20
440
300
Total Calories
500
400
300
200
100
0
0
10
15
20
25
30
Damaged*for*life*by*too*much*TV*
35
40
Damaged*for*life*by*too*much*TV
Health Score
Causal relationship?
r = - 0.93
5.*Describing*data*with*numeric*summary*
value*
1.*numbers,*percentages*and*propor-ons*
2.*summary*measures*of*central*loca-on/
central*tendency*
3.*summary*measures*of*spread/dispersion*
TV watching
5.1.*Numbers,*percentages*and*
propor-ons**
Numbers[the*numerical*summaries*of*data*
A*percentage*is*a*propor-on*mul-plied*by*
100.**
1)*Prevalence:*number*of*exis-ng*cases*in*
some*popula-on*at*a*given*-me.*
2)*Incidence*(incep-on):*the*number*of*new*
cases*occurring*per*100,*or*per*1000,*of*the*
popula-on,*during*some*period*of*-me.*
5.2.*Summary*measures*of*Central*loca-on*
1)*Mode:*category*or*value*occurs*the*most*ogen,**
***[*Categorical,*numerical,*discrete*
2)*Median:*middle*value*(data*in*ascending*order),*
central[ness.*
****[*Use:*ordinal*and*numerical*data.*
3)*Mean*(average):*divide*the*sum*of*the*values*by*the*
number*of*values*
4)*Percen>le:*divide*the*total*number*of*the*values*into*
100*equal[sized*groups.*
5.3.*Summary*measure*of*spread/
dispersion/variability*
Choosing*the*most*appropriate*measure*
mode
median
mean
Nominal
yes
no
Ordinal
yes
yes
no
Numerical
discrete
yes
Yes, if markedly
skewed
yes
Numerical
continuous
yes
Yes, if markedly
skewed
yes
no
Box[and[Whisker*Plot*
X MinimumQ1 Median Q3
value
10
BoxHand*whiskerplot:*graphical*summary*of*the*three*
quar-le*values,*the*minimum*and*maximum*values,*and*
outliers.*
Standard*devia-on*
average*distance*of*all*the*data*values*from*the*
mean*value.**
XMaximum
value
IQR*(interquar>le*range):*=*(75th**25th)*percen-le*
***********************************************=*Q3*Q1*
The*spread*in*a*set*of*data;**
*****Graphical*Display*of*Data*Using*
* *5HNumber*Summary*
*
Range:*maximum*value**minimum*value*
12
The*smaller*the*average*distance*is,*the*
narrower*the*spread,*and*vice*versa.*
Use:*numerical*data*only.*