Sie sind auf Seite 1von 15

Recap:*Variables*&*Data*

Describing*&*Presen-ng*
Data*
Nooriah*Mohamed*Salleh*
MBBS(Malaya),*MPH(Tulane),*DrPH(Tulane)*

Variables*are*labels.*The* Data*is*the*value*you*get*
value*of*variables*can*
from*observa-on*thru*
vary**
measuring,*coun-ng*etc.*
*
*
Example:**
Example:*
age,**
***#*of*pa-ent*
Gender*
Occupa-on*
Ethnicity**

***weight*of*baby*(kg)*
***#*of*doctors*
*****

Recap:*Variables*&*Data*
Variable

Types*of*Data*

Data
Nominal Data

Age (of mother)

23 years old
Categorical Data

Weight (of baby)

3.0 kg

Gender

male

Ethnic group

Malay

Occupation of mother

Housewife

Ordinal Data

Data
Discrete Data
Numerical Data
Continuous Data

Nominal*categorical*data*
It*can*be*allocated*into*one*of*a*number*of*
categories.*
Has*no*meaningful*order*
Example:*
Blood*type*(A,B,*AB,O),*sex*(M,*F)*

Discrete*numerical*data*

Countable*variables.*
Integer*form*(discrete)*
Numbers*of*things*
Example:*
number*of*pregnancies*
Number*of*pa-ents*
Number*of*teeth***

Ordinal*categorical*data*
It*can*be*allocated*to*one*of*a*number*of*
categories*arranged*in*a*meaningful*order.*
Example:*
Very*sa-sed,*sa-sed,*neutral,*unsa-sed,*very*
unsa-sed.**
Grade*I,*Grade*II,*Grade*III*(Tumor*Grading)*
Moderate,*Severe,*Very*Severe*(Pain)*

Con-nuous*numerical*data*
Measurable*variables.*
Round*to*the*nearest*integer*
Example:*
Weight*(Kg)*
Height*(metre)*
BP*(mmHg)*
Age*(years)*
Dura-on*of*surgery*(hour)*

Describing*data*with*tables*

1)*frequency*table**
2)*rela-ve*and*cumula-ve*frequency*
3)*grouped*frequency*
4)*open[*ended*groups*
5)*cross[tabula-on*

1.*Frequency*table*
A*picture*of*the*frequency*distribu-ons*
variables

Mortality (%)

Tally

No. of ICU
patients

11.2-15.1

1, 1, 1, 1, 1, 1, 1, 1, 1

15.2-20.1

1, 1, 1, 1, 1, 1, 1, 1

20.2-25.1

1, 1, 1, 1, 1

25.2-30.1

1, 1, 1

30.2-35.1

1,

2.*Rela-ve*frequency,*cumula-ve*
frequency*
Rela-ve*frequency:*percentage*of*the*total*
Cumula-ve*frequency:**
parity

No.of women

Percentage
(relative frequency)

Cumulative
percentage

12.5

12.5

15

27.5

14

35

62.5

10

25

87.5

7.5

95

2.5

97.5

25

100

frequency

3.*Grouped*frequency*
Grouped*frequency:*for*con-nuous*metric*
data*
A group
width of
300g
The class
lower limit
The class
upper limit

Birthweight

No. of infants

2700-2999

3000-3299

3300-3599

3600-3899

3900-4199

4200-4499

Table*for*display*of*Data*
Type of data

Table

Ordinal

Frequency table
numerical

Discrete
Continuous numerical
data

One*or*two*values*which*are*called*outliers,*
are*a*long*way*from*the*general*mass*of*the*
data.*
Use**or***

Grouped Frequency

5)*Cross[tabula-on*
Association between breast lump and parity
2 or fewer
children

4.*Open[ended*group*

Breast lump diagnosis

Totals

Malignant

Benign

Yes

21

25

No

11

15

Totals

32

40

3.*Describing*data*with*charts*
1.Nominal*data:*
(1)*the*pie*chart*
(2)*the*simple*bar*chart*
(3)*the*cluster*bar*chart*
(4)*the*stacked*bar*chart*
2.*Ordinal*data:*
(1)*the*pie*chart*
(2)*the*bar*chart*
(3)*the*dotplot**
*
3.*Discrete*numerical*data*

4.*Con-nuous*numerical*data*[*
histogram*
*
5.*Cumula-ve*ordinal*or*discrete*
data*[*step*chart*
*
6.*Cumula-ve*con-nuous*data*[*
cumula-ve*frequency*or*ogive*
*
7.*Time*based*data:*-meseries*
chart*
*

1.1.*Pie*chart*

1.2*Simple*bar*chart*

4[5*categories*
Describe*1*variable*
Start*at*0*in*the*same*order*as*the*table*

Same*widths,*equal*spaces*between*bars*
Bar Chart: Hair colar of the chidren
receiving d-phenothrin
60

Pie chart: Hair color of children reciving d-phenothrin

55

50
40

blonde, 18, 18%

dark , 21, 21%

30
blonde
brown

red, 4, 4%

red
dark

21

18

20
10

0
blonde

brown, 55, 57%

Bar chart for number of health professionals

brown

red

dark

1.3**Clustered*bar*chart*

Number of workers

6000
Cluster percetage bar chart of the hair
color receiving Malathion and dphenothrin

5000
4000
60

56

52

50

3000

blonde

40

2000

20
10

1000

brown

28

30

22

18

16
4

dark
4

0
malathion

0
Dentists

Doctors

Nurses

Profession

Pharmacists

red

d-penothrin

Plotting by sector rather than by profession


Look at the data from a different angle
Highlight different aspects of the data

Clustered bar chart for number of health professionals


Private
Public

Clustered bar charts of number of health professionals


4000

Number of workers

3000

2000

1000

Dentists
Doctors
Nurses
Pharmacists

3000

2000

1000

0
Dentists

Doctors

Nurses

Pharmacists

Profession

Private

1.4*Stacked*bar*chart*

Sector

Public

Variation of the basic bar chart


Stacked bar chart for number of health professionals
6000

Private
Public

stacked bar chart


5000
100%
80%
Non-smokers

60%

Fomer smokers
40%

Smokers

20%
0%
Breast-fed

Bottle-fed

Number of workers

Number of workers

4000

4000
3000
2000
1000
0
Dentists

Doctors

Nurses

Profession

Pharmacists

Stacked bar charts by sector

Segmented bar charts by profession

6000

Dentists
Doctors
Nurses
Pharmacists

90
80

Percent by sector

Number of workers

5000

Private
Public

100

4000
3000
2000
1000

70
60
50
40
30
20
10

0
Private

Public

Dentists

Doctors

Sector

3000

2000

1000

Dentists
Doctors
Nurses
Pharmacists

3000

2000

1000

Nurses

Pharmacists

90

2000
1000

70
60
50
40
30
20
10

Profession

Pharmacists

40
30
20
10
Private

Dentists
Doctors
Nurses
Pharmacists

4000
3000
2000
1000
0

Dentists

Doctors

Nurses

Profession

Pharmacists

Public

Segmented bar charts by sector

5000

Number of workers

3000

50

Sector

6000

80

4000

60

Stacked bar charts by sector


Private
Public

100

Percent by sector

Number of workers

Segmented bar charts by profession

Private
Public

5000

70

Sector

Stacked bar chart for number of health professionals

6000

80

Public

Dentists
Doctors
Nurses
Pharmacists

100
90

Percent within sector

Doctors

Profession

Nurses

90

0
Private

Dentists
Doctors
Nurses
Pharmacists

100

Percent within sector

4000

Private
Public

Doctors

Percentage bar charts by sector

Clustered bar chart of number of health professionals

Number of workers

Number of workers

4000

Dentists

Pharmacists

Profession

Clustered bar chart for number of health professionals

Dentists

Nurses

80
70
60
50
40
30
20
10
0

Private

Public

Sector

Private

Public

Sector

Stacked bar chart of yearly mortality rate per 1000 births

Time Trend
Exaggerate visually the increase in # prescriptions
written per person by starting at 8 rather than 0

Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.

Table*1:*Response*under*two*treatments*

Response to
Treatment
None
Partial
Complete
Total

Treatment
A
3
15
9

B
2
22
30

Within treatment percentage

Can compare the response type percentages


for the two treatments
Response to
treatment
None
Partial
Complete

100
90
80
70
60
50
40
30
20
10
0

27

54

Treatment

Within treatment percentage

Stacked bar charts for percentage figures

Histogram
Divide the range of the data into a suitably
chosen number of intervals, all of the same
width
The number of observations that fall
within each interval is plotted

Response to
treatment
None
Partial
Complete

100
90
80
70
60
50
40
30
20

Relative frequency histogram


Plot the proportions of observations that
fall within the class intervals

10
0
A

Treatment

Relative frequency polygon for SysVol


Histogram of End-Systolic Volume for 45 Male
Heart Attack Patients

40

20

Percent

Frequency

30

10

20

10

0
40

0
40

60

80 100 120 140 160 180 200 220

SysVol

60

80 100 120 140 160 180 200 220

SysVol

Histogram*

Step[up*chart*
Exercise 3.8 Cumulative percetage o finfants

Exercise 3-5, Histogram


120

40
35

100

30

100
90

80

25
20

Thrombosis cases

60

15
10

40

36.67

20

0
19

20-24

25-29

30-34

35

16.67
6.67

Percentage age distribution of pregnant women

Cumula-ve*frequency*curve*

Cumulative
percetage o
finfants

60

10

4.*Describing*data*from*its*distribu-onal*
shape*
1.*symmetric*mound[shaped*distribu-ons*

Exercise 3.9 Ogive


120

Exercise 3-5, Histogram

100

40

80
Attempting suicide

60

Later successful

40

35
30
25
20

Thrombosis cases

15

20

10
0
15-24

25-34

35-44

45-54

55-64

65-74

75-84

> 85

Percentage cumulative frequency curves of age for male


suicide attempters and later succeeders

5
0
19

20-24

25-29

30-34

35

Percentage age distribution of pregnant women

Non-Symmetrical Histograms

These histograms are skewed.


Common Shapes of
Histograms

Skewed Histograms

Skewed left (negative


skew)

Skewed right (positive


skew)

Common Shapes of
Histograms

Skewed Histograms

Skewed*distribu-ons*
Exercise 4.2 shape

Skewed left (negative


skew)

Skewed right
(positive skew)

160
140
120
100
80
60
40
20
0

Attempting suicide

15- 25- 35- 45- 55- 65- 75- >


24 34 44 54 64 74 84 85

Note: the SKEW follows the TAIL

Age distribution for female suicide


attempters and later succeeders

Bimodal*distribu-ons*

Shape*of*data*distribu-ons*
******
Symmetrical*or*skewed*

Left-Skewed
Mean Median Mode

Right-Skewed

Symmetric
Mean = Median = Mode

A*bimodal*distribu-on*is*one*with*two*dis-nct*
humps*or*peaks*

Mode Median Mean

Scaeer*Plots**Posi-ve*Correla-on*

Scaeer*Plots*

How Study Time Affects Grades


120
100

Overall grade

Scaeer*plots*are*similar*to*line*graphs*in*that*
each*graph*uses*the*horizontal*(*x*)*axis*and*
ver-cal*(*y*)**axis*to*plot*data*points.*
*
Scaeer*plots*are*most*ogen*used*to*show*
correla-ons*or*rela-onships*among*data.*

80
60
40
20
0
0

0.5

1.5

2.5

Time in hours

3.5

4.5

Study Time

Class
Grade

55

0.5

61

67

1.5

73

81

2.5

89

91

3.5

93

95

4.5

97

Scaeer*Plots**Nega-ve*Correla-on*
Weight Loss Over Time
250

200

Weight

150
Weight
100

50

0
0

10

Days worked out per month

12

Scaeer*Plot*of*the*Data*

Weight
0

200

0.5

205

190

1.5

195

180

2.5

190

170

3.5

177

160

4.5

170

150

5.5

168

Sandwich
Hamburger

Total Fat
(g)
(X)

Total Calories
(y)

260

Cheeseburger

13

320

Quarter Pounder

21

420

Quarter Pounder with


Cheese

30

530

140

Big Mac

31

560

6.5

150

Arch Sandwich Special

31

550

130

Arch Special with Bacon

34

590

7.5

170

120

8.5

130

110

9.5

115

10

100

10.5

120

11

90

11.5

90

12

80

Damaged*for*life*by*too*much*TV*

N Z Herald (04/10/2005)

Crispy Chicken

25

500

Fish Fillet

28

560

Grilled Chicken

20

440

Grilled Chicken Light

300

Fat Grams and Calories in Food


700
600

Total Calories

Work out time

500
400
300
200
100
0
0

10

15

20

25

30

Total Fat Grams

Damaged*for*life*by*too*much*TV*

35

40

Damaged*for*life*by*too*much*TV

Health Score

Causal relationship?
r = - 0.93

5.*Describing*data*with*numeric*summary*
value*
1.*numbers,*percentages*and*propor-ons*
2.*summary*measures*of*central*loca-on/
central*tendency*
3.*summary*measures*of*spread/dispersion*

TV watching

5.1.*Numbers,*percentages*and*
propor-ons**
Numbers[the*numerical*summaries*of*data*
A*percentage*is*a*propor-on*mul-plied*by*
100.**
1)*Prevalence:*number*of*exis-ng*cases*in*
some*popula-on*at*a*given*-me.*
2)*Incidence*(incep-on):*the*number*of*new*
cases*occurring*per*100,*or*per*1000,*of*the*
popula-on,*during*some*period*of*-me.*

5.2.*Summary*measures*of*Central*loca-on*
1)*Mode:*category*or*value*occurs*the*most*ogen,**
***[*Categorical,*numerical,*discrete*
2)*Median:*middle*value*(data*in*ascending*order),*
central[ness.*
****[*Use:*ordinal*and*numerical*data.*
3)*Mean*(average):*divide*the*sum*of*the*values*by*the*
number*of*values*
4)*Percen>le:*divide*the*total*number*of*the*values*into*
100*equal[sized*groups.*

5.3.*Summary*measure*of*spread/
dispersion/variability*

Choosing*the*most*appropriate*measure*
mode

median

mean

Nominal

yes

no

Ordinal

yes

yes

no

Numerical
discrete

yes

Yes, if markedly
skewed

yes

Numerical
continuous

yes

Yes, if markedly
skewed

yes

no

Box[and[Whisker*Plot*

X MinimumQ1 Median Q3
value

10

BoxHand*whiskerplot:*graphical*summary*of*the*three*
quar-le*values,*the*minimum*and*maximum*values,*and*
outliers.*

Standard*devia-on*
average*distance*of*all*the*data*values*from*the*
mean*value.**

XMaximum
value

IQR*(interquar>le*range):*=*(75th**25th)*percen-le*
***********************************************=*Q3*Q1*

The*spread*in*a*set*of*data;**

*****Graphical*Display*of*Data*Using*
* *5HNumber*Summary*

*
Range:*maximum*value**minimum*value*

12

The*smaller*the*average*distance*is,*the*
narrower*the*spread,*and*vice*versa.*
Use:*numerical*data*only.*

Das könnte Ihnen auch gefallen