You are on page 1of 37

Chapter 2

Displaying and Describing Categorical Data

Announcements
Clarification on labs: There is a lab every

week.
Please fill out the student survey.
If you have not already, read the syllabus!
Online homework due Wednesday, September
3rd, by 8am! So, may want to finish it by
Tuesday. You can complete this assignment
now. It is only about chapter 1, and we have
finished chapter 1! You can start the
assignment, save, and go back to it later. You
cannot take it more than once.

JMP download problems


If you are having difficulties, please send me

an email with details on what you are having


difficulties with. Also, include the following
information:
Laptop or desktop computer?
Mac or PC?

Chapter Outline
Review 5 Ws of Data
Review Categorical Variables
Distribution of One Categorical Variable
Relationship between Two Categorical

Variables

The Five Ws of Data


Who
Cases, rows in data file

What
What variables, columns in data file

Where
Where did the data come from?

When
When were the data collected?

Why
Why did we collect data?

How
How did we get the data?

Categorical Variables
Variables with labels as values.
Gender
Eye Color
Year in School

Example Eye Colors


Who 2068 Stat 101 Students
What Eye Color (Blue, Brown, Green, Hazel,

Other)
Where Iowa State University
When Spring 2004 through Spring 2007
Why As a part of a survey to collect
interesting demographic data on Stat 101
students to use during the course.
How Data collected through Internet survey.

Summarizing a Single
Categorical Variable
Number, proportion or percentage of Whos in

each category.
Summarize with
Frequency Table (Relative Frequency Table)
Bar Chart
Pie Chart

Frequency Table (Relative


Frequency Table)
Lists categories and number or proportion of

Whos in each category.


Compare numbers and/or proportions.

Example of a Relative Frequency


Table
Eye
Color
blue

Frequency
729

Relative
Frequency
0.35251

brown

642

0.31044

green

308

0.14894

hazel

347

0.16779

other

42

0.02031

Total

2068

1.00000

Bar Chart
Displays either number or percentage for

each category.
Compare heights of bars.
Do not need to have all categories in display.

Example of a Bar
Chart
Eye Colors of Stat 101 Students
Blue
Brown
Green
Hazel
Other

Pie Chart
Displays percentage of whole for each

category.
Compare sizes of pie slices.
Must have all categories in display.

Example of Pie Chart


Eye Colors of Stat 101 students
Blue
Brown
Green
Hazel
Other

Describing the Relationship


between Two Categorical
Variables
Is there an association between the two

categorical variables?
Two variables
Variable of interest = Response variable
Other variable = Explanatory variable
The explanatory variable is being used to

explain the differences in the response


variable.

Describing the Relationship


between Two Categorical
Variables
Is the distribution of the response variable

different for different categories of the


explanatory variable?
There is an association between the two

variables.

Is the distribution of the response variable

approximately the same for the different


categories of the other variable?
There is NOT an association between the two

variables.

Friday Announcements
First online homework due Wednesday at 8am, no

exceptions.
After you submit, you will be able to go to your grades

and see which questions you got correct and incorrect.

Online Homework 2:
displays bar graphs with no spaces, which is incorrect.
We have only learned pie charts, bar charts,
frequency/relative frequency tables, and mosaic plots.
If an online homework question mentions a different
display (scatterplot, histogram), it cannot be a
correct answer!

Friday Announcements
Personal glossaries are now available for

chapters 1 and 2 on Blackboard. Go to course


content, then chapter introduction. The document
ending in PG stands for Personal Glossary.
Some chapters are short and some are longer.

JMP
Still working on communicating with our IT person.
If you would prefer to use a campus computer and

not your personal computer, many labs have JMP


software: https://www-it.sws.iastate.edu/labsdb/

Describing the Relationship


between Two Categorical
Variables
Data = Two-Way Table (Contingency Table)
Rows = Categories of the explanatory
variable

Columns = Categories of the response

variable

Table entries = Number of observations

belonging to particular category of


explanatory variable and particular category
of response variable.

Example Eye Color and Gender


Who 2068 Stat 101 Students
What Eye Color (Blue, Brown, Green, Hazel,

Other) and Gender (Male, Female)


Where, When, Why and How Same as before

Contingency Table
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Marginal Distributions
Looks at percentages for each variable

separately (ignoring the other variable)


Margins of contingency table
Same as looking at two variables separately

Contingency Table
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Example:
Marginal Distribution of Eye Color
Blue = 729/2068
Brown = 642/2068
Green = 308/2068
Hazel = 347/2068
Other = 42/2068

Conditional Distributions
Looks at percentages for one variable

contingent upon (conditioned on) a particular


category for the other variable
Conditioning variable = explanatory variable
Other variable = response variable

Compare conditional distributions to marginal

distribution for same variable


Differences indicate a potential dependence
(association) between the two variables

Contingency Table
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Example:
Eye Color conditioned on Gender = Females
Blue = 370/1125
Brown = 352/1125
Green = 198/1125
Hazel = 187/1125
Other = 18/1125

Eye Color conditioned on Gender


Eye Colors
Gender

Blue

Female

370

Male
Total

Brown Green
352

198

Hazel Other Total


187

(32.89) (31.29) (17.60) (16.62)


359

290

110

160

(38.07) (30.75) (11.66) (16.97)


729

642

308

347

(35.25) (31.04) (14.89) (16.78)

18

1125

(1.60)

(100)

24

943

(2.55)

(100)

42

2068

(2.03)

(100)

What proportion have blue eyes?

729/2068 = .35
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

What proportion have brown eyes?

642/2068 = .31
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

What proportion have brown OR blue


eyes?

.35 + .31 = .66


Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

What proportion have brown eyes


AND are female?

352/2068 = .17
Eye Color

Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Mosaic Plot
Graphical summary of conditional

distributions in contingency table.


Similar to segmented bar charts (in textbook).
Includes summary of marginal distributions.

Mosaic Plot and


Association
Association
The lines (segments) in the mosaic plot do not
line up.
Means conditional distributions are different.
No association
The lines (segments) in the mosaic plot line up.
Means conditional distributions are same.

Eye Color conditioned on Gender


1.00

other

ey ec olor

hazel
0.75

green

0.50

brow n

0.25
blue
0.00
female

male
sex

Are the conditional distributions of


eye color given gender different?
1.00

other

ey ec olor

hazel
0.75

green

0.50

brow n

0.25
blue
0.00
female

male
sex

So, does that mean there is or


is not an association between
the variables?
1.00

other

ey ec olor

hazel
0.75

green

0.50

brow n

0.25
blue
0.00
female

male
sex