You are on page 1of 37

# Chapter 2

## Displaying and Describing Categorical Data

Announcements
Clarification on labs: There is a lab every

week.
Please fill out the student survey.
Online homework due Wednesday, September
3rd, by 8am! So, may want to finish it by
Tuesday. You can complete this assignment
now. It is only about chapter 1, and we have
finished chapter 1! You can start the
assignment, save, and go back to it later. You
cannot take it more than once.

If you are having difficulties, please send me

## an email with details on what you are having

difficulties with. Also, include the following
information:
Laptop or desktop computer?
Mac or PC?

Chapter Outline
Review 5 Ws of Data
Review Categorical Variables
Distribution of One Categorical Variable
Relationship between Two Categorical

Variables

## The Five Ws of Data

Who
Cases, rows in data file

What
What variables, columns in data file

Where
Where did the data come from?

When
When were the data collected?

Why
Why did we collect data?

How
How did we get the data?

Categorical Variables
Variables with labels as values.
Gender
Eye Color
Year in School

## Example Eye Colors

Who 2068 Stat 101 Students
What Eye Color (Blue, Brown, Green, Hazel,

Other)
Where Iowa State University
When Spring 2004 through Spring 2007
Why As a part of a survey to collect
interesting demographic data on Stat 101
students to use during the course.
How Data collected through Internet survey.

Summarizing a Single
Categorical Variable
Number, proportion or percentage of Whos in

each category.
Summarize with
Frequency Table (Relative Frequency Table)
Bar Chart
Pie Chart

## Frequency Table (Relative

Frequency Table)
Lists categories and number or proportion of

## Whos in each category.

Compare numbers and/or proportions.

## Example of a Relative Frequency

Table
Eye
Color
blue

Frequency
729

Relative
Frequency
0.35251

brown

642

0.31044

green

308

0.14894

hazel

347

0.16779

other

42

0.02031

Total

2068

1.00000

Bar Chart
Displays either number or percentage for

each category.
Compare heights of bars.
Do not need to have all categories in display.

Example of a Bar
Chart
Eye Colors of Stat 101 Students
Blue
Brown
Green
Hazel
Other

Pie Chart
Displays percentage of whole for each

category.
Compare sizes of pie slices.
Must have all categories in display.

## Example of Pie Chart

Eye Colors of Stat 101 students
Blue
Brown
Green
Hazel
Other

## Describing the Relationship

between Two Categorical
Variables
Is there an association between the two

categorical variables?
Two variables
Variable of interest = Response variable
Other variable = Explanatory variable
The explanatory variable is being used to

variable.

## Describing the Relationship

between Two Categorical
Variables
Is the distribution of the response variable

## different for different categories of the

explanatory variable?
There is an association between the two

variables.

## approximately the same for the different

categories of the other variable?
There is NOT an association between the two

variables.

Friday Announcements
First online homework due Wednesday at 8am, no

exceptions.
After you submit, you will be able to go to your grades

## and see which questions you got correct and incorrect.

Online Homework 2:
displays bar graphs with no spaces, which is incorrect.
We have only learned pie charts, bar charts,
frequency/relative frequency tables, and mosaic plots.
If an online homework question mentions a different
display (scatterplot, histogram), it cannot be a

Friday Announcements
Personal glossaries are now available for

## chapters 1 and 2 on Blackboard. Go to course

content, then chapter introduction. The document
ending in PG stands for Personal Glossary.
Some chapters are short and some are longer.

JMP
Still working on communicating with our IT person.
If you would prefer to use a campus computer and

## not your personal computer, many labs have JMP

software: https://www-it.sws.iastate.edu/labsdb/

## Describing the Relationship

between Two Categorical
Variables
Data = Two-Way Table (Contingency Table)
Rows = Categories of the explanatory
variable

variable

## belonging to particular category of

explanatory variable and particular category
of response variable.

## Example Eye Color and Gender

Who 2068 Stat 101 Students
What Eye Color (Blue, Brown, Green, Hazel,

## Other) and Gender (Male, Female)

Where, When, Why and How Same as before

Contingency Table
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Marginal Distributions
Looks at percentages for each variable

## separately (ignoring the other variable)

Margins of contingency table
Same as looking at two variables separately

Contingency Table
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Example:
Marginal Distribution of Eye Color
Blue = 729/2068
Brown = 642/2068
Green = 308/2068
Hazel = 347/2068
Other = 42/2068

Conditional Distributions
Looks at percentages for one variable

## contingent upon (conditioned on) a particular

category for the other variable
Conditioning variable = explanatory variable
Other variable = response variable

## distribution for same variable

Differences indicate a potential dependence
(association) between the two variables

Contingency Table
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Example:
Eye Color conditioned on Gender = Females
Blue = 370/1125
Brown = 352/1125
Green = 198/1125
Hazel = 187/1125
Other = 18/1125

Eye Colors
Gender

Blue

Female

370

Male
Total

Brown Green
352

198

187

359

290

110

160

729

642

308

347

18

1125

(1.60)

(100)

24

943

(2.55)

(100)

42

2068

(2.03)

(100)

## What proportion have blue eyes?

729/2068 = .35
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

## What proportion have brown eyes?

642/2068 = .31
Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

eyes?

## .35 + .31 = .66

Eye Color
Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

AND are female?

352/2068 = .17
Eye Color

## Gender Blue Brown Green Hazel

Other

Total

Female 370

352

198

187

18

1125

Male

359

290

110

160

24

943

Total

729

642

308

347

42

2068

Mosaic Plot
Graphical summary of conditional

## distributions in contingency table.

Similar to segmented bar charts (in textbook).
Includes summary of marginal distributions.

## Mosaic Plot and

Association
Association
The lines (segments) in the mosaic plot do not
line up.
Means conditional distributions are different.
No association
The lines (segments) in the mosaic plot line up.
Means conditional distributions are same.

1.00

other

ey ec olor

hazel
0.75

green

0.50

brow n

0.25
blue
0.00
female

male
sex

## Are the conditional distributions of

eye color given gender different?
1.00

other

ey ec olor

hazel
0.75

green

0.50

brow n

0.25
blue
0.00
female

male
sex

## So, does that mean there is or

is not an association between
the variables?
1.00

other

ey ec olor

hazel
0.75

green

0.50

brow n

0.25
blue
0.00
female

male
sex