Sie sind auf Seite 1von 59

Section 2

Descriptive Statistics
Part 1: Organizing Data

Learning Objectives
Variable and data types
Distribution tables and histograms
Distribution shapes

What is Descriptive Statistics?


As briefly seen in section 1, descriptive
statistics deals with
Methods of organizing or summarizing data
(graphs, tables, etc.)
Obtaining descriptive measures of centre,
variability, etc.

Descriptive Statistics: Data


The type of statistical analysis, or the
type of graph(s) that will be appropriate,
will depend on the type of DATA you
have.

Data: Definitions
Variable: A characteristic that varies
from one individual to another. In
statistics, we call these random
variables, because their values occur
by chance.
Example: Hair colour, height, number
of cells in a bacterial culture, etc.
5

Variable and Data Types


Qualitative (descriptive): The values
are usually words, that is, they take on
non numerical values.

Variable and Data Types


Quantitative (numerical): Values are
numbers.
Two types:
Discrete: Possible values are just whole
numbers. Usually COUNTED.
Continuous: Like a calculus variable.
Possible values could theoretically be any
number to any decimal degree of accuracy.
Usually MEASURED.
7

Variable and Data Types


Previously we mentioned variables
(hair colour, height, number of cells in a
bacterial culture)
Which of those variables are
(a) Qualitative?
(b) Discrete?
(c) Continuous?
8

Data
Observing several values of a variable
gives a set of data.
Qualitative data comes from several
values of a qualitative variable.
Similar for discrete and continuous
data.

Goal 1 of Descriptive
Statistics
Make the data easy to read by
somehow grouping and graphing it.
The method of grouping and graphing
will depend on what type of data you
have!
Normally use a computer to group and
graph for us.
10

Qualitative Data
This is the easiest to group and graph.
All you can do is count the number of
individual values in each category and
record these is a table.
Most common graphing options:

11

Qualitative Data: Example


Survey for favourite video game
console yielded the following data (next
slide).
N stands for Nintendo WiiU, P stands
for Playstation 4, and X stands for Xbox
One.

12

Qualitative Data: Example


Game console data:
P, N, N, N, X, P, P, X, N, X, N, N, P, N, N

First: Tabulate (Frequency and/or


relative frequency distribution).
Console type

Frequency

Relative
Frequency

Nintendo WiiU
Playstation 4
Xbox One
TOTAL
13

Qualitative Data: Example


Next step: Graph (bar or pie chart).
Height of bar = frequency OR relative
frequency of that group.
10
8
6
4
2
Nintendo

Playstation

Xbox

14

Frequency
Chart
10
8
6
4
2
Nintendo

Playstation

Xbox
15

Relative Frequency
Chart
0.6
0.5
0.4
0.3
0.2
Nintendo

Playstation

Xbox
16

Bars will look relatively


the same for each chart
type
0.6
0.5
0.4
0.3
0.2
Nintendo

Playstation

Xbox
17

Qualitative Data: Example


In class survey!!
Whats your major?
Well record the results in Minitab and
see how to group and graph with bar
charts and pie charts!
Give appropriate titles to your graphs,
and include units where appropriate.
18

Quantitative Data
Graphs and grouping methods are
similar whether data is discrete or
continuous.

19

Grouping Quantitative Data


Similar to grouping qualitative data.
Still have frequency and relative
frequency.
In addition, have:
Cumulative frequency
Cumulative relative frequency

20

Grouping and Graphing


Quantitative Data: Definitions
Class: Range of values.
Frequency, Relative Frequency:
Same as for qualitative data.
Cumulative Frequency: TOTAL count
(frequency) of observations SO FAR.
Cumulative Relative Frequency:
Total proportion (relative frequency) SO
FAR.
21

Grouping and Graphing


Quantitative Data: Definitions
Lower Cutpoint: Smallest possible
value in a class.
Upper Cutpoint: Smallest value that
can go into the next higher class (same
as lower cutpoint of next higher class).
Midpoint: Middle of a class (average
of upper and lower cutpoints).
Width: Size of a class (difference
22
between upper and lower cutpoint).

Grouping and Graphing


Quantitative Data: Definitions
Frequency Histogram: Like a bar
chart. Heights of bars = frequency of
class.
Relative Frequency / Cumulative
Frequency / Cumulative Relative
Frequency Histogram are done
similarly.
Outliers: Single observations that are
FAR AWAY from most of the data.
23

Notation For Classes

[ x, y )
Means everything from x up to, but not including y.
Example: All values from 30 to 40, including 30
but NOT including 40 Itself, would be:

[30, 40)
24

In Class Exercise 2.1.1


Suppose a survey was done to
determine daily TV viewing times.
Data is recorded in minutes:
61.4, 64.7, 70.1, 73.0, 73.5, 74.9, 75.4,
75.8, 77.9, 78.7, 79.5, 81.2, 83.6, 84.3,
85.9, 86.9, 96.1, 96.1, 101.4, 104.9.
Where should the classes begin?
End?
25
What class size would be appropriate?

In Class Exercise 2.1.1


61.4, 64.7, 70.1, 73.0, 73.5, 74.9, 75.4,
75.8, 77.9, 78.7, 79.5, 81.2, 83.6, 84.3,
85.9, 86.9, 96.1, 96.1, 101.4, 104.9.
Create a distribution table for the data,
including all four types of frequencies.
Give an appropriate title and include
UNITS.
26

Partial Solution
Distribution of TV Viewing Times (minutes)
Class

[60, 65)
[65, 70)
[70, 75)
[75, 80)
[80, 85)
[85, 90)
[90, 95)
[95, 100)
[100, 105)

Frequency

Relative
Frequency

Cumulative
Cumulative
Relative
Frequency
Frequency

Fill these in on
the handout!

27

Labelling your Histograms


Title: Should say Histogram of, followed
by the variable.
Example: Histogram of TV Viewing Times.

The x-axis: Label the variable AND the units.


Example: TV Viewing Times (minutes).

The y-axis: Label the TYPE of count (i.e., one


of: frequency / relative frequency / cumulative
frequency / cumulative relative frequency).
28

In Class Exercise 2.1.1


(continued)
Using your table, construct the
following histograms. Give appropriate
labels to both.
Frequency
Cumulative Relative Frequency

29

How to use Minitab to do


Histograms
Usually youll use a computer to do
histograms for you.
On assignments, always use a
computer.
On exams, Ill give you any histograms
you may need: know how to read them
to get information from them.
30

How to Read Histograms and


Cumulative Histograms
Frequency histograms give the
NUMBER of observations in a SINGLE
class.
Relative frequency histograms give the
PROPORTION of observations in a
SINGLE class.

31

How to Read Histograms and


Cumulative Histograms
Cumulative frequency histograms give
the NUMBER of observations that fall in
a class or one of the PREVIOUS
classes.
Cumulative relative frequency
histograms give the PROPORTION of
observations that fall in a class or one
of the PREVIOUS classes.
32

Example:
Reading Histograms
Going back to the TV viewing times
data, the following histograms (next
slides) were done in Minitab.
Use the appropriate histogram to
answer the questions that follow.

33

How many people watched TV between 80 and 85 minutes?


34

How many people watched TV no more than 90 minutes?


35

What proportion of people watched TV between 70 and 80?


36

What percentage of people watched TV at least 70 37


minutes?

How Many Classes?


Obviously, one class is useless!
Similarly, having too many classes is
also useless.
Rule of thumb: Normally between 5
and 15 classes.
Computer will do it for you.

38

Distribution Shapes
When you make a histogram, it has a
particular shape. This is called the
distribution shape.
The shape of your data will determine
the type of statistical analysis that is
appropriate.
Therefore, this is information you will
use for the rest of the course and in
EVERY statistical study you conduct.
39

How to Construct Distribution


Shapes
Distribution shapes ALWAYS come
from FREQUENCY or RELATIVE
FREQUENCY histograms.
Most commonly from RELATIVE
FREQUENCY.
Take a RELATIVE FREQUENCY
histogram and try to draw a smooth
curve over it.
40

How to Construct a
Distribution Shape
R
E
L
A
T
I
V
E

F
R
E
Q
E
N
C
Y
OBSERVATION CLASSES

41

Distribution Shapes
The following are perfect distribution
shapes.
In real life, you will almost never get a
histogram that looks perfect.
Therefore, these are meant as
guidelines.

42

Most Common Distribution


Shapes

43

Most Common Distribution


Shapes

44

Most Common Distribution


Shapes

45

Most Common Distribution


Shapes

46

Most Common Distribution


Shapes: Definitions
Modality: The number of modes
(peaks / local maxima) in the
distribution.
You can have any number of modes (0,
1, 2, ).
Unimodal: One mode.
Bimodal: Two modes.
Multimodal: Three or more modes.
47

Most Common Distribution


Shapes: Definitions
Symmetric: If cut in half, the
distribution would look mirrored on the
left and right pieces.

48

Most Common Distribution


Shapes
Examples of bimodal and multimodal
distributions:

Bimodal

49
Multimodal

Most Common Distribution


Shapes
The most common distribution shapes
used in all of statistics are the NORMAL
and RIGHT SKEWED distributions.
In this course, these are the only ones
well study in detail, beginning in section
4.

50

Sample Distributions
What you just saw are perfect
distribution shapes.
In real life, youll likely NEVER see a
perfect distribution shape in your
histograms from your samples.
This is called Sampling Error.

51

Sampling Error
This will cause some minor variations in
a histogram.
Usually it is good enough to say a
histograms shape is close to one of
our distributions.

52

Example
The following histogram came from a
normal distribution:

Its not perfect, but it looks close


enough to a bell curve. Any outliers?
53

Example
What do you think is the distribution
shape of this histogram?

Are there any potential outliers?


54

In Class Exercise 2.1.2


For the graphs on the next slides:
Determine the distribution shape.

55

In Class Exercise 2.1.2:


Shape and Outliers?

56

Population and Sample


Distribution
We take samples to predict properties
of a population, for example:
Mean
Standard deviation
DISTRIBUTION SHAPE

57

Distribution Shapes: Example


Can you determine the distribution
shapes of the following histograms?

58

Accuracy
Why do those histograms have
declining accuracy of the shape?
A histograms shape is most obvious
(and therefore, most representative of
the populations distribution) when the
sample size is ?

59

Das könnte Ihnen auch gefallen