Sie sind auf Seite 1von 20

# CHAPTER

Description of
1
Samples and
Populations

Objectives
After completing this chapter, you should be able to
➢ identify the different types of variables;
➢ show and illustrate the relationship between populations and
samples;
➢ illustrate how to construct frequency distribution in making charts
and graphs;
INTRODUCTION

## Statistics is widely used in almost all fields of human interest. Business,

engineering, sciences, social research, agriculture, etc. seek the broadest
possible factual basis for decision-making. Statistics is defined as a science by
which data are collected, presented, organized, analyzed and interpreted.
Sampling is a procedure, where in a portion of the data is taken from a large set
of data, called population; and the inference drawn from the sample is extended
to whole group. If sampling is found appropriate for a research, the researcher,
then, identifies the target population as precisely as possible, and in a way that
makes sense in terms of the purpose of study. Second, he put together a list of
the target population from which the sample will be selected and decide on a
sampling technique. He then makes an inference about the population. All these
four steps are interwoven and cannot be considered isolated from one another.
Simple random sampling, systematic sampling, stratified sampling fall into the
category of simple sampling techniques. Complex sampling techniques are used,
only in the presence of large experimental data sets.

## Students study statistics for several reasons:

1. Like professional people, you must be able to read and understand the
various statistical studies performed in your fields. To have this
understanding, you must be knowledgeable about the vocabulary,
symbols, concepts, and statistical procedures used in these studies.

## 2. You may be called on to conduct research in your field, since statistical

procedures are basic to research. To accomplish this, you must be able to
1 STATISTICS
design experiments; collect, organize, analyze, and summarize data; and
possibly make reliable predictions or forecasts for future use. You must
also be able to communicate the results of the study in your own words.

3. You can also use the knowledge gained from studying statistics to
become better consumers and citizens. For example, you can make
intelligent decisions about what products to purchase based on consumer
studies, about government spending based on utilization studies, and so
on.

These reasons can be considered the goals for studying statistics. It is the
purpose of this chapter to introduce the goals for studying statistics by answering
questions such as the following:
What are the branches of statistics?
What are data?
How are samples selected?

## 1.1. Descriptive and Inferential Statistics

To gain knowledge about seemingly haphazard situations, statisticians
collect information for variables, which describe the situation.

## Data are the values (measurements or observations) that the variables

can assume. Variables whose values are determined by chance are called
random variables. Suppose that an insurance company studies its records over
the past several years and determines that, on average, 3 out of every 100
automobiles the company insured were involved in accidents during a 1-year
period. Although there is no way to predict the specific automobiles that will be
involved in an accident (random occurrence), the company can adjust its rates
accordingly, since the company knows the general pattern over the long run.
(That is, on average, 3% of the insured automobiles will be involved in an
accident each year.) A collection of data values forms a data set. Each value in
the data set is called a data value or a datum.

## Data can be used in different ways. The body of knowledge called

statistics is sometimes divided into two main areas, depending on how data are
used. The two areas are
1. Descriptive statistics
2. Inferential statistics

## Descriptive statistics consists of the collection, organization,

summarization, and presentation of data.

## In descriptive statistics the statistician tries to describe a situation.

Consider the nationalcensus conducted by the U.S. government every 10 years.
Results of this census give you the average age, income, and other
characteristics of the U.S. population. To obtain this information, the Census
Bureau must have some means to collect relevant data. Once data are collected,
the bureau must organize and summarize them. Finally, the bureau needs a
means of presenting the data in some meaningful form, such as charts, graphs,
or tables.

2 STATISTICS
The second area of statistics is called inferential statistics.

## Inferential statistics consists of generalizing from samples to

populations, performing estimations and hypothesis tests, determining
relationships among variables, and making predictions.

## Here, the statistician tries to make inferences from samples to

populations. Inferential statistics uses probability, i.e., the chance of an event
occurring. You may be familiar with the concepts of probability through various
forms of gambling. If you play cards, dice, bingo, and lotteries, you win or lose
according to the laws of probability. Probability theory is also used in the
insurance industry and other areas.

## It is important to distinguish between a sample and a population.

A population consists of all subjects (human or otherwise) that are being
studied. Most of the time, due to the expense, time, size of population, medical
concerns, etc.,it is not possible to use the entire population for a statistical study;
therefore, researchers use samples.

## A sample is a group of subjects selected from a population.

If the subjects of a sample are properly selected, most of the time they
should possess the same or similar characteristics as the subjects in the
population.

## An area of inferential statistics called hypothesis testing is a decision-

making process for evaluating claims about a population, based on information
obtained from samples.

## To appropriately apply the best sampling technique for a particular

research, researcher should understand the concept of constants and variables.
Constants refer to the fundamental quantities that do not change
throughout the course of study. On the other hand, variables are quantities that
may take specific set of values. This refers to the characteristics of a person or
thing that can be assigned a number or category. Blood type and blood pressure,
gender, student grades are examples of variables. These set of values can be
classified as qualitative (categorical) and quantitative (numerical). Qualitative
variables are non – measurable characteristics that cannot assume a numerical
value but can be classified into two or more categories. Gender is a qualitative
dichotomous variable. Drinking habits of an individual in different situations may
be classified as “Very Often”, “Often”, Seldom”, “Very Seldom” or “Never” is an
example of multinomous variable.
In an ordinal scale of measurements, categorical variables are usually
coded numerically for a purpose of obtaining a weighted average that would
typically represent a group of responses. Medical practitioners’ perceptions
towards an issue can be classified and coded as 5 for “Certainly Agree”, 4 for
“Agree”, 3 for “Undecided”, 2 for “Disagree” and 1 for “Certainly Disagree”.
Quantitative Variables are numerical that can be ordered or ranked. Age,
heights, weights and body temperatures are examples of quantitative variables.
Variable such as weight is continuous because, in principle, two weights can be
arbitrarily close together. Some types of numeric variables are not continuous but
fall on a discrete scale, with spaces between the possible values.
3 STATISTICS
Continuous Variables can assume an infinite number of values between any
two specific values. They are obtained by measuring.

## Discrete Variables assume values that can be counted.

A discrete variable is a numeric variable for which we can list the possible
values. For example, the number of eggs in a bird’s nest is a discrete variable
because only the values 0, 1, 2, 3, . . . , are possible.Other examples of discrete
variables are Number of bacteria colonies in a petri dish.
As a summary, variables can be classified as follows;

DATA

QUALITATIVE QUANTITATIVE

QUALITATIVE QUANTITATIVE

## In addition to being classified as qualitative or quantitative, variables can

be classified by how they are categorized, counted, or measured. For example,
can the data be organized into specific categories, such as area of residence
(rural, suburban, or urban)? Can the data values be ranked, such as first place,
second place, etc.? Or are the values obtained from measurement, such as
heights, IQs, or temperature? This type of classification—i.e.,how variables are
categorized, counted, or measured—uses measurement scales, and four
common types of scales are used: nominal, ordinal, interval, and ratio.

## The first level of measurement is called the nominal level of

measurement. A sample of college instructors classified according to subject
taught (e.g., English, history, psychology, or mathematics) is an example of
nominal-level measurement. Classifying survey subjects as male or female is
another example of nominal-level measurement. No ranking or order can be
placed on the data. Classifying residents according to zip codes is also an
example of the nominal level of measurement. Even though numbers are
assigned as zip codes, there is no meaningful order or ranking. Other examples
of nominal-level data are political party (Democratic, Republican, Independent,
etc.), religion (Christianity, Judaism, Islam, etc.), and marital status (single,
married, divorced, widowed, separated).

## The nominal level of measurement classifies data into mutually

exclusive (nonoverlapping), exhausting categories in which no order or ranking
can be imposed on the data.

The next level of measurement is called the ordinal level. Data measured
at this level can be placed into categories, and these categories can be ordered,
or ranked. For example, from student evaluations, guest speakers might be
ranked as superior, average, or poor. Floats in a homecoming parade might be
ranked as first place, second place, etc. Note that precise measurement of
differences in the ordinal level of measurement does not exist. For instance,
4 STATISTICS
when people are classified according to their build (small, medium, or large), a
large variation exists among the individuals in each class. Other examples of
ordinal data are letter grades (A, B, C, D, F).

## The ordinal level of measurement classifies data into categories that

can be ranked; however, precise differences between the ranks do not exist.

The third level of measurement is called the interval level. This level differs
from the ordinal level in that precise differences do exist between units. For
example, many standardized psychological tests yield values measured on an
interval scale. IQ is an example of such a variable. There is a meaningful
difference of 1 point between an IQ of 109 and an IQ of 110. Temperature is
another example of interval measurement, since there is a meaningful difference
of 1 degrees Fahrenheit between each unit, such as 72 and 73 degrees F. One
property is lacking in the interval scale: There is no true zero. For example, IQ
tests do not measure people who have no intelligence.

## The interval level of measurement ranks data, and precise differences

between units of measure do exist; however, there is no meaningful zero.

The final level of measurement is called the ratio level. Examples of ratio
scales are those used to measure height, weight, area, and number of phone
calls received. Ratio scales have differences between units (1 inch, 1 pound,
etc.) and a true zero. In addition, the ratio scale contains a true ratio between
values. For example, if one person can lift 200 pounds and another can lift 100
pounds, then the ratio between them is 2 to 1. Put another way, the first person
can lift twice as much as t he second person.

## The ratio level of measurement possesses all the characteristics of

interval measurement, and there exists a true zero.

In addition, true ratios exist when the same variable is measured on two
different members of the population. There is not complete agreement among
statisticians about the classification of data into one of the four categories. For
example, some researchers classify IQ data as ratio data rather than interval.
Also, data can be altered so that they fit into a different category. For instance, if
the incomes of all professors of a college are classified into the three categories
of low, average, and high, then a ratio variable becomes an ordinal variable.

## Table 1–2 gives some examples of each type of data.

5 STATISTICS
1.3. Data Collection and Sampling Techniques

## Steps in Statistical Investigation

1. Collection
2. Organization
3. Presentation
4. Analysis
5. Interpretation

## Methods of Collection of Data

1. Direct or Interview Method
2. Indirect or Questionnaire Method
3. Registration
4. Observation
5. Experiment

Sampling Techniques

## 1. Random samples are selected by using chance methods or random

numbers. One such method is to number each subject in the
population. Then place numbered cards in a bowl, mix them
thoroughly, and select as many cards as needed. The subjects whose
numbers are selected constitute the sample. Since it is difficult to mix
the cards thoroughly, there is a chance of obtaining a biased sample.
For this reason, statisticians use another method of obtaining numbers.
They generate random numbers with a computer or calculator. Before
the invention of computers, random numbers were obtained from
tables

## 2. Systematic Sampling. Researchers obtain systematic samples by

numbering each subject of the population and then selecting every kth
subject.

## 3. Stratified Sampling. Researchers obtain stratified samples by dividing

the population into groups (called strata) according to some
characteristic that is important to the study, then sampling from each
group. Samples within the strata should be randomly selected. For
example, suppose the president of a two-year college wants to learn
how students feel about a certain issue. Furthermore, the president
wishes to see if the opinions of the first-year students differ from those
of the second-year students. The president will select students from
each group to use in the sample.

## 4. Cluster Sampling. Researchers also use cluster samples. Here the

population is divided into groups called clusters by some means such
as geographic area or schools in a large school district, etc. Then the
researcher randomly selects some of these clusters and uses all
members of the selected clusters as the subjects of the samples.
Suppose a researcher wishes to survey apartment dwellers in a large
city. If there are 10 apartment buildings in the city, the researcher can
select at random 2 buildings from the 10 and interview all the residents
of these buildings.
Cluster sampling is used when the population is large or when it
involves subjects residing in a large geographic area. For example, if
one wanted to do a study involving the patients in the hospitals in New
York City, it would be very costly and time-consuming to try to obtain a
6 STATISTICS
random sample of patients since they would be spread over a large
area. Instead, a few hospitals could be selected at random, and the
patients in these hospitals would be interviewed in a cluster.

## In addition to the four basic sampling methods, researchers use

other methods to obtain samples. One such method is called a
convenience sample. Here a researcher uses subjects that are
convenient. For example, the researcher may interview subjects entering
a local mall to determine the nature of their visit or perhaps what stores
they will be patronizing. This sample is probably not representative of the
general customers for several reasons. For one thing, it was probably
taken at a specific time of day, so not all customers entering the mall have
an equal chance of being selected since they were not there when the
survey was being conducted. But convenience samples can be
representative of the population. If the researcher investigates the
characteristics of the population and determines that the sample is
representative, then it can be used.

EXERCISES 2.1

## 1. In each of these statements, tell whether descriptive or inferential

statistics have been used.
A. In the year 2010, 148 million Americans will be enrolled in an HMO
(Source: USA TODAY).
B. Nine out of ten on-the-job fatalities are men (Source: USA TODAY
Weekend).
C. Expenditures for the cable industry were \$5.66 billion in 1996 (Source:
USA TODAY ).
D. The median household income for people aged 25–34 is \$35,888
(Source: USA TODAY ).
E. Allergy therapy makes bees go away (Source: Prevention).
F. Drinking decaffeinated coffee can raise cholesterol levels by 7%
(Source: American Heart Association).
G. The national average annual medicine expenditure per person is
\$1052 (Source: The Greensburg Tribune Review).

7 STATISTICS
H. Experts say that mortgage rates may soon hit bottom (Source: USA
TODAY ).
2. Classify each as nominal-level, ordinal-level, intervallevel, or ratio-level
measurement.
A. Pages in the city of Cleveland telephone book.
B. Rankings of tennis players.
C. Weights of air conditioners.
D. Temperatures inside 10 refrigerators.
E. Salaries of the top five CEOs in the United States.
F. Ratings of eight local plays (poor, fair, good, excellent).
G. Times required for mechanics to do a tune-up.
H. Ages of students in a classroom.
I. Marital status of patients in a physician’s office.
J. Horsepower of tractor engines.
3. Classify each variable as qualitative or quantitative.
A. Number of bicycles sold in 1 year by a large sporting goods store.
B. Colors of baseball caps in a store.
C. Times it takes to cut a lawn.
D. Capacity in cubic feet of six truck beds.
E. Classification of children in a day care center (infant, toddler,
preschool).
F. Weights of fish caught in Lake George.
G. Marital status of faculty members in a large university
4. Classify each variable as discrete or continuous.
A. Number of doughnuts sold each day by Doughnut Heaven.
B. Water temperatures of six swimming pools in Pittsburgh on a given
day.
C. Weights of cats in a pet shelter.
D. Lifetime (in hours) of 12 flashlight batteries.
E. Number of cheeseburgers sold each day by a hamburger stand on a
college campus.
F. Number of DVDs rented each day by a video store.
G. Capacity (in gallons) of six reservoirs in Jefferson County.
5. Give three examples each of nominal, ordinal, interval, and ratio data.

8 STATISTICS
6. For each of these statements, define a population and state how a sample
might be obtained.
A. The average cost of an airline meal is \$4.55 (Source: Everything Has
Its Price, Richard E. Donley, Simon and Schuster).
B. More than 1 in 4 United States children have cholesterol levels of 180
milligrams or higher (Source: The American Health Foundation).
C. Every 10 minutes, 2 people die in car crashes and 170 are injured
(Source: National Safety Council estimates).
D. When older people with mild to moderate hypertension were given
mineral salt for 6 months, the average blood pressure reading dropped
by 8 points systolic and 3 points diastolic (Source: Prevention).
E. The average amount spent per gift for Mom on Mother’s Day is \$25.95
(Source: The Gallup Organization)
7. Select a newspaper or magazine article that involves a statistical study,
and write a paper answering these questions.
B. What are the variables used in the study? In your opinion, what level of
measurement was used to obtain the data from the variables?
C. Does the article define the population? If so, how is it defined? If not,
how could it be defined?
D. Does the article state the sample size and how the sample was
obtained? If so, determine the size of the sample and explain how it
was selected. If not, suggest a way it could have been obtained.
E. Explain in your own words what procedure (survey, comparison of
groups, etc.) might have been used to determine the study’s
conclusions.
F. Do you agree or disagree with the conclusions? State your reasons.
8. Information from research studies is sometimes taken out of context.
Explain why the claims of these studies might be suspect.
A. The average salary of the graduates of the class of 1980 is \$32,500.
B. It is estimated that in Podunk there are 27,256 cats.
C. Only 3% of the men surveyed read Cosmopolitan magazine.
D. Based on a recent mail survey, 85% of the respondents favored gun
control.
E. A recent study showed that high school dropouts drink more coffee
than students who graduated; therefore, coffee dulls the brain.
F. Since most automobile accidents occur within 15 miles of a person’s
residence, it is safer to make long trips. 17. Identify each study as
being either observational or experimental. a. Subjects were randomly
assigned to two groups, and one group was given an herb and the
other group a placebo. After 6 months, the numbers of respiratory tract
infections each group had were compared. b. A researcher stood at a

9 STATISTICS
busy intersection to see if the color of the automobile that a person
drives is related to running red lights.

## The Frequency Distributions

Now that we already know the classification of data under study for a
particular research, the first step is to carefully and systematically describe the
data in tables and graphs. Appropriate form of organization and presentation of
data should be used in order to arrive at a meaningful interpretation of data.
A frequency distribution is simply a display of the frequency or number of
occurrences, of each value in the data set. The data can be presented in tabular
form or with a graph. Graphical representation is the most effective means of
organizing and presenting statistical data because the important relationships are
brought out more clearly and creatively in virtually solid and colorful figures. A
bar chart is a simple graphic showing the categories that a categorical variable
takes on and the number of observations in each category for the data in the
sample.
Frequency Distribution is the organization of raw data in table form, using
classes and frequencies.

## Let us consider the following example taken from Elementary Statistics by

Bluman, 2008. Suppose a researcher wished to study the number of miles that
employees of a certain company traveled to work each day. The researcher first
would have to collect the data by asking the employees the approximate distance
they travel from his or her home. When the data are not yet arranged they are
called raw data. Data are collected as follows:

1 2 6 7 12 13 2 6 9 5
18 7 3 15 15 4 17 1 14 5
4 16 4 5 8 6 6 18 5 2
9 11 12 1 9 2 10 11 4 10
9 18 8 8 4 14 7 3 2 6

To construct the frequency distribution, we first arrange the set of data into
array. Array is the arrangement of data from the highest to lowest or from lowest
to highest. The frequency distribution consist of classes and their corresponding
frequencies. Each raw data is placed into category called class. The class
frequency refers to the number of observations belonging to a class interval for
the number of items within a category.

The frequency distribution of the above set of data can be shown below.

Class Limits
Frequency
(in miles)
1-3 10
4-6 14
7-9 10
10 - 12 6
13 - 15 5
16 - 18 5
Total = 50

10 STATISTICS
Using this table, general observations can be made. For example, it can
be gleaned from the table that majority of the employees live within 9 miles away
from the company.

## The classes in this distribution are 1 – 3, 4 – 6, etc. For the interval 1 – 3,

1 is the lower class limit and 3 is the upper class limit. These values are called
class limits. Class Boundaries are more precise expressions of the class limits by
at least 0.5 of their value. It is situated between the upper limit of our interval and
the lower limit of the next interval. On the above table, 0.5 – 3.5 is the class
boundaries of the interval 1 – 3. Notice that the class boundary is half lower than
the lower class limit and half higher than the upper class limit. The class width for
a class in a frequency distribution is found by subtracting the lower (or upper)
class limit of one class from the lower (or upper) class limit of the next class. The
class width of the above distribution is given by 4 – 1 = 7 – 4 = 3.

## 1. Determine the classes.

Find the range of the score in the given data. The range is the difference
between the highest and the lowest number. R  H  L .

## Divide the range by the size of the class interval desired.

Range
Number of Classes 
ClassWidth

Notice that if series contains less than 50 cases, 10 classes or less are
just enough. If series contains 50 to 100 cases, 10 to 15 classes are
recommended. If more than 100 cases, 15 or more classes are good.

## 2. Prepare the class interval and class frequency columns.

3. Tally the data and find the numerical frequencies from the tallies.

## Example 2. Consider the score of 50 students in Statistics.

46 46 45 43 43 43 42 41 40 40
39 37 37 37 36 35 34 32 31 30
29 29 29 29 28 28 28 28 28 28
27 27 27 26 26 26 25 25 24 24
24 23 23 22 19 19 18 14 13 9

## The highest value is 46 and the lowest value is 9

Find the Range. R  H  L  R  46  9  37 .
Select the number of classes desired (usually between 5 and 20). In this
case 13 is arbitrarily chosen

## Step 2: Prepare the class interval and class frequency columns.

34
Class width   2.61  3
13

Step 4: Tally the data and find the numerical frequencies from the tallies.

11 STATISTICS
Class Intervals
Frequency
Scores
45 - 47 3
42 - 44 4
39 - 41 4
36 - 38 4
33 - 35 2
30 - 32 3
27 - 29 13
24 - 26 8
21 - 23 3
18 - 20 3
15 - 17 0
12 - 14 2
9 - 11 1
N = 50

## All reasons for constructing a frequency distributions are used in statistics

and are helpful when one is organizing and presenting the data.

## 1. To organize the data in a meaningful, intelligibly way.

2. To enable the reader to determine the nature or shape of the
distribution.
3. To facilitate computational procedures for measures of average and
4. To enable the researcher to draw charts and graphs for the
presentation of data.
5. To enable the reader to make comparisons among different data sets.

## Histograms, Frequency Polygons and Ogives

After all the data have been organized into a frequency distribution, they
can now be presented in graphical form. Graphical representations of data are
helpful tool to convey the mathematical relations of one variable to another.

## Statistical graphs can be used to describe to describe and analyze data

set. The three most commonly used graphs in research are as follows:

## 1. Histogram. A frequency curve which is composed of a series of

rectangles constructed with the steps as the base and the frequency
as the height.
2. Frequency Polygon. A graph which is constructed by connecting points
above the midpoint of a step and at a height equal to the frequency of
the step.
3. Cumulative Frequency Graph or Ogive. A cumulative frequency
distribution which can be represented graphically by a cumulative
frequency curve or ogive distribution.

## Example: Consider the frequency distribution presented in section 2.2. Construct

the histogram, frequency polygon and the ogives.

12 STATISTICS
Class Intervals
Frequency
Scores
45 - 47 3
42 - 44 4
39 - 41 4
36 - 38 4
33 - 35 2
30 - 32 3
27 - 29 13
24 - 26 8
21 - 23 3
18 - 20 3
15 - 17 0
12 - 14 2
9 - 11 1
N = 50

## Class Intervals Class Cumulative

Frequency
Scores Boundaries Frequency
45 - 47 44.5 - 47.5 3 50
42 - 44 41.5 - 44.5 4 47
39 - 41 38.5 - 41.5 4 43
36 - 38 35.5 - 38.5 4 39
33 - 35 32.5 - 35.5 2 35
30 - 32 29.5 - 32.5 3 33
27 - 29 26.5 - 29.5 13 30
24 - 26 23.5 - 26.6 8 17
21 - 23 20.5 - 23.5 3 9
18 - 20 17.5 - 20.5 3 6
15 - 17 14.5 - 17.5 0 3
12 - 14 11.5 - 14.5 2 3
9 - 11 8.5 - 11.5 1 1
N = 50

## A. Histogram. Draw and label the x and y axes. The frequency is

represented on the y – axis while the class boundaries on the x – axis.
Using the frequencies as the heights, draw vertical bars for each class.

13 STATISTICS
14
13
12
11
10
9
Frequency

8
7
6
5
4
3
2

1
8.5 11.5 14.5 17.5 20.5 23.5 26.5 29.5 32.5 35.5 38.5 41.5 44.5 47.5
Class Boundaries

B. Frequency Polygon.
Steps in making Frequency Polygon
1. Label the points on the base line.
2. Plot the midpoints. Scores within the interval are concentrated on the
midpoint.
3. When all points are plotted, join them by series of short lines.

14 STATISTICS
Above shows the frequency polygon. Notice that along the x – axis, the
class boundaries are plotted and the frequency are situated on the y – axis. Each
point on the line is plotted on the class mark or midpoint of each class interval.

## C. The Cumulative Frequency Graph. To construct the cumulative frequency

graph, we shall construct first the cumulative frequency column on our
frequency distribution.

## Class Intervals Class Cumulative

Frequency
Scores Boundaries Frequency
45 - 47 44.5 - 47.5 3 50
42 - 44 41.5 - 44.5 4 47
39 - 41 38.5 - 41.5 4 43
36 - 38 35.5 - 38.5 4 39
33 - 35 32.5 - 35.5 2 35
30 - 32 29.5 - 32.5 3 33
27 - 29 26.5 - 29.5 13 30
24 - 26 23.5 - 26.6 8 17
21 - 23 20.5 - 23.5 3 9
18 - 20 17.5 - 20.5 3 6
15 - 17 14.5 - 17.5 0 3
12 - 14 11.5 - 14.5 2 3
9 - 11 8.5 - 11.5 1 1
N = 50

15 STATISTICS
Example 2: The data below shows the record of high temperatures observed for
each of the 50 provinces in the country. Construct the histogram, frequency
polygon and cumulative frequency graph (Ogive).

Class Boundaries
Frequency
(in degree Celsius)
99.5 - 104.5 2
104.5 - 109.5 8
109.5 - 114.5 18
114.5 - 119.5 13
119.5 - 124.5 7
124.5 - 129.9 1
129.5 - 134.5 1
N = 50

Solution:
A. Histogram
1. Draw and label the x and y axes.
2. Represent the frequency on the x - axis and the class boundaries on the
y - axis.
3. Using the frequencies as the height , draw vertical bars for each class.

16 STATISTICS
B. Frequency Polygon.
1. Find the midpoints for each class.

Class Boundaries
Midpoint Frequency
(in degree Celsius)
99.5 - 104.5 102 2
104.5 - 109.5 107 8
109.5 - 114.5 112 18
114.5 - 119.5 117 13
119.5 - 124.5 122 7
124.5 - 129.9 127 1
129.5 - 134.5 132 1
N = 50

2. Draw and label the x and y axes. Label the x - axis with the midpoints
of each class, and the use of suitable scale on the y - axis for the
frequencies.
3. Using the midpoint for the x value and the frequencies as the y
values, plot the points.
4. Connect the adjacent points with line segments.

17 STATISTICS
C. The Cumulative Frequency (Ogive) Graph
1. Find the cumulative frequency for each class.

Class Boundaries
Cumulative
Midpoint Frequency
(in degree Celsius) Frequency
99.5 - 104.5 102 2 2
104.5 - 109.5 107 8 10
109.5 - 114.5 112 18 28
114.5 - 119.5 117 13 41
119.5 - 124.5 122 7 48
124.5 - 129.9 127 1 49
129.5 - 134.5 132 1 50
N = 50

2. Draw and label the x and y axes. Label the x - axis with the class
boundaries. Use an appropriate scale y - axis to represent the
cumulative frequencies.
3. Plot the cumulative frequency at each upper class boundary. Upper
boundaries are used since the cumulative frequencies represent
number of data values accumulated up to the upper boundary of each
class.

18 STATISTICS
4. Connect the adjacent points with line segments.

EXERCISES 2.2

## 1. List five reasons for organizing data into a frequency distribution.

2. Name the three types of frequency distributions, and explain when each
should be used.
3. Find the class boundaries, midpoints, and widths for each class.
A. 12–18
B. 56–74
C. 695–705
D. 13.6–14.7
E. 2.15–3.93

4. How many classes should frequency distributions have? Why should the
class width be an odd number?
5. Shown here are four frequency distributions. Each is incorrectly
constructed. State the reason why.
A. Class Frequency
27–32 1
33–38 0
39–44 6
45–49 4
50–55 2
B. Class Frequency
5–9 1
9–13 2
13–17 5
17–20 6
20–24 3

C. Class Frequency
123–127 3

19 STATISTICS
128–132 7
138–142 2
143–147 19

D. Class Frequency
9–13 1
14–19 6
20–25 2
26–28 5
29–32 9

## 6. What are open-ended frequency distributions? Why are they necessary?

7. State Gasoline Tax The state gas tax in cents per gallon for 25 states is
given below. Construct a grouped frequency distribution and a cumulative
frequency distribution with 5 classes.

## 7.5 16 23.5 17 22 21.5 19 20 27.1 20 22

20.7 17 28 20 23 18.5 25.3 24 31 14.5 25.9 18
30 31.5 Source: The World Almanac and Book of Facts.

8. Weights of the NBA’s Top 50 Players Listed are the weights of the NBA’s
top 50 players. Construct a grouped frequency distribution and a
cumulative frequency distribution with 8 classes. Analyze the results in
terms of peaks, extreme values, etc.

240 210 220 260 250 195 230 270 325 225 165
295 205 230 250 210 220 210 230 202 250 265
230 210 240 245 225 180 175 215 215 235 245
250 215 210 195 240 240 225 260 210 190 260
230 190 210 230 185 260
Source: www.msn.foxsports.com

## 9. Stories in the World’s Tallest Buildings The number of stories in each of

the world’s 30 tallest buildings is listed below. Construct a grouped
frequency distribution and a cumulative frequency distribution with 7
classes.

88 88 110 88 80 69 102 78 70 55 79
85 80 100 60 90 77 55 75 55 54 60
75 64 105 56 71 70 65 72
Source: New York Times Almanac.

## 10. GRE Scores at Top-Ranked Engineering Schools The average

quantitative GRE scores for the top 30 graduate schools of engineering
are listed. Construct a grouped frequency distribution and a cumulative
frequency distribution with 5 classes.

767 770 761 760 771 768 776 771 756 770 763
760 747 766 754 771 771 778 766 762 780 750
746 764 769 759 757 753 758 746
Source: U.S. News & World Report Best Graduate Schools.

20 STATISTICS