# Data Collection

## Populations and Samples

In stats, a population is defined as the whole set of items that are of interest – e.g. all the
people in a town.

## A census observes or measures every member of a population.

A sample is a selection of observations taken from a subset of the population which is used
to find out information about the population as a whole.

Census • Should give • Time consuming and
completely accurate expensive
result • Cannot be used when
the testing process to
destroys the item
• Hard to process large
quantity of data
Sample • Less time consuming • The data may not be
• Fewer people have to as accurate
respond • Sample may not be
• Less data to process large enough to give
small sub-groups

The size of the sample affects the validity of drawn conclusions. It depends on the required
accuracy. Generally, the larger the sample, the greater the accuracy. Large samples required
if population is very varied. Different samples could lead to different conclusions.

## Individual units of a population are known as sampling units.

Often sampling units of a population are individually named or numbered to form a list
called a sampling frame.

Sampling

## The sample should be representative of the population.

Random sampling helps to remove bias from a sample. There are three methods:
• Simple random sampling
• Systematic sampling
• Stratified sampling
A simple random sample of size n is one where every sample of size n has an equal chance
of being selected.
To carry out a simple random sample, each item in a sampling frame is allocated a number
to be selected randomly.

These random numbers can be selected either using a computer or using lottery sampling
(taking numbers out of a hat.

In systematic sampling, the required element are chosen at regular intervals from an ordered
list.

In stratified sampling, the population is divided into mutually exclusive strata e.g. males and
females, and a random sample is taken from each. The proportion of each strata sampled
should be the same.

The number sampled in stratum = no. in stratum / no. in pop * overall pop size

Simple random • Free of bias • Not suitable when
• Easy and cheap for population is large
small population • A sampling frame is
samples needed
• Each unit has a known
and equal chance of
selection
Systematic • Simple and quick • Sampling frame is
• Suitable for large needed
populations and • Can introduce bias if
samples sampling frame not
random
Stratified • Sample accurately • Population must be
reflects the population clearly defined into
structure strata
• Guarantees • Selection within each
proportional stratum suffers from
representation within the same

Non-random sampling

• Quota sampling
• Opportunity

## Quota sampling – researcher selects a sample that relfects characteristics of entire

population. The researcher meets with people and assigns them under particular
characteristics.

Opportunity sampling – taking sample from people available at the time and who fit the
criteria. e.g. first 20 people outside a supermarket
Quota • Allows small sample to • Can introduce bias
represent population • Population must be
• No sampling frame divided into groups
required • Non-responses not
• Quick and easy regarded as such
• Easy comparison
between groups
Opportunity • Easy • Unlikely to provide a
• Inexpensive representative sample
• Highly dependent on
individual researcher

Types of data

## Large amounts of data can be stored in a frequency table or as grouped data.

When data is presented in a grouped frequency table, the specific data values are not
shown. The groups are known as classes:

• Class boundaries tell you the maximum and minimum values in a class
• Midpoint is average of class bondaries
• Class width is difference between upper and lower boundaries

## The large data set

Some questions will be based on weather data from the large data set provided by edexcel.