You are on page 1of 6

# 1

## Lecture Notes on Introductory Statistics, I

(P.P. Leung)
Lecture notes are based on the following textbook:
N.A. Weiss (2012), Introductory Statistics, 9th edition, Pearson.
Chapter 1 The Nature of Statistics
1.1 Two kinds of Statistics
1.4 Other Sampling Designs ()

## Chapter 1 The Nature of Statistics

What is Statistics? ?
From Wikipedia, the free encyclopaedia:
Statistics is a mathematical science pertaining to the collection, analysis, interpretation
or explanation, and presentation of data. It is applicable to a wide variety of academic
disciplines, from the natural and social sciences to the humanities. Statistics is also used for
making informed decisions in government and business.
Statistical methods can be used to summarize or describe a collection of data; this is
called descriptive statistics. In addition, patterns in the data may be modeled in a way that
accounts for randomness and uncertainty in the observations, and then used to draw
inferences about the process or population being studied; this is called inferential statistics.
Both descriptive and inferential statistics comprise applied statistics. There is also a discipline
called mathematical statistics, which is concerned with the theoretical basis of the subject.
From Our textbook:

## Facts or data of a numerical kind, assembled (), classified () and tabulated (

) so as to present significant information about a given subject

## The science of assembling, classifying, and tabulating such facts or data

Collecting and analyzing data for the purpose for making generalizations and decisions
From :

## Statistic (in wider and simpler sense):

1. Let data talk.
2. Quantify the uncertainties.
3. Making decision without enough information.

## 1.1 Two kinds of Statistics

Descriptive Statistics () consists of methods for organizing () and
summarizing () information, e.g. the NBA/CBA season every year.
Inferential Statistics () consists of methods for drawing and measuring the
reliability () of conclusions about a population based on information obtained
from a sample of the population, e.g. the 1948 presidential election.
Technical Terms ():
Population () the collection of all individuals or items under consideration in a
statistical study.
Sample () a subset (part) of the population from which information is collected.
Statistics in this course either descriptive statistics or inferential statistics (they are applied
statistics.)

## 1.2 Simple Random Sampling ()

Census () acquire information on the entire population of interest.
Sampling () acquire information on only part of the population of interest.
Experimentation () acquire information by making up an experiment.
Why sampling is needed?
Survey of the whole population is usually labouring, time-consuming, expensive, frequently
impractical and sometimes impossible.
Simple Random Sampling () A sampling procedure for which each possible
sample of a given size is equally likely to be the one obtained. (
)
Simple Random Sample () A sample obtained by simple random sampling.
Why random sample is so important?
The sample being considered must be a representative sample (), i.e. it should
reflect as closely as possible the relevant characteristics () of the population.

## Random sample is a representative sample.

Simple random sampling with replacement ()
Simple random sampling without replacement ()
(In this course, unless we specify otherwise, assume that simple random sampling is done
with replacement).
Example Simple Random Sampling P.14 Ex1.7
Sampling Oklahoma State Officials As reported by The World Almanac, the top five state
officials of Oklahoma are as shown in Table 1.2. Consider these five officials a population of
interest.
(a) List the possible samples (without replacement) of two officials from this population of
five officials.
Governor (G)
Lieutenant Governor (L)
Secretary of State (S)
Attorney General (A)
Treasurer (T)
(b) Describe a method for obtaining a simple random sample of two officials from this
population of five officials.
(c) For the sampling method described in part (b), what are the chances that any particular
sample of two officials will be the one selected?
(d) Repeat parts (a)-(c) for samples of size 4.
Solution For convenience, we represent the officials by using the letters in parentheses.
{(G, L), (G, S), (G, A), (G, T), (L, S), (L, A), (L, T), (S, A), (S, T), (A,T)}
(a) Table 1.3 lists the 10 possible samples of two officials from this population of five
officials.
(b) To obtain a simple random sample of size 2, we could write the letters that correspond to
the five officials (G, L, S, A, and T) on separate pieces of paper. After placing these five
slips of paper in a box and shaking it, we could, while blindfolded, pick two slips of
paper.
(c) The procedure described in part (b) will provide a simple random sample. Consequently,
each of the possible samples of two officials is equally likely to be the one selected. There
are 10 possible samples, so the chances are 1/10 (1 in 10) that any particular sample of
two officials will be the one selected.

(d) Listing the five possible samples of four officials from this population of five officials. A
simple random sampling procedure, such as picking four slips of paper out of a box, gives
each of these samples a 1 in 5 chance of being the one selected.
{(G, L, S, A), (G, L, S, T), (G, L, A, T), (G, S, A, T), (L, S, A, T)}
Random-number Table to obtain random numbers, P.15, A5.
1.4 Other Sampling Designs ()
Why do we need other sampling designs?
Simple random sampling is relatively labouring, time-consuming, costly and sometimes
impractical. With limited resources, it is necessary to look for some other sampling designs.
The thumb rule is the sample obtained has to be closest as possible to the simple random
sample.
Systematic Random Sampling (/)
S1. Divide the population size by the sample size and round the result down to the nearest whole number, m.
S2. Use a random-number table or a similar device to obtain a number, between 1 and m.
S3. Select for the sample those members of the population that are numbered k, k+m, k+2m,

Cluster Sampling ()
S1. Divide the population into groups (clusters).
S2. Obtain a simple random sample of the clusters.
S3. Use all the members of the clusters obtained in step 2 as the sample.
Stratified Sampling (/)
S1. Divide the population into subpopulations (strata).
S2. From each stratum, obtain a simple random sample of size proportional to the size of the
stratum; that is the sample size for a stratum equals the total sample size times the stratum
size divided by the population size.
S3. Use all the members obtained in S2 as the sample.
Multistage sampling ()
Most large-scale surveys combine one or more of simple random sampling, systematic
random sampling, cluster sampling and stratified sampling.

Review Problems
Under standing the Concepts and Skills

## 1. In a newspaper or magazine, or on the Internet, find an example of

a. a descriptive study.
b. an inferential study.
2. Almost any inferential study involves aspects of descriptive statistics. Explain why.
3. Baseball Scores On September 3, 2005, the following baseball scores were printed in The
Major League Baseball
Giants 6, D'backs 3
Cubs 7, Pirates 3
Marlins 4, Mets 2
Phillies 7, Nationals 1
Braves 7, Reds 4
Astros 6, Cardinals 5
Rockies 11, Dodgers 3

## Blue Jays 4, Devil Rays 3

Orioles 7, Red Sox 3
White Sox 9, Tigers 1
Indians 6, Twins 1
Rangers 8, Royals 7
A's 12, Yankees 0
Angels 4, Mariners 1

4. Serious Energy Situation. In a USA TODAY/CNN Gallup Poll, 94% of those surveyed
said that the United States faced a serious energy situation, but, by 47% to 35%, they
preferred an emphasis on conservation rather than on more production. Is this study
5. British Backpacker Tourists. Research by Gustav Visser Charles Barker in "A Geography
of British Backpacker tourists in South Africa" (Geography, Vol. 89, No. 3, pp. 226reflects on the impact of British backpacker tourists visiting. South Africa. A sample of
British backpackers was interviewed. The information obtained from the sample was used
to construct the following table for the age distribution of all British backpackers. Classify
Age (yrs)

Percentage

Less than 21
21-25
26-30
31-35
36-40
Over 40

9%
46%
27%
10%
4%
4%

6. Teen Drug Abuse. In an article dated April 24,200-5, USA TODAY reported on the 17th

annual study on teen drug abuse, conducted by the Partnership for a Drug-Free America.
According to the survey of 7300 teens, the most popular prescription drug abused by teens
was Vicodin, with 18% or about 4.3 million youths reporting that they had used it to
get high. OxyContin and drugs for attention deficit disorder, such as Ritalin/Adderall,
followed with one in 10 teens reporting that they had tried them. Answer the following
a. Is the statement about 18% of youths abusing Vicodin inferential or descriptive?
b. Is the statement about 4.3 million youths abusing Vicodin inferential or descriptive?
7. Regarding observational studies and designed experiments:
a. Describe each type of statistical study.
b. With respect to possible conclusions, what important difference exists between these
two types of statistical studies?
8. Persistent Poverty and IQ. An article appearing in an issue of The Arizona Republic
reported on a study conducted by Greg Duncan of the University of Michigan. According
to the report, "Persistent poverty during the first 5 years of life leaves children with IQs 9.1
points lower at age 5 than children who suffer no poverty during that period...." Is this
statistical study an observational study or is it a designed experiment? Explain your