Sie sind auf Seite 1von 30

Presented by Dr.

Hien Tran
School of Engineering – Tan Tao University
What is statistics?
 Statistics is the science of collecting, organizing,
summarizing, and analyzing information (DATA) to
draw conclusions or answer questions. In addition,
statistics is about providing a measure of confidence
in any conclusions.
 A key aspect of data is that they vary. Is everyone in your
class the same height? No! Does everyone have the same
hair color? No! So, among individuals there is variability.
 In fact, data vary when measured on ourselves as well.
Do you sleep the same number of hours every night? No!
Do you consume the same number of calories every day?
No!
 One goal of statistics is to describe and understand sources
of variability.
 The entire group of individuals to be studied
is called the population. An individual is a
person or object that is a member of the
population being studied. A sample is a
subset of the population that is being studied.
 Descriptive statistics consist of organizing and
summarizing data. Descriptive statistics describe data
through numerical summaries, tables, and graphs. A
statistic is a numerical summary based on a sample.

 Inferential statistics uses methods that take results


from a sample, extends them to the population, and
measures the reliability of the result.
(Inferential statistics ~ Data + Probability)
Why study statistics?
 To be able to read (understand comprehensively)
journals/newspaper
 To be able to effectively conduct research
 To further develop critical and analytic thinking skills
 To be an informed consumer
 To know when you need to hire outside statistical help
How to study statistics?
The Process of Statistics
1. Indentify the research objective
2. Collect the data needed to answer the questions posed in 1.
3. Describe the data
4. Perform inference
 In both studies, the goal of the research was to determine if
radio frequencies from cell phones increase the risk of
contracting brain tumors. Whether or not brain cancer was
contracted is the response variable. The level of cell phone
usage is the explanatory variable.
 In research, we wish to determine how varying the amount of
an explanatory variable affects the value of a response
variable.
A convenience sample is one in which the
individuals in the sample are easily obtained.

Any studies that use this type of sampling


generally have results that are suspect. Results
should be looked upon with extreme skepticism.
If the results of the sample are not representative of the
population, then the sample has bias.

 Three Sources of Bias


1. Sampling Bias
2. Non-response Bias
3. Response Bias
Sampling bias means that the technique used to obtain
the individuals to be in the sample tend to favor one part
of the population over another.

Undercoverage is a type of sampling bias.


Undercoverage occurs when the proportion of one
segment of the population is lower in a sample than it
is in the population.
Non-response bias exists when individuals selected to be
in the sample who do not respond to the survey have
different opinions from those who do.

Non-response can be improved through the use of


callbacks or rewards/incentives.
Response bias exists when the answers on a survey do
not reflect the true feelings of the respondent.

Types of Response Bias


1. Interviewer error
2. Misrepresented answers
3. Words used in survey question
4. Order of the questions or words within the question
3 kinds of lies? (by Mark Twain)
3 kinds of lies? (by Mark Twain)
 Lies
3 kinds of lies? (by Mark Twain)
 Lies
 Damned lies
3 kinds of lies? (by Mark Twain)
 Lies
 Damned lies

Statistics
Lies, Damned lies, and Statistics
 Availibility bias

The problem is the following: because the use of available samples is


sometimes OK, we are perhaps fooled into thinking that they are OK
even when they’re not. And then we come up with arguments like:

i) Smoking can’t be all that bad. I know a lot of smokers who have lived
long and healthy lives.
ii) Cats must have a special ability to fall from great heights and
survive, because I’ve seen a lot of press reports about such events
(and I forget that I’ll rarely read a report about a cat falling and
dying).
iii) Violent criminals should be locked up for life because I’m always
reading newspaper articles about re-offenders (again, very unlikely
that I’ll read anything about non-re-offenders).
Lies, Damned lies, and Statistics
 Wording of questions
Example: (Questions and Answers in Attitude Surveys, 1981,
p.277 by Schuman and Presser)
Q1: Do you think the United States should forbid public speeches
against democracy?
YES: 21.4% (i.e. NO: 78.6%)
Q2: Do you think the United States should allow public speeches
against democracy?
NO: 47.8% (i.e. YES: 52.2%)
Lies, Damned lies, and Statistics
 Sampling technique

The U.S. Presidental


Election, 1948:

Harry S. Truman
versus
Thomas E. Dewey
Lies, Damned lies, and Statistics
 Jumping to conclusion / bad interpretation
Lies, Damned lies, and Statistics
 Misleading Averages

A statistician who put


her head in the oven
and her feet in the
refrigerator? She said,
“On average, I feel just
fine.”

statistician drowning in a pond with


an average depth of 3ft
Lies, Damned lies, and Statistics
 Misleading Averages - Match your facts with your questions
i. The professor felt that the test must have Grade # Received
been too easy, because the average
(MEDIAN) grade was a 95. 100 4
ii. When a colleague asked her about how
the midterm grades came out, she 98 5
answered, knowing that her classes were
gaining a reputation for being “too easy,” 95 2
that the average (MEAN) grade was an
80. 63 4
iii. A student may tell his parents: “Don’t worry
about my 63. It is not as bad as it sounds. 58 6
The average (MODE) grade was a 58.”
Lies, Damned lies, and Statistics
 Graphical Misrepresentations of Data
The data in the table to the right represent Year, x Life Expectancy, y
the historical life expectancies (in years) of
1950 68.2
residents of the United States.
1960 69.7
(a)Construct a misleading time series 1970 70.8
graph that implies that life expectancies 1980 73.7
have risen sharply.
1990 75.4
(b)Construct a time series graph that is
not misleading. 2000 77.0

(a) (b)
Lies, Damned lies, and Statistics
 Lurking variable/hidden factor
Simpson’s Paradox – Example: Kidney stone treatment

Treatment A Treatment B
Small Stones Group 1 Group 2
93% (81/87) 87% (234/270)
Large Stones Group 3 Group 4
73% (192/263) 69% (55/80)
Both 78% (273/350) 83% (289/350)

Which treatment is better?


Thank You
Have a Wonderful School Year!

Das könnte Ihnen auch gefallen