Sie sind auf Seite 1von 8

1/25/2015

Lecture 1 Outline

Course logistics and details


What is Stat 139?
A few example problems
A quick R demonstration

Kevins Contact Info

Stat 139
Prereqs:

Kevins office: Science Center, Room SC-614


Office Hours:

AP Stat, Stat 100, 101, 102, 104, (Intro Stat);


or 110 (Intro Prob).

Mon 11am-noon and Fri 11am-noon


Also by appointment (via email)

Math 21a & 21b (Multivariable Calculus &


Linear Algebra)

Phone numbers:

The course material goes further than what

Statistics Department: (617) 495-5496


My office (SC-614): (617) 495-8711

you learned in an intro stat course, and


addresses the question: what happens when
assumptions are not met?

Email: krader@fas.harvard.edu (best way to get a


hold of me).
3

1/25/2015

Teaching Staff

Course Website
Course website:
https://canvas.harvard.edu/courses/2421

There you will find (eventually):

Teaching Fellows (may not be complete):


Ryan Lee: ryanwonlee@gmail.com
Patrick Xu: patrickxu@college.harvard.edu

Syllabus
Administrative Announcements
Lecture Notes
R Tutorial (including download and install
instructions)
Assigned Homeworks
HW #1
Other Study Material (practice exams, web links,
etc...)

Teaching fellows will be teaching sections, holding


office hours, answering questions via email, and
grading HWs and exams.

6
5

Class Meetings

Lecture Notes

Lectures:

Paper copies will NOT be handed out at the beginning of

Mon, Wed, & Fri, 10am11pm, SC-Hall E

Sections
Optional (but strongly recommended) weekly section to
discuss homework, do extra problems, and review
difficult concepts.
Held mostly Wed, and Thurs afternoons
No sections this week (begin week of Feb. 2).
Look for announcement on the course website for
permanent times (OHs too).

lecture after this week (we will provide copies on Wed and
Fri).
They will loosely follow the order of the text, and will
reference specific sections in the text.
Lecture notes will be posted online at least 24 hours in
advance. An email will be sent when they are posted.
Notes are very concise you are encouraged to add your
own annotations and develop your own notes.
Occasionally mistakes appear in lecture notes; corrected
versions will be posted after class.
8

1/25/2015

Recommended Textbook

R Software (+ RStudio)

(not required)

Statistical Sleuth: A Course in Methods of Data Analysis,


Ramsey & Schafer, 3rd edition. Amazon Link:
www.amazon.com/Statistical-Sleuth-Course-Methods-Analysis/dp/1133490670

Some of the assigned homework problems will be

assigned from the text, but will always be reproduced for


you on the assignment.
From time to time, specific reading assignments may come
from the text as well.
Exams will be based on the lectures directly, and nothing
new from the text, besides the specific readings, not seen
in the lectures, notes, or HWs.

R will be used throughout the course and it is required


on most homework assignments (including the first).
Reasons for R:
Completely free software. Can be downloaded from
http://cran.r-project.org/
Available PC, MAC, Linux, and even Iphone and Ipad!
Flexible stat toolkit, access to cutting-edge methods,
powerful graphics capabilities, large and vibrant
community, unlimited possibilities.

RStudio helps organize/streamline the program:


http://www.rstudio.com/
Tutorials this week and early next week

R Help Guides

10

Exams

(all found on course website)


On the course website:
https://canvas.harvard.edu/courses/2421/files/folder/R+Guides
R for Beginners. by E.Paradis
Using R for Data Analysis and Graphics. Introduction, Code
and Commentary by JH Maindonald

2 Midterms, both 10-11am (in class)


Wed, March 4th
Wed, April 10th
Final Exam, Date and Time TBD (May 8 16)

Simple R - Using R for Intro Statistics by J. Verzani

You will be allowed one reference sheet for the

The R Guide by W.J. Owen

first midterm, 2 sheets for the second midterm, and


3 sheets for the final exam.

An Introduction to R by LH Lam
Comprehensive introduction to R:
http://cran.r-project.org/doc/manuals/R-intro.pdf

12
11

1/25/2015

Homeworks

HW Collaboration

Posted to course website on Fridays. Due the following

You are encouraged to discuss homework with other

Friday
HW #1 will be posted soon, and will be due Friday,
Feb. 6

students (and with the instructor and TFs, of course),


but you must write your final answers yourself, in your
own words.

Hard Copies must be handed in to the 3rd floor HW

Solutions prepared in committee or by copying or

boxes.

paraphrasing someone elses work are not acceptable;


your handed-in assignment must represent your own
thoughts. All computer output you submit must come
from work that you have done yourself.

We allow one late HW, no questions asked (due by the


beginning of the following lecture).

Other late homework will only be accepted with an


official University excuse (either from UHS or from
your resident deans office). NO HW Scores will be
dropped!

Please indicate on your problem sets the names of


the students with whom you worked.
13

14

Course Grading

Lecture 1 Outline

Component

Weight1

Weight2

Weight3

Homework

30%

30%

30%

Project

15%

15%

15%

Midterm

20%

5%

20%

Midterm

5%

20%

20%

Final Exam

30%

30%

15%

Course logistics and details


What is Stat 139?
A few example problems
A quick R demonstration

Your overall score for the course will be the highest of the 3 weighting
schemes presented above. Final course letter grades are not assigned
Total
100%
100%
100%
according to
a fixed percentages
of A's, B's,
etc

15

16

1/25/2015

Sleuthing: Whats in a word?


Merriam-Webster (m-w.com):
to act as a detective, search for information
to search for and discover

Richard D. De Veaux, Paul F. Velleman,


Amstat News, Sept 2008

Urban dictionary:
To play the role of detective, to gather facts and
information usually in the traditional Sherlock
Holmes inconspicuous way.
A proper Sleuth needs to be intelligent, witty, and
always a few steps ahead others. . His wisdom
is his greatest asset.

Mathematics has a long history of prodigies


and geniuses, with many of the most
famous luminaries showing their genius
at remarkably early ages
but why not Statistics?
17

Course Goal: learn statistical judgment


1. Improve understanding of statistical reasoning and
measures of uncertainty.

18

Lies, Damned Lies, and Statistics


Reasons?

2. Learn to translate long computer output to a short


summary of results in scientific as well as common
languages.
3. Expand your statistical toolkit and, at the same time,
deepen the understanding.
Change the way your reason about the world.
19

Conclusion is not supported by the method used


(e.g., causation vs. association)
Assumptions of the method are not satisfied, i.e.,
the model does NOT fit the data (e.g., units are
not independent, relationship is nonlinear etc.)
Unreliable source of the data themselves or poor
data collection techniques.
20

1/25/2015

Course Goal #3, Expanded: Statistical


Toolkit and Understanding
a. Based on formulated question of interest, be able to
choose the appropriate statistics tool;
b. Know its assumptions and understand the consequences
of violating each of them;
c. Clean the data and prepare them for analysis;
d. Fit the chosen model using R and check model-fit
qualitatively and quantitatively;
e. Formulate exactly what can be inferred from the results
in a language common to all scientists as well as in
layman's terms;
f. Understand the limitations of the model.
21

Examples of misleading conclusions if


key statistical principles are ignored

Lecture 1 Outline

22

Course logistics and details

Bethany L. Peters & Edward Stringham (2006). "No Booze? You


May Lose: Why Drinkers Earn More Money Than
Nondrinkers,

What is Stat 139?


A few example problems

Examining the General Social Survey, we find that self-reported


drinkers earn 10-14 percent more than abstainers, which
replicates results from other data sets.[] .These results
suggest that social drinking leads to increased social capital.

A quick R demonstration

What could possibly go wrong with this argument?


What are the relevant statistical principles or concepts here?
23

24

1/25/2015

Space Shuttle Challenger crash in


1986

Subprime mortgage crisis

Was caused by a failure of the O-rings used to control the flow


of fuel gasses.

In 2007, the US economy entered a mortgage crisis followed


by a recession.

During the day of the launch the outside temperature was


unusually low (31F).

A proximate cause was the rise in subprime lending.


Many subprime loans were packaged into mortgage-backed
securities (MBS) and ultimately defaulted.

The previous shuttles were launched at temperatures between


53F and 81F.

Subsequently, some flaws were highlighted in models used to


price and rate securities based on mortgages:

Statistical model showed association between cold


temperatures and O-ring failures, but the evidence was not
conclusive (partially, due to small sample size).

Assumptions on housing prices,


Assumptions on correlation between defaults.

What are the relevant statistical principles or concepts here?


25

Lecture 1 Outline

26

An R Demonstration

Course logistics and details


What is Stat 139?

A friend of mine said that this winter has been much


milder than last year, to date.

A few example problems

Is there evidence of this in the data?


How should we collect the data?

A quick R demonstration

What summary statistics should we measure?


What comparison should we make?
What statistical model or test should we use?
27

28

1/25/2015

An R Demonstration (cont.)
f = file.choose()
data = read.csv(f)
n = dim(data)[1]
data$maxtemp = data$Max.TemperatureF
winter15=data[data$Date >= "2015-0101",]
winter14=data[data$Date >= "2014-01-01"
& data$Date <= (data$Date[n]-365),]
# Visualize the data
boxplot(winter14$maxtemp,winter15$maxtem
p,col=c("rosybrown","green3"))
# As a 2-sample unpooled t-test
t.test(winter14$maxtemp,winter15$maxtemp
)
# As a 2-sample pooled t-test
t.test(winter14$
maxtemp,winter15$maxtemp, var.equal=T)
# As a 2-sample paired t-test
t.test(winter14$Max.TemperatureF
,winter15$Max.TemperatureF, paired=T)

Some Logistical Details

# As a Rank Sum test


w.test=wilcox.test(winter14$maxtemp
,winter15$maxtemp)

R tutorials this week and next: Wed, Jan 28 Mon, Feb 2.


Very basic introduction. Note: schedule may change a bit.

# As a Resampled test
diff.obs=mean(winter14$maxtemp)mean(winter15$maxtemp)
combined.sample=c(winter14$maxtemp,
winter15$maxtemp)

Wed: 7-8pm in SC-107.


Thurs: 7-8pm in Hall A, 8-9 & 9-10pm in SC-B09
Fri: 12-1, 1-2, 2-3pm in SC-B09

nsims=10000
diff.sim=rep(NA,nsims)
for(i in 1:nsims){
resampled.temp=sample(combined.sample,le
ngth(combined.sample))
diff.sim[i]=mean(resampled.temp[1:length
(winter14$maxtemp)])mean(resampled.temp[(length(winter14$max
temp)+1):length(combined.sample)])
}
mean(abs(diff.sim)>abs(diff.obs))

29

Sun: 4-5pm in SC-B09


Mon: 7-8pm in SC-B09

Sections will begin next week (Feb 2).


TF OH schedule to come; starts Feb 2.
First HW due next Friday, 2/6 @ 2pm. Will be posted by
the end of this week.
30

The Last Word

Correlation does not imply causation, but it does


waggle its eyebrows suggestively and gesture
furtively while mouthing look over there.
31

Das könnte Ihnen auch gefallen