59 views

Uploaded by Daniel Jedrychowski

- sddf
- Yoshiyuki Sowa et al- Direct observation of steps in rotation of the bacterial flagellar motor
- Test Bank for Business Statistics in Practice 7th Edition by Bowerman
- Lectura 11. Cutting or Capping of High Assay Values
- tabulation of data
- Seven Quality Tools
- 7 tools
- AGH University of Science
- Graph 1
- Chapter 02
- z7QC tools.pdf
- Histogram
- Grain Size Edit Print
- Histogram
- Creating and Viewing Histograms_SPD
- Bayes
- QDM_SubjectOutline
- The Seven Basic Tools of Quality
- Chapter 1&5
- TQM Tools & Techniques

You are on page 1of 60

“Statistics”

Lecturer Dr Gery Geenens

RC-2053

ggeenens@unsw.edu.au

Office hours: please use email to fix an appointment

Lecture slides and other course materials will be

made available on web page (BlackBoard9)

Tutorials start in Week 1 in the labs (Red Center)

Tutes in the computer lab: Weeks 1,2,4,6,8 and 12

Tutes in a classroom: Weeks 3,5,7,9,11,13

1

Help with the course:

• You will be assigned a tutor for tutorials/labs, who

should be your first point of contact

• Keep up! Don’t get behind. Seek help if you need it.

Prepare for tuts/labs and ask your tutor

• Attempt all the online quizzes

• There will be consultation hours available with your

lecturer and some other statistics’ staff. Details on

course web page or the door to RC-2053. Times can

be adjusted if necessary - contact lecturer if all

advertised hours clash with your timetable

• Peer support for statistics through the Student

Support Scheme (SSS) in RC-3064 2

Textbooks and reference books

1. Required textbook:

and N. Farnum, 2nd Edition, Duxbury Press, Thomson

Publishers.

Bundled with Student Solution Manual cost around $120.00

from UNSW Bookstore.

2. Additional references:

J.L. Devore. (Any edition is useful. There is now a new 7th

edition.) Duxbury Press, Thomson Publishers.

Press, Thomson Publishers.

3

Assessments

• Matlab online quizz (4%) : start as early in session as

you can. Due Monday 1pm Week 3

• 3 Stats online quizzes (2%+2%+2%) : due Monday

1pm, Weeks 5, 9 and 13

The online quizzes are through Maple TA : there is a

link to it from your Blackboard9 web page

• Midsession Test (15% ) : in tutorial in Week 7

• Matlab test (15%) : Week 10 – you should book a

time to suit you

• Final exam (60%): in exam period

Week 1: Data and Distributions

• Textbook reference: Ch 1: sections 1.1,1.2,1.3

- Stemplot (stem-and-leaf display) Q1,page 19

- Histogram Q17, page 23

- Uniform distribution Q19, page 31

What is Statistics?

• Describing data

Producing data

Drawing conclusions from data

• Statistical Science is the science of collecting,

organising and interpreting numerical facts, which

are referred to as data

• “Statistical Science is the Science of turning data into

information for decision making”

• Statistical science provides methods to enable us to

make intelligent judgements and informed decisions

in the presence of uncertainty and variation

6

Variation is endemic

• Natural Variation: Variation is the usual

situation and arises in nature, raw materials,

experimental conditions etc.

measurements on the same physical quantity.

7

Branches of Statistical Methods

• Descriptive: Summarize and describe

important features of the data

inferences) about some characteristic of a

population based on measurements on a

sample of individuals selected (how?) from

the population

8

Example 1: Does Cloud Seeding Work?

(from Devore, 1995, p. 39):

Is cloud seeding really effective in increasing rainfall?

For 26 pairs of days with similar weather, cloud seeding

was tried on one of the two days

Pair Rainfall: Seeded Rainfall: Unseeded

(mm) (mm)

1 4.1 1.0

2 7.7 4.9

3 17.5 4.9

4 31.4 11.5

5 32.7 17.3

….. …… …..

9

Rainfall difference (Seeded-Unseeded)

Difference

10

Cloud seeding experiment

• Most of the rainfall differences are positive, which

seems to suggest that cloud seeding increases

rainfall. However, can we be sure that the apparent

effect is not due to chance?

seeding work or not?) based on experimental data

experiments like this one, for collecting data and for

answering questions of interest based on the data

11

Example 2

Hair colour and pain tolerance

Melbourne suggests that there may be a difference

in pain threshold for blonds and brunettes

blond, light brunette and dark brunette groups and a

pain threshold score was measured for each subject.

(A higher score means a higher pain threshold)

12

Data for Pain and Hair Colour Experiment

D Brunette

L Brunette

D Blond

L Blond

0 10 20 30 40 50 60 70

13

Hair colour and pain tolerance

related to hair colour?) based on experimental data

colour, but is this effect real or just due to chance?

(the sample size is quite small)

differences between groups defined by different

characteristics (inherent or experimental)

14

Example 3: Defective bricks

• A sample of 214 bricks from a batch of bricks

yields 18 defective

5% of the entire batch, the batch is considered

acceptable

the process?

15

Defective bricks continued

• Again we have a decision to make (is our

manufacturing process behaving as it should?) based

on experimental data

sample indicate that the long run fraction of

defective is bigger than 5%?

this one, as well as for deciding on what data should

be collected (e.g. the sample size and so on)

16

Example 4: Challenger Space Shuttle

• For the 23 previous space shuttle missions prior to

the Challenger disaster of January 28, 1986 the

following variables were recorded:

– Temperature at launch

– Prelaunch pressure

– Number of O-rings which failed (out of six)

17

18

Data used night before launch

NOTE

SCALE Does Temperature have an effect on

19

LIMITS O-ring incidents?

Data including launches with no

incidents

20

21

Probability of Field Failure O-rings

• For each field joint, let pF be the

probability of a field joint failure

pF = pa pb pc pd

field joint failure

pFF = 1 – (1 – pF)6

22

Predictions under 2 scenarios:

Conditions for the ill-fated launch:

At 200psi, 31 F pˆ F ≈ 0.023,

pˆ FF ≈ 1 − (1 − 0.023) = 0.13

6

At 200psi, 60 F pˆ F ≈ 0.0032,

pˆ FF ≈ 1 − (1 − 0.0032) = 0.019

6

23

Experimental procedure:

1. Formulate the question(s)

2. Decide what data is required/most appropriate

3. Collect the data

4. Analyse/interpret the data

5. Draw conclusions (answer the questions)

24

Branches of Statistical Methods

• Descriptive: Summarize and describe

important features of the data

inferences) about some characteristic of a

population based on measurements on a

sample of individuals selected (how?) from

the population

25

Sample from Populations

• Population: set of all objects of interest in a

problem

the purpose of learning about characteristics

of the population

be REPRESENTATIVE

26

Sampling Populations for Inference

Population

X X X = sampled individuals

X

X

X Sample

XX

X

Infer Population

average is “close”

to sample Calculate Sample Average

average

27

Example of Inference

• Population average height for adult males is unknown

– A sample of 1000 randomly chosen males gave a sample

average height of 176.5cm

– We infer that the population average height is near

176.5cm

• Question: Can we quantify the accuracy or precision of

this inference?

• Answer: Statistical Science can!

• BUT the sample should ideally be REPRESENTATIVE of

the population studied

(RANDOM samples can help with this)

28

Defective bricks example revisited

• Bricks are manufactured in batches

• Contract requires proportion defective (π) of bricks in

batch is no larger than 5%

• Decide if a batch should be sent to a customer

• The Population is the set of all bricks in the batch

• Impractical to inspect every brick

• Collect a subset randomly from batch and calculate

the proportion p of defective in this sample

29

Descriptive Statistics

• Given sample data, our most basic statistical task is

to summarize it in some way

for different types of data

• Types of data:

– Quantitative (Numerical)

– Qualitative (Categorical)

30

Individuals and Variables

• Individuals are the objects (people, animals,

things) described by the data. Individuals are

sometimes referred to as elements, units or

participants

• A variable is any characteristic of an

individual. A variable can take different values

for different individuals

Variables can be multidimensional

(Univariate/ Bivariate/ Multivariate).

31

Categorical & Numerical Variables

• Categorical (or qualitative) Variable

– individuals are placed into one of several groups

or categories

– takes numerical values for which numerical

operations apply

32

Revisit : Hair colour and pain tolerance

What kind of variable is pain score?

Data for Pain and Hair Colour Experiment

D Brunette

L Brunette

D Blond

L Blond

0 10 20 30 40 50 60 70

33

Displaying Distributions

• The distribution of a variable tells us what values it

takes and how often it takes these values

in a data set using suitable graphs and numerical

summaries

is to look at the relationships between variables

34

Displaying Distributions

Graphs for categorical variables :

Show frequencies of individuals or observations

in each category of the variable: eg bar charts,

pie charts

The pattern of variation in a quantitative

variable is often displayed in a histogram or a

stemplot (or stem-and-leaf display)

35

Qualitative data: Bar Charts

36

Qualitative data: Pie Charts

37

Quantitative data:

Stemplot ( Stem-and-Leaf Display)

• Key point: Numbers must have a context for

sensible conclusions to be made from data

• A stemplot is a quick and easy way to

graphically display the key features of a

distribution of data

• These are best suited to a small to moderate

number of observations

38

To make a stemplot:

• Separate each observation into a stem (all but last

digit) and a leaf (final digit)

• E.g.,

24 := 2|4 139 := 13|9 5 := 0|5

• Write all unique stems in vertical column with the

smallest at the top, and draw a vertical line at the

right of this column

• Write each leaf in the row to the right of its stem, in

increasing order out from the stem

39

Example 5: Expenditure ($) of 50

shoppers

3.11 8.88 9.26 10.81 12.69 13.78

15.23 15.62 17.00 17.39 18.36 18.43

19.27 19.50 19.54 20.16 20.59 22.22

23.04 24.47 24.58 25.13 26.24 26.26

27.65 28.06 28.08 28.38 32.03 34.98

36.37 38.64 39.16 41.02 42.97 44.08

44.67 45.40 46.69 48.65 50.39 52.75

54.80 59.07 61.22 70.32 82.70 85.76

86.37 93.34

40

Ex 5: Stemplot

0 3 9 9

1 1 3 4 5 6 7 7 8 8 9 9

2 0 0 1 2 3 4 5 5 5 6 6 8 8 8 8

3 2 5 6 9 9

4 1 3 4 5 5 7 9

5 0 3 5 9

6 1

7 0

8 3 6 6

9 3

41

Variations on Stemplots

• Rounding or truncating the numbers to a few

digits before making a stemplot to avoid too

much detail in the stems

distribution

compare two related distributions

42

Ex 5: Splitting each stem

0 3

0 9 9

1 1 3 4

1 5 6 7 7 8 8 9

2 0 0 0 1 2 3 4

2 5 5 6 6 8 8 8 8

3 2

3 5 6 9 9

4 1 3 4

4 5 5 7 9

5 0 3

5 5 9

6 1

6

7 0

7

8 3

8 6 6

9 3

43

Ex 5: Splitting each stem

spent by 50 consecutive

supermarket shoppers:

(a) without splitting

stems

(b) splitting stems

44

Example of Stem and Leaf

• Data set 1

9, 10, 15, 22, 9, 15, 16, 24, 11, 46

• Data set 2

25, 27, 28, 36, 38, 39, 42, 50

What is the stem and what is the leaf?

45

46

Types of Quantitative Variables

• A variable is discrete if its set of possible

values constitutes a finite set or infinite

sequence (countable)

values consists of an entire interval on a

number line (uncountable)

47

Frequency or relative frequency

Histograms for Discrete Data

• Determine the frequency and relative frequency for

each value

whose height is the relative frequency of that value

48

Example 6: Credit cards

Students from a statistics class were asked how

many credit cards they carry. X is the variable

representing the number of cards

x # people Relative

Frequency

0 12

1 42

2 57

3 24

4 9

5 4

6 2 49

Credit Card Histogram

50

Histograms: Continuous Data

• Subdivide the measurement axis into a suitable

number of classes (or class intervals). Try to choose

sensible end points. Choose enough intervals to

avoid too much detail while retaining information

about important features of the distribution

• Determine the frequency and relative frequency for

each class. Divide each relative frequency by the

corresponding class width, this is called the density

• Then mark the class boundaries on a horizontal

measurement axis

• Above each class interval, draw a rectangle whose

height is the density 51

Histogram for continuous data:

property

• Multiplying both sides of the formula of the

density by the class width gives

relative frequency = (class width) x (density)

= (rect. width) x (rect. height)

= rectangle area

• The area of each rectangle is the relative

frequency of the corresponding class

• The total area of all rectangles must be 1

52

Histogram shapes

Typical words/phrases used to describe histograms and

other graphical displays (e.g. stem-and-leave of data) :

• symmetric, or skewed to the right/left;

• unimodal, or bimodal/multimodal;

• bell-shaped (if symmetric & uni-modal);

• there are possible outliers around…, or there are no

obvious outliers;

• typical value of the data is …;

• the range of the data is …;

• compared to the typical value the spread of the data is

fairly big/small 53

Histograms and Stemplots

• Histograms replace:

– the stems, in a stemplot, by class intervals.

– the leaves (showing the values, possible rounded) by

counts, percentages or densities

• Stemplots are useful for displaying distributions of

smaller data sets.

• Histograms are useful for moderate to larger data

sets.

54

Example:

A histogram with a density curve

Survival times of 72 guinea

pigs injected with tubercle

bacilli (Moore & McCabe)

Smooth density curve is

estimated using software.

No easy maths formula for

this curve!

Note:

extra bumps in right hand tail.

Some positive skew ignoring

these

55

General density curves

A density curve is a smooth curve through a relative

frequency histogram used to summarise its key

features succinctly

Usually the smooth curve is described by a

mathematical formula f(x)

non-negative

total area under it is 1 (integrates to unity)

56

Properties of density functions

• Non-negative function of a real valued variable

must be non-negative:

f ( x) ≥ 0

• Integral over the real numbers is unity

∞

∫−∞

f ( x)dx = 1

57

Example 6 : Time Between Industrial

Accidents

Density is Relative Frequency in

bins of width 5 days

0.05

0.04

177 Times (in

days) between

0.03 accidents at a

Density

DuPont Facility

0.02

over a 10 year

0.01 period.

0.00 [Vining, p.51]

0 100 200

Time_bw_acdnts

58

Ex 6: Exponential Density for

Time Between Industrial

Accidents

Fitted p.d.f. is

0.05

1 y

0.04 f ( y ) = exp(− )

0.03

λ λ

with λ = 20.412

Density

0.02

0.01

0.00

0 100 200

Time_bw_acdnts

59

Using the Density Function

• Proportion of values between a and b is area

b

∫a

f ( x)dx

• Eg. Calculate the chance that the time to the

next industrial accident after the one just

observed exceeds 80 days.

• Answer:

∞ 1

∫80 20.412 e

− y / 20.412

dy = 0.0199

60

- sddfUploaded byzeyin mohammed aumer
- Yoshiyuki Sowa et al- Direct observation of steps in rotation of the bacterial flagellar motorUploaded byGmso3
- Test Bank for Business Statistics in Practice 7th Edition by BowermanUploaded bya243011001
- Lectura 11. Cutting or Capping of High Assay ValuesUploaded bySebastian Jerez Urquieta
- tabulation of dataUploaded byRavneet Kaur
- Seven Quality ToolsUploaded by9986212378
- 7 toolsUploaded byPrince Rana
- AGH University of ScienceUploaded bycarlos mayorga
- Graph 1Uploaded byBassamSheryan
- Chapter 02Uploaded byjoebloggs1888
- z7QC tools.pdfUploaded byray lapida
- HistogramUploaded bymoresubscriptions
- Grain Size Edit PrintUploaded byAndi Rifan
- HistogramUploaded bySrikirupa V Muraly
- Creating and Viewing Histograms_SPDUploaded byLarry Sherrod
- BayesUploaded byPuspichanPalazzo
- QDM_SubjectOutlineUploaded bycindyeieio
- The Seven Basic Tools of QualityUploaded by3FoldTraining
- Chapter 1&5Uploaded byRowan Rodrigues
- TQM Tools & TechniquesUploaded byvinay tripathi
- QMM_1.pptxUploaded bySandeep Vijayakumar
- skittles project completeUploaded byapi-312645878
- 71989539-7-Basic-Qualty-Tools-Root-Cause-Analysis.pdfUploaded bygabao123
- Report NewUploaded bysaifz2012
- 7 Basic QC Tools SlideUploaded bysaladdress
- Aft s Exercises 2Uploaded byPi
- Tutorial 1 SSF1093 Introduction to Statistics (1)Uploaded byNur Asiah Zainal Abiddin
- Whole DocUploaded byJames Johnston
- Solution Chapter 1 Mandenhall bookUploaded byUzair Khan
- Intrumentation Lab#1Uploaded byAhsan Ijaz

- CHAPTER 02 - Linear CodesUploaded byNuruljannah Mohd Mazuki
- N Sun Decomposition of Complete Complete Bipartite and Some Harary GraphsUploaded by1br4h1m0v1c
- Locus and Parabola Parametric gUploaded byRichard Yang
- Bio-RadTechNote2861 Principles of Curve FittingUploaded byyumyum9
- Lecture 3 Short Run Price CompetitionUploaded byYazmin Abat
- Syllabus R10 Petroleum EngineeringUploaded byjntucekpepce
- herulaUploaded byapi-332133784
- Related Rates ExamplesUploaded byMatthew
- Angles - MathsUploaded byJannifer Love U
- Adriano Scremin - Pu ArticleUploaded bydiogorossot_85125119
- Automod Key FeaturesUploaded byfullkule
- physicsUploaded byAbhik Pal
- Post Processor ManualUploaded byVagner Aux Cad
- SEMANTIC PARSING OF SIMPLE SENTENCES IN UNIFICATION-BASED VIETNAMESE GRAMMARUploaded byAnonymous qwgN0m7oO
- Multinivel Modelo Estadística SPSSUploaded byMichele Wright
- Single Qubit Gate Simulation for quantum computerUploaded byzoro840
- pape23Uploaded bytsas9508
- Inverse Square Law CalculationUploaded byJayaneela Prawin
- C03-The Cellular ConceptUploaded byArjun Mukhopadhyay
- vibGraphUploaded byManel Montesinos
- 009Uploaded byvsalaiselvam
- (Grundlehren der mathematischen Wissenschaften 258) Joel Smoller (auth.)-Shock Waves and Reaction—Diffusion Equations-Springer US (1983).pdfUploaded bylig
- Criteria and Mechanics for GAD ShowUploaded byNoname Nameno
- Sequential Circuits (1)Uploaded byUday Desiraj
- FLACSUploaded byKhamphanh
- Generic PositiveUploaded byIndra Budhi Kurniawan
- ArticleUploaded byAlfred Alcantara
- C++Uploaded byVikas Saxena
- Qc ExactUploaded byMustafa Umut Sarac
- Face Detection using ANN.pdfUploaded byBudi Purnomo