Beruflich Dokumente
Kultur Dokumente
Lecture-1
COURSE OVERVIEW
Major Knowledge Areas
PROBABILITY
Probability: Concepts of Probability and their relevance to
statistical analysis,
Probability distributions relevant to transportation data analysis
INFERENTIAL STATISTICS
Data Collection: Survey planning and design
Statistical distributions, confidence intervals, hypothesis testing
CAUSAL STATISTICS
ANOVA
Regression analysis
Course philosophy; Basic theme and
Concepts
Course philosophy; Basic theme and
Concepts
The elements in probability allow us
to draw conclusions about
characteristics of hypothetical data
taken from the population, based on
known features of the population.
This type of reasoning is deductive in nature.
Course philosophy; Basic theme and
Concepts
For a statistical problem, the sample
along with inferential statistics allows
us to draw conclusion about the
population, with inferential statistics
making clear use of elements of
probability.
This reasoning is inductive in nature.
Week Topics to be covered
1 Course philosophy; Basic theme and Concepts
2 Probability: Concepts of Probability and their relevance to statistical analysis,
3 Probability distributions relevant to transportation data analysis.
4 Probability distributions types.
5 Probability distributions types--.Problems
6 Data Collection: Survey planning and design
7 Data Collection: statistical concepts--Problems
8 Traffic survey practice, inventory surveys, transport usage surveys, travel time and
congestion surveys, ---Test One--
9 matrix surveys, questionnaires and interviews, sources and use of secondary data,--
Project description
10 Statistics: Summary measures.
11 Statistical distributions, confidence intervals, hypothesis testing,
12 Contingency tables, correlation and linear regression,
13 ANOVA; basic concepts
14 ANOVA; applications
15 Multivariate analysis
16 Presentations; Course Conclusion
The background
FIGURE 1 — The history of road fatalities
COMMENTARY: There was a steady increase in the per capita road fatality
rate, with the exception of the Great Depression and the Second World War, until 1970. Since 1970,
the toll has trended downwards, although it has recently stalled.
Course philosophy; Basic theme and
Concepts
The engineering approach
Probability
When we know the underlying model that governs an experiment,
we use probability to figure out the chance that different
outcomes will occur.
For example, if we flip a fair coin 3 times, what is the probability
of obtaining 3 heads?
By definition, probability values are between 0 and 1.
What does it mean if Outcome A of an experiment has
aprobability of 1/3rd of occurring?
If the experiment is repeated a large number of times, Outcome
A will occur 1/3rd of the time.
Statistics
Data analysis, random variables, stochastic
processes, probability, statistical modelling and processes,
inference, time series, reliability, multivariate, SPC, ……
- everywhere in modern engineering
-workplaces, applications, research
Maths
Mathematical thinking is lifeblood of engineering
engineering needs the most technical maths faster than
any other discipline and
engineering needs the most maths generic skills faster
than
any other discipline
Maths is like language
Specific & generic skills become part of person
People forget how they acquired such skills
Transferability needs more than specifics required
Maths fitness is like physical fitness
Underpins development of field-specific skills
Necessary but not sufficient for excellence in specific
fields
Probability and Statistics
In Probability, we use our knowledge of the underlying model to
determine the probability that different outcomes will occur.
How does statistics compare to probability?
In statistics, we don’t know the underlying model governing
anexperiment.
All we get to see is a sample of some outcomes of the
experiment.
We use that sample to try to make inferences about the underlying
model governing the experiment.
So a thorough understanding of probability is essential to
understanding statistics.
Probability and Statistics
Collect, organize, and display data
Use appropriate statistical methods to analyze data
Develop inferences and predictions that are based on data
Understand and apply basic concepts of probability
Probability and Statistics-Example
Suppose for a manufacturing process we have an upper limit of 5%
defective items produced for the process to be “in control”.
We take a sample of 100 items produced and find 10 defective
items. Is the manufacturing process in control?
One way to do look at this is to say “if the process has 5%
defective items, what is the probability that there will be 10 or
more defective items in a sample of size 100?
The probability of this outcome is called a P-value.
In this case the P-value is only .0282. What does this mean?
That this outcome would occur by chance only 2.82% of the time.
Sowhat is the definition of a P-value?
The P-value is the probability of getting the
measured outcome if the assumed underlying model were
true.
Statistics
• Science of data collection,
summarization, presentation and
analysis for better decision making.
Observational study
Observe the system
Historical data
The objective is to build a system model usually called
empirical models
Design of experiment
Descriptive Statistics
Inferential Statistics
Forms of Data Description
Point summary
Tabular format
Graphical format
Diagrams
The application
FIGURE 2 — Rural risk
The projections
FIGURE 3 — Newer vehicles are safer
Vehicle safety standards and vehicle design will
be improved to further increase the
protection provided to occupants and minimise
the hazard to non-occupants struck by
a vehicle. This will include designing vehicles so
that they cause less damage to other
vehicles and road users in a crash.
Statistical analysis
FIGURE 4 — Recent trends in fatalities among
vulnerable road users
Demand Modelling on rail Lines, Stations and Trains
700
600
Nº de Viajeros
Travellers
Travellers
___ Curve fitting getting-off 300
___ Curve fitting getting-offsuben
200
Bajan Polinomio O(6)
R2 = 0.7561 PolynomialSuben
O(5)Polinomio O(6)
Polynomial O(4) 100
R2 = 0.2073
0
19 29 39 49 59 69 79 89 99
-100
• same Station, same Line (Madrid, C1, path 0), same day Type (L)
700
700
600
Suben
500 • Data getting-off Bajan
Travellers
400 Suben
Bajan Polinomio O(4)
Nº de Viajeros
400
300
___ Curve fitting getting-off
Suben Polinomio O(4) ___ Curve fitting getting-off
Bajan Polinomio O(6)
Suben Polinomio O(6)
300
200
Polynomial O(4) Polynomial O(5)
R2 = 0.6332 200
R2 = 0.8267
100
100
R2 = 0.1521
0 R2 = 0.2785
0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Time period
-100
Time
Tramoperiod
horario
-100
Tramo horario
Demand Modelling on rail Lines, Stations and Trains
Time period
28
Highway construction-Solution
P(AUB) = P(A) + P(B) - P(A∩B),and since events A and B are
independent, we can also write,
P(A∩B) = P(A)P(B) = 0.8x0.75 = 0.60 Thus,
P(AUB) = P(A) + P(B) - P(A)P(B)
= 0.8 + 0.75 - 0.8x0.75 = .95.
Notice that the probability P(A∩B) = P(“availability of construction
workers AND favorable weather conditions”) is only 60%, i.e.,
construction will be possible only 60% of the time.
29
Example Highway Construction
The result P(AUB) = 0.95 simply indicates that one of the two conditions
for highway constructions (either, availability of construction
workers OR favorable weather conditions, but not necessarily both)
are found 95% of the time. From the point of view of predicting
the ability to carry out construction activities the probability
P(A∩B) = 0.60 is more important
30
Example- Wave heights in a lake
In the process of re-designing a harbor in a lake, data is collected
on wind velocity in the area as well as water temperature to
check what effect these two variables have on wave height in
the harbor. Of interest for the designer are the conditions A
= “strong wind velocity” (registered when wind velocity is
larger than 15 mph) and
B = “warm waters” (registered when water temperature is
larger than 70oF). Records indicate that P(A) = 0.350, P(B)
= 0.150, and P(A∩B) = 0.052. Are the events A and B,
i.e., “strong wind velocity” and “warm waters”,
independent?
31
Wave Height- Solution
32
Conditional probability
Conditional probability is the probability associated with an
event, say A, given the occurrence of a related event, say B.
34
Example 1. Highway closing under snow
conditions
35
Solution to Example 1
Let events A = “highway closure”
B = “significant snow observed in the winter months” and
The data indicates that P(B) = P(“significant snow observed in the winter months”) = 30/90
= 1/3,and P(A∩B) = P(“highway closure and significant snow observed”) = 15/90 = 1/6.
The conditional probability, P(A|B), is, therefore, P(“highway closure given snowy conditions”)
= P(A|B) = P(A∩B)/P(B) =(1/6)/(1/3) = 1/2.
This is This is interpreted as saying that, for that particular highway, if significant
snowfall is recorded, highway closure will occur about half of the time.
36
Theorems on conditional probability
The following are two important theorems related to
conditional probability:
(a) For any three events A1, A2, and A3 the following relationship
holds true:
P(A1∩A2 ∩ A3) =P(A1)P(A2|A1)P(A3|A1 ∩ A2).
(b) If an event A must result in one of the mutually exclusive events
A1, A2, …, An, then
P(A) = P(A1)P(A|A1) + P(A2)P(A|A2) + … + P(An)P(A | An)
37
Example 2. Defective computer chips.
Suppose you are in the process of fixing a computer
by replacing three identical computer chips and you have a
container with 20 computer chips from which to select the
replacements. The chips are selected at random.
5 of the computer chips in the container are defective. What is
the probability that you would select three defective chips for
your computer repair?
38
Solution to example 2
Let A1, A2, A3 be the events that you select a defective
computer chip in the 1st,2nd , and 3rd picks out of the
container. Thus, you are interested in calculating
P(A1∩A2∩A3) = P(A1)P(A2|A1)P(A3|A1 ∩ A2)
39
Solution to example 2
1. Since there are 5 defective chips out of 20 chips,
P(A1) = 5/20 = ¼ = 0.25
41
Condition 2-Conditional probability
If an event A must result in one of the mutually exclusive events A1, A2, …, An, then
P(A) = P(A1)P(A|A1) + P(A2)P(A|A2) + … + P(An)P(A | An)
The event A and its relation to the mutually exclusive events A1, A2, …, An, is illustrated in
the following Venn diagram:
42
Example 3. Irrigation methods
While conducting a study on the effects of different irrigation methods on a given crop, you
define the following events:
· A1 = sprinkler irrigation
· A2 = steady furrow irrigation
· A3 = surge furrow irrigation
· A4 = drip irrigation
43
Solution to Example 3
You also find that the crop is successful if using sprinkler irrigation 85% of the time, if
using steady furrow irrigation 90% of the time, if using surge furrow irrigation 70%
of the time, and if using drip irrigation 60% of the time. Thus, if event A represents “a
successful
crop”, then we have that
44
Example 4. Highway traveling
To reach Grenoble (France) from Turin (Italy) one can follow either of two routes. The first connects Turin
and Grenoble, whereas the second passes through Chambery (France), i.e., the second route is
Turin-Chambery-Grenoble. During extreme weather conditions in winter, travel between Turin
and Grenoble is not always possible because some parts of the highway may not be open to traffic.
Define the following events:
· A = the highway from Turin to Grenoble is open
· B = the highway from Turin to Chambery is open
· C = the highway from Chambery to Grenoble is open
45
Example 4
In anticipation of driving from Turin to Grenoble, a traveler listens to the next day’s weather
forecast. If snow is forecast for the next day over the southern Alps, one can assume (on the
basis of past records) that
P(A) = 0.6, P(B) = 0.7, P(C) = 0.4, P(C|A) = 0.5, and P(A|B∩C) = 0.4.
(a) What is the probability that the traveler will be able to reach Grenoble from Turin?
(b) What is the probability the traveler will be able to drive from Turin to Grenoble by way of
Chambery?
(c) Which route should be taken in order to maximize the chance of reaching Grenoble?
46
l Probability can be a discrete or a continuous variable.
Discrete probability: P can have certain values only.
examples:
tossing a six-sided dice: P(xi) = Pi here xi = 1, 2, 3, 4, 5, 6 and Pi = 1/6 for all xi.
tossing a coin: only 2 choices, heads or tails.
NOTATION
for both of the above discrete examples (and in general)
xi is called a
when we sum over all mutually exclusive possibilities:
random variable
P xi =1
i
Continuous probability: P can be any number between 0 and 1.
define a “probability density function”, pdf, f(x):
f xdx = dPx a x dxwith a a continuous variable
Probability for x to be in the range a x b is:
b
P(a x b) = f xdx Probability=“area under the curve”
a
Just like the discrete case the sum of all probabilities must equal 1.
f xdx =1
We say that f(x) is normalized to one.
Probability for x to be exactly some number is zero since:
x=a
f xdx = 0
x=a
Note: in the above example the pdf depends on only 1 variable, x. In general, the pdf can depend on many
variables, i.e. f=f(x,y,z,…). In these cases the probability is calculated using from multi-dimensional integration.
47 P416 Lecture 1 R.Kass/Sp07
l Examples of some common P(x)’s and f(x)’s:
Discrete = P(x) Continuous = f(x)
binomial uniform, i.e. constant
Poisson Gaussian
exponential
chi square
l How do we describe a probability distribution?
u mean, mode, median, and variance
u for a discrete distribution, the mean and variance are defined by:
1 n 1 n
= xi 2 = (xi )2
n i=1 n i=1
u
Chi-square distribution Student t distribution
49 P416 Lecture 1 R.Kass/Sp07
l Calculation of mean and variance:
example: a discrete data set consisting of three numbers: {1, 2, 3}
average () is just:
n x 1 2 3
= i = =2
i=1 n 3
Complication: suppose some measurements are more precise than others.
Let each measurement xi have a weight wi associated with it then:
n n
= xi wi / wi “weighted average”
i=1 i=1
variance (2) or average squared deviation from the mean is just:
n
2 1
= (xi )2
n i=1 The variance
is called the standard deviation describes
rewrite the above expression by expanding the summations:
2 1
n n n the width
= xi 2 xi
2 2
of the pdf !
n i=1 i=1 i=1
1 n 2 2
= xi 2 2 This is sometimes written as:
n i=1
<x2>-<x>2 with <> average
1 n 2 2 of what ever is in the brackets
= xi
n i=1
Note: The n in the denominator would be n -1 if we determined the average () from the data itself.
Example: a continuous probability distribution,
This “pdf” has two modes!
f ( x ) = c sin 2
x for 0 x 2 , c = constant
It has same mean and median, but differ from the mode(s).
2
xdx =
2
Note : sin
2 0 2
= x sin xdx / sin 2 xdx =
2
0 0
3
mode = sin 2 x = 0 ,
x 2 2
a 2 1
median = sin xdx / sin 2 xdx =
2
a =
0 0 2
Steady increase in precision of the neutron lifetime but are any of these measurements
accurate?
The non-overlapping blue area is pizza for lunch, no pizza for dinner.
The non-overlapping red area is pizza for dinner, no pizza for lunch.