Beruflich Dokumente
Kultur Dokumente
Dr J Frost (jfrost@tiffin.kingston.sch.uk)
www.drfrostmaths.com
A: London
B: Paris
C: Madrid
Note: Some of these slides will likely be unsuitable for
those who have not done S1/S2. For those that haven’t,
I recommend just covering Part 1 and Part 3d
(Geometric).
Topic 6: Probability
Part 1 – Manipulating Probabilities
some basic probability/statistics will broaden your maths ‘general knowledge’. You’ll
But
know for example what scientists at CERN mean when they refer in the news to the “
test” needed to verify that a new particle has been discovered!
ζ
Topic 6 – Probability
𝑃 ( 𝐴 ∪ 𝐵 )=𝑃 ( 𝐴 )+ 𝑃 ( 𝐵 ) ?− 𝑃( 𝐴 ∩ 𝐵)
𝑃 ( 𝐴 The
∨𝐵 probability that A occurred given B
occurred.
)
To appreciate conditional probabilities, consider a probability tree:
3 Green
7
Conditional probabilities
Using the tree, we can construct the following identity for condition
probabilities:
𝑝(𝐵∨𝐴)
B 𝑝( 𝐴 ∩ 𝐵)
𝑝(
𝐴) A
or
Conditional probabilities
𝑃 ( 𝐵 ) 𝑃 ( 𝐵′ ∨ 𝐴 )
? So ?
𝑃 ( 𝐴′ ∩ 𝐵 )
?
Bayes’ Rule
Bayes’ Rule relates causes and effects. It allows us find the probability of the
cause given the effect, if we know the probability of the effect given the
cause.
𝑃 ( 𝐸|𝐶 ) 𝑃 (𝐶 )
𝑃 ( 𝐶| 𝐸 )=
𝑃 ( 𝐸)
House is trying to find the cause of a disease. He
Dr
suspects Lupus (as he always does) due to their kidney
failure. The probability that someone has this symptom
if they did have Lupus is 0.2. The probability that a
random patient has kidney damage is 0.001, and the
probability they have Lupus 0.0001. What is the
probability they have Lupus given their observed
symptom?
?
Bayes’ Rule
But we don’t always need to know the probability of the effect.
𝑷 ( 𝑬|𝑪 ) 𝑷 (𝑪 )
𝑷 ( 𝑪| 𝑬 )=
𝑷 ( 𝑬)
Notice that in the distribution , is fixed, and the distribution is over different
causes, where . This suggests we can write:
𝑷 ( 𝑪| 𝑬 )=𝒌 𝑷 ( 𝑬|𝑪 ) 𝑷 ( 𝑪 )
So ?
Then using Bayes’ Rule:
But . So , so . ?
Then
ζ
Topic 6 – Probability
P(X)
1 2 3 4 5 6
X Outcomes
Discrete vs Continuous Distributions
Discrete distributions are ones where the outcomes are discrete, e.g. throw of a die,
number of Heads seen in 10 throws, etc.
In contrast continuous distributions allow us to model things like height, weight, etc.
Discrete Continuous
0.4
0.3
P(X=k)
P(X=h)
0.2
0.1
Discrete
0.4
Probabilities add up to 1.
i.e.
0.3
P(X=k)
Because our probability function is ultimately just a plain old function
(provided it meets the above properties), we usually see the function
written as “” rather than .
Continuous Distributions
𝟏. 𝟔
Does it make sense to talk about
𝒑 ( 𝟏. 𝟒 ≤ 𝒉< 𝟏. 𝟔 )= ∫ 𝒑?( 𝒉 ) 𝒅𝒉
𝟏. 𝟒 the probability of someone being
exactly 2m?
We find the area under the
i.e.
graph. Note that the area Clearly not, but we could for
under the entire graph will be example find the probability
1: of a height being in a
P(X=h)
particular range.
Answer: ?
Probability is proportional to distance.
2
Maximum distance is 1m.
P(X=x)
So and thus ?
Archery scores: 57, 94, 25, 42 X is “times target hit out of 3 shots”.
x 0 1 2 3
? P(X=x) 0.25 0.5 0.05 0.2
?
Expected Value
Question: Two people randomly think of a real number between 0 and 100. What
is the expected difference between their numbers? (i.e. the average range)
(Source: Frosty Special)
(Hint: Make your random variable the difference between the two numbers )
Integrating we get . If the limits are 0 and 100, we get , so
?
Expected Value
Question: Two people randomly think of a real number between 0 and 100. What
is the expected difference between their numbers? (i.e. the average range)
(Source: Frosty Special)
(Hint: Make your random variable the difference between the two numbers )
100
𝐸 [ 𝑋 ] = ∫? 𝑥 𝑝 ( 𝑥 ) 𝑑𝑥
0
What does it mean? We add 1 to all the outcomes of the die (i.e. we
now have 2 to 7)
?
The probabilities remain unaffected.
In general: ?
Modifying Random Variables
We often modify the value of random variables.
Since
k 1 2
P(X=k) 0.6 0.4
?
So
STEP Question
Fire extinguishers may become faulty at any time after manufacture and are tested
annually on the anniversary of manufacture. The time T years after manufacture until
a fire extinguisher becomes faulty is modelled by the continuous probability density
function:
A faulty fire extinguisher will fail an annual test with probability p, in which case it is
destroyed immediately. A non-faulty fire extinguisher will always pass the test. All of
the annual tests are independent.
a) Show that the probability that a randomly chosen fire extinguisher will be destroyed
exactly three years after its manufacture is
b) Find the probability that a randomly chosen fire extinguisher that was destroyed
exactly
(We’ll dothree
part (b)years after its manufacture was faulty 18 months after its manufacture.
a bit later)
{
𝑓 ( 𝑡 ) = ( 1 +𝑡 2 )2
, 𝑓𝑜𝑟 𝑡 ≥ 0
¿ 0 , 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
Since
we have a PDF, it makes sense to integrate it so we can find the probability of the extinguisher failing between
some range of times.
The probability the extinguisher fails sometime in the first year is , during the second year and during the third year
Let’s consider the three cases:
a) If it fails during the first year, it must survive the first two tests, before failing the third. This gives a probability of
b) If it fails during the second year, it must survive the second test and fail on the third, giving (note that on the first
test, the probability of it surviving given it’s not faulty is 1)
c) If it fails during the third year, then it fails during the third year. We get .
Click to choose
points uniformly
across x.
?
Summary
• Random variables have a number of possible outcomes, each with an associated
probability.
• Random variables can be discrete or continuous.
• Discrete random variables have an associated probability mass function. We require
that across the domain of the function (i.e. possible outcomes).
• Continuous random variables have an associated probability density function.
Unlike ‘conventional’ probabilities, these can have a value greater than 1. We require
that , i.e. the total area under the graph is 1.
We can find a probability mass (i.e. the ‘conventional’ kind of probability) by finding
the area under the graph in a particular range, using definite integration.
• While we have a ‘mean’ for a sample, we have an ‘expected value’ for a random variable,
written . It can be calculated using for a discrete random variable, and for a continuous
random variable. The expected value for a fair die for example is 3.5.
• The variance gives a measure of spread. For specifically it’s the average squared distance
from the mean. We can calculate it using , which can be remembered using the
mnemonic “mean of the square minus the square of the mean”, or “msmsm”.
• . i.e. Scaling our outcomes/adding has the same effect on the mean.
ζ
Topic 6 – Probability
e.g. The possible time e.g. The possible probability distributions for the throw
before a volcano next of a die, given I threw a die 60 times and saw 10 ones,
erupts. 10 twos, 10 threes, 10 fours, 10 fives and 10 sixes.
Bernoulli Distribution
The Bernoulli Distribution is perhaps the most simple distribution. It
models an experiment with just two outcomes, often referred to as
‘success’ and ‘failure’.
Question: If I throw a biased coin (with probability of heads p) 8 times, what is the
probability I see 3 heads?
H H H T T T T T
We can write B(n,p) to represent the Binomial Distribution, where n and p are
the parameters for the number of trials and probability of a single success.
Suppose X is the random variable representing the
number of left handed people.
Then ?
(This example points out one of the assumptions of the Binomial Distribution: that each trial
is independent. But this was unlikely to be the case, since most on the table were related, and
left-handedness is in part hereditary. Sometimes when we model a scenario using an ‘off-the-
shelf’ distribution, we have to compromise by making simplifying assumptions.)
Summary of Distributions so far
Similarly, a multivariate distribution represents a single trial with any number of
outcomes.
A multinomial distribution is a generalisation of the Binomial Distribution, which
gives us the probability of counts when we have multiple outcomes.
Generalise to
Bernoulli n trials Binomial
e.g. “What’s the e.g. “What’s the
probability of getting a probability of getting 3
Heads?” Heads and 2 Tails?”
Generalise to
k outcomes
Generalise to
Multivariate n trials
Multinomial
e.g. “What’s the e.g. “What’s the
probability of getting probability of rolling
a 5? 3 sixes, 2 fours and a
1?
(Use your combinatorics knowledge to try and work out the probability function for this!)
Poisson Distribution
Cars pass you on a road at an average rate of 5
cars a minute. What’s the probability that 3
cars will pass you in the next minute?
510
? 10 ! 𝑒− 5 =0.018
𝑝 ( 𝑋 = 10 ) =
50 −5
1 − 𝑝 ( 𝑋 = 0 ) =1 −? 𝑒 =1 − 𝑒− 5=0.993
0!
Q3) What’s the probability that it next erupts between 2 and 3 years
after the current date?
i.e. It erupts 0 times in the first year, 0 times in the second year, and at
least once the third year. ?
Relationship to the Binomial Distribution
Imagine that we segment this fixed period into a number of smaller chunks of time, in
each of which an event can occur (which we’ll describe as a ‘success’), or not occur.
1 minute
If we presumed that we only had at most one car passing in each of these smaller
periods of time, then we could use a Binomial Distribution to model the total number
of cars that pass across 1 minute, because it models the number of successes.
Of course, multiple cars could actually pass within each smaller segment of time.
How would we fix this?
Relationship to the Binomial Distribution
We could simply use smaller chunks of time – in the limit, we have tiny slivers of time, so
instantaneous that we couldn’t possibly have two cars passing at exactly the same time.
1 minute
Now if we’d divided up our time into chunks where is large, and we 𝝀
expect an average of cars to pass, what then is the probability of a car 𝒑= ?
passing in one chunk of time? (Only Year 8 probability needed!) 𝒏
𝟏
𝟔?
Q1) What’s the probability that you die on the first shot?
Q2) What’s the probability that you die on the second shot?
?
Geometric Distribution
If you have a number of trials, where in each trial you can have a ‘success’ or ‘failure’,
and you repeat the trial until you have a success (at which point you stop), then the
geometric distribution gives you the probability of succeeding on the 1 st trial, the 2nd
trial, and so on.
Note that if , then
For example, if we tossed a fair die until we saw a 1, we’d expect to have to throw the die
times on average before we see a 1 (where the count includes the last throw).
Side Note: The distribution is called ‘geometric’ because if we were to list out the
probabilities for , , and so on, we’d have a geometric series!
Geometric Distribution
Tom and Geri have a competition. Initially, each player has one attempt at hitting a target. If
one player hits the target and the other does not then the successful player wins. If both
players hit the target, or if both players miss the target, then each has another attempt,
with the same rules applying. If the probability of Tom hitting the target is always and the
probability of Geri hitting the target is always , what is the probability that Tom wins the
competition?
A: B: C:
D: E: SMC
Level 5
The probability that they both hit or miss is .
So Tom can win by either winning immediately , or initially drawing Level 4
before winning: , or drawing twice and then winning: and so on. This
Level 3
gives us an infinite geometric series with and . Using , we get .
Level 2
Level 1
Frost Real-Life Example
My mum (who works at John Lewis), was selling London Olympics ‘trading cards’, of
which there were about 200 different cards to collect, and could be bought in packs.
Her manager was curious how many cards you would have to buy on average before
you collected them all. The problem was passed on to me!
(Note: Assume for simplicity that each card is equally likely to be acquired – unlike say ‘Pokemon cards’ [a
childhood fad I never got into], where lower numbered cards are rarer)
Hint: Perhaps think of the trials needed to collect the next card as a geometric process?
Then consider these processes all combined together.
Answer: cards?
Explanation on next slide…
Frost Real-Life Example
Answer: cards
get the first card, we just need to buy 1 card.
To
To get the second card, each time we buy a card, we have a chance of buying a new card
(if not, we keep buying until we have a new one). Since the number of cards we need to
buy to get this next card is geometrically distributed, we expected number of cards is
Combined these expected number of cards we need to buy for each new card, we get . The
bracketed expression is known as a ‘Harmonic Series’, which can be represented as . Typing
“200 * H(200)” into www.wolframalpha.com got me the answer above.
This problem is more generally known as the “Coupon Collector’s Problem” http://
en.wikipedia.org/wiki/Coupon_collector%27s_problem
Coin Conundrums
How would you model a fair coin given you have just a fair die?
Solution: Easy! Roll the die. If you get say an even number, declare
‘Heads’, else declare ‘Tails’.
?
How would you model a fair die given you have just a fair coin?
Solution: A bit harder! Throw the coin 3 times, giving us 8 possible outcomes. Label the
first 6 of these outcomes (e.g. HHH, HHT, …). If we get the last two outcomes, then reject
these outcomes and repeat.
An interesting side question is how many times on average we’d expect to have to throw
? is , the expected value of a geometric
the coin. If the probability of being able to stop
distribution is , i.e. . Since we throw the coin 3 times each time, then we expect an
average of 4 throws.
70 85 100 115 130 145 If and , how many Standard Deviations
IQ (“Intelligence Quotient”) x above the mean is 130?
Answer = 2
?
Z-values
We look up the units So ?
and tenths digit of
our z-value here…
Z-values
It’s useful to remember that 68% of values are within 1 s.d. of the mean,
95% within two, and 99.7% within 3 (when the variable is ‘normally
distributed’)
When scientists referred to a “ test” needed to officially ‘discover’ the Higgs Boson,
they mean that were the data observed to occur ‘by chance’ in the situation where the
Higgs Boson didn’t exist (known as the null hypothesis), then the probability is less than
that of being away from the mean of a randomly distributed variable: a 1 in 3.5m