Beruflich Dokumente
Kultur Dokumente
Introduction
detector
data
F. James (CERN)
1 / 42
Basic Concepts
Introduction
detector
data
1 / 42
Basic Concepts
Introduction
F. James (CERN)
2 / 42
Basic Concepts
Introduction
3 / 42
Basic Concepts
Probability
Probability
All statistical methods are based on calculations of probability.
We will define three different kinds of probability:
I
F. James (CERN)
4 / 42
Basic Concepts
Probability
Mathematical Probability
The Theory of Probability was properly formalized as a branch of
mathematics only in 1930 by Kolmogorov.
Let the set Xi be exclusive events:
(one and only one Xi can occur).
Then P(Xi ), is a probability if it satisfies the Kolmogorov axioms:
I
F. James (CERN)
5 / 42
Basic Concepts
Probability
Mathematical Probability 2
Some basic properties that follow directly from the axioms:
I
However, surprisingly:
I
Example: Let R be a real number drawn randomly between zero and one.
Then P(R 6= 1/2) = 1, but it is only almost certain, since R could be
1/2, with probability zero!
But mathematical probability is an abstract concept. It cannot be
measured. We need a probability with an operational definition. There are
two such definitions of probability: Frequentist and Bayesian.
F. James (CERN)
6 / 42
Basic Concepts
Probability
Frequentist Probability
First defined by John Venn in 1866.
The frequentist probability of an event A is defined as the number of times
A occurs, divided by the total number of trials N, in the limit of a large
number of identical trials:
P(A) = limN
N(A)
N
7 / 42
Basic Concepts
Probability
Frequentist Probability 2
Objection to the definition of Frequentist probability:
It requires an infinite number of experiments.
Objection overruled: Many scientific concepts are defined as limits.
For example the electric field:
~
~ = lim F
E
q0 q
~ is the force due to the field on the charge q.
where F
Since the charge disturbs the field, it has to be infinitesimally small. In this
case, the limit is not even possible because charge is quantized, but still
the definition is perfectly valid.
F. James (CERN)
8 / 42
Basic Concepts
Probability
Frequentist Probability 3
However, Frequentist Probability has an important limitation:
It can only be applied to repeatable phenomena.
Objection There are no repeatable phenomena, since
no experiment can be repeated exactly,
for example the age of the universe will be greater the second time.
Objection overruled: We will not apply it to phenomena that depend
critically on the age of the universe.
Conclusion: Most scientific work involves repeatable phenomena, and
frequentist probability is well defined for this work.
But we need in addition a more general kind of probability if we want to
apply it to non-repeatable phenomena.
F. James (CERN)
9 / 42
Basic Concepts
Probability
Bayesian Probability
For phenomena that are not repeatable, we need a more general definition.
(for example, we may want to know the probability that it will rain
tomorrow)
Bayesian Probability is defined as the degree of belief that A will happen.
It depends not only on the phenomenon itself, but also on the state of
knowledge and beliefs of the observer, and it will in general change with
time as the observer gains more knowledge.
We cannot verify if the Bayesian probability is correct by counting
frequencies.
The operational definition of Bayesian Probability is based on
the coherent bet of de Finetti. (around 1930)
There are problems, such as whether the value of money is linear.
F. James (CERN)
10 / 42
Basic Concepts
Properties of Probability
11 / 42
Basic Concepts
Properties of Probability
Bayes Theorem
Rev. Thomas Bayes, published posthumously in 1746.
Bayes Theorem says that the probability of both A and B being true
simultaneously can be written:
P(A and B) = P(A|B)P(B) = P(B|A)P(A)
which implies:
P(B|A) =
P(A|B)P(B)
P(A)
F. James (CERN)
P(A|B)P(B)
P(A|B)P(B) + P(A|notB)P(notB)
12 / 42
Basic Concepts
Properties of Probability
F. James (CERN)
P(T + |K)P(K)
P(T + |K)P(K) + P(T + |not K)P(not K)
13 / 42
Basic Concepts
Properties of Probability
F. James (CERN)
14 / 42
Basic Concepts
Other Basics
F. James (CERN)
15 / 42
Basic Concepts
Other Basics
1
2 2
(x)2
2 2
F. James (CERN)
16 / 42
Basic Concepts
Other Basics
F. James (CERN)
17 / 42
Basic Concepts
Other Basics
18 / 42
Basic Concepts
Probability Distributions
F. James (CERN)
Xf (X )dX
f (X )dX
f (Xi )
=1
19 / 42
Basic Concepts
Probability Distributions
F. James (CERN)
20 / 42
Basic Concepts
Probability Distributions
cov(X , Y )
.
X Y
21 / 42
Basic Concepts
Probability Distributions
X
Y
'
2 2
X
X
2X
2
2XY X Y
+ Y
2
X Y
Y
,
The above rule of thumb is well known, but may be very wrong.
In particular, if f and g are Gaussian distributions,
V (X /Y ) is infinite.
See Cousins and James, Am J. Physics, 74,159 (Feb 2006)
F. James (CERN)
22 / 42
Discrete
The distribution of
the number of events in a single bin of a histogram
is binomial
(if the bin contents are independent,
but the total number of events N is fixed).
F. James (CERN)
23 / 42
Discrete
r,
Parameters
N,
positive integer N.
positive integer.
p, 0 p 1.
N r
P(r ) =
p (1 p)Nr ,
r
Probability function
r = 0, 1, . . . , N.
Expected value
E (r ) = Np.
Variance
V (r ) = Np(1 p).
G (Z ) = [pZ + (1 p)]N .
F. James (CERN)
24 / 42
Discrete
F. James (CERN)
25 / 42
Discrete
ri , i = 1, 2, . . . , k, positive integers N.
N, positive integer.
k, positive integer.
Pk
p1 0, p2 0, . . . , pk 0,
i=1 pi = 1
Probability function
P(r1 , r2 , . . . , rk ) =
Expected values: E (ri ) = Npi .
Variances: V (ri ) = Npi (1 pi ) .
Covariances: cov(ri , rj ) = Npi pj ,
Probability generating function
N!
p r1 p r2 pkrk .
r1 !r2 ! rk ! 1 2
i 6= j .
G (Z2 , . . . , Zk ) = (p1 + p2 Z2 + + pk Zk )N .
F. James (CERN)
26 / 42
Discrete
F. James (CERN)
27 / 42
Discrete
r,
positive integer.
Parameter
Probability function
P(r ) =
Expected value
E (r ) = .
Variance
V (r ) = .
Skewness
1
1 = .
Kurtosis
2 =
G (Z ) = e (Z 1)
F. James (CERN)
r e
r!
1
.
28 / 42
Discrete
X
P 1.64
1.64 = 0.90
X
P 1.96
1.96 = 0.95
F. James (CERN)
29 / 42
Continuous
, real.
, positive real number.
Probability density function
1
1 (X )2
f (X ) = N(, ) =
exp
2
2
2
Cumulative distribution
Z Z
1 2
X
1
F (X ) =
where
(Z ) =
e 2 x dx .
2
2
Expected value
Variance
E (X ) = .
V (X ) = 2 .
Characteristic function
1 2 2
(t) = exp it t .
2
F. James (CERN)
30 / 42
Continuous
Parameters
Covariances
V
cov(X) =
V (Xi ) = Vii
V .
the (i, j)th element of
1
V t
Characteristic function: (t) = exp it tT
F. James (CERN)
Statistics for Physicists, 1: Basic2Concepts
May 2011, Stockholm
cov(Xi Xj ) = Vij ,
31 / 42
Continuous
F. James (CERN)
32 / 42
Continuous
Parameter
Probability density
1 X (N/2)1 X /2
e
2 2
f (X ) =
.
N
Expected value
E (X ) = N .
Variance
V (X ) = 2N .
F. James (CERN)
33 / 42
Continuous
or
or
F. James (CERN)
34 / 42
Continuous
N=1
0.4
0.35
0.3
N=2
0.25
N=3
0.2
N=4
0.15
N=5
N=6
0.1
0.05
0
F. James (CERN)
5
Variable X
10
35 / 42
Continuous
1
1
.
(1 + X 2 )
1
f (X ) =
.
2 + (X X0 )2
The parameters X0 and represent location and scale parameters
respectively, being the mode and half-width at half-height respectively.
However it should be noted that the mean and moments of the
distribution are still undefined.
F. James (CERN)
36 / 42
Continuous
1
,
ba
f (X ) = 0 ,
Expected value
Variance
Skewness
Kurtosis
Characteristic function
otherwise
a+b
.
2
(b a)2
.
V (X ) =
12
1 = 0 .
2 = 1.2 .
E (X ) =
(t) =
F. James (CERN)
aX b
e itb e ita
.
it(b a)
37 / 42
Continuous
F. James (CERN)
38 / 42
Continuous
Landau density
0.15
0.1
0.05
F. James (CERN)
-5
10
X
15
20
25
39 / 42
Convergence
Convergence
We say that the sequence tn , (n = 1, . . . ) converges to T if:
The Usual Convergence:
You give me an > 0,
and Ill give you an N
such that,for all n > N,
|tn T | < .
F. James (CERN)
40 / 42
Convergence
Convergence
We say that the sequence tn , (n = 1, . . . ) converges to T if:
The Usual Convergence:
Convergence in Probability:
In quadratic mean
In probability
In distribution
F. James (CERN)
stronger
to
weaker
convergence
40 / 42
Convergence
lim X
N
1 X
Xi
N N
lim
i=1
If the i are all the same, one can show that the above converges in
probability. Depending on the behaviour of the i for large i, one can
prove stronger laws of large numbers which differ from this weak law by
having stronger types of convergence.
These stronger laws will not be of interest to us.
F. James (CERN)
41 / 42
Convergence
N
X
i=1
v
N (0, 1) .
u N
uX
t
2
i
i=1
F. James (CERN)
42 / 42