Sie sind auf Seite 1von 49

Introduction to Statistical

Methods for Data Analysis

Dr Lorenzo Moneta
CERN PH-SFT
CH-1211 Geneva 23

sftweb.cern.ch
root.cern.ch

Outline

Probability definition
Probability Density Functions
Some typical distributions
Bayes Theorem
Parameter Estimation
Hypothesis Testing

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

References
A lot of the material for this introduction to statistical
methods is extracted from a course:
Statistical Methods for Data Analysis
(Luca Lista, INFN Napoli)

Material available also in his book


Statistical Methods for Data Analysis in Particle Physics
(Springer)
http://www.springer.com/us/book/9783319201757

Other suggested book is


Data Analysis in High Energy Physics (Wiley)
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Definition Of Probability
Two main different definitions:
Frequentist
Probability is the ratio of the number of occurrences of an event to
the total number of experiments, in the limit of very large number of
repeatable experiments.
Can only be applied to a specific classes of events (repeatable
experiments)
Meaningless to state: probability that the lightest SuSy particles
mass is less tha 1 TeV

Bayesian
Probability measures someones the degree of belief that
something is or will be true: would you bet?
Probability measures someones the degree of belief that
something is or will be true: would you bet?
Probability that Barcelona will win the next Champion League
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Classical Probability
Assume all accessible cases are equally probable
Valid on discrete cases only
Problem in continuous cases (definition of metrics)

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Binomial Distribution
Distribution of number of successes on N trials
e.g. spinning a coin or a dice N times

Each trial has a probability p of success

Average: <n> = Np
Variance: <n2>-<n>2 = Np(1-p)
Used for efficiency
In ROOT is available as
ROOT::Math::binomial_pdf(n,p,N)

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Frequentist Probability
Law of large numbers

this means also that

circular definition of probabilities


a phenomenon can be proven to be random only if we
observe infinite cases
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Conditional Probability
Probability of A, given B : P(A|B)
probability that an event known to belong to set B is also
member of set A
P(A | B) = P(A B) / P(B)
A is independent of B if
the conditional probability
of A given B is equal to the
probability of A:
P(A | B) = P(A)

Hence, if A is independent on B
P(A | B) = P(A) P(B)

If A is independent on B, B is independent on A
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Prob. Density Functions (PDF)

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

Gaussian (Normal) Distribution

Average =
Variance = 2
Widely used
because of the
central limit theorem
TMath::Gaus(x, , ,true)
ROOT::Math::normal_pdf( x, , )
TF1 f(f,gausn,xmin,xmax);
x = gRandom->Gaus(, );

PDF(x)

Gaussian PDF

=0 =0.3
=0 =1

1.2

=0 =3

=-2 =1

0.8
0.6

0.4
0.2
0
5

5
x

N.B. gausn for a normalised (PDF) Gaussian


Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

10

Central limit theorem


Sum of n random variables xn converges to a Gaussian,
irrespective of the original distributions of the variables xn
(only some basic regularity conditions must hold)
xn Gaussian
Example adding n flat distributions
<x> for n = 2 (x is uniform in [0,10])

<x> for n = 5 (x is uniform in [0,10])

2 / ndf = 422.9 / 97

220

Constant

200

Mean

180

190.8 2.3

2 / ndf

300

Constant

4.989 0.022

250
Sigma

160

87.47 / 83
306.4 3.7

Mean

5.011 0.013

Sigma

1.293 0.009

2.031 0.015

140

200

120
150

100
80

100
60
40

n=2

20
0
0

Lorenzo Moneta
CERN PH-SFT

50

10

0
0

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

n=5
1

10

11

Uniform (flat) distribution

Standard Deviation

Model for position of rain drops, time of cosmic ray


passage, etc..
Basic distribution for pseudo-random number
generation
ROOT::Math::uniform_pdf( x, a, b)
x = gRandom->Uniform(a, b);

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

12

Cumulative Distribution
Given a PDF f(x) the cumulative is defined as

The PDF for F is uniform distributed in [0,1]

Inverting the cumulative distribution one can generate


pseudo-random numbers according to any distribution
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

13

Example of Cumulative Distributions


0.4

Probability density function


ROOT::Math::normal_pdf(x,,)

0.35

normal_pdf

0.3
0.25
0.2
0.15
0.1
0.05

Cumulative distribution and its


complement (right tail integral)
ROOT::Math::normal_cdf(x,,)

05

ROOT::Math::normal_quantile(p,)
ROOT::Math::normal_quantile_c(p,)

x5

normal_cdf

0.6

normal_cdf_c

0.4
0.2
4

x5

0.8

0.9

p1

05

normal_quantile

normal_quantile_c

1
0
1
2
3
0

Lorenzo Moneta
CERN PH-SFT

0.8

ROOT::Math::normal_cdf_c(x,,)

Inverse of the cumulative distributions


(quantile distributions)

0.1

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

0.2

0.3

0.4

0.5

0.6

0.7

14

Poisson Distribution
Probability to have n entries in x a subset of X >> x

Limit of binomial distribution when


p = x/X = /N << 1
P(n | , N) for N is a Poisson( n | )

Limit of Poisson for large is a Gaussian


ROOT::Math::poisson_pdf(n,)
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

15

Poisson limit for large


Poisson becomes a Gaussian for large

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

16

Crystal Ball Function


Add an asymmetric power-law tail to a Gaussian PDF with
proper normalisation and continuity of PDF and its derivative

Lorenzo Moneta
CERN PH-SFT

ROOT::Math::crystalball_pdf(x,,n,,)
TF1 f(f,crystalballn,xmin,xmax)

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

17

Landau Distribution
Model the fluctuations in the energy loss of particles in
this layers

Lorenzo Moneta
CERN PH-SFT

ROOT::Math::landau_pdf(x,s,m)
TF1 f(f,landaun,xmin,xmax)

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

18

Bayes Theorem

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

19

A concrete example
A person received a diagnosis of a serious illness
The probability to detect positively a ill person is
~100%
The probability to give a positive result on a healthy
person is 0.2%

What is the probability that the person is really ill?


Is 99.8% a reasonable answer ?

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

20

Result using Bayes theorem


We know:
P(+ | ill) ~ 100 % P(- | ill) << 1
P(+ | healthy) = 0.2 % P(- | healthy) = 99.8

Using Bayes theorem we want to know


P(ill | +) = P( + | ill) P(ill)/P(+) ~ P(ill)/P(+)

We need to know
P(ill) = probability that a random person is ill << 1
P(healthy) = 1-P(ill)

We have also
P(+) = P(+ | ill) P(ill) + P(+|healhy)P(healty)
~ P(ill) + P(+| healthy)

Result: P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )


Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

21

Result from Bayes theorem (2)


Result:
P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )
Using some numbers
P(ill) = 0.1 %
P(+ | healthy) = 0.2%
Then we have:
P(ill|+) = .1 /(.1+.2) = 33 %

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

22

Likelihood Function
Likelihood function:
given some observed events: x1, xn
Likelihood function is the PDF of the variables x1, xn
L (x1, xn | 1,n )

Bayes theorem can be written as


posterior

likelihood
likelihood function

prior
prior probability

normalisation term

normalisation term
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

23

Repeated use of Bayes theorem

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

24

Bayesian Inference
Posterior summarises all information on the unknown
parameters given the data

From the posterior one can estimate best parameter


values and probability intervals (credible intervals)
Result depends on the prior distribution

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

25

How to compute the Posterior PDF


Perform analytical integration
feasible in very few simple cases

Use numerical integration


May be CPU intensive
difficult for large multi-dimensional cases

Markov Chain Monte Carlo


sample parameter space efficiently using a random walk
heading to the regions of higher probability
Metropolis algorithm to sample according to a PDF f(x)

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

26

Markov-Chain Monte Carlo

Available in ROOT in the RooStats package


Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

27

Problem with Bayesian approach


Bayesian probability is subjective
depends on prior probabilities or degrees of belief about
the unknown parameters

Problem on how to represent lack of knowledge


e.g. uniform distribution is not invariant under coordinate
transformations
uniform in log is scale-invariant
Jeffreys prior: prior invariant under parameter transformation

Recommend a study of the sensitivity of the result on


the chosen prior PDF

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

28

Frequentist vs Bayesian Inference

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

29

Parameter Estimation

Parameter estimate
Likelihood function
Maximum Likelihood method
Property of estimators

Dr Lorenzo Moneta
CERN PH-SFT
CH-1211 Geneva 23

sftweb.cern.ch
root.cern.ch

30

Statistical Inference

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

31

Parameter estimators

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

32

Likelihood Function

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

33

Maximum Likelihood Estimates

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

34

Gaussian approximation

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

35

Estimator properties

Consistency
Bias
Efficiency
Robustness

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

36

Estimator consistency

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

37

Bias

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

38

Efficiency

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

39

Robustness

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

40

Parameter uncertainties with ML

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

41

Error Determination

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

42

Hypothesis Testing
Definition of hypothesis testing
Neyman-Pearson lemma and
Likelihood ratio

Dr Lorenzo Moneta
CERN PH-SFT
CH-1211 Geneva 23

sftweb.cern.ch
root.cern.ch

43

Hypothesis Tests

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

44

Hypothesis Test
H0 : null hypothesis
the hypothesis we want to prove that is false
e.g. the data contains only background (no Higgs signal)

H1 : alternate hypothesis
e.g. the data contains signal (Higgs) and background

: significance level: probability to reject H1 if true


(error of first kind)
= 1 - selection efficiency

: probability to reject H0 if true (error of second kind)


power (probability to reject H0 if H1 is true) = 1 -
= misidentification probability
Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

45

Example: Cut analysis

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

46

Likelihood Ratio

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

47

Newman Pearson Lemma

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

48

Summary
We will look next lectures on how to
how to use multivariate (machine learning) methods to do
classification and more
estimate the parameter uncertainty (errors) in maximum
likelihood fits
estimate confidence intervals
use hypothesis tests for estimate the discovery
significance of new particles

We will complement this with examples in TMVA,


RooFit and RooStats

Lorenzo Moneta
CERN PH-SFT

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

49

Das könnte Ihnen auch gefallen