Statistics Rio Part1

Introduction to Statistical
Methods for Data Analysis
Dr Lorenzo Moneta
CERN PH-SFT
CH-1211 Geneva 23
sftweb.cern.ch
root.cern.ch
Outline
Probability definition
Probability Density Functions
Some typical distributions
Bayes Theorem
Parameter Estimation
Hypothesis Testing
Lorenzo Moneta
CERN PH-SFT
Data Analysis Tutorial at UERJ 2015: Introduction to Statistics
References
A lot of the material for this introduction to statistical
methods is extracted from a course:
Statistical Methods for Data Analysis
(Luca Lista, INFN Napoli)
Material available also in his book

Statistical Methods for Data Analysis in Particle Physics
(Springer)
http://www.springer.com/us/book/9783319201757
Other suggested book is

Data Analysis in High Energy Physics (Wiley)
Lorenzo Moneta
CERN PH-SFT
Definition Of Probability
Two main different definitions:
Frequentist
Probability is the ratio of the number of occurrences of an event to
the total number of experiments, in the limit of very large number of
repeatable experiments.
Can only be applied to a specific classes of events (repeatable
experiments)
Meaningless to state: probability that the lightest SuSy particles
mass is less tha 1 TeV
Bayesian
Probability measures someones the degree of belief that
something is or will be true: would you bet?
Probability measures someones the degree of belief that
something is or will be true: would you bet?
Probability that Barcelona will win the next Champion League
Lorenzo Moneta
CERN PH-SFT
Classical Probability
Assume all accessible cases are equally probable
Valid on discrete cases only
Problem in continuous cases (definition of metrics)
Lorenzo Moneta
CERN PH-SFT
Binomial Distribution
Distribution of number of successes on N trials
e.g. spinning a coin or a dice N times
Each trial has a probability p of success
Average: <n> = Np
Variance: <n2>-<n>2 = Np(1-p)
Used for efficiency
In ROOT is available as
ROOT::Math::binomial_pdf(n,p,N)
Lorenzo Moneta
CERN PH-SFT
Frequentist Probability
Law of large numbers
this means also that
circular definition of probabilities

a phenomenon can be proven to be random only if we
observe infinite cases
Lorenzo Moneta
CERN PH-SFT
Conditional Probability
Probability of A, given B : P(A|B)
probability that an event known to belong to set B is also
member of set A
P(A | B) = P(A B) / P(B)
A is independent of B if
the conditional probability
of A given B is equal to the
probability of A:
P(A | B) = P(A)
Hence, if A is independent on B
P(A | B) = P(A) P(B)
If A is independent on B, B is independent on A
Lorenzo Moneta
CERN PH-SFT
Prob. Density Functions (PDF)
Lorenzo Moneta
CERN PH-SFT
Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics
Gaussian (Normal) Distribution
Average =
Variance = 2
Widely used
because of the
central limit theorem
TMath::Gaus(x, , ,true)
ROOT::Math::normal_pdf( x, , )
TF1 f(f,gausn,xmin,xmax);
x = gRandom->Gaus(, );
PDF(x)
Gaussian PDF
=0 =0.3
=0 =1
1.2
=0 =3
=-2 =1
0.8
0.6
0.4
0.2
0
5
5
x
N.B. gausn for a normalised (PDF) Gaussian

Lorenzo Moneta
CERN PH-SFT
10
Central limit theorem

Sum of n random variables xn converges to a Gaussian,
irrespective of the original distributions of the variables xn
(only some basic regularity conditions must hold)
xn Gaussian
Example adding n flat distributions
<x> for n = 2 (x is uniform in [0,10])
<x> for n = 5 (x is uniform in [0,10])
2 / ndf = 422.9 / 97
220
Constant
200
Mean
180
190.8 2.3
2 / ndf
300
Constant
4.989 0.022
250
Sigma
160
87.47 / 83
306.4 3.7
Mean
5.011 0.013
Sigma
1.293 0.009
2.031 0.015
140
200
120
150
100
80
100
60
40
n=2
20
0
0
Lorenzo Moneta
CERN PH-SFT
50
10
0
0
n=5
1
10
11
Uniform (flat) distribution
Standard Deviation
Model for position of rain drops, time of cosmic ray

passage, etc..
Basic distribution for pseudo-random number
generation
ROOT::Math::uniform_pdf( x, a, b)
x = gRandom->Uniform(a, b);
Lorenzo Moneta
CERN PH-SFT
12
Cumulative Distribution
Given a PDF f(x) the cumulative is defined as
The PDF for F is uniform distributed in [0,1]
Inverting the cumulative distribution one can generate

pseudo-random numbers according to any distribution
Lorenzo Moneta
CERN PH-SFT
13
Example of Cumulative Distributions

0.4
Probability density function

ROOT::Math::normal_pdf(x,,)
0.35
normal_pdf
0.3
0.25
0.2
0.15
0.1
0.05
Cumulative distribution and its

complement (right tail integral)
ROOT::Math::normal_cdf(x,,)
05
ROOT::Math::normal_quantile(p,)
ROOT::Math::normal_quantile_c(p,)
x5
normal_cdf
0.6
normal_cdf_c
0.4
0.2
4
x5
0.8
0.9
p1
05
normal_quantile
normal_quantile_c
1
0
1
2
3
0
Lorenzo Moneta
CERN PH-SFT
0.8
ROOT::Math::normal_cdf_c(x,,)
Inverse of the cumulative distributions

(quantile distributions)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
14
Poisson Distribution
Probability to have n entries in x a subset of X >> x
Limit of binomial distribution when

p = x/X = /N << 1
P(n | , N) for N is a Poisson( n | )
Limit of Poisson for large is a Gaussian

ROOT::Math::poisson_pdf(n,)
Lorenzo Moneta
CERN PH-SFT
15
Poisson limit for large

Poisson becomes a Gaussian for large
Lorenzo Moneta
CERN PH-SFT
16
Crystal Ball Function

Add an asymmetric power-law tail to a Gaussian PDF with
proper normalisation and continuity of PDF and its derivative
Lorenzo Moneta
CERN PH-SFT
ROOT::Math::crystalball_pdf(x,,n,,)
TF1 f(f,crystalballn,xmin,xmax)
17
Landau Distribution
Model the fluctuations in the energy loss of particles in
this layers
Lorenzo Moneta
CERN PH-SFT
ROOT::Math::landau_pdf(x,s,m)
TF1 f(f,landaun,xmin,xmax)
18
Bayes Theorem
Lorenzo Moneta
CERN PH-SFT
19
A concrete example
A person received a diagnosis of a serious illness
The probability to detect positively a ill person is
~100%
The probability to give a positive result on a healthy
person is 0.2%
What is the probability that the person is really ill?

Is 99.8% a reasonable answer ?
Lorenzo Moneta
CERN PH-SFT
20
Result using Bayes theorem

We know:
P(+ | ill) ~ 100 % P(- | ill) << 1
P(+ | healthy) = 0.2 % P(- | healthy) = 99.8
Using Bayes theorem we want to know

P(ill | +) = P( + | ill) P(ill)/P(+) ~ P(ill)/P(+)
We need to know
P(ill) = probability that a random person is ill << 1
P(healthy) = 1-P(ill)
We have also
P(+) = P(+ | ill) P(ill) + P(+|healhy)P(healty)
~ P(ill) + P(+| healthy)
Result: P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )

Lorenzo Moneta
CERN PH-SFT
21
Result from Bayes theorem (2)

Result:
P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )
Using some numbers
P(ill) = 0.1 %
P(+ | healthy) = 0.2%
Then we have:
P(ill|+) = .1 /(.1+.2) = 33 %
Lorenzo Moneta
CERN PH-SFT
22
Likelihood Function
Likelihood function:
given some observed events: x1, xn
Likelihood function is the PDF of the variables x1, xn
L (x1, xn | 1,n )
Bayes theorem can be written as

posterior
likelihood
likelihood function
prior
prior probability
normalisation term
normalisation term
Lorenzo Moneta
CERN PH-SFT
23
Repeated use of Bayes theorem
Lorenzo Moneta
CERN PH-SFT
24
Bayesian Inference
Posterior summarises all information on the unknown
parameters given the data
From the posterior one can estimate best parameter

values and probability intervals (credible intervals)
Result depends on the prior distribution
Lorenzo Moneta
CERN PH-SFT
25
How to compute the Posterior PDF

Perform analytical integration
feasible in very few simple cases
Use numerical integration

May be CPU intensive
difficult for large multi-dimensional cases
Markov Chain Monte Carlo

sample parameter space efficiently using a random walk
heading to the regions of higher probability
Metropolis algorithm to sample according to a PDF f(x)
Lorenzo Moneta
CERN PH-SFT
26
Markov-Chain Monte Carlo
Available in ROOT in the RooStats package

Lorenzo Moneta
CERN PH-SFT
27
Problem with Bayesian approach

Bayesian probability is subjective
depends on prior probabilities or degrees of belief about
the unknown parameters
Problem on how to represent lack of knowledge

e.g. uniform distribution is not invariant under coordinate
transformations
uniform in log is scale-invariant
Jeffreys prior: prior invariant under parameter transformation
Recommend a study of the sensitivity of the result on

the chosen prior PDF
Lorenzo Moneta
CERN PH-SFT
28
Frequentist vs Bayesian Inference
Lorenzo Moneta
CERN PH-SFT
29
Parameter Estimation
Parameter estimate
Likelihood function
Maximum Likelihood method
Property of estimators
Dr Lorenzo Moneta
CERN PH-SFT
CH-1211 Geneva 23
sftweb.cern.ch
root.cern.ch
30
Statistical Inference
Lorenzo Moneta
CERN PH-SFT
31
Parameter estimators
Lorenzo Moneta
CERN PH-SFT
32
Likelihood Function
Lorenzo Moneta
CERN PH-SFT
33
Maximum Likelihood Estimates
Lorenzo Moneta
CERN PH-SFT
34
Gaussian approximation
Lorenzo Moneta
CERN PH-SFT
35
Estimator properties
Consistency
Bias
Efficiency
Robustness
Lorenzo Moneta
CERN PH-SFT
36
Estimator consistency
Lorenzo Moneta
CERN PH-SFT
37
Bias
Lorenzo Moneta
CERN PH-SFT
38
Efficiency
Lorenzo Moneta
CERN PH-SFT
39
Robustness
Lorenzo Moneta
CERN PH-SFT
40
Parameter uncertainties with ML
Lorenzo Moneta
CERN PH-SFT
41
Error Determination
Lorenzo Moneta
CERN PH-SFT
42
Hypothesis Testing
Definition of hypothesis testing
Neyman-Pearson lemma and
Likelihood ratio
Dr Lorenzo Moneta
CERN PH-SFT
CH-1211 Geneva 23
sftweb.cern.ch
root.cern.ch
43
Hypothesis Tests
Lorenzo Moneta
CERN PH-SFT
44
Hypothesis Test
H0 : null hypothesis
the hypothesis we want to prove that is false
e.g. the data contains only background (no Higgs signal)
H1 : alternate hypothesis
e.g. the data contains signal (Higgs) and background
: significance level: probability to reject H1 if true

(error of first kind)
= 1 - selection efficiency
: probability to reject H0 if true (error of second kind)

power (probability to reject H0 if H1 is true) = 1 -
= misidentification probability
Lorenzo Moneta
CERN PH-SFT
45
Example: Cut analysis
Lorenzo Moneta
CERN PH-SFT
46
Likelihood Ratio
Lorenzo Moneta
CERN PH-SFT
47
Newman Pearson Lemma
Lorenzo Moneta
CERN PH-SFT
48
Summary
We will look next lectures on how to
how to use multivariate (machine learning) methods to do
classification and more
estimate the parameter uncertainty (errors) in maximum
likelihood fits
estimate confidence intervals
use hypothesis tests for estimate the discovery
significance of new particles
We will complement this with examples in TMVA,

RooFit and RooStats
Lorenzo Moneta
CERN PH-SFT
49

Statistics Rio Part1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics Rio Part1

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction to Statistical

Methods for Data Analysis

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Material available also in his book

Other suggested book is

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Each trial has a probability p of success

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

this means also that

circular definition of probabilities

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Prob. Density Functions (PDF)

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

Gaussian (Normal) Distribution

N.B. gausn for a normalised (PDF) Gaussian

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Central limit theorem

<x> for n = 5 (x is uniform in [0,10])

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Uniform (flat) distribution

Model for position of rain drops, time of cosmic ray

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

The PDF for F is uniform distributed in [0,1]

Inverting the cumulative distribution one can generate

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Example of Cumulative Distributions

Probability density function

Cumulative distribution and its

Inverse of the cumulative distributions

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Limit of binomial distribution when

Limit of Poisson for large is a Gaussian

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Poisson limit for large

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Crystal Ball Function

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

What is the probability that the person is really ill?

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Result using Bayes theorem

Using Bayes theorem we want to know

Result: P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Result from Bayes theorem (2)

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Bayes theorem can be written as

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Repeated use of Bayes theorem

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

From the posterior one can estimate best parameter

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

How to compute the Posterior PDF

Use numerical integration

Markov Chain Monte Carlo

Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

Markov-Chain Monte Carlo

Available in ROOT in the RooStats package

Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics

Problem with Bayesian approach