Medical Imaging Informatics

MEDICAL IMAGING INFORMATICS:
Lecture # 1
Estimation Theory
Norbert Schuff
Professor of Radiology
VA Medical Center and UCSF
Norbert.schuff@ucsf.edu
Medical Imaging Informatics 2012 Nschuff

UCSF VA
Course # 170.03 Department of
Slide 1/31 Radiology & Biomedical Imaging
Objectives Of Today’s Lecture
• Understand basic concepts of data modeling

– Deterministic
– Probabilistic
– Bayesian
• Learn the form of the most common estimators
– Least squares estimator
– Maximum likelihood estimator
– Maximum a-posteriori estimator
• Learn how estimators are used in image processing
UCSF VA
Department of
Radiology & Biomedical Imaging
What Is Medical Imaging Informatics?
• Signal Processing
– Digital Image Acquisition
– Image Processing and Enhancement
• Data Mining
– Computational anatomy
– Statistics
– Databases
– Data-mining
– Workflow and Process Modeling and Simulation
• Data Management
– Picture Archiving and Communication System (PACS)
– Imaging Informatics for the Enterprise
– Image-Enabled Electronic Medical Records
– Radiology Information Systems (RIS) and Hospital Information Systems (HIS)
– Quality Assurance
– Archive Integrity and Security
• Data Visualization
– Image Data Compression
– 3D, Visualization and Multi-media
– DICOM, HL7 and other Standards
• Teleradiology
– Imaging Vocabularies and Ontologies
– Transforming the Radiological Interpretation Process (TRIP)[2]
– Computer-Aided Detection and Diagnosis (CAD).
– Radiology Informatics Education
• Etc.
UCSF VA
Department of
What Is The Focus Of This Course?
Learn how to maximize information using efficient
computational tools
Generative
statistic
Collect Data drive

data the model
Measurements Image knowledge
Model
Collect
data Compare
with
Inference model
Statistic
UCSF VA
Department of
Challenge: Maximizing Information Gain
1. Q: How to estimate quantities from a given set of

uncertain (noisy) measurements?
A: Apply estimation theory (1st lecture today)
2. Q: How to quantify information?

A: Apply information theory (2nd lecture next week)
Medical Imaging Informatics 2009, Nschuff

UCSF VA
Motivation Example I: Tissue Classification
Gray/White Matter Segmentation Hypothetical Histogram
1.0
0.8
0.6
0.4
0.2
0.0
Intensity
GM/WM overlap 50:50;

Can we do better than flipping a coin?

UCSF VA
Motivation Example II:
MR Spectroscopy
Colored spectra from singular value

decomposition (SVD)
Inlets from Fourier Transformation (FT)
UCSF VA
Department of
Motivation Example III:
Signal Decomposition
Diffusion Tensor Imaging (DTI) Goal:
•Sensitive to random motion of water Represent fiber bundles
•Senses the neighborhood on microscopic scale
Microscopic
tissue sample
Dr. Van Wedeen, MGH
Quantitative Diffusion Maps

UCSF VA
Department of
Basic Concepts of Modeling
: unknown world
state of interest
: measurement
World ˆ : Estimator - a
Measurement good guess of 
based on
measurements
Cartoon adapted from: Rajesh P. N. Rao, Bruno A. Olshausen Probabilistic Models of the Brain.
MIT Press 2002.
UCSF VA
Department of
Deterministic Model
N = number of measurements
M = number of states, M=1 is possible
Usually N > M and ||noise||2 > 0
 1   h11 h1m   ˆ1   noise1 

       
 .    .  . 
   h hnm   ˆ   noise 
 n   n1 m   n
φ N  H NxM θˆ M  noiseN Unknowns
The model is deterministic, because

discrete values of θ are solutions.
Note:
1) we make no assumption about the
distribution of θ
1) Each value is as likely as any
another value
What is the best estimator under these

circumstances?
UCSF VA
Department of
Least-Squares Estimator (LSE)
The best what we can do is minimizing noise:
φ  Hθ  noise
φ  Hθˆ  0
LSE
H Tφ   H T H  θˆ LSE  0 Covariance Matrix
θLSE   H H  HT
T 1
•LSE is popular choice for model fitting

•Useful for obtaining a descriptive measure
But
•LSE makes no assumptions about distributions of data or parameters
•Has no basis for statistics  “deterministic model” UCSF VA
Department of
Prominent Examples of LSE
N
150 1
Mean Value: ˆmean   j
100 N j 1
Intensities (Y)
 
N
1

50
ˆvariance   j  ˆmean
2
Variance:
0
N  1 j 1
-50
100 300 500 700 900
Measurements (x)
200
Amplitude: ˆ1
100
Frequency: ˆ2
Intensity
0 Phase: ˆ3
-100
Decay: ˆ4
100 300 500 700 900
Measurements
UCSF VA
Department of
Likelihood Model
Likelihood of 
We believe  is governed by chance, but we
don’t know the governing probability.
We perform measurements for all possible

values of .
We obtain the likelihood function of 

given a series of measurements 
Note:
 is random and unknown
 has been measured
The likelihood of parameter  is the
probability of the observed data  as a
function of this parameter
UCSF VA
Department of
Likelihood Function
Let  be a random variable with a discrete probability distribution p depending
on the parameter . The likelihood function of  (given the outcome j of ) is:
Lθ |    Pr    j ; θ 
Example: Coin toss
Let the probability that a coin lands heads up when tossed be pH.
The probability of getting two heads in two tosses (HH) is pH * pH
Thus, if pH=0.5, the probability of seeing to heads is 0.25.
Another way of saying this is that the likelihood that pH=0.5 given
observation HH = 0.25 is
L pH  0.5 | HH   Pr HH ; pH  0.5  0.25

NOTE: this is not the same as saying that the probability that pH=0.5 is 0.25,
given the observation HH. The likelihood function is NOT a probability
density function which sums to 1.
UCSF VA
Department of
Likelihood Model (cont’d)
New Goal:
Find an estimator
which gives the most likely
probability of 
underlying L |   .
UCSF VA
Department of
Maximum Likelihood Estimator (MLE)
Goal: Find estimator which gives the most likely value of 
underlying the likelihood function of 
Highest probability
θˆ MLE  max Pr  ; θ 
The MLE can be found by taking the derivative of the likelihood function
UCSF VA
Department of
Example I: MLE Of Normal Distribution
Normal Distribution
Normal distribution 1.0
 1 2
Pr N |  ,    exp  2    j     
N
2 0.8
 2 j 1  0.6
a2
0.4
log of the normal distribution (normD) 0.2
ln Pr N |  ,   2   
 j  
N
2 1 2 0.0
2 j 1
100 300 500
a1
700 900
Log Normal Distribution
MLE of the mean (1st derivative): 0
1 N
d
d
ln Pr    
4ˆ N j 1
2   j     0 -5
-10
N
1
MLE 
N
  j 
j 1
-15
100 300 500 700 900
a1
UCSF VA
Department of
Example II: MLE Of Binominal Distribution
(Coin Tosses)
n= # of tosses
= # of heads in n tosses
 = probability of obtaining a head in a toss (unknown)
Pr(; ; n) = probability of heads from n tosses given 
Pr(;  =0.7;n)
0.7
0.2
0.1
y
Pr(;  =0.3;n) 0.3 0.0
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10
UCSF VA

N Department of
MLE Of Coin Toss (cont’d)
Goal:
Given Pr(; ), estimate the most likely probability distribution ˆMLE  max Pr; 
that produced the data.
# heads
Intuitively: ˆMLE 
# tosses
For a fair coin ˆMLE  0.5

UCSF VA
Coin tosses follow a binominal distribution
n!
Pr Y  y | n,    y 1   
n y
 y ! n  y !
0.25
0.20
Likelihood
0.15
Likelihood function of coin tosses
0.10
n!
L  | y, n    y 1   
n y 0.05
 y ! n  y ! 0.00

0.1 0.3 0.5 0.7 0.9
W
What is the likelihood of observing 7 heads
given that we tossed a coin 10 times?
10!
L  | n  10, y  7    0.57 1  0.5
10  7
 0.12
For a fair coin  =0.5:  7!10  7 !
10!
L  | n  10, y  7    0.67 1  0.6 
10  7
For an unfair coin  =0.6  0.21
 7!10  7 !

UCSF VA
Likelihood function of coin tosses
0.25
n! 0.20
L  | y    y 1   
n y
Likelihood
 y ! n  y !
0.15
0.10
0.05
0.00
0.1 0.3 0.5 0.7 0.9

W
log likelihood function
ln L  | y  
-1
n!
ln  y ln    n  y  ln 1   
 y ! n  y ! -2
-3
0.1 0.3 0.5 0.7 0.9


UCSF VA
MLE Of Coin Toss
Evaluate MLE equation (1st derivative)
n!
ln L  | y   ln  y ln    n  y  ln 1   
 y ! n  y !
d ln L   y  n  y
 0
d  1
y  n y # heads
  0  ˆMLE  
 1    n # tosses
According to the MLE principle, the distribution Pr(y; MLE; n) for a given n
is the most likely distribution to have generated the observed data
of y.

UCSF VA
Relationship between MLE and LSE
Assume: •  is independent of noise
• Probability of measurements and noise have the same distribution
Prθ N |    Prnoise N  H |  
MLE is maximized when LSE is minimized

UCSF VA
Bayesian Model
Prior knowledge
Now, the daemon comes
into play, but we know
the daemon’s preference
for  (prior knowledge).
prior    Pr  
New Goal:
Find the estimator which
gives the most likely
probability distribution of 
given everything we know.

UCSF VA
Bayesian Model
Prior Likelihood
Posterior
Pr  |    Pr  ;   Pr  
UCSF VA
Thomas Bayes (1701-1761)
Gerolamo Cardano Abraham de Moivre

(1501-1576) (1667-1754)
First to formulate elementary First to formally derive the

rules of probability normal distribution curve
UCSF VA
Department of
Maximum A-Posteriori (MAP) Estimator
Goal:
Find the most likely MAP (max. posterior density of ) given .
Maximize joint density
ˆMAP  maxPr  ;   Pr  
MAP can be found by taken the partial derivative of the joint density
With respect to 

UCSF VA
Example III: MAP Of Normal Distribution
The sample mean of MAP is:
 2 N
ˆMAP 
   T 
2 2 
j 1
j
MAP is a linear combination between the prior mean and sample mean weighted
by there respective covariances
The case  2   represents an non-informative prior, leading toˆ  ˆMLE

MAP

UCSF VA
Posterior Distribution and Decision Rules
(|)
MSE
X MAP 
UCSF VA
Department of
Some Desirable Properties of Estimators I:
Unbiased: Mean value of the error should be zero
E  - 0
Consistent: Error estimator should decrease asymptotically as number of

measurements increase. (Mean Square Error (MSE))
2
MSE  E  -   0 for large N
What happens to MSE when estimator is biased?

2
MSE  E  -  - b E b
2
variance bias UCSF VA

Department of
Some Desirable Properties of Estimators II:
Efficient: Co-variance matrix of error should decrease asymptotically to its
minimal value for large N
Cik  E  i -i   k - k 
T
 some.very.small.value

UCSF VA
Example:
Properties Of Estimators Mean and Variance
1 N 1
Mean: E ˆ   E   j    N   
N j 1 N
The sample mean is an unbiased estimator of the true mean
1 N 2
 2  E   j      2  N 
1
E  ˆ   
2 2 2
Variance:
N j 1 N N
The variance is a consistent estimator because

It approaches zero for large number of measurements.

UCSF VA
Summary
• LSE is a descriptive method to accurately fit data to a

model.
• MLE is a method to seek the probability distribution that
makes the observed data most likely.
• MAP is a method to seek the most probably parameter
value given prior information about the parameters and
the observed data.
• If the influence of prior information decreases, MAP

approaches MLE.

UCSF VA
Some Priors in Imaging
• Smoothness of the brain

• Anatomical boundaries
• Intensity distributions
• Anatomical shapes
• Physical models
– Point spread function
– Bandwidth limits
• Etc.

UCSF VA
Estimation Theory: Motivation Example I
Gray/White Matter Segmentation Hypothetical Histogram
1.0
0.8
0.6
0.4
0.2
0.0
Intensity
What works better than flipping a coin?
Design likelihood functions based on

anatomy
co-occurance of signal intensities
others
Determine prior distribution
population based atlas of regional intensities
model based distributions of intensities UCSF VA
Department of
others Radiology & Biomedical Imaging
Data-Driven Brain MRI Segmentation On Edge
Confidence And Prior Tissue Information
J.R.Jiménez-Alaniz, et al.
IEEE TRANSACTIONS ON
MEDICAL IMAGING, VOL. 25, NO. 1,
JANUARY 2006
UCSF VA
Department of
Estimation Theory: Motivation Example III
Diffusion Spectrum Imaging – Human Cingulum Bundle
Goal:
Capture directions
of fiber bundles
Improvements to identify tracts:
Design likelihood functions based on

similarity measures of adjacent
voxels, e.g. correlations
fiber anatomy
maximum bend
Determine prior distributions from

anatomy
fiber skeletons from a population
others
Dr. Van Wedeen, MGH

UCSF VA
Bayesian Framework For Global Tractography
J.R.S. Jbabdi, et al.
NeuroImage 37 (2007)
116–129
UCSF VA
Department of
MAP Estimation In Image Reconstruction
Human brain MRI. (a) The original LR data. (b) Zero-padding

interpolation. (c) SR with box-PSF. (d) SR with Gaussian-PSF.
From: A. Greenspan in
The Computer Journal Advance Access published February 19, 2008

UCSF VA
Improved ASL Perfusion Results
zDFT =
zero-filled DFT
By Dr. John Kornak, UCSF

UCSF VA
Bayesian Automated Image Segmentation
Bruce Fischl, MGH

UCSF VA
Population Atlases As Priors
Dr. Sarang Joshi, U Utah, Salt Lake City

UCSF VA
Imaging Software Using MLE And MAP
Packages Applications Languages
VoxBo fMRI C/C++/IDL
MEDx sMRI, fMRI C/C++/Tcl/Tk
SPM fMRI, sMRI matlab/C
iBrain IDL
FSL fMRI, sMRI, DTI C/C++
fmristat fMRI matlab
BrainVoyager sMRI C/C++
BrainTools C/C++
AFNI fMRI, DTI C/C++
Freesurfer sMRI C/C++
NiPy Python

UCSF VA

Medical Imaging Informatics

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Medical Imaging Informatics

Hochgeladen von

Copyright:

Verfügbare Formate

MEDICAL IMAGING INFORMATICS:

Medical Imaging Informatics 2012 Nschuff

• Understand basic concepts of data modeling

Collect Data drive

1. Q: How to estimate quantities from a given set of

2. Q: How to quantify information?

Medical Imaging Informatics 2009, Nschuff

GM/WM overlap 50:50;

Medical Imaging Informatics 2009, Nschuff

Colored spectra from singular value

Inlets from Fourier Transformation (FT)

Dr. Van Wedeen, MGH

Quantitative Diffusion Maps

 1   h11 h1m   ˆ1   noise1 

φ N  H NxM θˆ M  noiseN Unknowns

The model is deterministic, because

What is the best estimator under these

H Tφ   H T H  θˆ LSE  0 Covariance Matrix

•LSE is popular choice for model fitting

100 300 500 700 900

We perform measurements for all possible

We obtain the likelihood function of 

Example: Coin toss

L pH  0.5 | HH   Pr HH ; pH  0.5  0.25

log of the normal distribution (normD) 0.2

Log Normal Distribution

MLE of the mean (1st derivative): 0

Pr(;  =0.3;n) 0.3 0.0

For a fair coin ˆMLE  0.5

Medical Imaging Informatics 2009, Nschuff

Medical Imaging Informatics 2009, Nschuff

0.1 0.3 0.5 0.7 0.9

0.1 0.3 0.5 0.7 0.9

Medical Imaging Informatics 2009, Nschuff

Prθ N |    Prnoise N  H |  

MLE is maximized when LSE is minimized

Medical Imaging Informatics 2009, Nschuff

Medical Imaging Informatics 2009, Nschuff

Gerolamo Cardano Abraham de Moivre

First to formulate elementary First to formally derive the

Medical Imaging Informatics 2009, Nschuff

The case  2   represents an non-informative prior, leading toˆ  ˆMLE

Medical Imaging Informatics 2009, Nschuff

Consistent: Error estimator should decrease asymptotically as number of

What happens to MSE when estimator is biased?

variance bias UCSF VA

Medical Imaging Informatics 2009, Nschuff

The variance is a consistent estimator because

Medical Imaging Informatics 2009, Nschuff

• LSE is a descriptive method to accurately fit data to a

• If the influence of prior information decreases, MAP

Medical Imaging Informatics 2009, Nschuff

• Smoothness of the brain

Medical Imaging Informatics 2009, Nschuff

What works better than flipping a coin?

Design likelihood functions based on

Improvements to identify tracts:

Design likelihood functions based on

Determine prior distributions from

Dr. Van Wedeen, MGH

Medical Imaging Informatics 2009, Nschuff

J.R.S. Jbabdi, et al.

Human brain MRI. (a) The original LR data. (b) Zero-padding

Medical Imaging Informatics 2009, Nschuff