Sie sind auf Seite 1von 7

Notes on Jensen’s inequality

There are various concrete representa!ons of Jensen’s inequality.

Jensen’s inequality in Andrew Ng’s Lecture Notes

Let f be a convex func!on, and let X be a random variable. Then:

E[f (X)] ≥ f (EX)

Moreover, if f is strictly convex, then E[f (X)] = f (EX) holds true if and only if
X = E[X] with probability 1 (i.e., if X is a constant).

Jensen’s inequality also holds for concave func!ons f , but with the direc!on of all the
inequali!es reversed (E[f (X)] ≤ f (EX), etc.).

For an interpreta!on of the theorem, consider the figure below.


Here, f is a convex func!on shown by the solid line. Also, X is a random variable that
has a 0.5 chance of taking the value a, and a 0.5 chance of taking the value b (indicated
on the x-axis). Thus, the expected value of X is given by the midpoint between a and b.
We also see the values f (a), f (b) and f (E[X]) indicated on the y -axis. Moreover, the
value E[f (X)] is now the midpoint on the y -axis between f (a) and f (b). From our
example, we see that because f is convex, it must be the case that E[f (X)] ≥
f (EX).

Jensen’s inequality in David McAllester’s Lecture Notes

Consider a probability distribu!on P on a set M and a func!on X assigning real values


X(m) for m ∈ M . If f is convex then for any distribu!on P on M we have the
following:
Em∼P [f (X(m))] ≥ f (Em∼P [X(m)])

Jensen’s inequality in Richard Yida Xu’s Lecture Notes

If Φ is a convex func!on and 0 < t < 1, then

Φ((1 − t) × x1 + t × x2 ) ≤ (1 − t) × Φ(x1 ) + t × Φ(x2 )


n
With ∑i=1 pi = 1, we can generalize the above inequality.

Φ(p1 x1 + p2 x2 + ... + pn xn ) ≤ p1 Φ(x1 ) + p1 Φ(x1 ) + ... + pn Φ(xn )


n n
Φ(∑ pi × xi ) ≤ ∑ pi × Φ(xi )
i=1 i=1

If both values of xi and f (xi ) are in the domain of Φ, we can replace xi with f (xi ) and
s!ll get
n n
Φ(∑ pi × f (xi )) ≤ ∑ pi × Φ(f (xi ))
i=1 i=1

For the con!nuous case with ∫x∈S p(x) = 1, if both values of x and f (x) are in the
domain of Φ we can get

Φ(∫ f (x)p(x)dx) ≤ ∫ Φ(f (x))p(x)dx


x∈S x∈S

Actually, the above inequality is

Φ(E[f (x)]) ≤ E[Φ(f (x))]

Jensen’s inequality from Wikipedia

Form involving a probability density func!on

Suppose Ω is a measurable subset of the real line and f (x) is a non-nega!ve func!on
such that

∫ f (x) dx = 1
−∞

In probabilis!c language, f is a probability density func!on.

Then Jensen’s inequality becomes the following statement about convex integrals:

If g is any real-valued measurable func!on and φ is convex over the range of g , then
∞ ∞
φ (∫ g(x)f (x) dx) ≤ ∫ φ(g(x))f (x) dx.
−∞ −∞

If g(x) = x, then this form of the inequality reduces to a commonly used special case:
∞ ∞
φ (∫ x f (x) dx) ≤ ∫ φ(x) f (x) dx.
−∞ −∞

Alterna!ve finite form

Let Ω = {x1 , ...xn }, and take µ to be the coun!ng measure on Ω, then the general
form reduces to a statement about sums:

n n
φ (∑ g(xi )f (xi )) ≤ ∑ φ(g(xi ))f (xi )
i=1 i=1

provided that f (xi ) = λi ≥ 0 and

λ1 + ⋯ + λn = 1

Gibbs’ inequality

If p(x) is the true probability distribu!on for x, and q(x) is another distribu!on, then
applying Jensen’s inequality for the random variable Y (x) = q(x)/p(x) and the
func!on φ(y) = −log(y) gives
E[φ(Y )] ≥ φ(E[Y ])

Therefore:

KL(p(x)∥q(x)) = ∫ p(x) log ( ) dx


p(x)
q(x)
= − ∫ p(x) log ( ) dx
q(x)
p(x)
≥ − log (∫ p(x) dx)
q(x)
p(x)
= − log (∫ q(x) dx) = 0

a result called Gibbs’ inequality.

It shows that the average message length is minimized when codes are assigned on the
basis of the true probabili!es p rather than any other distribu!on q . The quan!ty that is
non-nega!ve is called the Kullback–Leibler divergence of q from p.

Since −log(x) is a strictly convex func!on for x > 0, it follows that equality holds
when p(x) equals q(x) almost everywhere.

Notes:
Compared with other notes, it seems that the notes from Richard Yida Xu and Wikipedia
be"er match the deriva!on of EM and KL-divergence.

Reference

David McAllester, Jensen’s Inequality, (h"p://#c.uchicago.edu/~dmcallester/#c101-


07/lectures/jensen/jensen.pdf)

Richard YiDa Xu, Expecta!on-Maximiza!on, (h"p://www-


staff.it.uts.edu.au/~ydxu/ml_course/em.pdf)

Jensen’s inequality from Wikipedia (h"ps://en.wikipedia.org/wiki/Jensen’s_inequality)

Das könnte Ihnen auch gefallen