Topos

From formulasearchengine
Revision as of 09:44, 1 February 2014 by en>Kephir (Equivalent definitions: ↪)
Jump to navigation Jump to search

In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P<<Q, and whose first moments exist, then

DKL(PQ)ΨQ*(μ'1(P)),

where ΨQ* is the rate function, i.e. the convex conjugate of the cumulant-generating function, of Q, and μ'1(P) is the first moment of P.

The Cramér–Rao bound is a corollary of this result.

Proof

Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P<<Q. Consider the natural exponential family of Q given by

Qθ(A)=AeθxQ(dx)eθxQ(dx)=1MQ(θ)AeθxQ(dx)

for every measurable set A, where MQ is the moment-generating function of Q. (Note that Q0=Q.) Then

DKL(PQ)=DKL(PQθ)+suppP(logdQθdQ)dP.

By Gibbs' inequality we have DKL(PQθ)0 so that

DKL(PQ)suppP(logdQθdQ)dP=suppP(logeθxMQ(θ))P(dx)

Simplifying the right side, we have, for every real θ where MQ(θ)<:

DKL(PQ)μ'1(P)θΨQ(θ),

where μ'1(P) is the first moment, or mean, of P, and ΨQ=logMQ is called the cumulant-generating function. Taking the supremum completes the process of convex conjugation and yields the rate function:

DKL(PQ)supθ{μ'1(P)θΨQ(θ)}=ΨQ*(μ'1(P)).

Corollary: the Cramér–Rao bound

Mining Engineer (Excluding Oil ) Truman from Alma, loves to spend time knotting, largest property developers in singapore developers in singapore and stamp collecting. Recently had a family visit to Urnes Stave Church.

Start with Kullback's inequality

Let Xθ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions. Then

limh0DKL(Xθ+hXθ)h2limh0Ψθ*(μθ+h)h2,

where Ψθ* is the convex conjugate of the cumulant-generating function of Xθ and μθ+h is the first moment of Xθ+h.

Left side

The left side of this inequality can be simplified as follows:

limh0DKL(Xθ+hXθ)h2=limh01h2(logdXθ+hdXθ)dXθ+h
=limh01h2[(1dXθdXθ+h)+12(1dXθdXθ+h)2+o((1dXθdXθ+h)2)]dXθ+h,
where we have expanded the logarithm logx in a Taylor series in 11/x,
=limh01h2[12(1dXθdXθ+h)2]dXθ+h
=limh01h2[12(dXθ+hdXθdXθ+h)2]dXθ+h=12X(θ),

which is half the Fisher information of the parameter θ.

Right side

The right side of the inequality can be developed as follows:

limh0Ψθ*(μθ+h)h2=limh01h2supt{μθ+htΨθ(t)}.

This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is Ψ'θ(τ)=μθ+h, but we have Ψ'θ(0)=μθ, so that

Ψ'θ(0)=dμθdθlimh0hτ.

Moreover,

limh0Ψθ*(μθ+h)h2=12Ψ'θ(0)(dμθdθ)2=12Var(Xθ)(dμθdθ)2.

Putting both sides back together

We have:

12X(θ)12Var(Xθ)(dμθdθ)2,

which can be rearranged as:

Var(Xθ)(dμθ/dθ)2X(θ).

See also

Notes and references

  1. Aimé Fuchs and Giorgio Letta, L'inégalité de Kullback. Application à la théorie de l'estimation. Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0