Topos: Difference between revisions
en>Jakob.scholbach |
en>Kephir |
||
Line 1: | Line 1: | ||
In [[information theory]] and [[statistics]], '''Kullback's inequality''' is a lower bound on the [[Kullback–Leibler divergence]] expressed in terms of the [[large deviations theory|large deviations]] [[rate function]].<ref>Aimé Fuchs and Giorgio Letta, ''L'inégalité de Kullback. Application à la théorie de l'estimation.'' Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0</ref> If ''P'' and ''Q'' are [[probability distribution]]s on the real line, such that ''P'' is '''absolutely continuous''' with respect to ''Q'', i.e. ''P''<<''Q'', and whose first moments exist, then | |||
:<math>D_{KL}(P\|Q) \ge \Psi_Q^*(\mu'_1(P)),</math> | |||
where <math>\Psi_Q^*</math> is the rate function, i.e. the [[convex conjugate]] of the [[cumulant]]-generating function, of <math>Q</math>, and <math>\mu'_1(P)</math> is the first [[Moment (mathematics)|moment]] of <math>P.</math> | |||
The [[Cramér–Rao bound]] is a corollary of this result. | |||
==Proof== | |||
Let ''P'' and ''Q'' be [[probability distribution]]s (measures) on the real line, whose first moments exist, and such that [[Absolutely_continuous#Absolute_continuity_of_measures|''P''<<''Q'']]. Consider the '''[[natural exponential family]]''' of ''Q'' given by | |||
:<math>Q_\theta(A) = \frac{\int_A e^{\theta x}Q(dx)}{\int_{-\infty}^\infty e^{\theta x}Q(dx)} | |||
= \frac{1}{M_Q(\theta)} \int_A e^{\theta x}Q(dx)</math> | |||
for every measurable set ''A'', where <math>M_Q</math> is the '''[[moment-generating function]]''' of ''Q''. (Note that ''Q''<sub>0</sub>=''Q''.) Then | |||
:<math>D_{KL}(P\|Q) = D_{KL}(P\|Q_\theta) | |||
+ \int_{\mathrm{supp}P}\left(\log\frac{\mathrm dQ_\theta}{\mathrm dQ}\right)\mathrm dP.</math> | |||
By [[Gibbs' inequality]] we have <math>D_{KL}(P\|Q_\theta) \ge 0</math> so that | |||
:<math>D_{KL}(P\|Q) \ge | |||
\int_{\mathrm{supp}P}\left(\log\frac{\mathrm dQ_\theta}{\mathrm dQ}\right)\mathrm dP | |||
= \int_{\mathrm{supp}P}\left(\log\frac{e^{\theta x}}{M_Q(\theta)}\right) P(dx)</math> | |||
Simplifying the right side, we have, for every real θ where <math>M_Q(\theta) < \infty:</math> | |||
:<math>D_{KL}(P\|Q) \ge \mu'_1(P) \theta - \Psi_Q(\theta),</math> | |||
where <math>\mu'_1(P)</math> is the first moment, or mean, of ''P'', and <math>\Psi_Q = \log M_Q</math> is called the '''[[cumulant|cumulant-generating function]]'''. Taking the supremum completes the process of [[convex conjugate|convex conjugation]] and yields the [[rate function]]: | |||
:<math>D_{KL}(P\|Q) \ge \sup_\theta \left\{ \mu'_1(P) \theta - \Psi_Q(\theta) \right\} | |||
= \Psi_Q^*(\mu'_1(P)).</math> | |||
==Corollary: the Cramér–Rao bound== | |||
{{main|Cramér–Rao bound}} | |||
===Start with Kullback's inequality=== | |||
Let ''X''<sub>θ</sub> be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain [[Cramér–Rao_bound#Regularity_conditions|regularity conditions]]. Then | |||
:<math> \lim_{h\rightarrow 0} \frac {D_{KL}(X_{\theta+h}\|X_\theta)} {h^2} | |||
\ge \lim_{h\rightarrow 0} \frac {\Psi^*_\theta (\mu_{\theta+h})}{h^2}, | |||
</math> | |||
where <math>\Psi^*_\theta</math> is the [[convex conjugate]] of the [[Cumulant|cumulant-generating function]] of <math>X_\theta</math> and <math>\mu_{\theta+h}</math> is the first moment of <math>X_{\theta+h}.</math> | |||
===Left side=== | |||
The left side of this inequality can be simplified as follows: | |||
:<math>\lim_{h\rightarrow 0} | |||
\frac {D_{KL}(X_{\theta+h}\|X_\theta)} {h^2} | |||
=\lim_{h\rightarrow 0} | |||
\frac 1 {h^2} | |||
\int_{-\infty}^\infty \left( \log\frac{\mathrm dX_{\theta+h}}{\mathrm dX_\theta} \right) | |||
\mathrm dX_{\theta+h} | |||
</math> | |||
:<math> = \lim_{h\rightarrow 0} \frac 1 {h^2} \int_{-\infty}^\infty \left[ | |||
\left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) | |||
+\frac 1 2 \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) ^ 2 | |||
+ o \left( \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) ^ 2 \right) | |||
\right]\mathrm dX_{\theta+h}, | |||
</math> | |||
::where we have expanded the logarithm <math>\log x</math> in a [[Taylor series]] in <math>1-1/x</math>, | |||
:<math> = \lim_{h\rightarrow 0} \frac 1 {h^2} \int_{-\infty}^\infty \left[ | |||
\frac 1 2 \left( 1 - \frac{\mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) ^ 2 | |||
\right]\mathrm dX_{\theta+h} | |||
</math> | |||
:<math> | |||
= \lim_{h\rightarrow 0} \frac 1 {h^2} \int_{-\infty}^\infty \left[ | |||
\frac 1 2 \left( \frac{\mathrm dX_{\theta+h} - \mathrm dX_\theta}{\mathrm dX_{\theta+h}} \right) ^ 2 | |||
\right]\mathrm dX_{\theta+h} | |||
= \frac 1 2 \mathcal I_X(\theta),</math> | |||
which is half the [[Fisher information]] of the parameter θ. | |||
===Right side=== | |||
The right side of the inequality can be developed as follows: | |||
:<math> | |||
\lim_{h\rightarrow 0} \frac {\Psi^*_\theta (\mu_{\theta+h})}{h^2} | |||
= \lim_{h\rightarrow 0} \frac 1 {h^2} {\sup_t \{\mu_{\theta+h}t - \Psi_\theta(t)\} }. | |||
</math> | |||
This supremum is attained at a value of ''t''=τ where the first derivative of the cumulant-generating function is <math>\Psi'_\theta(\tau) = \mu_{\theta+h},</math> but we have <math>\Psi'_\theta(0) = \mu_\theta,</math> so that | |||
:<math>\Psi''_\theta(0) = \frac{d\mu_\theta}{d\theta} \lim_{h \rightarrow 0} \frac h \tau.</math> | |||
Moreover, | |||
:<math>\lim_{h\rightarrow 0} \frac {\Psi^*_\theta (\mu_{\theta+h})}{h^2} | |||
= \frac 1 {2\Psi''_\theta(0)}\left(\frac {d\mu_\theta}{d\theta}\right)^2 | |||
= \frac 1 {2\mathrm{Var}(X_\theta)}\left(\frac {d\mu_\theta}{d\theta}\right)^2.</math> | |||
===Putting both sides back together=== | |||
We have: | |||
:<math>\frac 1 2 \mathcal I_X(\theta) | |||
\ge \frac 1 {2\mathrm{Var}(X_\theta)}\left(\frac {d\mu_\theta}{d\theta}\right)^2,</math> | |||
which can be rearranged as: | |||
:<math>\mathrm{Var}(X_\theta) \ge \frac{(d\mu_\theta / d\theta)^2} {\mathcal I_X(\theta)}.</math> | |||
==See also== | |||
* [[Kullback–Leibler divergence]] | |||
* [[Cramér–Rao bound]] | |||
* [[Fisher information]] | |||
* [[Large deviations theory]] | |||
* [[Convex conjugate]] | |||
* [[Rate function]] | |||
* [[Moment-generating function]] | |||
==Notes and references== | |||
<references/> | |||
{{DEFAULTSORT:Kullback's Inequality}} | |||
[[Category:Information theory]] | |||
[[Category:Statistical inequalities]] | |||
[[Category:Estimation theory]] |
Revision as of 09:44, 1 February 2014
In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P<<Q, and whose first moments exist, then
where is the rate function, i.e. the convex conjugate of the cumulant-generating function, of , and is the first moment of
The Cramér–Rao bound is a corollary of this result.
Proof
Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P<<Q. Consider the natural exponential family of Q given by
for every measurable set A, where is the moment-generating function of Q. (Note that Q0=Q.) Then
By Gibbs' inequality we have so that
Simplifying the right side, we have, for every real θ where
where is the first moment, or mean, of P, and is called the cumulant-generating function. Taking the supremum completes the process of convex conjugation and yields the rate function:
Corollary: the Cramér–Rao bound
Mining Engineer (Excluding Oil ) Truman from Alma, loves to spend time knotting, largest property developers in singapore developers in singapore and stamp collecting. Recently had a family visit to Urnes Stave Church.
Start with Kullback's inequality
Let Xθ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions. Then
where is the convex conjugate of the cumulant-generating function of and is the first moment of
Left side
The left side of this inequality can be simplified as follows:
-
- where we have expanded the logarithm in a Taylor series in ,
which is half the Fisher information of the parameter θ.
Right side
The right side of the inequality can be developed as follows:
This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is but we have so that
Moreover,
Putting both sides back together
We have:
which can be rearranged as:
See also
- Kullback–Leibler divergence
- Cramér–Rao bound
- Fisher information
- Large deviations theory
- Convex conjugate
- Rate function
- Moment-generating function
Notes and references
- ↑ Aimé Fuchs and Giorgio Letta, L'inégalité de Kullback. Application à la théorie de l'estimation. Séminaire de probabilités (Strasbourg), vol. 4, pp. 108-131, 1970. http://www.numdam.org/item?id=SPS_1970__4__108_0