|
|
Line 1: |
Line 1: |
| In [[mathematics]], more specifically [[measure theory]], there are various notions of the '''convergence of measures'''. For an intuitive general sense of what is meant by ''convergence in measure'', consider a sequence of measures μ<sub>''n''</sub> on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure μ that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking [[Limit of a sequence|limit]]s; for any error tolerance ε > 0 we require there be ''N'' sufficiently large for ''n'' ≥ ''N'' to ensure the 'difference' between μ<sub>''n''</sub> and μ is smaller than ε. Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.
| | Andera is what you can call her but she never truly favored that name. Credit authorising is where my main income arrives from. To play lacross is the thing I adore most of all. Mississippi is where his house is.<br><br>Also visit my page accurate psychic predictions, [http://koreanyelp.com/index.php?document_srl=1798&mid=SchoolNews koreanyelp.com], |
| | |
| Three of the most common notions of convergence are described below.
| |
| | |
| ==Informal descriptions==
| |
| This section attempts to provide a rough intuitive description of three notions of convergence, using terminology developed in [[calculus]] courses; this section is necessarily imprecise as well as inexact, and the reader should refer to the formal clarifications in subsequent sections. In particular, the descriptions here do not address the possibility that the measure of some sets could be infinite, or that the underlying space could exhibit pathological behavior, and additional technical assumptions are needed for some of the statements. The statements in this section are however all correct if <math>\mu_n</math> is a sequence of probability measures on a [[Polish space]].
| |
| | |
| The various notions of convergence formalize the assertion that the 'average value' of each 'sufficiently nice' function should converge:
| |
| | |
| :<math>\int f\, d\mu_n \to \int f\, d\mu</math>
| |
| | |
| To formalize this requires a careful specification of the set of functions under consideration and how uniform the convergence should be.
| |
| | |
| The notion of ''weak convergence'' requires this convergence to take place for every continuous bounded function <math>f</math>.
| |
| This notion treats convergence for different functions ''f'' independently of one another, ''i.e.'' different functions ''f'' may require different values of ''N'' ≤ ''n'' to be approximated equally well (thus, convergence is non-uniform in <math>f</math>).
| |
| | |
| The notion of ''strong convergence'' formalizes the assertion that the measure of each measurable set should converge:
| |
| | |
| :<math>\mu_n(A) \to \mu(A)</math>
| |
| | |
| Again, no uniformity over the set <math>A</math> is required.
| |
| Intuitively, considering integrals of 'nice' functions, this notion provides more uniformity than weak convergence. As a matter of fact, when considering sequences of measures with uniformly bounded
| |
| variation on a [[Polish space]], strong convergence implies the convergence <math>\int f\, d\mu_n \to \int f\, d\mu</math> for any bounded measurable function <math>f</math>.
| |
| As before, this convergence is non-uniform in <math>f</math>
| |
| | |
| The notion of ''total variation convergence'' formalizes the assertion that the measure of all measurable sets should converge ''uniformly'', i.e. for every <math>\epsilon > 0</math> there exists ''N''
| |
| such that <math>|\mu_n(A) - \mu(A)| < \epsilon</math> for every ''n > N'' and for every measurable set <math>A</math>. As before, this implies convergence of integrals against bounded measurable functions, but this time
| |
| convergence is uniform over all functions bounded by any fixed constant.
| |
| | |
| ==Total variation convergence of measures==
| |
| This is the strongest notion of convergence shown on this page and is defined as follows. Let <math>(X, \mathcal{F})</math> be a [[measurable space]]. The [[total variation]] distance between two (positive) measures μ and ν is then given by
| |
| | |
| :<math> \left \|\mu- \nu \right \|_{TV} = \sup_f \left \{ \int_X fd\mu - \int_X fd\nu \right \}.</math>
| |
| | |
| Here the supremum is taken over ''f'' ranging over the set of all [[measurable function]]s from ''X'' to [−1, 1]. This is in contrast, for example, to the [[Wasserstein metric]], where the definition is of the same form, but the supremum is taken over ''f'' ranging over the set of measurable functions from ''X'' to [−1, 1] which have [[Lipschitz constant]] at most 1; and also in contrast to the [[Radon metric]], where the supremum is taken over ''f'' ranging over the set of continuous functions from ''X'' to [−1, 1]. In the case where ''X'' is a [[Polish space]], the total variation metric coincides with the Radon metric.
| |
| | |
| If μ and ν are both [[probability measure]]s, then the total variation distance is also given by
| |
| | |
| :<math>\left \|\mu- \nu \right \|_{TV} = 2\cdot\sup_{A\in \mathcal{F}} | \mu (A) - \nu (A) |.</math>
| |
| | |
| The equivalence between these two definitions can be seen as a particular case of the [[transportation theory (mathematics)#Monge and Kantorovich formulations|Monge-Kantorovich duality]]. From the two definitions above, it is clear that the total variation distance between probability measures is always between 0 and 2.
| |
| | |
| To illustrate the meaning of the total variation distance, consider the following thought experiment. Assume that we are given two probability measures μ and ν, as well as a random variable ''X''. We know that ''X'' has law either μ or ν but we do not know which one of the two. Assume that these two measures have prior probabilities 0.5 each of being the true law of ''X''. Assume now that we are given ''one'' single sample distributed according to the law of ''X'' and that we are then asked to guess which one of the two distributions describes that law. The quantity
| |
| | |
| :<math>{2+\|\mu-\nu\|_{TV} \over 4}</math>
| |
| | |
| then provides a sharp upper bound on the prior probability that our guess will be correct.
| |
| | |
| Given the above definition of total variation distance, a sequence μ<sub>''n''</sub> of measures defined on the same measure space is said to '''converge''' to a measure μ in total variation distance if for every ε > 0, there exists an ''N'' such that for all ''n'' > ''N'', one has that<ref>{{cite journal |last=Madras |first=Neil |coauthors=Sezer, Deniz|title=Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances|journal=[[arxiv]]|date=25 Feb 2011 |url=http://arxiv.org/abs/1102.5245|accessdate=21 July 2012}}</ref>
| |
| | |
| :<math>\|\mu_n - \mu\|_{TV} < \epsilon</math>.
| |
| | |
| ==Strong convergence of measures==
| |
| For <math>(X, \mathcal{F})</math> a [[measurable space]], a sequence μ<sub>''n''</sub> is said to converge strongly to a limit μ if
| |
| | |
| :<math> \lim_{n \to \infty} \mu_n(A) = \mu(A)</math>
| |
| | |
| for every set <math>A\in\mathcal{F}</math>.
| |
| | |
| For example, as a consequence of the [[Riemann–Lebesgue lemma]], the sequence μ<sub>''n''</sub> of measures on the interval [−1, 1] given by μ<sub>''n''</sub>(''dx'') = (1+ sin(''nx''))''dx'' converges strongly to Lebesgue measure, but it does not converge in total variation.
| |
| | |
| == Weak convergence of measures ==
| |
| In [[mathematics]] and [[statistics]], '''weak convergence''' (also known as '''narrow convergence''' or '''weak-* convergence''', which is a more appropriate name from the point of view of [[functional analysis]], but less frequently used) is one of many types of convergence relating to the convergence of [[measure theory|measures]]. It depends on a topology on the underlying space and thus is not a purely measure theoretic notion.
| |
| | |
| There are several equivalent [[definition]]s of weak convergence of a sequence of measures, some of which are (apparently) more general than others. The equivalence of these conditions is sometimes known as the '''portmanteau theorem'''.<ref>Achim Klenke, ''Probability theory'' (2006) Springer-Verlag, ISBN 978-1-848000-047-6 doi:10.1007/978-1-848000-048-3</ref>
| |
| | |
| <blockquote>
| |
| '''Definition.''' Let ''S'' be a [[metric space]] with its [[Borel sigma algebra|Borel σ-algebra]] Σ. We say that a sequence of [[probability measure]]s ''P''<sub>''n''</sub> on (''S'', Σ), ''n'' = 1, 2, ..., converges weakly to the probability measure ''P'', and write
| |
| :<math>P_n\Rightarrow P</math>
| |
| if any of the following equivalent conditions is true (here E<sub>''n''</sub> denotes expectation with respect to ''P<sub>n</sub>'' while E denotes expectation with respect to ''P''):
| |
| * E<sub>''n''</sub>''f'' → E''f'' for all [[Bounded function|bounded]], [[continuous function]]s ''f'';
| |
| * E<sub>''n''</sub>''f'' → E''f'' for all bounded and [[Lipschitz function]]s ''f'';
| |
| * limsup E<sub>''n''</sub>''f'' ≤ E''f'' for every [[upper semi-continuous]] function ''f'' bounded from above;
| |
| * liminf E<sub>''n''</sub>''f'' ≥ E''f'' for every [[lower semi-continuous]] function ''f'' bounded from below;
| |
| * limsup ''P''<sub>''n''</sub>(''C'') ≤ ''P''(''C'') for all [[closed set]]s ''C'' of space ''S'';
| |
| * liminf ''P''<sub>''n''</sub>(''U'') ≥ ''P''(''U'') for all [[open set]]s ''U'' of space ''S'';
| |
| * lim ''P''<sub>''n''</sub>(''A'') = ''P''(''A'') for all [[continuity set]]s ''A'' of measure ''P''.
| |
| </blockquote>
| |
| | |
| In the case ''S'' = '''R''' with its usual topology, if ''F''<sub>''n''</sub>, ''F'' denote the [[cumulative distribution function]]s of the measures ''P''<sub>''n''</sub>, ''P'' respectively, then ''P''<sub>''n''</sub> converges weakly to ''P'' if and only if lim<sub>''n''→∞</sub> ''F''<sub>''n''</sub>(''x'') = ''F''(''x'') for all points ''x'' ∈ '''R''' at which ''F'' is continuous.
| |
| | |
| For example, the sequence where ''P''<sub>''n''</sub> is the [[Dirac measure]] located at 1/n converges weakly to the Dirac measure located at 0 (if we view these as measures on '''R''' with the usual topology), but it does not converge strongly. This is intuitively clear: we only know that 1/n is "close" to 0 because of the topology of '''R'''.
| |
| | |
| This definition of weak convergence can be extended for ''S'' any [[metrizable]] [[topological space]]. It also defines a weak topology on '''''P'''''(''S''), the set of all probability measures defined on (''S'', Σ). The weak topology is generated by the following basis of open sets:
| |
| | |
| :<math>\left\{ U_{\phi, x, \delta} \left| \begin{array}{c} \phi \colon S \to \mathbf{R} \text{ is bounded and continuous,} \\ x \in \mathbf{R} \text{ and } \delta > 0 \end{array} \right. \right\},</math>
| |
| | |
| where | |
| | |
| :<math>U_{\phi, x, \delta} := \left\{ \mu \in \boldsymbol{P}(S) \left| \left| \int_{S} \phi \mathrm{d} \mu - x \right| < \delta \right. \right\}.</math>
| |
| | |
| If ''S'' is also [[separable space|separable]], then '''''P'''''(''S'') is metrizable and separable, for example by the [[Lévy–Prokhorov metric]], if ''S'' is also compact or [[Polish space|Polish]], so is '''''P'''''(''S'').
| |
| | |
| If ''S'' is separable, it naturally embeds into '''''P'''''(''S'') as the (closed) set of [[dirac measure]]s, and its [[convex hull]] is [[Dense set|dense]].
| |
| | |
| There are many "arrow notations" for this kind of convergence: the most frequently used are <math>P_{n} \Rightarrow P</math>, <math>P_{n} \rightharpoonup P</math> and <math>P_{n} \xrightarrow{\mathcal{D}} P.</math>.
| |
| | |
| ===Weak convergence of random variables===
| |
| {{Main|Convergence of random variables}}
| |
| Let <math>(\Omega, \mathcal{F}, \mathbb{P})</math> be a [[probability space]] and '''X''' be a metric space. If {{nowrap|''X<sub>n</sub>'', ''X'': Ω → '''X'''}} is a sequence of [[random variable]]s then ''X<sub>n</sub>'' is said to '''converge weakly''' (or '''in distribution''' or '''in law''') to ''X'' as {{nowrap|''n'' → ∞}} if the sequence of [[pushforward measure]]s (''X<sub>n</sub>'')<sub>∗</sub>('''P''') converges weakly to ''X''<sub>∗</sub>('''P''') in the sense of weak convergence of measures on '''X''', as defined above.
| |
| | |
| {{More footnotes|date=February 2010}}
| |
| | |
| ==References==
| |
| <references />
| |
| | |
| * {{cite book | author=Ambrosio, L., Gigli, N. & Savaré, G. | title=Gradient Flows in Metric Spaces and in the Space of Probability Measures | publisher=ETH Zürich, Birkhäuser Verlag | location=Basel | year=2005 | isbn=3-7643-2428-7 }}
| |
| * {{cite book | last=Billingsley | first=Patrick | title=Probability and Measure | publisher=John Wiley & Sons, Inc. | location=New York, NY | year=1995 | isbn=0-471-00710-2}}
| |
| * {{cite book | last=Billingsley | first=Patrick | title=Convergence of Probability Measures | publisher=John Wiley & Sons, Inc. | location=New York, NY | year=1999 | isbn=0-471-19745-9}}
| |
| | |
| ==See also==
| |
| * [[Convergence of random variables]]
| |
| * [[Prokhorov's theorem]]
| |
| * [[Lévy–Prokhorov metric]]
| |
| * [[Tightness of measures]]
| |
| | |
| ==External links==
| |
| * [http://www.encyclopediaofmath.org/index.php/Convergence_of_measures Convergence of measures] at [http://www.encyclopediaofmath.org/ Encyclopedia of Mathematics]
| |
| | |
| [[Category:Measure theory]]
| |