|
|
Line 1: |
Line 1: |
| {{More footnotes|date=November 2010}}
| | Wilber Berryhill is the title his mothers and fathers gave him and he completely digs that name. Some time ago he chose to reside in North Carolina and he doesn't strategy on altering it. Office supervising is where her main income comes from. One of the things she loves most is canoeing and she's been doing it for quite a whilst.<br><br>Also visit my web page; [http://netwk.hannam.ac.kr/xe/data_2/85669 psychic phone] |
| In [[statistics]], '''Bessel's correction''', named after [[Friedrich Bessel]], is the use of ''n'' − 1 instead of ''n'' in the formula for the [[sample variance]] and [[sample standard deviation]], where ''n'' is the number of observations in a sample: it corrects the bias in the estimation of the population variance, and some (but not all) of the bias in the estimation of the population standard deviation.
| |
| | |
| That is, when [[estimation theory|estimating]] the population [[variance]] and [[standard deviation]] from a sample when the population mean is unknown, the sample variance is estimated as the ''mean'' of the squared deviations of sample values from their mean—that is, using a multiplicative factor 1/''n''—is a [[biased estimator]] of the population variance, and for the average sample underestimates it. Multiplying the standard sample variance as computed in that fashion by ''n''/(''n'' − 1) (equivalently, using 1/(''n'' − 1) instead of 1/''n'' in the estimator's formula) corrects for this, and gives an unbiased estimator of the population variance. The cost of this correction is that the unbiased estimator has uniformly higher [[mean squared error]] than the biased estimator. In some terminology,<ref>W.J. Reichmann, W.J. (1961) ''Use and abuse of statistics'', Methuen. Reprinted 1964–1970 by Pelican. Appendix 8.</ref><ref>Upton, G.; Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. ISBN 978-0-19-954145-4 (entry for "Variance (data)")</ref> the factor ''n''/(''n'' − 1) is itself called '''Bessel's correction'''.
| |
| | |
| A subtle point is that, while the sample variance (using Bessel's correction) is an unbiased estimate of the population variance, its [[square root]], the sample standard deviation, is a ''biased'' estimate of the population standard deviation; because the square root is a [[concave function]], the bias is downward, by [[Jensen's inequality]]. There is no general formula for an unbiased estimator of the population standard deviation, though there are correction factors for particular distributions, such as the normal; see [[unbiased estimation of standard deviation]] for details. An approximation for the exact correction factor for the normal distribution is given by using ''n'' − 1.5 in the formula: the bias decays quadratically (rather than linearly, as in the uncorrected form and Bessel's corrected form).
| |
| | |
| One can understand Bessel's correction intuitively as the [[Degrees of freedom (statistics)|degrees of freedom]] in the [[errors and residuals in statistics|residuals]] vector:
| |
| | |
| :<math>(x_1-\overline{x},\,\dots,\,x_n-\overline{x}),</math>
| |
| | |
| where <math>\overline{x}</math> is the sample mean. While there are ''n'' independent samples, there are only ''n'' − 1 independent residuals, as they sum to 0. This is explained further in the article [[Degrees of freedom (statistics)#Residuals|Degrees of freedom (statistics)]]. | |
| | |
| == The source of the bias ==
| |
| Suppose the mean of the whole population is 2050, but the statistician does not know that, and must estimate it based on this small sample chosen randomly from the population:
| |
| : <math> 2051,\quad 2053,\quad 2055,\quad 2050,\quad 2051 \, </math>
| |
| | |
| One may compute the sample average: | |
| : <math> \frac{1}{5}\left(2051 + 2053 + 2055 + 2050 + 2051\right) = 2052</math>
| |
| | |
| This may serve as an observable estimate of the unobservable population average, which is 2050. Now we face the problem of estimating the population variance. That is the average of the squares of the deviations from 2050. If we knew that the population average is 2050, we could proceed as follows:
| |
| : <math>\begin{align}
| |
| {} & \frac{1}{5}\left[(2051 - 2050)^2 + (2053 - 2050)^2 + (2055 - 2050)^2 + (2050 - 2050)^2 + (2051 - 2050)^2\right] \\
| |
| =\; & \frac{36}{5} = 7.2
| |
| \end{align}</math>
| |
| | |
| But our estimate of the population average is the sample average, 2052, not 2050. Therefore we do what we can:
| |
| : <math>\begin{align}
| |
| {} & \frac{1}{5}\left[(2051 - 2052)^2 + (2053 - 2052)^2 + (2055 - 2052)^2 + (2050 - 2052)^2 + (2051 - 2052)^2\right] \\
| |
| =\; & \frac{16}{5} = 3.2
| |
| \end{align}</math>
| |
| | |
| This is a substantially smaller estimate. Now a question arises: is the estimate of the population variance that arises in this way using the sample mean ''always'' smaller than what we would get if we used the population mean? The answer is ''yes'' except when the sample mean happens to be the same as the population mean.
| |
| | |
| We are seeking the sum of squared distances from the population mean, but end up calculating the sum of squared differences from the sample mean, which, as will be seen, is the number that minimizes that sum of squared distances. So unless the sample happens to have the same mean as the population, this estimate will always underestimate the population variance.
| |
| | |
| To see why this happens, we use a simple identity in algebra:
| |
| : <math>(a + b)^2 = a^2 + 2ab + b^2\,</math>
| |
| | |
| With <math>a</math> representing the deviation from an individual to the sample mean, and <math>b</math> representing the deviation from the sample mean to the population mean. Note that we've simply decomposed the actual deviation from the (unknown) population mean into two components: the deviation to the sample mean, which we can compute, and the additional deviation to the population mean, which we can not. Now apply that identity to the squares of deviations from the population mean:
| |
| : <math>\begin{align}
| |
| {[}\,\underbrace{2053 - 2050}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the population} \\ \text{mean} \end{smallmatrix}}\,]^2 & = [\,\overbrace{(\,\underbrace{2053 - 2052}_{\begin{smallmatrix} \text{Deviation from} \\ \text{the sample mean} \end{smallmatrix}}\,)}^{\text{This is }a.} + \overbrace{(2052 - 2050)}^{\text{This is }b.}\,]^2 \\
| |
| & = \overbrace{(2053 - 2052)^2}^{\text{This is }a^2.} + \overbrace{2(2053 - 2052)(2052 - 2050)}^{\text{This is }2ab.} + \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.}
| |
| \end{align}</math>
| |
| | |
| Now apply this to all five observations and observe certain patterns:
| |
| : <math>\begin{align}
| |
| \overbrace{(2051 - 2052)^2}^{\text{This is }a^2.}\ +\ \overbrace{2(2051 - 2052)(2052 - 2050)}^{\text{This is }2ab.}\ +\ \overbrace{(2052 - 2050)^2}^{\text{This is }b^2.} \\
| |
| (2053 - 2052)^2\ +\ 2(2053 - 2052)(2052 - 2050)\ +\ (2052 - 2050)^2 \\
| |
| (2055 - 2052)^2\ +\ 2(2055 - 2052)(2052 - 2050)\ +\ (2052 - 2050)^2 \\
| |
| (2050 - 2052)^2\ +\ 2(2050 - 2052)(2052 - 2050)\ +\ (2052 - 2050)^2 \\
| |
| (2051 - 2052)^2\ +\ \underbrace{2(2051 - 2052)(2052 - 2050)}_{\begin{smallmatrix} \text{The sum of the entries in this} \\ \text{middle column must be 0.} \end{smallmatrix}}\ +\ (2052 - 2050)^2
| |
| \end{align}</math>
| |
| | |
| The sum of the entries in the middle column must be zero because the sum of the deviations from the sample average must be zero. When the middle column has vanished, we then observe that
| |
| * The sum of the entries in the first column (''a''<sup>2</sup>) is the sum of the squares of the deviations from the sample mean;
| |
| * The sum of ''all'' of the entries in the remaining two columns (''a''<sup>2</sup> and ''b''<sup>2</sup>) is the sum of squares of the deviations from the population mean, because of the way we started with [2053 − 2050]<sup>2</sup>, and did the same with the other four entries;
| |
| * The sum of ''all'' the entries must be bigger than the sum of the entries in the first column, since all the entries that have not vanished are positive (except when the population mean is the same as the sample mean, in which case all of the numbers in the last column will be 0).
| |
| | |
| Therefore:
| |
| * The sum of squares of the deviations from the ''population'' mean will be bigger than the sum of squares of the deviations from the ''sample'' mean (except when the population mean is the same as the sample mean, in which case the two are equal).
| |
| | |
| That is why the sum of squares of the deviations from the ''sample'' mean is too small to give an unbiased estimate of the population variance when the average of those squares is found.
| |
| | |
| == Terminology ==
| |
| This correction is so common that the term "sample variation" and "sample standard deviation" are frequently used to mean the corrected estimators (unbiased sample variation, less biased sample standard deviation), using ''n'' − 1. However caution is needed: some calculators and software packages may provide for both or only the more unusual formulation. <!-- For precision, in this article we use "standard deviation of the sample" to mean the actual standard deviation of the sample, which by definition uses ''n,'' and is a biased ''estimator'' of the population standard deviation. --> This article uses the following symbols and definitions:
| |
| | |
| :''μ'' is the population mean
| |
| | |
| :<math>\overline{x}\,</math> is the sample mean
| |
| | |
| :''σ''<sup>2</sup> is the population variance
| |
| | |
| :''s<sub>n</sub>''<sup>2</sup> is the biased sample variance (i.e. without Bessel's correction)
| |
| | |
| :''s''<sup>2</sup> is the unbiased sample variance (i.e. with Bessel's correction)
| |
| | |
| The standard deviations will then be the square roots of the respective variances. Since the square root introduces bias, the terminology "uncorrected" and "corrected" is preferred for the standard deviation estimators:
| |
| | |
| :''s<sub>n</sub>'' is the uncorrected sample standard deviation (i.e. without Bessel's correction)
| |
| | |
| :''s'' is the corrected sample standard deviation (i.e. with Bessel's correction), which is less biased, but still biased
| |
| | |
| == Formula ==
| |
| The sample mean is given by
| |
| | |
| :<math>\overline{x}=\frac{1}{n}\sum_{i=1}^n x_i.</math>
| |
| | |
| The biased sample variance is then written:
| |
| | |
| :<math>s_n^2 = \frac {1}{n} \sum_{i=1}^n \left(x_i - \overline{x} \right)^ 2 = \frac{\sum_{i=1}^n \left(x_i^2\right)}{n} - \frac{\left(\sum_{i=1}^n x_i\right)^2}{n^2}</math>
| |
| | |
| and the unbiased sample variance is written:
| |
| | |
| :<math>s^2 = \frac {1}{n-1} \sum_{i=1}^n \left(x_i - \overline{x} \right)^ 2 = \frac{\sum_{i=1}^n \left(x_i^2\right)}{n-1} - \frac{\left(\sum_{i=1}^n x_i\right)^2}{(n-1)n} = \left(\frac{n}{n-1}\right)\,s_n^2.</math>
| |
| | |
| == Proof of correctness - Alternate 1==
| |
| <div class="NavFrame collapsed" style="text-align: left">
| |
| <div class="NavHead">Click [show] to expand</div>
| |
| <div class="NavContent">
| |
| As a background fact, we use the identity <math>E[x^2] = \mu^2 + \sigma^2</math> which follows from the definition of the standard deviation and [[linearity of expectation]].
| |
| | |
| A very helpful observation is that for any distribution, the variance equals half the expected value of <math>(x_1 - x_2)^2</math> when <math>x_1, x_2</math> are independent samples. To prove this observation we will use that <math>E[x_1x_2] = E[x_1]E[x_2]</math> (which follows from the fact that they are independent) as well as linearity of expectation:
| |
| | |
| :<math>E[(x_1 - x_2)^2] = E[x_1^2] - E[2x_1x_2] + E[x_2^2] = (\sigma^2 + \mu^2) - 2\mu^2 + (\sigma^2 + \mu^2) = 2\sigma^2</math>
| |
| | |
| Now that the observation is proven, it suffices to show that the expected squared difference of two samples from the sample population <math>x_1, \ldots, x_n</math> equals <math>(n-1)/n</math> times the expected squared difference of two samples from the original distribution. To see this, note that when we pick <math>x_u</math> and <math>x_v</math> via ''u, v'' being integers selected independently and uniformly from 1 to ''n'', a fraction <math>n/n^2 = 1/n</math> of the time we will have ''u=v'' and therefore the sampled squared difference is zero independent of the original distribution. The remaining <math>1-1/n</math> of the time, the value of <math>E[(x_u-x_v)^2]</math> is the expected squared difference between two unrelated samples from the original distribution. Therefore, dividing the sample expected squared difference by <math>(1-1/n)</math>, or equivalently multiplying by <math>1/(1-1/n) = n/(n-1),</math> gives an unbiased estimate of the original expected squared difference.
| |
| </div>
| |
| </div>
| |
| | |
| == Proof of correctness - Alternate 2==
| |
| <div class="NavFrame collapsed" style="text-align: left">
| |
| <div class="NavHead">Click [show] to expand</div>
| |
| <div class="NavContent">
| |
| Recycling an [[Variance#Population variance and sample variance|identity for variance]],
| |
| :<math>
| |
| \begin{align}
| |
| \sum_{i=1}^n \left(x_i - \overline{x} \right)^2 &= \sum_{i=1}^n \left(x_i - \frac 1 n \sum_{j=1}^n x_j \right)^2 \\
| |
| &= \sum_{i=1}^n x_i^2 - n \left(\frac 1 n \sum_{j=1}^n x_j \right)^2 \\
| |
| &= \sum_{i=1}^n x_i^2 - n \overline{x}^2
| |
| \end{align}
| |
| </math>
| |
| so
| |
| :<math>
| |
| \begin{align}
| |
| \operatorname{E}\left(\sum_{i=1}^n \left[x_i - \mu - \left(\overline{x} - \mu\right)\right]^2 \right)
| |
| &= \operatorname{E}\left(\sum_{i=1}^n (x_i-\mu)^2 - n (\overline{x}-\mu)^2 \right) \\
| |
| &= \sum_{i=1}^n \operatorname{E}\left((x_i-\mu)^2 \right) - n \operatorname{E}\left((\overline{x}-\mu)^2\right) \\
| |
| &= \sum_{i=1}^n \operatorname{Var}\left(x_i \right) - n \operatorname{Var}\left(\overline{x} \right)
| |
| \end{align}
| |
| </math>
| |
| and by definition,
| |
| :<math>
| |
| \begin{align}
| |
| \operatorname{E}(s^2)
| |
| & = \operatorname{E}\left(\sum_{i=1}^n \frac{(x_i-\overline{x})^2}{n-1} \right)\\
| |
| & = \frac{1}{n-1} \operatorname{E}\left(\sum_{i=1}^n \left[x_i - \mu - \left(\overline{x} - \mu\right)\right]^2 \right)\\
| |
| &= \frac{1}{n-1} \left[\sum_{i=1}^n \operatorname{Var}\left(x_i \right) - n \operatorname{Var}\left(\overline{x} \right)\right]
| |
| \end{align}
| |
| </math>
| |
| | |
| Note that, since ''x''<sub>1</sub>, ''x''<sub>2</sub>, · · · , ''x<sub>n</sub>'' are a random sample from a distribution with variance ''σ''<sup>2</sup>, it follows that for each ''i'' = 1, 2, . . . , ''n'':
| |
| | |
| :<math> \operatorname{Var}(x_i) = \sigma^2</math>
| |
| | |
| and also
| |
| :<math>\operatorname{Var}(\overline{x}) = \sigma^2/n</math>
| |
| | |
| This is a property of the variance of uncorrelated variables, arising from the [[Variance#Sum of uncorrelated variables (Bienaymé formula)|Bienaymé formula]]. The required result is then obtained by substituting these two formulae:
| |
| | |
| :<math> | |
| \operatorname{E}(s^2) = \frac{1}{n-1}\left[\sum_{i=1}^n \sigma^2 - n(\sigma^2/n)\right] = \frac{1}{n-1}(n\sigma^2-\sigma^2) = \sigma^2. \,
| |
| </math>
| |
| </div>
| |
| </div>
| |
| | |
| == Proof of correctness - Alternate 3 ==
| |
| <div class="NavFrame collapsed" style="text-align: left">
| |
| <div class="NavHead">Click [show] to expand</div>
| |
| <div class="NavContent">
| |
| The expected discrepancy between the biased estimator and the true variance is
| |
| :<math>
| |
| \begin{align}
| |
| E \left[ \sigma^2 - s_{biased}^2 \right] &= E\left[ \frac{1}{n}\sum_{i=1}^n(x_i - \mu)^2 - \frac{1}{n}\sum_{i=1}^n (x_i - \overline{x})^2 \right] \\
| |
| &= \frac{1}{n} E\left[ \sum_{i=1}^n\left((x_i^2 - 2 x_i \mu + \mu^2) - (x_i^2 - 2 x_i \overline{x} + \overline{x}^2)\right) \right] \\
| |
| &= \frac{1}{n} E\left[ \mu^2 - 2 \overline{x} \mu + \overline{x}^2 \right] \\
| |
| &= \frac{1}{n} E\left[ (\overline{x} - \mu)^2 \right] \\
| |
| &= \text{Var} (\overline{x}) \\
| |
| &= \frac{\sigma^2}{n}
| |
| \end{align}
| |
| </math>
| |
| | |
| So, the expected value of the biased estimator will be
| |
| :<math> \operatorname{E} \left[ s^2_{\text{biased}} \right] = \sigma^2 - \frac{\sigma^2}{n} = \frac{n-1}{n} \sigma^2 </math>
| |
| | |
| So, an unbiased estimator should be given by
| |
| :<math> s_{\text{unbiased}}^2 = \frac{n}{n-1} s_{\text{biased}}^2 </math>
| |
| | |
| === Intuition ===
| |
| In the biased estimator, by using the sample mean instead of the true mean, you are underestimating each ''x''<sub>i</sub> - ''µ'' by <span style="text-decoration: overline">''x''</span> - ''µ''. We know that the variance of a sum is the sum of the variances (for uncorrelated variables). So, to find the discrepancy between the biased estimator and the true variance, we just need to find the variance of <span style="text-decoration: overline">''x''</span> - ''µ''.
| |
| | |
| This is just the variance of the sample mean, which is ''σ''<sup>2</sup>/n. So, we expect that the biased estimator underestimates ''σ''<sup>2</sup> by ''σ''<sup>2</sup>/n, and so the unbiased estimator = (1-1/n) * the biased estimator = (n-1)/n * the biased estimator.
| |
| | |
| </div>
| |
| </div>
| |
| | |
| ==See also==
| |
| * [[Bias of an estimator]]
| |
| * [[Standard deviation]]
| |
| * [[Unbiased estimation of standard deviation]]
| |
| | |
| ==Notes==
| |
| {{Reflist}}
| |
| | |
| ==External links==
| |
| * {{MathWorld|urlname=BesselsCorrection|title=Bessel's Correction}}
| |
| * [http://www.khanacademy.org/cs/fishy-statistics-unbiased-estimate-of-population-variance/1183564841 Animated experiment demonstrating the correction, at Khan Academy]
| |
| | |
| {{DEFAULTSORT:Bessel's Correction}}
| |
| [[Category:Statistical deviation and dispersion]]
| |
| [[Category:Statistical inference]]
| |