|
|
Line 1: |
Line 1: |
| {{See also|Least squares|Mean squared error|Partition of sums of squares|Residual sum of squares}}
| | Ed is what people contact me and my wife doesn't like it at all. Her family members life in Alaska but her husband desires them to transfer. What me and my family adore is to climb but I'm thinking on beginning something new. Office supervising is exactly where my primary earnings comes from but I've always wanted my own company.<br><br>My webpage - psychic readings online - [http://breenq.com/index.php?do=/profile-1144/info/ breenq.com], |
| In [[probability theory]] and [[statistics]], the definition of '''[[variance]]''' is either the [[expected value]] (when considering a theoretical [[probability distribution|distribution]]), or average value (for actual experimental data), of '''squared deviations''' from the mean. Computations for '''[[analysis of variance]]''' involve the partitioning of a sum of '''squared deviations'''. An understanding of the complex computations involved is greatly enhanced by a detailed study of the statistical value:
| |
| | |
| : <math>\operatorname{E}( X ^ 2 ).</math>
| |
| | |
| It is well known that for a [[random variable]] <math>X</math> with mean <math>\mu</math> and variance <math>\sigma^2</math>:
| |
| | |
| : <math>\sigma^2 = \operatorname{E}( X ^ 2 ) - \mu^2</math><ref>Mood & Graybill: ''An introduction to the Theory of Statistics'' (McGraw Hill)</ref>
| |
| | |
| Therefore
| |
| | |
| : <math>\operatorname{E}( X ^ 2 ) = \sigma^2 + \mu^2.</math>
| |
| | |
| From the above, the following are easily derived:
| |
| | |
| : <math>\operatorname{E}\left( \sum\left( X ^ 2\right) \right) = n\sigma^2 + n\mu^2</math>
| |
| | |
| : <math>\operatorname{E}\left( \left(\sum X \right)^ 2 \right) = n\sigma^2 + n^2\mu^2</math>
| |
| | |
| If <math>\hat{Y}</math> is a vector of n predictions, and <math>Y</math> is the vector of the true values, then the SSE of the predictor is:
| |
| <math>SSE=\frac{1}{2}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2</math>
| |
| == Sample variance ==
| |
| | |
| The sum of squared deviations needed to calculate variance (before deciding whether to divide by ''n'' or ''n'' − 1) is most easily calculated as
| |
| | |
| : <math>S = \sum x ^ 2 - \frac{\left(\sum x\right)^2}{n}</math>
| |
| | |
| From the two derived expectations above the expected value of this sum is
| |
| | |
| : <math>\operatorname{E}(S) = n\sigma^2 + n\mu^2 - \frac{n\sigma^2 + n^2\mu^2}{n}</math>
| |
| | |
| which implies
| |
| | |
| : <math>\operatorname{E}(S) = (n - 1)\sigma^2. </math>
| |
| | |
| This effectively proves the use of the divisor ''n'' − 1 in the calculation of an '''unbiased''' sample estimate of ''σ''<sup>2</sup>.
| |
| | |
| == Partition — analysis of variance ==
| |
| | |
| In the situation where data is available for ''k'' different treatment groups having size ''n<sub>i</sub>'' where ''i'' varies from 1 to ''k'', then it is assumed that the expected mean of each group is
| |
| | |
| : <math>\operatorname{E}(\mu_i) = \mu + T_i</math>
| |
| | |
| and the variance of each treatment group is unchanged from the population variance <math>\sigma^2</math>.
| |
| | |
| Under the Null Hypothesis that the treatments have no effect, then each of the <math>T_i</math> will be zero.
| |
| | |
| It is now possible to calculate three sums of squares:
| |
| | |
| ;Individual
| |
| | |
| :<math>I = \sum x^2 </math>
| |
| | |
| :<math>\operatorname{E}(I) = n\sigma^2 + n\mu^2</math>
| |
| | |
| ;Treatments
| |
| | |
| :<math>T = \sum_{i=1}^k \left(\left(\sum x\right)^2/n_i\right)</math>
| |
| | |
| :<math>\operatorname{E}(T) = k\sigma^2 + \sum_{i=1}^k n_i(\mu + T_i)^2</math>
| |
| | |
| :<math>\operatorname{E}(T) = k\sigma^2 + n\mu^2 + 2\mu \sum_{i=1}^k (n_iT_i) + \sum_{i=1}^k n_i(T_i)^2</math>
| |
| | |
| Under the null hypothesis that the treatments cause no differences and all the <math>T_i</math> are zero, the expectation simplifies to
| |
| | |
| :<math>\operatorname{E}(T) = k\sigma^2 + n\mu^2.</math>
| |
| | |
| ;Combination
| |
| | |
| :<math>C = \left(\sum x\right)^2/n</math>
| |
| | |
| :<math>\operatorname{E}(C) = \sigma^2 + n\mu^2</math>
| |
| | |
| ===Sums of squared deviations===
| |
| | |
| Under the null hypothesis, the difference of any pair of ''I'', ''T'', and ''C'' does not contain any dependency on <math>\mu</math>, only <math>\sigma^2</math>.
| |
| | |
| :<math>\operatorname{E}(I - C) = (n - 1)\sigma^2</math> total squared deviations aka ''[[total sum of squares]]''
| |
| | |
| :<math>\operatorname{E}(T - C) = (k - 1)\sigma^2</math> treatment squared deviations aka ''[[explained sum of squares]]''
| |
| | |
| :<math>\operatorname{E}(I - T) = (n - k)\sigma^2</math> residual squared deviations aka ''[[residual sum of squares]]'' | |
| | |
| The constants (''n'' − 1), (''k'' − 1), and (''n'' − ''k'') are normally referred to as the number of [[degrees of freedom (statistics)|degrees of freedom]].
| |
| | |
| ===Example===
| |
| | |
| In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.
| |
| | |
| :<math>I = \frac{1^2}{1} + \frac{2^2}{1} + \frac{3^2}{1} + \frac{4^2}{1} + \frac{6^2}{1} = 66</math>
| |
| | |
| :<math>T = \frac{(1 + 2 + 3)^2}{3} + \frac{(4 + 6)^2}{2} = 12 + 50 = 62</math>
| |
| | |
| :<math>C = \frac{(1 + 2 + 3 + 4 + 6)^2}{5} = 256/5 = 51.2</math>
| |
| | |
| Giving
| |
| | |
| : Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.
| |
| : Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.
| |
| : Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.
| |
| | |
| ==Two-way analysis of variance==
| |
| The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.
| |
| {| class="wikitable"
| |
| |-
| |
| !
| |
| ! Extra CO<sub>2</sub>
| |
| ! Extra humidity
| |
| |-
| |
| | No fertiliser
| |
| | 7, 2, 1
| |
| | 7, 6
| |
| |-
| |
| | Nitrate
| |
| | 11, 6
| |
| | 10, 7, 3
| |
| |-
| |
| | Phosphate
| |
| | 5, 3, 4
| |
| | 11, 4
| |
| |}
| |
| | |
| Five sums of squares are calculated:
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| ! Factor
| |
| ! Calculation
| |
| ! Sum
| |
| ! <math>\sigma^2</math>
| |
| |-
| |
| | Individual
| |
| | <math>7^2+2^2+1^2 + 7^2+6^2 + 11^2+6^2 + 10^2+7^2+3^2 + 5^2+3^2+4^2 + 11^2+4^2</math>
| |
| | 641
| |
| | 15
| |
| |-
| |
| | Fertiliser × Environment
| |
| | <math>\frac{(7+2+1)^2}{3} + \frac{(7+6)^2}{2} + \frac{(11+6)^2}{2} + \frac{(10+7+3)^2}{3} + \frac{(5+3+4)^2}{3} + \frac{(11+4)^2}{2}</math>
| |
| | 556.1667
| |
| | 6
| |
| |-
| |
| | Fertiliser
| |
| | <math>\frac{(7+2+1+7+6)^2}{5} + \frac{(11+6+10+7+3)^2}{5} + \frac{(5+3+4+11+4)^2}{5}</math>
| |
| | 525.4
| |
| | 3
| |
| |-
| |
| | Environment
| |
| | <math>\frac{(7+2+1+11+6+5+3+4)^2}{8} + \frac{(7+6+10+7+3+11+4)^2}{7} </math>
| |
| | 519.2679
| |
| | 2
| |
| |-
| |
| | Composite
| |
| | <math>\frac{(7+2+1+11+6+5+3+4+7+6+10+7+3+11+4)^2}{15} </math>
| |
| | 504.6
| |
| | 1
| |
| |}
| |
| | |
| Finally, the sums of squared deviations required for the [[analysis of variance]] can be calculated.
| |
| | |
| {| class="wikitable"
| |
| |-
| |
| ! Factor
| |
| ! Sum
| |
| ! <math>\sigma^2</math>
| |
| ! Total
| |
| ! Environment
| |
| ! Fertiliser
| |
| ! Fertiliser × Environment
| |
| ! Residual
| |
| |-
| |
| | Individual
| |
| | 641
| |
| | 15
| |
| | 1
| |
| |
| |
| |
| |
| |
| |
| | 1
| |
| |-
| |
| | Fertiliser × Environment
| |
| | 556.1667
| |
| | 6
| |
| |
| |
| |
| |
| |
| |
| | 1
| |
| | −1
| |
| |-
| |
| | Fertiliser
| |
| | 525.4
| |
| | 3
| |
| |
| |
| |
| |
| | 1
| |
| | −1
| |
| |
| |
| |-
| |
| | Environment
| |
| | 519.2679
| |
| | 2
| |
| |
| |
| | 1
| |
| |
| |
| | −1
| |
| |
| |
| |-
| |
| | Composite
| |
| | 504.6
| |
| | 1
| |
| | −1
| |
| | −1
| |
| | −1
| |
| | 1
| |
| |
| |
| |-
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |-
| |
| | Squared deviations
| |
| |
| |
| |
| |
| | 136.4
| |
| | 14.668
| |
| | 20.8
| |
| | 16.099
| |
| | 84.833
| |
| |-
| |
| | Degrees of freedom
| |
| |
| |
| |
| |
| | 14
| |
| | 1
| |
| | 2
| |
| | 2
| |
| | 9
| |
| |}
| |
| | |
| ==See also==
| |
| * [[Variance decomposition]]
| |
| * [[Errors and residuals in statistics]]
| |
| * [[Absolute deviation]]
| |
| | |
| ==References==
| |
| <References/>
| |
| | |
| [[Category:Statistical deviation and dispersion]]
| |
| [[Category:Analysis of variance]]
| |
Ed is what people contact me and my wife doesn't like it at all. Her family members life in Alaska but her husband desires them to transfer. What me and my family adore is to climb but I'm thinking on beginning something new. Office supervising is exactly where my primary earnings comes from but I've always wanted my own company.
My webpage - psychic readings online - breenq.com,