|
|
Line 1: |
Line 1: |
| [[Image:Total least squares.svg|right|thumb|200xp| Deming regression. The red lines show the error in both ''x'' and ''y''. This is different from the traditional least squares method which measures error parallel to the ''y'' axis. The case shown, with deviations measured perpendicularly, arises when ''x'' and ''y'' have equal variances.]]
| | I am Emil and was born on 23 February 1976. My hobbies are Model Aircraft Hobbies and Driving.<br><br>Here is my web blog [https://www.youtube.com/watch?v=-XN7kraNAOg Instant Rewards Review] |
| | |
| In [[statistics]], '''Deming regression''', named after [[W. Edwards Deming]], is an [[errors-in-variables model]] which tries to find the [[line of best fit]] for a two-dimensional dataset. It differs from the [[simple linear regression]] in that it accounts for [[errors and residuals in statistics|errors]] in observations on both the ''x''- and the ''y''- axis. It is a special case of [[total least squares]], which allows for any number of predictors and a more complicated error structure.
| |
| | |
| Deming regression is equivalent to the [[maximum likelihood]] estimation of an [[errors-in-variables model]] in which the errors for the two variables are assumed to be independent and [[normal distribution|normally distributed]], and the ratio of their variances, denoted ''δ'', is known.<ref>{{harv|Linnet|1993}}</ref> In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.
| |
| | |
| The Deming regression is only slightly more difficult to compute compared to the [[simple linear regression]]. Many software packages used in clinical chemistry, such as [[Analyse-it]], EP Evaluator, [[MedCalc]] and [[S-PLUS]] offer Deming regression.
| |
| | |
| The model was originally introduced by {{harvtxt|Adcock|1878}} who considered the case ''δ'' = 1, and then more generally by {{harvtxt|Kummell|1879}} with arbitrary ''δ''. However their ideas remained largely unnoticed for more than 50 years, until they were revived by {{harvtxt|Koopmans|1937}} and later propagated even more by {{harvtxt|Deming|1943}}. The latter book became so popular in [[clinical chemistry]] and related fields that the method was even dubbed ''Deming regression'' in those fields.<ref>Cornbleet, Gochman (1979)</ref>
| |
| | |
| == Specification ==
| |
| | |
| Assume that the available data (''y<sub>i</sub>'', ''x<sub>i</sub>'') are measured observations of the "true" values (''y<sub>i</sub>*'', ''x<sub>i</sub>*''):
| |
| : <math>\begin{align}
| |
| y_i &= y^*_i + \varepsilon_i, \\
| |
| x_i &= x^*_i + \eta_i,
| |
| \end{align}</math>
| |
| where errors ''ε'' and ''η'' are independent and the ratio of their variances is assumed to be known:
| |
| : <math> \delta = \frac{\sigma_\varepsilon^2}{\sigma_\eta^2}. </math>
| |
| | |
| In practice the variance of the <math>x</math> and <math>y</math> parameters is often unknown which complicates the estimate of <math> \delta </math> but where the measurement method for <math>x</math> and <math>y</math> is the same they are likely to be equal so that <math> \delta = 1 </math> for this case.
| |
| | |
| We seek to find the line of "best fit" ''y*'' = ''β''<sub>0</sub> + ''β''<sub>1</sub>''x*'', such that the weighted sum of squared residuals of the model is minimized:<ref>Fuller, ch.1.3.3</ref>
| |
| : <math>SSR = \sum_{i=1}^n\bigg(\frac{\varepsilon_i^2}{\sigma_\varepsilon^2} + \frac{\eta_i^2}{\sigma_\eta^2}\bigg) = \frac{1}{\sigma_\varepsilon^2} \sum_{i=1}^n\Big((y_i-\beta_0-\beta_1x^*_i)^2 + \delta(x_i-x^*_i)^2\Big) \ \to\ \min_{\beta_0,\beta_1,x_1^*,\ldots,x_n^*} SSR</math>
| |
| | |
| == Solution ==
| |
| The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from ''i'' = 1 to ''n''):
| |
| : <math>\begin{align}
| |
| & \overline{x} = \frac{1}{n}\sum x_i, \quad \overline{y} = \frac{1}{n}\sum y_i, \\
| |
| & s_{xx} = \tfrac{1}{n-1}\sum (x_i-\overline{x})^2, \\
| |
| & s_{xy} = \tfrac{1}{n-1}\sum (x_i-\overline{x})(y_i-\overline{y}), \\
| |
| & s_{yy} = \tfrac{1}{n-1}\sum (y_i-\overline{y})^2.
| |
| \end{align}</math>
| |
| | |
| Finally, the least-squares estimates of model's parameters will be<ref>Glaister (2001)</ref>
| |
| : <math>\begin{align}
| |
| & \hat\beta_1 = \frac{s_{yy}-\delta s_{xx} + \sqrt{(s_{yy}-\delta s_{xx})^2 + 4\delta s_{xy}^2}}{2s_{xy}} \\
| |
| & \hat\nu_1=\frac{-1}{\hat\beta_1} = \frac {-2 \delta s_{xy}}{s_{yy}-\delta s_{xx} - \sqrt{(s_{yy}-\delta s_{xx})^2 + 4\delta s_{xy}^2}}, \\
| |
| & \hat\beta_0 = \overline{y} - \hat\beta_1\overline{x}, \\
| |
| & \hat{x}_i^* = x_i + \frac{\hat\beta_1}{\hat\beta_1^2+\delta}(y_i-\hat\beta_0-\hat\beta_1x_i).
| |
| \end{align}</math>
| |
| where <math>\hat\nu_1</math> is slope of the line normal (perpendicular) to <math>\hat\beta_1</math>.
| |
| | |
| ==The case of equal error variances==
| |
| | |
| When <math>\delta=1</math>, Deming regression becomes [[orthogonal regression]]: it minimizes the sum of squared perpendicular distances from the data points to the regression line. In this case, denote each observation as a point ''z''<sub>''j''</sub> in the complex plane (i.e., the point (''x''<sub>''j''</sub>, ''y''<sub>''j''</sub>) is written as ''z''<sub>''j''</sub> = ''x''<sub>''j''</sub> + ''iy''<sub>''j''</sub> where ''i'' is the [[imaginary unit]]). Denote as ''Z'' the sum of the squared differences of the data points from the [[centroid]] (also denoted in complex coordinates), which is the point whose horizontal and vertical locations are the averages of those of the data points. Then:<ref>Minda and Phelps (2008), Theorem 2.3.</ref>
| |
| | |
| *If ''Z'' = 0, then every line through the centroid is a line of best orthogonal fit.
| |
| *If ''Z'' ≠ 0, the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to <math>\sqrt{Z}</math>.
| |
| | |
| A [[trigonometry|trigonometric]] representation of the orthogonal regression line was given by Coolidge in 1913.<ref>Coolidge, J. L. (1913).</ref>
| |
| | |
| ===Application===
| |
| | |
| In the case of three [[Line (geometry)|non-collinear]] points in the plane, the [[triangle]] with these points as its [[vertex (geometry)|vertices]] has a unique [[Steiner inellipse]] that is tangent to the triangle's sides at their midpoints. The [[Ellipse#Elements of an ellipse|major axis of this ellipse]] falls on the orthogonal regression line for the three vertices.<ref>Minda and Phelps (2008), Corollary 2.4.</ref>
| |
| | |
| ==Notes==
| |
| {{Reflist}}
| |
| | |
| ==References==
| |
| * {{cite journal|last=Adcock|first=R. J.|year=1878|title=A problem in least squares|journal=The Analyst|volume=5|issue=2|pages=53–54|publisher=Annals of Mathematics|doi=10.2307/2635758|jstor=2635758}}
| |
| * {{cite journal|author=Coolidge, J. L.|year=1913|title=Two geometrical applications of the mathematics of least squares|journal=The [[American Mathematical Monthly]]|volume=20|issue= 6|pages=187–190}}
| |
| * {{cite journal|author=Cornbleet, P.J.|coauthor=Gochman, N.|year=1979|title=Incorrect Least–Squares Regression Coefficients|journal=Clin. Chem.|volume=25|issue=3|pages=432–438|pmid=262186}}
| |
| * {{cite book|last=Deming|first=W. E.|authorlink=W. Edwards Deming|year=1943|title=Statistical adjustment of data|publisher=Wiley, NY (Dover Publications edition, 1985)|isbn=0-486-64685-8}}
| |
| * {{cite book|last=Fuller|first=Wayne A.|year=1987|title=Measurement error models|publisher=John Wiley & Sons, Inc|isbn=0-471-86187-1}}
| |
| * Glaister, P. (March 2001). "Least squares revisited". ''[[The Mathematical Gazette]]'' 85: 104-107.
| |
| * {{cite book|last=Koopmans|first=T. C.|year=1937|title=Linear regression analysis of economic time series|publisher=DeErven F. Bohn, Haarlem, Netherlands}}
| |
| * {{cite journal
| |
| | last = Kummell
| |
| | first = C. H.
| |
| | year = 1879
| |
| | title = Reduction of observation equations which contain more than one observed quantity
| |
| | journal = The Analyst
| |
| | volume = 6
| |
| | issue = 4
| |
| | pages = 97–105
| |
| | publisher = Annals of Mathematics
| |
| | doi = 10.2307/2635646
| |
| | jstor = 2635646
| |
| }}
| |
| * {{cite journal
| |
| | last = Linnet
| |
| | first = K.
| |
| | year = 1993
| |
| | title = Evaluation of regression procedures for method comparison studies
| |
| | journal = Clinical Chemistry
| |
| | volume = 39
| |
| | issue = 3
| |
| | pages = 424–432
| |
| | url = http://www.clinchem.org/cgi/reprint/39/3/424
| |
| | pmid = 8448852
| |
| }}
| |
| *{{cite journal
| |
| | last1 = Minda | first1 = D. | author1-link = David Minda
| |
| | last2 = Phelps | first2 = S.
| |
| | issue = 8
| |
| | journal = [[American Mathematical Monthly]]
| |
| | mr = 2456092
| |
| | pages = 679–689
| |
| | title = Triangles, ellipses, and cubic polynomials
| |
| | url = http://www.geogebra.org/en/upload/files/english/steve_phelps/minda%20phelps.pdf
| |
| | volume = 115
| |
| | year = 2008}}
| |
| | |
| {{DEFAULTSORT:Deming Regression}}
| |
| [[Category:Regression analysis]]
| |
I am Emil and was born on 23 February 1976. My hobbies are Model Aircraft Hobbies and Driving.
Here is my web blog Instant Rewards Review