|
|
Line 1: |
Line 1: |
| {{Regression bar}}
| |
| {{About|the mathematics that underlie curve fitting using linear least squares|statistical regression analysis using least squares|linear regression|linear regression on a single variable|simple linear regression|other uses|ordinary least squares|and|regression analysis}}
| |
|
| |
|
| In [[statistics]] and [[mathematics]], '''linear least squares''' is an approach fitting a [[mathematical model|mathematical]] or [[statistical model]] to [[data]] in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown [[parameter]]s of the model. The resulting fitted model can be used to [[descriptive statistics|summarize]] the data, to [[prediction|predict]] unobserved values from the same system, and to understand the mechanisms that may underlie the system.
| |
|
| |
|
| Mathematically, linear least squares is the problem of approximately solving an [[overdetermined system]] of linear equations, where the best approximation is defined as that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called "linear" least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are [[Convex function|convex]] and have a [[closed-form expression|closed-form solution]] that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast, [[non-linear least squares]] problems generally must be solved by an [[iterative method|iterative procedure]], and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the [[Minimum mean square error|Bayesian MMSE estimator]].
| | 000 är något kommer att inspirera dej att existera någon denna bingo webbplats. Framgångsrik denna jackpot enkelt om lyckan [http://data.gov.uk/data/search?q=p%C3%A5+din på din] blad. har mottot "Min format från människor" vilket befinner sig hur sa vill ingiva på deras webbplats. briljera med erbjuder disträ en utbetalning kungen? 000 per månad för denna jackpot villig din personliga. kan se deras konsumenten hjälp chattmoderatorer. behöver bara innehava en fullfjädrad hus 37 konversation eller innerligt mindre. De äger ett fullständig hop erbjudanden att förse åt bingo men deras progressiva jackpotten börjar bred? Ägs bruten Goldbond annonsering Ltd, St Minver, Storbritanniens främsta bingocommunityn, syftar småprat Bingo mot att effektuera den mest rolig gällande Internet bingospel Europa.<br><br>nFörsåvitt du vill bestå uppdaterad på den "nya casino personlig konkurs andra berättelser skapar underrättelse SF, vänligen banka kungen Prenumerera-knappen ovan. Ni kan avsluta prenumerationen när såsom helst<br><br>Om pokerrum finns, finns det blaffig att det har granskats itu 2 + 2ers. Ditt bästa skulle fullkomligt att genomföra Google-sökning. n5) aktör recensioner när har begränsat ditt valmöjlighet två av pokerrum, befinner sig det betydelsefull att söka kungen förut recensioner. Tillbringa hobby medverkande gäst kvalitet, , rakeback andra kampanjer, konsumenten medhåll och utträde taktfullhet En annat fason att erhålla några förstahandsinformation möter tvenne + 2 forum, det största poker diskussionsforum gällande internet.<br><br>antagligen låter såsom blott en metod att erhålla dig att inregistrera och det är. använder det intelligenta är det gratis pengar pro dig någon nederlag pro casino. Ifall du äger någonsin märkt någon annons förut något on-line på kurs casino har du observerat dom erbjuder oftast ett grandiost på linje casinobonus.<br><br>Inledningen en färsking i världen från online gaming indikerar att ni att världen kommer att Visa bestå en fenomenal 1 försåvitt behövs regler leka som . Om försöker fixa blint skapar felaktiga alternativ enär chanserna frige stora insatserna brukar kungen kort.<br><br>Denna deltagare inneha inte korten därför att i retur igen hans tidigare insatser han ser att andra folk icke kan matcha sin han handlar dom ur spelet. Fördelen det här förbättring befinner sig att han ej inneha att framföra sin labb, så att du aldrig vet förut hur sa han hade. Det kan visa ett enastående , skada vanligtvis visar ett svag näve. Göra affär potten: när vissa-1 sätter inom så avsevärt pengar att samtliga andra lägger kallas förvärva potten.<br><br>webbplats äger en antal bingorum att agera i, varje sin originell formgivning - såsom gör en välkommen ombyte grädde eller beige kulörta väggar villig närliggande Bygdegård eller kyrkliga hallen. Det finns en grandiost bingorum ute odla Sök nätet och inöva den bingorum recensioner så väljer webbplats felfri förut dej. Välj saken där online bingo webbplats diggar.<br><br>Frysning källor ofta någon 3-hjuls slotmaskin det kanske fett att spela. Vanligtvis erhålla din tillhörighet iced skulle existera ett illa omständighet, skada det befinner sig synbarligen ej fallet.<br><br>flesta internet-webbplatser vill anpassade fakta vilket bekräftas. Längst villig sidan nätet själv samt fas gratis håll kontra en webbplats därborta ni kan lite också tillägg absolut poker deg. För att denna cirka bruten du anländer frågor. Mitt idé att greppa någon visas kungen saken där mot att anträda tillsammans webben , donera dej märklig snygg annorlunda riktningar, försåvitt följer dom kommer du att mottagaren en hundra lbs. Det finns handfull webben webbplatser uppmuntrar poker besiktning överlämnas före deltagaren har precis för någon inga nya casinobonus. Odla om en stund dessa förutsättningar är uppfyllda, någon behörig att erhålla extra bankrolls.<br><br>nSaken där betecknar såsom nya casino extra bilisterna befinner sig inaktuella samt specifikationer tendera befinna gällande genast. kan skaffa fram din ljuddrivrutinerna absolut gratis webbplatsen respektive ljudkort fabrikant<br><br>Ifall vill vara uppdaterad gällande saken där "nya casino personlig konkurs och andra berättelser skapar nyheter i SF, vänligen succé knappen Abonnera. kan fullborda prenumerationen när som helst.<br><br>Du kan också överväga assistans internet via att förrätta sökning kungen Sportspel erbjudande. är färsk kan casinobonus inom sportvadslagning sedan ni tarva problem . Ni kan samt betänka assist dom specialister delar deras råd samt offerera alldeles fria guide. Behöver första förbrukning online Betting manualen erbjuds gällande Betsson .<br><br>ni det on-line för absolut kostnadsfri ni även fortsättningsvis företa satsningar men med falska deg. Det beror villig webbplatsen reglerna. Du kan göra satsningar när deltar inom online poker. Det befinner sig mot vill utföra pokerspel online på online kasinon. behärska om independent att som mycket som du vill om det finns ett föreskrift ifall det du satsar. Ifall ditt avsikt är att handla klöver spela poker online kungen online kasinon emedan kan du göra detta via att alstra satsningar och framgångsrika spel.<br><br>If you cherished this report and you would like to acquire more info concerning [http://3monthloansdirectlenders.co.uk/services/nya-online-svenska-casinon-pa-natet-overview/ nya online casino på nätet] kindly visit our own web site. |
| | |
| In statistics, linear least squares problems correspond to a particularly important type of [[statistical model]] called [[linear regression]] which arises as a particular form of [[regression analysis]]. One basic form of such a model is an [[ordinary least squares]] model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and [[statistical inference]]s related to these being dealt with in the articles just mentioned. See [[outline of regression analysis]] for an outline of the topic.
| |
| | |
| == Motivational example ==
| |
| {{see also|Polynomial regression}}
| |
| [[Image:Linear least squares example2.svg|right|thumb|A plot of the data points (in red), the least squares line of best fit (in blue), and the residuals (in green).]]
| |
| | |
| As a result of an experiment, four <math>(x, y)</math> data points were obtained, <math>(1, 6),</math> <math>(2, 5),</math> <math>(3, 7),</math> and <math>(4, 10)</math> (shown in red in the picture on the right). We hope to find a line <math>y=\beta_1+\beta_2 x</math> that best fits these four points. In other words, we would like to find the numbers <math>\beta_1</math> and <math>\beta_2</math> that approximately solve the overdetermined linear system
| |
| :<math>\begin{alignat}{3}
| |
| \beta_1 + 1\beta_2 &&\; = \;&& 6 & \\
| |
| \beta_1 + 2\beta_2 &&\; = \;&& 5 & \\
| |
| \beta_1 + 3\beta_2 &&\; = \;&& 7 & \\
| |
| \beta_1 + 4\beta_2 &&\; = \;&& 10 & \\
| |
| \end{alignat}</math>
| |
| of four equations in two unknowns in some "best" sense.
| |
| | |
| The [[least squares]] approach to solving this problem is to try to make as small as possible the sum of squares of "errors" between the right- and left-hand sides of these equations, that is, to find the [[maxima and minima|minimum]] of the function
| |
| | |
| : <math>\begin{align}S(\beta_1, \beta_2) =&
| |
| \left[6-(\beta_1+1\beta_2)\right]^2 | |
| +\left[5-(\beta_1+2\beta_2) \right]^2 \\
| |
| &+\left[7-(\beta_1 + 3\beta_2)\right]^2
| |
| +\left[10-(\beta_1 + 4\beta_2)\right]^2 \\
| |
| &= 4\beta_1^2 + 30\beta_2^2 + 20\beta_1\beta_2 - 56\beta_1 - 154\beta_2 + 210 .\end{align}</math>
| |
| | |
| The minimum is determined by calculating the [[partial derivative]]s of <math>S(\beta_1, \beta_2)</math> with respect to <math>\beta_1</math> and <math>\beta_2</math> and setting them to zero
| |
| | |
| :<math>\frac{\partial S}{\partial \beta_1}=0=8\beta_1 + 20\beta_2 -56</math>
| |
| :<math>\frac{\partial S}{\partial \beta_2}=0=20\beta_1 + 60\beta_2 -154.</math>
| |
| | |
| This results in a system of two equations in two unknowns, called the normal equations, which give, when solved
| |
| | |
| :<math>\beta_1=3.5</math>
| |
| :<math>\beta_2=1.4</math>
| |
| | |
| and the equation <math>y=3.5+1.4x</math> of the line of best fit. The [[residual (statistics)|residual]]s, that is, the discrepancies between the <math>y</math> values from the experiment and the <math>y</math> values calculated using the line of best fit are then found to be <math>1.1,</math> <math>-1.3,</math> <math>-0.7,</math> and <math>0.9</math> (see the picture on the right). The minimum value of the sum of squares of the residuals is <math>S(3.5, 1.4)=1.1^2+(-1.3)^2+(-0.7)^2+0.9^2=4.2.</math>
| |
| | |
| ===Using a quadratic model===
| |
| Importantly, in "linear least squares", we are not restricted to using a line as the model as in the above example. For instance, we could have chosen the restricted quadratic model <math>y=\beta_1 x^2</math>. This model is still linear in the <math>\beta_1</math> parameter, so we can still perform the same analysis, constructing a system of equations from the data points:
| |
| | |
| :<math>\begin{alignat}{2}
| |
| 6 &&\; = \beta_1 (1)^2 \\
| |
| 5 &&\; = \beta_1 (2)^2 \\
| |
| 7 &&\; = \beta_1 (3)^2 \\
| |
| 10 &&\; = \beta_1 (4)^2 \\
| |
| \end{alignat}</math>
| |
| | |
| The partial derivatives with respect to the parameters (this time there is only one) are again computed and set to 0:
| |
| | |
| <math>\frac{\partial S}{\partial \beta_1} = 0 = 708 \beta_1 - 498</math>
| |
| | |
| and solved
| |
| | |
| <math>\beta_1 = .703</math>
| |
| | |
| leading to the resulting best fit model <math>y = .703 x^2</math>
| |
| | |
| ==The general problem==
| |
| [[Image:Linear least squares2.png|right|thumb|The result of fitting a quadratic function <math>y=\beta_1+\beta_2x+\beta_3x^2\,</math> (in blue) through a set of data points <math>(x_i, y_i)</math> (in red). In linear least squares the function need not be linear in the argument <math>x,</math> but only in the parameters <math>\beta_j</math> that are determined to give the best fit.]]
| |
| Consider an [[overdetermined system]]
| |
| | |
| :<math>\sum_{j=1}^{n} X_{ij}\beta_j = y_i,\ (i=1, 2, \dots, m),</math>
| |
| | |
| of ''m'' [[linear equation]]s in ''n'' unknown [[coefficients]], ''β''<sub>1</sub>,''β''<sub>2</sub>,…,''β''<sub>''n''</sub>, with ''m'' > ''n''. This can be written in [[matrix (mathematics)|matrix]] form as
| |
| | |
| :<math>\mathbf {X} \boldsymbol {\beta} = \mathbf {y},</math>
| |
| | |
| where
| |
| | |
| :<math>\mathbf {X}=\begin{bmatrix}
| |
| X_{11} & X_{12} & \cdots & X_{1n} \\
| |
| X_{21} & X_{22} & \cdots & X_{2n} \\
| |
| \vdots & \vdots & \ddots & \vdots \\
| |
| X_{m1} & X_{m2} & \cdots & X_{mn}
| |
| \end{bmatrix} ,
| |
| \qquad \boldsymbol \beta = \begin{bmatrix}
| |
| \beta_1 \\ \beta_2 \\ \vdots \\ \beta_n \end{bmatrix} ,
| |
| \qquad \mathbf y = \begin{bmatrix}
| |
| y_1 \\ y_2 \\ \vdots \\ y_m
| |
| \end{bmatrix}. </math>
| |
| | |
| Such a system usually has no solution, so the goal is instead to find the coefficients '''''β''''' which fit the equations "best," in the sense of solving the [[Quadratic form (statistics)|quadratic]] [[minimization]] problem
| |
| | |
| :<math>\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\operatorname{arg\,min}}\,S(\boldsymbol{\beta}), </math>
| |
| | |
| where the objective function ''S'' is given by
| |
| | |
| :<math>S(\boldsymbol{\beta}) = \sum_{i=1}^{m}\bigl| y_i - \sum_{j=1}^{n} X_{ij}\beta_j\bigr|^2 = \bigl\|\mathbf y - \mathbf X \boldsymbol \beta \bigr\|^2.</math>
| |
| | |
| A justification for choosing this criterion is given in [[#Properties of the least-squares estimators|properties]] below. This minimization problem has a unique solution, provided that the ''n'' columns of the matrix ''X'' are [[linearly independent]], given by solving the '''normal equations'''
| |
| | |
| :<math>(\mathbf X^{\rm T} \mathbf X )\hat{\boldsymbol{\beta}}= \mathbf X^{\rm T} \mathbf y.</math>
| |
| | |
| ==Derivation of the normal equations==
| |
| Define the <math>i</math>th '''residual''' to be
| |
| | |
| :<math>r_i= y_i - \sum_{j=1}^{n} X_{ij}\beta_j</math>.
| |
| | |
| Then <math>S</math> can be rewritten
| |
| | |
| :<math>S = \sum_{i=1}^m r_i^2.</math>
| |
| | |
| ''S'' is [[Maxima and minima|minimized]] when its gradient vector is zero. (This follows by definition: if the gradient vector is not zero, there is a direction in which we can move to minimize it further - see [[maxima and minima]].) The elements of the gradient vector are the partial derivatives of ''S'' with respect to the parameters:
| |
| | |
| :<math>\frac{\partial S}{\partial \beta_j}=2\sum_{i = 1}^m r_i\frac{\partial r_i}{\partial \beta_j} \ (j=1,2,\dots, n).</math>
| |
| | |
| The derivatives are
| |
| | |
| :<math>\frac{\partial r_i}{\partial \beta_j}=-X_{ij}.</math> | |
| | |
| Substitution of the expressions for the residuals and the derivatives into the gradient equations gives
| |
| | |
| :<math>\frac{\partial S}{\partial \beta_j} = 2\sum_{i=1}^{m} \left( y_i-\sum_{k=1}^{n} X_{ik}\beta_k \right) (-X_{ij})\ (j=1,2,\dots, n).</math>
| |
| | |
| Thus if <math>\hat \beta</math> minimizes ''S'', we have
| |
| | |
| :<math>2\sum_{i=1}^{m} \left( y_i-\sum_{k=1}^{n} X_{ik}\hat \beta_k \right) (-X_{ij}) = 0\ (j=1,2,\dots, n).</math>
| |
| | |
| Upon rearrangement, we obtain the '''normal equations''':
| |
| | |
| :<math>\sum_{i=1}^{m}\sum_{k=1}^{n} X_{ij}X_{ik}\hat \beta_k=\sum_{i=1}^{m} X_{ij}y_i\ (j=1,2,\dots, n).</math>
| |
| | |
| The normal equations are written in matrix notation as
| |
| | |
| :<math>(\mathbf X^\mathrm{T} \mathbf X) \hat{\boldsymbol{\beta}} = \mathbf X^\mathrm{T} \mathbf y</math> (where ''X''<sup>T</sup> is the [[matrix transpose]] of ''X'').
| |
| | |
| The solution of the normal equations yields the vector <math>\hat{\boldsymbol{\beta}}</math> of the optimal parameter values.
| |
| | |
| ===Derivation directly in terms of matrices===
| |
| | |
| The normal equations can be derived directly from a matrix representation of the problem as follows. The objective is to minimize
| |
| | |
| :<math>S(\boldsymbol{\beta})
| |
| = \bigl\|\mathbf y - \mathbf X \boldsymbol \beta \bigr\|^2
| |
| = (\mathbf y-\mathbf X \boldsymbol \beta)^{\rm T}(\mathbf y-\mathbf X \boldsymbol \beta)
| |
| = \mathbf y ^{\rm T} \mathbf y - \boldsymbol \beta ^{\rm T} \mathbf X ^{\rm T} \mathbf y - \mathbf y ^{\rm T} \mathbf X \boldsymbol \beta + \boldsymbol \beta ^{\rm T} \mathbf X ^{\rm T} \mathbf X \boldsymbol \beta .</math>
| |
| | |
| Note that :<math>( \boldsymbol \beta ^{\rm T} \mathbf X ^{\rm T} \mathbf y ) ^{\rm T} = \mathbf y ^{\rm T} \mathbf X \boldsymbol \beta</math> has the dimension 1x1 (the number of columns of <math>\mathbf y</math>), so it is a scalar and the quantity to minimize becomes
| |
| | |
| :<math>S(\boldsymbol{\beta}) = \mathbf y ^{\rm T} \mathbf y - 2\boldsymbol \beta ^{\rm T} \mathbf X ^{\rm T} \mathbf y + \boldsymbol \beta ^{\rm T} \mathbf X ^{\rm T} \mathbf X \boldsymbol \beta .</math>
| |
| | |
| | |
| [[Matrix differentiation#Scalar-by-vector|Differentiating]] this with respect to <math>\boldsymbol \beta</math> and equating to zero to satisfy the first-order conditions gives
| |
| | |
| :<math>- \mathbf X^{\rm T} \mathbf y+ (\mathbf X^{\rm T} \mathbf X )\hat{\boldsymbol{\beta}} = 0,</math>
| |
| | |
| which is equivalent to the above-given normal equations. A sufficient condition for satisfaction of the second-order conditions for a minimum is that <math>\mathbf X</math> have full column rank, in which case <math>\mathbf X^{\rm T} \mathbf X</math> is [[Positive definite matrix|positive definite]].
| |
| | |
| ==Computation==
| |
| A general approach to the least squares problem <math>\operatorname{\,min} \, \big\|\mathbf y - X \boldsymbol \beta \big\|^2</math> can be described as follows. Suppose that we can find an ''n'' by ''m'' matrix '''S'''
| |
| such that '''XS''' is an
| |
| [[Linear projection|orthogonal projection]] onto the image of '''X'''. Then a solution to our minimization problem is given by
| |
| | |
| :<math>\boldsymbol \beta = S \mathbf y </math>
| |
| | |
| simply because
| |
| | |
| :<math> X \boldsymbol \beta = X ( S \mathbf y) = (X S) \mathbf y</math>
| |
| | |
| is exactly a sought for orthogonal projection of <math> \mathbf y </math> onto an image of '''X'''
| |
| ([[#Properties_of_the_least-squares_estimators|see the picture below]] and note that as explained in the
| |
| [[#Properties_of_the_least-squares_estimators|next section]] the image of '''X''' is just a subspace generated by column vectors of '''X''').
| |
| A few popular ways to find such a matrix ''S'' are described below.
| |
| | |
| ===Inverting the matrix of the normal equations===
| |
| The algebraic solution of the normal equations can be written as
| |
| | |
| : <math> \hat{\boldsymbol{\beta}} = (\mathbf X^ {\rm T} \mathbf X )^{-1} \mathbf X^ {\rm T} \mathbf y
| |
| = \mathbf X^+ \mathbf y</math>
| |
| | |
| where ''X''<sup>+</sup> is the [[Moore–Penrose pseudoinverse]] of ''X''. Although this equation is correct, and can work in many applications, it is not computationally efficient to invert the normal equations matrix. An exception occurs in [[numerical smoothing and differentiation]] where an analytical expression is required.
| |
| | |
| If the matrix ''X''<sup>T</sup>''X'' is [[Condition number|well-conditioned]] and [[Positive-definite matrix|positive definite]], implying that it has full [[rank (linear algebra)|rank]], the normal equations can be solved directly by using the [[Cholesky decomposition]] ''R''<sup>T</sup>''R'', where ''R'' is an upper [[triangular matrix]], giving:
| |
| | |
| : <math> R^{\rm T} R \hat{\boldsymbol{\beta}} = X^{\rm T} \mathbf y. </math>
| |
| | |
| The solution is obtained in two stages, a [[forward substitution]] step, solving for '''z''':
| |
| | |
| : <math> R^{\rm T} \mathbf z = X^{\rm T} \mathbf y,</math>
| |
| | |
| followed by a backward substitution, solving for <math>\hat{\boldsymbol{\beta}}</math>
| |
| | |
| : <math>R \hat{\boldsymbol{\beta}}= \mathbf z.</math>
| |
| | |
| Both substitutions are facilitated by the triangular nature of ''R''.
| |
| | |
| See [[linear regression#Example|example of linear regression]] for a worked-out numerical example with three parameters.
| |
| | |
| ===Orthogonal decomposition methods===
| |
| Orthogonal decomposition methods of solving the least squares problem are slower than the normal equations method but are more [[Numerical stability|numerically stable]], from not having to form the product ''X''<sup>T</sup>''X''.
| |
| | |
| The residuals are written in matrix notation as
| |
| | |
| :<math>\mathbf r= \mathbf y - X \hat{\boldsymbol{\beta}}.</math>
| |
| | |
| The matrix ''X'' is subjected to an orthogonal decomposition; the [[QR decomposition]] will serve to illustrate the process.
| |
| :<math>X=QR \ </math>
| |
| where ''Q'' is an ''m''×''m'' [[orthogonal matrix]] and ''R'' is an ''m''×''n'' matrix which is [[block matrix|partitioned]] into a ''n''×''n'' [[triangular matrix|upper triangular]] block, ''R''<sub>''n''</sub>, and a (''m'' − ''n'')×''n'' zero block '''0'''.
| |
| | |
| :<math>R= \begin{bmatrix}
| |
| R_n \\
| |
| \mathbf{0} \end{bmatrix}. </math>
| |
| | |
| The residual vector is left-multiplied by ''Q''<sup>T</sup>.
| |
| | |
| :<math>Q^{\rm T} \mathbf r = Q^{\rm T} \mathbf y - \left( Q^{\rm T} Q \right) R \hat{\boldsymbol{\beta}}= \begin{bmatrix}
| |
| \left(Q^{\rm T} \mathbf y \right)_n - R_n \hat{\boldsymbol{\beta}} \\
| |
| \left(Q^{\rm T} \mathbf y \right)_{m-n}
| |
| \end{bmatrix}
| |
| = \begin{bmatrix}
| |
| \mathbf u \\
| |
| \mathbf v
| |
| \end{bmatrix}
| |
| </math>
| |
| | |
| Because ''Q'' is [[orthogonal matrix|orthogonal]], the sum of squares of the residuals, ''s'', may be written as:
| |
| :<math>s = \|\mathbf r \|^2 = \mathbf r^{\rm T} \mathbf r = \mathbf r^{\rm T} Q Q^{\rm T} \mathbf r = \mathbf u^{\rm T} \mathbf u + \mathbf v^{\rm T} \mathbf v </math>
| |
| Since '''v''' doesn't depend on '''''β''''', the minimum value of ''s'' is attained when the upper block, '''u''', is zero. Therefore the parameters are found by solving:
| |
| :<math> R_n \hat{\boldsymbol{\beta}} =\left(Q^{\rm T} \mathbf y \right)_n.</math>
| |
| These equations are easily solved as ''R''<sub>''n''</sub> is upper triangular.
| |
| | |
| An alternative decomposition of ''X'' is the [[singular value decomposition]] (SVD)<ref>{{cite book |title=Solving Least Squares Problems |last=Lawson |first=C. L. |authorlink= |coauthors=Hanson, R. J. |year=1974 |publisher=Prentice-Hall |location=Englewood Cliffs, NJ |isbn=0-13-822585-0 |pages= |url= }}</ref>
| |
| | |
| :<math> X = U \Sigma V^{\rm T}. \ </math>
| |
| | |
| where ''U'' is ''m'' by ''m'' orthogonal matrix, ''V'' is ''n'' by ''n'' orthogonal matrix and <math>\Sigma</math> is an ''m'' by ''n'' matrix with all its elements outside of the main diagonal equal to ''0''. The (pseudo)-inverse of <math>\Sigma</math> is easily obtained by inverting its non-zero diagonal elements. Hence,
| |
| | |
| :<math> \mathbf X V \Sigma^+ U^{\rm T} = U \Sigma
| |
| V^{\rm T} V \Sigma^+ U^{\rm T} = U P U^{\rm T},</math>
| |
| | |
| where ''P'' is obtained from <math>\Sigma</math> by replacing its non-zero diagonal elements with ones. Since ''X'' and <math>\Sigma</math> are obviously of the same rank (one of the many advantages of [[singular value decomposition]])
| |
| | |
| :<math> \mathbf X V \Sigma^+ U^{\rm T} = U P U^{\rm T} </math>
| |
| | |
| is an orthogonal projection onto the image (column-space) of ''X'' and in accordance with a general approach described in the introduction above,
| |
| | |
| :<math> \beta = V\Sigma^+ U^{\rm T} \mathbf y </math>
| |
| | |
| is a solution of a least squares problem. This method is the most computationally intensive, but is particularly useful if the normal equations matrix, ''X''<sup>T</sup>''X'', is very ill-conditioned (i.e. if its [[condition number]] multiplied by the machine's relative [[round-off error]] is appreciably large). In that case, including the smallest [[singular value]]s in the inversion merely adds numerical noise to the solution. This can be cured using the truncated SVD approach, giving a more stable and exact answer, by explicitly setting to zero all singular values below a certain threshold and so ignoring them, a process closely related to [[factor analysis]].
| |
| | |
| == Properties of the least-squares estimators ==
| |
| [[Image:Linear least squares geometric interpretation.png|right|thumb|The residual vector, <math>y-X \hat{\boldsymbol\beta},</math> which corresponds to the solution of a least squares system, <math>y=X\boldsymbol \beta +\epsilon,</math> is orthogonal to the [[column space]] of the matrix <math>X.</math>]]
| |
| The gradient equations at the minimum can be written as
| |
| | |
| :<math>(\mathbf y - X \hat{\boldsymbol{\beta}})^{\rm T} X=0.</math>
| |
| | |
| A geometrical interpretation of these equations is that the vector of residuals, <math>\mathbf y - X \hat{\boldsymbol{\beta}}</math> is orthogonal to the [[column space]] of ''X'', since the dot product <math>(\mathbf y-X\hat{\boldsymbol{\beta}})\cdot X \mathbf v</math> is equal to zero for ''any'' conformal vector, '''v'''. This means that <math>\mathbf y - X \boldsymbol{\hat \beta}</math> is the shortest of all possible vectors <math>\mathbf{y}- X \boldsymbol \beta</math>, that is, the variance of the residuals is the minimum possible. This is illustrated at the right.
| |
| | |
| Introducing <math>\hat{\boldsymbol{\gamma}}</math> and a matrix ''K'' with the assumption that a matrix <math>[X \ K]</math> is non-singular and ''K''<sup>T</sup> ''X'' = 0 (cf. [[Linear_projection#Orthogonal_projections|Orthogonal projections]]), the residual vector should satisfy the following equation:
| |
| :<math>\hat{\mathbf{r}} \triangleq \mathbf{y} - X \hat{\boldsymbol{\beta}} = K \hat{{\boldsymbol{\gamma}}}.</math>
| |
| The equation and solution of linear least squares are thus described as follows:
| |
| :<math> \mathbf{y} = \begin{bmatrix}X & K\end{bmatrix} \begin{pmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{\gamma}} \end{pmatrix} ,</math>
| |
| :<math> \begin{pmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{\gamma}} \end{pmatrix} = \begin{bmatrix}X & K\end{bmatrix}^{-1} \mathbf{y} = \begin{bmatrix} (X^{\rm T} X)^{-1} X^{\rm T} \\ (K^{\rm T} K)^{-1} K^{\rm T} \end{bmatrix} \mathbf{y} .</math>
| |
| | |
| If the experimental errors, <math>\epsilon \,</math>, are uncorrelated, have a mean of zero and a constant variance, <math>\sigma</math>, the [[Gauss-Markov theorem]] states that the least-squares estimator, <math>\hat{\boldsymbol{\beta}}</math>, has the minimum variance of all estimators that are linear combinations of the observations. In this sense it is the best, or optimal, estimator of the parameters. Note particularly that this property is independent of the statistical [[distribution function]] of the errors. In other words, ''the distribution function of the errors need not be a [[normal distribution]]''. However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased.
| |
| | |
| For example, it is easy to show that the [[arithmetic mean]] of a set of measurements of a quantity is the least-squares estimator of the value of that quantity. If the conditions of the Gauss-Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be.
| |
| | |
| However, in the case that the experimental errors do belong to a normal distribution, the least-squares estimator is also a [[maximum likelihood]] estimator.<ref>{{cite book |title=The Mathematics of Physics and Chemistry |last=Margenau |first=Henry |authorlink= |coauthors=Murphy, George Moseley |year=1956 |publisher=Van Nostrand |location=Princeton |isbn= |pages= |url= }}</ref>
| |
| | |
| These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.
| |
| | |
| === Limitations ===
| |
| An assumption underlying the treatment given above is that the independent variable, ''x'', is free of error. In practice, the errors on the measurements of the independent variable are usually much smaller than the errors on the dependent variable and can therefore be ignored. When this is not the case, [[total least squares]] or more generally [[errors-in-variables models]], or ''rigorous least squares'', should be used. This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure.<ref name="pg">{{cite book |title=Data fitting in the Chemical Sciences |last=Gans |first=Peter |authorlink= |coauthors= |year=1992 |publisher=Wiley |location=New York |isbn=0-471-93412-7 |pages= |url= }}</ref><ref>{{cite book |title=Statistical adjustment of Data |last=Deming |first=W. E. |authorlink= |coauthors= |year=1943 |publisher=Wiley |location=New York |isbn= |pages= |url= }}</ref>
| |
| | |
| In some cases the (weighted) normal equations matrix ''X''<sup>T</sup>''X'' is [[ill-conditioned]]. When fitting polynomials the normal equations matrix is a [[Vandermonde matrix]]. Vandermode matrices become increasingly ill-conditioned as the order of the matrix increases.{{citation needed|date=December 2010}} In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate.{{citation needed|date=December 2010}} Various [[regularization (mathematics)|regularization]] techniques can be applied in such cases, the most common of which is called [[Tikhonov regularization|ridge regression]]. If further information about the parameters is known, for example, a range of possible values of <math>\mathbf{\hat{\boldsymbol{\beta}}}</math>, then various techniques can be used to increase the stability of the solution. For example, see [[#Constrained_linear_least_squares|constrained least squares]].
| |
| | |
| Another drawback of the least squares estimator is the fact that the norm of the residuals, <math>\| \mathbf y - X\hat{\boldsymbol{\beta}} \|</math> is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter <math>\mathbf{\hat{\boldsymbol{\beta}}}</math>, e.g., a small value of <math>\|{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}\|</math>.{{citation needed|date=December 2010}} However, since the true parameter <math>{\boldsymbol{\beta}}</math> is necessarily unknown, this quantity cannot be directly minimized. If a [[prior probability]] on <math>\hat{\boldsymbol{\beta}}</math> is known, then a [[Minimum mean square error|Bayes estimator]] can be used to minimize the [[mean squared error]], <math>E \left\{ \| {\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}} \|^2 \right\} </math>. The least squares method is often applied when no prior is known. Surprisingly, when several parameters are being estimated jointly, better estimators can be constructed, an effect known as [[Stein's phenomenon]]. For example, if the measurement error is [[Normal distribution|Gaussian]], several estimators are known which [[dominating decision rule|dominate]], or outperform, the least squares technique; the best known of these is the [[James–Stein estimator]]. This is an example of more general [[shrinkage estimator]]s that have been applied to regression problems.
| |
| | |
| ==Weighted linear least squares==
| |
| {{see also|Least squares#Weighted least squares}}
| |
| {{see also|Weighted mean}}
| |
| | |
| In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares:
| |
| | |
| :<math>\hat{\boldsymbol{\beta}} = \underset{\boldsymbol \beta}{ \operatorname{arg\,min} }\, \sum_{i=1}^{m} w_i \left|y_i - \sum_{j=1}^{n} X_{ij}\beta_j\right|^2 = \underset{\boldsymbol \beta}{ \operatorname{arg\,min} } \, \big\|W^{1/2} (\mathbf y - X \boldsymbol \beta) \big\|^2.</math>
| |
| | |
| where ''w''<sub>''i''</sub> > 0 is the weight of the ''i''th observation, and ''W'' is the [[diagonal matrix]] of such weights.
| |
| | |
| The weights should, ideally, be equal to the [[multiplicative inverse|reciprocal]] of the [[variance]] of the measurement.<ref>This implies that the observations are uncorrelated. If the observations are [[correlated]], the expression <math>\textstyle S=\sum_k \sum_j r_k W_{kj} r_j\,</math> applies. In this case the weight matrix should ideally be equal to the inverse of the [[variance-covariance matrix]] of the observations.</ref> <ref>{{cite book|author=Strutz, T.| title=Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond) |publisher=Vieweg+Teubner | year=2010 | isbn= 978-3-8348-1022-9}}, chapter 3</ref>
| |
| The normal equations are then:
| |
| | |
| :<math>\left(X^{\rm T} W X \right)\hat{\boldsymbol{\beta}} = X^{\rm T} W \mathbf y.</math>
| |
| | |
| This method is used in [[iteratively reweighted least squares]].
| |
| | |
| ===Parameter errors and correlation===
| |
| The estimated parameter values are linear combinations of the observed values
| |
| | |
| :<math>\hat{\boldsymbol{\beta}} = (X^{\rm T} W X)^{-1} X^{\rm T} W \mathbf y. \, </math>
| |
| | |
| Therefore an expression for the residuals (i.e., the ''estimated'' errors in the observations) can be obtained by [[error propagation]] from the errors in the observations. Let the [[variance-covariance matrix]] for the observations be denoted by ''M'' and that of the parameters by ''M<sup>β</sup>''. Then,
| |
| | |
| :<math>M^\beta= (X^{\rm T} W X)^{-1} X^{\rm T} W M W^{\rm T} X (X^{\rm T} W^{\rm T} X)^{-1}.</math>
| |
| <!-- Commented out: W is a diagonal matrix. so it is equal to its transpose {{Citation needed|date=August 2009|reason=Shouldn't that last inverted (X'*W*X) be transposed as well?}} -->
| |
| | |
| When ''W'' = ''M''<sup> −1</sup> this simplifies to
| |
| | |
| :<math>M^\beta=(X^{\rm T} W X)^{-1}.</math>
| |
| | |
| When unit weights are used (''W'' = ''I'') it is implied that the experimental errors are uncorrelated and all equal: ''M'' = ''σ''<sup>2</sup>''I'', where ''σ''<sup>2</sup> is the variance of an observation, and ''I'' is the [[identity matrix]]. In this case ''σ''<sup>2</sup> is approximated by <math>\frac{S}{m-n}</math>, where ''S'' is the minimum value of the objective function
| |
| | |
| :<math>M^\beta=\frac{S}{m-n}(X^{\rm T} X)^{-1}.</math>
| |
| | |
| The denominator, ''m'' − ''n'', is the number of [[Degrees of freedom (statistics)|degrees of freedom]]; see [[Degrees of freedom (statistics)#Effective degrees of freedom|effective degrees of freedom]] for generalizations for the case of correlated observations. In all cases, the [[variance]] of the parameter <math>\beta_i</math> is given by <math>M^\beta_{ii}</math> and the [[covariance]] between parameters <math>\beta_i</math> and <math>\beta_j</math> is given by <math>M^\beta_{ij}</math>. [[Standard deviation]] is the square root of variance, and the correlation coefficient is given by <math>\rho_{ij} = M^\beta_{ij}/(\sigma_i \sigma_j)</math>. These error estimates reflect only [[random errors]] in the measurements. The true uncertainty in the parameters is larger due to the presence of [[systematic errors]] which, by definition, cannot be quantified.
| |
| Note that even though the observations may be un-correlated, the parameters are typically [[Pearson product-moment correlation coefficient|correlated]].
| |
| | |
| ===Parameter confidence limits===
| |
| {{Main|Confidence interval}}
| |
| It is often ''assumed'', for want of any concrete evidence but often appealing to the [[central limit theorem]] -- see [[Normal distribution#Occurrence]] -- that the error on each observation belongs to a [[normal distribution]] with a mean of zero and standard deviation <math>\sigma</math>. Under that assumption the following probabilities can be derived for a single scalar parameter estimate in terms of its estimated standard error <math>se_{\beta}</math> (given [[Ordinary least squares#Large sample properties|here]]):
| |
| :68% that the interval <math>\hat \beta \pm se_{\beta}</math> encompasses the true coefficient value
| |
| :95% that the interval <math>\hat \beta \pm 2se_{\beta}</math> encompasses the true coefficient value
| |
| :99% that the interval <math>\hat \beta \pm 2.5se_{\beta}</math> encompasses the true coefficient value
| |
| The assumption is not unreasonable when ''m'' >> ''n''. If the experimental errors are normally distributed the parameters will belong to a [[Student's t-distribution]] with ''m'' − ''n'' [[Degrees of freedom (statistics)|degrees of freedom]]. When ''m'' >> ''n'' Student's t-distribution approximates a normal distribution. Note, however, that these confidence limits cannot take systematic error into account. Also, parameter errors should be quoted to one significant figure only, as they are subject to [[sampling error]].<ref>{{cite book |title=The Statistical Analysis of Experimental Data |last=Mandel |first=John |authorlink= |coauthors= |year=1964 |publisher=Interscience |location=New York |isbn= |pages= |url= }}</ref>
| |
| | |
| When the number of observations is relatively small, [[Chebychev's inequality]] can be used for an upper bound on probabilities, regardless of any assumptions about the distribution of experimental errors: the maximum probabilities that a parameter will be more than 1, 2 or 3 standard deviations away from its expectation value are 100%, 25% and 11% respectively.
| |
| | |
| === Residual values and correlation ===
| |
| | |
| The [[errors and residuals in statistics|residuals]] are related to the observations by
| |
| | |
| :<math>\mathbf{\hat r} = \mathbf y- X \hat{\boldsymbol{\beta}}= \mathbf y- H \mathbf y = (I - H) \mathbf y </math>
| |
| | |
| where ''H'' is the [[idempotent matrix]] known as the [[hat matrix]]:
| |
| | |
| :<math>H = X \left(X^{\rm T} W X \right)^{-1}X^{\rm T} W </math>
| |
| | |
| and ''I'' is the [[identity matrix]]. The variance-covariance matrice of the residuals, '''M<sup>r</sup>''' is given by
| |
| | |
| :<math>M^{\mathbf r}=\left(I-H \right) M \left(I-H \right)^{\rm T}.</math>
| |
| | |
| Thus the residuals are correlated, even if the observations are not.
| |
| | |
| When <math>W = M^{-1}</math>,
| |
| :<math>M^{\mathbf r}=\left(I-H \right) M.</math>
| |
| | |
| The sum of residual values is equal to zero whenever the model function contains a constant term. Left-multiply the expression for the residuals by ''X''<sup>T</sup>:
| |
| | |
| :<math>X^{\rm T} \hat{\mathbf r}=X^{\rm T} \mathbf y- X^{\rm T} X \hat{\boldsymbol{\beta}} = X^{\rm T} \mathbf y- (X^{\rm T} X)(X^{\rm T} X)^{-1}X^{\rm T} \mathbf y= \mathbf 0</math>
| |
| | |
| Say, for example, that the first term of the model is a constant, so that <math>X_{i1}=1</math> for all ''i''. In that case it follows that
| |
| | |
| :<math>\sum_i^m X_{i1} \hat r_i=\sum_i^m \hat r_i=0.</math>
| |
| | |
| Thus, in the [[#Motivational example|motivational example]], above, the fact that the sum of residual values is equal to zero it is not accidental but is a consequence of the presence of the constant term, α, in the model.
| |
| | |
| If experimental error follows a [[normal distribution]], then, because of the linear relationship between residuals and observations, so should residuals,<ref>{{cite book |title=Multivariate analysis |last=Mardia |first=K. V. |authorlink= |coauthors=Kent, J. T.; Bibby, J. M. |year=1979 |publisher=Academic Press |location=New York |isbn=0-12-471250-9 |pages= |url= }}</ref> but since the observations are only a sample of the population of all possible observations, the residuals should belong to a [[Student's t-distribution]]. [[Studentized residual]]s are useful in making a statistical test for an [[outlier]] when a particular residual appears to be excessively large.
| |
| | |
| == Objective function ==
| |
| The optimal value of the objective function, found by substituting in the optimal expression for the coefficient vector, can be written as
| |
| | |
| :<math>S=\mathbf y^{\rm T} (I-H)^{\rm T} (I-H) \mathbf y= \mathbf y^{\rm T} (I-H) \mathbf y,</math>
| |
| | |
| the latter equality holding since (''I'' – ''H'') is symmetric and idempotent. It can be shown from this<ref>{{cite book |title=Statistics in Physical Science |last=Hamilton |first=W. C. |authorlink= |coauthors= |year=1964 |publisher=Ronald Press |location=New York |isbn= |pages= |url= }}</ref> that under an appropriate assignment of weights the [[expected value]] of ''S'' is ''m-n''. If instead unit weights are assumed, the expected value of ''S'' is <math>(m-n)\sigma^2</math>, where <math>\sigma^2</math> is the variance of each observation.
| |
| | |
| If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a [[Chi-squared distribution|chi-squared (<math>\chi ^2</math>) distribution]] with ''m-n'' [[Degrees of freedom (statistics)|degrees of freedom]]. Some illustrative percentile values of <math>\chi ^2</math> are given in the following table.<ref>{{cite book |title=Schaum's outline of theory and problems of probability and statistics |last=Spiegel |first=Murray R. |authorlink= |coauthors= |year=1975 |publisher=McGraw-Hill |location=New York |isbn=0-585-26739-1 |pages= |url= }}</ref>
| |
| :{| class="wikitable"
| |
| |-
| |
| ! m-n
| |
| ! <math>\chi ^2 _{0.50}</math>
| |
| ! <math>\chi ^2 _{0.95}</math>
| |
| ! <math>\chi ^2 _{0.99}</math>
| |
| |-
| |
| | 10
| |
| | 9.34
| |
| | 18.3
| |
| | 23.2
| |
| |-
| |
| | 25
| |
| | 24.3
| |
| | 37.7
| |
| | 44.3
| |
| |-
| |
| | 100
| |
| | 99.3
| |
| | 124
| |
| | 136
| |
| |}
| |
| These values can be used for a statistical criterion as to the [[goodness-of-fit]]. When unit weights are used, the numbers should be divided by the variance of an observation.
| |
| | |
| ==Constrained linear least squares==
| |
| | |
| Often it is of interest to solve a linear least squares problem with an additional constraint on the solution. With constrained linear least squares, the original equation
| |
| | |
| :<math>\mathbf {X} \boldsymbol {\beta} = \mathbf {y}</math>
| |
| | |
| must be satisfied (in the least squares sense) while also ensuring that some other property of <math>\boldsymbol {\beta}</math> is maintained. There are often special purpose algorithms for solving such problems efficiently. Some examples of constraints are given below:
| |
| | |
| * [[Constrained generalized inverse|Equality constrained]] least squares: the elements of <math>\boldsymbol {\beta}</math> must exactly satisfy <math>\mathbf {L} \boldsymbol {\beta} = \mathbf {d}</math>
| |
| * [[Tikhonov regularization|Regularized]] least squares: the elements of <math>\boldsymbol {\beta}</math> must satisfy <math>\| \mathbf {L} \boldsymbol {\beta} - \mathbf {d} \| \le \rho </math>
| |
| * [[Non-negative least squares]] (NNLS): The vector <math>\boldsymbol {\beta}</math> satisfies the [[ordered vector space|vector inequality]] <math>\boldsymbol {\beta} \geq \boldsymbol{0}</math> that is defined componentwise --- that is, each component must be either positive or zero.
| |
| * Box-constrained least squares: The vector <math>\boldsymbol {\beta}</math> satisfies the [[ordered vector space|vector inequalities]] <math> \boldsymbol{lb} \leq \boldsymbol{\beta} \leq \boldsymbol{ub}</math>, each of which is defined componentwise.
| |
| * Integer constrained least squares: all elements of <math>\boldsymbol {\beta}</math> must be [[integer]] (instead of [[real number]]s).
| |
| * Real constrained least squares: all elements of <math>\boldsymbol {\beta}</math> must be real (rather than [[complex number]]s).
| |
| * Phase constrained least squares: all elements of <math>\boldsymbol {\beta}</math> must have the same [[Arg (mathematics)|phase]].
| |
| | |
| When the constraint only applies to some of the variables, the mixed problem may be solved using '''separable least squares''' by letting <math>\mathbf {X} = [\mathbf {X_1} \mathbf {X_2} ]</math> and <math>\mathbf {\beta}^{\rm T} = [\mathbf {\beta_1}^{\rm T} \mathbf {\beta_2}^{\rm T}]</math> represent the unconstrained (1) and constrained (2) components. Then substituting the least squares solution for <math>\mathbf {\beta_1}</math>, i.e.
| |
| | |
| :<math>\hat{\boldsymbol {\beta_1}} = \mathbf {X_1}^+ (\mathbf {y} - \mathbf {X_2} \boldsymbol {\beta_2})</math>
| |
| | |
| back into the original expression gives (following some rearrangement) an equation that can be solved as a purely constrained problem in <math>\mathbf {\beta_2}</math>.
| |
| | |
| :<math> \mathbf{P} \mathbf {X_2} \boldsymbol {\beta_2} = \mathbf{P}\mathbf {y}</math>
| |
| | |
| where <math>\mathbf{P}:=\mathbf{I}-\mathbf {X_1} \mathbf {X_1}^+</math> is a [[projection matrix]]. Following the constrained estimation of <math>\hat{\boldsymbol {\beta_2}}</math> the vector <math>\hat{\boldsymbol {\beta_1}}</math> is obtained from the expression above.
| |
| | |
| ==Typical uses and applications==
| |
| | |
| * [[Polynomial regression|Polynomial fitting]]: models are [[polynomial]]s in an independent variable, ''x'':
| |
| ** Straight line: <math>f(x, \boldsymbol \beta)=\beta_1 +\beta_2 x</math>.<ref>{{cite book |title=Analysis of Straight-Line Data |last=Acton |first=F. S. |authorlink= |coauthors= |year=1959 |publisher=Wiley |location=New York |isbn= |pages= |url= }}</ref>
| |
| ** Quadratic: <math>f(x, \boldsymbol \beta)=\beta_1 + \beta_2 x +\beta_3 x^2</math>.
| |
| ** Cubic, quartic and higher polynomials. For [[polynomial regression|regression with high-order polynomials]], the use of [[orthogonal polynomials]] is recommended.<ref>{{cite book |title=Numerical Methods of Curve Fitting |last=Guest |first=P. G. |authorlink= |coauthors= |year=1961 |publisher=Cambridge University Press |location=Cambridge |isbn= |pages= |url= }}{{page needed|date=December 2010}}</ref>
| |
| *[[Numerical smoothing and differentiation]] — this is an application of polynomial fitting.
| |
| *Multinomials in more than one independent variable, including surface fitting
| |
| *Curve fitting with [[B-spline]]s <ref name=pg/>
| |
| *[[Chemometrics]], [[Calibration curve]], [[Standard addition]], [[Gran plot]], [[Beer-Lambert law#Chemical analysis|analysis of mixtures]]
| |
| | |
| ===Uses in data fitting===
| |
| | |
| The primary application of linear least squares is in [[data fitting]]. Given a set of ''m'' data points <math>y_1, y_2,\dots, y_m,</math> consisting of experimentally measured values taken at ''m'' values <math>x_1, x_2,\dots, x_m</math> of an independent variable (<math>x_i</math> may be scalar or vector quantities), and given a model function <math>y=f(x, \boldsymbol \beta),</math> with <math>\boldsymbol \beta = (\beta_1, \beta_2, \dots, \beta_n),</math> it is desired to find the parameters <math>\beta_j</math> such that the model function "best" fits the data. In linear least squares, linearity is meant to be with respect to parameters <math>\beta_j,</math> so
| |
| | |
| :<math>f(x, \boldsymbol \beta) = \sum_{j=1}^{n} \beta_j \phi_j(x).</math>
| |
| | |
| Here, the functions <math>\phi_j</math> may be '''nonlinear''' with respect to the variable '''x'''.
| |
| | |
| Ideally, the model function fits the data exactly, so
| |
| | |
| : <math>y_i = f(x_i, \boldsymbol \beta)</math>
| |
| | |
| for all <math>i=1, 2, \dots, m.</math> This is usually not possible in practice, as there are more data points than there are parameters to be determined. The approach chosen then is to find the minimal possible value of the sum of squares of the [[residual (statistics)|residual]]s
| |
| :<math>r_i(\boldsymbol \beta)= y_i - f(x_i, \boldsymbol \beta),\ (i=1, 2, \dots, m) </math>
| |
| so to minimize the function
| |
| | |
| :<math>S(\boldsymbol \beta)=\sum_{i=1}^{m}r_i^2(\boldsymbol \beta).</math>
| |
| | |
| After substituting for <math>r_i</math> and then for <math>f</math>, this minimization problem becomes the quadratic minimization problem above with
| |
| | |
| :<math>X_{ij}=\phi_j(x_i),</math>
| |
| | |
| and the best fit can be found by solving the normal equations. | |
| | |
| ==Further discussion==
| |
| The ''numerical methods for linear least squares'' are important because [[linear regression]] models are among the most important types of model, both as formal [[statistical model]]s and for exploration of data-sets. The majority of [[Comparison of statistical packages|statistical computer packages]] contain facilities for regression analysis that make use of linear least squares computations. Hence it is appropriate that considerable effort has been devoted to the task of ensuring that these computations are undertaken efficiently and with due regard to [[round-off error]].
| |
| | |
| Individual statistical analyses are seldom undertaken in isolation, but rather are part of a sequence of investigatory steps. Some of the topics involved in considering numerical methods for linear least squares relate to this point. Thus important topics can be
| |
| *Computations where a number of similar, and often nested, models are considered for the same data-set. That is, where models with the same [[dependent variable]] but different sets of [[independent variables]] are to be considered, for essentially the same set of data-points.
| |
| *Computations for analyses that occur in a sequence, as the number of data-points increases.
| |
| *Special considerations for very extensive data-sets.
| |
| | |
| Fitting of linear models by least squares often, but not always, arise in the context of [[statistical analysis]]. It can therefore be important that considerations of computation efficiency for such problems extend to all of the auxiliary quantities required for such analyses, and are not restricted to the formal solution of the [[linear least squares (mathematics)|linear least squares]] problem.
| |
| | |
| ===Rounding errors===
| |
| Matrix calculations, like any other, are affected by [[rounding error]]s. An early summary of these effects, regarding the choice of computation methods for matrix inversion, was provided by Wilkinson. <ref>Wilkinson, J.H. (1963) "Chapter 3: Matrix Computations", ''Rounding Errors in Algebraic Processes'', London: Her Majesty's Stationery Office (National Physical Laboratory, Notes in Applied Science, No.32)</ref>
| |
| | |
| ==References==
| |
| {{reflist}}
| |
| | |
| ==Further reading==
| |
| *{{Cite book | author=Bevington, Philip R | coauthors=Robinson, Keith D | title=Data Reduction and Error Analysis for the Physical Sciences | year=2003 | publisher=McGraw Hill | location= | isbn=0-07-247227-8 | pages=}}
| |
| *{{Citation
| |
| | last=Barlow
| |
| | first=Jesse L.
| |
| | author-link=
| |
| | chapter=Chapter 9: Numerical aspects of Solving Linear Least Squares Problems
| |
| | editor-last=Rao | editor-first=C.R.
| |
| | title=Computational Statistics | series=Handbook of Statistics | volume=9
| |
| | publisher=North-Holland
| |
| | publication-date=1993
| |
| | isbn=0-444-88096-8
| |
| }}
| |
| *{{Cite book | last1=Björck |first1= Åke | authorlink= | coauthors= | title=Numerical methods for least squares problems | year=1996 | publisher=SIAM | location=Philadelphia | isbn=0-89871-360-9 | pages=}}
| |
| *{{Citation
| |
| | last=Goodall
| |
| | first=Colin R.
| |
| | author-link=
| |
| | chapter=Chapter 13: Computation using the QR decomposition
| |
| | editor-last=Rao | editor-first=C.R.
| |
| | title=Computational Statistics | series=Handbook of Statistics | volume=9
| |
| | publisher=North-Holland
| |
| | publication-date=1993
| |
| | isbn=0-444-88096-8
| |
| }}
| |
| *{{Citation
| |
| | last=National Physical Laboratory
| |
| | first=
| |
| | chapter=Chapter 1: Linear Equations and Matrices: Direct Methods
| |
| | title=Modern Computing Methods
| |
| |edition =2nd
| |
| |series= Notes on Applied Science
| |
| | volume=16
| |
| | publisher=Her Majesty's Stationery Office
| |
| | publication-date=1961
| |
| }}
| |
| *{{Citation
| |
| | last=National Physical Laboratory
| |
| | first=
| |
| | chapter=Chapter 2: Linear Equations and Matrices: Direct Methods on Automatic Computers
| |
| | title=Modern Computing Methods
| |
| |edition =2nd
| |
| |series= Notes on Applied Science
| |
| | volume=16
| |
| | publisher=Her Majesty's Stationery Office
| |
| | publication-date=1961
| |
| }}
| |
| | |
| ==External links==
| |
| | |
| *[http://mathworld.wolfram.com/LeastSquaresFitting.html Least Squares Fitting – From MathWorld]
| |
| *[http://mathworld.wolfram.com/LeastSquaresFittingPolynomial.html Least Squares Fitting-Polynomial – From MathWorld]
| |
| | |
| {{Least Squares and Regression Analysis}}
| |
| | |
| {{DEFAULTSORT:Linear Least Squares}}
| |
| [[Category:Regression analysis]]
| |
| [[Category:Computational statistics]]
| |
| [[Category:Numerical linear algebra]]
| |
| [[Category:Least squares]]
| |
| | |
| [[af:Kleinste-kwadratemetode]]
| |
| [[cs:Metoda nejmenších čtverců]]
| |
| [[de:Methode der kleinsten Quadrate]]
| |
| [[es:Mínimos cuadrados]]
| |
| [[fa:کمینه مربعات خطی]]
| |
| [[fr:Méthode des moindres carrés]]
| |
| [[gl:Mínimos cadrados]]
| |
| [[gl:Mínimos cadrados lineais]]
| |
| [[it:Minimi Quadrati]]
| |
| [[he:שיטת הריבועים הפחותים]]
| |
| [[la:Methodus quadratorum minimorum]]
| |
| [[hu:Legkisebb négyzetek módszere]]
| |
| [[nl:Kleinste-kwadratenmethode]]
| |
| [[ja:最小二乗法]]
| |
| [[pl:Metoda najmniejszych kwadratów]]
| |
| [[pt:Método dos mínimos quadrados]]
| |
| [[ru:Метод наименьших квадратов]]
| |
| [[su:Kuadrat leutik]]
| |
| [[uk:Метод найменших квадратів]]
| |
| [[fi:Pienimmän neliösumman menetelmä]]
| |
| [[sv:Minstakvadratmetoden]]
| |
| [[tr:En küçük kareler yöntemi]]
| |
| [[ur:لکیری اقل مربعات]]
| |
| [[vi:Bình phương tối thiểu]]
| |
| [[vi:Bình phương tối thiểu tuyến tính]]
| |
| [[zh:最小二乘法]]
| |