|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| In [[numerical linear algebra]], the [[conjugate gradient method]] is an [[iterative method]] for numerically solving the [[System of linear equations|linear system]]
| | I am Aaliyah from Middelstum. I love to play Cello. Other hobbies are Fantasy Football.<br><br>my web blog ... [http://gejalakankeranal.blogspot.com/ anus] |
| | |
| :<math>\boldsymbol{Ax}=\boldsymbol{b}</math>
| |
| | |
| where <math>\boldsymbol{A}</math> is [[Symmetric matrix|symmetric]] [[Positive-definite matrix|positive-definite]]. The conjugate gradient method can be derived from several different perspectives, including specialization of the [[conjugate direction method]] for [[Optimization (mathematics)|optimization]], and variation of the [[Arnoldi iteration|Arnoldi]]/[[Lanczos iteration|Lanczos]] iteration for [[eigenvalue]] problems.
| |
| | |
| The intent of this article is to document the important steps in these derivations.
| |
| | |
| ==Derivation from the conjugate direction method==
| |
| {{Expand section|date=April 2010}}
| |
| The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function
| |
| | |
| :<math>f(\boldsymbol{x})=\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x}\text{.}</math>
| |
| | |
| ===The conjugate direction method===
| |
| In the conjugate direction method for minimizing
| |
| | |
| :<math>f(\boldsymbol{x})=\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x}\text{.}</math>
| |
| | |
| one starts with an initial guess <math>\boldsymbol{x}_0</math> and the corresponding residual <math>\boldsymbol{r}_0=\boldsymbol{b}-\boldsymbol{Ax}_0</math>, and computes the iterate and residual by the formulae
| |
| | |
| :<math>\begin{align}
| |
| \alpha_i&=\frac{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{r}_i}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\text{,}\\
| |
| \boldsymbol{x}_{i+1}&=\boldsymbol{x}_i+\alpha_i\boldsymbol{p}_i\text{,}\\
| |
| \boldsymbol{r}_{i+1}&=\boldsymbol{r}_i-\alpha_i\boldsymbol{Ap}_i
| |
| \end{align}</math>
| |
| | |
| where <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are a series of mutually conjugate directions, i.e.,
| |
| | |
| :<math>\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_j=0</math>
| |
| | |
| for any <math>i\neq j</math>.
| |
| | |
| The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math>. Specific choices lead to various methods including the conjugate gradient method and [[Gaussian elimination]].
| |
| | |
| ==Derivation from the Arnoldi/Lanczos iteration==
| |
| {{see|Arnoldi iteration|Lanczos iteration}}
| |
| The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.
| |
| | |
| ===The general Arnoldi method===
| |
| In the Arnoldi iteration, one starts with a vector <math>\boldsymbol{r}_0</math> and gradually builds an [[orthonormal]] basis <math>\{\boldsymbol{v}_1,\boldsymbol{v}_2,\boldsymbol{v}_3,\ldots\}</math> of the [[Krylov subspace]]
| |
| | |
| :<math>\mathcal{K}(\boldsymbol{A},\boldsymbol{r}_0)=\{\boldsymbol{r}_0,\boldsymbol{Ar}_0,\boldsymbol{A}^2\boldsymbol{r}_0,\ldots\}</math>
| |
| | |
| by defining <math>\boldsymbol{v}_i=\boldsymbol{w}_i/\lVert\boldsymbol{w}_i\rVert_2</math> where
| |
| | |
| :<math>\boldsymbol{w}_i=\begin{cases}
| |
| \boldsymbol{r}_0 & \text{if }i=1\text{,}\\
| |
| \boldsymbol{Av}_{i-1}-\sum_{j=1}^{i-1}(\boldsymbol{v}_j^\mathrm{T}\boldsymbol{Av}_{i-1})\boldsymbol{v}_j & \text{if }i>1\text{.}
| |
| \end{cases}</math>
| |
| | |
| In other words, for <math>i>1</math>, <math>\boldsymbol{v}_i</math> is found by [[Gram-Schmidt orthogonalization|Gram-Schmidt orthogonalizing]] <math>\boldsymbol{Av}_{i-1}</math> against <math>\{\boldsymbol{v}_1,\boldsymbol{v}_2,\ldots,\boldsymbol{v}_{i-1}\}</math> followed by normalization.
| |
| | |
| Put in matrix form, the iteration is captured by the equation
| |
| | |
| :<math>\boldsymbol{AV}_i=\boldsymbol{V}_{i+1}\boldsymbol{\tilde{H}}_i</math>
| |
| | |
| where
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{V}_i&=\begin{bmatrix}
| |
| \boldsymbol{v}_1 & \boldsymbol{v}_2 & \cdots & \boldsymbol{v}_i
| |
| \end{bmatrix}\text{,}\\
| |
| \boldsymbol{\tilde{H}}_i&=\begin{bmatrix}
| |
| h_{11} & h_{12} & h_{13} & \cdots & h_{1,i}\\
| |
| h_{21} & h_{22} & h_{23} & \cdots & h_{2,i}\\
| |
| & h_{32} & h_{33} & \cdots & h_{3,i}\\
| |
| & & \ddots & \ddots & \vdots\\
| |
| & & & h_{i,i-1} & h_{i,i}\\
| |
| & & & & h_{i+1,i}
| |
| \end{bmatrix}=\begin{bmatrix}
| |
| \boldsymbol{H}_i\\
| |
| h_{i+1,i}\boldsymbol{e}_i^\mathrm{T}
| |
| \end{bmatrix}
| |
| \end{align}</math>
| |
| | |
| with
| |
| | |
| :<math>h_{ji}=\begin{cases}
| |
| \boldsymbol{v}_j^\mathrm{T}\boldsymbol{Av}_i & \text{if }j\leq i\text{,}\\
| |
| \lVert\boldsymbol{w}_{i+1}\rVert_2 & \text{if }j=i+1\text{,}\\
| |
| 0 & \text{if }j>i+1\text{.}
| |
| \end{cases}</math>
| |
| | |
| When applying the Arnoldi iteration to solving linear systems, one starts with <math>\boldsymbol{r}_0=\boldsymbol{b}-\boldsymbol{Ax}_0</math>, the residual corresponding to an initial guess <math>\boldsymbol{x}_0</math>. After each step of iteration, one computes <math>\boldsymbol{y}_i=\boldsymbol{H}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)</math> and the new iterate <math>\boldsymbol{x}_i=\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{y}_i</math>.
| |
| | |
| ===The direct Lanczos method===
| |
| For the rest of discussion, we assume that <math>\boldsymbol{A}</math> is symmetric positive-definite. With symmetry of <math>\boldsymbol{A}</math>, the [[upper Hessenberg matrix]] <math>\boldsymbol{H}_i=\boldsymbol{V}_i^\mathrm{T}\boldsymbol{AV}_i</math> becomes symmetric and thus tridiagonal. It then can be more clearly denoted by
| |
| | |
| :<math>\boldsymbol{H}_i=\begin{bmatrix}
| |
| a_1 & b_2\\
| |
| b_2 & a_2 & b_3\\
| |
| & \ddots & \ddots & \ddots\\
| |
| & & b_{i-1} & a_{i-1} & b_i\\
| |
| & & & b_i & a_i
| |
| \end{bmatrix}\text{.}</math>
| |
| | |
| This enables a short three-term recurrence for <math>\boldsymbol{v}_i</math> in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration.
| |
| | |
| Since <math>\boldsymbol{A}</math> is symmetric positive-definite, so is <math>\boldsymbol{H}_i</math>. Hence, <math>\boldsymbol{H}_i</math> can be [[LU factorization|LU factorized]] without [[partial pivoting]] into
| |
| | |
| :<math>\boldsymbol{H}_i=\boldsymbol{L}_i\boldsymbol{U}_i=\begin{bmatrix}
| |
| 1\\
| |
| c_2 & 1\\
| |
| & \ddots & \ddots\\
| |
| & & c_{i-1} & 1\\
| |
| & & & c_i & 1
| |
| \end{bmatrix}\begin{bmatrix}
| |
| d_1 & b_2\\
| |
| & d_2 & b_3\\
| |
| & & \ddots & \ddots\\
| |
| & & & d_{i-1} & b_i\\
| |
| & & & & d_i
| |
| \end{bmatrix}</math>
| |
| | |
| with convenient recurrences for <math>c_i</math> and <math>d_i</math>:
| |
| | |
| :<math>\begin{align}
| |
| c_i&=b_i/d_{i-1}\text{,}\\
| |
| d_i&=\begin{cases}
| |
| a_1 & \text{if }i=1\text{,}\\
| |
| a_i-c_ib_i & \text{if }i>1\text{.}
| |
| \end{cases}
| |
| \end{align}</math>
| |
| | |
| Rewrite <math>\boldsymbol{x}_i=\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{y}_i</math> as
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{x}_i&=\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{H}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)\\
| |
| &=\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{U}_i^{-1}\boldsymbol{L}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)\\
| |
| &=\boldsymbol{x}_0+\boldsymbol{P}_i\boldsymbol{z}_i
| |
| \end{align}</math>
| |
| | |
| with
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{P}_i&=\boldsymbol{V}_{i}\boldsymbol{U}_i^{-1}\text{,}\\
| |
| \boldsymbol{z}_i&=\boldsymbol{L}_i^{-1}(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)\text{.}
| |
| \end{align}</math>
| |
| | |
| It is now important to observe that
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{P}_i&=\begin{bmatrix}
| |
| \boldsymbol{P}_{i-1} & \boldsymbol{p}_i
| |
| \end{bmatrix}\text{,}\\
| |
| \boldsymbol{z}_i&=\begin{bmatrix}
| |
| \boldsymbol{z}_{i-1}\\
| |
| \zeta_i
| |
| \end{bmatrix}\text{.}
| |
| \end{align}</math>
| |
| | |
| In fact, there are short recurrences for <math>\boldsymbol{p}_i</math> and <math>\zeta_i</math> as well:
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{p}_i&=\frac{1}{d_i}(\boldsymbol{v}_i-b_i\boldsymbol{p}_{i-1})\text{,}\\
| |
| \zeta_i&=-c_i\zeta_{i-1}\text{.}
| |
| \end{align}</math>
| |
| | |
| With this formulation, we arrive at a simple recurrence for <math>\boldsymbol{x}_i</math>:
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{x}_i&=\boldsymbol{x}_0+\boldsymbol{P}_i\boldsymbol{z}_i\\
| |
| &=\boldsymbol{x}_0+\boldsymbol{P}_{i-1}\boldsymbol{z}_{i-1}+\zeta_i\boldsymbol{p}_i\\
| |
| &=\boldsymbol{x}_{i-1}+\zeta_i\boldsymbol{p}_i\text{.}
| |
| \end{align}</math>
| |
| | |
| The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.
| |
| | |
| ===The conjugate gradient method from imposing orthogonality and conjugacy===
| |
| If we allow <math>\boldsymbol{p}_i</math> to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form:
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{x}_i&=\boldsymbol{x}_{i-1}+\alpha_{i-1}\boldsymbol{p}_{i-1}\text{,}\\
| |
| \boldsymbol{r}_i&=\boldsymbol{r}_{i-1}-\alpha_{i-1}\boldsymbol{Ap}_{i-1}\text{,}\\
| |
| \boldsymbol{p}_i&=\boldsymbol{r}_i+\beta_{i-1}\boldsymbol{p}_{i-1}\text{.}
| |
| \end{align}</math>
| |
| | |
| As premises for the simplification, we now derive the orthogonality of <math>\boldsymbol{r}_i</math> and conjugacy of <math>\boldsymbol{p}_i</math>, i.e., for <math>i\neq j</math>,
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_j&=0\text{,}\\
| |
| \boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_j&=0\text{.}
| |
| \end{align}</math>
| |
| | |
| The residuals are mutually orthogonal because <math>\boldsymbol{r}_i</math> is essentially a multiple of <math>\boldsymbol{v}_{i+1}</math> since for <math>i=0</math>, <math>\boldsymbol{r}_0=\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{v}_1</math>, for <math>i>0</math>,
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{r}_i&=\boldsymbol{b}-\boldsymbol{Ax}_i\\
| |
| &=\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_0+\boldsymbol{V}_i\boldsymbol{y}_i)\\
| |
| &=\boldsymbol{r}_0-\boldsymbol{AV}_i\boldsymbol{y}_i\\
| |
| &=\boldsymbol{r}_0-\boldsymbol{V}_{i+1}\boldsymbol{\tilde{H}}_i\boldsymbol{y}_i\\
| |
| &=\boldsymbol{r}_0-\boldsymbol{V}_i\boldsymbol{H}_i\boldsymbol{y}_i-h_{i+1,i}(\boldsymbol{e}_i^\mathrm{T}\boldsymbol{y}_i)\boldsymbol{v}_{i+1}\\
| |
| &=\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{v}_1-\boldsymbol{V}_i(\lVert\boldsymbol{r}_0\rVert_2\boldsymbol{e}_1)-h_{i+1,i}(\boldsymbol{e}_i^\mathrm{T}\boldsymbol{y}_i)\boldsymbol{v}_{i+1}\\
| |
| &=-h_{i+1,i}(\boldsymbol{e}_i^\mathrm{T}\boldsymbol{y}_i)\boldsymbol{v}_{i+1}\text{.}\end{align}</math>
| |
| | |
| To see the conjugacy of <math>\boldsymbol{p}_i</math>, it suffices to show that <math>\boldsymbol{P}_i^\mathrm{T}\boldsymbol{AP}_i</math> is diagonal:
| |
| | |
| :<math>\begin{align}
| |
| \boldsymbol{P}_i^\mathrm{T}\boldsymbol{AP}_i&=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{V}_i^\mathrm{T}\boldsymbol{AV}_i\boldsymbol{U}_i^{-1}\\
| |
| &=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{H}_i\boldsymbol{U}_i^{-1}\\
| |
| &=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{L}_i\boldsymbol{U}_i\boldsymbol{U}_i^{-1}\\
| |
| &=\boldsymbol{U}_i^{-\mathrm{T}}\boldsymbol{L}_i
| |
| \end{align}</math>
| |
| | |
| is symmetric and lower triangular simultaneously and thus must be diagonal.
| |
| | |
| Now we can derive the constant factors <math>\alpha_i</math> and <math>\beta_i</math> with respect to the scaled <math>\boldsymbol{p}_i</math> by solely imposing the orthogonality of <math>\boldsymbol{r}_i</math> and conjugacy of <math>\boldsymbol{p}_i</math>.
| |
| | |
| Due to the orthogonality of <math>\boldsymbol{r}_i</math>, it is necessary that <math>\boldsymbol{r}_{i+1}^\mathrm{T}\boldsymbol{r}_i=(\boldsymbol{r}_i-\alpha_i\boldsymbol{Ap}_i)^\mathrm{T}\boldsymbol{r}_i=0</math>. As a result,
| |
| | |
| :<math>\begin{align}
| |
| \alpha_i&=\frac{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{Ap}_i}\\
| |
| &=\frac{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}{(\boldsymbol{p}_i-\beta_{i-1}\boldsymbol{p}_{i-1})^\mathrm{T}\boldsymbol{Ap}_i}\\
| |
| &=\frac{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\text{.}
| |
| \end{align}</math>
| |
| | |
| Similarly, due to the conjugacy of <math>\boldsymbol{p}_i</math>, it is necessary that <math>\boldsymbol{p}_{i+1}^\mathrm{T}\boldsymbol{Ap}_i=(\boldsymbol{r}_{i+1}+\beta_i\boldsymbol{p}_i)^\mathrm{T}\boldsymbol{Ap}_i=0</math>. As a result,
| |
| | |
| :<math>\begin{align}
| |
| \beta_i&=-\frac{\boldsymbol{r}_{i+1}^\mathrm{T}\boldsymbol{Ap}_i}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\\
| |
| &=-\frac{\boldsymbol{r}_{i+1}^\mathrm{T}(\boldsymbol{r}_i-\boldsymbol{r}_{i+1})}{\alpha_i\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\\
| |
| &=\frac{\boldsymbol{r}_{i+1}^\mathrm{T}\boldsymbol{r}_{i+1}}{\boldsymbol{r}_i^\mathrm{T}\boldsymbol{r}_i}\text{.}
| |
| \end{align}</math>
| |
| | |
| This completes the derivation.
| |
| | |
| ==References==
| |
| #{{cite journal|last1 = Hestenes|first1 = M. R.|authorlink1 = David Hestenes|last2 = Stiefel|first2 = E.|authorlink2 = Eduard Stiefel|title = Methods of conjugate gradients for solving linear systems|journal = Journal of Research of the National Bureau of Standards|volume = 49|issue = 6|date=December 1952|url = http://nvl.nist.gov/pub/nistpubs/jres/049/6/V49.N06.A08.pdf|format=PDF}}
| |
| #{{cite book|last = Saad|first = Y.|title = Iterative methods for sparse linear systems|edition = 2nd|chapter = Chapter 6: Krylov Subspace Methods, Part I|publisher = SIAM|year = 2003|isbn = 978-0-89871-534-7}}
| |
| | |
| {{Numerical linear algebra}}
| |
| | |
| [[Category:Numerical linear algebra]]
| |
| [[Category:Optimization algorithms and methods]]
| |
| [[Category:Gradient methods]]
| |
| [[Category:Articles containing proofs]]
| |