|
|
Line 1: |
Line 1: |
| {{COI|date=January 2012}}
| | I am 26 years old and my name is Wayne Rand. I life in Triebental (Austria). FREE SHIPPING - Save 80% & every coach handbags is the best work of designer. My site is [http://coachfactorybags.weebly.com/cheap-purses.html wholesale coach bags] |
| In mathematics, '''low-rank approximation''' is a [[mathematical optimization|minimization]] problem, in which the [[Loss function|cost function]] measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced [[rank (linear algebra)|rank]]. The problem is used for [[mathematical model]]ing and [[data compression]]. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., [[nonnegative matrix|non-negativity]] and [[Hankel matrix|Hankel structure]].
| |
| | |
| Low-rank approximation is closely related to:
| |
| | |
| * [[principal component analysis]],
| |
| * [[factor analysis]],
| |
| * [[total least squares]],
| |
| * [[latent semantic analysis]], and
| |
| * [[orthogonal regression]].
| |
| | |
| == Definition ==
| |
| | |
| Given
| |
| | |
| * structure specification <math>\mathcal{S} : \mathbb{R}^{n_p} \to \mathbb{R}^{m\times n}</math>,
| |
| * vector of structure parameters <math>p\in\mathbb{R}^{n_p}</math>, and
| |
| * desired rank <math>r</math>,
| |
| | |
| :<math>
| |
| \text{minimize} \quad \text{over } \widehat p \quad \|p - \widehat p\| \quad\text{subject to}\quad \operatorname{rank}\big(\mathcal{S}(\widehat p)\big) \leq r.
| |
| </math>
| |
| | |
| == Applications ==
| |
| | |
| * Linear [[system identification]], in which case the approximating matrix is [[Hankel matrix|Hankel structured]].<ref name=sysid-aut>
| |
| I. Markovsky, Structured low-rank approximation and its applications, Automatica, Volume 44, Issue 4, April 2008, Pages 891–909. http://dx.doi.org/10.1016/j.automatica.2007.09.011</ref><ref name=sysid-ac> | |
| I. Markovsky, J. C. Willems, S. Van Huffel, B. De Moor, and R. Pintelon, Application of structured total least squares for system identification and model reduction. IEEE Transactions on Automatic Control, Volume 50, Number 10, 2005, pages 1490–1500.</ref>
| |
| * [[Machine learning]], in which case the approximating matrix is nonlinearly structured.<ref name=book-springer>I. Markovsky, Low-Rank Approximation: Algorithms, Implementation, Applications, Springer, 2012, ISBN 978-1-4471-2226-5</ref>
| |
| * [[Recommender system]], in which case the data matrix has [[missing values]] and the approximation is [[categorical data|categorical]].
| |
| * Distance [[matrix completion]], in which case there is a positive definiteness constraint.
| |
| * [[Natural language processing]], in which case the approximation is [[nonnegative matrix|nonnegative]].
| |
| * [[Computer algebra]], in which case the approximation is [[Sylvester matrix|Sylvester structured]].
| |
| | |
| == Basic low-rank approximation problem ==
| |
| | |
| The unstructured problem with fit measured by the [[Frobenius norm]], i.e.,
| |
| :<math>
| |
| \text{minimize} \quad \text{over } \widehat D \quad \|D - \widehat D\|_{\text{F}}
| |
| \quad\text{subject to}\quad \operatorname{rank}\big(\widehat D\big) \leq r
| |
| </math>
| |
| has analytic solution in terms of the [[singular value decomposition]] of the data matrix. The result is referred to as the matrix approximation lemma or Eckart–Young–Mirsky theorem.<ref name=EYM-thm>C. Eckart, G. Young, The approximation of one matrix by another of lower rank. Psychometrika, Volume 1, 1936, Pages 211–8. {{doi|10.1007/BF02288367}}</ref> Let
| |
| :<math>
| |
| D = U\Sigma V^{\top} \in \mathbb{R}^{m\times n}, \quad m \leq n
| |
| </math>
| |
| be the singular value decomposition of <math>D</math> and partition <math>U</math>, <math>\Sigma=:\operatorname{diag}(\sigma_1,\ldots,\sigma_m)</math>, and <math>V</math> as follows:
| |
| :<math>
| |
| U =: \begin{bmatrix} U_1 & U_2\end{bmatrix}, \quad
| |
| \Sigma =: \begin{bmatrix} \Sigma_1 & 0 \\ 0 & \Sigma_2 \end{bmatrix}, \quad\text{and}\quad
| |
| V =: \begin{bmatrix} V_1 & V_2 \end{bmatrix},
| |
| </math>
| |
| where <math>\Sigma_1</math> is <math>r\times r</math>, <math>U_1</math> is <math>m\times r</math>, and <math>V_1</math> is <math>n\times r</math>. Then the rank-<math>r</math> matrix, obtained from the truncated singular value decomposition
| |
| :<math>
| |
| \widehat D^* = U_1 \Sigma_1 V_1^{\top},
| |
| </math>
| |
| is such that
| |
| :<math>
| |
| \|D-\widehat D^*\|_{\text{F}} = \min_{\operatorname{rank}(\widehat D) \leq r} \|D-\widehat D\|_{\text{F}} = \sqrt{\sigma^2_{r+1} + \cdots + \sigma^2_m}.
| |
| </math>
| |
| The minimizer <math>\widehat D^*</math> is unique if and only if <math>\sigma_{r+1}\neq\sigma_{r}</math>.
| |
| | |
| == Weighted low-rank approximation problems ==
| |
| | |
| The Frobenius norm weights uniformly all elements of the approximation error <math>D - \widehat D</math>. Prior knowledge about distribution of the errors can be taken into account by considering the weighted low-rank approximation problem
| |
| :<math>
| |
| \text{minimize} \quad \text{over } \widehat D \quad
| |
| \operatorname{vec}^{\top}(D - \widehat D) W \operatorname{vec}(D - \widehat D)
| |
| \quad\text{subject to}\quad \operatorname{rank}(\widehat D) \leq r,
| |
| </math>
| |
| where <math>vec(A)</math> [[vectorization (mathematics)|vectorizes]] the matrix <math>A</math> column wise and <math>W</math> is a given positive (semi)definite weight matrix.
| |
| | |
| The general weighted low-rank approximation problem does not admit an analytic solution in terms of the singular value decomposition and is solved by local optimization methods.
| |
| | |
| == Image and kernel representations of the rank constraints ==
| |
| | |
| Using the equivalences
| |
| :<math>
| |
| \operatorname{rank}(\widehat D) \leq r
| |
| \quad\iff\quad
| |
| \text{there are } P\in\R^{m\times r} \text{ and } L\in\R^{r\times n}
| |
| \text{ such that } \widehat D = PL
| |
| </math>
| |
| and
| |
| :<math>
| |
| \operatorname{rank}(\widehat D) \leq r
| |
| \quad\iff\quad
| |
| \text{there is full row rank } R\in\R^{m - r\times m} \text{ such that } R \widehat D = 0
| |
| </math>
| |
| the weighted low-rank approximation problem becomes equivalent to the parameter optimization problems
| |
| :<math>
| |
| \text{minimize} \quad \text{over } \widehat D, P \text{ and } L \quad
| |
| \operatorname{vec}^{\top}(D - \widehat D) W \operatorname{vec}(D - \widehat D)
| |
| \quad\text{subject to}\quad \widehat D = PL
| |
| </math>
| |
| and
| |
| :<math>
| |
| \text{minimize} \quad \text{over } \widehat D \text{ and } R \quad
| |
| \operatorname{vec}^{\top}(D - \widehat D) W \operatorname{vec}(D - \widehat D)
| |
| \quad\text{subject to}\quad R \widehat D = 0 \quad\text{and}\quad RR^{\top} = I_r,
| |
| </math>
| |
| where <math>I_r</math> is the [[identity matrix]] of size <math>r</math>.
| |
| | |
| == Alternating projections algorithm ==
| |
| | |
| The image representation of the rank constraint suggests a parameter optimization methods, in which the cost function is minimized alternatively over one of the variables (<math>P</math> or <math>L</math>) with the other one fixed. Although simultaneous minimization over both <math>P</math> and <math>L</math> is a difficult non[[convex optimization]] problem, minimization over one of the variables alone is a [[linear least squares (mathematics)|linear least squares]] problem and can be solved globally and efficiently.
| |
| | |
| The resulting optimization algorithm (called alternating projections) is globally convergent with a linear convergence rate to a locally optimal solution of the weighted low-rank approximation problem. Starting value for the <math>P</math> (or <math>L</math>) parameter should be given. The iteration is stopped when a user defined convergence condition is satisfied.
| |
| | |
| [[Matlab]] implementation of the alternating projections algorithm for weighted low-rank approximation:
| |
| | |
| <source lang="matlab">
| |
| function [dh, f] = wlra_ap(d, w, p, tol, maxiter)
| |
| [m, n] = size(d); r = size(p, 2); f = inf;
| |
| for i = 2:maxiter
| |
| % minimization over L
| |
| bp = kron(eye(n), p);
| |
| vl = (bp' * w * bp) \ bp' * w * d(:);
| |
| l = reshape(vl, r, n);
| |
| % minimization over P
| |
| bl = kron(l', eye(m));
| |
| vp = (bl' * w * bl) \ bl' * w * d(:);
| |
| p = reshape(vp, m, r);
| |
| % check exit condition
| |
| dh = p * l; dd = d - dh;
| |
| f(i) = dd(:)' * w * dd(:);
| |
| if abs(f(i - 1) - f(i)) < tol, break, end
| |
| end
| |
| </source>
| |
| | |
| == Variable projections algorithm ==
| |
| | |
| The alternating projections algorithm exploits the fact that the low rank approximation problem, parameterized in the image form, is bilinear in the variables <math>P</math> or <math>L</math>. The bilinear nature of the problem is effectively used in an alternative approach, called variable projections.<ref>G. Golub and V. Pereyra, Separable nonlinear least squares: the variable projection method and its applications, Institute of Physics, Inverse Problems, Volume 19, 2003, Pages 1-26.</ref>
| |
| | |
| Consider again the weighted low rank approximation problem, parameterized in the image form. Minimization with respect to the <math>L</math> variable (a linear least squares problem) leads to the closed form expression of the approximation error as a function of <math>P</math>
| |
| :<math>
| |
| f(P) = \sqrt{\operatorname{vec}^{\top}(D)\Big(
| |
| W - W (I_n \otimes P) \big( (I_n \otimes P)^{\top} W (I_n \otimes P) \big)^{-1} (I_n \otimes P)^{\top} W
| |
| \Big) \operatorname{vec}(D)}.
| |
| </math>
| |
| The original problem is therefore equivalent to the [[Least_squares#Non-linear_least_squares|nonlinear least squares problem]] of minimizing <math>f(P)</math> with respect to <math>P</math>. For this purpose standard optimization methods, e.g. the [[Levenberg-Marquardt algorithm]] can be used.
| |
| | |
| [[Matlab]] implementation of the variable projections algorithm for weighted low-rank approximation:
| |
| | |
| <source lang="matlab">
| |
| function [dh, f] = wlra_varpro(d, w, p, tol, maxiter)
| |
| prob = optimset(); prob.solver = 'lsqnonlin';
| |
| prob.options = optimset('MaxIter', maxiter, 'TolFun', tol);
| |
| prob.x0 = p; prob.objective = @(p) cost_fun(p, d, w);
| |
| [p, f ] = lsqnonlin(prob);
| |
| [f, vl] = cost_fun(p, d, w);
| |
| dh = p * reshape(vl, size(p, 2), size(d, 2));
| |
| | |
| function [f, vl] = cost_fun(p, d, w)
| |
| bp = kron(eye(size(d, 2)), p);
| |
| vl = (bp' * w * bp) \ bp' * w * d(:);
| |
| f = d(:)' * w * (d(:) - bp * vl);
| |
| </source>
| |
|
| |
| The variable projections approach can be applied also to low rank approximation problems parameterized in the kernel form. The method is effective when the number of eliminated variables is much larger than the number of optimization variables left at the stage of the nonlinear least squares minimization. Such problems occur in system identification, parameterized in the kernel form, where the eliminated variables are the approximating trajectory and the remaining variables are the model parameters. In the context of [[LTI system theory|linear time-invariant systems]], the elimination step is equivalent to [[Kalman filter|Kalman smoothing]].
| |
| | |
| == See also==
| |
| * [[CUR matrix approximation]] is made from the rows and columns of the original matrix
| |
| | |
| ==References==
| |
| {{reflist}}
| |
| * M. T. Chu, R. E. Funderlic, R. J. Plemmons, Structured low-rank approximation, Linear Algebra and its Applications, Volume 366, 1 June 2003, Pages 157–172, http://dx.doi.org/10.1016/S0024-3795(02)00505-0
| |
| | |
| ==External links==
| |
| *[https://github.com/slra/slra C++ package for structured-low rank approximation]
| |
| | |
| [[Category:Numerical linear algebra]]
| |
| [[Category:Dimension reduction]]
| |
| [[Category:Mathematical optimization]]
| |