|
|
Line 1: |
Line 1: |
| The '''Frank–Wolfe algorithm''' is a simple [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite doi|10.1016/0041-5553(66)90114-5|noedit}}</ref> '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe (mathematician)|Philip Wolfe]] in 1956.<ref>{{cite doi|10.1002/nav.3800030109|noedit}}</ref> In each iteration, the Frank–Wolfe algorithm considers a [[linear approximation]] of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same domain).
| | Irwin Butts is what my wife enjoys to call me although I don't really like becoming known as like that. Body building is 1 of the things I adore most. For many years he's been operating as a receptionist. For years he's been residing in North Dakota and his family loves it.<br><br>my website: [http://www.streaming.iwarrior.net/user/WMcneil at home std test] |
| | |
| ==Problem statement==
| |
| | |
| :Minimize <math> f(\mathbf{x})</math>
| |
| :subject to <math> \mathbf{x} \in \mathcal{D}</math>.
| |
| Where the function <math> f</math> is [[Convex function|convex]] and [[differentiable function|differentiable]], and the domain / feasible set <math>\mathcal{D}</math> is a [[Convex set|convex]] and bounded set in some [[vector space]].
| |
| | |
| ==Algorithm==
| |
| [[File:Frank-Wolfe-Algorithm.png|thumbnail|right|A step of the Frank-Wolfe algorithm]]
| |
| | |
| :''Initialization:'' Let <math>k \leftarrow 0</math>, and let <math>\mathbf{x}_0 \!</math> be any point in <math>\mathcal{D}</math>.
| |
| | |
| :'''Step 1.''' ''Direction-finding subproblem:'' Find <math>\mathbf{s}_k</math> solving
| |
| ::Minimize <math> \mathbf{s}^T \nabla f(\mathbf{x}_k)</math>
| |
| ::Subject to <math>\mathbf{s} \in \mathcal{D}</math>
| |
| :''(Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series|Taylor approximation]] of <math>f</math> around <math>\mathbf{x}_k \!</math>.)''
| |
| | |
| :'''Step 2.''' ''Step size determination:'' Set <math>\gamma \leftarrow \frac{2}{k+2}</math>, or alternatively find <math>\gamma</math> that minimizes <math> f(\mathbf{x}_k+\gamma(\mathbf{s}_k -\mathbf{x}_k))</math> subject to <math>0 \le \gamma \le 1</math> .
| |
| | |
| :'''Step 3.''' ''Update:'' Let <math>\mathbf{x}_{k+1}\leftarrow \mathbf{x}_k+\gamma(\mathbf{s}_k-\mathbf{x}_k)</math>, let <math>k \leftarrow k+1</math> and go to Step 1.
| |
| | |
| ==Properties==
| |
| While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set.
| |
| | |
| The convergence of the Frank–Wolfe algorithm is sublinear in general: the error to the optimum is <math>O(1/k)</math> after k iterations. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{cite doi|10.1016/0022-247X(78)90137-3|noedit}}</ref>
| |
| | |
| The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{cite doi|10.1145/1824777.1824783|noedit}}</ref> as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[transportation network]]s.<ref>{{cite doi|10.1016/0191-2615(84)90029-8|noedit}}</ref>
| |
| | |
| If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
| |
| | |
| While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>{{Cite book|title=Nonlinear Programming|first= Dimitri |last=Bertsekas|year= 2003|page= 222|publisher= Athena Scientific| isbn =1-886529-00-0}}</ref>
| |
| | |
| ==Lower bounds on the solution value, and primal-dual analysis==
| |
| | |
| Since <math>f</math> is convex, <math>f(\mathbf{y})</math> is always above the [[Tangent|tangent plane]] of <math>f</math> at any point <math>\mathbf{x} \in \mathcal{D}</math>:
| |
| | |
| :<math>
| |
| f(\mathbf{y}) \geq f(\mathbf{x}) + (\mathbf{y} - \mathbf{x})^T \nabla f(\mathbf{x})
| |
| </math>
| |
| | |
| This holds in particular for the (unknown) optimal solution <math>\mathbf{x}^*</math>. The best lower bound with respect to a given point <math>\mathbf{x}</math> is given by
| |
| | |
| :<math>
| |
| f(\mathbf{x}^*) \geq \min_{\mathbf{y} \in D} f(\mathbf{x}) + (\mathbf{y} - \mathbf{x})^T \nabla f(\mathbf{x}) = f(\mathbf{x}) - \mathbf{x}^T \nabla f(\mathbf{x}) + \min_{\mathbf{y} \in D} \mathbf{y}^T \nabla f(\mathbf{x})
| |
| </math>
| |
| | |
| The latter optimization problem is solved in every iteration of the Frank-Wolfe algorithm, therefore the solution <math>\mathbf{s}_k</math> of the direction-finding subproblem of the <math>k</math>-th iteration can be used to determine increasing lower bounds <math>l_k</math> during each iteration by setting <math>l_0 = - \infty</math> and
| |
| | |
| :<math>
| |
| l_k := \max (l_{k - 1}, f(\mathbf{x}_k) + (\mathbf{s}_k - \mathbf{x})^T \nabla f(\mathbf{x}_k))
| |
| </math>
| |
| Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always <math>l_k \leq f(\mathbf{x}^*) \leq f(\mathbf{x}_k)</math>.
| |
| | |
| It has been shown that this corresponding [[duality gap]], that is the difference between <math>f(\mathbf{x}_k)</math> and the lower bound <math>l_k</math>, decreases with the same convergence rate, i.e.
| |
| <math> | |
| f(\mathbf{x}_k) - l_k = O(1/k) .
| |
| </math> | |
| | |
| ==Notes==
| |
| {{Reflist}}
| |
| | |
| ==Bibliography==
| |
| *{{cite journal|last=Jaggi|first=Martin|title=Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization|journal=Journal of Machine Learning Research: Workshop and Conference Proceedings |volume=28|issue=1|pages=427–435|year= 2013 |url=http://jmlr.csail.mit.edu/proceedings/papers/v28/jaggi13.html}} (Overview paper)
| |
| *[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description
| |
| | |
| == See also ==
| |
| * [[Proximal Gradient Methods]]
| |
| | |
| {{Optimization algorithms|convex}}
| |
| | |
| {{DEFAULTSORT:Frank-Wolfe algorithm}}
| |
| [[Category:Optimization algorithms and methods]]
| |
| [[Category:Iterative methods]]
| |
| [[Category:First order methods]]
| |
| [[Category:Gradient methods]]
| |
Irwin Butts is what my wife enjoys to call me although I don't really like becoming known as like that. Body building is 1 of the things I adore most. For many years he's been operating as a receptionist. For years he's been residing in North Dakota and his family loves it.
my website: at home std test