|
|
Line 1: |
Line 1: |
| '''Stability''', also known as '''algorithmic stability''', is a notion in [[computational learning theory]] of how a [[machine learning| machine learning algorithm]] is perturbed by small changes to its inputs. A stable learning algorithm is one for which the prediction does not change much when the training data is modified slightly. For instance, consider a machine learning algorithm that is being trained to recognize handwritten letters of the alphabet, using 1000 examples of handwritten letters and their labels ("A" to "Z") as a training set. One way to modify this training set is to leave out an example, so that only 999 examples of handwritten letters and their labels are available. A stable learning algorithm would produce a similar [[statistical classification|classifier]] with both the 1000-element and 999-element training sets.
| | The adore for branded outfits and extras is not new, but men and females are just in appreciate with shopping branded objects for themselves.<br><br>This is the purpose, why [http://www.pcs-systems.co.uk/Images/celinebag.aspx http://www.pcs-systems.co.uk/Images/celinebag.aspx] the popularity and desire for designer outfits, footwear and purses have been escalating by just about every passing working day.� |
| | Different trend homes have brought natural beauty and class in the lives of men and women living across the world. When you go out in the marketplace for searching or store online, then you come throughout a substantial range of manufacturers, which are marketing incredible items, but if you chat about sleekness and lovely extras then almost nothing can be improved then Abercrombie and Fitch.�<br> |
| | This model has been exceedingly rising consideration of guys and girls. The rationale of these types of a sturdy enjoy for the A&F garment and accessories is the way this model fulfills the style demands of guys gals, belonging from various walks of lifestyl<br> |
| | Abercrombie and Fitch outlet has turn out to be a single of the most sought following retailer<br> |
| | You will get total assortment of branded goods, shelved in an appealing way. Adult males and women of diverse ages have been incredibly glad by the way, [http://www.adobe.com/cfusion/search/index.cfm?term=&Abercrombie&loc=en_us&siteSection=home Abercrombie] and Fitch has been catering their vogue demands. This brand has attained appreciable respect, for it has always been prosperous in going beyond the [http://www.wired.com/search?query=anticipations anticipations] of it�s hugely prestigious and esteemed buyer<br><br> |
| | The extra you devote on the garments of Abercrombie and Fitch, the additional you adorn and improve the value of your clothes and add-ons assortme<br> |
| | If you are fashion freak human being and do not want to glimpse like they your good friends look, then nothing at all can be superior than visiting your nearest Abercrombie and Fitch outlet. You will be astonished to see so many fashionable, trendy and modish solutions available, beneath a person roo<br><br> |
| | Abercrombie and Fitch has been providing some definitely celine bags trendy garments, for adult men and g<br> |
| | � The demand from customers for the products and solutions of this model is very a lot substantial in younger men and women. This brand name pays sheer attention on building and production the solutions of the optimum top quality. The material, hues, stitching and detailing of the goods of A&F, makes this manufacturer extremely different, novel and styli<br><br> |
|
| |
|
| Stability can be studied for many types of learning problems, from [[Natural language processing|language learning]] to [[inverse problem]]s in physics and engineering, as it is a property of the learning process rather than the type of information being learned. The study of stability gained importance in [[computational learning theory]] in the 2000s when it was shown to have a connection with [[Machine_learning#Generalization|generalization]]. It was shown that for large classes of learning algorithms, notably [[empirical risk minimization]] algorithms, certain types of stability ensure good generalization.
| | It talks about giving decent and innovative merchandise to its pros<br>c |
| | .� If you will take a look at Abercrombie and Fitch outlet, then you are undoubtedly going to be tantalized by the products and solutions, for you will get all what you have been longing for. Much more and much more individuals want to phase into the shops of Abercrombie and Fitch, for they have entire have confidence in in the products, which this brand name has been presenting to its highly prestigious<br><br> |
|
| |
|
| == History ==
| | s.� This manufacturer understands the style demands of its buyers. It has been devoted in catering their wants and would like to go further than its customers� anticipations. This is the explanation, why so lots of consumers have constantly been checking out their nearest and closest Abercrombie and Fitch o<br>let. |
| | | All the stores of this brand name serve you in the most effective feasible way. It is for absolutely sure that you will get to place your arms on the goods, which are available at standardized charges at all shop of Abercrombie and Fitch. |
| A central goal in designing a [[machine learning| machine learning system]] is to guarantee that the learning algorithm will [[Machine_learning#Generalization|generalize]], or perform accurately on new examples after being trained on a finite number of them. In the 1990s, milestones were made in obtaining generalization bounds for [[supervised learning| supervised learning algorithms]]. The technique historically used to prove generalization was to show that an algorithm was [[consistent estimator|consistent]], using the [[uniform convergence]] properties of empirical quantities to their means. This technique was used to obtain generalization bounds for the large class of [[empirical risk minimization]] (ERM) algorithms. An ERM algorithm is one that selects a solution from a hypothesis space <math>H</math> in such a way to minimize the empirical error on a training set <math>S</math>.
| |
| | |
| A general result, proved by [[Vladimir Vapnik]] for an ERM binary classification algorithms, is that for any target function and input distribution, any hypothesis space <math>H</math> with [[VC dimension|VC-dimension]] <math>d</math>, and <math>n</math> training examples, the algorithm is consistent and will produce a training error that is most <math>O\left(\sqrt{\frac{d}{n}}\right)</math> (plus logarithmic factors) from the true training error. The result was later extended to almost-ERM algorithms with function classes that do not have unique minimizers.
| |
| | |
| Vapnik's work, using what became known as [[VC theory]], established a relationship between generalization of a learning algorithm and properties of the hypothesis space <math>H</math> of functions being learned. However, these results could not be applied to algorithms with hypothesis spaces of unbounded VC-dimension. Put another way, these results could not be applied when the information being learned had a complexity that was too large to measure. Some of the simplest machine learning algorithms for instance for regression have hypothesis spaces with unbounded VC-dimension. Another example is a language learning algorithms that can produce sentences of arbitrary length.
| |
| | |
| Stability analysis was developed in the 2000s for [[computational learning theory]] and is an alternative method for obtaining generalization bounds. The stability of an algorithm is a property of the learning process, rather than a direct property of the hypothesis space <math>H</math>, and it can be assessed in algorithms that have hypothesis spaces with unbounded or undefined VC-dimension such as nearest neighbor. A stable learning algorithm is one for which the learned function does not change much when the training set is slightly modified, for instance by leaving out an example. A measure of [[Leave one out error]] is used in a Cross Validation Leave One Out (CVloo) algorithm to evaluate a learning algorithm's stability with respect to the loss function. As such, stability analysis is the application of [[sensitivity analysis]] to machine learning.
| |
| | |
| == Summary of classic results ==
| |
| | |
| * '''Early 1900s''' - Stability in learning theory was earliest described in terms of continuity of the learning map <math>L</math>, traced to [[Andrey Nikolayevich Tikhonov]].
| |
| | |
| * '''1979''' - Devroye and Wagner observed that the leave-one-out behavior of an algorithm is related to its sensitivity to small changes in the sample.<ref>L. Devroye and Wagner, Distribution-free performance bounds for potential function rules, IEEE Trans. Inform. Theory 25(5) (1979) 601–604.</ref>
| |
| | |
| * '''1999''' - Kearns and Ron discovered a connection between finite VC-dimension and stability.<ref>M. Kearns and [[Dana Ron|D. Ron]], Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, Neural Comput. 11(6) (1999) 1427–1453.</ref>
| |
| | |
| * '''2002''' - In a landmark paper, Bousquet and Elisseeff proposed the notion of ''uniform hypothesis stability'' of a learning algorithm and showed that it implies low generalization error. Uniform hypothesis stability, however, is a strong condition that does not apply to large classes of algorithms, including ERM algorithms with a hypothesis space of only two functions.<ref>O. Bousquet and A. Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499–526, 2002.</ref>
| |
| | |
| * '''2002''' - Kutin and Niyogi extended Bousquet and Elliseeff's results by providing generalization bounds for several weaker forms of stability which they called ''almost-everywhere stability''. Furthermore, they took an initial step in establishing the relationship between stability and consistency in ERM algorithms in the Probably Approximately Correct (PAC) setting.<ref>S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, Technical Report TR-2002-03, University of Chicago (2002).</ref>
| |
| | |
| * '''2006''' - In an unusual publication (on a theorem!) for the journal [[nature journal|Nature]], Mukherjee et al. proved the relationship between stability and ERM consistency in the general case. They proposed a statistical form of leave-one-out-stability which they called ''CVEEEloo stability'', and showed that it is a) sufficient for generalization in bounded loss classes, and b) necessary and sufficient for consistency (and thus generalization) of ERM algorithms for certain loss functions (such as the square loss, the absolute value and the binary classification loss).<ref>S. Mukherjee, P. Niyogi, T. Poggio, and R. M. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math., 25(1-3):161–193, 2006.</ref>
| |
| | |
| * '''2010''' - Shalev Shwartz noticed problems with the original results of Vapnik due to the complex relations between hypothesis space and loss class. They discuss stability notions that capture different loss classes and different types of learning, supervised and unsupervised.<ref>Shalev Shwartz, S., Shamir,O., Srebro, N.,Sridharan, K., Learnability, Stability and Uniform Convergence, Journal of Machine Learning Research, 11(Oct):2635-2670, 2010.</ref>
| |
| | |
| == Preliminary definitions ==
| |
| | |
| We define several terms related to learning algorithms training sets, so that we can then define stability in multiple ways and present theorems from the field.
| |
| | |
| A machine learning algorithm, also known as a learning map <math>L</math>, maps a training data set, which is a set of labeled examples <math>(x,y)</math>, onto a function <math>f</math> from <math>X</math> to <math>Y</math>, where <math>X</math> and <math>Y</math> are in the same space of the training examples. The functions <math>f</math> are selected from a hypothesis space of functions called <math>H</math>.
| |
| | |
| The training set from which an algorithm learns is defined as
| |
| | |
| <math>S = \{z_1 = (x_1,\ y_1)\ ,..,\ z_m = (x_m,\ y_m)\}</math>
| |
| | |
| and is of size <math>m</math> in <math>Z = X \times Y</math>
| |
| | |
| drawn i.i.d. from an unknown distribution D.
| |
| | |
| Thus, the learning map <math>L</math> is defined as a mapping from <math>Z_m</math> into <math>H</math>, mapping a training set <math>S</math> onto a function <math>f_S</math> from <math>X</math> to <math>Y</math>. Here, we consider only deterministic algorithms where <math>L</math> is symmetric with respect to <math>S</math>, i.e. it does not depend on the order of the elements in the training set. Furthermore, we assume that all functions are measurable and all sets are countable.
| |
| | |
| The loss <math>V</math> of a hypothesis <math>f</math> with respect to an example <math>z = (x,y)</math> is then defined as <math>V(f,z) = V(f(x),y)</math>.
| |
| | |
| The empirical error of <math>f</math> is <math>I_S[f] = \frac{1}{n}\sum V(f,z_i)</math>.
| |
| | |
| The true error of <math>f</math> is <math>I[f] = \mathbb{E}_z V(f,z)</math>
| |
| | |
| Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows:
| |
| * By removing the i-th element
| |
| <math>S^{|i} = \{z_1 ,...,\ z_{i-1},\ z_{i+1},...,\ z_m\}</math>
| |
| * By replacing the i-th element
| |
| <math>S^i = \{z_1 ,...,\ z_{i-1},\ z_i^',\ z_{i+1},...,\ z_m\}</math>
| |
| | |
| == Definitions of stability ==
| |
| | |
| ===Hypothesis Stability===
| |
| An algorithm <math>L</math> has hypothesis stability β with respect to the loss function V if the following holds:
| |
| | |
| <math>\forall i\in \{1,...,m\}, \mathbb{E}_{S,z} [|V(f_S,z)-V(f_{S^{|i}},z)|]\leq\beta.</math>
| |
| | |
| ===Point-wise Hypothesis Stability===
| |
| An algorithm <math>L</math> has point-wise hypothesis stability β with respect to the loss function V if the following holds:
| |
| | |
| <math>\forall i\in\ \{1,...,m\}, \mathbb{E}_{S} [|V(f_S,z_i)-V(f_{S^{|i}},z_i)|]\leq\beta.</math>
| |
| | |
| ===Error Stability===
| |
| An algorithm <math>L</math> has error stability β with respect to the loss function V if the following holds:
| |
| | |
| <math>\forall S\in Z^m, \forall i\in\{1,...,m\}, |\mathbb{E}_z[V(f_S,z)]-\mathbb{E}_z[V(f_{S^{|i}},z)]|\leq\beta</math>
| |
| | |
| ===Uniform Stability===
| |
| An algorithm <math>L</math> has uniform stability β with respect to the loss function V if the following holds:
| |
| | |
| <math>\forall S\in Z^m, \forall i\in\{1,...,m\}, \sup_{z\in Z}|V(f_S,z)-V(f_{S^i},z)|\leq\beta</math>
| |
| | |
| A probabilistic version of uniform stability β is:
| |
| | |
| <math>\forall S\in Z^m, \forall i\in\{1,...,m\}, \mathbb{P}_S\{\sup_{z\in Z}|V(f_S,z)-V(f_{S^i},z)|\leq\beta\}\geq1-\delta</math>
| |
| | |
| ===Leave-one-out cross-validation (CVloo) Stability===
| |
| An algorithm <math>L</math> has CVloo stability β with respect to the loss function V if the following holds:
| |
| | |
| <math>\forall i\in\{1,...,m\}, \mathbb{P}_S\{\sup_{z\in Z}|V(f_S,z_i)-V(f_{S^{|i}},z_i)|\leq\beta_{CV}\}\geq1-\delta_{CV}</math>
| |
| | |
| ===Expected-leave-one-out error (<math>Eloo_{err}</math>) Stability===
| |
| An algorithm <math>L</math> has <math>Eloo_{err}</math> stability if for each n there exists a <math>\beta_{EL}^m</math> and a <math>\delta_{EL}^m</math> such that:
| |
| | |
| <math>\forall i\in\{1,...,m\}, \mathbb{P}_S\{|I[f_S]-\frac{1}{m}\sum_{i=1}^m V(f_{S^{|i}},z_i)|\leq\beta_{EL}^m\}\geq1-\delta_{EL}^m</math>, with <math>\beta_{EL}^m</math> and <math>\delta_{EL}^m</math> going to zero for <math>n\rightarrow\inf</math>
| |
| | |
| == Classic theorems ==
| |
| | |
| '''From Bousquet and Ellisseeff (02)''':
| |
| | |
| For symmetric learning algorithms with bounded loss, if the algorithm has Uniform Stability with the probabilistic definition above, then the algorithm generalizes.
| |
| | |
| Uniform Stability is a strong condition which is not met by all algorithms but is, surprisingly, met by the large and important class of Regularization algorithms.
| |
| The generalization bound is given in the article.
| |
| | |
| '''From Mukherjee et al. (06)''':
| |
| | |
| *For symmetric learning algorithms with bounded loss, if the algorithm has ''both'' Leave-one-out cross-validation (CVloo) Stability and Expected-leave-one-out error (<math>Eloo_{err}</math>) Stability as defined above, then the algorithm generalizes.
| |
| *Neither condition alone is sufficient for generalization. However, both together ensure generalization (while the converse is not true).
| |
| *For ERM algorithms specifically (say for the square loss), Leave-one-out cross-validation (CVloo) Stability is both necessary and sufficient for consistency and generalization.
| |
| | |
| This is an important result for the foundations of learning theory, because it shows that two previously unrelated properties of an algorithm, stability and consistency, are equivalent for ERM (and certain loss functions).
| |
| The generalization bound is given in the article.
| |
| | |
| ==Algorithms that are stable==
| |
| This is a list of algorithms that have been shown to be stable, and the article where the associated generalization bounds are provided.
| |
| | |
| * [[Linear regression]]<ref>Elisseff, A. A study about algorithmic stability and
| |
| their relation to generalization performances. Technical
| |
| report. (2000)
| |
| </ref>
| |
| *k-NN classifier with a {0-1} loss function.<ref>L. Devroye and Wagner, Distribution-free performance bounds for potential function rules, IEEE Trans. Inform. Theory 25(5) (1979) 601–604.</ref>
| |
| *[[Support Vector Machine]] (SVM) classification with a bounded kernel and where the regularizer is a norm in a Reproducing Kernel Hilbert Space. A large regularization constant <math>C</math> leads to good stability.<ref>O. Bousquet and A. Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499–526, 2002.</ref>
| |
| *Soft margin SVM classification.<ref>O. Bousquet and A. Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499–526, 2002.</ref>
| |
| *[[regularization (machine learning)|Regularized]] Least Squares regression.<ref>O. Bousquet and A. Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499–526, 2002.</ref>
| |
| *The minimum relative entropy algorithm for classification.<ref>O. Bousquet and A. Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499–526, 2002.</ref>
| |
| *A version of [[bootstrap aggregating|bagging]] regularizers with the number <math>k</math> of regressors increasing with <math>n</math>.<ref>Rifkin, R. Everything Old is New Again: A fresh
| |
| look at historical approaches in machine learning. Ph.D. Thesis, MIT, 2002</ref>
| |
| *Multi-class SVM classification.<ref>Rifkin, R. Everything Old is New Again: A fresh
| |
| look at historical approaches in machine learning. Ph.D. Thesis, MIT, 2002</ref>
| |
| | |
| == References ==
| |
| {{Reflist}}
| |
| | |
| ==Further reading==
| |
| {{Refbegin}}
| |
| *S.Kutin and P.Niyogi.Almost-everywhere algorithmic stability and generalization error. In Proc. of UAI 18, 2002
| |
| *S. Rakhlin, S. Mukherjee, and T. Poggio. Stability results in learning theory. Analysis and Applications, 3(4):397–419, 2005
| |
| *V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995
| |
| *Vapnik, V., Statistical Learning Theory. Wiley, New York, 1998
| |
| *Poggio, T., Rifkin, R., Mukherjee, S. and Niyogi, P., "Learning Theory: general conditions for predictivity", Nature, Vol. 428, 419-422, 2004
| |
| *Andre Elisseeff, Theodoros Evgeniou, Massimiliano Pontil, Stability of Randomized Learning Algorithms, Journal of Machine Learning Research 6, 55–79, 2010
| |
| *Elisseeff, A. Pontil, M., Leave-one-out Error and Stability of Learning Algorithms with Applications, NATO SCIENCE SERIES SUB SERIES III COMPUTER AND SYSTEMS SCIENCES, 2003, VOL 190, pages 111-130
| |
| *Shalev Shwartz, S., Shamir,O., Srebro, N.,Sridharan, K., Learnability, Stability and Uniform Convergence, Journal of Machine Learning Research, 11(Oct):2635-2670, 2010
| |
| {{Refend}}
| |
| | |
| [[Category:Machine learning| ]]
| |
| [[Category:Learning]]
| |
The adore for branded outfits and extras is not new, but men and females are just in appreciate with shopping branded objects for themselves.
This is the purpose, why http://www.pcs-systems.co.uk/Images/celinebag.aspx the popularity and desire for designer outfits, footwear and purses have been escalating by just about every passing working day.�
Different trend homes have brought natural beauty and class in the lives of men and women living across the world. When you go out in the marketplace for searching or store online, then you come throughout a substantial range of manufacturers, which are marketing incredible items, but if you chat about sleekness and lovely extras then almost nothing can be improved then Abercrombie and Fitch.�
This model has been exceedingly rising consideration of guys and girls. The rationale of these types of a sturdy enjoy for the A&F garment and accessories is the way this model fulfills the style demands of guys gals, belonging from various walks of lifestyl
Abercrombie and Fitch outlet has turn out to be a single of the most sought following retailer
You will get total assortment of branded goods, shelved in an appealing way. Adult males and women of diverse ages have been incredibly glad by the way, Abercrombie and Fitch has been catering their vogue demands. This brand has attained appreciable respect, for it has always been prosperous in going beyond the anticipations of it�s hugely prestigious and esteemed buyer
The extra you devote on the garments of Abercrombie and Fitch, the additional you adorn and improve the value of your clothes and add-ons assortme
If you are fashion freak human being and do not want to glimpse like they your good friends look, then nothing at all can be superior than visiting your nearest Abercrombie and Fitch outlet. You will be astonished to see so many fashionable, trendy and modish solutions available, beneath a person roo
Abercrombie and Fitch has been providing some definitely celine bags trendy garments, for adult men and g
� The demand from customers for the products and solutions of this model is very a lot substantial in younger men and women. This brand name pays sheer attention on building and production the solutions of the optimum top quality. The material, hues, stitching and detailing of the goods of A&F, makes this manufacturer extremely different, novel and styli
It talks about giving decent and innovative merchandise to its pros
c
.� If you will take a look at Abercrombie and Fitch outlet, then you are undoubtedly going to be tantalized by the products and solutions, for you will get all what you have been longing for. Much more and much more individuals want to phase into the shops of Abercrombie and Fitch, for they have entire have confidence in in the products, which this brand name has been presenting to its highly prestigious
s.� This manufacturer understands the style demands of its buyers. It has been devoted in catering their wants and would like to go further than its customers� anticipations. This is the explanation, why so lots of consumers have constantly been checking out their nearest and closest Abercrombie and Fitch o
let.
All the stores of this brand name serve you in the most effective feasible way. It is for absolutely sure that you will get to place your arms on the goods, which are available at standardized charges at all shop of Abercrombie and Fitch.