Conway chained arrow notation: Difference between revisions

Revision as of 16:18, 12 January 2014

In statistical learning theory, or sometimes computational learning theory, the VC dimension (for Vapnik–Chervonenkis dimension) is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter. It is a core concept in Vapnik–Chervonenkis theory, and was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high-degree polynomial: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This function may not fit the training set well, because it has a low capacity. This notion of capacity is made more rigorous below.

Shattering

A classification model $f$ with some parameter vector $\theta$ is said to shatter a set of data points $(x_{1},x_{2},\ldots ,x_{n})$ if, for all assignments of labels to those points, there exists a $\theta$ such that the model $f$ makes no errors when evaluating that set of data points.

The VC dimension of a model $f$ is the maximum number of points that can be arranged so that $f$ shatters them. More formally, it is $h'$ where $h'$ is the maximum $h$ such that some data point set of cardinality $h$ can be shattered by $f$ .

For example, consider a straight line as the classification model: the model used by a perceptron. The line should separate positive data points from negative data points. There exist sets of 3 points that can indeed be shattered using this model (any 3 points that are not collinear can be shattered). However, no set of 4 points can be shattered: by Radon's theorem, any four points can be partitioned into two subsets with intersecting convex hulls, so it is not possible to separate one of these two subsets from the other. Thus, the VC dimension of this particular classifier is 3. It is important to remember that while one can choose any arrangement of points, the arrangement of those points cannot change when attempting to shatter for some label assignment. Note, only 3 of the 2³ = 8 possible label assignments are shown for the three points.


3 points shattered			4 points impossible

Uses

The VC dimension has utility in statistical learning theory, because it can predict a probabilistic upper bound on the test error of a classification model.

Vapnik ^[1] proved that the probability of the test error distancing from an upper bound (on data that is drawn i.i.d. from the same distribution as the training set) is given by

$P\left({\text{test error}}\leq {\text{training error}}+{\sqrt {h(\log(2N/h)+1)-\log(\eta /4) \over N}}\right)=1-\eta$

where $h$ is the VC dimension of the classification model, and $N$ is the size of the training set (restriction: this formula is valid when $h\ll N$ ). Similar complexity bounds can be derived using Rademacher complexity, but Rademacher complexity can sometimes provide more insight than VC dimension calculations into such statistical methods such as those using kernels.

In computational geometry, VC dimension is one of the critical parameters in the size of ε-nets, which determines the complexity of approximation algorithms based on them; range sets without finite VC dimension may not have finite ε-nets at all.

References

↑ Vapnik, Vladimir. The nature of statistical learning theory. springer, 2000.

Andrew Moore's VC dimension tutorial
Vapnik, Vladimir. "The nature of statistical learning theory". springer, 2000.
V. Vapnik and A. Chervonenkis. "On the uniform convergence of relative frequencies of events to their probabilities." Theory of Probability and its Applications, 16(2):264–280, 1971.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. "Learnability and the Vapnik–Chervonenkis dimension." Journal of the ACM, 36(4):929–865, 1989.
Christopher Burges Tutorial on SVMs for Pattern Recognition (containing information also for VC dimension) [1]
Bernard Chazelle. "The Discrepancy Method." [2]

[1] Vapnik, Vladimir. The nature of statistical learning theory. springer, 2000.

[1]

@@ Line 1: / Line 1: @@
-We understand the famous BMI or Body Mass Index Calculation so commonly used to assess whether you're usual, obese or obese? Well, what do we actually learn about it? Do we think it's important?<br><br>My point is I recognize I'm not overweight plus I understand I am a healthy person on my cholesterol and blood pressure, so there are details that give to my weight a calculator can't account for.<br><br>Don't worry about the non-integer exponent. Any off-the-shelf scientific calculator may handle it. I wouldn't use the LI for clinical purposes--even though it's more reasonable than BMI.<br><br>Wondering what a weight should be?  Try Mayo Clinics [http://safedietplansforwomen.com/bmi-calculator bmi calculator women].  We input height and weight, the calculator determines a BMI  or body mass index (a formula that estimates body fat).  We are able to compare to the norms for under/over plus simply appropriate weights. Results is printed.<br><br>A person having a large frame and/or a great deal of muscle may be flagged by Body Mass Index as being overweight--even whenever he has low body fat. This was true for disgraced cyclist Lance Armstrong at 1 point inside his athletic career.<br><br>Anyone whom is obese shoe definitely find them selves a superior doctor to discuss their weight plus health with. Occasionally an illness could put the hormones from whack plus result us to gain weight. Or maybe a medication is preventing us from losing the weight or result fatigue thus you may exercise. In any case, make sure to discuss this with the doctor plus left them know the plans. They can provide assistance or they may have to change drugs and etc.<br><br>In terms of its dodgy mathematics, the BMI is borderline junk research. However inside this respect, BMI is less egregious than Global Warming 'studies', inside that most of the big-name 'researchers' cherry-pick, hide, or fabricate information. Click on the link for my long hub on the topic.
+In [[Vapnik–Chervonenkis theory|statistical learning theory]], or sometimes [[computational learning theory]], the '''VC dimension''' (for '''Vapnik–Chervonenkis dimension''') is a measure of the [[membership function (mathematics)|capacity]] of a [[statistical classification]] [[algorithm]], defined as the [[cardinality]] of the largest set of points that the algorithm can [[Shattering (machine learning)|shatter]]. It is a core concept in [[Vapnik–Chervonenkis theory]], and was originally defined by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]].
+Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the [[Heaviside step function|thresholding]] of a high-[[degree of a polynomial|degree]] [[polynomial]]: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This function may not fit the training set well, because it has a low capacity. This notion of capacity is made more rigorous below.
+== Shattering ==
+A classification model <math>f</math> with some parameter vector <math>\theta</math> is said to ''shatter'' a set of data points <math>(x_1,x_2,\ldots,x_n)</math> if, for all assignments of labels to those points, there exists a <math>\theta</math> such that the model <math>f</math> makes no errors when evaluating that set of data points.
+The VC dimension of a model <math>f</math> is the maximum number of points that can be arranged so that <math>f</math> shatters them.  More formally, it is <math>h'</math> where <math>h'</math> is the maximum <math>h</math> such that some data point set of [[cardinality]] <math>h</math> can be shattered by <math>f</math>.
+For example, consider a [[linear classifier|straight line]] as the classification model: the model used by a [[perceptron]]. The line should separate positive data points from negative data points. There exist sets of 3 points that can indeed be shattered using this model (any 3 points that are not collinear can be shattered). However, no set of 4 points can be shattered: by [[Radon's theorem]], any four points can be partitioned into two subsets with intersecting convex hulls, so it is not possible to separate one of these two subsets from the other. Thus, the VC dimension of this particular classifier is&nbsp;3. It is important to remember that while one can choose any arrangement of points, the arrangement of those points cannot change when attempting to shatter for some label assignment. Note, only 3 of the 2<sup>3</sup>&nbsp;=&nbsp;8 possible label assignments are shown for the three points.
+{| border="0" cellpadding="4" cellspacing="0"
+|- style="text-align:center;"
+|  style="background:#dfd;"| [[File:VC1.svg]]
+|  style="background:#dfd;"| [[File:VC2.svg]]
+|  style="background:#dfd;"| [[File:VC3.svg]]
+|  style="background:#fdd;"| [[File:VC4.svg]]
+|- style="text-align:center;"
+| colspan="3"  style="background:#dfd;"| '''3 points shattered'''
+|  style="background:#fdd;"| '''4 points impossible'''
+|}
+== Uses ==
+The VC dimension has utility in statistical learning theory, because it can predict a [[probabilistic]] [[upper bound]] on the test error of a classification model.
+Vapnik <ref>Vapnik, Vladimir. The nature of statistical learning theory. springer, 2000.</ref>  proved that the probability of the test error distancing from an upper bound (on data that is drawn [[Independent identically-distributed random variables|i.i.d.]] from the same distribution as the training set) is given by
+<math>
+   P \left(\text{test error} \leq \text{training error} + \sqrt{h(\log(2N/h)+1)-\log(\eta/4)\over N} \right) = 1 - \eta
+</math>
+where <math>h</math> is the VC dimension of the classification model, and <math>N</math> is the size of the training set (restriction: this formula is valid when <math>h \ll N</math>). Similar complexity bounds can be derived using [[Rademacher complexity]], but Rademacher complexity can sometimes provide more insight than VC dimension calculations into such statistical methods such as those using [[kernel methods|kernels]].
+In [[computational geometry]], VC dimension is one of the critical parameters in the size of [[E-net (computational geometry)|ε-nets]], which determines the complexity of approximation algorithms based on them; range sets without finite VC dimension may not have finite ε-nets at all.
+==See also==
+*[[Sauer–Shelah lemma]], a bound on the number of sets in a set system in terms of the VC dimension
+== References ==
+<references/>
+* Andrew Moore's [http://www-2.cs.cmu.edu/~awm/tutorials/vcdim.html VC dimension tutorial]
+* Vapnik, Vladimir. "The nature of statistical learning theory". springer, 2000.
+* V. Vapnik and A. Chervonenkis. "On the uniform convergence of relative frequencies of events to their probabilities." ''Theory of Probability and its Applications'', 16(2):264–280, 1971.
+* A. Blumer, A. Ehrenfeucht, D. Haussler, and [[Manfred K. Warmuth|M. K. Warmuth]]. "Learnability and the Vapnik–Chervonenkis dimension." ''Journal of the ACM'', 36(4):929–865, 1989.
+* Christopher Burges Tutorial on SVMs for Pattern Recognition (containing information also for VC dimension) [http://citeseer.ist.psu.edu/burges98tutorial.html]
+* [[Bernard Chazelle]]. "The Discrepancy Method." [http://www.cs.princeton.edu/~chazelle/book.html]
+[[Category:Dimension]]
+[[Category:Statistical classification]]
+[[Category:Computational learning theory]]
+[[Category:Measures of complexity]]

Conway chained arrow notation: Difference between revisions

Revision as of 16:18, 12 January 2014

Contents

Shattering

Uses

See also

References

Navigation menu

Conway chained arrow notation: Difference between revisions

Revision as of 16:18, 12 January 2014

Shattering

Uses

See also

References

Navigation menu

Search