Afshar experiment: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Ceyockey
Repairing external link using Checklinks
 
Line 1: Line 1:
'''Bootstrap aggregating''' ('''bagging''') is a [[Ensemble learning|machine learning ensemble]] [[meta-algorithm]] designed to improve the stability and accuracy of [[machine learning]] algorithms used in [[statistical classification]] and [[Regression analysis|regression]]. It also reduces [[variance]] and helps to avoid [[overfitting]]. Although it is usually applied to [[Decision tree learning|decision tree]] methods, it can be used with any type of method. Bagging is a special case of the [[Ensemble learning|model averaging]] approach.
Last week I woke up  [http://lukebryantickets.flicense.com 2014 luke bryan tour] and realised - I've been single for a while today and after much intimidation from buddies I now find myself registered for internet  luke bryan 2014 - [http://minioasis.com minioasis.com] - dating. They guaranteed me that there are lots of normal, sweet and interesting individuals to meet, so here goes the message!<br>My household and friends are awe-inspiring and spending some time with them at bar gigs or dinners is always a necessity. As I realize you could not get a decent dialog together with the sound I haven't ever been into cabarets. I likewise got two really cute and undoubtedly cheeky canines that are invariably enthusiastic to meet new people.<br>I strive to stay as physically healthy as possible coming to the [http://www.Wikipedia.org/wiki/fitness+center fitness center]  Luke Bryan Tour Schedule ([http://lukebryantickets.pyhgy.com Lukebryantickets.Pyhgy.Com]) several times a week. I appreciate my athletics and try to perform or watch because many a potential. I will often at Hawthorn fits being wintertime. Note: I have noticed the carnage of fumbling matches at stocktake revenue, In case that you would considered buying a sport I really do not mind.<br><br><br><br>Look at my web site - [http://lukebryantickets.neodga.com find luke bryan]
 
==Description of the technique==
Given a standard [[training set]] ''D'' of size ''n'', bagging generates ''m'' new training sets <math>D_i</math>, each of size ''n′'', by [[Sampling (statistics)|sampling]] from ''D'' [[Probability distribution#With finite support|uniformly]] and [[Sampling (statistics)#Replacement of selected units|with replacement]]. By sampling with replacement, some observations may be repeated in each <math>D_i</math>. If ''n[[prime (symbol)|′]]''=''n'', then for large ''n'' the set <math>D_i</math> is expected to have the fraction (1 - 1/''[[e (mathematical constant)|e]]'') (≈63.2%) of the unique examples of ''D'', the rest being duplicates.<ref>Aslam, Javed A.; Popa, Raluca A.; and Rivest, Ronald L. (2007); [http://people.csail.mit.edu/rivest/pubs/APR07.pdf ''On Estimating the Size and Confidence of a Statistical Audit''], Proceedings of the Electronic Voting Technology Workshop (EVT '07), Boston, MA, August 6, 2007. More generally, when drawing with replacement ''n′'' values out of a set of ''n'' (different and equally likely), the expected number of unique draws is <math>n(1 - e^{-n'/n})</math>.</ref> This kind of sample is known as a [[bootstrap (statistics)|bootstrap]] sample. The ''m'' models are fitted using the above ''m'' bootstrap samples and combined by averaging the output (for regression) or voting (for classification).
 
Bagging leads to "improvements for unstable procedures" (Breiman, 1996), which include, for example, [[neural nets]], [[classification and regression tree]]s, and subset selection in [[linear regression]] (Breiman, 1994). An interesting application of bagging showing improvement in preimage learning is provided here.<ref>Sahu, A., Runger, G., Apley, D., Image denoising with a multi-phase kernel principal component approach and an ensemble version, IEEE Applied Imagery Pattern Recognition Workshop, pp.1-7, 2011.</ref> On the other hand, it can mildly degrade the performance of stable methods such as K-nearest neighbors (Breiman, 1996).
 
== Example: Ozone data ==
{{Essay-like|section|date=February 2012}}
To illustrate the basic principles of bagging, below is an analysis on the relationship between [[ozone]] and temperature (data from [[Peter Rousseeuw|Rousseeuw]] and Leroy (1986), available at [[classic data sets]], analysis done in [[R (programming language)|R]]).
 
The relationship between temperature and ozone in this data set is apparently non-linear, based on the scatter plot. To mathematically describe this relationship, [[local regression|LOESS]] smoothers (with span 0.5) are used.
Instead of building a single smoother from the complete data set, 100 [[bootstrap (statistics)|bootstrap]] samples of the data were drawn. Each sample is different from the original data set, yet resembles it in distribution and variability. For each bootstrap sample, a LOESS smoother was fit. Predictions from these 100 smoothers were then made across the range of the data. The first 10 predicted smooth fits appear as grey lines in the figure below. The lines are clearly very ''wiggly'' and they overfit the data - a result of the span being too low.
 
But taking the average of 100 smoothers, each fitted to a subset of the original data set, we arrive at one bagged predictor (red line). Clearly, the mean is more stable and there is less [[overfitting|overfit]].
 
[[image:ozone.png]]
 
==Bagging for nearest neighbour classifiers==
It is well known that the [[Bayes classifier|risk]] of a 1 nearest neighbour (1NN) classifier is at most twice the risk of the [[Bayes classifier]], but there are no guarantees that this classifier will be [[Bayes classifier|consistent]]. By careful choice of the size of the resamples, bagging can lead to substantial improvements of the performance of the 1NN classifier. By taking a large number of resamples of the data of size <math>n'</math>, the bagged nearest neighbour classifier will be consistent provided <math>n' \to \infty</math> diverges but <math>n'/n \to 0</math> as the sample size <math>n \to \infty</math>.  
 
Under infinite simulation, the bagged nearest neighbour classifier can be viewed as a [[weighted nearest neighbour classifier]]. Suppose that the feature space is <math>d</math> dimensional and denote by <math>C^{bnn}_{n,n'}</math> the bagged nearest neighbour classifier based on a training set of size <math>n</math>, with resamples of size <math>n'</math>. In the infinite sampling case, under certain regularity conditions on the class distributions, the [[Bayes classifier|excess risk]] has the following asymptotic expansion<ref name = "Samworth12">{{Cite Journal | author = Samworth R. J.
| title = Optimal weighted nearest neighbour classifiers
| journal = [[Annals of Statistics]]
| volume = 40
| issue = 5
| pages = 2733-2763
| year = 2012
| doi = 10.1214/12-AOS1049
}}
</ref>
:<math>\mathcal{R}_\mathcal{R}(C^{bnn}_{n,n'}) - \mathcal{R}_{\mathcal{R}}(C^{Bayes}) = \left(B_1 \frac{n'}{n} + B_2 \frac 1 {(n')^{4/d}}\right) \{1+o(1)\},</math>
for some constants <math>B_1</math> and <math>B_2</math>. The optimal choice of <math>n'</math>, that balances the two terms in the asymptotic expansion, is given by <math>n' =  B n^{d/(d+4)}</math> for some constant <math>B</math>.
 
== History ==
 
Bagging ('''B'''ootstrap '''agg'''regat'''ing''') was proposed by [[Leo Breiman]] in 1994 to improve the classification by combining classifications of randomly generated training sets. See Breiman, 1994. Technical Report No. 421.
 
== See also ==
*[[Boosting (meta-algorithm)]]
*[[Bootstrapping (statistics)]]
*[[Cross-validation (statistics)]]
*[[Random forest]]
 
== References ==
{{Reflist}}
* {{Cite journal
| last = Breiman
| first = Leo
| authorlink = Leo Breiman
| title = Bagging predictors
| journal = [[Machine Learning (journal)|Machine Learning]]
| volume = 24
| issue = 2
| pages = 123–140
| year = 1996
| id = {{citeseerx|10.1.1.121.7654}}
| doi = 10.1007/BF00058655
}}
 
* {{Cite journal
| last = Alfaro
| first = E., Gámez, M. and García, N.
| title = adabag: An R package for classification with AdaBoost.M1, AdaBoost-SAMME and Bagging
| year = 2012
| url = http://CRAN.R-project.org/package=adabag
}}
 
[[Category:Ensemble learning]]
[[Category:Machine learning algorithms]]
[[Category:Computational statistics]]

Latest revision as of 18:44, 26 November 2014

Last week I woke up 2014 luke bryan tour and realised - I've been single for a while today and after much intimidation from buddies I now find myself registered for internet luke bryan 2014 - minioasis.com - dating. They guaranteed me that there are lots of normal, sweet and interesting individuals to meet, so here goes the message!
My household and friends are awe-inspiring and spending some time with them at bar gigs or dinners is always a necessity. As I realize you could not get a decent dialog together with the sound I haven't ever been into cabarets. I likewise got two really cute and undoubtedly cheeky canines that are invariably enthusiastic to meet new people.
I strive to stay as physically healthy as possible coming to the fitness center Luke Bryan Tour Schedule (Lukebryantickets.Pyhgy.Com) several times a week. I appreciate my athletics and try to perform or watch because many a potential. I will often at Hawthorn fits being wintertime. Note: I have noticed the carnage of fumbling matches at stocktake revenue, In case that you would considered buying a sport I really do not mind.



Look at my web site - find luke bryan