|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| In '''computer vision''', the '''bag-of-words model''' (BoW model) can be applied to [[image classification]], by treating image features as words. In document classification, a [[bag of words]] is a sparse vector of occurrence counts of words; that is, a sparse [[histogram]] over the vocabulary. In [[computer vision]], a ''bag of visual words'' is a vector of occurrence counts of a vocabulary of local image features.
| | For a wagon to safely move forward, someone has to download it as well as the weight inside cannot be too much. The same holds true for society, for an economy for you to forward, customers have to try to move the economy in advance. Moreover, there must not be too splitting a bone . resting your past wagon adding unneeded free weight.<br><br>There a lot of ways that thieves get your credit rating. One of the easiest ways is to email you a message from what appears for your bank or other institution to receive asks of which you verify details. Before you do, aboard the phone and ask the bank if they do this because usually they do not.<br><br><br><br>Some businesses hire janitorial services companies for their cleaning personal needs. Take for instance a accounting unwavering. They are in the business of keeping indeed at bay, being an [http://bankowyksiegowy.eu/category/lubuskie/ bankowyksiegowy.eu] at that, while they chosen to hire a cleaning company to keep up with the cleanliness in the office. It is because for them a janitorial services firm is much better in doing their jobs and the quality for cash on that service is well-spent. In comparison to employing several people in-house, the business in keeping money secured for a rainy days prefers janitorial services for her cleaning will need.<br><br>To have a prisoner in jail always without chance of parole costs about $650,000.Recent studies have indicated how the death penalty costs 70% more than life time incarceration, in Tennessee 48% more, including Maryland, the death penaly costs 3 times as appreciably. In 2008, a California Commisson found that the system regarding death penalty would cost 11.5 millions dollars, in comparison to 137 million with the death fine. Even with no appeals, the death penalty is still more high-priced.<br><br>In September, 2000, The York Times conducted a survey which found that in focus of the final twnety years, states associated with death penalty have had homicide rates 48 to 101 percent higher than states without the death fee.<br><br>To all of the who don't know easy methods to prepare your taxes, the best way to start an individual want a timely refund, I'd TaxAct when they are fast, easy and free to several people. They'll even ready your state return using the details from your federal return and find you finest deductions. Though filing the region return isn't free, paying $29.99 so you can get everything prepared for you so you don't always be double enter any information and facts are well worth it instead to pay $75+ regarding and sit in line for hours waiting. |
| | |
| ==Representation based on the BoW model==
| |
| | |
| ===Image representation based on the BoW model===
| |
| | |
| To represent an image using BoW model, an image can be treated as a document. Similarly, "words" in images need to be defined too. To achieve this, it usually includes following three steps: [[Feature detection (computer vision)]], feature description and codebook generation.<ref name = "feifeicvpr2005">{{cite journal|doi=10.1109/CVPR.2005.16|chapter=A Bayesian Hierarchical Model for Learning Natural Scene Categories|title=2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)|year=2005|last1=Fei-Fei Li|last2=Perona|first2=P.|isbn=0-7695-2372-2|volume=2|pages=524}}</ref> A definition of the BoW model can be the "histogram representation based on independent features".<ref name="cvprcourse">
| |
| {{cite web
| |
| | author = L. Fei-Fei, R. Fergus, and A. Torralba
| |
| | title = Recognizing and Learning Object Categories, CVPR 2007 short course
| |
| | url=http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
| |
| }}
| |
| </ref> Content based image indexing and retrieval (CBIR) appears to be the early adopter of this image representation technique.<ref>{{cite journal|doi=10.1016/S0031-3203(01)00162-5|url=http://www.cs.nott.ac.uk/~qiu/webpages/Papers/ColorPatternRecognition.pdf|pages=1675–1686|title=Indexing chromatic and achromatic patterns for content-based colour image retrieval|year=2002|last1=Qiu|first1=G.|journal=Pattern Recognition|volume=35|issue=8}}</ref>
| |
| | |
| ====Feature representation====
| |
| After feature detection, each image is abstracted by several local patches. Feature representation methods deal with how to represent the patches as numerical vectors. These vectors are called feature descriptors. A good descriptor should have the ability to handle intensity, rotation, scale and affine variations to some extent. One of the most famous descriptors is [[Scale-invariant feature transform]] (SIFT).<ref name ="Loweiccv1999">{{Cite book
| |
| | url = http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf
| |
| | |
| | pages = 1150–1157
| |
| | year = 1999|doi=10.1109/ICCV.2003.1238356
| |
| | chapter = Object recognition with informative features and linear classification
| |
| | title = Proceedings Ninth IEEE International Conference on Computer Vision
| |
| | last1 = Vidal-Naquet
| |
| | last2 = Ullman
| |
| | isbn = 0-7695-1950-4
| |
| }}</ref> SIFT converts each patch to 128-dimensional vector. After this step, each image is a collection of vectors of the same dimension (128 for SIFT), where the order of different vectors is of no importance.
| |
| | |
| ====Codebook generation====
| |
| <!-- Unsourced image removed: [[Image:histogram_representation.JPG|right|thumb|Figure 4: histogram representation, a visual illustration, not real codewords.]] -->
| |
| <!-- Unsourced image removed: [[Image:example_codewords.JPG|right|thumb|Figure 5: some examples of codewords mapped back to image patches.]] -->
| |
| The final step for the BoW model is to convert vector represented patches to "codewords" (analogy to words in text documents), which also produces a "codebook" (analogy to a word dictionary). A codeword can be considered as a representative of several similar patches. One simple method is performing [[k-means clustering]] over all the vectors.<ref>{{cite journal
| |
| | author = T. Leung and [[Jitendra Malik|J. Malik]]
| |
| | title = Representing and recognizing the visual appearance of materials using three-dimensional textons
| |
| | url = http://www.cs.berkeley.edu/~malik/papers/LM-3dtexton.pdf
| |
| | journal = International Journal of Computer Vision
| |
| | volume = 43
| |
| | issue = 1
| |
| | pages = 29–44
| |
| | year = 2001
| |
| | doi = 10.1023/A:1011126920638 }}</ref> Codewords are then defined as the centers of the learned clusters. The number of the clusters is the codebook size (analogy to the size of the word dictionary).
| |
| | |
| Thus, each patch in an image is mapped to a certain codeword through the clustering process and the image can be represented by the [[histogram]] of the codewords.
| |
| | |
| ==Learning and recognition based on the BoW model==
| |
| Computer vision researchers have developed several learning methods to leverage the BoW model for image related task, such as [[object categorization]]. These methods can roughly be divided into two categories, generative and discriminative models. For multiple label categorization problem, the [[confusion matrix]] can be used as an evaluation metric.
| |
| | |
| ===Generative models===
| |
| | |
| Here are some notations for this section. Suppose the size of codebook is <math>V</math>.
| |
| * <math>w</math>: each patch <math>w</math> is a V-dimensional vector that has a single component that equals to one and all other components equal to zero (For k-means clustering setting, the single component equal one indicates the cluster that <math>w</math> belongs to). The <math>v</math>th codeword in the codebook can be represented as <math>w^v=1</math> and <math>w^u = 0</math> for <math>u\neq v</math>.
| |
| | |
| * <math>\mathbf{w}</math>: each image is represented by <math>\mathbf{w}=[w_1, w_2, \cdots, w_N]</math>, all the patches in an image
| |
| * <math>d_j</math>: the <math>j</math>th image in an image collection
| |
| * <math>c</math>: category of the image
| |
| * <math>z</math>: theme or topic of the patch
| |
| * <math>\pi</math>: mixture proportion
| |
| | |
| Since the BoW model is an analogy to the BoW model in NLP, generative models developed in text domains can also be adapted in computer vision. Simple Naïve Bayes model and hierarchical Bayesian models are discussed.
| |
| | |
| ====Naïve Bayes====
| |
| | |
| The simplest one is [[Naïve Bayes]] classifier.<ref name="danceeccv2004">{{cite conference
| |
| | author = G. Csurka, C. Dance, L.X. Fan, J. Willamowski, and C. Bray
| |
| | title = Visual categorization with bags of keypoints
| |
| | booktitle = Proc. of ECCV International Workshop on Statistical Learning in Computer Vision
| |
| | year=2004
| |
| | url=http://www.xrce.xerox.com/Research-Development/Publications/2004-0104/%28language%29/eng-GB
| |
| }}</ref> Using the language of [[graphical models]], the Naïve Bayes classifier is described by the equation below. The basic idea (or assumption) of this model is that each category has its own distribution over the codebooks, and that the distributions of each category are observably different. Take a face category and a car category for an example. The face category may emphasize the codewords which represent "nose", "eye" and "mouth", while the car category may emphasize the codewords which represent "wheel" and "window". Given a collection of training examples, the classifier learns different distributions for different categories. The categorization decision is made by
| |
| * <math>c^*=\arg \max_c p(c|\mathbf{w}) = \arg \max_c p(c)p(\mathbf{w}|c)=\arg \max_c p(c)\prod_{n=1}^Np(w_n|c)</math>
| |
| | |
| Since the Naïve Bayes classifier is simple yet effective, it is usually used as a baseline method for comparison.
| |
| | |
| ====Hierarchical Bayesian models====
| |
| | |
| The basic assumption of Naïve Bayes model does not hold sometimes. For example, a natural scene image may contain several different themes.
| |
| [[Probabilistic latent semantic analysis]] (pLSA)<ref>{{cite conference
| |
| | author = T. Hoffman
| |
| | title = Probabilistic Latent Semantic Analysis
| |
| | url = http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf
| |
| | booktitle = Proc. of the Fifteenth Conference on Uncertainty in Artificial Intelligence
| |
| | year = 1999 }}</ref><ref>{{Cite book
| |
| | doi=10.1109/ICCV.2005.77
| |
| | title = Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1
| |
| | url = http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic05b.pdf
| |
| | chapter=Discovering objects and their location in images
| |
| | year=2005
| |
| | last1=Sivic
| |
| | first1=J.
| |
| | last2=Russell
| |
| | first2=B.C.
| |
| | last3=Efros
| |
| | first3=A.A.
| |
| | last4=Zisserman
| |
| | first4=A.
| |
| | last5=Freeman
| |
| | first5=W.T.
| |
| | isbn=0-7695-2334-X
| |
| | pages=370
| |
| | |
| }}</ref> and [[latent Dirichlet allocation]] (LDA)<ref name="bleijmlr2003">
| |
| {{cite journal
| |
| | author = D. Blei, A. Ng, and M. Jordan
| |
| | title = Latent Dirichlet allocation
| |
| | url = http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
| |
| | journal = Journal of Machine Learning Research
| |
| | volume = 3
| |
| | pages = 993–1022
| |
| | year = 2003
| |
| | doi = 10.1162/jmlr.2003.3.4-5.993
| |
| | editor1-last = Lafferty
| |
| | editor1-first = John
| |
| | issue = 4–5 }}</ref> are two popular topic models from text domains to tackle the similar multiple "theme" problem. Take LDA for an example. To model natural scene images using LDA, an analogy is made like this (Figure 9):
| |
| * the image category is mapped to the document category;
| |
| * the mixture proportion of themes maps the mixture proportion of topics;
| |
| * the theme index is mapped to topic index;
| |
| * the codeword is mapped to the word.
| |
| This method shows very promising results in natural scene categorization on [http://vision.stanford.edu/resources_links.html 13 Natural Scene Categories].<ref name = "feifeicvpr2005"/>
| |
| | |
| ===Discriminative models===
| |
| | |
| Since images are represented based on the BoW model, any discriminative model suitable for text document categorization can be tried, such as [[support vector machine]] (SVM)<ref name="danceeccv2004"/> and [[AdaBoost]].<ref>{{Cite book
| |
| | doi=10.1109/CVPR.2005.254
| |
| | url = http://cbcl.mit.edu/projects/cbcl/publications/ps/serre-PID73457-05.pdf
| |
| | chapter=Object Recognition with Features Inspired by Visual Cortex
| |
| | title=2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)
| |
| | year=2005
| |
| | last1=Serre
| |
| | first1=T.
| |
| | last2=Wolf
| |
| | first2=L.
| |
| | last3=Poggio
| |
| | first3=T.
| |
| | isbn=0-7695-2372-2
| |
| | volume=2
| |
| | pages=994
| |
| | |
| }}</ref> [[Kernel trick]] is also applicable when kernel based classifier is used, such as SVM. Pyramid match kernel is newly developed one based on the BoW model. The local feature approach of using BoW model representation learnt by machine learning classifiers with different kernels (e.g., EMD-kernel and <math>X^2</math> kernel) has been vastly tested in the area of texture and object recognition.<ref name="bogkernelijcv2007">{{cite journal
| |
| | author = Jianguo Zhang, Marcin Marszałek, Svetlana Lazebnik, Cordelia Schmid
| |
| | title = Local Features and Kernels for Classification of Texture and Object Categories: a Comprehensive Study
| |
| | journal = International Journal of Computer Vision
| |
| | year = 2007
| |
| | volume = 73
| |
| | issue = 2
| |
| | pages = 213–238
| |
| | url = http://lear.inrialpes.fr/pubs/2007/ZMLS07/ZhangMarszalekLazebnikSchmid-IJCV07-ClassificationStudy.pdf
| |
| | doi = 10.1007/s11263-006-9794-4
| |
| }}</ref> Very promising results on a number of datasets have been reported.
| |
| This approach<ref name="bogkernelijcv2007"/> has achieved very impressive result in the [http://www.pascal-network.org/challenges/VOC/ the PASCAL Visual Object Classes Challenge]
| |
| | |
| ====Pyramid match kernel====
| |
| | |
| Pyramid match kernel<ref name="pyramidiccv2005">{{Cite book
| |
| | doi=10.1109/ICCV.2005.239
| |
| | url = http://www.cs.utexas.edu/~grauman/papers/grauman_darrell_iccv2005.pdf
| |
| | chapter=The pyramid match kernel: discriminative classification with sets of image features
| |
| | title=Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1
| |
| | year=2005
| |
| | last1=Grauman
| |
| | first1=K.
| |
| | last2=Darrell
| |
| | first2=T.
| |
| | isbn=0-7695-2334-X
| |
| | pages=1458
| |
| | |
| }}</ref> is a fast algorithm (linear complexity instead of classic one in quadratic complexity) kernel function (satisfying [[Mercer's condition]]) which maps the BoW features, or set of features in high dimension, to multi-dimensional multi-resolution histograms. An advantage of these multi-resolution histograms is their ability to capture co-occurring features. The pyramid match kernel builds multi-resolution histograms by binning data points into discrete regions of increasing size. Thus, points that do not match at high resolutions have the chance to match at low resolutions. The pyramid match kernel performs an approximate similarity match, without explicit search or computation of distance. Instead, it intersects the histograms to approximate the optimal match. Accordingly, the computation time is only linear in the number of features. Compared with other kernel approaches, the pyramid match kernel is much faster, yet provides equivalent accuracy. The pyramid match kernel was applied to [http://www.mis.informatik.tu-darmstadt.de/Research/Projects/categorization/eth80-db.html ETH-80 database] and [http://vision.cs.princeton.edu/resources_links.html Caltech 101 database] with promising results.<ref name="pyramidiccv2005"/><ref>{{Cite book|url=http://www.ifp.illinois.edu/~jyang29/ScSPM.htm|doi=10.1109/CVPR.2009.5206757|chapter=Linear spatial pyramid matching using sparse coding for image classification|title=2009 IEEE Conference on Computer Vision and Pattern Recognition|year=2009|last1=Jianchao Yang|last2=Kai Yu|last3=Yihong Gong|last4=Huang|first4=T.|isbn=978-1-4244-3992-8|pages=1794}}</ref>
| |
| | |
| ==Limitations and recent developments==
| |
| | |
| One of notorious disadvantages of BoW is that it ignores the spatial relationships among the patches, which is very important in image representation. Researchers have proposed several methods to incorporate the spatial information. For feature level improvements, correlogram features can capture spatial co-occurrences of features.<ref>{{Cite book
| |
| |doi=10.1109/CVPR.2006.102| url = http://johnwinn.org/Publications/papers/Savarese_Winn_Criminisi_Correlatons_CVPR2006.pdf
| |
| | year = 2006
| |
| |chapter=Discriminative Object Class Models of Appearance and Shape by Correlatons
| |
| |title=2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06)
| |
| |last1=Savarese
| |
| |first1=S.
| |
| |last2=Winn
| |
| |first2=J.
| |
| |last3=Criminisi
| |
| |first3=A.
| |
| |isbn=0-7695-2597-0
| |
| |volume=2
| |
| |pages=2033
| |
| }}</ref> For generative models, relative positions<ref>{{Cite book|doi=10.1109/ICCV.2005.137
| |
| | url = http://ssg.mit.edu/~esuddert/papers/iccv05.pdf|chapter=Learning hierarchical models of scenes, objects, and parts|title=Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1|year=2005|last1=Sudderth|first1=E.B.|last2=Torralba|first2=A.|last3=Freeman|first3=W.T.|last4=Willsky|first4=A.S.|isbn=0-7695-2334-X|pages=1331
| |
| }}</ref><ref>{{cite conference
| |
| | author = E. Sudderth, A. Torralba, W. Freeman, and A. Willsky
| |
| | title = Describing Visual Scenes using Transformed Dirichlet Processes
| |
| | url = http://ssg.mit.edu/~esuddert/papers/nips05.pdf
| |
| | booktitle = Proc. of Neural Information Processing Systems
| |
| | year = 2005 }}</ref> of codewords are also taken into account. The hierarchical shape and appearance model for human action<ref>{{Cite book|doi=10.1109/CVPR.2007.383132|url=http://vision.stanford.edu/posters/NieblesFeiFei_CVPR07_poster.pdf
| |
| | year = 2007|chapter=A Hierarchical Model of Shape and Appearance for Human Action Classification|title=2007 IEEE Conference on Computer Vision and Pattern Recognition|last1=Niebles|first1=Juan Carlos|last2=Li Fei-Fei|isbn=1-4244-1179-3|pages=1 }}</ref> introduces a new part layer ([[Constellation model]]) between the mixture proportion and the BoW features, which captures the spatial relationships among parts in the layer. For discriminative models, spatial pyramid match<ref>{{Cite book|doi=10.1109/CVPR.2006.68
| |
| | url = http://www-cvr.ai.uiuc.edu/ponce_grp/publication/paper/cvpr06b.pdf
| |
| | year = 2006|chapter=Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories|title=2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06)|last1=Lazebnik|first1=S.|last2=Schmid|first2=C.|last3=Ponce|first3=J.|isbn=0-7695-2597-0|volume=2|pages=2169 }}</ref> performs pyramid matching by partitioning the image into increasingly fine sub-regions and compute histograms of local features inside each sub-region.
| |
| | |
| Furthermore, the BoW model has not been extensively tested yet for view point invariance and scale invariance, and the performance is unclear. Also the BoW model for object segmentation and localization is not well understood.<ref name="cvprcourse"/>
| |
| | |
| ==See also==
| |
| | |
| * [[Part-based models]]
| |
| * [[Segmentation-based object categorization]]
| |
| * [[Vector space model]]
| |
| * [[ Bag-of-words model]]
| |
| * [[Feature extraction]]
| |
| | |
| ==References==
| |
| | |
| {{reflist}}
| |
| | |
| ==External links==
| |
| | |
| * [http://people.csail.mit.edu/fergus/iccv2005/bagwords.html A demo for two bag-of-words classifiers] by L. Fei-Fei, R. Fergus, and A. Torralba.
| |
| * [http://www.vision.caltech.edu/malaa/software/research/image-search/ Caltech Large Scale Image Search Toolbox]: a Matlab/C++ toolbox implementing Inverted File search for Bag of Words model. It also contains implementations for fast approximate nearest neighbor search using randomized [[k-d tree]], [[locality-sensitive hashing]], and [[hierarchical k-means]].
| |
| | |
| | |
| {{DEFAULTSORT:Bag Of Words Model In Computer Vision}}
| |
| [[Category:Object recognition and categorization]]
| |
| | |
| [[it:Modello della borsa di parole]]
| |
For a wagon to safely move forward, someone has to download it as well as the weight inside cannot be too much. The same holds true for society, for an economy for you to forward, customers have to try to move the economy in advance. Moreover, there must not be too splitting a bone . resting your past wagon adding unneeded free weight.
There a lot of ways that thieves get your credit rating. One of the easiest ways is to email you a message from what appears for your bank or other institution to receive asks of which you verify details. Before you do, aboard the phone and ask the bank if they do this because usually they do not.
Some businesses hire janitorial services companies for their cleaning personal needs. Take for instance a accounting unwavering. They are in the business of keeping indeed at bay, being an bankowyksiegowy.eu at that, while they chosen to hire a cleaning company to keep up with the cleanliness in the office. It is because for them a janitorial services firm is much better in doing their jobs and the quality for cash on that service is well-spent. In comparison to employing several people in-house, the business in keeping money secured for a rainy days prefers janitorial services for her cleaning will need.
To have a prisoner in jail always without chance of parole costs about $650,000.Recent studies have indicated how the death penalty costs 70% more than life time incarceration, in Tennessee 48% more, including Maryland, the death penaly costs 3 times as appreciably. In 2008, a California Commisson found that the system regarding death penalty would cost 11.5 millions dollars, in comparison to 137 million with the death fine. Even with no appeals, the death penalty is still more high-priced.
In September, 2000, The York Times conducted a survey which found that in focus of the final twnety years, states associated with death penalty have had homicide rates 48 to 101 percent higher than states without the death fee.
To all of the who don't know easy methods to prepare your taxes, the best way to start an individual want a timely refund, I'd TaxAct when they are fast, easy and free to several people. They'll even ready your state return using the details from your federal return and find you finest deductions. Though filing the region return isn't free, paying $29.99 so you can get everything prepared for you so you don't always be double enter any information and facts are well worth it instead to pay $75+ regarding and sit in line for hours waiting.