|
|
Line 1: |
Line 1: |
| {{context|date=January 2013}}
| | They call me Emilia. To do aerobics is a thing that I'm completely addicted to. For many years he's been operating as a meter reader and it's something he truly appreciate. North Dakota is her birth place but she will have to transfer one day or another.<br><br>my site: [https://healthcoachmarketing.zendesk.com/entries/53672184-Candida-Tips-And-Cures-That-May-Meet-Your-Needs healthcoachmarketing.zendesk.com] |
| {{technical|date=June 2012}}
| |
| {{machine learning bar}}
| |
| '''Conditional random fields (CRFs)''' are a class of [[statistical model| statistical modelling method]] often applied in [[pattern recognition]] and [[machine learning]], where they are used for [[structured prediction]]. Whereas an ordinary [[statistical classification|classifier]] predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account; e.g., the linear chain CRF popular in [[natural language processing]] predicts sequences of labels for sequences of input samples.
| |
| | |
| CRFs are a type of [[discriminative model|discriminative]] [[Markov random field|undirected]] [[Statistical model|probabilistic]] [[graphical model]]. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for [[sequence labeling|labeling]] or [[parsing]] of sequential data, such as natural language text or [[bioinformatics|biological sequences]]<ref name="Laf:McC:Per01">{{cite conference | authors = Lafferty, J., McCallum, A., Pereira, F. |
| |
| title=Conditional random fields: Probabilistic models for segmenting and labeling sequence data|
| |
| booktitle =Proc. 18th International Conf. on Machine Learning |
| |
| publisher= Morgan Kaufmann|
| |
| date = 2001| pages= 282–289|url= http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers }}
| |
| </ref>
| |
| and in [[computer vision]].<ref>{{cite article
| |
| | title = Multiscale conditional random fields for image labeling
| |
| | last1 = He | first1 = X. | last2 = Zemel | first2 = R.S. | last3 = Carreira-Perpinñán | first3 = M.A.
| |
| | date = 2004
| |
| | publisher = IEEE Computer Society
| |
| | id = {{citeseerx|10.1.1.3.7826}}
| |
| }}</ref>
| |
| Specifically, CRFs find applications in [[shallow parsing]],<ref>{{cite conference| title=shallow parsing with conditional random fields|
| |
| author = Sha, F., Pereira, F. | date=2003 | url= http://portal.acm.org/ft_gateway.cfm?id=1073473&type=pdf&CFID=4684435&CFTOKEN=39459323}}</ref>
| |
| [[named entity recognition]]<ref>{{cite conference|
| |
| url=http://acl.ldc.upenn.edu/coling2004/W1/pdf/21.pdf |
| |
| title=Biomedical named entity recognition using conditional random fields and rich feature sets |
| |
| author= Settles, B.|date=2004|
| |
| booktitle=Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications |
| |
| pages= 104--107}}</ref>
| |
| and [[Gene prediction|gene finding]], among other tasks, being an alternative to the related [[hidden Markov model]]s. In computer vision, CRFs are often used for object recognition and image segmentation.
| |
| | |
| ==Description==
| |
| Lafferty, [[Andrew McCallum|McCallum]] and Pereira<ref name="Laf:McC:Per01"/> define a CRF on observations <math>\boldsymbol{X}</math> and [[random variable]]s <math>\boldsymbol{Y}</math> as follows:
| |
| | |
| <blockquote>Let <math>G = (V , E)</math> be a graph such that
| |
| <math>\boldsymbol{Y} = (\boldsymbol{Y}_v)_{v\in V}</math>,
| |
| so that <math>\boldsymbol{Y}</math> is indexed by the vertices of <math>G</math>.
| |
| Then <math>(\boldsymbol{X}, \boldsymbol{Y})</math> is a conditional random field when the random variables <math>\boldsymbol{Y}_v</math>, conditioned on <math>\boldsymbol{X}</math>, obey the [[Markov property]] with
| |
| respect to the graph: <math>p(\boldsymbol{Y}_v |\boldsymbol{X}, \boldsymbol{Y}_w, w \neq v) = p(\boldsymbol{Y}_v |\boldsymbol{X}, \boldsymbol{Y}_w, w \sim v)</math>, where <math>\mathit{w} \sim v</math> means
| |
| that <math>w</math> and <math>v</math> are neighbors in <math>G</math>.
| |
| </blockquote>
| |
| | |
| What this means is that a CRF is an [[Graphical model|undirected graphical model]] whose nodes can be divided into exactly two disjoint sets <math>\boldsymbol{X}</math> and <math>\boldsymbol{Y}</math>, the observed and output variables, respectively; the conditional distribution <math>p(\boldsymbol{Y}|\boldsymbol{X})</math> is then modeled.
| |
| | |
| ===Inference===
| |
| For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an [[Markov random field#Inference|MRF]] and the same arguments hold.<ref name=SuttonIntroduction>{{cite arxiv | last1=Sutton | first1=Charles | last2=McCallum | first2= Andrew | class=stat.ML | year=2010 | eprint=1011.4088|title= An Introduction to Conditional Random Fields | version=v1}}</ref> | |
| However there exist special cases for which exact inference is feasible:
| |
| | |
| * If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the [[forward-backward algorithm|forward-backward]] and [[Viterbi algorithm]] for the case of HMMs.
| |
| * If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions.
| |
| | |
| If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include:
| |
| * Loopy belief propagation
| |
| * Alpha expansion
| |
| * Mean field inference
| |
| * Linear programming relaxations
| |
| | |
| ===Parameter Learning===
| |
| Learning the parameters <math>\theta</math> is usually done by [[maximum likelihood]] learning for <math>p(Y_i|X_i; \theta)</math>.
| |
| If all nodes have exponential family distributions and all nodes are observed during training, this [[Optimization (mathematics)|optimization]] is convex.<ref name="SuttonIntroduction" /> It can be solved for example using [[gradient descent]] algorithms, or [[Quasi-Newton method]]s such as the [[L-BFGS]] algorithm.
| |
| On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used.
| |
| | |
| ===Examples===
| |
| In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables <math>X</math> represents a sequence of observations and <math>Y</math> represents a hidden (or unknown) state variable that needs to be inferred given the observations.
| |
| The <math>Y_{i}</math> are structured to form a chain, with an edge between each <math>Y_{i-1}</math> and <math>Y_{i}</math>. As well as having a simple interpretation of the <math>Y_{i}</math> as "labels" for each element in the input sequence, this layout admits efficient algorithms for:
| |
| * model ''training'', learning the conditional distributions between the <math>Y_{i}</math> and feature functions from some corpus of training data.
| |
| * ''inference'', determining the probability of a given label sequence <math>Y</math> given <math>X</math>.
| |
| * ''decoding'', determining the ''most likely'' label sequence <math>Y</math> given <math>X</math>.
| |
| | |
| The conditional dependency of each <math>Y_{i}</math> on <math>X</math> is defined through a fixed set of ''feature functions'' of the form <math>f(i, Y_{i-1}, Y_{i}, X)</math>, which can informally be thought of as measurements on the input sequence that partially determine the [[Likelihood function|likelihood]] of each possible value for <math>Y_{i}</math>. The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for <math>Y_{i}</math>.
| |
| | |
| Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.
| |
| | |
| Notably in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence <math>X</math> at any point during inference, and the range of the feature functions need not have a probabilistic interpretation.
| |
| | |
| ==Variants==
| |
| | |
| ===Higher-order CRFs and semi-Markov CRFs===
| |
| | |
| CRFs can be extended into higher order models by making each <math>Y_{i}</math> dependent on a fixed number <math>o</math> of previous variables <math>Y_{i-o}, ..., Y_{i-1}</math>. Training and inference are only practical for small values of <math>o</math> (such as ''o'' ≤ 5),{{Citation needed|date=December 2008}} since their computational cost increases exponentially with <math>o</math>. Large-margin models for [[structured prediction]], such as the [[Structured SVM|structured Support Vector Machine]] can be seen as an alternative training procedure to CRFs.
| |
| | |
| There exists another generalization of CRFs, the '''semi-Markov conditional random field (semi-CRF)''', which models variable-length ''segmentations'' of the label sequence <math>Y</math>.<ref>{{Cite conference
| |
| | publisher = MIT Press
| |
| | pages = 1185–1192
| |
| | editors = Lawrence K. Saul, Yair Weiss, Léon Bottou (eds.)
| |
| | last = Sarawagi
| |
| | first = Sunita
| |
| | coauthors = William W. Cohen
| |
| | title = [http://books.nips.cc/papers/files/nips17/NIPS2004_0427.pdf Semi-Markov conditional random fields for information extraction]
| |
| | booktitle = Advances in Neural Information Processing Systems 17
| |
| | location = Cambridge, MA
| |
| | year = 2005
| |
| }}</ref> This provides much of the power of higher-order CRFs to model long-range dependencies of the <math>Y_{i}</math>, at a reasonable computational cost.
| |
| | |
| ===Latent-dynamic conditional random field===
| |
| '''Latent-dynamic conditional random fields''' ('''LDCRF''') or '''discriminative probabilistic latent variable models''' ('''DPLVM''') are a type of CRFs for sequence tagging tasks. They are [[latent variable model]]s that are trained discriminatively.
| |
| | |
| In an LDCRF, like in any sequence tagging task, given a sequence of observations '''x''' = {{mvar|x}}₁, … {{mvar|xₙ}}, the main problem the model must solve is how to assign a sequence of labels '''y''' = {{mvar|y}}₁, … {{mvar|yₙ}} from one finite set of labels {{mvar|Y}}. Instead of directly modeling {{mvar|P}}('''y'''|'''x''') as an ordinary linear-chain CRF would do, instead a set of latent variables '''h''' is "inserted" between '''x''' and '''y''' using the [[chain rule of probability]]:<ref name="lvperceptron">{{cite conference |author1=Xu Sun |author2=Takuya Matsuzaki |author3=Daisuke Okanohara |author4=Jun'ichi Tsujii |title=Latent Variable Perceptron Algorithm for Structured Classification |conference=IJCAI |year=2009 |pages=1236–1242}}</ref>
| |
| | |
| :<math>P(\mathbf{y} | \mathbf{x}) = \sum_\mathbf{h} P(\mathbf{y}|\mathbf{h}, \mathbf{x}) P(\mathbf{h} | \mathbf{x})</math>
| |
| | |
| This allows the capturing of latent structure between the observations and labels.<ref name="morency">{{cite doi|10.1109/CVPR.2007.383299}}</ref> While LDCRFs can be trained using quasi-Newton methods, a specialized version of the [[perceptron]] algorithm called the '''latent-variable perceptron''' has been developed for them as well, based on Collins' [[structured perceptron]] algorithm.<ref name="lvperceptron"/> These models find applications in [[computer vision]], specifically [[gesture recognition]] from video streams,<ref name="morency"/> and [[shallow parsing]].<ref name="lvperceptron"/>
| |
| | |
| == Software ==
| |
| This is a partial list of software that implement generic CRF tools.
| |
| * [http://klcl.pku.edu.cn/member/sunxu/code.htm CRF-ADF] Linear-chain CRFs with fast online ADF training ([[C Sharp (programming language)|C#]], [[.NET Framework|.NET]])
| |
| * [http://crfsharp.codeplex.com/ CRFSharp] Linear-chain CRFs ([[C Sharp (programming language)|C#]], [[.NET Framework|.NET]])
| |
| * [http://vision.csd.uwo.ca/code/ GCO] CRFs with submodular energy functions ([[C++]], [[Matlab]])
| |
| * [http://mallet.cs.umass.edu/grmm/index.php GRMM] General CRFs ([[Java (programming language)|Java]])
| |
| * [http://www.cs.ubc.ca/~murphyk/Software/CRFall.zip CRFall] General CRFs ([[MATLAB|Matlab]])
| |
| * [http://crf.sourceforge.net/ Sarawagi's CRF] Linear-chain CRFs ([[Java (programming language)|Java]])
| |
| * [http://sourceforge.net/projects/hcrf/ HCRF library] Hidden-state CRFs ([[C++]], [[MATLAB|Matlab]])
| |
| * [http://wapiti.limsi.fr/ Wapiti] Fast linear-chain CRFs ([[C (programming language)|C]])<ref>T. Lavergne, O. Cappé and F. Yvon (2010). [http://acl.eldoc.ub.rug.nl/mirror/P/P10/P10-1052.pdf Practical very large scale CRFs]. Proc. 48th Annual Meeting of the [[Association for Computational Linguistics|ACL]], pp. 504-513.</ref>
| |
| * [http://www.chokkan.org/software/crfsuite/ CRFSuite] Fast restricted linear-chain CRFs ([[C (programming language)|C]])
| |
| * [http://crfpp.sourceforge.net/ CRF++] Linear-chain CRFs ([[C++]])
| |
| * [http://flexcrfs.sourceforge.net/ FlexCRFs] First-order and second-order Markov CRFs ([[C++]])
| |
| * [http://hackage.haskell.org/package/crf-chain1 crf-chain1] First-order, linear-chain CRFs ([[Haskell (programming language)|Haskell]])
| |
| | |
| This is a partial list of software that implement CRF related tools.
| |
| * [http://www.broadinstitute.org/annotation/conrad Conrad] CRF based gene predictor ([[Java (programming language)|Java]])
| |
| * [http://nlp.stanford.edu/software/CRF-NER.shtml Stanford NER] Named Entity Recognizer ([[Java (programming language)|Java]])
| |
| * [http://cbioc.eas.asu.edu/banner/ BANNER] Named Entity Recognizer ([[Java (programming language)|Java]])
| |
| | |
| == See also ==
| |
| * [[Graphical model]]
| |
| * [[Markov random field]]
| |
| * [[Maximum entropy Markov model]] (MEMM)
| |
| | |
| == References ==
| |
| {{reflist|30em}}
| |
| | |
| ==Further reading==
| |
| * McCallum, A.: Efficiently inducing features of conditional random fields. In: ''Proc. 19th Conference on Uncertainty in Artificial Intelligence''. (2003)
| |
| | |
| * Wallach, H.M.: [http://www.cs.umass.edu/~wallach/technical_reports/wallach04conditional.pdf Conditional random fields: An introduction]. Technical report MS-CIS-04-21, University of Pennsylvania (2004)
| |
| * Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In "Introduction to Statistical Relational Learning". Edited by [[Lise Getoor]] and Ben Taskar. MIT Press. (2006) [http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf Online PDF]
| |
| * Klinger, R., Tomanek, K.: Classical Probabilistic Models and Conditional Random Fields. Algorithm Engineering Report TR07-2-013, Department of Computer Science, Dortmund University of Technology, December 2007. ISSN 1864-4503. [http://www.scai.fraunhofer.de/fileadmin/images/bio/data_mining/paper/crf_klinger_tomanek.pdf Online PDF]
| |
| | |
| | |
| | |
| [[Category:Graphical models]]
| |
| [[Category:Log-linear models]]
| |
| [[Category:Machine learning]]
| |