Subindependence: Difference between revisions
en>Tsirel the same but a bit shorter |
|||
Line 1: | Line 1: | ||
{{No footnotes|date=June 2010}} | |||
[[Graphical model]]s have become powerful frameworks for [[protein structure prediction]], [[protein–protein interaction]] and [[Thermodynamic free energy|free energy]] calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations. | |||
There are two main approaches to use graphical models in protein structure modeling. The first approach uses [[Discrete mathematics|discrete]] variables for representing coordinates or [[dihedral angle]]s of the protein structure. The variables are originally all continuous values and, to transform them into discrete values, a discretization process is typically applied. The second approach uses continuous variables for the coordinates or dihedral angles. | |||
==Discrete graphical models for protein structure== | |||
[[Markov random field]]s, also known as undirected graphical models are common representations for this problem. Given an [[undirected graph]] ''G'' = (''V'', ''E''), a set of [[random variable]]s ''X'' = (''X''<sub>''v''</sub>)<sub>''v'' ∈ ''V''</sub> indexed by ''V'', form a Markov random field with respect to ''G'' if they satisfy the pairwise Markov property: | |||
*any two non-adjacent variables are [[conditional independence|conditionally independent]] given all other variables: | |||
:<math>X_u \perp\!\!\!\perp X_v | X_{V \setminus \{u,v\}} \quad \text{if } \{u,v\} \notin E.</math> | |||
In the discrete model, the continuous variables are discretized into a set of favorable discrete values. If the variables of choice are [[dihedral angle]]s, the discretization is typically done by mapping each value to the corresponding [[rotamer]] conformation. | |||
===Model=== | |||
Let ''X'' = {''X''<sub>''b''</sub>, ''X''<sub>''s''</sub>} be the random variables representing the entire protein structure. ''X''<sub>''b''</sub> can be represented by a set of 3-d coordinates of the [[Backbone chain|backbone]] atoms, or equivalently, by a sequence of [[bond length]]s and [[dihedral angle]]s. The probability of a particular [[Protein structure|conformation]] ''x'' can then be written as: | |||
:<math>p(X = x|\Theta) = p(X_b = x_b)p(X_s = x_s|X_b,\Theta), \,</math> | |||
where <math>\Theta</math> represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in <math>\Theta</math>. This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance threshold, and only pair of residues which are within that threshold are considered connected (i.e. have an edge between them). | |||
Given this representation, the probability of a particular side chain conformation ''x''<sub>''s''</sub> given the backbone conformation ''x''<sub>''b''</sub> can be expressed as | |||
:<math>p(X_s = x_s|X_b = x_b) = \frac{1}{Z} \prod_{c\in C(G)}\Phi_c (x_s^c,x_b^c)</math> | |||
where ''C''(''G'') is the set of all cliques in ''G'', <math>\Phi</math> is a [[function (mathematics)|potential function]] defined over the variables, and ''Z'' is the [[partition function (mathematics)|partition function]]. | |||
To completely characterize the MRF, it is necessary to define the potential function <math>\Phi</math>. To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In [[Goblin System]], this pairwise functions are defined as | |||
:<math>\Phi(x_s^{i_p},x_b^{j_q}) = \exp ( -E(x_s^{i_p},x_b^{j_q})/K_BT)</math> | |||
where <math>E(x_s^{i_p},x_b^{j_q})</math> is the energy of interaction between rotamer state p of residue <math>X_i^s</math> and rotamer state q of residue <math>X_j^s</math> and <math>k_B</math> is the [[Boltzmann constant]]. | |||
Using a PDB file, this model can be built over the protein structure. From this model free energy can be calculated. | |||
===Free energy calculation: belief propagation=== | |||
It has been shown that the free energy of a system is calculated as | |||
:<math>G=E-TS</math> | |||
where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as | |||
:<math>G=\sum_{x}p(x)E(x)-T\sum_xp(x)\ln(p(x)) \,</math> | |||
Calculating p(x) on discrete graphs is done by the [[generalized belief propagation]] algorithm. This algorithm calculates an [[approximation]] to the probabilities, and it is not guaranteed to converge to a final value set. However, in practice, it has been shown to converge successfully in many cases. | |||
==Continuous graphical models for protein structures== | |||
Graphical models can still be used when the variables of choice are continuous. In these cases, the probability distribution is represented as a [[multivariate probability distribution]] over continuous variables. Each family of distribution will then impose certain properties on the graphical model. [[Multivariate Gaussian distribution]] is one of the most convenient distributions in this problem. The simple form of the probability, and the direct relation with the corresponding graphical model makes it a popular choice among researchers. | |||
===Gaussian graphical models of protein structures=== | |||
Gaussian graphical models are multivariate probability distributions encoding a network of dependencies among variables. Let <math>\Theta=[\theta_1, \theta_2, \dots, \theta_n]</math> be a set of <math>n</math> variables, such as <math>n</math> [[dihedral angles]], and let <math>f(\Theta=D)</math> be the value of the [[probability density function]] at a particular value ''D''. A multivariate Gaussian graphical model defines this probability as follows: | |||
:<math>f(\Theta=D) = \frac{1}{Z} \exp\left\{-\frac{1}{2}(D-\mu)^T\Sigma^{-1}(D-\mu)\right\}</math> | |||
Where <math>Z = (2\pi)^{n/2}|\Sigma|^{1/2}</math> is the closed form for the [[Partition function (mathematics)|partition function]]. The parameters of this distribution are <math>\mu</math> and <math>\Sigma</math>. <math>\mu</math> is the vector of [[mean values]] of each variable, and <math>\Sigma^{-1}</math>, the inverse of the [[covariance matrix]], also known as the [[precision matrix]]. Precision matrix contains the pairwise dependencies between the variables. A zero value in <math>\Sigma^{-1}</math> means that conditioned on the values of the other variables, the two corresponding variable are independent of each other. | |||
To learn the graph structure as a multivariate Gaussian graphical model, we can use either [[L-1 regularization]], or [[neighborhood selection]] algorithms. These algorithms simultaneously learn a graph structure and the edge strength of the connected nodes. An edge strength corresponds to the potential function defined on the corresponding two-node [[clique]]. We use a training set of a number of PDB structures to learn the <math>\mu</math> and <math>\Sigma^{-1}</math>. | |||
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the free energy. Here, the [[Partition function (mathematics)|partition function]] already has a [[Closed-form expression|closed form]], so the [[inference]], at least for the Gaussian graphical models is trivial. If the analytical form of the partition function is not available, [[particle filtering]] or [[expectation propagation]] can be used to approximate ''Z'', and then perform the inference and calculate free energy. | |||
{{No footnotes|date=August 2010}} | |||
==References== | |||
<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags which will then appear here automatically --> | |||
* Time Varying Undirected Graphs, Shuheng Zhou and John D. Lafferty and Larry A. Wasserman, COLT 2008 | |||
* Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation, Hetunandan Kamisetty Eric P. Xing Christopher J. Langmead, RECOMB 2008 | |||
==External links== | |||
* http://www.liebertonline.com/doi/pdf/10.1089/cmb.2007.0131 | |||
* http://www.learningtheory.org/colt2008/81-Zhou.pdf | |||
* {{cite journal|author1= Liu Y |author2= Carbonell J |author3= Gopalakrishnan V |year=2009|title= Conditional graphical models for protein structural motif recognition | |||
|journal= J Comput Biol. | volume=16|pages= 639-57 |url= http://www.ncbi.nlm.nih.gov/pubmed/19432536}} | |||
* [http://www.cs.cmu.edu/~jgc/publication/Predicting_Protein_Folds_ICML_2005.pdf Predicting Protein Folds with Structural Repeats Using a Chain Graph Model] | |||
{{DEFAULTSORT:Graphical Models For Protein Structure}} | |||
[[Category:Graphical models]] | |||
[[Category:Protein methods]] | |||
[[Category:Computational chemistry]] |
Revision as of 22:06, 10 December 2013
Template:No footnotes Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein protein interactions, protein-drug interaction, and free energy calculations.
There are two main approaches to use graphical models in protein structure modeling. The first approach uses discrete variables for representing coordinates or dihedral angles of the protein structure. The variables are originally all continuous values and, to transform them into discrete values, a discretization process is typically applied. The second approach uses continuous variables for the coordinates or dihedral angles.
Discrete graphical models for protein structure
Markov random fields, also known as undirected graphical models are common representations for this problem. Given an undirected graph G = (V, E), a set of random variables X = (Xv)v ∈ V indexed by V, form a Markov random field with respect to G if they satisfy the pairwise Markov property:
- any two non-adjacent variables are conditionally independent given all other variables:
In the discrete model, the continuous variables are discretized into a set of favorable discrete values. If the variables of choice are dihedral angles, the discretization is typically done by mapping each value to the corresponding rotamer conformation.
Model
Let X = {Xb, Xs} be the random variables representing the entire protein structure. Xb can be represented by a set of 3-d coordinates of the backbone atoms, or equivalently, by a sequence of bond lengths and dihedral angles. The probability of a particular conformation x can then be written as:
where represents any parameters used to describe this model, including sequence information, temperature etc. Frequently the backbone is assumed to be rigid with a known conformation, and the problem is then transformed to a side-chain placement problem. The structure of the graph is also encoded in . This structure shows which two variables are conditionally independent. As an example, side chain angles of two residues far apart can be independent given all other angles in the protein. To extract this structure, researchers use a distance threshold, and only pair of residues which are within that threshold are considered connected (i.e. have an edge between them).
Given this representation, the probability of a particular side chain conformation xs given the backbone conformation xb can be expressed as
where C(G) is the set of all cliques in G, is a potential function defined over the variables, and Z is the partition function.
To completely characterize the MRF, it is necessary to define the potential function . To simplify, the cliques of a graph are usually restricted to only the cliques of size 2, which means the potential function is only defined over pairs of variables. In Goblin System, this pairwise functions are defined as
where is the energy of interaction between rotamer state p of residue and rotamer state q of residue and is the Boltzmann constant.
Using a PDB file, this model can be built over the protein structure. From this model free energy can be calculated.
Free energy calculation: belief propagation
It has been shown that the free energy of a system is calculated as
where E is the enthalpy of the system, T the temperature and S, the entropy. Now if we associate a probability with each state of the system, (p(x) for each conformation value, x), G can be rewritten as
Calculating p(x) on discrete graphs is done by the generalized belief propagation algorithm. This algorithm calculates an approximation to the probabilities, and it is not guaranteed to converge to a final value set. However, in practice, it has been shown to converge successfully in many cases.
Continuous graphical models for protein structures
Graphical models can still be used when the variables of choice are continuous. In these cases, the probability distribution is represented as a multivariate probability distribution over continuous variables. Each family of distribution will then impose certain properties on the graphical model. Multivariate Gaussian distribution is one of the most convenient distributions in this problem. The simple form of the probability, and the direct relation with the corresponding graphical model makes it a popular choice among researchers.
Gaussian graphical models of protein structures
Gaussian graphical models are multivariate probability distributions encoding a network of dependencies among variables. Let be a set of variables, such as dihedral angles, and let be the value of the probability density function at a particular value D. A multivariate Gaussian graphical model defines this probability as follows:
Where is the closed form for the partition function. The parameters of this distribution are and . is the vector of mean values of each variable, and , the inverse of the covariance matrix, also known as the precision matrix. Precision matrix contains the pairwise dependencies between the variables. A zero value in means that conditioned on the values of the other variables, the two corresponding variable are independent of each other.
To learn the graph structure as a multivariate Gaussian graphical model, we can use either L-1 regularization, or neighborhood selection algorithms. These algorithms simultaneously learn a graph structure and the edge strength of the connected nodes. An edge strength corresponds to the potential function defined on the corresponding two-node clique. We use a training set of a number of PDB structures to learn the and .
Once the model is learned, we can repeat the same step as in the discrete case, to get the density functions at each node, and use analytical form to calculate the free energy. Here, the partition function already has a closed form, so the inference, at least for the Gaussian graphical models is trivial. If the analytical form of the partition function is not available, particle filtering or expectation propagation can be used to approximate Z, and then perform the inference and calculate free energy.
References
- Time Varying Undirected Graphs, Shuheng Zhou and John D. Lafferty and Larry A. Wasserman, COLT 2008
- Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation, Hetunandan Kamisetty Eric P. Xing Christopher J. Langmead, RECOMB 2008
External links
- http://www.liebertonline.com/doi/pdf/10.1089/cmb.2007.0131
- http://www.learningtheory.org/colt2008/81-Zhou.pdf
- One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting
In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang
Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules
Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.
A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running
The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more
There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang - Predicting Protein Folds with Structural Repeats Using a Chain Graph Model