Chronon: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
Added links to Sucahrd's papers in Notes
en>Penbat
Line 1: Line 1:
{{expert-subject|date=January 2013}}
I woke up another day and realized - I have also been single for some time at the moment and after much intimidation from pals I today find myself signed up for [http://Pinterest.com/search/pins/?q=internet+dating internet dating]. They promised me that there are plenty of standard, pleasant and enjoyable individuals to meet up, therefore here goes the pitch!<br>My friends and household are magnificent and spending time with them at tavern gigabytes or [http://www.tumblr.com/tagged/dinners dinners] is consistently essential. I haven't ever been in to nightclubs as I see that you can never own a decent dialogue with the sound. In addition, I got 2 really cute and definitely cheeky puppies that are consistently ready to meet up fresh individuals.<br>I endeavor to stay as toned as possible being at the gym several-times a   luke bryan live concert ([http://lukebryantickets.pyhgy.com http://lukebryantickets.Pyhgy.com]) week. I appreciate my athletics and make an effort to perform or view as numerous a potential. Being wintertime I am going to often at Hawthorn matches. Notice: I've experienced the carnage of wrestling fits at stocktake sales, If you really considered purchasing a sport I really do not mind.<br><br>Take  [http://www.museodecarruajes.org luke bryan tickets 2013] a look at my web blog - luke bryan live in concert ([http://lukebryantickets.flicense.com Full Article])
[[File:Boltzmannexamplev1.png|thumb|right|alt=A graphical representation of an example Boltzmann machine.| A graphical representation of an example Boltzmann machine. Each undirected edge represents dependency. In this example there are 3 hidden units and 4 visible units. This is not a restricted Boltzmann machine.]]
A '''Boltzmann machine''' is a type of [[stochastic neural network|stochastic]] [[recurrent neural network]] invented by [[Geoffrey Hinton]] and [[Terry Sejnowski]] in 1985. Boltzmann machines can be seen as the [[stochastic process|stochastic]], [[generative model|generative]] counterpart of [[Hopfield net]]s.  They were one of the first examples of a neural network capable of learning internal representations, and are able to represent and (given sufficient time) solve difficult combinatoric problems.  However, due to a number of issues discussed below, Boltzmann machines with unconstrained connectivity have not proven useful for practical problems in machine learning or inference.  They are still theoretically intriguing, however, due to the locality and [[Hebbian]] nature of their training algorithm, as well as their parallelism and the resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the learning can be made efficient enough to be useful for practical problems.
 
They are named after the [[Boltzmann distribution]] in statistical mechanics, which is used in their sampling function.
 
==Structure==
[[File:Boltzmannexamplev2.png|thumb|right|alt=A graphical representation of an example Boltzmann machine with weight labels.| A graphical representation of a Boltzmann machine with a few weights labeled. Each undirected edge represents dependency and is weighted with weight <math>w_{ij}</math>. In this example there are 3 hidden units (blue) and 4 visible units (white). This is not a restricted Boltzmann machine.]]
 
A Boltzmann machine, like a [[Hopfield net]]work, is a network of units with an "energy" defined for the network. It also has [[Wiktionary:binary|binary]] units, but unlike Hopfield nets, Boltzmann machine units are [[stochastic]]. The global energy, <math>E</math>, in a Boltzmann machine is identical in form to that of a Hopfield network:
 
:<math>E = -(\sum_{i<j} w_{ij} \, s_i \, s_j + \sum_i \theta_i \, s_i)</math>
 
Where:
* <math>w_{ij}</math> is the connection strength between unit <math>j</math> and unit <math>i</math>.
* <math>s_i</math> is the state, <math>s_i \in \{0,1\}</math>, of unit <math>i</math>.
* <math>\theta_i</math> is the bias of unit <math>i</math> in the global energy function. (<math>-\theta_i</math> is the activation threshold for the unit.)
 
The connections in a Boltzmann machine have two restrictions:
* <math>w_{ii}=0\qquad \forall i</math>. (No unit has a connection with itself.)
* <math>w_{ij}=w_{ji}\qquad \forall i,j</math>. (All connections are [[symmetric]].)
 
Often the weights are represented in matrix form with a symmetric matrix <math>W</math>, with zeros along the diagonal.
 
== Probability of a unit's state ==
 
The difference in the global energy that results from a single unit <math>i</math> being 0 (off) versus 1 (on), written <math>\Delta E_i</math>, assuming a symmetric matrix of weights, is given by:
 
:<math>\Delta E_i = \sum_j w_{ij} \, s_j + \theta_i</math>
 
This can be expressed as the difference of energies of two states:
 
:<math>\Delta E_i = E_\text{i=off} - E_\text{i=on}</math>
 
We then substitute the energy of each state with its relative probability according to the [[Boltzmann Factor]] (the property of a [[Boltzmann distribution]] that the energy of a state is proportional to the negative log probability of that state):
 
:<math>\Delta E_i = -k_B\,T\ln(p_\text{i=off}) - (-k_B\,T\ln(p_\text{i=on}))</math>
 
where <math>k_B</math> is Boltzmann's constant and is absorbed into the artificial notion of temperature <math>T</math>. We then rearrange terms and consider that the probabilities of the unit being on and off must sum to one:
 
:<math>\frac{\Delta E_i}{T} = \ln(p_\text{i=on}) - \ln(p_\text{i=off})</math>
:<math>\frac{\Delta E_i}{T} = \ln(p_\text{i=on}) - \ln(1 - p_\text{i=on})</math>
:<math>\frac{\Delta E_i}{T} = \ln\left(\frac{p_\text{i=on}}{1 - p_\text{i=on}}\right)</math>
:<math>-\frac{\Delta E_i}{T} = \ln\left(\frac{1 - p_\text{i=on}}{p_\text{i=on}}\right)</math>
:<math>-\frac{\Delta E_i}{T} = \ln\left(\frac{1}{p_\text{i=on}} - 1\right)</math>
:<math>\exp\left(-\frac{\Delta E_i}{T}\right) = \frac{1}{p_\text{i=on}} - 1</math>
 
We can now finally solve for <math>p_\text{i=on}</math>, the probability that the <math>i</math>-th unit is on.
 
:<math>p_\text{i=on} = \frac{1}{1+\exp(-\frac{\Delta E_i}{T})}</math>
 
where the [[scalar (physics)|scalar]] <math>T</math> is referred to as the [[temperature]] of the system. This relation is the source of the [[logistic function]] found in probability expressions in variants of the Boltzmann machine.
 
== Equilibrium state ==
 
The network is run by repeatedly choosing a unit and setting its state according to the above formula.  After running for long enough at a certain temperature, the probability of a global state of the network will depend only upon that global state's energy, according to a [[Boltzmann distribution]].  This means that log-probabilities of global states become linear in their energies.  This relationship is true when the machine is "at [[thermal equilibrium]]", meaning that the probability distribution of global states has converged.  If we start running the network from a high temperature, and gradually decrease it until we reach a [[thermal equilibrium]] at a low temperature, we may converge to a distribution where the energy level fluctuates around the global minimum.{{Citation needed|date=June 2012}}  This process is called [[simulated annealing]].
 
If we want to train the network so that the chance it will converge to a global state is according to an external distribution that we have over these states, we need to set the weights so that the global states with the highest probabilities will get the lowest energies.  This is done by the following training procedure.
 
==Training==
 
The units in the Boltzmann Machine are divided into "visible" units, V, and "hidden" units, H. The visible units are those, which receive information from the "environment", i.e. our training set is a set of binary vectors over the set V.  The distribution over the training set is denoted <math>P^{+}(V)</math>. 
 
As is discussed above, the distribution over global states converges as the Boltzmann machine reaches [[thermal equilibrium]].  We denote this distribution, after we [[Marginal distribution|marginalize]] it over the hidden units, as <math>P^{-}(V)</math>. 
 
Our goal is to approximate the "real" distribution <math>P^{+}(V)</math> using the <math>P^{-}(V)</math> which will be produced (eventually) by the machine.  To measure how similar the two distributions are, we use the [[Kullback-Leibler divergence]], <math>G</math>:
 
:<math>G = \sum_{v}{P^{+}(v)\ln\left({\frac{P^{+}(v)}{P^{-}(v)}}\right)}</math>
 
where the sum is over all the possible states of <math>V</math>.  <math>G</math> is a function of the weights, since they determine the energy of a state, and the energy determines <math>P^{-}(v)</math>, as promised by the [[Boltzmann distribution]].  Hence, we can use a [[gradient descent]] algorithm over <math>G</math>, so a given weight, <math>w_{ij}</math> is changed by subtracting the [[partial derivative]] of <math>G</math> with respect to the weight.
 
There are two phases to Boltzmann machine training, and we switch iteratively between them. One is the "positive" phase where the visible units' states are clamped to a particular binary state vector sampled from the training set (according to <math>P^{+}</math>).  The other is the "negative" phase where the network is allowed to run freely, i.e. no units have their state determined by external data.  Surprisingly enough, the gradient with respect to a given weight, <math>w_{ij}</math>, is given by the very simple equation (proved in Ackley et al.<ref>{{cite journal|last=Ackley|first=David H.|coauthors=Hinton, Geoffrey E.; Sejnowski, Terrence J.|title=A Learning Algorithm for Boltzmann Machines|journal=[[Cognitive Science (journal)|Cognitive Science]]|year=1985|volume=9|issue=1|pages=147–169|doi=10.1207/s15516709cog0901_7|url=http://learning.cs.toronto.edu/~hinton/absps/cogscibm.pdf}}</ref>):
 
:<math>\frac{\partial{G}}{\partial{w_{ij}}} = -\frac{1}{R}[p_{ij}^{+}-p_{ij}^{-}]</math>
 
where:
* <math>p_{ij}^{+}</math> is the probability of units ''i'' and ''j'' both being on when the machine is at equilibrium on the positive phase.  
 
* <math>p_{ij}^{-}</math> is the probability of units ''i'' and ''j'' both being on when the machine is at equilibrium on the negative phase.
 
* <math>R</math> denotes the learning rate
 
This result follows from the fact that at [[thermal equilibrium]] the probability <math>P^{-}(s)</math> of any global state <math>s</math> when the network is free-running is given by the [[Boltzmann distribution]] (hence the name "Boltzmann machine"). 
 
Remarkably, this learning rule is fairly biologically plausible because the only information needed to change the weights is provided by "local" information. That is, the connection (or [[synapse]] biologically speaking) does not need information about anything other than the two neurons it connects. This is far more biologically realistic than the information needed by a connection in many other neural network training algorithms, such as [[backpropagation]].
 
The training of a Boltzmann machine does not use the [[EM algorithm]], which is heavily used in [[machine learning]].
By minimizing the KL-divergence, it is equivalent to maximizing the log-likelihood of the data. Therefore, the training procedure performs gradient ascent on the log-likelihood of the observed data. This is in contrast to the EM algorithm, where the posterior distribution of the hidden nodes must be calculated before the maximization of the expected value of the complete data likelihood during the M-step.
 
Training the biases is similar, but uses only single node activity:
 
:<math>\frac{\partial{G}}{\partial{\theta_{i}}} = -\frac{1}{R}[p_{i}^{+}-p_{i}^{-}]</math>
 
==Problems==
 
The Boltzmann machine would theoretically be a rather general computational medium.  For instance, if trained on photographs, the machine would theoretically model the distribution of photographs, and could use that model to, for example, complete a partial photograph.
 
Unfortunately, there is a serious practical problem with the Boltzmann machine, namely that the learning seems to stop working correctly when the machine is scaled up to anything larger than a trivial machine.{{Cn|date=January 2013}}  This is due to a number of effects, the most important of which are:
 
* the time the machine must be run in order to collect equilibrium statistics grows exponentially with the machine's size, and with the magnitude of the connection strengths
* connection strengths are more plastic when the units being connected have activation probabilities intermediate between zero and one, leading to a so-called [[variance trap]].  The net effect is that noise causes the connection strengths to random walk until the activities saturate.
 
==Restricted Boltzmann machine==
[[File:Restricted Boltzmann machine.svg|thumb|right|alt=Graphical representation of an example restricted Boltzmann machine |Graphical representation of a restricted Boltzmann machine. The four blue units represent hidden units, and the three red units represent visible states. In restricted Boltzmann machines there are only connections (dependencies) between hidden and visible units, and none between units of the same type (no hidden-hidden, nor visible-visible connections).]]
{{main|Restricted Boltzmann machine}}
Although learning is impractical in general Boltzmann machines, it can be made quite efficient in
an architecture called the "restricted Boltzmann machine" or "RBM" which does not allow intralayer connections between hidden units. After training one RBM, the activities of its hidden units can be treated as data for training a higher-level RBM. This method of stacking RBM's makes it possible to train many layers of hidden units efficiently and is one of the most common [[deep learning]] strategies. As each new layer is added the overall generative model gets better.
 
There is an extension to the restricted Boltzmann machine that affords using real valued data rather than binary data. Along with higher order Boltzmann machines, it is outlined here [http://www.youtube.com/watch?v=VdIURAu1-aU].
 
One example of a practical application of Restricted Boltzmann machines is the performance improvement of speech recognition software.<ref>{{cite web |url=http://research.microsoft.com/pubs/144412/DBN4LVCSR-TransASLP.pdf |title=Context-Dependent Pre-trained Deep Neural
Networks for Large Vocabulary Speech Recognition |year=2011}}</ref>
 
==History==
{{Unreferenced|section|date=November 2009}}
 
The Boltzmann machine is a [[Monte Carlo method|Monte Carlo]] version of the [[Hopfield net]]work.
 
The idea of using annealed [[Ising model]]s for inference is often thought to have been first described by: 
 
* Geoffrey E. Hinton and Terrence J. Sejnowski, Analyzing Cooperative Computation. In Proceedings of the 5th Annual Congress of the Cognitive Science Society, Rochester, NY, May 1983.
 
* Geoffrey E. Hinton and Terrence J. Sejnowski, Optimal Perceptual Inference. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 448–453, IEEE Computer Society, Washington DC, June 1983.
 
However, it should be noted that these articles appeared after the seminal publication by John Hopfield, where the connection to physics and statistical mechanics was made in the first place, mentioning spin glasses:
 
* John J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. USA, vol. 79 no. 8, pp. 2554-2558, April 1982.
 
The idea of applying the Ising model with annealed [[Gibbs sampling]] is also present in [[Douglas Hofstadter]]'s [[Copycat (software)|Copycat]] project:
 
* Hofstadter, Douglas R., The Copycat Project: An Experiment in Nondeterminism and Creative Analogies. MIT Artificial Intelligence Laboratory Memo No. 755, January 1984.
 
* Hofstadter, Douglas R., A Non-Deterministic Approach to Analogy, Involving the Ising Model of Ferromagnetism. In E. Caianiello, ed. The Physics of Cognitive Processes. Teaneck, NJ: World Scientific, 1987.
 
Similar ideas (with a change of sign in the energy function) are also found in [[Paul Smolensky]]'s "Harmony Theory".
 
The explicit analogy drawn with statistical mechanics in the Boltzmann Machine formulation led to the use of terminology borrowed from physics (e.g., "energy" rather than "harmony"), which has become standard in the field.  The widespread adoption of this terminology may have been encouraged by the fact that its use led to the importation of a variety of concepts and methods from statistical mechanics.
However, there is no reason to think that the various proposals to use simulated annealing for inference described above were not independent.
([[Hermann von Helmholtz|Helmholtz]] made a similar analogy during the dawn of psychophysics.)
 
Ising models are now considered to be a special case of [[Markov random field]]s, which find widespread application in various fields, including [[linguistics]], [[robotics]], [[computer vision]], and [[artificial intelligence]].
 
==See also==
*[[Restricted Boltzmann machine]]
*[[Markov Random Field]]
*[[Ising Model]]
*[[Hopfield network]]
* Learning rule<ref>{{cite conference
| first = C.-Y.
| last = Liou
| authorlink =
| coauthors = Lin, S.-L.
| title = The other variant Boltzmann machine
| booktitle = International Joint Conference on Neural Networks
| pages = 449-454
| publisher = IEEE
| date = 1989
| location = Washington, DC, USA
| url = http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=118618&isnumber=3401&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel2%2F842%2F3401%2F00118618.pdf%3Ftp%3D%26arnumber%3D118618%26isnumber%3D3401
| doi = 10.1109/IJCNN.1989.118618}}</ref> that uses conditional “local” information can be
derived from the reversed form of <math>G</math>,
 
:<math>G' = \sum_{v}{P^{-}(v)\ln\left({\frac{P^{-}(v)}{P^{+}(v)}}\right)}</math>.
 
==References==
{{reflist}}
 
==Further reading==
* {{cite journal
|last1=Hinton |first1=G. E. |authorlink1=Geoffrey Hinton
|last2=Sejnowski|first2=T. J. |authorlink2=Terry Sejnowski
|year=1986
|title=Learning and Relearning in Boltzmann Machines
|editors=D. E. Rumelhart, J. L. McClelland, and the PDP Research Group
|journal=Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations
|pages=282–317 |location=Cambridge |publisher=MIT Press
|url=http://learning.cs.toronto.edu/~hinton/absps/pdp7.pdf
}}
* {{cite journal
|doi=10.1162/089976602760128018
|last1=Hinton |first1=G. E. |authorlink1=Geoffrey Hinton
|year=2002
|title=Training Products of Experts by Minimizing Contrastive Divergence
|journal=[[Neural Computation]]
|volume=14
|issue=8 |pages=1771–1800
|url=http://www.cs.toronto.edu/~hinton/absps/nccd.pdf
|pmid=12180402
}}
* {{cite journal
|doi=10.1162/neco.2006.18.7.1527
|last1=Hinton |first1=G. E. |authorlink1=Geoffrey Hinton
|last2=Osindero |first2=S.
|last3=Teh |first3=Y.
|year=2006
|title=A fast learning algorithm for deep belief nets
|journal=[[Neural Computation]]
|volume=18
|issue=7 |pages=1527–1554
|url=http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf
|pmid=16764513
}}
 
== External links ==
*[http://www.scholarpedia.org/article/Boltzmann_Machine Scholarpedia article by Hinton about Boltzmann machines]
*[http://youtube.com/watch?v=AyzOUbkUf3M Talk at Google by Geoffrey Hinton]
 
[[Category:Neural networks]]

Revision as of 15:37, 22 February 2014

I woke up another day and realized - I have also been single for some time at the moment and after much intimidation from pals I today find myself signed up for internet dating. They promised me that there are plenty of standard, pleasant and enjoyable individuals to meet up, therefore here goes the pitch!
My friends and household are magnificent and spending time with them at tavern gigabytes or dinners is consistently essential. I haven't ever been in to nightclubs as I see that you can never own a decent dialogue with the sound. In addition, I got 2 really cute and definitely cheeky puppies that are consistently ready to meet up fresh individuals.
I endeavor to stay as toned as possible being at the gym several-times a luke bryan live concert (http://lukebryantickets.Pyhgy.com) week. I appreciate my athletics and make an effort to perform or view as numerous a potential. Being wintertime I am going to often at Hawthorn matches. Notice: I've experienced the carnage of wrestling fits at stocktake sales, If you really considered purchasing a sport I really do not mind.

Take luke bryan tickets 2013 a look at my web blog - luke bryan live in concert (Full Article)