|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{For|the video game|Perplexity (video game)}}{{For|the card/alternative reality game|Perplex City}}
| | Hi there, I am Sophia. Alaska is exactly where he's usually been residing. Invoicing is my profession. It's not a common thing but what she likes doing is to play domino but she doesn't have the time recently.<br><br>My web site - tarot card readings ([http://fashionlinked.com/index.php?do=/profile-13453/info/ find more info]) |
| | |
| {{Wiktionarypar|perplexity}}
| |
| | |
| In [[information theory]], '''perplexity''' is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models.
| |
| | |
| == Perplexity of a probability distribution ==
| |
| | |
| The perplexity of a discrete [[probability distribution]] ''p'' is defined as
| |
| | |
| :<math>2^{H(p)}=2^{-\sum_x p(x)\log_2 p(x)}</math>
| |
| | |
| where ''H''(''p'') is the entropy of the distribution and ''x'' ranges over events. | |
| | |
| Perplexity of a [[random variable]] ''X'' may be defined as the perplexity of the distribution over its possible values ''x''.
| |
| | |
| In the special case where ''p'' models a fair ''k''-sided die (a uniform distribution over ''k'' discrete events), its perplexity is ''k''. A random variable with perplexity ''k'' has the same uncertainty as a fair ''k''-sided die, and one is said to be "''k''-ways perplexed" about the value of the random variable. (Unless it is a fair ''k''-sided die, more than ''k'' values will be possible, but the overall uncertainty is no greater because some of these values will have probability greater than 1/''k'', decreasing the overall value while summing.)
| |
| | |
| == Perplexity of a probability model ==
| |
| | |
| A model of an unknown probability distribution ''p'', may be proposed based on a training sample that was drawn from ''p''. Given a proposed probability model ''q'', one may evaluate ''q'' by asking how well it predicts a separate test sample ''x''<sub>1</sub>, ''x''<sub>2</sub>, ..., ''x<sub>N</sub>'' also drawn from ''p''. The perplexity of the model ''q'' is defined as
| |
| | |
| :<math>2^{-\sum_{i=1}^N \frac{1}{N} \log_2 q(x_i)}</math>
| |
| | |
| Better models ''q'' of the unknown distribution ''p'' will tend to assign higher probabilities ''q''(''x<sub>i</sub>'') to the test events. Thus, they have lower perplexity: they are less surprised by the test sample.
| |
| | |
| The exponent above may be regarded as the average number of bits needed to represent a test event ''x<sub>i</sub>'' if one uses an optimal code based on ''q''. Low-perplexity models do a better job of compressing the test sample, requiring few bits per test element on average because ''q''(''x<sub>i</sub>'') tends to be high.
| |
| | |
| The exponent may also be regarded as a [[cross-entropy]],
| |
| | |
| :<math>H(\tilde{p},q) = -\sum_x \tilde{p}(x) \log_2 q(x)</math> | |
| | |
| where <math>\tilde{p}</math> denotes the empirical distribution of the test sample (i.e., <math>\tilde{p}(x) = n/N</math> if ''x'' appeared ''n'' times in the test sample of size ''N'').
| |
| | |
| == Perplexity per word ==
| |
| | |
| In [[natural language processing]], perplexity is a way of evaluating [[language model]]s. A language model is a probability distribution over entire sentences or texts.
| |
| | |
| Using the definition of perplexity for a probability model, one might find, for example, that the average sentence ''x<sub>i</sub>'' in the test sample could be coded in 190 bits (i.e., the test sentences had an average log-probability of -190). This would give an enormous model perplexity of 2<sup>190</sup> per sentence. However, it is more common to normalize for sentence length and consider only the number of bits per word. Thus, if the test sample's sentences comprised a total of 1,000 words, and could be coded using a total of 7,950 bits, one could report a model perplexity of 2<sup>7.95</sup> = 247 ''per word.'' In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word.
| |
| | |
| The lowest perplexity that has been published on the [[Brown Corpus]] (1 million words of American [[English language|English]] of varying topics and genres) as of 1992 is indeed about 247 per word, corresponding to a cross-entropy of log<sub>2</sub>247 = 7.95 bits per word or 1.75 bits per letter <ref>{{cite journal |last=Brown |first=Peter F. |authorlink= |coauthors=et al.|date=March 1992 |title= An Estimate of an Upper Bound for the Entropy of English|journal=Computational Linguistics |volume=18 |issue=1 |pages= |id= |url=http://acl.ldc.upenn.edu/J/J92/J92-1002.pdf |accessdate=2007-02-07}}</ref> using a [[N-gram|trigram]] model. It is often possible to achieve lower perplexity on more specialized [[text corpus|corpora]], as they are more predictable.
| |
| | |
| ==References==
| |
| <references />
| |
| | |
| [[Category:Entropy and information]]
| |
Hi there, I am Sophia. Alaska is exactly where he's usually been residing. Invoicing is my profession. It's not a common thing but what she likes doing is to play domino but she doesn't have the time recently.
My web site - tarot card readings (find more info)