Familywise error rate: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Rjwilmsi
m Journal cites, added 1 DOI using AWB (9887)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{For|the video game|Perplexity (video game)}}{{For|the card/alternative reality game|Perplex City}}
Hi there, I am Sophia. Alaska is exactly where he's usually been residing. Invoicing is my profession. It's not a common thing but what she likes doing is to play domino but she doesn't have the time recently.<br><br>My web site - tarot card readings ([http://fashionlinked.com/index.php?do=/profile-13453/info/ find more info])
 
{{Wiktionarypar|perplexity}}
 
In [[information theory]], '''perplexity''' is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models.
 
== Perplexity of a probability distribution ==
 
The perplexity of a discrete [[probability distribution]] ''p'' is defined as
 
:<math>2^{H(p)}=2^{-\sum_x p(x)\log_2 p(x)}</math>
 
where ''H''(''p'') is the entropy of the distribution and ''x'' ranges over events.
 
Perplexity of a [[random variable]] ''X'' may be defined as the perplexity of the distribution over its possible values ''x''.
 
In the special case where ''p'' models a fair ''k''-sided die (a uniform distribution over ''k'' discrete events), its perplexity is ''k''.   A random variable with perplexity ''k'' has the same uncertainty as a fair ''k''-sided die, and one is said to be "''k''-ways perplexed" about the value of the random variable.  (Unless it is a fair ''k''-sided die, more than ''k'' values will be possible, but the overall uncertainty is no greater because some of these values will have probability greater than 1/''k'', decreasing the overall value while summing.)
 
== Perplexity of a probability model ==
 
A model of an unknown probability distribution ''p'', may be proposed based on a training sample that was drawn from ''p''.  Given a proposed probability model ''q'', one may evaluate ''q'' by asking how well it predicts a separate test sample ''x''<sub>1</sub>, ''x''<sub>2</sub>, ..., ''x<sub>N</sub>'' also drawn from ''p''.  The perplexity of the model ''q'' is defined as
 
:<math>2^{-\sum_{i=1}^N \frac{1}{N} \log_2 q(x_i)}</math>
 
Better models ''q'' of the unknown distribution ''p'' will tend to assign higher probabilities ''q''(''x<sub>i</sub>'') to the test events.  Thus, they have lower perplexity: they are less surprised by the test sample.
 
The exponent above may be regarded as the average number of bits needed to represent a test event ''x<sub>i</sub>'' if one uses an optimal code based on ''q''.  Low-perplexity models do a better job of compressing the test sample, requiring few bits per test element on average because ''q''(''x<sub>i</sub>'') tends to be high.
 
The exponent may also be regarded as a [[cross-entropy]],
 
:<math>H(\tilde{p},q) = -\sum_x \tilde{p}(x) \log_2 q(x)</math>
 
where <math>\tilde{p}</math> denotes the empirical distribution of the test sample (i.e., <math>\tilde{p}(x) = n/N</math> if ''x'' appeared ''n'' times in the test sample of size ''N'').
 
== Perplexity per word ==
 
In [[natural language processing]], perplexity is a way of evaluating [[language model]]s.  A language model is a probability distribution over entire sentences or texts. 
 
Using the definition of perplexity for a probability model, one might find, for example, that the average sentence ''x<sub>i</sub>'' in the test sample could be coded in 190 bits (i.e., the test sentences had an average log-probability of -190).  This would give an enormous model perplexity of 2<sup>190</sup> per sentence.  However, it is more common to normalize for sentence length and consider only the number of bits per word.  Thus, if the test sample's sentences comprised a total of 1,000 words, and could be coded using a total of 7,950 bits, one could report a model perplexity of 2<sup>7.95</sup> = 247  ''per word.''  In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word.
 
The lowest perplexity that has been published on the [[Brown Corpus]] (1 million words of American [[English language|English]] of varying topics and genres) as of 1992 is indeed about 247 per word, corresponding to a cross-entropy of log<sub>2</sub>247 = 7.95 bits per word or 1.75 bits per letter <ref>{{cite journal |last=Brown |first=Peter F. |authorlink= |coauthors=et al.|date=March 1992 |title= An Estimate of an Upper Bound for the Entropy of English|journal=Computational Linguistics |volume=18 |issue=1 |pages= |id= |url=http://acl.ldc.upenn.edu/J/J92/J92-1002.pdf |accessdate=2007-02-07}}</ref> using a [[N-gram|trigram]] model.  It is often possible to achieve lower perplexity on more specialized [[text corpus|corpora]], as they are more predictable.
 
==References==
<references />
 
[[Category:Entropy and information]]

Latest revision as of 03:54, 27 September 2014

Hi there, I am Sophia. Alaska is exactly where he's usually been residing. Invoicing is my profession. It's not a common thing but what she likes doing is to play domino but she doesn't have the time recently.

My web site - tarot card readings (find more info)