Energy drift: Difference between revisions

Latest revision as of 22:55, 15 April 2014

Hi there, I am Alyson Pomerleau and I believe it seems quite good when you say it. My wife and I live in Mississippi but now I'm contemplating other choices. My working day job is an invoicing officer but I've already utilized for another 1. What I adore performing is soccer but I don't have the time recently.

Also visit my web site ... clairvoyants (ustanford.com)

@@ Line 1: / Line 1: @@
-The '''Sørensen–Dice index''', also known by other names (see Names, below), is a [[statistic]] used for comparing the similarity of two [[Sample (statistics)|samples]]. It was independently developed by the [[botanist]]s [[Thorvald Sørensen]]<ref>{{cite journal |last=Sørensen |first=T. |year=1948 |title=A method of establishing groups of equal amplitude in [[plant sociology]] based on similarity of species and its application to analyses of the vegetation on Danish commons |journal=[[Kongelige Danske Videnskabernes Selskab]] |volume=5 |issue=4 |pages=1–34 |doi= }}</ref> and [[Lee Raymond Dice]],<ref>{{cite journal |last=Dice |first=Lee R. |title=Measures of the Amount of Ecologic Association Between Species |jstor=1932409 |journal=Ecology |volume=26 |issue=3 |year=1945 |pages=297–302 |doi=10.2307/1932409 }}</ref> who published in 1948 and 1945 respectively.
+Hi there, I am Alyson Pomerleau and I believe it seems quite good when you say it. My wife and I live in Mississippi but now I'm contemplating other choices. My working day job is an invoicing officer but I've already utilized for another 1. What I adore performing is soccer but I don't have the time recently.<br><br>Also visit my web site ... clairvoyants ([http://ustanford.com/index.php?do=/profile-38218/info/ ustanford.com])
-==Name==
-The index is known by several other names, usually '''Sørensen index''' or '''Dice's coefficient'''. Both names also see "similarity coefficient", "index", and other such variations. Common alternate spellings for Sørensen are Sorenson, Soerenson index and Sörenson index, and all three can also be seen with the –sen ending.
-==Formula==
-Sørensen's original formula was intended to be applied to presence/absence data, and is
-:<math> QS = \frac{2C}{A + B} = \frac{2 |A \cap B|}{|A| + |B|}</math>
-where ''A'' and ''B'' are the number of species in samples A and B, respectively, and ''C'' is the number of species shared by the two samples; QS is the quotient of similarity and ranges from 0 to 1. This expression is easily extended to [[Abundance (ecology)|abundance]] instead of presence/absence of species. This quantitative version of the Sørensen index is also known as ''[[Jan Czekanowski|Czekanowski]] index''. The Sørensen index is identical to [[Dice's coefficient]]<ref>http://www.sekj.org/PDF/anbf40/anbf40-415.pdf</ref> which is always in [0,&nbsp;1] range. The Sørensen index used as a distance measure, 1&nbsp;−&nbsp;''QS'', is identical to [[Hellinger distance]] and [[Bray Curtis dissimilarity]]<ref>{{cite journal |first=J. Roger |last=Bray |first2=J. T. |last2=Curtis |year=1948 |title=An Ordination of the Upland Forest Communities of Southern Wisconsin |journal=Ecological Monographs |volume=27 |issue=4 |pages=326–349 |doi=10.2307/1942268 }}</ref> when applied to quantitative data.
-It can be viewed as a similarity measure over sets:
-:<math>s = \frac{2 | X \cap Y |}{| X | + | Y |} </math>
-It is not very different in form from the [[Jaccard index]] but has some different properties.
-The function ranges between zero and one, like Jaccard. Unlike Jaccard, the corresponding difference function
-:<math>d = 1 -  \frac{2 | X \cap Y |}{| X | + | Y |} </math>
-is not a proper distance metric as it does not possess the property of triangle inequality. The simplest counterexample of this is given by the three sets {a}, {b}, and {a,b}, the distance between the first two
-being 1, and the difference between the third and each of the others being one-third.
-Similarly to Jaccard, the set operations can be expressed in terms of vector operations over binary vectors ''A'' and ''B'':
-<math>s_v = \frac{2 | A \cdot B |}{| A |^2 + | B |^2} </math>
-which gives the same outcome over binary vectors and also gives a more general similarity metric over vectors in general terms.
-For sets ''X'' and ''Y'' of keywords used in [[information retrieval]], the coefficient may be defined as twice the shared information (intersection) over the sum of cardinalities :<ref>{{cite book |last=van Rijsbergen |first=Cornelis Joost |year=1979
-|title=Information Retrieval
-|url=http://www.dcs.gla.ac.uk/Keith/Preface.html |publisher=Butterworths |location=London |isbn=3-642-12274-4 }}</ref>
-When taken as a string similarity measure, the coefficient may be calculated for two strings, ''x'' and ''y'' using [[bigram]]s as follows:<ref>{{cite conference |last=Kondrak |first=Grzegorz |coauthors=Marcu, Daniel; and Knight, Kevin |year=2003
-|title=Cognates Can Improve Statistical Translation Models
-|booktitle=Proceedings of HLT-NAACL 2003: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics
-|pages=46–48 |url=http://aclweb.org/anthology/N/N03/N03-2016.pdf}}</ref>
-:<math>s = \frac{2 n_t}{n_x + n_y}</math>
-where ''n''<sub>''t''</sub> is the number of character bigrams found in both strings, ''n''<sub>''x''</sub> is the number of bigrams in string ''x'' and ''n''<sub>''y''</sub> is the number of bigrams in string ''y''. For example, to calculate the similarity between:
-:<code>night</code>
-:<code>nacht</code>
-We would find the set of bigrams in each word:
-:{<code>ni</code>,<code>ig</code>,<code>gh</code>,<code>ht</code>}
-:{<code>na</code>,<code>ac</code>,<code>ch</code>,<code>ht</code>}
-Each set has four elements, and the intersection of these two sets has only one element: <code>ht</code>.
-Inserting these numbers into the formula, we calculate, ''s''&nbsp;=&nbsp;(2&nbsp;·&nbsp;1)&nbsp;/&nbsp;(4&nbsp;+&nbsp;4)&nbsp;=&nbsp;0.25.
-==Applications==
-The Sørensen–Dice coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960<ref>[http://links.jstor.org/sici?sici=0012-9658%28196007%2941%3A3%3C409%3AAOSK%28F%3E2.0.CO%3B2-1 Looman, J. and Campbell, J.B. (1960) Adaptation of Sorensen's K (1948) for estimating unit affinities in prairie vegetation. Ecology 41 (3): 409–416.]</ref>). Justification for its use is primarily  empirical rather than theoretical (although it can be justified  theoretically as the intersection of two [[fuzzy set]]s<ref>[http://dx.doi.org/10.1007/BF00039905 Roberts, D.W. (1986) Ordination on the basis of fuzzy set theory. Vegetatio 66 (3): 123–131.]</ref>). As compared to [[Euclidean distance]], Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers.<ref>McCune, Bruce & Grace, James (2002) Analysis of Ecological Communities. Mjm Software Design; ISBN 0-9721290-0-6.</ref>
-==See also==
-* [[Correlation]]
-* [[Czekanowski similarity index]]
-* [[Jaccard index]]
-* [[Hamming distance]]
-* [[Horn’s index]]
-* [[Hurlbert’s index]]
-* [[Kulczyński similarity index]]
-* [[Pianka's index]]
-* [[MacArthur and Levin's index]]
-* [[Morisita's overlap index]]
-* [[Overlap coefficient]]
-* [[Renkonen similarity index]] (due to [[Olavi Renkonen]])
-* [[Simplified Morisita’s index]]
-* [[Tversky index]]
-* [[Universal adaptive strategy theory (UAST)]]
-==References==
-{{reflist}}
-==External links==
-{{Wikibooks|Algorithm implementation|Strings/Dice's coefficient|Dice's coefficient}}
-* Open Source [https://github.com/rockymadden/stringmetric/blob/master/core/src/main/scala/com/rockymadden/stringmetric/similarity/DiceSorensenMetric.scala Dice / Sorensen] [[Scala programming language|Scala]] implementation as part of the larger [http://rockymadden.com/stringmetric/ stringmetric project]
-{{DEFAULTSORT:Sorensen-Dice coefficient}}
-[[Category:Information retrieval]]
-[[Category:String similarity measures]]
-[[Category:Measure theory]]

Energy drift: Difference between revisions

Latest revision as of 22:55, 15 April 2014

Navigation menu

Search