|
|
Line 1: |
Line 1: |
| In [[statistics]], a '''rank correlation''' is any of several statistics that measure the relationship between [[ranking]]s of different [[ordinal data|ordinal]] variables or different rankings of the same variable, where a "ranking" is the assignment of the labels "first", "second", "third", etc. to different observations of a particular variable. A '''rank correlation coefficient''' measures the degree of similarity between two rankings, and can be used to assess the [[Statistical significance|significance]] of the relation between them.
| | Hello. Let me introduce the writer. Her name is Refugia Shryock. South Dakota is her beginning location but she requirements to move because of her family members. She is a librarian but she's always wanted her own company. Doing ceramics is what love performing.<br><br>Here is my weblog; [http://www.biogids.nl/biobank/2014-06-13/how-can-1-especially-get-around-todays-diseases www.biogids.nl] |
| | |
| ==Context==
| |
| | |
| If, for example, one variable is the identity of a college basketball program and another variable is the identity of a college football program, one could test for a relationship between the poll rankings of the two types of program: do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank correlation coefficient can measure that relationship, and the measure of significance of the rank correlation coefficient can show whether the measured relationship is small enough to likely be a coincidence.
| |
| | |
| If there is only one variable, the identity of a college football program, but it is subject to two different poll rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls' rankings can be measured with a rank correlation coefficient.
| |
| | |
| ==Correlation coefficients==
| |
| | |
| Some of the more popular rank [[correlation]] statistics include
| |
| # [[Spearman's rank correlation coefficient|Spearman's ρ]]
| |
| # [[Kendall's tau rank correlation coefficient|Kendall's τ]]
| |
| # [[Goodman and Kruskal's gamma|Goodman and Kruskal's γ]]
| |
| | |
| An increasing rank correlation [[coefficient]] implies increasing agreement between rankings. The coefficient is inside the interval [−1, 1] and assumes the value:
| |
| | |
| * 1 if the agreement between the two rankings is perfect; the two rankings are the same.
| |
| * 0 if the rankings are completely independent.
| |
| * −1 if the disagreement between the two rankings is perfect; one ranking is the reverse of the other.
| |
| | |
| Following {{harvtxt|Diaconis|1988}}, a ranking can be seen as a [[permutation]] of a [[set (mathematics)|set]] of objects. Thus we can look at observed rankings as data obtained when the sample space is (identified with) a [[symmetric group]]. We can then introduce a [[Metric (mathematics)|metric]], making the symmetric group into a [[metric space]]. Different metrics will correspond to different rank correlations.
| |
| | |
| ==General Correlation Coefficient==
| |
| {{harvtxt|Kendall|1944}} showed that his tau and Spearman's rho are particular cases of a general correlation coefficient.
| |
| | |
| Suppose we have a set of <math>n</math> objects, which are being considered in relation to two properties, represented by <math>x</math> and <math>y</math>, forming the sets of values <math>\{x_i\}_{i\le n}</math> and <math>\{y_i\}_{i\le n}</math>. To any pair of individuals, say the <math>i</math>-th and the <math>j</math>-th we assign a <math>x</math>-score, denoted by <math>a_{ij}</math>, and a <math>y</math>-score, denoted by <math>b_{ij}</math>. The only requirement made to this functions is anti-symmetry, so <math>a_{ij}=-a_{ji}</math> and <math>b_{ij}=-b_{ji}</math>. Then the generalised correlation coefficient <math>\Gamma</math> is defined by
| |
| | |
| : <math>\Gamma = \frac{\sum_{i,j = 1}^n a_{ij}b_{ij}}{\sqrt{\sum_{i,j = 1}^n a_{ij}^2 \sum_{i,j = 1}^n b_{ij}^2}} </math>
| |
| | |
| ===Kendall's <math>\tau</math> as a particular case===
| |
| If <math>r_i</math> is the rank of the <math>i</math>-member according to the <math>x</math>-quality, we can define
| |
| | |
| : <math>a_{ij} = \sgn(r_j-r_i) </math> | |
| | |
| and similarly for <math>b</math>. The sum <math>\sum a_{ij}b_{ij} </math> is twice the amount of concordant pairs minus the discordant pairs (see [[Kendall tau rank correlation coefficient]]). The sum <math>\sum a_{ij}^2</math> is just the number of terms <math>a_{ij}</math>, equal to <math>n(n-1)</math>, and so for <math>\sum b_{ij}^2</math>. It follows that <math>\Gamma</math> is equal to the Kendall's <math>\tau</math> coefficient.
| |
| | |
| ===Spearman's <math>\rho</math> as a particular case===
| |
| If <math>r_i</math>, <math>s_i</math> are the ranks of the <math>i</math>-member according to the <math>x</math> and the <math>y</math>-quality respectively, we can simply define
| |
| : <math>a_{ij} = r_j-r_i </math>
| |
| : <math>b_{ij} = s_j-s_i </math>
| |
| | |
| The sums <math>\sum a_{ij}^2</math> and <math>\sum b_{ij}^2</math> are equal, since both <math>r_i</math> and <math>s_i</math> range from <math>1</math> to <math>n</math>. Then we have:
| |
| : <math>\Gamma = \frac{\sum (r_j-r_i)(s_j-s_i)}{\sum(r_j-r_i)^2} </math>
| |
| | |
| now
| |
| : <math>\sum_{i,j = 1}^n (r_j-r_i)(s_j-s_i)= \sum_{i=1}^n \sum_{j=1}^n r_is_i + \sum_{i=1}^n \sum_{j=1}^n r_js_j - \sum_{i=1}^n \sum_{j=1}^n (r_is_j+r_js_i) </math>
| |
| : <math>=2n\sum_{i=1}^n r_is_i - 2 \sum_{i=1}^n r_i \sum_{j=1}^n s_j </math>
| |
| : <math>=2n\sum_{i=1}^n r_is_i - \frac12 n^2(n+1)^2 </math>
| |
| since <math>\sum r_i</math> and <math>\sum s_j</math> are both equal to the sum of the first <math>n</math> natural numbers, namely <math>\frac12n(n+1)</math>.
| |
| | |
| We also have
| |
| : <math>S = \sum_{i=1}^n (r_i-s_i)^2 = 2 \sum r_i^2 - 2\sum r_is_i </math>
| |
| and hence
| |
| : <math>\sum(r_j-r_i)(s_j-s_i) = 2n\sum r_i^2 - \frac12n^2(n+1)^2 - nS </math>
| |
| | |
| <math>\sum r_i^2</math> being the sum of squares of the first <math>n</math> naturals equals <math>\frac16n(n+1)(2n+1)</math>. Thus, the last equation reduces to
| |
| : <math>\sum(r_j-r_i)(s_j-s_i) = \frac16n^2(n^2-1) - nS </math>
| |
| | |
| Further
| |
| : <math>\sum(r_j-r_i)^2 = 2n\sum r_i^2-2\sum r_ir_j </math>
| |
| : <math>= 2n\sum r_i^2-2(\sum r_i)^2 = \frac16n^2(n^2-1)</math>
| |
| | |
| and thus, substituting into the original formula these results we get
| |
| : <math>\Gamma R = 1-\frac{6* SIGMA D 2}{n^3-n}</math>
| |
| | |
| which is exactly the [[Spearman's rank correlation coefficient]] <math>\rho</math>.
| |
| | |
| ==References==
| |
| *{{citation |last=Everitt |first=B. S. |year=2002 |title=The Cambridge Dictionary of Statistics |location=Cambridge |publisher=Cambridge University Press |isbn=0-521-81099-X}}
| |
| *{{citation |last=Diaconis |first=P. |year=1988 |title=Group Representations in Probability and Statistics |series=Lecture Notes-Monograph Series |location=Hayward, CA |publisher=Institute of Mathematical Statistics |isbn=0-940600-14-5}}
| |
| *{{citation |last=Kendall |first=M. G. | year=1970 | title=Rank Correlation Methods | location=London | publisher=Griffin | isbn=0-85264-199-0}}
| |
| | |
| {{Statistics|descriptive}}
| |
| | |
| [[Category:Covariance and correlation]]
| |
| [[Category:Non-parametric statistics]]
| |
| [[Category:Statistical dependence]]
| |
Hello. Let me introduce the writer. Her name is Refugia Shryock. South Dakota is her beginning location but she requirements to move because of her family members. She is a librarian but she's always wanted her own company. Doing ceramics is what love performing.
Here is my weblog; www.biogids.nl