Sholl analysis: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
link cats -> cat
en>Myasuda
m sp
 
Line 1: Line 1:
The '''iterative proportional fitting procedure''' ('''IPFP''', also known as '''biproportional fitting''' in statistics, '''RAS algorithm'''<ref>{{cite journal
I'm Eddie and I liѵe աith my huѕband and our 2 children in Bergerаc, in thе  south part. My hobbies are Art collеcting, Antiquing and Shooting sport.<br>http://www.highschoolofperformingarts.com/uploads/files/pres.php?uploads/files/pres.php=salomon-xt-slab-5-shoes<br><br>my wеƄpage; [http://Grohova.cz/img/tmp/pres.php?nike/air/force=adidas-shoes-barricade-6.0 Air jordan 11 kaskus]
|last=Bacharach|first=M.
|year=1965
|title=Estimating Nonnegative Matrices from Marginal Data
|journal=International Economic Review
|volume=6|pages=294–310
|doi=10.2307/2525582
|jstor=2525582
|issue=3
|publisher=Blackwell Publishing
}}</ref> in economics and '''matrix raking''' or '''matrix scaling''' in computer science) is an [[iterative algorithm]] for estimating cell values of a [[contingency table]] such that the marginal totals remain fixed and the estimated table decomposes into an [[outer product]].
 
First introduced by [[W. Edwards Deming|Deming]] and Stephan in 1940<ref>{{cite journal
|last=Deming |first=W. E.|authorlink=W. Edwards Deming
|last2=Stephan |first2=F. F.
|year=1940
|title=On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known
|journal=[[Annals of Mathematical Statistics]]
|volume=11 |issue=4 |pages=427–444
|mr=3527 |doi=10.1214/aoms/1177731829
}}</ref> (they proposed IPFP as an algorithm leading to a minimizer of the [[Pearson X-squared statistic]], which it ''does not'',<ref>{{cite journal
|last=Stephan |first=F. F.|year=1942
|title=Iterative method of adjusting frequency tables when expected margins are known
|journal=[[Annals of Mathematical Statistics]]
|volume=13 |issue=2 |pages=166–178
|mr=6674 | zbl = 0060.31505 |doi=10.1214/aoms/1177731604
}}</ref> and even failed to prove convergence), it has seen various extensions and related research. A rigorous proof of convergence by means of [[differential geometry]] is due to [[Stephen Fienberg|Fienberg]] (1970).<ref>{{cite journal
|last=Fienberg |first=S. E.|authorlink=Stephen Fienberg
|year=1970
|title=An Iterative Procedure for Estimation in Contingency Tables
|journal=[[Annals of Mathematical Statistics]]
|volume=41 |issue=3 |pages=907–917
|mr=266394 | zbl = 0198.23401 | jstor = 2239244 |doi=10.1214/aoms/1177696968
}}</ref> He interpreted the family of contingency tables of constant crossproduct ratios as a particular (''IJ''&nbsp;&minus;&nbsp;1)-dimensional manifold of constant interaction and showed that the IPFP is a fixed-point iteration on that manifold. Nevertheless, he assumed strictly positive observations. Generalization to tables with zero entries is still considered a hard and only partly solved problem.
 
An exhaustive treatment of the algorithm and its mathematical foundations can be found in the book of Bishop et al. (1975).<ref>{{cite book
|title=Discrete Multivariate Analysis: Theory and Practice
|last=Bishop |first=Y. M. M.
|first2=S. E. |last2=Fienberg |authorlink2=Stephen Fienberg
|first3=P. W. |last3=Holland
|year=1975
|publisher=MIT Press|isbn=978-0-262-02113-5 |mr=381130
}}</ref> The first general proof of convergence, built on non-trivial measure theoretic theorems and entropy minimization, is due to Csiszár (1975).<ref>{{cite journal
|last=Csiszár |first=I.|authorlink=Imre Csiszár
|year=1975
|title=''I''-Divergence of Probability Distributions and Minimization Problems
|journal=Annals of Probability
|volume=3 |issue=1 |pages=146–158
|mr=365798 | zbl = 0318.60013 | jstor = 2959270 |doi=10.1214/aop/1176996454
}}</ref>
Relatively new results on convergence and error behavior have been published by Pukelsheim and Simeone (2009)
.<ref>{{cite web |title=On the Iterative Proportional Fitting Procedure: Structure of Accumulation Points and L1-Error Analysis |url=http://opus.bibliothek.uni-augsburg.de/volltexte/2009/1368/ |date= |work= |publisher= Pukelsheim, F. and Simeone, B. |accessdate=2009-06-28}}</ref> They proved simple necessary and sufficient conditions for the convergence of the IPFP for arbitrary two-way tables (i.e. tables with zero entries) by analysing an <math>L_1</math>-error function.
 
Other general algorithms can be modified to yield the same limit as the IPFP, for instance the [[Newton–Raphson method]] and
the [[EM algorithm]]. In most cases, IPFP is preferred due to its computational speed, numerical stability and algebraic simplicity.
 
== Algorithm 1 (classical IPFP) ==
 
Given a  two-way (''I'' &times; ''J'')-table of counts <math>(x_{ij})</math>, where the cell values are assumed to be Poisson or multinomially distributed, we wish to estimate a decomposition <math>\hat{m}_{ij} = a_i b_j</math> for all ''i'' and ''j'' such that <math>(\hat{m}_{ij})</math> is the [[maximum likelihood]] estimate (MLE) of the expected values <math>(m_{ij})</math> leaving the marginals <math>\textstyle x_{i+} = \sum_j x_{ij}\,</math> and <math>\textstyle x_{+j} = \sum_i x_{ij}\,</math> fixed. The assumption that the table factorizes in such a manner is known as the ''model of independence'' (I-model). Written in terms of a [[log-linear model]], we can write this assumption as <math>\log\ m_{ij} = u + v_i + w_j + z_{ij}</math>, where <math>m_{ij} := \mathbb{E}(x_{ij})</math>, <math>\sum_i v_i = \sum_j w_j = 0</math> and the interaction term vanishes, that is <math>z_{ij} = 0</math> for all ''i'' and ''j''.
 
Choose initial values <math>\hat{m}_{ij}^{(0)} := 1</math> (different choices of initial values may lead to changes in convergence behavior), and for <math>\eta \geq 1</math> set
 
: <math>\hat{m}_{ij}^{(2\eta - 1)} = \frac{\hat{m}_{ij}^{(2\eta-2)}x_{i+}}{\sum_{k=1}^J \hat{m}_{ik}^{(2\eta-2)}}</math>
 
: <math>\hat{m}_{ij}^{(2\eta)} = \frac{\hat{m}_{ij}^{(2\eta-1)}x_{+j}}{\sum_{k=1}^I \hat{m}_{kj}^{(2\eta-1)}}.</math>
 
Notes:
* Convergence does not depend on the actual distribution. Distributional assumptions are necessary for inferring that the limit <math>(\hat{m}_{ij}) := \lim_{\eta\rightarrow\infty} (\hat{m}^{(\eta)}_{ij})</math> is an MLE indeed.
 
* IPFP can be manipulated to generate any positive marginals be replacing <math>x_{i+}</math> by the desired row marginal <math>u_i</math> (analogously for the column marginals).
 
* IPFP can be extended to fit the ''model of quasi-independence'' (Q-model), where <math>m_{ij} = 0</math> is known a priori for <math>(i,j)\in S</math>. Only the initial values have to be changed: Set <math>\hat{m}_{ij}^{(0)} = 0</math> if <math>(i,j)\in S</math> and 1 otherwise.
 
== Algorithm 2 (factor estimation) ==
 
Assume the same setting as in the classical IPFP.
Alternatively, we can estimate the row and column factors separately: Choose initial values <math>\hat{b}_j^{(0)} := 1</math>, and for <math>\eta \geq 1</math> set
 
: <math>\hat{a}_i^{(\eta)} = \frac{x_{i+}}{\sum_j \hat{b}_j^{(\eta-1)}},</math>
 
: <math>\hat{b}_j^{(\eta)} = \frac{x_{+j}}{\sum_i \hat{a}_i^{(\eta)}}</math>
 
Setting <math>\hat{m}_{ij}^{(2\eta)} = \hat{a}_i^{(\eta)}\hat{b}_j^{(\eta)}</math>, the two variants of the algorithm are mathematically equivalent (can be seen by formal induction).
 
Notes:
 
* In matrix notation, we can write <math>(\hat{m}_{ij}) = \hat{a}\hat{b}^T</math>, where <math>\hat{a} = (\hat{a}_1,\ldots,\hat{a}_I)^T = \lim_{\eta\rightarrow\infty} \hat{a}^{(\eta)}</math> and <math>\hat{b} = (\hat{b}_1,\ldots,\hat{b}_J)^T = \lim_{\eta\rightarrow\infty} \hat{b}^{(\eta)}</math>.
* The factorization is not unique, since it is <math>m_{ij} = a_i b_j = (\gamma a_i)(\frac{1}{\gamma}b_j)</math> for all <math>\gamma > 0</math>.
* The factor totals remain constant, i.e. <math>\sum_i \hat{a}_i^{(\eta)} = \sum_i \hat{a}_i^{(1)}</math> for all <math>\eta \geq 1</math> and <math>\sum_j \hat{b}_j^{(\eta)} = \sum_j \hat{b}_j^{(0)}</math> for all <math>\eta \geq 0</math>.
* To fit the Q-model, where <math>m_{ij} = 0</math> a priori for <math>(i,j)\in S</math>, set <math>\delta_{ij} = 0</math> if (<math>i,j)\in S</math> and <math>\delta_{ij} = 1</math> otherwise. Then
 
:: <math>\hat{a}_i^{(\eta)} = \frac{x_{i+}}{\sum_j \delta_{ij}\hat{b}_j^{(\eta-1)}},</math>
 
:: <math>\hat{b}_j^{(\eta)} = \frac{x_{+j}}{\sum_i \delta_{ij}\hat{a}_i^{(\eta)}}</math>
 
:: <math>\hat{m}_{ij}^{(2\eta)} = \delta_{ij}\hat{a}_i^{(\eta)}\hat{b}_j^{(\eta)}</math>
 
Obviously, the I-model is a particular case of the Q-model.
 
== Algorithm 3 (RAS) ==
 
The Problem: Let <math>M := (m^{(0)}_{ij}) \in \mathbb{R}^{I\times J}</math> be the initial matrix with nonnegative entries, <math>u \in \mathbb{R}^I</math> a vector of specified
row marginals (e.i. row sums) and <math>v \in \mathbb{R}^J</math> a vector of column marginals. We wish to compute a matrix <math>\hat{M} = (\hat{m}_{ij}) \in \mathbb{R}^{I\times J}</math> similar to ''M'' with predefined marginals, meaning
 
: <math>\hat{a}_{i+} = \sum_{j=1}^n \hat{a}_{ij} = u_i</math>
 
and
 
: <math>\hat{a}_{+j} = \sum_{i=1}^m \hat{a}_{ij} = v_j</math>
 
Define the diagonalization operator <math>diag: \mathbb{R}^k \longrightarrow \mathbb{R}^{k\times k}</math>, which produces a (diagonal) matrix with its input vector on the main diagonal and zero elsewhere. Then, for <math>\eta \geq 0</math>, set
 
: <math>M^{(2\eta + 1)} = \text{diag}(r^{(\eta+1)})M^{(2\eta)}</math>
 
: <math>M^{(2\eta + 2)} = M^{(2\eta+1)}\text{diag}(s^{(\eta+1)})</math>
 
where
 
: <math>r_i^{\eta + 1} = \frac{u_i}{\sum_j m_{ij}^{(2\eta)}}</math>
 
: <math>s_j^{\eta + 1} = \frac{v_j}{\sum_i m_{ij}^{(2\eta+1)}}</math>
 
Finally, we obtain <math>\hat{M} = \lim_{\eta\rightarrow\infty} M^{(\eta)}.</math>
 
== Discussion and comparison of the algorithms ==
 
Although RAS seems to be the solution of an entirely different problem, it is indeed identical to the classical IPFP. In practice,
one would not implement actual matrix multiplication, since diagonal matrices are involved. Reducing the operations to the necessary ones,
it can easily be seen that RAS does the same as IPFP. The vaguely demanded 'similarity' can be explained as follows: IPFP (and thus RAS)
maintains the crossproduct ratios, e.i.
 
: <math>\frac{m^{(0)}_{ij}m^{(0)}_{hk}}{m^{(0)}_{ik}m^{(0)}_{hj}} = \frac{m^{(\eta)}_{ij}m^{(\eta)}_{hk}}{m^{(\eta)}_{ik}m^{(\eta)}_{hj}}\ \forall\ \eta \geq 0\text{ and }i\neq h,\quad  j\neq k</math>
 
since <math>m^{(\eta)}_{ij} = a_i^{(\eta)}b_j^{(\eta)}.</math>
 
This property is sometimes called '''structure conservation''' and directly leads to the geometrical interpretation of contingency tables and the proof of convergence in the seminal paper of Fienberg (1970).
 
Nevertheless, direct factor estimation (algorithm 2) is under all circumstances the best way to deal with IPF: Whereas classical IPFP needs
 
: <math>IJ(2+J) + IJ(2+I) = I^2J + IJ^2 + 4IJ \, </math>
 
elementary operations in each iteration step (including a row and a column fitting step), factor estimation needs only
 
: <math>I(1+J) + J(1+I) = 2IJ + I + J \, </math>
 
operations being at least one order in magnitude faster than classical IPFP.
 
== Existence and uniqueness of MLEs ==
 
Necessary and sufficient conditions for the existence and uniqueness of MLEs are complicated in the general case (see<ref>{{cite book |title=The Analysis of Frequency Data |last=Haberman |first=S. J.|year=1974 |publisher=Univ. Chicago Press|isbn=978-0-226-31184-5}}</ref>), but sufficient conditions for 2-dimensional tables are simple:
 
* the marginals of the observed table do not vanish (that is, <math>x_{i+} > 0,\ x_{+j} > 0</math>) and
* the observed table is inseparable (e.i. the table does not permute to a block-diagonal shape).
 
If unique MLEs exist, IPFP exhibits linear convergence in the worst case (Fienberg 1970), but exponential convergence has also been observed (Pukelsheim and Simeone 2009). If a direct estimator (i.e. a closed form of <math>(\hat{m}_{ij})</math>) exists, IPFP converges after 2 iterations. If unique MLEs do not exist, IPFP converges toward the so-called ''extended MLEs'' by design (Haberman 1974), but convergence may be arbitrarily slow and often computationally infeasible.
 
If all observed values are strictly positive, existence and uniqueness of MLEs and therefore convergence is ensured.
 
== Goodness of fit ==
 
Checking if the assumption of independence is adequate, one uses the [[Pearson X-squared statistic]]
 
: <math>X^2 = \sum_{i,j}\frac{(x_{ij}-\hat{m_{ij}})^2}{\hat{m_{ij}}}</math>
 
or alternatively the [[likelihood-ratio test]] ([[G-test]]) statistic
 
: <math>G = 2\sum_{i,j} x_{ij}\log\ \frac{x_{ij}}{\hat{m}_{ij}}</math>.
 
Both statistics are asymptotically <math>\Chi^2_r</math>-distributed, where <math>r = (I-1)(J-1)</math> is the number of degrees of freedom.
That is, if the [[p-value]]s <math>1 - \Chi^2_r(X^2)</math> and <math>1 - \Chi^2_r(G)</math> are not too small (> 0.05 for instance), there is no indication to discard the hypothesis of independence.
 
== Interpretation ==
 
If the rows correspond to different values of property A, and the columns correspond to different values of property B, and the hypothesis of independence is not discarded, the properties A and B are considered independent.
 
== Example ==
 
Consider a table of observations (taken from the entry on [[contingency table]]s):
 
<center>
{| class="wikitable"
|-----
|
|| right-handed || left-handed || TOTAL
|-----
| male || 43 || 9 || 52
|-----
| female || 44 || 4 || 48
|-----
| TOTAL || 87 || 13 || 100
|}</center>
 
For executing the classical IPFP, we first initialize the matrix with ones, leaving the marginals untouched:
 
<center>
{| class="wikitable"
|-----
|
|| right-handed || left-handed || TOTAL
|-----
| male || 1 || 1 || 52
|-----
| female || 1 || 1 || 48
|-----
| TOTAL || 87 || 13 || 100
|}</center>
 
Of course, the marginal sums do not correspond to the matrix anymore, but this is fixed in the next two iterations of IPFP. The first iteration deals with the row sums:
 
<center>
{| class="wikitable"
|-----
|
|| right-handed || left-handed || TOTAL
|-----
| male || 26 || 26 || 52
|-----
| female || 24 || 24 || 48
|-----
| TOTAL || 87 || 13 || 100
|}</center>
 
Note that, by definition, the row sums always constitute a perfect match after odd iterations, as do the column sums for even ones. The subsequent iteration updates the matrix column-wise:
 
<center>
{| class="wikitable"
|-----
|
|| right-handed || left-handed || TOTAL
|-----
| male || 45.24 || 6.76 || 52
|-----
| female ||  41.76 || 6.24 || 48
|-----
| TOTAL || 87 || 13 || 100
|}</center>
 
Now, both row and column sums of the matrix match the given marginals again.
 
The [[p-value]] of this matrix approximates to <math>p(X^2) \approx  0.1824671</math>, meaning: gender and left-handedness/right-handedness can be considered independent.
 
== Notes ==
{{reflist}}
 
{{DEFAULTSORT:Iterative Proportional Fitting}}
[[Category:Categorical data]]
[[Category:Statistical algorithms]]

Latest revision as of 16:42, 22 November 2014

I'm Eddie and I liѵe աith my huѕband and our 2 children in Bergerаc, in thе south part. My hobbies are Art collеcting, Antiquing and Shooting sport.
http://www.highschoolofperformingarts.com/uploads/files/pres.php?uploads/files/pres.php=salomon-xt-slab-5-shoes

my wеƄpage; Air jordan 11 kaskus