Chandrasekhar number: Difference between revisions

Latest revision as of 00:10, 19 July 2013

Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations (e.g. individuals in a cohort) classical tools (such as cluster analysis) can be used. The method was tailored to social sciences^[1] from a technique originally introduced to study molecular biology (protein or genetic) sequences (see sequence alignment). Optimal matching uses the Needleman-Wunsch algorithm.

Algorithm

Let $S = (s_{1}, s_{2}, s_{3}, \dots s_{T})$ be a sequence of states $s_{i}$ belonging to a finite set of possible states. Let us denote $S$ the sequence space, i.e. the set of all possible sequences of states.

Optimal matching algorithms work by defining simple operator algebras that manipulate sequences, i.e. a set of operators $a_{i} : S \to S$ . In the most simple approach, a set composed of only three basic operations to transform sequences is used:

one state $s$ is inserted in the sequence $a_{s^{'}}^{I n s} (s_{1}, s_{2}, s_{3}, \dots s_{T}) = (s_{1}, s_{2}, s_{3}, \dots, s^{'}, \dots s_{T})$
one state is deleted from the sequence $a_{s_{2}}^{D e l} (s_{1}, s_{2}, s_{3}, \dots s_{T}) = (s_{1}, s_{3}, \dots s_{T})$ and
a state $s_{1}$ is replaced (substituted) by state $s'_{1}$ , $a_{s_{1}, s'_{1}}^{S u b} (s_{1}, s_{2}, s_{3}, \dots s_{T}) = (s'_{1}, s_{2}, s_{3}, \dots s_{T})$ .

Imagine now that a cost $c (a_{i}) \in R_{0}^{+}$ is associated to each operator. Given two sequences $S_{1}$ and $S_{2}$ , the idea is to measure the cost of obtaining $S_{2}$ from $S_{1}$ using operators from the algebra. Let $A = a_{1}, a_{2}, \dots a_{n}$ be a sequence of operators such that the application of all the operators of this sequence $A$ to the first sequence $S_{1}$ gives the second sequence $S_{2}$ : $S_{2} = a_{1} \circ a_{2} \circ \dots \circ a_{n} (S_{1})$ where $a_{1} \circ a_{2}$ denotes the compound operator. To this set we associate the cost $c (A) = \sum_{i = 1}^{n} c (a_{i})$ , that represents the total cost of the transformation. One should consider at this point that there might exist different such sequences $A$ that transform $S_{1}$ into $S_{2}$ ; a reasonable choice is to select the cheapest of such sequences. We thus call distance
$d (S_{1}, S_{2}) = \min_{A} {c (A) s u c h t h a t S_{2} = A (S_{1})}$
that is, the cost of the least expensive set of transformations that turn $S_{1}$ into $S_{2}$ . Notice that $d (S_{1}, S_{2})$ is by definition nonnegative since it is the sum of positive costs, and trivially $d (S_{1}, S_{2}) = 0$ if and only if $S_{1} = S_{2}$ , that is there is no cost. The distance function is symmetric if insertion and deletion costs are equal $c (a^{I n s}) = c (a^{D e l})$ ; the term indel cost usually refers to the common cost of insertion and deletion.

Considering a set composed of only the three basic operations described above, this proximity measure satisfies the triangular inequality. Transitivity however, depends on the definition of the set of elementary operations.

Criticism

Although optimal matching techniques are widely used in sociology and demography, such techniques also have their flaws. As was pointed out by several authors (for example L. L. Wu^[2]), the main problem in the application of optimal matching is to appropriately define the costs $c (a_{i})$ .

Optimal matching in causal modelling

Optimal matching is also a term used in statistical modelling of causal effects. In this context it refers to matching "cases" with "controls", and is completely separate from the sequence-analytic sense.

Software

TDA is a powerful program, offering access to some of the latest developments in transition data analysis.
STATA has implemented a package to run optimal matching analysis.
TraMineR is an open source R-package for analysing and visualizing states and events sequences, including optimal matching analysis.

References and notes

↑ A. Abbott and A. Tsay, (2000) Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect Sociological Methods & Research], Vol. 29, 3-33. 21 year-old Glazier James Grippo from Edam, enjoys hang gliding, industrial property developers in singapore developers in singapore and camping. Finds the entire world an motivating place we have spent 4 months at Alejandro de Humboldt National Park.
↑ L. L. Wu. (2000) Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect" Sociological Methods & Research, 29 41-64. 21 year-old Glazier James Grippo from Edam, enjoys hang gliding, industrial property developers in singapore developers in singapore and camping. Finds the entire world an motivating place we have spent 4 months at Alejandro de Humboldt National Park.

[1] A. Abbott and A. Tsay, (2000) Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect Sociological Methods & Research], Vol. 29, 3-33. 21 year-old Glazier James Grippo from Edam, enjoys hang gliding, industrial property developers in singapore developers in singapore and camping. Finds the entire world an motivating place we have spent 4 months at Alejandro de Humboldt National Park.

[2] L. L. Wu. (2000) Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect" Sociological Methods & Research, 29 41-64. 21 year-old Glazier James Grippo from Edam, enjoys hang gliding, industrial property developers in singapore developers in singapore and camping. Finds the entire world an motivating place we have spent 4 months at Alejandro de Humboldt National Park.

[1]

[2]

@@ Line 1: / Line 1: @@
-Andrew Simcox is the title his mothers and fathers gave him and he totally enjoys this name. I've always cherished living in Mississippi. What I adore doing is soccer but I don't have the time lately. Credit authorising is where my main earnings arrives from.<br><br>Feel free to surf to my homepage :: [http://test.jeka-nn.ru/node/129 psychic chat online]
+'''Optimal matching''' is a sequence analysis method used in [[social science]], to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations (e.g. individuals in a [[cohort (statistics)|cohort]]) classical tools (such as [[cluster analysis]]) can be used. The method was tailored to social sciences<ref>A. Abbott and A. Tsay, (2000) ''[http://smr.sagepub.com/cgi/content/abstract/29/1/3 Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect]'' Sociological Methods & Research], Vol. 29, 3-33. {{doi|10.1177/0049124100029001001}}</ref> from a technique originally introduced to study molecular biology (protein or genetic) sequences (see [[sequence alignment]]). Optimal matching uses the [[Needleman-Wunsch algorithm]].
+== Algorithm ==
+Let <math>S = (s_1, s_2, s_3, \ldots s_T)</math> be a sequence of states <math>s_i</math> belonging to a finite set of possible states. Let us denote <math>{\mathbf S}</math> the sequence space, i.e. the set of all possible sequences of states.
+Optimal matching algorithms work by defining simple operator [[algebras]] that manipulate sequences, i.e. a set of operators <math>a_i: {\mathbf S} \rightarrow {\mathbf S}</math>. In the most simple approach, a set composed of only three basic operations to transform sequences is used:
+* one state <math>s</math> is inserted in the sequence <math>a^{\rm Ins}_{s'} (s_1, s_2, s_3, \ldots s_T) = (s_1, s_2, s_3, \ldots, s', \ldots s_T) </math>
+* one state is deleted from the sequence <math>a^{\rm Del}_{s_2} (s_1, s_2, s_3, \ldots s_T) = (s_1, s_3, \ldots  s_T)</math> and
+* a state <math>s_1</math> is replaced (substituted) by state <math>s'_1</math>, <math>a^{\rm Sub}_{s_1,s'_1} (s_1, s_2, s_3, \ldots s_T) = (s'_1, s_2, s_3, \ldots s_T)</math>.
+Imagine now that a ''cost''  <math>c(a_i) \in {\mathbf R}^+_0</math> is associated
+to each operator. Given two sequences <math>S_1</math> and <math>S_2</math>,
+the idea is to measure the ''cost'' of obtaining  <math>S_2</math> from  <math>S_1</math>
+using operators from the algebra. Let <math>A={a_1, a_2, \ldots a_n}</math> be a sequence of operators such that the application of all the operators of this sequence <math>A</math> to the first sequence <math>S_1</math> gives the second sequence <math>S_2</math>:
+<math>S_2 = a_1 \circ a_2 \circ \ldots \circ a_{n} (S_1)</math> where <math>a_1 \circ a_2</math> denotes the compound operator.
+To this set we associate the cost <math>c(A) = \sum_{i=1}^n c(a_i)</math>, that
+represents the total cost of the transformation. One should consider at this point that there might exist different such sequences <math>A</math> that transform  <math>S_1</math> into <math>S_2</math>; a reasonable choice is to select the cheapest of such sequences. We thus
+call distance
+<br> <math>d(S_1,S_2)= \min_A \left \{ c(A)~{\rm such~that}~S_2 = A (S_1)  \right \} </math>
+<br> that is, the cost of the least expensive set of transformations that turn <math>S_1</math> into <math>S_2</math>. Notice that <math>d(S_1,S_2)</math> is by definition nonnegative since it is the sum of positive costs, and trivially <math>d(S_1,S_2)=0</math> if and only if <math>S_1=S_2</math>, that is there is no cost. The distance function is [[symmetric]] if insertion and deletion costs are equal <math>c(a^{\rm Ins}) = c(a^{\rm Del})</math>; the term ''indel'' cost usually refers to the common cost of insertion and deletion.
+Considering a set composed of only the three basic operations described above, this proximity measure satisfies the triangular inequality. [[transitive relation|Transitivity]] however, depends on the definition of the set of elementary operations.
+== Criticism ==
+Although optimal matching techniques are widely used in sociology and demography, such techniques also have their flaws.  As was pointed out by several authors (for example L. L. Wu<ref>L. L. Wu. (2000) ''[http://smr.sagepub.com/cgi/content/refs/29/1/41 Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect"]'' Sociological Methods & Research, 29 41-64. {{doi|10.1177/0049124100029001003}}</ref>), the main problem in the application of optimal matching is to appropriately define the costs <math>c(a_i)</math>.
+== Optimal matching in causal modelling ==
+Optimal matching is also a term used in statistical modelling of [[Counterfactual_conditional#Within_empirical_testing|causal effects]]. In this context it refers to matching "cases" with "controls", and is completely separate from the sequence-analytic sense.
+== Software ==
+* [http://steinhaus.stat.ruhr-uni-bochum.de/tda.html TDA] is a powerful program, offering access to some of the latest developments in transition data analysis.
+* [http://ideas.repec.org/a/tsj/stataj/v6y2006i4p435-460.html STATA] has implemented a package to run optimal matching analysis.
+* [http://mephisto.unige.ch/traminer/ TraMineR] is an open source [[R (programming language)|R]]-package for analysing and visualizing states and events sequences, including optimal matching analysis.
+== References and notes ==
+<references/>
+[[Category:Data mining]]
+[[Category:Statistical distance measures]]
+[[Category:Methods in sociology]]

Chandrasekhar number: Difference between revisions

Latest revision as of 00:10, 19 July 2013

Contents

Algorithm

Criticism

Optimal matching in causal modelling

Software

References and notes

Navigation menu

Chandrasekhar number: Difference between revisions

Latest revision as of 00:10, 19 July 2013

Algorithm

Criticism

Optimal matching in causal modelling

Software

References and notes

Navigation menu

Search