Overdetermined system: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>D.Lazard
top: fixing wiki link + linking "degree of freedom" + adding the property of inconsistence of general over determined systems
 
Line 1: Line 1:
In [[computer science]], '''Hirschberg's algorithm''', named after its inventor, [[Dan Hirschberg]], is a [[dynamic programming]] [[algorithm]] that finds the optimal [[sequence alignment]] between two [[string (computer science)|string]]s. Optimality is measured with the [[Levenshtein distance]], defined to be the sum of the costs of insertions, replacements, deletions, and null actions needed to change one string into the other.  Hirschberg's algorithm is simply described as a [[divide and conquer algorithm|divide and conquer]] version of the [[Needleman&ndash;Wunsch algorithm]].<ref>[http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Hirsch/ Hirschberg's algorithm<!-- Bot generated title -->]</ref>  Hirschberg's algorithm is commonly used in [[computational biology]] to find maximal global alignments of [[DNA]] and [[protein]] sequences.
Hello! Let me begin by stating my name - Ron Stephenson. Years ago we moved to Kansas. The preferred hobby for my children and me is playing crochet and now I'm attempting to earn money with it. I am a production and distribution officer.<br><br>Also visit my site :: [http://Bikedance.com/blogs/post/29704 http://Bikedance.com]
 
==Algorithm information==
Hirschberg's algorithm is a generally applicable algorithm for optimal sequence alignment. [[BLAST]] and [[FASTA]] are suboptimal [[Heuristic (computer science)|heuristics]].  If ''x'' and ''y'' are strings, where length(''x'') = ''n'' and length(''y'') = ''m'', the [[Needleman-Wunsch algorithm]] finds an optimal alignment in [[Big O Notation|O]](''nm'') time, using O(''nm'') space.  Hirschberg's algorithm is a clever modification of the Needleman-Wunsch Algorithm which still takes O(''nm'') time, but needs only O(min{''n'',''m''}) space.<ref>http://www.cs.tau.ac.il/~rshamir/algmb/98/scribe/html/lec02/node10.html</ref>
One application of the algorithm is finding sequence alignments of DNA or protein sequences.  It is also a space-efficient way to calculate the [[longest common subsequence problem|longest common subsequence]] between two sets of data such as with the common [[diff]] tool.
 
The Hirschberg algorithm can be derived from the Needleman-Wunsch algorithm by observing that:<ref>{{cite journal|author=Hirschberg, D. S.|title=A linear space algorithm for computing maximal common subsequences|journal=Communications of the ACM|volume=18|issue=6|year=1975|pages=341–343|doi=10.1145/360825.360861}}</ref>
# one can compute the optimal alignment score by only storing the current and previous row of the Needleman-Wunsch score matrix;
# if <math>(Z,W) = \operatorname{NW}(X,Y)</math> is the optimal alignment of <math>(X,Y)</math>, and <math>X = X^l + X^r</math> is an arbitrary partition of <math>X</math>, there exists a partition <math>Y^l + Y^r</math> of <math>Y</math> such that <math>\operatorname{NW}(X,Y) = \operatorname{NW}(X^l,Y^l) + \operatorname{NW}(X^r,Y^r)</math>.
 
== Algorithm description ==
 
<math>X_i</math> denotes the i-th character of <math>X</math>, where <math>1 < i \leqslant \operatorname{length}(X)</math>. <math>X_{i:j}</math> denotes a substring of size <math>j-i+1</math>, ranging from i-th to the j-th character of <math>X</math>. <math>\operatorname{rev}(X)</math> is the reversed version of <math>X</math>.
 
<math>X</math> and <math>Y</math> are sequences to be aligned. Let <math>x</math> be a character from <math>X</math>, and <math>y</math> be a character from <math>Y</math>. We assume that <math>\operatorname{Del}(x)</math>, <math>\operatorname{Ins}(y)</math> and <math>\operatorname{Sub}(x,y)</math> are well defined integer-valued functions. These functions represent the cost of deleting <math>x</math>, inserting <math>y</math>, and replacing <math>x</math> with <math>y</math>, respectively.
 
We define <math>\operatorname{NWScore}(X,Y)</math>, which returns the last line of the Needleman-Wunsch score matrix <math>\mathrm{Score}(i,j)</math>:
 
  '''function''' NWScore(X,Y)
    Score(0,0) = 0
    '''for''' j=1 '''to''' length(Y)
      Score(0,j) = Score(0,j-1) + Ins(Y<sub>j</sub>)
    '''for''' i=1 '''to''' length(X)
      Score(i,0) = Score(i-1,0) + Del(X<sub>i</sub>)
      '''for''' j=1 '''to''' length(Y)
        scoreSub = Score(i-1,j-1) + Sub(X<sub>i</sub>, Y<sub>j</sub>)
        scoreDel = Score(i-1,j) + Del(X<sub>i</sub>)
        scoreIns = Score(i,j-1) + Ins(Y<sub>j</sub>)
        Score(i,j) = max(scoreSub, scoreDel, scoreIns)
      '''end'''
    '''end'''
    '''for''' j=0 '''to''' length(Y)
      LastLine(j) = Score(length(X),j)
    '''return''' LastLine
 
Note that at any point, <math>\operatorname{NWScore}</math> only requires the two most recent rows of the score matrix. Thus, <math>\operatorname{NWScore}</math> can be implemented in <math>O(\operatorname{min}\{\operatorname{length}(X),\operatorname{length}(Y)\})</math> space.
 
The Hirschberg algorithm follows:
 
  '''function''' Hirschberg(X,Y)
    Z = ""
    W = ""
    '''if''' length(X) == 0 '''or''' length(Y) == 0
      '''if''' length(X) == 0
        '''for''' i=1 '''to''' length(Y)
          Z = Z + '-'
          W = W + Y<sub>i</sub>
        '''end'''
      '''else if''' length(Y) == 0
        '''for''' i=1 '''to''' length(X)
          Z = Z + X<sub>i</sub>
          W = W + '-'
        '''end'''
      '''end'''
    '''else if''' length(X) == 1 '''or''' length(Y) == 1
      (Z,W) = NeedlemanWunsch(X,Y)
    '''else'''
      xlen = length(X)
      xmid = length(X)/2
      ylen = length(Y)
 
      ScoreL = NWScore(X<sub>1:xmid</sub>, Y)
      ScoreR = NWScore(rev(X<sub>xmid+1:xlen</sub>), rev(Y))
      ymid = PartitionY(ScoreL, ScoreR)
 
      (Z,W) = Hirschberg(X<sub>1:xmid</sub>, y<sub>1:ymid</sub>) + Hirschberg(X<sub>xmid+1:xlen</sub>, Y<sub>ymid+1:ylen</sub>)
    '''end'''
    '''return''' (Z,W)
 
In the context of Observation (2), assume that <math>X^l + X^r</math> is a partition of <math>X</math>. Function <math>\mathrm{PartitionY}</math> returns index <math>\mathrm{ymid}</math> such that <math>Y^l = Y_{1:\mathrm{ymid}}</math> and <math>Y^r = Y_{\mathrm{ymid}+1:\operatorname{length}(Y)}</math>. <math>\mathrm{PartitionY}</math> is given by
 
  '''function''' PartitionY(ScoreL, ScoreR)
    '''return''' [[arg max]] ScoreL + rev(ScoreR)
 
== Example ==
 
Let
 
<math>
  \begin{align}
    X &= \mathrm{AGTACGCA},\\
    Y &= \mathrm{TATGC},\\
    \operatorname{Del}(x) &= -2,\\
    \operatorname{Ins}(y) &= -2,\\
    \operatorname{Sub}(x,y) &= \begin{cases} +2, & \mbox{if } x = y \\ -1, & \mbox{if } x \neq y.\end{cases}
  \end{align}
</math>.
 
The optimal alignment is given by
 
  W = AGTACGCA
  Z = --TATGC-
 
Indeed, this can be verified by backtracking its corresponding Needleman-Wunsch matrix:
 
          '''T  A  T  G  C'''
      '''0'''  -2  -4  -6  -8 -10
  '''A'''  '''-2'''  -1  0  -2  -4  -6
  '''G'''  '''-4'''  -3  -2  -1  0  -2
  '''T'''  -6  '''-2'''  -4  0  -2  -1
  '''A'''  -8  -4  '''0'''  -2  -1  -3
  '''C''' -10  -6  -2  '''-1'''  -3  1
  '''G''' -12  -8  -4  -3  '''1'''  -1
  '''C''' -14 -10  -6  -5  -1  '''3'''
  '''A''' -16 -12  -8  -7  -3  '''1'''
 
One starts with the top level call to <math>\operatorname{Hirschberg}(\mathrm{AGTACGCA}, \mathrm{TATGC})</math>. The call to <math>\operatorname{NWScore}(\mathrm{AGTA},Y)</math> produces the following matrix:
 
        '''T  A  T  G  C'''
    0  -2  -4  -6  -8 -10
  '''A''' -2  -1  0  -2  -4  -6
  '''G''' -4  -3  -2  -1  0  -2
  '''T''' -6  -2  -4  0  -2  -1
  '''A''' -8  -4  0  -2  -1  -3
 
Likewise, <math>\operatorname{NWScore}(\operatorname{rev}(\mathrm{CGCA}), \operatorname{rev}(Y))</math> generates the following matrix:
 
        '''C  G  T  A  T'''
    0 -2  -4  -6  -8 -10
  '''A''' -2 -1  -3  -5  -4  -6
  '''C''' -4  0  -2  -4  -6  -5
  '''G''' -6 -2  2  0  -2  -4
  '''C''' -8 -4  0  1  -1  -3
 
Their last lines are respectively
 
  ScoreL = [ -8 -4  0 -2 -1 -3 ]
  ScoreR = [ -8 -4  0  1 -1 -3 ]
 
<tt>PartitionY(ScoreL, ScoreR) = 2</tt>, such that <math>X = \mathrm{AGTA} + \mathrm{CGCA}</math> and <math>Y = \mathrm{TA} + \mathrm{TGC}</math>.
 
The entire Hirschberg recursion (which we omit for brevity) produces the following tree:
 
                (AGTACGCA,TATGC)
                /              \
        (AGTA,TA)            (CGCA,TGC)
          /    \              /      \
      (AG,)  (TA,TA)      (CG,TG)  (CA,C)
              /  \        /  \
            (T,T) (A,A)  (C,T) (G,G)
 
The leaves of the tree contain the optimal alignment.
 
==See also==
* [[Needleman-Wunsch algorithm]]
* [[Smith Waterman algorithm]]
* [[Levenshtein distance]]
* [[Longest common subsequence problem|Longest Common Subsequence]]
 
==References==
{{reflist}}
 
{{DEFAULTSORT:Hirschberg's Algorithm}}
[[Category:Sequence alignment algorithms]]
[[Category:Bioinformatics algorithms]]
[[Category:Articles with example pseudocode]]
[[Category:Dynamic programming]]

Latest revision as of 13:30, 7 January 2015

Hello! Let me begin by stating my name - Ron Stephenson. Years ago we moved to Kansas. The preferred hobby for my children and me is playing crochet and now I'm attempting to earn money with it. I am a production and distribution officer.

Also visit my site :: http://Bikedance.com