Saturday, November 18, 2023

Longest Common Subsequence

Longest Common Subsequence

In the longest-common-subsequence problem, given two sequences X = < x1, x2, . . , xm > and Y = < y1, y2, . . . , yn > and wish to find a maximum length common subsequence of X and Y.

Given two sequences X and Y , we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y . 

For example, if X = < A, B,C, B,D,A,B > and Y = < B,D,C,A, B,A >, the sequence < B,C,A> is a common subsequence of both X and Y . The sequence < B,C,A > is not a longest common subsequence (LCS) of X and Y , however, since it has length 3 and the sequence < B,C, B,A >, which is also common to both X and Y , has length 4. The sequence < B, C, B, A> is an LCS of X and Y, as is the sequence < B, D, A, B>,since X and Y have no common subsequence of length 5 or greater.


Solving the LCS problem using dynamic programming

Step 1: Characterizing a longest common subsequence
The LCS problem has an optimal-substructure property.
Theorem (Optimal substructure of an LCS)
Let X = < x1, x2, . . . , xm > and Y = <y1, y2, . . . , yn > be sequences, and let Z = < z1,z2, . . . ,zk > be any LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then zk ≠ xm implies that Z is an LCS of Xm-1 and Y.
3. If xm ≠ yn, then zk ≠ yn implies that Z is an LCS of X and Yn-1.
The way that Theorem characterizes longest common subsequences tells us that an LCS of two sequences contains within it an LCS of prefixes of the two sequences. Thus, the LCS problem has an optimal-substructure property.
Step 2: A recursive solution
Finding an LCS of X = < x1, x2, . . . , xm > and Y = < y1, y2, . . . , yn >. 
If xm = yn, we must find an LCS of Xm-1 and Yn-1. Appending xm = yn to this LCS yields an LCS of X and Y . 
If xm ≠ yn, then we must solve two subproblems: finding an LCS of Xm-1 and Y and finding an LCS of X and   Yn-1. Whichever of these two LCSs is longer is an LCS of X and Y.

To find an LCS of X and Y, we may need to find the LCSs of X and Yn-1 and of Xm-1 and Y. But each of these subproblems has the subsubproblem of finding an LCS of Xm-1 and Yn-1.
The optimal substructure of the LCS problem gives the recursive formula




C [i, j ] to be the length of an LCS of the sequences Xi and Yj
Step 3: Computing the length of an LCS
Based on equation, an exponential-time recursive algorithm can be written to compute the length of an LCS of two sequences. Since the LCS problem has only θ(mn) distinct subproblems, however,dynamic programming can be used to compute the solutions bottom up.


















Procedure LCS-LENGTH takes two sequences X = < x1, x2, . . . , xm > and Y = < y1,y2, . . . ,yn > as inputs. It stores the c[ i, j ] values in a table c [ 0 . . m, 0 . . n]. The procedure also maintains the table b [1 . . m, 1 . . n ] to help us construct an optimal solution. The procedure returns the b and c tables; c [m, n] contains the length of an LCS of X and Y.

Step 4: Constructing an LCS
The b table returned by LCS-LENGTH enables us to quickly construct an LCS of X = < x1, x2, . . . , xm > and Y = < y1, y2, . . . , yn >. We simply begin at b [m, n] and trace through the table by following the arrows. Whenever we encounter a 
 in entry b[i, j ], it implies that xi = yj is an element of the LCS that LCS-LENGTH found.

With this method, we encounter the elements of this LCS in reverse order. The following recursive procedure prints out an LCS of X and Y in the proper, forward order. The initial call is PRINT-LCS(b,X,X.length, Y.length).


Example:











0 comments:

Post a Comment

Data Structures with C++



NET/SET/CS PG



Operating Systems



Computer Networks



JAVA



Design and Analysis of Algorithms



Programming in C++

Top