Given a random sequence, find the longest subsequence such that all subsequence elements are in monotonically increasing order.
Example, given sequence {7,8,9,1,2,3,6}, longest increasing sequence (LIS) is {1,2,3,6}.
I come across this classical problem while reading Dynamic Programming chapter in CLRS algorithm book. Given the dynamic programming context, trying dynamic programming becomes my first thought.
The most important part of DP is to find the underlying independent optimal substructure of one problem. Since the purpose is to build an increasing subsequence, its natural to use prefix analyze method just as longest common subsequence (LCS) problem. (LCS states that if A[i]=B[j], LCS[i,j] = LCS[i-1, j-1] + 1, or else investigating LCS[i-1, j] and LCS[i, j-1] would be necessary, causing overlapping subproblems that can be optimized by dynamic programming.)
Following LCS's idea, this is what first came to my mind (and sadly fails to work):
Problem: For series [A0, A1, ... , An] find LIS.
My way of defining optimal subproblems:
Suppose for series [A0, ... , Ai], LIS is [As0, As1, ... , Asj]. Here I did not realize the relationship between Ai and Asj. I thought they can be different.
If A(i+1) > Asj, definitely we have (A0, ... Ai, A(i+1)) as the new LIS [As0, As1, ... Asj, A(i+1)].
If A(i+1) = Asj, now we have a problem.
Suppose there is another LIS [At0, At1,... Atk], and length of LIS[As0->Asj] is the same as LIS[At0->Atk]. We pick LIS[As0->Asj] as LIS of A(i) instead of LIS[At0->Atk] just by random.
If Atk is less than Asj, then adding the new A(i+1) will change our new LIS to [At0, At1, ... Atk, A(i+1)];
If Atk is larger than Asj, then adding the new A(i+1) will not change our LIS.
A simple example is, if we have sequence [7,8,9,1,2,3]. [7,8,9] or [1,2,3] can be picked as the LIS so far. If [7,8,9] is picked as LIS, then adding a 9 to the current sequence will not change the LIS based on the above analysis, which is wrong because the new LIS is [1,2,3,9]. Now we have violated the rule that for the same problem, subproblems must be independent.
A solution to this problem is to always pick the LIS that has the smallest ending number. Based on the above subproblem structure it still causes subproblem overlapping under one problem.
The reason why correct LIS is missed by using the above strategy is, as in LCS, I tried to rule out subproblems when I progress from one subproblem LIS[A0->Ai] to another LIS[A0->Ai+1] by only considering one possible LIS from LIS[A0->Ai]. This turns out to be a bad strategy. How to fix it then?
A natural way to fix it is to consider all possible LIS from LIS[A0->Ai]. Get all LIS?
One right way to do it:
To get all LIS, one way to do this is to mark LIS by its ending char. So for LIS[A0->Ai], to get all its LIS, we find LIS of [A0->A1] with A1 being the ending char of LIS[A0->A1], do so for [A0->A2] all the way to [A0->Ai].
Now the recursive rule is clear, consider all subproblems, combining with current char A[i+1], find the next best LIS ending with A[i+1]. If we want to find LIS(A[n]) without requiring A[n] being LIS's ending char, we can traverse the whole A sequence and find the LIS of A[i] that has longest LIS.
The tricky part in this problem is to find the optimal substructure. Now we can see one subproblem is solely dependent on its subsubproblems rather than its peer subproblems. Subproblem overlapping is thus eliminated.
The recursive rule is:
Mark LIS with A[i] as its ending char as LIS'(A[i]). For LIS'(A[i+1]), find LIS'(A[0]) all the way to LIS'(A[i]).
If A[i+1] is larger than A[k], k = 0,...,i, combine these A[k] as a set S(Ak), Find A[k'] in S(Ak) that has longest length, then LIS'(A[i+1]) = LIS'(A[k']) + A[i+1].
If A[i+1] is less than all A[k], k = 0,...,i, then LIS'(A[i+1]) = A[i+1].
Just for fun, there is another tricky way to do LIS by using longest common subsequence here.
Following LCS's idea, this is what first came to my mind (and sadly fails to work):
Problem: For series [A0, A1, ... , An] find LIS.
My way of defining optimal subproblems:
Suppose for series [A0, ... , Ai], LIS is [As0, As1, ... , Asj]. Here I did not realize the relationship between Ai and Asj. I thought they can be different.
If A(i+1) > Asj, definitely we have (A0, ... Ai, A(i+1)) as the new LIS [As0, As1, ... Asj, A(i+1)].
If A(i+1) = Asj, now we have a problem.
Suppose there is another LIS [At0, At1,... Atk], and length of LIS[As0->Asj] is the same as LIS[At0->Atk]. We pick LIS[As0->Asj] as LIS of A(i) instead of LIS[At0->Atk] just by random.
If Atk is less than Asj, then adding the new A(i+1) will change our new LIS to [At0, At1, ... Atk, A(i+1)];
If Atk is larger than Asj, then adding the new A(i+1) will not change our LIS.
A simple example is, if we have sequence [7,8,9,1,2,3]. [7,8,9] or [1,2,3] can be picked as the LIS so far. If [7,8,9] is picked as LIS, then adding a 9 to the current sequence will not change the LIS based on the above analysis, which is wrong because the new LIS is [1,2,3,9]. Now we have violated the rule that for the same problem, subproblems must be independent.
A solution to this problem is to always pick the LIS that has the smallest ending number. Based on the above subproblem structure it still causes subproblem overlapping under one problem.
The reason why correct LIS is missed by using the above strategy is, as in LCS, I tried to rule out subproblems when I progress from one subproblem LIS[A0->Ai] to another LIS[A0->Ai+1] by only considering one possible LIS from LIS[A0->Ai]. This turns out to be a bad strategy. How to fix it then?
A natural way to fix it is to consider all possible LIS from LIS[A0->Ai]. Get all LIS?
One right way to do it:
To get all LIS, one way to do this is to mark LIS by its ending char. So for LIS[A0->Ai], to get all its LIS, we find LIS of [A0->A1] with A1 being the ending char of LIS[A0->A1], do so for [A0->A2] all the way to [A0->Ai].
Now the recursive rule is clear, consider all subproblems, combining with current char A[i+1], find the next best LIS ending with A[i+1]. If we want to find LIS(A[n]) without requiring A[n] being LIS's ending char, we can traverse the whole A sequence and find the LIS of A[i] that has longest LIS.
The tricky part in this problem is to find the optimal substructure. Now we can see one subproblem is solely dependent on its subsubproblems rather than its peer subproblems. Subproblem overlapping is thus eliminated.
The recursive rule is:
Mark LIS with A[i] as its ending char as LIS'(A[i]). For LIS'(A[i+1]), find LIS'(A[0]) all the way to LIS'(A[i]).
If A[i+1] is larger than A[k], k = 0,...,i, combine these A[k] as a set S(Ak), Find A[k'] in S(Ak) that has longest length, then LIS'(A[i+1]) = LIS'(A[k']) + A[i+1].
If A[i+1] is less than all A[k], k = 0,...,i, then LIS'(A[i+1]) = A[i+1].
Just for fun, there is another tricky way to do LIS by using longest common subsequence here.
No comments:
Post a Comment