Monday, September 15, 2014

Word break II

Reference link: https://oj.leetcode.com/problems/word-break-ii/

Question:
Given a string s and a dictionary of words dict, add spaces in s to construct a sentence where each word is a valid dictionary word.
Return all such possible sentences.
For example, given
s = "catsanddogs",
dict = ["cat", "cats", "and", "sand", "dog", "dogs"].

A solution is ["cats and dogs", "cat sand dogs"].

My thoughts:
A tough question for me. Dynamic programming is definitely necessary. But i will start from my brute force method.
To brute force this question is easy. DFS is needed. For the target array, construct a stack (to simulate DFS) to store last valid partition. Use a index to check stk.top() to index substring.

  • If the substring is valid in the dictionary:
    • If index at string end: A successful partition has been found. push current index to stack, output current partition combination. Pop top, set index to current top+1.
    • Else: Push index to stack, index++.
  • Else:
    • If index at string end: Set current index to stk.top()+1. Pop top.
    • Else: Increment index.
A illustration of this brute force process is attached (only key steps):
Here st is the stack, gray rows are successful rows, orange blocks are current index block.
As one can easily see, substring "DOGS" has been visited at least twice in this process because I did not memoize the results of "CATSAND". For a long input string, brute force method takes longer than a minute to finish. But, if I already have the result of "CATSAND", I can simply add "DOGS" string to all possible solutions of "CATSAND" and voila.

Now the recursive structure is clear. For a string S with length L, if I want to get all combinations of S(-1, L-1], I can check S(-1, sep] since this can be memoized just like what we see in the above figure. If S(-1, sep] can be represented by a series of dict elements, then we can forward to check S(sep, L-1] to see if it is in the dictionary. If both are true, we get a hit. To speed this process up, we can check S(sep, L-1] first since this part does not involve recursive steps.

An illustration of the above process with speed up is:

Notice here I do not have to memoize S(sep, L-1]. This is because if (sep, L-1] can be divided to (sep, sep2] + (sep2, L-1], then if I set sep to sep2, this case can collapse to case (-1, sep2] + (sep2, L-1] case. 

Code is as follow:
unordered_map<int, vector<string>> um;

vector<string> helper(string& s, int i, unordered_set<string> &dict) {
    if(um[i].size()) return um[i];
    vector<string> ret;
    if(i == 0) {
        if(dict.count(s.substr(0, 1))) ret.push_back(s.substr(0, 1) + " ");
        return ret;
    }
    if(dict.count(s.substr(0, i+1))) ret.push_back(s.substr(0, i+1) + " ");
    for(int sep = 1; sep <= i; sep++) {
        if(!dict.count(s.substr(sep, i-sep+1))) continue;
        auto tmp = helper(s, sep-1, dict);
        if(!tmp.size()) continue;
        for(auto &e:tmp) e += s.substr(sep, i-sep+1) + " ";
        ret.insert(ret.end(), tmp.begin(), tmp.end());
    }
    um[i] = ret;
    return ret;
}

vector<string> wordBreak(string s, unordered_set<string> &dict) {
    if(!s.size()) return *(new vector<string>);
    auto ret = helper(s, s.size()-1, dict);
    for(auto &e:ret) {
        if(e.size())e.pop_back();
    }
    return ret;
}

No comments:

Post a Comment