Wednesday, November 19, 2014

Convert sorted list to binary search tree

Leetcode link.

The way I handle this question is I first count how many nodes there are in the list, and construct an empty binary search tree, then use morris traversal/recursive inorder traversal to fill the whole tree with list value. Time complexity is O(n) with space complexity O(n).

But geekforgeek provided another brilliant solution. I am still pondering on why this works, and my bet is on the similarity between dfs and inorder traversal. But here is the code and some comments:
TreeNode* helper(ListNode* &head, int low, int high) {
    if(low == high) {
        TreeNode* ret = new TreeNode(head->val);
        head = head->next;
        return ret;
    }
    else if(low > high) return NULL;
    int mid = low + ((high-low)>>1);
    TreeNode* left = helper(head, low, mid-1);
    TreeNode* ret = new TreeNode(head->val);
    ret->left = left;
    head = head->next;
    ret->right = helper(head, mid+1, high);
    return ret;
}

TreeNode *sortedListToBST(ListNode *head) {
    if(!head) return NULL;
    ListNode* walker = head;
    int count = 0;
    while(walker) {
        ++count;
        walker = walker->next;
    }
    return helper(head, 0, count-1);
} 

That low and high index is quite confusing, they actually only represents how many nodes there are in this section. For each valid loop (\( low <= high \)) we need to make sure the list pointer moves to the next section once the root node for this section is handled, such that when we reaches line 13 list pointer is at the start of this section.
Anyway, this is the first time I see code that build a tree from leaf nodes. Very brilliant idea.

PS: geekforgeeks' code is actually even better than mine since the case when \(low == high\) is integrated in the ensuing code.

Tuesday, November 18, 2014

Divide Two Integers

Medium difficulty question with all kinds of corner cases, which bump the difficulty to hard.
Leetcode link.

Q: Divide two integers without using multiplication, division and mod operator.

A:
I managed to crack it by myself the first time I fought through leetcode, but not this time. I had to go back and check what I had done. :( shame

The idea is simple. Shifting operation is your friend. Test case \( \frac{700}{51} \): \[ 700 = 2^3 \cdot 51 + 292 \\ 292 = 2^2 \cdot 51 + 88 \\ 88 = 2^0 \cdot 51 + 37 \] We stop at 37 because \( 37 < 51 \).
All right that's it. Answer is \( 2^3+2^2+2^0 = 13 \).

Corner cases' time!
  1. Dividend / Divisor == 0?
  2. Dividient or Divisor is negative? or both are negative?
  3. This is about integer, so what do you consider? Overflow! \(Dividend/divisor == INT\_MIN/INT\_MAX\) or \(-INT\_MIN/-INT\_MAX ?\)
First one is easy to handle. Notice when divisor is 0, throwing an exception is what you should do. But OJ says you can simply return 0. All right then.
Second one is tricky to handle. Judge the sign! Then use \( abs \) function to do it! Yeah, you'll see there is one loose end soon...
Third one? I cheated... I used long long type to handle everything. If I don't, the loose end in second question will reveal: The devil \( abs(INT\_MIN) \) !  ideone says \( abs(INT\_MIN) == INT\_MIN \), someone on leetcode forum reports \( abs(-2147483647) = -2147483644 \) or something like that. Anyway, I cheated...

My code is as follow. Not the best code though. Feel free to play it here.
pair<long long, long long> decompose(long long a, long long b) {        // DONT MAKE a < b
    int k = 0;                                  // decompose a/b = 2^k + d, d < 2^k
    while (b < a) {
        b <<= 1;
        k++;
    }
    if (b == a) return make_pair(k, 0);
    return make_pair(--k, a - (b>>1));
}

int divide(int dvd, int dvs) {
    if (dvs == 0) throw overflow_error("Divide by zero exception");
    if (dvd == 0) return 0;

    long long ldvd, ldvs;
    ldvd = abs((long long)dvd); ldvs = abs((long long)dvs);
    if (ldvd < ldvs) return 0;

    long long ret = 0;
    pair<long long, long long> kb;
    do{
        kb = decompose(ldvd, ldvs);
        ret += 1 << kb.first;
        ldvd = kb.second;
    } while (kb.second >= ldvs);
    return ((dvd>>31) == (dvs>>31)) ? ret : (~ret + 1);
}

Monday, November 17, 2014

Container with most water


Leetcode link:
Q:
Given n non-negative integers a1, a2, ..., an, where each represents a point at coordinate (i, ai). n vertical lines are drawn such that the two endpoints of line i is at (i, ai) and (i, 0). Find two lines, which together with x-axis forms a container, such that the container contains the most water.
Note: You may not slant the container. 


A:
Given line i and line j, one can easily come up with: \[ Area[i][j] = min(height[i], height[j]) * (j-i) \]
So an \( O(n^2) \) solution is fairly easy to get.


To decrease the time complexity, tricks and pruning are all what you need. Suppose we found an solution from i to j \( (i<j) \), and we assume \( height[i] < height[j] \). Then \( height[i] \) is dominant, which means if we decrease j, there will be no area larger than \( Area[i][j] \). This should be very easy to verify.


With this pruning, in order to investigate all possible i and j combinations, we start with \( i = 0 \) and \( j = height.size()-1 \) and shrink i and j inward, resulting the following simple code:


int maxArea(vector<int> &height) {
    if(height.size() <= 1) return 0;
    // idea of this program: If found one candidate [A[i], A[j]], 
    // if A[i] < A[j] then A[i] is dominate. all [A[i], A[j-k]] (k>0) 
    // will not generate area larger than [A[i], A[j]]. The only 
    // candidates can be achieved by moving j.
    int low, high;
    low = 0;
    high = height.size()-1;
    int maxarea = min(height[high], height[low]) * (high-low);
    while(low != high) {
        maxarea = max(maxarea, min(height[high], height[low]) * (high-low));
        if(height[low] < height[high]) low++;
        else high--;
    }
    return maxarea;
}

Friday, November 7, 2014

[Learning note]: Consistent hashing

This is a fun note. Original link here(Chinese), and here, here, here. (Update: it seems Amazon also use this stuff.)

So in the last post, I scratched the surface of amortized analysis. Amortize analysis shows that insertion in dynamic array is O(1) time complexity and so does insertion and deletion in hash map. But average analysis will have trouble in real world implementation. Consider the case below:

Problem statement


Map one object to \(N\) cache server can be done by a simple hash function \(hashkey(object)%N\). For uniformly distributed hashkey this algorithm is good.

Suppose we have 10 servers labeled as server 0 to server 9. Objects have hash key 11, 12 and 13. Then they will be mapped to server 1, server 2 and server 3.

What if:

  1. One server (server 9) is down. Now the hash function changes to \(hashkey(object)%(N-1)\).

    In this case, objects have to be moved to server 2, server 3 and server 4. All objects have to be moved.
  2. New server added. Now the hash function changes to \(hashkey(object)%(N+1)\).

    In this case, objects have to be moved to server 0, server 1 and server 2.

For large cache servers the cost of the above two cases are not tolerable. Consistent hashing is designed to minimize the compact of cache serve changes by minimizing key moves when new server is introduced or old server is removed from the system. Using our previous dynamic array example, when resizing or shrinking happens, consistent hashing is designed to minimize the resulting compact by reusing as many old keys as possible. As other posts, this post will mainly focus on the application side. More details can be found in this paper.

Algorithm Design


So as stated in Wikipedia and the reference link, circular hash space is used to represent a common modulo-based hash function. I will not redraw all the graphs this time, all figures credit to sparkliang.

If unsigned 32-bit integer value is used as hash key, then the whole hash space can be referred to as a circular space based on modulo property. We represent it as the left figure below.


Given 4 objects with 4 different keys, the figure above shows their placement in the hash space. Up until here we are still using the classical hash design method.

The innovative part of consistent hashing is, not only put object into hash space, cache server is also placed into the hash space. Suppose we have 3 servers. Using certain function, we map them two three different hash key Key A, Key B and Key C. Now place them to hash space:

Now we have object keys and server keys all in the hash space. The next question comes naturally: how to map objects to servers? You might have guessed it right already:
for each object we search clockwise (or counter-clockwise as long as you keep the direction consistent in your program design) and group it to the first cache server we find.
That's it. After grouping, the above figure is now:

Why is this consistent hashing method superior to the naive hashing method?


Let's revisit the two questions raised above and first discuss what happens when one server is down:

Suppose server B is down. Now based on the algorithm above, we have:
It is easy to see that only the segment between cache A and cache B needs to be processed. Comparing with the naive hashing method, removing a cache server causes relatively smaller impact.

Adding a cache server is similar. Suppose a new cache server Cache D is added with Key D, we now have:
It is easy to see that only the segment between cache A and cache B needs to be processed, which brings better performance than naive hashing method.

Virtual cache server in hash space

Consider the following case (derived from cache deletion case):

In this case three objects are grouped to cache C while only one object is grouped to cache A. This balancing issue usually happens when we do not have too many cache server. To avoid this balancing issue, virtual servers are added to the hashing space such that we can mimic the behavior or many servers even though we have only a few servers.

The implementation is quite clever. For each server we make several replica. For example, for the case above, we make one shadow copy for each server and put the shadow copy back to the hashing space. Now we have:

And we can see the hashing space is balanced.

A simple simulation program has been written to verify that if different servers have different weight, for example a new server can take more work than an old server, replica number can be directly related to server weight. In the case of hash table resizing this technique is also applicable. Test link here.

[Learning note] Amortized analysis case: Dynamic array insertion complexity

Amortized analysis is sometimes used in algorithm analysis when average time complexity is more concerned than individual operations. I stumble upon this today when I am reviewing hash map implementation and find it fun.

Case 1: Dynamic array insertion


Consider a dynamic array of size N. When insertion is performed, if the dynamic array still has vacant locations, insertion will take O(1) time. But if the dynamic array is full (all almost full depends on the implementation), we will reallocate an array of size 2N and move original elements to this new array along with the newly added element. This single operation will cost about O(2N) time. (2N time to allocate the new array, at least constant time to delete the original array, N time to move the element, constant time to add the newly inserted element). So we will have a timeline similar to:

When amortized, resizing cost is distributed to all N insertions, leaving an average complexity of nearly O(1).

Case 2: Hash table resizing


With dynamic array insertion in mind, hash table resizing is quite similar. Consider a size N hashmap containing M elements. Its load factor is \( \frac{N}{M} \). The resizing scheme is as follow: If the load balance exceeds 1, we will double hash table size. If load balance is less than \( \frac{1}{4} \), hash table size is halve.

Use the same analyzing scheme, the amortized cost for insertion is calculated as: (Figure redrawn from here.)

When amortized, the average cost is still O(1).

As to shrinking hash table, we start with \( N = \frac{M}{2} \) since this is what remains after one last shrinking operation. We need to perform at least \( \frac{M}{4} \) deletion operations before another shrinking is needed. This shrinking will have time cost of O( \( \frac{M}{2}\) ). Distribute this time to all \( \frac{M}{4} \) deletion operations. The amortized time complexity is still O(1).

Wednesday, October 22, 2014

Wildcard matching

Leetcode link. If you are only interested in Yu's code's explanation, jump to here:

Q:
'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

Tag of this problem is DP + Greedy. Nice tags explain everything.

The following explanations and code pieces are mostly inspired by link, it seems everyone is using this solution. To make things easier, I will call string s "string", and string p "pattern" from now on. It would be fun to have wildcard in both string and pattern, but for now we restrict wildcard to appear only in the pattern.

Basic idea:

DFS:


So we have three categories of pattern chars, normal ones (anything other than '?' and '*'), single-all matching char ('?') and the most interesting multi-matching char('*'). One can easily come up with:
bool match(string, strptr, pattern, patptr) {
    if (pattern[patptr] != '*' &&  pattern[patptr] != '?')
        return ( string[strptr] == pattern[patptr] && 
                 match(string, strptr+1, pattern, patptr+1) );
    else if (pattern[patptr] == '?')
        return match(string, strptr+1, pattern, patptr+1);
    else
        return ( match(string, strptr, pattern, patptr+1) || \
                 match(string, strptr+1, pattern, patptr+1));
}
Don't even bother trying this in the online judge system. The time complexity is definitely exponential with exponential level memory requirement.

Dynamic programming I:


With that recursive formula in mind, let's transform it to bottom-up DP. In DFS recursive, we are looking forward to chars behind us, In bottom-up DP, we are using the information we already calculated, thus we are looking backward. When a '*' appears, we  match backward instead of forward. We use star to cancel the corresponding string char, pushing the problem back one string char, or we use the star to cancel nothing. Hard to explain without figures:

So the left one is using the star to cancel the current char in string, and leave the star untouched. Now the problem is traced back to whether we should use star to cancel string[i].
The right one is using the star to cancel nothing. Now the DP formula is:

bool match(stringm, strptr, pattern, patptr) {
    if(memoizationTable[strptr][patptr] has any valid info) 
        return memoizationTable[strptr][patptr];
    bool ret;

    if (pattern[patptr] != '*' &&  pattern[patptr] != '?')
        ret = ( string[strptr] == pattern[patptr] && 
                match(string, strptr-1, pattern, patptr-1) );
    else if (pattern[patptr] == '?')
        ret = match(string, strptr-1, pattern, patptr-1);
    else
        ret = ( match(string, strptr-1, pattern, patptr) || \
                match(string, strptr, pattern, patptr-1)); 

    memoizationTable[strptr][patptr] = ret;
    return ret;
}
Now the complexity is somewhat acceptable. With some memory optimization, one can easily come up with a solution of O(MN), O(N) complexity.
Try this with online judge system, still TLE.

Dynamic programming II:


Now that TLE assures me that this problem itself is special, conventional DP technique needs to be altered. There must be something unique here. After some careful thought, state machine and DFA stuff come back to my mind again. Check KMP algorithm and this post for more state machine ideas.

We have three categories, regular chars, '?' chars, and the annoying '*' chars. When the DFA is constructed, regular chars and '?' chars are all very easy to handle since one will only use this char once. The '*' char can be used for multiple times, or none at all, thus critical point is the '*' char. Say I have pattern "*a*b*c". At the first star I can use it for k1 times, second star I can use it for k2 times, The solution to this wild card problem can be represented by an array of [k1, k2, k3, ...] I can uniquely come up with one way to match the pattern to the string. Wait a sec, this is not DP, this is brute force. Where is the overlapping subproblem?

Consider this case
string:  aaaaa cX
pattern: *a*a* c

While searching, suppose we are now having first star representing 1 '?', second star representing 2 '?', then when we visit the third star, we know it will fail. Next time we have first star representing 2 '?', second star representing 1 '?', when we visit the third star we can just give up. since now the pattern is aligned exactly the same as last time. So I was thinking using an offset to record this. Mark each star as one state, for each state, if an accumulative offset fails, record it, so next time it does not need to calculate again.  Code looks like this. Still fails online judge.

Dynamic programming III:


OK, more pruning is needed. First pruning, when the string pointer reaches end of string, but pattern pointer is not at the end yet, what should we do?

Case 1: If the remaining part of pattern is full of '*', we are good.
Case 2: Else, we have a mismatch. Check this (Upper one is string, lower one is pattern, this offset thing means I will use the star to represent the whole offset section in string):

So now we have a mismatch, and section b contains character that is not '*'. In this case, are we 100% sure that pattern will not match target string? Notice the only tunable parameter here is offset, which can only be increased, not decreased, since the program will try offset 0 first, then offset 1, then offset 2.... So the question is, is it possible to achieve matching between target string and pattern only by increasing offset?

No. when we increase offset, suppose we get this:
Since b is still here, now d+b string still cannot give us a pass.

So we have:
When target string matches to its end:
1. if pattern reaches its end, pattern matches target string.
2. if pattern is not at its end, but left with bunch of '*'s, pattern matches string.
3. if pattern is not at its end, with chars other than '*' left, pattern will not match string.
With this pruning, along with above dynamic programming trick, this code piece barely passes online judge. AC time 1392ms. Yes, more than 1 sec.

Greedy:


More pruning! See I changed the section title. (This idea is borrowed from Yu's code.)
There is a very important backtracking procedure in the previous dynamic programming. Check this graph out.
In this case, since segment b will not match segment c, we have a fail. Moreover, I assume that no matter how I tune offset 2, it will always give me a fail. Now I'm faced with this question:
Do I need to back track, and increase offset 1 such that the following situation might occur?
Answer is no.

Why? If this happens, then if I set offset 2 to offset1'-offset1+offset2', we can use offset1 and offset2 to achieve the same goal. Here offset1'-offset1+offset2' is definitely positive.

So we have:
When we have a mismatch, we only tune the nearest offset, if tuning the nearest offset cannot solve this problem, then pattern cannot match target string.
This means we only need to record the nearest offset. Memory space requirement O(1).

Conclusion:


Combining dynamic programming III and greedy, wildcard can be solved by using this code piece:
    bool isMatch(const char *s, const char *p) {
        const char* strptr = s;
        const char* patptr = p;
        
        const char* lastStrPtr = NULL;
        const char* lastPatPtr = NULL;
        while (*strptr) {
            if (*strptr == *patptr || *patptr == '?') {
                strptr++; patptr++;
            }
            else if (*patptr == '*') {
                lastStrPtr = strptr;
                lastPatPtr = patptr++;
            }
            else if (*strptr != *patptr) {
                if (!lastStrPtr) return false;
                strptr = ++lastStrPtr;
                patptr = lastPatPtr + 1;
            }
        }
        while(*patptr == '*') patptr++;
        return !(*patptr);
    }
AC time around 100ms. Very nice greedy approach.

Tuesday, October 21, 2014

Grammar candy: Lambda expression (closures) in C++11

Useful links: Alex Allain's blog. Cheatsheet (Chinese).

Basic structure of lambda expression:
[capture specification](argument list)->return type {function body}

Example:
int main() {
    auto f = [](){cout<<"hello world";};
    f();
    return 0;
}
Argument list is what we needed in the lambda function. Just like:
int main() {
    auto f = [](int v){cout<<v<<endl;};
    f(1024);
    return 0;
}
Capture specification is fun, it means we can use external data/variable (with respect to lambda function itself) in the lambda function. Based on the way we pass the external data/variable, there are several cases:
  1. [], Empty: No external data/variable is needed in the current lambda function. Just like all the cases above;
  2. [=]: Pass all visible data/variable with respect to the lambda function into lambda function, by value.
    int main() {
        int a = 1;
        int b = 2;
        auto f = [=](int c){cout<<(c+a)<<' '<<(c+b);};
        f(10);
        return 0;
    }
    
    Output 11 and 12.
  3. [&]: Same as above, only difference is pass by reference.
    int main() {
        int a = 1;
        auto f = [&]{++a;};
        f();
        cout<<a;
        return 0;
    }
    Output 2.
  4. [this]: pass lambda function class to lambda function.
    struct s{
        int v;
        s(int val):v(val){};
        void f(){ ( [this]{cout<<++v;} )(); }
    };
    
    int main() {
        (new s(3))->f();
        return 0;
    }
    Output 4.
  5. [var_name]: Only pass variable var_name to lambda, by value.
  6. [&var_name]: Same as above, only difference is pass by reference.
  7. [a, &b]: Pass variable a by value, b by reference.
  8. [=, &a, &b]: Pass variable a and b by reference, all other visible variables pass by value.
  9. [&, a, b]: Inverse of case 8.
    int main() {
        int a = 1;
        int b = 2;
        int c = 3;
        ([=, a, &b]{cout<<a<<b++<<c<<endl;}) ();
        cout<<a<<b<<c;
        return 0;
    }
    Output 123,133. Don't try to to ++a inside lambda function, this will throw an error stating a is read only.
Return type:
When lambda contains more than one return value, C++ standards say you better give it a return type, just like:
int main() {
    auto abs = [](int a)->int{if(a>=0)return a;
                           else return -a;};
    cout<<abs(-3)<<abs(-2)<<abs(-1)<<abs(0);
    return 0;
}
If you try this in ideone, it will not complain if you missed the ->int part, but that is because of GCC. C++ standard say you should explicitly denote the return type.

Alex also presents a very fun concept, Delegate.I am still a newbie, thus I'll use Alex's case and build a similar test case:

#include <iostream>
#include <algorithm>
#include <functional>
#include <string>
using namespace std;

struct c1{
    vector<int> v;
    c1(vector<int> val):v(val){}
    
    void modifyV(function<string (int)> functionHandle) {
        if(functionHandle) {
            for_each(v.begin(), v.end(), \
            [&](int e){cout<<functionHandle(e);});
        }
    }
};

struct c2{
    int record;
    c2():record(0){};
    string c2func(int val) {
        record = max(record, val);
        return val%2?"odd ":"even ";
    }
    void disp() {
        cout<<"\nmax number is "<<record;
    }
};

int main() {
    vector<int> v{1,2,3,4,5};
    c1 t(v);
    c2 t2;
    t.modifyV([&](int val){return t2.c2func(val);});
    t2.disp();
    return 0;
}
Output is:
odd even odd even odd 
max number is 5
So what I did was I use class c2 to process c1's data (determine whether it is odd number or even number) and record c1's data's maximum number. The connection is built by code on line 34. This lambda function use & to take in t2 and passed it to t.

This is quite fun, a more interesting code can be found in Alex's post. He also talks about how to receive a lambda function as input by using std::functional<type>, function handle as parameter.

Friday, October 17, 2014

Minimum window substring

Leetcode link.

This is suppose to be an easy question since I have already read through stormrage's post and thought I have mastered every detail about this code. But eventually it took me almost 3 hours straight to figure out a nice and clean code.

There are several things that one should notice in designing a robust algorithm for this problem:

  1. loop invariant: count <= total required chars. (excessive chars not accounted). If count < total required chars, we are building the first matching window. thus shrinking left edge is not necessary. Else if count == total required chars, Current window is either a half finished first time window or perfect left edge window with "one extra required char on the right edge " window. In both cases we can start shrinking left edge.
  2. We do not have to write a separate code block to do first time window matching because either the first matching or second matching requires left shrinking. Thinking of S = "aab" and T = "b".
So eventually with some modifications I come up with the following code. Alas, always think carefully before you start coding.

string minWindow(string S, string T) {
    if(S.size() < T.size() || !T.size()) return *(new string);
    vector lmt(256);
    for(auto c:T) lmt[c]++;
    vector ct(256);
    
    int minLen = INT_MAX;
    string ret;
    
    int a0, a1, count;                              // left and right edge pointer.
    count = 0;
    for(a0 = 0, a1 = 0; a1 < S.size(); a1++) {      // do right edge expand at loop start
        if(!lmt[S[a1]]) continue;                   // invalid char, keep expanding
        if(ct[S[a1]] < lmt[S[a1]]) count++;         // char is needed now, increase count
        ct[S[a1]]++;                                // hasFound array+1
        
        if(count == T.size()) {                     // loop invariant met, now do shrinking
            for(;a0 < a1; a0++) {                   // shrinking....
                if(!lmt[S[a0]]) continue;           // invalid char, keep shrinking
                ct[S[a0]] --;                       // good char, try hasFound--
                                                    // before shrinking
                if(ct[S[a0]] < lmt[S[a0]]) {        // oops, should not delete this char.
                    ct[S[a0]]++;                    //   roll hasFound back.
                    break;                          //   don't move left edge pointer a0
                }                                   // it's ok to remove this char, keep
                                                    // shrinking
            }
            
            if(a1 - a0 + 1 < minLen) {              // one min window found. update
                minLen = a1-a0+1;
                ret = S.substr(a0, a1-a0+1);
            }
        }
    }
    
    if(minLen == INT_MAX) return *(new string);     // INT_MAX is acting as safeguard.
    return ret;
}

Thursday, October 9, 2014

Distinct Subsequence

Leetcode link.

Q: Given a string S and a string T, count the number of distinct subsequences of T in S.
Example: S = "rabbbit", T = "rabbit". Return 3.

This question itself is already hard to understand. I failed in creating a DP formula for it (fired). But with KMP in mind, I come up with another method that use finite state machine to do this problem. Complexity(O(m*n), O(n)), where m and n are length of S and T.

Other guy's method (DP based):


Use num[i][j] to save the number of distinct subsequences of T(0, j) in S(0, i).
Then

  1. if T[j] != S[i], Formula num[i][j] = num[i-1][j].
  2. else we have two options for the new char S[i], either we take it (num[i-1][j-1]) or we don't take it (num[i-1][j]). Formula num[i][j] = num[i-1][j-1] + num[i-1][j].
Space optimization technique can be used just like what I did in this post. Final complexity (O(m*n), O(n)). The following code is borrowed from this post(Chinese).

int numDistinct(string S, string T) {  
    // Start typing your C/C++ solution below  
    // DO NOT write int main() function  
    int match[200];  
    if(S.size() < T.size()) return 0;  
    match[0] = 1;  
    for(int i=1; i <= T.size(); i++)  
        match[i] = 0;  
    for(int i=1; i<= S.size(); i ++)  
        for(int j =T.size(); j>=1; j--)  
            if(S[i-1] == T[j-1])  
                match[j]+= match[j-1];  
    return match[T.size()];  
}  


My state based method:


So each char in the T string is treated as a state. This state thing basically means at current state, if  I received the char I want from S, I can either take this char and progress to the next state, or I do not take this char. Example:

For target T = "rabbit" . I have state 0 to 5 standing for current matching state. If my current state is at 4, which is character "i", and I receive a char "t" from S, I can either progress to the next state "t", or I disregard this "t" and wait for the next "t" to appear. Basically similar to the DP formula:  num[i][j] = num[i-1][j-1] + num[i-1][j].

I construct a state count array ct of size T.size() and initialized as all 0 except for the first state ct[0] = 1.  This means at character i, how many states are waiting for the same T[i] in string S to appear. This seems hard to explain. Let me draw some figures.



So now, 1 state is waiting for character "r" from string S. No one is waiting for "a" to "t".

Send a "r" from string S.

This "r" can either be consumed or not. If consumed, we will wait for an "a" then, if not, we will still wait for a "r":

Send an "a" from string S.

First we consider the one waiting for an "a", as last time, it can choose to consume this "a" or not consume this "a". As to the one waiting for an "r", it will not progress. (Similar to DP formula num[i][j] = num[i-1][j]).


Send a "b" from string S.

This is the same as last time. I will just skip it and only paste the drawing.


Send another "b" from string S.

Now this is the fun part, I now have state 2 and state 3 all waiting for this coming "b". For state 2, if I consume this "b", it will be converted to state 3. But this newly converted state 3 cannot consume "b" again since I only received one "b" from S. On the other hand, those already sitting at state 3 can consume this "b" and get converted to state 4. Thus I will do state 3 first, and then do state 2.
For state 3 waiting for "b":
 (Green arrow means consume this "b", red means do not consume this "b"):

Then state 2 waiting for "b":
This is just like memory optimization, from last state to the fist state instead of first state to the last.
Final ct after this step:

Send the third "b" from string S:
Those at state 4 waiting for "i", at state 1 waiting for "a", at state 0 waiting for "r" will not be affected. Use similar analysis method, one can come up with this final ct array:

Send an "i" from string S:
Same as send "a" or send "r". Final ct array:

Send a "t" from string S:
Now those state waiting for "t", if they decide to consume this "t", then we will have 3 subsequences that matches the requirement. If they decide not to consume this "t", we leave it in the ct array.
Final ct array should be exactly the same as the above one, while all the "overflow" states get captured in a return variable.

Now we are at the end of string S. Loop ends.

Code looks so similar to DP version. (in fact, you can say I am using DP since this ct array is basically the memoization step).

int numDistinct(string S, string T) {
    //corner cases
    if (!S.size() || !T.size() || S.size() < T.size()) return 0;

    //state counter array
    int* ct = new int[T.size()]();
    ct[0] = 1;
    int ret = 0;
    for (int i = 0; i<S.size(); i++)
        for (int j = T.size() - 1; j >= 0; j--)
            if (S[i] == T[j])
                if (j == T.size() - 1) ret += ct[j];
                else ct[j + 1] += ct[j];
    
    return ret;
}

Playable version can be found here.


Wednesday, October 8, 2014

Leetcode: Best Time to Buy and Sell Stock (Extension, DP)

Leetcode link to original problems:
BS1: https://oj.leetcode.com/problems/best-time-to-buy-and-sell-stock/ (O(n), O(1))
BS2: https://oj.leetcode.com/problems/best-time-to-buy-and-sell-stock-ii/ (O(n), O(1))
BS3: https://oj.leetcode.com/problems/best-time-to-buy-and-sell-stock-iii/ (O(n), O(n)) or (O(n), O(1))

Extension to buy and sell stock problem:

Q: Say you have an array of size n, for which the ith element is the price of a given stock on day i. Design an algorithm to find the maximum profit. You may complete at most k transactions.
Assume this array is named as prices[].


First step: Basic DP formula


Idea comes from lrd15's awesome post (link). Another idea that I found is this post (Chinese), which is quite hard to comprehend for me though.

We are now trying to handle a generalized problem that can be represented as finding profit[i][k], where the first index i represents day i, and second index k represent maximum number of transactions so far. As usual, for problems that aim to find maximum/minimum result, we break it to one subproblem that contains the current index, and one subproblem that does not contain the current index. This problem is no different.

Our problem:

For profit[i][k], we seek:
  1. Subproblem 1: No transaction ends at day i. =>           profit[i-1][k] = S1
  2. Subproblem 2: The last transaction ends at day i => S2
What we need to do is to find out which one is bigger? S1 or S2. Then profit[i][j] = S1>S2?S1:S2.
This S2 can be further decomposed to:

Suppose the last transaction starts at day j, then
S2 = maximum { profit[j][k] + (prices[i] - prices[j])}, where j is [0, i).
Now we can write the basic DP formula as:
profit[i][k] = max{ profit[i-1][k], maximum{profit[j][k] + (prices[i] - prices[j]) | j in [0,i)} }

A bottom-up implementation using (O(kn^2), O(kn)) time-space complexity.
int maxProfit(vector<int> &prices, int times = 2) {
 // corner cases
 if (prices.size() <= 1) return 0;

 // define profit[i][k] at i day, at most k transactions
 // 
 // the recursive equation is:
 // profit[i][k] = max(profit[i-1][k], ->last trans@today
 //       max(prices[i]-prices[j]+profit[j][k-1]) ->last trans@day j)
 //       here j is at range [0, i-1]
 // can also be rewritten as 
 // profit[i][k] = max(profit[i-1][k], 
 //                    prices[i]+max(profit[j][k-1]-prices[j]))
 int numOfDays = prices.size();
 int numOfTrans = times;

 int** profit = new int*[numOfDays];
 for (int i = 0; i < numOfDays; i++)
  profit[i] = new int[numOfTrans+1]();

 for (int i = 1; i < numOfDays; i++) {
  for (int k = 1; k <= numOfTrans; k++) {
   // find max(profit[j][k-1] - prices[j]) for j [0,i)
   int S = INT_MIN;
   for (int j = 0; j < i; j++) {
    int temp = profit[j][k - 1] - prices[j];
    if (temp>S) S = temp;
   }
   int term1 = profit[i - 1][k];
   int term2 = prices[i] + S;
   profit[i][k] = term1 > term2 ? term1 : term2;
  }
 }
 return profit[numOfDays - 1][numOfTrans];
}

Playable version here. Leetcode TLE since this is basically n^2 complexity.


Second step: Optimization


Now this is the tricky part since it's all about math. Let's take a look at the DP formula again:
profit[i][k] = max{ profit[i-1][k], maximum{profit[j][k] + (prices[i] - prices[j]) | j in [0,i)} }

When we calculate maximum{profit[j][k] + (prices[i] - prices[j]) | j in [0,i)} part, we notice prices[i] term can be extracted, changing the original DP formula to:
profit[i][k] = max{ profit[i-1][k], prices[i] + max{profit[j][k-1] - prices[j]} }
and the term max{profit[j][k-1] - prices[j]} can be further DP since this is a running maximum value problem.

If we write:
S[m][k] = max{ profit[j][k-1] - prices[j] | j in [0, m)}
then based on running maximum:
S[m][k] = max{ profit[m-1][k-1] - prices[m-1], S[m-1][k]}
Now we can decompose the modified DP formula to a DP equation set:
S[i][k] = max{ profit[i-1][k-1] - prices[i-1], S[i-1][k] }
profit[i][k] = max{ profit[i-1][k], prices[i] + S[i][k] }

This is the end of time optimization.

Space optimization is easy to see. profit[i][k] only depends on the last row. So does S[i][k]. Thus using bottom-up space optimization rule, we come up with the following code: (O(kn), O(k))
int maxProfit(vector<int> &prices, int times=2) {
    // corner cases
    if (prices.size() <= 1) return 0;

    // define profit[i][k] at i day, at most k transactions
    // 
    // the recursive equation is:
    // profit[i][k] = max(profit[i-1][k], ->last trans@today
    //                      max(prices[i]-prices[j]+profit[j][k-1]) ->last trans@day j)
    //                      here j is at range [0, i-1]
    // can also be rewritten as 
    // profit[i][k] = max(profit[i-1][k], 
    //                    prices[i]+max(profit[j][k-1]-prices[j]))
    // if we write:
    // S[l][k] = max(profit[j][k-1]-prices[j]), (j in [0, l)), then its equation is
    // S[l][k] = max(profit[l-1][k-1]-prices[l-1], S[l-1][k]).
    // Rewrite line 18 equation as equation set:
    //     S[i][k] = max(profit[i-1][k-1]-prices[i-1], S[i-1][k])
    //     profit[i][k] = max(profit[i-1][k], prices[i] + S[i][k])
    //
    int numOfDays = prices.size();
    int totalTransTime = times;
    // only need to construct one row
    // at day 0, no matter how many trans its all 0 profit.
    int* profit_i = new int[totalTransTime+1]();
    int* S = new int[totalTransTime+1]();
    for (int i = 0; i < totalTransTime + 1; i++)
        S[i] = INT_MIN;
    for (int i = 1; i < numOfDays; i++) {
        for (int k = totalTransTime; k > 0; k--) {
            int cand = profit_i[k - 1] - prices[i - 1];
            if (cand > S[k]) {
                S[k] = cand;
            }
            else
                S[k] = S[k];
                
            int term1 = profit_i[k];
            int term2 = prices[i] + S[k];

            profit_i[k] = term1 > term2 ? term1 : term2;
        }
    }
    return profit_i[totalTransTime];
}

Playable version here. Leetcode AC, 52ms.




Update (2014-11-19)
Why this works? Especially the decomposition part where I decompose a double maximization problem into a DP equation set? (Sidenote: Using more space to store intermediate value such that time can be saved).
The problem in this post contains subproblem max{profit[j][k-1] - prices[j]} that has nothing to do with i except in the range of j.
Can I apply this strategy to other DP problems such as "container with most water problem"?

The DP equation in the container problem is:
bestContainer[i] = max{ min(height[i], height[j]) * (j - i) | j in [0, i) }
This problem, on the contrary, does not have an intermediate result that I can store. I can not separate j from i because of the term min(height[i], height[j]).

So when to apply this space vs time strategy? I guess observation is the key. Identify the characteristic of the DP formula and see what you can find.



Update (2015-10-4)
Leetcode created a OJ for this general case. Link. One more pruning is needed to pass the large data set: If transaction time is larger or equal than day size / 2, to make sure no two transactions are overlapping, this problem degenerates to "Best time to buy and sell stock II", which means unlimited sale/buy can be performed. This can be done in O(n) time.

Tuesday, September 30, 2014

Find duplicates in a array using only O(n) time and O(1) space

This question comes from a stackoverflow question. The original question is stated as:

Q> Given a size n integer array with all elements in [0, n-1]. Any number can appear for any times. Design an algorithm that can detect all duplicate elements in this array. Try to do this in O(n) time and O(1) space. Assume the original string can be modified.

A>


Algorithm 1:


It is trivial to come up with (O(n), O(n)) or (O(n^2), O(1)) algorithm. For me, figuring out (O(n), O(1)) algorithm is no easy task. Once I saw an algorithm that uses element sign bit as a flag to indicate whether a certain number has been visited or not. C++ code as follow:
vector<int> findDup(int a[], int len) {
 vector<int> ret;
 for (int i = 0; i<len; i++) {
  if(a[abs(a[i])] > 0) a[abs(a[i])] *= -1;
  else ret.push_back(abs(a[i]));
 }
 return ret;
}
with playable version here.

This method uses the property that all elements in the array are positive. Thus sign bit can be used as flag to denote whether we have visited it or not.

This method is not perfect. There are several drawbacks:

  1. It uses sign bit, so in a sense this is not O(1) space at all. What if the original array if defined as unsigned int array?
  2. If we have a 0 in the array we cannot detect it, by the requirement we need to be able to accomodate 0.
  3. If we have one number repeated for more than once, the output vector will contain multiple copy of this number.

Algorithm 2:


Stackoverflow user caf came up with this awesome answer here, while an improved version here is proposed by user j_random_hacker.

Caf's answer is also based on modifying the original array. But instead of modifying the sign bit, caf came up with another idea. Caf uses the index of the element as a flag.

The idea is as follow:
Suppose we are given the following array:
We scan from the left to right and treat a[i]'s value as a "pointer" to a[i]th element in the array. We will try to make a[a[i]] = a[i]. It works in this way:
In the top graph, we have a[a[i]] = j, and a[j] = a[a[i]] = k. To make a[j] = j, we only need to swap a[i]'s value, which is j, with a[j]'s value, which I denote as k for now. After swap we get the bottom figure.

Now what? now we have a[a[j]] = j, but check a[i] again, we have a new "pointer" a[i] = k. We will just go ahead and check a[a[i]] = a[k] = l, check whether k and l matches. This process will not complete until we have a[a[i]] = a[i].

Let's use the first figure to demonstrate this process. 
First swap:

Now a[0] = 3, check a[3] and repeat this process again.

Now a[0] = 1, check a[1].
a[1] = 1, thus this process for a[0] stops.

We haven't done yet. Now we check a[1], make sure its pointer also satisfies a[a[i]] = a[i]. Well it's already been done, we move on.

At index 2, well that also looks good. No swap needed.

We move on to index 3, 4 and 5, everything looks good. Eventually we will have:

Now what? Well it is quite apparent now. It can be seen that if a[i] <> i, then a[i] is a duplicate.

Code:
vector<int> findDup2(int a[], int len) {
 vector<int> ret;
 
 for(int i = 0; i < len; i++) {
  while(a[a[i]] != a[i]) {
   swap(a[a[i]], a[i]);
  }
 }
 
 for (int i = 0; i<len; i++) {
  if (a[i] != i) ret.push_back(a[i]);
 }
 return ret;
}
Playable version here.

As Caf's comment suggests, this may looks like O(n^2) but it is actually not. Every swap will fix one a[j] and makes a[j] equals to j. If this relation holds already then no swap is needed. Thus at most N-1 swaps are needed to fix all elements. Complexity O(n).


Algorithm 2plus:


From the output we can see an apparent problem. We did find out all numbers that are duplicate. But the output is not quite what we want. If a number is repeated n (n>2) times, it will show up for n-2 times in the output.

j_random_hacker solves this problem perfectly. He still uses the property a[a[i]] = a[i] (Suddenly it reminds of two star programming...). But for any found duplicate number, this property will be used only once. If this number is a duplicate, which means a[i] <> i, we check to see whether a[a[i]] == a[i] still holds. If it holds, we break it to prevent other duplicates to use this property, such that other duplicates will not be able to be recognized as a duplicate. 

But how do we break it? we set a[a[i]] to what? Suppose a[a[i]] = k, we need to be sure that the a[a[i]] itself will not be recognized as a duplicate, which means a[a[a[i]]] = a[k] (this is confusing... three star programming) cannot equals to k.  Simply put, we need to find a k that satisfies a[k]<>k. We already know a[i] <> i, thus k = i.

I know the above paragraph is confusing. Let's work on the above example. Hopefully it will clarify j_random_hack's brilliant idea.

OK, so again we starts with 
since we still want that nice a[a[i]] = a[i] property. But this time, we change the output part to:
vector<int> findDup2(int a[], int len) {
 vector<int> ret;
 
 for(int i = 0; i < len; i++) {
  while(a[a[i]] != a[i]) {
   swap(a[a[i]], a[i]);
  }
 }
 
 for (int i = 0; i<len; i++) {
  if (a[i] != i && a[a[i]] == a[i]) {
   ret.push_back(a[i]);
   a[a[i]] = i;
  }
 }
 return ret;
}

The changes are on line 11 and line 13. Let's apply this output code to the above graphical result.

For index 0, it is a duplicate since a[0] <> 0. Based on Caf's original code, we immediately implies a[a[0]] == a[0] = 1. Let's break this property, and change a[a[0]] to 0. We get:

For index 1, it is the original copy, but now we recognize it also as a duplicate since a[1] = 0 <> 1. But, based on j_random_hack's idea, since a[a[i]] <> a[i], this index used to be a duplicate but we have destroyed its property (a[k] <> k). We will not output it.

For index 2, it used to be a duplicate, but property destroyed just like a[1]. No output.

For index 3 and 4, they are not duplicates at all (a[i] == i), No output.

For index 5, we repeat what we have done for index 0, and get the following result:
Index 5 is recognized as a duplicate, original copy a[4] has been corrupted. Output a[5] = 4.

Playable final version can be found here.


Conclusion:

It all starts with the idea of using index as flag. This is way harder than using sign bit as flag. To use index as flag, we use a[i]'s value as a new index and see how we can maintain loop invariant a[a[i]] = a[i]. The followup algorithm moved even further, it uses both a[j] <> j to indicate a duplicate, and use a[a[i]] == a[i] as a breakable flag to indicate whether we have visited this element or not. (Shivering with awe....)


Followups:


  1. Find missing elements. Total array size n. Elements are 0, 1, ... n-1 except that 2 numbers are missing. These two numbers have been replaced by 2 other numbers in range [0, n-1]. Find these two missing numbers and replacement numbers. (Original array modifiable) Solution.
  2. Find the first non repeated character in a string. Suppose we are not handling unicode characters, just 8-bit ascii code. (Expand it to 256 length string, use the above algorithm2plus to break the original string, third scan to pick out the first a[i]=i character. (O(n), O(256)) Solution)


Monday, September 22, 2014

Longest palindrome substring (Manacher's linear algorithm)

Intro


Before we start, I strongly recommend you read this post from Professor Johan Jeuring. It details a very important palindrome property: mirror property, aka "palindrome in palindrome". Once you read it you will understand the following algorithm better. Be sure to comment if you find my explanation of Manacher's algorithm does not make sense at all.

References: Dr. Jeuring used Haskell to describe his idea, which make it hard for me to understand. But if you are a Haskell guru, check out his post. If you prefer python, here is another awesome tutorial. The idea in this post comes from felix's posts (Chinese, English), which use C to describe Manacher's algorithm. Here I'm demonstrating felix's idea in my way using C++ with step by step walkthrough as detailed as possible.

Question: Given a string s, find its longest palindrome substring (not subsequence)


Step1: Preprocess


In Manacher's algorithm, palindrome is not defined by starting and ending indices, instead, palindrome center location and palindrome "radius" is used to represent a palindrome. 

For example, given string "aba", longest palindrome can be represented by center location at 'b' with a radius 2 (counting 'b' itself). Given string "abba", longest palindrome can be represented by center location between two 'b's with radius 2.

You can easily see when the longest palindrome has even length, it would be hard for us to represent its center by using character index. Thus what we would like to do is add placeholders between each character pair. We call the resulting string padded string of original string.

Sidenote: If you have read felix's post, because felix was handling standard C string with a '\0' ending char, felix added another placeholder at the beginning of original string. We are using C++ std::string, thus adding placeholders between character pairs is good enough.

Use felix's example string "12212321", we construct the padded string as:
where '#' represents a placeholder.

Now we can redefine the representation of a palindrome substring. I can immediate see one quite long palindrome from index 0 to index 8. For this palindrome substring "#1#2#2#1#", it is centered at index 4 with a radius of 5 (#1#2 # 2#1#). We can define an array representing the radius of the longest palindrome substring centered at current location, I will call it radius array. To be consistent with felix's post, I will use P[i] to represent it. Notice each character itself can be called a palindrome substring, so radius can be at least 1.

If we manually construct this radius array, we will have: 
I'll verify one of this P[i]. Take i = 7. The longest palindrome substring with s[i] = '1' as its center is s[4, 10] = '#2#1#2#'. Radius (1 #2#) has length 4.

Now it is easy to see when P[i] gets maximum value, we find the longest palindrome substring in the padded string.

You might see constructing P[i] is not hard at all, we simple visit each index and expand P[i] from 1 to the longest it can get by comparing s[i+P[i]] and s[i-P[i]]. This is a feasible way to solve this problem but that will have a time complexity of O(n^2). Manacher's algorithm can do better. But the idea is the same. Manacher's algorithm provides a better way to construct P[i] and reduces time complexity to O[n].


Step2: Iteration


Now let's show how Manacher constructs P[i].

Two variables are defined:
ind: where the latest longest palindrome substring centers at.
mx: the index of the character that is immediately after current longest palindrome substring.

Take string "ababa" for example. Scan from left to right. If we are standing at the first 'b', we can see current ind is 1 (longest palindrome substring at index 0 is 'a' itself), how about mx? current longest palindrome substring is aba with length 3, mx is the character immediately after "aba", which is the second 'b'. Now mx = 3.

At each index, three operations will be performed.

2.1 Precalculate P[i]. 

Here is when we will use palindrome substring's mirror property. I will assume you have already read Professor Jeuring's post. But just to refresh your memory, I'll repeat it based on my interpretation:

For a palindrome string residing in a longer palindrome string, its length must be the same as its mirror palindrome string's length. This mirror palindrome string is defined with respect to longer palindrome string's center.

Formularize this statement, we have multiple cases:

2.1.1. Current character does not resides in the latest palindrome substring.
Case: "abacde" 

Scan from left to right, when we are at index 3, we find our latest longest palindrome substring is s[0-2] = "aba". Index 3 is not part of it. Thus index 3 will have P[i] = 1. But wait! "aca" is also palindrome! Calm down. Title 2.1 says precalculate P[i]. We will update P[3] to correct value soon.

Codify case 2.1.1, using our predefined variables ind and mx, we can have:
if (i >= mx) P[i] = 1;
Just fyi, I'll repeat the definition of mx again: the index of the character that is immediately after current longest palindrome substring.

2.1.2 Current character reside in the latest palindrome substring.
Case: "xabaxabayz"
Scan from left to right, when we are at index 6, we find our latest longest palindrome substring is s[1-7] = "abaxaba" with center at 4. Now ind = 4, mx = 8.

Index 6's mirror index about axis 4 is 2*4-6 = 2. It has P[2] = 3. So we set P[6] to 3 based on mirror property. Wait, that is not correct! P[6] should be 2 just by observation! Why? because longest palindrome centered at 2, which is "xabax", does not totally reside in the green area. But, here comes the fun part, its "sub-palindrome" "aba" totally reside in the green area. Thus in this case, instead of using P[2*ind - i], we will use the shorter yet safer version "aba", which has radius (mx-i).

The above case is an extreme case. Sometimes we are not that unlucky and the mirror index will have its longest palindrome substring totally reside in the green area. So considering both safety and efficiency, we will use:
if (i < mx) {
	int mirrorPi = p[2 * ind - i];
	int safePi = mx - i;
	p[i] = mirrorPi < safePi ? mirrorPi : safePi;
}
(Here when mirrorPi is less than (mx-i), mirroPi is safer. I just use saferPi as a variable name that is easier to understand.)

2.2 Expand P[i]. 


Once we get over 2.1, the remaining steps are easier to understand. In the last section we were cautious, we are trying to find the safest P[i] to start with. Next we will expand P[i] as large as possible to make sure we don't miss anything.
while (i + p[i] < s.size() && i - p[i] >= 0 \
       && s[i + p[i]] == s[i - p[i]])
	p[i]++;
We can still use 2.1.1's case as an example. We will expand index 3's P[i] to 2 since "aca" is the longest parlindrome string centered at index 3.


2.3 Update mx and ind. 


Once we finish expansion at index i, we have found out the longest possible palindrome substring centered at i. We might find a longer palindrome substring then the current longest palindrome substring. Since we will keep using mx and ind in step 2.1, proper maintenance is needed to make sure we always have mx and ind representing the property of current longest palindrome substring. This is done by verifying mx is still the largest mx.
if (i + p[i] > mx) {
	mx = i + p[i];
	ind = i;
}

Step3: Cleanup


Since padding characters have been added to the original string, clean up work is need to restore final result without padded placeholders. In the following code, core concept is demonstrated. Corner cases such as empty string has not been handled. Cleanup has not be realized either. (Lazy OP).

pair<int int> manacher(string s) {
	// assuming string is in the format of #s[0]#s[1]#...#s[n]# 
	// format with length larger than 3 (original unpadded string
	// has more than one char).

	// preparation phase
	int* p = new int[s.size()]();
	p[0] = 1; p[1] = 2;
	int mx = 3;	// mx: store the index immediate after current
	            //longest palindrome
	int ind = 1; // ind: store current longest palindrome
	             //center index.

	for (int i = 2; i < s.size(); i++) {
		// step1: precalculate p[i]
		if (i < mx) {
			int mirrorPi = p[2 * ind - i];
			int safePi = mx - i;
			p[i] = mirrorPi < safePi ? mirrorPi : safePi;
		}
		else {
			p[i] = 1;
		}

		// step2: expand
		while (i + p[i] < s.size() \
			   && i - p[i] >= 0 \
			   && s[i + p[i]] == s[i - p[i]])
			p[i]++;

		// step3: update mx and ind
		if (i + p[i] > mx) {
			mx = i + p[i];
			ind = i;
		}
	}

	return make_pair(p[ind], ind);
}

You can test this code here.