Friday, November 7, 2014

[Learning note]: Consistent hashing

This is a fun note. Original link here(Chinese), and here, here, here. (Update: it seems Amazon also use this stuff.)

So in the last post, I scratched the surface of amortized analysis. Amortize analysis shows that insertion in dynamic array is O(1) time complexity and so does insertion and deletion in hash map. But average analysis will have trouble in real world implementation. Consider the case below:

Problem statement


Map one object to \(N\) cache server can be done by a simple hash function \(hashkey(object)%N\). For uniformly distributed hashkey this algorithm is good.

Suppose we have 10 servers labeled as server 0 to server 9. Objects have hash key 11, 12 and 13. Then they will be mapped to server 1, server 2 and server 3.

What if:

  1. One server (server 9) is down. Now the hash function changes to \(hashkey(object)%(N-1)\).

    In this case, objects have to be moved to server 2, server 3 and server 4. All objects have to be moved.
  2. New server added. Now the hash function changes to \(hashkey(object)%(N+1)\).

    In this case, objects have to be moved to server 0, server 1 and server 2.

For large cache servers the cost of the above two cases are not tolerable. Consistent hashing is designed to minimize the compact of cache serve changes by minimizing key moves when new server is introduced or old server is removed from the system. Using our previous dynamic array example, when resizing or shrinking happens, consistent hashing is designed to minimize the resulting compact by reusing as many old keys as possible. As other posts, this post will mainly focus on the application side. More details can be found in this paper.

Algorithm Design


So as stated in Wikipedia and the reference link, circular hash space is used to represent a common modulo-based hash function. I will not redraw all the graphs this time, all figures credit to sparkliang.

If unsigned 32-bit integer value is used as hash key, then the whole hash space can be referred to as a circular space based on modulo property. We represent it as the left figure below.


Given 4 objects with 4 different keys, the figure above shows their placement in the hash space. Up until here we are still using the classical hash design method.

The innovative part of consistent hashing is, not only put object into hash space, cache server is also placed into the hash space. Suppose we have 3 servers. Using certain function, we map them two three different hash key Key A, Key B and Key C. Now place them to hash space:

Now we have object keys and server keys all in the hash space. The next question comes naturally: how to map objects to servers? You might have guessed it right already:
for each object we search clockwise (or counter-clockwise as long as you keep the direction consistent in your program design) and group it to the first cache server we find.
That's it. After grouping, the above figure is now:

Why is this consistent hashing method superior to the naive hashing method?


Let's revisit the two questions raised above and first discuss what happens when one server is down:

Suppose server B is down. Now based on the algorithm above, we have:
It is easy to see that only the segment between cache A and cache B needs to be processed. Comparing with the naive hashing method, removing a cache server causes relatively smaller impact.

Adding a cache server is similar. Suppose a new cache server Cache D is added with Key D, we now have:
It is easy to see that only the segment between cache A and cache B needs to be processed, which brings better performance than naive hashing method.

Virtual cache server in hash space

Consider the following case (derived from cache deletion case):

In this case three objects are grouped to cache C while only one object is grouped to cache A. This balancing issue usually happens when we do not have too many cache server. To avoid this balancing issue, virtual servers are added to the hashing space such that we can mimic the behavior or many servers even though we have only a few servers.

The implementation is quite clever. For each server we make several replica. For example, for the case above, we make one shadow copy for each server and put the shadow copy back to the hashing space. Now we have:

And we can see the hashing space is balanced.

A simple simulation program has been written to verify that if different servers have different weight, for example a new server can take more work than an old server, replica number can be directly related to server weight. In the case of hash table resizing this technique is also applicable. Test link here.

No comments:

Post a Comment