Distributing the traffic evenly and consistently across all the servers is not difficult if the number of servers in the cluster is constant. But in the real world, you always need to take servers out of service for maintenance. The challenge of a good sharding algorithm is to avoid complete redistribution of requests.
Table below uses a simple modular algorithm. It divides and keys by number of servers in service, the remainder is the server that takes the request.
Key | 396562 | 673665 | 115181 | 650428 | 804339 | 394035 | 280572 | 108093 | 938266 | 125314 |
5 nodes | 2 | 0 | 1 | 3 | 4 | 0 | 2 | 3 | 1 | 4 |
4 nodes | 2 | 1 | 1 | 0 | 3 | 3 | 0 | 1 | 2 | 2 |
You can notice that the if we have a 5 server (0-4) cluster and takes server 4 out of service. The requests are completely redistributed to the remaining 4 servers. We are aware of two different algorithms that provide consistency upon node change.
Look-up Ring Algorithm
Form a ring using an array that has significantly larger amount of elements that number of server nodes. For illustration purpose, we use 25 slots for 5 nodes, but the real world ratio should be much higher. The exact number can be determined by running simulation. Then randomly places the server node number in this array. In order to distribute the load evenly in normal mode, the algorithm to populate the ring need to make sure every node get same share of the slots.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 |
3 | 1 | 4 | 2 | 0 | 3 | 0 | 2 | 4 | 1 | 2 | 3 | 4 | 0 | 1 | 3 | 2 | 1 | 0 | 4 | 3 | 1 | 4 | 0 | 2 |
To determine which node gets which request, we divide the key (650428) by number of slots (25) and take the remainder (3). Use the remainder as index to get the server node number (2) in the array above. That server (2) is designated to serve the request. If the designated server node is out of service (OOS), uses the server node (0) designated by the next slot (4) in the array. The process continues until an server that is in service is found. Table below illustrates the process by showing the which server node is selected to serve the request of a set of test keys.
You can see that in the last row, when node 2 is out of service, it's load is distributed between node 0, 1 and 3. In the meantime, other requests are continue to be served by the same server node in the normal situation. That eliminates the need to completely redistribute the cache.
Key | 396562 | 673665 | 115181 | 650428 | 804339 | 394035 | 280572 | 108093 | 938266 | 125314 |
MOD 25 | 12 | 15 | 6 | 3 | 14 | 10 | 22 | 18 | 16 | 14 |
Node selection | Normal | 4 | 3 | 0 | 2 | 1 | 2 | 4 | 0 | 2 | 1 |
Node 2 OOS | 4 | 3 | 0 | 0 | 1 | 3 | 4 | 0 | 1 | 1 |
The advantage of using this algorithm is that the look-up speed is fast and consistent regardless of the number of server nodes that we have. The disadvantage is the need to maintain the look-up ring especially when new nodes are being added to the cluster.
Key+Node Hash Algorithm
This algorithm is to use a good hash algorithm, commonly MD5 or SHA-1. For each request, compute a value for each active node. The value is the hash of a string consisting of key and node (node number, node name or anything that uniquely identifies the node). The server yielded the largest hash value takes the request. Table below demonstrate the node selection process for a set of test keys. The hash algorithm used here is for illustration purpose only, it's neither MD5, nor SHA-1.
In the last row, you can see that when node 2 is out of service, it's load is distributed between node 0, 1 and 4. In the meantime, other requests are continue to be served by the same server node in the normal situation. That eliminates the need to completely redistribute the cache.
Node \ Key | 396562 | 673665 | 115181 | 650428 | 804339 | 394035 | 280572 | 108093 | 938266 | 125314 |
Hash of Key concatenating Node | Node 0 | 81526 | 40031 | 29723 | 53735 | 23911 | 34931 | 96088 | 43852 | 56076 | 38777 |
Node 1 | 5425 | 19393 | 93416 | 53022 | 51364 | 84920 | 51352 | 70016 | 26255 | 30336 |
Node 2 | 93129 | 26422 | 83633 | 65930 | 81901 | 87666 | 50754 | 32221 | 29866 | 7363 |
Node 3 | 40372 | 44005 | 22422 | 32105 | 80448 | 39727 | 33887 | 31331 | 82034 | 93235 |
Node 4 | 4337 | 89463 | 87164 | 64973 | 90511 | 14499 | 88153 | 11442 | 63305 | 29493 |
Node selection | Normal | 2 | 4 | 1 | 2 | 4 | 2 | 0 | 1 | 3 | 3 |
Node 2 OOS | 0 | 4 | 1 | 4 | 4 | 1 | 0 | 1 | 3 | 3 |
The advantage of this algorithm is simple and low maintenance. Nodes can be easily added and removed from the cluster without any issue. The disadvantage is that overhead to calculate the hash value for each request. And the overhead increases when number of nodes in the cluster increases.