動機

先假設

我們有

srv2key = default_dict(list)
key2srv = {}
srvs = ['server1', 'server2', 'server3', 'server4']
  • srv2key: 把server對到負責的key們
  • key2srv: key是哪一個srv負責
  • srvs: 所有server

consistent hash

先想像有一個hash可以把string轉成0~2**32-1之間的值

把servers想像成環上的點,之後分配資料就是利用bsearch找靠近的server

def get(k):
  hashed = hash(k)
  i = bisect(map(hash, srvs), hash(k)) % len(srvs)
  srv2key[srvs[i]] += [k]
  key2srv[k] = srvs[i]
  return ret

刪server時,只要往把原本的server資料往前丟就好

def del_server(srv):
  keys = srv2key[srv]
  oldSrv = srvs.index(srv)
  del(srvs, oldSrv)
  del(srv2key, oldSrv)

  newSrv = srvs[oldSrv+1 % len(srvs)]
  srv2key[newSrv] += keys
  for k in keys:
    key2srv[k] = newSrv
  # 其實也可以 [get(k) for k in keys]

rendezvous hash

可以把consistent hash擴展成把所有server加進來比,選一個分數最高的

def get(k):
  ret = max(srvs, key=lambda srv: score_aka_hash(srv, k))
  srv2key[ret] += [k]
  key2srv[k] = ret
  return ret

刪server時,就變成每個key都要重新map

def del_server(srv):
  keys = srv2key[srv]
  oldSrv = srvs.index(srv)
  del(srvs, oldSrv)
  del(srv2key, oldSrv)

  [get(k) for k in keys]

rendezvous hash & consistent hash

rendezvous hash根據hash function,可以讓key平均分配在server上 但是如果要add/del server是個大工程。

consistent hash在處理add/del server簡單的多,同時查找是log n不是n。 但是如果有hotspot,就要自己調整virtual node去調。

rendezvous hash的實際例子

從wiki上改來的,from New Hashing Algorithms for Data Storage

import mmh3
import math

def int_to_float(value: int) -> float:
    """Converts a uniformly random 64-bit integer to uniformly random floating point number on interval [0, 1)."""
    fifty_three_ones = 0xFFFFFFFFFFFFFFFF >> (64 - 53)
    fifty_three_zeros = float(1 << 53)
    return (value & fifty_three_ones) / fifty_three_zeros

class Node:
    """Class representing a node that is assigned keys as part of a weighted rendezvous hash."""
    def __init__(self, name: str, seed, weight) -> None:
        self.name, self.seed, self.weight = name, seed, weight

    def __str__(self):
        return "[" + self.name + " (" + str(self.seed) + ", " + str(self.weight) + ")]"

    def compute_weighted_score(self, key):
        _, hash_2 = mmh3.hash64(str(key), 0xFFFFFFFF & self.seed)
        hash_f = int_to_float(hash_2)
        score = 1.0 / -math.log(hash_f)
        return self.weight * score

def determine_responsible_node(nodes, k):
    """Determines which node, of a set of nodes of various weights, is responsible for the provided key."""
    return max(nodes, key=lambda n: n.compute_weighted_score(k))

node1 = Node("node1", 123, 100)
node2 = Node("node2", 567, 200)
node3 = Node("node3", 789, 300)

print(str(determine_responsible_node([node1, node2, node3], 'foo')))
print(str(determine_responsible_node([node1, node2, node3], 'bar')))
print(str(determine_responsible_node([node1, node2, node3], 'hello')))

Ref

Rendezvous Hashing: an alternative to Consistent Hashing Consistent Hashing Algorithm: 應用情境、原理與實作範例 Rendezvous hashing consistent hashing vs. rendezvous (HRW) hashing - what are the tradeoffs?