動機
先假設
我們有
srv2key = default_dict(list)
key2srv = {}
srvs = ['server1', 'server2', 'server3', 'server4']
- srv2key: 把server對到負責的key們
- key2srv: key是哪一個srv負責
- srvs: 所有server
consistent hash
先想像有一個hash可以把string轉成0~2**32-1之間的值
把servers想像成環上的點,之後分配資料就是利用bsearch找靠近的server
def get(k):
hashed = hash(k)
i = bisect(map(hash, srvs), hash(k)) % len(srvs)
srv2key[srvs[i]] += [k]
key2srv[k] = srvs[i]
return ret
刪server時,只要往把原本的server資料往前丟就好
def del_server(srv):
keys = srv2key[srv]
oldSrv = srvs.index(srv)
del(srvs, oldSrv)
del(srv2key, oldSrv)
newSrv = srvs[oldSrv+1 % len(srvs)]
srv2key[newSrv] += keys
for k in keys:
key2srv[k] = newSrv
# 其實也可以 [get(k) for k in keys]
rendezvous hash
可以把consistent hash擴展成把所有server加進來比,選一個分數最高的
def get(k):
ret = max(srvs, key=lambda srv: score_aka_hash(srv, k))
srv2key[ret] += [k]
key2srv[k] = ret
return ret
刪server時,就變成每個key都要重新map
def del_server(srv):
keys = srv2key[srv]
oldSrv = srvs.index(srv)
del(srvs, oldSrv)
del(srv2key, oldSrv)
[get(k) for k in keys]
rendezvous hash & consistent hash
rendezvous hash根據hash function,可以讓key平均分配在server上 但是如果要add/del server是個大工程。
consistent hash在處理add/del server簡單的多,同時查找是log n不是n。 但是如果有hotspot,就要自己調整virtual node去調。
rendezvous hash的實際例子
從wiki上改來的,from New Hashing Algorithms for Data Storage
import mmh3
import math
def int_to_float(value: int) -> float:
"""Converts a uniformly random 64-bit integer to uniformly random floating point number on interval [0, 1)."""
fifty_three_ones = 0xFFFFFFFFFFFFFFFF >> (64 - 53)
fifty_three_zeros = float(1 << 53)
return (value & fifty_three_ones) / fifty_three_zeros
class Node:
"""Class representing a node that is assigned keys as part of a weighted rendezvous hash."""
def __init__(self, name: str, seed, weight) -> None:
self.name, self.seed, self.weight = name, seed, weight
def __str__(self):
return "[" + self.name + " (" + str(self.seed) + ", " + str(self.weight) + ")]"
def compute_weighted_score(self, key):
_, hash_2 = mmh3.hash64(str(key), 0xFFFFFFFF & self.seed)
hash_f = int_to_float(hash_2)
score = 1.0 / -math.log(hash_f)
return self.weight * score
def determine_responsible_node(nodes, k):
"""Determines which node, of a set of nodes of various weights, is responsible for the provided key."""
return max(nodes, key=lambda n: n.compute_weighted_score(k))
node1 = Node("node1", 123, 100)
node2 = Node("node2", 567, 200)
node3 = Node("node3", 789, 300)
print(str(determine_responsible_node([node1, node2, node3], 'foo')))
print(str(determine_responsible_node([node1, node2, node3], 'bar')))
print(str(determine_responsible_node([node1, node2, node3], 'hello')))
Ref
Rendezvous Hashing: an alternative to Consistent Hashing Consistent Hashing Algorithm: 應用情境、原理與實作範例 Rendezvous hashing consistent hashing vs. rendezvous (HRW) hashing - what are the tradeoffs?