動機

記錄用

TODO: 這能幹嘛

定義

就是一個array是map一個排名到以該數字開始的suffix

排名就是把suffix sort過的順序

像是[2,3,0,4,1]就是

  • 以2開始的suffix是第一名
  • 以3開始的suffix是第二名

流程

concept(用到分塊與倍增):

  1. counting sort
  2. 分class

init:

  1. 做counting sort (對每個字做)
  2. 分class (sort過,所以只要前後字不一樣就是不同classes)

2, 4, 8 …:

  1. 先取前半string的位置
  2. 以前半的class做counting sort
  3. 分class (前半要在同一個class,後半也要在同一個class)
def count_sort(arr, kinds, f):
  ret = [0]*len(arr)
  cnts = [0]*kinds
  for n in arr:
    cnts[n] += 1
  for i in range(1,kinds):
    cnts[i] += cnts[i-1]
  for i in reversed(range(len(arr))): # reversed!!
    cnts[arr[i]] -= 1
    ret[cnts[arr[i]]] = f[i]
  return ret

def classify(arr, p, isDiffLevel):
  l = [0]*len(arr)
  level = 0
  for i in range(1,len(arr)):
    if isDiffLevel(i,i-1):
      level += 1
    l[p[i]] = level
  return l, level+1

s = [0,0,1,0]
p = count_sort(s, len(s), list(range(len(s))))
l, levels = classify(s, p, lambda a,b: s[p[a]] != s[p[b]])

for h in filter(lambda h: (1 << h) < len(s), range(len(s))):
  def isDiff(a,b):
    rhs = (l[p[a]], l[(l[a]+(1<<h))% len(s)] )
    lhs = (l[p[b]], l[(l[b]+(1<<h))% len(s)] )
    return rhs != lhs
  p_lhs = [p[i] - (1 << h) + (len(s) if p[i] < (1 << h) else 0) for i in range(len(s))]
  p = count_sort([l[x] for x in p_lhs], levels, p_lhs)
  l, levels = classify(s, p, isDiff)
print(p)

Ref

Suffix Array