String Algorithms
Efficient string searching powers text editors, search engines, and bioinformatics. KMP improves naive O(n*m) to O(n+m).
def build_lps(pattern):
m = len(pattern)
lps = [0] * m
prev = i = 0
while i < m:
if i == 0: prev = 0
if pattern[i] == pattern[prev]:
prev += 1; lps[i] = prev; i += 1
elif prev:
prev = lps[prev-1]
else:
lps[i] = 0; i += 1
return lps
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
lps = build_lps(pattern)
matches = []
i = j = 0
while i < n:
if text[i] == pattern[j]:
i += 1; j += 1
if j == m:
matches.append(i - j)
j = lps[j-1]
elif i < n and text[i] != pattern[j]:
if j: j = lps[j-1]
else: i += 1
return matches
print(kmp_search("AABAACAADAABAABA", "AABA")) # [0, 9, 12]
Rabin-Karp Rolling Hash
def rabin_karp(text, pattern, q=101):
n, m = len(text), len(pattern)
d = 256
h = pow(d, m-1, q)
p_hash = t_hash = 0
matches = []
for i in range(m):
p_hash = (d*p_hash + ord(pattern[i])) % q
t_hash = (d*t_hash + ord(text[i])) % q
for i in range(n-m+1):
if p_hash == t_hash:
if text[i:i+m] == pattern: matches.append(i)
if i < n-m:
t_hash = (d*(t_hash - ord(text[i])*h) + ord(text[i+m])) % q
if t_hash < 0: t_hash += q
return matches
Q: What makes KMP faster than naive?
Naive restarts from position 0 on each mismatch. KMP precomputes LPS (failure function) to know exactly how far back to go — the text pointer i never moves backward, guaranteeing O(n) traversal.
Comments (0)
No comments yet. Be the first!
Leave a Comment