String Subsequence

Catalogue
  1. 1. 392. Is Subsequence
    1. 1.1. Solution1. Recursion
    2. 1.2. Solution2. Iteration
    3. 1.3. Follow up : multiple strings
  2. 2. 792. Number of Matching Subsequences
    1. 2.1. Solution1. Binary Search + Hash Table
    2. 2.2. Solution2. Next Letter Pointer
  3. 3. 727. Minimum Window Subsequence
  • For a string with length n, it has $2^n$ Subsequences
  • Check a string is or not a subsequence of another string (a litte greedy)

    392. Is Subsequence

  • Time : O(|t|)

Solution1. Recursion

1
2
3
4
5
6
7
8
9
class Solution:
def isSubsequence(self, s, t):
if not s:
return True

for i in range(len(t)):
if t[i] == s[0]:
return self.isSubsequence(s[1:], t[i+1:])
return False

Solution2. Iteration

  • Two Pointer and Greedy
    1
    2
    3
    4
    5
    6
    7
    8
    9
    class Solution:
    def isSubsequence(self, s, t):
    p = 0
    for i in range(len(t)):
    if p >= len(s):
    break
    if s[p] == t[i]:
    p += 1
    return p == len(s)

Follow up : multiple strings

If there are lots of incoming S, say S1, S2, … , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence.

  • intuitive way is to precalculate something in T to speed up the query
  • Precalculate : O(n), Query : (mlogn)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Solution:
def isSubsequence(self, s, t):
indices = collections.defaultdict(list)
for i, ch in enumerate(t):
indices[ch].append(i)

prev = -1
for i in range(len(s)):
if s[i] not in indices:
return False
idx = self.bSearch(prev, indices[s[i]])
if idx == -1:
return False
prev = idx
return True

def bSearch(self, prev, nums):
if prev >= nums[-1]:
return -1

start, end = 0, len(nums) - 1
while start < end:
mid = start + (end - start) // 2
if nums[mid] > prev: # Mistake2 : we should find greater than previous index CANNOT equal
end = mid
else:
start = mid + 1

return nums[start] if nums[start] > prev else -1
# Mistake1 : forget to consider the index may not in the nums!!!

792. Number of Matching Subsequences

Solution1. Binary Search + Hash Table

  • Online Algorithms, Precalculate String
  • Data Structure : Hash Table <charater, indices>
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    class Solution:
    def numMatchingSubseq(self, S, words):
    # Time O(n)
    indices = [[] for _ in range(26)]
    for i, ch in enumerate(S):
    indices[ord(ch)-ord('a')].append(i)

    cnt = 0
    mdic = {}
    for word in words:
    if word in mdic:
    cnt += mdic[word]
    continue

    if self.isMatch(indices, word):
    mdic[word] = 1
    cnt += 1
    else:
    mdic[word] = 0

    return cnt

    # Time : O(mlog n) n is the length of S, m is the length of word
    def isMatch(self, indices, word):
    prev = -1
    for ch in word:
    idx = self.bSearch(prev, indices[ord(ch)-ord('a')])
    if idx == -1:
    return False
    prev = idx
    return True

    def bSearch(self, prev, nums):
    if not nums or prev >= nums[-1]:
    return -1

    start, end = 0, len(nums) - 1
    while start < end:
    mid = start + (end - start) // 2
    if nums[mid] > prev:
    end = mid
    else:
    start = mid + 1
    return nums[start] if nums[start] > prev else -1

Solution2. Next Letter Pointer

  • Offline Algorithms, Precalculate words
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Solution:
def numMatchingSubseq(self, S, words):
indices = [[] for _ in range(26)]
for word in words:
indices[ord(word[0]) - ord('a')].append(word[1:])

cnt = 0
for ch in S:
cans = indices[ord(ch)-ord('a')]
newCans = []
for can in cans:
if len(can) == 0:
cnt += 1
continue
if can[0] == ch:
newCans.append(can[1:])
else:
indices[ord(can[0])-ord('a')].append(can[1:])
indices[ord(ch)-ord('a')] = newCans
return cnt

727. Minimum Window Subsequence

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class Solution(object):
def minWindow(self, S, T):
n = len(S)
# Must be the start index of the results!!!
cur = [i if S[i] == T[0] else -1 for i in range(n)]

for j in xrange(1, len(T)):
last = -1
new = [-1] * n
for i, ch in enumerate(S):
if last >= 0 and ch == T[j]:
new[i] = last
if cur[i] >= 0:
last = cur[i]
cur = new

start, end = 0, n
for e, s in enumerate(cur):
if s >= 0 and e - s < end - start:
start, end = s, e
return S[start:end+1] if end != n else ""
Share