Sampling & Randomization

Catalogue
  1. 1. Reservoir Sampling
    1. 1.1. Shuffling Algorithm
    2. 1.2. Sampling for an unlimited data flow
    3. 1.3. Return k out of n log
    4. 1.4. 398. Random Pick Index
    5. 1.5. Return a random largest number’s index
  2. 2. Randomization
    1. 2.1. 470. Implement Rand10() Using Rand7()
    2. 2.2. Design Random(1,000,000) with Random(5)
    3. 2.3. 528. Random Pick with Weight
  3. 3. Random
    1. 3.1. 380. Insert Delete GetRandom O(1)

Reservoir Sampling

Shuffling Algorithm

pokers(spades, hearts, diamonds, clubs)

  1. Iteration 1
    Call random(0-51), lets say, random_numbers1 = 5
    P(every card can showup in position 51) = 1/52

  2. Iteration 2
    Call random(0-50), lets say, random_numbers2 = 8
    Every card was NOT selected during previous iteration = 1 - 1/52
    P(every card showup in position 50) = (1-1/52) * 1/51 = 1/52

  3. Iteration 3
    Every card was NOT selected during previous iteration = 1 - 1/52
    Every card was NOT selected during previous iteration = 1 - 1/51
    P(every card showup in position 50) = (1-1/52) (1-1/51) 1/50 = 1/52

…..

  • Select random k elements in an array of size n

Sampling for an unlimited data flow

How to do sampling for an unlimited data flow and when reading the n-th element we are required to return one random number among all numbers read so far, such that the probability of returning any element read so far is 1/n.

  • O(1) sapce complexity
  1. t = 0 call random[0-0] = random_num1, if random_num1 == 0 -> solu = input[0]
    P(input[0] will be returned as the solu) = 100%
  2. t = 1 call random[0-1] = random_num2, if random_num2 == 0 -> solu = input[1]
    P(input[0] will be returned as the solu) = 1 - 50%
    P(input[1] will be returned as the solu) = 50%
  3. t = 2 call random[0-2] = random_num3, if random_num3 == 0 -> solu = input[2]
    P(input[0] will be returned as the solu) = 1/2 (1 - 1/3) = 1/3
    P(input[1] will be returned as the solu) = 1/2
    (1 - 1/3) = 1/3
    P(input[2] will be returned as the solu) = 1/3
    ….

Return k out of n log

  • same thing as above!
  • first k element put into the reservoir
  • when new element comes in, generate a random number from 0 to its index
  • if index < k, replace the new element with the k postion element

398. Random Pick Index

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Solution(object):

def __init__(self, nums):
self.nums = nums

def pick(self, target):
cnt = 0
res = 0
for i in xrange(len(self.nums)):
if self.nums[i] != target:
continue
if random.randint(0, cnt) == 0:
res = i
cnt += 1
return res

Return a random largest number’s index

Randomization

Random(5) -> Random(25) -> Random(7)
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24

  1. call Random(5) for the 1st time = random_num1[0—4] = row number
  2. call Random(5) for the 2nd time = random_num2[0—4] = col number
    generated_number = r1 * 5 + r2

470. Implement Rand10() Using Rand7()

1
2
3
4
5
6
7
8
9
class Solution(object):
def rand10(self):
r1 = rand7()
r2 = rand7()
num = (r1-1) * 7 + (r2-1)
if num <= 39:
return (num % 10) + 1
else:
return self.rand10()

Design Random(1,000,000) with Random(5)

  • $log_5(1000000)$

528. Random Pick with Weight

  • PreSum + Binary Search
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    class Solution(object):

    def __init__(self, w):
    if not w: return

    s = [w[0]]
    for weight in w[1:]:
    s.append(s[-1] + weight)
    self.sum = s
    print(self.sum)

    def pickIndex(self):
    num = random.randint(1, self.sum[-1])
    start, end = 0, len(self.sum)-1
    while start < end:
    mid = (start + end) / 2
    if self.sum[mid] < num:
    start = mid + 1
    else:
    end = mid
    return start

Random

380. Insert Delete GetRandom O(1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class RandomizedSet(object):

def __init__(self):
self.nums = []
self.dic = {}

def insert(self, val):
if val in self.dic:
return False
self.nums.append(val)
idx = len(self.nums) - 1
self.dic[val] = idx
return True

def remove(self, val):
if val not in self.dic:
return False
last = len(self.nums) - 1
idx = self.dic[val]
if last != idx:
self.nums[last], self.nums[idx] = self.nums[idx], self.nums[last]
self.dic[self.nums[idx]] = idx
del self.dic[val]
self.nums.pop()
return True

def getRandom(self):
return random.choice(self.nums)
Share