Data Stream

Catalogue
  1. 1. Data Stream
    1. 1.1. 703. Kth Largest Element in a Stream
    2. 1.2. 346. Moving Average from Data Stream
    3. 1.3. 295. Find Median from Data Stream
    4. 1.4. Find the first non-repeating character from a Stream
  2. 2. Sliding Window
    1. 2.1. Slinding Window Average
    2. 2.2. 239. Sliding Window Maximum
      1. 2.2.1. Solution1. Naive
      2. 2.2.2. Solution2. Heap
      3. 2.2.3. Solution3. Deque
    3. 2.3. 480. Sliding Window Median

Data Stream

703. Kth Largest Element in a Stream

  • heap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class KthLargest(object):

def __init__(self, k, nums):
self.size = k
heapq.heapify(nums)
self.heap = nums
while len(self.heap) > k:
heapq.heappop(self.heap)

# Time : O(log k), Space : O(k)
def add(self, val):
heapq.heappush(self.heap, val)
if len(self.heap) > self.size:
heapq.heappop(self.heap)
return self.heap[0]

346. Moving Average from Data Stream

  • sliding window : deque - double queue
1
2
3
4
5
6
7
8
9
10
11
12
13
class MovingAverage(object):

def __init__(self, size):
self.sum = 0
self.size = size
self.queue = collections.deque()

def next(self, val):
self.queue.append(val)
self.sum += val
if len(self.queue) > self.size:
self.sum -= self.queue.popleft()
return self.sum / (1.0 * len(self.queue))

295. Find Median from Data Stream

  • min Heap, max Heap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class MedianFinder(object):

def __init__(self):
self.maxHeap = []
self.minHeap = []

def addNum(self, num):
heapq.heappush(self.maxHeap, -num)
heapq.heappush(self.minHeap, -heapq.heappop(self.maxHeap))
m1, m2 = len(self.maxHeap), len(self.minHeap)
if m1 < m2:
heapq.heappush(self.maxHeap, -heapq.heappop(self.minHeap))

def findMedian(self):
m1, m2 = len(self.maxHeap), len(self.minHeap)
if m1 == m2:
return (self.minHeap[0] - self.maxHeap[0]) / 2.0
else:
return -self.maxHeap[0] * 1.0
  • follow up : what if the number of element is toooo large to be stored into the memory?

Find the first non-repeating character from a Stream

Use Case:

  1. We need to somehow record which one is the First (order)
    When a new element comes in
  • we found that previous solution is now not the solution any more, we need to update the solution to the next one.(Use a doubl linked list)
  1. We also need to record what kind of letters that have appeared (non-repeating)
  • Hash Map <key:character, val:Node>

(Same with LRU!!!)

Sliding Window

Slinding Window Average

  • Time : O(1), Space :(window size)
  • Queue to maintain the window
  • variable sum to record sum , update O(1)

239. Sliding Window Maximum

total data - n, window size - k

Solution1. Naive

Time : O((n-k)k), (O(k) for online data stream)

Solution2. Heap

Time : O((n-k)logk)

  1. Initialization:insert all first k elements into the max-heap.
  2. Then: when the sliding window moves to the right side step by step
    1 new element comes in.
    1 left-most element should be removed from the sliding window. bu we can temoporarily keep it in the heap, until it becomes the top element in the heap. lazy deletion
    (value, index)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Solution(object):
def maxSlidingWindow(self, nums, k):
if not nums or k == 0:
return []

maxHeap = []
for i in xrange(k-1):
heapq.heappush(maxHeap, (-nums[i], i))

res = []
for i in xrange(k-1, len(nums)):
heapq.heappush(maxHeap, (-nums[i], i))
while maxHeap[0][1] <= i-k:
heapq.heappop(maxHeap)
res.append(-maxHeap[0][0])
return res
  • When we want to call max_heap.top(), the only thing we should be careful about is to check, whether this top elemtns’s index is < left border of the sliding window. If so, keep poping it out.

Solution3. Deque

Time : O(n)

  • We must maintain all the values in the dequeue to keep them in a descending order.
  • because when a new element X comes in, if it is bigger and newer than the right most element r, then r cannot be the solution. whatsoever, we can just delete r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Solution(object):
def maxSlidingWindow(self, nums, k):
queue = collections.deque()
res = []
for i, num in enumerate(nums):
if queue and queue[0] <= i-k:
queue.popleft()

while queue and num > nums[queue[-1]]:
queue.pop()

queue.append(i)
if i >= k - 1:
res.append(nums[queue[0]])
return res
  • So the dequeue[left-most] is the final result to return whenever the window slides one step to the right.

480. Sliding Window Median

  • double heap + lazy clear
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class Solution(object):
def medianSlidingWindow(self, nums, k):
if k == 1:
return map(float, nums)

minHeap, maxHeap = [], []
for i in xrange(k):
heapq.heappush(maxHeap, (-nums[i], i))

for _ in xrange(k/2):
val, idx = heapq.heappop(maxHeap)
heapq.heappush(minHeap, (-val, idx))

res = []
for i in xrange(k, len(nums)):
res.append(-maxHeap[0][0]*1.0 if k % 2 else (minHeap[0][0]-maxHeap[0][0])/2.0)

if nums[i] >= minHeap[0][0]:
heapq.heappush(minHeap, (nums[i], i))
if nums[i-k] <= -maxHeap[0][0]:
val, idx = heapq.heappop(minHeap)
heapq.heappush(maxHeap, (-val, idx))
else:
heapq.heappush(maxHeap, (-nums[i], i))
if nums[i-k] >= minHeap[0][0]:
val, idx = heapq.heappop(maxHeap)
heapq.heappush(minHeap, (-val, idx))

while maxHeap[0][1] <= i-k:
heapq.heappop(maxHeap)
while minHeap[0][1] <= i-k:
heapq.heappop(minHeap)
res.append(-maxHeap[0][0]*1.0 if k % 2 else (minHeap[0][0]-maxHeap[0][0])/2.0)
return res
Share