Implementing Quicksort in Python

Just like Mergesort, Quicksort is another favorite sorting algorithm with best case O(n logn) but worst case O(n^2). However, compare to merge sort, this algorithm doesn’t use extra space at all. Find the article on wikipedia if you haven’t seen this algorithm yet: https://en.wikipedia.org/wiki/Quicksort

The idea is as below:

  • Pick pivot number (any number, but someone said pick middle or random to minimize the worst case)
  • Reorder the array so that elements less than pivot come before the pivot, and elements greater than pivot come after. After this, pivot is in its final position. This process is called partitioning.
  • Recursively apply the above to sublists elements with smaller values and separately with greater values.
def quicksort(nums):
  sort(nums, 0, len(nums)-1)

def sort(nums, first, last):
  if first < last:
    pivot = partition(nums, first, last)
    sort(nums, first, pivot-1)
    sort(nums, pivot+1, last)

def partition(nums, first, last):
  pivot = first + (last - first + 1)/2
  swap(nums, pivot, last)
  for i in range(first, last):
    if nums[i] <= nums[last]:
      swap(nums, i, first)
      first += 1
  swap(nums, first, last)
  return first

def swap(nums, left, right):
  nums[left], nums[right] = nums[right], nums[left]

Implementing Merge Sort in Python

Merge Sort

Merge sort is a popular example for divide and conquer algorithm. Divide the lists and merge it back in sorted order. Check out the wikipedia if you haven’t seen it before, it’s cool: https://en.wikipedia.org/wiki/Merge_sort

The basic idea of the algorithm is as follow:

  • Given list of numbers, divide into sublist until we have n number of list that contain 1 number each
  • Repeatedly merge the two sublists together until we obtain sorted list

The solution in python is as below:

def merge_sort(nums):
  if len(nums) <= 1:
    return nums
  mid = len(nums)/2
  left = merge_sort(nums[:mid])
  right = merge_sort(nums[mid:])
  return merge(left, right)

def merge(left, right):
  if not left:
    return right
  if not right:
    return left
  if left[0] < right[0]:
    return [left[0]] + merge(left[1:], right)
  return [right[0]] + merge(left, right[1:])

Remove Element and Move Zeros in Array: using two pointers

These are simple questions asked in Facebook interview.

Part I

Given an array and a value, remove all instances of that value in place and return the new length.

The order of elements can be changed. It doesn’t matter what you leave beyond the new length.

My Solution

The idea is to keep pointer to position index and store here if we found element that is not zero. The time complexity is O(n).

def remove(nums):
  """
  :type nums: List[int]
  :type val: int
  :rtype: int
  """
  if not nums:
    return 0
  i = 0
  for n in nums:
    if n != val:
      nums[i] = n
      i += 1
  return i
  

Part II

Given an array nums, write a function to move all 0’s to the end of it while maintaining the relative order of the non-zero elements.

For example, given nums = [0, 1, 0, 3, 12], after calling your function, nums should be [1, 3, 12, 0, 0].

Note:
You must do this in-place without making a copy of the array.
Minimize the total number of operations.

My Solution

The idea is similar to the first one. Keep a pointer to indicate the position of array to store to. The time complexity is O(n).

def move_zero(nums):
  i = 0
  for n in nums:
    if n != 0:
      nums[i] = n
      i += 1
  while i<len(nums):
    nums[i] = 0
    i += 1

Remove Duplicates from Sorted Array: using two pointers

These two questions were asked in Facebook interview.

Part I

Given a sorted array, remove the duplicates in place such that each element appear only once and return the new length.

Do not allocate extra space for another array, you must do this in place with constant memory.

For example,
Given input array nums = [1,1,2],

Your function should return length = 2, with the first two elements of nums being 1 and 2 respectively. It doesn’t matter what you leave beyond the new length.

My Solution

The idea is very simple to get time complexity of O(n). Use the first pointer to point the position which element is being stored to, and the other pointer to iterate the elements. If the current element and the previous element are different, store the current element in the first pointer and move one. Make exception for the first element and in case element is only 0 or 1 size.

def remove_duplicates(nums):
  if not nums:
    return 0
  idx = 0
  for i in range(len(nums)):
    if i==0 or nums[i] != nums[i-1]:
      nums[idx] = nums[i]
      idx += 1
  return idx

Part II

Follow up for “Remove Duplicates”:
What if duplicates are allowed at most twice?

For example,
Given sorted array nums = [1,1,1,2,2,3],

Your function should return length = 5, with the first five elements of nums being 1122 and 3. It doesn’t matter what you leave beyond the new length.

My Solution

The idea is to skip the middle element and check the previous and next elements. If these are different, store the current element to the position index. The time complexity for this would be O(n).

def max_two_duplicates(nums):
  if len(nums) < 3:
    return len(nums)
  idx = 1
  for i in range(1, len(nums)-1):
    if nums[i-1] != nums[i+1]:
      nums[idx] = nums[i]
      idx += 1
  # add the last element
  nums[idx] = nums[-1]
  return idx + 1

Summary Ranges: using two pointers to return summary of an array

This was asked in Google interview.

Given a sorted integer array without duplicates, return the summary of its ranges.

For example, given [0,1,2,4,5,7], return ["0->2","4->5","7"].

My Solution

The idea is using two pointers to maintaining the start index and end index, then format it correctly. If the start and end index are the same, format it without the arrow, otherwise add arrow in between. The time complexity would be O(n) and the space is O(n) since I use extra space array to store the result.

def summary_range(nums):
  if not nums:
    return []
  start, i, res = 0, 0, []
  while i<len(nums)-1:
    if nums[i+1]-nums[i] != 1:
      format_range(res, nums, start, i)
      start = i+1
    i += 1
  # for the last element or in case of 1 element
  format_range(res, nums, start, i)
  return res

def format_range(res, nums, start, end):
  if start == end:
    res.append(str(nums[start]))
  else:
    res.append(str(nums[start]) + "->" + str(nums[end]))

Spiral Matrix

This question was asked during my second round interview with Yahoo search team.

Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral order.

For example,
Given the following matrix:

[
 [ 1, 2, 3 ],
 [ 4, 5, 6 ],
 [ 7, 8, 9 ]
]

You should return [1,2,3,6,9,8,7,4,5].

My Solution

This is pretty straight forward O(mn) solution by traversing right, down, left, and up until it reaches the center. But careful with the duplication by restricting left and up traversal with the start/end indexes.

def spiral_order(matrix):
  if not matrix:
    return []
  res = []
  rowStart, rowEnd = 0, len(matrix)
  colStart, colEnd = 0, len(matrix[0])

  # traverse until the center matrix
  while rowStart < rowEnd and colStart < colEnd:
    # traverse right
    for i in range(colStart, colEnd):
      res.append(matrix[rowStart][i])
    rowStart += 1

    # traverse down
    for j in range(rowStart, rowEnd):
      res.append(matrix[j][colEnd-1])
    colEnd -= 1

    # traverse left, careful on the duplication
    if rowStart < rowEnd:
      for k in range(colEnd-1, colStart-1, -1):
        res.append(matrix[rowEnd-1][k])
    rowEnd -= 1

    # traverse up, careful on the duplication
    if colStart < colEnd:
      for l in range(rowEnd-1, rowStart-1, -1):
        res.append(matrix[l][colStart])
    colStart += 1

  return res

Word Search II

This question is a follow up from my previous post (Word Search) that was asked in Airbnb.

Given a 2D board and a list of words from the dictionary, find all words in the board.

Each word must be constructed from letters of sequentially adjacent cell, where “adjacent” cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once in a word.

For example,
Given words = ["oath","pea","eat","rain"] and board =

[
  ['o','a','a','n'],
  ['e','t','a','e'],
  ['i','h','k','r'],
  ['i','f','l','v']
]

Return ["eat","oath"].

Note:
You may assume that all inputs are consist of lowercase letters a-z.

My Solution

Unlike previous question, this has to be optimized to accommodate large test cases as we currently have list of words instead of just one word (see my previous post). This can be done by immediately stop the backtrack if current candidate doesn’t exist in all words prefix. The idea is to build up the prefix of all words using trie, and limit our scope of search in a trie to speedup the computation. And again do DFS to search all possible paths/strings and append to the result. During the traversal, if we reach the leaf of a node, it means we have found a path, add it to the result but keep going as other words might also contain this word as their prefixes. Return if a letter doesn’t exist in the trie at all.

def word_search(board, words):
  if not board or not words:
    return None
  # build up the prefix trie
  trie = {}
  for w in words:
    t = trie
    for c in w:
      if c not in trie:
        t[c] = {}
      t = t[c]
    t["#"] = "#"  # mark the leaf node
  
  res = []
  for i in range(len(board)):
    for j in range(len(board[0])):
      dfs_helper(board, i, j, trie, '', res)
  return list(set(res))  # a word might have more than one path, use set to remove duplicates

def dfs_helper(board, i, j, trie, path, res):
  # we found leaf node, do not return here but keep going
  if "#" in trie:
    res.append(path)  # add to res array, remember in python array is referred by reference, not value like Java
  
  if i<0 or j<0 or i>=len(board) or j>=len(board[0]) or board[i][j] not in trie:
    return
  tmp = board[i][j]  # first letter found
  board[i][j] = "@"  # mark visited

  res = dfs_helper(board, i-1, j, trie[tmp], path+tmp, res) or \
        dfs_helper(board, i+1, j, trie[tmp], path+tmp, res) or \
        dfs_helper(board, i, j-1, trie[tmp], path+tmp, res) or \
        dfs_helper(board, i, j+1, trie[tmp], path_tmp, res)

Word Search

This question was asked in Facebook:

Given a 2D board and a word, find if the word exists in the grid.

The word can be constructed from letters of sequentially adjacent cell, where “adjacent” cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

For example,
Given board =

[
  ['A','B','C','E'],
  ['S','F','C','S'],
  ['A','D','E','E']
]

word = "ABCCED", -> returns true,
word = "SEE", -> returns true,
word = "ABCB", -> returns false.

My Solution

We can see clearly this is a backtracking problem (depth-first search). By trying the letter from top left, and using DFS check if there are ways to construct the word vertically and horizontally. If we can’t find the solution, keep going to try different letter until we have tried all possibilities. This approach will cost O(n^2) to check all elements, plus O(b^m) where b is branching factor (in this case: 2) and m is maximum depth of the state space.

def word_exist(board, word):
  if not board or not word:
    return False
  for i in range(len(board)):
    for j in range(len(board[0])):
      if self.dfs(board, word, i, j):
        return True
  return False

# traverse the grid
def dfs(board, word, i, j):
  if not word:  # we have reached end of the word
    return True
  if i<0 or j<0 or i>=len(board) or j>=len(board[0]) or word[0] != board[i][j]:
    return False

  # we found the first letter
  tmp = board[i][j]  # store the letter in tmp var to do backtracking later
  board[i][j] = "#"  # mark as visited

  res = dfs(board, word[1:], i-1, j) or \
        dfs(board, word[1:], i+1, j) or \
        dfs(board, word[1:], i, j-1) or \
        dfs(board, word[1:], i, j+1)
  board[i][j] = tmp  # restore back the original letter
  return res
  

Triangle

Given a triangle, find the minimum path sum from top to bottom. Each step you may move to adjacent numbers on the row below.

For example, given the following triangle

[[2],
[3,4],
[6,5,7],
[4,1,8,3]]

The minimum path sum from top to bottom is 11 (i.e., 2 + 3 + 5 + 1 = 11).

Note:
Bonus point if you are able to do this using only O(n) extra space, where n is the total number of rows in the triangle.

It is a good practice to approach this problem using dynamic programming technique. However, at first the array looks like a tree from its form, where quickly people think to traverse the numbers using DFS. If we look carefully, every node here always shares a child which is an “overlapping subproblems”. For example from the above example, in the 2nd row: 3 has 6 and 5 as children; 4 has 5 and 7 as children. Here, 5 is both child for 3 and 4. Going down to the bottom we will find more shared children. Clearly this is a dynamic programming problem.

This can be solved using both top-down or bottom-up techniques. I would pick bottom-up as this is pretty straight forward. We will go from the second last row. Path sum of a node will be the minimum value of its two children plus its own value. From the above example, in case of row 3 (2nd last row), path sum of 6 will be ‘min(4,1) + 6’. By generalizing this, we will get the bottom-up approach as below with space complexity O(1) and time complexity O(n^2):

def min_sum(triangle):
  if not triangle:
    return None
  for i in range(len(triangle)-2, -1, -1):
    for j in range(len(triangle[0])):
      triangle[i][j] += min(triangle[i+1][j], triangle[i+1][j+1])
  return triangle[0][0]

A slightly different bottom-up approach if we don’t want to change the original input due to thread safe issue or whatever reason. This is going to be O(n) space complexity. Note that when we compute a row, rows after this would be useless for the result, thus we can do this by using 1D array and iteratively update itself.

def min_sum(triangle):
  if not triangle:
    return None
  res = triangle[-1]
  for i in range(len(triangle)-2, -1, -1):
    for j in range(len(triangle[0])):
      res[j] = min(res[j], res[j+1]) + triangle[i][j]
  return res[0]

Contains Duplicate

Part I

This is a 3 sets of array problems from Airbnb. Part I would be very easy and straight forward:

Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears at least twice in the array, and it should return false if every element is distinct.


My Solution

The easiest way would be sort the array first and check if the current element is different with the previous element. The time complexity would be O(n log n).

def contain_duplicate(nums):
  """
  :type nums: List[int]
  :rtype: bool
  """
  if not nums or len(nums) == 1:
    return False
  nums = sorted(nums)
  for i in range(1, len(nums)):
    if nums[i] == nums[i-1]:
      return True
  return False

This can be optimized to O(n) using hash table.

def contain_duplicate(nums):
  table = {}
  for n in nums:
    if n in table:
      return True
    table[n] = 1
  return False

Part II

Given an array of integers and an integer k, find out whether there are two distinct indices i and j in the array such that nums[i] = nums[j] and the difference between i and j is at most k.


My Solution

The approach is the same, store the nums as key and index as value, if the nums exist in the table compare both of indexes.

def contain_duplicate(nums, k):
  table = {}
  for i, n in enumerate(nums):
    if n in table and i - table[n] &amp;lt;= k:
      return True
    table[n] = i
  return False

Part III

Given an array of integers, find out whether there are two distinct indices i and j in the array such that the difference between nums[i] and nums[j] is at most t and the difference between i and j is at most k.


My Solution

This problem is very tricky, compare to the first two problems. In order to get O(n), we can use the idea from bucket sort. We keep a bucket with size k that will dynamically change. If we use BST to hold the nums, since BST would be O(log n) to access the element, the whole solution would cost O(n log n). But if we use bucket (or sliding window) by applying hash table, since lookup and deleting in hashtable only cost O(1), the whole solution would cost O(n).

For example, suppose we have t=2 and we found 3 in the array. The possible nums would be 1,2,3,4,5, by dividing with t, 3 would be in the bucket #1 (3 divided by 2) and the possible numbers can be found in bucket #0 or #2 or #1 itself. We then maintain the size of bucket to be k and delete the bucket that is far away from the index. It is guaranteed that at any time there will be at most min(k, len(nums)). The solution is as below:

def contain_duplicate(nums, k, t):
  buckets = {}
  offset = 1
  for i, n in enumerate(nums):
    # beware of t=0
    b_num = n/t if t else n  
    for b in range(b_num - offset, b_num + offset + 1):
      if b in buckets and abs(buckets[b] - n) <= t:
        return True

    buckets[b_num] = n

    # remove index that is too far
    if len(buckets) > k:
      b_del = nums[i-k]/t if t else nums[i-k]
      del buckets[b_del]
  return False