Given an unbalanced binary tree, write code to select k sample node at random

straightforward solution would be to list all the nodes in any traversal order (BFS, DFS, etc) and find random k index from the n nodes in the list. But how this approach would behave if n is very large and k is very small? or n is unknown? When n is very large the probability of choosing an element from n , k/n is very very small number when n>>k. Usual method for generating (e.g. random()*(i+1) or rand()%(i+1)). is very skewed. The distribution of selection probability trends to have long negative tail. So, obviously you can’t guarantee an uniform sampling.

Choosing uniformly distributed random sample from an unknown (or very large) polulatiom is an interesting problem. There is a very famous algorithm to solve this problem in O(n) using randomized algorithms. This is called Reservoir sampling.

The idea is to keep a reservoir of k elements and select an element from the population of size n with equal probability of k/n. How to do that?

- Select first k (index i = 0 to k-1) elements and add to the reservoir
- For rest of the elements, iterate for each element at index i = k to n, find a uniform random index 0<=j<=i
- If j is less than k then replace the reservoir element at index j by current population element at index i.

Question is how the uniform selection probability is guaranteed? Let’s do some math. An element can be in the final reservoir if either it was selected from index 0 to k-1 and was not replaced in subsequent iterations or it was selected from index k to n-1 and took place in reservoir by replacing an element at index 0 to k-1.

- Case 1 (elements from index 0 to k-1) : probability that an element at index i=0 to k-1 will be in the reservoir = probability for choosing each of the random index selected for replacement during iterations k to n-1 without using index i element = (k/k+1)*(k+1/k+2)*…*((n-1)/n) = k/n
- Case 2 (elements from index k to n-1) : probability that an element at index i=k to n-1 will be in the reservoir = (probability of choosing an element at index i=k to n-1) * (probability of choosing an element at index j =0 to k-1 to be replaced by element at i). Let’s start from last element. probability of choosing last element at index i=n-1 = probability of choosing an element at index j=0 to k-1 = k/n. Let’s consider second last term. Probability of choosing element at index i=n-2 = (probability that element at index j = 0 to k-1 was selected for replacement in previous iteration, i=n-2) * (probability that index j=0 to k-1 selected for replacement, in current iteration is not equal to j’) = (k/(n-1))*((n-1)/n) = k/n

Now, we understand that we can use reservoir sampling to get uniformly random sample of size k from a population of size n. Let’s get back to original problem. We need to find the samples for a binary tree. Without knowing the total number of elements we can find k random sample by keeping an index starting at index=0 and increment by one whenever we come across a tree node. Using such an index we can apply reservoir sampling while traversing the tree in any traversal order as follows in O(n) time –

public static TreeNode[] randomKSampleTreeNode(TreeNode root, int k){ TreeNode[] reservoir = new TreeNode[k]; Queue<TreeNode> queue = new LinkedList<TreeNode>(); queue.offer(root); int index = 0; //copy first k elements into reservoir while(!queue.isEmpty() && index < k){ TreeNode node = queue.poll(); reservoir[index++] = node; if(node.left != null){ queue.offer(node.left); } if(node.right != null){ queue.offer(node.right); } } //for index k+1 to the last node of the tree select random index from (0 to index) //if random index is less than k than replace reservoir node at this index by //current node while(!queue.isEmpty()){ TreeNode node = queue.poll(); int j = (int) Math.floor(Math.random()*(index+1)); index++; if(j < k){ reservoir[j] = node; } if(node.left != null){ queue.offer(node.left); } if(node.right != null){ queue.offer(node.right); } } return reservoir; }