Kth Smallest Element and median of Two Sorted Arrays

Given two sorted arrays find the element which would be kth in their merged and sorted combination.

For example, A=[1, 1, 2, 3, 10, 15] and B=[-1, 2, 3, 4, 6, 7] then k=8th smallest element would be 4 as it appears in 8th position of the merged sorted array=[-1, 1, 1, 2, 2, 3, 3, 4, 6, 7, 10, 15].

A trivial solution would be to do a merge operation as we do in merge phase of the merge sort while increasing a counter whenever we chose an element in the merge phase (i.e. when we increase one of the marker index). This guarantees O(k) time solution. Can we do better?

As both the arrays are sorted so in best case scenario when both arrays are identical then kth element in the merged array would be k/2 the element in one of the arrays. If they are not identical then kth element will reside in one of the 2 partitions of one of the sorted arrays. This signals us that we could discard searching in some partitions. How? Off-course we can try binary search over two sorted arrays.

We can try to find two mid point index i and j in the two arrays respectively such that A[i] falls in between B[j-1] and B[j] i.e. B[j-1] < A[i] < B[j] . Notice that this guarantees that A[i] is that (i+j+1) th smallest element because there is exactly j elements in B less than A[i] (because A[i] > B[j-1] ) and there are exactly i elements in A less than A[i]. So, if we want to find kth smallest element then we need to find middle element A[i] and B[j]
maintaining the invariant that i+j+1 = k.

i + j = k – 1

Also note that if we have A[i] > B[j] then we know that kth smallest can't be in A[i+1..] or in B[..j]. So we can discard these two halves. This guarantees logarithm search time. Also, note that if A[i] > B[j] then we are already discarding elements in B[..j] which are smaller than kth smallest. So, we arrive the following conclusions -

  • A[i] > B[j] left partition of B are smaller and right partition of A are greater than kth smallest.
    • Discard A[i+1..r1] and B[p2..j]
    • Update k = k-(j-p2+1) as we are discarding smaller elements in B[p2..j]
  • Otherwise left partition of A are smaller and right partition of B are greater than kth smallest.
    • Discard A[p1..i] and B[j+1..r2]
    • Update k = k-(i-p1+1) as we are discarding smaller elements in A[p1..i]

Below is the implementation of the above binary search based algorithm that takes O(lgn1+lgn2) = O(lgn) time to find the kth element from two sorted arrays. We can also use this utility to find median of two sorted arrays. If n1+n2=n is even then median is the average of k=(n/2-1) th and k=(n/2)th smallest elements. If n is odd then median is k=(n/2)th smallest element.

public static double findMedianSortedArrays1(int A[], int B[]) {
	int m = A.length;
	int n = B.length;
 
	if ((m + n) % 2 != 0) // odd
		return (double) findKth(A,0, m - 1, B, 0, n - 1, (m + n) / 2);
	else { // even
		return (findKth(A, 0, m - 1, B, 0, n - 1, (m + n) / 2) 
			+ findKth(A, 0, m - 1, B, 0, n - 1, (m + n) / 2 - 1)) * 0.5;
	}
}

public static int findKth(int A[], int p1, int r1, int B[], int p2, int r2, int k) {
	int n1 = r1-p1+1;
	int n2 = r2-p2+1;
	
	//base cases
	if(n1 == 0){
		return B[p2+k];
	}
	if(n2 == 0){
		return A[p1+k];
	}
	//
	if(k == 0){
		return Math.min(A[p1], B[p2]);
	}
	
	//select two index i,j from A and B respectively such that If A[i] is between B[j] and B[j-1] 
	//Then A[i] would be the i+j+1 smallest element because.
	//Therefore, if we choose i and j such that i+j = k-1, we are able to find the k-th smallest element.
	int i = n1/(n1+n2)*k;//let's try tp chose a middle element close to kth element in A 
	int j = k-1 -i;
	
	//add the offset
	int mid1 = Math.min(p1+i, r1);
	int mid2 = Math.min(p2+j, r2);
	
	//mid1 is greater than mid2. So, median is either in A[p1...mid1] or in B[mid2+1...r2].
	//we have already see B[p2..mid2] elements smaller than kth smallest
	if(A[mid1] > B[mid2]){
		k = k - (mid2-p2+1);
		r1 = mid1;
		p2 = mid2+1;
	}
	//mid2 is greater than or equal mid1. So, median is either in A[mid1+1...r1] or in B[p2...mid2].
	//we have already see A[p1..mid1] elements smaller than kth smallest
	else{
		k = k - (mid1-p1+1);
		p1 = mid1+1;
		r2 = mid2;
	}
	
	return findKth(A, p1, r1, B, p2, r2, k);
}