Given a unsorted array with n elements. How can we find the largest gap between consecutive numbers of sorted version in O(n)?

For example, we have a unsorted array, a=[5, 3, 1, 8, 9, 2, 4] of size n=7 then the sorted version is
[1, 2, 3, 4, 5, 8, 9]. The output of the algorithm should be 8-5 = 3.

Similarly, for a=[5, 1, 8, 9, 999999, 99999] then answer should be 999999-99999 = 900000.

One way to do it is to using counting sort or radix sort if the integer range is known and small compared to size of array. But it is not feasible solution in case we don’t know the range of the values or the values can be very large.

If we think carefully we will observe that if the array contains n numbers within a contagious sequence of numbers with values between a min and max value (inclusive) then the maximum gap is one. Now, in real problem many of these numbers will be missing. It may happen that some range of values are missing in the sequence and thus creating gaps. If we can identify such missing ranges in a cheap way then we can solve this problem in O(n).

The idea is to bucketize the values between min and max value (exclusive) of the given array into a set of equal sized buckets so that at least one empty bucket is formed. Each of such empty buckets will create a gap of size equals to the difference between max value in the previous non-empty bucket and the minimum value in the next non-empty bucket. The motivation behind this is the Pigeonhole Principle which tells us that if we divide n-1 numbers in n buckets than at least one one bucket is empty

Below is a  linear‐time  algorithm  for  computing  the  maximum  gap  allowing  the  constant  time  computation.

Given  a  set  S  of  n > 2  real  numbers  x1, x2, …, xn.  

1. Find  the  maximum,  xmax  and  the  minimum,  xmin  in  S.  
2. Divide  the  interval  [xmin, xmax]  into  (n−1)  "buckets"  of  equal  size  δ = (xmax – xmin)/(n‐1). 
3. For  each  of  the  remaining  n‐2  numbers  determine  in  which  bucket  it  falls  using  the  floor  function.  The number  xi  belongs  to  the  kth bucket  Bk  if,  and  only  if,  (xi ‐ xmin)/δ = k‐1. 
4. For  each  bucket  Bk  compute  xkmin  and xkmax  among  the  numbers  that  fall  in  Bk.  If  the  bucket  is  empty return  nothing.  If the bucket  contains  only  one  number  return  that  number  as  both  xkmin  and xkmax.
5. Construct  a  list  L  of  all  the  ordered  minima  and  maxima:  L: (x1min, x1max), (x2min, x2max), … , (x(n‐1)min, x(n‐1)max), 
        • Note: Since  there  are  n‐1  buckets  and  only  n‐2  numbers,  by  the  Pigeonhole  Principle,  at least  one bucket  must  be  empty.  Therefore  the  maximum  distance  between  a  pair  of consecutive  points  must  be  at  least the  length  of  the  bucket.  Therefore  the  solution  is  not  found  among  a  pair  of  points  that  are  contained  in the  same  bucket. 
6. In  L  find  the  maximum  distance  between  a  pair  of  consecutive  minimum  and  maximum (ximax, xjmin),  where j > i. 
7.  Exit  with  this  number  as  the  maximum  gap.

For example, with a=[5, 3, 1, 8, 9, 2, 4],

a=[5, 3, 1, 8, 9, 2, 4], n=7
1. Min = 1, Max = 9
2. Create 7-1 = 6 buckets with equal size (or width), δ = (9-1)/(7-1) = 8/6 = 1.33
3. Populate bucket with rest of the 5 elements by putting a[i] to bucket number (a[i]-min)/δ. 
Compute max and min in each bucket: 
          
          b0    b1    b2    b3    b4    b5
         ____________________________________
        |_2___|__3__|__4__|__5__|______|__8__|
        ^     ^     ^     ^     ^      ^     ^
        1    2.33   3.66  5    6.33   7.66   9

4. Identify all the empty buckets i.e. gaps and compute gap value = max of previous nonempty bucket - min of next non-empty bucket. 
   For example, in this case b4 is an empty bucket, previous non empty bucket is b3 and next non-empty bucket is b5. 
   So, gap value at b4 = min(b5) - max(b3) = 8-5 = 3.
5. Update a global max and iterate over all empty buckets to perform step 3. 
   As there is only one bucket the answer is 3.

I have implemented the above algorithm using two simple auxiliary arrays to keep track of maximum and minimum in the n-1 buckets with n-2 values (excluding min and max value). The algorithm runs in O(n) time and O(n) space.

public static int maxGap(int[] a){
	int n = a.length;
	if(n < 2){
		return 0;
	}
	
	int max = Integer.MIN_VALUE;
	int min = Integer.MAX_VALUE;
	
	for(int i = 0; i < n; i++){
		max = Math.max(max, a[i]);
		min = Math.min(min, a[i]);
	}
	
	//n-1 buckets -  we only care about max and min in each buckets
	int[] bucketMaxima = new int[n-1];
	Arrays.fill(bucketMaxima, Integer.MIN_VALUE);
	int[] bucketMinima = new int[n-1];
	Arrays.fill(bucketMinima, Integer.MAX_VALUE);
	//bucket width
	float delta = (float)(max-min)/((float)n-1);
	
	//populate the bucket maxima and minima
	for(int i = 0; i < n; i++){
		if(a[i] == max || a[i] == min){
			continue;
		}
		
		int bucketIndex = (int) Math.floor((a[i]-min)/delta);
		bucketMaxima[bucketIndex] = bucketMaxima[bucketIndex] == Integer.MIN_VALUE ? a[i] : Math.max(bucketMaxima[bucketIndex], a[i]);
		bucketMinima[bucketIndex] = bucketMinima[bucketIndex] == Integer.MAX_VALUE ? a[i] : Math.min(bucketMinima[bucketIndex], a[i]);
	}
	
	//find the maxgap - maxgaps
	int prev =  min;
	int maxGap = 0;
	for(int i = 0; i< n-1; i++){
		//empty bucket according to Pigeonhole principle
		if(bucketMinima[i] == Integer.MAX_VALUE){
			continue;
		}
		
		maxGap = Math.max(maxGap, bucketMinima[i]-prev);
		prev = bucketMaxima[i];
	}
	
	maxGap = Math.max(maxGap, max-prev);
	
	return maxGap;
}

Algorithms

The  Maximum  Gap  Problem :  Pigeonhole  Principle

Leave a Reply Cancel reply

Leave a Reply Cancel reply

Login