1. Given a set of positive numbers less than equal to N, where one number is missing. Find the missing number efficiently.
2. Given a set of positive numbers less than equal to N, where two numbers are missing. Find the missing numbers efficiently.
3. Given a sequence of positive numbers less than equal to N, where one number is repeated and another is missing. Find the repeated and the missing numbers efficiently.
4. Given a sequence of integers (positive and negative). Find the first missing positive number in the sequence.
Solutions should not use no more than O(n) time and constant space.
1. A=[2,1,5,8,6,7,3,10,9] and N=10 then, 4 is missing.
2. A=[2,1,5,8,6,7,3,9] and N=10 then, 4 and 10 are missing.
3. A=[2,1,5,8,3,6,7,3,10] and N=10 then, 3 is repeating and 9 is missing.
2. A=[1,2,0] then first missing positive is 3, A=[3,4,-1,1], the first missing positive is 2.
Single Number Missing
A trivial approach would be to sort the array and loop through zero to N-1 to check whether index i contains number i+1. This will take constant space but takes O(nlgn) time. We can do a counting sort to sort the array but still it’ll take in O(n+k) time and O(k) space. But we need to do it O(n) time and constant space, how?
Its rather simple to do it if we apply some elementary mathematics. We know that the input set contains positive numbers less than equal to N. If there were no missing in the sequence then summation of all the numbers would yield a sum of N*(N+1)/2 , which is the value of summation of numbers 1 to N. But if one number is missing then the summation of the given numbers, S will be less than expected sum N*(N+1)/2 and the difference (N*(N+1)/2 – S) is the missing number. This is O(n) time and O(1) space algorithm.
Two Missing Numbers
We can solve it using math same as above. Let’s say p and q are the missing numbers among 1 to N. Then summation of given input numbers,
S = N*(N+1)/2 - p -q =>p+q = N*(N+1)/2 -S
Also, we know that multiplication of numbers 1 to N is N! –
P = N!/pq =>pq = N!/P
Then we can solve these two equations to find the missing number p and q. However this approach has a serious limitation because the product of a large amount of numbers can overflow the buffer. We could have used long but still multiplication operation is not cheap.
Can we avoid multiplication? As the numbers are positive and between the range [1,N] we could use the element of the array as index into the array to mark them as exists. Then the positions for missing element will be unmarked. But it’ll change the array itself. How do we make sure that marking one position we are not losing information at the position we are marking. For example, A=[2,1,5,8,6,7,3,9] then if we mark A[A-1] i.e. A with special value, lets say 0 marking 2 as not missing, then A becomes A’=[2,0,5,8,6,7,3,9], then we are losing information and A[A-1] i.e. A will never get marked to inform us that 1 is not missing.
We can actually overcome the overwriting issue by just negating the number at index A[abs(A[i])-1] for each i. So, we are not losing value but just changing the sign and indexing based on absolute value. After we mark for all the numbers we can now have a second pass on the array and check for unmarked i.e. positive elements. At the same time we can revert the negated elements back to positive thus getting back to original array. Below is the implementation of this idea.
The above solution has a limitation that we assume the input array is not immutable. What if we can’t update the input array (i.e.e immutable) and still we need to find the missing values in O(n) time and constant space? We need some brain teaser here. If we had no missing numbers then xor of all numbers from 1 to n i.e. xor1=1^2^..^n, and xor of all numbers in the array i.e. xor2=A^A^…^A[n-1], they should give us the same result. Hence xor of these two xor results should yield a 0 (as equal by xor).
Now, if two of the numbers are missing then in xor1 all elements would cancel each other except the missing p and q. All the bits that are set in xor1 will be set in either p or q. So if we take any set bit of xor1 and divide the elements of the array in two sets – one set of elements with same bit set and other set with same bit not set. By doing so, we will get x in one set and y in another set. Now if we do XOR of all the elements in first set, we will get x, and by doing same in other set we will get y. Which set bit to chose? We can actually chose any but it is easier to the right most set bit because we can directly get the mask as easy as xor&~(xor-1). Below is the implementation of this idea –
One Missing, One Repeated
What if we have one single number getting repeated twice and one missing? Note that, missing one element and repeating one element is equivalent phenomena with respect to xor arithmetic. Because during xor1 these repeating element will nullify each other and made the element missing in the xor. That is we can use the same procedure described above to find one missing and one repeated element.
First missing Positive
For example, A=[1,2,0] then first missing positive is 3, A=[3,6,4,-1,1], the first missing positive is 2. Can we use some of the above techniques we discussed? Note that, there might be more than one missing numbers as well as negative numbers and zeros. If all numbers were positive then we could have used the 2nd method for finding two missing number where we used the element as index to negate the value for marking them as non-missing. However, in this problem we may have non-positive numbers i.e. zeros and negatives. So, we can’t simply apply the algorithm. But if we think carefully then we notice that we actually don’t have to care about zeros and negative numbers because we only care about smallest positive numbers. That is if we can put aside the non-positive numbers and only considers the positives then we can simply apply the “element as index to mark non-missing by negating the value” method to find the missing positives.
How do we put aside non-positive elements? We can actually do a partition as we do in quicksort to create a partition where all positive elements will be put on left of the partition and all zeros and negatives on the right hand. If we find such a partition index q, then A[0..q-1] will contain all positives. Now, we just have to scan the positive partition of array the i.e. from 0 to q-1 and mark A[abs(A[i])-1] as marked i.e negating the value. After marking phase we sweep through the partition again to find first index i where we find a positive element. Then i+1 is the smallest i.e. first missing positive. If we do not find such an index then there is no missing numbers between 1 to q (why?). In that case we return next positive number q+1 (why?). Below is the implementation of this algorithm which assumes we can update the original array. It runs in O(n) time and constant space.