Minimum Substring of S Containing Elements in T

Given a random string S and another string T with unique elements, find the minimum consecutive sub-string of S such that it contains all the elements in T.

For example:
S=adobecodebanc
T=abc
answer=’banc’

The problem is similar to find all anagram of a string as substring of another given string. Please read my previous article here to understand the solve the anagram problem. The problem in this article is a slight modification to the anagram problem where instead of finding anagram sunstring of fixed length we need to find minimum length substring that all characters of the anagram.

The optimum minimum length solution would be an anagram of T as a substring of S. For example, S = adbacfab and T = abc then answer is bac which is an anagram of T and a substring of S. Now let’s consider S= adobecodebanc and T = abc then we can see that there is no such anagram of T as a substring of S. But there are non-anagram substrings that contains all the characters. for example, there are 3 on-anagram substring that contains all characters of T : adobec, codeba, and banc. The minimumlength solution is bnc.

If you have already read my previous article you might have already figured out that the problem is question can be solved using a sliding window approach. If we assume the characters in the strings are coming from a constant size alphabet then we can solve this problem in O(n) time and constant space.

We start by computing an initial window of character histogram of first m characters where m is the length of T. If this window contains all the character then this is the solution. If not all characters are present then we need to extend the window to right until we have a window with all characters in T. Once we find a solution then we can slide the window.

But note that, once we find a window with all characters then there is no point of sliding (remove first character of window and add another character on right) as long as total number of matching characters remain same (why?). So, it makes more sense to shrink the window from left to remove unwanted characters until the shrinking window remove a required character. At this point we are free to extend the window to match all required characters. We continue this process until window start reaches the end of the string.

Below is the implementation of the above idea. Note that matching two histograms is of constant orders as the alphabet size is constant fixed. So, the overall time complexity is O(n) and space complexity is O(1).

//O(1) match between two histogram due to constant size alphabet i.e. 256
private static int countMatches(int[] textHist, int[] patHist){
	int match = 0;
	for(int i = 0; i< 256; i++){
		if(patHist[i] > 0 && textHist[i] > 0){
			match++;
		}
	}
	
	return match;
}

public static String minLenSubStringWithAllChars(String str, String t){
	int[] textHist = new int[256];
	int[] patHist = new int[256];
	int start = 0;
	int end = 0;
	int minLen = Integer.MAX_VALUE;
	int bestStart = 0;
	int bestEnd = 0;
	
	//prepare the initial window of size of the char set
	for(end = 0; end < t.length(); end++){
		textHist[str.charAt(end)]++;
		patHist[t.charAt(end)]++;
	}
	
	while(start < str.length()){
		int matches = countMatches(textHist, patHist);
		//if current window doesn't contain all the characters
		//then strech the window to right upto the end of string
		if(matches < t.length() && end < str.length()){
			//strech window
			textHist[str.charAt(end)]++;
			end++;
		}
		//if current window contains all the characters with frequency 
		//at least one then we have the freedom to shrink the window
		//from front. 
		else if(matches >= t.length()){
			//as current window contains all character so update minLen				
			if(end-start < minLen){
				minLen = end-start;
				bestStart = start;
				bestEnd = end;
			}
			//shrink window
			textHist[str.charAt(start)]--;
			start++;
		}
		//if current window doesn't cntains all chars
		//but we can't strech the window anymore then break
		else{
			break;
		}
	}
	
	return str.substring(bestStart, bestEnd);
}

 

Another version of this algorithm can be formulated as a different problem – finding minimum length subarray that sums to a given positive number.

Given an array of n positive integers and a positive integer s, find the minimal length of a subarray of which the sum ≥ s. If there isn’t one, return -1 instead.

For example, given the array [2,3,1,2,4,3] and s = 7, the subarray [4,3] has the minimal length of 2.

We can use similar growing-shrinking sliding window to solve this problem. We will stretch the window as long as sum is less than given number. Otherwise, we can shrink the window. Below is the O(n) implementation of this solution.

public static int minlengthSubarraySum(int[] nums, int sum){
	int minlen = Integer.MAX_VALUE;
	int curSum = 0;
	int start = 0;
	int end = 0;
	
	while(start < nums.length){
		//if current window doesn't add up to the given sum then 
		//strech the window to right
		if(curSum < sum && end < nums.length){
			curSum += nums[end];
			end++;
		}
		//if current window adds up to at least given sum then
		//we can shrink the window 
		else if(curSum >= sum){
			minlen = Math.min(minlen, end-start);
			curSum -= nums[start];
			start++;
		}
		//cur sum less than required sum but we reach the end 
		else{
			break;
		}
	}
	
	return (minlen == Integer.MAX_VALUE) ? -1 : minlen;
}