Given a random string S and another string T with unique elements, find the minimum consecutive sub-string of S such that it contains all the elements in T.
The problem is similar to find all anagram of a string as substring of another given string. Please read my previous article here to understand the solve the anagram problem. The problem in this article is a slight modification to the anagram problem where instead of finding anagram sunstring of fixed length we need to find minimum length substring that all characters of the anagram.
The optimum minimum length solution would be an anagram of T as a substring of S. For example, S = adbacfab and T = abc then answer is bac which is an anagram of T and a substring of S. Now let’s consider S= adobecodebanc and T = abc then we can see that there is no such anagram of T as a substring of S. But there are non-anagram substrings that contains all the characters. for example, there are 3 on-anagram substring that contains all characters of T : adobec, codeba, and banc. The minimumlength solution is bnc.
If you have already read my previous article you might have already figured out that the problem is question can be solved using a sliding window approach. If we assume the characters in the strings are coming from a constant size alphabet then we can solve this problem in O(n) time and constant space.
We start by computing an initial window of character histogram of first m characters where m is the length of T. If this window contains all the character then this is the solution. If not all characters are present then we need to extend the window to right until we have a window with all characters in T. Once we find a solution then we can slide the window.
But note that, once we find a window with all characters then there is no point of sliding (remove first character of window and add another character on right) as long as total number of matching characters remain same (why?). So, it makes more sense to shrink the window from left to remove unwanted characters until the shrinking window remove a required character. At this point we are free to extend the window to match all required characters. We continue this process until window start reaches the end of the string.
Below is the implementation of the above idea. Note that matching two histograms is of constant orders as the alphabet size is constant fixed. So, the overall time complexity is O(n) and space complexity is O(1).
Another version of this algorithm can be formulated as a different problem – finding minimum length subarray that sums to a given positive number.
Given an array of n positive integers and a positive integer s, find the minimal length of a subarray of which the sum ≥ s. If there isn’t one, return -1 instead.
For example, given the array [2,3,1,2,4,3] and s = 7, the subarray [4,3] has the minimal length of 2.
We can use similar growing-shrinking sliding window to solve this problem. We will stretch the window as long as sum is less than given number. Otherwise, we can shrink the window. Below is the O(n) implementation of this solution.