Wildcard or Regex Matching

Implement wildcard pattern matching with support for ‘.’ and ‘*’.

‘.’ Matches any single character.
‘*’ Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

Some examples:
isMatch(“aa”,”a”) ? false
isMatch(“aa”,”a.”) ? true
isMatch(“aaa”,”ab*”) ? false
isMatch(“aa”, “.*”) ? true
isMatch(“aa”, “a*”) ? true
isMatch(“ab”, “.*”) ? true
isMatch(“aabaa”, “a*b.*”) ? false

Matching ‘.’ is straightforward : it will match any character at the current position. So, if the pattern starts with ‘.’ then it has to match any character in string. Other wise if pattern starts with a character other than ‘*’ then it has to match the same character in the string. We can continue doing this if until we see a ‘*’.

If pattern contains a character followed by ‘*’ then the pattern matches the starting character zero or more times. First, we can match zero length which is equivalent to not matching the first two character.
If zero match doesn’t go through then we need to match all possible length suffix of string (why?) to match the pattern recursively.

public static boolean matchesFirst(String str, String pat){
	if(str == null){
		return pat == null;
	}
	if(pat == null){
		return str == null;
	}
	return (str.length() > 0 && str.charAt(0) == pat.charAt(0)) || (pat.charAt(0) == '.' && !str.isEmpty());  
}

public static boolean isMatch(String str, String pat){
	//base cases
	if(str == null){
		return pat == null;
	}
	else if(pat == null){
		return str == null;
	}
	if(str.isEmpty()){
		return pat.isEmpty();
	}
	
	//pattern without *
	if((pat.length() == 1 && pat.charAt(0) != '*' ) || pat.length() > 1 && pat.charAt(1) != '*'){
		//must match the first character
		if(!matchesFirst(str, pat)){
			return false;
		}
		//match rest
		String restStr = str.length() > 1 ? str.substring(1) : null;
		String restPat = pat.length() > 1 ? pat.substring(1) : null;
		return isMatch(restStr, restPat);
	}
	//pattern with * (0 or more matches)
	else{
		//zero match of first character of the pattern
		String rigtpat = pat.length() > 2 ? pat.substring(2) : null;
		if(isMatch(str, rigtpat)){
			return true;
		}
		//Otherwise match all possible length prefix of str to match and return true if any match found
		while(matchesFirst(str, pat)){
			str= str.length() > 1 ? str.substring(1) : null;
			if(isMatch(str, rigtpat)){
				return true;
			}
		}
	}
	
	return false;
}

 

Bonus Question: Given a Log file containing huge number of unstructured logs. Find all unique IP addresses appeared in the log file.

This is a classic Pattern Matching problem. We can write a simple regex to match IP address to identify all IP address in the file. The challenges would be handle very large log file. I will leave the scaling part for the user. Below is a simple implementation considering all the lines of the file small enough to fit in memory.

 public static void printIPAddressedd(String file){
		    //read file line by line
		  	String[] lines = {"Oct 16 03:18:05 app1002.corp httpd: 172.18.159.102 - - [16/Oct/2013:03:18:01 +0000] \"GET / HTTP/1.1\" 200 22 \"-\" \"Python-CRT-Client/0.0.8\" 3378887", "Oct 16 03:18:05 web1004.corp httpd: 202.16.73.36 - - [16/Oct/2013:03:18:05 +0000] \"GET /icon.gif HTTP/1.1\" 404 310 \"-\" \"Python-urllib/2.6\" 200", "Dec 16 05:04:45 mail3.corp postfix/smtpd[26986]: disconnect from 172.16.73.2", "Dec 16 05:04:45 app1003.corp postfix/smtpd[26965]: client=172.32.72.5"};
		    String validIp = "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";
		    String ipAddressPatternRegex = validIp+"\\."+validIp+"\\."+validIp+"\\."+validIp;
		    Matcher m;
		    HashSet<String> ips = new HashSet<String>();
		    for(String line : lines){
		        Pattern p = Pattern.compile(ipAddressPatternRegex);
		        m = p.matcher(line);
		        while(m.find()){
		            String ip = m.group().trim();
		            if(!ips.contains(ip)){
		                ips.add(ip);
		                System.out.println(ip);
		            }
		        }
		    }        
		}