2.17.2: 2011-06-17

net.sf.basedb.util.fuzzy
Class StringMatcher

java.lang.Object
  extended by net.sf.basedb.util.fuzzy.StringMatcher

public class StringMatcher
extends Object

A wrapper class for fuzzy string matching using the SecondString package. This class uses a given StringDistance object for calculating string similaroties. Use getScore(String, String) to get the similarity score of two strings.

Use getBestMatch(String, Collection) to find the best match of a string among strings in a given list.

Use getBestPairs(Collection, Collection) to pair up strings in two lists.

Version:
2.8
Author:
nicklas
See Also:
SecondString project page
Last modified
$Date: 2008-09-11 22:08:14 +0200 (Thu, 11 Sep 2008) $

Nested Class Summary
static class StringMatcher.FuzzyMatch
          Wrapper that holds information about a fuzzy match.
 
Field Summary
private  com.wcohen.ss.api.StringDistance matcher
           
 
Constructor Summary
StringMatcher()
          Create a new matcher using the default fuzzy matching algorithm: Level2JaroWinkler.
StringMatcher(com.wcohen.ss.api.StringDistance matcher)
          Create a new matcher using a specific fuzzy matching algorithm.
 
Method Summary
 StringMatcher.FuzzyMatch getBestMatch(String key, Collection<? extends String> values)
          Find the string that is most similar to a given string in a list of strings.
 List<StringMatcher.FuzzyMatch> getBestPairs(Collection<? extends String> keys, Collection<? extends String> values)
          Match strings in two lists.
private  int[] getHighestScoreIdx(double[][] score)
           
 double getScore(String s1, String s2)
          Get the similarity score of two strings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

matcher

private final com.wcohen.ss.api.StringDistance matcher
Constructor Detail

StringMatcher

public StringMatcher()
Create a new matcher using the default fuzzy matching algorithm: Level2JaroWinkler.


StringMatcher

public StringMatcher(com.wcohen.ss.api.StringDistance matcher)
Create a new matcher using a specific fuzzy matching algorithm.

Parameters:
matcher - The algorithm to use
Method Detail

getScore

public double getScore(String s1,
                       String s2)
Get the similarity score of two strings. The score is a value between 0 and 1, where 0 is a poor match and 1 is a good match. Note! It doesn't have to be a perfect match to get a score of 1.

Parameters:
s1 - The first string
s2 - The second string
Returns:
The similarity score or 0 if any of the strings are null

getBestMatch

public StringMatcher.FuzzyMatch getBestMatch(String key,
                                             Collection<? extends String> values)
Find the string that is most similar to a given string in a list of strings.

Parameters:
key - The string to look for
values - The list of strings to compare with the key
Returns:
A StringMatcher.FuzzyMatch result for the string in the list that got the highest score; null if there are no values in the list

getBestPairs

public List<StringMatcher.FuzzyMatch> getBestPairs(Collection<? extends String> keys,
                                                   Collection<? extends String> values)
Match strings in two lists. The result is a paired list of strings with one values from each of the lists. The matching is done so that no string from any of the lists is paired more than once.

Parameters:
keys - The list with the keys
values - The list with the values
Returns:
A list of StringMatcher.FuzzyMatch results. The returned list is of the same size as the 'keys' collection with the elements in the same order as returned by the iterator. The list may contains null elements if no match could be found for a given key

getHighestScoreIdx

private int[] getHighestScoreIdx(double[][] score)

2.17.2: 2011-06-17