Class StringMatcher

java.lang.Object
net.sf.basedb.util.fuzzy.StringMatcher

public class StringMatcher
extends Object
A wrapper class for fuzzy string matching using the SecondString package. This class uses a given StringDistance object for calculating string similaroties. Use getScore(String, String) to get the similarity score of two strings.

Use getBestMatch(String, Collection) to find the best match of a string among strings in a given list.

Use getBestPairs(Collection, Collection) to pair up strings in two lists.

Version:
2.8
Author:
nicklas
See Also:
SecondString project page
Last modified
$Date: 2023-12-20 11:11:43 +0100 (Wed, 20 Dec 2023) $
  • Field Details

    • matcher

      private final com.wcohen.ss.api.StringDistance matcher
  • Constructor Details

    • StringMatcher

      public StringMatcher()
      Create a new matcher using the default fuzzy matching algorithm: Level2JaroWinkler.
    • StringMatcher

      public StringMatcher​(com.wcohen.ss.api.StringDistance matcher)
      Create a new matcher using a specific fuzzy matching algorithm.
      Parameters:
      matcher - The algorithm to use
  • Method Details

    • getScore

      public double getScore​(String s1, String s2)
      Get the similarity score of two strings. The score is a value between 0 and 1, where 0 is a poor match and 1 is a good match. Note! It doesn't have to be a perfect match to get a score of 1.
      Parameters:
      s1 - The first string
      s2 - The second string
      Returns:
      The similarity score or 0 if any of the strings are null
    • getBestMatch

      public StringMatcher.FuzzyMatch getBestMatch​(String key, Collection<? extends String> values)
      Find the string that is most similar to a given string in a list of strings.
      Parameters:
      key - The string to look for
      values - The list of strings to compare with the key
      Returns:
      A StringMatcher.FuzzyMatch result for the string in the list that got the highest score; null if there are no values in the list
    • getBestPairs

      public List<StringMatcher.FuzzyMatch> getBestPairs​(Collection<? extends String> keys, Collection<? extends String> values)
      Match strings in two lists. The result is a paired list of strings with one values from each of the lists. The matching is done so that no string from any of the lists is paired more than once.
      Parameters:
      keys - The list with the keys
      values - The list with the values
      Returns:
      A list of StringMatcher.FuzzyMatch results. The returned list is of the same size as the 'keys' collection with the elements in the same order as returned by the iterator. The list may contains null elements if no match could be found for a given key
    • getHighestScoreIdx

      private int[] getHighestScoreIdx​(double[][] score)