Package net.sf.basedb.util.fuzzy
Class StringMatcher
java.lang.Object
net.sf.basedb.util.fuzzy.StringMatcher
A wrapper class for fuzzy string matching using the SecondString package. This
class uses a given StringDistance object for calculating string similaroties.
Use
getScore(String, String)
to get the similarity score of two strings.
Use getBestMatch(String, Collection)
to find the best match of a string among
strings in a given list.
Use getBestPairs(Collection, Collection)
to pair up strings in two lists.
- Version:
- 2.8
- Author:
- nicklas
- See Also:
- Last modified
- $Date: 2023-12-20 11:11:43 +0100 (Wed, 20 Dec 2023) $
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Wrapper that holds information about a fuzzy match. -
Field Summary
-
Constructor Summary
ConstructorDescriptionCreate a new matcher using the default fuzzy matching algorithm:Level2JaroWinkler
.StringMatcher
(com.wcohen.ss.api.StringDistance matcher) Create a new matcher using a specific fuzzy matching algorithm. -
Method Summary
Modifier and TypeMethodDescriptiongetBestMatch
(String key, Collection<? extends String> values) Find the string that is most similar to a given string in a list of strings.getBestPairs
(Collection<? extends String> keys, Collection<? extends String> values) Match strings in two lists.private int[]
getHighestScoreIdx
(double[][] score) double
Get the similarity score of two strings.
-
Field Details
-
matcher
private final com.wcohen.ss.api.StringDistance matcher
-
-
Constructor Details
-
StringMatcher
public StringMatcher()Create a new matcher using the default fuzzy matching algorithm:Level2JaroWinkler
. -
StringMatcher
public StringMatcher(com.wcohen.ss.api.StringDistance matcher) Create a new matcher using a specific fuzzy matching algorithm.- Parameters:
matcher
- The algorithm to use
-
-
Method Details
-
getScore
Get the similarity score of two strings. The score is a value between 0 and 1, where 0 is a poor match and 1 is a good match. Note! It doesn't have to be a perfect match to get a score of 1.- Parameters:
s1
- The first strings2
- The second string- Returns:
- The similarity score or 0 if any of the strings are null
-
getBestMatch
Find the string that is most similar to a given string in a list of strings.- Parameters:
key
- The string to look forvalues
- The list of strings to compare with the key- Returns:
- A
StringMatcher.FuzzyMatch
result for the string in the list that got the highest score; null if there are no values in the list
-
getBestPairs
public List<StringMatcher.FuzzyMatch> getBestPairs(Collection<? extends String> keys, Collection<? extends String> values) Match strings in two lists. The result is a paired list of strings with one values from each of the lists. The matching is done so that no string from any of the lists is paired more than once.- Parameters:
keys
- The list with the keysvalues
- The list with the values- Returns:
- A list of
StringMatcher.FuzzyMatch
results. The returned list is of the same size as the 'keys' collection with the elements in the same order as returned by the iterator. The list may contains null elements if no match could be found for a given key
-
getHighestScoreIdx
private int[] getHighestScoreIdx(double[][] score)
-