public class StringMatcher extends Object
getScore(String, String)
to get the similarity score of two strings.
Use getBestMatch(String, Collection)
to find the best match of a string among
strings in a given list.
Use getBestPairs(Collection, Collection)
to pair up strings in two lists.
Modifier and Type | Class and Description |
---|---|
static class |
StringMatcher.FuzzyMatch
Wrapper that holds information about a fuzzy match.
|
Modifier and Type | Field and Description |
---|---|
private com.wcohen.ss.api.StringDistance |
matcher |
Constructor and Description |
---|
StringMatcher()
Create a new matcher using the default fuzzy matching algorithm:
Level2JaroWinkler . |
StringMatcher(com.wcohen.ss.api.StringDistance matcher)
Create a new matcher using a specific fuzzy matching algorithm.
|
Modifier and Type | Method and Description |
---|---|
StringMatcher.FuzzyMatch |
getBestMatch(String key,
Collection<? extends String> values)
Find the string that is most similar to a given string in a list of
strings.
|
List<StringMatcher.FuzzyMatch> |
getBestPairs(Collection<? extends String> keys,
Collection<? extends String> values)
Match strings in two lists.
|
private int[] |
getHighestScoreIdx(double[][] score) |
double |
getScore(String s1,
String s2)
Get the similarity score of two strings.
|
public StringMatcher()
Level2JaroWinkler
.public StringMatcher(com.wcohen.ss.api.StringDistance matcher)
matcher
- The algorithm to usepublic double getScore(String s1, String s2)
s1
- The first strings2
- The second stringpublic StringMatcher.FuzzyMatch getBestMatch(String key, Collection<? extends String> values)
key
- The string to look forvalues
- The list of strings to compare with the keyStringMatcher.FuzzyMatch
result for the string in the list
that got the highest score; null if there are no values in the
listpublic List<StringMatcher.FuzzyMatch> getBestPairs(Collection<? extends String> keys, Collection<? extends String> values)
keys
- The list with the keysvalues
- The list with the valuesStringMatcher.FuzzyMatch
results. The returned list is of the
same size as the 'keys' collection with the elements in the same
order as returned by the iterator. The list may contains null elements
if no match could be found for a given keyprivate int[] getHighestScoreIdx(double[][] score)