Class StringMatcher

    • Field Detail

      • matcher

        private final com.wcohen.ss.api.StringDistance matcher
    • Constructor Detail

      • StringMatcher

        public StringMatcher()
        Create a new matcher using the default fuzzy matching algorithm: Level2JaroWinkler.
      • StringMatcher

        public StringMatcher​(com.wcohen.ss.api.StringDistance matcher)
        Create a new matcher using a specific fuzzy matching algorithm.
        Parameters:
        matcher - The algorithm to use
    • Method Detail

      • getScore

        public double getScore​(String s1,
                               String s2)
        Get the similarity score of two strings. The score is a value between 0 and 1, where 0 is a poor match and 1 is a good match. Note! It doesn't have to be a perfect match to get a score of 1.
        Parameters:
        s1 - The first string
        s2 - The second string
        Returns:
        The similarity score or 0 if any of the strings are null
      • getBestMatch

        public StringMatcher.FuzzyMatch getBestMatch​(String key,
                                                     Collection<? extends String> values)
        Find the string that is most similar to a given string in a list of strings.
        Parameters:
        key - The string to look for
        values - The list of strings to compare with the key
        Returns:
        A StringMatcher.FuzzyMatch result for the string in the list that got the highest score; null if there are no values in the list
      • getBestPairs

        public List<StringMatcher.FuzzyMatch> getBestPairs​(Collection<? extends String> keys,
                                                           Collection<? extends String> values)
        Match strings in two lists. The result is a paired list of strings with one values from each of the lists. The matching is done so that no string from any of the lists is paired more than once.
        Parameters:
        keys - The list with the keys
        values - The list with the values
        Returns:
        A list of StringMatcher.FuzzyMatch results. The returned list is of the same size as the 'keys' collection with the elements in the same order as returned by the iterator. The list may contains null elements if no match could be found for a given key
      • getHighestScoreIdx

        private int[] getHighestScoreIdx​(double[][] score)