diff --git a/README.md b/README.md index 5bd4e45..79828f1 100644 --- a/README.md +++ b/README.md @@ -29,8 +29,8 @@ where `s1` and `s2` can be any iterator with a `length` method (e.g. `AbstractSt - The package includes distance "modifiers", that can be applied to any distance. - - [Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) boosts the similary score of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance. - - [Partial](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) returns the maximal similarity score between the shorter string and substrings of the longer string. + - [Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) diminishes the distance of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance. + - [Partial](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) returns the minimum distance between the shorter string and substrings of the longer string. - [TokenSort](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) adjusts for differences in word orders by reording words alphabetically. - [TokenSet](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) adjusts for differences in word orders and word numbers by comparing the intersection of two strings with each string. - [TokenMax](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) combines scores using the base distance, the `Partial`, `TokenSort` and `TokenSet` modifiers, with penalty terms depending on string lengths. @@ -48,14 +48,15 @@ evaluate(TokenSet(Jaro()), "martha", "marhta") evaluate(TokenMax(RatcliffObershelp()), "martha", "marhta") ``` -A good distance to match strings composed of multiple words (like addresses) is `TokenMax(Levenshtein())` (see [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)). - -Alternatively, each `dist` struct can be used as a callable to call the evaluate function of each metric or modified metric, for example: +Alternatively, each distance can be used as a callable to call the evaluate function of each metric or modified metric, for example: ```julia Jaro()("martha", "marhta") Winkler(Jaro())("martha", "marhta") QGram(2)("martha", "marhta") ``` + +A good distance to match strings composed of multiple words (like addresses) is `TokenMax(Levenshtein())` (see [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)). + ## Compare The function `compare` is defined as 1 minus the normalized distance between two strings. It always returns a number between 0 and 1: a value of 0 means completely different and a value of 1 means completely similar. ```julia