diff --git a/README.md b/README.md index 7267986..79d9e89 100644 --- a/README.md +++ b/README.md @@ -4,15 +4,8 @@ ## Installation The package is registered in the [`General`](https://github.com/JuliaRegistries/General) registry and so can be installed at the REPL with `] add StringDistances`. -## Evaluate -To compute the distance between two strings (or between two iterators), you can use one of these two syntaxes: - -```julia -evaluate(dist, s1, s2) -dist()(s1, s2) -``` - -where `dist` is one of the following distances:: +## Supported Distances +The available distances are: - Edit Distances - [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) `Jaro()` @@ -25,9 +18,7 @@ where `dist` is one of the following distances:: - [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index) `Jaccard(q::Int)` - [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient) `Overlap(q::Int)` - [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) `SorensenDice(q::Int)` - -- The package includes distance "modifiers", that can be applied to any distance. - +- Distance "modifiers" that can be applied to any distance: - [Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) diminishes the distance of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance. - [Partial](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) returns the minimum distance between the shorter string and substrings of the longer string. - [TokenSort](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) adjusts for differences in word orders by reording words alphabetically. @@ -37,29 +28,36 @@ where `dist` is one of the following distances:: A good distance to match strings composed of multiple words (like addresses) is `TokenMax(Levenshtein())` (see [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) + +## Basic Use + +### evaluate +You can always compute a certain distance between two strings (or iterators) using the following syntax: + ```julia -evaluate(Levenshtein(), "martha", "marhta") -evaluate(QGram(2), "martha", "marhta") -evaluate(Winkler(Jaro()), "martha", "marhta") - - -Levenshtein()("martha", "marhta") -QGram(2)("martha", "marhta") -Winkler(Jaro())("martha", "marhta") +evaluate(dist, s1, s2) +dist(s1, s2) ``` -## Compare +For instance, with the `Levenshtein` distance, + +```julia +evaluate(Levenshtein(), "martha", "marhta") +Levenshtein()("martha", "marhta") +``` + +### compare The function `compare` is defined as 1 minus the normalized distance between two strings. It always returns a number between 0 and 1: a value of 0 means completely different and a value of 1 means completely similar. + ```julia evaluate(Levenshtein(), "New York", "New York") #> 0 compare("New York", "New York", Levenshtein()) #> 1.0 - ``` -## Find +## find - `findmax` returns the value and index of the element in `itr` with the highest similarity score with `s`. Its syntax is: ```julia findmax(s, itr, dist::StringDistance; min_score = 0.0)