Update README.md

compathelper/new_version/2020-05-20-12-03-08-092-188304956
matthieugomez 2020-03-03 06:48:00 -05:00
parent 6c9058a03a
commit 38d87ccd34
1 changed files with 21 additions and 23 deletions

View File

@ -4,15 +4,8 @@
## Installation
The package is registered in the [`General`](https://github.com/JuliaRegistries/General) registry and so can be installed at the REPL with `] add StringDistances`.
## Evaluate
To compute the distance between two strings (or between two iterators), you can use one of these two syntaxes:
```julia
evaluate(dist, s1, s2)
dist()(s1, s2)
```
where `dist` is one of the following distances::
## Supported Distances
The available distances are:
- Edit Distances
- [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) `Jaro()`
@ -25,9 +18,7 @@ where `dist` is one of the following distances::
- [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index) `Jaccard(q::Int)`
- [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient) `Overlap(q::Int)`
- [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) `SorensenDice(q::Int)`
- The package includes distance "modifiers", that can be applied to any distance.
- Distance "modifiers" that can be applied to any distance:
- [Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) diminishes the distance of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance.
- [Partial](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) returns the minimum distance between the shorter string and substrings of the longer string.
- [TokenSort](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) adjusts for differences in word orders by reording words alphabetically.
@ -37,29 +28,36 @@ where `dist` is one of the following distances::
A good distance to match strings composed of multiple words (like addresses) is `TokenMax(Levenshtein())` (see [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)
## Basic Use
### evaluate
You can always compute a certain distance between two strings (or iterators) using the following syntax:
```julia
evaluate(Levenshtein(), "martha", "marhta")
evaluate(QGram(2), "martha", "marhta")
evaluate(Winkler(Jaro()), "martha", "marhta")
Levenshtein()("martha", "marhta")
QGram(2)("martha", "marhta")
Winkler(Jaro())("martha", "marhta")
evaluate(dist, s1, s2)
dist(s1, s2)
```
## Compare
For instance, with the `Levenshtein` distance,
```julia
evaluate(Levenshtein(), "martha", "marhta")
Levenshtein()("martha", "marhta")
```
### compare
The function `compare` is defined as 1 minus the normalized distance between two strings. It always returns a number between 0 and 1: a value of 0 means completely different and a value of 1 means completely similar.
```julia
evaluate(Levenshtein(), "New York", "New York")
#> 0
compare("New York", "New York", Levenshtein())
#> 1.0
```
## Find
## find
- `findmax` returns the value and index of the element in `itr` with the highest similarity score with `s`. Its syntax is:
```julia
findmax(s, itr, dist::StringDistance; min_score = 0.0)