Update README.md
parent
6c9058a03a
commit
38d87ccd34
44
README.md
44
README.md
|
@ -4,15 +4,8 @@
|
|||
## Installation
|
||||
The package is registered in the [`General`](https://github.com/JuliaRegistries/General) registry and so can be installed at the REPL with `] add StringDistances`.
|
||||
|
||||
## Evaluate
|
||||
To compute the distance between two strings (or between two iterators), you can use one of these two syntaxes:
|
||||
|
||||
```julia
|
||||
evaluate(dist, s1, s2)
|
||||
dist()(s1, s2)
|
||||
```
|
||||
|
||||
where `dist` is one of the following distances::
|
||||
## Supported Distances
|
||||
The available distances are:
|
||||
|
||||
- Edit Distances
|
||||
- [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) `Jaro()`
|
||||
|
@ -25,9 +18,7 @@ where `dist` is one of the following distances::
|
|||
- [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index) `Jaccard(q::Int)`
|
||||
- [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient) `Overlap(q::Int)`
|
||||
- [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) `SorensenDice(q::Int)`
|
||||
|
||||
- The package includes distance "modifiers", that can be applied to any distance.
|
||||
|
||||
- Distance "modifiers" that can be applied to any distance:
|
||||
- [Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) diminishes the distance of strings with common prefixes. The Winkler adjustment was originally defined for the Jaro similarity score but this package defines it for any string distance.
|
||||
- [Partial](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) returns the minimum distance between the shorter string and substrings of the longer string.
|
||||
- [TokenSort](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) adjusts for differences in word orders by reording words alphabetically.
|
||||
|
@ -37,29 +28,36 @@ where `dist` is one of the following distances::
|
|||
|
||||
A good distance to match strings composed of multiple words (like addresses) is `TokenMax(Levenshtein())` (see [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)
|
||||
|
||||
|
||||
## Basic Use
|
||||
|
||||
### evaluate
|
||||
You can always compute a certain distance between two strings (or iterators) using the following syntax:
|
||||
|
||||
```julia
|
||||
evaluate(Levenshtein(), "martha", "marhta")
|
||||
evaluate(QGram(2), "martha", "marhta")
|
||||
evaluate(Winkler(Jaro()), "martha", "marhta")
|
||||
|
||||
|
||||
Levenshtein()("martha", "marhta")
|
||||
QGram(2)("martha", "marhta")
|
||||
Winkler(Jaro())("martha", "marhta")
|
||||
evaluate(dist, s1, s2)
|
||||
dist(s1, s2)
|
||||
```
|
||||
|
||||
## Compare
|
||||
For instance, with the `Levenshtein` distance,
|
||||
|
||||
```julia
|
||||
evaluate(Levenshtein(), "martha", "marhta")
|
||||
Levenshtein()("martha", "marhta")
|
||||
```
|
||||
|
||||
### compare
|
||||
The function `compare` is defined as 1 minus the normalized distance between two strings. It always returns a number between 0 and 1: a value of 0 means completely different and a value of 1 means completely similar.
|
||||
|
||||
```julia
|
||||
evaluate(Levenshtein(), "New York", "New York")
|
||||
#> 0
|
||||
compare("New York", "New York", Levenshtein())
|
||||
#> 1.0
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Find
|
||||
## find
|
||||
- `findmax` returns the value and index of the element in `itr` with the highest similarity score with `s`. Its syntax is:
|
||||
```julia
|
||||
findmax(s, itr, dist::StringDistance; min_score = 0.0)
|
||||
|
|
Loading…
Reference in New Issue