readme
parent
2ebe788f8c
commit
4a743452b3
22
README.md
22
README.md
|
@ -9,21 +9,21 @@ This Julia package computes various distances between strings.
|
||||||
## Distances
|
## Distances
|
||||||
|
|
||||||
#### Edit Distances
|
#### Edit Distances
|
||||||
- [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance)
|
- [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) `Hamming()`
|
||||||
- [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance)
|
- [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) `Levenshtein()`
|
||||||
- [Damerau-Levenshtein Distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)
|
- [Damerau-Levenshtein Distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) `DamerauLevenshtein()`
|
||||||
|
|
||||||
#### Q-Grams Distances
|
#### Q-Grams Distances
|
||||||
Q-gram distances compare the set of all substrings of length `q` in each string.
|
Q-gram distances compare the set of all substrings of length `q` in each string.
|
||||||
- QGram Distance
|
- QGram Distance `Qgram(q)`
|
||||||
- [Cosine Distance](https://en.wikipedia.org/wiki/Cosine_similarity)
|
- [Cosine Distance](https://en.wikipedia.org/wiki/Cosine_similarity) `Cosine(q)`
|
||||||
- [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index)
|
- [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index) `Jaccard(q)`
|
||||||
- [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient)
|
- [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient) `Overlap(q)`
|
||||||
- [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient)
|
- [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) `SorensenDice(q)`
|
||||||
|
|
||||||
#### Others
|
#### Others
|
||||||
- [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)
|
- [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) `Jaro()`
|
||||||
- [RatcliffObershelp Distance](https://xlinux.nist.gov/dads/HTML/ratcliffObershelp.html)
|
- [RatcliffObershelp Distance](https://xlinux.nist.gov/dads/HTML/ratcliffObershelp.html) `RatcliffObershelp()`
|
||||||
|
|
||||||
## Syntax
|
## Syntax
|
||||||
The function `evaluate` return the *litteral distance* between two strings.
|
The function `evaluate` return the *litteral distance* between two strings.
|
||||||
|
@ -101,7 +101,7 @@ The package includes distance "modifiers", that can be applied to any distance.
|
||||||
As a rule of thumb,
|
As a rule of thumb,
|
||||||
- Standardize strings before comparing them (correct for uppercases, punctuations, whitespaces, accents, abbreviations...)
|
- Standardize strings before comparing them (correct for uppercases, punctuations, whitespaces, accents, abbreviations...)
|
||||||
- Don't use Edit Distances if word order do not matter.
|
- Don't use Edit Distances if word order do not matter.
|
||||||
- The distance `Tokenmax(RatcliffObershelp())' is a good default choice.
|
- The distance `Tokenmax(RatcliffObershelp())` is a good default choice.
|
||||||
|
|
||||||
## References
|
## References
|
||||||
- [The stringdist Package for Approximate String Matching](https://journal.r-project.org/archive/2014-1/loo.pdf) Mark P.J. van der Loo
|
- [The stringdist Package for Approximate String Matching](https://journal.r-project.org/archive/2014-1/loo.pdf) Mark P.J. van der Loo
|
||||||
|
|
|
@ -12,6 +12,13 @@ using StringDistances, Base.Test
|
||||||
|
|
||||||
@test compare(Jaccard(2), "", "abc") ≈ 0.0 atol = 1e-4
|
@test compare(Jaccard(2), "", "abc") ≈ 0.0 atol = 1e-4
|
||||||
|
|
||||||
|
@test compare(Jaccard(2), "martha", "martha") ≈ 1.0 atol = 1e-4
|
||||||
|
@test compare(Cosine(2), "martha", "martha") ≈ 1.0 atol = 1e-4
|
||||||
|
@test compare(Jaccard(2), "martha", "martha") ≈ 1.0 atol = 1e-4
|
||||||
|
@test compare(Overlap(2), "martha", "martha") ≈ 1.0 atol = 1e-4
|
||||||
|
@test compare(SorensenDice(2), "martha", "martha") ≈ 1.0 atol = 1e-4
|
||||||
|
|
||||||
|
|
||||||
# Winkler
|
# Winkler
|
||||||
@test compare(Winkler(Jaro(), 0.1, 0.0), "martha", "marhta") ≈ 0.9611 atol = 1e-4
|
@test compare(Winkler(Jaro(), 0.1, 0.0), "martha", "marhta") ≈ 0.9611 atol = 1e-4
|
||||||
@test compare(Winkler(Jaro(), 0.1, 0.0), "dwayne", "duane") ≈ 0.84 atol = 1e-4
|
@test compare(Winkler(Jaro(), 0.1, 0.0), "dwayne", "duane") ≈ 0.84 atol = 1e-4
|
||||||
|
|
Loading…
Reference in New Issue