From 4a743452b3123ff26262f04c25a5d00f80089d16 Mon Sep 17 00:00:00 2001 From: matthieugomez Date: Tue, 15 May 2018 18:47:55 -0400 Subject: [PATCH] readme --- README.md | 22 +++++++++++----------- test/modifiers.jl | 7 +++++++ 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 6b840d5..0a4b275 100644 --- a/README.md +++ b/README.md @@ -9,21 +9,21 @@ This Julia package computes various distances between strings. ## Distances #### Edit Distances -- [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) -- [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) -- [Damerau-Levenshtein Distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) +- [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) `Hamming()` +- [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) `Levenshtein()` +- [Damerau-Levenshtein Distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) `DamerauLevenshtein()` #### Q-Grams Distances Q-gram distances compare the set of all substrings of length `q` in each string. -- QGram Distance -- [Cosine Distance](https://en.wikipedia.org/wiki/Cosine_similarity) -- [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index) -- [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient) -- [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) +- QGram Distance `Qgram(q)` +- [Cosine Distance](https://en.wikipedia.org/wiki/Cosine_similarity) `Cosine(q)` +- [Jaccard Distance](https://en.wikipedia.org/wiki/Jaccard_index) `Jaccard(q)` +- [Overlap Distance](https://en.wikipedia.org/wiki/Overlap_coefficient) `Overlap(q)` +- [Sorensen-Dice Distance](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) `SorensenDice(q)` #### Others -- [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) -- [RatcliffObershelp Distance](https://xlinux.nist.gov/dads/HTML/ratcliffObershelp.html) +- [Jaro Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) `Jaro()` +- [RatcliffObershelp Distance](https://xlinux.nist.gov/dads/HTML/ratcliffObershelp.html) `RatcliffObershelp()` ## Syntax The function `evaluate` return the *litteral distance* between two strings. @@ -101,7 +101,7 @@ The package includes distance "modifiers", that can be applied to any distance. As a rule of thumb, - Standardize strings before comparing them (correct for uppercases, punctuations, whitespaces, accents, abbreviations...) - Don't use Edit Distances if word order do not matter. -- The distance `Tokenmax(RatcliffObershelp())' is a good default choice. +- The distance `Tokenmax(RatcliffObershelp())` is a good default choice. ## References - [The stringdist Package for Approximate String Matching](https://journal.r-project.org/archive/2014-1/loo.pdf) Mark P.J. van der Loo diff --git a/test/modifiers.jl b/test/modifiers.jl index e6a1dc2..69250a6 100644 --- a/test/modifiers.jl +++ b/test/modifiers.jl @@ -12,6 +12,13 @@ using StringDistances, Base.Test @test compare(Jaccard(2), "", "abc") ≈ 0.0 atol = 1e-4 +@test compare(Jaccard(2), "martha", "martha") ≈ 1.0 atol = 1e-4 +@test compare(Cosine(2), "martha", "martha") ≈ 1.0 atol = 1e-4 +@test compare(Jaccard(2), "martha", "martha") ≈ 1.0 atol = 1e-4 +@test compare(Overlap(2), "martha", "martha") ≈ 1.0 atol = 1e-4 +@test compare(SorensenDice(2), "martha", "martha") ≈ 1.0 atol = 1e-4 + + # Winkler @test compare(Winkler(Jaro(), 0.1, 0.0), "martha", "marhta") ≈ 0.9611 atol = 1e-4 @test compare(Winkler(Jaro(), 0.1, 0.0), "dwayne", "duane") ≈ 0.84 atol = 1e-4