Go to file
Iain Dunning 9198a7115d Add pkg.julialang.org badge 2015-11-03 10:09:10 -05:00
benchmark use binary search 2015-11-02 12:52:23 -05:00
src use binary search 2015-11-02 12:52:23 -05:00
test use binary search 2015-11-02 12:52:23 -05:00
.travis.yml first commit 2015-10-22 12:12:44 -04:00
LICENSE.md first commit 2015-10-22 12:12:44 -04:00
README.md Add pkg.julialang.org badge 2015-11-03 10:09:10 -05:00
REQUIRE add DataStructures 2015-10-25 20:27:35 -04:00

README.md

Build Status Coverage Status StringDistances

StringDistances allow to compute various distances between strings. It works with any string of type AbstractString (in particular ASCII and UTF-8)

Distances

  • Hamming Distance
  • Jaro Distance
  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • QGram Distance
  • Cosine Distance
  • Jaccard Distance

Syntax

  • The basic syntax follows the Distances package:

    using StringDistances
    evaluate(Hamming(), "martha", "marhta")
    evaluate(QGram(2), "martha", "marhta")
    
  • Normalize a distance between 0-1 with Normalized

    evaluate(Normalized(Hamming()), "martha", "marhta")
    evaluate(Normalized(QGram(2)), "martha", "marhta")
    
  • Add a Winkler adjustment with Winkler

    evaluate(Winkler(Jaro()), "martha", "marhta")
    evaluate(Winkler(Qgram(2)), "martha", "marhta")
    

    While the Winkler adjustment was originally defined in the context of the Jaro distance, it can be helpful with other distances too. Note: a distance is automatically normalized between 0 and 1 when used with a Winkler adjustment.

References

A good reference for these string distances is an article written for the R package stringdist: The stringdist Package for Approximate String Matching Mark P.J. van der Loo