Go to file
matthieugomez a680311200 Tree implementation 2015-11-03 14:30:15 -05:00
benchmark refractor 2015-11-03 10:41:35 -05:00
src Tree implementation 2015-11-03 14:30:15 -05:00
test refractor 2015-11-03 10:41:35 -05:00
.travis.yml first commit 2015-10-22 12:12:44 -04:00
LICENSE.md first commit 2015-10-22 12:12:44 -04:00
README.md stringdist 2015-11-03 10:55:37 -05:00
REQUIRE refractor 2015-11-03 10:41:35 -05:00

README.md

Build Status Coverage Status StringDistances

StringDistances allow to compute various distances between strings. The package should work with any AbstractString (in particular ASCII and UTF-8)

Distances

  • Hamming Distance
  • Jaro Distance
  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • QGram Distance
  • Cosine Distance
  • Jaccard Distance

A good reference about string distances is the article written for the R package stringdist: The stringdist Package for Approximate String Matching Mark P.J. van der Loo

Syntax

  • The basic syntax follows the Distances package:

    using StringDistances
    evaluate(Hamming(), "martha", "marhta")
    evaluate(QGram(2), "martha", "marhta")
    
  • Normalize a distance between 0-1 with Normalized

    evaluate(Normalized(Hamming()), "martha", "marhta")
    evaluate(Normalized(QGram(2)), "martha", "marhta")
    
  • Add a Winkler adjustment with Winkler

    evaluate(Winkler(Jaro()), "martha", "marhta")
    evaluate(Winkler(Qgram(2)), "martha", "marhta")
    

    While the Winkler adjustment was originally defined in the context of the Jaro distance, it can be helpful with other distances too. Note: a distance is automatically normalized between 0 and 1 when used with a Winkler adjustment.