StringDistances.jl/README.md

45 lines
1.7 KiB
Markdown

[![Build Status](https://travis-ci.org/matthieugomez/StringDistances.jl.svg?branch=master)](https://travis-ci.org/matthieugomez/StringDistances.jl)
[![Coverage Status](https://coveralls.io/repos/matthieugomez/StringDistances.jl/badge.svg?branch=master)](https://coveralls.io/r/matthieugomez/StringDistances.jl?branch=master)
[![StringDistances](http://pkg.julialang.org/badges/StringDistances_0.4.svg)](http://pkg.julialang.org/?pkg=StringDistances)
StringDistances allow to compute various distances between strings. The package should work with any `AbstractString` (in particular ASCII and UTF-8)
## Distances
- Hamming Distance
- Jaro Distance
- Levenshtein Distance
- Damerau-Levenshtein Distance
- QGram Distance
- Cosine Distance
- Jaccard Distance
A good reference about string distances is the article written for the R package `stringdist`:
*The stringdist Package for Approximate String Matching* Mark P.J. van der Loo
## Syntax
- The basic syntax follows the [Distances](https://github.com/JuliaStats/Distances.jl) package:
```julia
using StringDistances
evaluate(Hamming(), "martha", "marhta")
evaluate(QGram(2), "martha", "marhta")
```
- Normalize a distance between 0-1 with `Normalized`
```julia
evaluate(Normalized(Hamming()), "martha", "marhta")
evaluate(Normalized(QGram(2)), "martha", "marhta")
```
- Add a [Winkler adjustment](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) with `Winkler`
```julia
evaluate(Winkler(Jaro()), "martha", "marhta")
evaluate(Winkler(Qgram(2)), "martha", "marhta")
```
While the Winkler adjustment was originally defined in the context of the Jaro distance, it can be helpful with other distances too. Note: a distance is automatically normalized between 0 and 1 when used with a Winkler adjustment.