StringDistances.jl/README.md

45 lines
1.7 KiB
Markdown
Raw Normal View History

2015-10-22 18:38:04 +02:00
[![Build Status](https://travis-ci.org/matthieugomez/StringDistances.jl.svg?branch=master)](https://travis-ci.org/matthieugomez/StringDistances.jl)
2015-10-23 03:03:57 +02:00
[![Coverage Status](https://coveralls.io/repos/matthieugomez/StringDistances.jl/badge.svg?branch=master)](https://coveralls.io/r/matthieugomez/StringDistances.jl?branch=master)
2015-10-30 16:20:26 +01:00
[![StringDistances](http://pkg.julialang.org/badges/StringDistances_0.4.svg)](http://pkg.julialang.org/?pkg=StringDistances)
2015-10-22 18:38:04 +02:00
2015-11-02 18:54:47 +01:00
StringDistances allow to compute various distances between strings. The package should work with any `AbstractString` (in particular ASCII and UTF-8)
2015-10-22 18:38:04 +02:00
2015-10-25 16:23:46 +01:00
## Distances
2015-10-22 18:38:04 +02:00
2015-11-03 16:55:37 +01:00
- Hamming Distance
- Jaro Distance
- Levenshtein Distance
- Damerau-Levenshtein Distance
- QGram Distance
- Cosine Distance
- Jaccard Distance
2015-10-23 16:12:51 +02:00
2015-10-22 18:23:10 +02:00
2015-11-03 16:55:37 +01:00
A good reference about string distances is the article written for the R package `stringdist`:
*The stringdist Package for Approximate String Matching* Mark P.J. van der Loo
2015-10-25 16:23:46 +01:00
## Syntax
- The basic syntax follows the [Distances](https://github.com/JuliaStats/Distances.jl) package:
```julia
using StringDistances
evaluate(Hamming(), "martha", "marhta")
evaluate(QGram(2), "martha", "marhta")
```
- Normalize a distance between 0-1 with `Normalized`
```julia
evaluate(Normalized(Hamming()), "martha", "marhta")
evaluate(Normalized(QGram(2)), "martha", "marhta")
```
- Add a [Winkler adjustment](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) with `Winkler`
```julia
2015-10-25 16:40:18 +01:00
evaluate(Winkler(Jaro()), "martha", "marhta")
2015-10-25 16:23:46 +01:00
evaluate(Winkler(Qgram(2)), "martha", "marhta")
```
2015-10-25 16:45:15 +01:00
While the Winkler adjustment was originally defined in the context of the Jaro distance, it can be helpful with other distances too. Note: a distance is automatically normalized between 0 and 1 when used with a Winkler adjustment.