stringdist

tree
matthieugomez 2015-11-03 10:55:37 -05:00
parent b0e8f28c40
commit 5b221e1682
2 changed files with 10 additions and 15 deletions

View File

@ -7,15 +7,17 @@ StringDistances allow to compute various distances between strings. The package
## Distances
- [x] Hamming Distance
- [x] Jaro Distance
- [x] Levenshtein Distance
- [x] Damerau-Levenshtein Distance
- [x] QGram Distance
- [x] Cosine Distance
- [x] Jaccard Distance
- Hamming Distance
- Jaro Distance
- Levenshtein Distance
- Damerau-Levenshtein Distance
- QGram Distance
- Cosine Distance
- Jaccard Distance
A good reference about string distances is the article written for the R package `stringdist`:
*The stringdist Package for Approximate String Matching* Mark P.J. van der Loo
## Syntax
- The basic syntax follows the [Distances](https://github.com/JuliaStats/Distances.jl) package:
@ -26,8 +28,6 @@ StringDistances allow to compute various distances between strings. The package
evaluate(QGram(2), "martha", "marhta")
```
- Normalize a distance between 0-1 with `Normalized`
```julia
@ -42,8 +42,3 @@ StringDistances allow to compute various distances between strings. The package
evaluate(Winkler(Qgram(2)), "martha", "marhta")
```
While the Winkler adjustment was originally defined in the context of the Jaro distance, it can be helpful with other distances too. Note: a distance is automatically normalized between 0 and 1 when used with a Winkler adjustment.
## References
A good reference for these string distances is an article written for the R package `stringdist`:
*The stringdist Package for Approximate String Matching* Mark P.J. van der Loo

View File

@ -45,7 +45,7 @@ Base.sort(qiter::QGramIterator) = sort!(collect(qiter), alg = QuickSort)
##############################################################################
##
## Define a type that iterates through a pair of sorted vector
## At each iteration, output number of times it appears in v1, number of times it appears in v2
## For each element in either v1 or v2, output number of times it appears in v1 and the number of times it appears in v2
##
##############################################################################