test on 1.0

pull/8/head
matthieugomez 2018-08-19 00:44:10 +01:00
parent 571738cb5c
commit 0d505a18d9
4 changed files with 10 additions and 13 deletions

View File

@ -1,15 +1,13 @@
language: julia
julia:
- 0.7
- 1.0
- nightly
matrix:
allow_failures:
- julia: nightly
script:
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
- julia --check-bounds=yes -e 'Pkg.clone(pwd()); Pkg.build("StringDistances"); Pkg.test("StringDistances"; coverage=true)'
after_success:
- julia -e 'cd(Pkg.dir("StringDistances")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())'
- julia -e 'using Pkg; cd(Pkg.dir("StringDistances")); Pkg.add("Coverage"); using Coverage; Coveralls.submit(Coveralls.process_folder())'
notifications:
email: false
on_success: never

View File

@ -87,7 +87,7 @@ The package includes distance "modifiers", that can be applied to any distance.
```
- [TokenMax](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) combines scores using the base distance, the `Partial`, `TokenSort` and `TokenSet` modifiers, with penalty terms depending on string lengths.
- [TokenMax](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) combines scores using the base distance, the `Partial`, `TokenSort` and `TokenSet` modifiers, with penalty terms depending on string lengths. This is the default distance in [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/) .
```julia
compare(TokenMax(RatcliffObershelp()),"mariners vs angels", "los angeles angels at seattle mariners")
@ -95,8 +95,7 @@ The package includes distance "modifiers", that can be applied to any distance.
```
## Compare vs Evaluate
The function `compare` returns a similarity score: a value of 0 means completely different and a value of 1 means completely similar.
In contrast, the function `evaluate` returns the litteral distance between two strings, with a value of 0 being completely similar.
In contrast, the function `evaluate` returns the litteral distance between two strings, with a value of 0 being completely similar. some distances are between 0 and 1. Others are unbouded.
```julia
compare(Levenshtein(), "New York", "New York")
@ -108,12 +107,12 @@ evaluate(Levenshtein(), "New York", "New York")
## Which distance should I use?
As a rule of thumb,
- Standardize strings before comparing them (correct for uppercases, punctuations, whitespaces, accents, abbreviations...)
- Don't use Edit Distances if word order do not matter.
- The distance `Tokenmax(RatcliffObershelp())` is a good default choice.
- Standardize strings before comparing them (cases, whitespaces, accents, abbreviations...)
- Don't use one of the Edit distances if word order do not matter.
- The distance `Tokenmax(RatcliffObershelp())` is a good choice to link names or adresses across datasets.
## References
- [The stringdist Package for Approximate String Matching](https://journal.r-project.org/archive/2014-1/loo.pdf) Mark P.J. van der Loo
- [fuzzywuzzy blog post](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)
- [fuzzywuzzy](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)

View File

@ -1,3 +1,3 @@
julia 0.7-
julia 0.7
Distances
IterTools

View File

@ -7,7 +7,7 @@ module StringDistances
## Export
##
##############################################################################
import Base: eltype, length, iterate, ==, hash, isless, convert, show, endof
import Base: eltype, length, iterate, ==, hash, isless, convert, show
import Distances: evaluate, Hamming, hamming, PreMetric, SemiMetric
import IterTools: chain
export