What is fuzzy-string-match
- fuzzy-string-match is a fuzzy string matching library for ruby.
- It is fast. ( written in C with RubyInline )
- It suports only Jaro-Winkler distance algorithm.
- This program was ported by hand from lucene-3.0.2. (lucene is Java product)
- If you want to add another string distance algorithm, please port by yourself and contact me [email protected].
Installing
- gem install fuzzy-string-match
Features
- Caluclate Jaro-Winkler distance of two strings.
- Pure ruby version can handle both ascii and UTF8 strings. (and slow)
- Native version can only ascii strings. (and fast)
Sample code
- Native version
require 'fuzzystringmatch'
jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
p jarow.getDistance( "jones", "johnson" )
- Pure ruby version
require 'fuzzystringmatch'
jarow = FuzzyStringMatch::JaroWinkler.new.create( :pure )
p jarow.getDistance( "ああ", "あい" )
Sample on irb
irb(main):001:0> require 'fuzzystringmatch'
require 'fuzzystringmatch'
=> true
irb(main):002:0> jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
=> #<FuzzyStringMatch::JaroWinklerNative:0x000001011b0010>
irb(main):003:0> jarow.getDistance( "al", "al" )
jarow.getDistance( "al", "al" )
=> 1.0
irb(main):004:0> jarow.getDistance( "dixon", "dicksonx" )
jarow.getDistance( "dixon", "dicksonx" )
=> 0.8133333333333332
Requires
- RubyInline
- Ruby 1.9.1 or higher
Author
- Copyright (C) Kiyoka Nishiyama [email protected]
- I ported from java source code of lucene-3.0.2.
See also
- http://en.wikipedia.org/wiki/Jaro–Winkler_distance
- http://lucene.apache.org/
- http://github.com/naoya/perl-text-jarowinkler
License
- Apache 2.0 LICENSE