Class: FuzzyStrings
- Inherits:
-
Object
- Object
- FuzzyStrings
- Defined in:
- lib/fuzzy_strings.rb
Overview
Match words based on the operations needed to get 2 similar words bt insertion, deletion, substitution or transposition operations.
cot => coat (a must be inserted to get the same word) coat => cot (a must be deleted to get the same word) cost => coat (a must be substituted with s to get the same word) foo => floor (l and r must be inserted) floor => foo (l and r must be deleted) cost => cots (t and s must substituted (cost=2) or transpositioned (cost=1))
the cost is the amount of operations needed to make 2 words the same
Usage
fs = FuzzyStrings.new("pattern")
match = fs.compare("pattren")
puts match.match?
# true
puts match.score
# 2
puts match
Unicode?
It is assumed that all strings are utf-8
Defined Under Namespace
Classes: Match
Instance Method Summary collapse
-
#compare(string2, no_transpositions = false) ⇒ Object
compare a given string to the base pattern, the compared strings is operated upon (soo cot as the pattern and coat in compare leads to deletion).
-
#initialize(string1) ⇒ FuzzyStrings
constructor
A new instance of FuzzyStrings.
Constructor Details
#initialize(string1) ⇒ FuzzyStrings
Returns a new instance of FuzzyStrings.
26 27 28 |
# File 'lib/fuzzy_strings.rb', line 26 def initialize(string1) @string1 = string1.to_s rescue "" end |
Instance Method Details
#compare(string2, no_transpositions = false) ⇒ Object
compare a given string to the base pattern, the compared strings is operated upon (soo cot as the pattern and coat in compare leads to deletion)
returns a FuzzyStrings::Match object
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/fuzzy_strings.rb', line 35 def compare(string2, no_transpositions = false) @string2 = string2.to_s rescue "" @match = Match.new return @match if @string1 == @string2 rule = 'U*' sequence1 = @string1.unpack rule sequence2 = @string2.unpack rule if (sequence1 + sequence2).include?(0) raise ArgumentError.new( "Strings cannot contain NULL-bytes due to internal semantics" ) end @short, @long = if sequence1.length < sequence2.length [sequence1, sequence2] else [sequence2, sequence1] end find_insertions find_substitutions find_transpositions unless no_transpositions == true return @match end |