Class: FuzzyStrings

Inherits:
Object
  • Object
show all
Defined in:
lib/fuzzy_strings.rb

Overview

Match words based on the operations needed to get 2 similar words bt insertion, deletion, substitution or transposition operations.

cot => coat (a must be inserted to get the same word) coat => cot (a must be deleted to get the same word) cost => coat (a must be substituted with s to get the same word) foo => floor (l and r must be inserted) floor => foo (l and r must be deleted) cost => cots (t and s must substituted (cost=2) or transpositioned (cost=1))

the cost is the amount of operations needed to make 2 words the same

Usage

fs = FuzzyStrings.new("pattern")
match = fs.compare("pattren")
puts match.match?
# true
puts match.score
# 2
puts match

Unicode?

It is assumed that all strings are utf-8

Defined Under Namespace

Classes: Match

Instance Method Summary collapse

Constructor Details

#initialize(string1) ⇒ FuzzyStrings

Returns a new instance of FuzzyStrings.



26
27
28
# File 'lib/fuzzy_strings.rb', line 26

def initialize(string1)
  @string1 = string1.to_s rescue ""
end

Instance Method Details

#compare(string2, no_transpositions = false) ⇒ Object

compare a given string to the base pattern, the compared strings is operated upon (soo cot as the pattern and coat in compare leads to deletion)

returns a FuzzyStrings::Match object



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/fuzzy_strings.rb', line 35

def compare(string2, no_transpositions = false)
  @string2 = string2.to_s rescue ""
  @match   = Match.new

  return @match if @string1 == @string2

  rule = 'U*'

  sequence1 = @string1.unpack rule
  sequence2 = @string2.unpack rule

  if (sequence1 + sequence2).include?(0)
    raise ArgumentError.new(
      "Strings cannot contain NULL-bytes due to internal semantics"
    )
  end

  @short, @long = if sequence1.length < sequence2.length
    [sequence1, sequence2]
  else
    [sequence2, sequence1]
  end

  find_insertions
  find_substitutions
  find_transpositions unless no_transpositions == true

  return @match
end