Class: BBLib::FuzzyMatcher

Inherits:
Object
  • Object
show all
Includes:
Effortless
Defined in:
lib/bblib/classes/fuzzy_matcher.rb

Overview

Used to apply multiple string comparison algorithms to strings and normalize them to determine similarity for words or phrases.

Instance Method Summary collapse

Methods included from Effortless

#_attrs, included

Instance Method Details

#best_match(string_a, *string_b) ⇒ Object

Returns the best match from array b to string a based on percent.



30
31
32
# File 'lib/bblib/classes/fuzzy_matcher.rb', line 30

def best_match(string_a, *string_b)
  similarities(string_a, *string_b).max_by { |_k, v| v }[0]
end

#match?(string_a, string_b) ⇒ Boolean

Checks to see if the match percentage between Strings a and b are equal to or greater than the threshold.

Returns:

  • (Boolean)


25
26
27
# File 'lib/bblib/classes/fuzzy_matcher.rb', line 25

def match?(string_a, string_b)
  similarity(string_a, string_b) >= threshold.to_f
end

#set_weight(algorithm, weight) ⇒ Object



40
41
42
43
# File 'lib/bblib/classes/fuzzy_matcher.rb', line 40

def set_weight(algorithm, weight)
  return nil unless algorithms.include? algorithm
  algorithms[algorithm] = BBLib.keep_between(weight, 0, nil)
end

#similarities(string_a, *string_b) ⇒ Object

Returns a hash of array ‘b’ with the percentage match to a. If sort is true, the hash is sorted desc by match percent.



36
37
38
# File 'lib/bblib/classes/fuzzy_matcher.rb', line 36

def similarities(string_a, *string_b)
  [*string_b].map { |word| [word, matches[word] = similarity(string_a, word)] }
end

#similarity(string_a, string_b) ⇒ Object

Calculates a percentage match between string a and string b.



12
13
14
15
16
17
18
19
20
21
22
# File 'lib/bblib/classes/fuzzy_matcher.rb', line 12

def similarity(string_a, string_b)
  string_a, string_b = prep_strings(string_a, string_b)
  return 100.0 if string_a == string_b
  score = 0
  total_weight = algorithms.values.inject { |sum, weight| sum + weight }
  algorithms.each do |algorithm, weight|
    next unless weight.positive?
    score+= string_a.send("#{algorithm}_similarity", string_b) * weight
  end
  score / total_weight
end