Class: Appydave::Tools::Dam::FuzzyMatcher

Inherits:
Object
  • Object
show all
Defined in:
lib/appydave/tools/dam/fuzzy_matcher.rb

Overview

Fuzzy matching for brand names using Levenshtein distance

Class Method Summary collapse

Class Method Details

.find_matches(input, candidates, threshold: 3) ⇒ Array<String>

Find closest matches to input string

Parameters:

  • input (String)

    Input string to match

  • candidates (Array<String>)

    List of valid options

  • threshold (Integer) (defaults to: 3)

    Maximum distance to consider a match (default: 3)

Returns:

  • (Array<String>)

    Sorted list of closest matches



14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# File 'lib/appydave/tools/dam/fuzzy_matcher.rb', line 14

def find_matches(input, candidates, threshold: 3)
  return [] if input.nil? || input.empty? || candidates.empty?

  # Calculate distances and filter by threshold
  matches = candidates.map do |candidate|
    distance = levenshtein_distance(input.downcase, candidate.downcase)
    { candidate: candidate, distance: distance }
  end

  # Filter by threshold
  matches = matches.select { |m| m[:distance] <= threshold }

  # Sort by distance (closest first)
  matches.sort_by { |m| m[:distance] }.map { |m| m[:candidate] }
end

.levenshtein_distance(str1, str2) ⇒ Integer

Calculate Levenshtein distance between two strings

Parameters:

  • str1 (String)

    First string

  • str2 (String)

    Second string

Returns:

  • (Integer)

    Edit distance



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/appydave/tools/dam/fuzzy_matcher.rb', line 34

def levenshtein_distance(str1, str2)
  return str2.length if str1.empty?
  return str1.length if str2.empty?

  # Create distance matrix
  matrix = Array.new(str1.length + 1) { Array.new(str2.length + 1) }

  # Initialize first row and column
  (0..str1.length).each { |i| matrix[i][0] = i }
  (0..str2.length).each { |j| matrix[0][j] = j }

  # Calculate distances
  (1..str1.length).each do |i|
    (1..str2.length).each do |j|
      cost = str1[i - 1] == str2[j - 1] ? 0 : 1
      matrix[i][j] = [
        matrix[i - 1][j] + 1,      # deletion
        matrix[i][j - 1] + 1,      # insertion
        matrix[i - 1][j - 1] + cost # substitution
      ].min
    end
  end

  matrix[str1.length][str2.length]
end