Hintable Levenshtein

Levenshtein distances but with extra hints. Perhaps adding or deleting a space is not as big as a change as other things, or substituting a ‘c’ for a ‘k’ is again a cheaper operation than just any arbitrary change.

Just an example

english_rules = [
  HintableLevenshtein::RuleSet.new(0.3, HintableLevenshtein::Rule.insert(/[\.,!]/)),
  HintableLevenshtein::RuleSet.new(0.3, HintableLevenshtein::Rule.delete(/[\.,!]/)),
  HintableLevenshtein::RuleSet.new(0.4, HintableLevenshtein::Rule.substitute('!' => '.')),
  HintableLevenshtein::RuleSet.new(0.4, HintableLevenshtein::Rule.substitute('!' => ',')),
  HintableLevenshtein::RuleSet.new(0.75, HintableLevenshtein::Rule.insert(' '), HintableLevenshtein::Rule.insert(' ')),
  HintableLevenshtein::RuleSet.new(0.5, HintableLevenshtein::Rule.insert(' ')),
  HintableLevenshtein::RuleSet.new(0.5, HintableLevenshtein::Rule.delete(' ')),
  HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('z' => 's')),
  HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('k' => 'c')),
  HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('u' => 'o')),
  HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('e' => 'a')),
  HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('i' => 'y')),
  HintableLevenshtein::RuleSet.new(1, HintableLevenshtein::Rule.delete),
  HintableLevenshtein::RuleSet.new(1, HintableLevenshtein::Rule.insert),
  HintableLevenshtein::RuleSet.new(1, HintableLevenshtein::Rule.substitute)
]

a = "hello kitten pizza!!"
b = "hello    cittin pissssa.."

puts "normal levenshtein: #{HintableLevenshtein.new.distance(a, b)}"
puts "hinted levenshtein: #{HintableLevenshtein.new(english_rules).distance(a, b)}"

Would output:

normal levenshtein: 11.0
hinted levenshtein: 7.15