Hintable Levenshtein
Levenshtein distances but with extra hints. Perhaps adding or deleting a space is not as big as a change as other things, or substituting a âcâ for a âkâ is again a cheaper operation than just any arbitrary change.
Just an example
english_rules = [
HintableLevenshtein::RuleSet.new(0.3, HintableLevenshtein::Rule.insert(/[\.,!]/)),
HintableLevenshtein::RuleSet.new(0.3, HintableLevenshtein::Rule.delete(/[\.,!]/)),
HintableLevenshtein::RuleSet.new(0.4, HintableLevenshtein::Rule.substitute('!' => '.')),
HintableLevenshtein::RuleSet.new(0.4, HintableLevenshtein::Rule.substitute('!' => ',')),
HintableLevenshtein::RuleSet.new(0.75, HintableLevenshtein::Rule.insert(' '), HintableLevenshtein::Rule.insert(' ')),
HintableLevenshtein::RuleSet.new(0.5, HintableLevenshtein::Rule.insert(' ')),
HintableLevenshtein::RuleSet.new(0.5, HintableLevenshtein::Rule.delete(' ')),
HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('z' => 's')),
HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('k' => 'c')),
HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('u' => 'o')),
HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('e' => 'a')),
HintableLevenshtein::RuleSet.new(0.7, HintableLevenshtein::Rule.substitute('i' => 'y')),
HintableLevenshtein::RuleSet.new(1, HintableLevenshtein::Rule.delete),
HintableLevenshtein::RuleSet.new(1, HintableLevenshtein::Rule.insert),
HintableLevenshtein::RuleSet.new(1, HintableLevenshtein::Rule.substitute)
]
a = "hello kitten pizza!!"
b = "hello cittin pissssa.."
puts "normal levenshtein: #{HintableLevenshtein.new.distance(a, b)}"
puts "hinted levenshtein: #{HintableLevenshtein.new(english_rules).distance(a, b)}"
Would output:
normal levenshtein: 11.0
hinted levenshtein: 7.15