Class: Levenshtein
- Inherits:
-
Object
- Object
- Levenshtein
- Defined in:
- lib/levenshtein.rb
Overview
Levenshtein distance algorithm implementation for Ruby, with UTF-8 support
- Author
-
Paul BATTLEY (pbattley @ gmail.com)
- Version
-
1.3
- Date
-
2005-04-19
About
The Levenshtein distance is a measure of how similar two strings s and t are, calculated as the number of deletions/insertions/substitutions needed to transform s into t. The greater the distance, the more the strings differ.
The Levenshtein distance is also sometimes referred to as the easier-to-pronounce-and-spell ‘edit distance’.
Revision history
-
2005-05-19 1.3 Repairing an oversight, distance can now be called via Levenshtein.distance(s, t)
-
2005-05-04 1.2 Now uses just one 1-dimensional array. I think this is as far as optimisation can go.
-
2005-05-04 1.1 Now storing only the current and previous rows of the matrix instead of the whole lot.
Licence
Copyright © 2005 Paul Battley
Usage of the works is permitted provided that this instrument is retained with the works, so that any entity that uses the works is notified of this instrument.
DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.
Class Method Summary collapse
Class Method Details
.distance(str1, str2) ⇒ Object
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/levenshtein.rb', line 45 def distance(str1, str2) s, t = str1.unpack('U*'), str2.unpack('U*') n, m = s.length, t.length return m if (0 == n) return n if (0 == m) d = (0..m).to_a x = nil (0...n).each do |i| e = i+1 (0...m).each do |j| cost = (s[i] == t[j]) ? 0 : 1 x = [ d[j+1] + 1, # insertion e + 1, # deletion d[j] + cost # substitution ].min d[j] = e e = x end d[m] = x end return x end |