Module: Edits::RestrictedEdit
- Defined in:
- lib/edits/restricted_edit.rb
Overview
Implements Restricted Damerau-Levenshtein distance (Optimal Alignment) algorithm.
Determines distance between two strings by counting edits, identifying:
- Insertion
- Deletion
- Substitution
- Swapped items
Class Method Summary collapse
-
.distance(seq1, seq2) ⇒ Integer
Calculate the Restricted Damerau-Levenshtein distance (Optimal Alignment) of two sequences.
Class Method Details
.distance(seq1, seq2) ⇒ Integer
Note:
Not a true distance metric, fails to satisfy triangle inequality.
Calculate the Restricted Damerau-Levenshtein distance (Optimal Alignment) of two sequences.
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/edits/restricted_edit.rb', line 23 def self.distance(seq1, seq2) if seq1.length > seq2.length temp = seq1 seq1 = seq2 seq2 = temp end # array of Integer codepoints outperforms String seq1 = seq1.codepoints if seq1.is_a? String seq2 = seq2.codepoints if seq2.is_a? String rows = seq1.length cols = seq2.length return cols if rows.zero? return rows if cols.zero? # previous two rows of cost matrix are retained lastlast_row = [] last_row = [] # Initialize first row of cost matrix. # The full initial state where cols=3, rows=2 would be: # [[0, 1, 2, 3], # [1, 0, 0, 0], # [2, 0, 0, 0]] curr_row = 0.upto(cols).to_a rows.times do |row| lastlast_row = last_row last_row = curr_row # generate next row of cost matrix curr_row = Array.new(cols + 1, 0) curr_row[0] = row + 1 curr_item = seq1[row] cols.times do |col| sub_cost = curr_item == seq2[col] ? 0 : 1 is_swap = sub_cost == 1 && row.positive? && col.positive? && curr_item == seq2[col - 1] && seq1[row - 1] == seq2[col] deletion = last_row[col + 1] + 1 insertion = curr_row[col] + 1 substitution = last_row[col] + sub_cost # step cost is min of operation costs cost = deletion < insertion ? deletion : insertion cost = substitution if substitution < cost if is_swap swap = lastlast_row[col - 1] + 1 cost = swap if swap < cost end curr_row[col + 1] = cost end end curr_row[cols] end |