Class: LevensteinWithPath::Path
- Inherits:
-
Object
- Object
- LevensteinWithPath::Path
- Defined in:
- lib/levenstein_with_path.rb
Overview
Computes the levenstein distance between two sequences.
Unlike other Ruby libraries, this one:
- Can tell you not just the optimal score, but the sequence of operations to get it. This is useful if you want to present a nice diff.
- Lets you pass either strings or array of some other atomic tokens (like words, lines, nucleobases, etc.).
It is based on the Wagner-Fischer algorithm,
which is O(n^2)
but lets you retain the history of edits.
See also that algorithm applied to Levenstein distance.
Instance Attribute Summary collapse
-
#score ⇒ Object
readonly
Returns the value of attribute score.
-
#scores ⇒ Object
readonly
Returns the value of attribute scores.
-
#tokens1 ⇒ Object
readonly
Returns the value of attribute tokens1.
-
#tokens2 ⇒ Object
readonly
Returns the value of attribute tokens2.
Instance Method Summary collapse
- #edits ⇒ Object
-
#initialize(tokens1, tokens2) ⇒ Path
constructor
A new instance of Path.
Constructor Details
#initialize(tokens1, tokens2) ⇒ Path
Returns a new instance of Path.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/levenstein_with_path.rb', line 19 def initialize(tokens1, tokens2) @tokens1 = if tokens1.is_a?(String) tokens1.chars else tokens1 end.to_a @tokens2 = if tokens2.is_a?(String) tokens2.chars else tokens2 end.to_a s = @tokens1 t = @tokens2 m = s.size n = t.size d = 0.upto(m).map { [0] * (n + 1)} 1.upto(m) {|i| d[i][0] = i} 1.upto(n) {|j| d[0][j] = j} # rubocop:disable Layout/SpaceInsideReferenceBrackets 1.upto(n) do |j| 1.upto(m) do |i| cost = s[i - 1] == t[j - 1] ? 0 : 1 d[i][j] = [ d[i - 1][j ] + 1, # delete d[i ][j - 1] + 1, # insert d[i - 1][j - 1] + cost # keep/swap ].min end end # rubocop:enable Layout/SpaceInsideReferenceBrackets @scores = d @score = d[m][n] end |
Instance Attribute Details
#score ⇒ Object (readonly)
Returns the value of attribute score.
17 18 19 |
# File 'lib/levenstein_with_path.rb', line 17 def score @score end |
#scores ⇒ Object (readonly)
Returns the value of attribute scores.
17 18 19 |
# File 'lib/levenstein_with_path.rb', line 17 def scores @scores end |
#tokens1 ⇒ Object (readonly)
Returns the value of attribute tokens1.
16 17 18 |
# File 'lib/levenstein_with_path.rb', line 16 def tokens1 @tokens1 end |
#tokens2 ⇒ Object (readonly)
Returns the value of attribute tokens2.
16 17 18 |
# File 'lib/levenstein_with_path.rb', line 16 def tokens2 @tokens2 end |
Instance Method Details
#edits ⇒ Object
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/levenstein_with_path.rb', line 56 def edits return @edits if @edits @edits = [] s = @tokens1 t = @tokens2 m = s.size n = t.size i = m j = n d = @scores while i > 0 or j > 0 cur_score = d[i][j] insert_score = j > 0 ? d[i][j - 1] : cur_score + 1 delete_score = i > 0 ? d[i - 1][j] : cur_score + 1 keep_or_swap_score = (i > 0 and j > 0) ? d[i - 1][j - 1] : cur_score + 1 best_score = [insert_score, delete_score, keep_or_swap_score].min if best_score == keep_or_swap_score if cur_score == keep_or_swap_score @edits << Keep.new(s[i - 1]) else @edits << Swap.new(s[i - 1], t[j - 1]) end i -= 1 j -= 1 elsif best_score == insert_score @edits << Insert.new(t[j - 1]) j -= 1 else @edits << Delete.new(s[i - 1]) i -= 1 end end @edits.reverse! end |