Class: Gjp::VersionMatcher
- Inherits:
-
Object
- Object
- Gjp::VersionMatcher
- Includes:
- Logger
- Defined in:
- lib/gjp/version_matcher.rb
Overview
heuristically matches version strings
Instance Method Summary collapse
-
#best_match(my_version, their_versions) ⇒ Object
using a heuristic criterion.
-
#chunk_distance(my_chunk, their_chunk) ⇒ Object
returns a score representing the distance between two version chunks for integers, the score is the difference between their values for strings, the score is the Levenshtein distance in any case score is normalized between 0 (identical) and 99 (very different/uncomparable).
-
#split_version(full_name) ⇒ Object
assumes that version strings begin with a numeric character and are separated by a ., -, _, ~ or space returns a [name, version] pair.
Methods included from Logger
Instance Method Details
#best_match(my_version, their_versions) ⇒ Object
using a heuristic criterion. Idea:
- split the version number in chunks divided by ., - etc.
- every chunk with same index is "compared", differences make up a score
- "comparison" is a subtraction if the chunk is an integer, a string distance measure otherwise
- score weighs differently on chunk index (first chunks are most important)
- lowest score wins
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/gjp/version_matcher.rb', line 30 def best_match(my_version, their_versions) log.debug("version comparison: #{my_version} vs #{their_versions.join(', ')}") my_chunks = my_version.split /[\.\-\_ ~,]/ their_chunks_hash = Hash[ their_versions.map do |their_version| their_chunks_for_version = if their_version != nil their_version.split /[\.\-\_ ~,]/ else [] end their_chunks_for_version += [nil]*[my_chunks.length - their_chunks_for_version.length, 0].max [their_version, their_chunks_for_version] end ] max_chunks_length = ([my_chunks.length] + their_chunks_hash.values.map {|chunk| chunk.length}).max scoreboard = [] their_versions.each do |their_version| their_chunks = their_chunks_hash[their_version] score = 0 their_chunks.each_with_index do |their_chunk, i| score_multiplier = 100**(max_chunks_length -i -1) my_chunk = my_chunks[i] score += chunk_distance(my_chunk, their_chunk) * score_multiplier end scoreboard << {:version => their_version, :score => score} end scoreboard = scoreboard.sort_by {|element| element[:score]} log.debug("scoreboard: ") scoreboard.each_with_index do |element, i| log.debug(" #{i+1}. #{element[:version]} (score: #{element[:score]})") end winner = scoreboard.first if winner != nil return winner[:version] end end |
#chunk_distance(my_chunk, their_chunk) ⇒ Object
returns a score representing the distance between two version chunks for integers, the score is the difference between their values for strings, the score is the Levenshtein distance in any case score is normalized between 0 (identical) and 99 (very different/uncomparable)
78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/gjp/version_matcher.rb', line 78 def chunk_distance(my_chunk, their_chunk) if my_chunk == nil my_chunk = "0" end if their_chunk == nil their_chunk = "0" end if my_chunk.is_i? && their_chunk.is_i? return [(my_chunk.to_i - their_chunk.to_i).abs, 99].min else return [Text::Levenshtein.distance(my_chunk.upcase, their_chunk.upcase), 99].min end end |
#split_version(full_name) ⇒ Object
assumes that version strings begin with a numeric character and are separated by a ., -, _, ~ or space returns a [name, version] pair
14 15 16 17 18 19 20 21 |
# File 'lib/gjp/version_matcher.rb', line 14 def split_version(full_name) matches = full_name.match(/(.*?)(?:[\.\-\_ ~,]?([0-9].*))?$/) if matches != nil && matches.length > 1 [matches[1], matches[2]] else [full_string, nil] end end |