Class: Gjp::VersionMatcher

Inherits:
Object
  • Object
show all
Includes:
Logger
Defined in:
lib/gjp/version_matcher.rb

Overview

heuristically matches version strings

Instance Method Summary collapse

Methods included from Logger

log, #log

Instance Method Details

#best_match(my_version, their_versions) ⇒ Object

using a heuristic criterion. Idea:

- split the version number in chunks divided by ., - etc.
- every chunk with same index is "compared", differences make up a score
- "comparison" is a subtraction if the chunk is an integer, a string distance measure otherwise
- score weighs differently on chunk index (first chunks are most important)
- lowest score wins


30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/gjp/version_matcher.rb', line 30

def best_match(my_version, their_versions)
  log.debug("version comparison: #{my_version} vs #{their_versions.join(', ')}")

  my_chunks = my_version.split /[\.\-\_ ~,]/
  their_chunks_hash = Hash[
    their_versions.map do |their_version|
      their_chunks_for_version = if their_version != nil
        their_version.split /[\.\-\_ ~,]/
      else
        []
      end
      their_chunks_for_version += [nil]*[my_chunks.length - their_chunks_for_version.length, 0].max
      [their_version, their_chunks_for_version]
    end
  ]
  
  max_chunks_length = ([my_chunks.length] + their_chunks_hash.values.map {|chunk| chunk.length}).max
  
  scoreboard = []
  their_versions.each do |their_version|
    their_chunks = their_chunks_hash[their_version]
    score = 0
    their_chunks.each_with_index do |their_chunk, i|
      score_multiplier = 100**(max_chunks_length -i -1)
      my_chunk = my_chunks[i]
      score += chunk_distance(my_chunk, their_chunk) * score_multiplier
    end
    scoreboard << {:version => their_version, :score => score}
  end
  
  scoreboard = scoreboard.sort_by {|element| element[:score]}

  log.debug("scoreboard: ")
  scoreboard.each_with_index do |element, i|
    log.debug("  #{i+1}. #{element[:version]} (score: #{element[:score]})")
  end
  
  winner = scoreboard.first
  
  if winner != nil
    return winner[:version]
  end
end

#chunk_distance(my_chunk, their_chunk) ⇒ Object

returns a score representing the distance between two version chunks for integers, the score is the difference between their values for strings, the score is the Levenshtein distance in any case score is normalized between 0 (identical) and 99 (very different/uncomparable)



78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/gjp/version_matcher.rb', line 78

def chunk_distance(my_chunk, their_chunk)
  if my_chunk == nil
    my_chunk = "0"
  end
  if their_chunk == nil
    their_chunk = "0"
  end
  if my_chunk.is_i? && their_chunk.is_i?
    return [(my_chunk.to_i - their_chunk.to_i).abs, 99].min
  else
    return [Text::Levenshtein.distance(my_chunk.upcase, their_chunk.upcase), 99].min
  end
end

#split_version(full_name) ⇒ Object

assumes that version strings begin with a numeric character and are separated by a ., -, _, ~ or space returns a [name, version] pair



14
15
16
17
18
19
20
21
# File 'lib/gjp/version_matcher.rb', line 14

def split_version(full_name)	
   matches = full_name.match(/(.*?)(?:[\.\-\_ ~,]?([0-9].*))?$/)
   if matches != nil && matches.length > 1
     [matches[1], matches[2]]
	else
		[full_string, nil]
	end
end