Class: IndexedSearch::Match::AmericanSoundex
- Defined in:
- lib/indexed_search/match/american_soundex.rb
Overview
Does the “american soundex” variation of the soundex algorithm comparison to find words that sound similar. Only works well for English.
Also supports keys longer than 4 characters, and is more tolerant of unicode characters in a way that’s somewhat similar to how MySQL’s SOUNDEX() function works…
Uses an american_soundex column to store a soundex value with each entry in the IndexedSearch::Word model. TODO: ideally non-ascii letters should be normalized to similar ascii ones if they can…
Constant Summary collapse
- MAP =
{ 'a' => '0', 'e' => '0', 'i' => '0', 'o' => '0', 'u' => '0', 'b' => '1', 'f' => '1', 'p' => '1', 'v' => '1', 'c' => '2', 'g' => '2', 'j' => '2', 'k' => '2', 'q' => '2', 's' => '2', 'x' => '2', 'z' => '2', 'd' => '3', 't' => '3', 'l' => '4', 'm' => '5', 'n' => '5', 'r' => '6' }
Class Method Summary collapse
-
.make_index_value(term) ⇒ Object
see: en.wikipedia.org/wiki/Soundex#Rules our exception is of course the length, and some limited unicode tolerance.
Instance Method Summary collapse
Methods inherited from Base
#find, find_attributes, #initialize, match_against_term?, #results, #term_matches, #term_non_matches
Constructor Details
This class inherits a constructor from IndexedSearch::Match::Base
Class Method Details
.make_index_value(term) ⇒ Object
see: en.wikipedia.org/wiki/Soundex#Rules our exception is of course the length, and some limited unicode tolerance
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/indexed_search/match/american_soundex.rb', line 56 def self.make_index_value(term) idx = 0 idx += 1 until term[idx] =~ /\A\p{Alpha}\z/ || idx >= term.size return nil if idx >= term.size value = UnicodeUtils.simple_upcase(term[idx]) return value if max_length == 1 last_code = MAP[term[idx]] while idx < term.size do idx += 1 code = MAP[term[idx]] if ! code.nil? && code != last_code value += code if code != '0' return value if value.size >= max_length last_code = code end end value.ljust(4, '0') end |
Instance Method Details
#scope ⇒ Object
34 35 36 |
# File 'lib/indexed_search/match/american_soundex.rb', line 34 def scope @scope.where(self.class.matcher_attribute => term_map.keys) end |
#term_map ⇒ Object
38 39 40 41 42 |
# File 'lib/indexed_search/match/american_soundex.rb', line 38 def term_map @term_map ||= Hash.new { |hash,key| hash[key] = [] }.tap do |map| term_matches.each { |term| map[self.class.make_index_value(term)] << term } end end |