Module: ActsAsTokenizable::StringUtils

Defined in:
lib/acts_as_tokenizable/string_utils.rb

Class Method Summary collapse

Class Method Details

.alphanumerics(str) ⇒ Object

returns an array that contains, in order:

* the numeric parts, converted to numbers
* the non-numeric parts, as text

this is useful for sorting alphanumerically. For example:

“A1”, “A12”, “A2”].sort_by{|x| x.alphanumerics} => [“A1”, “A2”, “A12”

inspired by : blog.labnotes.org/2007/12/13/rounded-corners-173-beautiful-code/



45
46
47
# File 'lib/acts_as_tokenizable/string_utils.rb', line 45

def self.alphanumerics(str)
  str.split(/(\d+)/).map { |v| v =~ /\d/ ? v.to_i : v }
end

.numeric?(str) ⇒ Boolean

returns true if numeric, false, otherwise

Returns:

  • (Boolean)


6
7
8
9
10
# File 'lib/acts_as_tokenizable/string_utils.rb', line 6

def self.numeric?(str)
  true if Float(str)
rescue
  false
end

.remove_words(str, words_array, separator = ' ') ⇒ Object

removes certain words from a string. As a side-effect, all word-separators are converted to the separator char



20
21
22
# File 'lib/acts_as_tokenizable/string_utils.rb', line 20

def self.remove_words(str, words_array, separator = ' ')
  (words(str) - words_array).join separator
end

.replace_words(str, replacements, separator = ' ') ⇒ Object

replaces certain words on a string. As a side-effect, all word-separators are converted to the separator char



26
27
28
29
30
31
32
33
34
35
36
# File 'lib/acts_as_tokenizable/string_utils.rb', line 26

def self.replace_words(str, replacements, separator = ' ')
  replaced_words = words(str)
  replacements.each do |candidates, replacement|
    candidates.each do |candidate|
      replaced_words = replaced_words.collect do |w|
        w == candidate ? replacement : w
      end
    end
  end
  replaced_words.join separator
end

.to_token(str, max_length = 255) ⇒ Object

convert into something that can be used as an indexation key



50
51
52
53
54
55
56
57
# File 'lib/acts_as_tokenizable/string_utils.rb', line 50

def self.to_token(str, max_length = 255)
  # to_slug and normalize are provided by the 'babosa' gem
  # remove all non-alphanumeric but hyphen (-)
  str = str.to_slug.normalize.strip.downcase.gsub(/[\s|\.|,]+/, '')
  # remove duplicates, except on pure numbers
  str = str.squeeze unless numeric?(str)
  str[0..(max_length - 1)]
end

.words(str) ⇒ Object

returns an array of strings containing the words on this string. Removes spaces, strange chars, etc



14
15
16
# File 'lib/acts_as_tokenizable/string_utils.rb', line 14

def self.words(str)
  str.split(/[\s|\.|,]+/)
end

.words_to_token(str, max_length = 255, separator = ' ') ⇒ Object

tokenizes each word individually and joins the word with the separator



60
61
62
63
64
65
66
# File 'lib/acts_as_tokenizable/string_utils.rb', line 60

def self.words_to_token(str, max_length = 255, separator = ' ')
  words(str)
    .collect { |w| to_token(w) }
    .uniq
    .join(separator)
    .slice(0, max_length)
end