Module: ActsAsTokenizable::StringUtils
- Defined in:
- lib/acts_as_tokenizable/string_utils.rb
Class Method Summary collapse
-
.alphanumerics(str) ⇒ Object
returns an array that contains, in order: * the numeric parts, converted to numbers * the non-numeric parts, as text this is useful for sorting alphanumerically.
-
.numeric?(str) ⇒ Boolean
returns true if numeric, false, otherwise.
-
.remove_words(str, words_array, separator = ' ') ⇒ Object
removes certain words from a string.
-
.replace_words(str, replacements, separator = ' ') ⇒ Object
replaces certain words on a string.
-
.to_token(str, max_length = 255) ⇒ Object
convert into something that can be used as an indexation key.
-
.words(str) ⇒ Object
returns an array of strings containing the words on this string.
-
.words_to_token(str, max_length = 255, separator = ' ') ⇒ Object
tokenizes each word individually and joins the word with the separator.
Class Method Details
.alphanumerics(str) ⇒ Object
returns an array that contains, in order:
* the numeric parts, converted to numbers
* the non-numeric parts, as text
this is useful for sorting alphanumerically. For example:
- “A1”, “A12”, “A2”].sort_by{|x| x.alphanumerics} => [“A1”, “A2”, “A12”
-
inspired by : blog.labnotes.org/2007/12/13/rounded-corners-173-beautiful-code/
45 46 47 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 45 def self.alphanumerics(str) str.split(/(\d+)/).map { |v| v =~ /\d/ ? v.to_i : v } end |
.numeric?(str) ⇒ Boolean
returns true if numeric, false, otherwise
6 7 8 9 10 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 6 def self.numeric?(str) true if Float(str) rescue false end |
.remove_words(str, words_array, separator = ' ') ⇒ Object
removes certain words from a string. As a side-effect, all word-separators are converted to the separator char
20 21 22 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 20 def self.remove_words(str, words_array, separator = ' ') (words(str) - words_array).join separator end |
.replace_words(str, replacements, separator = ' ') ⇒ Object
replaces certain words on a string. As a side-effect, all word-separators are converted to the separator char
26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 26 def self.replace_words(str, replacements, separator = ' ') replaced_words = words(str) replacements.each do |candidates, replacement| candidates.each do |candidate| replaced_words = replaced_words.collect do |w| w == candidate ? replacement : w end end end replaced_words.join separator end |
.to_token(str, max_length = 255) ⇒ Object
convert into something that can be used as an indexation key
50 51 52 53 54 55 56 57 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 50 def self.to_token(str, max_length = 255) # to_slug and normalize are provided by the 'babosa' gem # remove all non-alphanumeric but hyphen (-) str = str.to_slug.normalize.strip.downcase.gsub(/[\s|\.|,]+/, '') # remove duplicates, except on pure numbers str = str.squeeze unless numeric?(str) str[0..(max_length - 1)] end |
.words(str) ⇒ Object
returns an array of strings containing the words on this string. Removes spaces, strange chars, etc
14 15 16 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 14 def self.words(str) str.split(/[\s|\.|,]+/) end |
.words_to_token(str, max_length = 255, separator = ' ') ⇒ Object
tokenizes each word individually and joins the word with the separator
60 61 62 63 64 65 66 |
# File 'lib/acts_as_tokenizable/string_utils.rb', line 60 def self.words_to_token(str, max_length = 255, separator = ' ') words(str) .collect { |w| to_token(w) } .uniq .join(separator) .slice(0, max_length) end |