Class: Babosa::Identifier

Inherits:
Object
  • Object
show all
Defined in:
lib/babosa/identifier.rb

Overview

This class provides some string-manipulation methods specific to slugs.

Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.

All of the bang methods return an instance of String, while the bangless versions return an instance of Babosa::Identifier, so that calls to methods specific to this class can be chained:

string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators  # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">

Constant Summary collapse

Error =
Class.new(StandardError)
@@utf8_proxy =
if Babosa.jruby15?
  UTF8::JavaProxy
elsif defined? Unicode::VERSION
  UTF8::UnicodeProxy
elsif defined? ActiveSupport
  UTF8::ActiveSupportProxy
else
  UTF8::DumbProxy
end

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(string) ⇒ Identifier

Returns a new instance of Identifier.

Parameters:

  • string (#to_s)

    The string to use as the basis of the Identifier.


65
66
67
68
69
# File 'lib/babosa/identifier.rb', line 65

def initialize(string)
  @wrapped_string = string.to_s
  tidy_bytes!
  normalize_utf8!
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(symbol, *args, &block) ⇒ Object


60
61
62
# File 'lib/babosa/identifier.rb', line 60

def method_missing(symbol, *args, &block)
  @wrapped_string.__send__(symbol, *args, &block)
end

Instance Attribute Details

#wrapped_stringObject (readonly) Also known as: to_s

Returns the value of attribute wrapped_string


35
36
37
# File 'lib/babosa/identifier.rb', line 35

def wrapped_string
  @wrapped_string
end

Class Method Details

.utf8_proxyObject

Return the proxy used for UTF-8 support.

See Also:


50
51
52
# File 'lib/babosa/identifier.rb', line 50

def self.utf8_proxy
  @@utf8_proxy
end

.utf8_proxy=(obj) ⇒ Object

Set a proxy object used for UTF-8 support.

See Also:


56
57
58
# File 'lib/babosa/identifier.rb', line 56

def self.utf8_proxy=(obj)
  @@utf8_proxy = obj
end

Instance Method Details

#==(value) ⇒ Object


71
72
73
# File 'lib/babosa/identifier.rb', line 71

def ==(value)
  @wrapped_string.to_s == value.to_s
end

#clean!Object

Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.

Returns:

  • String


131
132
133
# File 'lib/babosa/identifier.rb', line 131

def clean!
  @wrapped_string = @wrapped_string.gsub("-", " ").squeeze(" ").strip
end

#default_normalize_optionsObject

The default options for #normalize!. Override to set your own defaults.


273
274
275
# File 'lib/babosa/identifier.rb', line 273

def default_normalize_options
  {:transliterate => true, :max_length => 255, :separator => "-"}
end

#downcase!Object

Perform UTF-8 sensitive downcasing.

Returns:

  • String


241
242
243
# File 'lib/babosa/identifier.rb', line 241

def downcase!
  @wrapped_string = @@utf8_proxy.downcase(@wrapped_string)
end

#empty?Boolean

Returns:

  • (Boolean)

79
80
81
82
83
# File 'lib/babosa/identifier.rb', line 79

def empty?
  # included to make this class :respond_to? :empty for compatibility with Active Support's
  # #blank?
  @wrapped_string.empty?
end

#eql?(value) ⇒ Boolean

Returns:

  • (Boolean)

75
76
77
# File 'lib/babosa/identifier.rb', line 75

def eql?(value)
  @wrapped_string == value
end

#normalize!(options = nil) ⇒ Object

Normalize the string for use as a URL slug. Note that in this context, normalize means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.

Parameters:

  • Options

Returns:

  • String


147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/babosa/identifier.rb', line 147

def normalize!(options = nil)
  options = default_normalize_options.merge(options || {})

  if translit_option = options[:transliterate]
    if translit_option != true
      transliterate!(*translit_option)
    else
      transliterate!(*options[:transliterations])
    end
  end
  to_ascii! if options[:to_ascii]
  clean!
  word_chars!
  clean!
  downcase!
  truncate_bytes!(options[:max_length])
  with_separators!(options[:separator])
end

#normalize_utf8!Object

Perform Unicode composition on the wrapped string.

Returns:

  • String


247
248
249
# File 'lib/babosa/identifier.rb', line 247

def normalize_utf8!
  @wrapped_string = @@utf8_proxy.normalize_utf8(@wrapped_string)
end

#tidy_bytes!Object

Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.

Returns:

  • String


254
255
256
# File 'lib/babosa/identifier.rb', line 254

def tidy_bytes!
  @wrapped_string = @@utf8_proxy.tidy_bytes(@wrapped_string)
end

#to_ascii!Object

Delete any non-ascii characters.

Returns:

  • String


193
194
195
# File 'lib/babosa/identifier.rb', line 193

def to_ascii!
  @wrapped_string = @wrapped_string.gsub(/[^\x00-\x7f]/u, '')
end

#to_identifierObject Also known as: to_slug


268
269
270
# File 'lib/babosa/identifier.rb', line 268

def to_identifier
  self
end

#to_ruby_method!(allow_bangs = true) ⇒ Object

Normalize a string so that it can safely be used as a Ruby method name.


167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# File 'lib/babosa/identifier.rb', line 167

def to_ruby_method!(allow_bangs = true)
  leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten
  leader          = leader.to_s
  trailer         = trailer.to_s
  if allow_bangs
    trailer.downcase!
    trailer.gsub!(/[^a-z0-9!=\\?]/, '')
  else
    trailer.downcase!
    trailer.gsub!(/[^a-z0-9]/, '')
  end
  id = leader.to_identifier
  id.transliterate!
  id.to_ascii!
  id.clean!
  id.word_chars!
  id.clean!
  @wrapped_string = id.to_s + trailer
  if @wrapped_string == ""
    raise Error, "Input generates impossible Ruby method name"
  end
  with_separators!("_")
end

#transliterate!(*kinds) ⇒ Object Also known as: approximate_ascii!

Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.

string = Identifier.new "Łódź
string.transliterate                 # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate                 # => "日本"

You can pass any key(s) from Characters.approximations as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.

string = Identifier.new "Jürgen Müller"
string.transliterate                 # => "Jurgen Muller"
string.transliterate :german         # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate                 # => "¡Feliz ano!"
string.transliterate :spanish        # => "¡Feliz anio!"

The approximations are an array, which you can modify if you choose:

# Make Spanish use "nh" rather than "nn"
Babosa::Transliterator::Spanish::APPROXIMATIONS["ñ"] = "nh"

Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:

string.transliterate!(:spanish)       # => "¡Feliz anio!"
string.transliterate!                 # => "¡Feliz anio!"

Parameters:

  • *args (Symbol)

Returns:

  • String


118
119
120
121
122
123
124
125
126
# File 'lib/babosa/identifier.rb', line 118

def transliterate!(*kinds)
  kinds.compact!
  kinds = [:latin] if kinds.empty?
  kinds.each do |kind|
    transliterator = Transliterator.get(kind).instance
    @wrapped_string = transliterator.transliterate(@wrapped_string)
  end
  @wrapped_string
end

#truncate!(max) ⇒ Object

Truncate the string to max characters.

Examples:

"üéøá".to_identifier.truncate(3) #=> "üéø"

Returns:

  • String


201
202
203
# File 'lib/babosa/identifier.rb', line 201

def truncate!(max)
  @wrapped_string = unpack("U*")[0...max].pack("U*")
end

#truncate_bytes!(max) ⇒ Object

Truncate the string to max bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max if the string must be truncated at a multibyte character boundary.

Examples:

"üéøá".to_identifier.truncate_bytes(3) #=> "ü"

Returns:

  • String


212
213
214
215
216
217
218
219
220
221
222
223
224
225
# File 'lib/babosa/identifier.rb', line 212

def truncate_bytes!(max)
  return @wrapped_string if @wrapped_string.bytesize <= max
  curr = 0
  new = []
  unpack("U*").each do |char|
    break if curr > max
    char = [char].pack("U")
    curr += char.bytesize
    if curr <= max
      new << char
    end
  end
  @wrapped_string = new.join
end

#upcase!Object

Perform UTF-8 sensitive upcasing.

Returns:

  • String


235
236
237
# File 'lib/babosa/identifier.rb', line 235

def upcase!
  @wrapped_string = @@utf8_proxy.upcase(@wrapped_string)
end

#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!

Replaces whitespace with dashes (“-”).

Returns:

  • String


229
230
231
# File 'lib/babosa/identifier.rb', line 229

def with_separators!(char = "-")
  @wrapped_string = @wrapped_string.gsub(/\s/u, char)
end

#word_chars!Object

Remove any non-word characters. For this library's purposes, this means anything other than letters, numbers, spaces, newlines and linefeeds.

Returns:

  • String


138
139
140
# File 'lib/babosa/identifier.rb', line 138

def word_chars!
  @wrapped_string = (unpack("U*") - Babosa::STRIPPABLE).pack("U*")
end