Class: Babosa::Identifier
- Inherits:
-
Object
- Object
- Babosa::Identifier
- Defined in:
- lib/babosa/identifier.rb
Overview
This class provides some string-manipulation methods specific to slugs.
Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of Babosa::Identifier, so that calls to methods specific to this class can be chained:
string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">
Constant Summary collapse
- @@utf8_proxy =
if Babosa.jruby15? UTF8::JavaProxy elsif defined? Unicode UTF8::UnicodeProxy elsif defined? ActiveSupport UTF8::ActiveSupportProxy else UTF8::DumbProxy end
Instance Attribute Summary collapse
-
#wrapped_string ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute wrapped_string.
Class Method Summary collapse
-
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
-
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
Instance Method Summary collapse
- #==(value) ⇒ Object
-
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
-
#default_normalize_options ⇒ Object
The default options for #normalize!.
-
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
- #empty? ⇒ Boolean
- #eql?(value) ⇒ Boolean
-
#initialize(string) ⇒ Identifier
constructor
A new instance of Identifier.
- #method_missing(symbol, *args, &block) ⇒ Object
-
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug.
-
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
-
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
-
#to_ascii! ⇒ Object
Delete any non-ascii characters.
- #to_identifier ⇒ Object (also: #to_slug)
-
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
-
#transliterate!(kind = nil) ⇒ Object
(also: #approximate_ascii!)
Approximate an ASCII string.
-
#truncate!(max) ⇒ Object
Truncate the string to
max
characters. -
#truncate_bytes!(max) ⇒ Object
Truncate the string to
max
bytes. -
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
-
#with_separators!(char = "-") ⇒ Object
(also: #with_dashes!)
Replaces whitespace with dashes (“-”).
-
#word_chars! ⇒ Object
Remove any non-word characters.
Constructor Details
#initialize(string) ⇒ Identifier
Returns a new instance of Identifier.
63 64 65 66 67 |
# File 'lib/babosa/identifier.rb', line 63 def initialize(string) @wrapped_string = string.to_s tidy_bytes! normalize_utf8! end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(symbol, *args, &block) ⇒ Object
58 59 60 |
# File 'lib/babosa/identifier.rb', line 58 def method_missing(symbol, *args, &block) @wrapped_string.__send__(symbol, *args, &block) end |
Instance Attribute Details
#wrapped_string ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute wrapped_string.
33 34 35 |
# File 'lib/babosa/identifier.rb', line 33 def wrapped_string @wrapped_string end |
Class Method Details
.utf8_proxy ⇒ Object
Return the proxy used for UTF-8 support.
48 49 50 |
# File 'lib/babosa/identifier.rb', line 48 def self.utf8_proxy @@utf8_proxy end |
.utf8_proxy=(obj) ⇒ Object
Set a proxy object used for UTF-8 support.
54 55 56 |
# File 'lib/babosa/identifier.rb', line 54 def self.utf8_proxy=(obj) @@utf8_proxy = obj end |
Instance Method Details
#==(value) ⇒ Object
69 70 71 |
# File 'lib/babosa/identifier.rb', line 69 def ==(value) @wrapped_string.to_s == value.to_s end |
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
123 124 125 |
# File 'lib/babosa/identifier.rb', line 123 def clean! @wrapped_string = @wrapped_string.gsub("-", " ").squeeze(" ").strip end |
#default_normalize_options ⇒ Object
The default options for #normalize!. Override to set your own defaults.
259 260 261 |
# File 'lib/babosa/identifier.rb', line 259 def {:transliterate => true, :max_length => 255, :separator => "-"} end |
#downcase! ⇒ Object
Perform UTF-8 sensitive downcasing.
227 228 229 |
# File 'lib/babosa/identifier.rb', line 227 def downcase! @wrapped_string = @@utf8_proxy.downcase(@wrapped_string) end |
#empty? ⇒ Boolean
77 78 79 80 81 |
# File 'lib/babosa/identifier.rb', line 77 def empty? # included to make this class :respond_to? :empty for compatibility with Active Support's # #blank? @wrapped_string.empty? end |
#eql?(value) ⇒ Boolean
73 74 75 |
# File 'lib/babosa/identifier.rb', line 73 def eql?(value) @wrapped_string == value end |
#normalize!(options = nil) ⇒ Object
Normalize the string for use as a URL slug. Note that in this context, normalize
means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/babosa/identifier.rb', line 139 def normalize!( = nil) # Handle deprecated usage if == true warn "#normalize! now takes a hash of options rather than a boolean" = .merge(:to_ascii => true) else = .merge( || {}) end if [:transliterate] transliterate!(*[:transliterations]) end to_ascii! if [:to_ascii] clean! word_chars! clean! downcase! truncate_bytes!([:max_length]) with_separators!([:separator]) end |
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
233 234 235 |
# File 'lib/babosa/identifier.rb', line 233 def normalize_utf8! @wrapped_string = @@utf8_proxy.normalize_utf8(@wrapped_string) end |
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
240 241 242 |
# File 'lib/babosa/identifier.rb', line 240 def tidy_bytes! @wrapped_string = @@utf8_proxy.tidy_bytes(@wrapped_string) end |
#to_ascii! ⇒ Object
Delete any non-ascii characters.
179 180 181 |
# File 'lib/babosa/identifier.rb', line 179 def to_ascii! @wrapped_string = @wrapped_string.gsub(/[^\x00-\x7f]/u, '') end |
#to_identifier ⇒ Object Also known as: to_slug
254 255 256 |
# File 'lib/babosa/identifier.rb', line 254 def to_identifier self end |
#to_ruby_method!(allow_bangs = true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
# File 'lib/babosa/identifier.rb', line 160 def to_ruby_method!(allow_bangs = true) leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten if allow_bangs trailer.downcase.gsub!(/[^a-z0-9!=\\\\?]/, '') else trailer.downcase.gsub!(/[^a-z0-9]/, '') end id = leader.to_identifier id.transliterate! id.to_ascii! id.clean! id.word_chars! id.clean! @wrapped_string = id.to_s + trailer with_separators!("_") end |
#transliterate!(kind = nil) ⇒ Object Also known as: approximate_ascii!
Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = Identifier.new "Łódź
string.transliterate # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate # => "日本"
You can pass any key(s) from Characters.approximations
as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.
string = Identifier.new "Jürgen Müller"
string.transliterate # => "Jurgen Muller"
string.transliterate :german # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate # => "¡Feliz ano!"
string.transliterate :spanish # => "¡Feliz anio!"
You can modify the built-in approximations, or add your own:
# Make Spanish use "nh" rather than "nn"
Babosa::Characters.add_approximations(:spanish, "ñ" => "nh")
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:
string.transliterate!(:spanish) # => "¡Feliz anio!"
string.transliterate! # => "¡Feliz anio!"
115 116 117 118 |
# File 'lib/babosa/identifier.rb', line 115 def transliterate!(kind = nil) transliterator = Transliterator.get(kind || :latin).instance @wrapped_string = transliterator.transliterate(@wrapped_string) end |
#truncate!(max) ⇒ Object
Truncate the string to max
characters.
187 188 189 |
# File 'lib/babosa/identifier.rb', line 187 def truncate!(max) @wrapped_string = unpack("U*")[0...max].pack("U*") end |
#truncate_bytes!(max) ⇒ Object
Truncate the string to max
bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max
if the string must be truncated at a multibyte character boundary.
198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
# File 'lib/babosa/identifier.rb', line 198 def truncate_bytes!(max) return @wrapped_string if @wrapped_string.bytesize <= max curr = 0 new = [] unpack("U*").each do |char| break if curr > max char = [char].pack("U") curr += char.bytesize if curr <= max new << char end end @wrapped_string = new.join end |
#upcase! ⇒ Object
Perform UTF-8 sensitive upcasing.
221 222 223 |
# File 'lib/babosa/identifier.rb', line 221 def upcase! @wrapped_string = @@utf8_proxy.upcase(@wrapped_string) end |
#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!
Replaces whitespace with dashes (“-”).
215 216 217 |
# File 'lib/babosa/identifier.rb', line 215 def with_separators!(char = "-") @wrapped_string = @wrapped_string.gsub(/\s/u, char) end |
#word_chars! ⇒ Object
Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, newlines and linefeeds.
130 131 132 |
# File 'lib/babosa/identifier.rb', line 130 def word_chars! @wrapped_string = (unpack("U*") - Babosa::STRIPPABLE).pack("U*") end |