Class: Babosa::Identifier
- Inherits:
-
Object
- Object
- Babosa::Identifier
- Defined in:
- lib/babosa/identifier.rb
Overview
This class provides some string-manipulation methods specific to slugs.
Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of Identifier, so that calls to methods specific to this class can be chained:
string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">
Constant Summary collapse
- Error =
Class.new(StandardError)
Instance Attribute Summary collapse
-
#wrapped_string ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute wrapped_string.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
-
#default_normalize_options ⇒ Object
The default options for #normalize!.
- #eql?(other) ⇒ Boolean
-
#initialize(string) ⇒ Identifier
constructor
A new instance of Identifier.
- #method_missing(symbol, *args, &block) ⇒ Object
-
#normalize!(options = {}) ⇒ Object
Normalize the string for use as a URL slug.
-
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
- #respond_to_missing?(name, include_all) ⇒ Boolean
-
#strip_leading_digits! ⇒ Object
Strip any leading digits.
-
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
-
#to_ascii! ⇒ Object
Delete any non-ascii characters.
- #to_identifier ⇒ Object (also: #to_slug)
- #to_ruby_method(allow_bangs: true) ⇒ Object
-
#to_ruby_method!(allow_bangs: true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
-
#transliterate!(*kinds) ⇒ Object
(also: #approximate_ascii!)
Approximate an ASCII string.
-
#truncate!(max) ⇒ Object
Truncate the string to
max
characters. -
#truncate_bytes!(max) ⇒ Object
Truncate the string to
max
bytes. -
#with_separators!(char = "-") ⇒ Object
(also: #with_dashes!)
Replaces whitespace with dashes (“-”).
-
#word_chars! ⇒ Object
Remove any non-word characters.
Constructor Details
#initialize(string) ⇒ Identifier
Returns a new instance of Identifier.
36 37 38 39 40 |
# File 'lib/babosa/identifier.rb', line 36 def initialize(string) @wrapped_string = string.to_s.dup tidy_bytes! normalize_utf8! end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(symbol, *args, &block) ⇒ Object
27 28 29 |
# File 'lib/babosa/identifier.rb', line 27 def method_missing(symbol, *args, &block) @wrapped_string.__send__(symbol, *args, &block) end |
Instance Attribute Details
#wrapped_string ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute wrapped_string.
24 25 26 |
# File 'lib/babosa/identifier.rb', line 24 def wrapped_string @wrapped_string end |
Instance Method Details
#==(other) ⇒ Object
42 43 44 |
# File 'lib/babosa/identifier.rb', line 42 def ==(other) to_s == other.to_s end |
#clean! ⇒ Object
Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.
98 99 100 101 102 |
# File 'lib/babosa/identifier.rb', line 98 def clean! gsub!(/[- ]+/, " ") strip! to_s end |
#default_normalize_options ⇒ Object
The default options for #normalize!. Override to set your own defaults.
247 248 249 |
# File 'lib/babosa/identifier.rb', line 247 def {transliterate: :latin, max_length: 255, separator: "-"} end |
#eql?(other) ⇒ Boolean
46 47 48 |
# File 'lib/babosa/identifier.rb', line 46 def eql?(other) self == other end |
#normalize!(options = {}) ⇒ Object
Normalize the string for use as a URL slug. Note that in this context, normalize
means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'lib/babosa/identifier.rb', line 123 def normalize!( = {}) = .merge() if [:transliterate] option = [:transliterate] if option == true transliterate!(*[:transliterations]) else transliterate!(*option) end end to_ascii! if [:to_ascii] word_chars! clean! downcase! truncate_bytes!([:max_length]) with_separators!([:separator]) end |
#normalize_utf8! ⇒ Object
Perform Unicode composition on the wrapped string.
205 206 207 208 |
# File 'lib/babosa/identifier.rb', line 205 def normalize_utf8! unicode_normalize!(:nfc) to_s end |
#respond_to_missing?(name, include_all) ⇒ Boolean
31 32 33 |
# File 'lib/babosa/identifier.rb', line 31 def respond_to_missing?(name, include_all) @wrapped_string.respond_to?(name, include_all) end |
#strip_leading_digits! ⇒ Object
Strip any leading digits.
213 214 215 216 |
# File 'lib/babosa/identifier.rb', line 213 def strip_leading_digits! gsub!(/^\d+/, "") to_s end |
#tidy_bytes! ⇒ Object
Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.
221 222 223 224 225 226 |
# File 'lib/babosa/identifier.rb', line 221 def tidy_bytes! scrub! do |bad| bad.encode(Encoding::UTF_8, Encoding::Windows_1252, invalid: :replace, undef: :replace) end to_s end |
#to_ascii! ⇒ Object
Delete any non-ascii characters.
162 163 164 165 |
# File 'lib/babosa/identifier.rb', line 162 def to_ascii! gsub!(/[^\x00-\x7f]/u, "") to_s end |
#to_identifier ⇒ Object Also known as: to_slug
242 243 244 |
# File 'lib/babosa/identifier.rb', line 242 def to_identifier self end |
#to_ruby_method(allow_bangs: true) ⇒ Object
238 239 240 |
# File 'lib/babosa/identifier.rb', line 238 def to_ruby_method(allow_bangs: true) with_new_instance { |id| id.to_ruby_method!(allow_bangs: allow_bangs) } end |
#to_ruby_method!(allow_bangs: true) ⇒ Object
Normalize a string so that it can safely be used as a Ruby method name.
146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/babosa/identifier.rb', line 146 def to_ruby_method!(allow_bangs: true) last_char = self[-1] transliterate! to_ascii! word_chars! strip_leading_digits! clean! @wrapped_string += last_char if allow_bangs && ["!", "?"].include?(last_char) raise Error, "Input generates impossible Ruby method name" if self == "" with_separators!("_") end |
#transliterate!(*kinds) ⇒ Object Also known as: approximate_ascii!
Approximate an ASCII string. This works only for strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = Identifier.new "Łódź, Poland"
string.transliterate # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate # => "日本"
You can pass the names of any transliterator class as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.
string = Identifier.new "Jürgen Müller"
string.transliterate # => "Jurgen Muller"
string.transliterate :german # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate # => "¡Feliz ano!"
string.transliterate :spanish # => "¡Feliz anio!"
The approximations are an array, which you can modify if you choose:
# Make Spanish use "nh" rather than "nn"
Babosa::Transliterator::Spanish::APPROXIMATIONS["ñ"] = "nh"
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:
string.transliterate!(:spanish) # => "¡Feliz anio!"
string.to_ascii! # => "Feliz anio!"
84 85 86 87 88 89 90 91 92 |
# File 'lib/babosa/identifier.rb', line 84 def transliterate!(*kinds) kinds.compact! kinds = [:latin] if kinds.empty? kinds.each do |kind| transliterator = Transliterator.get(kind).instance @wrapped_string = transliterator.transliterate(@wrapped_string) end to_s end |
#truncate!(max) ⇒ Object
Truncate the string to max
characters.
174 175 176 |
# File 'lib/babosa/identifier.rb', line 174 def truncate!(max) @wrapped_string = slice(0, max) end |
#truncate_bytes!(max) ⇒ Object
Truncate the string to max
bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max
if the string must be truncated at a multibyte character boundary.
188 189 190 191 |
# File 'lib/babosa/identifier.rb', line 188 def truncate_bytes!(max) truncate!(max) chop! until bytesize <= max end |
#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!
Replaces whitespace with dashes (“-”).
197 198 199 200 |
# File 'lib/babosa/identifier.rb', line 197 def with_separators!(char = "-") gsub!(/\s/u, char) to_s end |
#word_chars! ⇒ Object
Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, underscores, dashes, newlines, and linefeeds.
109 110 111 112 113 114 115 |
# File 'lib/babosa/identifier.rb', line 109 def word_chars! # `^\p{letter}` = Any non-Unicode letter # `&&` = add the following character class # `[^ _\n\r\p{Extended_Pictographic}]` = Anything other than space, underscore, newline, linefeed or emojis gsub!(/[[^\p{letter}]&&[^ \d_\-\n\r\p{Extended_Pictographic}]]/, "") to_s end |