Class: Babosa::Identifier

Inherits:
Object
  • Object
show all
Defined in:
lib/babosa/identifier.rb

Overview

This class provides some string-manipulation methods specific to slugs.

Note that this class includes many “bang methods” such as #clean! and #normalize! that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., Identifier#clean! and Identifier#clean) which does not appear in the documentation because it is generated dynamically.

All of the bang methods return an instance of String, while the bangless versions return an instance of Identifier, so that calls to methods specific to this class can be chained:

string = Identifier.new("hello world")
string.with_separators! # => "hello-world"
string.with_separators  # => <Babosa::Identifier:0x000001013e1590 @wrapped_string="hello-world">

Constant Summary collapse

Error =
Class.new(StandardError)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(string) ⇒ Identifier

Returns a new instance of Identifier.

Parameters:

  • string (#to_s)

    The string to use as the basis of the Identifier.



36
37
38
39
40
# File 'lib/babosa/identifier.rb', line 36

def initialize(string)
  @wrapped_string = string.to_s.dup
  tidy_bytes!
  normalize_utf8!
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(symbol, *args, &block) ⇒ Object



27
28
29
# File 'lib/babosa/identifier.rb', line 27

def method_missing(symbol, *args, &block)
  @wrapped_string.__send__(symbol, *args, &block)
end

Instance Attribute Details

#wrapped_stringObject (readonly) Also known as: to_s

Returns the value of attribute wrapped_string.



24
25
26
# File 'lib/babosa/identifier.rb', line 24

def wrapped_string
  @wrapped_string
end

Instance Method Details

#==(other) ⇒ Object



42
43
44
# File 'lib/babosa/identifier.rb', line 42

def ==(other)
  to_s == other.to_s
end

#clean!Object

Converts dashes to spaces, removes leading and trailing spaces, and replaces multiple whitespace characters with a single space.

Returns:

  • String



98
99
100
101
102
# File 'lib/babosa/identifier.rb', line 98

def clean!
  gsub!(/[- ]+/, " ")
  strip!
  to_s
end

#default_normalize_optionsObject

The default options for #normalize!. Override to set your own defaults.



247
248
249
# File 'lib/babosa/identifier.rb', line 247

def default_normalize_options
  {transliterate: :latin, max_length: 255, separator: "-"}
end

#eql?(other) ⇒ Boolean

Returns:

  • (Boolean)


46
47
48
# File 'lib/babosa/identifier.rb', line 46

def eql?(other)
  self == other
end

#normalize!(options = {}) ⇒ Object

Normalize the string for use as a URL slug. Note that in this context, normalize means, strip, remove non-letters/numbers, downcasing, truncating to 255 bytes and converting whitespace to dashes.

Parameters:

  • options (Hash) (defaults to: {})

Returns:

  • String



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# File 'lib/babosa/identifier.rb', line 123

def normalize!(options = {})
  options = default_normalize_options.merge(options)

  if options[:transliterate]
    option = options[:transliterate]
    if option == true
      transliterate!(*options[:transliterations])
    else
      transliterate!(*option)
    end
  end
  to_ascii! if options[:to_ascii]
  word_chars!
  clean!
  downcase!
  truncate_bytes!(options[:max_length])
  with_separators!(options[:separator])
end

#normalize_utf8!Object

Perform Unicode composition on the wrapped string.

Returns:

  • String



205
206
207
208
# File 'lib/babosa/identifier.rb', line 205

def normalize_utf8!
  unicode_normalize!(:nfc)
  to_s
end

#respond_to_missing?(name, include_all) ⇒ Boolean

Returns:

  • (Boolean)


31
32
33
# File 'lib/babosa/identifier.rb', line 31

def respond_to_missing?(name, include_all)
  @wrapped_string.respond_to?(name, include_all)
end

#strip_leading_digits!Object

Strip any leading digits.

Returns:

  • String



213
214
215
216
# File 'lib/babosa/identifier.rb', line 213

def strip_leading_digits!
  gsub!(/^\d+/, "")
  to_s
end

#tidy_bytes!Object

Attempt to convert characters encoded using CP1252 and IS0-8859-1 to UTF-8.

Returns:

  • String



221
222
223
224
225
226
# File 'lib/babosa/identifier.rb', line 221

def tidy_bytes!
  scrub! do |bad|
    bad.encode(Encoding::UTF_8, Encoding::Windows_1252, invalid: :replace, undef: :replace)
  end
  to_s
end

#to_ascii!Object

Delete any non-ascii characters.

Returns:

  • String



162
163
164
165
# File 'lib/babosa/identifier.rb', line 162

def to_ascii!
  gsub!(/[^\x00-\x7f]/u, "")
  to_s
end

#to_identifierObject Also known as: to_slug



242
243
244
# File 'lib/babosa/identifier.rb', line 242

def to_identifier
  self
end

#to_ruby_method(allow_bangs: true) ⇒ Object



238
239
240
# File 'lib/babosa/identifier.rb', line 238

def to_ruby_method(allow_bangs: true)
  with_new_instance { |id| id.to_ruby_method!(allow_bangs: allow_bangs) }
end

#to_ruby_method!(allow_bangs: true) ⇒ Object

Normalize a string so that it can safely be used as a Ruby method name.

Parameters:

  • allow_bangs (Boolean) (defaults to: true)

Returns:

  • String

Raises:



146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/babosa/identifier.rb', line 146

def to_ruby_method!(allow_bangs: true)
  last_char = self[-1]
  transliterate!
  to_ascii!
  word_chars!
  strip_leading_digits!
  clean!
  @wrapped_string += last_char if allow_bangs && ["!", "?"].include?(last_char)
  raise Error, "Input generates impossible Ruby method name" if self == ""

  with_separators!("_")
end

#transliterate!(*kinds) ⇒ Object Also known as: approximate_ascii!

Approximate an ASCII string. This works only for strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.

string = Identifier.new "Łódź, Poland"
string.transliterate                 # => "Lodz, Poland"
string = Identifier.new "日本"
string.transliterate                 # => "日本"

You can pass the names of any transliterator class as arguments. This allows for contextual approximations. Various languages are supported, you can see which ones by looking at the source of Transliterator::Base.

string = Identifier.new "Jürgen Müller"
string.transliterate                 # => "Jurgen Muller"
string.transliterate :german         # => "Juergen Mueller"
string = Identifier.new "¡Feliz año!"
string.transliterate                 # => "¡Feliz ano!"
string.transliterate :spanish        # => "¡Feliz anio!"

The approximations are an array, which you can modify if you choose:

# Make Spanish use "nh" rather than "nn"
Babosa::Transliterator::Spanish::APPROXIMATIONS["ñ"] = "nh"

Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use #to_ascii!:

string.transliterate!(:spanish)       # => "¡Feliz anio!"
string.to_ascii!                      # => "Feliz anio!"

Parameters:

  • *args (Symbol)

Returns:

  • String



84
85
86
87
88
89
90
91
92
# File 'lib/babosa/identifier.rb', line 84

def transliterate!(*kinds)
  kinds.compact!
  kinds = [:latin] if kinds.empty?
  kinds.each do |kind|
    transliterator = Transliterator.get(kind).instance
    @wrapped_string = transliterator.transliterate(@wrapped_string)
  end
  to_s
end

#truncate!(max) ⇒ Object

Truncate the string to max characters.

Examples:

"üéøá".to_identifier.truncate(3) #=> "üéø"

Parameters:

  • max (Integer)

    The maximum number of characters.

Returns:

  • String



174
175
176
# File 'lib/babosa/identifier.rb', line 174

def truncate!(max)
  @wrapped_string = slice(0, max)
end

#truncate_bytes!(max) ⇒ Object

Truncate the string to max bytes. This can be useful for ensuring that a UTF-8 string will always fit into a database column with a certain max byte length. The resulting string may be less than max if the string must be truncated at a multibyte character boundary.

Examples:

"üéøá".to_identifier.truncate_bytes(3) #=> "ü"

Parameters:

  • max (Integer)

    The maximum number of bytes.

Returns:

  • String



188
189
190
191
# File 'lib/babosa/identifier.rb', line 188

def truncate_bytes!(max)
  truncate!(max)
  chop! until bytesize <= max
end

#with_separators!(char = "-") ⇒ Object Also known as: with_dashes!

Replaces whitespace with dashes (“-”).

Parameters:

  • char (String) (defaults to: "-")

    the separator character to use.

Returns:

  • String



197
198
199
200
# File 'lib/babosa/identifier.rb', line 197

def with_separators!(char = "-")
  gsub!(/\s/u, char)
  to_s
end

#word_chars!Object

Remove any non-word characters. For this library’s purposes, this means anything other than letters, numbers, spaces, underscores, dashes, newlines, and linefeeds.

Returns:

  • String



109
110
111
112
113
114
115
# File 'lib/babosa/identifier.rb', line 109

def word_chars!
  # `^\p{letter}` = Any non-Unicode letter
  # `&&` = add the following character class
  # `[^ _\n\r\p{Extended_Pictographic}]` = Anything other than space, underscore, newline, linefeed or emojis
  gsub!(/[[^\p{letter}]&&[^ \d_\-\n\r\p{Extended_Pictographic}]]/, "")
  to_s
end