Method: ActiveSupport::Inflector#transliterate

Defined in:
activesupport/lib/active_support/inflector/transliterate.rb

#transliterate(string, replacement = "?", locale: nil) ⇒ Object

Replaces non-ASCII characters with an ASCII approximation, or if none exists, a replacement character which defaults to “?”.

transliterate('Ærøskøbing')
# => "AEroskobing"

Default approximations are provided for Western/Latin characters, e.g, “ø”, “ñ”, “é”, “ß”, etc.

This method is I18n aware, so you can set up custom approximations for a locale. This can be useful, for example, to transliterate German’s “ü” and “ö” to “ue” and “oe”, or to add support for transliterating Russian to ASCII.

In order to make your custom transliterations available, you must set them as the i18n.transliterate.rule i18n key:

# Store the transliterations in locales/de.yml
i18n:
  transliterate:
    rule:
      ü: "ue"
      ö: "oe"

# Or set them using Ruby
I18n.backend.store_translations(:de, i18n: {
  transliterate: {
    rule: {
      'ü' => 'ue',
      'ö' => 'oe'
    }
  }
})

The value for i18n.transliterate.rule can be a simple Hash that maps characters to ASCII approximations as shown above, or, for more complex requirements, a Proc:

I18n.backend.store_translations(:de, i18n: {
  transliterate: {
    rule: ->(string) { MyTransliterator.transliterate(string) }
  }
})

Now you can have different transliterations for each locale:

transliterate('Jürgen', locale: :en)
# => "Jurgen"

transliterate('Jürgen', locale: :de)
# => "Juergen"

Transliteration is restricted to UTF-8, US-ASCII, and GB18030 strings. Other encodings will raise an ArgumentError.

Raises:

  • (ArgumentError)
[View source]

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'activesupport/lib/active_support/inflector/transliterate.rb', line 64

def transliterate(string, replacement = "?", locale: nil)
  raise ArgumentError, "Can only transliterate strings. Received #{string.class.name}" unless string.is_a?(String)
  raise ArgumentError, "Cannot transliterate strings with #{string.encoding} encoding" unless ALLOWED_ENCODINGS_FOR_TRANSLITERATE.include?(string.encoding)

  return string.dup if string.ascii_only?
  string = string.dup if string.frozen?

  input_encoding = string.encoding

  # US-ASCII is a subset of UTF-8 so we'll force encoding as UTF-8 if
  # US-ASCII is given. This way we can let tidy_bytes handle the string
  # in the same way as we do for UTF-8
  string.force_encoding(Encoding::UTF_8) if string.encoding == Encoding::US_ASCII

  # GB18030 is Unicode compatible but is not a direct mapping so needs to be
  # transcoded. Using invalid/undef :replace will result in loss of data in
  # the event of invalid characters, but since tidy_bytes will replace
  # invalid/undef with a "?" we're safe to do the same beforehand
  string.encode!(Encoding::UTF_8, invalid: :replace, undef: :replace) if string.encoding == Encoding::GB18030

  transliterated = I18n.transliterate(
    ActiveSupport::Multibyte::Unicode.tidy_bytes(string).unicode_normalize(:nfc),
    replacement: replacement,
    locale: locale
  )

  # Restore the string encoding of the input if it was not UTF-8.
  # Apply invalid/undef :replace as tidy_bytes does
  transliterated.encode!(input_encoding, invalid: :replace, undef: :replace) if input_encoding != transliterated.encoding

  transliterated
end