Module: UTF8Encoding

Includes:
ControlCharacters, ForceBinary, ObjectSupport
Included in:
NdrSupport::YAML::SerializationMigration
Defined in:
lib/ndr_support/utf8_encoding.rb,
lib/ndr_support/utf8_encoding/force_binary.rb,
lib/ndr_support/utf8_encoding/object_support.rb,
lib/ndr_support/utf8_encoding/control_characters.rb

Overview

Allows any object (if supported) to have all related strings encoded in place to UTF-8.

Defined Under Namespace

Modules: ControlCharacters, ForceBinary, ObjectSupport Classes: UTF8CoercionError

Constant Summary collapse

AUTO_ENCODINGS =

Our known source encodings, in order of preference:

%w( UTF-8 UTF-16 Windows-1252 )
REPLACEMENT_SCHEME =

How should unmappable characters be escaped, when forcing encoding?

lambda { |char| '0x' + char.ord.to_s(16).rjust(2, '0') }
UTF8 =
'UTF-8'.freeze
BINARY =
'BINARY'.freeze

Constants included from ControlCharacters

ControlCharacters::CONTROL_CHARACTERS

Instance Method Summary collapse

Methods included from ObjectSupport

#ensure_utf8_array!, #ensure_utf8_hash!, #ensure_utf8_object!

Methods included from ForceBinary

#binary_encode_any_high_ascii

Methods included from ControlCharacters

#escape_control_chars, #escape_control_chars!, #escape_control_chars_in_array!, #escape_control_chars_in_hash!, #escape_control_chars_in_object!

Instance Method Details

#coerce_utf8(string, source_encoding = nil) ⇒ Object

Returns a UTF-8 version of ‘string`, escaping any unmappable characters.



47
48
49
# File 'lib/ndr_support/utf8_encoding.rb', line 47

def coerce_utf8(string, source_encoding = nil)
  coerce_utf8!(string.dup, source_encoding)
end

#coerce_utf8!(string, source_encoding = nil) ⇒ Object

Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.



52
53
54
55
56
57
58
# File 'lib/ndr_support/utf8_encoding.rb', line 52

def coerce_utf8!(string, source_encoding = nil)
  # Try normally first...
  ensure_utf8!(string, source_encoding)
rescue UTF8CoercionError
  # ...before going back-to-basics, and replacing things that don't map:
  string.encode!(UTF8, BINARY, :fallback => REPLACEMENT_SCHEME)
end

#ensure_utf8(string, source_encoding = nil) ⇒ Object

Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.



24
25
26
# File 'lib/ndr_support/utf8_encoding.rb', line 24

def ensure_utf8(string, source_encoding = nil)
  ensure_utf8!(string.dup, source_encoding)
end

#ensure_utf8!(string, source_encoding = nil) ⇒ Object

Attempts to encode ‘string` to UTF-8, in place. Returns `string`, or raises an exception.



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/ndr_support/utf8_encoding.rb', line 30

def ensure_utf8!(string, source_encoding = nil)
  # A list of encodings we should try from:
  candidates = source_encoding ? Array.wrap(source_encoding) : AUTO_ENCODINGS

  # Attempt to coerce the string to UTF-8, from one of the source
  # candidates (in order of preference):
  apply_candidates!(string, candidates)

  unless string.valid_encoding?
    # None of our candidate source encodings worked, so fail:
    fail(UTF8CoercionError, "Attempted to use: #{candidates}")
  end

  string
end