Module: UTF8Encoding
- Includes:
- ControlCharacters, ForceBinary, ObjectSupport
- Included in:
- NdrSupport::YAML::SerializationMigration
- Defined in:
- lib/ndr_support/utf8_encoding.rb,
lib/ndr_support/utf8_encoding/force_binary.rb,
lib/ndr_support/utf8_encoding/object_support.rb,
lib/ndr_support/utf8_encoding/control_characters.rb
Overview
Allows any object (if supported) to have all related strings encoded in place to UTF-8.
Defined Under Namespace
Modules: ControlCharacters, ForceBinary, ObjectSupport Classes: UTF8CoercionError
Constant Summary collapse
- AUTO_ENCODINGS =
Our known source encodings, in order of preference:
%w( UTF-8 UTF-16 Windows-1252 )
- REPLACEMENT_SCHEME =
How should unmappable characters be escaped, when forcing encoding?
lambda { |char| '0x' + char.ord.to_s(16).rjust(2, '0') }
- UTF8 =
'UTF-8'.freeze
- BINARY =
'BINARY'.freeze
Constants included from ControlCharacters
ControlCharacters::CONTROL_CHARACTERS
Instance Method Summary collapse
-
#coerce_utf8(string, source_encoding = nil) ⇒ Object
Returns a UTF-8 version of ‘string`, escaping any unmappable characters.
-
#coerce_utf8!(string, source_encoding = nil) ⇒ Object
Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.
-
#ensure_utf8(string, source_encoding = nil) ⇒ Object
Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.
-
#ensure_utf8!(string, source_encoding = nil) ⇒ Object
Attempts to encode ‘string` to UTF-8, in place.
Methods included from ObjectSupport
#ensure_utf8_array!, #ensure_utf8_hash!, #ensure_utf8_object!
Methods included from ForceBinary
Methods included from ControlCharacters
#escape_control_chars, #escape_control_chars!, #escape_control_chars_in_array!, #escape_control_chars_in_hash!, #escape_control_chars_in_object!
Instance Method Details
#coerce_utf8(string, source_encoding = nil) ⇒ Object
Returns a UTF-8 version of ‘string`, escaping any unmappable characters.
47 48 49 |
# File 'lib/ndr_support/utf8_encoding.rb', line 47 def coerce_utf8(string, source_encoding = nil) coerce_utf8!(string.dup, source_encoding) end |
#coerce_utf8!(string, source_encoding = nil) ⇒ Object
Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.
52 53 54 55 56 57 58 |
# File 'lib/ndr_support/utf8_encoding.rb', line 52 def coerce_utf8!(string, source_encoding = nil) # Try normally first... ensure_utf8!(string, source_encoding) rescue UTF8CoercionError # ...before going back-to-basics, and replacing things that don't map: string.encode!(UTF8, BINARY, :fallback => REPLACEMENT_SCHEME) end |
#ensure_utf8(string, source_encoding = nil) ⇒ Object
Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.
24 25 26 |
# File 'lib/ndr_support/utf8_encoding.rb', line 24 def ensure_utf8(string, source_encoding = nil) ensure_utf8!(string.dup, source_encoding) end |
#ensure_utf8!(string, source_encoding = nil) ⇒ Object
Attempts to encode ‘string` to UTF-8, in place. Returns `string`, or raises an exception.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/ndr_support/utf8_encoding.rb', line 30 def ensure_utf8!(string, source_encoding = nil) # A list of encodings we should try from: candidates = source_encoding ? Array.wrap(source_encoding) : AUTO_ENCODINGS # Attempt to coerce the string to UTF-8, from one of the source # candidates (in order of preference): apply_candidates!(string, candidates) unless string.valid_encoding? # None of our candidate source encodings worked, so fail: fail(UTF8CoercionError, "Attempted to use: #{candidates}") end string end |