Module: Puppet::Util::CharacterEncoding

Defined in:: lib/puppet/util/character_encoding.rb

Overview

A module to centralize heuristics/practices for managing character encoding in Puppet

Constant Summary collapse

REPLACEMENT_CHAR_MAP =

{
  Encoding::UTF_8 => "\uFFFD",
  Encoding::UTF_16LE => "\xFD\xFF".force_encoding(Encoding::UTF_16LE),
}

Class Method Summary collapse

.convert_to_utf_8(string) ⇒ String

Given a string, attempts to convert a copy of the string to UTF-8.
.override_encoding_to_utf_8(string) ⇒ String

Given a string, tests if that string’s bytes represent valid UTF-8, and if so return a copy of the string with external enocding set to UTF-8.
.scrub(string) ⇒ Object

Given a string, return a copy of that string with any invalid byte sequences in its current encoding replaced with the replacement character “uFFFD” (UTF-8) if the string is UTF-8 or UTF-16LE, or “?” otherwise.

Class Method Details

.convert_to_utf_8(string) ⇒ `String`

Given a string, attempts to convert a copy of the string to UTF-8. Conversion uses encode - the string’s internal byte representation is modifed to UTF-8.

This method is intended for situations where we generally trust that the string’s bytes are a faithful representation of the current encoding associated with it, and can use it as a starting point for transcoding (conversion) to UTF-8.

Parameters:

string (String) —

a string to transcode

Returns:

(String) —

copy of the original string, in UTF-8 if transcodable

# File 'lib/puppet/util/character_encoding.rb', line 16

def convert_to_utf_8(string)
  original_encoding = string.encoding
  string_copy = string.dup
  begin
    if original_encoding == Encoding::UTF_8
      if !string_copy.valid_encoding?
        Puppet.debug(_("%{value} is already labeled as UTF-8 but this encoding is invalid. It cannot be transcoded by Puppet.") %
          { value: string.dump })
      end
      # String is aleady valid UTF-8 - noop
      return string_copy
    else
      # If the string comes to us as BINARY encoded, we don't know what it
      # started as. However, to encode! we need a starting place, and our
      # best guess is whatever the system currently is (default_external).
      # So set external_encoding to default_external before we try to
      # transcode to UTF-8.
      string_copy.force_encoding(Encoding.default_external) if original_encoding == Encoding::BINARY
      return string_copy.encode(Encoding::UTF_8)
    end
  rescue EncodingError => detail
    # Set the encoding on our copy back to its original if we modified it
    string_copy.force_encoding(original_encoding) if original_encoding == Encoding::BINARY

    # Catch both our own self-determined failure to transcode as well as any
    # error on ruby's part, ie Encoding::UndefinedConversionError on a
    # failure to encode!.
    Puppet.debug(_("%{error}: %{value} cannot be transcoded by Puppet.") %
      { error: detail.inspect, value: string.dump })
    return string_copy
  end
end

.override_encoding_to_utf_8(string) ⇒ `String`

Given a string, tests if that string’s bytes represent valid UTF-8, and if so return a copy of the string with external enocding set to UTF-8. Does not modify the byte representation of the string. If the string does not represent valid UTF-8, does not set the external encoding.

This method is intended for situations where we do not believe that the encoding associated with a string is an accurate reflection of its actual bytes, i.e., effectively when we believe Ruby is incorrect in its assertion of the encoding of the string.

a copy of the original string if override would result in invalid encoding.

Parameters:

string (String) —

to set external encoding (re-label) to utf-8

Returns:

(String) —

a copy of string with external encoding set to utf-8, or

# File 'lib/puppet/util/character_encoding.rb', line 63

def override_encoding_to_utf_8(string)
  string_copy = string.dup
  original_encoding = string_copy.encoding
  return string_copy if original_encoding == Encoding::UTF_8
  if string_copy.force_encoding(Encoding::UTF_8).valid_encoding?
    return string_copy
  else
    Puppet.debug(_("%{value} is not valid UTF-8 and result of overriding encoding would be invalid.") % { value: string.dump })
    # Set copy back to its original encoding before returning
    return string_copy.force_encoding(original_encoding)
  end
end

.scrub(string) ⇒ `Object`

Note:

does not modify encoding, but new string will have different bytes from original. Only needed for ruby 1.9.3 support.

Given a string, return a copy of that string with any invalid byte sequences in its current encoding replaced with the replacement character “uFFFD” (UTF-8) if the string is UTF-8 or UTF-16LE, or “?” otherwise.

Parameters:

string —

a string to remove invalid byte sequences from

Returns:

a copy of string invalid byte sequences replaced by the unicode replacement character or “?” character

# File 'lib/puppet/util/character_encoding.rb', line 89

def scrub(string)
  if string.respond_to?(:scrub)
    string.scrub
  else
    replacement_character = REPLACEMENT_CHAR_MAP[string.encoding] || '?'
    string.chars.map { |c| c.valid_encoding? ? c : replacement_character }.join
  end
end

Module: Puppet::Util::CharacterEncoding

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.convert_to_utf_8(string) ⇒ String

.override_encoding_to_utf_8(string) ⇒ String

.scrub(string) ⇒ Object

.convert_to_utf_8(string) ⇒ `String`

.override_encoding_to_utf_8(string) ⇒ `String`

.scrub(string) ⇒ `Object`