Module: Puppet::Util::CharacterEncoding
- Defined in:
- lib/puppet/util/character_encoding.rb
Overview
A module to centralize heuristics/practices for managing character encoding in Puppet
Class Method Summary collapse
-
.convert_to_utf_8(string) ⇒ String
Given a string, attempts to convert a copy of the string to UTF-8.
-
.override_encoding_to_utf_8(string) ⇒ String
Given a string, tests if that string’s bytes represent valid UTF-8, and if so return a copy of the string with external encoding set to UTF-8.
Class Method Details
.convert_to_utf_8(string) ⇒ String
Given a string, attempts to convert a copy of the string to UTF-8. Conversion uses encode - the string’s internal byte representation is modifed to UTF-8.
This method is intended for situations where we generally trust that the string’s bytes are a faithful representation of the current encoding associated with it, and can use it as a starting point for transcoding (conversion) to UTF-8.
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/puppet/util/character_encoding.rb', line 18 def convert_to_utf_8(string) original_encoding = string.encoding string_copy = string.dup begin if original_encoding == Encoding::UTF_8 unless string_copy.valid_encoding? Puppet.debug { _("%{value} is already labeled as UTF-8 but this encoding is invalid. It cannot be transcoded by Puppet.") % { value: string.dump } } end # String is already valid UTF-8 - noop string_copy else # If the string comes to us as BINARY encoded, we don't know what it # started as. However, to encode! we need a starting place, and our # best guess is whatever the system currently is (default_external). # So set external_encoding to default_external before we try to # transcode to UTF-8. string_copy.force_encoding(Encoding.default_external) if original_encoding == Encoding::BINARY string_copy.encode(Encoding::UTF_8) end rescue EncodingError => detail # Set the encoding on our copy back to its original if we modified it string_copy.force_encoding(original_encoding) if original_encoding == Encoding::BINARY # Catch both our own self-determined failure to transcode as well as any # error on ruby's part, ie Encoding::UndefinedConversionError on a # failure to encode!. Puppet.debug { _("%{error}: %{value} cannot be transcoded by Puppet.") % { error: detail.inspect, value: string.dump } } string_copy end end |
.override_encoding_to_utf_8(string) ⇒ String
Given a string, tests if that string’s bytes represent valid UTF-8, and if so return a copy of the string with external encoding set to UTF-8. Does not modify the byte representation of the string. If the string does not represent valid UTF-8, does not set the external encoding.
This method is intended for situations where we do not believe that the encoding associated with a string is an accurate reflection of its actual bytes, i.e., effectively when we believe Ruby is incorrect in its assertion of the encoding of the string.
a copy of the original string if override would result in invalid encoding.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/puppet/util/character_encoding.rb', line 67 def override_encoding_to_utf_8(string) string_copy = string.dup original_encoding = string_copy.encoding return string_copy if original_encoding == Encoding::UTF_8 if string_copy.force_encoding(Encoding::UTF_8).valid_encoding? string_copy else Puppet.debug { _("%{value} is not valid UTF-8 and result of overriding encoding would be invalid.") % { value: string.dump } } # Set copy back to its original encoding before returning string_copy.force_encoding(original_encoding) end end |