Module: Mail::Multibyte

Defined in:
lib/mail/multibyte/chars.rb,
lib/mail/multibyte.rb,
lib/mail/multibyte/utils.rb,
lib/mail/multibyte/unicode.rb

Overview

:nodoc:

Defined Under Namespace

Modules: Unicode Classes: Chars, EncodingError

Constant Summary collapse

VALID_CHARACTER =

Regular expressions that describe valid byte sequences for a character

{
  # Borrowed from the Kconv library by Shinji KONO - (also as seen on the W3C site)
  'UTF-8' => /\A(?:
              [\x00-\x7f]                                         |
              [\xc2-\xdf] [\x80-\xbf]                             |
              \xe0        [\xa0-\xbf] [\x80-\xbf]                 |
              [\xe1-\xef] [\x80-\xbf] [\x80-\xbf]                 |
              \xf0        [\x90-\xbf] [\x80-\xbf] [\x80-\xbf]     |
              [\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf]     |
              \xf4        [\x80-\x8f] [\x80-\xbf] [\x80-\xbf])\z /xn,
  # Quick check for valid Shift-JIS characters, disregards the odd-even pairing
  'Shift_JIS' => /\A(?:
              [\x00-\x7e\xa1-\xdf]                                     |
              [\x81-\x9f\xe0-\xef] [\x40-\x7e\x80-\x9e\x9f-\xfc])\z /xn
}

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.proxy_classObject

The proxy class returned when calling mb_chars. You can use this accessor to configure your own proxy class so you can support other encodings. See the Mail::Multibyte::Chars implementation for an example how to do this.

Example:

Mail::Multibyte.proxy_class = CharsForUTF32


17
18
19
# File 'lib/mail/multibyte.rb', line 17

def proxy_class
  @proxy_class
end

Class Method Details

.clean(string) ⇒ Object

Removes all invalid characters from the string.

Note: this method is a no-op in Ruby 1.9



36
37
38
# File 'lib/mail/multibyte/utils.rb', line 36

def self.clean(string)
  string
end

.is_utf8?(string) ⇒ Boolean

Returns true if string has valid utf-8 encoding

Returns:

  • (Boolean)


12
13
14
15
16
17
18
19
20
21
# File 'lib/mail/multibyte/utils.rb', line 12

def self.is_utf8?(string)
  case string.encoding
  when Encoding::UTF_8
    verify(string)
  when Encoding::ASCII_8BIT, Encoding::US_ASCII
    verify(to_utf8(string))
  else
    false
  end
end

.mb_chars(str) ⇒ Object

Multibyte proxy

mb_chars is a multibyte safe proxy for string methods.

In Ruby 1.8 and older it creates and returns an instance of the Mail::Multibyte::Chars class which encapsulates the original string. A Unicode safe version of all the String methods are defined on this proxy class. If the proxy class doesn’t respond to a certain method, it’s forwarded to the encapsuled string.

name = 'Claus Müller'
name.reverse # => "rell??M sualC"
name.length  # => 13

name.mb_chars.reverse.to_s # => "rellüM sualC"
name.mb_chars.length       # => 12

In Ruby 1.9 and newer mb_chars returns self because String is (mostly) encoding aware. This means that it becomes easy to run one version of your code on multiple Ruby versions.

Method chaining

All the methods on the Chars proxy which normally return a string will return a Chars object. This allows method chaining on the result of any of these methods.

name.mb_chars.reverse.length # => 12

Interoperability and configuration

The Chars object tries to be as interchangeable with String objects as possible: sorting and comparing between String and Char work like expected. The bang! methods change the internal string representation in the Chars object. Interoperability problems can be resolved easily with a to_s call.

For more information about the methods defined on the Chars proxy see Mail::Multibyte::Chars. For information about how to change the default Multibyte behaviour see Mail::Multibyte.



55
56
57
58
59
60
61
# File 'lib/mail/multibyte.rb', line 55

def self.mb_chars(str)
  if is_utf8?(str)
    proxy_class.new(str)
  else
    str
  end
end

.to_utf8(string) ⇒ Object



40
41
42
# File 'lib/mail/multibyte/utils.rb', line 40

def self.to_utf8(string)
  string.dup.force_encoding(Encoding::UTF_8)
end

.valid_characterObject

Returns a regular expression that matches valid characters in the current encoding



7
8
9
# File 'lib/mail/multibyte/utils.rb', line 7

def self.valid_character
  VALID_CHARACTER[Encoding.default_external.to_s]
end

.verify(string) ⇒ Object

Verifies the encoding of a string



24
25
26
# File 'lib/mail/multibyte/utils.rb', line 24

def self.verify(string)
  string.valid_encoding?
end

.verify!(string) ⇒ Object

Verifies the encoding of the string and raises an exception when it’s not valid

Raises:



29
30
31
# File 'lib/mail/multibyte/utils.rb', line 29

def self.verify!(string)
  raise EncodingError.new("Found characters with invalid encoding") unless verify(string)
end