Module: Mail::Multibyte
- Defined in:
- lib/mail/multibyte/chars.rb,
lib/mail/multibyte.rb,
lib/mail/multibyte/utils.rb,
lib/mail/multibyte/unicode.rb
Overview
:nodoc:
Defined Under Namespace
Modules: Unicode Classes: Chars, EncodingError
Constant Summary collapse
- VALID_CHARACTER =
Regular expressions that describe valid byte sequences for a character
{ # Borrowed from the Kconv library by Shinji KONO - (also as seen on the W3C site) 'UTF-8' => /\A(?: [\x00-\x7f] | [\xc2-\xdf] [\x80-\xbf] | \xe0 [\xa0-\xbf] [\x80-\xbf] | [\xe1-\xef] [\x80-\xbf] [\x80-\xbf] | \xf0 [\x90-\xbf] [\x80-\xbf] [\x80-\xbf] | [\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf] | \xf4 [\x80-\x8f] [\x80-\xbf] [\x80-\xbf])\z /xn, # Quick check for valid Shift-JIS characters, disregards the odd-even pairing 'Shift_JIS' => /\A(?: [\x00-\x7e\xa1-\xdf] | [\x81-\x9f\xe0-\xef] [\x40-\x7e\x80-\x9e\x9f-\xfc])\z /xn }
Class Attribute Summary collapse
-
.proxy_class ⇒ Object
The proxy class returned when calling mb_chars.
Class Method Summary collapse
-
.clean(string) ⇒ Object
Removes all invalid characters from the string.
-
.is_utf8?(string) ⇒ Boolean
Returns true if string has valid utf-8 encoding.
-
.mb_chars(str) ⇒ Object
Multibyte proxy.
- .to_utf8(string) ⇒ Object
-
.valid_character ⇒ Object
Returns a regular expression that matches valid characters in the current encoding.
-
.verify(string) ⇒ Object
Verifies the encoding of a string.
-
.verify!(string) ⇒ Object
Verifies the encoding of the string and raises an exception when it’s not valid.
Class Attribute Details
.proxy_class ⇒ Object
The proxy class returned when calling mb_chars. You can use this accessor to configure your own proxy class so you can support other encodings. See the Mail::Multibyte::Chars implementation for an example how to do this.
Example:
Mail::Multibyte.proxy_class = CharsForUTF32
17 18 19 |
# File 'lib/mail/multibyte.rb', line 17 def proxy_class @proxy_class end |
Class Method Details
.clean(string) ⇒ Object
Removes all invalid characters from the string.
Note: this method is a no-op in Ruby 1.9
36 37 38 |
# File 'lib/mail/multibyte/utils.rb', line 36 def self.clean(string) string end |
.is_utf8?(string) ⇒ Boolean
Returns true if string has valid utf-8 encoding
12 13 14 15 16 17 18 19 20 21 |
# File 'lib/mail/multibyte/utils.rb', line 12 def self.is_utf8?(string) case string.encoding when Encoding::UTF_8 verify(string) when Encoding::ASCII_8BIT, Encoding::US_ASCII verify(to_utf8(string)) else false end end |
.mb_chars(str) ⇒ Object
Multibyte proxy
mb_chars
is a multibyte safe proxy for string methods.
In Ruby 1.8 and older it creates and returns an instance of the Mail::Multibyte::Chars class which encapsulates the original string. A Unicode safe version of all the String methods are defined on this proxy class. If the proxy class doesn’t respond to a certain method, it’s forwarded to the encapsuled string.
name = 'Claus Müller'
name.reverse # => "rell??M sualC"
name.length # => 13
name.mb_chars.reverse.to_s # => "rellüM sualC"
name.mb_chars.length # => 12
In Ruby 1.9 and newer mb_chars
returns self
because String is (mostly) encoding aware. This means that it becomes easy to run one version of your code on multiple Ruby versions.
Method chaining
All the methods on the Chars proxy which normally return a string will return a Chars object. This allows method chaining on the result of any of these methods.
name.mb_chars.reverse.length # => 12
Interoperability and configuration
The Chars object tries to be as interchangeable with String objects as possible: sorting and comparing between String and Char work like expected. The bang! methods change the internal string representation in the Chars object. Interoperability problems can be resolved easily with a to_s
call.
For more information about the methods defined on the Chars proxy see Mail::Multibyte::Chars. For information about how to change the default Multibyte behaviour see Mail::Multibyte.
55 56 57 58 59 60 61 |
# File 'lib/mail/multibyte.rb', line 55 def self.mb_chars(str) if is_utf8?(str) proxy_class.new(str) else str end end |
.to_utf8(string) ⇒ Object
40 41 42 |
# File 'lib/mail/multibyte/utils.rb', line 40 def self.to_utf8(string) string.dup.force_encoding(Encoding::UTF_8) end |
.valid_character ⇒ Object
Returns a regular expression that matches valid characters in the current encoding
7 8 9 |
# File 'lib/mail/multibyte/utils.rb', line 7 def self.valid_character VALID_CHARACTER[Encoding.default_external.to_s] end |
.verify(string) ⇒ Object
Verifies the encoding of a string
24 25 26 |
# File 'lib/mail/multibyte/utils.rb', line 24 def self.verify(string) string.valid_encoding? end |
.verify!(string) ⇒ Object
Verifies the encoding of the string and raises an exception when it’s not valid
29 30 31 |
# File 'lib/mail/multibyte/utils.rb', line 29 def self.verify!(string) raise EncodingError.new("Found characters with invalid encoding") unless verify(string) end |