Module: Net::IMAP::ResponseParser::Patterns::RFC3629

Included in:
Net::IMAP::ResponseParser::Patterns
Defined in:
lib/net/imap/response_parser.rb

Overview

UTF-8, a transformation format of ISO 10646

UTF8-1 = %x00-7F UTF8-tail = %x80-BF UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /

%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )

UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /

%xF4 %x80-8F 2( UTF8-tail )

UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 UTF8-octets = *( UTF8-char )

n.b. String * Integer is used for repetition, rather than /x3/, because ruby 3.2’s linear-time cache-based optimization doesn’t work with “bounded or fixed times repetition nesting in another repetition (e.g. /(a2,3)*/). It is an implementation issue entirely, but we believe it is hard to support this case correctly.” See bugs.ruby-lang.org/issues/19104

Constant Summary collapse

UTF8_1 =

aka ASCII 7bit

/[\x00-\x7f]/n
UTF8_TAIL =
/[\x80-\xBF]/n
UTF8_2 =
/[\xC2-\xDF]#{UTF8_TAIL}/n
UTF8_3 =
Regexp.union(/\xE0[\xA0-\xBF]#{UTF8_TAIL}/n,
/\xED[\x80-\x9F]#{UTF8_TAIL}/n,
/[\xE1-\xEC]#{    UTF8_TAIL.source * 2}/n,
/[\xEE-\xEF]#{    UTF8_TAIL.source * 2}/n)
UTF8_4 =
Regexp.union(/[\xF1-\xF3]#{    UTF8_TAIL.source * 3}/n,
/\xF0[\x90-\xBF]#{UTF8_TAIL.source * 2}/n,
/\xF4[\x80-\x8F]#{UTF8_TAIL.source * 2}/n)
UTF8_CHAR =
Regexp.union(UTF8_1, UTF8_2, UTF8_3, UTF8_4)
UTF8_OCTETS =
/#{UTF8_CHAR}*/n