Module: Net::IMAP::ResponseParser::Patterns::RFC3629
- Included in:
- Net::IMAP::ResponseParser::Patterns
- Defined in:
- lib/net/imap/response_parser.rb
Overview
UTF-8, a transformation format of ISO 10646
UTF8-1 = %x00-7F UTF8-tail = %x80-BF UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 UTF8-octets = *( UTF8-char )
n.b. String * Integer is used for repetition, rather than /x3/, because ruby 3.2’s linear-time cache-based optimization doesn’t work with “bounded or fixed times repetition nesting in another repetition (e.g. /(a2,3)*/). It is an implementation issue entirely, but we believe it is hard to support this case correctly.” See bugs.ruby-lang.org/issues/19104
Constant Summary collapse
- UTF8_1 =
aka ASCII 7bit
/[\x00-\x7f]/n
- UTF8_TAIL =
/[\x80-\xBF]/n
- UTF8_2 =
/[\xC2-\xDF]#{UTF8_TAIL}/n
- UTF8_3 =
Regexp.union(/\xE0[\xA0-\xBF]#{UTF8_TAIL}/n, /\xED[\x80-\x9F]#{UTF8_TAIL}/n, /[\xE1-\xEC]#{ UTF8_TAIL.source * 2}/n, /[\xEE-\xEF]#{ UTF8_TAIL.source * 2}/n)
- UTF8_4 =
Regexp.union(/[\xF1-\xF3]#{ UTF8_TAIL.source * 3}/n, /\xF0[\x90-\xBF]#{UTF8_TAIL.source * 2}/n, /\xF4[\x80-\x8F]#{UTF8_TAIL.source * 2}/n)
- UTF8_CHAR =
Regexp.union(UTF8_1, UTF8_2, UTF8_3, UTF8_4)
- UTF8_OCTETS =
/#{UTF8_CHAR}*/n