Method: String#unicode_normalize
- Defined in:
- string.c
#unicode_normalize(form = :nfc) ⇒ Object
Unicode Normalization—Returns a normalized form of str
, using Unicode normalizations NFC, NFD, NFKC, or NFKD. The normalization form used is determined by form
, which can be any of the four values :nfc
, :nfd
, :nfkc
, or :nfkd
. The default is :nfc
.
If the string is not in a Unicode Encoding, then an Exception is raised. In this context, ‘Unicode Encoding’ means any of UTF-8, UTF-16BE/LE, and UTF-32BE/LE, as well as GB18030, UCS_2BE, and UCS_4BE. Anything other than UTF-8 is implemented by converting to UTF-8, which makes it slower than UTF-8.
"a\u0300".unicode_normalize #=> "\u00E0"
"a\u0300".unicode_normalize(:nfc) #=> "\u00E0"
"\u00E0".unicode_normalize(:nfd) #=> "a\u0300"
"\xE0".force_encoding('ISO-8859-1').unicode_normalize(:nfd)
#=> Encoding::CompatibilityError raised
10870 10871 10872 10873 10874 |
# File 'string.c', line 10870
static VALUE
rb_str_unicode_normalize(int argc, VALUE *argv, VALUE str)
{
return unicode_normalize_common(argc, argv, str, id_normalize);
}
|