Class: UnicodeUtils::Codepoint

Inherits:
Object
  • Object
show all
Defined in:
lib/unicode_utils/codepoint.rb

Overview

A Codepoint instance represents a single Unicode code point.

UnicodeUtils::Codepoint.new(0x20ac) => #<U+20AC "€" EURO SIGN utf8:e2,82,ac>

Constant Summary collapse

RANGE =

The Unicode codespace. Any integer in this range is a Unicode code point.

0..0x10FFFF

Instance Method Summary collapse

Constructor Details

#initialize(int) ⇒ Codepoint

Create a Codepoint instance that wraps the given Integer. int must be in Codepoint::RANGE.



18
19
20
21
22
23
# File 'lib/unicode_utils/codepoint.rb', line 18

def initialize(int)
  unless RANGE.include?(int)
    raise ArgumentError, "#{int} not in codespace"
  end
  @int = int
end

Instance Method Details

#hexbytesObject

Get the bytes used to encode this code point in UTF-8, hex-formatted.

Codepoint.new(0xe4).hexbytes => "c3,a4"


55
56
57
# File 'lib/unicode_utils/codepoint.rb', line 55

def hexbytes
  to_s.bytes.map { |b| sprintf("%02x", b) }.join(",")
end

#inspectObject

#<U+… char name utf8-hexbytes>



60
61
62
# File 'lib/unicode_utils/codepoint.rb', line 60

def inspect
  "#<#{uplus} #{to_s.inspect} #{name || "nil"} utf8:#{hexbytes}>"
end

#nameObject

Get the normative Unicode name of this code point.

See also: UnicodeUtils.char_name



40
41
42
# File 'lib/unicode_utils/codepoint.rb', line 40

def name
  UnicodeUtils.char_name(@int)
end

#ordObject

Convert to Integer.



26
27
28
# File 'lib/unicode_utils/codepoint.rb', line 26

def ord
  @int
end

#to_sObject

Convert this code point to an UTF-8 encoded string. Returns a new string on each call and thus it is allowed to mutate the return value.



47
48
49
# File 'lib/unicode_utils/codepoint.rb', line 47

def to_s
  @int.chr(Encoding::UTF_8)
end

#uplusObject

Format in U+ notation.

Codepoint.new(0xc5).uplus => "U+00C5"


33
34
35
# File 'lib/unicode_utils/codepoint.rb', line 33

def uplus
  sprintf('U+%04X', @int)
end