Class: UnicodeUtils::Codepoint

Inherits:

Object

Object
UnicodeUtils::Codepoint

show all

Defined in:: lib/unicode_utils/codepoint.rb

Overview

A Codepoint instance represents a single Unicode code point.

UnicodeUtils::Codepoint.new(0x20ac) => #<U+20AC "€" EURO SIGN utf8:e2,82,ac>

Constant Summary collapse

RANGE = The Unicode codespace. Any integer in this range is a Unicode code point.

0..0x10FFFF

Instance Method Summary collapse

#hexbytes ⇒ Object

Get the bytes used to encode this code point in UTF-8, hex-formatted.
#initialize(int) ⇒ Codepoint constructor

Create a Codepoint instance that wraps the given Integer.
#inspect ⇒ Object

#<U+…
#name ⇒ Object

Get the normative Unicode name of this code point.
#ord ⇒ Object

Convert to Integer.
#to_s ⇒ Object

Convert this code point to an UTF-8 encoded string.
#uplus ⇒ Object

Format in U+ notation.

Constructor Details

#initialize(int) ⇒ `Codepoint`

Create a Codepoint instance that wraps the given Integer. int must be in Codepoint::RANGE.

# File 'lib/unicode_utils/codepoint.rb', line 18

def initialize(int)
  unless RANGE.include?(int)
    raise ArgumentError, "#{int} not in codespace"
  end
  @int = int
end

Instance Method Details

#hexbytes ⇒ `Object`

Get the bytes used to encode this code point in UTF-8, hex-formatted.

Codepoint.new(0xe4).hexbytes => "c3,a4"



55
56
57

# File 'lib/unicode_utils/codepoint.rb', line 55

def hexbytes
  to_s.bytes.map { |b| sprintf("%02x", b) }.join(",")
end

#inspect ⇒ `Object`

#<U+… char name utf8-hexbytes>



60
61
62

# File 'lib/unicode_utils/codepoint.rb', line 60

def inspect
  "#<#{uplus} #{to_s.inspect} #{name || "nil"} utf8:#{hexbytes}>"
end

#name ⇒ `Object`

Get the normative Unicode name of this code point.

#ord ⇒ `Object`

Convert to Integer.



26
27
28

# File 'lib/unicode_utils/codepoint.rb', line 26

def ord
  @int
end

#to_s ⇒ `Object`

Convert this code point to an UTF-8 encoded string. Returns a new string on each call and thus it is allowed to mutate the return value.



47
48
49

# File 'lib/unicode_utils/codepoint.rb', line 47

def to_s
  @int.chr(Encoding::UTF_8)
end

#uplus ⇒ `Object`

Format in U+ notation.

Codepoint.new(0xc5).uplus => "U+00C5"



33
34
35

# File 'lib/unicode_utils/codepoint.rb', line 33

def uplus
  sprintf('U+%04X', @int)
end

Class: UnicodeUtils::Codepoint

Overview

Constant Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(int) ⇒ Codepoint

Instance Method Details

#hexbytes ⇒ Object

#inspect ⇒ Object

#name ⇒ Object

#ord ⇒ Object

#to_s ⇒ Object