Module: Stringex::Unidecoder

Defined in:: lib/stringex/unidecoder.rb

Constant Summary collapse

CODEPOINTS = Contains Unicode codepoints, loading as needed from YAML files

Hash.new{|h, k|
  h[k] = ::YAML.load_file(File.join(File.expand_path(File.dirname(__FILE__)), "unidecoder_data", "#{k}.yml"))
}

Class Method Summary collapse

.decode(string) ⇒ Object

Returns string with its UTF-8 characters transliterated to ASCII ones.
.encode(codepoint) ⇒ Object

Returns character for the given Unicode codepoint.
.get_codepoint(character) ⇒ Object

Returns Unicode codepoint for the given character.
.in_yaml_file(character) ⇒ Object

Returns string indicating which file (and line) contains the transliteration value for the character.

Class Method Details

.decode(string) ⇒ `Object`

Returns string with its UTF-8 characters transliterated to ASCII ones

You’re probably better off just using the added String#to_ascii



17
18
19

# File 'lib/stringex/unidecoder.rb', line 17

def decode(string)
  string.chars.map{|char| decoded(char)}.join
end

.encode(codepoint) ⇒ `Object`

Returns character for the given Unicode codepoint



22
23
24

# File 'lib/stringex/unidecoder.rb', line 22

def encode(codepoint)
  ["0x#{codepoint}".to_i(16)].pack("U")
end

.get_codepoint(character) ⇒ `Object`

Returns Unicode codepoint for the given character



27
28
29

# File 'lib/stringex/unidecoder.rb', line 27

def get_codepoint(character)
  "%04x" % character.unpack("U")[0]
end

.in_yaml_file(character) ⇒ `Object`

Returns string indicating which file (and line) contains the transliteration value for the character

# File 'lib/stringex/unidecoder.rb', line 33

def in_yaml_file(character)
  unpacked = character.unpack("U")[0]
  "#{code_group(unpacked)}.yml (line #{grouped_point(unpacked) + 2})"
end

Module: Stringex::Unidecoder

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.decode(string) ⇒ Object

.encode(codepoint) ⇒ Object

.get_codepoint(character) ⇒ Object

.in_yaml_file(character) ⇒ Object

.decode(string) ⇒ `Object`

.encode(codepoint) ⇒ `Object`

.get_codepoint(character) ⇒ `Object`

.in_yaml_file(character) ⇒ `Object`