Module: Babosa::UTF8::DumbProxy

Extended by:
DumbProxy, UTF8Proxy
Included in:
DumbProxy
Defined in:
lib/babosa/utf8/dumb_proxy.rb

Overview

This module provides fallback UTF-8 support when nothing else is available. It does case folding for Roman alphabet-based characters commonly used by Western European languages and little else, making it useless for Russian, Bulgarian, Greek, etc. If at all possible, Unicode or ActiveSupport should be used instead because they support the full UTF-8 character range.

Constant Summary

Constants included from UTF8Proxy

UTF8Proxy::CP1252

Instance Method Summary collapse

Methods included from UTF8Proxy

tidy_bytes

Instance Method Details

#downcase(string) ⇒ Object



16
17
18
# File 'lib/babosa/utf8/dumb_proxy.rb', line 16

def downcase(string)
  string.unpack("U*").map {|char| Mappings::DOWNCASE[char] or char}.flatten.pack("U*")
end

#normalize_utf8(string) ⇒ Object

This does a very naive Unicode normalization, which should work for this library’s purposes (i.e., Roman-based codepoints, up to U+017E). Do not use reuse this as a general solution! Use a real library like Unicode or ActiveSupport instead.



28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/babosa/utf8/dumb_proxy.rb', line 28

def normalize_utf8(string)
  codepoints = string.unpack("U*")
  new = []
  until codepoints.empty? do
    if Mappings::COMPOSITION[codepoints[0..1]]
      new << Mappings::COMPOSITION[codepoints.slice!(0,2)]
    else
      new << codepoints.shift
    end
  end
  new.compact.flatten.pack("U*")
end

#upcase(string) ⇒ Object



20
21
22
# File 'lib/babosa/utf8/dumb_proxy.rb', line 20

def upcase(string)
  string.unpack("U*").map {|char| Mappings::UPCASE[char] or char}.flatten.pack("U*")
end