Module: Babosa::UTF8::DumbProxy

Extended by:
DumbProxy, UTF8Proxy
Included in:
DumbProxy
Defined in:
lib/babosa/utf8/dumb_proxy.rb

Overview

This module provides fallback UTF-8 support when nothing else is available. It does case folding for Roman alphabet-based characters commonly used by Western European languages and little else, making it useless for Russian, Bulgarian, Greek, etc. If at all possible, Unicode or ActiveSupport should be used instead because they support the full UTF-8 character range.

Constant Summary

Constants included from UTF8Proxy

UTF8Proxy::CP1252

Instance Method Summary collapse

Methods included from UTF8Proxy

tidy_bytes

Instance Method Details

#downcase(string) ⇒ Object



15
16
17
# File 'lib/babosa/utf8/dumb_proxy.rb', line 15

def downcase(string)
  string.unpack("U*").map {|char| Mappings::DOWNCASE[char] or char}.flatten.pack("U*")
end

#normalize_utf8(string) ⇒ Object

This does a very naive Unicode normalization, which should work for this library’s purposes (i.e., Roman-based codepoints, up to U+017E). Do not use reuse this as a general solution! Use a real library like Unicode or ActiveSupport instead.



27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/babosa/utf8/dumb_proxy.rb', line 27

def normalize_utf8(string)
  codepoints = string.unpack("U*")
  new = []
  until codepoints.empty? do
    if Mappings::COMPOSITION[codepoints[0..1]]
      new << Mappings::COMPOSITION[codepoints.slice!(0,2)]
    else
      new << codepoints.shift
    end
  end
  new.compact.flatten.pack("U*")
end

#upcase(string) ⇒ Object



19
20
21
# File 'lib/babosa/utf8/dumb_proxy.rb', line 19

def upcase(string)
  string.unpack("U*").map {|char| Mappings::UPCASE[char] or char}.flatten.pack("U*")
end