Module: Babosa::UTF8::DumbProxy

Extended by:: DumbProxy, UTF8Proxy

Included in:: DumbProxy

Defined in:: lib/babosa/utf8/dumb_proxy.rb

Overview

This module provides fallback UTF-8 support when nothing else is available. It does case folding for Roman alphabet-based characters commonly used by Western European languages and little else, making it useless for Russian, Bulgarian, Greek, etc. If at all possible, Unicode or ActiveSupport should be used instead because they support the full UTF-8 character range.

Constant Summary

Constants included from UTF8Proxy

UTF8Proxy::CP1252

Instance Method Summary collapse

#downcase(string) ⇒ Object
#normalize_utf8(string) ⇒ Object

This does a very naive Unicode normalization, which should work for this library’s purposes (i.e., Roman-based codepoints, up to U+017E).
#upcase(string) ⇒ Object

Methods included from UTF8Proxy

tidy_bytes

Instance Method Details

#downcase(string) ⇒ `Object`



16
17
18

# File 'lib/babosa/utf8/dumb_proxy.rb', line 16

def downcase(string)
  string.unpack("U*").map {|char| Mappings::DOWNCASE[char] or char}.flatten.pack("U*")
end

#normalize_utf8(string) ⇒ `Object`

This does a very naive Unicode normalization, which should work for this library’s purposes (i.e., Roman-based codepoints, up to U+017E). Do not use reuse this as a general solution! Use a real library like Unicode or ActiveSupport instead.

# File 'lib/babosa/utf8/dumb_proxy.rb', line 28

def normalize_utf8(string)
  codepoints = string.unpack("U*")
  new = []
  until codepoints.empty? do
    if Mappings::COMPOSITION[codepoints[0..1]]
      new << Mappings::COMPOSITION[codepoints.slice!(0,2)]
    else
      new << codepoints.shift
    end
  end
  new.compact.flatten.pack("U*")
end

#upcase(string) ⇒ `Object`



20
21
22

# File 'lib/babosa/utf8/dumb_proxy.rb', line 20

def upcase(string)
  string.unpack("U*").map {|char| Mappings::UPCASE[char] or char}.flatten.pack("U*")
end