Class: Tilia::VObject::StringUtil
- Inherits:
-
Object
- Object
- Tilia::VObject::StringUtil
- Defined in:
- lib/tilia/v_object/string_util.rb
Overview
Useful utilities for working with various strings.
Class Method Summary collapse
-
.convert_to_utf8(str) ⇒ String
This method tries its best to convert the input string to UTF-8.
-
.guess_encoding(str) ⇒ String
Detects the encoding of a string.
-
.mb_strcut(string, length) ⇒ String
Cuts the string after a certain bytelength.
-
.utf8?(str) ⇒ Boolean
Returns true or false depending on if a string is valid UTF-8.
Class Method Details
.convert_to_utf8(str) ⇒ String
This method tries its best to convert the input string to UTF-8.
Currently only ISO-5991-1 input and UTF-8 input is supported, but this may be expanded upon if we receive other examples.
26 27 28 29 30 31 |
# File 'lib/tilia/v_object/string_util.rb', line 26 def self.convert_to_utf8(str) str = str.encode('UTF-8', guess_encoding(str)) # Removing any control characters str.gsub(/(?:[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F])/, '') end |
.guess_encoding(str) ⇒ String
Detects the encoding of a string
Currently only supports ‘UTF-8’, ‘ISO-5991-1’ and ‘Windows-1252’.
39 40 41 42 43 44 45 46 47 48 |
# File 'lib/tilia/v_object/string_util.rb', line 39 def self.guess_encoding(str) cd = CharDet.detect(str) # Best solution I could find ... if cd['confidence'] > 0.4 && cd['encoding'] =~ /(?:windows|iso)/i cd['encoding'] else 'UTF-8' end end |
.mb_strcut(string, length) ⇒ String
Cuts the string after a certain bytelength
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/tilia/v_object/string_util.rb', line 55 def self.mb_strcut(string, length) return '' if string == '' string = string.clone tmp = '' while tmp.bytesize <= length tmp += string[0] string[0] = '' end # Last char was utf-8 multibyte if tmp.bytesize > length string[0] = tmp[-1] + string[0].to_s tmp[-1] = '' end tmp end |
.utf8?(str) ⇒ Boolean
Returns true or false depending on if a string is valid UTF-8.
10 11 12 13 14 15 16 |
# File 'lib/tilia/v_object/string_util.rb', line 10 def self.utf8?(str) fail ArgumentError, 'str needs to be a String' unless str.is_a?(String) # Control characters return false if str =~ /[\x00-\x08\x0B-\x0C\x0E\x0F]/ str.encoding.to_s == 'UTF-8' && str.valid_encoding? end |