Class: Tilia::VObject::StringUtil

Inherits:
Object
  • Object
show all
Defined in:
lib/tilia/v_object/string_util.rb

Overview

Useful utilities for working with various strings.

Class Method Summary collapse

Class Method Details

.convert_to_utf8(str) ⇒ String

This method tries its best to convert the input string to UTF-8.

Currently only ISO-5991-1 input and UTF-8 input is supported, but this may be expanded upon if we receive other examples.

Parameters:

  • str (String)

Returns:

  • (String)


26
27
28
29
30
31
# File 'lib/tilia/v_object/string_util.rb', line 26

def self.convert_to_utf8(str)
  str = str.encode('UTF-8', guess_encoding(str))

  # Removing any control characters
  str.gsub(/(?:[\x00-\x08\x0B-\x0C\x0E-\x1F\x7F])/, '')
end

.guess_encoding(str) ⇒ String

Detects the encoding of a string

Currently only supports ‘UTF-8’, ‘ISO-5991-1’ and ‘Windows-1252’.

Parameters:

  • str (String)

Returns:

  • (String)

    encoding



39
40
41
42
43
44
45
46
47
48
# File 'lib/tilia/v_object/string_util.rb', line 39

def self.guess_encoding(str)
  cd = CharDet.detect(str)

  # Best solution I could find ...
  if cd['confidence'] > 0.4 && cd['encoding'] =~ /(?:windows|iso)/i
    cd['encoding']
  else
    'UTF-8'
  end
end

.mb_strcut(string, length) ⇒ String

Cuts the string after a certain bytelength

Parameters:

  • string (String)
  • length (Fixnum)

Returns:

  • (String)

    cut string



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/tilia/v_object/string_util.rb', line 55

def self.mb_strcut(string, length)
  return '' if string == ''

  string = string.clone
  tmp = ''
  while tmp.bytesize <= length
    tmp += string[0]
    string[0] = ''
  end

  # Last char was utf-8 multibyte
  if tmp.bytesize > length
    string[0] = tmp[-1] + string[0].to_s
    tmp[-1] = ''
  end
  tmp
end

.utf8?(str) ⇒ Boolean

Returns true or false depending on if a string is valid UTF-8.

Parameters:

  • str (String)

Returns:

  • (Boolean)


10
11
12
13
14
15
16
# File 'lib/tilia/v_object/string_util.rb', line 10

def self.utf8?(str)
  fail ArgumentError, 'str needs to be a String' unless str.is_a?(String)
  # Control characters
  return false if str =~ /[\x00-\x08\x0B-\x0C\x0E\x0F]/

  str.encoding.to_s == 'UTF-8' && str.valid_encoding?
end