Class: Gitlab::Unicode

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/unicode.rb

Constant Summary collapse

BIDI_REGEXP =

Regular expression for identifying bidirectional control characters in UTF-8 strings

Documentation on how this works: idiosyncratic-ruby.com/41-proper-unicoding.html

/\p{Bidi Control}/
SPACE_REGEXP =

Regular expression for identifying space characters

In web browsers space characters can be confused with simple spaces which may be misleading

/\p{Space_Separator}/
DANGEROUS_CHARS =
Regexp.union(
  /[\p{Cc}&&[^\t\n\r]]/, # All control chars except tab, LF, CR
  /\u00AD/,              # Soft hyphen
  /\u200B/,              # ZWSP
  /[\u202A-\u202E]/,     # Bidi overrides
  /\u2060/,              # Word joiner
  /[\u2066-\u2069]/,     # Bidi isolates
  /\uFEFF/,              # BOM
  /[\uFFF9-\uFFFB]/,     # Annotations
  /\uFFFC/,              # Object replacement
  /[\u2062-\u2064]/,     # Invisible math operators
  /[\u{E0000}-\u{E01EF}]/, # Tag characters + Variation Selectors Supplement
  /[\u2028-\u2029]/ # Line/paragraph separators
).freeze

Class Method Summary collapse

Class Method Details

.bidi_warningObject

Warning message used to highlight bidi characters in the GUI



35
36
37
# File 'lib/gitlab/unicode.rb', line 35

def bidi_warning
  _("Potentially unwanted character detected: Unicode BiDi Control")
end