Class: Owasp::Esapi::Codec::CssCodec

Inherits:

BaseCodec

Object
BaseCodec
Owasp::Esapi::Codec::CssCodec

show all

Defined in:: lib/codec/css_codec.rb

Constant Summary

Constants inherited from BaseCodec

BaseCodec::END_CODE_POINT, BaseCodec::START_CODE_POINT

Instance Method Summary collapse

#decode_char(input) ⇒ Object
decode a character from the PushableString We follow the rules defined for CSS by w3 http://www.w3.org/TR/CSS21/syndata.html#characters All CSS syntax is case-insensitive within the ASCII range (i.e., [a-z] and [A-Z] are equivalent), except for parts that are not under the control of CSS.
#encode_char(immune, input) ⇒ Object
Returns backslash encoded character.

Methods inherited from BaseCodec

#decode, #encode, #hex, #min

Instance Method Details

#decode_char(input) ⇒ `Object`

decode a character from the PushableString We follow the rules defined for CSS by w3 http://www.w3.org/TR/CSS21/syndata.html#characters All CSS syntax is case-insensitive within the ASCII range (i.e., [a-z] and [A-Z] are equivalent), except for parts that are not under the control of CSS. For example, the case-sensitivity of values of the HTML attributes "id" and "class", of font names, and of URIs lies outside the scope of this specification. Note in particular that element names are case-insensitive in HTML, but case-sensitive in XML. In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B&W?" or "B\26 W\3F". Note that Unicode is code-by-code equivalent to ISO 10646 (see [UNICODE] and [ISO10646]). In CSS 2.1, a backslash () character can indicate one of three types of character escape. Inside a CSS comment, a backslash stands for itself, and if a backslash is immediately followed by the end of the style sheet, it also stands for itself (i.e., a DELIM token).

First, inside a string, a backslash followed by a newline is ignored (i.e., the string is deemed not to contain either the backslash or the newline). Outside a string, a backslash followed by a newline stands for itself (i.e., a DELIM followed by a newline).

Second, it cancels the meaning of special CSS characters. Any character (except a hexadecimal digit, linefeed, carriage return, or form feed) can be escaped with a backslash to remove its special meaning. For example, "\"" is a string consisting of one double quote. Style sheet preprocessors must not remove these backslashes from a style sheet since that would change the style sheet's meaning.

Third, backslash escapes allow authors to refer to characters they cannot easily put in a document. In this case, the backslash is followed by at most six hexadecimal digits (0..9A..F), which stand for the ISO 10646 ([ISO10646]) character with that number, which must not be zero. (It is undefined in CSS 2.1 what happens if a style sheet does contain a character with Unicode codepoint zero.) If a character in the range [0-9a-fA-F] follows the hexadecimal number, the end of the number needs to be made clear. There are two ways to do that: 1. with a space (or other white space character): "\26 B" ("&B"). In this case, user agents should treat a "CR/LF" pair (U+000D/U+000A) as a single white space character. 2. by providing exactly 6 hexadecimal digits: "\000026B" ("&B") In fact, these two methods may be combined. Only one white space character is ignored after a hexadecimal escape. Note that this means that a "real" space after the escape sequence must be doubled.If the number is outside the range allowed by Unicode (e.g., "\110000" is above the maximum 10FFFF allowed in current Unicode), the UA may replace the escape with the "replacement character" (U+FFFD). If the character is to be displayed, the UA should show a visible symbol, such as a "missing character" glyph (cf. 15.2, point 5). Note: Backslash escapes are always considered to be part of an identifier or a string (i.e., "\7B" is not punctuation, even though "{" is, and "\32" is allowed at the start of a class name, even though "2" is not). The identifier "te\st" is exactly the same identifier as "test".

# File 'lib/codec/css_codec.rb', line 50

def decode_char(input)

  input.mark
  first = input.next
  if first.nil? or !first.eql?('\\')
    input.reset
    return nil
  end
  second = input.next
  if second.nil?
    input.reset
    return nil
  end
  # rule execution
  fallthrough = false
  if second == "\r"
    # speical whitespace cases
    if input.peek?("\n")
      input.next
      fallthrough = true
    end
  end
  # handle the skip ahead. Ruby case doesnt allow for fall through so we inlined the small setup
  return decode_char(input) if second == "\n" || second == "\f" || second == "\u0000" || fallthrough
  # non hex test
  return second if !input.hex?(second)
  # check for 6 hex digits for rule 3
  tmp = second
  for i in 1..5 do
    c = input.next
    if c.nil? or c =~ /\s/
      break
    end
    if input.hex?(c)
      tmp << c
    else
      input.push(c)
    end
  end
  # check the codepoint and if outside of range, return teh replacement
  begin
    i = tmp.hex
    return i.chr(Encoding::UTF_8) if i >= START_CODE_POINT and i <= END_CODE_POINT
    return "\ufffd"
  rescue Exception => e
    raise EncodingError.new("Received an exception while parsing a string verified to be hex")
  end
end

#encode_char(immune, input) ⇒ `Object`

Returns backslash encoded character.

# File 'lib/codec/css_codec.rb', line 10

def encode_char(immune, input)
  # check immune
  return input if immune.include?(input)
  # check for alpha numeric
  hex = hex(input)
  # add a space at end to terminate under css
  return "\\#{hex} " unless hex.nil? or hex.empty?
  input
end