Module: WSDL::XML::Parser

Extended by:
Log::ClassMethods
Defined in:
lib/wsdl/xml/parser.rb

Overview

Secure XML parsing with protection against common XML attacks.

This module provides a centralized, secure way to parse XML throughout the WSDL library. It protects against:

  • XXE (XML External Entity) attacks: External entities are not loaded because we omit the NOENT flag (which would enable substitution) and include NONET (which blocks network access).

  • SSRF (Server-Side Request Forgery): Network access during parsing is disabled via NONET, preventing the parser from making outbound requests.

  • DTD-based attacks: We deliberately omit DTDLOAD and DTDATTR flags, so external DTDs are not loaded and DTD attributes are not defaulted. Additionally, DOCTYPE declarations are rejected by default as defense-in-depth.

  • Billion Laughs / XML Bomb: Internal entity expansion is limited by Nokogiri/libxml2's default entity expansion limits. For defense in depth, use Parser.detect_threats to identify suspicious patterns before parsing.

Security Design

Nokogiri's ParseOptions flags are additive — they enable features when present. For security, we carefully choose which flags to include:

Flags we INCLUDE (enabled):

  • NONET: Disable network access (prevents SSRF, external entity fetching)
  • NOCDATA: Merge CDATA as text (simplifies processing)
  • STRICT: Require well-formed XML (for parse(), not parse_relaxed())
  • NOBLANKS: Remove blank nodes (optional, for canonicalization)

Flags we deliberately OMIT (disabled):

  • NOENT: Would enable entity substitution — we leave it OFF
  • DTDLOAD: Would load external DTDs — we leave it OFF
  • DTDATTR: Would default attributes from DTD — we leave it OFF
  • DTDVALID: Would validate against DTD — we leave it OFF

DOCTYPE Rejection

By default, all parse methods reject XML documents containing DOCTYPE declarations. This is a defense-in-depth measure because:

  • Legitimate SOAP/WSDL documents never require DOCTYPE declarations
  • DOCTYPE is the attack vector for XXE, entity expansion, and DTD attacks
  • Rejecting DOCTYPE before parsing prevents any parser vulnerabilities

Examples:

Parse untrusted XML securely

doc = WSDL::XML::Parser.parse(untrusted_xml)

Parse with blank node removal (for canonicalization)

doc = WSDL::XML::Parser.parse(xml, noblanks: true)

Parse with threat logging

doc = WSDL::XML::Parser.parse_with_logging(xml)

See Also:

Constant Summary collapse

SECURE_PARSE_OPTIONS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Secure parse options for strict XML parsing.

These options provide secure defaults:

  • STRICT: Require well-formed XML
  • NONET: Block all network access during parsing
  • NOCDATA: Merge CDATA sections as text nodes

Notably ABSENT (for security):

  • NOENT: Not included, so entities are NOT substituted
  • DTDLOAD: Not included, so external DTDs are NOT loaded
  • DTDATTR: Not included, so DTD attributes are NOT defaulted
Nokogiri::XML::ParseOptions::STRICT |
Nokogiri::XML::ParseOptions::NONET |
Nokogiri::XML::ParseOptions::NOCDATA
SECURE_PARSE_OPTIONS_NOBLANKS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Secure parse options with blank node removal.

Used for signature operations where whitespace must be normalized for consistent canonicalization.

SECURE_PARSE_OPTIONS |
Nokogiri::XML::ParseOptions::NOBLANKS
RELAXED_PARSE_OPTIONS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Relaxed parse options that tolerate malformed XML.

Used for third-party WSDL documents that may not be strictly valid. Still includes security protections (NONET).

Nokogiri::XML::ParseOptions::NONET |
Nokogiri::XML::ParseOptions::RECOVER |
Nokogiri::XML::ParseOptions::NOCDATA
RELAXED_PARSE_OPTIONS_NOBLANKS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Relaxed parse options with blank node removal.

RELAXED_PARSE_OPTIONS |
Nokogiri::XML::ParseOptions::NOBLANKS
DOCTYPE_PATTERN =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Pattern to detect DOCTYPE declarations (case-insensitive).

/<!DOCTYPE/i

Class Method Summary collapse

Methods included from Log::ClassMethods

logger

Class Method Details

.contains_doctype?(xml_string) ⇒ Boolean

Checks if an XML string contains a DOCTYPE declaration.

Parameters:

  • xml_string (String)

    the XML string to check

Returns:

  • (Boolean)

    true if DOCTYPE is present



274
275
276
# File 'lib/wsdl/xml/parser.rb', line 274

def contains_doctype?(xml_string)
  xml_string.b.match?(DOCTYPE_PATTERN)
end

.detect_threats(xml_string) ⇒ Array<Symbol>

Checks if an XML document contains potentially dangerous constructs.

This provides defense-in-depth by detecting attack patterns before parsing. Even though our parser options block most attacks, this helps identify and log malicious input.

Detected threats:

  • :doctype — DOCTYPE declaration (often used in XXE)
  • :entity_declaration — ENTITY definitions
  • :external_reference — SYSTEM or PUBLIC identifiers
  • :parameter_entity — Parameter entity references (%entity;)
  • :deep_nesting — Excessive tag nesting (potential DoS, >1,000 open tags)
  • :large_attribute — Single attribute value >10,000 characters (potential DoS)
  • :large_attributes_total — Cumulative attribute size >1,000,000 bytes (potential DoS)

Examples:

Check for threats

threats = Parser.detect_threats(xml)
if threats.any?
  logger.warn("Suspicious XML: #{threats.join(', ')}")
end

Reject dangerous XML

threats = Parser.detect_threats(xml)
raise SecurityError, "Rejected: #{threats}" if threats.include?(:external_reference)

Parameters:

  • xml_string (String)

    the XML string to check

Returns:

  • (Array<Symbol>)

    list of detected threat indicators



229
230
231
# File 'lib/wsdl/xml/parser.rb', line 229

def detect_threats(xml_string)
  detect_threat_patterns(xml_string.b).uniq
end

.parse(xml, noblanks: false) ⇒ Nokogiri::XML::Document

Parses an XML string or returns an existing document.

This method applies secure parsing options to protect against XXE, SSRF, and other XML-based attacks. It requires well-formed XML.

Examples:

Basic usage

doc = Parser.parse('<root><child>text</child></root>')

With blank removal for signatures

doc = Parser.parse(xml, noblanks: true)

Parameters:

  • xml (String, Nokogiri::XML::Document)

    the XML to parse

  • noblanks (Boolean) (defaults to: false)

    remove blank nodes (default: false) Set to true when parsing for signature operations to ensure consistent canonicalization.

Returns:

  • (Nokogiri::XML::Document)

    the parsed document

Raises:

  • (ArgumentError)

    if xml is not a String or Document

  • (WSDL::XMLSecurityError)

    if XML contains a DOCTYPE declaration

  • (Nokogiri::XML::SyntaxError)

    if XML is malformed (strict mode)



137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/wsdl/xml/parser.rb', line 137

def parse(xml, noblanks: false)
  case xml
  when Nokogiri::XML::Document
    xml
  when String
    reject_doctype!(xml)
    options = noblanks ? SECURE_PARSE_OPTIONS_NOBLANKS : SECURE_PARSE_OPTIONS
    Nokogiri::XML(xml, nil, nil, options)
  else
    raise ArgumentError, "Expected String or Nokogiri::XML::Document, got #{xml.class}"
  end
rescue Nokogiri::XML::SyntaxError => e
  raise_if_security_error(e)
end

.parse_relaxed(xml, noblanks: false) ⇒ Nokogiri::XML::Document

Parses XML with relaxed error handling.

This is useful for parsing potentially malformed WSDL documents from third parties that may not be strictly valid XML but are still processable.

Security Note: This still applies XXE and SSRF protections (NONET is enabled, NOENT/DTDLOAD are not). It only relaxes the strict well-formedness requirements via RECOVER. DOCTYPE declarations are still rejected by default.

Parameters:

  • xml (String)

    the XML string to parse

  • noblanks (Boolean) (defaults to: false)

    remove blank nodes

Returns:

  • (Nokogiri::XML::Document)

    the parsed document

Raises:



168
169
170
171
172
173
174
# File 'lib/wsdl/xml/parser.rb', line 168

def parse_relaxed(xml, noblanks: false)
  reject_doctype!(xml) if xml.is_a?(String)
  options = noblanks ? RELAXED_PARSE_OPTIONS_NOBLANKS : RELAXED_PARSE_OPTIONS
  Nokogiri::XML(xml, nil, nil, options)
rescue Nokogiri::XML::SyntaxError => e
  raise_if_security_error(e)
end

.parse_untrusted(xml, noblanks: false, strict: true) {|threats| ... } ⇒ Nokogiri::XML::Document

Parses XML with threat callback.

This method scans for potential threats before parsing and invokes the callback if any are found. The callback can log, raise, or take other action.

Examples:

Log threats but continue parsing

Parser.parse_untrusted(xml) do |threats|
  logger.warn("XML threats detected: #{threats}")
end

Reject XML with external references

Parser.parse_untrusted(xml) do |threats|
  if threats.include?(:external_reference)
    raise SecurityError, "External references not allowed"
  end
end

Parameters:

  • xml (String)

    the XML string to parse

  • noblanks (Boolean) (defaults to: false)

    remove blank nodes

  • strict (Boolean) (defaults to: true)

    use strict parsing (default: true)

Yields:

  • (threats)

    called when threats are detected

Yield Parameters:

  • threats (Array<Symbol>)

    the detected threat types

Returns:

  • (Nokogiri::XML::Document)

    the parsed document

Raises:



260
261
262
263
264
265
266
267
# File 'lib/wsdl/xml/parser.rb', line 260

def parse_untrusted(xml, noblanks: false, strict: true)
  if xml.is_a?(String)
    threats = detect_threats(xml)
    yield threats if threats.any? && block_given?
  end

  strict ? parse(xml, noblanks:) : parse_relaxed(xml, noblanks:)
end

.parse_with_logging(xml, noblanks: false, strict: true) ⇒ Nokogiri::XML::Document

Parses XML with threat detection and logging.

Scans the XML for suspicious patterns before parsing and logs any detected threats. Useful for monitoring attack attempts against your SOAP endpoints.

Examples:

With logging

doc = Parser.parse_with_logging(response_xml)

Parameters:

  • xml (String)

    the XML string to parse

  • noblanks (Boolean) (defaults to: false)

    remove blank nodes

  • strict (Boolean) (defaults to: true)

    use strict parsing (default: true)

Returns:

  • (Nokogiri::XML::Document)

    the parsed document

Raises:



192
193
194
195
196
197
198
199
# File 'lib/wsdl/xml/parser.rb', line 192

def parse_with_logging(xml, noblanks: false, strict: true)
  if xml.is_a?(String)
    threats = detect_threats(xml)
    logger.warn("Potential XML attack detected: #{threats.join(', ')}") if threats.any?
  end

  strict ? parse(xml, noblanks:) : parse_relaxed(xml, noblanks:)
end