Module: WSDL::XML::Parser
- Extended by:
- Log::ClassMethods
- Defined in:
- lib/wsdl/xml/parser.rb
Overview
Secure XML parsing with protection against common XML attacks.
This module provides a centralized, secure way to parse XML throughout the WSDL library. It protects against:
XXE (XML External Entity) attacks: External entities are not loaded because we omit the NOENT flag (which would enable substitution) and include NONET (which blocks network access).
SSRF (Server-Side Request Forgery): Network access during parsing is disabled via NONET, preventing the parser from making outbound requests.
DTD-based attacks: We deliberately omit DTDLOAD and DTDATTR flags, so external DTDs are not loaded and DTD attributes are not defaulted. Additionally, DOCTYPE declarations are rejected by default as defense-in-depth.
Billion Laughs / XML Bomb: Internal entity expansion is limited by Nokogiri/libxml2's default entity expansion limits. For defense in depth, use Parser.detect_threats to identify suspicious patterns before parsing.
Security Design
Nokogiri's ParseOptions flags are additive — they enable features when present. For security, we carefully choose which flags to include:
Flags we INCLUDE (enabled):
- NONET: Disable network access (prevents SSRF, external entity fetching)
- NOCDATA: Merge CDATA as text (simplifies processing)
- STRICT: Require well-formed XML (for parse(), not parse_relaxed())
- NOBLANKS: Remove blank nodes (optional, for canonicalization)
Flags we deliberately OMIT (disabled):
- NOENT: Would enable entity substitution — we leave it OFF
- DTDLOAD: Would load external DTDs — we leave it OFF
- DTDATTR: Would default attributes from DTD — we leave it OFF
- DTDVALID: Would validate against DTD — we leave it OFF
DOCTYPE Rejection
By default, all parse methods reject XML documents containing DOCTYPE declarations. This is a defense-in-depth measure because:
- Legitimate SOAP/WSDL documents never require DOCTYPE declarations
- DOCTYPE is the attack vector for XXE, entity expansion, and DTD attacks
- Rejecting DOCTYPE before parsing prevents any parser vulnerabilities
Constant Summary collapse
- SECURE_PARSE_OPTIONS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Secure parse options for strict XML parsing.
These options provide secure defaults:
- STRICT: Require well-formed XML
- NONET: Block all network access during parsing
- NOCDATA: Merge CDATA sections as text nodes
Notably ABSENT (for security):
- NOENT: Not included, so entities are NOT substituted
- DTDLOAD: Not included, so external DTDs are NOT loaded
- DTDATTR: Not included, so DTD attributes are NOT defaulted
Nokogiri::XML::ParseOptions::STRICT | Nokogiri::XML::ParseOptions::NONET | Nokogiri::XML::ParseOptions::NOCDATA
- SECURE_PARSE_OPTIONS_NOBLANKS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Secure parse options with blank node removal.
Used for signature operations where whitespace must be normalized for consistent canonicalization.
SECURE_PARSE_OPTIONS | Nokogiri::XML::ParseOptions::NOBLANKS
- RELAXED_PARSE_OPTIONS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Relaxed parse options that tolerate malformed XML.
Used for third-party WSDL documents that may not be strictly valid. Still includes security protections (NONET).
Nokogiri::XML::ParseOptions::NONET | Nokogiri::XML::ParseOptions::RECOVER | Nokogiri::XML::ParseOptions::NOCDATA
- RELAXED_PARSE_OPTIONS_NOBLANKS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Relaxed parse options with blank node removal.
RELAXED_PARSE_OPTIONS | Nokogiri::XML::ParseOptions::NOBLANKS
- DOCTYPE_PATTERN =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Pattern to detect DOCTYPE declarations (case-insensitive).
/<!DOCTYPE/i
Class Method Summary collapse
-
.contains_doctype?(xml_string) ⇒ Boolean
Checks if an XML string contains a DOCTYPE declaration.
-
.detect_threats(xml_string) ⇒ Array<Symbol>
Checks if an XML document contains potentially dangerous constructs.
-
.parse(xml, noblanks: false) ⇒ Nokogiri::XML::Document
Parses an XML string or returns an existing document.
-
.parse_relaxed(xml, noblanks: false) ⇒ Nokogiri::XML::Document
Parses XML with relaxed error handling.
-
.parse_untrusted(xml, noblanks: false, strict: true) {|threats| ... } ⇒ Nokogiri::XML::Document
Parses XML with threat callback.
-
.parse_with_logging(xml, noblanks: false, strict: true) ⇒ Nokogiri::XML::Document
Parses XML with threat detection and logging.
Methods included from Log::ClassMethods
Class Method Details
.contains_doctype?(xml_string) ⇒ Boolean
Checks if an XML string contains a DOCTYPE declaration.
274 275 276 |
# File 'lib/wsdl/xml/parser.rb', line 274 def contains_doctype?(xml_string) xml_string.b.match?(DOCTYPE_PATTERN) end |
.detect_threats(xml_string) ⇒ Array<Symbol>
Checks if an XML document contains potentially dangerous constructs.
This provides defense-in-depth by detecting attack patterns before parsing. Even though our parser options block most attacks, this helps identify and log malicious input.
Detected threats:
:doctype— DOCTYPE declaration (often used in XXE):entity_declaration— ENTITY definitions:external_reference— SYSTEM or PUBLIC identifiers:parameter_entity— Parameter entity references (%entity;):deep_nesting— Excessive tag nesting (potential DoS, >1,000 open tags):large_attribute— Single attribute value >10,000 characters (potential DoS):large_attributes_total— Cumulative attribute size >1,000,000 bytes (potential DoS)
229 230 231 |
# File 'lib/wsdl/xml/parser.rb', line 229 def detect_threats(xml_string) detect_threat_patterns(xml_string.b).uniq end |
.parse(xml, noblanks: false) ⇒ Nokogiri::XML::Document
Parses an XML string or returns an existing document.
This method applies secure parsing options to protect against XXE, SSRF, and other XML-based attacks. It requires well-formed XML.
137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/wsdl/xml/parser.rb', line 137 def parse(xml, noblanks: false) case xml when Nokogiri::XML::Document xml when String reject_doctype!(xml) = noblanks ? SECURE_PARSE_OPTIONS_NOBLANKS : SECURE_PARSE_OPTIONS Nokogiri::XML(xml, nil, nil, ) else raise ArgumentError, "Expected String or Nokogiri::XML::Document, got #{xml.class}" end rescue Nokogiri::XML::SyntaxError => e raise_if_security_error(e) end |
.parse_relaxed(xml, noblanks: false) ⇒ Nokogiri::XML::Document
Parses XML with relaxed error handling.
This is useful for parsing potentially malformed WSDL documents from third parties that may not be strictly valid XML but are still processable.
Security Note: This still applies XXE and SSRF protections (NONET is enabled, NOENT/DTDLOAD are not). It only relaxes the strict well-formedness requirements via RECOVER. DOCTYPE declarations are still rejected by default.
168 169 170 171 172 173 174 |
# File 'lib/wsdl/xml/parser.rb', line 168 def parse_relaxed(xml, noblanks: false) reject_doctype!(xml) if xml.is_a?(String) = noblanks ? RELAXED_PARSE_OPTIONS_NOBLANKS : RELAXED_PARSE_OPTIONS Nokogiri::XML(xml, nil, nil, ) rescue Nokogiri::XML::SyntaxError => e raise_if_security_error(e) end |
.parse_untrusted(xml, noblanks: false, strict: true) {|threats| ... } ⇒ Nokogiri::XML::Document
Parses XML with threat callback.
This method scans for potential threats before parsing and invokes the callback if any are found. The callback can log, raise, or take other action.
260 261 262 263 264 265 266 267 |
# File 'lib/wsdl/xml/parser.rb', line 260 def parse_untrusted(xml, noblanks: false, strict: true) if xml.is_a?(String) threats = detect_threats(xml) yield threats if threats.any? && block_given? end strict ? parse(xml, noblanks:) : parse_relaxed(xml, noblanks:) end |
.parse_with_logging(xml, noblanks: false, strict: true) ⇒ Nokogiri::XML::Document
Parses XML with threat detection and logging.
Scans the XML for suspicious patterns before parsing and logs any detected threats. Useful for monitoring attack attempts against your SOAP endpoints.
192 193 194 195 196 197 198 199 |
# File 'lib/wsdl/xml/parser.rb', line 192 def parse_with_logging(xml, noblanks: false, strict: true) if xml.is_a?(String) threats = detect_threats(xml) logger.warn("Potential XML attack detected: #{threats.join(', ')}") if threats.any? end strict ? parse(xml, noblanks:) : parse_relaxed(xml, noblanks:) end |