Module: MultiXml

Extended by:
Helpers
Defined in:
lib/multi_xml.rb,
lib/multi_xml/errors.rb,
lib/multi_xml/helpers.rb,
lib/multi_xml/version.rb,
lib/multi_xml/constants.rb,
lib/multi_xml/file_like.rb,
lib/multi_xml/parsers/ox.rb,
lib/multi_xml/parsers/oga.rb,
lib/multi_xml/parsers/rexml.rb,
lib/multi_xml/parsers/libxml.rb,
lib/multi_xml/parsers/nokogiri.rb,
lib/multi_xml/parsers/dom_parser.rb,
lib/multi_xml/parsers/libxml_sax.rb,
lib/multi_xml/parsers/sax_handler.rb,
lib/multi_xml/parsers/nokogiri_sax.rb

Overview

A generic swappable back-end for parsing XML

MultiXml provides a unified interface for XML parsing across different parser libraries. It automatically selects the best available parser (Ox, LibXML, Nokogiri, Oga, or REXML) and converts XML to Ruby hashes.

Examples:

Parse XML

MultiXml.parse('<root><name>John</name></root>')
#=> {"root"=>{"name"=>"John"}}

Set the parser

MultiXml.parser = :nokogiri

Defined Under Namespace

Modules: FileLike, Helpers, Parsers Classes: DisallowedTypeError, NoParserError, ParseError

Constant Summary collapse

VERSION =

The current version of MultiXml

Returns:

  • (Gem::Version)

    the gem version

Gem::Version.create("0.8.1")
TEXT_CONTENT_KEY =

Hash key for storing text content within element hashes

Examples:

Accessing text content

result = MultiXml.parse('<name>John</name>')
result["name"] #=> "John" (simplified, but internally uses __content__)

Returns:

  • (String)

    the key "content" used for text content

"__content__".freeze
RUBY_TYPE_TO_XML =

Maps Ruby class names to XML type attribute values

Examples:

Check XML type for a Ruby class

RUBY_TYPE_TO_XML["Integer"] #=> "integer"

Returns:

  • (Hash{String => String})

    mapping of Ruby class names to XML types

{
  "Symbol" => "symbol",
  "Integer" => "integer",
  "BigDecimal" => "decimal",
  "Float" => "float",
  "TrueClass" => "boolean",
  "FalseClass" => "boolean",
  "Date" => "date",
  "DateTime" => "datetime",
  "Time" => "datetime",
  "Array" => "array",
  "Hash" => "hash"
}.freeze
DISALLOWED_TYPES =

XML type attributes disallowed by default for security

These types are blocked to prevent code execution vulnerabilities.

Examples:

Check default disallowed types

DISALLOWED_TYPES #=> ["symbol", "yaml"]

Returns:

  • (Array<String>)

    list of disallowed type names

%w[symbol yaml].freeze
FALSE_BOOLEAN_VALUES =

Values that represent false in XML boolean attributes

Examples:

Check false values

FALSE_BOOLEAN_VALUES.include?("0") #=> true

Returns:

  • (Set<String>)

    values considered false

Set.new(%w[0 false]).freeze
DEFAULT_OPTIONS =

Default parsing options

Examples:

View defaults

DEFAULT_OPTIONS[:symbolize_keys] #=> false

Returns:

  • (Hash)

    default options for parse method

{
  typecast_xml_value: true,
  disallowed_types: DISALLOWED_TYPES,
  symbolize_keys: false
}.freeze
PARSER_PREFERENCE =

Parser libraries in preference order (fastest first)

Examples:

View parser order

PARSER_PREFERENCE.first #=> ["ox", :ox]

Returns:

  • (Array<Array>)

    pairs of [require_path, parser_symbol]

[
  ["ox", :ox],
  ["libxml", :libxml],
  ["nokogiri", :nokogiri],
  ["rexml/document", :rexml],
  ["oga", :oga]
].freeze
PARSE_DATETIME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Parses datetime strings, trying Time first then DateTime

Returns:

  • (Proc)

    lambda that parses datetime strings

lambda do |string|
  Time.parse(string).utc
rescue ArgumentError
  DateTime.parse(string).to_time.utc
end
FILE_CONVERTER =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Creates a file-like StringIO from base64-encoded content

Returns:

  • (Proc)

    lambda that creates file objects

lambda do |content, entity|
  StringIO.new(content.unpack1("m")).tap do |io|
    io.extend(FileLike)
    file_io = io # : FileIO
    file_io.original_filename = entity["name"]
    file_io.content_type = entity["content_type"]
  end
end
TYPE_CONVERTERS =

Type converters for XML type attributes

Maps type attribute values to lambdas that convert string content. Converters with arity 2 receive the content and the full entity hash.

Examples:

Using a converter

TYPE_CONVERTERS["integer"].call("42") #=> 42

Returns:

  • (Hash{String => Proc})

    mapping of type names to converter procs

{
  # Primitive types
  "symbol" => :to_sym.to_proc,
  "string" => :to_s.to_proc,
  "integer" => :to_i.to_proc,
  "float" => :to_f.to_proc,
  "double" => :to_f.to_proc,
  "decimal" => ->(s) { BigDecimal(s) },
  "boolean" => ->(s) { !FALSE_BOOLEAN_VALUES.include?(s.strip) },

  # Date and time types
  "date" => Date.method(:parse),
  "datetime" => PARSE_DATETIME,
  "dateTime" => PARSE_DATETIME,

  # Binary types
  "base64Binary" => ->(s) { s.unpack1("m") },
  "binary" => ->(s, entity) { (entity["encoding"] == "base64") ? s.unpack1("m") : s },
  "file" => FILE_CONVERTER,

  # Structured types
  "yaml" => lambda do |string|
    YAML.safe_load(string, permitted_classes: [Symbol, Date, Time])
  rescue ArgumentError, Psych::SyntaxError
    string
  end
}.freeze

Class Method Summary collapse

Methods included from Helpers

apply_converter, convert_hash, convert_text_content, disallowed_type?, empty_value?, extract_array_entries, find_array_entries, symbolize_keys, transform_keys, typecast_array, typecast_children, typecast_hash, typecast_xml_value, undasherize_keys, unwrap_file_if_present, unwrap_if_simple, wrap_and_typecast

Class Method Details

.parse(xml, options = {}) ⇒ Hash

Parse XML into a Ruby Hash

Examples:

Parse simple XML

MultiXml.parse('<root><name>John</name></root>')
#=> {"root"=>{"name"=>"John"}}

Parse with symbolized keys

MultiXml.parse('<root><name>John</name></root>', symbolize_keys: true)
#=> {root: {name: "John"}}

Parameters:

  • xml (String, IO)

    XML content as a string or IO-like object

  • options (Hash) (defaults to: {})

    Parsing options

Options Hash (options):

  • :parser (Symbol, String, Module)

    Parser to use for this call

  • :symbolize_keys (Boolean)

    Convert keys to symbols (default: false)

  • :disallowed_types (Array<String>)

    Types to reject (default: ['yaml', 'symbol'])

  • :typecast_xml_value (Boolean)

    Apply type conversions (default: true)

Returns:

  • (Hash)

    Parsed XML as nested hash

Raises:



74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/multi_xml.rb', line 74

def parse(xml, options = {})
  options = DEFAULT_OPTIONS.merge(options)
  xml_parser = options[:parser] ? resolve_parser(options.fetch(:parser)) : parser

  io = normalize_input(xml)
  return {} if io.eof?

  result = parse_with_error_handling(io, xml, xml_parser)
  result = typecast_xml_value(result, options.fetch(:disallowed_types)) if options.fetch(:typecast_xml_value)
  result = symbolize_keys(result) if options.fetch(:symbolize_keys)
  result
end

.parserModule

Get the current XML parser module

Returns the currently configured parser, auto-detecting one if not set. Parsers are checked in order of performance: Ox, LibXML, Nokogiri, Oga, REXML.

Examples:

Get current parser

MultiXml.parser #=> MultiXml::Parsers::Ox

Returns:

  • (Module)

    the current parser module



37
38
39
# File 'lib/multi_xml.rb', line 37

def parser
  @parser ||= resolve_parser(detect_parser)
end

.parser=(new_parser) ⇒ Module

Set the XML parser to use

Examples:

Set parser by symbol

MultiXml.parser = :nokogiri

Set parser by module

MultiXml.parser = MyCustomParser

Parameters:

  • new_parser (Symbol, String, Module)

    Parser specification

    • Symbol/String: :libxml, :nokogiri, :ox, :rexml, :oga
    • Module: Custom parser implementing parse(io) and parse_error

Returns:

  • (Module)

    the newly configured parser module



52
53
54
# File 'lib/multi_xml.rb', line 52

def parser=(new_parser)
  @parser = resolve_parser(new_parser)
end