saxon-rb – An idiomatic Ruby wrapper for Saxon

saxon-rb aims to be a fully-featured idiomatic wrapper for the Saxon XML processing and transformation library. Built after several years experience with writing, using, and maintaining the saxon-xslt XSLT-focussed wrapper, saxon-rb aims to keep that ease-of-use for those most common cases, while making the less-common cases easy as well.

Saxon provides a massive amount of valuable functionality beyond simple XSLT compilation and invocation, and a lot of that is very hard to use from Ruby. The facilities provided by it, and by the XSLT 2 and 3 specs, rely heavily on the XDM values and types system, which saxon-rb makes easy to work with.

Parameter creation and passing, are richer and more expressive; results from XSLT, or XPath, that aren't just result trees can be worked with directly in Ruby.

You can find Saxon HE at http://saxon.sourceforge.net/ and Saxonica at http://www.saxonica.com/

Saxon HE is (c) Michael H. Kay and released under the Mozilla MPL 1.0 (http://www.mozilla.org/MPL/1.0/)

Installation

Add this line to your application's Gemfile:

gem 'saxon-rb'

And then execute:

$ bundle

Or install it yourself as:

$ gem install saxon-rb

Simple usage

Parse an XML document

Using a default document builder from the default processor:

document_node = Saxon::Processor.create.document_builder.build(Saxon::Source.from_path('/path/to/your.xml'))

document_node = Saxon.XML('/path/to/your.xml')

Transform an XML document with XSLT

transformer = Saxon::Processor.create.xslt_compiler.compile(Saxon::Source.from_path('/path/to/your.xsl'))
# Or
transformer = Saxon.XSLT('/path/to/your.xsl')

# Apply templates against a document
result_1 = transformer.apply_templates(document_node)

# Call a template without a context item to process
result_2 = transformer.call_template('main-template')

Run XPath queries against an XML document

processor = Saxon::Processor.create
xpath = processor.xpath_compiler.compile('//element[@attr = $a:var]')

matches = xpath.evaluate(document_node)

Migrating from `saxon-xslt` (or Nokogiri)

saxon-xslt wrapped Saxon and provided a Nokogiri-esque API. Nokogiri is built on XSLT 1 processors, and the APIs support XSLT 1 features, but won't allow XSLT 2/3 features (like setting initial tunnel parameters, starting processing by calling a named template, or a function). The main API for invoking XSLT in saxon-rb needs to be different from Nokogiri's so that full use of XSLT 2/3 features is possible.

By default, the original saxon-xslt API (on Saxon::XSLT::Stylesheet) is not available. If you need those methods, then you can load the legacy API by requiring saxon/nokogiri.

That gives you back the #transform, #apply_to, and #serialize methods on the object you get back after compiling an XSLT: Saxon::XSLT::Executable in saxon-rb. They work the same way, and you should be able to drop in saxon-rb as a replacement for XSLT processing.

require 'saxon-rb'
require 'saxon/nokogiri'

xslt = Saxon.XSLT('/path/to/my.xsl')
xslt.apply_to(Saxon.XML('/path/to/my.xml')) #=> "<result-xml/>"

Usage

XSLT

Using XSLT involves creating a compiler, compiling an XSLT document, and then using that compiled document to transform something.

Constructing a Compiler

The simplest way is to call #xslt_compiler on a Saxon::Processor instance.

processor = Saxon::Processor.create

# Simplest, default options
compiler = processor.xslt_compiler

In order to set compile-time options, declare static compile-time parameters then pass a block to the method using the DSL syntax (see the DSL RDoc for complete details):

compiler = processor.xslt_compiler {
  static_parameters 'param' => 'value'
  default_collation 'https://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive/'
}

The static context for a Compiler cannot be changed, you must create a new one with the context you want. We make it very simple to create a new Compiler based on an existing one. Declaring a parameter again overwrites the value.

new_compiler = compiler.create {
  static_parameters 'param' => 'new value'
}

new_compiler.default_collation #=> "https://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive/"

If you wanted to remove a value, you need to start from scratch. You can, of course, extract any data you want from a compiler instance separately and use that to create a new one.

params = compiler.static_parameters
new_compiler = processor.xslt_compiler {
  static_parameters params
}
new_compiler.default_collation #=> nil

Compiling an XSLT stylesheet

Once you have a compiler, call #compile and pass in a Saxon::Source or an existing Saxon::XDM::Node. Parameters and other run-time configuration options can be set using a block in the same way as creating a compiler. You'll be returned a Saxon::XSLT::Executable.

source = Saxon::Source.create('my.xsl')
xslt = compiler.compile(source) {
  initial_template_parameters 'param' => 'other value'
}

You can also pass in (or override) parameters at stylesheet execution time, but if you'll be executing the same stylesheet against many documents with the same initial parameters then setting them at compile time is simpler.

Executing an XSLT stylesheet

Once you have a compiled stylesheet, then it can be executed against a source document in a variety of ways.

First, you can use the traditional apply templates ,method, which was the only way in XSLT 1.

input = Saxon::Source.create('input.xml')
result = xslt.apply_templates(input)

Next, you can call a specific named template (new in XSLT 2).

result = xslt.call_template('template-name')

Note that there's no input document here. If your XSLT needs a global context item set when you invoke it via a named template, then you can do that, too:

input = processor.XML('input.xml')
result = xslt.call_template('template-name', {
  global_context_item: input
})

Global and initial template parameters can be set at compiler creation time, compile time, or execution time. See Setting parameters for details.

To serialize the document you can, of course, just call #to_s on the result:

result = xslt.apply_templates(input)
puts result.to_s #=> '<?xml version="1.0"...'

You can also serialize directly to a file path or to any IO instance.

result = xslt.apply_templates(input)
result.serialize('/path/to/output.xml')

result_2 = xslt.apply_templates(input)
result_2.serialize($stderr)

You can override serialization options that were set by <xsl:output/> in your XSLT:

result = xslt.apply_templates(input)
result.serialize('/path/to/output.xml') {
  output_property[:indent] = 'yes'
}

You can also obtain the result of the transform as an XDM Value:

result = xslt.apply_templates(input)
result.xdm_value #=> #<Saxon::XDM::Node...>

You also have easy access to provide an instance of a class implementing Saxon's net.sf.saxon.s9api.Destination interface:

dom_document = javax.xml.parsers.DocumentBuilderFactory.newInstance.builder.newDocument
destination = Saxon::S9API::DOMDestination.new(dom_document)
result = xslt.apply_templates(input).to_destination(destination)

Setting parameters

There are four kinds of parameters you can set: Static parameters, which are set at stylesheet compile time and cannot be changed after compilation. Global parameters defined by top-level <xsl:parameter/> are available throughout an XSLT, and they can be set when the compiled XSLT is run. The other two kinds of parameters relate to parameters passed to the first template run (either the first template matched when called with #apply_templates, or the named template called with #call_template). Initial template parameters are essentially implied <xsl:with-parameter tunnel="no"> elements. Initial template tunnel parameters are implied <xsl:with-parameter tunnel="yes"> elements.

# At compile time
xslt = compiler.compile(source) {
  static_parameters 'static-param' => 'static value'
  global_parameters 'param' => 'global value'
  initial_template_parameters 'param' => 'other value'
  initial_template_tunnel_parameters 'param' => 'tunnel value'
}

# At execution time
xslt.apply_templates(input, {
  global_parameters: {'param' => 'global value'},
  initial_template_parameters: {'param' => 'other value'},
  initial_template_tunnel_parameters: {'param' => 'tunnel value'}
})

Multiple parameters can be set:

# At compile time
xslt = compiler.compile(source) {
  global_parameters 'param-1' => 'a', 'param-2' => 'b'
}

# At execution time
xslt.apply_templates(input, {
  global_parameters: {'param-1' => 'a', 'param-2' => 'b'}
})

Parameter names in XSLT are QNames, and values are an XDM Value. saxon-rb will convert Ruby values (see Saxon::QName.resolve and Saxon::XDM.Value). You can also use explicit Saxon::QName or XDM values:

compiler.compile(source) {
  global_parameters Saxon::QName.clark('{http://example.org/#ns}name') => Saxon::XDM.Value(1)
}

If you need to use parameter names which use a namespace prefix, you must use an explicit Saxon::QName to refer to it.

XPath

Using an XPath involves creating a compiler, compiling an XPath into an executable, and then running that XPath executable against an XDM node.

In order to use prefixed QNames in your XPaths, like +/ns:name/+, then you need to declare prefix/namespace URI bindings when you create a compiler.

It's also possible to make use of variables in your XPaths by declaring them at the compiler creation stage, and then passing in values for them as XPath run time.

processor = Saxon::Processor.create
xpath = processor.xpath_compiler {
  namespace a: 'http://example.org/a'
  variable 'a:var', 'xs:string'
}.compile('//a:element[@attr = $a:var]')

matches = xpath.evaluate(document_node, {
  'a:var' => 'the value'
}) #=> Saxon::XDM::Value

The XPath::Executable#evaluate method returns an XDM Value containing the result sequence. For a result sequence with multiple items then it'll be a Saxon::XDM::Value. A single-item sequence will return an appropriate item instance - a Saxon::XDM::Node or a Saxon::XDM::AtomicValue.

You can also use the XPath::Executable#as_enum to return a lazy enumerator over the result.

Using your Saxon PE license and `.jar`s instead of the bundled Saxon HE

Saxon 9.9 HE is bundled with the gem. To use Saxon PE or EE (the commercial versions) you need to make the .jars available, and then create a licensed Saxon::Configuration object. To make the .jars available is simply a matter of adding them to the CLASS_PATH. The version of Saxon downloaded directly provides several .jar files. We provide a Saxon::Loader method for adding the .jars within the directory correctly. Saxon is distributed through Maven as a single .jar, which you can just add to the LOAD_PATH/CLASS_PATH. If you're adding to the CLASS_PATH directly, or calling Saxon::Loader.load!, then you need to do it before you try and use the library.

Loading a Saxon PE you downloaded directly from Saxonica

require 'saxon-rb'

Saxon::Loader.load!('/path/to/SaxonPE9-9-1-2J') # The folder that contains the .jars, like $SAXON_HOME
config = Saxon::Configuration.create_licensed('/path/to/saxon.lic')
processor = Saxon::Processor.create(config)

processor.xslt_compiler...

Loading a Saxon PE installed via Maven (e.g. with JBundler)

require 'jbundler'
require 'saxon-rb'

config = Saxon::Configuration.create_licensed('/path/to/saxon.lic')
processor = Saxon::Processor.create(config)

...

See https://github.com/mkristian/jbundler for more on loading Java deps from Maven.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/fidothe/saxon-rb. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Saxon-rb project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.