Class: Saxon::Source

Inherits:
Object
  • Object
show all
Defined in:
lib/saxon/source.rb

Overview

Provides a wrapper around the JAXP StreamSource class Saxon uses to bring the XML bytestream in. Provides some extra methods to make handling closing the source and its inputstream after consumption more idiomatic

Defined Under Namespace

Modules: Helpers

Constant Summary collapse

PathChecker =

Lambda that checks if the given path exists and is a file

->(path) {
  File.file?(path)
}
URIChecker =

Lambda that checks if the given string is a valid URI

->(uri) {
  begin
    URI.parse(uri)
    true
  rescue URI::InvalidURIError
    false
  end
}

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(stream_source, inputstream = nil) ⇒ Source

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of Source.

Parameters:

  • stream_source (java.xml.transform.stream.StreamSource)

    The Java JAXP StreamSource

  • inputstream (java.io.InputStream, java.io.StringReader) (defaults to: nil)

    The Java InputStream or StringReader



272
273
274
275
276
# File 'lib/saxon/source.rb', line 272

def initialize(stream_source, inputstream = nil)
  @stream_source = stream_source
  @inputstream = inputstream
  @closed = false
end

Class Method Details

.create(input, opts = {}) ⇒ Saxon::Source

Generate a Saxon::Source from one of the several inputs allowed.

If possible the character encoding of the input source will be left to the XML parser to discover (from the <?xml charset="..."?> XML declaration).

The Base URI for the source (its absolute path, or URI) can be set by passing in the :base_uri option. This is the same thing as an XML document’s ‘System ID’ - Base URI is the term most widely used in Ruby libraries for this, so that’s what’s used here.

If the source’s character encoding can’t be correctly discovered by the parser from the XML declaration (<?xml version="..." charset="..."?> at the top of the document), then it can be passed as the :encoding option.

If an existing Saxon::Source is passed in, simply return it.

Parameters:

  • input (Saxon::Source, IO, File, String, Pathname, URI)

    The XML to be parsed

  • opts (Hash) (defaults to: {})

Options Hash (opts):

  • :base_uri (String)

    The Base URI for the Source - an absolute URI or relative path that will be used to resolve relative URLs in the XML. Setting this will override any path or URI derived from an IO, URI, or Path.

  • :encoding (String, Encoding)

    The encoding of the source. Note that specifying this will force the parser to ignore the charset if it’s set in the XML declaration of the source. Only really useful if there’s a discrepancy between the source’s declared and actual encoding. Defaults to the <?xml charset=“…”?> declaration in the source.

Returns:



242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/saxon/source.rb', line 242

def create(input, opts = {})
  case input
  when Saxon::Source
    input
  when IO, File, java.io.InputStream, StringIO
    from_io(input, opts)
  when Pathname, PathChecker
    from_path(input, opts)
  when URIChecker
    from_uri(input, opts)
  else
    from_string(input, opts)
  end
end

.from_io(io, opts = {}) ⇒ Saxon::Source

Generate a Saxon::Source given an IO-like

Parameters:

  • io (IO, File)

    The IO-like containing XML to be parsed

  • opts (Hash) (defaults to: {})

Options Hash (opts):

  • :base_uri (String)

    The Base URI for the Source - an absolute URI or relative path that will be used to resolve relative URLs in the XML. Setting this will override any path or URI derived from the IO-like.

  • :encoding (String, Encoding)

    The encoding of the source. Note that specifying this will force the parser to ignore the charset if it’s set in the XML declaration of the source. Only really useful if there’s a discrepancy between the source’s declared and actual encoding. Defaults to the <?xml charset=“…”?> declaration in the source.

Returns:



141
142
143
144
145
146
# File 'lib/saxon/source.rb', line 141

def from_io(io, opts = {})
  base_uri = opts.fetch(:base_uri) { Helpers.base_uri(io) }
  encoding = opts.fetch(:encoding, nil)
  inputstream = Helpers.inputstream(io, encoding)
  from_inputstream_or_reader(inputstream, base_uri)
end

.from_path(path, opts = {}) ⇒ Saxon::Source

Generate a Saxon::Source given a path to a file

Parameters:

  • path (String, Pathname)

    The path to the XML file to be parsed

  • opts (Hash) (defaults to: {})

Options Hash (opts):

  • :base_uri (String)

    The Base URI for the Source - an absolute URI or relative path that will be used to resolve relative URLs in the XML. Setting this will override the file path.

  • :encoding (String, Encoding)

    The encoding of the source. Note that specifying this will force the parser to ignore the charset if it’s set in the XML declaration of the source. Only really useful if there’s a discrepancy between the source’s declared and actual encoding. Defaults to the <?xml charset=“…”?> declaration in the source.

Returns:



162
163
164
165
166
167
168
# File 'lib/saxon/source.rb', line 162

def from_path(path, opts = {})
  encoding = opts.fetch(:encoding, nil)
  return from_inputstream_or_reader(Helpers.file(path), opts[:base_uri]) if encoding.nil?
  reader = Helpers.file_reader(path, encoding)
  base_uri = opts.fetch(:base_uri) { File.expand_path(path) }
  from_inputstream_or_reader(reader, base_uri)
end

.from_string(string, opts = {}) ⇒ Saxon::Source

Generate a Saxon::Source given a string containing XML

Parameters:

  • string (String)

    The string containing XML to be parsed

  • opts (Hash) (defaults to: {})

Options Hash (opts):

  • :base_uri (String)

    The Base URI for the Source - an absolute URI or relative path that will be used to resolve relative URLs in the XML. This will be nil unless set.

  • :encoding (String, Encoding)

    The encoding of the source. Note that specifying this will force the parser to ignore the charset if it’s set in the XML declaration of the source. Only really useful if there’s a discrepancy between the encoding of the string and the encoding of the source. Defaults to the encoding of the string, unless that is ASCII-8BIT, in which case the parser will use the <?xml charset=“…”?> declaration in the source to pick the encoding.

Returns:



205
206
207
208
209
# File 'lib/saxon/source.rb', line 205

def from_string(string, opts = {})
  encoding = opts.fetch(:encoding) { string.encoding }
  reader = Helpers.string_reader(string, encoding)
  from_inputstream_or_reader(reader, opts[:base_uri])
end

.from_uri(uri, opts = {}) ⇒ Saxon::Source

Generate a Saxon::Source given a URI

Parameters:

  • uri (String, URI)

    The URI to the XML file to be parsed

  • opts (Hash) (defaults to: {})

Options Hash (opts):

  • :base_uri (String)

    The Base URI for the Source - an absolute URI or relative path that will be used to resolve relative URLs in the XML. Setting this will override the given URI.

  • :encoding (String, Encoding)

    The encoding of the source. Note that specifying this will force the parser to ignore the charset if it’s set in the XML declaration of the source. Only really useful if there’s a discrepancy between the source’s declared and actual encoding. Defaults to the <?xml charset=“…”?> declaration in the source.

Returns:



184
185
186
187
188
# File 'lib/saxon/source.rb', line 184

def from_uri(uri, opts = {})
  encoding = opts.fetch(:encoding, nil)
  return from_io(open(uri), encoding: encoding) if encoding
  from_inputstream_or_reader(uri.to_s, opts[:base_uri])
end

Instance Method Details

#base_uriString

Returns The base URI of the Source.

Returns:

  • (String)

    The base URI of the Source



279
280
281
# File 'lib/saxon/source.rb', line 279

def base_uri
  stream_source.getSystemId
end

#base_uri=(uri) ⇒ String

Returns The new base URI of the Source.

Parameters:

  • uri (String, URI)

    The URI to use as the Source’s Base URI

Returns:

  • (String)

    The new base URI of the Source



285
286
287
288
# File 'lib/saxon/source.rb', line 285

def base_uri=(uri)
  stream_source.setSystemId(uri.to_s)
  base_uri
end

#closeTrueClass

Close the Source and its associated InputStream or Reader, allowing those resources to be freed.

Returns:

  • (TrueClass)

    Returns true



293
294
295
296
# File 'lib/saxon/source.rb', line 293

def close
  inputstream.close
  @closed = true
end

#closed?Boolean

Returns true if the source is closed, false otherwise

Returns:

  • (Boolean)

    Returns true if the source is closed, false otherwise



299
300
301
# File 'lib/saxon/source.rb', line 299

def closed?
  @closed
end

#consume {|source| ... } ⇒ Object

Yields itself and then closes itself. To be used by DocumentBuilders or other consumers, making it easy to ensure the source is closed after it has been consumed.

Yields:

  • (source)

    Yields self to the block

Raises:



309
310
311
312
313
# File 'lib/saxon/source.rb', line 309

def consume(&block)
  raise SourceClosedError if closed?
  block.call(self)
  close
end

#to_javajava.xml.transform.stream.StreamSource

Returns The underlying JAXP StreamSource.

Returns:

  • (java.xml.transform.stream.StreamSource)

    The underlying JAXP StreamSource



316
317
318
# File 'lib/saxon/source.rb', line 316

def to_java
  @stream_source
end