Class: RDF::RDFa::Reader

Inherits:
RDF::Reader
  • Object
show all
Includes:
Expansion
Defined in:
lib/rdf/rdfa/reader.rb,
lib/rdf/rdfa/reader/rexml.rb,
lib/rdf/rdfa/reader/nokogiri.rb

Overview

An RDFa parser in Ruby

This class supports [Nokogiri][] for HTML processing, and will automatically select the most performant implementation (Nokogiri or LibXML) that is available. If need be, you can explicitly override the used implementation by passing in a `:library` option to `Reader.new` or `Reader.open`.

[Nokogiri]: nokogiri.org/

Based on processing rules described here:

See Also:

Author:

Defined Under Namespace

Modules: Nokogiri, REXML Classes: EvaluationContext

Constant Summary

XHTML =
"http://www.w3.org/1999/xhtml"
SafeCURIEorCURIEorIRI =

Content model for @about and @resource. In RDFa 1.0, this was URIorSafeCURIE

{
  :rdfa1.0" => [:safe_curie, :uri, :bnode],
  :rdfa1.1" => [:safe_curie, :curie, :uri, :bnode],
}
TERMorCURIEorAbsIRI =

Content model for @datatype. In RDFa 1.0, this was CURIE Also plural TERMorCURIEorAbsIRIs, content model for @rel, @rev, @property and @typeof

{
  :rdfa1.0" => [:term, :curie],
  :rdfa1.1" => [:term, :curie, :absuri],
}
NC_REGEXP =

This expression matches an NCName as defined in [XML-NAMES](www.w3.org/TR/2009/REC-xml-names-20091208/#NT-NCName)

Regexp.new(
%{^
  (  [a-zA-Z_]
   | \\\\u[0-9a-fA-F]{4}
  )
  (  [0-9a-zA-Z_\.-/]
   | \\\\u([0-9a-fA-F]{4})
  )*
$},
Regexp::EXTENDED)
TERM_REGEXP =

This expression matches an term as defined in [RDFA-CORE](# @see www.w3.org/TR/2012/CR-rdfa-core-20120313/#s_terms)

For the avoidance of doubt, this definition means a 'term' in RDFa is an XML NCName that also permits slash as a non-leading character.

Regexp.new(
%{^
  (?!\\\\u0301)             # ́ is a non-spacing acute accent.
                            # It is legal within an XML Name, but not as the first character.
  (  [a-zA-Z_]
   | \\\\u[0-9a-fA-F]{4}
  )
  (  [0-9a-zA-Z_\.-\/]
   | \\\\u([0-9a-fA-F]{4})
  )*
$},
Regexp::EXTENDED)

Constants included from Expansion

Expansion::COOKED_VOCAB_STATEMENTS

Instance Attribute Summary (collapse)

Instance Method Summary (collapse)

Methods included from Expansion

#expand, #rule

Constructor Details

- (reader) initialize(input = $stdin, options = {}) {|reader| ... }

Initializes the RDFa reader instance.

Parameters:

  • input (IO, File, String) (defaults to: $stdin)

    the input stream to read

  • options (Hash{Symbol => Object}) (defaults to: {})

    any additional options (see `RDF::Reader#initialize`)

Options Hash (options):

  • :library (Symbol)

    One of :nokogiri or :rexml. If nil/unspecified uses :nokogiri if available, :rexml otherwise.

  • :vocab_expansion (Boolean) — default: false

    whether to perform RDFS expansion on the resulting graph

  • :host_language (:xml, :xhtml1, :xhtml5, :html4, :html5, :svg) — default: :html5

    Host Language

  • :version (:"rdfa1.0", :"rdfa1.1") — default: :"rdfa1.1"

    Parser version information

  • :processor_callback (Proc) — default: nil

    Callback used to provide processor graph triples.

  • :rdfagraph (Array<Symbol>) — default: [:output]

    Used to indicate if either or both of the :output or :processor graphs are output. Value is an array containing on or both of :output or :processor.

  • :vocab_repository (Repository) — default: nil

    Repository to save loaded vocabularies.

  • :debug (Array)

    Array to place debug messages

Yields:

  • (reader)

    `self`

Yield Parameters:

  • reader (RDF::Reader)

Yield Returns:

  • (void)

    ignored

Raises:

  • (Error)

    Raises RDF::ReaderError if validate



257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/rdf/rdfa/reader.rb', line 257

def initialize(input = $stdin, options = {}, &block)
  super do
    @debug = options[:debug]
    
    @options[:rdfagraph] = case @options[:rdfagraph]
    when String, Symbol then @options[:rdfagraph].to_s.split(',').map(&:strip).map(&:to_sym)
    when Array then @options[:rdfagraph].map {|o| o.to_s.to_sym}
    else  []
    end.select {|o| [:output, :processor].include?(o)}
    @options[:rdfagraph] << :output if @options[:rdfagraph].empty?

    @library = case options[:library]
      when nil
        # Use Nokogiri when available, and REXML otherwise:
        (defined?(::Nokogiri) && RUBY_PLATFORM != 'java') ? :nokogiri : :rexml
      when :nokogiri, :rexml
        options[:library]
      else
        raise ArgumentError.new("expected :rexml or :nokogiri, but got #{options[:library].inspect}")
    end

    require "rdf/rdfa/reader/#{@library}"
    @implementation = case @library
      when :nokogiri then Nokogiri
      when :rexml    then REXML
    end
    self.extend(@implementation)

    detect_host_language_version(input, options)

    add_info(@doc, "version = #{@version},  host_language = #{@host_language}, library = #{@library}, rdfagraph = #{@options[:rdfagraph].inspect}, expand = #{@options[:vocab_expansion]}")

    begin
      initialize_xml(input, options)
    rescue
      add_error(nil, "Malformed document: #{$!.message}")
    end
    add_error(nil, "Empty document") if root.nil?
    add_error(nil, "Syntax errors:\n#{doc_errors}") if !doc_errors.empty?

    # Section 4.2 RDFa Host Language Conformance
    #
    # The Host Language may require the automatic inclusion of one or more Initial Contexts
    @host_defaults = {
      :vocabulary       => nil,
      :uri_mappings     => {},
      :initial_contexts => [],
    }

    if @version == :rdfa1.0"
      # Add default term mappings
      @host_defaults[:term_mappings] = %w(
        alternate appendix bookmark cite chapter contents copyright first glossary help icon index
        last license meta next p3pv1 prev role section stylesheet subsection start top up
        ).inject({}) { |hash, term| hash[term] = RDF::XHV[term]; hash }
    end

    case @host_language
    when :xml, :svg
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT]
    when :xhtml1
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, XHTML_RDFA_CONTEXT]
    when :xhtml5, :html4, :html5
      @host_defaults[:initial_contexts] = [XML_RDFA_CONTEXT, HTML_RDFA_CONTEXT]
    end

    block.call(self) if block_given?
  end
end

Instance Attribute Details

- (Object) host_language (readonly)

Host language



84
85
86
# File 'lib/rdf/rdfa/reader.rb', line 84

def host_language
  @host_language
end

- (Object) implementation (readonly)

Returns the XML implementation module for this reader instance.



226
227
228
# File 'lib/rdf/rdfa/reader.rb', line 226

def implementation
  @implementation
end

- (Object) version (readonly)

Version



88
89
90
# File 'lib/rdf/rdfa/reader.rb', line 88

def version
  @version
end

Instance Method Details

- (void) each_statement {|statement| ... }

This method returns an undefined value.

Iterates the given block for each RDF statement in the input.

Reads to graph and performs expansion if required.

Yields:

  • (statement)

Yield Parameters:

  • statement (RDF::Statement)


335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
# File 'lib/rdf/rdfa/reader.rb', line 335

def each_statement(&block)
  if @options[:vocab_expansion]
    @options[:vocab_expansion] = false
    expand.each_statement(&block)
    @options[:vocab_expansion] = true
  else
    @callback = block

    # Process any saved callbacks (processor graph issues)
    @saved_callbacks.each {|s| @callback.call(s) } if @saved_callbacks

    # Add prefix definitions from host defaults
    @host_defaults[:uri_mappings].each_pair do |prefix, value|
      prefix(prefix, value)
    end

    # parse
    parse_whole_document(@doc, RDF::URI(base_uri))
  end
end

- (void) each_triple {|subject, predicate, object| ... }

This method returns an undefined value.

Iterates the given block for each RDF triple in the input.

Yields:

  • (subject, predicate, object)

Yield Parameters:

  • subject (RDF::Resource)
  • predicate (RDF::URI)
  • object (RDF::Value)


364
365
366
367
368
# File 'lib/rdf/rdfa/reader.rb', line 364

def each_triple(&block)
  each_statement do |statement|
    block.call(*statement.to_triple)
  end
end