Class: RDF::Turtle::Writer

Inherits:
Writer
  • Object
show all
Includes:
StreamingWriter, Util::Logger
Defined in:
lib/rdf/turtle/writer.rb

Overview

A Turtle serialiser

Note that the natural interface is to write a whole graph at a time. Writing statements or Triples will create a graph to add them to and then serialize the graph.

The writer will add prefix definitions, and use them for creating @prefix definitions, and minting QNames

Examples:

Obtaining a Turtle writer class

RDF::Writer.for(:ttl)         #=> RDF::Turtle::Writer
RDF::Writer.for("etc/test.ttl")
RDF::Writer.for(file_name:       "etc/test.ttl")
RDF::Writer.for(file_extension:  "ttl")
RDF::Writer.for(content_type:    "text/turtle")

Serializing RDF graph into an Turtle file

RDF::Turtle::Writer.open("etc/test.ttl") do |writer|
  writer << graph
end

Serializing RDF statements into an Turtle file

RDF::Turtle::Writer.open("etc/test.ttl") do |writer|
  graph.each_statement do |statement|
    writer << statement
  end
end

Serializing RDF statements into an Turtle string

RDF::Turtle::Writer.buffer do |writer|
  graph.each_statement do |statement|
    writer << statement
  end
end

Serializing RDF statements to a string in streaming mode

RDF::Turtle::Writer.buffer(stream:  true) do |writer|
  graph.each_statement do |statement|
    writer << statement
  end
end

Creating @base and @prefix definitions in output

RDF::Turtle::Writer.buffer(base_uri:  "http://example.com/", prefixes:  {
    nil => "http://example.com/ns#",
    foaf:  "http://xmlns.com/foaf/0.1/"}
) do |writer|
  graph.each_statement do |statement|
    writer << statement
  end
end

Author:

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from StreamingWriter

#stream_epilogue, #stream_prologue, #stream_statement

Constructor Details

#initialize(output = $stdout, **options) {|writer| ... } ⇒ Writer

Initializes the Turtle writer instance.

Parameters:

  • output (IO, File) (defaults to: $stdout)

    the output stream

  • options (Hash{Symbol => Object})

    any additional options

Options Hash (**options):

  • :encoding (Encoding) — default: Encoding::UTF_8

    the encoding to use on the output stream

  • :canonicalize (Boolean) — default: false

    whether to canonicalize literals when serializing

  • :prefixes (Hash) — default: Hash.new

    the prefix mappings to use (not supported by all writers)

  • :base_uri (#to_s) — default: nil

    the base URI to use when constructing relative URIs

  • :max_depth (Integer) — default: 3

    Maximum depth for recursively defining resources, defaults to 3

  • :standard_prefixes (Boolean) — default: false

    Add standard prefixes to @prefixes, if necessary.

  • :stream (Boolean) — default: false

    Do not attempt to optimize graph presentation, suitable for streaming large graphs.

  • :default_namespace (String) — default: nil

    URI to use as default namespace, same as prefixes[nil]

  • :unique_bnodes (Boolean) — default: false

    Use unique node identifiers, defaults to using the identifier which the node was originall initialized with (if any).

  • :literal_shorthand (Boolean) — default: true

    Attempt to use Literal shorthands for numbers and boolean values

Yields:

  • (writer)

    self

  • (writer)

Yield Parameters:

  • writer (RDF::Writer)
  • writer (RDF::Writer)

Yield Returns:

  • (void)

126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# File 'lib/rdf/turtle/writer.rb', line 126

def initialize(output = $stdout, **options, &block)
  @graph = RDF::Graph.new
  @uri_to_pname = {}
  @uri_to_prefix = {}
  options = {literal_shorthand: true}.merge(options)
  super do
    reset
    if block_given?
      case block.arity
        when 0 then instance_eval(&block)
        else block.call(self)
      end
    end
  end
end

Instance Attribute Details

#graphGraph

Returns Graph of statements serialized.

Returns:

  • (Graph)

    Graph of statements serialized


64
65
66
# File 'lib/rdf/turtle/writer.rb', line 64

def graph
  @graph
end

Class Method Details

.optionsObject

Writer options


69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/rdf/turtle/writer.rb', line 69

def self.options
  super + [
    RDF::CLI::Option.new(
      symbol: :max_depth,
      datatype: Integer,
      on: ["--max-depth DEPTH"],
      description: "Maximum depth for recursively defining resources, defaults to 3.") {true},
    RDF::CLI::Option.new(
      symbol: :stream,
      datatype: TrueClass,
      on: ["--stream"],
      description: "Do not attempt to optimize graph presentation, suitable for streaming large graphs.") {true},
    RDF::CLI::Option.new(
      symbol: :default_namespace,
      datatype: RDF::URI,
      on: ["--default-namespace URI", :REQUIRED],
      description: "URI to use as default namespace, same as prefixes.") {|arg| RDF::URI(arg)},
    RDF::CLI::Option.new(
      symbol: :literal_shorthand,
      datatype: FalseClass,
      on: ["--no-literal-shorthand"],
      description: "Do not ttempt to use Literal shorthands fo numbers and boolean values.") {false},
  ]
end

Instance Method Details

#blankNodePropertyList?(resource, position) ⇒ Boolean (protected)

Can subject be represented as a blankNodePropertyList?

Returns:

  • (Boolean)

483
484
485
486
487
488
# File 'lib/rdf/turtle/writer.rb', line 483

def blankNodePropertyList?(resource, position)
  !resource.statement? && resource.node? &&
    !collection?(resource) &&
    (!is_done?(resource) || position == :subject) &&
    ref_count(resource) == (position == :object ? 1 : 0)
end

#bump_reference(resource) ⇒ Integer (protected)

Increase the reference count of this resource

Parameters:

  • resource (RDF::Resource)

Returns:

  • (Integer)

    resulting reference count


507
508
509
# File 'lib/rdf/turtle/writer.rb', line 507

def bump_reference(resource)
  @references[resource] = ref_count(resource) + 1
end

#format_literal(literal, **options) ⇒ String

Returns the N-Triples representation of a literal.

Parameters:

  • literal (RDF::Literal, String, #to_s)
  • options (Hash{Symbol => Object})

Returns:

  • (String)

275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
# File 'lib/rdf/turtle/writer.rb', line 275

def format_literal(literal, **options)
  case literal
  when RDF::Literal
    case @options[:literal_shorthand] && literal.valid? ? literal.datatype : false
    when RDF::XSD.boolean
      %w(true false).include?(literal.value) ? literal.value : literal.canonicalize.to_s
    when RDF::XSD.integer
      literal.value.match?(/^[\+\-]?\d+$/) && !canonicalize? ? literal.value : literal.canonicalize.to_s
    when RDF::XSD.decimal
      literal.value.match?(/^[\+\-]?\d+\.\d+?$/) && !canonicalize? ?
        literal.value :
        literal.canonicalize.to_s
    when RDF::XSD.double
      in_form = case literal.value
      when /[\+\-]?\d+\.\d*E[\+\-]?\d+$/i then true
      when /[\+\-]?\.\d+E[\+\-]?\d+$/i    then true
      when /[\+\-]?\d+E[\+\-]?\d+$/i      then true
      else false
      end && !canonicalize?

      in_form ? literal.value : literal.canonicalize.to_s.sub('E', 'e').to_s
    else
      text = quoted(literal.value)
      text << "@#{literal.language}" if literal.has_language?
      text << "^^#{format_uri(literal.datatype)}" if literal.has_datatype?
      text
    end
  else
    quoted(literal.to_s)
  end
end

#format_node(node, **options) ⇒ String

Returns the Turtle representation of a blank node.

Parameters:

  • node (RDF::Node)
  • options (Hash{Symbol => Object})

Returns:

  • (String)

325
326
327
# File 'lib/rdf/turtle/writer.rb', line 325

def format_node(node, **options)
  options[:unique_bnodes] ? node.to_unique_base : node.to_base
end

#format_quotedTriple(statement, **options) ⇒ String

Returns an embedded triple.

Parameters:

  • statement (RDF::Statement)
  • options (Hash{Symbol => Object})

Returns:

  • (String)

335
336
337
338
# File 'lib/rdf/turtle/writer.rb', line 335

def format_quotedTriple(statement, **options)
  log_debug("rdfstar") {"#{statement.to_ntriples}"}
  "<<%s %s %s>>" % statement.to_a.map { |value| format_term(value, **options) }
end

#format_uri(uri, **options) ⇒ String

Returns the Turtle representation of a URI reference.

Parameters:

  • uri (RDF::URI)
  • options (Hash{Symbol => Object})

Returns:

  • (String)

313
314
315
316
317
# File 'lib/rdf/turtle/writer.rb', line 313

def format_uri(uri, **options)
  md = uri.relativize(base_uri)
  log_debug("relativize") {"#{uri.to_ntriples} => #{md.inspect}"} if md != uri.to_s
  md != uri.to_s ? "<#{md}>" : (get_pname(uri) || "<#{uri}>")
end

#get_pname(resource) ⇒ String?

Return a QName for the URI, or nil. Adds namespace of QName to defined prefixes

Parameters:

  • resource (RDF::Resource)

Returns:

  • (String, nil)

    value to use to identify URI


207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# File 'lib/rdf/turtle/writer.rb', line 207

def get_pname(resource)
  case resource
  when RDF::Node
    return options[:unique_bnodes] ? resource.to_unique_base : resource.to_base
  when RDF::URI
    uri = resource.to_s
  else
    return nil
  end

  pname = case
  when @uri_to_pname.has_key?(uri)
    return @uri_to_pname[uri]
  when u = @uri_to_prefix.keys.sort_by {|uu| uu.length}.reverse.detect {|uu| uri.index(uu.to_s) == 0}
    # Use a defined prefix
    prefix = @uri_to_prefix[u]
    unless u.to_s.empty?
      prefix(prefix, u) unless u.to_s.empty?
      log_debug("get_pname") {"add prefix #{prefix.inspect} => #{u}"}
      uri.sub(u.to_s, "#{prefix}:")
    end
  when @options[:standard_prefixes] && vocab = RDF::Vocabulary.each.to_a.detect {|v| uri.index(v.to_uri.to_s) == 0}
    prefix = vocab.__name__.to_s.split('::').last.downcase
    @uri_to_prefix[vocab.to_uri.to_s] = prefix
    prefix(prefix, vocab.to_uri) # Define for output
    log_debug("get_pname") {"add standard prefix #{prefix.inspect} => #{vocab.to_uri}"}
    uri.sub(vocab.to_uri.to_s, "#{prefix}:")
  else
    nil
  end

  # Make sure pname is a valid pname
  if pname
    md = Terminals::PNAME_LN.match(pname) || Terminals::PNAME_NS.match(pname)
    pname = nil unless md.to_s.length == pname.length
  end

  @uri_to_pname[uri] = pname
end

#indent(modifier = 0) ⇒ String (protected)

Returns indent string multiplied by the depth

Parameters:

  • modifier (Integer) (defaults to: 0)

    Increase depth by specified amount

Returns:

  • (String)

    A number of spaces, depending on current depth


454
455
456
# File 'lib/rdf/turtle/writer.rb', line 454

def indent(modifier = 0)
  " " * (@options.fetch(:log_depth, log_depth) * 2 + modifier)
end

#is_done?(subject) ⇒ Boolean (protected)

Returns:

  • (Boolean)

511
512
513
# File 'lib/rdf/turtle/writer.rb', line 511

def is_done?(subject)
  @serialized.include?(subject)
end

#order_subjectsArray<Resource> (protected)

Order subjects for output. Override this to output subjects in another order.

Uses #top_classes and #base_uri.

Returns:

  • (Array<Resource>)

    Ordered list of subjects


364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
# File 'lib/rdf/turtle/writer.rb', line 364

def order_subjects
  seen = {}
  subjects = []

  # Start with base_uri
  if base_uri && @subjects.keys.include?(base_uri)
    subjects << RDF::URI(base_uri)
    seen[RDF::URI(base_uri)] = true
  end

  # Add distinguished classes
  top_classes.each do |class_uri|
    graph.query({predicate:  RDF.type, object:  class_uri}).
      map {|st| st.subject}.
      sort.
      uniq.
      each do |subject|
      log_debug("order_subjects") {subject.to_ntriples}
      subjects << subject
      seen[subject] = true
    end
  end

  # Mark as seen lists that are part of another list
  @lists.values.map(&:statements).
    flatten.each do |st|
      seen[st.object] = true if @lists.key?(st.object)
    end

  # List elements which are bnodes should not be targets for top-level serialization
  list_elements = @lists.values.map(&:to_a).flatten.select(&:node?).compact

  # Sort subjects by resources and statements over bnodes, ref_counts and the subject URI itself
  recursable = (@subjects.keys - list_elements).
    select {|s| !seen.include?(s)}.
    map {|r| [r.node? ? 2 : (r.statement? ? 1 : 0), ref_count(r), r]}.
    sort

  subjects + recursable.map{|r| r.last}
end

#predicate_orderArray<URI> (protected)

Defines order of predicates to to emit at begninning of a resource description. Defaults to \[rdf:type, rdfs:label, dc:title\]

Returns:

  • (Array<URI>)

358
# File 'lib/rdf/turtle/writer.rb', line 358

def predicate_order; [RDF.type, RDF::RDFS.label, RDF::URI("http://purl.org/dc/terms/title")]; end

#preprocessObject (protected)

Perform any preprocessing of statements required


406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
# File 'lib/rdf/turtle/writer.rb', line 406

def preprocess
  # Load defined prefixes
  (@options[:prefixes] || {}).each_pair do |k, v|
    @uri_to_prefix[v.to_s] = k
  end

  prefix(nil, @options[:default_namespace]) if @options[:default_namespace]

  case
  when @options[:stream]
  else
    @options[:prefixes] = {}  # Will define actual used when matched

    @graph.each {|statement| preprocess_statement(statement)}
  end
end

#preprocess_statement(statement) ⇒ Object (protected)

Perform any statement preprocessing required. This is used to perform reference counts and determine required prefixes.

Parameters:

  • statement (Statement)

426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
# File 'lib/rdf/turtle/writer.rb', line 426

def preprocess_statement(statement)
  #log_debug("preprocess") {statement.to_ntriples}
  bump_reference(statement.object)
  # Count properties of this subject
  (@subjects[statement.subject] ||= {})[statement.predicate] ||= 0
  @subjects[statement.subject][statement.predicate] += 1

  # Collect lists
  if statement.predicate == RDF.first
    l = RDF::List.new(subject: statement.subject, graph: graph)
    @lists[statement.subject] = l if l.valid?
  end

  if statement.object == RDF.nil || statement.subject == RDF.nil
    # Add an entry for the list tail
    @lists[RDF.nil] ||= RDF::List[]
  end

  # Pre-fetch pnames, to fill prefixes
  get_pname(statement.subject)
  get_pname(statement.predicate)
  get_pname(statement.object)
  get_pname(statement.object.datatype) if statement.object.literal? && statement.object.datatype
end

#prop_count(subject) ⇒ Integer (protected)

Return the number of statements having this resource as a subject other than for list properties

Returns:

  • (Integer)

492
493
494
495
496
# File 'lib/rdf/turtle/writer.rb', line 492

def prop_count(subject)
  @subjects.fetch(subject, {}).
    reject {|k, v| [RDF.type, RDF.first, RDF.rest].include?(k)}.
    values.reduce(:+) || 0
end

#quoted(string) ⇒ String (protected)

Use single- or multi-line quotes. If literal contains \t, \n, or \r, use a multiline quote, otherwise, use a single-line

Parameters:

  • string (String)

Returns:

  • (String)

472
473
474
475
476
477
478
479
480
# File 'lib/rdf/turtle/writer.rb', line 472

def quoted(string)
  if string.to_s.match(/[\t\n\r]/)
    string = string.gsub('\\', '\\\\\\\\').gsub('"', '\\"')

    %("""#{string}""")
  else
    "\"#{escaped(string)}\""
  end
end

#ref_count(resource) ⇒ Integer (protected)

Return the number of times this node has been referenced in the object position

Returns:

  • (Integer)

500
501
502
# File 'lib/rdf/turtle/writer.rb', line 500

def ref_count(resource)
  @references.fetch(resource, 0)
end

#resetObject (protected)

Reset internal helper instance variables


459
460
461
462
463
464
465
# File 'lib/rdf/turtle/writer.rb', line 459

def reset
  @lists = {}

  @references = {}
  @serialized = {}
  @subjects = {}
end

#sort_properties(properties) ⇒ Array<String>

Take a hash from predicate uris to lists of values. Sort the lists of values. Return a sorted list of properties.

Parameters:

  • properties (Hash{String => Array<Resource>})

    A hash of Property to Resource mappings

Returns:

  • (Array<String>)

    ] Ordered list of properties. Uses predicate_order.


251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/rdf/turtle/writer.rb', line 251

def sort_properties(properties)
  # Make sorted list of properties
  prop_list = []

  predicate_order.each do |prop|
    next unless properties[prop.to_s]
    prop_list << prop.to_s
  end

  properties.keys.sort.each do |prop|
    next if prop_list.include?(prop.to_s)
    prop_list << prop.to_s
  end

  log_debug("sort_properties") {prop_list.join(', ')}
  prop_list
end

#start_documentObject (protected)

Output @base and @prefix definitions


342
343
344
345
346
347
348
349
# File 'lib/rdf/turtle/writer.rb', line 342

def start_document
  @output.write("#{indent}@base <#{base_uri}> .\n") unless base_uri.to_s.empty?

  log_debug("start_document") {prefixes.inspect}
  prefixes.keys.sort_by(&:to_s).each do |prefix|
    @output.write("#{indent}@prefix #{prefix}: <#{prefixes[prefix]}> .\n")
  end
end

#subject_done(subject) ⇒ Object (protected)

Mark a subject as done.


516
517
518
# File 'lib/rdf/turtle/writer.rb', line 516

def subject_done(subject)
  @serialized[subject] = true
end

#top_classesArray<URI> (protected)

Defines rdf:type of subjects to be emitted at the beginning of the graph. Defaults to rdfs:Class

Returns:

  • (Array<URI>)

353
# File 'lib/rdf/turtle/writer.rb', line 353

def top_classes; [RDF::RDFS.Class]; end

#write_epilogue

This method returns an undefined value.

Outputs the Turtle representation of all stored triples.

See Also:


174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/rdf/turtle/writer.rb', line 174

def write_epilogue
  case
  when @options[:stream]
    stream_epilogue
  else
    @max_depth = @options[:max_depth] || 3

    self.reset

    log_debug("\nserialize") {"graph: #{@graph.size}"}

    preprocess

    start_document

    # Remove lists that are referenced and have non-list properties;
    # these are legal, but can't be serialized as lists
    @lists.reject! do |node, list|
      ref_count(node) > 0 && prop_count(node) > 0
    end

    order_subjects.each do |subject|
      unless is_done?(subject)
        statement(subject)
      end
    end
  end
  super
end

#write_prologue

This method returns an undefined value.

Write out declarations


160
161
162
163
164
165
166
167
# File 'lib/rdf/turtle/writer.rb', line 160

def write_prologue
  case
  when @options[:stream]
    stream_prologue
  else
  end
  super
end

#write_triple(subject, predicate, object)

This method returns an undefined value.

Adds a triple to be serialized

Parameters:

  • subject (RDF::Resource)
  • predicate (RDF::URI)
  • object (RDF::Value)

148
149
150
151
152
153
154
155
# File 'lib/rdf/turtle/writer.rb', line 148

def write_triple(subject, predicate, object)
  statement = RDF::Statement.new(subject, predicate, object)
  if @options[:stream]
    stream_statement(statement)
  else
    @graph.insert(statement)
  end
end