Class: HexaPDF::Document::Metadata

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/document/metadata.rb

Overview

This class provides methods for reading and writing the document-level metadata.

When an instance is created (usually through HexaPDF::Document#metadata), the metadata is read from the document’s information dictionary (see HexaPDF::Type::Info) and made available through the various methods.

By default, the metadata is written to the information dictionary as well as to the document’s metadata stream (see HexaPDF::Type::Metadata) once the document is written. This can be controlled via the #write_info_dict and #write_metdata_stream methods.

While HexaPDF is able to write an XMP packet (using a limited form) to the document’s metadata stream, it provides no way for reading XMP metadata. If reading functionality or extended writing functionality is needed, make sure this class does not write the metadata and read/create the metadata stream yourself.

Caveats

  • Disabling writing to the information dictionary will only prevent parts from being written. The #producer is always written to the information dictionary as per the AGPL license terms. The #modification_date may be written depending on the arguments to HexaPDF::Document#write.

  • If writing the metadata stream is enabled, any existing metadata stream is completely overwritten. This means the metadata stream is not updated with the changed information.

Adding custom metadata properties

All the properties specified for the information dictionary are supported.

Furthermore, HexaPDF supports writing custom properties to the metadata stream. For this to work the used XMP namespaces need to be registered using #register_namespace. Additionally, the types of all used XMP properties need to be registered using #register_property.

The following types for XMP properties are supported:

String

Maps to the XMP simple string value. Values need to be of type String.

Integer

Maps to the XMP integer core value type and gets formatted as string. Values need to be of type Integer.

Date

Maps to the XMP simple string value, correctly formatted. Values need to be of type Time, Date, or DateTime

URI

Maps to the XMP simple value variant of URI. Values need to be of type String or URI.

Boolean

Maps to the XMP simple string value, correctly formatted. Values need to be either true or false.

OrderedArray

Maps to the XMP ordered array. Values need to be of type Array and items must be XMP simple values.

UnorderedArray

Maps to the XMP unordered array. Values need to be of type Array and items must be simple values.

LanguageArray

Maps to the XMP language alternatives array. Values need to be of type Array and items
must either be strings (they are associated with the set default language) or
LocalizedString instances.

See: PDF2.0 s14.3, www.adobe.com/products/xmp.html

Defined Under Namespace

Classes: LocalizedString

Constant Summary collapse

PREDEFINED_NAMESPACES =

Contains a mapping of predefined prefixes for XMP namespaces for metadata.

{
  "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "xmp" => "http://ns.adobe.com/xap/1.0/",
  "pdf" => "http://ns.adobe.com/pdf/1.3/",
  "dc" => "http://purl.org/dc/elements/1.1/",
  "x" => "adobe:ns:meta/",
  "pdfaid" => "http://www.aiim.org/pdfa/ns/id/",
}.freeze
PREDEFINED_PROPERTIES =

Contains a mapping of predefined XMP properties to their types, i.e. from namespace to property and then type.

{
  "http://ns.adobe.com/xap/1.0/" => {
    'CreatorTool' => 'String',
    'CreateDate' => 'Date',
    'ModifyDate' => 'Date',
  }.freeze,
  "http://ns.adobe.com/pdf/1.3/" => {
    'Keywords' => 'String',
    'Producer' => 'String',
    'Trapped' => 'Boolean',
  }.freeze,
  "http://purl.org/dc/elements/1.1/" => {
    'creator' => 'OrderedArray',
    'description' => 'LanguageArray',
    'title' => 'LanguageArray',
  }.freeze,
  "http://www.aiim.org/pdfa/ns/id/" => {
    'part' => 'Integer',
    'conformance' => 'String',
  }.freeze,
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(document) ⇒ Metadata

Creates a new Metadata object for the given PDF document.



158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/hexapdf/document/metadata.rb', line 158

def initialize(document)
  @document = document
  @namespaces = PREDEFINED_NAMESPACES.dup
  @properties = PREDEFINED_PROPERTIES.transform_values(&:dup)
  @default_language = document.catalog[:Lang] || 'x-default'
  @metadata = Hash.new {|h, k| h[k] = {} }
  @custom_metadata = []
  write_info_dict(true)
  (true)
  @document.register_listener(:complete_objects, &method(:write_metadata))
  
end

Instance Method Details

#author(value = :UNSET) ⇒ Object

:call-seq:

metadata.author           -> author or nil
metadata.author(value)    -> value

Returns the name of the person who created the document (author) if no argument is given. Otherwise sets the author to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:creator.



308
309
310
# File 'lib/hexapdf/document/metadata.rb', line 308

def author(value = :UNSET)
  property('dc', 'creator', value)
end

#creation_date(value = :UNSET) ⇒ Object

:call-seq:

metadata.creation_date           -> creation_date or nil
metadata.creation_date(value)    -> value

Returns the date and time (a Time object) the document was created if no argument is given. Otherwise sets the creation date to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:CreateDate.



387
388
389
# File 'lib/hexapdf/document/metadata.rb', line 387

def creation_date(value = :UNSET)
  property('xmp', 'CreateDate', value)
end

#creator(value = :UNSET) ⇒ Object

:call-seq:

metadata.creator           -> creator or nil
metadata.creator(value)    -> value

Returns the name of the PDF processor that created the original document from which this PDF was converted if no argument is given. Otherwise sets the name of the creator tool to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:CreatorTool.



357
358
359
# File 'lib/hexapdf/document/metadata.rb', line 357

def creator(value = :UNSET)
  property('xmp', 'CreatorTool', value)
end

#custom_metadata(data) ⇒ Object

Adds the given data string as custom metadata to the XMP document.

The data string must contain a fully valid ‘rdf:Description’ element.

Using this method allows adding metadata like PDF/A schema definitions for which there is no direct support by HexaPDF.



258
259
260
# File 'lib/hexapdf/document/metadata.rb', line 258

def (data)
  @custom_metadata << data
end

#default_language(value = :UNSET) ⇒ Object

:call-seq:

metadata.default_language          -> language
metadata.default_language(value)   -> value

Returns the default language in RFC3066 format used for unlocalized strings if no argument is given. Otherwise sets the default language to the given language.

The initial default lanuage is taken from the document catalog’s /Lang entry. If that is not set, the default language is assumed to be default language (‘x-default’).



180
181
182
183
184
185
186
# File 'lib/hexapdf/document/metadata.rb', line 180

def default_language(value = :UNSET)
  if value == :UNSET
    @default_language
  else
    @default_language = value
  end
end

#delete(ns = nil, property = nil) ⇒ Object

:call-seq:

.delete
.delete(ns_prefix)
.delete(ns_prefix, name)

Deletes either all metadata properties, only the ones from a specific namespace, or a specific one.



269
270
271
272
273
274
275
276
277
# File 'lib/hexapdf/document/metadata.rb', line 269

def delete(ns = nil, property = nil)
  if ns.nil? && property.nil?
    @metadata.clear
  elsif property.nil?
    @metadata.delete(namespace(ns))
  else
    @metadata[namespace(ns)].delete(property)
  end
end

#keywords(value = :UNSET) ⇒ Object

:call-seq:

metadata.keywords           -> keywords or nil
metadata.keywords(value)    -> value

Returns the keywords associated with the document if no argument is given. Otherwise sets keywords to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Keywords.



341
342
343
# File 'lib/hexapdf/document/metadata.rb', line 341

def keywords(value = :UNSET)
  property('pdf', 'Keywords', value)
end

#modification_date(value = :UNSET) ⇒ Object

:call-seq:

metadata.modification_date           -> modification_date or nil
metadata.modification_date(value)    -> value

Returns the date and time (a Time object) the document was most recently modified if no argument is given. Otherwise sets the modification date to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:ModifyDate.



402
403
404
# File 'lib/hexapdf/document/metadata.rb', line 402

def modification_date(value = :UNSET)
  property('xmp', 'ModifyDate', value)
end

#namespace(ns) ⇒ Object

Returns the namespace URI associated with the given prefix.



218
219
220
221
222
# File 'lib/hexapdf/document/metadata.rb', line 218

def namespace(ns)
  @namespaces.fetch(ns) do
    raise HexaPDF::Error, "Namespace prefix '#{ns}' not registered"
  end
end

#producer(value = :UNSET) ⇒ Object

:call-seq:

metadata.producer           -> producer or nil
metadata.producer(value)    -> value

Returns the name of the PDF processor that converted the original document to PDF if no argument is given. Otherwise sets the name of the producer to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Producer.



372
373
374
# File 'lib/hexapdf/document/metadata.rb', line 372

def producer(value = :UNSET)
  property('pdf', 'Producer', value)
end

#property(ns, property, value = :UNSET) ⇒ Object

:call-seq:

metadata.property(ns_prefix, name)           -> property_value
metadata.property(ns_prefix, name, value)    -> value

Returns the value for the property specified via the namespace prefix ns_prefix and name if the value argument is not provided. Otherwise sets the property to value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.



241
242
243
244
245
246
247
248
249
250
# File 'lib/hexapdf/document/metadata.rb', line 241

def property(ns, property, value = :UNSET)
  ns = @metadata[namespace(ns)]
  if value == :UNSET
    ns[property]
  elsif value.nil?
    ns.delete(property)
  else
    ns[property] = value
  end
end

#register_namespace(prefix, uri) ⇒ Object

Registers the prefix for the given namespace uri.



213
214
215
# File 'lib/hexapdf/document/metadata.rb', line 213

def register_namespace(prefix, uri)
  @namespaces[prefix] = uri
end

#register_property_type(prefix, property, type) ⇒ Object

Registers the property for the namespace specified via prefix as the given type.

The argument type has to be one of the following: ‘String’, ‘Integer’, ‘Date’, ‘URI’, ‘Boolean’, ‘OrderedArray’, ‘UnorderedArray’, or ‘LanguageArray’.



228
229
230
# File 'lib/hexapdf/document/metadata.rb', line 228

def register_property_type(prefix, property, type)
  (@properties[namespace(prefix)] ||= {})[property] = type
end

#subject(value = :UNSET) ⇒ Object

:call-seq:

metadata.subject           -> subject or nil
metadata.subject(value)    -> value

Returns the subject of the document if no argument is given. Otherwise sets the subject to the given value.

If the value is a LocalizedString, the language for the subject is taken from it. Otherwise the language specified via #default_language is used.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:description.



326
327
328
# File 'lib/hexapdf/document/metadata.rb', line 326

def subject(value = :UNSET)
  property('dc', 'description', value)
end

#title(value = :UNSET) ⇒ Object

:call-seq:

metadata.title           -> title or nil
metadata.title(value)    -> value

Returns the document’s title if no argument is given. Otherwise sets the document’s title to the given value.

If the value is a LocalizedString, the language for the title is taken from it. Otherwise the language specified via #default_language is used.

The value nil is returned if the property is not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:title.



293
294
295
# File 'lib/hexapdf/document/metadata.rb', line 293

def title(value = :UNSET)
  property('dc', 'title', value)
end

#trapped(value = :UNSET) ⇒ Object

:call-seq:

metadata.trapped           -> trapped or nil
metadata.trapped(value)    -> value

Returns true if the document has been modified to include trapping information if no argument is given. Otherwise sets the trapped status to the given boolean value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Trapped.



417
418
419
# File 'lib/hexapdf/document/metadata.rb', line 417

def trapped(value = :UNSET)
  property('pdf', 'Trapped', value)
end

#write_info_dict(value) ⇒ Object

Makes HexaPDF write the information dictionary if value is true.

See the class documentation for caveats.



196
197
198
# File 'lib/hexapdf/document/metadata.rb', line 196

def write_info_dict(value)
  @write_info_dict = value
end

#write_info_dict?Boolean

Returns true if the information dictionary should be written.

Returns:

  • (Boolean)


189
190
191
# File 'lib/hexapdf/document/metadata.rb', line 189

def write_info_dict?
  @write_info_dict
end

#write_metadata_stream(value) ⇒ Object

Makes HexaPDF write the metadata stream if value is true.

See the class documentation for caveats.



208
209
210
# File 'lib/hexapdf/document/metadata.rb', line 208

def (value)
  @write_metadata_stream = value
end

#write_metadata_stream?Boolean

Returns true if the metadata stream should be written.

Returns:

  • (Boolean)


201
202
203
# File 'lib/hexapdf/document/metadata.rb', line 201

def 
  @write_metadata_stream
end