datacite-mapping

Build Status Code Climate Inline docs Gem Version

A library for mapping DataCite XML to Ruby objects, based on xml-mapping and xml-mapping_extensions. Full API documentation on RubyDoc.info.

Supports Datacite 4.3; backward-compatible with Datacite 3.1.

Note that although this gem maintains compatibility with multiple versions of DataCite XML, changes to DataCite XML sometimes force changes to the internal object model of the gem. So different versions of this gem may require minor updates to how a part of the model is accessed.

Usage

The core of the Datacite::Mapping library is the Resource class, corresponding to the root <resource/> element in a Datacite document.

Reading

To create a Resource object from XML file, use Resource.parse_xml or Resource.load_from_file, depending on the data source:

XML source Method to use
file path Resource.load_from_file
String Resource.parse_xml
IO Resource.parse_xml
REXML::Document Resource.parse_xml
REXML::Element Resource.parse_xml

Example:

require 'datacite/mapping'
include Datacite::Mapping

resource = Resource.load_from_file('datacite-example-full-v4.3.xml')
# => #<Datacite::Mapping::Resource:0x007f97689e87a0 …

abstract = resource.descriptions.find { |d| d.type = DescriptionType::ABSTRACT }
# => #<Datacite::Mapping::Description:0x007f976aafa330 …
abstract.value
# => "XML example of all DataCite Metadata Schema v4.3 properties."

Note that Datacite::Mapping uses the TypesafeEnum gem to represent controlled vocabularies such as ResourceTypeGeneral and DescriptionType.

Writing

In general, a Resource object must be provided with all required attributes on initialization.

resource = Resource.new(
  identifier: Identifier.new(value: '10.5555/12345678'),
  creators: [
    Creator.new(
      name: 'Josiah Carberry',
      identifier: NameIdentifier.new(
        scheme: 'ORCID', 
        scheme_uri: URI('http://orcid.org/'), 
        value: '0000-0002-1825-0097'
      ),
      affiliations: [
        'Department of Psychoceramics, Brown University'
      ]
    )
  ],
  titles: [
    Title.new(value: 'Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory')
  ],
  publisher: 'Journal of Psychoceramics',
  publication_year: 2008
)
#  => #<Datacite::Mapping::Resource:0x007f9768958fb0 …

To create XML from a Resource object, use Resource.write_xml, Resource.save_to_file, or Resource.save_to_xml, depending on the destination:

XML destination Method to use
XML string Resource.write_xml
file path Resource.save_to_file
REXML::Element Resource.save_xml

Example:

resource.write_xml
#  => "<resource xsi:schemaLocation='http://datacite.org/schema/kernel-4 …

Namespace prefix

To set a prefix for the Datacite namespace, use Resource.namespace_prefix=:

resource.namespace_prefix = 'dcs'
resource.write_xml
#  => "<dcs:resource xmlns:dcs='http://datacite.org/schema/kernel-4' …

Datacite 3 compatibility

In general, Datacite::Mapping is lax on read, accepting either Datacite 3 or Datacite 4 or a mix, and (mostly for historical reasons involving bad data its authors needed to parse) allowing some deviations from the schema. By default, it writes Datacite 4, but can write Datacite 3 by passing an optional argument to any of the writer methods:

resource.write_xml(mapping: :datacite_3) # note schema URL below
# => "<resource xsi:schemaLocation='http://datacite.org/schema/kernel-3

When using the :datacite_3 mapping, the Datacite 4 <geoLocationPolygon/> and <fundingReference/> elements, which are not supported in Datacite 3, will be dropped, with a warning. Any <relatedIdentifier/> elements of type IGSN will be converted to Handle identifiers with prefix 10273 (the prefix of the IGSN resolver).

Contributing

Datacite::Mapping is released under an MIT license. When submitting a pull request, please make sure the Rubocop style checks pass, as well as making sure unit tests pass with 100% coverage; you can check these individually with bundle exec rubocop and bundle exec rake:coverage, or run the default rake task which includes both, bundle exec rake.