datacite-mapping
A library for mapping DataCite XML to Ruby objects, based on xml-mapping and xml-mapping_extensions. Full API documentation on RubyDoc.info.
Supports Datacite 4.3; backward-compatible with Datacite 3.1.
Note that although this gem maintains compatibility with multiple versions of DataCite XML, changes to DataCite XML sometimes force changes to the internal object model of the gem. So different versions of this gem may require minor updates to how a part of the model is accessed.
Usage
The core of the Datacite::Mapping library is the Resource
class, corresponding to the root <resource/>
element
in a Datacite document.
Reading
To create a Resource
object from XML file, use Resource.parse_xml
or Resource.load_from_file
,
depending on the data source:
XML source | Method to use |
---|---|
file path | Resource.load_from_file |
String |
Resource.parse_xml |
IO |
Resource.parse_xml |
REXML::Document |
Resource.parse_xml |
REXML::Element |
Resource.parse_xml |
Example:
require 'datacite/mapping'
include Datacite::Mapping
resource = Resource.load_from_file('datacite-example-full-v4.3.xml')
# => #<Datacite::Mapping::Resource:0x007f97689e87a0 …
abstract = resource.descriptions.find { |d| d.type = DescriptionType::ABSTRACT }
# => #<Datacite::Mapping::Description:0x007f976aafa330 …
abstract.value
# => "XML example of all DataCite Metadata Schema v4.3 properties."
Note that Datacite::Mapping uses the TypesafeEnum gem to represent controlled vocabularies such as ResourceTypeGeneral and DescriptionType.
Writing
In general, a Resource
object must be provided with all required attributes on initialization.
resource = Resource.new(
identifier: Identifier.new(value: '10.5555/12345678'),
creators: [
Creator.new(
name: 'Josiah Carberry',
identifier: NameIdentifier.new(
scheme: 'ORCID',
scheme_uri: URI('http://orcid.org/'),
value: '0000-0002-1825-0097'
),
affiliations: [
'Department of Psychoceramics, Brown University'
]
)
],
titles: [
Title.new(value: 'Toward a Unified Theory of High-Energy Metaphysics: Silly String Theory')
],
publisher: 'Journal of Psychoceramics',
publication_year: 2008
)
# => #<Datacite::Mapping::Resource:0x007f9768958fb0 …
To create XML from a Resource
object, use Resource.write_xml
, Resource.save_to_file
, or
Resource.save_to_xml
, depending on the destination:
XML destination | Method to use |
---|---|
XML string | Resource.write_xml |
file path | Resource.save_to_file |
REXML::Element |
Resource.save_xml |
Example:
resource.write_xml
# => "<resource xsi:schemaLocation='http://datacite.org/schema/kernel-4 …
Namespace prefix
To set a prefix for the Datacite namespace, use Resource.namespace_prefix=
:
resource.namespace_prefix = 'dcs'
resource.write_xml
# => "<dcs:resource xmlns:dcs='http://datacite.org/schema/kernel-4' …
Datacite 3 compatibility
In general, Datacite::Mapping is lax on read, accepting either Datacite 3 or Datacite 4 or a mix, and (mostly for historical reasons involving bad data its authors needed to parse) allowing some deviations from the schema. By default, it writes Datacite 4, but can write Datacite 3 by passing an optional argument to any of the writer methods:
resource.write_xml(mapping: :datacite_3) # note schema URL below
# => "<resource xsi:schemaLocation='http://datacite.org/schema/kernel-3
When using the :datacite_3
mapping, the Datacite 4 <geoLocationPolygon/>
and <fundingReference/>
elements, which are not supported in Datacite 3, will be dropped, with a warning. Any
<relatedIdentifier/>
elements of type IGSN will be converted
to Handle identifiers with prefix 10273 (the prefix of the IGSN resolver).
Contributing
Datacite::Mapping is released under an MIT license. When submitting a pull request,
please make sure the Rubocop style checks pass, as well as making sure unit tests pass with 100%
coverage; you can check these individually with bundle exec rubocop
and bundle exec rake:coverage
,
or run the default rake task which includes both, bundle exec rake
.