A Ruby gem for working with the ResourceSync web synchronization framework.
It consists of the following:
- Classes corresponding to the major document types defined in the ResourceSync specification, such as Resource Lists, Change Lists, Source Descriptions and so on. Each of these classes has a
load_from_xmlmethod that can parse the corresponding XML document (as an
REXML::Element), and a
save_to_xmlmethod that can serialize an instance of that class to XML (as an
- Classes for the major sub-structures of those documents, such as the
<sitemap>tags (subsumed under the Resource class) defined by the Sitemap specification, as well as the ResourceSync-specific
<rs:md>tags (the Link and Metadata classes, respectively).
- An XMLParser class that can take a ResourceSync-augmented Sitemap document (in the form of an
REXML::Document, a string, an
IO, or something sufficiently
REXML::Documentcan parse it) and produce an instance of the appropriate class based on the
capabilityattribute in the root element’s metadata.
Parsing a ResourceSync document
“by require ‘resync’
data = File.read(‘my-capability-list.xml’) capability_list = Resync::XMLParser.parse(data)
Writing a ResourceSync document
“by require ‘resync’
change_list = Resync::ChangeList.new( links: [Resync::Link.new(rel: up, href: http://example.com/my-dataset/my-capability-list.xml)], metadata: Resync::Metadata.new( capability: ‘changelist’, from_time: Time.utc(2013, 1, 3) ) resources: [# generate list of changes here ] ) xml = change_list.save_to_xml formatter = REXML::Formatters::Pretty.new formatter.write(xml, $stdout)
resync-client, a Ruby client library for ResourceSync.
This is a work in progress. Bug reports and feature requests are welcome (particularly on the document creation side, which our use cases haven’t really explored).
Structural inconvenience and unnecessary repetition
There are certain well-specified relationships between elements: most document types should always have a link with an
up relationship, many resources should have metadata with a defined
capability attribute, and so on. In some cases there are convenience getters for these attributes on the ‘parent’ object (e.g. you can ask for the
capability directly without violating the law of Demeter), but there generally aren’t corresponding convenience setters, or convenience initializer parameters.
Document types (
ResourceList, etc.) will create a
Metadata with the appropriate capability for themselves if none is specified, but if they’re initialized with one that doesn’t declare a capability, they’ll raise an exception rather than fill it in (just as they’ll raise an exception if the wrong capability is specified).
Logical relationships between elements
ChangeList should contain only resources with
Metadata declaring a
change type. The resources in a
ResourceDumpManifest should each declare a
path indicating their locations in the ZIP file.
resync doesn’t currently do anything to enforce, validate, or assist in compliance with these and similar restrictions.
(An exception: document types will complain if initialized with
Metadata having the wrong capability.)
Time attribute requirements
The required/forbidden time attributes defined in Appendix A, “Time Attribute Requirements”, of the ResourceSync specification are not enforced; it’s possible to create, e.g., a
ResourceList with a
from_time on its metadata, or a
ChangeList with members whose metadata does not declare a
modified_time, even though both scenarios are forbidden by the specification.
Value restrictions from XML schemata
The ResourceSync schema defines restrictions on the values of several attributes:
- Path values must start with a slash, must not end with a slash
- Priorities must be positive and < 1,000,000
- Link relation types must conform with RFC 5988
- URIs have a minimum length of 12 and a max of 2048 characters.
- Priorities must be in the range 0.0-1.0 (inclusive)
None of these restrictions are currently enforced by
When reading a ResourceSync document from XML and writing it back out,
<rs:ln> elements will always appear before
<rs:md> elements, regardless of their order in the original source.
The XML::Mapping library
resync uses doesn’t support namespaces, so namespace handling in
resync is a bit hacky. In particular, you may see strange behavior when using
<sitemap> tags outside the context of a