Preservation
Extraction from the Pure Research Information System and transformation for loading by Archivematica.
Includes transfer preparation, reporting and disk space management.
Status
Installation
Add this line to your application's Gemfile:
gem 'preservation'
And then execute:
$ bundle
Or install it yourself as:
$ gem install preservation
Usage
Configuration
Configure Preservation. If log_path
is omitted, logging (standard library)
writes to STDOUT.
Preservation.configure do |config|
config.db_path = ENV['ARCHIVEMATICA_DB_PATH']
config.ingest_path = ENV['ARCHIVEMATICA_INGEST_PATH']
config.log_path = ENV['PRESERVATION_LOG_PATH']
end
Create a hash for passing to a transfer.
# Pure host with authentication.
config = {
url: ENV['PURE_URL'],
username: ENV['PURE_USERNAME'],
password: ENV['PURE_PASSWORD']
}
# Pure host without authentication.
config = {
url: ENV['PURE_URL']
}
Transfer
Configure a transfer to retrieve data from a Pure host.
transfer = Preservation::Transfer::Dataset.new config
Single
If necessary, fetch the metadata, prepare a directory in the ingest path and populate it with the files and JSON description file.
transfer.prepare uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
Batch
For multiple Pure datasets, if necessary, fetch the metadata, prepare a directory in the ingest path and populate it with the files and JSON description file.
A maximum of 10 will be prepared using the doi_short directory naming scheme. Each dataset will only be prepared if 20 days have elapsed since the metadata record was last modified.
transfer.prepare_batch max: 10,
dir_scheme: :doi_short,
delay: 20
Directory name
The following are permitted values for the dir_scheme parameter:
:uuid_title
:title_uuid
:date_uuid_title
:date_title_uuid
:date_time_uuid
:date_time_title
:date_time_uuid_title
:date_time_title_uuid
:uuid
:doi
:doi_short
Load directory
A transfer-ready directory, with a name built according to the directory scheme specified, in this case doi_short. This particular example has only one file Ebola_data_Jun15.zip in the dataset.
.
├── 10.17635-lancaster-researchdata-6
│ ├── Ebola_data_Jun15.zip
│ └── metadata
│ └── metadata.json
metadata.json:
[
{
"filename": "objects/Ebola_data_Jun15.zip",
"dc.title": "Ebolavirus evolution 2013-2015",
"dc.description": "Data used for analysis of selection and evolutionary rate in Zaire Ebolavirus variant Makona",
"dcterms.created": "2015-06-04",
"dcterms.available": "2015-06-04",
"dc.publisher": "Lancaster University",
"dc.identifier": "http://dx.doi.org/10.17635/lancaster/researchdata/6",
"dcterms.spatial": [
"Guinea, Sierra Leone, Liberia"
],
"dc.creator": [
"Gatherer, Derek"
],
"dc.contributor": [
"Robertson, David",
"Lovell, Simon"
],
"dc.subject": [
"Ebolavirus",
"evolution",
"phylogenetics",
"virulence",
"Filoviridae",
"positive selection"
],
"dcterms.license": "CC BY",
"dc.relation": [
"http://dx.doi.org/10.1136/ebmed-2014-110127",
"http://dx.doi.org/10.1099/vir.0.067199-0"
]
}
]
Storage
Free up disk space for completed transfers. Can be done at any time.
Preservation::Storage.cleanup
Report
Can be used for scheduled monitoring of transfers.
Preservation::Report::Transfer.exception
Formatted as JSON:
{
"pending": {
"count": 3,
"data": [
{
"path": "10.17635-lancaster-researchdata-72",
"path_timestamp": "2016-09-29 12:08:58 +0100"
},
{
"path": "10.17635-lancaster-researchdata-74",
"path_timestamp": "2016-09-29 12:08:59 +0100"
},
{
"path": "10.17635-lancaster-researchdata-75",
"path_timestamp": "2016-09-29 12:09:00 +0100"
}
]
},
"current": {
"path": "10.17635-lancaster-researchdata-90",
"unit_type": "ingest",
"status": "PROCESSING",
"current": 1,
"id": 91,
"uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
"path_timestamp": "2016-09-28 17:09:33 +0100
},
"failed": {
"count": 0
},
"incomplete": {
"count": 1,
"data": [
{
"path": "10.17635-lancaster-researchdata-90",
"unit_type": "ingest",
"status": "PROCESSING",
"current": 1,
"id": 91,
"uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
"path_timestamp": "2016-09-28 17:09:33 +0100"
}
]
},
"complete": {
"count": 78
}
}