Class: ROCrate::Reader

Inherits:
Object
  • Object
show all
Defined in:
lib/ro_crate/reader.rb

Overview

A class to handle reading of RO-Crates from Zip files or directories.

Class Method Summary collapse

Class Method Details

.build_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) ⇒ Crate

Create and populate crate from the given set of entities.

Parameters:

  • entity_hash (Hash{String => Hash})

    A Hash containing all the entities in the @graph, mapped by their @id.

  • source (String, ::File, Pathname)

    The location of the RO-Crate being read.

  • crate_class (Class) (defaults to: ROCrate::Crate)

    The class to use to instantiate the crate, useful if you have created a subclass of ROCrate::Crate that you want to use. (defaults to ROCrate::Crate).

  • context (nil, String, Array, Hash)

    A custom JSON-LD @context (parsed), or nil to use default.

Returns:

  • (Crate)

    The RO-Crate.



160
161
162
163
164
165
166
167
168
169
170
171
172
173
# File 'lib/ro_crate/reader.rb', line 160

def self.build_crate(entity_hash, source, crate_class: ROCrate::Crate, context:)
  crate = initialize_crate(entity_hash, source, crate_class: crate_class, context: context)

  extract_data_entities(crate, source, entity_hash).each do |entity|
    crate.add_data_entity(entity)
  end

  # The remaining entities in the hash must be contextual.
  extract_contextual_entities(crate, entity_hash).each do |entity|
    crate.add_contextual_entity(entity)
  end

  crate
end

.create_data_entity(crate, entity_class, source, entity_props) ⇒ ROCrate::File, ...

Create a DataEntity of the given class.

Parameters:

  • crate (Crate)

    The RO-Crate being read.

  • source (String, ::File, Pathname)

    The location of the RO-Crate being read.

  • entity_props (Hash)

    A Hash containing the entity’s properties, including its @id.

Returns:

Raises:



241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
# File 'lib/ro_crate/reader.rb', line 241

def self.create_data_entity(crate, entity_class, source, entity_props)
  id = entity_props.delete('@id')
  raise ROCrate::ReadException, "Data Entity missing '@id': #{entity_props.inspect}" unless id
  decoded_id = URI.decode_www_form_component(id)
  path = nil
  uri = URI(id) rescue nil
  if uri&.absolute?
    path = uri
    decoded_id = nil
  elsif !id.start_with?('#')
    [id, decoded_id].each do |i|
      fullpath = ::File.join(source, i)
      path = Pathname.new(fullpath) if ::File.exist?(fullpath)
    end
    if path.nil?
      raise ROCrate::ReadException, "Local Data Entity not found in crate: #{id}"
    end
  end

  entity_class.new(crate, path, decoded_id, entity_props)
end

.detect_root_directory(source) ⇒ Pathname?

Finds an RO-Crate’s root directory (where ‘ro-crate-metdata.json` is located) within a given directory.

Parameters:

  • source (String, ::File, Pathname)

    The location of the directory.

Returns:

  • (Pathname, nil)

    The path to the root, or nil if not found.



308
309
310
311
312
313
314
315
316
317
318
319
# File 'lib/ro_crate/reader.rb', line 308

def self.detect_root_directory(source)
  Pathname(source).find do |entry|
    if entry.file?
      name = entry.basename.to_s
      if name == ROCrate::Metadata::IDENTIFIER || name == ROCrate::Metadata::IDENTIFIER_1_0
        return entry.parent
      end
    end
  end

  nil
end

.entities_from_metadata(metadata) ⇒ Hash{String => Hash}

Extracts all the entities from the @graph of the RO-Crate Metadata.

Parameters:

  • metadata (Hash)

    A Hash containing the parsed metadata JSON.

Returns:

  • (Hash{String => Hash})

    A Hash of all the entities, mapped by their @id.



128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/ro_crate/reader.rb', line 128

def self.()
  graph = ['@graph']

  if graph
    # Collect all the things in the graph, mapped by their @id
    entities = {}
    graph.each do |entity|
      entities[entity['@id']] = entity
    end

    # Do some normalization...
    entities[ROCrate::Metadata::IDENTIFIER] = (entities)
    raise ROCrate::ReadException, "No metadata entity found in @graph!" unless entities[ROCrate::Metadata::IDENTIFIER]
    entities[ROCrate::Preview::IDENTIFIER] = extract_preview_entity(entities)
    entities[ROCrate::Crate::IDENTIFIER] = extract_root_entity(entities)
    raise ROCrate::ReadException, "No root entity (with @id: #{entities[ROCrate::Metadata::IDENTIFIER].dig('about', '@id')}) found in @graph!" unless entities[ROCrate::Crate::IDENTIFIER]

    entities
  else
    raise ROCrate::ReadException, "No @graph found in metadata!"
  end
end

.extract_contextual_entities(crate, entity_hash) ⇒ Array<ContextualEntity>

Create appropriately specialized ContextualEntity objects from the given hash of entities and their properties.

Parameters:

  • crate (Crate)

    The RO-Crate being read.

  • entity_hash (Hash)

    A Hash containing all the entities in the @graph, mapped by their @id.

Returns:



222
223
224
225
226
227
228
229
230
231
232
# File 'lib/ro_crate/reader.rb', line 222

def self.extract_contextual_entities(crate, entity_hash)
  entities = []

  entity_hash.each do |id, entity_props|
    entity_class = ROCrate::ContextualEntity.specialize(entity_props)
    entity = entity_class.new(crate, id, entity_props)
    entities << entity
  end

  entities
end

.extract_data_entities(crate, source, entity_hash) ⇒ Array<ROCrate::File, ROCrate::Directory>

Discover data entities from the ‘hasPart` property of a crate, and create DataEntity objects for them. Entities are looked up in the given `entity_hash` (and then removed from it).

Parameters:

  • crate (Crate)

    The RO-Crate being read.

  • source (String, ::File, Pathname)

    The location of the RO-Crate being read.

  • entity_hash (Hash)

    A Hash containing all the entities in the @graph, mapped by their @id.

Returns:



206
207
208
209
210
211
212
213
214
215
# File 'lib/ro_crate/reader.rb', line 206

def self.extract_data_entities(crate, source, entity_hash)
  (crate.raw_properties['hasPart'] || []).map do |ref|
    entity_props = entity_hash.delete(ref['@id'])
    next unless entity_props
    entity_class = ROCrate::DataEntity.specialize(entity_props)
    entity = create_data_entity(crate, entity_class, source, entity_props)
    next if entity.nil?
    entity
  end.compact
end

.extract_metadata_entity(entities) ⇒ nil, Hash{String => Hash}

Extract the metadata entity from the entity hash, according to the rules defined here: www.researchobject.org/ro-crate/1.1/root-data-entity.html#finding-the-root-data-entity mapped by its @id, or nil if nothing is found.

Returns:

  • (nil, Hash{String => Hash})

    A Hash containing (hopefully) one value, the metadata entity’s properties



269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# File 'lib/ro_crate/reader.rb', line 269

def self.(entities)
  key = entities.detect do |_, props|
    conforms = props['conformsTo']
    conforms = [conforms] unless conforms.is_a?(Array)
    conforms.compact.any? { |c| c['@id']&.start_with?(ROCrate::Metadata::RO_CRATE_BASE) }
  end&.first

  return entities.delete(key) if key

  # Legacy support
  (entities.delete("./#{ROCrate::Metadata::IDENTIFIER}") ||
      entities.delete(ROCrate::Metadata::IDENTIFIER) ||
      entities.delete("./#{ROCrate::Metadata::IDENTIFIER_1_0}") ||
      entities.delete(ROCrate::Metadata::IDENTIFIER_1_0))
end

.extract_preview_entity(entities) ⇒ Hash{String => Hash}

Extract the ro-crate-preview entity from the entity hash.

Returns:

  • (Hash{String => Hash})

    A Hash containing the preview entity’s properties mapped by its @id, or nil if nothing is found.



288
289
290
# File 'lib/ro_crate/reader.rb', line 288

def self.extract_preview_entity(entities)
  entities.delete("./#{ROCrate::Preview::IDENTIFIER}") || entities.delete(ROCrate::Preview::IDENTIFIER)
end

.extract_root_entity(entities) ⇒ Hash{String => Hash}

Extract the root entity from the entity hash, according to the rules defined here: www.researchobject.org/ro-crate/1.1/root-data-entity.html#finding-the-root-data-entity mapped by its @id.

Returns:

  • (Hash{String => Hash})

    A Hash containing (hopefully) one value, the root entity’s properties,

Raises:



297
298
299
300
301
# File 'lib/ro_crate/reader.rb', line 297

def self.extract_root_entity(entities)
  root_id = entities[ROCrate::Metadata::IDENTIFIER].dig('about', '@id')
  raise ROCrate::ReadException, "Metadata entity does not reference any root entity" unless root_id
  entities.delete(root_id)
end

.initialize_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) ⇒ Crate

Initialize a crate from the given set of entities.

Parameters:

  • entity_hash (Hash{String => Hash})

    A Hash containing all the entities in the @graph, mapped by their @id.

  • source (String, ::File, Pathname)

    The location of the RO-Crate being read.

  • crate_class (Class) (defaults to: ROCrate::Crate)

    The class to use to instantiate the crate, useful if you have created a subclass of ROCrate::Crate that you want to use. (defaults to ROCrate::Crate).

  • context (nil, String, Array, Hash)

    A custom JSON-LD @context (parsed), or nil to use default.

Returns:

  • (Crate)

    The RO-Crate.



184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/ro_crate/reader.rb', line 184

def self.initialize_crate(entity_hash, source, crate_class: ROCrate::Crate, context:)
  crate_class.new.tap do |crate|
    crate.properties = entity_hash.delete(ROCrate::Crate::IDENTIFIER)
    crate..properties = entity_hash.delete(ROCrate::Metadata::IDENTIFIER)
    crate..context = context
    preview_properties = entity_hash.delete(ROCrate::Preview::IDENTIFIER)
    preview_path = ::File.join(source, ROCrate::Preview::IDENTIFIER)
    preview_path = ::File.exist?(preview_path) ? Pathname.new(preview_path) : nil
    if preview_properties || preview_path
      crate.preview = ROCrate::Preview.new(crate, preview_path, preview_properties || {})
    end
    crate.add_all(source, false)
  end
end

.read(source, target_dir: Dir.mktmpdir) ⇒ Crate

Reads an RO-Crate from a directory or zip file.

Parameters:

  • source (String, ::File, Pathname, #read)

    The location of the zip or directory, or an IO-like object containing a zip.

  • target_dir (String, ::File, Pathname) (defaults to: Dir.mktmpdir)

    The target directory where the crate should be unzipped (if its a Zip file).

Returns:

  • (Crate)

    The RO-Crate.



11
12
13
14
15
16
17
18
19
20
21
22
23
# File 'lib/ro_crate/reader.rb', line 11

def self.read(source, target_dir: Dir.mktmpdir)
  begin
    is_dir = ::File.directory?(source)
  rescue TypeError
    is_dir = false
  end

  if is_dir
    read_directory(source)
  else
    read_zip(source, target_dir: target_dir)
  end
end

.read_directory(source) ⇒ Crate

Reads an RO-Crate from a directory.

Parameters:

  • source (String, ::File, Pathname)

    The location of the directory.

Returns:

  • (Crate)

    The RO-Crate.

Raises:



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/ro_crate/reader.rb', line 99

def self.read_directory(source)
  raise ROCrate::ReadException, "Source is not a directory!" unless ::File.directory?(source)

  source = ::File.expand_path(source)
   = Dir.entries(source).detect { |entry| entry == ROCrate::Metadata::IDENTIFIER ||
      entry == ROCrate::Metadata::IDENTIFIER_1_0 }

  if 
     = ::File.read(::File.join(source, ))
    begin
       = JSON.parse()
    rescue JSON::ParserError => e
      raise ROCrate::ReadException.new("Error parsing metadata", e)
    end

    entities = ()
    context = ['@context']

    build_crate(entities, source, context: context)
  else
    raise ROCrate::ReadException, "No metadata found!"
  end
end

.read_zip(source, target_dir: Dir.mktmpdir) ⇒ Crate

Reads an RO-Crate from a zip file. It first extracts the Zip file to a temporary directory, and then calls #read_directory.

Parameters:

  • source (String, ::File, Pathname, #read)

    The location of the zip file, or an IO-like object.

  • target_dir (String, ::File, Pathname) (defaults to: Dir.mktmpdir)

    The target directory where the crate should be unzipped.

Returns:

  • (Crate)

    The RO-Crate.

Raises:



83
84
85
86
87
88
89
90
91
92
# File 'lib/ro_crate/reader.rb', line 83

def self.read_zip(source, target_dir: Dir.mktmpdir)
  raise ROCrate::ReadException, "Target is not a directory!" unless ::File.directory?(target_dir)

  unzip_to(source, target_dir)

  # Traverse the unzipped directory to try and find the crate's root
  root_dir = detect_root_directory(target_dir)

  read_directory(root_dir)
end

.unzip_file_to(file_or_path, target) ⇒ Object

Extract the contents of the given Zip file to the given directory.

Parameters:

  • source (String, ::File, Pathname)

    The location of the zip file.

  • target (String, ::File, Pathname)

    The target directory where the file should be unzipped.



63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/ro_crate/reader.rb', line 63

def self.unzip_file_to(file_or_path, target)
  Dir.chdir(target) do
    Zip::File.open(file_or_path) do |zipfile|
      zipfile.each do |entry|
        unless ::File.exist?(entry.name)
          FileUtils::mkdir_p(::File.dirname(entry.name))
          zipfile.extract(entry, entry.name)
        end
      end
    end
  end
end

.unzip_io_to(io, target) ⇒ Object

Extract the given Zip file data to the given directory.

Parameters:

  • source (#read)

    An IO-like object containing a Zip file.

  • target (String, ::File, Pathname)

    The target directory where the file should be unzipped.



45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/ro_crate/reader.rb', line 45

def self.unzip_io_to(io, target)
  Dir.chdir(target) do
    Zip::InputStream.open(io) do |input|
      while (entry = input.get_next_entry)
        unless ::File.exist?(entry.name) || entry.name_is_directory?
          FileUtils::mkdir_p(::File.dirname(entry.name))
          ::File.binwrite(entry.name, input.read)
        end
      end
    end
  end
end

.unzip_to(source, target) ⇒ Object

Extract the contents of the given Zip file/data to the given directory.

Parameters:

  • source (String, ::File, Pathname, #read)

    The location of the zip file, or an IO-like object.

  • target (String, ::File, Pathname)

    The target directory where the file should be unzipped.



30
31
32
33
34
35
36
37
38
# File 'lib/ro_crate/reader.rb', line 30

def self.unzip_to(source, target)
  source = Pathname.new(::File.expand_path(source)) if source.is_a?(String)

  if source.is_a?(Pathname) || source.respond_to?(:path)
    unzip_file_to(source, target)
  else
    unzip_io_to(source, target)
  end
end