Module: DS::Extractor::DsMetsXmlExtractor::ClassMethods

Included in:: DS::Extractor::DsMetsXmlExtractor

Defined in:: lib/ds/extractor/ds_mets_xml_extractor.rb

Constant Summary collapse

NS =

{
  mods: 'http://www.loc.gov/mods/v3',
  mets: 'http://www.loc.gov/METS/',
}

DATE_START_XPATH =

'mods:mods/mods:originInfo/mods:dateCreated[@point="start"]'

DATE_END_XPATH =

'mods:mods/mods:originInfo/mods:dateCreated[@point="end"]'

Instance Method Summary collapse

#dated_by_scribe?(xml) ⇒ Boolean

Determines if the XML document is dated by a scribe.
#extract_acknowledgments(xml) ⇒ Array<String>

Extracts acknowledgments from the given XML document.
#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extracts all subjects from the given record.
#extract_all_subjects_as_recorded(xml) ⇒ Array<String>

Extract all subjects as recorded from the given XML.
#extract_artists(record) ⇒ Array<DS::Extractor::Name>

Extracts artists from the given record using the specified type and role.
#extract_artists_as_recorded(record) ⇒ Object

Extracts artists as recorded from the given record.
#extract_assigned_date(part) ⇒ Array<Integer>

Return dates found in the otherDate element, reformatting them as needed.
#extract_associated_agents(record) ⇒ Array<String>

Extract other names from the given record.
#extract_authors(record) ⇒ Array<DS::Extractor::Name>

Extracts authors from the given record.
#extract_authors_as_recorded(record) ⇒ Array<String>

Extracts authors as recorded from the given record.
#extract_cataloging_convention(xml) ⇒ Object
#extract_date_created(part) ⇒ Array<Integer>

Return any date not found in the otherDate or in a dateCreated date range (see #extract_date_range); thus:.
#extract_date_range(xml, range_sep:) ⇒ Array<String>

Extract ranges from mods:dateCreated elements where a @point is defined, thus:.
#extract_date_range_for_part(part) ⇒ Array<Integer>

Extract ranges from mods:dateCreated elements where a @point is start and end.
#extract_docket(xml) ⇒ Array<String>

DS METS can have mods:abstract elments with @displayLabel=“docket”.
#extract_explicit(node, tag:) ⇒ Array<String>

Extracts explicit information from the given node based on the provided tag.
#extract_extent(node) ⇒ String

Extracts the extent from the given node.
#extract_filenames(page) ⇒ Array<String>

Extract the filename for page.
#extract_folio_num(page) ⇒ String

Extracts the folio number from the given page node.
#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>

Extracts former owners from the given record.
#extract_former_owners_as_recorded(xml, lookup_split: true) ⇒ Array<String>

Extracts former owners as recorded from the given XML.
#extract_genres(xml) ⇒ Object
#extract_incipit_explicit(xml) ⇒ Object

If the mods:mods element has a <mods:titleInfo type="alternative"> element and a <mods:abstract[not(@displayLabel)]>, then the content of the <mods:abstract[not(@displayLabel)]> is an incipit; XPath:.
#extract_institution_name(xml) ⇒ String

Extracts the institution name from the given XML document.
#extract_languages(record) ⇒ Array<DS::Extractor::Language>

Extract languages from the given record.
#extract_languages_as_recorded(record) ⇒ String

Return a list of unique languages from the text-level <mods:note>s that start with “lang:” (case -insensitive), joined with separator; so, “Latin”, rather than “Latin|Latin|Latin”, etc.
#extract_link_to_inst_record(xml) ⇒ String

Extract link to institution record from the given XML.
#extract_master_mets_file(page) ⇒ Array<String>

In some METS files each page has a list of mets:fptr elements, we need to get the @FILEID for the master image, but we don’t know which one is for the master.
#extract_material_as_recorded(record) ⇒ String

Extracts the material as recorded from the given record.
#extract_materials(record) ⇒ Array<DS::Extractor::Material>

Extracts materials from the given record.
#extract_mets_creator(xml) ⇒ Array<String>

Extracts the creator information from the METS XML document.
#extract_ms_note(xml) ⇒ Array<String>

Extracts the manuscript note from the given XML.
#extract_ms_phys_desc(xml) ⇒ Object
#extract_name(node, *roles) ⇒ Array<DS::Extractor::Name>

Extract name from the given node based on the provided roles.
#extract_notes(xml) ⇒ Array<String>

Extract the notes at all level from the xml, and return an array of strings.
#extract_other_names_as_recorded(record) ⇒ Array<String>

Extract other names as recorded from the given record.
#extract_page_note(xml) ⇒ Array<String>

Extracts notes for each page in the given XML.
#extract_part_note(xml) ⇒ Array<String>

Extracts notes for each part in the given XML.
#extract_part_phys_desc(xml) ⇒ Array<String>

Extracts physical description notes for each part in the XML.
#extract_pd_note(part) ⇒ Array<String>

Extracts physical description notes from the given part object.
#extract_physical_description(xml) ⇒ Array

Extract and format all the physical description values for the manuscript and each part.
#extract_places(record) ⇒ Array<DS::Extractor::Place>

Extracts places from the given record.
#extract_production_date_as_recorded(xml) ⇒ Array<String>

Return as a single string all the date values for the manuscript.
#extract_production_places_as_recorded(xml) ⇒ Array<String>

Extract production places as recorded from the given XML.
#extract_recon_names(xml) ⇒ Array<Array>

Extract reconciliation names from the given XML.
#extract_recon_places(xml) ⇒ Array<Array>

Extract the places of production for reconciliation CSV output.
#extract_recon_splits(xml) ⇒ Object

Extract acknowledgments, notes, physical descriptions, and former owners; return all strings that start with SPLIT:, remove ‘SPLIT: ’ and return an array of arrays that can be treated as rows by Recon::Type::Splits.
#extract_recon_subjects(xml) ⇒ Array<String,String>

See the note for [Recon::Type::Subjects]: Each source subject extraction method should return a two dimensional array:.
#extract_recon_titles(xml) ⇒ Array<String>

Extract reconciliation titles from the given XML.
#extract_scribes(record) ⇒ Array<String>

Extract scribes from the given record.
#extract_scribes_as_recorded(record) ⇒ Array<String>

Extracts scribes as recorded from the given record.
#extract_shelfmark(xml) ⇒ String

For the legacy DS METS, this value is the value of mods:identifier is the shelf mark.
#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extracts subjects from the given record.
#extract_subjects_as_recorded(xml) ⇒ Array<String>

Extract subjects, the mods:originInfo/mods:edition values for each text.
#extract_text_note(xml) ⇒ Array<String>

Extracts text notes from the given XML document.
#extract_titles(record) ⇒ Array<DS::Extractor::Title>

Extract titles from the given record.
#extract_titles_as_recorded(record) ⇒ Array<String>

Extract titles as recorded from the given record.
#find_ms(xml) ⇒ Object

METS structMap extraction.
#find_pages(xml) ⇒ Arry<Nokogiri::XML::Node>

Array of the page-level mets:dmdSec nodes.
#find_parts(xml) ⇒ Array<Nokogiri::XML::Node>

Find the manuscript parts in the XML document.
#find_texts(xml) ⇒ Array<Nokogiri::XML::Node>

Find the texts in the XML document.
#note_by_type(node, note_type, tag: nil) ⇒ Object

DS 1.0 METS note types:.
#physdesc_note(node, note_type, tag: nil) ⇒ Array<String>

Extracts the physical description notes from the given node based on the note type and optional tag.
#source_modified ⇒ String

A method to return the date when the source was last modified.

Instance Method Details

#dated_by_scribe?(xml) ⇒ `Boolean`

Determines if the XML document is dated by a scribe.

Parameters:

xml (Nokogiri::XML:Node) —

the XML document to check

Returns:

(Boolean) —

true if the document is dated by a scribe, false otherwise

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 529

def dated_by_scribe? xml
  parts = find_parts xml
  # mods:mods/mods:note
  xpath = 'mods:mods/mods:note[@type="date"]'
  parts.any? { |part|
    part.xpath(xpath).text.upcase == 'Y'
  }
end

#extract_acknowledgments(xml) ⇒ `Array<String>`

Extracts acknowledgments from the given XML document.

Parameters:

xml (Nokogiri::XML::Node) —

the XML document to extract acknowledgments from

Returns:

(Array<String>) —

the extracted acknowledgments

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 645

def extract_acknowledgments xml
  notes = []
  notes += find_ms(xml).flat_map { |ms| note_by_type ms, 'admin' }

  notes += find_parts(xml).flat_map { |part|
    extent = extract_extent part
    note_by_type part, 'admin', tag: extent
  }

  notes += find_texts(xml).flat_map { |text|
    extent = extract_extent text
    note_by_type text, 'admin', tag: extent
  }

  notes += find_pages(xml).flat_map { |page|
    extent = extract_extent page
    note_by_type page, 'admin', tag: extent
  }

  clean_notes notes
end

#extract_all_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

Note:

method returns #extract_subjects to fulfill DS::Extractor contract

Extracts all subjects from the given record.

Parameters:

record (Object) —

the record to extract subjects from

Returns:

(Array<DS::Extractor::Subject>) —

the extracted subjects



921
922
923

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 921

def extract_all_subjects record
  extract_subjects record
end

#extract_all_subjects_as_recorded(xml) ⇒ `Array<String>`

Extract all subjects as recorded from the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

the XML to extract subjects from

Returns:

(Array<String>) —

the extracted subjects as recorded



510
511
512

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 510

def extract_all_subjects_as_recorded xml
  extract_subjects_as_recorded xml
end

#extract_artists(record) ⇒ `Array<DS::Extractor::Name>`

Extracts artists from the given record using the specified type and role.

Parameters:

record (Object) —

the record to extract artists from

Returns:

(Array<DS::Extractor::Name>) —

an array of extracted artists



292
293
294

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 292

def extract_artists record
  DS::Extractor::DsMetsXmlExtractor.extract_name record, *%w{ artist [artist] illuminator }
end

#extract_artists_as_recorded(record) ⇒ `Object`

Extracts artists as recorded from the given record.

Parameters:

record (Object) —

the record to extract artists



284
285
286

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 284

def extract_artists_as_recorded record
  extract_artists(record).map &:as_recorded
end

#extract_assigned_date(part) ⇒ `Array<Integer>`

Return dates found in the otherDate element, reformatting them as needed. These examples are taken from several METS files.

<mods:dateOther>[ca. 1410]</mods:dateOther>
<mods:dateOther>[between 1100 and 1200]</mods:dateOther>
<mods:dateOther>[between 1450 and 1460]</mods:dateOther>
<mods:dateOther>[between 1450 and 1500]</mods:dateOther>
<mods:dateOther>s. XV#^3/4#</mods:dateOther>
<mods:dateOther>s. XV</mods:dateOther>
<mods:dateOther>s. XVI#^4/4#</mods:dateOther>
<mods:dateOther>s. XVIII#^2/4#</mods:dateOther>
<mods:dateOther>s. XV#^in#</mods:dateOther>

Most dateOther values have the format:

s. XVII#^2#

The notation #^<VAL># encodes a portion of the string that was presented as superscript on the Berkeley DS site. DS 2.0 doesn’t use the superscripts; thus, when it occurs, this portion of the string is reformatted ‘(<VAL>)`:

s. XVII#^2#   =>    s. XVII(2)
s. XV#^ex#    =>    s. XV(ex)
s. XVI#^in#   =>    s. XVI(in)
s. X#^med#    =>    s. X(med)
s. XII#^med#  =>    s. XII(med)

Parameters:

part (Nokogiri::XML:Node) —

a part-level node

Returns:

(Array<Integer>) —

the date string reformatted as described above

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 635

def extract_assigned_date part
  xpath = 'mods:mods/mods:originInfo/mods:dateOther'
  part.xpath(xpath).text.gsub %r{#\^?([\w/]+)(\^|#)}, '(\1)'
end

#extract_associated_agents(record) ⇒ `Array<String>`

Extract other names from the given record.

Parameters:

record (Object) —

the record to extract other names from

Returns:

(Array<String>) —

the extracted other names



324
325
326

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 324

def extract_associated_agents record
  DS::Extractor::DsMetsXmlExtractor.extract_name record, 'other'
end

#extract_authors(record) ⇒ `Array<DS::Extractor::Name>`

Extracts authors from the given record.

Parameters:

record (Object) —

the record to extract authors from

Returns:

(Array<DS::Extractor::Name>) —

an array of extracted authors



269
270
271

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 269

def extract_authors record
  DS::Extractor::DsMetsXmlExtractor.extract_name record, *%w{ author [author] }
end

#extract_authors_as_recorded(record) ⇒ `Array<String>`

Extracts authors as recorded from the given record.

Parameters:

record (Object) —

the record to extract authors from

Returns:

(Array<String>) —

the extracted authors as recorded



277
278
279

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 277

def extract_authors_as_recorded record
  extract_authors(record).map &:as_recorded
end

#extract_cataloging_convention(xml) ⇒ `Object`



17
18
19

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 17

def extract_cataloging_convention xml
  'ds-mets'
end

#extract_date_created(part) ⇒ `Array<Integer>`

Return any date not found in the otherDate or in a dateCreated date range (see #extract_date_range); thus:

<mods:dateCreated>1537</mods:dateCreated>
<mods:dateCreated>1531</mods:dateCreated>
<mods:dateCreated>14??, October 21</mods:dateCreated>
<mods:dateCreated>1462, July 23</mods:dateCreated>
<mods:dateCreated>1549, November</mods:dateCreated>

These values commonly give the date for “dated” manuscripts

Parameters:

part (Nokogiri::XML:Node) —

a part-level node

Returns:

(Array<Integer>) —

the content of any dateCreated without ‘@point’ defined

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 599

def extract_date_created part
  xpath = 'mods:mods/mods:originInfo/mods:dateCreated[not(@point)]'
  part.xpath(xpath).map(&:text).join ', '
end

#extract_date_range(xml, range_sep:) ⇒ `Array<String>`

Extract ranges from mods:dateCreated elements where a @point is defined, thus:

<mods:dateCreated point="start" encoding="iso8601">1300</mods:dateCreated>
<mods:dateCreated point="end" encoding="iso8601">1399</mods:dateCreated>

Parameters:

part (Nokogiri::XML:Node) —

a part-level node

Returns:

(Array<String>) —

the start and end dates as an array of integers

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 563

def extract_date_range xml, range_sep:
  find_parts(xml).map { |part|
    extract_date_range_for_part(part).join range_sep
  }
end

#extract_date_range_for_part(part) ⇒ `Array<Integer>`

Extract ranges from mods:dateCreated elements where a @point is start and end

Parameters:

part (Nokogiri::XML:Node) —

a part-level node

Returns:

(Array<Integer>) —

the start and end dates as an array of integers

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 578

def extract_date_range_for_part part
  start_date = part.xpath(DATE_START_XPATH).text
  end_date   = part.xpath(DATE_END_XPATH).text
  [start_date, end_date].reject(&:empty?).map(&:to_i)
end

#extract_docket(xml) ⇒ `Array<String>`

DS METS can have mods:abstract elments with @displayLabel=“docket”. Extract these values and return as an array.

Parameters:

xml (Nokogiri::XML::Node) —

the document xml

Returns:

(Array<String>) —

the note values

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 889

def extract_docket xml
  xpath = %q{//mods:abstract[@displayLabel = 'docket']/text()}
  xml.xpath(xpath, NS).map { |docket|
    "Docket: #{docket.text}"
  }
end

#extract_explicit(node, tag:) ⇒ `Array<String>`

Extracts explicit information from the given node based on the provided tag.

Parameters:

node (Nokogiri::XML::Node) —

the XML node to extract information from
tag (String) —

the tag to prepend to each extracted information

Returns:

(Array<String>) —

an array of extracted information

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 790

def extract_explicit node, tag:
  node.xpath('mods:mods/mods:abstract/text()').map { |n|
    "#{tag}: #{n.text}"
  }
end

#extract_extent(node) ⇒ `String`

Extracts the extent from the given node.

Parameters:

node (Nokogiri::XML::Node) —

the XML node to extract extent from

Returns:

(String) —

the extracted extent

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 211

def extract_extent node
  xpath = 'mods:mods/mods:physicalDescription/mods:extent'
  node.xpath(xpath).flat_map { |extent|
    extent.text.split(%r{;;}).first
  }.join ', '
end

#extract_filenames(page) ⇒ `Array<String>`

Extract the filename for page. This will be either:

* the values for +mods:identifier+ with +@type='filename'+; or

* the filenames pointed to by the linked +mets:fptr+ in the
     +mets:fileGrp+ with +@USE='image/master'+

* an array containing +['NO_FILE']+, if no files are associated with
     the page

There will almost always be one file, but at least one manuscript has page with two associated images. Thus, we return an array.

Parameters:

page (Nokogiri::XML::Node) —

the mets:dmdSec node for the page

Returns:

(Array<String>) —

array of all the filenames for page

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 683

def extract_filenames page
  # mods:mods/mods:identifier[@type='filename']
  xpath     = 'mods:mods/mods:identifier[@type="filename"]'
  filenames = page.xpath(xpath).map(&:text)
  return filenames unless filenames.empty?

  # no filename; find the ARK URL for the master image for this page
  extract_master_mets_file page
end

#extract_folio_num(page) ⇒ `String`

Extracts the folio number from the given page node.

Parameters:

page (Nokogiri::XML::Node) —

the XML node representing the page

Returns:

(String) —

the extracted folio number

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 697

def extract_folio_num page
  # mods:mods/mods:physicalDescription/mods:extent
  xpath = 'mods:mods/mods:physicalDescription/mods:extent'
  page.xpath(xpath).map(&:text).join '|'
end

#extract_former_owners(record) ⇒ `Array<DS::Extractor::Name>`

Extracts former owners from the given record.

Parameters:

record (Nokogiri::XML::Node) —

the XML node representing the record

Returns:

(Array<DS::Extractor::Name>) —

an array of extracted former owners

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 253

def extract_former_owners record
  xpath = "./descendant::mods:note[@type='ownership']/text()"
  notes = clean_notes(record.xpath(xpath).flat_map(&:text))

  notes.flat_map { |n|
    splits = Recon::Type::Splits._lookup_single(n, from_column: 'authorized_label')
    splits.present? ? splits.split('|') : n
  }.map { |n|
    DS::Extractor::Name.new as_recorded: DS.mark_long(n), role: 'former owner'
  }
end

#extract_former_owners_as_recorded(xml, lookup_split: true) ⇒ `Array<String>`

Extracts former owners as recorded from the given XML.

Parameters:

xml (Nokogiri::XML::NodeSet) —

the parsed XML to extract former owners from
lookup_split (Boolean) (defaults to: true) —

whether to lookup split information or not

Returns:

(Array<String>) —

the extracted former owners as recorded



245
246
247

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 245

def extract_former_owners_as_recorded xml, lookup_split: true
  extract_former_owners(xml).map &:as_recorded
end

#extract_genres(xml) ⇒ `Object`



462
463
464

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 462

def extract_genres xml
  []
end

#extract_incipit_explicit(xml) ⇒ `Object`

If the mods:mods element has a <mods:titleInfo type="alternative"> element and a <mods:abstract[not(@displayLabel)]>, then the content of the <mods:abstract[not(@displayLabel)]> is an incipit; XPath:

//mods:mods[./mods:titleInfo[@type="alternative"] and ./mods:abstract[not(@displayLabel)]]

//mods:mods[./mods:titleInfo[@type="alternative"]]/mods:abstract[not(@displayLabel)]/text()

If the mods:mods element has a ‘mods:titleInfo type=“alternative”` element and a `<mods:note type=“content”>`, then the content of the `<mods:note type=“content”>` is an explicit; XPath:

//mods:mods[./mods:titleInfo[@type="alternative"] and ./mods:note[@type="content"]]

//mods:mods[./mods:titleInfo[@type="alternative"]]/mods:note[@type="content"]/text()

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 864

def extract_incipit_explicit xml
  # ./descendant::mods:physicalDescription
  # mods:mods/mods:originInfo/mods:place/mods:placeTerm
  # find any mod:mods containing an incipit or explicit
  xpath = %q{//mods:mods[./mods:titleInfo[@type="alternative"] and
        (./mods:abstract[not(@displayLabel)] or
        ./mods:note[@type="content"])]}

  find_texts(xml).flat_map { |node|
    # return an array for formatted incipits and explicits for this manuscript
    extent = node.xpath('./descendant::mods:physicalDescription/mods:extent/text()', NS).text
    node.xpath('./descendant::mods:abstract[not(@displayLabel)]/text()').map { |inc|
      "Incipit, #{extent}: #{inc}"
    } + node.xpath('./descendant::mods:note[@type="content"]/text()').map { |exp|
      "Explicit, #{extent}: #{exp}"
    }
  }
end

#extract_institution_name(xml) ⇒ `String`

Extracts the institution name from the given XML document.

Parameters:

xml (Nokogiri::XML::Node) —

the XML document to extract the institution name from

Returns:

(String) —

the extracted institution name



25
26
27

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 25

def extract_institution_name xml
  extract_mets_creator(xml).first
end

#extract_languages(record) ⇒ `Array<DS::Extractor::Language>`

Extract languages from the given record.

Parameters:

record (Object) —

the record to extract languages from

Returns:

(Array<DS::Extractor::Language>) —

the extracted languages

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 342

def extract_languages record
  # /mets:mets/mets:dmdSec/mets:mdWrap/mets:xmlData/mods:mods/mods:note
  # Can be Lang: or lang: or ???, so down case the text with translate()
  xpath = './descendant::mods:note[starts-with(translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"), "lang:")]'
  find_texts(record).flat_map { |text|
    text.xpath(xpath).map { |note| note.text.sub(%r{^lang:\s*}i, '') }
  }.uniq.map { |as_recorded|
    DS::Extractor::Language.new as_recorded: as_recorded
  }
end

#extract_languages_as_recorded(record) ⇒ `String`

Return a list of unique languages from the text-level <mods:note>s that start with “lang:” (case -insensitive), joined with separator; so, “Latin”, rather than “Latin|Latin|Latin”, etc.

Returns:

(String)



334
335
336

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 334

def extract_languages_as_recorded record
  extract_languages(record).map &:as_recorded
end

#extract_link_to_inst_record(xml) ⇒ `String`

Extract link to institution record from the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

the XML to extract the link from

Returns:

(String) —

the extracted link to the institution record

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 518

def extract_link_to_inst_record xml
  ms = find_ms xml
  # xpath mods:mods/mods:relatedItem/mods:location/mods:url
  xpath = "mods:mods/mods:relatedItem/mods:location/mods:url"
  ms.xpath(xpath).map(&:text).join '|'
end

#extract_master_mets_file(page) ⇒ `Array<String>`

In some METS files each page has a list of mets:fptr elements, we need to get the @FILEID for the master image, but we don’t know which one is for the master. Thus we get all the @FILEIDs.

<mets:structMap>
  <mets:div TYPE="text" LABEL="[No Title for Display]" ADMID="RMD1" DMDID="DM1">
    <mets:div TYPE="item" LABEL="[No Title for Display]" DMDID="DM2">
      <mets:div TYPE="item" LABEL="[No Title for Display]" DMDID="DM3">
        <mets:div TYPE="item" LABEL="Music extending into right margin, upper right column." DMDID="DM4">
          <mets:fptr FILEID="FID1"/>
          <mets:fptr FILEID="FID3"/>
          <mets:fptr FILEID="FID5"/>
          <mets:fptr FILEID="FID7"/>
          <mets:fptr FILEID="FID9"/>
        </mets:div>
        <!-- snip -->
      </mets:div>
    </mets:div>
  </mets:div>
</mets:structMap>

Using the FILEIDs, find the corresponding mets:file in the mets:fileGrp with @USE=‘image/master’.

<mets:fileGrp USE="image/master">
  <mets:file ID="FID1" MIMETYPE="image/tiff" SEQ="1" CREATED="2010-11-08T10:26:20.3" ADMID="ADM1 ADM4" GROUPID="GID1">
    <mets:FLocat xlink:href="http://nma.berkeley.edu/ark:/28722/bk0008v1k7q" LOCTYPE="URL"/>
  </mets:file>
  <mets:file ID="FID2" MIMETYPE="image/tiff" SEQ="2" CREATED="2010-11-08T10:26:20.393" ADMID="ADM1 ADM5" GROUPID="GID2">
    <mets:FLocat xlink:href="http://nma.berkeley.edu/ark:/28722/bk0008v1k88" LOCTYPE="URL"/>
  </mets:file>
</mets:fileGrp>

We then follow the xlink:href to get the filename from the ‘location’ HTTP header.

Parameters:

page (Nokogiri::XML::Node) —

the mets:dmdSec node for the page

Returns:

(Array<String>) —

array of all the filenames for page

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 742

def extract_master_mets_file page
  dmdid = page['ID']
  # all the mets:fptr @FILEIDs for this page
  xpath = %Q{//mets:structMap/descendant::mets:div[@DMDID='#{dmdid}']/mets:fptr/@FILEID}

  # create an OR query because we don't know which FILEID is for the
  # master mets:file:
  #     "@ID = 'FID1' or @ID = 'FID3' or @ID = 'FID5' ... etc."
  id_query = page.xpath(xpath).map(&:text).map { |id| "@ID='#{id}'" }.join ' or '
  return ['NO_FILE'] if id_query.strip.empty? # there is no associated mets:fptr

  # the @xlink:href is the Berkeley ARK address; e.g., http://nma.berkeley.edu/ark:/28722/bk0008v1k88
  xpath          = "//mets:fileGrp[@USE='image/master']/mets:file[#{id_query}]/mets:FLocat/@xlink:href"
  fptr_addresses = page.xpath(xpath).map &:text
  return ['NO_FILE'] if fptr_addresses.empty? # I don't know if this happens, but just in case...

  # for each ARK address, find the TIFF filename
  fptr_addresses.map { |address| locate_filename address }
end

#extract_material_as_recorded(record) ⇒ `String`

Extracts the material as recorded from the given record.

Parameters:

record (CSV::Row) —

the record to extract material from

Returns:

(String) —

the extracted material as recorded



222
223
224

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 222

def extract_material_as_recorded record
  extract_materials(record).map(&:as_recorded).join '|'
end

#extract_materials(record) ⇒ `Array<DS::Extractor::Material>`

Extracts materials from the given record.

Parameters:

record (Object) —

the record to extract materials from

Returns:

(Array<DS::Extractor::Material>) —

an array of Material objects

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 230

def extract_materials record
  find_parts(record).flat_map { |part|
    physdesc_note part, 'support'
  }.map { |s|
    s.downcase.chomp('.').strip
  }.uniq.map { |as_recorded|
    DS::Extractor::Material.new as_recorded: as_recorded
  }
end

#extract_mets_creator(xml) ⇒ `Array<String>`

Extracts the creator information from the METS XML document.

Parameters:

xml (Nokogiri::XML::Node) —

the XML document containing METS data

Returns:

(Array<String>) —

an array of creator information

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 33

def extract_mets_creator xml
  creator = xml.xpath('/mets:mets/mets:metsHdr/mets:agent[@ROLE="CREATOR" and @TYPE="ORGANIZATION"]/mets:name', NS).text
  creator.split %r{;;}
end

#extract_ms_note(xml) ⇒ `Array<String>`

Extracts the manuscript note from the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

the XML node to extract manuscript note from

Returns:

(Array<String>) —

an array of manuscript notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 766

def extract_ms_note xml
  notes = []
  ms    = find_ms xml
  notes += note_by_type ms, :none, tag: 'Manuscript note'
  notes += note_by_type ms, 'bibliography', tag: 'Bibliography'
  notes
end

#extract_ms_phys_desc(xml) ⇒ `Object`

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 88

def extract_ms_phys_desc xml
  ms = find_ms xml
  physdesc_note ms, 'presentation', tag: 'Binding'
end

#extract_name(node, *roles) ⇒ `Array<DS::Extractor::Name>`

Extract name from the given node based on the provided roles.

Parameters:

node (Object) —

the node to extract name from
roles (Array<String>) —

the roles to search for

Returns:

(Array<DS::Extractor::Name>) —

the extracted names

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 358

def extract_name node, *roles
  # Roles have different cases: Author, author, etc.
  # Xpath 1.0 has no lower-case function, so use translate()
  translate = "translate(./mods:role/mods:roleTerm/text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')"
  props     = roles.map { |r| "#{translate} = '#{r}'" }.join ' or '
  xpath     = "./descendant::mods:name[#{props}]"
  node.xpath(xpath).flat_map { |name|
    name.xpath('mods:namePart').text.split %r{\s*;\s*}
  }.uniq.map { |as_recorded|
    DS::Extractor::Name.new as_recorded: as_recorded, role: roles.first
  }
end

#extract_notes(xml) ⇒ `Array<String>`

Extract the notes at all level from the xml, and return an array of strings

Parameters:

xml (Nokogiri::XML::Node) —

the document’s xml

Returns:

(Array<String>) —

the note values

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 833

def extract_notes xml
  notes = []
  # get all notes that don't have @type
  xpath = %q{//mods:note[not(@type)]/text()}
  notes += extract_ms_note xml
  notes += extract_part_note xml
  notes += extract_text_note xml
  notes += extract_docket xml
  notes += extract_page_note xml

  clean_notes notes
end

#extract_other_names_as_recorded(record) ⇒ `Array<String>`

Extract other names as recorded from the given record.

Parameters:

record (Object) —

the record to extract other names from

Returns:

(Array<String>) —

the extracted other names as recorded



316
317
318

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 316

def extract_other_names_as_recorded record
  extract_associated_agents(record).map &:as_recorded
end

#extract_page_note(xml) ⇒ `Array<String>`

Extracts notes for each page in the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

the XML node to extract notes from

Returns:

(Array<String>) —

an array of extracted notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 816

def extract_page_note xml
  find_pages(xml).flat_map { |page|
    extent = extract_extent page
    notes  = []
    notes  += note_by_type page, :none, tag: extent
    notes  += note_by_type page, 'content', tag: "Incipit, #{extent}"
    notes  += extract_explicit page, tag: "Explicit, #{extent}"
    notes
  }
end

#extract_part_note(xml) ⇒ `Array<String>`

Extracts notes for each part in the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

the XML node to extract notes from

Returns:

(Array<String>) —

an array of extracted notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 778

def extract_part_note xml
  find_parts(xml).flat_map { |part|
    extent = extract_extent part
    note_by_type part, :none, tag: extent
  }
end

#extract_part_phys_desc(xml) ⇒ `Array<String>`

Extracts physical description notes for each part in the XML.

Parameters:

xml (Nokogiri::XML::Node) —

the XML node to extract parts from

Returns:

(Array<String>) —

an array of extracted physical description notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 119

def extract_part_phys_desc xml
  parts = find_parts xml
  parts.flat_map { |part|
    extent = extract_extent part
    notes  = []

    tag   = "Figurative details, #{extent}"
    notes += physdesc_note part, 'physical details', tag: tag
    notes += extract_pd_note part
    tag   = "Script, #{extent}"
    notes += physdesc_note part, 'script', tag: tag
    tag   = "Music, #{extent}"
    notes += physdesc_note part, 'medium', tag: tag
    tag   = "Layout, #{extent}"
    notes += physdesc_note part, 'technique', tag: tag
    tag   = "Watermarks, #{extent}"
    notes += physdesc_note part, 'marks', tag: tag
    notes
  }
end

#extract_pd_note(part) ⇒ `Array<String>`

Extracts physical description notes from the given part object.

Parameters:

part (Nokogiri::XML::Node) —

the XML node representing the part

Returns:

(Array<String>) —

an array of extracted physical description notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 97

def extract_pd_note part
  extent = extract_extent part

  xpath = %q{mods:mods/mods:physicalDescription/mods:note[@type = 'physical description']/text()}
  part.xpath(xpath).flat_map { |node|
    text  = node.text
    notes = []
    if text =~ %r{;;}
      other_deco, num_scribes = text.split %r{;;+}
      notes << "Other decoration, #{extent}: #{other_deco}" unless other_deco.blank?
      notes << "Number of scribes, #{extent}: #{num_scribes}" unless num_scribes.blank?
    else
      notes << "Other decoration, #{extent}: #{text}" unless text.empty?
    end
    notes
  }
end

#extract_physical_description(xml) ⇒ `Array`

Extract and format all the physical description values for the manuscript and each part.

# MS Note Phys desc

presentation -> Binding

# MS Part phys description

- support -- accounted for as support

- marks - 'Watermarks'
- medium -> 'Music'
- physical description -> 'Other decoration'
- physical details -> 'Figurative details'
- script -> 'Script'
- technique -> 'Layout'

Parameters:

xml (Nokogiri::XML::Node) —

the document’s xml

Returns:

(Array) —

the physical description values

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 59

def extract_physical_description xml
  physdesc = []
  physdesc += extract_ms_phys_desc xml
  physdesc += extract_part_phys_desc xml
  physdesc.flatten!

  clean_notes physdesc
end

#extract_places(record) ⇒ `Array<DS::Extractor::Place>`

Extracts places from the given record.

Parameters:

record (Object) —

the record to extract places from

Returns:

(Array<DS::Extractor::Place>) —

the extracted places

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 904

def extract_places record
  parts = find_parts record
  xpath = 'mods:mods/mods:originInfo/mods:place/mods:placeTerm'
  parts.flat_map { |node|
    node.xpath(xpath).map { |place|
      DS::Extractor::Place.new as_recorded: place.text.split(%r{;;}).join(', ')
    }
  }
end

#extract_production_date_as_recorded(xml) ⇒ `Array<String>`

Return as a single string all the date values for the manuscript. This is a concatenation of the values returned by DS10.extract_date_created, DS10.extract_assigned_date, DS10.extract_date_range.

Parameters:

xml (Nokogiri::XML:Node) —

the parsed METS xml document

Returns:

(Array<String>) —

the concatenated date values

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 545

def extract_production_date_as_recorded xml
  find_parts(xml).map { |part|
    date_created = extract_date_created part
    assigned     = extract_assigned_date part
    range        = extract_date_range_for_part(part).join '-'
    [date_created, assigned, range].uniq.reject(&:empty?).join '; '
  }.reject { |date| date.to_s.strip.empty? }
end

#extract_production_places_as_recorded(xml) ⇒ `Array<String>`

Extract production places as recorded from the given XML.

Parameters:

xml (Object) —

the XML to extract production places from

Returns:

(Array<String>) —

the extracted production places as recorded



398
399
400

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 398

def extract_production_places_as_recorded xml
  extract_places(xml).map &:as_recorded
end

#extract_recon_names(xml) ⇒ `Array<Array>`

Extract reconciliation names from the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

a <METS_XML> node

Returns:

(Array<Array>) —

an array of arrays of names for reconciliation

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 430

def extract_recon_names xml
  data = extract_authors(xml).map &:to_a
  data += extract_artists(xml).map &:to_a
  data += extract_scribes(xml).map &:to_a
  data += extract_former_owners(xml).map &:to_a
  data += extract_associated_agents(xml).map &:to_a
  data
end

#extract_recon_places(xml) ⇒ `Array<Array>`

Extract the places of production for reconciliation CSV output.

Returns a two-dimensional array, each row is a place; and each row has one column: place name; for example:

[["Austria"],
 ["Germany"],
 ["France (?)"]]

Parameters:

xml (Nokogiri::XML:Node) —

a <METS_XML> node

Returns:

(Array<Array>) —

an array of arrays of values



414
415
416

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 414

def extract_recon_places xml
  extract_places(xml).map &:to_a
end

#extract_recon_splits(xml) ⇒ `Object`

Extract acknowledgments, notes, physical descriptions, and former owners; return all strings that start with SPLIT:, remove ‘SPLIT: ’ and return an array of arrays that can be treated as rows by Recon::Type::Splits

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 444

def extract_recon_splits xml
  data = []
  data += DS::Extractor::DsMetsXmlExtractor.extract_former_owners_as_recorded xml, lookup_split: false
  data.flatten.select { |d| d.to_s.size >= 400 }.map { |d| [d.strip] }
end

#extract_recon_subjects(xml) ⇒ `Array<String,String>`

See the note for [Recon::Type::Subjects]: Each source subject extraction method should return a two dimensional array:

[["Islamic law--Early works to 1800", ""],
  ["Malikites--Early works to 1800", ""],
  ["Islamic law", ""],s
  ["Malikites", ""],
  ["Arabic language--Grammar--Early works to 1800", ""],
  ["Arabic language--Grammar", ""],
  ...
  ]

The second value is for those cases where the source provides an authority URI. The METS records don’t give a URI so this method always returns the empty string for the second value.

Parameters:

xml (Nokogiri::XML:Node) —

a <METS_XML> node

Returns:

(Array<String,String>) —

a two-dimenional array of subject and URI



485
486
487

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 485

def extract_recon_subjects xml
  extract_subjects(xml).map &:to_a
end

#extract_recon_titles(xml) ⇒ `Array<String>`

Extract reconciliation titles from the given XML.

Parameters:

xml (Nokogiri::XML::Node) —

a <METS_XML> node

Returns:

(Array<String>) —

an array of titles for reconciliation



422
423
424

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 422

def extract_recon_titles xml
  extract_titles(xml).to_a
end

#extract_scribes(record) ⇒ `Array<String>`

Extract scribes from the given record.

Parameters:

record (Object) —

the record to extract scribes from

Returns:

(Array<String>) —

the extracted scribes



308
309
310

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 308

def extract_scribes record
  DS::Extractor::DsMetsXmlExtractor.extract_name record, *%w{ scribe [scribe] }
end

#extract_scribes_as_recorded(record) ⇒ `Array<String>`

Extracts scribes as recorded from the given record.

Parameters:

record (Object) —

the record to extract scribes from

Returns:

(Array<String>) —

the extracted scribes as recorded



300
301
302

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 300

def extract_scribes_as_recorded record
  extract_scribes(record).map &:as_recorded
end

#extract_shelfmark(xml) ⇒ `String`

For the legacy DS METS, this value is the value of mods:identifier is the shelf mark. If there are other ID types, we can’t distinguish them from shelfmarks.

Parameters:

xml (Nokogiri::XML:Node) —

a <METS_XML> node

Returns:

(String) —

the shelfmark

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 457

def extract_shelfmark xml
  ms = find_ms xml
  ms.xpath('mods:mods/mods:identifier[@type="local"]/text()').text
end

#extract_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

Extracts subjects from the given record.

Parameters:

record (Object) —

the record to extract subjects from

Returns:

(Array<DS::Extractor::Subject>) —

the extracted subjects

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 929

def extract_subjects record
  xpath = '//mods:originInfo/mods:edition'
  find_texts(record).flat_map { |text|
    text.xpath(xpath).map { |subj|
      as_recorded = subj.text.strip.gsub(/\s+/, ' ')
      DS::Extractor::Subject.new as_recorded: as_recorded, vocab: 'ds-subject'
    }
  }
end

#extract_subjects_as_recorded(xml) ⇒ `Array<String>`

Extract subjects, the mods:originInfo/mods:edition values for each text. For example,

<mods:originInfo>
  <mods:edition>Alexander, de Villa Dei.</mods:edition>
  <mods:edition>Latin language--Grammar.</mods:edition>
  <mods:edition>Latin poetry, Medieval and modern.</mods:edition>
  <mods:edition>Manuscripts, Medieval--Connecticut--New Haven.</mods:edition>
</mods:originInfo>

Parameters:

xml (Nokogiri::XML:Node) —

a <METS_XML> node

Returns:

(Array<String>) —

an of subjects



502
503
504

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 502

def extract_subjects_as_recorded xml
  extract_subjects(xml).map(&:as_recorded)
end

#extract_text_note(xml) ⇒ `Array<String>`

Extracts text notes from the given XML document.

Parameters:

xml (Nokogiri::XML::Node) —

the XML document to extract text notes from

Returns:

(Array<String>) —

the extracted text notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 800

def extract_text_note xml
  find_texts(xml).flat_map { |text|
    extent = extract_extent text
    notes  = []
    notes  += note_by_type text, :none, tag: extent
    notes  += note_by_type text, 'condition', tag: "Status of text, #{extent}"
    notes  += note_by_type text, 'content', tag: "Incipit, #{extent}"
    notes  += extract_explicit text, tag: "Explicit, #{extent}"
    notes
  }
end

#extract_titles(record) ⇒ `Array<DS::Extractor::Title>`

Extract titles from the given record.

Parameters:

record (Object) —

the record to extract titles from

Returns:

(Array<DS::Extractor::Title>) —

the extracted titles

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 383

def extract_titles record
  xpath = 'mods:mods/mods:titleInfo/mods:title'
  find_texts(record).flat_map { |text|
    text.xpath(xpath).map(&:text)
  }.reject {
    |t| t == '[Title not supplied]'
  }.map { |as_recorded|
    DS::Extractor::Title.new as_recorded: as_recorded
  }
end

#extract_titles_as_recorded(record) ⇒ `Array<String>`

Extract titles as recorded from the given record.

Parameters:

record (Object) —

the record to extract titles from

Returns:

(Array<String>) —

the extracted titles as recorded



375
376
377

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 375

def extract_titles_as_recorded record
  extract_titles(record).map &:as_recorded
end

#find_ms(xml) ⇒ `Object`

METS structMap extraction

Extract mods:mods elements by catalog description level: manuscript, manuscript part, text, page, image

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 946

def find_ms xml
  # the manuscript is one div deep in the structMap
  # /mets:mets/mets:structMap/mets:div/@DMDID
  xpath = '/mets:mets/mets:structMap/mets:div/@DMDID'
  id    = xml.xpath(xpath).first.text
  xml.xpath "/mets:mets/mets:dmdSec[@ID='#{id}']/mets:mdWrap/mets:xmlData"
end

#find_pages(xml) ⇒ `Arry<Nokogiri::XML::Node>`

Returns array of the page-level mets:dmdSec nodes.

Parameters:

xml (Nokogiri::XML::Node) —

parsed XML of the METS document

Returns:

(Arry<Nokogiri::XML::Node>) —

array of the page-level mets:dmdSec nodes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 992

def find_pages xml
  # /mets:mets/mets:structMap/mets:div/mets:div/mets:div/mets:div/@DMDID
  # the pages are four divs deep in the structMap
  # We need the IDs in order
  xpath = '/mets:mets/mets:structMap/mets:div/mets:div/mets:div/mets:div/@DMDID'
  ids   = xml.xpath(xpath).map &:text
  # collect dmdSec's for all the page IDs
  ids.flat_map { |id|
    xml.xpath "/mets:mets/mets:dmdSec[@ID='#{id}']/mets:mdWrap/mets:xmlData"
  }
end

#find_parts(xml) ⇒ `Array<Nokogiri::XML::Node>`

Find the manuscript parts in the XML document.

Parameters:

xml (Nokogiri::XML::Node) —

the parsed XML document

Returns:

(Array<Nokogiri::XML::Node>) —

an array of manuscript parts in the correct order

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 958

def find_parts xml
  # /mets:mets/mets:structMap/mets:div/mets:div/@DMDID
  # manuscripts parts are two divs deep in the structMap
  # We need to get the IDs in order
  xpath = '/mets:mets/mets:structMap/mets:div/mets:div/@DMDID'
  ids   = xml.xpath(xpath).map &:text
  # We can't count on the order or the numbering of the mets:dmdSec
  # elements outside of the structMap. Thus, we have to return an
  # array with the parts mets:dmdSec in the correct order.
  ids.map { |id|
    xml.xpath "/mets:mets/mets:dmdSec[@ID='#{id}']/mets:mdWrap/mets:xmlData"
  }
end

#find_texts(xml) ⇒ `Array<Nokogiri::XML::Node>`

Find the texts in the XML document.

Parameters:

xml (Nokogiri::XML::Node) —

the parsed XML document

Returns:

(Array<Nokogiri::XML::Node>) —

an array of text nodes in the correct order

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 977

def find_texts xml
  # /mets:mets/mets:structMap/mets:div/mets:div/mets:div/@DMDID
  # texts are three divs deep in the structMap
  # We need to get the IDs in order
  xpath = '/mets:mets/mets:structMap/mets:div/mets:div/mets:div/@DMDID'
  ids   = xml.xpath(xpath).map &:text
  ids.map { |id|
    xml.xpath "/mets:mets/mets:dmdSec[@ID='#{id}']/mets:mdWrap/mets:xmlData"
  }
end

#note_by_type(node, note_type, tag: nil) ⇒ `Object`

DS 1.0 METS note types:

# MS Note types:

Accounted for
- ownership -- accounted for, former owner
- action -- skip; administrative note: "Inputter ...."
- admin -- acknowledgments
- untyped -- 'Manuscript Note'
- bibliography -- 'Bibliography'
- source note -- skip; not present on DS legacy pages

# MS Note Phys desc

presentation -> Binding

# Part note types:

- date - already accounted for
- content - skip
- admin - Acknowledgments

- untyped

# MS Part phys description

 - support -- accounted for as support

 - marks - 'Watermarks'
 - medium -> 'Music'
 - physical description -> 'Other decoration'
 - physical details -> 'Figurative details'
 - script -> 'Script'
 - technique -> 'Layout'

# Text note types

 Accounted for
 - admin - acknowledgments

 - condition -> 'Status of text'
 - content -> handled as Text Incipit
 - untyped -> 'Text note'

# Page note types

 Accounted for
   None

 - content -> Folio Incipit
 - date -- skip
 - untyped -> 'Folio note'

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 195

def note_by_type node, note_type, tag: nil
  if note_type == :none
    xpath = %q{mods:mods/mods:note[not(@type)]/text()}
  else
    xpath = %Q{mods:mods/mods:note[@type = '#{note_type}']/text()}
  end

  node.xpath(xpath).map { |x|
    tag.nil? ? x.text : "#{tag}: #{x.text}"
  }
end

#physdesc_note(node, note_type, tag: nil) ⇒ `Array<String>`

Extracts the physical description notes from the given node based on the note type and optional tag.

Parameters:

node (Nokogiri::XML::Node) —

the XML node to extract notes from
note_type (Symbol) —

the type of note to extract
tag (String) (defaults to: nil) —

an optional tag to prepend to each extracted note

Returns:

(Array<String>) —

an array of extracted notes

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 74

def physdesc_note node, note_type, tag: nil
  if note_type == :none
    xpath = %q{mods:mods/mods:physicalDescription/mods:note[not(@type)]}
  else
    xpath = %Q{mods:mods/mods:physicalDescription/mods:note[@type = '#{note_type}']}
  end

  node.xpath(xpath).map { |x|
    tag.nil? ? x.text : "#{tag}: #{x.text}"
  }
end

#source_modified ⇒ `String`

A method to return the date when the source was last modified. For DS METS we have chosen the date 2021-10-01.

Returns:

(String) —

“2021-10-01”



1007
1008
1009

# File 'lib/ds/extractor/ds_mets_xml_extractor.rb', line 1007

def source_modified
  "2021-10-01"
end

Module: DS::Extractor::DsMetsXmlExtractor::ClassMethods

Constant Summary collapse

Instance Method Summary collapse

Instance Method Details

#dated_by_scribe?(xml) ⇒ Boolean

#extract_acknowledgments(xml) ⇒ Array<String>

#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>

#extract_all_subjects_as_recorded(xml) ⇒ Array<String>

#extract_artists(record) ⇒ Array<DS::Extractor::Name>

#extract_artists_as_recorded(record) ⇒ Object

#extract_assigned_date(part) ⇒ Array<Integer>

#extract_associated_agents(record) ⇒ Array<String>

#extract_authors(record) ⇒ Array<DS::Extractor::Name>

#extract_authors_as_recorded(record) ⇒ Array<String>

#extract_cataloging_convention(xml) ⇒ Object

#extract_date_created(part) ⇒ Array<Integer>

#extract_date_range(xml, range_sep:) ⇒ Array<String>

#extract_date_range_for_part(part) ⇒ Array<Integer>

#extract_docket(xml) ⇒ Array<String>

#extract_explicit(node, tag:) ⇒ Array<String>

#extract_extent(node) ⇒ String

#extract_filenames(page) ⇒ Array<String>

#extract_folio_num(page) ⇒ String

#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>

#extract_former_owners_as_recorded(xml, lookup_split: true) ⇒ Array<String>

#extract_genres(xml) ⇒ Object

#extract_incipit_explicit(xml) ⇒ Object

#extract_institution_name(xml) ⇒ String

#extract_languages(record) ⇒ Array<DS::Extractor::Language>

#extract_languages_as_recorded(record) ⇒ String

#extract_link_to_inst_record(xml) ⇒ String

#extract_master_mets_file(page) ⇒ Array<String>

#extract_material_as_recorded(record) ⇒ String

#extract_materials(record) ⇒ Array<DS::Extractor::Material>

#extract_mets_creator(xml) ⇒ Array<String>

#extract_ms_note(xml) ⇒ Array<String>

#extract_ms_phys_desc(xml) ⇒ Object

#extract_name(node, *roles) ⇒ Array<DS::Extractor::Name>

#extract_notes(xml) ⇒ Array<String>

#extract_other_names_as_recorded(record) ⇒ Array<String>

#extract_page_note(xml) ⇒ Array<String>

#extract_part_note(xml) ⇒ Array<String>

#extract_part_phys_desc(xml) ⇒ Array<String>

#extract_pd_note(part) ⇒ Array<String>

#extract_physical_description(xml) ⇒ Array

#extract_places(record) ⇒ Array<DS::Extractor::Place>

#extract_production_date_as_recorded(xml) ⇒ Array<String>

#extract_production_places_as_recorded(xml) ⇒ Array<String>

#extract_recon_names(xml) ⇒ Array<Array>

#extract_recon_places(xml) ⇒ Array<Array>

#extract_recon_splits(xml) ⇒ Object

#extract_recon_subjects(xml) ⇒ Array<String,String>

#extract_recon_titles(xml) ⇒ Array<String>

#extract_scribes(record) ⇒ Array<String>

#extract_scribes_as_recorded(record) ⇒ Array<String>

#extract_shelfmark(xml) ⇒ String

#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>

#extract_subjects_as_recorded(xml) ⇒ Array<String>

#extract_text_note(xml) ⇒ Array<String>

#extract_titles(record) ⇒ Array<DS::Extractor::Title>

#extract_titles_as_recorded(record) ⇒ Array<String>

#find_ms(xml) ⇒ Object

#find_pages(xml) ⇒ Arry<Nokogiri::XML::Node>

#find_parts(xml) ⇒ Array<Nokogiri::XML::Node>

#find_texts(xml) ⇒ Array<Nokogiri::XML::Node>

#note_by_type(node, note_type, tag: nil) ⇒ Object

#physdesc_note(node, note_type, tag: nil) ⇒ Array<String>

#source_modified ⇒ String

#dated_by_scribe?(xml) ⇒ `Boolean`

#extract_acknowledgments(xml) ⇒ `Array<String>`

#extract_all_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

#extract_all_subjects_as_recorded(xml) ⇒ `Array<String>`

#extract_artists(record) ⇒ `Array<DS::Extractor::Name>`

#extract_artists_as_recorded(record) ⇒ `Object`

#extract_assigned_date(part) ⇒ `Array<Integer>`

#extract_associated_agents(record) ⇒ `Array<String>`

#extract_authors(record) ⇒ `Array<DS::Extractor::Name>`

#extract_authors_as_recorded(record) ⇒ `Array<String>`

#extract_cataloging_convention(xml) ⇒ `Object`

#extract_date_created(part) ⇒ `Array<Integer>`

#extract_date_range(xml, range_sep:) ⇒ `Array<String>`

#extract_date_range_for_part(part) ⇒ `Array<Integer>`

#extract_docket(xml) ⇒ `Array<String>`

#extract_explicit(node, tag:) ⇒ `Array<String>`

#extract_extent(node) ⇒ `String`

#extract_filenames(page) ⇒ `Array<String>`

#extract_folio_num(page) ⇒ `String`

#extract_former_owners(record) ⇒ `Array<DS::Extractor::Name>`

#extract_former_owners_as_recorded(xml, lookup_split: true) ⇒ `Array<String>`

#extract_genres(xml) ⇒ `Object`

#extract_incipit_explicit(xml) ⇒ `Object`

#extract_institution_name(xml) ⇒ `String`

#extract_languages(record) ⇒ `Array<DS::Extractor::Language>`

#extract_languages_as_recorded(record) ⇒ `String`

#extract_link_to_inst_record(xml) ⇒ `String`

#extract_master_mets_file(page) ⇒ `Array<String>`

#extract_material_as_recorded(record) ⇒ `String`

#extract_materials(record) ⇒ `Array<DS::Extractor::Material>`

#extract_mets_creator(xml) ⇒ `Array<String>`

#extract_ms_note(xml) ⇒ `Array<String>`

#extract_ms_phys_desc(xml) ⇒ `Object`

#extract_name(node, *roles) ⇒ `Array<DS::Extractor::Name>`

#extract_notes(xml) ⇒ `Array<String>`

#extract_other_names_as_recorded(record) ⇒ `Array<String>`

#extract_page_note(xml) ⇒ `Array<String>`

#extract_part_note(xml) ⇒ `Array<String>`

#extract_part_phys_desc(xml) ⇒ `Array<String>`

#extract_pd_note(part) ⇒ `Array<String>`

#extract_physical_description(xml) ⇒ `Array`

#extract_places(record) ⇒ `Array<DS::Extractor::Place>`

#extract_production_date_as_recorded(xml) ⇒ `Array<String>`

#extract_production_places_as_recorded(xml) ⇒ `Array<String>`

#extract_recon_names(xml) ⇒ `Array<Array>`

#extract_recon_places(xml) ⇒ `Array<Array>`

#extract_recon_splits(xml) ⇒ `Object`

#extract_recon_subjects(xml) ⇒ `Array<String,String>`

#extract_recon_titles(xml) ⇒ `Array<String>`

#extract_scribes(record) ⇒ `Array<String>`

#extract_scribes_as_recorded(record) ⇒ `Array<String>`

#extract_shelfmark(xml) ⇒ `String`

#extract_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

#extract_subjects_as_recorded(xml) ⇒ `Array<String>`

#extract_text_note(xml) ⇒ `Array<String>`

#extract_titles(record) ⇒ `Array<DS::Extractor::Title>`

#extract_titles_as_recorded(record) ⇒ `Array<String>`

#find_ms(xml) ⇒ `Object`

#find_pages(xml) ⇒ `Arry<Nokogiri::XML::Node>`

#find_parts(xml) ⇒ `Array<Nokogiri::XML::Node>`

#find_texts(xml) ⇒ `Array<Nokogiri::XML::Node>`

#note_by_type(node, note_type, tag: nil) ⇒ `Object`

#physdesc_note(node, note_type, tag: nil) ⇒ `Array<String>`

#source_modified ⇒ `String`