Module: DS::Extractor::MarcXmlExtractor::ClassMethods

Included in:: DS::Extractor::MarcXmlExtractor

Defined in:: lib/ds/extractor/marc_xml_extractor.rb

Instance Method Summary collapse

#build_name_query(tags: [], relators: []) ⇒ String

Build names query tags and relators.
#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ Array<Array>

Extract subfield values specified by tags.
#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ Array<Array>

Extract datafields values with authority numbers (URL) when present for reconciliation CSV output.
#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ String

A method to collect subfields from a given datafield based on specified codes.
#compile_dates(record, code, part1, part2) ⇒ Object

Compiles dates based on the provided code and parts.
#extract_001_control_number(record, holdings_file = nil) ⇒ String

Extracts the 001 control number from the given MARC XML record and joins non-empty values with ‘|’.
#extract_acknowledgments(record) ⇒ Array

Extracts acknowledgments from the given record.
#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extracts all subjects from the given record, including named subjects and subjects.
#extract_all_subjects_as_recorded(record) ⇒ Array<String>

Extracts all subjects as recorded from the given record.
#extract_artists(record) ⇒ Array<DS::Extractor::Name>

Extracts artists from the given record using the specified type and role.
#extract_artists_as_recorded(record) ⇒ Array<String>

Extracts artists as recorded from the given record.
#extract_artists_as_recorded_agr(record) ⇒ Array<String>

Extracts artists as recorded with vernacular form from the given record.
#extract_associated_agents(record) ⇒ Object
#extract_authority_number(datafield) ⇒ String

Extract the authority number, subfield $0 from the given datafield.
#extract_authors(record) ⇒ Array<String>

Extracts authors from the given record.
#extract_authors_as_recorded(record) ⇒ Array<String>

Extract names from record using tags and relators.
#extract_authors_as_recorded_agr(record) ⇒ Array<String>

Extract the alternate graphical representation of the name or return [].
#extract_cataloging_convention(record) ⇒ Object
#extract_date_part(datestring, ndx1, ndx2) ⇒ String

Extracts a part of the date string from a MARC 008 controlfield, using the indices ndx1 and ndx2.
#extract_date_range(record, range_sep:) ⇒ Array

Extract the encoded date from controlfield 008.
#extract_extent(record) ⇒ Array<String>

Extracts the extent from the given MARC XML record.
#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>

Extract former owners from the given record.
#extract_former_owners_as_recorded(record) ⇒ Array<String>

Extracts former owners as recorded from the given record.
#extract_former_owners_as_recorded_agr(record) ⇒ Array<String>

Extracts former owners as recorded with vernacular form from the given record.
#extract_genre_vocabulary(record) ⇒ Array<Symbol>

Extracts the genre vocabulary from the given MARC XML record.
#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ Array<DS::Extractor::Genre>

Extracts genres from the given MARC XML record.
#extract_genres_as_recorded(record, uniq: true) ⇒ Array<String>

Genres and subjects.
#extract_langs(record) ⇒ String

Extract the language codes from controlfield 008 and datafield 041$a.
#extract_languages(record) ⇒ Object
#extract_languages_as_recorded(record) ⇒ String

Extract the language as record; default to the 546$a field; otheriwse return the code values from controlfield 008 and 041$a.
#extract_material_as_recorded(record) ⇒ String

Extracts the material as recorded from the given MARC XML record.
#extract_materials(record) ⇒ Array<DS::Extractor::Material>

Extracts materials from the given MARC XML record.
#extract_mmsid(record) ⇒ String

Extracts the MMS ID from the given MARC XML record.
#extract_name_portion(datafield) ⇒ String

Extract the the PN from datafield, pulling subfields $a, $b, $c, $d.
#extract_named_500(record, name:, strip_name: false) ⇒ Array<String>

Return an array of 500$a values that begin with name: (name followed by a colon :).
#extract_named_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extract named subjects from the MARC XML record based on specified tags.
#extract_named_subjects_as_recorded(record) ⇒ Array<String>

Extracts named subjects as recorded from the given record.
#extract_names(record, tags: [], relators: []) ⇒ Array<DS::Extractor::Name>

Extract names from the MARC XML record based on specified tags and relators.
#extract_names_as_recorded(record, tags: [], relators: []) ⇒ String

Extract names from record using tags and relators.
#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ Object

Extract the alternate graphical representation of the name or return ”.
#extract_notes(record) ⇒ Array<String>

Notes.
#extract_physical_description(record) ⇒ String

Extracts the physical description from the given MARC XML record.
#extract_places(record) ⇒ Object
#extract_pn_agr(datafield) ⇒ String

Extract the alternate graphical representation of the name or return ”.
#extract_production_date_as_recorded(record) ⇒ Object

Look for a date as recorded.
#extract_production_places_as_recorded(record) ⇒ Array<String>

Look for a place as recorded.
#extract_recon_genres(record, sub_sep: '--') ⇒ Array<Array>

Extract genre terms for reconciliation CSV output.
#extract_recon_names(record, tags: [], relators: []) ⇒ Array<Array<String>>

For the given record, extract the names as an array of arrays, including the concatenated name string (subfields, a, b, c, d) and, if present, the alternate graphical representation (AGR) and authority number (or URI).
#extract_recon_places(record) ⇒ Array<Array>

Extract the places of production MARC 260$a for reconciliation CSV output.
#extract_recon_subjects(record) ⇒ Array

Extracts reconstructed subjects from the given record.
#extract_recon_titles(record) ⇒ Array<String>

Extracts reconstructed titles from the given record.
#extract_role(datafield, relators:) ⇒ String

Extract the role value, subfield $e, from the given datafield.
#extract_scribes(record) ⇒ Array<DS::Extractor::Name>

Extract scribes from the given record.
#extract_scribes_as_recorded(record) ⇒ Array<String>

Extract scribes as recorded from the given record.
#extract_scribes_as_recorded_agr(record) ⇒ Array<String>

Extracts scribes as recorded with vernacular form from the given record.
#extract_subject_by_tags(record, tags: []) ⇒ Array<DS::Extractor::Subject>

Return an array of strings of formatted subjects (600, 610, 611, 630, 647, 648, 650, and 651).
#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extracts subjects from the given record based on specified tags.
#extract_subjects_as_recorded(record) ⇒ Array<String>

Extracts subjects as recorded from the given record.
#extract_titles(record) ⇒ Array<DS::Extractor::Title>

Extracts titles from the given record.
#extract_titles_as_recorded(record) ⇒ Array<String>

Extracts titles as recorded from the given record.
#extract_titles_as_recorded_agr(record) ⇒ Array<String>

Extracts titles as recorded with vernacular form from the given record.
#extract_uniform_titles_as_recorded(record) ⇒ Array<String>

Extracts uniform titles as recorded from the given record.
#extract_uniform_titles_as_recorded_agr(record) ⇒ Array<String>

Extracts uniform titles as recorded with vernacular form from the given MARC XML record.
#extract_vocabulary(datafield) ⇒ String
#handle_bce_date(record) ⇒ Array<String>

Compiles BCE dates based on the provided record.
#title_as_recorded(record) ⇒ String

Extracts the title as recorded from the given record.
#title_as_recorded_agr(record, tag) ⇒ String

Extracts the title as recorded with vernacular form from the given record.
#uniform_title_as_recorded_agr(record) ⇒ String

Extracts uniform titles as recorded and aggregates them from the given MARC XML record.
#uniform_titles_as_recorded(record) ⇒ String

Extracts uniform titles as recorded from the given record.

Instance Method Details

#build_name_query(tags: [], relators: []) ⇒ `String`

Build names query tags and relators. Tags understood are 100, 700, and 710. The relators are used to require datafields based on the contents of a subfield code e containing the specified value, like ‘scribe’:

contains(./subfield[@code ='e'], 'scribe')

For relators see section <strong>$e - Relator term<strong>, here:

https://www.loc.gov/marc/bibliographic/bdx00.html

To require the subfield not have a relator, pass :none as the relator value.

build_name_query tags: ['100'], relators: :none

This will add the following to the query.

not(./subfield[@code = 'e'])

Note: In U. Penn manuscript catalog records, 700 and 710 fields that do not have a subfield code e are associated authors.

Parameters:

(defaults to: [])

the MARC field code
(defaults to: [])

for 700$e, 710$e, a value like ‘former owner’

Returns:

the data field query string

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 264

def build_name_query tags: [], relators: []
  return '' if tags.empty? # don't process nonsensical requests
  # make sure the tags are all strings
  _tags        = [tags].flatten.map &:to_s
  tag_query    = _tags.map { |t| "@tag = #{t}" }.join " or "
  query_string = "(#{tag_query})"

  _relators = [relators].flatten.map { |r| r.to_s.strip.downcase == 'none' ? :none : r }
  return "datafield[#{query_string}]" if _relators.empty?

  if _relators.include? :none
    query_string += " and not(./subfield[@code = 'e'])"
    return "datafield[#{query_string}]"
  end

  relator_string = relators.map { |r| "contains(./subfield[@code ='e'], '#{r}')" }.join " or "
  query_string   += (relator_string.empty? ? '' : " and (#{relator_string})")
  "datafield[#{query_string}]"
end

#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ `Array<Array>`

Extract subfield values specified by tags

Parameters:

a <marc:record> node
(defaults to: [])

the MARC datafield tag(s)
(defaults to: [])

the MARC subfield code(s)
(defaults to: '|')

separator for joining multiple datafield values
(defaults to: ' ')

separator for joining subfield values

Returns:

an array of arrays of values

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1083

def collect_datafields record, tags: [], codes: [], field_sep: '|', sub_sep: ' '
  _tags     = [tags].flatten.map &:to_s
  tag_query = _tags.map { |t| "@tag = #{t}" }.join " or "
  record.xpath("datafield[#{tag_query}]").map { |datafield|
    value = collect_subfields datafield, codes: codes, sub_sep: sub_sep
    DS::Util.clean_string value, terminator: ''
  }
end

#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ `Array<Array>`

Extract datafields values with authority numbers (URL) when present for reconciliation CSV output.

Parameters:

a <marc:record> node
(defaults to: [])

the MARC datafield tag(s)
(defaults to: [])

the MARC subfield code(s)
(defaults to: ' ')

separator for joining subfield values

Returns:

an array of arrays of values

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1063

def collect_recon_datafields record, tags: [], codes: [], sub_sep: ' '
  _tags     = [tags].flatten.map &:to_s
  tag_query = _tags.map { |t| "@tag = #{t}" }.join " or "
  record.xpath("datafield[#{tag_query}]").map { |datafield|
    value  = collect_subfields datafield, codes: codes, sub_sep: sub_sep
    value  = DS::Util.clean_string value, terminator: ''
    number = datafield.xpath('subfield[@tag="0"]').text
    [value, number]
  }
end

#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ `String`

A method to collect subfields from a given datafield based on specified codes.

Parameters:

the datafield to collect subfields from
(defaults to: [])

the MARC subfield code(s) to collect
(defaults to: ' ')

the separator for joining subfield values

Returns:

the concatenated subfield values

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1108

def collect_subfields datafield, codes: [], sub_sep: ' '
  # ensure that +codes+ is an array of strings
  _codes = [codes].flatten.map &:to_s
  # Code query example: ['a', 'b', 'd', 'c'] => @code = 'a' or @code = 'b' or @code = 'c' or @code = 'd'
  code_query = _codes.map { |code| "@code = '#{code}'" }.join ' or '
  xpath      = %Q{subfield[#{code_query}]}
  DS::Util.clean_string datafield.xpath(xpath).map(&:text).reject(&:empty?).join sub_sep
end

#compile_dates(record, code, part1, part2) ⇒ `Object`

Compiles dates based on the provided code and parts. This methods determines the date based on the date code from the MARC 008 field; the code in position 6 of the MARC 008 field.

See www.loc.gov/marc/bibliographic/bd008.html

Parameters:

the marc:record node
the marc 008 date code

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 723

def compile_dates record, code, part1, part2
  case code
  when 'i', 'k', 'm', 'p', 'q', '|'
    [part1, part2]
  when 'n'
    []
  when 'b'
    handle_bce_date record
  else
    [part1]
  end
end

#extract_001_control_number(record, holdings_file = nil) ⇒ `String`

Extracts the 001 control number from the given MARC XML record and joins non-empty values with ‘|’.

Parameters:

the MARC XML record to extract the control number from
(defaults to: nil)

(optional) the holdings file

Returns:

the extracted 001 control number joined with ‘|’

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1122

def extract_001_control_number record, holdings_file = nil
  ids = []
  # add the MMS ID
  ids << extract_mmsid(record)

  ids.reject(&:empty?).join '|'
end

#extract_acknowledgments(record) ⇒ `Array`

Extracts acknowledgments from the given record.

Parameters:

the record to extract acknowledgments from

Returns:

the extracted acknowledgments



1142
1143
1144

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1142

def extract_acknowledgments record
  []
end

#extract_all_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

Extracts all subjects from the given record, including named subjects and subjects.

Parameters:

the record to extract all subjects from

Returns:

the extracted all subjects



477
478
479

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 477

def extract_all_subjects record
  extract_named_subjects(record) + extract_subjects(record)
end

#extract_all_subjects_as_recorded(record) ⇒ `Array<String>`

Extracts all subjects as recorded from the given record.

Parameters:

the record to extract all subjects from

Returns:

the extracted all subjects as recorded



485
486
487

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 485

def extract_all_subjects_as_recorded record
  extract_all_subjects(record).map &:as_recorded
end

#extract_artists(record) ⇒ `Array<DS::Extractor::Name>`

Extracts artists from the given record using the specified type and role.

Parameters:

the record to extract artists from

Returns:

an array of extracted artists

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 96

def extract_artists record
  extract_names(
    record, tags: [700, 710, 711],
    relators:     ['artist', 'illuminator']
  )
end

#extract_artists_as_recorded(record) ⇒ `Array<String>`

Extracts artists as recorded from the given record.

Parameters:

the record to extract artists from

Returns:

the extracted artists as recorded



107
108
109

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 107

def extract_artists_as_recorded record
  extract_artists(record).map &:as_recorded
end

#extract_artists_as_recorded_agr(record) ⇒ `Array<String>`

Extracts artists as recorded with vernacular form from the given record.

Parameters:

the record to extract artists from

Returns:

the extracted artists as recorded with vernacular form



115
116
117

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 115

def extract_artists_as_recorded_agr record
  extract_artists(record).map &:vernacular
end

#extract_associated_agents(record) ⇒ `Object`



173
174
175

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 173

def extract_associated_agents record
  []
end

#extract_authority_number(datafield) ⇒ `String`

Extract the authority number, subfield $0 from the given datafield.

Parameters:

the marc:datafield node with the name

Returns:

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1013

def extract_authority_number datafield
  xpath = "./subfield[@code='0']"
  datafield.xpath(xpath).text
end

#extract_authors(record) ⇒ `Array<String>`

Extracts authors from the given record.

Parameters:

the record to extract authors from

Returns:

an array of extracted authors

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 168

def extract_authors record
  extract_names(record, tags: [100, 110, 111]) +
    extract_names(record, tags: [700, 710, 711], relators: %w{author})
end

#extract_authors_as_recorded(record) ⇒ `Array<String>`

Extract names from record using tags and relators. Authors are extracted from datafields 100, 110, 111, 700, 701, and 711.

All 1xx are extracted, no relator is assumed and all 1xx are assumed to be authors.

700, 710, and 711 are extracted when subfield 7xx$e contains ‘author’.

Parameters:

a <marc:record> node

Returns:

list of names

#extract_authors_as_recorded_agr(record) ⇒ `Array<String>`

Extract the alternate graphical representation of the name or return [].

See MARC specification for 880 fields:

www.loc.gov/marc/bibliographic/bd880.html

Parameters:

a <marc:record> node

Returns:

list of names or []

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 59

def extract_authors_as_recorded_agr record
  authors = []
  authors += extract_names_as_recorded_agr record, tags: [100, 110, 111]
  authors += extract_names_as_recorded_agr record, tags: [700, 710, 711], relators: %w{author}
  authors
end

#extract_cataloging_convention(record) ⇒ `Object`



1050
1051
1052

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1050

def extract_cataloging_convention record
  record.xpath('datafield[@tag=040]/subfield[@code="e"]/text()').text
end

#extract_date_part(datestring, ndx1, ndx2) ⇒ `String`

Extracts a part of the date string from a MARC 008 controlfield, using the indices ndx1 and ndx2.

Ensures that the extracted part starts with a digit and matches a sequence of digits and/or ‘u’.

Parameters:

the input datestring
the starting index for extraction
the length of the substring to extract

Returns:

the extracted part of the datestring

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 777

def extract_date_part datestring, ndx1, ndx2
  part = datestring[ndx1, ndx2]

  # part must start with a digit and match a seq of digits and/or u
  return unless part =~ /^\d[\du]+/

  part.sub! /^0+/, '' if part =~ /^0+[1-9]/
  part
end

#extract_date_range(record, range_sep:) ⇒ `Array`

Extract the encoded date from controlfield 008.

Follows

Returns an array containing a pair of dates or a single date, or an empty array.

The following date types have appeared in MARC records contributed to DS as of 2024-02-27 and are handled here:

b - No dates given; B.C. date involved

- 'b        '
- date is taken from 046$b, and if present $d or $e
- See: https://www.loc.gov/marc/bibliographic/bd046.html

e - Detailed date

- 'e11200520', 'e139403 x', 'e164509 t', 'e167707 y',
  'e187505 s'
- the first date part is returned a single year

i - Inclusive dates of collection

- 'i07500800', 'i08000830', 'i1000    '
- the first and -- if present -- second date part are
  returned as two years

k - Range of years of bulk of collection

- 'k15121716'
- the first and second date parts are returned as two years

m - Multiple dates

- 'm0618193u', 'm07390741', 'm10751200', 'm16uu1637',
  'm17uu1900'
- the first and second date parts are returned as two years
- see note below on replacement of u's

n - Dates unknown

- 'nuuuuuuuu'
- no date returned

p - Date of distribution/release/issue and

  production/recording session when different
- 'p1400    '
- the first and -- if present -- second date part are
  returned as two years

q - Questionable date q - ‘q01000299’, ‘q0979 ’, ‘q09910992’, ‘q10001099’,

  'q1300    ', 'q13uu14uu', 'q13uu1693', 'q14011425',
  'q1425uuuu', 'q1450    ', 'q1460    ', 'q14uu14uu',
  'quuuu1597'
- the first and -- if present -- second date part are
  returned as two years
- if the second date part is 'uuuu', the first date part is
  returned as year; ; ‘q1425uuuu’ => 1425
- if the first date part is 'uuuu', the second date part is
  returned as year; ‘quuuu1597’ => 1597
- for partial date parts with u's, see the note below

r - Reprint/reissue date and original date

- 'r11751199'
- the first date part is returned a single year

s - Single known date/probable date s - ‘s1171 ’, ‘s1171 xx ’, ‘s1192 ua ’, ‘s1250||||’,

  's1286 iq ', 's1315 sy ', 's1366 is ', 's1436 gw ',
  's1450 it ', 's1470 ly ', 's1470 tu ', 's1470 uuu',
  's1497 enk', 's1595 sp ', 's19uu    '
- the first date part is returned a single year
- see note below on replacement of u's

| - No attempt to code

- '|12501300'
- this appears to be miscoding
- nevertheless, '|' coded records will follow the default
  rule: date part one is returned a single year

The following cases, so far unrepresented in contributor data, will follow the default rule: date part one will be returned as a single year.

c - Continuing resource currently published d - Continuing resource ceased publication t - Publication date and copyright date u - Continuing resource status unknown

Note on the replacement of u’s in partial year dates

- Where u's appear in the first date they are replace by 0;
  thus, 'q13uu1693'  => '1300, 1693'
- Where u's appear in the second date they are replace by 9;
  thus, 'q14uu14uu'  => '1400, 1499'

For MARC partial dates see Date 1 and Date 2 documentation here

https://www.loc.gov/marc/bibliographic/bd008a.html

Parameters:

the marc:record node

Returns:

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 692

def extract_date_range record, range_sep:
  # 008 controlfield; e.g.,
  #
  #     "220518q14001500xx            000 0     d"
  ctrl_008 = record.at_xpath("controlfield[@tag='008']")
  return [] unless ctrl_008 # return if no 008
  # get positions 7-15: q14001500
  date_str = ctrl_008.text[6, 9]
  code     = date_str[0] # 'm'
  part1    = extract_date_part date_str, 1, 4 # '0618'
  part1.gsub! /u/, '0' if part1.present?
  part2 = extract_date_part date_str, 5, 8 # '193u'
  part2.gsub! /u/, '9' if part2.present?

  range = compile_dates(record, code, part1, part2).filter_map { |y|
    # filter out blank dates and '9999'
    y if y.present? && y != '9999'
  }

  return [] if range.blank?
  [range.join(range_sep)]
end

#extract_extent(record) ⇒ `Array<String>`

Extracts the extent from the given MARC XML record.

Parameters:

the MARC XML record to extract extent from

Returns:

an array of extracted extents

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 975

def extract_extent record
  subfield_xpath = "subfield[@code = 'a' or @code = 'b' or @code = 'c']"
  record.xpath("datafield[@tag=300]").map { |datafield|
    datafield.xpath(subfield_xpath).filter_map { |s|
      s.text unless s.text.empty?
    }.join ' '
  }.filter_map { |ext|
    "Extent: #{DS::Util.clean_string ext}" unless ext.strip.empty?
  }
end

#extract_former_owners(record) ⇒ `Array<DS::Extractor::Name>`

Extract former owners from the given record.

Parameters:

the record to extract former owners from

Returns:

the extracted former owners

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 123

def extract_former_owners record
  extract_names(
    record, tags: [700, 710, 711], relators: ['former owner']
  )
end

#extract_former_owners_as_recorded(record) ⇒ `Array<String>`

Extracts former owners as recorded from the given record.

Parameters:

the record to extract former owners from

Returns:

the extracted former owners as recorded



133
134
135

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 133

def extract_former_owners_as_recorded record
  extract_former_owners(record).map &:as_recorded
end

#extract_former_owners_as_recorded_agr(record) ⇒ `Array<String>`

Extracts former owners as recorded with vernacular form from the given record.

Parameters:

the record to extract former owners from

Returns:

the extracted former owners as recorded with vernacular form



141
142
143

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 141

def extract_former_owners_as_recorded_agr record
  extract_former_owners(record).map &:vernacular
end

#extract_genre_vocabulary(record) ⇒ `Array<Symbol>`

Extracts the genre vocabulary from the given MARC XML record.

Parameters:

the MARC XML record to extract genre vocabulary from

Returns:

an array of extracted genre vocabularies



530
531
532

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 530

def extract_genre_vocabulary record
  extract_genres(record).map(&:vocab)
end

#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ `Array<DS::Extractor::Genre>`

Extracts genres from the given MARC XML record.

Parameters:

the MARC XML record to extract genres from
(defaults to: '--')

(default: ‘–’) the separator for joining subfields
(defaults to: :all)

(default: :all) the vocab type to extract

Returns:

an array of extracted genres

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 508

def extract_genres record, sub_sep: '--', vocab: :all
  xpath = %q{datafield[@tag = 655]}
  record.xpath(xpath).filter_map { |datafield|
    as_recorded          = collect_subfields datafield, codes: 'abcvzyx'.split(//), sub_sep: sub_sep
    as_recorded          = DS::Util.clean_string as_recorded, terminator: ''
    term_vocab           = extract_vocabulary datafield
    source_authority_uri = extract_authority_number datafield

    next unless as_recorded.present?
    next unless vocab == :all || vocab == term_vocab

    DS::Extractor::Genre.new(
      as_recorded: as_recorded, vocab: term_vocab,
      source_authority_uri: source_authority_uri
    )
  }
end

#extract_genres_as_recorded(record, uniq: true) ⇒ `Array<String>`

Genres and subjects

Extract genre and form terms from MARC datafield 655 values, where the 655$2 value can be specified; e.g., rbprov, aat, lcgft.

Set sub2 to :all to extract all 655 terms

Parameters:

the MARC record
(defaults to: true)

whether to return only unique terms; default: true

Returns:

array of genre terms

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 366

def extract_genres_as_recorded record, uniq: true
  terms = extract_genres(record, sub_sep: '--', vocab: :all).map(&:as_recorded)

  uniq ? terms.uniq : terms
end

#extract_langs(record) ⇒ `String`

Extract the language codes from controlfield 008 and datafield 041$a.

Parameters:

the marc:record node

Returns:

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 321

def extract_langs record
  # Language is in 008 at characters 35-37 (0-based indexing)
  (langs ||= []) << record.xpath("substring(controlfield[@tag='008']/text(), 36, 3)")
  # 041 is present if there's more than one language
  langs += record.xpath("datafield[@tag=041]/subfield[@code='a']").map(&:text)
  # if there are 041 values, the lang from 008 is repeated; remove the duplicate
  langs.select(&:present?).uniq
end

#extract_languages(record) ⇒ `Object`

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 340

def extract_languages record
  xpath = "datafield[@tag=546]/subfield[@code='a']"
  langs = record.xpath(xpath).map { |val|
    DS::Util.clean_string val.text, terminator: ''
  }.select(&:present?).map { |as_recorded|
    DS::Extractor::Language.new as_recorded: as_recorded
  }
  return langs if langs.present?

  extract_langs(record).map { |as_recorded|
    DS::Extractor::Language.new as_recorded: as_recorded
  }
end

#extract_languages_as_recorded(record) ⇒ `String`

Extract the language as record; default to the 546$a field; otheriwse return the code values from controlfield 008 and 041$a.

Parameters:

the marc:record node

Returns:



336
337
338

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 336

def extract_languages_as_recorded record
  extract_languages(record).map &:as_recorded
end

#extract_material_as_recorded(record) ⇒ `String`

Extracts the material as recorded from the given MARC XML record.

Parameters:

the MARC XML record to extract material from

Returns:

the extracted material as recorded



954
955
956

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 954

def extract_material_as_recorded record
  extract_materials(record).map(&:as_recorded).first.to_s
end

#extract_materials(record) ⇒ `Array<DS::Extractor::Material>`

Extracts materials from the given MARC XML record.

Parameters:

the MARC XML record to extract materials from

Returns:

an array of extracted materials

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 962

def extract_materials record
  DS::Extractor::MarcXmlExtractor.collect_datafields(
    record, tags: 300, codes: 'b'
  ).filter_map { |material|
    next unless material.present?
    DS::Extractor::Material.new as_recorded: material
  }
end

#extract_mmsid(record) ⇒ `String`

Extracts the MMS ID from the given MARC XML record.

Parameters:

the MARC XML record to extract the MMS ID from

Returns:

the extracted MMS ID



1134
1135
1136

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1134

def extract_mmsid record
  record.xpath("controlfield[@tag=001]").text
end

#extract_name_portion(datafield) ⇒ `String`

Extract the the PN from datafield, pulling subfields $a, $b, $c, $d.

Parameters:

the marc:datafield node with the name

Returns:

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 289

def extract_name_portion datafield
  codes = %w{ a b c d }
  value = collect_subfields datafield, codes: codes
  DS::Util.clean_string value, terminator: ''
end

#extract_named_500(record, name:, strip_name: false) ⇒ `Array<String>`

Return an array of 500$a values that begin with name: (name followed by a colon :). The name prefix is removed if strip_name is true; it’s false by default.

Parameters:

the MARC XML record
the named prefix, like ‘Binding’, without trailing colon
(defaults to: false)

whether to remove the name prefix from returned comments; default is false

Returns:

the matching

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1156

def extract_named_500 record, name:, strip_name: false
  return [] if name.to_s.strip.empty?

  # format the prefix; make sure there's not an extra ':'
  prefix = "#{name.strip.chomp ':'}:"
  xpath  = %Q{datafield[@tag=500]/subfield[@code='a' and starts-with(translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"), '#{prefix.downcase}')]/text()}
  record.xpath(xpath).map { |d|
    note = d.text.strip
    strip_name ? note.sub(%r{^#{prefix}\s*}i, '') : note
  }
end

#extract_named_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

Extract named subjects from the MARC XML record based on specified tags.

Parameters:

the record to extract named subjects from

Returns:

an array of extracted named subjects



453
454
455

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 453

def extract_named_subjects record
  extract_subject_by_tags record, tags: [600, 610, 611, 630, 647]
end

#extract_named_subjects_as_recorded(record) ⇒ `Array<String>`

Extracts named subjects as recorded from the given record.

Parameters:

the record to extract named subjects from

Returns:

the extracted named subjects as recorded



444
445
446

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 444

def extract_named_subjects_as_recorded record
  extract_named_subjects(record).map &:as_recorded
end

#extract_names(record, tags: [], relators: []) ⇒ `Array<DS::Extractor::Name>`

Extract names from the MARC XML record based on specified tags and relators.

Parameters:

the record to extract names from
(defaults to: [])

the MARC field tag
(defaults to: [])

for 700$e, 710$e, values like ‘former owner’

Returns:

an array of extracted names

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 199

def extract_names record, tags: [], relators: []
  xpath = build_name_query tags: tags, relators: relators
  return [] if xpath.empty? # don't process nonsensical requests

  record.xpath(xpath).map { |datafield|

    as_recorded = extract_name_portion datafield
    role        = extract_role datafield, relators: relators
    role        = 'author' if role.blank?
    vernacular  = extract_pn_agr datafield
    ref         = extract_authority_number datafield

    DS::Extractor::Name.new(
      as_recorded: as_recorded, role: role,
      vernacular:  vernacular, ref: ref
    )
  }
end

#extract_names_as_recorded(record, tags: [], relators: []) ⇒ `String`

Extract names from record using tags and relators. Tags understood are 100, 700, and 710. The relators are used to require datafields based on the contents of a subfield code e containing the specified value, like ‘scribe’:

contains(./subfield[@code ='e'], 'scribe')

Parameters:

a <marc:record> node
(defaults to: [])

the MARC field tag
(defaults to: [])

for 700$e, 710$e, a value like ‘former owner’

Returns:

pipe-separated list of names

#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ `Object`

Extract the alternate graphical representation of the name or return ”.

See MARC specification for 880 fields:

www.loc.gov/marc/bibliographic/bd880.html

Parameters:

a <marc:record> node
(defaults to: [])

the MARC field code
(defaults to: [])

for 700$e, 710$e, a value like ‘former owner’

#extract_notes(record) ⇒ `Array<String>`

Notes

Extract notes from record.

Extract values from ‘500$a` fields that do not begin with AMREMM tags for specific values like ’Binding:‘. Specifically, this method ignores fields beginning with:

Pagination|Foliation|Layout|Colophon|Collation|Script|Decoration|\
     Binding|Origin|Watermarks|Watermark|Signatures|Shelfmark

Parameters:

a <MARC_RECORD> node

Returns:

an array of note strings

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1001

def extract_notes record
  xpath = "datafield[@tag=500 or @tag=561]/subfield[@code='a']/text()"
  record.xpath(xpath).map { |note|
    DS::Util.clean_string note.text.strip.gsub(%r{\s+}, ' ')
  }
end

#extract_physical_description(record) ⇒ `String`

Extracts the physical description from the given MARC XML record.

Parameters:

the record to extract the physical description from

Returns:

the extracted physical description



946
947
948

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 946

def extract_physical_description record
  extract_extent(record)
end

#extract_places(record) ⇒ `Object`

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 575

def extract_places record
  xpath = "datafield[@tag=260 or @tag=264]/subfield[@code='a']/text()"
  record.xpath(xpath).map { |pn|
    next if pn.to_s.blank?
    as_recorded = DS::Util.clean_string(pn.text, terminator: '')
    DS::Extractor::Place.new as_recorded: as_recorded
  }
end

#extract_pn_agr(datafield) ⇒ `String`

Extract the alternate graphical representation of the name or return ”.

See MARC specification for 880 fields:

www.loc.gov/marc/bibliographic/bd880.html

Input will look like this:

<marc:datafield ind1="1" ind2=" " tag="100">
  <marc:subfield code="6">880-01</marc:subfield>
  <marc:subfield code="a">Urmawī, ʻAbd al-Muʼmin ibn Yūsuf,</marc:subfield>
  <marc:subfield code="d">approximately 1216-1294.</marc:subfield>
</marc:datafield>
<!-- ... -->
<marc:datafield ind1="1" ind2=" " tag="880">
  <marc:subfield code="6">100-01//r</marc:subfield>
  <marc:subfield code="a">ارموي، عبد المؤمن بن يوسف،</marc:subfield>
  <marc:subfield code="d">اپرxمتلي 12161294.</marc:subfield>
</marc:datafield>

Parameters:

the main data field @tag = ‘100’, ‘700’, etc.

Returns:

the text representation of the value

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1041

def extract_pn_agr datafield
  linkage = datafield.xpath("subfield[@code='6']").text
  return '' if linkage.empty?
  tag   = datafield.xpath('./@tag').text
  index = linkage.split('-').last
  xpath = "./parent::record/datafield[@tag='880' and contains(./subfield[@code='6'], '#{tag}-#{index}')]"
  extract_name_portion datafield.xpath(xpath)
end

#extract_production_date_as_recorded(record) ⇒ `Object`

Look for a date as recorded. Look first at 260$c, then 260$d, then 245$f, finally use the encoded date from 008

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 790

def extract_production_date_as_recorded record
  # Note that MARC does not specify a subfield '260$d':
  #
  # https://www.loc.gov/marc/bibliographic/bd260.html
  #
  # However Cornell use $d to continue 260$c
  dar = record.xpath("datafield[@tag=260]/subfield[@code='c' or @code='d']/text()").map do |t|
    DS::Util.clean_string t.text.strip
  end.join ' '
  return [dar.strip] unless dar.strip.empty?

  dar = record.xpath("datafield[@tag=264]/subfield[@code='c']/text()").map do |t|
    DS::Util.clean_string t.text.strip
  end.join ' '
  return [dar.strip] unless dar.strip.empty?

  # 245 is the title field but can have a date in $f
  #
  # see: https://www.loc.gov/marc/bibliographic/bd245.html
  #
  # Cornell uses 245$f in records that also lack 260 or 264; see
  # '4600 Bd. Ms. 176':
  #
  # https://catalog.library.cornell.edu/catalog/6382455/librarian_view
  #
  #   <datafield ind1="0" ind2="0" tag="245">
  #     <subfield code="a">Shah-nameh,</subfield>
  #     <subfield code="f">1600s.</subfield>
  #   </datafield>
  #
  dar = record.xpath("datafield[@tag=245]/subfield[@code='f']").text
  return [DS::Util.clean_string(dar)] unless dar.strip.empty?

  encoded_date = extract_date_range record, range_sep: '-'
  [encoded_date.join('_').strip]
end

#extract_production_places_as_recorded(record) ⇒ `Array<String>`

Look for a place as recorded. Look first at 264$a, then 260$a; return ” when no value is found

Parameters:

the MARC record

Returns:

the place name or []

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 551

def extract_production_places_as_recorded record
  xpath = "datafield[@tag=260 or @tag=264]/subfield[@code='a']/text()"
  record.xpath(xpath).map { |pn|
    DS::Util.clean_string pn.text, terminator: '' unless pn.to_s.strip.empty?
  }
end

#extract_recon_genres(record, sub_sep: '--') ⇒ `Array<Array>`

Extract genre terms for reconciliation CSV output.

Returns a two-dimensional array, each row is a place; and each row has three columns: term, vocab, and authority number.

Parameters:

a <MARC_RECORD> node

Returns:

an array of arrays of values



498
499
500

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 498

def extract_recon_genres record, sub_sep: '--'
  extract_genres(record, sub_sep: sub_sep).map(&:to_a)
end

#extract_recon_names(record, tags: [], relators: []) ⇒ `Array<Array<String>>`

For the given record, extract the names as an array of arrays, including the concatenated name string (subfields, a, b, c, d) and, if present, the alternate graphical representation (AGR) and authority number (or URI).

Each returned sub array will have three values: name, name AGR, URI.

Parameters:

a <marc:record> node
(defaults to: [])

the MARC field tag
(defaults to: [])

for 700$e, 710$e, a value like ‘former owner’

Returns:



189
190
191

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 189

def extract_recon_names record, tags: [], relators: []
  extract_names(record, tags: tags, relators: relators).map &:to_a
end

#extract_recon_places(record) ⇒ `Array<Array>`

Extract the places of production MARC 260$a for reconciliation CSV output.

Returns a two-dimensional array, each row is a place; and each row has one column: place name; for example:

[["Austria"],
 ["Germany"],
 ["France (?)"]]

Parameters:

a <marc:record> node

Returns:

an array of arrays of values



571
572
573

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 571

def extract_recon_places record
  extract_places(record).map &:to_a
end

#extract_recon_subjects(record) ⇒ `Array`

Extracts reconstructed subjects from the given record.

Parameters:

the record to extract reconstructed subjects from

Returns:

the extracted reconstructed subjects



538
539
540

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 538

def extract_recon_subjects record
  extract_all_subjects(record).map &:to_a
end

#extract_recon_titles(record) ⇒ `Array<String>`

Extracts reconstructed titles from the given record.

Parameters:

the record to extract reconstructed titles from

Returns:

the extracted reconstructed titles



835
836
837

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 835

def extract_recon_titles record
  extract_titles(record).to_a
end

#extract_role(datafield, relators:) ⇒ `String`

Extract the role value, subfield $e, from the given datafield.

Parameters:

the marc:datafield node with the name

Returns:

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 300

def extract_role datafield, relators:
  relators_list = *relators
  return '' if relators_list.empty? or relators_list.include? :none
  # if there's no $e, stop processing
  return '' if datafield.xpath('subfield[@code = "e"]/text()').text.empty?

  df_roles    = datafield.xpath('subfield[@code = "e"]/text()').map(&:text)
  rel_pattern = /(#{relators_list.join('|')})/
  role        = df_roles.find { |role| role =~ rel_pattern }
  DS::Util.clean_string role, terminator: ''
end

#extract_scribes(record) ⇒ `Array<DS::Extractor::Name>`

Extract scribes from the given record.

Parameters:

the record to extract scribes from

Returns:

the extracted scribes

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 70

def extract_scribes record
  extract_names(
    record, tags: [700, 710, 711], relators: ['scribe']
  )
end

#extract_scribes_as_recorded(record) ⇒ `Array<String>`

Extract scribes as recorded from the given record.

Parameters:

the record to extract scribes from

Returns:

the extracted scribes as recorded



80
81
82

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 80

def extract_scribes_as_recorded record
  extract_scribes(record).map &:as_recorded
end

#extract_scribes_as_recorded_agr(record) ⇒ `Array<String>`

Extracts scribes as recorded with vernacular form from the given record.

Parameters:

the record to extract scribes from

Returns:

the extracted scribes as recorded with vernacular form



88
89
90

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 88

def extract_scribes_as_recorded_agr record
  extract_scribes(record).map &:vernacular
end

#extract_subject_by_tags(record, tags: []) ⇒ `Array<DS::Extractor::Subject>`

Return an array of strings of formatted subjects (600, 610, 611, 630, 647, 648, 650, and 651). Subjects values are separated by ‘–’:

<datafield ind1="1" ind2="0" tag="600">
  <subfield code="a">Cicero, Marcus Tullius</subfield>
  <subfield code="x">Spurious and doubtful works.</subfield>
</datafield>

# => "Cicero, Marcus Tullius--Spurious and doubtful works"

Subfields with codes ‘b’, ‘c’, ‘d’, ‘p’, ‘q’, and ‘t’ are appended to the preceding subfield:

  <datafield ind1=" " ind2="7" tag="647">
    <subfield code="a">Conspiracy of Catiline</subfield>
    <subfield code="c">(Rome :</subfield>
    <subfield code="d">65-62 B.C.)</subfield>
    <subfield code="2">fast</subfield>
    <subfield code="0">(OCoLC)fst01352536</subfield>
  </datafield>

  # => "Conspiracy of Catiline (Rome : 65-62 B.C.)"

@param [Nokogiri::XML::Node] record the MARC record

Returns:

an array of formatted subjects strings

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 398

def extract_subject_by_tags record, tags: []
  tag_list = *tags
  raise "No tags given for subject extraction: #{tags.inspect}" if tag_list.empty?
  sep       = '--'
  tag_query = tag_list.map { |tag| "@tag=#{tag}" }.join " or "
  record.xpath("datafield[#{tag_query}]").map { |datafield|
    values = Hash.new { |hash, k| hash[k] = [] }
    vocab  = datafield.xpath('./@ind2').text
    datafield.xpath("subfield").map { |subfield|
      subfield_text = DS::Util.clean_string subfield.text
      subfield_code = subfield.xpath('./@code').text
      case subfield_code
      when 'e', 'w'
        # don't include these formatted in subject
      when 'b', 'c', 'd', 'p', 'q', 't'
        # append these to the preceding value
        # we assume that there is a preceding value
        values[:terms][-1] += " #{subfield_text}"
        values[:codes][-1] += ";#{subfield_code}"
      when %r{\A[[:alpha:]]\z}
        # any other codes: a, g, v, x, y, z
        values[:terms] << subfield_text
        values[:codes] << subfield_code
      when '2'
        vocab = subfield.text
      when '0'
        values[:urls] << subfield_text
      end
    }
    terms = DS::Util.clean_string values[:terms].join(sep), terminator: ''
    urls  = DS::Util.clean_string values[:urls].join(sep), terminator: ''
    codes = DS::Util.clean_string values[:codes].join(sep), terminator: ''
    DS::Extractor::Subject.new(
      as_recorded:          terms,
      subfield_codes:       codes,
      source_authority_uri: urls,
      vocab:                vocab
    )
  }

end

#extract_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

Extracts subjects from the given record based on specified tags.

Parameters:

the record to extract subjects from

Returns:

an array of extracted subjects



469
470
471

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 469

def extract_subjects record
  extract_subject_by_tags record, tags: [648, 650, 651]
end

#extract_subjects_as_recorded(record) ⇒ `Array<String>`

Extracts subjects as recorded from the given record.

Parameters:

the record to extract subjects from

Returns:

the extracted subjects as recorded



461
462
463

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 461

def extract_subjects_as_recorded record
  extract_subjects(record).map &:as_recorded
end

#extract_titles(record) ⇒ `Array<DS::Extractor::Title>`

Extracts titles from the given record.

Parameters:

the record to extract titles from

Returns:

an array of extracted titles

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 843

def extract_titles record
  tar      = title_as_recorded record
  tar_agr  = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.title_as_recorded_agr(record, 245), terminator: ''
  utar     = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.uniform_titles_as_recorded(record), terminator: ''
  utar_agr = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.uniform_title_as_recorded_agr(record), terminator: ''

  [DS::Extractor::Title.new(
    as_recorded:              tar,
    vernacular:               tar_agr,
    uniform_title:            utar,
    uniform_title_vernacular: utar_agr
  )]
end

#extract_titles_as_recorded(record) ⇒ `Array<String>`

Extracts titles as recorded from the given record.

Parameters:

the record to extract titles from

Returns:

the extracted titles as recorded



893
894
895

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 893

def extract_titles_as_recorded record
  extract_titles(record).map &:as_recorded
end

#extract_titles_as_recorded_agr(record) ⇒ `Array<String>`

Extracts titles as recorded with vernacular form from the given record.

Parameters:

the record to extract titles from

Returns:

the extracted titles as recorded with vernacular form



861
862
863

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 861

def extract_titles_as_recorded_agr record
  extract_titles(record).map &:vernacular
end

#extract_uniform_titles_as_recorded(record) ⇒ `Array<String>`

Extracts uniform titles as recorded from the given record.

Parameters:

the record to extract uniform titles from

Returns:

the extracted uniform titles as recorded



913
914
915

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 913

def extract_uniform_titles_as_recorded record
  extract_titles(record).map &:uniform_title
end

#extract_uniform_titles_as_recorded_agr(record) ⇒ `Array<String>`

Extracts uniform titles as recorded with vernacular form from the given MARC XML record.

Parameters:

the record to extract uniform titles from

Returns:

the extracted uniform titles as recorded with vernacular form



922
923
924

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 922

def extract_uniform_titles_as_recorded_agr record
  extract_titles(record).map &:uniform_title_vernacular
end

#extract_vocabulary(datafield) ⇒ `String`

Parameters:

the term datafield

Returns:

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1095

def extract_vocabulary datafield
  return 'lcsh' if datafield['ind2'] == '0'

  vocab = datafield.xpath("subfield[@code=2]").text
  vocab.chomp '.' if vocab.present?
end

#handle_bce_date(record) ⇒ `Array<String>`

Compiles BCE dates based on the provided record. It extracts BCE dates from specific subfields in the MARC XML record.

The method stops and returns an empty array [] if the record lacks a 240$b (BCE date 1). It then looks for a 245$d (BCE date 2) or 245$e (CE date 2). An array containing the single 240$b value as a negative value or a range of two dates.

See: www.loc.gov/marc/bibliographic/bd046.html

Parameters:

the MARC XML record

Returns:

an array of BCE dates in string format

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 749

def handle_bce_date record
  # "datafield[@tag=260]/subfield[@code='c' or @code='d']/text()")
  bce_date1 = record.at_xpath('datafield[@tag=046]/subfield[@code="b"]/text()').to_s
  # stop if there's no BCE date 1
  return [] if bce_date1.blank?

  xpath     = 'datafield[@tag=046]/subfield[@code="d"]/text()'
  bce_date2 = record.at_xpath(xpath).to_s

  return ["-#{bce_date1}", "-#{bce_date2}"] if bce_date2.present?

  xpath    = 'datafield[@tag=046]/subfield[@code="e"]/text()'
  ce_date2 = bce_date2 = record.at_xpath(xpath).to_s
  return ["-#{bce_date1}", ce_date2] if ce_date2.present?

  ["-#{bce_date1}"]
end

#title_as_recorded(record) ⇒ `String`

Extracts the title as recorded from the given record.

Parameters:

the record to extract the title from

Returns:

the extracted title as recorded

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 869

def title_as_recorded record
  xpath = "datafield[@tag=245]/subfield[@code='a' or @code='b']"
  record.xpath(xpath).map { |title|
    DS::Util.clean_string(title.text, terminator: '')
  }.join '; '
end

#title_as_recorded_agr(record, tag) ⇒ `String`

Extracts the title as recorded with vernacular form from the given record.

Parameters:

the record to extract the title from
the tag to use for extraction

Returns:

the extracted title as recorded with vernacular form

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 881

def title_as_recorded_agr record, tag
  linkage = record.xpath("datafield[@tag=#{tag}]/subfield[@code='6']").text
  return '' if linkage.empty?
  index = linkage.split('-').last
  xpath = "datafield[@tag='880' and contains(./subfield[@code='6'], '#{tag}-#{index}')]/subfield[@code='a']"
  DS::Util.clean_string record.xpath(xpath).text.delete '[]'
end

#uniform_title_as_recorded_agr(record) ⇒ `String`

Extracts uniform titles as recorded and aggregates them from the given MARC XML record.

Parameters:

the MARC XML record to extract uniform titles from

Returns:

the aggregated uniform titles as recorded

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 930

def uniform_title_as_recorded_agr record
  tag240 = title_as_recorded_agr record, 240
  tag130 = title_as_recorded_agr record, 130
  [tag240, tag130].reject(&:empty?).map { |title|
    DS::Util.clean_string title
  }.join '|'
end

#uniform_titles_as_recorded(record) ⇒ `String`

Extracts uniform titles as recorded from the given record.

Parameters:

the record to extract uniform titles from

Returns:

the extracted uniform titles as recorded joined by ‘|’

# File 'lib/ds/extractor/marc_xml_extractor.rb', line 901

def uniform_titles_as_recorded record
  title_240 = record.xpath("datafield[@tag=240]/subfield[@code='a']").text
  title_130 = record.xpath("datafield[@tag=130]/subfield[@code='a']").text
  [title_240, title_130].reject(&:empty?).map { |title|
    DS::Util.clean_string title, terminator: ''
  }.join '|'
end

Module: DS::Extractor::MarcXmlExtractor::ClassMethods

Instance Method Summary collapse

Instance Method Details

#build_name_query(tags: [], relators: []) ⇒ String

#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ Array<Array>

#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ Array<Array>

#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ String

#compile_dates(record, code, part1, part2) ⇒ Object

#extract_001_control_number(record, holdings_file = nil) ⇒ String

#extract_acknowledgments(record) ⇒ Array

#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>

#extract_all_subjects_as_recorded(record) ⇒ Array<String>

#extract_artists(record) ⇒ Array<DS::Extractor::Name>

#extract_artists_as_recorded(record) ⇒ Array<String>

#extract_artists_as_recorded_agr(record) ⇒ Array<String>

#extract_associated_agents(record) ⇒ Object

#extract_authority_number(datafield) ⇒ String

#extract_authors(record) ⇒ Array<String>

#extract_authors_as_recorded(record) ⇒ Array<String>

#extract_authors_as_recorded_agr(record) ⇒ Array<String>

#extract_cataloging_convention(record) ⇒ Object

#extract_date_part(datestring, ndx1, ndx2) ⇒ String

#extract_date_range(record, range_sep:) ⇒ Array

#extract_extent(record) ⇒ Array<String>

#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>

#extract_former_owners_as_recorded(record) ⇒ Array<String>

#extract_former_owners_as_recorded_agr(record) ⇒ Array<String>

#extract_genre_vocabulary(record) ⇒ Array<Symbol>

#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ Array<DS::Extractor::Genre>

#extract_genres_as_recorded(record, uniq: true) ⇒ Array<String>

#extract_langs(record) ⇒ String

#extract_languages(record) ⇒ Object

#extract_languages_as_recorded(record) ⇒ String

#extract_material_as_recorded(record) ⇒ String

#extract_materials(record) ⇒ Array<DS::Extractor::Material>

#extract_mmsid(record) ⇒ String

#extract_name_portion(datafield) ⇒ String

#extract_named_500(record, name:, strip_name: false) ⇒ Array<String>

#extract_named_subjects(record) ⇒ Array<DS::Extractor::Subject>

#extract_named_subjects_as_recorded(record) ⇒ Array<String>

#extract_names(record, tags: [], relators: []) ⇒ Array<DS::Extractor::Name>

#extract_names_as_recorded(record, tags: [], relators: []) ⇒ String

#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ Object

#extract_notes(record) ⇒ Array<String>

#extract_physical_description(record) ⇒ String

#extract_places(record) ⇒ Object

#extract_pn_agr(datafield) ⇒ String

#extract_production_date_as_recorded(record) ⇒ Object

#extract_production_places_as_recorded(record) ⇒ Array<String>

#extract_recon_genres(record, sub_sep: '--') ⇒ Array<Array>

#extract_recon_names(record, tags: [], relators: []) ⇒ Array<Array<String>>

#extract_recon_places(record) ⇒ Array<Array>

#extract_recon_subjects(record) ⇒ Array

#extract_recon_titles(record) ⇒ Array<String>

#extract_role(datafield, relators:) ⇒ String

#extract_scribes(record) ⇒ Array<DS::Extractor::Name>

#extract_scribes_as_recorded(record) ⇒ Array<String>

#extract_scribes_as_recorded_agr(record) ⇒ Array<String>

#extract_subject_by_tags(record, tags: []) ⇒ Array<DS::Extractor::Subject>

#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>

#extract_subjects_as_recorded(record) ⇒ Array<String>

#extract_titles(record) ⇒ Array<DS::Extractor::Title>

#extract_titles_as_recorded(record) ⇒ Array<String>

#extract_titles_as_recorded_agr(record) ⇒ Array<String>

#extract_uniform_titles_as_recorded(record) ⇒ Array<String>

#extract_uniform_titles_as_recorded_agr(record) ⇒ Array<String>

#extract_vocabulary(datafield) ⇒ String

#handle_bce_date(record) ⇒ Array<String>

#title_as_recorded(record) ⇒ String

#title_as_recorded_agr(record, tag) ⇒ String

#uniform_title_as_recorded_agr(record) ⇒ String

#uniform_titles_as_recorded(record) ⇒ String

#build_name_query(tags: [], relators: []) ⇒ `String`

#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ `Array<Array>`

#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ `Array<Array>`

#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ `String`

#compile_dates(record, code, part1, part2) ⇒ `Object`

#extract_001_control_number(record, holdings_file = nil) ⇒ `String`

#extract_acknowledgments(record) ⇒ `Array`

#extract_all_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

#extract_all_subjects_as_recorded(record) ⇒ `Array<String>`

#extract_artists(record) ⇒ `Array<DS::Extractor::Name>`

#extract_artists_as_recorded(record) ⇒ `Array<String>`

#extract_artists_as_recorded_agr(record) ⇒ `Array<String>`

#extract_associated_agents(record) ⇒ `Object`

#extract_authority_number(datafield) ⇒ `String`

#extract_authors(record) ⇒ `Array<String>`

#extract_authors_as_recorded(record) ⇒ `Array<String>`

#extract_authors_as_recorded_agr(record) ⇒ `Array<String>`

#extract_cataloging_convention(record) ⇒ `Object`

#extract_date_part(datestring, ndx1, ndx2) ⇒ `String`

#extract_date_range(record, range_sep:) ⇒ `Array`

#extract_extent(record) ⇒ `Array<String>`

#extract_former_owners(record) ⇒ `Array<DS::Extractor::Name>`

#extract_former_owners_as_recorded(record) ⇒ `Array<String>`

#extract_former_owners_as_recorded_agr(record) ⇒ `Array<String>`

#extract_genre_vocabulary(record) ⇒ `Array<Symbol>`

#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ `Array<DS::Extractor::Genre>`

#extract_genres_as_recorded(record, uniq: true) ⇒ `Array<String>`

#extract_langs(record) ⇒ `String`

#extract_languages(record) ⇒ `Object`

#extract_languages_as_recorded(record) ⇒ `String`

#extract_material_as_recorded(record) ⇒ `String`

#extract_materials(record) ⇒ `Array<DS::Extractor::Material>`

#extract_mmsid(record) ⇒ `String`

#extract_name_portion(datafield) ⇒ `String`

#extract_named_500(record, name:, strip_name: false) ⇒ `Array<String>`

#extract_named_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

#extract_named_subjects_as_recorded(record) ⇒ `Array<String>`

#extract_names(record, tags: [], relators: []) ⇒ `Array<DS::Extractor::Name>`

#extract_names_as_recorded(record, tags: [], relators: []) ⇒ `String`

#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ `Object`

#extract_notes(record) ⇒ `Array<String>`

#extract_physical_description(record) ⇒ `String`

#extract_places(record) ⇒ `Object`

#extract_pn_agr(datafield) ⇒ `String`

#extract_production_date_as_recorded(record) ⇒ `Object`

#extract_production_places_as_recorded(record) ⇒ `Array<String>`

#extract_recon_genres(record, sub_sep: '--') ⇒ `Array<Array>`

#extract_recon_names(record, tags: [], relators: []) ⇒ `Array<Array<String>>`

#extract_recon_places(record) ⇒ `Array<Array>`

#extract_recon_subjects(record) ⇒ `Array`

#extract_recon_titles(record) ⇒ `Array<String>`

#extract_role(datafield, relators:) ⇒ `String`

#extract_scribes(record) ⇒ `Array<DS::Extractor::Name>`

#extract_scribes_as_recorded(record) ⇒ `Array<String>`

#extract_scribes_as_recorded_agr(record) ⇒ `Array<String>`

#extract_subject_by_tags(record, tags: []) ⇒ `Array<DS::Extractor::Subject>`

#extract_subjects(record) ⇒ `Array<DS::Extractor::Subject>`

#extract_subjects_as_recorded(record) ⇒ `Array<String>`

#extract_titles(record) ⇒ `Array<DS::Extractor::Title>`

#extract_titles_as_recorded(record) ⇒ `Array<String>`

#extract_titles_as_recorded_agr(record) ⇒ `Array<String>`

#extract_uniform_titles_as_recorded(record) ⇒ `Array<String>`

#extract_uniform_titles_as_recorded_agr(record) ⇒ `Array<String>`

#extract_vocabulary(datafield) ⇒ `String`

#handle_bce_date(record) ⇒ `Array<String>`

#title_as_recorded(record) ⇒ `String`

#title_as_recorded_agr(record, tag) ⇒ `String`

#uniform_title_as_recorded_agr(record) ⇒ `String`

#uniform_titles_as_recorded(record) ⇒ `String`