Module: DS::Extractor::MarcXmlExtractor::ClassMethods

Included in:
DS::Extractor::MarcXmlExtractor
Defined in:
lib/ds/extractor/marc_xml_extractor.rb

Instance Method Summary collapse

Instance Method Details

#build_name_query(tags: [], relators: []) ⇒ String

Build names query tags and relators. Tags understood are 100, 700, and 710. The relators are used to require datafields based on the contents of a subfield code e containing the specified value, like ‘scribe’:

contains(./subfield[@code ='e'], 'scribe')

For relators see section <strong>$e - Relator term<strong>, here:

https://www.loc.gov/marc/bibliographic/bdx00.html

To require the subfield not have a relator, pass :none as the relator value.

build_name_query tags: ['100'], relators: :none

This will add the following to the query.

not(./subfield[@code = 'e'])

Note: In U. Penn manuscript catalog records, 700 and 710 fields that do not have a subfield code e are associated authors.

Parameters:

  • (defaults to: [])

    the MARC field code

  • (defaults to: [])

    for 700$e, 710$e, a value like ‘former owner’

Returns:

  • the data field query string



264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 264

def build_name_query tags: [], relators: []
  return '' if tags.empty? # don't process nonsensical requests
  # make sure the tags are all strings
  _tags        = [tags].flatten.map &:to_s
  tag_query    = _tags.map { |t| "@tag = #{t}" }.join " or "
  query_string = "(#{tag_query})"

  _relators = [relators].flatten.map { |r| r.to_s.strip.downcase == 'none' ? :none : r }
  return "datafield[#{query_string}]" if _relators.empty?

  if _relators.include? :none
    query_string += " and not(./subfield[@code = 'e'])"
    return "datafield[#{query_string}]"
  end

  relator_string = relators.map { |r| "contains(./subfield[@code ='e'], '#{r}')" }.join " or "
  query_string   += (relator_string.empty? ? '' : " and (#{relator_string})")
  "datafield[#{query_string}]"
end

#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ Array<Array>

Extract subfield values specified by tags

Parameters:

  • a <marc:record> node

  • (defaults to: [])

    the MARC datafield tag(s)

  • (defaults to: [])

    the MARC subfield code(s)

  • (defaults to: '|')

    separator for joining multiple datafield values

  • (defaults to: ' ')

    separator for joining subfield values

Returns:

  • an array of arrays of values



1083
1084
1085
1086
1087
1088
1089
1090
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1083

def collect_datafields record, tags: [], codes: [], field_sep: '|', sub_sep: ' '
  _tags     = [tags].flatten.map &:to_s
  tag_query = _tags.map { |t| "@tag = #{t}" }.join " or "
  record.xpath("datafield[#{tag_query}]").map { |datafield|
    value = collect_subfields datafield, codes: codes, sub_sep: sub_sep
    DS::Util.clean_string value, terminator: ''
  }
end

#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ Array<Array>

Extract datafields values with authority numbers (URL) when present for reconciliation CSV output.

Parameters:

  • a <marc:record> node

  • (defaults to: [])

    the MARC datafield tag(s)

  • (defaults to: [])

    the MARC subfield code(s)

  • (defaults to: ' ')

    separator for joining subfield values

Returns:

  • an array of arrays of values



1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1063

def collect_recon_datafields record, tags: [], codes: [], sub_sep: ' '
  _tags     = [tags].flatten.map &:to_s
  tag_query = _tags.map { |t| "@tag = #{t}" }.join " or "
  record.xpath("datafield[#{tag_query}]").map { |datafield|
    value  = collect_subfields datafield, codes: codes, sub_sep: sub_sep
    value  = DS::Util.clean_string value, terminator: ''
    number = datafield.xpath('subfield[@tag="0"]').text
    [value, number]
  }
end

#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ String

A method to collect subfields from a given datafield based on specified codes.

Parameters:

  • the datafield to collect subfields from

  • (defaults to: [])

    the MARC subfield code(s) to collect

  • (defaults to: ' ')

    the separator for joining subfield values

Returns:

  • the concatenated subfield values



1108
1109
1110
1111
1112
1113
1114
1115
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1108

def collect_subfields datafield, codes: [], sub_sep: ' '
  # ensure that +codes+ is an array of strings
  _codes = [codes].flatten.map &:to_s
  # Code query example: ['a', 'b', 'd', 'c'] => @code = 'a' or @code = 'b' or @code = 'c' or @code = 'd'
  code_query = _codes.map { |code| "@code = '#{code}'" }.join ' or '
  xpath      = %Q{subfield[#{code_query}]}
  DS::Util.clean_string datafield.xpath(xpath).map(&:text).reject(&:empty?).join sub_sep
end

#compile_dates(record, code, part1, part2) ⇒ Object

Compiles dates based on the provided code and parts. This methods determines the date based on the date code from the MARC 008 field; the code in position 6 of the MARC 008 field.

See www.loc.gov/marc/bibliographic/bd008.html

Parameters:

  • the marc:record node

  • the marc 008 date code



723
724
725
726
727
728
729
730
731
732
733
734
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 723

def compile_dates record, code, part1, part2
  case code
  when 'i', 'k', 'm', 'p', 'q', '|'
    [part1, part2]
  when 'n'
    []
  when 'b'
    handle_bce_date record
  else
    [part1]
  end
end

#extract_001_control_number(record, holdings_file = nil) ⇒ String

Extracts the 001 control number from the given MARC XML record and joins non-empty values with ‘|’.

Parameters:

  • the MARC XML record to extract the control number from

  • (defaults to: nil)

    (optional) the holdings file

Returns:

  • the extracted 001 control number joined with ‘|’



1122
1123
1124
1125
1126
1127
1128
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1122

def extract_001_control_number record, holdings_file = nil
  ids = []
  # add the MMS ID
  ids << extract_mmsid(record)

  ids.reject(&:empty?).join '|'
end

#extract_acknowledgments(record) ⇒ Array

Extracts acknowledgments from the given record.

Parameters:

  • the record to extract acknowledgments from

Returns:

  • the extracted acknowledgments



1142
1143
1144
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1142

def extract_acknowledgments record
  []
end

#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extracts all subjects from the given record, including named subjects and subjects.

Parameters:

  • the record to extract all subjects from

Returns:

  • the extracted all subjects



477
478
479
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 477

def extract_all_subjects record
  extract_named_subjects(record) + extract_subjects(record)
end

#extract_all_subjects_as_recorded(record) ⇒ Array<String>

Extracts all subjects as recorded from the given record.

Parameters:

  • the record to extract all subjects from

Returns:

  • the extracted all subjects as recorded



485
486
487
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 485

def extract_all_subjects_as_recorded record
  extract_all_subjects(record).map &:as_recorded
end

#extract_artists(record) ⇒ Array<DS::Extractor::Name>

Extracts artists from the given record using the specified type and role.

Parameters:

  • the record to extract artists from

Returns:

  • an array of extracted artists



96
97
98
99
100
101
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 96

def extract_artists record
  extract_names(
    record, tags: [700, 710, 711],
    relators:     ['artist', 'illuminator']
  )
end

#extract_artists_as_recorded(record) ⇒ Array<String>

Extracts artists as recorded from the given record.

Parameters:

  • the record to extract artists from

Returns:

  • the extracted artists as recorded



107
108
109
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 107

def extract_artists_as_recorded record
  extract_artists(record).map &:as_recorded
end

#extract_artists_as_recorded_agr(record) ⇒ Array<String>

Extracts artists as recorded with vernacular form from the given record.

Parameters:

  • the record to extract artists from

Returns:

  • the extracted artists as recorded with vernacular form



115
116
117
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 115

def extract_artists_as_recorded_agr record
  extract_artists(record).map &:vernacular
end

#extract_associated_agents(record) ⇒ Object



173
174
175
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 173

def extract_associated_agents record
  []
end

#extract_authority_number(datafield) ⇒ String

Extract the authority number, subfield $0 from the given datafield.

Parameters:

  • the marc:datafield node with the name

Returns:



1013
1014
1015
1016
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1013

def extract_authority_number datafield
  xpath = "./subfield[@code='0']"
  datafield.xpath(xpath).text
end

#extract_authors(record) ⇒ Array<String>

Extracts authors from the given record.

Parameters:

  • the record to extract authors from

Returns:

  • an array of extracted authors



168
169
170
171
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 168

def extract_authors record
  extract_names(record, tags: [100, 110, 111]) +
    extract_names(record, tags: [700, 710, 711], relators: %w{author})
end

#extract_authors_as_recorded(record) ⇒ Array<String>

Extract names from record using tags and relators. Authors are extracted from datafields 100, 110, 111, 700, 701, and 711.

All 1xx are extracted, no relator is assumed and all 1xx are assumed to be authors.

700, 710, and 711 are extracted when subfield 7xx$e contains ‘author’.

Parameters:

  • a <marc:record> node

Returns:

  • list of names

See Also:



44
45
46
47
48
49
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 44

def extract_authors_as_recorded record
  authors = []
  authors += extract_names_as_recorded record, tags: [100, 110, 111]
  authors += extract_names_as_recorded record, tags: [700, 710, 711], relators: %w{author}
  authors
end

#extract_authors_as_recorded_agr(record) ⇒ Array<String>

Extract the alternate graphical representation of the name or return [].

See MARC specification for 880 fields:

Parameters:

  • a <marc:record> node

Returns:

  • list of names or []



59
60
61
62
63
64
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 59

def extract_authors_as_recorded_agr record
  authors = []
  authors += extract_names_as_recorded_agr record, tags: [100, 110, 111]
  authors += extract_names_as_recorded_agr record, tags: [700, 710, 711], relators: %w{author}
  authors
end

#extract_cataloging_convention(record) ⇒ Object



1050
1051
1052
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1050

def extract_cataloging_convention record
  record.xpath('datafield[@tag=040]/subfield[@code="e"]/text()').text
end

#extract_date_part(datestring, ndx1, ndx2) ⇒ String

Extracts a part of the date string from a MARC 008 controlfield, using the indices ndx1 and ndx2.

Ensures that the extracted part starts with a digit and matches a sequence of digits and/or ‘u’.

Parameters:

  • the input datestring

  • the starting index for extraction

  • the length of the substring to extract

Returns:

  • the extracted part of the datestring



777
778
779
780
781
782
783
784
785
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 777

def extract_date_part datestring, ndx1, ndx2
  part = datestring[ndx1, ndx2]

  # part must start with a digit and match a seq of digits and/or u
  return unless part =~ /^\d[\du]+/

  part.sub! /^0+/, '' if part =~ /^0+[1-9]/
  part
end

#extract_date_range(record, range_sep:) ⇒ Array

Extract the encoded date from controlfield 008.

Follows

Returns an array containing a pair of dates or a single date, or an empty array.

The following date types have appeared in MARC records contributed to DS as of 2024-02-27 and are handled here:

b - No dates given; B.C. date involved

- 'b        '
- date is taken from 046$b, and if present $d or $e
- See: https://www.loc.gov/marc/bibliographic/bd046.html

e - Detailed date

- 'e11200520', 'e139403 x', 'e164509 t', 'e167707 y',
  'e187505 s'
- the first date part is returned a single year

i - Inclusive dates of collection

- 'i07500800', 'i08000830', 'i1000    '
- the first and -- if present -- second date part are
  returned as two years

k - Range of years of bulk of collection

- 'k15121716'
- the first and second date parts are returned as two years

m - Multiple dates

- 'm0618193u', 'm07390741', 'm10751200', 'm16uu1637',
  'm17uu1900'
- the first and second date parts are returned as two years
- see note below on replacement of u's

n - Dates unknown

- 'nuuuuuuuu'
- no date returned

p - Date of distribution/release/issue and

  production/recording session when different
- 'p1400    '
- the first and -- if present -- second date part are
  returned as two years

q - Questionable date q - ‘q01000299’, ‘q0979 ’, ‘q09910992’, ‘q10001099’,

  'q1300    ', 'q13uu14uu', 'q13uu1693', 'q14011425',
  'q1425uuuu', 'q1450    ', 'q1460    ', 'q14uu14uu',
  'quuuu1597'
- the first and -- if present -- second date part are
  returned as two years
- if the second date part is 'uuuu', the first date part is
  returned as year; ; ‘q1425uuuu’ => 1425
- if the first date part is 'uuuu', the second date part is
  returned as year; ‘quuuu1597’ => 1597
- for partial date parts with u's, see the note below

r - Reprint/reissue date and original date

- 'r11751199'
- the first date part is returned a single year

s - Single known date/probable date s - ‘s1171 ’, ‘s1171 xx ’, ‘s1192 ua ’, ‘s1250||||’,

  's1286 iq ', 's1315 sy ', 's1366 is ', 's1436 gw ',
  's1450 it ', 's1470 ly ', 's1470 tu ', 's1470 uuu',
  's1497 enk', 's1595 sp ', 's19uu    '
- the first date part is returned a single year
- see note below on replacement of u's

| - No attempt to code

- '|12501300'
- this appears to be miscoding
- nevertheless, '|' coded records will follow the default
  rule: date part one is returned a single year

The following cases, so far unrepresented in contributor data, will follow the default rule: date part one will be returned as a single year.

c - Continuing resource currently published d - Continuing resource ceased publication t - Publication date and copyright date u - Continuing resource status unknown

Note on the replacement of u’s in partial year dates

- Where u's appear in the first date they are replace by 0;
  thus, 'q13uu1693'  => '1300, 1693'
- Where u's appear in the second date they are replace by 9;
  thus, 'q14uu14uu'  => '1400, 1499'

For MARC partial dates see Date 1 and Date 2 documentation here

https://www.loc.gov/marc/bibliographic/bd008a.html

Parameters:

  • the marc:record node

Returns:



692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 692

def extract_date_range record, range_sep:
  # 008 controlfield; e.g.,
  #
  #     "220518q14001500xx            000 0     d"
  ctrl_008 = record.at_xpath("controlfield[@tag='008']")
  return [] unless ctrl_008 # return if no 008
  # get positions 7-15: q14001500
  date_str = ctrl_008.text[6, 9]
  code     = date_str[0] # 'm'
  part1    = extract_date_part date_str, 1, 4 # '0618'
  part1.gsub! /u/, '0' if part1.present?
  part2 = extract_date_part date_str, 5, 8 # '193u'
  part2.gsub! /u/, '9' if part2.present?

  range = compile_dates(record, code, part1, part2).filter_map { |y|
    # filter out blank dates and '9999'
    y if y.present? && y != '9999'
  }

  return [] if range.blank?
  [range.join(range_sep)]
end

#extract_extent(record) ⇒ Array<String>

Extracts the extent from the given MARC XML record.

Parameters:

  • the MARC XML record to extract extent from

Returns:

  • an array of extracted extents



975
976
977
978
979
980
981
982
983
984
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 975

def extract_extent record
  subfield_xpath = "subfield[@code = 'a' or @code = 'b' or @code = 'c']"
  record.xpath("datafield[@tag=300]").map { |datafield|
    datafield.xpath(subfield_xpath).filter_map { |s|
      s.text unless s.text.empty?
    }.join ' '
  }.filter_map { |ext|
    "Extent: #{DS::Util.clean_string ext}" unless ext.strip.empty?
  }
end

#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>

Extract former owners from the given record.

Parameters:

  • the record to extract former owners from

Returns:

  • the extracted former owners



123
124
125
126
127
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 123

def extract_former_owners record
  extract_names(
    record, tags: [700, 710, 711], relators: ['former owner']
  )
end

#extract_former_owners_as_recorded(record) ⇒ Array<String>

Extracts former owners as recorded from the given record.

Parameters:

  • the record to extract former owners from

Returns:

  • the extracted former owners as recorded



133
134
135
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 133

def extract_former_owners_as_recorded record
  extract_former_owners(record).map &:as_recorded
end

#extract_former_owners_as_recorded_agr(record) ⇒ Array<String>

Extracts former owners as recorded with vernacular form from the given record.

Parameters:

  • the record to extract former owners from

Returns:

  • the extracted former owners as recorded with vernacular form



141
142
143
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 141

def extract_former_owners_as_recorded_agr record
  extract_former_owners(record).map &:vernacular
end

#extract_genre_vocabulary(record) ⇒ Array<Symbol>

Extracts the genre vocabulary from the given MARC XML record.

Parameters:

  • the MARC XML record to extract genre vocabulary from

Returns:

  • an array of extracted genre vocabularies



530
531
532
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 530

def extract_genre_vocabulary record
  extract_genres(record).map(&:vocab)
end

#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ Array<DS::Extractor::Genre>

Extracts genres from the given MARC XML record.

Parameters:

  • the MARC XML record to extract genres from

  • (defaults to: '--')

    (default: ‘–’) the separator for joining subfields

  • (defaults to: :all)

    (default: :all) the vocab type to extract

Returns:

  • an array of extracted genres



508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 508

def extract_genres record, sub_sep: '--', vocab: :all
  xpath = %q{datafield[@tag = 655]}
  record.xpath(xpath).filter_map { |datafield|
    as_recorded          = collect_subfields datafield, codes: 'abcvzyx'.split(//), sub_sep: sub_sep
    as_recorded          = DS::Util.clean_string as_recorded, terminator: ''
    term_vocab           = extract_vocabulary datafield
    source_authority_uri = extract_authority_number datafield

    next unless as_recorded.present?
    next unless vocab == :all || vocab == term_vocab

    DS::Extractor::Genre.new(
      as_recorded: as_recorded, vocab: term_vocab,
      source_authority_uri: source_authority_uri
    )
  }
end

#extract_genres_as_recorded(record, uniq: true) ⇒ Array<String>

Genres and subjects

Extract genre and form terms from MARC datafield 655 values, where the 655$2 value can be specified; e.g., rbprov, aat, lcgft.

Set sub2 to :all to extract all 655 terms

Parameters:

  • the MARC record

  • (defaults to: true)

    whether to return only unique terms; default: true

Returns:

  • array of genre terms



366
367
368
369
370
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 366

def extract_genres_as_recorded record, uniq: true
  terms = extract_genres(record, sub_sep: '--', vocab: :all).map(&:as_recorded)

  uniq ? terms.uniq : terms
end

#extract_langs(record) ⇒ String

Extract the language codes from controlfield 008 and datafield 041$a.

Parameters:

  • the marc:record node

Returns:



321
322
323
324
325
326
327
328
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 321

def extract_langs record
  # Language is in 008 at characters 35-37 (0-based indexing)
  (langs ||= []) << record.xpath("substring(controlfield[@tag='008']/text(), 36, 3)")
  # 041 is present if there's more than one language
  langs += record.xpath("datafield[@tag=041]/subfield[@code='a']").map(&:text)
  # if there are 041 values, the lang from 008 is repeated; remove the duplicate
  langs.select(&:present?).uniq
end

#extract_languages(record) ⇒ Object



340
341
342
343
344
345
346
347
348
349
350
351
352
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 340

def extract_languages record
  xpath = "datafield[@tag=546]/subfield[@code='a']"
  langs = record.xpath(xpath).map { |val|
    DS::Util.clean_string val.text, terminator: ''
  }.select(&:present?).map { |as_recorded|
    DS::Extractor::Language.new as_recorded: as_recorded
  }
  return langs if langs.present?

  extract_langs(record).map { |as_recorded|
    DS::Extractor::Language.new as_recorded: as_recorded
  }
end

#extract_languages_as_recorded(record) ⇒ String

Extract the language as record; default to the 546$a field; otheriwse return the code values from controlfield 008 and 041$a.

Parameters:

  • the marc:record node

Returns:



336
337
338
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 336

def extract_languages_as_recorded record
  extract_languages(record).map &:as_recorded
end

#extract_material_as_recorded(record) ⇒ String

Extracts the material as recorded from the given MARC XML record.

Parameters:

  • the MARC XML record to extract material from

Returns:

  • the extracted material as recorded



954
955
956
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 954

def extract_material_as_recorded record
  extract_materials(record).map(&:as_recorded).first.to_s
end

#extract_materials(record) ⇒ Array<DS::Extractor::Material>

Extracts materials from the given MARC XML record.

Parameters:

  • the MARC XML record to extract materials from

Returns:

  • an array of extracted materials



962
963
964
965
966
967
968
969
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 962

def extract_materials record
  DS::Extractor::MarcXmlExtractor.collect_datafields(
    record, tags: 300, codes: 'b'
  ).filter_map { |material|
    next unless material.present?
    DS::Extractor::Material.new as_recorded: material
  }
end

#extract_mmsid(record) ⇒ String

Extracts the MMS ID from the given MARC XML record.

Parameters:

  • the MARC XML record to extract the MMS ID from

Returns:

  • the extracted MMS ID



1134
1135
1136
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1134

def extract_mmsid record
  record.xpath("controlfield[@tag=001]").text
end

#extract_name_portion(datafield) ⇒ String

Extract the the PN from datafield, pulling subfields $a, $b, $c, $d.

Parameters:

  • the marc:datafield node with the name

Returns:



289
290
291
292
293
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 289

def extract_name_portion datafield
  codes = %w{ a b c d }
  value = collect_subfields datafield, codes: codes
  DS::Util.clean_string value, terminator: ''
end

#extract_named_500(record, name:, strip_name: false) ⇒ Array<String>

Return an array of 500$a values that begin with name: (name followed by a colon :). The name prefix is removed if strip_name is true; it’s false by default.

Parameters:

  • the MARC XML record

  • the named prefix, like ‘Binding’, without trailing colon

  • (defaults to: false)

    whether to remove the name prefix from returned comments; default is false

Returns:

  • the matching



1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1156

def extract_named_500 record, name:, strip_name: false
  return [] if name.to_s.strip.empty?

  # format the prefix; make sure there's not an extra ':'
  prefix = "#{name.strip.chomp ':'}:"
  xpath  = %Q{datafield[@tag=500]/subfield[@code='a' and starts-with(translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"), '#{prefix.downcase}')]/text()}
  record.xpath(xpath).map { |d|
    note = d.text.strip
    strip_name ? note.sub(%r{^#{prefix}\s*}i, '') : note
  }
end

#extract_named_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extract named subjects from the MARC XML record based on specified tags.

Parameters:

  • the record to extract named subjects from

Returns:

  • an array of extracted named subjects



453
454
455
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 453

def extract_named_subjects record
  extract_subject_by_tags record, tags: [600, 610, 611, 630, 647]
end

#extract_named_subjects_as_recorded(record) ⇒ Array<String>

Extracts named subjects as recorded from the given record.

Parameters:

  • the record to extract named subjects from

Returns:

  • the extracted named subjects as recorded



444
445
446
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 444

def extract_named_subjects_as_recorded record
  extract_named_subjects(record).map &:as_recorded
end

#extract_names(record, tags: [], relators: []) ⇒ Array<DS::Extractor::Name>

Extract names from the MARC XML record based on specified tags and relators.

Parameters:

  • the record to extract names from

  • (defaults to: [])

    the MARC field tag

  • (defaults to: [])

    for 700$e, 710$e, values like ‘former owner’

Returns:

  • an array of extracted names



199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 199

def extract_names record, tags: [], relators: []
  xpath = build_name_query tags: tags, relators: relators
  return [] if xpath.empty? # don't process nonsensical requests

  record.xpath(xpath).map { |datafield|

    as_recorded = extract_name_portion datafield
    role        = extract_role datafield, relators: relators
    role        = 'author' if role.blank?
    vernacular  = extract_pn_agr datafield
    ref         = extract_authority_number datafield

    DS::Extractor::Name.new(
      as_recorded: as_recorded, role: role,
      vernacular:  vernacular, ref: ref
    )
  }
end

#extract_names_as_recorded(record, tags: [], relators: []) ⇒ String

Extract names from record using tags and relators. Tags understood are 100, 700, and 710. The relators are used to require datafields based on the contents of a subfield code e containing the specified value, like ‘scribe’:

contains(./subfield[@code ='e'], 'scribe')

Parameters:

  • a <marc:record> node

  • (defaults to: [])

    the MARC field tag

  • (defaults to: [])

    for 700$e, 710$e, a value like ‘former owner’

Returns:

  • pipe-separated list of names

See Also:



25
26
27
28
29
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 25

def extract_names_as_recorded record, tags: [], relators: []
  xpath = build_name_query tags: tags, relators: relators
  return '' if xpath.empty? # don't process nonsensical requests
  record.xpath(xpath).map { |datafield| DS::Util.clean_string extract_name_portion datafield }
end

#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ Object

Extract the alternate graphical representation of the name or return .

See MARC specification for 880 fields:

Parameters:

  • a <marc:record> node

  • (defaults to: [])

    the MARC field code

  • (defaults to: [])

    for 700$e, 710$e, a value like ‘former owner’

See Also:



230
231
232
233
234
235
236
237
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 230

def extract_names_as_recorded_agr record, tags: [], relators: []
  xpath = build_name_query tags: tags, relators: relators
  return '' if xpath.empty? # don't process nonsensical requests

  record.xpath(xpath).map { |datafield|
    extract_pn_agr datafield
  }
end

#extract_notes(record) ⇒ Array<String>

Notes

Extract notes from record.

Extract values from ‘500$a` fields that do not begin with AMREMM tags for specific values like ’Binding:‘. Specifically, this method ignores fields beginning with:

Pagination|Foliation|Layout|Colophon|Collation|Script|Decoration|\
     Binding|Origin|Watermarks|Watermark|Signatures|Shelfmark

Parameters:

  • a <MARC_RECORD> node

Returns:

  • an array of note strings



1001
1002
1003
1004
1005
1006
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1001

def extract_notes record
  xpath = "datafield[@tag=500 or @tag=561]/subfield[@code='a']/text()"
  record.xpath(xpath).map { |note|
    DS::Util.clean_string note.text.strip.gsub(%r{\s+}, ' ')
  }
end

#extract_physical_description(record) ⇒ String

Extracts the physical description from the given MARC XML record.

Parameters:

  • the record to extract the physical description from

Returns:

  • the extracted physical description



946
947
948
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 946

def extract_physical_description record
  extract_extent(record)
end

#extract_places(record) ⇒ Object



575
576
577
578
579
580
581
582
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 575

def extract_places record
  xpath = "datafield[@tag=260 or @tag=264]/subfield[@code='a']/text()"
  record.xpath(xpath).map { |pn|
    next if pn.to_s.blank?
    as_recorded = DS::Util.clean_string(pn.text, terminator: '')
    DS::Extractor::Place.new as_recorded: as_recorded
  }
end

#extract_pn_agr(datafield) ⇒ String

Extract the alternate graphical representation of the name or return .

See MARC specification for 880 fields:

Input will look like this:

<marc:datafield ind1="1" ind2=" " tag="100">
  <marc:subfield code="6">880-01</marc:subfield>
  <marc:subfield code="a">Urmawī, ʻAbd al-Muʼmin ibn Yūsuf,</marc:subfield>
  <marc:subfield code="d">approximately 1216-1294.</marc:subfield>
</marc:datafield>
<!-- ... -->
<marc:datafield ind1="1" ind2=" " tag="880">
  <marc:subfield code="6">100-01//r</marc:subfield>
  <marc:subfield code="a">ارموي، عبد المؤمن بن يوسف،</marc:subfield>
  <marc:subfield code="d">اپرxمتلي 12161294.</marc:subfield>
</marc:datafield>

Parameters:

  • the main data field @tag = ‘100’, ‘700’, etc.

Returns:

  • the text representation of the value



1041
1042
1043
1044
1045
1046
1047
1048
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1041

def extract_pn_agr datafield
  linkage = datafield.xpath("subfield[@code='6']").text
  return '' if linkage.empty?
  tag   = datafield.xpath('./@tag').text
  index = linkage.split('-').last
  xpath = "./parent::record/datafield[@tag='880' and contains(./subfield[@code='6'], '#{tag}-#{index}')]"
  extract_name_portion datafield.xpath(xpath)
end

#extract_production_date_as_recorded(record) ⇒ Object

Look for a date as recorded. Look first at 260$c, then 260$d, then 245$f, finally use the encoded date from 008



790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 790

def extract_production_date_as_recorded record
  # Note that MARC does not specify a subfield '260$d':
  #
  # https://www.loc.gov/marc/bibliographic/bd260.html
  #
  # However Cornell use $d to continue 260$c
  dar = record.xpath("datafield[@tag=260]/subfield[@code='c' or @code='d']/text()").map do |t|
    DS::Util.clean_string t.text.strip
  end.join ' '
  return [dar.strip] unless dar.strip.empty?

  dar = record.xpath("datafield[@tag=264]/subfield[@code='c']/text()").map do |t|
    DS::Util.clean_string t.text.strip
  end.join ' '
  return [dar.strip] unless dar.strip.empty?

  # 245 is the title field but can have a date in $f
  #
  # see: https://www.loc.gov/marc/bibliographic/bd245.html
  #
  # Cornell uses 245$f in records that also lack 260 or 264; see
  # '4600 Bd. Ms. 176':
  #
  # https://catalog.library.cornell.edu/catalog/6382455/librarian_view
  #
  #   <datafield ind1="0" ind2="0" tag="245">
  #     <subfield code="a">Shah-nameh,</subfield>
  #     <subfield code="f">1600s.</subfield>
  #   </datafield>
  #
  dar = record.xpath("datafield[@tag=245]/subfield[@code='f']").text
  return [DS::Util.clean_string(dar)] unless dar.strip.empty?

  encoded_date = extract_date_range record, range_sep: '-'
  [encoded_date.join('_').strip]
end

#extract_production_places_as_recorded(record) ⇒ Array<String>

Look for a place as recorded. Look first at 264$a, then 260$a; return ” when no value is found

Parameters:

  • the MARC record

Returns:

  • the place name or []



551
552
553
554
555
556
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 551

def extract_production_places_as_recorded record
  xpath = "datafield[@tag=260 or @tag=264]/subfield[@code='a']/text()"
  record.xpath(xpath).map { |pn|
    DS::Util.clean_string pn.text, terminator: '' unless pn.to_s.strip.empty?
  }
end

#extract_recon_genres(record, sub_sep: '--') ⇒ Array<Array>

Extract genre terms for reconciliation CSV output.

Returns a two-dimensional array, each row is a place; and each row has three columns: term, vocab, and authority number.

Parameters:

  • a <MARC_RECORD> node

Returns:

  • an array of arrays of values



498
499
500
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 498

def extract_recon_genres record, sub_sep: '--'
  extract_genres(record, sub_sep: sub_sep).map(&:to_a)
end

#extract_recon_names(record, tags: [], relators: []) ⇒ Array<Array<String>>

For the given record, extract the names as an array of arrays, including the concatenated name string (subfields, a, b, c, d) and, if present, the alternate graphical representation (AGR) and authority number (or URI).

Each returned sub array will have three values: name, name AGR, URI.

Parameters:

  • a <marc:record> node

  • (defaults to: [])

    the MARC field tag

  • (defaults to: [])

    for 700$e, 710$e, a value like ‘former owner’

Returns:



189
190
191
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 189

def extract_recon_names record, tags: [], relators: []
  extract_names(record, tags: tags, relators: relators).map &:to_a
end

#extract_recon_places(record) ⇒ Array<Array>

Extract the places of production MARC 260$a for reconciliation CSV output.

Returns a two-dimensional array, each row is a place; and each row has one column: place name; for example:

[["Austria"],
 ["Germany"],
 ["France (?)"]]

Parameters:

  • a <marc:record> node

Returns:

  • an array of arrays of values



571
572
573
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 571

def extract_recon_places record
  extract_places(record).map &:to_a
end

#extract_recon_subjects(record) ⇒ Array

Extracts reconstructed subjects from the given record.

Parameters:

  • the record to extract reconstructed subjects from

Returns:

  • the extracted reconstructed subjects



538
539
540
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 538

def extract_recon_subjects record
  extract_all_subjects(record).map &:to_a
end

#extract_recon_titles(record) ⇒ Array<String>

Extracts reconstructed titles from the given record.

Parameters:

  • the record to extract reconstructed titles from

Returns:

  • the extracted reconstructed titles



835
836
837
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 835

def extract_recon_titles record
  extract_titles(record).to_a
end

#extract_role(datafield, relators:) ⇒ String

Extract the role value, subfield $e, from the given datafield.

Parameters:

  • the marc:datafield node with the name

Returns:



300
301
302
303
304
305
306
307
308
309
310
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 300

def extract_role datafield, relators:
  relators_list = *relators
  return '' if relators_list.empty? or relators_list.include? :none
  # if there's no $e, stop processing
  return '' if datafield.xpath('subfield[@code = "e"]/text()').text.empty?

  df_roles    = datafield.xpath('subfield[@code = "e"]/text()').map(&:text)
  rel_pattern = /(#{relators_list.join('|')})/
  role        = df_roles.find { |role| role =~ rel_pattern }
  DS::Util.clean_string role, terminator: ''
end

#extract_scribes(record) ⇒ Array<DS::Extractor::Name>

Extract scribes from the given record.

Parameters:

  • the record to extract scribes from

Returns:

  • the extracted scribes



70
71
72
73
74
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 70

def extract_scribes record
  extract_names(
    record, tags: [700, 710, 711], relators: ['scribe']
  )
end

#extract_scribes_as_recorded(record) ⇒ Array<String>

Extract scribes as recorded from the given record.

Parameters:

  • the record to extract scribes from

Returns:

  • the extracted scribes as recorded



80
81
82
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 80

def extract_scribes_as_recorded record
  extract_scribes(record).map &:as_recorded
end

#extract_scribes_as_recorded_agr(record) ⇒ Array<String>

Extracts scribes as recorded with vernacular form from the given record.

Parameters:

  • the record to extract scribes from

Returns:

  • the extracted scribes as recorded with vernacular form



88
89
90
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 88

def extract_scribes_as_recorded_agr record
  extract_scribes(record).map &:vernacular
end

#extract_subject_by_tags(record, tags: []) ⇒ Array<DS::Extractor::Subject>

Return an array of strings of formatted subjects (600, 610, 611, 630, 647, 648, 650, and 651). Subjects values are separated by ‘–’:

<datafield ind1="1" ind2="0" tag="600">
  <subfield code="a">Cicero, Marcus Tullius</subfield>
  <subfield code="x">Spurious and doubtful works.</subfield>
</datafield>

# => "Cicero, Marcus Tullius--Spurious and doubtful works"

Subfields with codes ‘b’, ‘c’, ‘d’, ‘p’, ‘q’, and ‘t’ are appended to the preceding subfield:

  <datafield ind1=" " ind2="7" tag="647">
    <subfield code="a">Conspiracy of Catiline</subfield>
    <subfield code="c">(Rome :</subfield>
    <subfield code="d">65-62 B.C.)</subfield>
    <subfield code="2">fast</subfield>
    <subfield code="0">(OCoLC)fst01352536</subfield>
  </datafield>

  # => "Conspiracy of Catiline (Rome : 65-62 B.C.)"

@param [Nokogiri::XML::Node] record the MARC record

Returns:

  • an array of formatted subjects strings



398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 398

def extract_subject_by_tags record, tags: []
  tag_list = *tags
  raise "No tags given for subject extraction: #{tags.inspect}" if tag_list.empty?
  sep       = '--'
  tag_query = tag_list.map { |tag| "@tag=#{tag}" }.join " or "
  record.xpath("datafield[#{tag_query}]").map { |datafield|
    values = Hash.new { |hash, k| hash[k] = [] }
    vocab  = datafield.xpath('./@ind2').text
    datafield.xpath("subfield").map { |subfield|
      subfield_text = DS::Util.clean_string subfield.text
      subfield_code = subfield.xpath('./@code').text
      case subfield_code
      when 'e', 'w'
        # don't include these formatted in subject
      when 'b', 'c', 'd', 'p', 'q', 't'
        # append these to the preceding value
        # we assume that there is a preceding value
        values[:terms][-1] += " #{subfield_text}"
        values[:codes][-1] += ";#{subfield_code}"
      when %r{\A[[:alpha:]]\z}
        # any other codes: a, g, v, x, y, z
        values[:terms] << subfield_text
        values[:codes] << subfield_code
      when '2'
        vocab = subfield.text
      when '0'
        values[:urls] << subfield_text
      end
    }
    terms = DS::Util.clean_string values[:terms].join(sep), terminator: ''
    urls  = DS::Util.clean_string values[:urls].join(sep), terminator: ''
    codes = DS::Util.clean_string values[:codes].join(sep), terminator: ''
    DS::Extractor::Subject.new(
      as_recorded:          terms,
      subfield_codes:       codes,
      source_authority_uri: urls,
      vocab:                vocab
    )
  }

end

#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>

Extracts subjects from the given record based on specified tags.

Parameters:

  • the record to extract subjects from

Returns:

  • an array of extracted subjects



469
470
471
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 469

def extract_subjects record
  extract_subject_by_tags record, tags: [648, 650, 651]
end

#extract_subjects_as_recorded(record) ⇒ Array<String>

Extracts subjects as recorded from the given record.

Parameters:

  • the record to extract subjects from

Returns:

  • the extracted subjects as recorded



461
462
463
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 461

def extract_subjects_as_recorded record
  extract_subjects(record).map &:as_recorded
end

#extract_titles(record) ⇒ Array<DS::Extractor::Title>

Extracts titles from the given record.

Parameters:

  • the record to extract titles from

Returns:

  • an array of extracted titles



843
844
845
846
847
848
849
850
851
852
853
854
855
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 843

def extract_titles record
  tar      = title_as_recorded record
  tar_agr  = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.title_as_recorded_agr(record, 245), terminator: ''
  utar     = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.uniform_titles_as_recorded(record), terminator: ''
  utar_agr = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.uniform_title_as_recorded_agr(record), terminator: ''

  [DS::Extractor::Title.new(
    as_recorded:              tar,
    vernacular:               tar_agr,
    uniform_title:            utar,
    uniform_title_vernacular: utar_agr
  )]
end

#extract_titles_as_recorded(record) ⇒ Array<String>

Extracts titles as recorded from the given record.

Parameters:

  • the record to extract titles from

Returns:

  • the extracted titles as recorded



893
894
895
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 893

def extract_titles_as_recorded record
  extract_titles(record).map &:as_recorded
end

#extract_titles_as_recorded_agr(record) ⇒ Array<String>

Extracts titles as recorded with vernacular form from the given record.

Parameters:

  • the record to extract titles from

Returns:

  • the extracted titles as recorded with vernacular form



861
862
863
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 861

def extract_titles_as_recorded_agr record
  extract_titles(record).map &:vernacular
end

#extract_uniform_titles_as_recorded(record) ⇒ Array<String>

Extracts uniform titles as recorded from the given record.

Parameters:

  • the record to extract uniform titles from

Returns:

  • the extracted uniform titles as recorded



913
914
915
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 913

def extract_uniform_titles_as_recorded record
  extract_titles(record).map &:uniform_title
end

#extract_uniform_titles_as_recorded_agr(record) ⇒ Array<String>

Extracts uniform titles as recorded with vernacular form from the given MARC XML record.

Parameters:

  • the record to extract uniform titles from

Returns:

  • the extracted uniform titles as recorded with vernacular form



922
923
924
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 922

def extract_uniform_titles_as_recorded_agr record
  extract_titles(record).map &:uniform_title_vernacular
end

#extract_vocabulary(datafield) ⇒ String

Parameters:

  • the term datafield

Returns:



1095
1096
1097
1098
1099
1100
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1095

def extract_vocabulary datafield
  return 'lcsh' if datafield['ind2'] == '0'

  vocab = datafield.xpath("subfield[@code=2]").text
  vocab.chomp '.' if vocab.present?
end

#handle_bce_date(record) ⇒ Array<String>

Compiles BCE dates based on the provided record. It extracts BCE dates from specific subfields in the MARC XML record.

The method stops and returns an empty array [] if the record lacks a 240$b (BCE date 1). It then looks for a 245$d (BCE date 2) or 245$e (CE date 2). An array containing the single 240$b value as a negative value or a range of two dates.

See: www.loc.gov/marc/bibliographic/bd046.html

Parameters:

  • the MARC XML record

Returns:

  • an array of BCE dates in string format



749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 749

def handle_bce_date record
  # "datafield[@tag=260]/subfield[@code='c' or @code='d']/text()")
  bce_date1 = record.at_xpath('datafield[@tag=046]/subfield[@code="b"]/text()').to_s
  # stop if there's no BCE date 1
  return [] if bce_date1.blank?

  xpath     = 'datafield[@tag=046]/subfield[@code="d"]/text()'
  bce_date2 = record.at_xpath(xpath).to_s

  return ["-#{bce_date1}", "-#{bce_date2}"] if bce_date2.present?

  xpath    = 'datafield[@tag=046]/subfield[@code="e"]/text()'
  ce_date2 = bce_date2 = record.at_xpath(xpath).to_s
  return ["-#{bce_date1}", ce_date2] if ce_date2.present?

  ["-#{bce_date1}"]
end

#title_as_recorded(record) ⇒ String

Extracts the title as recorded from the given record.

Parameters:

  • the record to extract the title from

Returns:

  • the extracted title as recorded



869
870
871
872
873
874
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 869

def title_as_recorded record
  xpath = "datafield[@tag=245]/subfield[@code='a' or @code='b']"
  record.xpath(xpath).map { |title|
    DS::Util.clean_string(title.text, terminator: '')
  }.join '; '
end

#title_as_recorded_agr(record, tag) ⇒ String

Extracts the title as recorded with vernacular form from the given record.

Parameters:

  • the record to extract the title from

  • the tag to use for extraction

Returns:

  • the extracted title as recorded with vernacular form



881
882
883
884
885
886
887
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 881

def title_as_recorded_agr record, tag
  linkage = record.xpath("datafield[@tag=#{tag}]/subfield[@code='6']").text
  return '' if linkage.empty?
  index = linkage.split('-').last
  xpath = "datafield[@tag='880' and contains(./subfield[@code='6'], '#{tag}-#{index}')]/subfield[@code='a']"
  DS::Util.clean_string record.xpath(xpath).text.delete '[]'
end

#uniform_title_as_recorded_agr(record) ⇒ String

Extracts uniform titles as recorded and aggregates them from the given MARC XML record.

Parameters:

  • the MARC XML record to extract uniform titles from

Returns:

  • the aggregated uniform titles as recorded



930
931
932
933
934
935
936
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 930

def uniform_title_as_recorded_agr record
  tag240 = title_as_recorded_agr record, 240
  tag130 = title_as_recorded_agr record, 130
  [tag240, tag130].reject(&:empty?).map { |title|
    DS::Util.clean_string title
  }.join '|'
end

#uniform_titles_as_recorded(record) ⇒ String

Extracts uniform titles as recorded from the given record.

Parameters:

  • the record to extract uniform titles from

Returns:

  • the extracted uniform titles as recorded joined by ‘|’



901
902
903
904
905
906
907
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 901

def uniform_titles_as_recorded record
  title_240 = record.xpath("datafield[@tag=240]/subfield[@code='a']").text
  title_130 = record.xpath("datafield[@tag=130]/subfield[@code='a']").text
  [title_240, title_130].reject(&:empty?).map { |title|
    DS::Util.clean_string title, terminator: ''
  }.join '|'
end