Module: DS::Extractor::MarcXmlExtractor::ClassMethods
- Included in:
- DS::Extractor::MarcXmlExtractor
- Defined in:
- lib/ds/extractor/marc_xml_extractor.rb
Instance Method Summary collapse
-
#build_name_query(tags: [], relators: []) ⇒ String
Build names query tags and relators.
-
#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ Array<Array>
Extract subfield values specified by
tags. -
#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ Array<Array>
Extract datafields values with authority numbers (URL) when present for reconciliation CSV output.
-
#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ String
A method to collect subfields from a given datafield based on specified codes.
-
#compile_dates(record, code, part1, part2) ⇒ Object
Compiles dates based on the provided code and parts.
-
#extract_001_control_number(record, holdings_file = nil) ⇒ String
Extracts the 001 control number from the given MARC XML record and joins non-empty values with ‘|’.
-
#extract_acknowledgments(record) ⇒ Array
Extracts acknowledgments from the given record.
-
#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>
Extracts all subjects from the given record, including named subjects and subjects.
-
#extract_all_subjects_as_recorded(record) ⇒ Array<String>
Extracts all subjects as recorded from the given record.
-
#extract_artists(record) ⇒ Array<DS::Extractor::Name>
Extracts artists from the given record using the specified type and role.
-
#extract_artists_as_recorded(record) ⇒ Array<String>
Extracts artists as recorded from the given record.
-
#extract_artists_as_recorded_agr(record) ⇒ Array<String>
Extracts artists as recorded with vernacular form from the given record.
- #extract_associated_agents(record) ⇒ Object
-
#extract_authority_number(datafield) ⇒ String
Extract the authority number, subfield $0 from the given datafield.
-
#extract_authors(record) ⇒ Array<String>
Extracts authors from the given record.
-
#extract_authors_as_recorded(record) ⇒ Array<String>
Extract names from record using tags and relators.
-
#extract_authors_as_recorded_agr(record) ⇒ Array<String>
Extract the alternate graphical representation of the name or return
[]. - #extract_cataloging_convention(record) ⇒ Object
-
#extract_date_part(datestring, ndx1, ndx2) ⇒ String
Extracts a part of the date string from a MARC 008 controlfield, using the indices ndx1 and ndx2.
-
#extract_date_range(record, range_sep:) ⇒ Array
Extract the encoded date from controlfield 008.
-
#extract_extent(record) ⇒ Array<String>
Extracts the extent from the given MARC XML record.
-
#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>
Extract former owners from the given record.
-
#extract_former_owners_as_recorded(record) ⇒ Array<String>
Extracts former owners as recorded from the given record.
-
#extract_former_owners_as_recorded_agr(record) ⇒ Array<String>
Extracts former owners as recorded with vernacular form from the given record.
-
#extract_genre_vocabulary(record) ⇒ Array<Symbol>
Extracts the genre vocabulary from the given MARC XML record.
-
#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ Array<DS::Extractor::Genre>
Extracts genres from the given MARC XML record.
-
#extract_genres_as_recorded(record, uniq: true) ⇒ Array<String>
Genres and subjects.
-
#extract_langs(record) ⇒ String
Extract the language codes from controlfield 008 and datafield 041$a.
- #extract_languages(record) ⇒ Object
-
#extract_languages_as_recorded(record) ⇒ String
Extract the language as record; default to the 546$a field; otheriwse return the code values from controlfield 008 and 041$a.
-
#extract_material_as_recorded(record) ⇒ String
Extracts the material as recorded from the given MARC XML record.
-
#extract_materials(record) ⇒ Array<DS::Extractor::Material>
Extracts materials from the given MARC XML record.
-
#extract_mmsid(record) ⇒ String
Extracts the MMS ID from the given MARC XML record.
-
#extract_name_portion(datafield) ⇒ String
Extract the the PN from datafield, pulling subfields $a, $b, $c, $d.
-
#extract_named_500(record, name:, strip_name: false) ⇒ Array<String>
Return an array of 500$a values that begin with
name:(namefollowed by a colon:). -
#extract_named_subjects(record) ⇒ Array<DS::Extractor::Subject>
Extract named subjects from the MARC XML record based on specified tags.
-
#extract_named_subjects_as_recorded(record) ⇒ Array<String>
Extracts named subjects as recorded from the given record.
-
#extract_names(record, tags: [], relators: []) ⇒ Array<DS::Extractor::Name>
Extract names from the MARC XML record based on specified tags and relators.
-
#extract_names_as_recorded(record, tags: [], relators: []) ⇒ String
Extract names from record using tags and relators.
-
#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ Object
Extract the alternate graphical representation of the name or return ”.
-
#extract_notes(record) ⇒ Array<String>
Notes.
-
#extract_physical_description(record) ⇒ String
Extracts the physical description from the given MARC XML record.
- #extract_places(record) ⇒ Object
-
#extract_pn_agr(datafield) ⇒ String
Extract the alternate graphical representation of the name or return ”.
-
#extract_production_date_as_recorded(record) ⇒ Object
Look for a date as recorded.
-
#extract_production_places_as_recorded(record) ⇒ Array<String>
Look for a place as recorded.
-
#extract_recon_genres(record, sub_sep: '--') ⇒ Array<Array>
Extract genre terms for reconciliation CSV output.
-
#extract_recon_names(record, tags: [], relators: []) ⇒ Array<Array<String>>
For the given record, extract the names as an array of arrays, including the concatenated name string (subfields, a, b, c, d) and, if present, the alternate graphical representation (AGR) and authority number (or URI).
-
#extract_recon_places(record) ⇒ Array<Array>
Extract the places of production MARC 260$a for reconciliation CSV output.
-
#extract_recon_subjects(record) ⇒ Array
Extracts reconstructed subjects from the given record.
-
#extract_recon_titles(record) ⇒ Array<String>
Extracts reconstructed titles from the given record.
-
#extract_role(datafield, relators:) ⇒ String
Extract the role value, subfield $e, from the given datafield.
-
#extract_scribes(record) ⇒ Array<DS::Extractor::Name>
Extract scribes from the given record.
-
#extract_scribes_as_recorded(record) ⇒ Array<String>
Extract scribes as recorded from the given record.
-
#extract_scribes_as_recorded_agr(record) ⇒ Array<String>
Extracts scribes as recorded with vernacular form from the given record.
-
#extract_subject_by_tags(record, tags: []) ⇒ Array<DS::Extractor::Subject>
Return an array of strings of formatted subjects (600, 610, 611, 630, 647, 648, 650, and 651).
-
#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>
Extracts subjects from the given record based on specified tags.
-
#extract_subjects_as_recorded(record) ⇒ Array<String>
Extracts subjects as recorded from the given record.
-
#extract_titles(record) ⇒ Array<DS::Extractor::Title>
Extracts titles from the given record.
-
#extract_titles_as_recorded(record) ⇒ Array<String>
Extracts titles as recorded from the given record.
-
#extract_titles_as_recorded_agr(record) ⇒ Array<String>
Extracts titles as recorded with vernacular form from the given record.
-
#extract_uniform_titles_as_recorded(record) ⇒ Array<String>
Extracts uniform titles as recorded from the given record.
-
#extract_uniform_titles_as_recorded_agr(record) ⇒ Array<String>
Extracts uniform titles as recorded with vernacular form from the given MARC XML record.
- #extract_vocabulary(datafield) ⇒ String
-
#handle_bce_date(record) ⇒ Array<String>
Compiles BCE dates based on the provided record.
-
#title_as_recorded(record) ⇒ String
Extracts the title as recorded from the given record.
-
#title_as_recorded_agr(record, tag) ⇒ String
Extracts the title as recorded with vernacular form from the given record.
-
#uniform_title_as_recorded_agr(record) ⇒ String
Extracts uniform titles as recorded and aggregates them from the given MARC XML record.
-
#uniform_titles_as_recorded(record) ⇒ String
Extracts uniform titles as recorded from the given record.
Instance Method Details
#build_name_query(tags: [], relators: []) ⇒ String
Build names query tags and relators. Tags understood are 100, 700, and 710. The relators are used to require datafields based on the contents of a subfield code e containing the specified value, like ‘scribe’:
contains(./subfield[@code ='e'], 'scribe')
For relators see section <strong>$e - Relator term<strong>, here:
https://www.loc.gov/marc/bibliographic/bdx00.html
To require the subfield not have a relator, pass :none as the relator value.
build_name_query tags: ['100'], relators: :none
This will add the following to the query.
not(./subfield[@code = 'e'])
Note: In U. Penn manuscript catalog records, 700 and 710 fields that do not have a subfield code e are associated authors.
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 264 def build_name_query tags: [], relators: [] return '' if .empty? # don't process nonsensical requests # make sure the tags are all strings = [].flatten.map &:to_s tag_query = .map { |t| "@tag = #{t}" }.join " or " query_string = "(#{tag_query})" _relators = [relators].flatten.map { |r| r.to_s.strip.downcase == 'none' ? :none : r } return "datafield[#{query_string}]" if _relators.empty? if _relators.include? :none query_string += " and not(./subfield[@code = 'e'])" return "datafield[#{query_string}]" end relator_string = relators.map { |r| "contains(./subfield[@code ='e'], '#{r}')" }.join " or " query_string += (relator_string.empty? ? '' : " and (#{relator_string})") "datafield[#{query_string}]" end |
#collect_datafields(record, tags: [], codes: [], field_sep: '|', sub_sep: ' ') ⇒ Array<Array>
Extract subfield values specified by tags
1083 1084 1085 1086 1087 1088 1089 1090 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1083 def collect_datafields record, tags: [], codes: [], field_sep: '|', sub_sep: ' ' = [].flatten.map &:to_s tag_query = .map { |t| "@tag = #{t}" }.join " or " record.xpath("datafield[#{tag_query}]").map { |datafield| value = collect_subfields datafield, codes: codes, sub_sep: sub_sep DS::Util.clean_string value, terminator: '' } end |
#collect_recon_datafields(record, tags: [], codes: [], sub_sep: ' ') ⇒ Array<Array>
Extract datafields values with authority numbers (URL) when present for reconciliation CSV output.
1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1063 def collect_recon_datafields record, tags: [], codes: [], sub_sep: ' ' = [].flatten.map &:to_s tag_query = .map { |t| "@tag = #{t}" }.join " or " record.xpath("datafield[#{tag_query}]").map { |datafield| value = collect_subfields datafield, codes: codes, sub_sep: sub_sep value = DS::Util.clean_string value, terminator: '' number = datafield.xpath('subfield[@tag="0"]').text [value, number] } end |
#collect_subfields(datafield, codes: [], sub_sep: ' ') ⇒ String
A method to collect subfields from a given datafield based on specified codes.
1108 1109 1110 1111 1112 1113 1114 1115 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1108 def collect_subfields datafield, codes: [], sub_sep: ' ' # ensure that +codes+ is an array of strings _codes = [codes].flatten.map &:to_s # Code query example: ['a', 'b', 'd', 'c'] => @code = 'a' or @code = 'b' or @code = 'c' or @code = 'd' code_query = _codes.map { |code| "@code = '#{code}'" }.join ' or ' xpath = %Q{subfield[#{code_query}]} DS::Util.clean_string datafield.xpath(xpath).map(&:text).reject(&:empty?).join sub_sep end |
#compile_dates(record, code, part1, part2) ⇒ Object
Compiles dates based on the provided code and parts. This methods determines the date based on the date code from the MARC 008 field; the code in position 6 of the MARC 008 field.
723 724 725 726 727 728 729 730 731 732 733 734 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 723 def compile_dates record, code, part1, part2 case code when 'i', 'k', 'm', 'p', 'q', '|' [part1, part2] when 'n' [] when 'b' handle_bce_date record else [part1] end end |
#extract_001_control_number(record, holdings_file = nil) ⇒ String
Extracts the 001 control number from the given MARC XML record and joins non-empty values with ‘|’.
1122 1123 1124 1125 1126 1127 1128 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1122 def extract_001_control_number record, holdings_file = nil ids = [] # add the MMS ID ids << extract_mmsid(record) ids.reject(&:empty?).join '|' end |
#extract_acknowledgments(record) ⇒ Array
Extracts acknowledgments from the given record.
1142 1143 1144 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1142 def extract_acknowledgments record [] end |
#extract_all_subjects(record) ⇒ Array<DS::Extractor::Subject>
Extracts all subjects from the given record, including named subjects and subjects.
477 478 479 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 477 def extract_all_subjects record extract_named_subjects(record) + extract_subjects(record) end |
#extract_all_subjects_as_recorded(record) ⇒ Array<String>
Extracts all subjects as recorded from the given record.
485 486 487 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 485 def extract_all_subjects_as_recorded record extract_all_subjects(record).map &:as_recorded end |
#extract_artists(record) ⇒ Array<DS::Extractor::Name>
Extracts artists from the given record using the specified type and role.
96 97 98 99 100 101 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 96 def extract_artists record extract_names( record, tags: [700, 710, 711], relators: ['artist', 'illuminator'] ) end |
#extract_artists_as_recorded(record) ⇒ Array<String>
Extracts artists as recorded from the given record.
107 108 109 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 107 def extract_artists_as_recorded record extract_artists(record).map &:as_recorded end |
#extract_artists_as_recorded_agr(record) ⇒ Array<String>
Extracts artists as recorded with vernacular form from the given record.
115 116 117 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 115 def extract_artists_as_recorded_agr record extract_artists(record).map &:vernacular end |
#extract_associated_agents(record) ⇒ Object
173 174 175 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 173 def extract_associated_agents record [] end |
#extract_authority_number(datafield) ⇒ String
Extract the authority number, subfield $0 from the given datafield.
1013 1014 1015 1016 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1013 def datafield xpath = "./subfield[@code='0']" datafield.xpath(xpath).text end |
#extract_authors(record) ⇒ Array<String>
Extracts authors from the given record.
168 169 170 171 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 168 def record extract_names(record, tags: [100, 110, 111]) + extract_names(record, tags: [700, 710, 711], relators: %w{author}) end |
#extract_authors_as_recorded(record) ⇒ Array<String>
Extract names from record using tags and relators. Authors are extracted from datafields 100, 110, 111, 700, 701, and 711.
All 1xx are extracted, no relator is assumed and all 1xx are assumed to be authors.
700, 710, and 711 are extracted when subfield 7xx$e contains ‘author’.
44 45 46 47 48 49 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 44 def record = [] += extract_names_as_recorded record, tags: [100, 110, 111] += extract_names_as_recorded record, tags: [700, 710, 711], relators: %w{author} end |
#extract_authors_as_recorded_agr(record) ⇒ Array<String>
Extract the alternate graphical representation of the name or return [].
See MARC specification for 880 fields:
59 60 61 62 63 64 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 59 def record = [] += extract_names_as_recorded_agr record, tags: [100, 110, 111] += extract_names_as_recorded_agr record, tags: [700, 710, 711], relators: %w{author} end |
#extract_cataloging_convention(record) ⇒ Object
1050 1051 1052 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1050 def extract_cataloging_convention record record.xpath('datafield[@tag=040]/subfield[@code="e"]/text()').text end |
#extract_date_part(datestring, ndx1, ndx2) ⇒ String
Extracts a part of the date string from a MARC 008 controlfield, using the indices ndx1 and ndx2.
Ensures that the extracted part starts with a digit and matches a sequence of digits and/or ‘u’.
777 778 779 780 781 782 783 784 785 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 777 def extract_date_part datestring, ndx1, ndx2 part = datestring[ndx1, ndx2] # part must start with a digit and match a seq of digits and/or u return unless part =~ /^\d[\du]+/ part.sub! /^0+/, '' if part =~ /^0+[1-9]/ part end |
#extract_date_range(record, range_sep:) ⇒ Array
Extract the encoded date from controlfield 008.
Follows
Returns an array containing a pair of dates or a single date, or an empty array.
The following date types have appeared in MARC records contributed to DS as of 2024-02-27 and are handled here:
b - No dates given; B.C. date involved
- 'b '
- date is taken from 046$b, and if present $d or $e
- See: https://www.loc.gov/marc/bibliographic/bd046.html
e - Detailed date
- 'e11200520', 'e139403 x', 'e164509 t', 'e167707 y',
'e187505 s'
- the first date part is returned a single year
i - Inclusive dates of collection
- 'i07500800', 'i08000830', 'i1000 '
- the first and -- if present -- second date part are
returned as two years
k - Range of years of bulk of collection
- 'k15121716'
- the first and second date parts are returned as two years
m - Multiple dates
- 'm0618193u', 'm07390741', 'm10751200', 'm16uu1637',
'm17uu1900'
- the first and second date parts are returned as two years
- see note below on replacement of u's
n - Dates unknown
- 'nuuuuuuuu'
- no date returned
p - Date of distribution/release/issue and
production/recording session when different
- 'p1400 '
- the first and -- if present -- second date part are
returned as two years
q - Questionable date q - ‘q01000299’, ‘q0979 ’, ‘q09910992’, ‘q10001099’,
'q1300 ', 'q13uu14uu', 'q13uu1693', 'q14011425',
'q1425uuuu', 'q1450 ', 'q1460 ', 'q14uu14uu',
'quuuu1597'
- the first and -- if present -- second date part are
returned as two years
- if the second date part is 'uuuu', the first date part is
returned as year; ; ‘q1425uuuu’ => 1425
- if the first date part is 'uuuu', the second date part is
returned as year; ‘quuuu1597’ => 1597
- for partial date parts with u's, see the note below
r - Reprint/reissue date and original date
- 'r11751199'
- the first date part is returned a single year
s - Single known date/probable date s - ‘s1171 ’, ‘s1171 xx ’, ‘s1192 ua ’, ‘s1250||||’,
's1286 iq ', 's1315 sy ', 's1366 is ', 's1436 gw ',
's1450 it ', 's1470 ly ', 's1470 tu ', 's1470 uuu',
's1497 enk', 's1595 sp ', 's19uu '
- the first date part is returned a single year
- see note below on replacement of u's
| - No attempt to code
- '|12501300'
- this appears to be miscoding
- nevertheless, '|' coded records will follow the default
rule: date part one is returned a single year
The following cases, so far unrepresented in contributor data, will follow the default rule: date part one will be returned as a single year.
c - Continuing resource currently published d - Continuing resource ceased publication t - Publication date and copyright date u - Continuing resource status unknown
Note on the replacement of u’s in partial year dates
- Where u's appear in the first date they are replace by 0;
thus, 'q13uu1693' => '1300, 1693'
- Where u's appear in the second date they are replace by 9;
thus, 'q14uu14uu' => '1400, 1499'
For MARC partial dates see Date 1 and Date 2 documentation here
https://www.loc.gov/marc/bibliographic/bd008a.html
692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 692 def extract_date_range record, range_sep: # 008 controlfield; e.g., # # "220518q14001500xx 000 0 d" ctrl_008 = record.at_xpath("controlfield[@tag='008']") return [] unless ctrl_008 # return if no 008 # get positions 7-15: q14001500 date_str = ctrl_008.text[6, 9] code = date_str[0] # 'm' part1 = extract_date_part date_str, 1, 4 # '0618' part1.gsub! /u/, '0' if part1.present? part2 = extract_date_part date_str, 5, 8 # '193u' part2.gsub! /u/, '9' if part2.present? range = compile_dates(record, code, part1, part2).filter_map { |y| # filter out blank dates and '9999' y if y.present? && y != '9999' } return [] if range.blank? [range.join(range_sep)] end |
#extract_extent(record) ⇒ Array<String>
Extracts the extent from the given MARC XML record.
975 976 977 978 979 980 981 982 983 984 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 975 def extract_extent record subfield_xpath = "subfield[@code = 'a' or @code = 'b' or @code = 'c']" record.xpath("datafield[@tag=300]").map { |datafield| datafield.xpath(subfield_xpath).filter_map { |s| s.text unless s.text.empty? }.join ' ' }.filter_map { |ext| "Extent: #{DS::Util.clean_string ext}" unless ext.strip.empty? } end |
#extract_former_owners(record) ⇒ Array<DS::Extractor::Name>
Extract former owners from the given record.
123 124 125 126 127 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 123 def extract_former_owners record extract_names( record, tags: [700, 710, 711], relators: ['former owner'] ) end |
#extract_former_owners_as_recorded(record) ⇒ Array<String>
Extracts former owners as recorded from the given record.
133 134 135 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 133 def extract_former_owners_as_recorded record extract_former_owners(record).map &:as_recorded end |
#extract_former_owners_as_recorded_agr(record) ⇒ Array<String>
Extracts former owners as recorded with vernacular form from the given record.
141 142 143 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 141 def extract_former_owners_as_recorded_agr record extract_former_owners(record).map &:vernacular end |
#extract_genre_vocabulary(record) ⇒ Array<Symbol>
Extracts the genre vocabulary from the given MARC XML record.
530 531 532 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 530 def extract_genre_vocabulary record extract_genres(record).map(&:vocab) end |
#extract_genres(record, sub_sep: '--', vocab: :all) ⇒ Array<DS::Extractor::Genre>
Extracts genres from the given MARC XML record.
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 508 def extract_genres record, sub_sep: '--', vocab: :all xpath = %q{datafield[@tag = 655]} record.xpath(xpath).filter_map { |datafield| as_recorded = collect_subfields datafield, codes: 'abcvzyx'.split(//), sub_sep: sub_sep as_recorded = DS::Util.clean_string as_recorded, terminator: '' term_vocab = extract_vocabulary datafield = datafield next unless as_recorded.present? next unless vocab == :all || vocab == term_vocab DS::Extractor::Genre.new( as_recorded: as_recorded, vocab: term_vocab, source_authority_uri: ) } end |
#extract_genres_as_recorded(record, uniq: true) ⇒ Array<String>
Genres and subjects
Extract genre and form terms from MARC datafield 655 values, where the 655$2 value can be specified; e.g., rbprov, aat, lcgft.
Set sub2 to :all to extract all 655 terms
366 367 368 369 370 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 366 def extract_genres_as_recorded record, uniq: true terms = extract_genres(record, sub_sep: '--', vocab: :all).map(&:as_recorded) uniq ? terms.uniq : terms end |
#extract_langs(record) ⇒ String
Extract the language codes from controlfield 008 and datafield 041$a.
321 322 323 324 325 326 327 328 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 321 def extract_langs record # Language is in 008 at characters 35-37 (0-based indexing) (langs ||= []) << record.xpath("substring(controlfield[@tag='008']/text(), 36, 3)") # 041 is present if there's more than one language langs += record.xpath("datafield[@tag=041]/subfield[@code='a']").map(&:text) # if there are 041 values, the lang from 008 is repeated; remove the duplicate langs.select(&:present?).uniq end |
#extract_languages(record) ⇒ Object
340 341 342 343 344 345 346 347 348 349 350 351 352 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 340 def extract_languages record xpath = "datafield[@tag=546]/subfield[@code='a']" langs = record.xpath(xpath).map { |val| DS::Util.clean_string val.text, terminator: '' }.select(&:present?).map { |as_recorded| DS::Extractor::Language.new as_recorded: as_recorded } return langs if langs.present? extract_langs(record).map { |as_recorded| DS::Extractor::Language.new as_recorded: as_recorded } end |
#extract_languages_as_recorded(record) ⇒ String
Extract the language as record; default to the 546$a field; otheriwse return the code values from controlfield 008 and 041$a.
336 337 338 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 336 def extract_languages_as_recorded record extract_languages(record).map &:as_recorded end |
#extract_material_as_recorded(record) ⇒ String
Extracts the material as recorded from the given MARC XML record.
954 955 956 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 954 def extract_material_as_recorded record extract_materials(record).map(&:as_recorded).first.to_s end |
#extract_materials(record) ⇒ Array<DS::Extractor::Material>
Extracts materials from the given MARC XML record.
962 963 964 965 966 967 968 969 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 962 def extract_materials record DS::Extractor::MarcXmlExtractor.collect_datafields( record, tags: 300, codes: 'b' ).filter_map { |material| next unless material.present? DS::Extractor::Material.new as_recorded: material } end |
#extract_mmsid(record) ⇒ String
Extracts the MMS ID from the given MARC XML record.
1134 1135 1136 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1134 def extract_mmsid record record.xpath("controlfield[@tag=001]").text end |
#extract_name_portion(datafield) ⇒ String
Extract the the PN from datafield, pulling subfields $a, $b, $c, $d.
289 290 291 292 293 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 289 def extract_name_portion datafield codes = %w{ a b c d } value = collect_subfields datafield, codes: codes DS::Util.clean_string value, terminator: '' end |
#extract_named_500(record, name:, strip_name: false) ⇒ Array<String>
Return an array of 500$a values that begin with name: (name followed by a colon :). The name prefix is removed if strip_name is true; it’s false by default.
1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1156 def extract_named_500 record, name:, strip_name: false return [] if name.to_s.strip.empty? # format the prefix; make sure there's not an extra ':' prefix = "#{name.strip.chomp ':'}:" xpath = %Q{datafield[@tag=500]/subfield[@code='a' and starts-with(translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz"), '#{prefix.downcase}')]/text()} record.xpath(xpath).map { |d| note = d.text.strip strip_name ? note.sub(%r{^#{prefix}\s*}i, '') : note } end |
#extract_named_subjects(record) ⇒ Array<DS::Extractor::Subject>
Extract named subjects from the MARC XML record based on specified tags.
453 454 455 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 453 def extract_named_subjects record record, tags: [600, 610, 611, 630, 647] end |
#extract_named_subjects_as_recorded(record) ⇒ Array<String>
Extracts named subjects as recorded from the given record.
444 445 446 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 444 def extract_named_subjects_as_recorded record extract_named_subjects(record).map &:as_recorded end |
#extract_names(record, tags: [], relators: []) ⇒ Array<DS::Extractor::Name>
Extract names from the MARC XML record based on specified tags and relators.
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 199 def extract_names record, tags: [], relators: [] xpath = build_name_query tags: , relators: relators return [] if xpath.empty? # don't process nonsensical requests record.xpath(xpath).map { |datafield| as_recorded = extract_name_portion datafield role = extract_role datafield, relators: relators role = 'author' if role.blank? vernacular = extract_pn_agr datafield ref = datafield DS::Extractor::Name.new( as_recorded: as_recorded, role: role, vernacular: vernacular, ref: ref ) } end |
#extract_names_as_recorded(record, tags: [], relators: []) ⇒ String
Extract names from record using tags and relators. Tags understood are 100, 700, and 710. The relators are used to require datafields based on the contents of a subfield code e containing the specified value, like ‘scribe’:
contains(./subfield[@code ='e'], 'scribe')
25 26 27 28 29 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 25 def extract_names_as_recorded record, tags: [], relators: [] xpath = build_name_query tags: , relators: relators return '' if xpath.empty? # don't process nonsensical requests record.xpath(xpath).map { |datafield| DS::Util.clean_string extract_name_portion datafield } end |
#extract_names_as_recorded_agr(record, tags: [], relators: []) ⇒ Object
Extract the alternate graphical representation of the name or return ”.
See MARC specification for 880 fields:
230 231 232 233 234 235 236 237 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 230 def extract_names_as_recorded_agr record, tags: [], relators: [] xpath = build_name_query tags: , relators: relators return '' if xpath.empty? # don't process nonsensical requests record.xpath(xpath).map { |datafield| extract_pn_agr datafield } end |
#extract_notes(record) ⇒ Array<String>
Notes
Extract notes from record.
Extract values from ‘500$a` fields that do not begin with AMREMM tags for specific values like ’Binding:‘. Specifically, this method ignores fields beginning with:
Pagination|Foliation|Layout|Colophon|Collation|Script|Decoration|\
Binding|Origin|Watermarks|Watermark|Signatures|Shelfmark
1001 1002 1003 1004 1005 1006 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1001 def extract_notes record xpath = "datafield[@tag=500 or @tag=561]/subfield[@code='a']/text()" record.xpath(xpath).map { |note| DS::Util.clean_string note.text.strip.gsub(%r{\s+}, ' ') } end |
#extract_physical_description(record) ⇒ String
Extracts the physical description from the given MARC XML record.
946 947 948 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 946 def extract_physical_description record extract_extent(record) end |
#extract_places(record) ⇒ Object
575 576 577 578 579 580 581 582 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 575 def extract_places record xpath = "datafield[@tag=260 or @tag=264]/subfield[@code='a']/text()" record.xpath(xpath).map { |pn| next if pn.to_s.blank? as_recorded = DS::Util.clean_string(pn.text, terminator: '') DS::Extractor::Place.new as_recorded: as_recorded } end |
#extract_pn_agr(datafield) ⇒ String
Extract the alternate graphical representation of the name or return ”.
See MARC specification for 880 fields:
Input will look like this:
<marc:datafield ind1="1" ind2=" " tag="100">
<marc:subfield code="6">880-01</marc:subfield>
<marc:subfield code="a">Urmawī, ʻAbd al-Muʼmin ibn Yūsuf,</marc:subfield>
<marc:subfield code="d">approximately 1216-1294.</marc:subfield>
</marc:datafield>
<!-- ... -->
<marc:datafield ind1="1" ind2=" " tag="880">
<marc:subfield code="6">100-01//r</marc:subfield>
<marc:subfield code="a">ارموي، عبد المؤمن بن يوسف،</marc:subfield>
<marc:subfield code="d">اپرxمتلي 12161294.</marc:subfield>
</marc:datafield>
1041 1042 1043 1044 1045 1046 1047 1048 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1041 def extract_pn_agr datafield linkage = datafield.xpath("subfield[@code='6']").text return '' if linkage.empty? tag = datafield.xpath('./@tag').text index = linkage.split('-').last xpath = "./parent::record/datafield[@tag='880' and contains(./subfield[@code='6'], '#{tag}-#{index}')]" extract_name_portion datafield.xpath(xpath) end |
#extract_production_date_as_recorded(record) ⇒ Object
Look for a date as recorded. Look first at 260$c, then 260$d, then 245$f, finally use the encoded date from 008
790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 790 def extract_production_date_as_recorded record # Note that MARC does not specify a subfield '260$d': # # https://www.loc.gov/marc/bibliographic/bd260.html # # However Cornell use $d to continue 260$c dar = record.xpath("datafield[@tag=260]/subfield[@code='c' or @code='d']/text()").map do |t| DS::Util.clean_string t.text.strip end.join ' ' return [dar.strip] unless dar.strip.empty? dar = record.xpath("datafield[@tag=264]/subfield[@code='c']/text()").map do |t| DS::Util.clean_string t.text.strip end.join ' ' return [dar.strip] unless dar.strip.empty? # 245 is the title field but can have a date in $f # # see: https://www.loc.gov/marc/bibliographic/bd245.html # # Cornell uses 245$f in records that also lack 260 or 264; see # '4600 Bd. Ms. 176': # # https://catalog.library.cornell.edu/catalog/6382455/librarian_view # # <datafield ind1="0" ind2="0" tag="245"> # <subfield code="a">Shah-nameh,</subfield> # <subfield code="f">1600s.</subfield> # </datafield> # dar = record.xpath("datafield[@tag=245]/subfield[@code='f']").text return [DS::Util.clean_string(dar)] unless dar.strip.empty? encoded_date = extract_date_range record, range_sep: '-' [encoded_date.join('_').strip] end |
#extract_production_places_as_recorded(record) ⇒ Array<String>
Look for a place as recorded. Look first at 264$a, then 260$a; return ” when no value is found
551 552 553 554 555 556 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 551 def extract_production_places_as_recorded record xpath = "datafield[@tag=260 or @tag=264]/subfield[@code='a']/text()" record.xpath(xpath).map { |pn| DS::Util.clean_string pn.text, terminator: '' unless pn.to_s.strip.empty? } end |
#extract_recon_genres(record, sub_sep: '--') ⇒ Array<Array>
Extract genre terms for reconciliation CSV output.
Returns a two-dimensional array, each row is a place; and each row has three columns: term, vocab, and authority number.
498 499 500 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 498 def extract_recon_genres record, sub_sep: '--' extract_genres(record, sub_sep: sub_sep).map(&:to_a) end |
#extract_recon_names(record, tags: [], relators: []) ⇒ Array<Array<String>>
For the given record, extract the names as an array of arrays, including the concatenated name string (subfields, a, b, c, d) and, if present, the alternate graphical representation (AGR) and authority number (or URI).
Each returned sub array will have three values: name, name AGR, URI.
189 190 191 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 189 def extract_recon_names record, tags: [], relators: [] extract_names(record, tags: , relators: relators).map &:to_a end |
#extract_recon_places(record) ⇒ Array<Array>
Extract the places of production MARC 260$a for reconciliation CSV output.
Returns a two-dimensional array, each row is a place; and each row has one column: place name; for example:
[["Austria"],
["Germany"],
["France (?)"]]
571 572 573 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 571 def extract_recon_places record extract_places(record).map &:to_a end |
#extract_recon_subjects(record) ⇒ Array
Extracts reconstructed subjects from the given record.
538 539 540 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 538 def extract_recon_subjects record extract_all_subjects(record).map &:to_a end |
#extract_recon_titles(record) ⇒ Array<String>
Extracts reconstructed titles from the given record.
835 836 837 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 835 def extract_recon_titles record extract_titles(record).to_a end |
#extract_role(datafield, relators:) ⇒ String
Extract the role value, subfield $e, from the given datafield.
300 301 302 303 304 305 306 307 308 309 310 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 300 def extract_role datafield, relators: relators_list = *relators return '' if relators_list.empty? or relators_list.include? :none # if there's no $e, stop processing return '' if datafield.xpath('subfield[@code = "e"]/text()').text.empty? df_roles = datafield.xpath('subfield[@code = "e"]/text()').map(&:text) rel_pattern = /(#{relators_list.join('|')})/ role = df_roles.find { |role| role =~ rel_pattern } DS::Util.clean_string role, terminator: '' end |
#extract_scribes(record) ⇒ Array<DS::Extractor::Name>
Extract scribes from the given record.
70 71 72 73 74 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 70 def extract_scribes record extract_names( record, tags: [700, 710, 711], relators: ['scribe'] ) end |
#extract_scribes_as_recorded(record) ⇒ Array<String>
Extract scribes as recorded from the given record.
80 81 82 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 80 def extract_scribes_as_recorded record extract_scribes(record).map &:as_recorded end |
#extract_scribes_as_recorded_agr(record) ⇒ Array<String>
Extracts scribes as recorded with vernacular form from the given record.
88 89 90 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 88 def extract_scribes_as_recorded_agr record extract_scribes(record).map &:vernacular end |
#extract_subject_by_tags(record, tags: []) ⇒ Array<DS::Extractor::Subject>
Return an array of strings of formatted subjects (600, 610, 611, 630, 647, 648, 650, and 651). Subjects values are separated by ‘–’:
<datafield ind1="1" ind2="0" tag="600">
<subfield code="a">Cicero, Marcus Tullius</subfield>
<subfield code="x">Spurious and doubtful works.</subfield>
</datafield>
# => "Cicero, Marcus Tullius--Spurious and doubtful works"
Subfields with codes ‘b’, ‘c’, ‘d’, ‘p’, ‘q’, and ‘t’ are appended to the preceding subfield:
<datafield ind1=" " ind2="7" tag="647">
<subfield code="a">Conspiracy of Catiline</subfield>
<subfield code="c">(Rome :</subfield>
<subfield code="d">65-62 B.C.)</subfield>
<subfield code="2">fast</subfield>
<subfield code="0">(OCoLC)fst01352536</subfield>
</datafield>
# => "Conspiracy of Catiline (Rome : 65-62 B.C.)"
@param [Nokogiri::XML::Node] record the MARC record
398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 398 def record, tags: [] tag_list = * raise "No tags given for subject extraction: #{.inspect}" if tag_list.empty? sep = '--' tag_query = tag_list.map { |tag| "@tag=#{tag}" }.join " or " record.xpath("datafield[#{tag_query}]").map { |datafield| values = Hash.new { |hash, k| hash[k] = [] } vocab = datafield.xpath('./@ind2').text datafield.xpath("subfield").map { |subfield| subfield_text = DS::Util.clean_string subfield.text subfield_code = subfield.xpath('./@code').text case subfield_code when 'e', 'w' # don't include these formatted in subject when 'b', 'c', 'd', 'p', 'q', 't' # append these to the preceding value # we assume that there is a preceding value values[:terms][-1] += " #{subfield_text}" values[:codes][-1] += ";#{subfield_code}" when %r{\A[[:alpha:]]\z} # any other codes: a, g, v, x, y, z values[:terms] << subfield_text values[:codes] << subfield_code when '2' vocab = subfield.text when '0' values[:urls] << subfield_text end } terms = DS::Util.clean_string values[:terms].join(sep), terminator: '' urls = DS::Util.clean_string values[:urls].join(sep), terminator: '' codes = DS::Util.clean_string values[:codes].join(sep), terminator: '' DS::Extractor::Subject.new( as_recorded: terms, subfield_codes: codes, source_authority_uri: urls, vocab: vocab ) } end |
#extract_subjects(record) ⇒ Array<DS::Extractor::Subject>
Extracts subjects from the given record based on specified tags.
469 470 471 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 469 def extract_subjects record record, tags: [648, 650, 651] end |
#extract_subjects_as_recorded(record) ⇒ Array<String>
Extracts subjects as recorded from the given record.
461 462 463 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 461 def extract_subjects_as_recorded record extract_subjects(record).map &:as_recorded end |
#extract_titles(record) ⇒ Array<DS::Extractor::Title>
Extracts titles from the given record.
843 844 845 846 847 848 849 850 851 852 853 854 855 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 843 def extract_titles record tar = title_as_recorded record tar_agr = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.title_as_recorded_agr(record, 245), terminator: '' utar = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.uniform_titles_as_recorded(record), terminator: '' utar_agr = DS::Util.clean_string DS::Extractor::MarcXmlExtractor.uniform_title_as_recorded_agr(record), terminator: '' [DS::Extractor::Title.new( as_recorded: tar, vernacular: tar_agr, uniform_title: utar, uniform_title_vernacular: utar_agr )] end |
#extract_titles_as_recorded(record) ⇒ Array<String>
Extracts titles as recorded from the given record.
893 894 895 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 893 def extract_titles_as_recorded record extract_titles(record).map &:as_recorded end |
#extract_titles_as_recorded_agr(record) ⇒ Array<String>
Extracts titles as recorded with vernacular form from the given record.
861 862 863 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 861 def extract_titles_as_recorded_agr record extract_titles(record).map &:vernacular end |
#extract_uniform_titles_as_recorded(record) ⇒ Array<String>
Extracts uniform titles as recorded from the given record.
913 914 915 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 913 def extract_uniform_titles_as_recorded record extract_titles(record).map &:uniform_title end |
#extract_uniform_titles_as_recorded_agr(record) ⇒ Array<String>
Extracts uniform titles as recorded with vernacular form from the given MARC XML record.
922 923 924 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 922 def extract_uniform_titles_as_recorded_agr record extract_titles(record).map &:uniform_title_vernacular end |
#extract_vocabulary(datafield) ⇒ String
1095 1096 1097 1098 1099 1100 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 1095 def extract_vocabulary datafield return 'lcsh' if datafield['ind2'] == '0' vocab = datafield.xpath("subfield[@code=2]").text vocab.chomp '.' if vocab.present? end |
#handle_bce_date(record) ⇒ Array<String>
Compiles BCE dates based on the provided record. It extracts BCE dates from specific subfields in the MARC XML record.
The method stops and returns an empty array [] if the record lacks a 240$b (BCE date 1). It then looks for a 245$d (BCE date 2) or 245$e (CE date 2). An array containing the single 240$b value as a negative value or a range of two dates.
749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 749 def handle_bce_date record # "datafield[@tag=260]/subfield[@code='c' or @code='d']/text()") bce_date1 = record.at_xpath('datafield[@tag=046]/subfield[@code="b"]/text()').to_s # stop if there's no BCE date 1 return [] if bce_date1.blank? xpath = 'datafield[@tag=046]/subfield[@code="d"]/text()' bce_date2 = record.at_xpath(xpath).to_s return ["-#{bce_date1}", "-#{bce_date2}"] if bce_date2.present? xpath = 'datafield[@tag=046]/subfield[@code="e"]/text()' ce_date2 = bce_date2 = record.at_xpath(xpath).to_s return ["-#{bce_date1}", ce_date2] if ce_date2.present? ["-#{bce_date1}"] end |
#title_as_recorded(record) ⇒ String
Extracts the title as recorded from the given record.
869 870 871 872 873 874 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 869 def title_as_recorded record xpath = "datafield[@tag=245]/subfield[@code='a' or @code='b']" record.xpath(xpath).map { |title| DS::Util.clean_string(title.text, terminator: '') }.join '; ' end |
#title_as_recorded_agr(record, tag) ⇒ String
Extracts the title as recorded with vernacular form from the given record.
881 882 883 884 885 886 887 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 881 def title_as_recorded_agr record, tag linkage = record.xpath("datafield[@tag=#{tag}]/subfield[@code='6']").text return '' if linkage.empty? index = linkage.split('-').last xpath = "datafield[@tag='880' and contains(./subfield[@code='6'], '#{tag}-#{index}')]/subfield[@code='a']" DS::Util.clean_string record.xpath(xpath).text.delete '[]' end |
#uniform_title_as_recorded_agr(record) ⇒ String
Extracts uniform titles as recorded and aggregates them from the given MARC XML record.
930 931 932 933 934 935 936 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 930 def uniform_title_as_recorded_agr record tag240 = title_as_recorded_agr record, 240 tag130 = title_as_recorded_agr record, 130 [tag240, tag130].reject(&:empty?).map { |title| DS::Util.clean_string title }.join '|' end |
#uniform_titles_as_recorded(record) ⇒ String
Extracts uniform titles as recorded from the given record.
901 902 903 904 905 906 907 |
# File 'lib/ds/extractor/marc_xml_extractor.rb', line 901 def uniform_titles_as_recorded record title_240 = record.xpath("datafield[@tag=240]/subfield[@code='a']").text title_130 = record.xpath("datafield[@tag=130]/subfield[@code='a']").text [title_240, title_130].reject(&:empty?).map { |title| DS::Util.clean_string title, terminator: '' }.join '|' end |