Module: PennMARC::Util
- Included in:
- Helper
- Defined in:
- lib/pennmarc/util.rb
Overview
class to hold “utility” methods used in MARC parsing methods
Constant Summary collapse
- TRAILING_PUNCTUATIONS_PATTERNS =
{ semicolon: /\s*;\s*$/, colon: /\s*:\s*$/, equal: /=$/, slash: %r{\s*/\s*$}, comma: /\s*,\s*$/, period: /\.\s*$/ }.freeze
Instance Method Summary collapse
-
#append_relator(field:, joined_subfields:, relator_term_sf:, relator_map: Mappers.relator) ⇒ String
Appends a relator value to the given string.
-
#append_trailing(trailer, string) ⇒ String
Intelligently append given punctuation to the end of a string.
-
#datafield_and_linked_alternate(record, tag) ⇒ Array<String>
Returns the non-6,8 subfields from a datafield and its 880 link.
-
#field_defined?(record, marc_field) ⇒ Boolean
Check if a given record has a field present by tag (e.g., ‘041’).
-
#field_or_its_linked_alternate?(field, tags) ⇒ Boolean
Does a field or its linked alternate match any of the specified tags?.
-
#join_and_squish(array) ⇒ String
Join array and normalizing extraneous spaces.
-
#join_subfields(field, &selector) ⇒ String
Join subfields from a field selected based on a provided proc.
-
#linked_alternate(record, subfield6_value, &selector) ⇒ Array
MARC 880 field “Alternate Graphic Representation” contains text “linked” to another field (e.g., 254 [Title]) used as an alternate representation.
-
#linked_alternate_not_6_or_8(record, subfield6_value) ⇒ Array
Common case of wanting to extract all the subfields besides 6 or 8, from 880 datafield that has a particular subfield 6 value.
-
#no_subfield_value_matches?(field, subfield, regex) ⇒ Boolean?
returns true if field has no value that matches passed-in regex and passed in subfield.
-
#prefixed_subject_and_alternate(record, prefix) ⇒ Array
Get 650 & 880 for Provenance and Chronology: prefix should be ‘PRO’ or ‘CHR’ and may be preceded by a ‘%’.
-
#relator(field:, relator_term_sf:, relator_map: Mappers.relator) ⇒ String
Returns a relator value of the given field.
-
#relator_join_separator(str) ⇒ String (frozen)
Match any open dates ending a given string to determine join separator for relator term in 1xx/7xx fields.
- #relator_term_subfield(field) ⇒ String (frozen)
-
#remove_paren_value_from_subfield_i(field) ⇒ String
If there’s a subfield i, extract its value, and if there’s something in parentheses in that value, extract that.
-
#subfield_defined?(field, subfield) ⇒ Boolean
Check if a field has a given subfield defined.
-
#subfield_in?(array) ⇒ Proc
returns a lambda checking if passed-in subfield’s code is a member of array.
-
#subfield_not_in?(array) ⇒ Proc
returns a lambda checking if passed-in subfield’s code is NOT a member of array.
-
#subfield_undefined?(field, subfield) ⇒ Boolean
Check if a field does not have a given subfield defined.
-
#subfield_value?(field, subfield, regex) ⇒ Boolean?
returns true if field has a value that matches passed-in regex and passed in subfield.
-
#subfield_value_in?(field, subfield, array) ⇒ Boolean
returns true if a given field has a given subfield value in a given array.
-
#subfield_value_not_in?(field, subfield, array) ⇒ Boolean
returns true if a given field does not have a given subfield value in a given array.
-
#subfield_values(field, subfield) ⇒ Array
Gets all subfield values for a subfield in a given field.
-
#subfield_values_for(tag:, subfield:, record:) ⇒ Array
Get all subfield values for a provided subfield from any occurrence of a provided tag/tags.
-
#substring_after(string, target) ⇒ String (frozen)?
Get the substring of a string after the first occurrence of a target character.
-
#substring_before(string, target) ⇒ String (frozen)?
Get the substring of a string up to a given target character.
-
#translate_relator(relator_code, mapping) ⇒ String?
Translate a relator code using mapping.
-
#trim_punctuation(string) ⇒ String
Trim punctuation method extracted from Traject macro, to ensure consistent output.
- #trim_trailing(trailer, string) ⇒ String
-
#trim_trailing!(trailer, string) ⇒ String, Nil
trim trailing punctuation, manipulating string in place.
-
#valid_subject_genre_source_code?(field) ⇒ Boolean
Does the given field specify an allowed source code?.
Instance Method Details
#append_relator(field:, joined_subfields:, relator_term_sf:, relator_map: Mappers.relator) ⇒ String
Appends a relator value to the given string. It prioritizes relator codes found in subfield $4 and falls back to the specified relator term subfield (defaulting to ‘e’) if no valid codes are found in $4. Use with 1xx/7xx fields.
336 337 338 339 340 341 342 343 344 345 346 347 348 |
# File 'lib/pennmarc/util.rb', line 336 def append_relator(field:, joined_subfields:, relator_term_sf:, relator_map: Mappers.relator) joined_subfields = trim_trailing(:comma, joined_subfields) join_separator = relator_join_separator(joined_subfields) relator = subfield_values(field, '4').filter_map { |code| translate_relator(code, relator_map) } relator = subfield_values(field, relator_term_sf).map { |term| trim_trailing(:comma, term) } if relator.blank? relator = append_trailing(:period, relator.join(', ')) if relator.present? [joined_subfields, relator].compact_blank.join(join_separator).squish end |
#append_trailing(trailer, string) ⇒ String
Intelligently append given punctuation to the end of a string
169 170 171 172 173 174 175 176 177 178 |
# File 'lib/pennmarc/util.rb', line 169 def append_trailing(trailer, string) return string if string.end_with?('.', '-') map = { semicolon: ';', colon: ':', slash: '/', comma: ',', period: '.' } string + map[trailer.to_sym] end |
#datafield_and_linked_alternate(record, tag) ⇒ Array<String>
Returns the non-6,8 subfields from a datafield and its 880 link.
214 215 216 217 218 |
# File 'lib/pennmarc/util.rb', line 214 def datafield_and_linked_alternate(record, tag) record.fields(tag).filter_map { |field| join_subfields(field, &subfield_not_in?(%w[6 8])) } + linked_alternate_not_6_or_8(record, tag) end |
#field_defined?(record, marc_field) ⇒ Boolean
Check if a given record has a field present by tag (e.g., ‘041’)
19 20 21 |
# File 'lib/pennmarc/util.rb', line 19 def field_defined?(record, marc_field) record.select { |field| field.tag == marc_field }.any? end |
#field_or_its_linked_alternate?(field, tags) ⇒ Boolean
Does a field or its linked alternate match any of the specified tags?
303 304 305 306 307 308 |
# File 'lib/pennmarc/util.rb', line 303 def field_or_its_linked_alternate?(field, ) return true if field.tag.in? return true if field.tag == '880' && subfield_value?(field, '6', /^(#{.join('|')})/) false end |
#join_and_squish(array) ⇒ String
Join array and normalizing extraneous spaces
239 240 241 |
# File 'lib/pennmarc/util.rb', line 239 def join_and_squish(array) array.join(' ').squish end |
#join_subfields(field, &selector) ⇒ String
Join subfields from a field selected based on a provided proc
27 28 29 30 31 32 33 34 35 36 |
# File 'lib/pennmarc/util.rb', line 27 def join_subfields(field, &selector) return '' unless field field.select(&selector).filter_map { |sf| value = sf.value&.strip next if value.blank? value }.join(' ').squish end |
#linked_alternate(record, subfield6_value, &selector) ⇒ Array
MARC 880 field “Alternate Graphic Representation” contains text “linked” to another field (e.g., 254 [Title]) used as an alternate representation. Often used to hold translations of title values. A common need is to extract subfields as selected by passed-in block from 880 datafield that has a particular subfield 6 value. See: www.loc.gov/marc/bibliographic/bd880.html
189 190 191 192 193 194 195 |
# File 'lib/pennmarc/util.rb', line 189 def linked_alternate(record, subfield6_value, &selector) record.fields('880').filter_map do |field| next unless subfield_value?(field, '6', /^(#{Array.wrap(subfield6_value).join('|')})/) field.select(&selector).map(&:value).join(' ') end end |
#linked_alternate_not_6_or_8(record, subfield6_value) ⇒ Array
Common case of wanting to extract all the subfields besides 6 or 8, from 880 datafield that has a particular subfield 6 value. We exclude 6 because that value is the linkage ID itself and 8 because… IDK
203 204 205 206 207 208 |
# File 'lib/pennmarc/util.rb', line 203 def linked_alternate_not_6_or_8(record, subfield6_value) excluded_subfields = %w[6 8] linked_alternate(record, subfield6_value) do |sf| excluded_subfields.exclude?(sf.code) end end |
#no_subfield_value_matches?(field, subfield, regex) ⇒ Boolean?
returns true if field has no value that matches passed-in regex and passed in subfield
54 55 56 |
# File 'lib/pennmarc/util.rb', line 54 def no_subfield_value_matches?(field, subfield, regex) field&.none? { |sf| sf.code == subfield.to_s && sf.value =~ regex } end |
#prefixed_subject_and_alternate(record, prefix) ⇒ Array
11/2018: do not display $5 in PRO or CHR subjs
Get 650 & 880 for Provenance and Chronology: prefix should be ‘PRO’ or ‘CHR’ and may be preceded by a ‘%’
277 278 279 280 281 282 283 284 285 286 287 288 289 |
# File 'lib/pennmarc/util.rb', line 277 def prefixed_subject_and_alternate(record, prefix) record.fields(%w[650 880]).filter_map { |field| next unless field.indicator2 == '4' next if field.tag == '880' && no_subfield_value_matches?(field, '6', /^650/) next unless field.any? { |sf| sf.code == 'a' && sf.value =~ /^(#{prefix}|%#{prefix})/ } elements = field.select(&subfield_in?(%w[a])).map { |sf| sf.value.gsub(/^%?#{prefix}/, '') } elements << join_subfields(field, &subfield_not_in?(%w[a 6 8 e w 5])) join_and_squish elements }.uniq end |
#relator(field:, relator_term_sf:, relator_map: Mappers.relator) ⇒ String
Returns a relator value of the given field. Like append_relator, it prioritizes relator codes found in subfileld $4 and falls back to the specified relator term subfield relator_term_sf if no valid codes are found in $4
356 357 358 359 360 |
# File 'lib/pennmarc/util.rb', line 356 def relator(field:, relator_term_sf:, relator_map: Mappers.relator) relator = subfield_values(field, '4').filter_map { |code| translate_relator(code, relator_map) } relator = subfield_values(field, relator_term_sf) if relator.blank? relator.join end |
#relator_join_separator(str) ⇒ String (frozen)
Match any open dates ending a given string to determine join separator for relator term in 1xx/7xx fields.
313 314 315 |
# File 'lib/pennmarc/util.rb', line 313 def relator_join_separator(str) /\b\d+-\z/.match?(str) ? ' ' : ', ' end |
#relator_term_subfield(field) ⇒ String (frozen)
324 325 326 |
# File 'lib/pennmarc/util.rb', line 324 def relator_term_subfield(field) field_or_its_linked_alternate?(field, %w[111 411 611 711 811]) ? 'j' : 'e' end |
#remove_paren_value_from_subfield_i(field) ⇒ String
If there’s a subfield i, extract its value, and if there’s something in parentheses in that value, extract that.
247 248 249 250 251 252 253 254 255 256 257 258 259 |
# File 'lib/pennmarc/util.rb', line 247 def remove_paren_value_from_subfield_i(field) val = field.filter_map { |sf| next unless sf.code == 'i' match = /\((.+?)\)/.match(sf.value) if match sf.value.sub("(#{match[1]})", '') else sf.value end }.first || '' trim_trailing(:colon, trim_trailing(:period, val)) end |
#subfield_defined?(field, subfield) ⇒ Boolean
Check if a field has a given subfield defined
94 95 96 |
# File 'lib/pennmarc/util.rb', line 94 def subfield_defined?(field, subfield) field.any? { |sf| sf.code == subfield.to_s } end |
#subfield_in?(array) ⇒ Proc
returns a lambda checking if passed-in subfield’s code is a member of array
79 80 81 |
# File 'lib/pennmarc/util.rb', line 79 def subfield_in?(array) ->(subfield) { array.member?(subfield.code) } end |
#subfield_not_in?(array) ⇒ Proc
returns a lambda checking if passed-in subfield’s code is NOT a member of array
86 87 88 |
# File 'lib/pennmarc/util.rb', line 86 def subfield_not_in?(array) ->(subfield) { !array.member?(subfield.code) } end |
#subfield_undefined?(field, subfield) ⇒ Boolean
Check if a field does not have a given subfield defined
102 103 104 |
# File 'lib/pennmarc/util.rb', line 102 def subfield_undefined?(field, subfield) field.none? { |sf| sf.code == subfield.to_s } end |
#subfield_value?(field, subfield, regex) ⇒ Boolean?
returns true if field has a value that matches passed-in regex and passed in subfield
44 45 46 |
# File 'lib/pennmarc/util.rb', line 44 def subfield_value?(field, subfield, regex) field&.any? { |sf| sf.code == subfield.to_s && sf.value =~ regex } end |
#subfield_value_in?(field, subfield, array) ⇒ Boolean
returns true if a given field has a given subfield value in a given array
63 64 65 |
# File 'lib/pennmarc/util.rb', line 63 def subfield_value_in?(field, subfield, array) field.any? { |sf| sf.code == subfield.to_s && sf.value.in?(array) } end |
#subfield_value_not_in?(field, subfield, array) ⇒ Boolean
returns true if a given field does not have a given subfield value in a given array
72 73 74 |
# File 'lib/pennmarc/util.rb', line 72 def subfield_value_not_in?(field, subfield, array) field.none? { |sf| sf.code == subfield.to_s && sf.value.in?(array) } end |
#subfield_values(field, subfield) ⇒ Array
Gets all subfield values for a subfield in a given field
110 111 112 113 114 115 116 117 118 |
# File 'lib/pennmarc/util.rb', line 110 def subfield_values(field, subfield) field.filter_map do |sf| next unless sf.code == subfield.to_s next if sf.value.blank? sf.value end end |
#subfield_values_for(tag:, subfield:, record:) ⇒ Array
Get all subfield values for a provided subfield from any occurrence of a provided tag/tags
125 126 127 128 129 |
# File 'lib/pennmarc/util.rb', line 125 def subfield_values_for(tag:, subfield:, record:) record.fields(tag).flat_map do |field| subfield_values field, subfield end end |
#substring_after(string, target) ⇒ String (frozen)?
Get the substring of a string after the first occurrence of a target character
232 233 234 |
# File 'lib/pennmarc/util.rb', line 232 def substring_after(string, target) string.scan(target).present? ? string.split(target, 2).second : '' end |
#substring_before(string, target) ⇒ String (frozen)?
Get the substring of a string up to a given target character
224 225 226 |
# File 'lib/pennmarc/util.rb', line 224 def substring_before(string, target) string.scan(target).present? ? string.split(target, 2).first : '' end |
#translate_relator(relator_code, mapping) ⇒ String?
handle case of receiving a URI? E.g., loc.gov/relator/aut
Translate a relator code using mapping
266 267 268 269 270 |
# File 'lib/pennmarc/util.rb', line 266 def translate_relator(relator_code, mapping) return if relator_code.blank? mapping[relator_code&.to_sym] end |
#trim_punctuation(string) ⇒ String
Trim punctuation method extracted from Traject macro, to ensure consistent output
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/pennmarc/util.rb', line 134 def trim_punctuation(string) return string unless string string = string.sub(%r{ *[ ,/;:] *\Z}, '') # trailing period if it is preceded by at least three letters (possibly preceded and followed by whitespace) string = string.sub(/( *[[:word:]]{3,})\. *\Z/, '\1') # single square bracket characters if they are the start and/or end chars and there are no internal square # brackets. string = string.sub(/\A\[?([^\[\]]+)\]?\Z/, '\1') # trim any leading or trailing whitespace string.strip end |
#trim_trailing(trailer, string) ⇒ String
153 154 155 |
# File 'lib/pennmarc/util.rb', line 153 def trim_trailing(trailer, string) string.sub TRAILING_PUNCTUATIONS_PATTERNS[trailer.to_sym], '' end |
#trim_trailing!(trailer, string) ⇒ String, Nil
trim trailing punctuation, manipulating string in place
161 162 163 |
# File 'lib/pennmarc/util.rb', line 161 def trim_trailing!(trailer, string) string.sub! TRAILING_PUNCTUATIONS_PATTERNS[trailer.to_sym], '' end |
#valid_subject_genre_source_code?(field) ⇒ Boolean
Does the given field specify an allowed source code?
295 296 297 |
# File 'lib/pennmarc/util.rb', line 295 def valid_subject_genre_source_code?(field) subfield_value_in?(field, '2', PennMARC::HeadingControl::ALLOWED_SOURCE_CODES) end |