Module: PennMARC::Util

Included in:
Helper
Defined in:
lib/pennmarc/util.rb

Overview

class to hold “utility” methods used in MARC parsing methods

Constant Summary collapse

TRAILING_PUNCTUATIONS_PATTERNS =
{ semicolon: /\s*;\s*$/,
colon: /\s*:\s*$/,
equal: /=$/,
slash: %r{\s*/\s*$},
comma: /\s*,\s*$/,
period: /\.\s*$/ }.freeze

Instance Method Summary collapse

Instance Method Details

#append_relator(field:, joined_subfields:, relator_term_sf:, relator_map: Mappers.relator) ⇒ String

Appends a relator value to the given string. It prioritizes relator codes found in subfield $4 and falls back to the specified relator term subfield (defaulting to ‘e’) if no valid codes are found in $4. Use with 1xx/7xx fields.

Parameters:

  • field (MARC::Field)

    where relator values are stored

  • joined_subfields (String)

    the string to which the relator is appended

  • relator_term_sf (String)

    MARC subfield that stores relator term

  • relator_map (Hash) (defaults to: Mappers.relator)

Returns:

  • (String)


336
337
338
339
340
341
342
343
344
345
346
347
348
# File 'lib/pennmarc/util.rb', line 336

def append_relator(field:, joined_subfields:, relator_term_sf:, relator_map: Mappers.relator)
  joined_subfields = trim_trailing(:comma, joined_subfields)

  join_separator = relator_join_separator(joined_subfields)

  relator = subfield_values(field, '4').filter_map { |code| translate_relator(code, relator_map) }

  relator = subfield_values(field, relator_term_sf).map { |term| trim_trailing(:comma, term) } if relator.blank?

  relator = append_trailing(:period, relator.join(', ')) if relator.present?

  [joined_subfields, relator].compact_blank.join(join_separator).squish
end

#append_trailing(trailer, string) ⇒ String

Intelligently append given punctuation to the end of a string

Parameters:

  • trailer (Symbol)
  • string (String)

Returns:

  • (String)


169
170
171
172
173
174
175
176
177
178
# File 'lib/pennmarc/util.rb', line 169

def append_trailing(trailer, string)
  return string if string.end_with?('.', '-')

  map = { semicolon: ';',
          colon: ':',
          slash: '/',
          comma: ',',
          period: '.' }
  string + map[trailer.to_sym]
end

#datafield_and_linked_alternate(record, tag) ⇒ Array<String>

Returns the non-6,8 subfields from a datafield and its 880 link.

Parameters:

  • record (MARC::Record)
  • tag (String)

Returns:

  • (Array<String>)

    values



214
215
216
217
218
# File 'lib/pennmarc/util.rb', line 214

def datafield_and_linked_alternate(record, tag)
  record.fields(tag).filter_map { |field|
    join_subfields(field, &subfield_not_in?(%w[6 8]))
  } + linked_alternate_not_6_or_8(record, tag)
end

#field_defined?(record, marc_field) ⇒ Boolean

Check if a given record has a field present by tag (e.g., ‘041’)

Parameters:

  • record (MARC::Record)
  • marc_field (String)

Returns:

  • (Boolean)


19
20
21
# File 'lib/pennmarc/util.rb', line 19

def field_defined?(record, marc_field)
  record.select { |field| field.tag == marc_field }.any?
end

#field_or_its_linked_alternate?(field, tags) ⇒ Boolean

Does a field or its linked alternate match any of the specified tags?

Parameters:

  • field (MARC::Field)
  • tags (Array<String>)

Returns:

  • (Boolean)


303
304
305
306
307
308
# File 'lib/pennmarc/util.rb', line 303

def field_or_its_linked_alternate?(field, tags)
  return true if field.tag.in? tags
  return true if field.tag == '880' && subfield_value?(field, '6', /^(#{tags.join('|')})/)

  false
end

#join_and_squish(array) ⇒ String

Join array and normalizing extraneous spaces

Parameters:

  • array (Array)

Returns:

  • (String)


239
240
241
# File 'lib/pennmarc/util.rb', line 239

def join_and_squish(array)
  array.join(' ').squish
end

#join_subfields(field, &selector) ⇒ String

Join subfields from a field selected based on a provided proc

Parameters:

  • field (MARC::DataField, nil)
  • selector (Proc)

Returns:

  • (String)


27
28
29
30
31
32
33
34
35
36
# File 'lib/pennmarc/util.rb', line 27

def join_subfields(field, &selector)
  return '' unless field

  field.select(&selector).filter_map { |sf|
    value = sf.value&.strip
    next if value.blank?

    value
  }.join(' ').squish
end

#linked_alternate(record, subfield6_value, &selector) ⇒ Array

MARC 880 field “Alternate Graphic Representation” contains text “linked” to another field (e.g., 254 [Title]) used as an alternate representation. Often used to hold translations of title values. A common need is to extract subfields as selected by passed-in block from 880 datafield that has a particular subfield 6 value. See: www.loc.gov/marc/bibliographic/bd880.html

Parameters:

  • record (MARC::Record)
  • subfield6_value (String|Array)

    either a string to look for in sub6 or an array of them

  • selector (Proc)

    takes a subfield as argument, returns a boolean

Returns:

  • (Array)

    array of linked alternates



189
190
191
192
193
194
195
# File 'lib/pennmarc/util.rb', line 189

def linked_alternate(record, subfield6_value, &selector)
  record.fields('880').filter_map do |field|
    next unless subfield_value?(field, '6', /^(#{Array.wrap(subfield6_value).join('|')})/)

    field.select(&selector).map(&:value).join(' ')
  end
end

#linked_alternate_not_6_or_8(record, subfield6_value) ⇒ Array

Common case of wanting to extract all the subfields besides 6 or 8, from 880 datafield that has a particular subfield 6 value. We exclude 6 because that value is the linkage ID itself and 8 because… IDK

Parameters:

  • record (MARC::Record)
  • subfield6_value (String|Array)

    either a string to look for in sub6 or an array of them

Returns:

  • (Array)

    array of linked alternates without 8 or 6 values



203
204
205
206
207
208
# File 'lib/pennmarc/util.rb', line 203

def linked_alternate_not_6_or_8(record, subfield6_value)
  excluded_subfields = %w[6 8]
  linked_alternate(record, subfield6_value) do |sf|
    excluded_subfields.exclude?(sf.code)
  end
end

#no_subfield_value_matches?(field, subfield, regex) ⇒ Boolean?

returns true if field has no value that matches passed-in regex and passed in subfield

Parameters:

  • field (MARC::DataField)
  • subfield (String|Integer|Symbol)
  • regex (Regexp)

Returns:

  • (Boolean, nil)


54
55
56
# File 'lib/pennmarc/util.rb', line 54

def no_subfield_value_matches?(field, subfield, regex)
  field&.none? { |sf| sf.code == subfield.to_s && sf.value =~ regex }
end

#prefixed_subject_and_alternate(record, prefix) ⇒ Array

Note:

11/2018: do not display $5 in PRO or CHR subjs

Get 650 & 880 for Provenance and Chronology: prefix should be ‘PRO’ or ‘CHR’ and may be preceded by a ‘%’

Parameters:

  • record (MARC::Record)
  • prefix (String)

    prefix to select from subject field

Returns:

  • (Array)

    array of values



277
278
279
280
281
282
283
284
285
286
287
288
289
# File 'lib/pennmarc/util.rb', line 277

def prefixed_subject_and_alternate(record, prefix)
  record.fields(%w[650 880]).filter_map { |field|
    next unless field.indicator2 == '4'

    next if field.tag == '880' && no_subfield_value_matches?(field, '6', /^650/)

    next unless field.any? { |sf| sf.code == 'a' && sf.value =~ /^(#{prefix}|%#{prefix})/ }

    elements = field.select(&subfield_in?(%w[a])).map { |sf| sf.value.gsub(/^%?#{prefix}/, '') }
    elements << join_subfields(field, &subfield_not_in?(%w[a 6 8 e w 5]))
    join_and_squish elements
  }.uniq
end

#relator(field:, relator_term_sf:, relator_map: Mappers.relator) ⇒ String

Returns a relator value of the given field. Like append_relator, it prioritizes relator codes found in subfileld $4 and falls back to the specified relator term subfield relator_term_sf if no valid codes are found in $4

Parameters:

  • field (MARC::Field)

    where relator values are stored

  • relator_term_sf (String)

    MARC subfield that stores relator term

  • relator_map (Hash) (defaults to: Mappers.relator)

Returns:

  • (String)


356
357
358
359
360
# File 'lib/pennmarc/util.rb', line 356

def relator(field:, relator_term_sf:, relator_map: Mappers.relator)
  relator = subfield_values(field, '4').filter_map { |code| translate_relator(code, relator_map) }
  relator = subfield_values(field, relator_term_sf) if relator.blank?
  relator.join
end

#relator_join_separator(str) ⇒ String (frozen)

Match any open dates ending a given string to determine join separator for relator term in 1xx/7xx fields.

Parameters:

  • str (String)

Returns:

  • (String (frozen))


313
314
315
# File 'lib/pennmarc/util.rb', line 313

def relator_join_separator(str)
  /\b\d+-\z/.match?(str) ? ' ' : ', '
end

#relator_term_subfield(field) ⇒ String (frozen)

For a given field, determine in which subfield to find relator term The following fields and their linked alternates use $j for relator terms: 111, 411, 611, 711, 811

Parameters:

  • field (MARC:Field)

Returns:

  • (String (frozen))


324
325
326
# File 'lib/pennmarc/util.rb', line 324

def relator_term_subfield(field)
  field_or_its_linked_alternate?(field, %w[111 411 611 711 811]) ? 'j' : 'e'
end

#remove_paren_value_from_subfield_i(field) ⇒ String

If there’s a subfield i, extract its value, and if there’s something in parentheses in that value, extract that.

Parameters:

  • field (MARC::Field)

Returns:

  • (String)

    subfield i without parentheses value



247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/pennmarc/util.rb', line 247

def remove_paren_value_from_subfield_i(field)
  val = field.filter_map { |sf|
    next unless sf.code == 'i'

    match = /\((.+?)\)/.match(sf.value)
    if match
      sf.value.sub("(#{match[1]})", '')
    else
      sf.value
    end
  }.first || ''
  trim_trailing(:colon, trim_trailing(:period, val))
end

#subfield_defined?(field, subfield) ⇒ Boolean

Check if a field has a given subfield defined

Parameters:

  • field (MARC::DataField)
  • subfield (String|Symbol|Integer)

Returns:

  • (Boolean)


94
95
96
# File 'lib/pennmarc/util.rb', line 94

def subfield_defined?(field, subfield)
  field.any? { |sf| sf.code == subfield.to_s }
end

#subfield_in?(array) ⇒ Proc

returns a lambda checking if passed-in subfield’s code is a member of array

Parameters:

  • array (Array)

Returns:

  • (Proc)


79
80
81
# File 'lib/pennmarc/util.rb', line 79

def subfield_in?(array)
  ->(subfield) { array.member?(subfield.code) }
end

#subfield_not_in?(array) ⇒ Proc

returns a lambda checking if passed-in subfield’s code is NOT a member of array

Parameters:

  • array (Array)

Returns:

  • (Proc)


86
87
88
# File 'lib/pennmarc/util.rb', line 86

def subfield_not_in?(array)
  ->(subfield) { !array.member?(subfield.code) }
end

#subfield_undefined?(field, subfield) ⇒ Boolean

Check if a field does not have a given subfield defined

Parameters:

  • field (MARC::DataField)
  • subfield (String|Symbol|Integer)

Returns:

  • (Boolean)


102
103
104
# File 'lib/pennmarc/util.rb', line 102

def subfield_undefined?(field, subfield)
  field.none? { |sf| sf.code == subfield.to_s }
end

#subfield_value?(field, subfield, regex) ⇒ Boolean?

returns true if field has a value that matches passed-in regex and passed in subfield

Parameters:

  • field (MARC::DataField)
  • subfield (String|Integer|Symbol)
  • regex (Regexp)

Returns:

  • (Boolean, nil)


44
45
46
# File 'lib/pennmarc/util.rb', line 44

def subfield_value?(field, subfield, regex)
  field&.any? { |sf| sf.code == subfield.to_s && sf.value =~ regex }
end

#subfield_value_in?(field, subfield, array) ⇒ Boolean

returns true if a given field has a given subfield value in a given array

Parameters:

  • field (MARC::DataField)
  • subfield (String|Integer|Symbol)
  • array (Array)

Returns:

  • (Boolean)


63
64
65
# File 'lib/pennmarc/util.rb', line 63

def subfield_value_in?(field, subfield, array)
  field.any? { |sf| sf.code == subfield.to_s && sf.value.in?(array) }
end

#subfield_value_not_in?(field, subfield, array) ⇒ Boolean

returns true if a given field does not have a given subfield value in a given array

Parameters:

  • field (MARC:DataField)
  • subfield (String|Integer|Symbol)
  • array (Array)

Returns:

  • (Boolean)


72
73
74
# File 'lib/pennmarc/util.rb', line 72

def subfield_value_not_in?(field, subfield, array)
  field.none? { |sf| sf.code == subfield.to_s && sf.value.in?(array) }
end

#subfield_values(field, subfield) ⇒ Array

Gets all subfield values for a subfield in a given field

Parameters:

  • field (MARC::DataField)
  • subfield (String|Symbol)

    as a string or symbol

Returns:

  • (Array)

    subfield values for given subfield code



110
111
112
113
114
115
116
117
118
# File 'lib/pennmarc/util.rb', line 110

def subfield_values(field, subfield)
  field.filter_map do |sf|
    next unless sf.code == subfield.to_s

    next if sf.value.blank?

    sf.value
  end
end

#subfield_values_for(tag:, subfield:, record:) ⇒ Array

Get all subfield values for a provided subfield from any occurrence of a provided tag/tags

Parameters:

  • tag (String, Array)

    tags to consider

  • subfield (String, Symbol)

    subfield to take the values from

  • record (MARC::Record)

    source

Returns:

  • (Array)

    array of subfield values



125
126
127
128
129
# File 'lib/pennmarc/util.rb', line 125

def subfield_values_for(tag:, subfield:, record:)
  record.fields(tag).flat_map do |field|
    subfield_values field, subfield
  end
end

#substring_after(string, target) ⇒ String (frozen)?

Get the substring of a string after the first occurrence of a target character

Parameters:

  • string (String)

    string to split

  • target (String)

    character to split upon

Returns:

  • (String (frozen), nil)


232
233
234
# File 'lib/pennmarc/util.rb', line 232

def substring_after(string, target)
  string.scan(target).present? ? string.split(target, 2).second : ''
end

#substring_before(string, target) ⇒ String (frozen)?

Get the substring of a string up to a given target character

Parameters:

  • string (String)

    string to split

  • target (String)

    character to split upon

Returns:

  • (String (frozen), nil)


224
225
226
# File 'lib/pennmarc/util.rb', line 224

def substring_before(string, target)
  string.scan(target).present? ? string.split(target, 2).first : ''
end

#translate_relator(relator_code, mapping) ⇒ String?

TODO:

handle case of receiving a URI? E.g., loc.gov/relator/aut

Translate a relator code using mapping

Parameters:

  • relator_code (String, NilClass)
  • mapping (Hash)

Returns:

  • (String, nil)

    full relator string



266
267
268
269
270
# File 'lib/pennmarc/util.rb', line 266

def translate_relator(relator_code, mapping)
  return if relator_code.blank?

  mapping[relator_code&.to_sym]
end

#trim_punctuation(string) ⇒ String

Trim punctuation method extracted from Traject macro, to ensure consistent output

Parameters:

  • string (String)

Returns:

  • (String)

    string with relevant punctuation removed



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# File 'lib/pennmarc/util.rb', line 134

def trim_punctuation(string)
  return string unless string

  string = string.sub(%r{ *[ ,/;:] *\Z}, '')

  # trailing period if it is preceded by at least three letters (possibly preceded and followed by whitespace)
  string = string.sub(/( *[[:word:]]{3,})\. *\Z/, '\1')

  # single square bracket characters if they are the start and/or end chars and there are no internal square
  # brackets.
  string = string.sub(/\A\[?([^\[\]]+)\]?\Z/, '\1')

  # trim any leading or trailing whitespace
  string.strip
end

#trim_trailing(trailer, string) ⇒ String

Parameters:

  • trailer (Symbol|String)

    to target for removal

  • string (String)

    to modify

Returns:

  • (String)


153
154
155
# File 'lib/pennmarc/util.rb', line 153

def trim_trailing(trailer, string)
  string.sub TRAILING_PUNCTUATIONS_PATTERNS[trailer.to_sym], ''
end

#trim_trailing!(trailer, string) ⇒ String, Nil

trim trailing punctuation, manipulating string in place

Parameters:

  • trailer (Symbol, String)

    trailer to target for removal

  • string (String)

    string to modify

Returns:

  • (String, Nil)

    string to modify



161
162
163
# File 'lib/pennmarc/util.rb', line 161

def trim_trailing!(trailer, string)
  string.sub! TRAILING_PUNCTUATIONS_PATTERNS[trailer.to_sym], ''
end

#valid_subject_genre_source_code?(field) ⇒ Boolean

Does the given field specify an allowed source code?

Parameters:

  • field (MARC::DataField)

Returns:

  • (Boolean)


295
296
297
# File 'lib/pennmarc/util.rb', line 295

def valid_subject_genre_source_code?(field)
  subfield_value_in?(field, '2', PennMARC::HeadingControl::ALLOWED_SOURCE_CODES)
end