Class: Traject::Macros::MarcFormatClassifier

Inherits:
Object
  • Object
show all
Defined in:
lib/traject/macros/marc_format_classifier.rb

Overview

A tool for classifiying MARC records according to format/form/genre/type, just using our own custom vocabulary for those things.

used by the marc_formats macro, but you can also use it directly for a bit more control.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(marc_record) ⇒ MarcFormatClassifier

Returns a new instance of MarcFormatClassifier.



33
34
35
# File 'lib/traject/macros/marc_format_classifier.rb', line 33

def initialize(marc_record)
  @record = marc_record
end

Instance Attribute Details

#recordObject (readonly)

Returns the value of attribute record.



31
32
33
# File 'lib/traject/macros/marc_format_classifier.rb', line 31

def record
  @record
end

Instance Method Details

#formats(options = {}) ⇒ Object

A very opinionated method that just kind of jams together all the possible format/genre/types into one array of 1 to N elements.

If no other values are present, the default value "Other" will be used.

See also individual methods which you can use you seperate into different facets or do other custom things.



44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/traject/macros/marc_format_classifier.rb', line 44

def formats(options = {})
  options = {:default => "Other"}.merge(options)

  formats = []

  formats.concat genre

  formats << "Manuscript/Archive" if manuscript_archive?
  formats << "Microform" if microform?
  formats << "Online"    if online?

  # In our own data, if it's an audio recording, it might show up
  # as print, but it's probably not.
  formats << "Print"     if print? && ! (formats.include?("Non-musical Recording") || formats.include?("Musical Recording"))

  # If it's a Dissertation, we decide it's NOT a book
  if thesis?
    formats.delete("Book")
    formats << "Dissertation/Thesis"
  end

  if proceeding?
    formats <<  "Conference"
  end

  if formats.empty?
    formats << options[:default]
  end

  return formats
end

#genreObject

Returns 1 or more values in an array from: Book; Journal/Newspaper; Musical Score; Map/Globe; Non-musical Recording; Musical Recording Image; Software/Data; Video/Film

Uses leader byte 6, leader byte 7, and 007 byte 0.

Gets actual labels from marc_genre_leader and marc_genre_007 translation maps, so you can customize labels if you want.



86
87
88
89
90
91
92
93
94
95
# File 'lib/traject/macros/marc_format_classifier.rb', line 86

def genre
  marc_genre_leader = Traject::TranslationMap.new("marc_genre_leader")
  marc_genre_007    = Traject::TranslationMap.new("marc_genre_007")

  results = marc_genre_leader[ record.leader.slice(6,2) ] ||
    marc_genre_leader[ record.leader.slice(6)] ||
    record.find_all {|f| f.tag == "007"}.collect {|f| marc_genre_007[f.value.slice(0)]}

  [results].flatten
end

#manuscript_archive?Boolean

Marked as manuscript OR archive.

Returns:

  • (Boolean)


175
176
177
178
179
180
181
182
183
184
185
186
# File 'lib/traject/macros/marc_format_classifier.rb', line 175

def manuscript_archive?
  leader06 = record.leader.slice(6)
  leader08 = record.leader.slice(8)

  # leader 6 t=Manuscript Language Material, d=Manuscript Music,
  # f=Manuscript Cartographic
  #
  # leader 06 = 'b' is obsolete, but if it exists it means archival countrl
  #
  # leader 08 'a'='archival control'
  %w{t d f b}.include?(leader06) || leader08 == "a"
end

#microform?Boolean

if field 007 byte 0 is 'h', that's microform. But many of our microform don't have that. If leader byte 6 is 'h', that's an obsolete way of saying microform. And finally, if GMD is

Returns:

  • (Boolean)


168
169
170
171
172
# File 'lib/traject/macros/marc_format_classifier.rb', line 168

def microform?
  normalized_gmd.start_with?("[microform]") ||
  record.leader[6] == "h" ||
  record.find {|f| (f.tag == "007") && (f.value[0] == "h")}
end

#normalized_gmdObject

downcased version of the gmd, or else empty string



189
190
191
192
193
# File 'lib/traject/macros/marc_format_classifier.rb', line 189

def normalized_gmd
  @gmd ||= begin
    ((a245 = record['245']) && a245['h'] && a245['h'].downcase) || ""
  end
end

#online?Boolean

We use marc 007 to determine if this represents an online resource. But sometimes resort to 245$h GMD too.

Returns:

  • (Boolean)


150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/traject/macros/marc_format_classifier.rb', line 150

def online?
  # field 007, byte 0 c="electronic" byte 1 r="remote" ==> sure Online
  found_007 = record.fields('007').find do |field|
    field.value.slice(0) == "c" && field.value.slice(1) == "r"
  end

  return true if found_007

  # Otherwise, if it has a GMD ["electronic resource"], we count it
  # as online only if NO 007[0] == 'c' exists, cause if it does we already
  # know it's electronic but not remote, otherwise first try would
  # have found it.
  return (normalized_gmd.start_with? "[electronic resource]") && ! record.find {|f| f.tag == '007' && f.value.slice(0) == "c"}
end

#print?Boolean

Algorithm with help from Chris Case.

  • If it has any RDA 338, then it's print if it has a value of volume, sheet, or card.
  • If it does not have an RDA 338, it's print if and only if it has no 245$h GMD.

  • Here at JH, for legacy reasons we also choose to not call it print if it's already been marked audio, but we do that in a different method.

Note that any record that has neither a 245 nor a 338rda is going to be marked print

This algorithm is definitely going to get some things wrong in both directions, with real world data. But seems to be good enough.

Returns:

  • (Boolean)


129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# File 'lib/traject/macros/marc_format_classifier.rb', line 129

def print?


  rda338 = record.find_all do |field|
    field.tag == "338" && field['2'] == "rdacarrier"
  end

  if rda338.length > 0
    rda338.find do |field|
      field.subfields.find do |sf|
        (sf.code == "a" && %w{volume card sheet}.include?(sf.value)) ||
        (sf.code == "b" && %w{nc no nb}.include?(sf.value))
      end
    end
  else
    normalized_gmd.length == 0 
  end
end

#proceeding?Boolean

Just checks all $6xx for a $v "Congresses"

Returns:

  • (Boolean)


105
106
107
108
109
110
111
112
# File 'lib/traject/macros/marc_format_classifier.rb', line 105

def proceeding?
  @proceeding_q ||= begin
    ! record.find do |field|
      field.tag.slice(0) == '6' &&
          field.subfields.find {|sf| sf.code == "v" && /^\s*(C|c)ongresses\.?\s*$/.match(sf.value) }
    end.nil?
  end
end

#thesis?Boolean

Just checks if it has a 502, if it does it's considered a thesis

Returns:

  • (Boolean)


98
99
100
101
102
# File 'lib/traject/macros/marc_format_classifier.rb', line 98

def thesis?
  @thesis_q ||= begin
    ! record.find {|a| a.tag == "502"}.nil?
  end
end