Class: Traject::MarcExtractor::Spec
- Inherits:
-
Object
- Object
- Traject::MarcExtractor::Spec
- Defined in:
- lib/traject/marc_extractor_spec.rb
Constant Summary collapse
- DATAFIELD_PATTERN =
Converts from a string marc spec like "008[35]:245abc:700a" to a hash used internally to represent the specification. See comments at head of class for documentation of string specification format.
Return value
The hash returned is keyed by tag, and has as values an array of 0 or or more MarcExtractor::Spec objects representing the specified extraction operations for that tag.
It's an array of possibly more than one, because you can specify multiple extractions on the same tag: for instance "245a:245abc"
See tests for more examples.
/\A([a-zA-Z0-9]{3})(\|([a-z0-9\ \*])([a-z0-9\ \*])\|)?([a-z0-9]*)?\Z/
- CONTROLFIELD_PATTERN =
/\A([a-zA-Z0-9]{3})(\[(\d+)(-(\d+))?\])\Z/
Instance Attribute Summary collapse
-
#byte1 ⇒ Object
Returns the value of attribute byte1.
-
#byte2 ⇒ Object
Returns the value of attribute byte2.
-
#bytes ⇒ Object
readonly
Returns the value of attribute bytes.
-
#indicator1 ⇒ Object
Returns the value of attribute indicator1.
-
#indicator2 ⇒ Object
Returns the value of attribute indicator2.
-
#subfields ⇒ Object
Returns the value of attribute subfields.
-
#tag ⇒ Object
Returns the value of attribute tag.
Class Method Summary collapse
-
.create_controlfield_spec(tag, byte1, byte2) ⇒ Object
Create a new controlfield spec.
-
.create_datafield_spec(tag, ind1, ind2, subfields) ⇒ Object
Create a new datafield spec.
- .hash_from_string(spec_string) ⇒ Object
Instance Method Summary collapse
-
#==(spec) ⇒ Object
Simple equality definition.
-
#includes_subfield_code?(code) ⇒ Boolean
Pass in a string subfield code like 'a'; does this spec include it?.
-
#initialize(hash = nil) ⇒ Spec
constructor
Allow use of a hash to initialize.
-
#joinable? ⇒ Boolean
Should subfields extracted by joined, if we have a seperator? * '630' no subfields specified => join all subfields * '630abc' multiple subfields specified = join all subfields * '633a' one subfield => do not join, return one value for each $a in the field * '633aa' one subfield, doubled => do join after all, will return a single string joining all the values of all the $a's.
-
#matches_indicators?(field) ⇒ Boolean
Pass in a MARC field, do it's indicators match indicators in this spec? nil indicators in spec mean we don't care, everything matches.
- #set_bytes(byte1, byte2) ⇒ Object
Constructor Details
#initialize(hash = nil) ⇒ Spec
Allow use of a hash to initialize. Should ditch this and use optional keyword args once folks move to 2.x syntax
77 78 79 80 81 82 83 |
# File 'lib/traject/marc_extractor_spec.rb', line 77 def initialize(hash = nil) if hash hash.each_pair do |key, value| self.send("#{key}=", value) end end end |
Instance Attribute Details
#byte1 ⇒ Object
Returns the value of attribute byte1.
73 74 75 |
# File 'lib/traject/marc_extractor_spec.rb', line 73 def byte1 @byte1 end |
#byte2 ⇒ Object
Returns the value of attribute byte2.
73 74 75 |
# File 'lib/traject/marc_extractor_spec.rb', line 73 def byte2 @byte2 end |
#bytes ⇒ Object (readonly)
Returns the value of attribute bytes.
73 74 75 |
# File 'lib/traject/marc_extractor_spec.rb', line 73 def bytes @bytes end |
#indicator1 ⇒ Object
Returns the value of attribute indicator1.
73 74 75 |
# File 'lib/traject/marc_extractor_spec.rb', line 73 def indicator1 @indicator1 end |
#indicator2 ⇒ Object
Returns the value of attribute indicator2.
73 74 75 |
# File 'lib/traject/marc_extractor_spec.rb', line 73 def indicator2 @indicator2 end |
#subfields ⇒ Object
Returns the value of attribute subfields.
72 73 74 |
# File 'lib/traject/marc_extractor_spec.rb', line 72 def subfields @subfields end |
#tag ⇒ Object
Returns the value of attribute tag.
72 73 74 |
# File 'lib/traject/marc_extractor_spec.rb', line 72 def tag @tag end |
Class Method Details
.create_controlfield_spec(tag, byte1, byte2) ⇒ Object
Create a new controlfield spec
218 219 220 221 222 |
# File 'lib/traject/marc_extractor_spec.rb', line 218 def self.create_controlfield_spec(tag, byte1, byte2) spec = Spec.new(:tag => tag.freeze) spec.set_bytes(byte1.freeze, byte2.freeze) spec end |
.create_datafield_spec(tag, ind1, ind2, subfields) ⇒ Object
Create a new datafield spec. Most of the logic about how to deal with special characters is built into the Spec class.
204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/traject/marc_extractor_spec.rb', line 204 def self.create_datafield_spec(tag, ind1, ind2, subfields) spec = Spec.new(:tag => tag) spec.indicator1 = ind1.freeze spec.indicator2 = ind2.freeze if subfields and !subfields.empty? spec.subfields = subfields.split('') end spec end |
.hash_from_string(spec_string) ⇒ Object
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/traject/marc_extractor_spec.rb', line 168 def self.hash_from_string(spec_string) # hash defaults to [] hash = Hash.new # Split the string(s) given on colon spec_strings = spec_string.is_a?(Array) ? spec_string.map { |s| s.split(/\s*:\s*/) }.flatten : spec_string.split(/\s*:\s*/) spec_strings.each do |part| if m = DATAFIELD_PATTERN.match(part) tag, ind1, ind2, subfields = m[1], m[3], m[4], m[5] spec = create_datafield_spec(tag, ind1, ind2, subfields) hash[spec.tag] ||= [] hash[spec.tag] << spec elsif m = CONTROLFIELD_PATTERN.match(part) tag, byte1, byte2 = m[1], m[3], m[5] spec = create_controlfield_spec(tag, byte1, byte2) hash[spec.tag] ||= [] hash[spec.tag] << spec else raise ArgumentError.new("Unrecognized marc extract specification: #{part}") end end return hash end |
Instance Method Details
#==(spec) ⇒ Object
Simple equality definition
138 139 140 141 142 143 144 145 146 |
# File 'lib/traject/marc_extractor_spec.rb', line 138 def ==(spec) return false unless spec.kind_of?(Spec) return (self.tag == spec.tag) && (self.subfields == spec.subfields) && (self.indicator1 == spec.indicator1) && (self.indicator2 == spec.indicator2) && (self.bytes == spec.bytes) end |
#includes_subfield_code?(code) ⇒ Boolean
Pass in a string subfield code like 'a'; does this spec include it?
132 133 134 135 |
# File 'lib/traject/marc_extractor_spec.rb', line 132 def includes_subfield_code?(code) # subfields nil means include them all self.subfields.nil? || self.subfields.include?(code) end |
#joinable? ⇒ Boolean
Should subfields extracted by joined, if we have a seperator?
- '630' no subfields specified => join all subfields
- '630abc' multiple subfields specified = join all subfields
- '633a' one subfield => do not join, return one value for each $a in the field
- '633aa' one subfield, doubled => do join after all, will return a single string joining all the values of all the $a's.
Last case is handled implicitly at the moment when subfields == ['a', 'a']
92 93 94 |
# File 'lib/traject/marc_extractor_spec.rb', line 92 def joinable? (self.subfields.nil? || self.subfields.size != 1) end |
#matches_indicators?(field) ⇒ Boolean
Pass in a MARC field, do it's indicators match indicators in this spec? nil indicators in spec mean we don't care, everything matches.
125 126 127 128 |
# File 'lib/traject/marc_extractor_spec.rb', line 125 def matches_indicators?(field) return (indicator1.nil? || indicator1 == field.indicator1) && (indicator2.nil? || indicator2 == field.indicator2) end |
#set_bytes(byte1, byte2) ⇒ Object
114 115 116 117 118 119 120 |
# File 'lib/traject/marc_extractor_spec.rb', line 114 def set_bytes(byte1, byte2) if byte1 && byte2 @bytes = ((byte1.to_i)..(byte2.to_i)) elsif byte1 @bytes = byte1.to_i end end |