Class: Traject::MarcReader
- Inherits:
-
Object
- Object
- Traject::MarcReader
- Includes:
- Enumerable
- Defined in:
- lib/traject/marc_reader.rb
Overview
- "marc_source.type": serialization type. default 'binary'
- "binary". standard ISO 2709 "binary" MARC format, will use ruby-marc MARC::Reader (Note, if you are using type 'binary', you probably want to also set 'marc_source.encoding')
- "xml", MarcXML, will use ruby-marc MARC::XMLReader
- "json" The "marc-in-json" format, encoded as newline-separated json. (synonym 'ndj'). A simplistic newline-separated json, with no comments allowed, and no unescpaed internal newlines allowed in the json objects -- we just read line by line, and assume each line is a marc-in-json. http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ will use Traject::NDJReader which uses MARC::Record.new_from_hash.
- "marc_source.encoding": Only used for marc_source.type 'binary', character encoding of the source marc records. Can be any encoding recognized by ruby, OR 'MARC-8'. For 'MARC-8', content will be transcoded (by ruby-marc) to UTF-8 in internal MARC::Record Strings. Default nil, meaning let MARC::Reader use it's default, which will probably be Encoding.default_internal, which will probably be UTF-8. Right now Traject::MarcReader is hard-coded to transcode to UTF-8 as an internal encoding.
- "marc_reader.xml_parser": For XML type, which XML parser to tell Marc::Reader to use. Anything recognized by Marc::Reader :parser argument. By default, asks Marc::Reader to take it's best guess as to highest performance available installed option. Probably best to leave as default.
Example
In a configuration file:
require 'traject/marc_reader'
settings do
provide "reader_class_name", "Traject::MarcReader"
provide "marc_source.type", "xml"
end
Constant Summary collapse
- @@best_xml_parser =
MARC::XMLReader.best_available
Instance Attribute Summary collapse
-
#input_stream ⇒ Object
readonly
Returns the value of attribute input_stream.
-
#settings ⇒ Object
readonly
Returns the value of attribute settings.
Instance Method Summary collapse
- #each(*args, &block) ⇒ Object
-
#initialize(input_stream, settings) ⇒ MarcReader
constructor
A new instance of MarcReader.
-
#internal_reader ⇒ Object
Creates proper kind of ruby MARC reader, depending on settings or guesses.
Constructor Details
#initialize(input_stream, settings) ⇒ MarcReader
Returns a new instance of MarcReader.
59 60 61 62 |
# File 'lib/traject/marc_reader.rb', line 59 def initialize(input_stream, settings) @settings = Traject::Indexer::Settings.new settings @input_stream = input_stream end |
Instance Attribute Details
#input_stream ⇒ Object (readonly)
Returns the value of attribute input_stream.
55 56 57 |
# File 'lib/traject/marc_reader.rb', line 55 def input_stream @input_stream end |
#settings ⇒ Object (readonly)
Returns the value of attribute settings.
55 56 57 |
# File 'lib/traject/marc_reader.rb', line 55 def settings @settings end |
Instance Method Details
#each(*args, &block) ⇒ Object
84 85 86 |
# File 'lib/traject/marc_reader.rb', line 84 def each(*args, &block) self.internal_reader.each(*args, &block) end |
#internal_reader ⇒ Object
Creates proper kind of ruby MARC reader, depending on settings or guesses.
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/traject/marc_reader.rb', line 66 def internal_reader unless defined? @internal_reader @internal_reader = case settings["marc_source.type"] when "xml" parser = settings["marc_reader.xml_parser"] || @@best_xml_parser MARC::XMLReader.new(self.input_stream, :parser=> parser) when 'json' Traject::NDJReader.new(self.input_stream, settings) else args = { :invalid => :replace } args[:external_encoding] = settings["marc_source.encoding"] MARC::Reader.new(self.input_stream, args) end end return @internal_reader end |