Class: Traject::DebugWriter
- Inherits:
-
LineWriter
- Object
- LineWriter
- Traject::DebugWriter
- Defined in:
- lib/traject/debug_writer.rb
Overview
The Traject::DebugWriter produces a simple, human-readable output format that's also amenable to simple computer processing (e.g., with a simple grep). It's the output format used when you pass the --debug-mode switch to traject on the command line.
Output format is three columns: id, output field, values (multiple values seperated by '|'), and looks something like:
000001580 edition [1st ed.]
000001580 format Book | Online | Print
000001580 geo Great Britain
000001580 id 000001580
000001580 isbn 0631126902
Settings
- 'output_file' -- the name of the file to output to (command line -o shortcut).
- 'output_stream' -- alternately, the IO stream
- 'debug_writer.idfield' -- the solr field from which to pull the record ID (default: 'id')
- 'debug_writer.format' -- How to format the id/solr field/values (default: '%-12s %-25s %s')
By default, with neither output_file nor output_stream provided, writes to stdout, which can be useful for debugging diagnosis.
Example configuration file
require 'traject/debug_writer'
settings do
provide "writer_class_name", "Traject::DebugWriter"
provide "output_file", "out.txt"
end
Constant Summary collapse
- DEFAULT_IDFIELD =
'id'
- DEFAULT_FORMAT =
'%-12s %-25s %s'
Instance Attribute Summary
Attributes inherited from LineWriter
#output_file, #settings, #write_mutex
Instance Method Summary collapse
-
#initialize ⇒ DebugWriter
constructor
A new instance of DebugWriter.
- #record_number(context) ⇒ Object
- #serialize(context) ⇒ Object
Methods inherited from LineWriter
#_write, #close, #open_output_file, #put, #should_close_stream?
Constructor Details
#initialize ⇒ DebugWriter
Returns a new instance of DebugWriter.
38 39 40 41 42 43 44 45 46 |
# File 'lib/traject/debug_writer.rb', line 38 def initialize(*) super @idfield = settings["debug_writer.idfield"] || DEFAULT_IDFIELD @format = settings['debug_writer.format'] || DEFAULT_FORMAT @use_position = (@idfield == 'record_position') @already_threw_warning_about_missing_id = false end |
Instance Method Details
#record_number(context) ⇒ Object
48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/traject/debug_writer.rb', line 48 def record_number(context) return context.position if @use_position if context.output_hash.has_key?(@idfield) context.output_hash[@idfield].first else unless @already_threw_warning_about_missing_id context.logger.warn "At least one record (#{context.record_inspect}) doesn't define field '#{@idfield}'. All records are assumed to have a unique id. You can set which field to look in via the setting 'debug_writer.idfield'" @already_threw_warning_about_missing_id = true end "record_num_#{context.position}" end end |
#serialize(context) ⇒ Object
62 63 64 65 66 67 68 |
# File 'lib/traject/debug_writer.rb', line 62 def serialize(context) h = context.output_hash rec_key = record_number(context) lines = h.keys.sort.map { |k| @format % [rec_key, k, h[k].join(' | ')] } lines.push "\n" lines.join("\n") end |