Class: ETL::Processor::EncodeProcessor
- Defined in:
- lib/etl/processor/encode_processor.rb
Overview
The encode processor uses Iconv to convert a file from one encoding (eg: utf-8) to another (eg: latin1), line by line.
Instance Attribute Summary collapse
-
#source_encoding ⇒ Object
readonly
The source file encoding.
-
#source_file ⇒ Object
readonly
The file to load from.
-
#target_encoding ⇒ Object
readonly
The target file encoding.
-
#target_file ⇒ Object
readonly
The file to write to.
Instance Method Summary collapse
-
#initialize(control, configuration) ⇒ EncodeProcessor
constructor
Initialize the processor.
-
#process ⇒ Object
Execute the processor.
Constructor Details
#initialize(control, configuration) ⇒ EncodeProcessor
Initialize the processor.
Configuration options:
-
:source_file
: The file to load data from -
:source_encoding
: The source file encoding (eg: ‘latin1’,‘utf-8’), as supported by Iconv -
:target_file
: The file to write data to -
:target_encoding
: The target file encoding
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/etl/processor/encode_processor.rb', line 24 def initialize(control, configuration) super raise ControlError, "Source file must be specified" if configuration[:source_file].nil? raise ControlError, "Target file must be specified" if configuration[:target_file].nil? @source_file = File.join(File.dirname(control.file), configuration[:source_file]) @source_encoding = configuration[:source_encoding] @target_file = File.join(File.dirname(control.file), configuration[:target_file]) @target_encoding = configuration[:target_encoding] raise ControlError, "Source and target file cannot currently point to the same file" if source_file == target_file begin @iconv = Iconv.new(target_encoding,source_encoding) rescue Iconv::InvalidEncoding raise ControlError, "Either the source encoding '#{source_encoding}' or the target encoding '#{target_encoding}' is not supported" end end |
Instance Attribute Details
#source_encoding ⇒ Object (readonly)
The source file encoding
13 14 15 |
# File 'lib/etl/processor/encode_processor.rb', line 13 def source_encoding @source_encoding end |
#source_file ⇒ Object (readonly)
The file to load from
9 10 11 |
# File 'lib/etl/processor/encode_processor.rb', line 9 def source_file @source_file end |
#target_encoding ⇒ Object (readonly)
The target file encoding
15 16 17 |
# File 'lib/etl/processor/encode_processor.rb', line 15 def target_encoding @target_encoding end |
#target_file ⇒ Object (readonly)
The file to write to
11 12 13 |
# File 'lib/etl/processor/encode_processor.rb', line 11 def target_file @target_file end |
Instance Method Details
#process ⇒ Object
Execute the processor
41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/etl/processor/encode_processor.rb', line 41 def process # operate line by line to handle large files without loading them in-memory # could be replaced by a system iconv call when available, for greater performance File.open(source_file) do |source| #puts "Opening #{target_file}" File.open(target_file,'w') do |target| source.each_line do |line| target << @iconv.iconv(line) end end end end |