Class: Jinx::Csv::Joiner
- Inherits:
-
Object
- Object
- Jinx::Csv::Joiner
- Includes:
- Enumerable
- Defined in:
- lib/jinx/csv/joiner.rb
Overview
Merges two CSV files on common fields.
Instance Method Summary collapse
-
#compare(source, target) ⇒ -1, ...
private
Compares the given source and target buffers with result as follows: * If source and target are nil, then nil * If source is nil and target is not nil, then -1 * If target is nil and source is not nil, then 1 * Otherwise, the pair-wise comparison of the source and target keys.
-
#initialize(source, target = nil, output = nil) ⇒ Joiner
constructor
A new instance of Joiner.
-
#join(*fields) {|rec| ... } ⇒ Object
Joins the source to the target and writes the output.
-
#look_ahead(csvio, buf = nil) ⇒ Buffer?
private
The modified look-ahead, or nil if end of file.
-
#merge(source, target, output) {|rec| ... } ⇒ Object
private
Merges the given source into the target as the output.
-
#shift(csvio, buf = nil) ⇒ Buffer?
private
Reads a record into the given buffers.
-
#shift?(buf, other, order) ⇒ Boolean
private
Returns whether to shift the given buffer as follows: * If the buffer precedes the other buffer, then true.
Constructor Details
#initialize(source, target = nil, output = nil) ⇒ Joiner
Returns a new instance of Joiner.
12 13 14 15 16 |
# File 'lib/jinx/csv/joiner.rb', line 12 def initialize(source, target=nil, output=nil) @source = source @target = target || STDIN @output = output || STDOUT end |
Instance Method Details
#compare(source, target) ⇒ -1, ... (private)
Compares the given source and target buffers with result as follows:
-
If source and target are nil, then nil
-
If source is nil and target is not nil, then -1
-
If target is nil and source is not nil, then 1
-
Otherwise, the pair-wise comparison of the source and target keys
181 182 183 184 185 186 187 188 189 190 191 192 193 |
# File 'lib/jinx/csv/joiner.rb', line 181 def compare(source, target) return target.nil? ? nil : 1 if source.nil? return -1 if target.nil? source.key.each_with_index do |v1, i| v2 = target.key[i] next if v1.nil? and v2.nil? return -1 if v1.nil? return 1 if v2.nil? cmp = v1 <=> v2 return cmp unless cmp == 0 end 0 end |
#join(*fields) {|rec| ... } ⇒ Object
Joins the source to the target and writes the output. The source fields used are given by the fields
argument, if given. By default, all source fields are used.
The output fields consist of the qualified source fields and all target fields. The output fields are in the following order:
-
The common fields, in order of occurrence in the source file.
-
The qualified source-specific fields, in order of occurrence in the source file.
-
The target-specific fields, in order of occurrence in the target file.
The match is on the common qualified source and target fields. Both files must be sorted in order of the common fields, sequenced by their occurence in the source header.
If an output argument is given, then the joined record is written to the output. If a block is given, then the block is called on each record prior to writing the record to the output. If the block returns nil, then the record is not written.
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/jinx/csv/joiner.rb', line 40 def join(*fields, &block) CsvIO.open(@target) do |tgt| CsvIO.open(@source) do |src| # all source fields (unordered) usflds = src.field_names.to_set fields.each do |fld| unless usflds.include?(fld) then raise ArgumentError.new("CSV join field #{fld} not found in the source file #{@source}.") end end # the qualified source fields (ordered) qsflds = fields.empty? ? src.field_names : fields tflds = tgt.field_names @common = qsflds & tflds # The headers consist of the common fields followed by the qualified # source-specific fields followed by the target-specific fields. hdrs = @common | qsflds | tflds CsvIO.open(@output, :mode => 'w', :headers => hdrs) do |out| merge(src, tgt, out, &block) end end end alias :each :join end |
#look_ahead(csvio, buf = nil) ⇒ Buffer? (private)
Returns the modified look-ahead, or nil if end of file.
165 166 167 168 169 170 171 |
# File 'lib/jinx/csv/joiner.rb', line 165 def look_ahead(csvio, buf=nil) rec = csvio.next || return buf ||= Buffer.new buf.record = rec buf.key = @common.map { |k| rec[k] } buf end |
#merge(source, target, output) {|rec| ... } ⇒ Object (private)
Merges the given source into the target as the output. The output headers must be in the order specified by #join.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/jinx/csv/joiner.rb', line 79 def merge(source, target, output) # the qualified source field accessors sflds = source.accessors & output.accessors # the target field accessors tflds = target.accessors # the common fields @common = sflds & tflds # The target-specific accessors trest = tflds - @common # The source-specific accessors srest = output.accessors - trest - @common # The output record obuf = Array.new(output.accessors.size) # The source/target current/next (key, record) buffers # Read the first and second records into the buffers sbuf = shift(source) tbuf = shift(target) # Compare the source and target. while cmp = compare(sbuf, tbuf) do # Fill the output record in three sections: the common, source and target fields. obuf.fill do |i| if i < @common.size then cmp <= 0 ? sbuf.key[i] : tbuf.key[i] elsif i < sflds.size then # Only fill the output record with source values if there is a current source # record and the target does not precede the source. sbuf.record[srest[i - @common.size]] if sbuf and cmp <= 0 elsif tbuf and cmp >= 0 # Only fill the output record with target values if there is a current target # record and the source does not precede the target. tbuf.record[trest[i - sflds.size]] end end orec = block_given? ? yield(obuf) : obuf # Emit the output record. output << orec if orec # Shift the buffers as necessary. ss, ts = shift?(sbuf, tbuf, cmp), shift?(tbuf, sbuf, -cmp) sbuf = shift(source, sbuf) if ss tbuf = shift(target, tbuf) if ts end end |
#shift(csvio, buf = nil) ⇒ Buffer? (private)
Reads a record into the given buffers.
148 149 150 151 152 153 154 155 156 157 158 159 160 |
# File 'lib/jinx/csv/joiner.rb', line 148 def shift(csvio, buf=nil) if buf then return if buf.lookahead.nil? else # prime the look-ahead buf = Buffer.new(nil, nil, look_ahead(csvio)) return shift(csvio, buf) end buf.record = buf.lookahead.record buf.key = buf.lookahead.key buf.lookahead = look_ahead(csvio, buf.lookahead) buf end |
#shift?(buf, other, order) ⇒ Boolean (private)
Returns whether to shift the given buffer as follows:
-
If the buffer precedes the other buffer, then true.
-
If the buffer succeeds the other buffer, then false.
-
Otherwise, if the lookahead record has the same key as the buffer record then true.
-
Otherwise, if the other lookahead record has a different key than the other record, then true.
132 133 134 135 136 137 138 139 140 141 |
# File 'lib/jinx/csv/joiner.rb', line 132 def shift?(buf, other, order) case order when -1 then true when 1 then false when 0 then compare(buf, buf.lookahead) == 0 or compare(other, other.lookahead) != 0 end end |