Class: Jinx::CsvIO

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/jinx/csv/csvio.rb

Overview

CsvIO reads or writes CSV records. This class wraps a FasterCSV with the following modifications:

  • relax the date parser to allow dd/mm/yyyy dates

  • don’t convert integer text with a leading zero to an octal number

  • allow one custom converter with different semantics: if the converter block call returns nil, then continue conversion, otherwise return the converter result. This differs from FasterCSV converter semantics which calls converters as long the result equals the input field value. The CsvIO converter semantics supports converters that intend a String result to be the converted result.

CsvIO is Enumerable, but does not implement the complete Ruby IO interface.

Constant Summary collapse

MMM_MM_MAP =

3-letter months => month sequence hash.

['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'].to_compact_hash_with_index do |mmm, index|
  index < 9 ? ('0' + index.succ.to_s) : index.succ.to_s
end
DateMatcher =

DateMatcher relaxes the FasterCSV DateMatcher to allow dd/mm/yyyy dates.

/ \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} | \d{1,2}-\w{3}-\d{2,4} | \d{4}[-\/]\d{1,2}[-\/]\d{1,2} | \d{1,2}[-\/]\d{1,2}[-\/]\d{2,4} )\z /x
DD_MMM_YYYY_RE =
/^(\d{1,2})-([[:alpha:]]{3})-(\d{2,4})$/

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dev, opts = nil) {|value, info| ... } ⇒ CsvIO

Creates a new CsvIO for the specified source file. If a converter block is given, then it is added to the CSV converters list.

Parameters:

  • dev (String, IO)

    the CSV file or stream to open

  • opts (Hash) (defaults to: nil)

    the open options

Options Hash (opts):

  • :mode (String)

    the input mode (default r)

  • :headers (String)

    the input field headers

Yields:

  • (value, info)

    converts the input value

Yield Parameters:

  • value (String)

    the input value

  • info

    the current field’s FasterCSV FieldInfo metadata

Raises:

  • (ArgumentError)

    if the input is nil



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/jinx/csv/csvio.rb', line 84

def initialize(dev, opts=nil, &converter)
  raise ArgumentError.new("CSV input argument is missing") if dev.nil?
  # the CSV file open mode
  mode = Options.get(:mode, opts, 'r')
  # the CSV headers option; can be boolean or array
  hdr_opt = Options.get(:headers, opts)
  # there is a header record by default for an input CSV file
  hdr_opt ||= true if mode =~ /^r/
  # make parent directories if necessary for an output CSV file
  File.makedirs(File.dirname(dev)) if String == dev and mode =~ /^w/
  # if headers aren't given, then convert the input CSV header record names to underscore symbols
  hdr_cvtr = :symbol unless Enumerable === hdr_opt
  # make a custom converter
  custom = Proc.new { |value, info| convert(value, info, &converter) }
  # collect the options
  csv_opts = {:headers => hdr_opt, :header_converters => hdr_cvtr, :return_headers => true, :write_headers => true, :converters => custom}
  # Make the parent directory if necessary.
  FileUtils.mkdir_p(File.dirname(dev)) if String === dev and mode !~ /^r/
  # open the CSV file
  @csv = String === dev ? FasterCSV.open(dev, mode, csv_opts) : FasterCSV.new(dev, csv_opts)
  # the header => field name hash:
  # if the header option is set to true, then read the input header line.
  # otherwise, parse an empty string which mimics an input header line.
  hdr_row = case hdr_opt
  when true then
    @csv.shift
  when Enumerable then
    ''.parse_csv(:headers => hdr_opt, :header_converters => :symbol, :return_headers => true)
  else
    raise ArgumentError.new("CSV headers option value not supported: #{hdr_opt}")
  end
  # The field value accessors consist of the header row headers converted to a symbol.
  @accessors = hdr_row.headers
  # The field names consist of the header row values.
  @field_names = @accessors.map { |sym| hdr_row[sym] }
  # the header name => symbol map
  @hdr_sym_hash = hdr_row.to_hash.invert
end

Instance Attribute Details

#accessors<Symbol> (readonly) Also known as: headers

Returns the CSV field value accessor.

Returns:

  • (<Symbol>)

    the CSV field value accessor



26
27
28
# File 'lib/jinx/csv/csvio.rb', line 26

def accessors
  @accessors
end

#field_names<String> (readonly)

Returns the CSV field names.

Returns:

  • (<String>)

    the CSV field names



23
24
25
# File 'lib/jinx/csv/csvio.rb', line 23

def field_names
  @field_names
end

Class Method Details

.foreach(file, opts = nil) {|row| ... } ⇒ Object

Opens the given CSV file and calls #each with the given block.

Parameters:

  • dev (String, IO)

    the CSV file or stream to open

  • opts (Hash) (defaults to: nil)

    the open options

Options Hash (opts):

  • :mode (String)

    the input mode (default r)

  • :headers (String)

    the input field headers

Yields:

  • (row)

    the block to execute on the row

Yield Parameters:

  • row ({Symbol => Object})

    the field symbol => value hash



52
53
54
# File 'lib/jinx/csv/csvio.rb', line 52

def self.foreach(file, opts=nil, &block)
  open(file, opts) { |csvio| csvio.each(&block) }
end

.join(source, opts) {|rec| ... } ⇒ Object

Joins the source to the target and writes the output. The match is on all fields held in common. If there is more than one match, then all but the first match has empty values for the merged fields. Both files must be sorted in order of the common fields, sequenced by their occurence in the source header.

Parameters:

  • source (String, IO)

    the join source file

  • opts ({Symbol => String, IO, <String>})

    the join options

Options Hash (opts):

  • :to (String, IO)

    the join target file name or device (default stdin)

  • :for (<String>)

    the target field names (default all target fields)

  • :as (String, IO)

    the output file name or device (default stdout)

Yields:

  • (rec)

    process the output record and return the record to write

Yield Parameters:

  • rec (FasterCSV::Record)

    the output record



68
69
70
71
# File 'lib/jinx/csv/csvio.rb', line 68

def self.join(source, opts, &block)
  flds = opts[:for] || Array::EMPTY_ARRAY
  Csv::Joiner.new(source, opts[:to], opts[:as]).join(*flds, &block)
end

.open(dev, opts = nil) {|csvio| ... } ⇒ Object

Opens the CSV file and calls the given block with this CsvIO as the argument.

Parameters:

  • dev (String, IO)

    the CSV file or stream to open

  • opts (Hash) (defaults to: nil)

    the open options

Options Hash (opts):

  • :mode (String)

    the input mode (default r)

  • :headers (String)

    the input field headers

Yields:

  • (csvio)

    the optional block to execute

Yield Parameters:

  • csvio (CsvIO)

    the open CSVIO instance



35
36
37
38
39
40
41
42
43
44
# File 'lib/jinx/csv/csvio.rb', line 35

def self.open(dev, opts=nil)
  csvio = new(dev, opts)
  if block_given? then
    begin
      yield csvio
    ensure
      csvio.close
    end
  end
end

Instance Method Details

#accessor(name) ⇒ Object

Parameters:

  • header (String)

    the CSV field header name

  • the (Symbol)

    header accessor method



130
131
132
# File 'lib/jinx/csv/csvio.rb', line 130

def accessor(name)
  @hdr_sym_hash[name]
end

#closeObject

Closes the CSV file.



124
125
126
# File 'lib/jinx/csv/csvio.rb', line 124

def close
  @csv.close
end

#convert(f, info) ⇒ Object (private)

Returns the converted value.

Parameters:

  • f

    the input field value to convert

  • info

    the CSV field info

Returns:

  • the converted value



179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/jinx/csv/csvio.rb', line 179

def convert(f, info)
  return if f.nil?
  # the block has precedence
  value = yield(f, info) if block_given?
  # integer conversion
  value ||= Integer(f) if f =~ /^[1-9]\d*$/
  # date conversion
  value ||= convert_date(f) if f =~ CsvIO::DateMatcher
  # float conversion
  value ||= (Float(f) rescue f) if f =~ /^\d+\.\d*$/ or f =~ /^\d*\.\d+$/
  # return converted value or the input field if there was no conversion
  value || f
end

#convert_date(f) ⇒ Date (private)

Returns the converted date.

Parameters:

  • the (String)

    input field value

Returns:

  • (Date)

    the converted date



195
196
197
198
199
200
201
202
203
204
# File 'lib/jinx/csv/csvio.rb', line 195

def convert_date(f)
  # If input value is in dd-mmm-yy format, then reformat.
  # Otherwise, parse as a Date if possible.
  if f =~ DD_MMM_YYYY_RE then
    ddmmyy = reformat_dd_mmm_yy_date(f) || return
    convert_date(ddmmyy)
  else
    Date.parse(f, true) rescue nil
  end
end

#each {|row| ... } ⇒ Object

Iterates over each CSV row, yielding a row for each iteration.

Yields:

  • (row)

    processes the CSV row

Yield Parameters:

  • row (FasterCSV::Row)

    the CSV row



138
139
140
# File 'lib/jinx/csv/csvio.rb', line 138

def each(&block)
  @csv.each(&block)
end

#readlineObject Also known as: shift, next

Reads the next CSV row.

Returns:

  • the next CSV row

See Also:



146
147
148
# File 'lib/jinx/csv/csvio.rb', line 146

def readline
  @csv.shift
end

#reformat_dd_mmm_yy_date(f) ⇒ String (private)

Returns the reformatted date String in mm/dd/yy format.

Parameters:

  • the (String)

    input field value in dd-mmm-yy format

Returns:

  • (String)

    the reformatted date String in mm/dd/yy format



208
209
210
211
212
# File 'lib/jinx/csv/csvio.rb', line 208

def reformat_dd_mmm_yy_date(f)
  dd, mmm, yy = DD_MMM_YYYY_RE.match(f).captures
  mm = MMM_MM_MAP[mmm.downcase] || return
  "#{mm}/#{dd}/#{yy}"
end

#write(row) ⇒ Object Also known as: <<

Writes the given row to the CSV file.

Parameters:

  • row ({Symbol => Object})

    the input row



157
158
159
160
# File 'lib/jinx/csv/csvio.rb', line 157

def write(row)
  @csv << row
  @csv.flush
end