Module: DarwinCore::Ingester
Overview
This module abstracts information for reading csv file to be used in several classes which need such functionality
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Returns the value of attribute data.
-
#encoding ⇒ Object
readonly
Returns the value of attribute encoding.
-
#fields ⇒ Object
readonly
Returns the value of attribute fields.
-
#fields_separator ⇒ Object
readonly
Returns the value of attribute fields_separator.
-
#file_path ⇒ Object
readonly
Returns the value of attribute file_path.
-
#ignore_headers ⇒ Object
readonly
Returns the value of attribute ignore_headers.
-
#line_separator ⇒ Object
readonly
Returns the value of attribute line_separator.
-
#properties ⇒ Object
readonly
Returns the value of attribute properties.
-
#quote_character ⇒ Object
readonly
Returns the value of attribute quote_character.
-
#size ⇒ Object
readonly
Returns the value of attribute size.
Instance Method Summary collapse
Instance Attribute Details
#data ⇒ Object (readonly)
Returns the value of attribute data.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def data @data end |
#encoding ⇒ Object (readonly)
Returns the value of attribute encoding.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def encoding @encoding end |
#fields ⇒ Object (readonly)
Returns the value of attribute fields.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def fields @fields end |
#fields_separator ⇒ Object (readonly)
Returns the value of attribute fields_separator.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def fields_separator @fields_separator end |
#file_path ⇒ Object (readonly)
Returns the value of attribute file_path.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def file_path @file_path end |
#ignore_headers ⇒ Object (readonly)
Returns the value of attribute ignore_headers.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def ignore_headers @ignore_headers end |
#line_separator ⇒ Object (readonly)
Returns the value of attribute line_separator.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def line_separator @line_separator end |
#properties ⇒ Object (readonly)
Returns the value of attribute properties.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def properties @properties end |
#quote_character ⇒ Object (readonly)
Returns the value of attribute quote_character.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def quote_character @quote_character end |
#size ⇒ Object (readonly)
Returns the value of attribute size.
5 6 7 |
# File 'lib/dwc_archive/ingester.rb', line 5 def size @size end |
Instance Method Details
#read(batch_size = 10_000) {|[res, errors]| ... } ⇒ Object
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/dwc_archive/ingester.rb', line 12 def read(batch_size = 10_000) DarwinCore.logger_write(@dwc.object_id, "Reading #{name} data") res = [] errors = [] args = define_csv_args min_size = @fields.map { |f| f[:index].to_i || 0 }.max + 1 csv = CSV.new(open(@file_path), **args) csv.each_with_index do |r, i| next if @ignore_headers && i == 0 min_size > r.size ? errors << r : process_csv_row(res, errors, r) next if i == 0 || i % batch_size != 0 DarwinCore.logger_write(@dwc.object_id, format("Ingested %s records from %s", i, name)) next unless block_given? yield [res, errors] res = [] errors = [] end yield [res, errors] if block_given? [res, errors] end |