Class: InCSV::Database

Inherits:
Object
  • Object
show all
Defined in:
lib/incsv/database.rb

Overview

Represents a database file, handling the creation of the database and of the table within the database, as well as the importing of data from a CSV file into the database.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(csv) ⇒ Database

Returns a new instance of Database.



10
11
12
13
14
15
16
# File 'lib/incsv/database.rb', line 10

def initialize(csv)
  @csv = csv

  @db = Sequel.sqlite(db_path)
  # require "logger"
  # @db.loggers << Logger.new($stdout)
end

Instance Attribute Details

#dbObject (readonly)

Returns the value of attribute db.



18
19
20
# File 'lib/incsv/database.rb', line 18

def db
  @db
end

Instance Method Details

#create_tableObject

Creates a table in the database, with one column in the database for each column in the CSV, the type of which is the best guess for the data found in that column in the CSV data.



61
62
63
64
65
66
67
68
69
70
71
# File 'lib/incsv/database.rb', line 61

def create_table
  @db.create_table!(table_name) do
    primary_key :_incsv_id
  end

  schema.columns.each do |c|
    @db.alter_table(table_name) do
      add_column c.name, c.type.for_database
    end
  end
end

#db_pathObject

Returns the path to the database file, generated based on the filename of the CSV passed to the class. For example, a CSV called ‘products.csv` will be stored in a database called `products.db` in the same directory.



44
45
46
47
# File 'lib/incsv/database.rb', line 44

def db_path
  path = Pathname(csv)
  (path.dirname + (path.basename(".csv").to_s + ".db")).to_s
end

#exists?Boolean

Returns true if the database file exists; makes no effort to check whether it is in fact a valid SQLite database.

Returns:

  • (Boolean)


36
37
38
# File 'lib/incsv/database.rb', line 36

def exists?
  File.exist?(db_path)
end

#importObject

Imports data from the CSV file into the database, applying any preprocessing specified by the column type (e.g. stripping currency prefixes).

Data is imported in transactions, in chunks of 200 rows at a time.



78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/incsv/database.rb', line 78

def import
  return if imported?

  create_table unless table_created?

  columns      = schema.columns
  column_names = columns.map(&:name)

  chunks(200) do |chunk|
    rows = chunk.map do |row|
      row.to_hash.values.each_with_index.map do |column, n|
        columns[n].type.clean_value(column)
      end
    end

    @db[table_name].import(column_names, rows)
  end
end

#imported?Boolean

Returns true if there is data in the primary table. There are perhaps more accurate ways to calculate this, but only by comparing samples from the CSV to the table; this is faster and will in practice be accurate.

Returns:

  • (Boolean)


30
31
32
# File 'lib/incsv/database.rb', line 30

def imported?
   table_created? && @db[table_name].count > 0
end

#table_created?Boolean

Returns true if the primary database table within the database has been created.

Returns:

  • (Boolean)


22
23
24
# File 'lib/incsv/database.rb', line 22

def table_created?
  @db.table_exists?(table_name)
end

#table_nameObject

Returns the table name, by default generated based on the filename of the CSV. For example, a CSV called ‘products.csv` will produce a table called `products`.



52
53
54
55
56
# File 'lib/incsv/database.rb', line 52

def table_name
  @table_name ||= begin
    File.basename(csv, ".csv").downcase.gsub(/[^a-z_]/, "").to_sym
  end
end