Class: ETL::Control::DatabaseSource

Inherits:
Source
  • Object
show all
Defined in:
lib/etl/control/source/database_source.rb

Overview

Source object which extracts data from a database using ActiveRecord.

Instance Attribute Summary

Attributes inherited from Source

#configuration, #control, #definition, #local_base, #store_locally

Instance Method Summary collapse

Methods inherited from Source

class_for_name, #errors, #last_local_file, #last_local_file_trigger, #local_file, #local_file_trigger, #read_locally, #timestamp

Constructor Details

#initialize(control, configuration, definition) ⇒ DatabaseSource

Initialize the source.

Arguments:

  • control: The ETL::Control::Control instance

  • configuration: The configuration Hash

  • definition: The source definition

Required configuration options:

  • :table: The source table name

  • :database: The database name

Other options:

  • :adapter: The adapter to use (defaults to :mysql)

  • :username: The database username (defaults to ‘root’)

  • :password: The password to the database (defaults to nothing)

  • :host: The host for the database (defaults to ‘localhost’)

  • :join: Optional join part for the query (ignored unless specified)

  • :select: Optional select part for the query (defaults to ‘*’)

  • :order: Optional order part for the query (ignored unless specified)

  • :store_locally: Set to false to not store a copy of the source data locally in a flat file (defaults to true)



37
38
39
40
# File 'lib/etl/control/source/database_source.rb', line 37

def initialize(control, configuration, definition)
  super
  connect
end

Instance Method Details

#columnsObject

Get the list of columns to read. This is defined in the source definition as either an Array or Hash



91
92
93
94
95
96
97
98
99
100
# File 'lib/etl/control/source/database_source.rb', line 91

def columns
  case definition
  when Array
    definition.collect(&:to_sym)
  when Hash
    definition.keys.collect(&:to_sym)
  else
    raise "Definition must be either an Array or a Hash"
  end
end

#count(use_cache = true) ⇒ Object

Get the number of rows in the source



80
81
82
83
84
85
86
87
# File 'lib/etl/control/source/database_source.rb', line 80

def count(use_cache=true)
  return @count if @count && use_cache
  if store_locally || read_locally
    @count = count_locally
  else
    @count = connection.select_value(query.gsub(/SELECT .* FROM/, 'SELECT count(1) FROM'))
  end
end

#each(&block) ⇒ Object

Returns each row from the source. If read_locally is specified then this method will attempt to read from the last stored local file. If no locally stored file exists or if the trigger file for the last locally stored file does not exist then this method will raise an error.



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/etl/control/source/database_source.rb', line 107

def each(&block)
  if read_locally # Read from the last stored source
    ETL::Engine.logger.debug "Reading from local cache"
    read_rows(last_local_file, &block)
  else # Read from the original source
    if store_locally
      file = local_file
      write_local(file)
      read_rows(file, &block)
    else
      connection.select_all(query).each do |row|
        row = ETL::Row.new(row.symbolize_keys)
        row.source = self
        yield row
      end
    end
  end
end

#groupObject

Get the group by part of the query, defaults to nil



64
65
66
# File 'lib/etl/control/source/database_source.rb', line 64

def group
  configuration[:group]
end

#joinObject

Get the join part of the query, defaults to nil



54
55
56
# File 'lib/etl/control/source/database_source.rb', line 54

def join
  configuration[:join]
end

#local_directoryObject

Get the local directory to use, which is a combination of the local_base, the db hostname the db database name and the db table.



49
50
51
# File 'lib/etl/control/source/database_source.rb', line 49

def local_directory
  File.join(local_base, host, configuration[:database], configuration[:table])
end

#new_records_onlyObject

Return the column which is used for in the where clause to identify new rows



75
76
77
# File 'lib/etl/control/source/database_source.rb', line 75

def new_records_only
  configuration[:new_records_only]
end

#orderObject

Get the order for the query, defaults to nil



69
70
71
# File 'lib/etl/control/source/database_source.rb', line 69

def order
  configuration[:order]
end

#selectObject

Get the select part of the query, defaults to ‘*’



59
60
61
# File 'lib/etl/control/source/database_source.rb', line 59

def select
  configuration[:select] || '*'
end

#to_sObject

Get a String identifier for the source



43
44
45
# File 'lib/etl/control/source/database_source.rb', line 43

def to_s
  "#{host}/#{configuration[:database]}/#{configuration[:table]}"
end