Class: ETL::Control::DatabaseSource

Inherits:
Source
  • Object
show all
Defined in:
lib/etl/control/source/database_source.rb

Overview

Source object which extracts data from a database using ActiveRecord.

Instance Attribute Summary collapse

Attributes inherited from Source

#configuration, #control, #definition, #local_base, #store_locally

Instance Method Summary collapse

Methods inherited from Source

class_for_name, #errors, #last_local_file, #last_local_file_trigger, #local_file, #local_file_trigger, #read_locally, #timestamp

Constructor Details

#initialize(control, configuration, definition) ⇒ DatabaseSource

Initialize the source.

Arguments:

  • control: The ETL::Control::Control instance

  • configuration: The configuration Hash

  • definition: The source definition

Required configuration options:

  • :target: The target connection

  • :table: The source table name

  • :database: The database name

Other options:

  • :join: Optional join part for the query (ignored unless specified)

  • :select: Optional select part for the query (defaults to ‘*’)

  • :group: Optional group by part for the query (ignored unless specified)

  • :order: Optional order part for the query (ignored unless specified)

  • :new_records_only: Specify the column to use when comparing timestamps against the last successful ETL job execution for the current control file.

  • :store_locally: Set to false to not store a copy of the source data locally in a flat file (defaults to true)



40
41
42
43
44
# File 'lib/etl/control/source/database_source.rb', line 40

def initialize(control, configuration, definition)
  super
  @target = configuration[:target]
  @table = configuration[:table]
end

Instance Attribute Details

#tableObject

Returns the value of attribute table.



12
13
14
# File 'lib/etl/control/source/database_source.rb', line 12

def table
  @table
end

#targetObject

Returns the value of attribute target.



11
12
13
# File 'lib/etl/control/source/database_source.rb', line 11

def target
  @target
end

Instance Method Details

#columnsObject

Get the list of columns to read. This is defined in the source definition as either an Array or Hash



95
96
97
98
# File 'lib/etl/control/source/database_source.rb', line 95

def columns
  # weird default is required for writing to cache correctly
  @columns ||= query_rows.any? ? query_rows.first.keys : ['']
end

#count(use_cache = true) ⇒ Object

Get the number of rows in the source



84
85
86
87
88
89
90
91
# File 'lib/etl/control/source/database_source.rb', line 84

def count(use_cache=true)
  return @count if @count && use_cache
  if store_locally || read_locally
    @count = count_locally
  else
    @count = connection.select_value(query.gsub(/SELECT .* FROM/, 'SELECT count(1) FROM'))
  end
end

#each(&block) ⇒ Object

Returns each row from the source. If read_locally is specified then this method will attempt to read from the last stored local file. If no locally stored file exists or if the trigger file for the last locally stored file does not exist then this method will raise an error.



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/etl/control/source/database_source.rb', line 105

def each(&block)
  if read_locally # Read from the last stored source
    ETL::Engine.logger.debug "Reading from local cache"
    read_rows(last_local_file, &block)
  else # Read from the original source
    if store_locally
      file = local_file
      write_local(file)
      read_rows(file, &block)
    else
      query_rows.each do |row|
        row = ETL::Row.new(row.symbolize_keys)
        row.source = self
        yield row
      end
    end
  end
end

#groupObject

Get the group by part of the query, defaults to nil



68
69
70
# File 'lib/etl/control/source/database_source.rb', line 68

def group
  configuration[:group]
end

#joinObject

Get the join part of the query, defaults to nil



58
59
60
# File 'lib/etl/control/source/database_source.rb', line 58

def join
  configuration[:join]
end

#local_directoryObject

Get the local directory to use, which is a combination of the local_base, the db hostname the db database name and the db table.



53
54
55
# File 'lib/etl/control/source/database_source.rb', line 53

def local_directory
  File.join(local_base, host, database, configuration[:table])
end

#new_records_onlyObject

Return the column which is used for in the where clause to identify new rows



79
80
81
# File 'lib/etl/control/source/database_source.rb', line 79

def new_records_only
  configuration[:new_records_only]
end

#orderObject

Get the order for the query, defaults to nil



73
74
75
# File 'lib/etl/control/source/database_source.rb', line 73

def order
  configuration[:order]
end

#selectObject

Get the select part of the query, defaults to ‘*’



63
64
65
# File 'lib/etl/control/source/database_source.rb', line 63

def select
  configuration[:select] || '*'
end

#to_sObject

Get a String identifier for the source



47
48
49
# File 'lib/etl/control/source/database_source.rb', line 47

def to_s
  "#{host}/#{database}/#{table}"
end