Class: ETL::Control::DatabaseSource

Inherits:
Source show all
Defined in:
lib/etl/control/source/database_source.rb

Overview

Source object which extracts data from a database using ActiveRecord.

Instance Attribute Summary collapse

Attributes inherited from Source

#configuration, #control, #definition, #local_base, #store_locally

Instance Method Summary collapse

Methods inherited from Source

class_for_name, #errors, #last_local_file, #last_local_file_trigger, #local_file, #local_file_trigger, #read_locally, #timestamp

Constructor Details

#initialize(control, configuration, definition) ⇒ DatabaseSource

Initialize the source.

Arguments:

  • control: The ETL::Control::Control instance

  • configuration: The configuration Hash

  • definition: The source definition

Required configuration options:

  • :target: The target connection

  • :table: The source table name

  • :database: The database name

Other options:

  • :join: Optional join part for the query (ignored unless specified)

  • :select: Optional select part for the query (defaults to ‘*’)

  • :group: Optional group by part for the query (ignored unless specified)

  • :order: Optional order part for the query (ignored unless specified)

  • :new_records_only: Specify the column to use when comparing timestamps against the last successful ETL job execution for the current control file.

  • :store_locally: Set to false to not store a copy of the source data locally in a flat file (defaults to true)



42
43
44
45
46
47
# File 'lib/etl/control/source/database_source.rb', line 42

def initialize(control, configuration, definition)
  super
  @target = configuration[:target]
  @table = configuration[:table]
  @query = configuration[:query]
end

Instance Attribute Details

#tableObject

Returns the value of attribute table.



14
15
16
# File 'lib/etl/control/source/database_source.rb', line 14

def table
  @table
end

#targetObject

Returns the value of attribute target.



13
14
15
# File 'lib/etl/control/source/database_source.rb', line 13

def target
  @target
end

Instance Method Details

#columnsObject

Get the list of columns to read. This is defined in the source definition as either an Array or Hash



98
99
100
101
# File 'lib/etl/control/source/database_source.rb', line 98

def columns
  # weird default is required for writing to cache correctly
  @columns ||= query_rows.any? ? query_rows.first.keys : ['']
end

#count(use_cache = true) ⇒ Object

Get the number of rows in the source



87
88
89
90
91
92
93
94
# File 'lib/etl/control/source/database_source.rb', line 87

def count(use_cache=true)
  return @count if @count && use_cache
  if @store_locally || read_locally
    @count = count_locally
  else
    @count = connection.select_value(query.gsub(/SELECT .* FROM/, 'SELECT count(1) FROM'))
  end
end

#each(&block) ⇒ Object

Returns each row from the source. If read_locally is specified then this method will attempt to read from the last stored local file. If no locally stored file exists or if the trigger file for the last locally stored file does not exist then this method will raise an error.



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/etl/control/source/database_source.rb', line 108

def each(&block)
  if read_locally # Read from the last stored source
    ETL::Engine.logger.debug "Reading from local cache"
    read_rows(last_local_file, &block)
  else # Read from the original source
    if @store_locally
      file = local_file
      write_local(file)
      read_rows(file, &block)
    else
      query_rows.each do |r|
        row = ETL::Row.new()
        r.symbolize_keys.each_pair { |key, value|
          row[key] = value
        }
        row.source = self
        yield row
      end
    end
  end
end

#groupObject

Get the group by part of the query, defaults to nil



71
72
73
# File 'lib/etl/control/source/database_source.rb', line 71

def group
  configuration[:group]
end

#joinObject

Get the join part of the query, defaults to nil



61
62
63
# File 'lib/etl/control/source/database_source.rb', line 61

def join
  configuration[:join]
end

#local_directoryObject

Get the local directory to use, which is a combination of the local_base, the db hostname the db database name and the db table.



56
57
58
# File 'lib/etl/control/source/database_source.rb', line 56

def local_directory
  File.join(local_base, to_s)
end

#new_records_onlyObject

Return the column which is used for in the where clause to identify new rows



82
83
84
# File 'lib/etl/control/source/database_source.rb', line 82

def new_records_only
  configuration[:new_records_only]
end

#orderObject

Get the order for the query, defaults to nil



76
77
78
# File 'lib/etl/control/source/database_source.rb', line 76

def order
  configuration[:order]
end

#selectObject

Get the select part of the query, defaults to ‘*’



66
67
68
# File 'lib/etl/control/source/database_source.rb', line 66

def select
  configuration[:select] || '*'
end

#to_sObject

Get a String identifier for the source



50
51
52
# File 'lib/etl/control/source/database_source.rb', line 50

def to_s
  "#{host}/#{database}/#{@table}"
end