Class: ModelIterator

Inherits:
Object
  • Object
show all
Defined in:
lib/model_iterator.rb

Overview

Iterates over large models, storing state in Redis.

Defined Under Namespace

Classes: MaxIterations

Constant Summary collapse

VERSION =
"1.0.2"

Class Attribute Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(klass, *args) ⇒ ModelIterator

Initializes a ModelIterator instance.

klass - ActiveRecord::Base class to iterate. clause - String SQL WHERE clause, with ‘?’ placeholders for values. *values - Optional array of values to be added to a custom SQL WHERE

clause.

options - Optional Hash options.

        :redis     - A Redis object for storing the state.
        :order     - Symbol specifying the order to iterate.  :asc or
                     :desc.  Default: :asc
        :id_field  - String name of the ID column.  Default: "id"
        :id_clause - String name of the fully qualified ID column.
                     Prepends the model's table name to the front of
                     the ID field.  Default: "table_name.id"
        :start_id  - Fixnum to start iterating from.  Default: 1
        :prefix    - Custom String prefix for redis keys.
        :select    - Optional String of the columns to retrieve.
        :joins     - Optional Symbol or Hash :joins option for 
                     ActiveRecord::Base.find.
        :max       - Optional Fixnum of the maximum number of iterations.
                     Use max * limit to process a known number of records
                     at a time.
        :limit     - Fixnum limit of objects to fetch from the db.
                     Default: 100
        :conditions - Array of String SQL WHERE clause and optional values
                      (Will override clause/values given in arguments.)

ModelIterator.new(Repository, :start_id => 5000)
ModelIterator.new(Repository, 'public=?', true, :start_id => 1000)


92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/model_iterator.rb', line 92

def initialize(klass, *args)
  @klass = klass
  @options = if args.last.respond_to?(:fetch)
    args.pop
  else
    {}
  end
  @redis = @options[:redis] || self.class.redis
  @id_field = @options[:id_field] || klass.primary_key
  @id_clause = @options[:id_clause] || "#{klass.table_name}.#{@id_field}"
  @order = @options[:order] == :desc ? :desc : :asc
  op = @order == :asc ? '>' : '<'
  @max = @options[:max].to_i
  @joins = @options[:joins]
  @clause =  "#{@id_clause} #{op} ?"
  if @options[:conditions]
    conditions = Array(@options[:conditions])
    @clause += " AND (#{conditions.first})"
    @clause_args = conditions[1..-1]
  elsif !args.empty?
    @clause += " AND (#{args.shift})"
    @clause_args = args
  end
  @current_id = @options[:start_id]
  @limit = @options[:limit] || 100
  @job = @prefix = @key = nil
end

Class Attribute Details

.redisObject

Gets or sets a default Redis client object for iterators.



15
16
17
# File 'lib/model_iterator.rb', line 15

def redis
  @redis
end

Instance Attribute Details

#clauseObject (readonly)

Gets a String SQL Where clause fragment. Use ‘?` for variable substitution.

Returns a String.



33
34
35
# File 'lib/model_iterator.rb', line 33

def clause
  @clause
end

#clause_argsObject (readonly)

Gets an Array of values to be sql-escaped and joined with the clause.

Returns an Array of unescaped sql values.



38
39
40
# File 'lib/model_iterator.rb', line 38

def clause_args
  @clause_args
end

#current_id(refresh = false) ⇒ Object

Public: Points to the latest record that was yielded, by database ID.

refresh - Boolean that determines if the instance variable cache should

be reset first.  Default: false.

Returns a Fixnum.



126
127
128
129
# File 'lib/model_iterator.rb', line 126

def current_id(refresh = false)
  @current_id = nil if refresh
  @current_id ||= @redis.get(key).to_i
end

#id_clauseObject (readonly)

Gets the String fully qualified ID field (with the table name).



50
51
52
# File 'lib/model_iterator.rb', line 50

def id_clause
  @id_clause
end

#id_fieldObject (readonly)

Gets the String name of the ID field.



47
48
49
# File 'lib/model_iterator.rb', line 47

def id_field
  @id_field
end

#jobObject

Gets or sets a Proc that is called with each model instance while iterating. This is set automatically by #each.



57
58
59
# File 'lib/model_iterator.rb', line 57

def job
  @job
end

#joinsObject (readonly)

Gets the :joins value for ActiveRecord::Base.find.



53
54
55
# File 'lib/model_iterator.rb', line 53

def joins
  @joins
end

#klassObject (readonly)

Gets a reference to the ActiveRecord::Base class that is iterated.

Returns a Class.



21
22
23
# File 'lib/model_iterator.rb', line 21

def klass
  @klass
end

#limitObject

Gets or sets the number of records that are returned in each database query.

Returns a Fixnum.



27
28
29
# File 'lib/model_iterator.rb', line 27

def limit
  @limit
end

#maxObject (readonly)

Gets a Fixnum value of the maximum iterations to run, or 0.



44
45
46
# File 'lib/model_iterator.rb', line 44

def max
  @max
end

#prefixObject (readonly)

Gets a String used to prefix the redis keys used by this object.



41
42
43
# File 'lib/model_iterator.rb', line 41

def prefix
  @prefix
end

#redisObject

Gets or sets the Redis client object.



60
61
62
# File 'lib/model_iterator.rb', line 60

def redis
  @redis
end

Instance Method Details

#cleanupObject

Public: Cleans up any redis keys.

Returns nothing.



181
182
183
184
# File 'lib/model_iterator.rb', line 181

def cleanup
  @redis.del(key)
  @current_id = nil
end

#conditionsObject

Public: Gets an ActiveRecord :connections value, ready for ActiveRecord::Base.all.

Returns an Array with a String query clause, and unescaped db values.



199
200
201
# File 'lib/model_iterator.rb', line 199

def conditions
  [@clause, current_id, *@clause_args]
end

#eachObject Also known as: run

Public: Iterates through the whole dataset, yielding individual records as they are received. This calls #records multiple times, setting the #current_id after each run. If an exception is raised, the ModelIterator instance can safely be restarted, since all state is stored in Redis.

&block - Block that gets called with each ActiveRecord::Base instance.

Returns nothing.



142
143
144
145
146
147
148
149
150
151
# File 'lib/model_iterator.rb', line 142

def each
  @job = block = (block_given? ? Proc.new : @job)
  each_set do |records|
    records.each do |record|
      block.call(record)
      @current_id = record.send(@id_field)
    end
  end
  cleanup
end

#each_set(&block) ⇒ Object

Public: Iterates through the whole dataset. This calls #records multiple times, but does not set the #current_id after each record.

&block - Block that gets called with each ActiveRecord::Base instance.

Returns nothing.



159
160
161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/model_iterator.rb', line 159

def each_set(&block)
  loops = 0
  while records = self.records
    begin
      block.call(records)
      loops += 1
      if @max > 0 && loops >= @max
        raise MaxIterations, self
      end
    ensure
      @redis.set(key, @current_id) if @current_id
    end
  end
end

#find_optionsObject

Public: Builds the ActiveRecord::Base.find options for a single query.

Returns a Hash.



215
216
217
218
219
220
221
222
# File 'lib/model_iterator.rb', line 215

def find_options
  opt = {:conditions => conditions, :limit => @limit, :order => "#{@id_clause} #{@order}"}
  if columns = @options[:select]
    opt[:select] = columns
  end
  opt[:joins] = @joins if @joins
  opt
end

#keyObject



191
192
193
# File 'lib/model_iterator.rb', line 191

def key
  @key ||= "#{prefix}:current"
end

#recordsObject

Public: Queries the database for the next page of records.

Returns an Array of ActiveRecord::Base instances if any results are returned, or nil.



207
208
209
210
# File 'lib/model_iterator.rb', line 207

def records
  arr = @klass.all(find_options)
  arr.empty? ? nil : arr
end