Class: Furnish::Scheduler

Inherits:
Object
  • Object
show all
Includes:
Logger::Mixins
Defined in:
lib/furnish/scheduler.rb

Overview

This is a scheduler for provisioners. It can run in parallel or serial mode, and is dependency-based, that is, it will only schedule items for execution which have all their dependencies satisfied and items that haven’t will wait to execute until that happens.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeScheduler

Instantiate the Scheduler.



48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/furnish/scheduler.rb', line 48

def initialize

  raise "Cannot start; Furnish has not been initialized" unless Furnish.initialized?

  @force_deprovision  = false
  @solved_mutex       = Mutex.new
  @serial             = false
  @solver_thread      = nil
  @working_threads    = { }
  @queue              = Queue.new
  @vm                 = Furnish::VM.new
  @recovering         = false
  @signal_handler     = true
end

Instance Attribute Details

#force_deprovisionObject

Ignore exceptions while deprovisioning. Default is false.



33
34
35
# File 'lib/furnish/scheduler.rb', line 33

def force_deprovision
  @force_deprovision
end

#serialObject

Turn serial mode on (off by default). This forces the scheduler to execute every provision in order, even if it could handle multiple provisions at the same time.



27
28
29
# File 'lib/furnish/scheduler.rb', line 27

def serial
  @serial
end

#signal_handlerObject

When true, calling #run or #recover also installs a SIGINFO (Ctrl+T in the terminal on macs) and SIGUSR2 handler which can be used to get information on the status of what’s solved and what’s working.

Default is true.



43
44
45
# File 'lib/furnish/scheduler.rb', line 43

def signal_handler
  @signal_handler
end

#vmObject (readonly)

Access the VM object.



20
21
22
# File 'lib/furnish/scheduler.rb', line 20

def vm
  @vm
end

Instance Method Details

#deprovision_group(group_name, clean_state = true) ⇒ Object

Performs the deprovision of a group by replaying its provision strategy backwards and applying the #shutdown method instead of the #startup method. Removes it from the various state tables if true is set as the second argument, which is the default.

While this is a part of the public API, you should probably use #teardown or #teardown_group instead of this method, as they have better error handling and semantics. This “just does it”.



375
376
377
378
# File 'lib/furnish/scheduler.rb', line 375

def deprovision_group(group_name, clean_state=true)
  shutdown(group_name)
  delete_group(group_name) if clean_state
end

#group(name) ⇒ Object Also known as: g

Get the Furnish::ProvisionerGroup by name as it currently exists in the scheduler. Useful for querying properties of a given provisioner after they’ve been set.



108
109
110
# File 'lib/furnish/scheduler.rb', line 108

def group(name)
  vm.groups[name]
end

#needs_recoveryObject

A map of group name to Furnish::ProvisionerGroup for groups that failed their #startup or #shutdown. See #recover for more information



99
100
101
# File 'lib/furnish/scheduler.rb', line 99

def needs_recovery
  vm.need_recovery
end

#needs_recovery?Boolean

Is recovery necessary? See #recover.

Returns:

  • (Boolean)


91
92
93
# File 'lib/furnish/scheduler.rb', line 91

def needs_recovery?
  needs_recovery.count > 0
end

#recoverObject

Initiate recovery. While running, #recovering? will be true.

Recovery will step through all the items in #needs_recovery and attempt to recover them according to Furnish::ProvisionerGroup#recover. If recovery succeeds, the items will be in the solved formula and effectively provisioned. They will also be removed from the needs_recovery information.

If recovery fails, #needs_recovery will not be touched (but the state at which recovery starts the next attempt may be different for those groups). Additionally, the return value of this method will be keyed by the group name, and an exception or false depending on what we got back during recovery. It is strongly recommended you check #needs_recovery? or the return value after calling this to locate flapping groups.

Recovery is a serial process and blocks the main thread. It also installs a signal handler if #signal_handler is set. It does not interrupt or stop the scheduler, but note that in serial mode, the scheduler will likely already be stopped by the time you are able to call recovery. In threaded mode, this means any dependencies that are able to be provisioned after a successful recovery of a group will automatically start provisioning.



241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/furnish/scheduler.rb', line 241

def recover
  install_handler if signal_handler

  @recovering = true

  failures = { }

  needs_recovery.keys.each do |k|
    begin
      group = vm.groups[k]
      result = group.recover(force_deprovision)
      vm.groups[k] = group

      if result
        needs_recovery.delete(k)
        @queue << k
      else
        failures[k] = false
      end
    rescue => e
      failures[k] = e
    end
  end

  if @serial
    begin
      queue_loop
    rescue => e
      if_debug do
        puts "During recovery, serial mode, encountered: #{e}: #{e.message}"
      end
    end
  end

  @recovering = false

  return failures
end

#recovering?Boolean

Is recovery running? See #recover.

Returns:

  • (Boolean)


84
85
86
# File 'lib/furnish/scheduler.rb', line 84

def recovering?
  @recovering
end

#runObject

Start the scheduler. In serial mode this call will block until the whole dependency graph is satisfied, or one of the provisions fails, at which point an exception will be raised. In parallel mode, this call completes immediately, and you should use #wait_for to control main thread flow, and #running? and #stop to control and monitor the threads this class manages.



200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# File 'lib/furnish/scheduler.rb', line 200

def run
  # short circuit if we're not serial and already running
  return if running?

  install_handler if signal_handler

  if @serial
    service_resolved_waiters
    queue_loop
  else
    @solver_thread = Thread.new do
      with_timeout(false) { service_resolved_waiters }
      queue_loop
    end
  end
end

#running?Boolean

Ask the scheduler if it’s running. Returns nil in serial mode.

If there’s an exception waiting and the scheduler has stopped, it will be raised here.

Returns:

  • (Boolean)


69
70
71
72
73
74
75
76
77
78
79
# File 'lib/furnish/scheduler.rb', line 69

def running?
  return nil if @serial
  return nil unless @solver_thread
  if @solver_thread.alive?
    return true
  else
    # XXX if there's an exception to be raised, it'll happen here.
    @solver_thread.join
    return nil
  end
end

#schedule_provision(group_name, provisioners, dependencies = []) ⇒ Object Also known as: s, sched

Schedule a group of VMs for provision. This takes a group name, which is a string, an array of provisioner objects, and a list of string dependencies. If anything in the dependencies list hasn’t been pre-declared, it refuses to continue.

This method will return nil if the server group is already provisioned.



122
123
124
125
# File 'lib/furnish/scheduler.rb', line 122

def schedule_provision(group_name, provisioners, dependencies=[])
  group = Furnish::ProvisionerGroup.new(provisioners, group_name, dependencies)
  schedule_provisioner_group(group)
end

#schedule_provisioner_group(group) ⇒ Object Also known as: <<

Schedule a provision with a Furnish::ProvisionerGroup. Works exactly like Furnish::Scheduler#schedule_provision otherwise.



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/furnish/scheduler.rb', line 134

def schedule_provisioner_group(group)
  return nil if vm.groups[group.name]

  vm.groups[group.name] = group

  unless group.dependencies.all? { |x| vm.groups.has_key?(x) }
    raise "One of your dependencies for #{group.name} has not been pre-declared. Cannot continue"
  end

  vm.dependencies[group.name] = group.dependencies

  vm.sync_waiters do |waiters|
    waiters.add(group.name)
  end

  return true
end

#stopObject

Instructs the scheduler to stop. Note that this is not an interrupt, and the queue will still be exhausted before terminating.

It is a good idea to check #running? before calling this to ensure the scheduler did not halt with an exception.



287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'lib/furnish/scheduler.rb', line 287

def stop
  if @serial
    @queue << nil
  else
    @working_threads.values.map { |v| v.join rescue nil }
    if @solver_thread and @solver_thread.alive?
      @queue << nil
      sleep 0.1 until @queue.empty?
      @solver_thread.kill
    end

    @solver_thread = nil
  end
end

#teardown(exceptions = []) ⇒ Object

Instruct all provisioners except ones in the exception list to tear down. Calls #stop as its first action.

This is always done serially. For sanity.

If #force_provision is true, failed shutdowns from provisioners will not halt the deprovisioning process.



357
358
359
360
361
362
363
# File 'lib/furnish/scheduler.rb', line 357

def teardown(exceptions=[])
  stop

  (vm.groups.keys.to_set - exceptions.to_set).each do |group_name|
    deprovision_group(group_name) # clean this after everything finishes
  end
end

#teardown_group(group_name, wait = true) ⇒ Object Also known as: down, d

Teardown a single group – modifies the solved formula. Be careful to resupply dependencies if you use this, as nothing will resolve until you resupply it.

This takes an optional argument to wait for the group to be solved before attempting to tear it down. Setting this to false effectively says, “I know what I’m doing”, and you should feel bad if you file an issue because you supplied it.

If #force_provision is true, failed shutdowns from provisioners will not halt the deprovisioning process.



315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# File 'lib/furnish/scheduler.rb', line 315

def teardown_group(group_name, wait=true)
  begin
    wait_for(group_name) if wait
  rescue => e
    raise e unless force_deprovision
  end

  dependent_items = vm.dependencies.partition { |k,v| v.include?(group_name) }.first.map(&:first)

  if_debug do
    if dependent_items.length > 0
      puts "Trying to terminate #{group_name}, found #{dependent_items.inspect} depending on it"
    end
  end

  @solved_mutex.synchronize do
    dependent_and_working = @working_threads.keys & dependent_items

    if dependent_and_working.count > 0
      if_debug do
        puts "#{dependent_and_working.inspect} are depending on #{group_name}, which you are trying to deprovision."
        puts "We can't resolve this problem for you, and future converges may fail during this run that would otherwise work."
        puts "Consider using wait_for to better control the dependencies, or turning serial provisioning on."
      end
    end

    deprovision_group(group_name)
  end
end

#wait_for(*dependencies) ⇒ Object Also known as: w

Sleep until this list of dependencies are resolved. In parallel mode, will raise if an exception occurred while waiting for these groups, or the groups entered recovery state, or the scheduler is not currently running. In serial mode, wait_for just returns nil.



160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/furnish/scheduler.rb', line 160

def wait_for(*dependencies)
  return nil if @serial
  return nil if dependencies.empty?

  unless running?
    raise "The scheduler doesn't appear to be running or started. Can't wait_for anything!"
  end

  dep_set = Set[*dependencies]

  until dep_set & vm.solved == dep_set
    sleep 0.1
    @solver_thread.join unless @solver_thread.alive?

    dependencies_in_recovery = needs_recovery.keys.to_set & dep_set

    if needs_recovery? and !dependencies_in_recovery.empty?
      # we really can't get them all, but we can at least raise the first one.
      group_name = dependencies_in_recovery.first

      group_exception = needs_recovery[group_name]
      if group_exception
        raise group_exception
      else
        raise "group #{group_name} is in recovery during wait_for"
      end
    end
  end
end