Class: SpreadsheetAgent::Runner
- Defined in:
- lib/spreadsheet_agent/runner.rb
Overview
SpreadsheetAgent::Runner is a class designed to facilitate the automated traversal of all, or some defined set of pages, entries, and goals defined in a SpreadsheetAgent compatible Google Spreadsheet, and run agents or processes on them. By placing a SpreadsheetAgent::Runner script into the scheduling system (cron, etc.) on one or more compute nodes, desired pages, entries, and goals can be processed efficiently over a period of time, and new pages, entries, or goals can be automatically picked up as they are introduced. Runners can be designed to automate the submission of agent scripts, check the status of jobs, aggregate information about job status, or automate cleanup tasks.
Instance Attribute Summary collapse
-
#agent_bin ⇒ Object
String Path, Optional.
-
#debug ⇒ Object
Boolean, Optional (default false).
-
#dry_run ⇒ Object
Boolean.
-
#only_pages ⇒ Object
readonly
Readonly access to the array of pages to be processed.
-
#query_fields ⇒ Object
readonly
Readonly access to the Hash of key_fields, as defined in :config.
-
#run_in_serial ⇒ Object
Boolean, Optional (default false).
-
#sleep_between ⇒ Object
Integer, Optional (default 5).
Attributes inherited from Db
#config, #config_file, #db, #session
Instance Method Summary collapse
-
#initialize(attributes = { }) ⇒ Runner
constructor
Create a new SpreadsheetAgent::Runner.
-
#only_pages_if(&include_code) ⇒ Object
Provide a PROC desinged to intelligently determine pages to process.
-
#process!(&runner_code) ⇒ Object
Processes configured pages, entries, and goals with a PROC.
-
#skip_entry(&skip_code) ⇒ Object
Provide a PROC desinged to intelligently determine entries on any page to skip.
-
#skip_goal(&skip_code) ⇒ Object
Provide a PROC desinged to skip a specific goal in any entry on all pages processed.
-
#skip_pages_if(&skip_code) ⇒ Object
Provide a PROC designed to intelligently filter out pages that are not to be processed.
Methods inherited from Db
Constructor Details
#initialize(attributes = { }) ⇒ Runner
Create a new SpreadsheetAgent::Runner. Can be created with any of the following optional attributes:
-
:skip_pages - raises SpreadsheetAgentError if passed along with :only_pages
-
:only_pages - raises SpreadsheetAgentError if passed along with :skip_pages
-
:dry_run
-
:run_in_serial
-
:debug
-
:config_file (see SpreadsheetAgent::Db)
-
:sleep_between
-
:agent_bin
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/spreadsheet_agent/runner.rb', line 67 def initialize(attributes = { }) if (!attributes[:skip_pages].nil? && !attributes[:only_pages].nil?) raise SpreadsheetAgentError, "You cannot construct a runner with both only_pages and skip_pages" end @dry_run = attributes[:dry_run] @run_in_serial = attributes[:run_in_serial] @debug = attributes[:debug] @config_file = attributes[:config_file] @sleep_between = 5 unless attributes[:sleep_between].nil? @sleep_between = attributes[:sleep_between] end @agent_bin = find_bin() + '../agent_bin' unless attributes[:agent_bin].nil? @agent_bin = attributes[:agent_bin] end if attributes[:skip_pages] @skip_pages = attributes[:skip_pages].clone end if attributes[:only_pages] @only_pages = attributes[:only_pages].clone end build_db() @query_fields = build_query_fields() if @skip_pages skip_pages_if do |page| @skip_pages.include? page end end if @dry_run @debug = true end end |
Instance Attribute Details
#agent_bin ⇒ Object
String Path, Optional. The path to the directory containing agent executable programs that the default process PROC executes. The default is the ../agent_bin directory relative to the directory containing the calling script, $0.
41 42 43 |
# File 'lib/spreadsheet_agent/runner.rb', line 41 def agent_bin @agent_bin end |
#debug ⇒ Object
Boolean, Optional (default false). If true, information about pages, entries, and goals that are checked and filtered is printed to STDERR.
32 33 34 |
# File 'lib/spreadsheet_agent/runner.rb', line 32 def debug @debug end |
#dry_run ⇒ Object
Boolean. Optional (default false). If true, run will generate the commands that it would run for all runnable entry-goals, print them to STDERR, but not actually run the commands. Automatically sets debug to 1. Note, if the process_entries_with coderef is overridden, dry_run is ignored.
22 23 24 |
# File 'lib/spreadsheet_agent/runner.rb', line 22 def dry_run @dry_run end |
#only_pages ⇒ Object (readonly)
Readonly access to the array of pages to be processed. Only pages will only be defined when :only_pages or :skip_pages are defined in the constructor params, or when the skip_pages_if, or only_pages_if methods are called.
52 53 54 |
# File 'lib/spreadsheet_agent/runner.rb', line 52 def only_pages @only_pages end |
#query_fields ⇒ Object (readonly)
Readonly access to the Hash of key_fields, as defined in :config. The runner uses this to construct the commandline for each agent on each entry in a page that gets run, with the value of the GoogleDrive::List entry for the given ‘key’ passed as argument in the order specified by the ‘rank’ field for each key in the config.
47 48 49 |
# File 'lib/spreadsheet_agent/runner.rb', line 47 def query_fields @query_fields end |
#run_in_serial ⇒ Object
Boolean, Optional (default false). If true, the default process_entries_with PROC runs each agent_script executable in the foreground, rather than in the background, thus in serial. If false, all agent_script executables are run in parallel, in the background. This is not used when process_entries_with is set to a different PROC.
28 29 30 |
# File 'lib/spreadsheet_agent/runner.rb', line 28 def run_in_serial @run_in_serial end |
#sleep_between ⇒ Object
Integer, Optional (default 5). The number of seconds that the runner sleeps between each call to process an entry-goal.
36 37 38 |
# File 'lib/spreadsheet_agent/runner.rb', line 36 def sleep_between @sleep_between end |
Instance Method Details
#only_pages_if(&include_code) ⇒ Object
Provide a PROC desinged to intelligently determine pages to process. If not called, all pages not affected by the :skip_pages, or :only_pages constructor params, or a previous call to skip_pages_if will be processed. This will override only_pages, or skip_pages passed as arguments to the constructor, and any previous call to skip_pages_if, or only_pages_if. The PROC should take the title of a page as a string, and return true if a process decides to include the page, false otherwise. Must be called before the process! method to affect the pages it processes. Returns the runner self to facilitate chained processing with skip_goal, skip_entry, and/or process! if desired.
include only pages whose title begins with 'foo'
runner.only_pages_if {|title| title.match(/^foo/)}.process!
Same, but without calling process so that skip_entry or skip_goal can be called on the runner
runner.only_pages_if do |title|
title.match(/^foo/)
end
... can call skip_entry, skip_goal
runner.process!
152 153 154 155 |
# File 'lib/spreadsheet_agent/runner.rb', line 152 def only_pages_if(&include_code) @only_pages = @db.worksheets.collect{ |p| p.title }.select { |ptitle| include_code.call(ptitle) } self end |
#process!(&runner_code) ⇒ Object
Processes configured pages, entries, and goals with a PROC. The default PROC takes the entry, iterates over each goal not skipped by skip_goal, and:
-
determines if an executable #{ @agent_bin }/#{ goal }_agent.rb script exists
-
if so, executes the goal_agent script with commandline arguments constructed from the values in the entry for each field in the query_fields array defined in config.
If run_in_serial is false, the default PROC runs each agent in the background, in parallel. Otherwise, it runs each serially in the foreground. If dry_run is true, the command is printed to STDERR, but is not run. A PROC supplied to override the default PROC should take an GoogleDrive::List, and GoogleDrive::Worksheet as arguments. This allows the process to query the entry for information using its hash access, and/or update the entry on the spreadsheet. In order for changes to the GoogleDrive::List to take effect, the GoogleDrive::Worksheet must be saved in the PROC. The process sleeps @sleep_between between each call to the PROC (default or otherwise). If dry_run is true when a PROC is supplied, the page.title and runnable_entry hash inspection are printed to STDERR but the PROC is not called.
# call each goal agent script in agent_bin on each entry in each page
runner = SpreadsheetAgent::Runer.new
runner.process!
# find entries with a threshold > 5 and update the 'threshold_exceeded' field
runner.skip_entry{|entry| entry['threshold'] <= 5 }.process! do |entry,page|
entry.update 'threshold_exceeded', "1"
page.save
# only process entries on the 'main' page where the threshold has not been exceeded
runner.only_pages = ['main']
runner.skip_entry{|entry| entry['threshold'] != 1 }.process!
227 228 229 230 231 232 233 234 235 236 237 238 239 |
# File 'lib/spreadsheet_agent/runner.rb', line 227 def process!(&runner_code) get_runnable_entries().each do |entry_info| entry_page, runnable_entry = entry_info if runner_code.nil? default_process(runnable_entry) elsif @dry_run $stderr.print "Would run #{ entry_page.title } #{ runnable_entry.inspect }" else runner_code.call(runnable_entry, entry_page) end sleep @sleep_between end end |
#skip_entry(&skip_code) ⇒ Object
Provide a PROC desinged to intelligently determine entries on any page to skip. If not called, all entries on processed pages will be processed. The PROC should take a GoogleDrive::List representing the record in the spreadsheet, which can be accessed as a Hash with fields as key and that fields value as value. It should return true if the code decides to skip processing the entry, false otherwise. Must be called before the process! method to affect the entries on each page that it processes. Returns the runner self to facilitate chained processing with skip_pages_if, only_pages_if, skip_goal, and/or process! if desired.
skip entries which have run foo or bar
runner.only_pages_if {|entry| entry['foo'] == 1 || entry['bar'] == 1 }.process!
skip entries that a human reading the spreadsheet has annotated with less than 3.5 in the 'threshold' field
runner.only_pages_if do |entry|
entry['threshold'] < 3.5
end
... can call skip_pages_if, only_pages_if, skip_goal
runner.process!
175 176 177 178 |
# File 'lib/spreadsheet_agent/runner.rb', line 175 def skip_entry(&skip_code) @skip_entry_code = skip_code self end |
#skip_goal(&skip_code) ⇒ Object
Provide a PROC desinged to skip a specific goal in any entry on all pages processed. If not called, all goals of each entry and page to be processed by the runner will be processed.
[note!] Ignored when a PROC is passed to the process! method, e.g. it is only used when process! executes
agent scripts for the goal.
The PROC should take a string, which will be one of the header fields in the spreadsheet. It should return true if that goal is to be skipped, falsed otherwise. Returns the runner self to facilitate chained processing with skip_pages_if, only_pages_if, skip_entry, and/or process! if desired.
skip the 'post_process' goal on each entry of each page processed
runner.skip_goal{|goal| goal == 'post_process' }.process!
This is best when used in conjunction with skip_entry to skip_goals for particular entries runner.skip_entry{|entry| entry < 2.5 }.skip_goal{|goal| goal == ‘post_process’ }.process!
195 196 197 198 |
# File 'lib/spreadsheet_agent/runner.rb', line 195 def skip_goal(&skip_code) @skip_goal_code = skip_code self end |
#skip_pages_if(&skip_code) ⇒ Object
Provide a PROC designed to intelligently filter out pages that are not to be processed. If not called, all pages not defined in :only_pages, or :skip_pages parameters in the constructor, or a previous call to only_pages_if will be processed. This will override only_pages, or skip_pages passed as arguments to the constructor, and any previous call to skip_pages_if, or only_pages_if. The PROC should take the title of a page as a string, and return true if a process decides to skip the page, false otherwise. Must be called before the process! method to affect the pages it processes. Returns the runner self to facilitate chained processing with skip_goal, skip_entry, and/or process! if desired.
skip pages whose title contains 'skip'
runner.skip_pages_if {|title| title.match(/skip/) }.process!
Same, but without calling process so that skip_entry or skip_goal can be called on the runner
runner.skip_pages_if do |title|
title.match(/skip/)
end
... can call skip_entry, skip_goal, etc
runner.process!
128 129 130 131 |
# File 'lib/spreadsheet_agent/runner.rb', line 128 def skip_pages_if(&skip_code) @only_pages = @db.worksheets.collect{ |p| p.title }.reject{ |ptitle| skip_code.call(ptitle) } self end |