Jongleur

Jongleur is a process scheduler and manager. It allows its users to declare a number of executable tasks as Ruby classes, define precedence between those tasks and run each task as a separate process.

Jongleur is particularly useful for implementing workflows modeled as a DAG (Directed Acyclic Graph), but can be also used to run multiple tasks in parallel or even sequential workflows where each task needs to run as a separate OS process.

Environment

This gem has been built using the POSIX/UNIX process model. It will work on Linux and Mac OS but not on Windows.

Jongleur has been tested with MRuby 2.4.3, 2.4.4, 2.5.0 and 2.5.1. I would also expect it to work with other Ruby implementations too, such as JRuby or Rubinius though it hasn't yet been tested on those.

Installation

Add this line to your application's Gemfile:

gem 'jongleur'

And then execute:

$ bundle

Or install it yourself as:

$ gem install jongleur

What does it do?

In a nutshell, Jongleur keeps track of a number of tasks and executes them as separate OS processes according to their precedence criteria. For instance, if there are 3 tasks A, B and C, and task C depends on A and B, Jongleur will start executing A and B in separate processes (i.e. in parallel) and will wait until they are both finished before it executes C in a separate process.

Jongleur is ideal for running workflows represented as DAGs, but is also useful for simply running tasks in parallel or for whenever you need some multi-processing capability.

Concepts

Task Graph

To run Jongleur, you will need to define the tasks to run and their precedence. A Task Graph is a representation of the tasks to be run by Jongleur and it usually (but not exclusively) represents a DAG, as in the examples below:

DAG examples

A Task Graph is defined as a Hash in the following format:

{task-name => list[names-of-dependent-tasks]}

So the first graph would be defined as:

my_graph = {
  s: [:q, :r, :t],
  q: [:r],
  r: [],
  t: []
}

where they Hash key is the class name of a Task and the Hash value is an Array of other Tasks that can be run only after this Task is finished. So in the above example:

Tasks Q, R and T can only start after task S has finished.
Task R can only start after Q has finished.
Tasks T and T have no dependents. No other task need wait for them.

N.B: Since the Task Graph is a Hash, any duplicate key entries will be overriden. For instance, if this Task Graph

my_task_graph = { A: [:B, :C], B: [:D] }

is re-defined as

my_task_graph = { A: [:B], A: [:C], B: [:D] }

The 2nd assignment of A will override the first one so your graph will be:

{:A=>[:C], :B=>[:D]}

Always assign all dependent tasks together in a single list.

Task Matrix

It's a tabular real-time representation of the state of task execution. It can be invoked at any time with

Jongleur::API.task_matrix

After defining your Task Graph and before running Jongleur, your Task Matrix should look like this:

#<Jongleur::Task name=:A, pid=-1, running=false, exit_status=nil, success_status=nil>,
#<Jongleur::Task name=:B, pid=-1, running=false, exit_status=nil, success_status=nil>,
#<Jongleur::Task name=:C, pid=-1, running=false, exit_status=nil, success_status=nil>,
#<Jongleur::Task name=:D, pid=-1, running=false, exit_status=nil, success_status=nil>,
#<Jongleur::Task name=:E, pid=-1, running=false, exit_status=nil, success_status=nil>

After Jongleur finishes, your Task Matrix will look something like this:

#<Jongleur::Task name=:A, pid=95117, running=false, exit_status=0, success_status=true>
#<Jongleur::Task name=:B, pid=95118, running=false, exit_status=0, success_status=true>
#<Jongleur::Task name=:C, pid=95120, running=false, exit_status=0, success_status=true>
#<Jongleur::Task name=:D, pid=95122, running=false, exit_status=0, success_status=true>
#<Jongleur::Task name=:E, pid=95123, running=false, exit_status=0, success_status=true>

The Jongleur::Task attribute values are as follows

name : the Task name
pid : the Task process id (nil if the task hasn't yet ran)
running : true if task is currently running
exit_status : usually 0 if process finished without errors, <>0 or nil otherwise
success_status : true if process finished successfully, false if it didn't or nil if process didn't exit at all

WorkerTask

This is the implementation template for a Task. For each Task in your Task Graph you must provide a class that derives from WorkerTask and implements the execute method. This method is what will be called by Jongleur when the Task is ready to run.

Usage

Using Jongleur is easy:

(Optional) You may want to head your code with require Jongleur so that you won't have to namespace every api call.

Define your Task Graph

test_graph = {
  A: [:B, :C],
  B: [:D],
  C: [:D],
  D: [:E],
  E: []
}

Add your Task Graph to Jongleur

API.add_task_graph test_graph

=> [#<struct Jongleur::Task name=:A, pid=-1, running=false, exit_status=nil, success_status=nil>,
    #<struct Jongleur::Task name=:B, pid=-1, running=false, exit_status=nil, success_status=nil>,
    #<struct Jongleur::Task name=:C, pid=-1, running=false, exit_status=nil, success_status=nil>,
    #<struct Jongleur::Task name=:D, pid=-1, running=false, exit_status=nil, success_status=nil>,
    #<struct Jongleur::Task name=:E, pid=-1, running=false, exit_status=nil, success_status=nil>]

Jongleur will show you the Task Matrix for your Task Graph with all attributes set at their initial values, obviously, since the Tasks haven't ran yet.

(Optional) You may want to see a graphical representation of your Task Graph
```
API.print_graph('/tmp')

=> "/tmp/jongleur_graph_08252018_194828.pdf"
```
Opening the PDF file will display this:
Implement your tasks. To do that you have to (i) create a new class, based on WorkerTask and (ii) define and #execute method in your class. This is the method hat Jongleur will call to run the Task. For instance task A from your Task Graph may look something like that:
```
   class A < Jongleur::WorkerTask
      @desc = 'this is task A'
      def execute
       sleep 1
       'A is running... '
      end
    end
```
You'll have to do the same for Tasks B, C, D and E, as these ae the tasks declared in the Task Graph.

Run the tasks. Ok, pay attention now because this is the complex bit. Nah, only joking - it's simply:

   $> API.run

  => Starting workflow...
  => starting task A
 => finished task: A, process: 2501, exit_status: 0, success: true
 => starting task B
 => starting task C
 => finished task: C, process: 2503, exit_status: 0, success: true
 => finished task: B, process: 2502, exit_status: 0, success: true
 => starting task D
 => finished task: D, process: 2505, exit_status: 0, success: true
 => starting task E
 => finished task: E, process: 2506, exit_status: 0, success: true
 => Workflow finished

A simple example of a client app for Jongleur can be found on GitLab

Use-Cases

Extract-Transform-Load

The ETL workflow is ideally suited to Jongleur. You can define many Extraction tasks -maybe separate Tasks for different data sources- and have them ran in parallel to each other. At the same time Transformation and Loading Tasks wait in turn for the previous task to finish before they start, as in this DAG illustration:

ETL DAG

Transactions

Transactional workflows can be greatly sped up by Jongleur by parallelising parts of the transaction that are usually performed sequentially, i.e:

Transaction DAG

Development

After checking out the repo, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

F.A.Q

Does Jongleur allow me to pass messages between Tasks?

No it doesn't. Each task is run competely independently from the other Tasks. There is no Inter-Process Communication, no common data contexts, no shared memory.

This is something that I wouldl ike to build into Jongleur. For now, you can save a Task's data in a detabase or KV Store and using the Tasks process id as part of the key. Subsequent Tasks can retrieve their predecessor's process ids with

API.get_predecessor_pids

and therefore retrieve the data created by those Tasks.

What's the difference between Jongleur::Task's success_status and exit_status attributes?

According to the official docs exit_status returns the least significant eight bits of the return code of the stat call while success_status returns true if stat is successful.

What happens when Jongleur finishes running?

When Jongleur finishes running all tasks in its Task Graph -and regardless of whether the Tasks themselves have failed ot not- it will exit the parent process with an exit code of 0.

What happens if a Task fails

If a Task fails to run or to finish its run, Jongleur will simply go on running any other tasks it can. It will not run any Tasks which depend on the failed Task. The status of the failed Task will be indicated via an appropriate output message and also on the Task Matrix.

How can I examine the Task Matrix after Jongleur has finished?

Jongleur serializes each run's Task Matrix as a JSON file in the /tmp directory. You can either view this in an editor or load it and manipulate it in Ruby with

JSON.parse( File.read('/tmp/jongleur_task_matrix_08272018_103406.json') )

Roadmap

These are the things I'd like Jongleur to support in future releases:

Task storage mechanism, i.e. the ability for each Task to save data in a uniquely identifiable and safe way so that data can be shared between sequential tasks in a transparent and easy manner.
Rails integration. Pretty self-explanatory really.

Contributing

Any suggestions for new features or improvements are very welcome. Please raise bug reports and pull requests on GitLab.

License

The gem is available as open source under the terms of the MIT License