EXEL
EXEL is the Elastic eXEcution Language, a simple Ruby DSL for creating processing jobs that can be run on a single machine, or scaled up to run on dozens of machines with no changes to the job itself. To run a job on more than one machine, simply install EXEL async and remote provider gems to integrate with your preferred platforms. The currently implemented providers so far are:
Async Providers
Remote Providers
Installation
Add this line to your application's Gemfile:
gem 'exel'
And then execute:
$ bundle
Or install it yourself as:
$ gem install exel
Usage
Processors
A processor can be any class that provides the following interface:
class MyProcessor
def initialize(context)
# typically context is assigned to @context here
end
def process(block)
# do your work here
end
end
Processors are initialized immediately before #process
is called, allowing them to set up any state that they need from the context. The #process
method is where your processing logic will be implemented. Processors should be focused on performing one particular aspect of the processing that you want to accomplish, allowing your job to be composed of a sequence of small processing steps. If a block was given in the call to process
in the job DSL, it will be passed as the argument to #process
and can be run with: block.run(@context)
The Context
The Context
class has a Hash-like interface and acts as shared storage for the various processors that make up a job. Processors take their expected inputs from the context, and place any resulting outputs there for subsequent processors to access. Values are typically placed in the context through the following means:
- Initial context set up before the job is run
- Arguments passed to processors in the job DSL
- Outputs assigned by processors during processing
If you use EXEL with an async provider, such as exel-sidekiq, and a remote provider, such as exel-s3, a context switch will occur when the async
command is executed. Context shifts involve serializing the context and uploading it via the remote provider, then downloading and deserializing it when the async block is eventually run. This allows the processors to pass the results of their process through the sequence of processors in the job, without having to be concerned with when, where, or how those processors will be run.
Supported Commands
process
Execute the given processor class (specified by the:with
option), given the current context and any additional arguments providedsplit
Split the input data into 1000 line chunks and run the given block for each chunk. Assumes that the input data is a CSV formatted file referenced bycontext[:resource]
. When each block is run,context[:resource]
will reference to the chunk file.async
Asynchronously run the given block. Uses the configured async provider to execute the block.
Example job
EXEL::Job.define :example_job do
# Download a large CSV data file
process with: FTPDownloader, host: ftp.example.com, path: context[:file_path]
# split it into smaller 1000 line files
split do
# for each file asynchronously run the following sequence of processors
async do
process with: RecordLoader # convert each row of data into your domain model
process with: SomeProcessor # apply some additional processing to each record
process with: RecordSaver # write this batch of records to your database
process with: ExternalServiceProcessor # interact with some service, ex: updating a search index
end
end
end
Elsewhere in your application, you could run this job as follows:
def run_example_job(file_path)
context = EXEL::Context.new(file_path: file_path, user: 'username')
EXEL::Job.run(:example_job, context)
end
Contributing
- Fork it ( https://github.com/[my-github-username]/exel/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request