RedStorm - JRuby on Storm

build status

RedStorm provides a Ruby DSL using JRuby integration for the Storm distributed realtime computation system.

Like RedStorm? visit us on IRC at #redstorm on freenode

Check also these related projects:

Documentation

Chances are new versions of RedStorm will introduce changes that will break compatibility or change the developement workflow. To prevent out-of-sync documentation, per version specific documentation are kept in the wiki when necessary.

Dependencies

Tested on OSX 10.8.3 and Ubuntu Linux 12.10 using Storm 0.9.0-wip16 and JRuby 1.7.4 and OpenJDK 7

Installation

  • RubyGems
  $ gem install redstorm
  • Bundler
  source "https://rubygems.org"
  gem "redstorm", "~> 0.6.6"

Usage

Overview

  • create a project directory
  • install the RedStorm gem
  • create a subdirectory for your topology code
  • perform the initial setup as to build and install dependencies
  $ redstorm install
  • run your topology in local mode
  $ redstorm local <path/to/topology_class_file_name.rb>

Initial setup

$ redstorm install

This will install default Java jar dependencies in target/dependency, generate & compile the Java bindings in target/classes.

By default, the Java compilation will use the current JVM version as source/target compilation compatibility. If you want to force a specific source/target version, use the --[JVM VERSION] option. For example, to force compiling for 1.6 use:

$ redstorm install --1.6

Create a topology

Create a subdirectory for your topology code and create your topology class using this naming convention: underscore topology_class_file_name.rb MUST correspond to its CamelCase class name.

Here's an example hello_world_topology.rb

require 'red_storm'

class HelloWorldSpout < RedStorm::DSL::Spout
  on_init {@words = ["hello", "world"]}
  on_send {@words.shift unless @words.empty?}
end

class HelloWorldBolt < RedStorm::DSL::Bolt
  on_receive :emit => false do |tuple|
    log.info(tuple[0])
  end
end

class HelloWorldTopology < RedStorm::DSL::Topology
  spout HelloWorldSpout do
    output_fields :word
  end

  bolt HelloWorldBolt do
    source HelloWorldSpout, :global
  end
end

Gems in your topology

RedStorm requires Bundler if gems are needed in your topology. Supply a Gemfile in the root of your project directory with the gems required in your topology. If you are using Bundler also for other gems than those required in the topology you should group the topology gems in a Bunder group of your choice.

Note that bundler is only used to help package the gems prior to running a topology. Your topology code should not use Bundler. With require "red_storm" in your topology class file, RedStorm will take care of setting the gems path. Do not require 'bundler/setup' in the topology.

  1. have Bundler install the gems locally
  $ bundle install
  1. copy the topology gems into the target/gems directory
  $ redstorm bundle [BUNDLER_GROUP]
  1. make sure your topology class has require "red_storm"
  require 'red_storm'

The redstorm bundle command copy the gems specified in the Gemfile (in a specific group if specified) into the target/gems directory. In order for the topology to run in a Storm cluster, the fully installed gems must be packaged and self-contained into a single jar file. This has an important consequence: the gems will not be installed on the cluster target machines, they are already installed in the jar file. This could lead to problems if the machine used to install the gems is of a different architecture than the cluster target machines and some of these gems have C or FFI extensions.

Custom Jar dependencies in your topology (XML Warning! :P)

By defaut, RedStorm installs Storm and JRuby jars dependencies into target/dependency. RedStorm uses Ivy 2.3 to manage dependencies. You can fully control and customize these dependencies.

There are two distinct sets of dependencies: the storm dependencies manages the requirements (Storm jars) for the Storm local mode runtime. The topology dependencies manages the requirements (JRuby jars) for the topology runtime.

You can supply custom storm and topology dependencies by creating ivy/storm_dependencies.xml and ivy/topology_dependencies.xml files. Below are the current default content for these files:

  • ivy/storm_dependencies.xml
  <?xml version="1.0"?>
  <ivy-module version="2.0" xmlns:m="http://ant.apache.org/ivy/maven">
    <info organisation="redstorm" module="storm-deps"/>
    <dependencies>
      <dependency org="storm" name="storm" rev="0.9.0-wip16" conf="default" transitive="true" />
      <override org="org.slf4j" module="slf4j-log4j12" rev="1.6.3"/>
    </dependencies>
  </ivy-module>
  • ivy/topology_dependencies.xml
  <?xml version="1.0"?>
  <ivy-module version="2.0" xmlns:m="http://ant.apache.org/ivy/maven">
    <info organisation="redstorm" module="topology-deps"/>
    <dependencies>
      <dependency org="org.jruby" name="jruby-core" rev="1.7.4" conf="default" transitive="true"/>

      <!-- explicitely specify jffi to also fetch the native jar. make sure to update jffi version matching jruby-core version -->
      <!-- this is the only way I found using Ivy to fetch the native jar -->
      <dependency org="com.github.jnr" name="jffi" rev="1.2.5" conf="default" transitive="true">
        <artifact name="jffi" type="jar" />
        <artifact name="jffi" type="jar" m:classifier="native"/>
      </dependency>

    </dependencies>
  </ivy-module>

The jars repositories can be configured by adding the ivy/settings.xml file in the root of your project. For information on the Ivy settings format, see the Ivy Settings Documentation. Below is the current default:

  • ivy/settings.xml
  <?xml version="1.0"?>
  <ivysettings>
    <settings defaultResolver="repositories"/>
    <resolvers>
      <chain name="repositories">
        <ibiblio name="ibiblio" m2compatible="true"/>
        <ibiblio name="sonatype" root="http://repo.maven.apache.org/maven2/" m2compatible="true"/>
        <ibiblio name="clojars" root="http://clojars.org/repo/" m2compatible="true"/>
        <ibiblio name="conjars" root="http://conjars.org/repo/" m2compatible="true"/>
      </chain>
    </resolvers>
  </ivysettings>

Run in local mode

$ redstorm local <sources_directory_path/topology_class_file_name.rb>

note that the topology can also be launched with the following command:

$ java -cp "target/classes:target/dependency/storm/default/*:target/dependency/topology/default/*:<sources_directory_path>" redstorm.TopologyLauncher local <sources_directory_path/topology_class_file_name.rb>

See examples below to run examples in local mode or on a production cluster.

Run on production cluster

Locally installing the Storm distribution is not required. Note that RedStorm does not provide the Storm command line utilities and you will need to install the Storm distribution to use the Storm command line utilities.

  1. create the Storm config file ~/.storm/storm.yaml and add the following
  nimbus.host: "host_name_or_ip"

you can also use an alternate path and use the --config <other/path/to/config.yaml>

  1. generate target/cluster-topology.jar. This jar file will include your sources directory plus the required dependencies
  $ redstorm jar <sources_directory1> <sources_directory2> ...
  1. submit the cluster topology jar file to the cluster
  $ redstorm cluster <sources_directory/topology_class_file_name.rb>

or if you have an alternate Storm config path:

  $ redstorm cluster --config <some/other/path/to/config.yaml> <sources_directory/topology_class_file_name.rb>

note that the cluster topology jar can also be submitted using the storm command with:

  $ storm jar target/cluster-topology.jar -Djruby.compat.version=RUBY1_9 redstorm.TopologyLauncher cluster <sources_directory/topology_class_file_name.rb>

The Storm wiki has instructions on setting up a production cluster. You can also manually submit your topology.

Examples

Install the example files in your project. The examples/ dir will be created in your project root dir.

$ redstorm examples

All examples using the DSL are located in examples/dsl. Examples using the standard Java interface are in examples/native.

Local mode

Example topologies without gems

$ redstorm local examples/dsl/exclamation_topology.rb
$ redstorm local examples/dsl/exclamation_topology2.rb
$ redstorm local examples/dsl/word_count_topology.rb

Example topologies with gems

For examples/dsl/redis_word_count_topology.rb the redis gem is required and you need a Redis server running on localhost:6379

  1. create a Gemfile
  source "https://rubygems.org"

  group :word_count do
      gem "redis"
  end
  1. install the topology gems
  $ bundle install
  $ redstorm bundle word_count
  1. run the topology in local mode
  $ redstorm local examples/dsl/redis_word_count_topology.rb

Using redis-cli push words into the test list and watch Storm pick them up

Remote cluster

All examples using the DSL can run in both local or on a remote cluster. The only native example compatible with a remote cluster is examples/native/cluster_word_count_topology.rb.

Topologies without gems

  1. genererate the target/cluster-topology.jar and include the examples/ directory
  $ redstorm jar examples
  1. submit the cluster topology jar file to the cluster
  $ redstorm cluster examples/dsl/exclamation_topology.rb
  $ redstorm cluster examples/dsl/exclamation_topology2.rb
  $ redstorm cluster examples/dsl/word_count_topology.rb

Topologies with gems

For examples/dsl/redis_word_count_topology.rb the redis gem is required and you need a Redis server running on localhost:6379

  1. create a Gemfile
  source "https://rubygems.org"

  group :word_count do
      gem "redis"
  end
  1. install the topology gems
  $ bundle install
  $ redstorm bundle word_count
  1. genererate the target/cluster-topology.jar and include the examples/ directory
  $ redstorm jar examples
  1. submit the cluster topology jar file to the cluster
  $ redstorm cluster examples/dsl/redis_word_count_topology.rb

Using redis-cli push words into the test list and watch Storm pick them up

The Storm wiki has instructions on setting up a production cluster. You can also manually submit your topology.

Ruby DSL

Ruby DSL Documentation

Multilang ShellSpout & ShellBolt support

Please refer to Using non JVM languages with Storm for the complete information on Multilang & shelling in Storm.

In RedStorm ShellSpout and ShellBolt are supported using the following construct in the topology definition:

bolt JRubyShellBolt, ["python", "splitsentence.py"] do
  output_fields "word"
  source SimpleSpout, :shuffle
end
  • JRubyShellBolt must be used for a ShellBolt and the array argument ["python", "splitsentence.py"] are the arguments to the class constructor and are the commands to the ShellBolt.

  • The directory containing the topology class must contain a resources/ directory with all the shell files.

See the shell topology example

Transactional and LinearDRPC topologies

Despite the fact that both transactional and linear DRPC topologies are now deprecated as of Storm 0.8.1 work on these has been merged in RedStorm 0.6.5. Lots of the work done on this is required toward Storm Trident topologies. Documentation and examples for transactional and linear DRPC topologies will be added shorty.

Known issues

  • SnakeYAML conflict between Storm and JRuby

See issue. This is a classic Java world jar conflict. Storm 0.9.0 uses snakeyaml 1.9 and JRuby 1.7.x uses snakeyaml 1.11. If you try to use YAML serialization in your topology it will crash with an exception. This problem is easy to solve when running topologies in local mode, simply override in the storm dependencies with the correct jar version. You can do this be creating a custom storm dependencies:

  • ivy/storm_dependencies.xml

    <?xml version="1.0"?>
    <ivy-module version="2.0">
      <info organisation="redstorm" module="storm-deps"/>
      <dependencies>
        <dependency org="storm" name="storm" rev="0.9.0-wip16" conf="default" transitive="true" />
        <override org="org.slf4j" module="slf4j-log4j12" rev="1.6.3"/>
        <override org="org.yaml" module="snakeyaml" rev="1.11"/>
      </dependencies>
    </ivy-module>
    

In remote cluster mode you will have to update snakeyaml manually or with your favorite deployment/provisioning tool.

RedStorm Development

It is possible to fork the RedStorm project and run local and remote/cluster topologies directly from the project sources without installing the gem. This is a useful setup when contributing to the project.

Requirements

  • JRuby 1.7.4

Workflow

  • fork project and create branch

  • install RedStorm required gems

  $ bundle install
  • install dependencies in target/dependencies
  $ bundle exec redstorm deps
  • generate and build Java source into target/classes
  $ bundle exec redstorm build

By default, the Java compilation will use the current JVM version as source/target compilation compatibility. If you want to force a specific source/target version, use the --[JVM VERSION] option. For example, to force compiling for 1.6 use:

  $ bundle exec redstorm build --1.6

if you modify any of the RedStorm Ruby code or Java binding code, you need to run this to refresh code and rebuild the bindings

  • follow the normal usage patterns to run the topology in local or remote cluster.
  $ bundle exec redstorm bundle ...
  $ bundle exec redstorm local ...
  $ bundle exec redstorm jar ...
  $ bundle exec redstorm cluster ...

Remote cluster testing

Vagrant & Chef configuration to create a single node test Storm cluster is available in https://github.com/colinsurprenant/redstorm/tree/master/vagrant/

Notes about 1.8/1.9 JRuby compatibility

Ruby 1.9 is the default runtime mode in JRuby 1.7.x

If you require Ruby 1.8 support, there are two ways to have JRuby run in 1.8 runtime mode:

  • by setting the JRUBY_OPTS env variable
  $ export JRUBY_OPTS=--1.8
  • by using the --1.8 option
  $ jruby --1.8 -S redstorm ...

By defaut, a topology will be executed in the same mode as the interpreter running the $ redstorm command. You can force RedStorm to choose a specific JRuby compatibility mode using the [--1.8|--1.9] parameter for the topology execution in local or remote cluster.

$ redstorm local|cluster [--1.8|--1.9] ...

If you are not using the DSL and only using the proxy classes (like in examples/native) you will need to manually set the JRuby version in the Storm Backtype::Config object like this:

class SomeTopology
  RedStorm::Configuration.topology_class = self

  def start(base_class_path, env)
    builder = TopologyBuilder.new
    builder.setSpout ...
    builder.setBolt ...

    conf = Backtype::Config.new
    conf.put("topology.worker.childopts", "-Djruby.compat.version=RUBY1_8")

    StormSubmitter.submitTopology("some_topology", conf, builder.createTopology);
  end
end

How to contribute

Fork the project, create a branch and submit a pull request.

Some ways you can contribute:

  • by reporting bugs using the issue tracker
  • by suggesting new features using the issue tracker
  • by writing or editing documentation
  • by writing specs
  • by writing code
  • by refactoring code
  • ...

Projects using RedStorm

If you want to list your RedStorm project here, contact me.

  • Tweigeist - realtime computation of the top trending hashtags on Twitter. See Live Demo.

Author

Colin Surprenant, http://github.com/colinsurprenant/, @colinsurprenant, [email protected], http://colinsurprenant.com/

Contributors

License

Apache License, Version 2.0. See the LICENSE.md file.