NAME

forkoff

SYNOPSIS

brain-dead simple parallel processing for ruby

URI

http://rubyforge.org/projects/codeforpeople
http://github.com/ahoward/forkoff

INSTALL

gem install forkoff

DESCRIPTION

forkoff works for any enumerable object, iterating a code block to run in a
child process and collecting the results.  forkoff can limit the number of
child processes which is, by default, 2.

SAMPLES

<========< samples/a.rb >========>

~ > cat samples/a.rb

  # forkoff makes it trivial to do parallel processing with ruby, the following
  # prints out each word in a separate process
  #

    require 'forkoff'

    %w( hey you ).forkoff!{|word| puts "#{ word } from #{ Process.pid }"}

~ > ruby samples/a.rb

  hey from 7907
  you from 7908

<========< samples/b.rb >========>

~ > cat samples/b.rb

  # for example, this takes only 4 seconds or so to complete (8 iterations
  # running in two processes = twice as fast)
  #

    require 'forkoff'

    a = Time.now.to_f

    results =
      (0..7).forkoff do |i|
        sleep 1
        i ** 2
      end

    b = Time.now.to_f

    elapsed = b - a

    puts "elapsed: #{ elapsed }"
    puts "results: #{ results.inspect }"

~ > ruby samples/b.rb

  elapsed: 4.19184589385986
  results: [0, 1, 4, 9, 16, 25, 36, 49]

<========< samples/c.rb >========>

~ > cat samples/c.rb

  # forkoff does *NOT* spawn processes in batches, waiting for each batch to
  # complete.  rather, it keeps a certain number of processes busy until all
  # results have been gathered.  in otherwords the following will ensure that 3
  # processes are running at all times, until the list is complete. note that
  # the following will take about 3 seconds to run (3 sets of 3 @ 1 second).
  #

  require 'forkoff'

  pid = Process.pid

  a = Time.now.to_f

  pstrees =
    %w( a b c d e f g h i ).forkoff! :processes => 3 do |letter|
      sleep 1
      { letter => ` pstree -l 2 #{ pid } ` }
    end

  b = Time.now.to_f

  puts
  puts "pid: #{ pid }"
  puts "elapsed: #{ b - a }"
  puts

  require 'yaml'

  pstrees.each do |pstree|
    y pstree
  end

~ > ruby samples/c.rb

  pid: 7922
  elapsed: 3.37899208068848

  --- 
  a: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |-+- 07923 ahoward ruby -Ilib samples/c.rb
     |-+- 07924 ahoward (ruby)
     \-+- 07925 ahoward ruby -Ilib samples/c.rb

  --- 
  b: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |-+- 07923 ahoward ruby -Ilib samples/c.rb
     |-+- 07924 ahoward ruby -Ilib samples/c.rb
     \-+- 07925 ahoward ruby -Ilib samples/c.rb

  --- 
  c: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |-+- 07923 ahoward ruby -Ilib samples/c.rb
     |-+- 07924 ahoward (ruby)
     \-+- 07925 ahoward ruby -Ilib samples/c.rb

  --- 
  d: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |-+- 07932 ahoward ruby -Ilib samples/c.rb
     |--- 07933 ahoward ruby -Ilib samples/c.rb
     \--- 07934 ahoward ruby -Ilib samples/c.rb

  --- 
  e: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |--- 07932 ahoward (ruby)
     |-+- 07933 ahoward ruby -Ilib samples/c.rb
     \-+- 07934 ahoward (ruby)

  --- 
  f: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |--- 07932 ahoward (ruby)
     |-+- 07933 ahoward ruby -Ilib samples/c.rb
     \-+- 07934 ahoward ruby -Ilib samples/c.rb

  --- 
  g: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |-+- 07941 ahoward ruby -Ilib samples/c.rb
     |--- 07942 ahoward ruby -Ilib samples/c.rb
     \--- 07943 ahoward ruby -Ilib samples/c.rb

  --- 
  h: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |-+- 07941 ahoward (ruby)
     |-+- 07942 ahoward ruby -Ilib samples/c.rb
     \--- 07943 ahoward ruby -Ilib samples/c.rb

  --- 
  i: |
    -+- 07922 ahoward ruby -Ilib samples/c.rb
     |--- 07942 ahoward (ruby)
     \-+- 07943 ahoward ruby -Ilib samples/c.rb

<========< samples/d.rb >========>

~ > cat samples/d.rb

  # forkoff supports two strategies of reading the result from the child: via
  # pipe (the default) or via file.  you can select which to use using the
  # :strategy option.
  #

    require 'forkoff'

    %w( hey you guys ).forkoff :strategy => :file do |word|
      puts "#{ word } from #{ Process.pid }"
    end

~ > ruby samples/d.rb

  hey from 7953
  you from 7954
  guys from 7955

HISTORY

1.1.0 
  - move to a model with one work queue and signals sent from consumers to
  producer to noitify ready state.  this let's smaller jobs race through a
  single process even while a larger job may have one sub-process bound up.
  incorporates a fix from http://github.com/fredrikj/forkoff which meant
  some processes would lag behind when jobs didn't have similar execution
  times.

1.0.0
  - move to github

0.0.4
  - code re-org
  - add :strategy option
  - default number of processes is 2, not 8

0.0.1

  - updated to use producer threds pushing onto a SizedQueue for each consumer
    channel.  in this way the producers do not build up a massize parllel data
    structure but provide data to the consumers only as fast as they can fork
    and proccess it.  basically for a 4 process run you'll end up with 4
    channels of size 1 between 4 produces and 4 consumers, each consumer is a
    thread popping of jobs, forking, and yielding results.

  - removed use of Queue for capturing the output.  now it's simply an array
    of arrays which removed some sync overhead.

  - you can configure the number of processes globally with

      Forkoff.default['proccess'] = 4

  - you can now pass either an options hash

      forkoff( :processes => 2 ) ...

    or plain vanilla number

      forkoff( 2 ) ...

    to the forkoff call

  - default number of processes is 8, not 2

0.0.0

  initial version