JRuby on Hadoop
JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. We recommend to use this with hadoop-rubydsl on the github / gemcutter.
Description
Install
Required gems are all on GemCutter.
-
Upgrade your rubygem to 1.3.5
-
Install gems
$ gem install jruby-on-hadoop
Usage
-
Run Hadoop cluster on your machines and set HADOOP_HOME env variable.
-
put files into your hdfs. ex) test/inputs/file1
-
Now you can run ‘joh’ like below:
$ joh examples/wordcount.rb test/inputs test/outputs
You can get Hadoop job results in your hdfs test/outputs/part-*
Example
see also examples/wordcount.rb
def setup(conf)
# setup jobconf
end
def map(key, value, output, reporter)
# mapper process
# (wordcount example)
value.split.each do |word|
output.collect(word, 1)
end
end
def reduce(key, values, output, reporter)
# reducer process
# (wordcount example)
sum = 0
values.each {|v| sum += v }
output.collect(key, sum)
end
Build
You can build hadoop-ruby.jar by “ant”.
ant
Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.19.2.
Author
Koichi Fujikawa <[email protected]>
Copyright
License: Apache License