ganapati – Hadoop HDFS Thrift interface for Ruby
Ganapati is a Ruby thrift lib for interfacing with Hadoop’s distributed file system, HDFS. It also includes a few command line client utilities.
To install:
gem install ganapati
Starting thrift server
Documentation in Hadoop for the thrift interface to HDFS is crap. It can be found here.
As a much simpler and safer way of auto compiling and then starting the thrift interface, use the provided script:
bin/hdfs_thrift_server <port>
This will start a thrift server on the given port (after compiling the server code provided in the Hadoop distribution).
Basic Usage
require 'rubygems'
require 'ganapati'
# args are host, port, and optional timeout
client = Ganapati::Client.new 'localhost', 1234
# copy a file to hdfs
client.put("/some/file", "/some/hadoop/path")
# get a file from hadoop
client.get("/some/hadoop/path", "/local/path")
# Create a file
f = client.create("/home/someuser/afile.txt")
f.write("this is some text")
# Always, always close the file
f.close
# Create a file with code block
client.create("/home/someuser/afile.txt") { |f|
f.write("this is some text")
}
# Open a file for reading and read it
client.open('/home/someuser/afile.txt') { |f|
puts f.read
# or read for specific length from start
puts f.read(0, 4)
}
# read a file line by line
client.readlines('/home/someuser/afile.txt') { |line|
puts line
}
# Open a file for appending and append to it
client.append('/home/someuser/afile.txt') { |f|
f.write "new data"
}
## Common file methods are available (chown, chmod, mkdir, stat, etc). Examples:
# move a file
client.mv "/home/someuser/afile.txt", "/home/someuser/test.txt"
# remove a file
client.rm "/home/someuser/test.txt"
# test for file existance
client.exists? "/home/someuser/test.txt"
# get a list of all files
client.ls "/home"
client.close
# Quick and dirty way to print remote file. The run class method takes care of closing the client.
puts Ganapati::Client.run('localhost', 1234) { |c| c.open('/home/someuser/afile.txt') { |f| f.read } }
Command Line Utilities
There are a few utility programs included in the bin directory. hls provides a way to see the contents of HDFS (recursively and verbosely with appropriate command line options):
./bin/hls hdfs://host:port/tmp
hcp provides a way to copy to/from/between HDFS servers:
./bin/hcp hdfs://host:port/some/path/to/file ./file
./bin/hcp ./file hdfs://host:port/some/path/to/file
./bin/hcp hdfs://anotherhost:port/some/path/to/file hdfs://host:port/some/path/to/file