Typhoeus

http://github.com/pauldix/typhoeus/tree/master

the mailing list

Thanks to my employer kgbweb for allowing me to release this as open source. Btw, we’re hiring and we work on cool stuff like this every day. Get a hold of me if you rock at rails/js/html/css or if you have experience in search, information retrieval, and machine learning.

I also wanted to thank Todd A. Fisher. I ripped a good chunk of the c libcurl-multi code from his update to Curb. Awesome stuff Todd!

Summary

Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic. To be a little more specific, it’s a library for accessing web services in Ruby. It’s specifically designed for building RESTful service oriented architectures in Ruby that need to be fast enough to process calls to multiple services within the client’s HTTP request/response life cycle.

Some of the awesome features are parallel request execution, memoization of request responses (so you don’t make the same request multiple times in a single group), built in support for caching responses to memcached (or whatever), a nifty DSL for creating classes that make http calls and process responses, and mocking capability baked in. It uses libcurl and libcurl-multi to work this speedy magic. I wrote the c bindings myself so it’s yet another Ruby libcurl library, but with some extra awesomeness added in.

Installation

For now Typhoeus exists only on github. It requires you to have a current version of libcurl installed. I’ve tested this with 7.19.4.


gem sources -a http://gems.github.com # if you haven’t already
gem install pauldix-typhoeus
If you’re on Debian or Ubuntu and getting errors while trying to install, it could be because you don’t have the latest version of libcurl installed. Do this to fix:

sudo apt-get install libcurl4-gnutls-dev

Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for Typhoeus to work! The version in Mac Ports is old and doesn’t play nice. You should download curl and build from source. Then you’ll have to install the gem again.

If you’re still having issues, please let me know on the mailing list.

There’s one other thing you should know. The Easy object (which is just a libcurl thing) allows you to set timeout values in milliseconds. However, for this to work you need to build libcurl with c-ares support built in. Unfortunately, I still haven’t been able to get this to work on either my Mac or Linux. I’ll have to figure that out.

Usage


require 'rubygems'
require 'typhoeus'
require 'json'  

# here's an example for twitter search
# Including Typhoeus adds http methods like get, put, post, and delete.
# What's more interesting though is the stuff to build up what I call
# remote_methods.
class Twitter
  include Typhoeus
  remote_defaults :on_success => lambda {|response| JSON.parse(response.body)},
                  :on_failure => lambda {|response| puts "error code: #{response.code}"},
                  :base_uri   => "http://search.twitter.com"

  define_remote_method :search, :path => '/search.json'
  define_remote_method :trends, :path => '/trends/:time_frame.json'
end

tweets = Twitter.search(:params => {:q => "railsconf"})

# if you look at the path argument for the :trends method, it has :time_frame.
# this tells it to add in a parameter called :time_frame that gets interpolated
# and inserted.
trends = Twitter.trends(:time_frame => :current)

# and then the calls don't actually happen until the first time you
# call a method on one of the objects returned from the remote_method
puts tweets.keys # it's a hash from parsed JSON

# you can also do things like override any of the default parameters
Twitter.search(:params => {:q => "hi"}, :on_success => lambda {|response| puts response.body})

# on_success and on_failure lambdas take a response object. 
# It has four accesssors: code, body, headers, and time

# here's and example of memoization
twitter_searches = []
10.times do
  twitter_searches << Twitter.search(:params => {:q => "railsconf"})
end

# this next part will actually make the call. However, it only makes one
# http request and parses the response once. The rest are memoized.
twitter_searches.each {|s| puts s.keys}

# you can also have it cache responses and do gets automatically
# here we define a remote method that caches the responses for 60 seconds
klass = Class.new do
  include Typhoeus
  
  define_remote_method :foo, :base_uri => "http://localhost:3001", :cache_responses => 60
end

klass.cache = some_memcached_instance_or_whatever
response = klass.foo 
puts response.body # makes the request

second_response = klass.foo
puts response.body # pulls from the cache without making a request

# you can also pass timeouts on the define_remote_method or as a parameter
# Note that timeouts are in milliseconds.
Twitter.trends(:time_frame => :current, :timeout => 2000)

# you also get the normal get, put, post, and delete methods
class Remote
  include Typhoeus
end

Remote.get("http://www.pauldix.net")
Remote.put("http://", :body => "this is a request body")
Remote.post("http://localhost:3001/posts.xml", 
  {:params => {:post => {:author => "paul", :title => "a title", :body => "a body"}}})
Remote.delete("http://localhost:3001/posts/1")

# you also have the ability to set request headers. So you can set your user agent and manually
Remote.get("http://www.pauldix.net", :headers => {"User-Agent" => "typhoeus", "If-None-Match" => "some etag"})

# and do things like basic HTTP authentication
require 'base64'
Remote.get("http://twitter.com/statuses/followers.json", 
           :headers => {"Authorization" => "Basic #{Base64.b64encode("login:password")}"})

# body and headers arguments also get passed through on defined remote methods.
class TwitterRestAPI
  include Typhoeus
  remote_defaults :on_success => lambda {|response| JSON.parse(response.body)},
                  :on_failure => lambda {|response| puts "error code: #{response.code}"},
                  :base_uri   => "http://twitter.com"

  define_remote_method :followers, 
                       :path => '/statuses/followers.json', 
                       :headers => {"Authorization" => "Basic #{Base64.b64encode("twitter_id:password")}"}
end

# The response object returned by get, put, post, and delete is passed to the on_success 
# or on_failure lambda block if declared.
# The return value of the lambda block is then what is returned by the remote method invocation.
# The response object can do the following:
response.code    # the http return code
response.body    # the body of the response
response.headers # the response headers
response.time    # the response time in seconds

# Typhoeus also has a nifty mocking framework built in
# mock all calls to get
Remote.mock(:get, :code => 200, :body => "whatever")

# here we mock calls to get for the url
Remote.mock(:get, :url => "http://pauldix.net", :code => 200, :body => "hi", :headers => "there", :time => 2)

# note that url, code, body, headers, and time are all optional parameters to mock.
# the first parameter can be either :get, :put, :post, or :delete

# you can also provide headers and body that are expected on the call. An exception will be raised if they don't match
Remote.mock(:get, :url => "http://pauldix.net", :expected_headers => {"If-None-Match" => "sldfkj234"})
Remote.mock(:put, :url => "http://pauldix.net", :expected_body => "this is a body!")

# using that mocking you could mock out the Twitter client like so:
Twitter.mock(:get, :body => '{"hi": "there"}')
# now any calls to trends, or search will get the mock and call the on_success handler. the response object will have that body.
# we could also mock out a failure like so
Twitter.mock(:get, :body => '{"fail": "oh noes!"}', :code => 500)
# now calls to a remote method will result in the on_failure handler being called

The best place to see the functionality of what including Typhoeus in a class gives you is to look at the “remote_spec.rb”

Benchmarks

I set up a benchmark to test how the parallel performance works vs Ruby’s built in NET::HTTP. The setup was a local evented HTTP server that would take a request, sleep for 500 milliseconds and then issued a blank response. I set up the client to call this 20 times. Here are the results:


  net::http  0.030000   0.010000   0.040000 ( 10.054327)
  typhoeus   0.020000   0.070000   0.090000 (  0.508817)

We can see from this that NET::HTTP performs as expected, taking 10 seconds to run 20 500ms requests. Typhoeus only takes 500ms (the time of the response that took the longest.) One other thing to note is that Typhoeus keeps a pool of libcurl Easy handles to use. For this benchmark I warmed the pool first. So if you test this out it may be a bit slower until the Easy handle pool has enough in it to run all the simultaneous requests. For some reason the easy handles can take quite some time to allocate.

Next Steps

  • Write up some more examples.
  • Create a SimpleDB client library using Typhoeus.
  • Create or get someone to create a CouchDB client library using Typhoeus.
  • Add support for automatic retry, exponential back-off, and queuing for later.
  • Add in the support for custom get and set methods on the cache.
  • Add in support for integrated HTTP caching with Memcached.

LICENSE

(The MIT License)

Copyright © 2009:

Paul Dix

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ‘AS IS’, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.