Traverse

gem install traverse

Introduction

Traverse is a simple tool that makes it easy to traverse XML and JSON.

Let's say you're messing around with Twitter's XML public timeline. Traverse let's you do things like this:

require 'open-uri'

timeline = Traverse::Document.new(open "http://api.twitter.com/statuses/public_timeline.xml")

timeline.statuses.each do |status|
  puts "#{status.user.name} says: #{status.text}"
end

Or, let's say you're foolin' with Spotify's JSON Search API.

search = Traverse::Document.new(open "http://ws.spotify.com/search/1/track.json?q=like+a+virgin")

search.tracks.each do |track|
  puts "Track: #{track.name}"
  puts "Album: #{track.album.name}"
end

For a slightly more complicated example, take a look at a boxscore pulled from Major League Baseball's public API.

url = "http://gd2.mlb.com/components/game/mlb/year_2011/month_03/day_31/gid_2011_03_31_detmlb_nyamlb_1/boxscore.xml"
boxscore = Traverse::Document.new(open url)

boxscore.game_id # => '2011/03/31/detmlb-nyamlb-1'

boxscore.battings[0].batters[1].name_display_first_last # => 'Derek Jeter'

boxscore.battings[0].batters.select do |batter|
  batter.rbi.to_i > 0
end.count # => 4

boxscore.pitchings.find do |pitching|
  pitching.team_flag == 'away'
end.out # => '24'

Traverse's XML API

When in doubt, check the spec file.

Children

Let's say you're working with the following node of XML:

<foo>
  <bar ...>
    ...
  </bar>
</foo>

Assuming you've wrapped the XML in a Traverse::Document named foo, you can traverse to the bar node with a simple method call:

foo.bar # => returns a representation of the bar node

If there are in fact many bars inside foo, Traverse will transparently collect them in an array. You can access that array by pluralizing the name of the individual nodes:

foo.bars # => an array containing all of the bars
foo.bars.first # => grab the first bar

Traverse will also do its best to transparently collect singularly-named nodes inside of pluralized parents. Example:

<foo>
  <bars>
    <bar ...>
      ...
    </bar>
    ...
    <bar ...>
      ...
    </bar>
  </bars>
</foo>
foo.bars # => an array of all of the bars
foo.bars.first # => the first bar

This won't work if the pluralized parent node has attributes or if its children aren't all singularized versions of itself! Twitter's timeline is an example; the parent node's name is statuses, and its children are all named status, but the statuses node has an attribute.

Attributes

If your XML node has an attribute named foo, you can access that attribute with a simple method call:

my_node.foo # => return the attribute's value

Notice that there might be a conflict if your node has both an attribute named foo and a child foo. Traverse puts children first, so the attribute foo will get clobbered.

To get around this, Traverse provides a backdoor for when you really want the attribute:

my_node.foo # => grabs the child named foo
my_node['foo'] # => grabs the attribute named foo

Text

If you traverse to a node that has no attributes and contains only text, you can grab that text with a simple method call. Let's suppose you have the following XML:

<foo>
  <bar>
    This bar rocks!
  </bar>
</foo>

You can grab the text like this:

foo.bar # => "This bar rocks!"

What if bar does have attributes? Then you'll need to use the text method:

<foo>
  <bar baz="quux">
    This bar rocks!
  </bar>
</foo>
foo.bar # => grabs a representation of the node, not the text!
foo.bar.text # => "This bar rocks!"

I know what you're thinking: what the hell do I do if my node has text inside it but I want to grab its attribute named text???

foo.bar['text'] # => grab bar's attribute named 'text'
foo.bar.text # => grab bar's text contents

Traverse's JSON API

Again, when in doubt, check the spec file.

Traverse doesn't really do anything magical with JSON data; under the hood, it uses YAJL to parse JSON into a Hash, and hashes are alredy pretty traversable in Ruby. But, to maintain consistency with the XML API, you can happily traverse JSON using method calls instead of querying a hash.

Helper methods

Traverse provides a few helper methods to help with traversal.

If you're traversing some JSON, the _keys_ method will be available. It returns an array containing the names of the JSON object's keys.

search._keys_.count
# =>  2

search.first._keys_
# =>  ["user","favorited","source","id","text","created_at"]

If you're looking at XML, the _children_ and _attributes_ methods will be available. The _children_ method gives you an array of traversable child nodes, and _attributes_ gives you a hash of attribute-name/attribute-value pairs.

Contributors!

Traverse wouldn't be possible without help from friends. Thanks!