Graboid

Graboid

Simply awesome web scraping. Better docs later. See specs.

0.3.4 Update

http://twoism.posterous.com/new-graboid-dsl

Installation

gem install nokogiri graboid

Usage

%w{rubygems graboid}.each { |f| require f }

class RedditEntry
  include Graboid::Scraper

  selector '.entry'

  set :title
  set :domain, :selector => '.domain a'

  set :link,   :selector => '.title' do |entry| 
    entry.css('a').first['href'] 
  end

  page_with do |doc|
    doc.css('p.nextprev a').select{|a| a.text =~ /next/i  }.first['href']
  end

  before_paginate do
    puts "opening page: #{self.source}"
    puts "collection size: #{self.collection.length}"
    puts "#{"*"*100}"
  end

end

@posts = RedditEntry.new( :source => 'http://reddit.com' ).all( :max_pages => 2 )

@posts.each do |p| 
  puts "title: #{p.title}"
  puts "domain: #{p.domain}"
  puts "link: #{p.link}"
  puts "#{"*"*100}"
end

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright (c) 2010 Christopher Burnett. See LICENSE for details.