Module: Unbreakable

Defined in:
lib/unbreakable.rb,
lib/unbreakable/scraper.rb,
lib/unbreakable/version.rb,
lib/unbreakable/observers/log.rb,
lib/unbreakable/decorators/timeout.rb,
lib/unbreakable/observers/observer.rb,
lib/unbreakable/processors/transform.rb,
lib/unbreakable/data_storage/file_data_store.rb

Overview

When using this gem, you’ll start by defining a Scraper, with methods for retrieving and processing data. The data will be stored in DataStorage; this gem currently provides only a FileDataStore. You may enhance a datastore with Decorators and Observers: for example, a Timeout decorator to retry on timeout with exponential backoff and a Log observer which logs retrieval progress. Of course, you must also define a Processor to turn your raw data into machine-readable data.

A skeleton scraper:

require 'unbreakable'

class MyScraper < Unbreakable::Scraper
  def retrieve(*args)
    # download all the documents
  end
  def processable
    # return a list of documents to process
  end
end

class MyProcessor < Unbreakable::Processors::Transform
  def perform
    # return the transformed record as a hash, array, etc.
  end
  def persist(arg)
    # store the hash/array/etc. in Mongo, MySQL, YAML, etc.
  end
end

scraper = MyScraper.new
scraper.processor.register MyProcessor
scraper.configure do |c|
  # configure the scraper
end
scraper.run(ARGV)

Every scraper script can run as a command-line script. Try it!

ruby myscraper.rb

Defined Under Namespace

Modules: DataStorage, Decorators, Observers, Processors Classes: InvalidRemoteFile, Scraper, UnbreakableError

Constant Summary collapse

VERSION =
"0.0.2"