Module: Unbreakable
- Defined in:
- lib/unbreakable.rb,
lib/unbreakable/scraper.rb,
lib/unbreakable/version.rb,
lib/unbreakable/observers/log.rb,
lib/unbreakable/decorators/timeout.rb,
lib/unbreakable/observers/observer.rb,
lib/unbreakable/processors/transform.rb,
lib/unbreakable/data_storage/file_data_store.rb
Overview
When using this gem, you’ll start by defining a Scraper, with methods for retrieving and processing data. The data will be stored in DataStorage; this gem currently provides only a FileDataStore. You may enhance a datastore with Decorators and Observers: for example, a Timeout decorator to retry on timeout with exponential backoff and a Log observer which logs retrieval progress. Of course, you must also define a Processor to turn your raw data into machine-readable data.
A skeleton scraper:
require 'unbreakable'
class MyScraper < Unbreakable::Scraper
def retrieve(*args)
# download all the documents
end
def processable
# return a list of documents to process
end
end
class MyProcessor < Unbreakable::Processors::Transform
def perform
# return the transformed record as a hash, array, etc.
end
def persist(arg)
# store the hash/array/etc. in Mongo, MySQL, YAML, etc.
end
end
scraper = MyScraper.new
scraper.processor.register MyProcessor
scraper.configure do |c|
# configure the scraper
end
scraper.run(ARGV)
Every scraper script can run as a command-line script. Try it!
ruby myscraper.rb
Defined Under Namespace
Modules: DataStorage, Decorators, Observers, Processors Classes: InvalidRemoteFile, Scraper, UnbreakableError
Constant Summary collapse
- VERSION =
"0.0.2"