Class: Wgit::Base

Inherits:
Object
  • Object
show all
Extended by:
DSL
Defined in:
lib/wgit/base.rb

Overview

Class to inherit from, as an alternative form of using the Wgit::DSL. All subclasses must define a #parse(doc, &block) method.

Constant Summary

Constants included from DSL

DSL::DSL_ERROR__NO_START_URL

Class Method Summary collapse

Instance Method Summary collapse

Methods included from DSL

crawl, crawl_site, empty_db!, extract, follow, index, index_site, index_www, last_response, reset, search, start, use_crawler, use_database

Class Method Details

.mode(method) ⇒ Object

Sets the crawl/index method to call when Base.run is called. The mode method must match one defined in the Wgit::Crawler or Wgit::Indexer class.

Parameters:

  • method (Symbol)

    The crawl/index method to call.



35
36
37
# File 'lib/wgit/base.rb', line 35

def self.mode(method)
  @method = method
end

.run(&block) ⇒ Object

Runs the crawl/index passing each crawled Wgit::Document and the given block to the subclass's #parse method.



15
16
17
18
19
20
21
22
23
24
25
26
27
28
# File 'lib/wgit/base.rb', line 15

def self.run(&block)
  crawl_method = @method || :crawl
  obj = new

  unless obj.respond_to?(:parse)
    raise "#{obj.class} must respond_to? #parse(doc, &block)"
  end

  obj.setup
  send(crawl_method) { |doc| obj.parse(doc, &block) }
  obj.teardown

  obj
end

Instance Method Details

#setupObject

Runs once before the crawl/index is run. Override as needed.



8
# File 'lib/wgit/base.rb', line 8

def setup; end

#teardownObject

Runs once after the crawl/index is complete. Override as needed.



11
# File 'lib/wgit/base.rb', line 11

def teardown; end