Class: ExtraLoop::ScraperBase

async        : Whether the scraper should issue HTTP requests in series or in parallel (set to false to suppress logging completely).
log          : logging options (defaults to standard error).
  appenders    : specifies where the log messages should be appended to (defaults to standard error).
  log_level    : specifies the log level (defaults to info).

arguments - Hash of arguments to be passed to the Typhoeus HTTP client (optional).

Returns itself.

# File 'lib/extraloop/loggable.rb', line 59

def initialize(*args)
  base_initialize(*args)
  init_log!
  self
end

Instance Attribute Details

#options ⇒ `Object` (readonly)

Returns the value of attribute options.



6
7
8

# File 'lib/extraloop/scraper_base.rb', line 6

def options
  @options
end

#results ⇒ `Object` (readonly)

Returns the value of attribute results.



6
7
8

# File 'lib/extraloop/scraper_base.rb', line 6

def results
  @results
end

Instance Method Details

#base_initialize ⇒ `Object`

50	# File 'lib/extraloop/loggable.rb', line 50 alias_method :base_initialize, :initialize

#extract(*args, &block) ⇒ `Object`

Public: Registers a new extractor to be added to the loop.

Delegates to Extractor, will raise an exception if neither a selector, a block, or an attribute name is provided.

selector - The CSS3 selector identifying the node list over which iterate (optional). callback - A block of code (optional). attribute - An attribute name (optional).

Returns itself.

# File 'lib/extraloop/scraper_base.rb', line 81

def extract(*args, &block)
  args << block if block
  @extractor_args << args
  self
end

#loop_on(*args, &block) ⇒ `Object`

Public: Sets the scraper extraction loop.

Delegates to Extractor, will raise an exception if neither a selector, a block, or an attribute name is provided.

selector - The CSS3 selector identifying the node list over which iterate (optional). attribute - An attribute name (optional).

callback - A block of code (optional).

Returns itself.

# File 'lib/extraloop/scraper_base.rb', line 62

def loop_on(*args, &block)
  args << block if block
  # prepend placeholder values for loop name and extraction environment
  @loop_extractor_args = args.insert(0, nil, nil)
  self
end

#run ⇒ `Object`

Public: Runs the main scraping loop.

Returns nothing

# File 'lib/extraloop/scraper_base.rb', line 92

def run
  @urls.each do |url|
    issue_request(url)

    # if the scraper is asynchronous start processing the Hydra HTTP queue 
    # only after that the last url has been appended to the queue (see #issue_request).
    #
    if @options[:async]
      if url == @urls.last
        @hydra.run
      end
    else
      @hydra.run
    end
  end
  self
end

Class: ExtraLoop::ScraperBase

Overview

Direct Known Subclasses

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Utils::Support

Methods included from Hookable

Constructor Details

#initialize(urls, options = {}, arguments = {}) ⇒ ScraperBase

Instance Attribute Details

#options ⇒ Object (readonly)

#results ⇒ Object (readonly)

Instance Method Details

#base_initialize ⇒ Object

#extract(*args, &block) ⇒ Object

#loop_on(*args, &block) ⇒ Object

#run ⇒ Object

#initialize(urls, options = {}, arguments = {}) ⇒ `ScraperBase`

#options ⇒ `Object` (readonly)

#results ⇒ `Object` (readonly)

#base_initialize ⇒ `Object`

#extract(*args, &block) ⇒ `Object`

#loop_on(*args, &block) ⇒ `Object`

#run ⇒ `Object`