Class: Retriever::PageIterator

Inherits:
Fetch
  • Object
show all
Defined in:
lib/retriever/page_iterator.rb

Constant Summary

Constants inherited from Fetch

Fetch::HR

Instance Attribute Summary

Attributes inherited from Fetch

#max_pages, #result, #t

Instance Method Summary collapse

Methods inherited from Fetch

#dump, #errlog, #filter_out_querystrings, #good_response?, #lg, #start, #write

Constructor Details

#initialize(url, options, &block) ⇒ PageIterator

receives target url and RR options, and a block runs the block on all pages during crawl, pushing

the returned value of the block onto a result stack
the complete data returned from the crawl is accessible thru self.result


8
9
10
11
12
13
14
15
16
17
18
19
# File 'lib/retriever/page_iterator.rb', line 8

def initialize(url, options, &block)
  super
  start
  fail 'block required for PageIterator' unless block_given?
  @iterator = true
  @result.push(block.call @page_one)
  lg("-- PageIterator crawled- #{url}")
  async_crawl_and_collect(&block)
  # done, make sure progress bar says we are done
  @progressbar.finish if @progress
  @result.sort_by! { |x| x.length } if @result.size > 1
end