Class: Retriever::PageIterator
- Defined in:
- lib/retriever/page_iterator.rb
Constant Summary
Constants inherited from Fetch
Instance Attribute Summary
Attributes inherited from Fetch
Instance Method Summary collapse
-
#initialize(url, options, &block) ⇒ PageIterator
constructor
recieves target url and RR options, and a block runs the block on all pages during crawl, pushing the returned value of the block onto a result stack the complete data returned from the crawl is accessible thru self.result.
Methods inherited from Fetch
#dump, #errlog, #good_response?, #lg, #start, #write
Constructor Details
#initialize(url, options, &block) ⇒ PageIterator
recieves target url and RR options, and a block runs the block on all pages during crawl, pushing
the returned value of the block onto a result stack
the complete data returned from the crawl is accessible thru self.result
8 9 10 11 12 13 14 15 16 17 18 19 |
# File 'lib/retriever/page_iterator.rb', line 8 def initialize(url, , &block) super start fail 'block required for PageIterator' unless block_given? @iterator = true @result.push(block.call @page_one) lg("-- PageIterator crawled- #{url}") async_crawl_and_collect(&block) # done, make sure progress bar says we are done @progressbar.finish if @progress @result.sort_by! { |x| x.length } if @result.size > 1 end |