Class: ExtraLoop::ScraperBase
- Inherits:
-
Object
- Object
- ExtraLoop::ScraperBase
- Includes:
- Hookable, Loggable, Utils::Support
- Defined in:
- lib/extraloop/loggable.rb,
lib/extraloop/scraper_base.rb
Overview
Monkey patches ScraperBase.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#options ⇒ Object
readonly
Returns the value of attribute options.
-
#results ⇒ Object
readonly
Returns the value of attribute results.
Instance Method Summary collapse
- #base_initialize ⇒ Object
-
#extract(*args, &block) ⇒ Object
Public: Registers a new extractor to be added to the loop.
-
#initialize(urls, options = {}, arguments = {}) ⇒ ScraperBase
constructor
Public: Initalizes a web scraper.
-
#loop_on(*args, &block) ⇒ Object
Public: Sets the scraper extraction loop.
-
#run ⇒ Object
Public: Runs the main scraping loop.
Methods included from Utils::Support
Methods included from Hookable
Constructor Details
#initialize(urls, options = {}, arguments = {}) ⇒ ScraperBase
Public: Initalizes a web scraper.
urls - One or several urls. options - Hash of scraper options
async : Whether the scraper should issue HTTP requests in series or in parallel (set to false to suppress logging completely).
log : logging options (defaults to standard error).
appenders : specifies where the log messages should be appended to (defaults to standard error).
log_level : specifies the log level (defaults to info).
arguments - Hash of arguments to be passed to the Typhoeus HTTP client (optional).
Returns itself.
59 60 61 62 63 |
# File 'lib/extraloop/loggable.rb', line 59 def initialize(*args) base_initialize(*args) init_log! self end |
Instance Attribute Details
#options ⇒ Object (readonly)
Returns the value of attribute options.
6 7 8 |
# File 'lib/extraloop/scraper_base.rb', line 6 def @options end |
#results ⇒ Object (readonly)
Returns the value of attribute results.
6 7 8 |
# File 'lib/extraloop/scraper_base.rb', line 6 def results @results end |
Instance Method Details
#base_initialize ⇒ Object
50 |
# File 'lib/extraloop/loggable.rb', line 50 alias_method :base_initialize, :initialize |
#extract(*args, &block) ⇒ Object
Public: Registers a new extractor to be added to the loop.
Delegates to Extractor, will raise an exception if neither a selector, a block, or an attribute name is provided.
selector - The CSS3 selector identifying the node list over which iterate (optional). callback - A block of code (optional). attribute - An attribute name (optional).
Returns itself.
81 82 83 84 85 |
# File 'lib/extraloop/scraper_base.rb', line 81 def extract(*args, &block) args << block if block @extractor_args << args self end |
#loop_on(*args, &block) ⇒ Object
Public: Sets the scraper extraction loop.
Delegates to Extractor, will raise an exception if neither a selector, a block, or an attribute name is provided.
selector - The CSS3 selector identifying the node list over which iterate (optional). attribute - An attribute name (optional).
callback - A block of code (optional).
Returns itself.
62 63 64 65 66 67 |
# File 'lib/extraloop/scraper_base.rb', line 62 def loop_on(*args, &block) args << block if block # prepend placeholder values for loop name and extraction environment @loop_extractor_args = args.insert(0, nil, nil) self end |
#run ⇒ Object
Public: Runs the main scraping loop.
Returns nothing
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/extraloop/scraper_base.rb', line 92 def run @urls.each do |url| issue_request(url) # if the scraper is asynchronous start processing the Hydra HTTP queue # only after that the last url has been appended to the queue (see #issue_request). # if @options[:async] if url == @urls.last @hydra.run end else @hydra.run end end self end |