Module: ScraperUtils::MiscUtils
- Defined in:
- lib/scraper_utils/misc_utils.rb
Overview
Misc Standalone Utilities
Constant Summary collapse
- THROTTLE_HOSTNAME =
"block"
Class Attribute Summary collapse
-
.default_crawl_delay ⇒ Object
Returns the value of attribute default_crawl_delay.
-
.default_max_load ⇒ Object
Returns the value of attribute default_max_load.
Class Method Summary collapse
- .reset_defaults! ⇒ Object
-
.reset_throttler! ⇒ Object
Reset the internal throttler (useful in tests).
-
.throttle_block ⇒ Object
Throttle block to be nice to servers we are scraping.
- .will_pause_till ⇒ Object
Class Attribute Details
.default_crawl_delay ⇒ Object
Returns the value of attribute default_crawl_delay.
12 13 14 |
# File 'lib/scraper_utils/misc_utils.rb', line 12 def default_crawl_delay @default_crawl_delay end |
.default_max_load ⇒ Object
Returns the value of attribute default_max_load.
12 13 14 |
# File 'lib/scraper_utils/misc_utils.rb', line 12 def default_max_load @default_max_load end |
Class Method Details
.reset_defaults! ⇒ Object
14 15 16 17 18 |
# File 'lib/scraper_utils/misc_utils.rb', line 14 def reset_defaults! @default_crawl_delay = MechanizeUtils::AgentConfig.default_crawl_delay @default_max_load = MechanizeUtils::AgentConfig.default_max_load reset_throttler! end |
.reset_throttler! ⇒ Object
Reset the internal throttler (useful in tests)
35 36 37 |
# File 'lib/scraper_utils/misc_utils.rb', line 35 def reset_throttler! @throttler = nil end |
.throttle_block ⇒ Object
Throttle block to be nice to servers we are scraping. Time spent inside the block (parsing, saving) counts toward the delay.
22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/scraper_utils/misc_utils.rb', line 22 def throttle_block throttler.before_request(THROTTLE_HOSTNAME) begin result = yield throttler.after_request(THROTTLE_HOSTNAME) result rescue StandardError => e throttler.after_request(THROTTLE_HOSTNAME, overloaded: HostThrottler.overload_error?(e)) raise end end |
.will_pause_till ⇒ Object
39 40 41 |
# File 'lib/scraper_utils/misc_utils.rb', line 39 def will_pause_till throttler.will_pause_till(THROTTLE_HOSTNAME) end |