Module: ScraperUtils::MiscUtils

Defined in:
lib/scraper_utils/misc_utils.rb

Overview

Misc Standalone Utilities

Constant Summary collapse

THROTTLE_HOSTNAME =
"block"

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.default_crawl_delayObject

Returns the value of attribute default_crawl_delay.



12
13
14
# File 'lib/scraper_utils/misc_utils.rb', line 12

def default_crawl_delay
  @default_crawl_delay
end

.default_max_loadObject

Returns the value of attribute default_max_load.



12
13
14
# File 'lib/scraper_utils/misc_utils.rb', line 12

def default_max_load
  @default_max_load
end

Class Method Details

.reset_defaults!Object



14
15
16
17
18
# File 'lib/scraper_utils/misc_utils.rb', line 14

def reset_defaults!
  @default_crawl_delay = MechanizeUtils::AgentConfig.default_crawl_delay
  @default_max_load = MechanizeUtils::AgentConfig.default_max_load
  reset_throttler!
end

.reset_throttler!Object

Reset the internal throttler (useful in tests)



35
36
37
# File 'lib/scraper_utils/misc_utils.rb', line 35

def reset_throttler!
  @throttler = nil
end

.throttle_blockObject

Throttle block to be nice to servers we are scraping. Time spent inside the block (parsing, saving) counts toward the delay.



22
23
24
25
26
27
28
29
30
31
32
# File 'lib/scraper_utils/misc_utils.rb', line 22

def throttle_block
  throttler.before_request(THROTTLE_HOSTNAME)
  begin
    result = yield
    throttler.after_request(THROTTLE_HOSTNAME)
    result
  rescue StandardError => e
    throttler.after_request(THROTTLE_HOSTNAME, overloaded: HostThrottler.overload_error?(e))
    raise
  end
end

.will_pause_tillObject



39
40
41
# File 'lib/scraper_utils/misc_utils.rb', line 39

def will_pause_till
  throttler.will_pause_till(THROTTLE_HOSTNAME)
end