Misc Utilities

Throttling Requests

Use ScraperUtils::MiscUtils.throttle_block to automatically pace requests based on server response time:

response = ScraperUtils::MiscUtils.throttle_block do
  HTTParty.get(url)
end
# process response

The throttle automatically:

  • Measures block execution time
  • Adds 0.5s delay (configurable via extra_delay:)
  • Pauses before next request based on previous timing
  • Caps pause at 120s maximum

Override the next pause duration manually if needed:

ScraperUtils::MiscUtils.pause_duration = 2.0

Note: the agent returned by ScraperUtils::MechanizeUtils.mechanize_agent automatically applies throttling when each request is made and thus does not need to be wrapped with the helper.