Class: FeldtRuby::HtmlDocGetter
- Defined in:
- lib/feldtruby/net/html_doc_getter.rb
Overview
Fetch html pages from a site but ensure some time between subsequent get operations. To minimize the risk that we “annoy” the site operators.
Instance Method Summary collapse
- #get(url) ⇒ Object
- #get_html_doc(url) ⇒ Object
-
#initialize(minTimeBetweenGets = 1.0, maxRandomDelayBetweenGets = 3.0) ⇒ HtmlDocGetter
constructor
A new instance of HtmlDocGetter.
- #set_new_delay ⇒ Object
- #wait_until_delay_passed ⇒ Object
Constructor Details
#initialize(minTimeBetweenGets = 1.0, maxRandomDelayBetweenGets = 3.0) ⇒ HtmlDocGetter
Returns a new instance of HtmlDocGetter.
8 9 10 11 12 |
# File 'lib/feldtruby/net/html_doc_getter.rb', line 8 def initialize(minTimeBetweenGets = 1.0, maxRandomDelayBetweenGets = 3.0) @min_delay = minTimeBetweenGets @delta_delay = maxRandomDelayBetweenGets - @min_delay @delay_until = Time.now - 1.0 # Ensure no wait the first time end |
Instance Method Details
#get(url) ⇒ Object
13 14 15 16 17 18 19 20 |
# File 'lib/feldtruby/net/html_doc_getter.rb', line 13 def get(url) wait_until_delay_passed() begin open(url).read ensure set_new_delay end end |
#get_html_doc(url) ⇒ Object
21 22 23 |
# File 'lib/feldtruby/net/html_doc_getter.rb', line 21 def get_html_doc(url) Nokogiri::HTML(get(url)) end |