Class: FeldtRuby::HtmlDocGetter

Inherits:
Object
  • Object
show all
Defined in:
lib/feldtruby/net/html_doc_getter.rb

Overview

Fetch html pages from a site but ensure some time between subsequent get operations. To minimize the risk that we “annoy” the site operators.

Instance Method Summary collapse

Constructor Details

#initialize(minTimeBetweenGets = 1.0, maxRandomDelayBetweenGets = 3.0) ⇒ HtmlDocGetter

Returns a new instance of HtmlDocGetter.



8
9
10
11
12
# File 'lib/feldtruby/net/html_doc_getter.rb', line 8

def initialize(minTimeBetweenGets = 1.0, maxRandomDelayBetweenGets = 3.0)
  @min_delay = minTimeBetweenGets
  @delta_delay = maxRandomDelayBetweenGets - @min_delay
  @delay_until = Time.now - 1.0 # Ensure no wait the first time
end

Instance Method Details

#get(url) ⇒ Object



13
14
15
16
17
18
19
20
# File 'lib/feldtruby/net/html_doc_getter.rb', line 13

def get(url)
  wait_until_delay_passed()
  begin
    open(url).read
  ensure
    set_new_delay
  end
end

#get_html_doc(url) ⇒ Object



21
22
23
# File 'lib/feldtruby/net/html_doc_getter.rb', line 21

def get_html_doc(url)
  Nokogiri::HTML(get(url))
end

#set_new_delayObject



28
29
30
# File 'lib/feldtruby/net/html_doc_getter.rb', line 28

def set_new_delay
  @delay_until = Time.now + (@min_delay + rand() * @delta_delay)
end

#wait_until_delay_passedObject



24
25
26
27
# File 'lib/feldtruby/net/html_doc_getter.rb', line 24

def wait_until_delay_passed
  now = Time.now
  sleep(@delay_until - now) if now < @delay_until
end