Module: RateBeer::Scraping

Included in:
Beer::Beer, Brewery::BeerList, Brewery::Brewery, Location, Search, Style
Defined in:
lib/ratebeer/scraping.rb

Overview

The Scraping module contains a series of methods to assist with scraping pages from RateBeer.com, and dealing with the results.

Defined Under Namespace

Classes: PageNotFoundError

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#idObject (readonly)

Returns the value of attribute id.



13
14
15
# File 'lib/ratebeer/scraping.rb', line 13

def id
  @id
end

Class Method Details

.included(base) ⇒ Object

Run method on inclusion in class.



16
17
18
19
20
21
22
23
24
25
# File 'lib/ratebeer/scraping.rb', line 16

def self.included(base)
  if base.respond_to?(:data_keys)
    base.data_keys.each do |attr|
      define_method(attr) do
        send("scrape_#{attr}") unless instance_variable_defined?("@#{attr}")
        instance_variable_get("@#{attr}")
      end
    end
  end
end

.nbspObject

Emulate   character for stripping, substitution, etc.



115
116
117
# File 'lib/ratebeer/scraping.rb', line 115

def nbsp
  Nokogiri::HTML(" ").text
end

.noko_doc(url) ⇒ Object

Create Nokogiri doc from url.



103
104
105
106
107
108
109
# File 'lib/ratebeer/scraping.rb', line 103

def noko_doc(url)
  begin
    Nokogiri::HTML(open(url).read)
  rescue OpenURI::HTTPError => msg
    raise PageNotFoundError.new("Page not found - #{url}")
  end
end

Instance Method Details

#==(other_entity) ⇒ Object



53
54
55
# File 'lib/ratebeer/scraping.rb', line 53

def ==(other_entity)
  other_entity.is_a?(self.class) && id == other_entity.id
end

#fix_characters(string) ⇒ Object

Fix characters in string scraped from website.

This method substitutes problematic characters found in strings scraped from RateBeer.com



132
133
134
135
136
137
138
139
140
141
# File 'lib/ratebeer/scraping.rb', line 132

def fix_characters(string)
  string = string.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
  characters = { nbsp     => " ",
                 "\u0093" => "ž",
                 "\u0092" => "'",
                 "\u0096" => "",
                 / {2,}/ => " " }
  characters.each { |c, r| string.gsub!(c, r) }
  string.strip
end

#full_detailsObject

Return full details of the scraped entity in a Hash.



70
71
72
73
74
75
76
77
# File 'lib/ratebeer/scraping.rb', line 70

def full_details
  data = self.class
             .data_keys
             .map { |k| [k, send("#{k}")] }
             .to_h
  { id:   id,
    url:  url }.merge(data)
end

Extracts an ID# from an a element containing a link to an entity.



64
65
66
# File 'lib/ratebeer/scraping.rb', line 64

def id_from_link(node)
  node.attribute('href').value.split('/').last.to_i
end

#initialize(id, name: nil, **options) ⇒ Object

Create RateBeer::Scraper instance.

Requires an ID#, and optionally accepts a name and options parameters.

Parameters:

  • id (Integer, String)

    ID# of the entity which is to be retrieved

  • name (String) (defaults to: nil)

    Name of the entity to which ID# relates if known

  • options (hash)

    Options hash for entity created



35
36
37
38
39
40
41
# File 'lib/ratebeer/scraping.rb', line 35

def initialize(id, name: nil, **options)
  @id   = id
  @name = name unless name.nil?
  options.each do |k, v|
    instance_variable_set("@#{k.to_s}", v)
  end
end

#inspectObject



43
44
45
46
47
# File 'lib/ratebeer/scraping.rb', line 43

def inspect
  val = "#<#{self.class} ##{@id}"
  val << " - #{@name}" if instance_variable_defined?("@name")
  val << ">"
end

#page_count(doc) ⇒ Integer

Determine the number of pages in a document.

Parameters:

  • doc (Nokogiri::Doc)

    Nokogiri document to test for pagination

Returns:

  • (Integer)

    Number of pages in the document



93
94
95
96
97
98
99
# File 'lib/ratebeer/scraping.rb', line 93

def page_count(doc)
  doc.at_css('.pagination') && doc.at_css('.pagination')
                                  .css('b')
                                  .map(&:text)
                                  .map(&:to_i)
                                  .max
end

#pagination?(doc) ⇒ Boolean

Determine if data is paginated, or not.

Parameters:

  • doc (Nokogiri::Doc)

    Nokogiri document to test for pagination

Returns:

  • (Boolean)

    true, if paginated, else false



84
85
86
# File 'lib/ratebeer/scraping.rb', line 84

def pagination?(doc)
  !page_count(doc).nil?
end

#post_request(url, params) ⇒ Object

Make POST request to RateBeer form. Return a Nokogiri doc.



145
146
147
148
# File 'lib/ratebeer/scraping.rb', line 145

def post_request(url, params)
  res = Net::HTTP.post_form(url, params)
  Nokogiri::HTML(res.body)
end

#symbolize_text(text) ⇒ Object

Convert text keys to symbols



123
124
125
# File 'lib/ratebeer/scraping.rb', line 123

def symbolize_text(text)
  text.downcase.gsub(' ', '_').gsub('.', '').to_sym
end

#to_sObject



49
50
51
# File 'lib/ratebeer/scraping.rb', line 49

def to_s
  inspect
end

#urlObject



57
58
59
60
61
# File 'lib/ratebeer/scraping.rb', line 57

def url
  @url ||= if respond_to?("#{demodularized_class_name.downcase}_url", id)
             send("#{demodularized_class_name.downcase}_url", id)
           end
end