Class: Grubby::Scraper

Inherits:
Object
  • Object
show all
Defined in:
lib/grubby/scraper.rb

Direct Known Subclasses

JsonScraper, PageScraper

Defined Under Namespace

Classes: Error

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ Scraper

Returns a new instance of Scraper.

Parameters:

  • source

Raises:



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/grubby/scraper.rb', line 60

def initialize(source)
  @source = source
  @scraped = {}
  @errors = {}

  self.class.fields.each do |field|
    begin
      self.send(field)
    rescue RuntimeError
    end
  end

  unless @errors.empty?
    listing = @errors.map do |field, error|
      error_class = " (#{error.class})" unless error.class == RuntimeError
      error_trace = error.backtrace.join("\n").indent(2)
      "* #{field} -- #{error.message}#{error_class}\n#{error_trace}"
    end
    raise Error.new("Failed to scrape the following fields:\n#{listing.join("\n")}")
  end
end

Instance Attribute Details

#sourceObject (readonly)

Returns The source being scraped. Typically a Mechanize pluggable parser such as Mechanize::Page.

Returns:

  • (Object)

    The source being scraped. Typically a Mechanize pluggable parser such as Mechanize::Page.



55
56
57
# File 'lib/grubby/scraper.rb', line 55

def source
  @source
end

Class Method Details

.fieldsArray<Symbol>

Returns The names of all scraped values, as defined by scrapes.

Returns:

  • (Array<Symbol>)

    The names of all scraped values, as defined by scrapes.



48
49
50
# File 'lib/grubby/scraper.rb', line 48

def self.fields
  @fields ||= []
end

.scrapes(field, optional: false) { ... } ⇒ Object

Defines an attribute reader method named by field. During initialize, the given block is called, and the attribute is set to the block’s return value. By default, if the block’s return value is nil, an exception will be raised. To prevent this behavior, set optional to true.

Parameters:

  • field (Symbol, String)

    name of the scraped value

  • optional (Boolean) (defaults to: false)

    whether to permit a nil scraped value

Yields:

  • scrapes the value

Yield Returns:

  • (Object)

    scraped value



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/grubby/scraper.rb', line 20

def self.scrapes(field, optional: false, &block)
  field = field.to_sym
  self.fields << field

  define_method(field) do
    return @scraped[field] if @scraped.key?(field)

    unless @errors.key?(field)
      begin
        value = instance_eval(&block)
        if value.nil?
          raise "`#{field}` cannot be nil" unless optional
          $log.debug("Scraped nil value for #{self.class}##{field}")
        end
        @scraped[field] = value
      rescue RuntimeError => e
        @errors[field] = e
      end
    end

    raise "`#{field}` raised a #{@errors[field].class}" if @errors.key?(field)

    @scraped[field]
  end
end

Instance Method Details

#[](field) ⇒ Object

Returns the scraped value named by field.

Parameters:

Returns:

  • (Object)

Raises:

  • (RuntimeError)

    if field is not a valid name



88
89
90
# File 'lib/grubby/scraper.rb', line 88

def [](field)
  @scraped.fetch(field.to_sym)
end

#to_hHash<Symbol, Object>

Returns all scraped values as a Hash.

Returns:

  • (Hash<Symbol, Object>)


95
96
97
# File 'lib/grubby/scraper.rb', line 95

def to_h
  @scraped.dup
end