Class: Nibbler

Inherits:
Object
  • Object
show all
Defined in:
lib/nibbler.rb

Overview

A minimalistic, declarative HTML scraper

Direct Known Subclasses

Article, BlogScraper, NibblerJSON

Defined Under Namespace

Classes: JsonDocument

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(data) ⇒ Nibbler

Initialize the parser with raw data or a document



23
24
25
26
27
# File 'lib/nibbler.rb', line 23

def initialize(data)
  @doc = self.class.convert_document(data)
  # initialize plural properties
  self.class.rules.each { |name, (s, k, plural)| send("#{name}=", []) if plural }
end

Instance Attribute Details

#docObject (readonly)

Returns the value of attribute doc.



3
4
5
# File 'lib/nibbler.rb', line 3

def doc
  @doc
end

Class Method Details

.element(*args, &block) ⇒ Object

Declare a singular scraping rule



6
7
8
9
10
11
# File 'lib/nibbler.rb', line 6

def self.element(*args, &block)
  selector, name, delegate = parse_rule_declaration(*args, &block)
  rules[name] = [selector, delegate]
  attr_accessor name
  name
end

.elements(*args, &block) ⇒ Object

Declare a plural scraping rule



14
15
16
17
# File 'lib/nibbler.rb', line 14

def self.elements(*args, &block)
  name = element(*args, &block)
  rules[name] << true
end

.parse(data) ⇒ Object

Process data by creating a new scraper



20
# File 'lib/nibbler.rb', line 20

def self.parse(data) new(data).parse end

Instance Method Details

#parseObject

Parse the document and save values returned by selectors



30
31
32
33
34
35
36
37
38
39
# File 'lib/nibbler.rb', line 30

def parse
  self.class.rules.each do |target, (selector, delegate, plural)|
    if plural
      send(target).concat @doc.search(selector).map { |i| parse_result(i, delegate) }
    else
      send("#{target}=", parse_result(@doc.at(selector), delegate))
    end
  end
  self
end

#to_hashObject

Dump the extracted data into a hash with symbolized keys



42
43
44
45
46
47
48
49
# File 'lib/nibbler.rb', line 42

def to_hash
  converter = lambda { |obj| obj.respond_to?(:to_hash) ? obj.to_hash : obj }
  self.class.rules.keys.inject({}) do |hash, name|
    value = send(name)
    hash[name.to_sym] = Array === value ? value.map(&converter) : converter[value]
    hash
  end
end