Class: Nibbler
- Inherits:
-
Object
- Object
- Nibbler
- Defined in:
- lib/nibbler.rb
Overview
A minimalistic, declarative HTML scraper
Direct Known Subclasses
Defined Under Namespace
Classes: JsonDocument
Instance Attribute Summary collapse
-
#doc ⇒ Object
readonly
Returns the value of attribute doc.
Class Method Summary collapse
-
.element(*args, &block) ⇒ Object
Declare a singular scraping rule.
-
.elements(*args, &block) ⇒ Object
Declare a plural scraping rule.
-
.parse(data) ⇒ Object
Process data by creating a new scraper.
Instance Method Summary collapse
-
#initialize(data) ⇒ Nibbler
constructor
Initialize the parser with raw data or a document.
-
#parse ⇒ Object
Parse the document and save values returned by selectors.
-
#to_hash ⇒ Object
Dump the extracted data into a hash with symbolized keys.
Constructor Details
#initialize(data) ⇒ Nibbler
Initialize the parser with raw data or a document
23 24 25 26 27 |
# File 'lib/nibbler.rb', line 23 def initialize(data) @doc = self.class.convert_document(data) # initialize plural properties self.class.rules.each { |name, (s, k, plural)| send("#{name}=", []) if plural } end |
Instance Attribute Details
#doc ⇒ Object (readonly)
Returns the value of attribute doc.
3 4 5 |
# File 'lib/nibbler.rb', line 3 def doc @doc end |
Class Method Details
.element(*args, &block) ⇒ Object
Declare a singular scraping rule
6 7 8 9 10 11 |
# File 'lib/nibbler.rb', line 6 def self.element(*args, &block) selector, name, delegate = parse_rule_declaration(*args, &block) rules[name] = [selector, delegate] attr_accessor name name end |
.elements(*args, &block) ⇒ Object
Declare a plural scraping rule
14 15 16 17 |
# File 'lib/nibbler.rb', line 14 def self.elements(*args, &block) name = element(*args, &block) rules[name] << true end |
.parse(data) ⇒ Object
Process data by creating a new scraper
20 |
# File 'lib/nibbler.rb', line 20 def self.parse(data) new(data).parse end |
Instance Method Details
#parse ⇒ Object
Parse the document and save values returned by selectors
30 31 32 33 34 35 36 37 38 39 |
# File 'lib/nibbler.rb', line 30 def parse self.class.rules.each do |target, (selector, delegate, plural)| if plural send(target).concat @doc.search(selector).map { |i| parse_result(i, delegate) } else send("#{target}=", parse_result(@doc.at(selector), delegate)) end end self end |
#to_hash ⇒ Object
Dump the extracted data into a hash with symbolized keys
42 43 44 45 46 47 48 49 |
# File 'lib/nibbler.rb', line 42 def to_hash converter = lambda { |obj| obj.respond_to?(:to_hash) ? obj.to_hash : obj } self.class.rules.keys.inject({}) do |hash, name| value = send(name) hash[name.to_sym] = Array === value ? value.map(&converter) : converter[value] hash end end |