Class: Ninja2k::Scraper
- Inherits:
-
Object
- Object
- Ninja2k::Scraper
- Defined in:
- lib/ninja2k/scraper.rb
Overview
Scraper will load up a specified resource, and search the page using a combination of your seletor and any clues given. It provides a hooking mechanism so you can override the default parsing action (split on
, one row for each item found)
Instance Attribute Summary collapse
-
#selector ⇒ String
The xpath selector to use when searching for clues.
-
#url ⇒ String
The url we will scrape from.
Instance Method Summary collapse
-
#add_hook(clue, p_roc) ⇒ Object
Adds a hook to the hook hash.
-
#clues ⇒ Array
The clues we are going to look for with the selector in the document returned by url.
-
#clues=(value) ⇒ Object
Sets the clues for the scraper.
-
#hooks ⇒ Hash
A hash of Proc object to call when parsing each item found by the selector and clue combination.
- #hooks=(hash) ⇒ Object
-
#initialize(url, selector, options = {}) ⇒ Scraper
constructor
Creates a new Scraper.
-
#output ⇒ Array
The output from scraping as an array This is populated by the scrape or to_xlsx methods.
-
#package ⇒ Axlsx::Package
The axlsx package used for xlsx serialization.
-
#scrape ⇒ Array
Scrapes the resourse using the clues and hooks provided.
-
#to_xlsx(filename = false) ⇒ Axlsx::Package
seralizes the output to xlsx.
Constructor Details
#initialize(url, selector, options = {}) ⇒ Scraper
Creates a new Scraper
35 36 37 38 39 40 41 |
# File 'lib/ninja2k/scraper.rb', line 35 def initialize(url, selector, ={}) self.url = url self.selector = selector .each do |o| self.send("#{o[0]}=", o[1]) if self.respond_to? "#{o[0]}=" end end |
Instance Attribute Details
#selector ⇒ String
The xpath selector to use when searching for clues
49 50 51 |
# File 'lib/ninja2k/scraper.rb', line 49 def selector @selector end |
#url ⇒ String
The url we will scrape from
45 46 47 |
# File 'lib/ninja2k/scraper.rb', line 45 def url @url end |
Instance Method Details
#add_hook(clue, p_roc) ⇒ Object
Adds a hook to the hook hash
80 81 82 |
# File 'lib/ninja2k/scraper.rb', line 80 def add_hook(clue, p_roc) hooks[clue] = p_roc end |
#clues ⇒ Array
The clues we are going to look for with the selector in the document returned by url
114 115 116 |
# File 'lib/ninja2k/scraper.rb', line 114 def clues @clues ||= [] end |
#clues=(value) ⇒ Object
Sets the clues for the scraper
121 122 123 124 |
# File 'lib/ninja2k/scraper.rb', line 121 def clues=(value) raise ArugmentError, 'clues must be an array of strings to search for with your selector' unless value.is_a?(Array) @clues = value end |
#hooks ⇒ Hash
A hash of Proc object to call when parsing each item found by the selector and clue combination. The element found will be passed to the member of this hash that uses the clue as a key
65 66 67 |
# File 'lib/ninja2k/scraper.rb', line 65 def hooks @hooks ||= {} end |
#hooks=(hash) ⇒ Object
70 71 72 73 |
# File 'lib/ninja2k/scraper.rb', line 70 def hooks=(hash) raise ArgumentError, 'Hooks must be a hash of procs to call when scraping each clue' unless hash.is_a?(Hash) @hooks = hash end |
#output ⇒ Array
The output from scraping as an array This is populated by the scrape or to_xlsx methods
55 56 57 |
# File 'lib/ninja2k/scraper.rb', line 55 def output @output ||= [] end |
#package ⇒ Axlsx::Package
The axlsx package used for xlsx serialization
129 130 131 |
# File 'lib/ninja2k/scraper.rb', line 129 def package @package ||= Axlsx::Package.new end |
#scrape ⇒ Array
Scrapes the resourse using the clues and hooks provided
87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/ninja2k/scraper.rb', line 87 def scrape @package = nil @output = [] clues.each do |clue| if detail = parse_clue(clue) output << [clue, detail.pop] detail.each { |datum| output << ['', datum] } end end output end |
#to_xlsx(filename = false) ⇒ Axlsx::Package
seralizes the output to xlsx. If you do not specify the file_name parameter The package will be created, but not serialized to disk. This means you can use the return value to stream the data using to_xlsx(false).to_stream.read
106 107 108 109 |
# File 'lib/ninja2k/scraper.rb', line 106 def to_xlsx(filename=false) scrape serialize(filename) end |