Class: Faceoff::Pagelet
- Inherits:
-
Object
- Object
- Faceoff::Pagelet
- Defined in:
- lib/faceoff/pagelet.rb
Overview
The pagelet class is used to parse out facebook javascript dynamic content on a given html page.
Class Method Summary collapse
-
.parse(html, type = nil) ⇒ Object
Parses an html string and returns a hash of Nokogiri::HTML::Document objects, indexed by page area: Pagelet.parse html #=> => <#OBJ>, :top_bar => <#OBJ>….
-
.regex_for(name) ⇒ Object
Returns a regex to retrieve the given pagelet.
Class Method Details
.parse(html, type = nil) ⇒ Object
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/faceoff/pagelet.rb', line 18 def self.parse html, type=nil pagelet = nil matches = html.scan regex_for(type) matches.each do |name, html| html = JSON.parse("[\"#{html}\"]").first html_doc = Nokogiri::HTML.parse html return html_doc if type pagelet ||= {} pagelet[name.to_sym] = html_doc end pagelet end |
.regex_for(name) ⇒ Object
Returns a regex to retrieve the given pagelet.
39 40 41 42 |
# File 'lib/faceoff/pagelet.rb', line 39 def self.regex_for name name ||= "\\w+" %r{<script>.*"pagelet_(#{name})":"(.*)"\},"page_cache":.*\}\);</script>} end |