Module: Html2rss::AutoSource::Scraper
- Defined in:
- lib/html2rss/auto_source/scraper.rb,
lib/html2rss/auto_source/scraper/html.rb,
lib/html2rss/auto_source/scraper/schema.rb,
lib/html2rss/auto_source/scraper/schema/thing.rb,
lib/html2rss/auto_source/scraper/semantic_html.rb,
lib/html2rss/auto_source/scraper/schema/item_list.rb,
lib/html2rss/auto_source/scraper/schema/list_item.rb,
lib/html2rss/auto_source/scraper/semantic_html/image.rb,
lib/html2rss/auto_source/scraper/semantic_html/extractor.rb
Overview
The Scraper module contains all scrapers that can be used to extract articles. Each scraper should implement a call method that returns an array of article hashes. Each scraper should also implement an articles? method that returns true if the scraper can potentially be used to extract articles from the given HTML.
Defined Under Namespace
Classes: Html, NoScraperFound, Schema, SemanticHtml
Constant Summary collapse
- SCRAPERS =
[ Html, Schema, SemanticHtml ].freeze
Class Method Summary collapse
-
.from(parsed_body) ⇒ Array<Class>
Returns an array of scrapers that claim to find articles in the parsed body.
Class Method Details
.from(parsed_body) ⇒ Array<Class>
Returns an array of scrapers that claim to find articles in the parsed body.
26 27 28 29 30 31 |
# File 'lib/html2rss/auto_source/scraper.rb', line 26 def self.from(parsed_body) scrapers = SCRAPERS.select { |scraper| scraper.articles?(parsed_body) } raise NoScraperFound, 'No suitable scraper found for URL.' if scrapers.empty? scrapers end |