Module: Html2rss::AutoSource::Scraper
- Defined in:
- lib/html2rss/auto_source/scraper.rb,
lib/html2rss/auto_source/scraper/html.rb,
lib/html2rss/auto_source/scraper/schema.rb,
lib/html2rss/auto_source/scraper/schema/base.rb,
lib/html2rss/auto_source/scraper/semantic_html.rb,
lib/html2rss/auto_source/scraper/semantic_html/image.rb,
lib/html2rss/auto_source/scraper/semantic_html/extractor.rb
Overview
The Scraper module contains all scrapers that can be used to extract articles. Each scraper should implement a ‘call` method that returns an array of article hashes. Each scraper should also implement an `articles?` method that returns true if the scraper can potentially be used to extract articles from the given HTML.
Defined Under Namespace
Classes: Html, NoScraperFound, Schema, SemanticHtml
Constant Summary collapse
- SCRAPERS =
[ Html, Schema, SemanticHtml ].freeze
Class Method Summary collapse
-
.from(parsed_body) ⇒ Array<Class>
Returns an array of scrapers that claim to find articles in the parsed body.
Class Method Details
.from(parsed_body) ⇒ Array<Class>
Returns an array of scrapers that claim to find articles in the parsed body.
26 27 28 29 30 31 |
# File 'lib/html2rss/auto_source/scraper.rb', line 26 def self.from(parsed_body) scrapers = SCRAPERS.select { |scraper| scraper.articles?(parsed_body) } raise NoScraperFound, 'No suitable scraper found for URL.' if scrapers.empty? scrapers end |