Module: Html2rss::AutoSource::Scraper

Defined in:: lib/html2rss/auto_source/scraper.rb,
lib/html2rss/auto_source/scraper/html.rb,
lib/html2rss/auto_source/scraper/schema.rb,
lib/html2rss/auto_source/scraper/schema/thing.rb,
lib/html2rss/auto_source/scraper/semantic_html.rb,
lib/html2rss/auto_source/scraper/schema/item_list.rb,
lib/html2rss/auto_source/scraper/schema/list_item.rb,
lib/html2rss/auto_source/scraper/semantic_html/image.rb,
lib/html2rss/auto_source/scraper/semantic_html/extractor.rb more...

Overview

The Scraper module contains all scrapers that can be used to extract articles. Each scraper should implement a ‘call` method that returns an array of article hashes. Each scraper should also implement an `articles?` method that returns true if the scraper can potentially be used to extract articles from the given HTML.

Defined Under Namespace

Classes: Html, NoScraperFound, Schema, SemanticHtml

Constant Summary collapse

SCRAPERS =

[
  Html,
  Schema,
  SemanticHtml
].freeze

Class Method Summary collapse

.from(parsed_body) ⇒ Array<Class>

Returns an array of scrapers that claim to find articles in the parsed body.

Class Method Details

permalink .from(parsed_body) ⇒ `Array<Class>`

Returns an array of scrapers that claim to find articles in the parsed body.

Parameters:

parsed_body (Nokogiri::HTML::Document) —

The parsed HTML body.

Returns:

(Array<Class>) —

An array of scraper classes that can handle the parsed body.

Raises:

(NoScraperFound)

[View source]

# File 'lib/html2rss/auto_source/scraper.rb', line 26

def self.from(parsed_body)
  scrapers = SCRAPERS.select { |scraper| scraper.articles?(parsed_body) }
  raise NoScraperFound, 'No suitable scraper found for URL.' if scrapers.empty?

  scrapers
end