Class: Html2rss::AutoSource
- Inherits:
-
Object
- Object
- Html2rss::AutoSource
- Defined in:
- lib/html2rss/auto_source.rb,
lib/html2rss/auto_source/article.rb,
lib/html2rss/auto_source/channel.rb,
lib/html2rss/auto_source/cleanup.rb,
lib/html2rss/auto_source/reducer.rb,
lib/html2rss/auto_source/scraper.rb,
lib/html2rss/auto_source/rss_builder.rb,
lib/html2rss/auto_source/scraper/html.rb,
lib/html2rss/auto_source/scraper/schema.rb,
lib/html2rss/auto_source/scraper/schema/thing.rb,
lib/html2rss/auto_source/scraper/semantic_html.rb,
lib/html2rss/auto_source/scraper/schema/item_list.rb,
lib/html2rss/auto_source/scraper/schema/list_item.rb,
lib/html2rss/auto_source/scraper/semantic_html/image.rb,
lib/html2rss/auto_source/scraper/semantic_html/extractor.rb more...
Overview
The AutoSource class is responsible for extracting channel and articles from a given URL. It uses a set of ArticleExtractors to extract articles, utilizing popular ways of marking articles, e.g. schema, microdata, open graph, etc.
Defined Under Namespace
Modules: Scraper Classes: Article, Channel, Cleanup, NoArticlesFound, Reducer, RssBuilder
Instance Method Summary collapse
- #articles ⇒ Object
- #build ⇒ Object
- #channel ⇒ Object
-
#initialize(url, body:, headers: {}) ⇒ AutoSource
constructor
A new instance of AutoSource.
Constructor Details
permalink #initialize(url, body:, headers: {}) ⇒ AutoSource
Returns a new instance of AutoSource.
20 21 22 23 24 |
# File 'lib/html2rss/auto_source.rb', line 20 def initialize(url, body:, headers: {}) @url = url @body = body @headers = headers end |
Instance Method Details
permalink #articles ⇒ Object
[View source]
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/html2rss/auto_source.rb', line 40 def articles @articles ||= Scraper.from(parsed_body).flat_map do |scraper| instance = scraper.new(parsed_body, url:) articles_in_thread = Parallel.map(instance.each) do |article_hash| Log.debug "Scraper: #{scraper} in worker: #{Parallel.worker_number} [#{article_hash[:url]}]" Article.new(**article_hash, scraper:) end Reducer.call(articles_in_thread, url:) articles_in_thread end end |
permalink #build ⇒ Object
26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/html2rss/auto_source.rb', line 26 def build raise NoArticlesFound if articles.empty? Reducer.call(articles, url:) Cleanup.call(articles, url:, keep_different_domain: true) channel.articles = articles Html2rss::AutoSource::RssBuilder.new( channel:, articles: ).call end |
permalink #channel ⇒ Object
[View source]
56 57 58 |
# File 'lib/html2rss/auto_source.rb', line 56 def channel @channel ||= Channel.new(parsed_body, headers: @headers, url:) end |