Class: Html2rss::AutoSource::Cleanup

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/auto_source/cleanup.rb

Overview

Cleanup is responsible for cleaning up the extracted articles. :reek:MissingSafeMethod { enabled: false } It applies various strategies to filter and refine the article list.

Class Method Summary collapse

Class Method Details

.call(articles, url:, keep_different_domain: false) ⇒ Object



11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# File 'lib/html2rss/auto_source/cleanup.rb', line 11

def call(articles, url:, keep_different_domain: false)
  Log.debug "Cleanup: start with #{articles.size} articles"

  articles.select!(&:valid?)

  remove_short!(articles, :title)

  deduplicate_by!(articles, :url)
  deduplicate_by!(articles, :title)

  keep_only_http_urls!(articles)
  reject_different_domain!(articles, url) unless keep_different_domain

  Log.debug "Cleanup: end with #{articles.size} articles"
  articles
end