Class: Html2rss::HtmlExtractor::Extractors::Archive

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/html_extractor/enclosure_extractor.rb

Overview

Extracts archive enclosures (zip, tar.gz, tgz) from HTML tags.

Class Method Summary collapse

Class Method Details

.call(article_tag, base_url:) ⇒ Object



86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/html2rss/html_extractor/enclosure_extractor.rb', line 86

def self.call(, base_url:)
  .css('a[href$=".zip"], a[href$=".tar.gz"], a[href$=".tgz"]').filter_map do |link|
    href = link['href'].to_s
    next if href.empty?

    abs_url = Url.from_relative(href, base_url)
    {
      url: abs_url,
      type: 'application/zip'
    }
  end
end