Class: Html2rss::AutoSource::Scraper::SemanticHtml::Image

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/auto_source/scraper/semantic_html/image.rb

Overview

Image is responsible for extracting image URLs the article_tag.

Class Method Summary collapse

Class Method Details

.call(article_tag, url:) ⇒ Object



10
11
12
13
14
15
16
# File 'lib/html2rss/auto_source/scraper/semantic_html/image.rb', line 10

def self.call(, url:)
  img_src = from_source() ||
            from_img() ||
            from_style()

  Utils.build_absolute_url_from_relative(img_src, url) if img_src
end

.from_img(article_tag) ⇒ Object



18
19
20
# File 'lib/html2rss/auto_source/scraper/semantic_html/image.rb', line 18

def self.from_img()
  .at_css('img[src]:not([src^="data"])')&.[]('src')
end

.from_source(article_tag) ⇒ Object

Extracts the largest image source from the srcset attribute of an img tag or a source tag inside a picture tag.

See Also:



29
30
31
32
33
34
35
36
37
38
39
40
41
42
# File 'lib/html2rss/auto_source/scraper/semantic_html/image.rb', line 29

def self.from_source() # rubocop:disable Metrics/AbcSize
  hash = .css('img[srcset], picture > source[srcset]')
                    .flat_map { |source| source['srcset'].to_s.split(',') }
                    .filter_map do |line|
    width, url = line.split.reverse
    next if url.nil? || url.start_with?('data:')

    width_value = width.to_i.zero? ? 0 : width.scan(/\d+/).first.to_i

    [width_value, url.strip]
  end.to_h

  hash[hash.keys.max]
end

.from_style(article_tag) ⇒ Object



44
45
46
47
48
49
# File 'lib/html2rss/auto_source/scraper/semantic_html/image.rb', line 44

def self.from_style()
  .css('[style*="url"]')
             .map { |tag| tag['style'][/url\(['"]?(.*?)['"]?\)/, 1] }
             .reject { |src| !src || src.start_with?('data:') }
             .max_by(&:size)
end