Class: Html2rss::Item

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/item.rb

Overview

Takes the selected Nokogiri::HTML and responds to accessor names defined in the feed config.

Instances can only be created via ‘.from_url` and each represents an internally used “RSS item”. Such an item provides dynamically defined attributes as methods.

Defined Under Namespace

Classes: Context, Enclosure

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(xml, config) ⇒ Item

Returns a new instance of Item.

Parameters:

[View source]

40
41
42
43
# File 'lib/html2rss/item.rb', line 40

def initialize(xml, config)
  @xml = xml
  @config = config
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(method_name, *_args) ⇒ String

Dynamically extracts data based on the method name.

Parameters:

  • method_name (Symbol)
  • _args (Array)

Returns:

  • (String)

    extracted value for the selector.

[View source]

64
65
66
67
68
# File 'lib/html2rss/item.rb', line 64

def method_missing(method_name, *_args)
  return super unless respond_to_missing?(method_name)

  extract(method_name)
end

Class Method Details

.from_url(url, config) ⇒ Array<Html2rss::Item>

Fetches items from a given URL using configuration settings.

Parameters:

  • url (Addressable::URI)

    URL to fetch items from.

  • config (Html2rss::Config)

    Configuration object.

Returns:

[View source]

25
26
27
28
29
30
31
32
33
34
35
# File 'lib/html2rss/item.rb', line 25

def self.from_url(url, config)
  ctx = RequestService::Context.new(url:, headers: config.headers)

  body = RequestService.execute(ctx, strategy: config.strategy).body
  body = ObjectToXmlConverter.new(JSON.parse(body)).call if config.json?

  Nokogiri.HTML(body)
          .css(config.selector_string(Config::Selectors::ITEMS_SELECTOR_NAME))
          .map { |xml| new(xml, config) }
          .select(&:valid?)
end

Instance Method Details

#categoriesArray<String>

Retrieves categories for the item based on configured category selectors.

Returns:

  • (Array<String>)

    list of categories.

[View source]

116
117
118
119
120
121
122
# File 'lib/html2rss/item.rb', line 116

def categories
  config.category_selector_names
        .filter_map do |method_name|
    category = public_send(method_name)
    category.strip unless category.to_s.empty?
  end.uniq
end

#enclosureEnclosure

Retrieves enclosure details for the item.

Returns:

[View source]

136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/html2rss/item.rb', line 136

def enclosure
  url = enclosure_url

  raise 'An item.enclosure requires an absolute URL' unless url&.absolute?

  type = config.selector_attributes_with_channel(:enclosure)[:content_type] ||
         Html2rss::Utils.guess_content_type_from_url(url)

  Enclosure.new(
    type:,
    bits_length: 0,
    url: url.to_s
  )
end

#enclosure?true, false

Checks if the item has an enclosure based on configuration.

Returns:

  • (true, false)
[View source]

128
129
130
# File 'lib/html2rss/item.rb', line 128

def enclosure?
  config.selector?(:enclosure)
end

#extract(tag) ⇒ String

Selects and processes data according to the selector name.

Parameters:

  • tag (Symbol)

Returns:

  • (String)

    the extracted value for the selector.

[View source]

75
76
77
78
79
80
81
82
# File 'lib/html2rss/item.rb', line 75

def extract(tag)
  attribute_options = config.selector_attributes_with_channel(tag.to_sym)

  post_process(
    ItemExtractors.item_extractor_factory(attribute_options, xml).get,
    attribute_options.fetch(:post_process, false)
  )
end

#guidString

Returns SHA1 hashed GUID.

Returns:

  • (String)

    SHA1 hashed GUID.

[View source]

106
107
108
109
110
# File 'lib/html2rss/item.rb', line 106

def guid
  content = config.guid_selector_names.flat_map { |method_name| public_send(method_name) }.join

  Digest::SHA1.hexdigest(content)
end

#respond_to_missing?(method_name, _include_private = false) ⇒ true, false

Checks if the object responds to a method dynamically based on the configuration.

:reek:BooleanParameter { enabled: false }

Parameters:

  • method_name (Symbol)
  • _include_private (true, false) (defaults to: false)

Returns:

  • (true, false)
[View source]

54
55
56
# File 'lib/html2rss/item.rb', line 54

def respond_to_missing?(method_name, _include_private = false)
  config.selector?(method_name) || super
end

#title_or_descriptionString?

Returns either the title or the description, preferring title if available.

Returns:

  • (String, nil)
[View source]

97
98
99
100
101
# File 'lib/html2rss/item.rb', line 97

def title_or_description
  return title if config.selector?(:title)

  description if config.selector?(:description)
end

#valid?true, false

Checks if the item is valid accordin to RSS 2.0 spec, by ensuring it has at least a title or a description.

Returns:

  • (true, false)
[View source]

89
90
91
# File 'lib/html2rss/item.rb', line 89

def valid?
  title_or_description.to_s != ''
end