Class: Html2rss::ItemExtractors::Href

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/item_extractors/href.rb

Overview

Returns the value of the href attribute. It always returns absolute URLs. If the extracted href value is a relative URL, it prepends the channel’s URL.

Imagine this a HTML element with a href attribute:

<a href="/posts/latest-findings">...</a>

YAML usage example:

channel:
  url: http://blog-without-a-feed.example.com
  ...
selectors:
  link:
    selector: a
    extractor: href

Would return:

'http://blog-without-a-feed.example.com/posts/latest-findings'

Defined Under Namespace

Classes: Options

Instance Method Summary collapse

Constructor Details

#initialize(xml, options) ⇒ Href

Initializes the Href extractor.

Parameters:

  • xml (Nokogiri::XML::Element)
  • options (Options)


34
35
36
37
38
# File 'lib/html2rss/item_extractors/href.rb', line 34

def initialize(xml, options)
  @options = options
  @element = ItemExtractors.element(xml, options.selector)
  @href = @element.attr('href').to_s
end

Instance Method Details

#getString

Retrieves and returns the normalized absolute URL.

Returns:

  • (String)

    The absolute URL.



44
45
46
47
48
49
# File 'lib/html2rss/item_extractors/href.rb', line 44

def get
  return nil unless @href

  sanitized_href = Html2rss::Utils.sanitize_url(@href)
  Html2rss::Utils.build_absolute_url_from_relative(sanitized_href, @options.channel.url)
end