Class: FeedAutodiscovery

Inherits:
Object
  • Object
show all
Defined in:
lib/feed_autodiscovery.rb

Overview

Class to performs feed autodiscovery on an HTML document.

Class Method Summary collapse

Class Method Details

.discover(feed, feed_response) ⇒ Object

Try to perform feed autodiscovery on an HTTP response, with the assumption that it's an HTML document.

If successful, save the discovered fetch_url in the database and return the updated feed.

This method just updates the fetch_url of the feed with the one autodiscovered from the HTML, it doesn't retrieve entries nor do any other changes. It's the responsability of the invoking code to fetch the feed afterwards, populate entries, title, URL etc.

Receives as arguments the feed object to be associated with the discovered fetch_url, and the response object with the HTML document.

Any errors raised are bubbled to be handled higher up the call chain. In particular, if the response on which autodiscovery is being performed is not an HTML document, an error will be raised.

Returns the updated feed object if autodiscovery is successful, or nil if the HTML didn't have a feed associated.


23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/feed_autodiscovery.rb', line 23

def self.discover(feed, feed_response)
  Rails.logger.info "Trying to perform feed autodiscovery on url #{feed.fetch_url}"
  doc = Nokogiri::HTML feed_response

  # In this order, give preference to Atom, then to RSS, then to generic "feed" links
  xpath_atom = '//head//link[@rel="alternate"][@type="application/atom+xml"]'
  xpath_rss = '//head//link[@rel="alternate"][@type="application/rss+xml"]'
  xpath_feed = '//head//link[@rel="feed"]'
  feed_link = doc.at_xpath(xpath_atom + '|' + xpath_rss + '|' + xpath_feed)

  feed_href = feed_link&.attr('href')&.to_s
  if feed_href.present?
    # If the href is a path without fqdn, i.e. "/feeds.php", prepend it with the scheme and fqdn of the webpage
    feed_href = relative_to_absolute_url feed_href, feed

    # If the href is a relative protocol URL, i.e. "//website.com/feeds.php", prepend it with the scheme of the webpage
    feed_href = relative_to_absolute_protocol feed_href, feed

    # Check if the autodiscovered feed is already in the database
    existing_feed = Feed.url_variants_feed feed_href
    if existing_feed.present? && existing_feed == feed
      # The discovered URL is the one the passed feed already has. No changes in the db are necessary.
      Rails.logger.info "Autodiscovered feed with URL #{feed_href}. Feed #{feed.id} already has this fetch_url, no changes necessary."
      discovered_feed = feed
    elsif existing_feed.present? && existing_feed != feed
      # There is already a feed in the db with the discovered url. Discard the passed feed and subscribe users to the already existing one.
      Rails.logger.info "Autodiscovered already known feed with url #{feed_href}. Using it and destroying feed with url #{feed.url} passed as argument"
      feed.users.find_each do |user|
        Rails.logger.info "User #{user.id} - #{user.email} is subscribed to feed #{feed.url} to be destroyed, subscribing to existing feed #{existing_feed.id} - #{feed_href} instead"
        user.subscribe existing_feed.fetch_url unless user.feeds.include? existing_feed
      end

      feed.destroy
      discovered_feed = existing_feed
    else
      Rails.logger.info "Autodiscovered new feed with url #{feed_href}. Updating fetch url in the database."
      feed.fetch_url = feed_href
      feed.save!
      discovered_feed = feed
    end

    return discovered_feed
  else
    Rails.logger.warn "Feed autodiscovery failed for #{feed.fetch_url}"
    return nil
  end
end