Class: FeedAbstract::Feed

Inherits:
Object
  • Object
show all
Defined in:
lib/feed-abstract/feed.rb

Overview

FeedAbstract:::Feed is the main class. It invokes RSS::Parser and negotiates which of the FeedAbstract::Channel and FeedAbstract::Item classes get dispatched to normalize the object graph of the feed you’re parsing.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(xml = nil, opts = {}) ⇒ Feed

Parameters

  • xml - a string or object instance that responds to read

  • :do_validate - whether or not the feed should be validated. Passed through to RSS::Parser. Defaults to false.

  • :ignore_unknown_element - passed through to RSS::Parser. Defaults to true.

  • :input_encoding - Defaults to “UTF-8”. This is the encoding of the feed as passed to FeedAbstract::Feed.new

  • :output_encoding - Defaults to “UTF-8”. This is the encoding of the feed as it’s passed to the underlying parser - generally you should keep this as “UTF-8”.

  • :force_encoding - Force input text to be UTF-8 (or whatever you set :output_encoding to), removing invalid byte sequences before passing to RSS::Parser. Defaults to true because RSS::Parser will thrown an error if it sees invalid byte sequences.

  • :transliterate_characters - Ask Iconv to transliterate unknown characters when forcing encoding conversion. This only works reliably when you set the :input_encoding properly. Defaults to false and should probably be left there because it’s quirky and doesn’t appear to work reliably.

Returns

An object with three attributes:

  • channel - an instance of FeedAbstract::Channel matching the type of feed we recognized

  • items - an array of items matching the type of feed we recognized.

  • raw_feed - the raw feed object returned by RSS::Parser, which might include RSS::Atom::Feed, RSS::RDF, or RSS::Rss

You will most likely be using the channel and items attributes.

Notes

  • If a feed can’t be parsed, we’ll throw a FeedAbstract::ParserError.

  • All dates are returned as Time objects. You will need to convert if you’re using ActiveRecord as it expects DateTime objects - fortunately this is very easy via the ‘to_datetime’ extension that ActiveRecord provides.

feed = FeedAbstract::Feed.new(feed_xml_string)
feed.items.each do |item|
  fi = FeedItem.new #Your feed item model.
  if item.updated.respond_to?(:to_datetime) #There's a date instead of an empty string
    fi.last_updated = item.updated.to_datetime # <<--- here! 
  end
  #More happens. . . 
  fi.save
end

Examples

f = FeedAbstract::Feed.new(File.open('/home/foo/xml/feed.rss2'))
puts f.channel.title
puts f.channel.description

f.items.each do|item|
 puts item.title
 puts item.link
end

f = FeedAbstract::Feed.new(File.open('/home/foo/xml/feed.atom'))
puts f.channel.generator

puts "All tags / categories / subjects in this feed: " + f.items.collect{|i| i.categories}.flatten.uniq.sort.join(', ')

f = FeedAbstract::Feed.new(Net::HTTP.get(URI.parse('http://rss.slashdot.org/Slashdot/slashdot')))
puts f.items.collect{|i| i.link}


60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/feed-abstract/feed.rb', line 60

def initialize(xml = nil, opts = {})
  options = {
    :do_validate => false, 
    :ignore_unknown_element => true, 
    :input_encoding => 'UTF-8', 
    :output_encoding => 'UTF-8', 
    :force_encoding => true, 
    :transliterate_characters => false
  }.merge(opts)

  input = (xml.respond_to?(:read)) ? xml.read : xml

  if options[:force_encoding]
    ic = Iconv.new(options[:output_encoding].upcase + ((options[:transliterate_characters]) ? '//TRANSLIT' : '') + '//IGNORE',options[:input_encoding].upcase)
    if input.respond_to?(:encoding)
      # ruby 1.9
      # Only transcode if the encoding isn't valid.
      # See: http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/ for why we're appending the extra space.
      unless (input.encoding.to_s.upcase == options[:output_encoding].upcase && input.valid_encoding?)
        input = ic.iconv(input << ' ')[0..-2]
      end
    else
      # ruby 1.8
      input = ic.iconv(input << ' ')[0..-2]
    end
  end

  @raw_feed = RSS::Parser.parse(input,options[:do_validate], options[:ignore_unknown_element])
  if @raw_feed == nil
    raise FeedAbstract::ParserError
  end
  negotiate_channel_class
end

Instance Attribute Details

#channelObject (readonly)

Returns the value of attribute channel.



9
10
11
# File 'lib/feed-abstract/feed.rb', line 9

def channel
  @channel
end

#itemsObject (readonly)

Returns the value of attribute items.



9
10
11
# File 'lib/feed-abstract/feed.rb', line 9

def items
  @items
end

#raw_feedObject (readonly)

Returns the value of attribute raw_feed.



9
10
11
# File 'lib/feed-abstract/feed.rb', line 9

def raw_feed
  @raw_feed
end