Class: Rfeedfinder
- Inherits:
-
Object
- Object
- Rfeedfinder
- Defined in:
- lib/rfeedfinder/version.rb,
lib/rfeedfinder.rb
Overview
:nodoc:
Defined Under Namespace
Modules: VERSION
Class Method Summary collapse
-
.feed(uri, options = {}) ⇒ Object
Takes: *
uri
(string): The URI to check *options
(hash) *:proxy
: (string) proxy information to use. -
.feeds(uri, options = {}) ⇒ Object
Takes: *
uri
(string): The URI to check *options
(hash) *:proxy
: (string) proxy information to use. -
.isFeed?(uri, options) ⇒ Boolean
Takes: *
uri
(string). -
.isFeedData?(data) ⇒ Boolean
Takes: *
data
(string).
Instance Method Summary collapse
-
#feed(uri) ⇒ Object
Takes: *
uri
(string). -
#feeds(uri) ⇒ Object
Takes: *
uri
(string). -
#initialize(init_values = {}) ⇒ Rfeedfinder
constructor
Takes: *
init_values
(hash) *:proxy
: (string) proxy information to use.
Constructor Details
#initialize(init_values = {}) ⇒ Rfeedfinder
Takes:
-
init_values
(hash)-
:proxy
: (string) proxy information to use. Defaults to a blank string -
:user_agent
: (string) user agent to identify as. Defaults to Ruby/#RUBY_VERSION - Rfeedfinder VERSION -
:from
: (string) contact info to the responsible person. FIXME: Is this correct? Defaults to [email protected] -
:keep_data
: (boolean) if the data downloaded for the feeds should be returned along with the URLs. Defaults to false -
:use_google
: (boolean) tries to find a URL using a google “I’m feeling lucky” search. Defaults to false
Example:
Rfeedfinder.new(=> “127.0.0.1:1234”,
:user_agent => "MyApp", :from => "[email protected]", :referer => "http://domain.com")
-
Returns a new instance of Rfeedfinder
31 32 33 |
# File 'lib/rfeedfinder.rb', line 31 def initialize(init_values = {}) @options = init_values end |
Class Method Details
.feed(uri, options = {}) ⇒ Object
Takes:
-
uri
(string): The URI to check -
options
(hash)-
:proxy
: (string) proxy information to use. Defaults to a blank string -
:user_agent
: (string) user agent to identify as. Defaults to Ruby/#RUBY_VERSION - Rfeedfinder VERSION -
:from
: (string) contact info to the responsible person. FIXME: Is this correct? Defaults to [email protected] -
:keep_data
: (boolean) if the data downloaded for the feeds should be returned along with the URLs. Defaults to false -
:use_google
: (boolean) tries to find a URL using a google “I’m feeling lucky” search. Defaults to false
Example:
Rfeedfinder.feeds(“www.google.com”, => “127.0.0.1:1234”,
:user_agent => "MyApp", :from => "[email protected]", :referer => "http://domain.com")
-
Returns:
-
one URL as a string or nil
-
one hash if the :keep_data option is true Example: {:url => “url1”, :data => “some data”}
Raises:
-
ArgumentError if
uri
is not a valid URL, and :use_google => false -
ArgumentError if :use_google => true but it’s not your lucky day
254 255 256 257 258 259 260 261 262 |
# File 'lib/rfeedfinder.rb', line 254 def self.feed(uri, = {}) [:only_first] = true feedlist = Rfeedfinder.feeds(uri, ) unless feedlist.empty? return feedlist[0] else return nil end end |
.feeds(uri, options = {}) ⇒ Object
Takes:
-
uri
(string): The URI to check -
options
(hash)-
:proxy
: (string) proxy information to use. Defaults to a blank string -
:user_agent
: (string) user agent to identify as. Defaults to Ruby/#RUBY_VERSION - Rfeedfinder VERSION -
:from
: (string) contact info to the responsible person. FIXME: Is this correct? Defaults to [email protected] -
:keep_data
: (boolean) if the data downloaded for the feeds should be returned along with the URLs. Defaults to false -
:use_google
: (boolean) tries to find a URL using a google “I’m feeling lucky” search. Defaults to false
Example:
Rfeedfinder.feeds(“www.google.com”, => “127.0.0.1:1234”,
:user_agent => "MyApp", :from => "[email protected]", :referer => "http://domain.com")
-
Returns:
-
array of urls
-
array of hashes if the :keep_data option is true Example:
- {:url => “url1”, :data => “some data”},{:url => “url2”, :data => “feed data”}
Raises:
-
ArgumentError if
uri
is not a valid URL, and :use_google => false -
ArgumentError if :use_google => true but it’s not your lucky day
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
# File 'lib/rfeedfinder.rb', line 86 def self.feeds(uri, = {}) # We have to create a hash for the data # if the user has asked us to keep the data [:data] = {} if [:keep_data] [:original_uri] = uri if !Rfeedfinder.isAValidURL?(uri) and [:use_google] uri = URI.decode(uri) [:recurs] = [uri] if [:recurs].nil? fulluri = Rfeedfinder.makeFullURI(uri) raise ArgumentError, "#{fulluri} is not a valid URI." \ if !Rfeedfinder.isAValidURL?(fulluri) and ![:use_google] # Add youtube support if fulluri =~ /youtube\.com\/user\/(.*[^\/])/ fulluri = "http://www.youtube.com/rss/user/#{$1}/videos.rss" end if fulluri =~ /youtube\.com\/tag\/(.*[^\/])/ fulluri = "http://www.youtube.com/rss/tag/#{$1}/videos.rss" end data = Rfeedfinder.open_doc(fulluri, ) return [] if data.nil? # If we used the google link finder, then we should set the new URL fulluri = [:google_link] if [:google_link] # is this already a feed? if Rfeedfinder.isFeedData?(data) feedlist = [fulluri] Rfeedfinder.verifyRedirect(feedlist) return feedlist end #verify redirection newuri = Rfeedfinder.tryBrokenRedirect(data) if !newuri.nil? and !newuri.empty? [:recurs] = [] unless [:recurs] unless [:recurs].include?(newuri) [:recurs] << newuri return feeds(newuri, ) end end #verify frameset frames = Rfeedfinder.getFrameLinks(data, fulluri) frames.each {|newuri| if !newuri.nil? and !newuri.empty? [:recurs] = [] unless [:recurs] unless [:recurs].include?(newuri) [:recurs] << newuri return feeds(newuri, ) end end } # nope, it's a page, try LINK tags first outfeeds = Rfeedfinder.getLinks(data, fulluri).select {|link| Rfeedfinder.isFeed?(link, )} #_debuglog('found %s feeds through LINK tags' % len(outfeeds)) if outfeeds.empty? # no LINK tags, look for regular <A> links that point to feeds begin links = Rfeedfinder.getALinks(data, fulluri) rescue links = [] end # Get local links links, locallinks = Rfeedfinder.getLocalLinks(links, fulluri) # TODO: # implement support for :only_first down her # look for obvious feed links on the same server selected_feeds = locallinks.select{|link| Rfeedfinder.isFeedLink?(link) and Rfeedfinder.isFeed?(link, )} outfeeds << selected_feeds unless selected_feeds.empty? # outfeeds.each{|link| puts "1 #{link}"} # look harder for feed links on the same server selected_feeds = locallinks.select{|link| Rfeedfinder.(link) and Rfeedfinder.isFeed?(link, )} if outfeeds.empty? outfeeds << selected_feeds unless selected_feeds.empty? # outfeeds.each{|link| puts "2 #{link}"} # look for obvious feed links on another server selected_feeds = links.select {|link| Rfeedfinder.isFeedLink?(link) and Rfeedfinder.isFeed?(link, )} if outfeeds.empty? outfeeds << selected_feeds unless selected_feeds.empty? # outfeeds.each{|link| puts "3 #{link}"} # look harder for feed links on another server selected_feeds = links.select {|link| Rfeedfinder.(link) and Rfeedfinder.isFeed?(link, )} if outfeeds.empty? outfeeds << selected_feeds unless selected_feeds.empty? # outfeeds.each{|link| puts "4 #{link}"} end if outfeeds.empty? # no A tags, guessing # filenames used by popular software: guesses = ['atom.xml', # blogger, TypePad 'feed/', # wordpress 'feeds/posts/default', # blogspot 'feed/main/rss20', # fotolog 'index.atom', # MT, apparently 'index.rdf', # MT 'rss.xml', # Dave Winer/Manila 'index.xml', # MT 'index.rss'] # Slash guesses.each { |guess| uri = URI.join(fulluri, guess).to_s outfeeds << uri if Rfeedfinder.isFeed?(uri, ) } end # try with adding ending slash if outfeeds.empty? and fulluri !~ /\/$/ outfeeds = Rfeedfinder.feeds(fulluri + "/", ) end # Verify redirection Rfeedfinder.verifyRedirect(outfeeds) # This has to be used until proper :only_first support has been built in outfeeds = outfeeds.first if [:only_first] and outfeeds.size > 1 if [:keep_data] output = [] outfeeds.each do |feed| output << {:url => feed, :data => [:data][feed]} end return output else return outfeeds end end |
.isFeed?(uri, options) ⇒ Boolean
Takes:
-
uri
(string)
Downloads the URI and checkes the content with the isFeedData?
class method
Returns:
-
true
if the uri points to a feed -
false
if not
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
# File 'lib/rfeedfinder.rb', line 289 def self.isFeed?(uri, ) # We return false if the user only wants one result # and we already have found it so there aren't made # any additional external calls return false if [:only_first] and [:already_found_one] uri.gsub!(/\/\/www\d\./, "//www.") begin protocol = URI.split(uri) return false if !protocol[0].index(/^[http|https]/) rescue # URI error return false end data = Rfeedfinder.open_doc(uri, ) return false if data.nil? if Rfeedfinder.isFeedData?(data) [:already_found_one] = true if [:only_first] return true else return false end end |
.isFeedData?(data) ⇒ Boolean
Takes:
-
data
(string)
Returns:
-
true
if the data has a rss, rdf or feed tag -
false
if the data has a html tag
272 273 274 275 276 |
# File 'lib/rfeedfinder.rb', line 272 def self.isFeedData?(data) # if no html tag and rss, rdf or feed tag, it's a feed # puts data return ((data/"html|HTML").empty? and (!(data/:rss).nil? or !(data/:rdf).nil? or !(data/:feed).nil?)) end |
Instance Method Details
#feed(uri) ⇒ Object
Takes:
-
uri
(string)
Returns:
-
url (string)
53 54 55 |
# File 'lib/rfeedfinder.rb', line 53 def feed(uri) result = Rfeedfinder.feed(uri, @options.dup) end |
#feeds(uri) ⇒ Object
Takes:
-
uri
(string)
Returns:
-
array of urls
42 43 44 |
# File 'lib/rfeedfinder.rb', line 42 def feeds(uri) Rfeedfinder.feeds(uri, @options.dup) end |