Class: RDig::ETagFilter

Inherits:
Object
  • Object
show all
Includes:
MonitorMixin
Defined in:
lib/rdig/crawler.rb

Overview

checks fetched documents’ E-Tag headers against the list of E-Tags of the documents already indexed. This is supposed to help against double-indexing documents which can be reached via different URLs (think host.com/ and host.com/index.html ) Documents without ETag are allowed to pass through

Instance Method Summary collapse

Constructor Details

#initializeETagFilter

Returns a new instance of ETagFilter.



118
119
120
121
# File 'lib/rdig/crawler.rb', line 118

def initialize
  @etags = Set.new
  super
end

Instance Method Details

#apply(document) ⇒ Object



123
124
125
126
127
128
# File 'lib/rdig/crawler.rb', line 123

def apply(document)
  return document unless (document.respond_to?(:etag) && document.etag && !document.etag.empty?)
  synchronize do
    @etags.add?(document.etag) ? document : nil 
  end
end