Class: RDig::ETagFilter
- Inherits:
-
Object
- Object
- RDig::ETagFilter
- Includes:
- MonitorMixin
- Defined in:
- lib/rdig/crawler.rb
Overview
checks fetched documents’ E-Tag headers against the list of E-Tags of the documents already indexed. This is supposed to help against double-indexing documents which can be reached via different URLs (think host.com/ and host.com/index.html ) Documents without ETag are allowed to pass through
Instance Method Summary collapse
- #apply(document) ⇒ Object
-
#initialize ⇒ ETagFilter
constructor
A new instance of ETagFilter.
Constructor Details
#initialize ⇒ ETagFilter
Returns a new instance of ETagFilter.
118 119 120 121 |
# File 'lib/rdig/crawler.rb', line 118 def initialize @etags = Set.new super end |
Instance Method Details
#apply(document) ⇒ Object
123 124 125 126 127 128 |
# File 'lib/rdig/crawler.rb', line 123 def apply(document) return document unless (document.respond_to?(:etag) && document.etag && !document.etag.empty?) synchronize do @etags.add?(document.etag) ? document : nil end end |