Method: Grubby#fulfill

Defined in:
lib/grubby.rb

#fulfill(uri, purpose = "") {|resource| ... } ⇒ Object?

Ensures only-once processing of the resource indicated by uri for the specified purpose. The given block is executed and the result is returned if and only if the Grubby instance has not recorded a previous call to fulfill for the same resource and purpose.

Note that the resource is identified by both its URI and its content hash. The latter prevents superfluous and rearranged URI query string parameters from interfering with only-once processing.

If #journal is set, and if the block does not raise an exception, the resource and purpose are logged to the journal file. This enables only-once processing across multiple program runs. It also provides a means to resume batch processing after an unexpected termination.

Examples:

grubby = Grubby.new

grubby.fulfill("https://example.com/posts") do |page|
  "first time"
end
# == "first time"

grubby.fulfill("https://example.com/posts") do |page|
  "already seen" # not evaluated
end
# == nil

grubby.fulfill("https://example.com/posts?page=1") do |page|
  "already seen content hash" # not evaluated
end
# == nil

grubby.fulfill("https://example.com/posts", "again!") do |page|
  "already seen, but new purpose"
end
# == "already seen, but new purpose"

Parameters:

Yield Parameters:

Yield Returns:

  • (Object)

Returns:

  • (Object, nil)

Raises:

  • (Mechanize::ResponseCodeError)

    if fetching the resource results in error (see Mechanize#get)



193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
# File 'lib/grubby.rb', line 193

def fulfill(uri, purpose = "")
  series = []

  uri = uri.to_absolute_uri
  return unless add_fulfilled(uri, purpose, series)

  normalized_uri = normalize_uri(uri)
  return unless add_fulfilled(normalized_uri, purpose, series)

  $log.info("Fetch #{normalized_uri}")
  resource = get(normalized_uri)
  unprocessed = add_fulfilled(resource.uri, purpose, series) &
    add_fulfilled("content hash: #{resource.content_hash}", purpose, series)

  result = yield resource if unprocessed

  CSV.open(journal, "a") do |csv|
    series.each{|entry| csv << entry }
  end if journal

  result
end