Module: Spidr::Events

Included in:
Agent
Defined in:
lib/spidr/events.rb

Overview

The Events module adds methods to Agent for registering callbacks which will receive URLs, links, headers and pages, when they are visited.

Instance Method Summary collapse

Instance Method Details

#all_headers {|headers| ... } ⇒ Object

Pass the headers from every response the agent receives to a given block.

Yields:

  • (headers)

    The block will be passed the headers of every response.

Yield Parameters:

  • headers (Hash)

    The headers from a response.



73
74
75
# File 'lib/spidr/events.rb', line 73

def all_headers
  every_page { |page| yield page.headers }
end

#every_atom_doc {|doc| ... } ⇒ Object

Pass every Atom document that the agent parses to a given block.

Yields:

  • (doc)

    The block will be passed every Atom document parsed.

Yield Parameters:

  • doc (Nokogiri::XML::Document)

    A parsed XML document.

See Also:



392
393
394
395
396
397
398
399
400
# File 'lib/spidr/events.rb', line 392

def every_atom_doc
  every_page do |page|
    if (block_given? && page.atom?)
      if (doc = page.doc)
        yield doc
      end
    end
  end
end

#every_atom_page {|feed| ... } ⇒ Object

Pass every Atom feed that the agent visits to a given block.

Yields:

  • (feed)

    The block will be passed every Atom feed visited.

Yield Parameters:

  • feed (Page)

    A visited page.



456
457
458
459
460
# File 'lib/spidr/events.rb', line 456

def every_atom_page
  every_page do |page|
    yield page if (block_given? && page.atom?)
  end
end

#every_bad_request_page {|page| ... } ⇒ Object

Pass every Bad Request page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Bad Request page visited.

Yield Parameters:

  • page (Page)

    A visited page.



145
146
147
148
149
# File 'lib/spidr/events.rb', line 145

def every_bad_request_page
  every_page do |page|
    yield page if (block_given? && page.bad_request?)
  end
end

#every_css_page {|page| ... } ⇒ Object

Pass every CSS page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every CSS page visited.

Yield Parameters:

  • page (Page)

    A visited page.



426
427
428
429
430
# File 'lib/spidr/events.rb', line 426

def every_css_page
  every_page do |page|
    yield page if (block_given? && page.css?)
  end
end

#every_doc {|doc| ... } ⇒ Object

Pass every HTML or XML document that the agent parses to a given block.

Yields:

  • (doc)

    The block will be passed every HTML or XML document parsed.

Yield Parameters:

  • doc (Nokogiri::HTML::Document, Nokogiri::XML::Document)

    A parsed HTML or XML document.

See Also:



286
287
288
289
290
291
292
293
294
# File 'lib/spidr/events.rb', line 286

def every_doc
  every_page do |page|
    if block_given?
      if (doc = page.doc)
        yield doc
      end
    end
  end
end

#every_failed_url {|url| ... } ⇒ Object

Pass each URL that could not be requested to the given block.

Yields:

  • (url)

    The block will be passed every URL that could not be requested.

Yield Parameters:

  • url (URI::HTTP)

    A failed URL.



31
32
33
34
# File 'lib/spidr/events.rb', line 31

def every_failed_url(&block)
  @every_failed_url_blocks << block
  return self
end

#every_forbidden_page {|page| ... } ⇒ Object

Pass every Forbidden page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Forbidden page visited.

Yield Parameters:

  • page (Page)

    A visited page.



175
176
177
178
179
# File 'lib/spidr/events.rb', line 175

def every_forbidden_page
  every_page do |page|
    yield page if (block_given? && page.forbidden?)
  end
end

#every_html_doc {|doc| ... } ⇒ Object

Pass every HTML document that the agent parses to a given block.

Yields:

  • (doc)

    The block will be passed every HTML document parsed.

Yield Parameters:

  • doc (Nokogiri::HTML::Document)

    A parsed HTML document.

See Also:



307
308
309
310
311
312
313
314
315
# File 'lib/spidr/events.rb', line 307

def every_html_doc
  every_page do |page|
    if (block_given? && page.html?)
      if (doc = page.doc)
        yield doc
      end
    end
  end
end

#every_html_page {|page| ... } ⇒ Object

Pass every HTML page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every HTML page visited.

Yield Parameters:

  • page (Page)

    A visited page.



236
237
238
239
240
# File 'lib/spidr/events.rb', line 236

def every_html_page
  every_page do |page|
    yield page if (block_given? && page.html?)
  end
end

#every_internal_server_error_page {|page| ... } ⇒ Object

Pass every Internal Server Error page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Internal Server Error page visited.

Yield Parameters:

  • page (Page)

    A visited page.



206
207
208
209
210
# File 'lib/spidr/events.rb', line 206

def every_internal_server_error_page
  every_page do |page|
    yield page if (block_given? && page.had_internal_server_error?)
  end
end

#every_javascript_page {|page| ... } ⇒ Object

Pass every JavaScript page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every JavaScript page visited.

Yield Parameters:

  • page (Page)

    A visited page.



411
412
413
414
415
# File 'lib/spidr/events.rb', line 411

def every_javascript_page
  every_page do |page|
    yield page if (block_given? && page.javascript?)
  end
end

Passes every origin and destination URI of each link to a given block.

Yields:

  • (origin, dest)

    The block will be passed every origin and destination URI of each link.

Yield Parameters:

  • origin (URI::HTTP)

    The URI that a link originated from.

  • dest (URI::HTTP)

    The destination URI of a link.



521
522
523
524
# File 'lib/spidr/events.rb', line 521

def every_link(&block)
  @every_link_blocks << block
  return self
end

#every_missing_page {|page| ... } ⇒ Object

Pass every Missing page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Missing page visited.

Yield Parameters:

  • page (Page)

    A visited page.



190
191
192
193
194
# File 'lib/spidr/events.rb', line 190

def every_missing_page
  every_page do |page|
    yield page if (block_given? && page.missing?)
  end
end

#every_ms_word_page {|page| ... } ⇒ Object

Pass every MS Word page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every MS Word page visited.

Yield Parameters:

  • page (Page)

    A visited page.



471
472
473
474
475
# File 'lib/spidr/events.rb', line 471

def every_ms_word_page
  every_page do |page|
    yield page if (block_given? && page.ms_word?)
  end
end

#every_ok_page {|page| ... } ⇒ Object

Pass every OK page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every OK page visited.

Yield Parameters:

  • page (Page)

    A visited page.



100
101
102
103
104
# File 'lib/spidr/events.rb', line 100

def every_ok_page
  every_page do |page|
    yield page if (block_given? && page.ok?)
  end
end

#every_page {|page| ... } ⇒ Object

Pass every page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every page visited.

Yield Parameters:

  • page (Page)

    A visited page.



86
87
88
89
# File 'lib/spidr/events.rb', line 86

def every_page(&block)
  @every_page_blocks << block
  return self
end

#every_pdf_page {|page| ... } ⇒ Object

Pass every PDF page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every PDF page visited.

Yield Parameters:

  • page (Page)

    A visited page.



486
487
488
489
490
# File 'lib/spidr/events.rb', line 486

def every_pdf_page
  every_page do |page|
    yield page if (block_given? && page.pdf?)
  end
end

#every_redirect_page {|page| ... } ⇒ Object

Pass every Redirect page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Redirect page visited.

Yield Parameters:

  • page (Page)

    A visited page.



115
116
117
118
119
# File 'lib/spidr/events.rb', line 115

def every_redirect_page
  every_page do |page|
    yield page if (block_given? && page.redirect?)
  end
end

#every_rss_doc {|doc| ... } ⇒ Object

Pass every RSS document that the agent parses to a given block.

Yields:

  • (doc)

    The block will be passed every RSS document parsed.

Yield Parameters:

  • doc (Nokogiri::XML::Document)

    A parsed XML document.

See Also:



371
372
373
374
375
376
377
378
379
# File 'lib/spidr/events.rb', line 371

def every_rss_doc
  every_page do |page|
    if (block_given? && page.rss?)
      if (doc = page.doc)
        yield doc
      end
    end
  end
end

#every_rss_page {|feed| ... } ⇒ Object

Pass every RSS feed that the agent visits to a given block.

Yields:

  • (feed)

    The block will be passed every RSS feed visited.

Yield Parameters:

  • feed (Page)

    A visited page.



441
442
443
444
445
# File 'lib/spidr/events.rb', line 441

def every_rss_page
  every_page do |page|
    yield page if (block_given? && page.rss?)
  end
end

#every_timedout_page {|page| ... } ⇒ Object

Pass every Timeout page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Timeout page visited.

Yield Parameters:

  • page (Page)

    A visited page.



130
131
132
133
134
# File 'lib/spidr/events.rb', line 130

def every_timedout_page
  every_page do |page|
    yield page if (block_given? && page.timedout?)
  end
end

#every_txt_page {|page| ... } ⇒ Object

Pass every Plain Text page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Plain Text page visited.

Yield Parameters:

  • page (Page)

    A visited page.



221
222
223
224
225
# File 'lib/spidr/events.rb', line 221

def every_txt_page
  every_page do |page|
    yield page if (block_given? && page.txt?)
  end
end

#every_unauthorized_page {|page| ... } ⇒ Object

Pass every Unauthorized page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every Unauthorized page visited.

Yield Parameters:

  • page (Page)

    A visited page.



160
161
162
163
164
# File 'lib/spidr/events.rb', line 160

def every_unauthorized_page
  every_page do |page|
    yield page if (block_given? && page.unauthorized?)
  end
end

#every_url {|url| ... } ⇒ Object

Pass each URL from each page visited to the given block.

Yields:

  • (url)

    The block will be passed every URL from every page visited.

Yield Parameters:

  • url (URI::HTTP)

    Each URL from each page visited.



17
18
19
20
# File 'lib/spidr/events.rb', line 17

def every_url(&block)
  @every_url_blocks << block
  return self
end

#every_url_like(pattern) {|url| ... } ⇒ Object

Pass every URL that the agent visits, and matches a given pattern, to a given block.

Parameters:

  • pattern (Regexp, String)

    The pattern to match URLs with.

Yields:

  • (url)

    The block will be passed every URL that matches the given pattern.

Yield Parameters:

  • url (URI::HTTP)

    A matching URL.

Since:

  • 0.3.2



51
52
53
54
# File 'lib/spidr/events.rb', line 51

def every_url_like(pattern,&block)
  @every_url_like_blocks[pattern] << block
  return self
end

#every_xml_doc {|doc| ... } ⇒ Object

Pass every XML document that the agent parses to a given block.

Yields:

  • (doc)

    The block will be passed every XML document parsed.

Yield Parameters:

  • doc (Nokogiri::XML::Document)

    A parsed XML document.

See Also:



328
329
330
331
332
333
334
335
336
# File 'lib/spidr/events.rb', line 328

def every_xml_doc
  every_page do |page|
    if (block_given? && page.xml?)
      if (doc = page.doc)
        yield doc
      end
    end
  end
end

#every_xml_page {|page| ... } ⇒ Object

Pass every XML page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every XML page visited.

Yield Parameters:

  • page (Page)

    A visited page.



251
252
253
254
255
# File 'lib/spidr/events.rb', line 251

def every_xml_page
  every_page do |page|
    yield page if (block_given? && page.xml?)
  end
end

#every_xsl_doc {|doc| ... } ⇒ Object

Pass every XML Stylesheet (XSL) that the agent parses to a given block.

Yields:

  • (doc)

    The block will be passed every XSL Stylesheet (XSL) parsed.

Yield Parameters:

  • doc (Nokogiri::XML::Document)

    A parsed XML document.

See Also:



350
351
352
353
354
355
356
357
358
# File 'lib/spidr/events.rb', line 350

def every_xsl_doc
  every_page do |page|
    if (block_given? && page.xsl?)
      if (doc = page.doc)
        yield doc
      end
    end
  end
end

#every_xsl_page {|page| ... } ⇒ Object

Pass every XML Stylesheet (XSL) page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every XML Stylesheet (XSL) page visited.

Yield Parameters:

  • page (Page)

    A visited page.



267
268
269
270
271
# File 'lib/spidr/events.rb', line 267

def every_xsl_page
  every_page do |page|
    yield page if (block_given? && page.xsl?)
  end
end

#every_zip_page {|page| ... } ⇒ Object

Pass every ZIP page that the agent visits to a given block.

Yields:

  • (page)

    The block will be passed every ZIP page visited.

Yield Parameters:

  • page (Page)

    A visited page.



501
502
503
504
505
# File 'lib/spidr/events.rb', line 501

def every_zip_page
  every_page do |page|
    yield page if (block_given? && page.zip?)
  end
end

#initialize_events(options = {}) ⇒ Object (protected)



528
529
530
531
532
533
534
535
# File 'lib/spidr/events.rb', line 528

def initialize_events(options={})
  @every_url_blocks        = []
  @every_failed_url_blocks = []
  @every_url_like_blocks   = Hash.new { |hash,key| hash[key] = [] }

  @every_page_blocks = []
  @every_link_blocks = []
end

#urls_like(pattern, &block) ⇒ Object

See Also:



59
60
61
# File 'lib/spidr/events.rb', line 59

def urls_like(pattern,&block)
  every_url_like(pattern,&block)
end