Module: Spidr::Sanitizers

Included in:
Agent
Defined in:
lib/spidr/sanitizers.rb

Overview

The Sanitizers module adds methods to Agent which control the sanitation of incoming links.

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#strip_fragmentsObject

Specifies whether the Agent will strip URI fragments



10
11
12
# File 'lib/spidr/sanitizers.rb', line 10

def strip_fragments
  @strip_fragments
end

#strip_queryObject

Specifies whether the Agent will strip URI queries



13
14
15
# File 'lib/spidr/sanitizers.rb', line 13

def strip_query
  @strip_query
end

Instance Method Details

#initialize_sanitizers(options = {}) ⇒ Object (protected)

Initializes the Sanitizer rules.

Parameters:

  • options (Hash) (defaults to: {})

    Additional options.

Options Hash (options):

  • :strip_fragments (Boolean) — default: true

    Specifies whether or not to strip the fragment component from URLs.

  • :strip_query (Boolean) — default: false

    Specifies whether or not to strip the query component from URLs.

Since:

  • 0.2.2



51
52
53
54
# File 'lib/spidr/sanitizers.rb', line 51

def initialize_sanitizers(options={})
  @strip_fragments = options.fetch(:strip_fragments,true)
  @strip_query     = options.fetch(:strip_query,false)
end

#sanitize_url(url) ⇒ URI::HTTP, URI::HTTPS

Sanitizes a URL based on filtering options.

Parameters:

  • url (URI::HTTP, URI::HTTPS, String)

    The URL to be sanitized

Returns:

  • (URI::HTTP, URI::HTTPS)

    The new sanitized URL.

Since:

  • 0.2.2



26
27
28
29
30
31
32
33
# File 'lib/spidr/sanitizers.rb', line 26

def sanitize_url(url)
  url = URI(url.to_s) unless url.kind_of?(URI)

  url.fragment = nil if @strip_fragments
  url.query    = nil if @strip_query

  return url
end