Module: Msf::Auxiliary::HttpCrawler

Includes:: Report

Defined in:: lib/msf/core/auxiliary/http_crawler.rb

Overview

This module provides methods for implementing a web crawler

Defined Under Namespace

Classes: MaximumPageCount, WebTarget

Instance Attribute Summary collapse

#form_count ⇒ Object

Returns the value of attribute form_count.
#request_count ⇒ Object

Returns the value of attribute request_count.
#targets ⇒ Object

Some accessors for stat tracking.
#url_count ⇒ Object

Returns the value of attribute url_count.
#url_total ⇒ Object

Returns the value of attribute url_total.

Instance Method Summary collapse

#cleanup ⇒ Object
#crawl_target(t) ⇒ Object
#crawler_options(t) ⇒ Object
#crawler_process_page(t, page, cnt) ⇒ Object

Specific module implementations should redefine this method with whatever is meaningful to them.
#dirbust? ⇒ Boolean
#focus_crawl(page) ⇒ Object
#get_connection_timeout ⇒ Object
#get_link_filter ⇒ Object

Scrub links that end in these extensions.
#initialize(info = {}) ⇒ Object
#max_crawl_threads ⇒ Object
#max_crawl_time ⇒ Object
#max_page_count ⇒ Object
#proxies ⇒ Object

Returns the configured proxy list.
#rhost ⇒ Object

Returns the target host.
#rport ⇒ Object

Returns the remote port.
#run ⇒ Object

Entry point for the crawler code.
#setup ⇒ Object
#ssl ⇒ Object

Returns the boolean indicating SSL.
#ssl_version ⇒ Object

Returns the string indicating SSL version.
#vhost ⇒ Object

Returns the VHOST of the HTTP server.

Methods included from Report

#active_db?, #create_cracked_credential, #create_credential, #create_credential_and_login, #create_credential_login, #db, #db_warning_given?, #get_client, #get_host, #inside_workspace_boundary?, #invalidate_login, #mytask, #myworkspace, #myworkspace_id, #report_auth_info, #report_client, #report_exploit, #report_host, #report_loot, #report_note, #report_service, #report_vuln, #report_web_form, #report_web_page, #report_web_site, #report_web_vuln, #store_cred, #store_local, #store_loot

Methods included from Metasploit::Framework::Require

optionally, optionally_active_record_railtie, optionally_include_metasploit_credential_creation, #optionally_include_metasploit_credential_creation, optionally_require_metasploit_db_gem_engines

Instance Attribute Details

permalink #form_count ⇒ `Object`

Returns the value of attribute form_count.

[View on GitHub]


100
101
102

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 100

def form_count
  @form_count
end

permalink #request_count ⇒ `Object`

Returns the value of attribute request_count.

[View on GitHub]


100
101
102

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 100

def request_count
  @request_count
end

permalink #targets ⇒ `Object`

Some accessors for stat tracking

[View on GitHub]


99
100
101

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 99

def targets
  @targets
end

permalink #url_count ⇒ `Object`

Returns the value of attribute url_count.

[View on GitHub]


100
101
102

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 100

def url_count
  @url_count
end

permalink #url_total ⇒ `Object`

Returns the value of attribute url_total.

[View on GitHub]


100
101
102

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 100

def url_total
  @url_total
end

Instance Method Details

permalink #cleanup ⇒ `Object`

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 68

def cleanup
  if @crawler
    @crawler.shutdown rescue nil
    @crawler = nil
  end
  super
end

permalink #crawl_target(t) ⇒ `Object`

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 191

def crawl_target(t)
  cnt  = 0
  opts = crawler_options(t)
  url  = t.to_url

  @crawler = ::Anemone::Core.new([url], opts)
  @crawler.on_every_page do |page|
    cnt += 1

    self.request_count += 1

    # Extract any interesting data from the page
    crawler_process_page(t, page, cnt)

    # Blow up if we hit our maximum page count
    if cnt >= max_page_count
      print_error("Maximum page count reached for #{url}")
      raise MaximumPageCount, "Maximum page count reached"
    end
  end

  # Skip link processing based on a regular expression
  @crawler.skip_links_like(
    get_link_filter
  )

  # Focus our crawling on interesting, but not over-crawled links
  @crawler.focus_crawl do |page|
    focus_crawl(page)
  end

  begin
    @crawler.run
  rescue MaximumPageCount
    # No need to print anything else
  rescue ::Timeout::Error
    # Bubble this up to the top-level handler
    raise $!
  rescue ::Exception => e
    # Ridiculous f'ing anonymous timeout exception which I've no idea
    # how it comes into existence.
    if e.to_s =~ /execution expired/
      raise ::Timeout::Error
    else
      print_error("Crawler Exception: #{url} #{e} #{e.backtrace}")
    end
  ensure
    @crawler.shutdown rescue nil
    @crawler = nil
  end
end

permalink #crawler_options(t) ⇒ `Object`

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 269

def crawler_options(t)
  opts = {}
  opts[:user_agent]      = datastore['UserAgent']
  opts[:verbose]         = false
  opts[:threads]         = max_crawl_threads
  opts[:obey_robots_txt] = false
  opts[:redirect_limit]  = datastore['RedirectLimit']
  opts[:retry_limit]     = datastore['RetryLimit']
  opts[:accept_cookies]  = true
  opts[:depth_limit]     = false
  opts[:skip_query_strings]  = false
  opts[:discard_page_bodies] = true
  opts[:framework]           = framework
  opts[:module]              = self
  opts[:timeout]             = get_connection_timeout
  opts[:dirbust]             = dirbust?

  if (t[:headers] and t[:headers].length > 0)
    opts[:inject_headers] = t[:headers]
  end

  if t[:cookies]
    opts[:cookies] = t[:cookies]
  end

  opts[:username] = t[:username] || ''
  opts[:password] = t[:password] || ''
  opts[:domain]   = t[:domain]   || 'WORKSTATION'

  if ssl
    opts[:ssl_version] = ssl_version
  end
  opts
end

permalink #crawler_process_page(t, page, cnt) ⇒ `Object`

Specific module implementations should redefine this method with whatever is meaningful to them.

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 245

def crawler_process_page(t, page, cnt)
  return if page.nil? # Skip over pages that don't contain any info aka page is nil. We can't process these types of pages since there is no data to process.
  msg = "[#{"%.5d" % cnt}/#{"%.5d" % max_page_count}]    #{page ? page.code || "ERR" : "ERR"} - #{@current_site.vhost} - #{page.url}"
  case page.code
    when 301,302
      if page.headers and page.headers["location"]
        print_status(msg + " -> " + page.headers["location"].to_s)
      else
        print_status(msg)
      end
    when 500...599
      # XXX: Log the fact that we hit an error page
      print_good(msg)
    when 401,403
      print_good(msg)
    when 200
      print_status(msg)
    when 404
      print_error(msg)
    else
      print_error(msg)
  end
end

permalink #dirbust? ⇒ `Boolean`

Returns:

(Boolean)

[View source] [View on GitHub]


177
178
179

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 177

def dirbust?
  datastore['DirBust']
end

permalink #focus_crawl(page) ⇒ `Object`

[View source] [View on GitHub]


187
188
189

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 187

def focus_crawl(page)
  page.links
end

permalink #get_connection_timeout ⇒ `Object`

[View source] [View on GitHub]


161
162
163

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 161

def get_connection_timeout
  datastore['RequestTimeout']
end

permalink #get_link_filter ⇒ `Object`

Scrub links that end in these extensions. If more or less is desired by a particular module, this should get redefined.

[View source] [View on GitHub]


183
184
185

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 183

def get_link_filter
  /\.(js|png|jpe?g|bmp|gif|swf|jar|zip|gz|bz2|rar|pdf|docx?|pptx?)$/i
end

permalink #initialize(info = {}) ⇒ `Object`

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 13

def initialize(info = {})
  super

  register_options(
    [
      Opt::RHOST,
      Opt::RPORT(80),
      OptString.new('VHOST', [ false, "HTTP server virtual host" ]),
      OptString.new('URI',   [ true, "The starting page to crawl", "/"]),
      Opt::Proxies,
      OptInt.new('MAX_PAGES', [ true, 'The maximum number of pages to crawl per URL', 500]),
      OptInt.new('MAX_MINUTES', [ true, 'The maximum number of minutes to spend on each URL', 5]),
      OptInt.new('MAX_THREADS', [ true, 'The maximum number of concurrent requests', 4]),
      OptString.new('HttpUsername', [false, 'The HTTP username to specify for authentication']),
      OptString.new('HttpPassword', [false, 'The HTTP password to specify for authentication']),
      OptString.new('DOMAIN', [ true, 'The domain to use for windows authentication', 'WORKSTATION']),
      OptBool.new('SSL', [ false, 'Negotiate SSL/TLS for outgoing connections', false])

    ], self.class
  )

  register_advanced_options(
    [
      OptBool.new('DirBust', [ false, 'Bruteforce common URL paths', true]),
      OptInt.new('RequestTimeout', [false, 'The maximum number of seconds to wait for a reply', 15]),
      OptInt.new('RedirectLimit', [false, 'The maximum number of redirects for a single request', 5]),
      OptInt.new('RetryLimit', [false, 'The maximum number of attempts for a single request', 5]),
      OptString.new('UserAgent', [true, 'The User-Agent header to use for all requests',
        "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
      ]),
      OptString.new('BasicAuthUser', [false, 'The HTTP username to specify for basic authentication']),
      OptString.new('BasicAuthPass', [false, 'The HTTP password to specify for basic authentication']),
      OptString.new('HTTPAdditionalHeaders', [false, "A list of additional headers to send (separated by \\x01)"]),
      OptString.new('HTTPCookie', [false, "A HTTP cookie header to send with each request"]),
      Opt::SSLVersion
    ], self.class
  )

  register_autofilter_ports([ 80, 8080, 443, 8000, 8888, 8880, 8008, 3000, 8443 ])
  register_autofilter_services(%W{ http https })

  begin
    require 'anemone'
    @anemone_loaded = true
  rescue ::Exception => e
    @anemone_loaded = false
    @anemone_error  = e
  end
end

permalink #max_crawl_threads ⇒ `Object`

[View source] [View on GitHub]


173
174
175

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 173

def max_crawl_threads
  datastore['MAX_THREADS']
end

permalink #max_crawl_time ⇒ `Object`

[View source] [View on GitHub]


169
170
171

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 169

def max_crawl_time
  datastore['MAX_MINUTES'] * 60.0
end

permalink #max_page_count ⇒ `Object`

[View source] [View on GitHub]


165
166
167

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 165

def max_page_count
  datastore['MAX_PAGES']
end

permalink #proxies ⇒ `Object`

Returns the configured proxy list

[View source] [View on GitHub]


349
350
351

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 349

def proxies
  datastore['Proxies']
end

permalink #rhost ⇒ `Object`

Returns the target host

[View source] [View on GitHub]


314
315
316

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 314

def rhost
  datastore['RHOST']
end

permalink #rport ⇒ `Object`

Returns the remote port

[View source] [View on GitHub]


321
322
323

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 321

def rport
  datastore['RPORT']
end

permalink #run ⇒ `Object`

Entry point for the crawler code

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 104

def run

  self.request_count = 0
  self.form_count  = 0
  self.url_count   = 0
  self.url_total   = 1

  path,query = datastore['URI'].split('?', 2)
  query ||= ""

  t = WebTarget.new

  t.merge!({
    :vhost    => vhost,
    :host     => rhost,
    :port     => rport,
    :ssl      => ssl,
    :path     => path,
    :query    => query,
    :info     => ""
  })

  if datastore['HttpUsername'] and datastore['HttpUsername'] != ''
    t[:username] = datastore['HttpUsername'].to_s
    t[:password] = datastore['HttpPassword'].to_s
    t[:domain]   = datastore['DOMAIN'].to_s
  end

  if datastore['HTTPCookie']
    t[:cookies] = {}
    datastore['HTTPCookie'].to_s.split(';').each do |pair|
      k,v = pair.strip.split('=', 2)
      next if not v
      t[:cookies][k] = v
    end
  end

  if datastore['HTTPAdditionalHeaders']
    t[:headers] = datastore['HTTPAdditionalHeaders'].to_s.split("\x01").select{|x| x.to_s.length > 0}
  end

  t[:site] = report_web_site(:wait => true, :host => t[:host], :port => t[:port], :vhost => t[:vhost], :ssl => t[:ssl])

  print_status("Crawling #{t.to_url}...")

  begin
    @current_vhost = t[:vhost]
    @current_site  = t[:site]
    ::Timeout.timeout(max_crawl_time) { crawl_target(t) }
  rescue ::Timeout::Error
    print_error("Crawl of #{t.to_url} has reached the configured timeout")
  ensure
    @current_vhost = nil
  end
  print_status("Crawl of #{t.to_url} complete")
end

permalink #setup ⇒ `Object`

Raises:

(RuntimeError)

[View source] [View on GitHub]

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 63

def setup
  raise RuntimeError, "Could not load Anemone/Nokogiri: #{@anemone_error}" if not @anemone_loaded
  super
end

permalink #ssl ⇒ `Object`

Returns the boolean indicating SSL

[View source] [View on GitHub]


335
336
337

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 335

def ssl
  ((datastore.default?('SSL') and rport.to_i == 443) or datastore['SSL'])
end

permalink #ssl_version ⇒ `Object`

Returns the string indicating SSL version

[View source] [View on GitHub]


342
343
344

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 342

def ssl_version
  datastore['SSLVersion']
end

permalink #vhost ⇒ `Object`

Returns the VHOST of the HTTP server.

[View source] [View on GitHub]


328
329
330

# File 'lib/msf/core/auxiliary/http_crawler.rb', line 328

def vhost
  datastore['VHOST'] || datastore['RHOST']
end

Module: Msf::Auxiliary::HttpCrawler

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Report

Methods included from Metasploit::Framework::Require

Instance Attribute Details

permalink #form_count ⇒ Object

permalink #request_count ⇒ Object

permalink #targets ⇒ Object

permalink #url_count ⇒ Object

permalink #url_total ⇒ Object

Instance Method Details

permalink #cleanup ⇒ Object

permalink #crawl_target(t) ⇒ Object

permalink #crawler_options(t) ⇒ Object

permalink #crawler_process_page(t, page, cnt) ⇒ Object

permalink #dirbust? ⇒ Boolean

permalink #focus_crawl(page) ⇒ Object

permalink #get_connection_timeout ⇒ Object

permalink #get_link_filter ⇒ Object

permalink #initialize(info = {}) ⇒ Object

permalink #max_crawl_threads ⇒ Object

permalink #max_crawl_time ⇒ Object

permalink #max_page_count ⇒ Object

permalink #proxies ⇒ Object

permalink #rhost ⇒ Object

permalink #rport ⇒ Object

permalink #run ⇒ Object

permalink #setup ⇒ Object

permalink #ssl ⇒ Object

permalink #ssl_version ⇒ Object

permalink #vhost ⇒ Object

permalink #form_count ⇒ `Object`

permalink #request_count ⇒ `Object`

permalink #targets ⇒ `Object`

permalink #url_count ⇒ `Object`

permalink #url_total ⇒ `Object`

permalink #cleanup ⇒ `Object`

permalink #crawl_target(t) ⇒ `Object`

permalink #crawler_options(t) ⇒ `Object`

permalink #crawler_process_page(t, page, cnt) ⇒ `Object`

permalink #dirbust? ⇒ `Boolean`

permalink #focus_crawl(page) ⇒ `Object`

permalink #get_connection_timeout ⇒ `Object`

permalink #get_link_filter ⇒ `Object`

permalink #initialize(info = {}) ⇒ `Object`

permalink #max_crawl_threads ⇒ `Object`

permalink #max_crawl_time ⇒ `Object`

permalink #max_page_count ⇒ `Object`

permalink #proxies ⇒ `Object`

permalink #rhost ⇒ `Object`

permalink #rport ⇒ `Object`

permalink #run ⇒ `Object`

permalink #setup ⇒ `Object`

permalink #ssl ⇒ `Object`

permalink #ssl_version ⇒ `Object`

permalink #vhost ⇒ `Object`