Class: Arachni::Parser

Inherits:

Object

Object
Arachni::Parser

show all

Includes:: Module::Utilities, UI::Output

Defined in:: lib/arachni/parser/parser.rb,
lib/arachni/parser/page.rb,
lib/arachni/parser/elements.rb

Overview

Analyzer class

Analyzes HTML code extracting forms, links and cookies depending on user opts.

It grabs all element attributes not just URLs and variables. All URLs are converted to absolute and URLs outside the domain are ignored.

Forms

Form analysis uses both regular expressions and the Nokogiri parser in order to be able to handle badly written HTML code, such as not closed tags and tag overlaps.

In order to ease audits, in addition to parsing forms into data structures like “select” and “option”, all auditable inputs are put under the “auditable” key.

Links

Links are extracted using the Nokogiri parser.

Cookies

Cookies are extracted from the HTTP headers and parsed by WEBrick::Cookie

@author: Tasos “Zapotek” Laskos

<[email protected]>
<[email protected]>

@version: 0.2.2

Defined Under Namespace

Modules: Element, Extractors Classes: Page

Instance Attribute Summary collapse

#opts ⇒ Options readonly

Options instance.
#url ⇒ String

The url of the page.

Instance Method Summary collapse

#base ⇒ Object
#cookies ⇒ Array<Element::Cookie>

Extracts cookies from an HTTP headers.
#dir(url) ⇒ Object
#doc ⇒ Object
#exclude?(url) ⇒ Boolean
#extract_domain(url) ⇒ String

Extracts the domain from a URI object.
#forms(html = nil) ⇒ Array<Element::Form>

TODO: Add support for radio buttons.
#headers ⇒ Hash

Returns a list of valid auditable HTTP header fields.
#in_domain?(uri) ⇒ Boolean

Returns true if uri is in the same domain as the page, returns false otherwise.
#include?(url) ⇒ Boolean
#initialize(opts, res) ⇒ Parser constructor

Constructor Instantiates Analyzer class with user options.
#link_vars(link) ⇒ Hash

Extracts variables and their values from a link.
#links ⇒ Array<Element::Link>

Extracts links from HTML document.
#merge_with_cookiejar(cookies) ⇒ Array<Element::Cookie>

Merges ‘cookies’ with the cookiejar and returns it as an array.
#merge_with_cookiestore(cookies) ⇒ Object
#paths ⇒ Array<URI>

Array of distinct links to follow.
#run ⇒ Page

Runs the Analyzer and extracts forms, links and cookies.
#skip?(path) ⇒ Boolean
#text? ⇒ Boolean
#to_absolute(link) ⇒ String

Converts relative URL link into an absolute URL based on the location of the page.
#too_deep?(url) ⇒ Boolean

Methods included from Module::Utilities

#exception_jail, #get_path, #hash_keys_to_str, #normalize_url, #read_file, #seed, #uri_decode, #uri_encode, #uri_parse, #uri_parser, #url_sanitize

Methods included from UI::Output

#buffer, #debug!, #debug?, #flush_buffer, #mute!, #muted?, #only_positives!, #only_positives?, #print_bad, #print_debug, #print_debug_backtrace, #print_debug_pp, #print_error, #print_error_backtrace, #print_info, #print_line, #print_ok, #print_status, #print_verbose, #reroute_to_file, #reroute_to_file?, #uncap_buffer!, #unmute!, #verbose!, #verbose?

Constructor Details

#initialize(opts, res) ⇒ `Parser`

Constructor Instantiates Analyzer class with user options.

Parameters:

opts (Options)

# File 'lib/arachni/parser/parser.rb', line 99

def initialize( opts, res )
    @opts = opts

    @code = res.code
    @url  = url_sanitize( res.effective_url )
    @html = res.body
    @response_headers = res.headers_hash

    @doc   = nil
    @paths = nil
end

Instance Attribute Details

#opts ⇒ `Options` (readonly)

Options instance

Returns:

(Options)



91
92
93

# File 'lib/arachni/parser/parser.rb', line 91

def opts
  @opts
end

#url ⇒ `String`

Returns the url of the page.

Returns:

(String) —

the url of the page



84
85
86

# File 'lib/arachni/parser/parser.rb', line 84

def url
  @url
end

Instance Method Details

#base ⇒ `Object`

# File 'lib/arachni/parser/parser.rb', line 547

def base
    begin
        tmp = doc.search( '//base[@href]' )
        return tmp[0]['href'].dup
    rescue
        return
    end
end

#cookies ⇒ `Array<Element::Cookie>`

Extracts cookies from an HTTP headers

Parameters:

headers (String) —

HTTP headers
html (String) —

the HTML code of the page

Returns:

(Array<Element::Cookie>) —

of cookies

# File 'lib/arachni/parser/parser.rb', line 401

def cookies

    cookies_arr = []
    cookies     = []

    begin
        doc.search( "//meta[@http-equiv]" ).each {
            |elem|

            next if elem['http-equiv'].downcase != 'set-cookie'
            k, v = elem['content'].split( ';' )[0].split( '=', 2 )
            cookies_arr << Element::Cookie.new( @url, { 'name' => k, 'value' => v } )
        }
    rescue Exception => e
        # ap e
        # ap e.backtrace
    end


    # don't ask me why....
    if @response_headers.to_s.downcase.substring?( 'set-cookie' )
        begin
            cookies << ::WEBrick::Cookie.parse_set_cookies( @response_headers['Set-Cookie'].to_s )
            cookies << ::WEBrick::Cookie.parse_set_cookies( @response_headers['set-cookie'].to_s )
        rescue Exception => e
            # ap e
            # ap e.backtrace
            return cookies_arr
        end
    end

    cookies.flatten.uniq.each_with_index {
        |cookie, i|
        cookies_arr[i] = Hash.new

        cookie.instance_variables.each {
            |var|
            value = cookie.instance_variable_get( var ).to_s
            value.strip!

            key = normalize_name( var )
            val = value.gsub( /[\"\\\[\]]/, '' )

            next if val == seed
            cookies_arr[i][key] = val
        }

        # cookies.reject!{ |cookie| cookie['name'] == cookies_arr[i]['name'] }

        cookies_arr[i] = Element::Cookie.new( @url, cookies_arr[i] )
    }
    cookies_arr.flatten!
    return cookies_arr
end

#dir(url) ⇒ `Object`



456
457
458

# File 'lib/arachni/parser/parser.rb', line 456

def dir( url )
    URI( File.dirname( URI( url.to_s ).path ) + '/' )
end

#doc ⇒ `Object`

# File 'lib/arachni/parser/parser.rb', line 179

def doc
  return @doc if @doc
  @doc = Nokogiri::HTML( @html ) if @html rescue nil
end

#exclude?(url) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/arachni/parser/parser.rb', line 598

def exclude?( url )
    @opts.exclude.each {
        |pattern|
        return true if url.to_s =~ pattern
    }

    return false
end

#extract_domain(url) ⇒ `String`

Extracts the domain from a URI object

Parameters:

url (URI)

Returns:

(String)

# File 'lib/arachni/parser/parser.rb', line 587

def extract_domain( url )

    if !url.host then return false end

    splits = url.host.split( /\./ )

    if splits.length == 1 then return true end

    splits[-2] + "." + splits[-1]
end

#forms(html = nil) ⇒ `Array<Element::Form>`

TODO: Add support for radio buttons.

Extracts forms from HTML document

Parameters:

html (String) (defaults to: nil)

Returns:

(Array<Element::Form>) —

array of forms

#headers ⇒ `Hash`

Returns a list of valid auditable HTTP header fields.

It’s more of a placeholder method, it doesn’t actually analyze anything. It’s a long shot that any of these will be vulnerable but better be safe than sorry.

Returns:

(Hash) —

HTTP header fields

# File 'lib/arachni/parser/parser.rb', line 247

def headers
    headers_arr  = []
    {
        'accept'          => 'text/html,application/xhtml+xml,application' +
            '/xml;q=0.9,*/*;q=0.8',
        'accept-charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
        'accept-language' => 'en-gb,en;q=0.5',
        'accept-encoding' => 'gzip;q=1.0,deflate;q=0.6,identity;q=0.3',
        'from'       => @opts.authed_by || '',
        'user-agent' => @opts.user_agent || '',
        'referer'    => @url,
        'pragma'     => 'no-cache'
    }.each {
        |k,v|
        headers_arr << Element::Header.new( @url, { k => v } )
    }

    return headers_arr
end

#in_domain?(uri) ⇒ `Boolean`

Returns true if uri is in the same domain as the page, returns false otherwise

Returns:

(Boolean)

# File 'lib/arachni/parser/parser.rb', line 569

def in_domain?( uri )

    curi = URI.parse( normalize_url( uri.to_s ) )

    if( @opts.follow_subdomains )
        return extract_domain( curi ) ==  extract_domain( URI( @url.to_s ) )
    end

    return curi.host == URI.parse( normalize_url( @url.to_s ) ).host
end

#include?(url) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/arachni/parser/parser.rb', line 607

def include?( url )
    return true if @opts.include.empty?

    @opts.include.each {
        |pattern|
        pattern = Regexp.new( pattern ) if pattern.is_a?( String )
        return true if url.to_s =~ pattern
    }
    return false
end

#link_vars(link) ⇒ `Hash`

Extracts variables and their values from a link

Parameters:

link (String)

Returns:

(Hash) —

name=>value pairs

#links ⇒ `Array<Element::Link>`

Extracts links from HTML document

Parameters:

html (String)

Returns:

(Array<Element::Link>) —

of links

#merge_with_cookiejar(cookies) ⇒ `Array<Element::Cookie>`

Merges ‘cookies’ with the cookiejar and returns it as an array

Parameters:

cookies (Array<Hash>)

Returns:

(Array<Element::Cookie>) —

the merged cookies

# File 'lib/arachni/parser/parser.rb', line 222

def merge_with_cookiejar( cookies )
    return cookies if !@opts.cookies

    @opts.cookies.each_pair {
        |name, value|
        cookies << Element::Cookie.new( @url,
            {
                'name'    => name,
                'value'   => value
            } )
    }

    return cookies
end

#merge_with_cookiestore(cookies) ⇒ `Object`

# File 'lib/arachni/parser/parser.rb', line 184

def merge_with_cookiestore( cookies )

    @cookiestore ||= []

    if @cookiestore.empty?
        @cookiestore = cookies
    else
        tmp = {}
        @cookiestore.each {
            |cookie|
            tmp.merge!( cookie.simple )
        }

        cookies.each {
            |cookie|
            tmp.merge!( cookie.simple )
        }

        @cookiestore = tmp.map {
            |name, value|
            Element::Cookie.new( @url, {
                'name'    => name,
                'value'   => value
            } )
        }
    end

    return @cookiestore

end

#paths ⇒ `Array<URI>`

Array of distinct links to follow

Returns:

(Array<URI>)

# File 'lib/arachni/parser/parser.rb', line 465

def paths
  return @paths unless @paths.nil?
  @paths = []
  return @paths if !doc

  @paths = run_extractors
  return @paths
end

#run ⇒ `Page`

Runs the Analyzer and extracts forms, links and cookies

Returns:

(Page)

# File 'lib/arachni/parser/parser.rb', line 116

def run

    # non text files won't contain any auditable elements
    if !text?
        return Page.new( {
            :code        => @code,
            :url         => @url,
            :query_vars  => link_vars( @url ),
            :html        => @html,
            :headers     => [],
            :response_headers     => @response_headers,
            :paths       => [],
            :forms       => [],
            :links       => [],
            :cookies     => [],
            :cookiejar   => []
        } )
    end


    cookies_arr = cookies
    cookies_arr = merge_with_cookiejar( cookies_arr.flatten.uniq )

    jar = {}
    jar = @opts.cookies = Arachni::HTTP.parse_cookiejar( @opts.cookie_jar ) if @opts.cookie_jar

    preped = {}
    cookies_arr.each{ |cookie| preped.merge!( cookie.simple ) }

    jar = preped.merge( jar )

    c_links = links

    if !( vars = link_vars( @url ) ).empty?
        url = to_absolute( @url )
        c_links << Arachni::Parser::Element::Link.new( url, {
            'href' => url,
            'vars' => vars
        } )
    end

    return Page.new( {
        :code        => @code,
        :url         => @url,
        :query_vars  => link_vars( @url ),
        :html        => @html,
        :headers     => headers(),
        :response_headers     => @response_headers,
        :paths       => paths(),
        :forms       => @opts.audit_forms ? forms() : [],
        :links       => @opts.audit_links ? c_links : [],
        :cookies     => merge_with_cookiestore( merge_with_cookiejar( cookies_arr ) ),
        :cookiejar   => jar
    } )

end

#skip?(path) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/arachni/parser/parser.rb', line 618

def skip?( path )
    return true if !path

    begin
        return true if !include?( path )
        return true if exclude?( path )
        return true if too_deep?( path )
        return true if !in_domain?( path )
    rescue
        true
    end
end

#text? ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/arachni/parser/parser.rb', line 173

def text?
    type = Arachni::HTTP.content_type( @response_headers )
    return false if !type
    return type.to_s.substring?( 'text' )
end

#to_absolute(link) ⇒ `String`

Converts relative URL link into an absolute URL based on the location of the page

Parameters:

link (String)

Returns:

(String)

# File 'lib/arachni/parser/parser.rb', line 510

def to_absolute( link )

    begin
        link = normalize_url( link )
        if uri_parser.parse( link ).host
            return link
        end
    rescue Exception => e
        # ap e
        # ap e.backtrace
        return nil
    end

    begin
        # remove anchor
        link = uri_encode( link.to_s.gsub( /#[a-zA-Z0-9_-]*$/,'' ) )

        if url = base
            base_url = uri_parser.parse( url )
        else
            base_url = uri_parser.parse( @url )
        end

        relative = uri_parser.parse( link )
        absolute = base_url.merge( relative )

        absolute.path = '/' if absolute.path && absolute.path.empty?

        return absolute.to_s
    rescue Exception => e
        # ap e
        # ap e.backtrace
        return nil
    end
end

#too_deep?(url) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/arachni/parser/parser.rb', line 557

def too_deep?( url )
    if @opts.depth_limit > 0 && (@opts.depth_limit + 1) <= URI(url.to_s).path.count( '/' )
        return true
    else
        return false
    end
end

Class: Arachni::Parser

Overview

Forms

Links

Cookies

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Module::Utilities

Methods included from UI::Output

Constructor Details

#initialize(opts, res) ⇒ Parser

Instance Attribute Details

#opts ⇒ Options (readonly)

#url ⇒ String

Instance Method Details

#base ⇒ Object

#cookies ⇒ Array<Element::Cookie>

#dir(url) ⇒ Object

#doc ⇒ Object

#exclude?(url) ⇒ Boolean

#extract_domain(url) ⇒ String

#forms(html = nil) ⇒ Array<Element::Form>

#headers ⇒ Hash

#in_domain?(uri) ⇒ Boolean

#include?(url) ⇒ Boolean

#link_vars(link) ⇒ Hash

#links ⇒ Array<Element::Link>

#merge_with_cookiejar(cookies) ⇒ Array<Element::Cookie>

#merge_with_cookiestore(cookies) ⇒ Object

#paths ⇒ Array<URI>

#run ⇒ Page

#skip?(path) ⇒ Boolean

#text? ⇒ Boolean

#to_absolute(link) ⇒ String

#too_deep?(url) ⇒ Boolean

#initialize(opts, res) ⇒ `Parser`

#opts ⇒ `Options` (readonly)

#url ⇒ `String`

#base ⇒ `Object`

#cookies ⇒ `Array<Element::Cookie>`

#dir(url) ⇒ `Object`

#doc ⇒ `Object`

#exclude?(url) ⇒ `Boolean`

#extract_domain(url) ⇒ `String`

#forms(html = nil) ⇒ `Array<Element::Form>`

#headers ⇒ `Hash`

#in_domain?(uri) ⇒ `Boolean`

#include?(url) ⇒ `Boolean`

#link_vars(link) ⇒ `Hash`

#links ⇒ `Array<Element::Link>`

#merge_with_cookiejar(cookies) ⇒ `Array<Element::Cookie>`

#merge_with_cookiestore(cookies) ⇒ `Object`

#paths ⇒ `Array<URI>`

#run ⇒ `Page`

#skip?(path) ⇒ `Boolean`

#text? ⇒ `Boolean`

#to_absolute(link) ⇒ `String`

#too_deep?(url) ⇒ `Boolean`