Class: Arachni::Support::Signature

Inherits:
Object
  • Object
show all
Defined in:
lib/arachni/support/signature.rb

Overview

Represents a signature, used to maintain a lightweight representation of a String and refine it using similar Strings to remove noise.

Author:

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(data, options = {}) ⇒ Signature

Note:

The string will be tokenized based on whitespace.

Returns a new instance of Signature.

Parameters:

  • data (String, Signature)

    Seed data to use to initialize the signature.

  • options (Hash) (defaults to: {})

Options Hash (options):

  • :threshold (Integer)

    Sets the maximum allowed difference (in tokens) when performing comparisons.



34
35
36
37
38
39
40
41
# File 'lib/arachni/support/signature.rb', line 34

def initialize( data, options = {} )
    @tokens  = tokenize( data )
    @options = options

    if @options[:threshold] && !@options[:threshold].is_a?( Numeric )
        fail ArgumentError, 'Option :threshold must be a number.'
    end
end

Instance Attribute Details

#tokensObject (readonly)

Returns the value of attribute tokens.



25
26
27
# File 'lib/arachni/support/signature.rb', line 25

def tokens
  @tokens
end

Instance Method Details

#==(other) ⇒ Object

Parameters:



129
130
131
# File 'lib/arachni/support/signature.rb', line 129

def ==( other )
    hash == other.hash
end

#differences(other) ⇒ Integer

Returns Amount of differences between signatures.

Parameters:

Returns:

  • (Integer)

    Amount of differences between signatures.



102
103
104
105
106
107
# File 'lib/arachni/support/signature.rb', line 102

def differences( other )
    return nil if other.nil?
    return 0   if self == other

    ((tokens - other.tokens) | (other.tokens - tokens)).size
end

#distance(other, ins = 2, del = 2, sub = 1) ⇒ Integer

Note:

Very expensive, use #differences when possible.

Returns Levenshtein distance.

Parameters:

  • other (Signature)
  • ins (Integer) (defaults to: 2)

    Cost of an ‘insert` operation.

  • del (Integer) (defaults to: 2)

    Cost of a ‘delete` operation.

  • sub (Integer) (defaults to: 1)

    Cost of a ‘substitute` operation.

Returns:

  • (Integer)

    Levenshtein distance

See Also:



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/arachni/support/signature.rb', line 68

def distance( other, ins = 2, del = 2, sub = 1 )
    return nil if other.nil?
    return 0   if hash == other.hash

    # Distance matrix.
    dm = []

    # Initialize first row values.
    dm[0] = (0..tokens.size).collect { |i| i * ins }
    fill  = [0] * (tokens.size - 1)

    # Initialize first column values.
    (1..other.tokens.size).each do |i|
        dm[i] = [i * del, fill.flatten]
    end

    # Populate matrix.
    (1..other.tokens.size).each do |i|
        (1..tokens.size).each do |j|
            # Critical comparison.
            dm[i][j] = [
                dm[i-1][j-1] + (tokens[j-1] == other.tokens[i-1] ? 0 : sub),
                dm[i][j-1] + ins,
                dm[i-1][j] + del
            ].min
        end
    end

    # The last value in matrix is the Levenshtein distance.
    dm.last.last
end

#dupSignature

Returns Copy of ‘self`.

Returns:



120
121
122
# File 'lib/arachni/support/signature.rb', line 120

def dup
    self.class.new( '' ).tap { |s| s.copy( tokens, @options ) }
end

#hashObject



124
125
126
# File 'lib/arachni/support/signature.rb', line 124

def hash
    tokens.hash
end

#refine(data) ⇒ Signature

Note:

The string will be tokenized based on whitespace.

Returns New, refined signature.

Parameters:

Returns:



54
55
56
# File 'lib/arachni/support/signature.rb', line 54

def refine( data )
    dup.refine!( data )
end

#refine!(data) ⇒ Signature

Note:

The string will be tokenized based on whitespace.

Returns ‘self`.

Parameters:

Returns:



46
47
48
49
# File 'lib/arachni/support/signature.rb', line 46

def refine!( data )
    @tokens &= tokenize( data )
    self
end

#similar?(other, threshold = ) ⇒ Bool

Parameters:

Returns:

  • (Bool)


114
115
116
117
# File 'lib/arachni/support/signature.rb', line 114

def similar?( other, threshold = @options[:threshold] )
    fail 'No threshold given.' if !threshold
    self == other || differences( other ) < threshold
end