Class: Classifier::ContentNode

Inherits:
Object
  • Object
show all
Defined in:
lib/classifier/lsi/content_node.rb

Overview

This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(word_frequencies, *categories) ⇒ ContentNode

If text_proc is not specified, the source will be duck-typed via source.to_s



18
19
20
21
# File 'lib/classifier/lsi/content_node.rb', line 18

def initialize(word_frequencies, *categories)
  @categories = categories || []
  @word_hash = word_frequencies
end

Instance Attribute Details

#categoriesObject

Returns the value of attribute categories.



10
11
12
# File 'lib/classifier/lsi/content_node.rb', line 10

def categories
  @categories
end

#lsi_normObject

Returns the value of attribute lsi_norm.



10
11
12
# File 'lib/classifier/lsi/content_node.rb', line 10

def lsi_norm
  @lsi_norm
end

#lsi_vectorObject

Returns the value of attribute lsi_vector.



10
11
12
# File 'lib/classifier/lsi/content_node.rb', line 10

def lsi_vector
  @lsi_vector
end

#raw_normObject

Returns the value of attribute raw_norm.



10
11
12
# File 'lib/classifier/lsi/content_node.rb', line 10

def raw_norm
  @raw_norm
end

#raw_vectorObject

Returns the value of attribute raw_vector.



10
11
12
# File 'lib/classifier/lsi/content_node.rb', line 10

def raw_vector
  @raw_vector
end

#word_hashObject (readonly)

Returns the value of attribute word_hash.



14
15
16
# File 'lib/classifier/lsi/content_node.rb', line 14

def word_hash
  @word_hash
end

Instance Method Details

#raw_vector_with(word_list) ⇒ Object

Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/classifier/lsi/content_node.rb', line 35

def raw_vector_with(word_list)
  vec = if $GSL
          GSL::Vector.alloc(word_list.size)
        else
          Array.new(word_list.size, 0)
        end

  @word_hash.each_key do |word|
    vec[word_list[word]] = @word_hash[word] if word_list[word]
  end

  # Perform the scaling transform
  total_words = $GSL ? vec.sum : vec.sum_with_identity
  total_unique_words = vec.count { |word| word != 0 }

  # Perform first-order association transform if this vector has more
  # than one word in it.
  if total_words > 1.0 && total_unique_words > 1
    weighted_total = 0.0

    vec.each do |term|
      next unless term.positive?
      next if total_words.zero?

      term_over_total = term / total_words
      val = term_over_total * Math.log(term_over_total)
      weighted_total += val unless val.nan?
    end
    vec = vec.collect { |val| Math.log(val + 1) / -weighted_total }
  end

  if $GSL
    @raw_norm   = vec.normalize
    @raw_vector = vec
  else
    @raw_norm   = Vector[*vec].normalize
    @raw_vector = Vector[*vec]
  end
end

#search_normObject

Use this to fetch the appropriate search vector in normalized form.



29
30
31
# File 'lib/classifier/lsi/content_node.rb', line 29

def search_norm
  @lsi_norm || @raw_norm
end

#search_vectorObject

Use this to fetch the appropriate search vector.



24
25
26
# File 'lib/classifier/lsi/content_node.rb', line 24

def search_vector
  @lsi_vector || @raw_vector
end