Class: Classifier::ContentNode
- Defined in:
- lib/classifier/lsi/content_node.rb
Overview
This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.
Instance Attribute Summary collapse
-
#categories ⇒ Object
Returns the value of attribute categories.
-
#lsi_norm ⇒ Object
Returns the value of attribute lsi_norm.
-
#lsi_vector ⇒ Object
Returns the value of attribute lsi_vector.
-
#raw_norm ⇒ Object
Returns the value of attribute raw_norm.
-
#raw_vector ⇒ Object
Returns the value of attribute raw_vector.
-
#word_hash ⇒ Object
readonly
Returns the value of attribute word_hash.
Instance Method Summary collapse
-
#initialize(word_frequencies, *categories) ⇒ ContentNode
constructor
If text_proc is not specified, the source will be duck-typed via source.to_s.
-
#raw_vector_with(word_list) ⇒ Object
Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.
-
#search_norm ⇒ Object
Use this to fetch the appropriate search vector in normalized form.
-
#search_vector ⇒ Object
Use this to fetch the appropriate search vector.
Constructor Details
#initialize(word_frequencies, *categories) ⇒ ContentNode
If text_proc is not specified, the source will be duck-typed via source.to_s
18 19 20 21 |
# File 'lib/classifier/lsi/content_node.rb', line 18 def initialize(word_frequencies, *categories) @categories = categories || [] @word_hash = word_frequencies end |
Instance Attribute Details
#categories ⇒ Object
Returns the value of attribute categories.
10 11 12 |
# File 'lib/classifier/lsi/content_node.rb', line 10 def categories @categories end |
#lsi_norm ⇒ Object
Returns the value of attribute lsi_norm.
10 11 12 |
# File 'lib/classifier/lsi/content_node.rb', line 10 def lsi_norm @lsi_norm end |
#lsi_vector ⇒ Object
Returns the value of attribute lsi_vector.
10 11 12 |
# File 'lib/classifier/lsi/content_node.rb', line 10 def lsi_vector @lsi_vector end |
#raw_norm ⇒ Object
Returns the value of attribute raw_norm.
10 11 12 |
# File 'lib/classifier/lsi/content_node.rb', line 10 def raw_norm @raw_norm end |
#raw_vector ⇒ Object
Returns the value of attribute raw_vector.
10 11 12 |
# File 'lib/classifier/lsi/content_node.rb', line 10 def raw_vector @raw_vector end |
#word_hash ⇒ Object (readonly)
Returns the value of attribute word_hash.
14 15 16 |
# File 'lib/classifier/lsi/content_node.rb', line 14 def word_hash @word_hash end |
Instance Method Details
#raw_vector_with(word_list) ⇒ Object
Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/classifier/lsi/content_node.rb', line 35 def raw_vector_with(word_list) vec = if $GSL GSL::Vector.alloc(word_list.size) else Array.new(word_list.size, 0) end @word_hash.each_key do |word| vec[word_list[word]] = @word_hash[word] if word_list[word] end # Perform the scaling transform total_words = $GSL ? vec.sum : vec.sum_with_identity total_unique_words = vec.count { |word| word != 0 } # Perform first-order association transform if this vector has more # than one word in it. if total_words > 1.0 && total_unique_words > 1 weighted_total = 0.0 vec.each do |term| next unless term.positive? next if total_words.zero? term_over_total = term / total_words val = term_over_total * Math.log(term_over_total) weighted_total += val unless val.nan? end vec = vec.collect { |val| Math.log(val + 1) / -weighted_total } end if $GSL @raw_norm = vec.normalize @raw_vector = vec else @raw_norm = Vector[*vec].normalize @raw_vector = Vector[*vec] end end |
#search_norm ⇒ Object
Use this to fetch the appropriate search vector in normalized form.
29 30 31 |
# File 'lib/classifier/lsi/content_node.rb', line 29 def search_norm @lsi_norm || @raw_norm end |
#search_vector ⇒ Object
Use this to fetch the appropriate search vector.
24 25 26 |
# File 'lib/classifier/lsi/content_node.rb', line 24 def search_vector @lsi_vector || @raw_vector end |