Class: ClassifierReborn::ContentNode
- Inherits:
-
Object
- Object
- ClassifierReborn::ContentNode
- Defined in:
- lib/classifier-reborn/lsi/content_node.rb
Overview
This is an internal data structure class for the LSI node. Save for raw_vector_with, it should be fairly straightforward to understand. You should never have to use it directly.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#categories ⇒ Object
Returns the value of attribute categories.
-
#lsi_norm ⇒ Object
Returns the value of attribute lsi_norm.
-
#lsi_vector ⇒ Object
Returns the value of attribute lsi_vector.
-
#raw_norm ⇒ Object
Returns the value of attribute raw_norm.
-
#raw_vector ⇒ Object
Returns the value of attribute raw_vector.
-
#word_hash ⇒ Object
readonly
Returns the value of attribute word_hash.
Instance Method Summary collapse
-
#initialize(word_hash, *categories) ⇒ ContentNode
constructor
If text_proc is not specified, the source will be duck-typed via source.to_s.
-
#raw_vector_with(word_list) ⇒ Object
Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.
-
#search_norm ⇒ Object
Use this to fetch the appropriate search vector in normalized form.
-
#search_vector ⇒ Object
Use this to fetch the appropriate search vector.
-
#transposed_search_vector ⇒ Object
Method to access the transposed search vector.
Constructor Details
#initialize(word_hash, *categories) ⇒ ContentNode
If text_proc is not specified, the source will be duck-typed via source.to_s
19 20 21 22 23 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 19 def initialize(word_hash, *categories) @categories = categories || [] @word_hash = word_hash @lsi_norm, @lsi_vector = nil end |
Instance Attribute Details
#categories ⇒ Object
Returns the value of attribute categories.
12 13 14 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12 def categories @categories end |
#lsi_norm ⇒ Object
Returns the value of attribute lsi_norm.
12 13 14 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12 def lsi_norm @lsi_norm end |
#lsi_vector ⇒ Object
Returns the value of attribute lsi_vector.
12 13 14 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12 def lsi_vector @lsi_vector end |
#raw_norm ⇒ Object
Returns the value of attribute raw_norm.
12 13 14 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12 def raw_norm @raw_norm end |
#raw_vector ⇒ Object
Returns the value of attribute raw_vector.
12 13 14 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 12 def raw_vector @raw_vector end |
#word_hash ⇒ Object (readonly)
Returns the value of attribute word_hash.
16 17 18 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 16 def word_hash @word_hash end |
Instance Method Details
#raw_vector_with(word_list) ⇒ Object
Creates the raw vector out of word_hash using word_list as the key for mapping the vector space.
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 46 def raw_vector_with(word_list) vec = if $SVD == :numo Numo::DFloat.zeros(word_list.size) elsif $SVD == :gsl GSL::Vector.alloc(word_list.size) else Array.new(word_list.size, 0) end @word_hash.each_key do |word| vec[word_list[word]] = @word_hash[word] if word_list[word] end # Perform the scaling transform and force floating point arithmetic if $SVD == :numo total_words = vec.sum.to_f elsif $SVD == :gsl sum = 0.0 vec.each { |v| sum += v } total_words = sum else total_words = vec.reduce(0, :+).to_f end total_unique_words = 0 if [:numo, :gsl].include?($SVD) vec.each { |word| total_unique_words += 1 if word != 0.0 } else total_unique_words = vec.count { |word| word != 0 } end # Perform first-order association transform if this vector has more # then one word in it. if total_words > 1.0 && total_unique_words > 1 weighted_total = 0.0 # Cache calculations, this takes too long on large indexes cached_calcs = Hash.new do |hash, term| hash[term] = ((term / total_words) * Math.log(term / total_words)) end vec.each do |term| weighted_total += cached_calcs[term] if term > 0.0 end # Cache calculations, this takes too long on large indexes cached_calcs = Hash.new do |hash, val| hash[val] = Math.log(val + 1) / -weighted_total end vec = vec.map do |val| cached_calcs[val] end end if $SVD == :numo @raw_norm = vec / Numo::Linalg.norm(vec) @raw_vector = vec elsif $SVD == :gsl @raw_norm = vec.normalize @raw_vector = vec else @raw_norm = Vector[*vec].normalize @raw_vector = Vector[*vec] end end |
#search_norm ⇒ Object
Use this to fetch the appropriate search vector in normalized form.
40 41 42 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 40 def search_norm @lsi_norm || @raw_norm end |
#search_vector ⇒ Object
Use this to fetch the appropriate search vector.
26 27 28 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 26 def search_vector @lsi_vector || @raw_vector end |
#transposed_search_vector ⇒ Object
Method to access the transposed search vector
31 32 33 34 35 36 37 |
# File 'lib/classifier-reborn/lsi/content_node.rb', line 31 def transposed_search_vector if $SVD == :numo search_vector else search_vector.col end end |