Class: RSemantic::VectorSpace::Builder
- Inherits:
-
Object
- Object
- RSemantic::VectorSpace::Builder
- Defined in:
- lib/rsemantic/vector_space/builder.rb
Overview
A algebraic model for representing text documents as vectors of identifiers. A document is represented as a vector. Each dimension of the vector corresponds to a separate term. If a term occurs in the document, then the value in the vector is non-zero.
Instance Attribute Summary collapse
-
#parsed_document_cache ⇒ Object
readonly
Returns the value of attribute parsed_document_cache.
Instance Method Summary collapse
- #build_document_matrix(documents) ⇒ Object
- #build_query_vector(term_list) ⇒ Object
-
#initialize(options = {}) ⇒ Builder
constructor
A new instance of Builder.
Constructor Details
Instance Attribute Details
#parsed_document_cache ⇒ Object (readonly)
Returns the value of attribute parsed_document_cache.
7 8 9 |
# File 'lib/rsemantic/vector_space/builder.rb', line 7 def parsed_document_cache @parsed_document_cache end |
Instance Method Details
#build_document_matrix(documents) ⇒ Object
17 18 19 20 21 22 23 24 25 26 27 28 29 |
# File 'lib/rsemantic/vector_space/builder.rb', line 17 def build_document_matrix(documents) @vector_keyword_index = build_vector_keyword_index(documents) document_vectors = documents.enum_for(:each_with_index).map{|document,document_id| build_vector(document, document_id)} n = document_vectors.size m = document_vectors.first.size # TODO check where else we use document_vectors and if we can directly use column based ones document_matrix = GSL::Matrix.alloc(*document_vectors.map {|v| v.transpose}) Model.new(document_matrix, @vector_keyword_index) end |
#build_query_vector(term_list) ⇒ Object
31 32 33 |
# File 'lib/rsemantic/vector_space/builder.rb', line 31 def build_query_vector(term_list) build_vector(term_list.join(" ")) end |