Class: TfIdfSimilarity::Model
- Inherits:
-
Object
- Object
- TfIdfSimilarity::Model
- Extended by:
- Forwardable
- Includes:
- MatrixMethods
- Defined in:
- lib/tf-idf-similarity/model.rb
Direct Known Subclasses
Instance Method Summary collapse
-
#document_index(document) ⇒ Integer?
Return the index of the document in the corpus.
-
#initialize(documents, opts = {}) ⇒ Model
constructor
A new instance of Model.
-
#similarity_matrix ⇒ GSL::Matrix, ...
Returns a similarity matrix for the documents in the corpus.
-
#term_frequency_inverse_document_frequency(document, term) ⇒ Float
(also: #tfidf)
Return the term frequency–inverse document frequency.
-
#text_index(text) ⇒ Integer?
Return the index of the document with matching text.
Constructor Details
#initialize(documents, opts = {}) ⇒ Model
Returns a new instance of Model.
11 12 13 14 15 16 17 18 19 20 21 22 23 |
# File 'lib/tf-idf-similarity/model.rb', line 11 def initialize(documents, opts = {}) @model = TermCountModel.new(documents, opts) @library = (opts[:library] || :matrix).to_sym array = Array.new(terms.size) do |i| idf = inverse_document_frequency(terms[i]) Array.new(documents.size) do |j| (term_frequency(documents[j], terms[i]) * idf).to_f end end @matrix = initialize_matrix(array) end |
Instance Method Details
#document_index(document) ⇒ Integer?
Return the index of the document in the corpus.
52 53 54 |
# File 'lib/tf-idf-similarity/model.rb', line 52 def document_index(document) @model.documents.index(document) end |
#similarity_matrix ⇒ GSL::Matrix, ...
Note:
Columns are normalized to unit vectors, so we can calculate the cosine similarity of all document vectors.
Returns a similarity matrix for the documents in the corpus.
40 41 42 43 44 45 46 |
# File 'lib/tf-idf-similarity/model.rb', line 40 def similarity_matrix if documents.empty? [] else multiply_self(normalize) end end |
#term_frequency_inverse_document_frequency(document, term) ⇒ Float Also known as: tfidf
Return the term frequency–inverse document frequency.
30 31 32 |
# File 'lib/tf-idf-similarity/model.rb', line 30 def term_frequency_inverse_document_frequency(document, term) inverse_document_frequency(term) * term_frequency(document, term) end |
#text_index(text) ⇒ Integer?
Return the index of the document with matching text.
60 61 62 63 64 |
# File 'lib/tf-idf-similarity/model.rb', line 60 def text_index(text) @model.documents.index do |document| document.text == text end end |