Class: TfIdfSimilarity::BM25Model
- Defined in:
- lib/tf-idf-similarity/bm25_model.rb
Instance Method Summary collapse
-
#inverse_document_frequency(term) ⇒ Float
(also: #idf)
Return the term's inverse document frequency.
-
#term_frequency(document, term) ⇒ Float
(also: #tf)
Returns the term's frequency in the document.
Constructor Details
This class inherits a constructor from TfIdfSimilarity::Model
Instance Method Details
#inverse_document_frequency(term) ⇒ Float Also known as: idf
Return the term's inverse document frequency.
11 12 13 14 |
# File 'lib/tf-idf-similarity/bm25_model.rb', line 11 def inverse_document_frequency(term) df = @model.document_count(term) log((documents.size - df + 0.5) / (df + 0.5)) end |
#term_frequency(document, term) ⇒ Float Also known as: tf
Note:
Like Lucene, we use a b value of 0.75 and a k1 value of 1.2.
Returns the term's frequency in the document.
24 25 26 27 28 29 30 31 |
# File 'lib/tf-idf-similarity/bm25_model.rb', line 24 def term_frequency(document, term) if @model.average_document_size.zero? Float::NAN else tf = document.term_count(term) (tf * 2.2) / (tf + 0.3 + 0.9 * document.size / @model.average_document_size) end end |