Class: RetrievalLite::Document
- Inherits:
-
Object
- Object
- RetrievalLite::Document
- Defined in:
- lib/retrieval_lite/document.rb
Instance Attribute Summary collapse
-
#content ⇒ Object
readonly
the text of the document.
-
#id ⇒ Object
readonly
the id of the document.
-
#term_frequencies ⇒ Object
readonly
a Hash<String, Integer> of all terms of the documents to the frequency of each term.
Instance Method Summary collapse
-
#frequency_of(term) ⇒ Integer
The number of times a term appears in the document.
-
#initialize(content, opts = {}) ⇒ Document
constructor
splits the text of the document into an array of tokens.
-
#print_tokens ⇒ Object
for debugging.
-
#term_count ⇒ Integer
The total number of unique terms in the document.
-
#terms ⇒ Array<String>
The unique terms of the document.
-
#total_terms ⇒ Integer
The total number of terms (not unique) in the document.
Constructor Details
#initialize(content, opts = {}) ⇒ Document
splits the text of the document into an array of tokens
14 15 16 17 18 |
# File 'lib/retrieval_lite/document.rb', line 14 def initialize(content, opts = {}) @content = content @id = opts[:id] || object_id @term_frequencies = RetrievalLite::Tokenizer.parse_content(content) end |
Instance Attribute Details
#content ⇒ Object (readonly)
the text of the document
3 4 5 |
# File 'lib/retrieval_lite/document.rb', line 3 def content @content end |
#id ⇒ Object (readonly)
the id of the document
7 8 9 |
# File 'lib/retrieval_lite/document.rb', line 7 def id @id end |
#term_frequencies ⇒ Object (readonly)
a Hash<String, Integer> of all terms of the documents to the frequency of each term
5 6 7 |
# File 'lib/retrieval_lite/document.rb', line 5 def term_frequencies @term_frequencies end |
Instance Method Details
#frequency_of(term) ⇒ Integer
Returns the number of times a term appears in the document.
39 40 41 42 43 44 45 |
# File 'lib/retrieval_lite/document.rb', line 39 def frequency_of(term) if @term_frequencies.has_key?(term) return @term_frequencies[term] else return 0 end end |
#print_tokens ⇒ Object
for debugging
21 22 23 24 25 |
# File 'lib/retrieval_lite/document.rb', line 21 def print_tokens @term_frequencies.each do |key, value| puts "#{key}: #{value}" end end |
#term_count ⇒ Integer
Returns the total number of unique terms in the document.
28 29 30 |
# File 'lib/retrieval_lite/document.rb', line 28 def term_count @term_frequencies.size end |
#terms ⇒ Array<String>
Returns the unique terms of the document.
33 34 35 |
# File 'lib/retrieval_lite/document.rb', line 33 def terms @term_frequencies.keys end |
#total_terms ⇒ Integer
Returns the total number of terms (not unique) in the document.
48 49 50 51 52 53 54 |
# File 'lib/retrieval_lite/document.rb', line 48 def total_terms count = 0 @term_frequencies.each do |key, value| count += value end return count end |