Class: RetrievalLite::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/retrieval_lite/document.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(content, opts = {}) ⇒ Document

splits the text of the document into an array of tokens

Parameters:

  • content (String)

    the text of the document

  • opts (Hash) (defaults to: {})

    optional arguments to initializer

Options Hash (opts):

  • :id (String)

    the id of the document. Defaults to object_id assigned by ruby



14
15
16
17
18
# File 'lib/retrieval_lite/document.rb', line 14

def initialize(content, opts = {})
  @content = content
  @id = opts[:id] || object_id
  @term_frequencies = RetrievalLite::Tokenizer.parse_content(content)
end

Instance Attribute Details

#contentObject (readonly)

the text of the document



3
4
5
# File 'lib/retrieval_lite/document.rb', line 3

def content
  @content
end

#idObject (readonly)

the id of the document



7
8
9
# File 'lib/retrieval_lite/document.rb', line 7

def id
  @id
end

#term_frequenciesObject (readonly)

a Hash<String, Integer> of all terms of the documents to the frequency of each term



5
6
7
# File 'lib/retrieval_lite/document.rb', line 5

def term_frequencies
  @term_frequencies
end

Instance Method Details

#frequency_of(term) ⇒ Integer

Returns the number of times a term appears in the document.

Parameters:

  • term (String)

Returns:

  • (Integer)

    the number of times a term appears in the document



39
40
41
42
43
44
45
# File 'lib/retrieval_lite/document.rb', line 39

def frequency_of(term)
  if @term_frequencies.has_key?(term)
    return @term_frequencies[term]
  else
    return 0
  end
end

for debugging



21
22
23
24
25
# File 'lib/retrieval_lite/document.rb', line 21

def print_tokens
  @term_frequencies.each do |key, value|
    puts "#{key}: #{value}"
  end
end

#term_countInteger

Returns the total number of unique terms in the document.

Returns:

  • (Integer)

    the total number of unique terms in the document



28
29
30
# File 'lib/retrieval_lite/document.rb', line 28

def term_count
  @term_frequencies.size
end

#termsArray<String>

Returns the unique terms of the document.

Returns:

  • (Array<String>)

    the unique terms of the document



33
34
35
# File 'lib/retrieval_lite/document.rb', line 33

def terms
  @term_frequencies.keys
end

#total_termsInteger

Returns the total number of terms (not unique) in the document.

Returns:

  • (Integer)

    the total number of terms (not unique) in the document



48
49
50
51
52
53
54
# File 'lib/retrieval_lite/document.rb', line 48

def total_terms
  count = 0
  @term_frequencies.each do |key, value|
    count += value
  end
  return count
end