Class: Document

Inherits:
Object show all
Defined in:
lib/picolena/templates/app/models/document.rb,
lib/picolena/templates/spec/spec_helper.rb

Overview

Document class retrieves information from filesystem and the index for any given document.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path) ⇒ Document

Instantiates a new Document, and ensure that the given path exists and is included in an indexed directory. Raises otherwise.



9
10
11
12
13
14
# File 'lib/picolena/templates/app/models/document.rb', line 9

def initialize(path)
  # To ensure @complete_path is an absolute direction.
  @complete_path=File.expand_path(path)
  validate_existence_of_file
  validate_in_indexed_directory
end

Instance Attribute Details

#complete_pathObject (readonly) Also known as: to_s

Returns the value of attribute complete_path.



3
4
5
# File 'lib/picolena/templates/app/models/document.rb', line 3

def complete_path
  @complete_path
end

#matching_contentObject

Returns the value of attribute matching_content.



4
5
6
# File 'lib/picolena/templates/app/models/document.rb', line 4

def matching_content
  @matching_content
end

#scoreObject

Returns the value of attribute score.



4
5
6
# File 'lib/picolena/templates/app/models/document.rb', line 4

def score
  @score
end

Class Method Details

.default_fields_for(complete_path) ⇒ Object

Indexing fields that are shared between every document.



130
131
132
133
134
135
136
137
138
139
140
141
# File 'lib/picolena/templates/app/models/document.rb', line 130

def self.default_fields_for(complete_path)
  doc=Document.new(complete_path)
  {
    :complete_path      => complete_path,
    :probably_unique_id => complete_path.base26_hash,
    :alias_path         => doc.alias_path,
    :filename           => File.basename(complete_path),
    :basename           => File.basename(complete_path, File.extname(complete_path)).gsub(/_/,' '),
    :filetype           => File.extname(complete_path),
    :modified           => File.mtime(complete_path).strftime("%Y%m%d%H%M%S")
  }
end

.find_by_extension(ext) ⇒ Object



16
17
18
# File 'lib/picolena/templates/spec/spec_helper.rb', line 16

def self.find_by_extension(ext)
  Finder.new("ext:#{ext}").matching_documents.first
end

Instance Method Details

#alias_pathObject

End users should not always know where documents are stored internally. An alias path can be specified in config/indexed_directories.yml

For example, with:

"/media/wiki_dump/" : "http://www.mycompany.com/wiki/"

The documents

"/media/wiki_dump/organigram.odp"

will be displayed as being:

"http://www.mycompany.com/wiki/organigram.odp"


48
49
50
51
52
# File 'lib/picolena/templates/app/models/document.rb', line 48

def alias_path
  original_dir=indexed_directory
  alias_dir=Picolena::IndexedDirectories[original_dir]
  dirname.sub(original_dir,alias_dir)
end

#basenameObject

Returns filename without extension

"buildings.odt" => "buildings"


34
35
36
# File 'lib/picolena/templates/app/models/document.rb', line 34

def basename
  filename.chomp(extname)
end

#cachedObject

Cache à la Google. Returns content as it was at the time it was indexed.



84
85
86
# File 'lib/picolena/templates/app/models/document.rb', line 84

def cached
  from_index[:content]
end

#contentObject

Retrieves content as it is now.



78
79
80
# File 'lib/picolena/templates/app/models/document.rb', line 78

def content
  PlainTextExtractor.extract_content_from(complete_path)
end

#extractorObject



69
70
71
# File 'lib/picolena/templates/app/models/document.rb', line 69

def extractor
  PlainTextExtractor.find_by_extension(self.ext_as_sym) rescue nil
end

#filenameObject



20
# File 'lib/picolena/templates/app/models/document.rb', line 20

alias_method :filename, :basename

#has_content?Boolean

Did at least one letter got extracted from the document? This boolean is used in views to know if a link should be displayed to show the content

Returns:

  • (Boolean)


156
157
158
# File 'lib/picolena/templates/app/models/document.rb', line 156

def has_content?
  cached =~ /\w/
end

#highlighted_cache(raw_query) ⇒ Object

Returns cached content with matching terms between ‘<<’ ‘>>’.



89
90
91
92
93
94
95
# File 'lib/picolena/templates/app/models/document.rb', line 89

def highlighted_cache(raw_query)
  excerpts=Indexer.index.highlight(Query.extract_from(raw_query), doc_id,
                          :field => :content, :excerpt_length => :all,
                          :pre_tag => "<<", :post_tag => ">>"
           )
  excerpts.is_an?(Array) ? excerpts.first : ""
end

#icon_pathObject

Returns thumbnail if available, mime icon otherwise



144
145
146
147
148
149
150
151
# File 'lib/picolena/templates/app/models/document.rb', line 144

def icon_path
  if File.exists?(thumbnail_path) then
    thumbnail_path(:public_dir)
  else
    icon_symbol=Picolena::FiletypeToIconSymbol[ext_as_sym]
    "icons/#{icon_symbol}.png" if icon_symbol
  end
end

#inspectObject

Returns complete path as well as matching score and language if available.

../spec/test_dirs/indexed/just_one_doc/for_test.txt (56.3%) (language:en)

Used for example by

rake index:search query="some query"


28
29
30
# File 'lib/picolena/templates/app/models/document.rb', line 28

def inspect
  [self,("(#{pretty_score})" if @score),("(language:#{language})" if language)].compact.join(" ")
end

#languageObject

Returns found language, if any.



120
121
122
# File 'lib/picolena/templates/app/models/document.rb', line 120

def language
  from_index[:language]
end

#mimeObject



73
74
75
# File 'lib/picolena/templates/app/models/document.rb', line 73

def mime
  extractor.mime_name rescue 'application/octet-stream'
end

#mtimeObject

Returns the last modification time before the document got indexed, as YYYYMMDDHHMMSS integer.

>> doc.mtime
=> 20080509093951


115
116
117
# File 'lib/picolena/templates/app/models/document.rb', line 115

def mtime
  from_index[:modified].to_i
end

#pretty_dateObject

Returns the last modification date before the document got indexed. Useful to know how old a document is, and to which version the cache corresponds.

>> doc.pretty_date
=> "2008-05-09"


101
102
103
# File 'lib/picolena/templates/app/models/document.rb', line 101

def pretty_date
  from_index[:modified].sub(/(\d{4})(\d{2})(\d{2})\d{6}/,'\1-\2-\3')
end

#pretty_mtimeObject

Returns the last modification time before the document got indexed.

>> doc.pretty_mtime
=> "2008-05-09 09:39:51"


108
109
110
# File 'lib/picolena/templates/app/models/document.rb', line 108

def pretty_mtime
  from_index[:modified].sub(/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})/,'\1-\2-\3 \4:\5:\6')
end

#pretty_scoreObject

Returns matching score as a percentage, e.g. 56.3%



125
126
127
# File 'lib/picolena/templates/app/models/document.rb', line 125

def pretty_score
  "%3.1f%" % (@score*100)
end

#probably_unique_idObject

Returns an id for this document. This id will be used in Controllers in order to get tiny urls. Since it’s a base26 hash of the absolute filename, it can only be “probably unique”. For huge amount of indexed documents, it would be wise to increase HashLength in config/custom/picolena.rb



58
59
60
# File 'lib/picolena/templates/app/models/document.rb', line 58

def probably_unique_id
  @probably_unique_id||=complete_path.base26_hash
end

#supported?Boolean

Returns true iff some PlainTextExtractor has been defined to convert it to plain text.

Document.new("presentation.pdf").supported? => true
Document.new("presentation.some_weird_extension").supported? => false

Returns:

  • (Boolean)


65
66
67
# File 'lib/picolena/templates/app/models/document.rb', line 65

def supported?
  PlainTextExtractor.supported_extensions.include?(self.ext_as_sym) unless ext_as_sym==:no_extension and !plain_text?
end