Class: Document
- Defined in:
- lib/picolena/templates/app/models/document.rb,
lib/picolena/templates/spec/spec_helper.rb
Overview
Document class retrieves information from filesystem and the index for any given document.
Instance Attribute Summary collapse
-
#complete_path ⇒ Object
(also: #to_s)
readonly
Returns the value of attribute complete_path.
-
#matching_content ⇒ Object
Returns the value of attribute matching_content.
-
#score ⇒ Object
Returns the value of attribute score.
Class Method Summary collapse
-
.default_fields_for(complete_path) ⇒ Object
Indexing fields that are shared between every document.
- .find_by_extension(ext) ⇒ Object
Instance Method Summary collapse
-
#alias_path ⇒ Object
End users should not always know where documents are stored internally.
-
#basename ⇒ Object
Returns filename without extension “buildings.odt” => “buildings”.
-
#cached ⇒ Object
Cache à la Google.
-
#content ⇒ Object
Retrieves content as it is now.
- #extractor ⇒ Object
- #filename ⇒ Object
-
#has_content? ⇒ Boolean
Did at least one letter got extracted from the document? This boolean is used in views to know if a link should be displayed to show the content.
-
#highlighted_cache(raw_query) ⇒ Object
Returns cached content with matching terms between ‘<<’ ‘>>’.
-
#icon_path ⇒ Object
Returns thumbnail if available, mime icon otherwise.
-
#initialize(path) ⇒ Document
constructor
Instantiates a new Document, and ensure that the given path exists and is included in an indexed directory.
-
#inspect ⇒ Object
Returns complete path as well as matching score and language if available.
-
#language ⇒ Object
Returns found language, if any.
- #mime ⇒ Object
-
#mtime ⇒ Object
Returns the last modification time before the document got indexed, as YYYYMMDDHHMMSS integer.
-
#pretty_date ⇒ Object
Returns the last modification date before the document got indexed.
-
#pretty_mtime ⇒ Object
Returns the last modification time before the document got indexed.
-
#pretty_score ⇒ Object
Returns matching score as a percentage, e.g.
-
#probably_unique_id ⇒ Object
Returns an id for this document.
-
#supported? ⇒ Boolean
Returns true iff some PlainTextExtractor has been defined to convert it to plain text.
Constructor Details
#initialize(path) ⇒ Document
Instantiates a new Document, and ensure that the given path exists and is included in an indexed directory. Raises otherwise.
9 10 11 12 13 14 |
# File 'lib/picolena/templates/app/models/document.rb', line 9 def initialize(path) # To ensure @complete_path is an absolute direction. @complete_path=File.(path) validate_existence_of_file validate_in_indexed_directory end |
Instance Attribute Details
#complete_path ⇒ Object (readonly) Also known as: to_s
Returns the value of attribute complete_path.
3 4 5 |
# File 'lib/picolena/templates/app/models/document.rb', line 3 def complete_path @complete_path end |
#matching_content ⇒ Object
Returns the value of attribute matching_content.
4 5 6 |
# File 'lib/picolena/templates/app/models/document.rb', line 4 def matching_content @matching_content end |
#score ⇒ Object
Returns the value of attribute score.
4 5 6 |
# File 'lib/picolena/templates/app/models/document.rb', line 4 def score @score end |
Class Method Details
.default_fields_for(complete_path) ⇒ Object
Indexing fields that are shared between every document.
130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/picolena/templates/app/models/document.rb', line 130 def self.default_fields_for(complete_path) doc=Document.new(complete_path) { :complete_path => complete_path, :probably_unique_id => complete_path.base26_hash, :alias_path => doc.alias_path, :filename => File.basename(complete_path), :basename => File.basename(complete_path, File.extname(complete_path)).gsub(/_/,' '), :filetype => File.extname(complete_path), :modified => File.mtime(complete_path).strftime("%Y%m%d%H%M%S") } end |
Instance Method Details
#alias_path ⇒ Object
End users should not always know where documents are stored internally. An alias path can be specified in config/indexed_directories.yml
For example, with:
"/media/wiki_dump/" : "http://www.mycompany.com/wiki/"
The documents
"/media/wiki_dump/organigram.odp"
will be displayed as being:
"http://www.mycompany.com/wiki/organigram.odp"
48 49 50 51 52 |
# File 'lib/picolena/templates/app/models/document.rb', line 48 def alias_path original_dir=indexed_directory alias_dir=Picolena::IndexedDirectories[original_dir] dirname.sub(original_dir,alias_dir) end |
#basename ⇒ Object
Returns filename without extension
"buildings.odt" => "buildings"
34 35 36 |
# File 'lib/picolena/templates/app/models/document.rb', line 34 def basename filename.chomp(extname) end |
#cached ⇒ Object
Cache à la Google. Returns content as it was at the time it was indexed.
84 85 86 |
# File 'lib/picolena/templates/app/models/document.rb', line 84 def cached from_index[:content] end |
#content ⇒ Object
Retrieves content as it is now.
78 79 80 |
# File 'lib/picolena/templates/app/models/document.rb', line 78 def content PlainTextExtractor.extract_content_from(complete_path) end |
#extractor ⇒ Object
69 70 71 |
# File 'lib/picolena/templates/app/models/document.rb', line 69 def extractor PlainTextExtractor.find_by_extension(self.ext_as_sym) rescue nil end |
#filename ⇒ Object
20 |
# File 'lib/picolena/templates/app/models/document.rb', line 20 alias_method :filename, :basename |
#has_content? ⇒ Boolean
Did at least one letter got extracted from the document? This boolean is used in views to know if a link should be displayed to show the content
156 157 158 |
# File 'lib/picolena/templates/app/models/document.rb', line 156 def has_content? cached =~ /\w/ end |
#highlighted_cache(raw_query) ⇒ Object
Returns cached content with matching terms between ‘<<’ ‘>>’.
89 90 91 92 93 94 95 |
# File 'lib/picolena/templates/app/models/document.rb', line 89 def highlighted_cache(raw_query) excerpts=Indexer.index.highlight(Query.extract_from(raw_query), doc_id, :field => :content, :excerpt_length => :all, :pre_tag => "<<", :post_tag => ">>" ) excerpts.is_an?(Array) ? excerpts.first : "" end |
#icon_path ⇒ Object
Returns thumbnail if available, mime icon otherwise
144 145 146 147 148 149 150 151 |
# File 'lib/picolena/templates/app/models/document.rb', line 144 def icon_path if File.exists?(thumbnail_path) then thumbnail_path(:public_dir) else icon_symbol=Picolena::FiletypeToIconSymbol[ext_as_sym] "icons/#{icon_symbol}.png" if icon_symbol end end |
#inspect ⇒ Object
Returns complete path as well as matching score and language if available.
../spec/test_dirs/indexed/just_one_doc/for_test.txt (56.3%) (language:en)
Used for example by
rake index:search query="some query"
28 29 30 |
# File 'lib/picolena/templates/app/models/document.rb', line 28 def inspect [self,("(#{pretty_score})" if @score),("(language:#{language})" if language)].compact.join(" ") end |
#language ⇒ Object
Returns found language, if any.
120 121 122 |
# File 'lib/picolena/templates/app/models/document.rb', line 120 def language from_index[:language] end |
#mime ⇒ Object
73 74 75 |
# File 'lib/picolena/templates/app/models/document.rb', line 73 def mime extractor.mime_name rescue 'application/octet-stream' end |
#mtime ⇒ Object
Returns the last modification time before the document got indexed, as YYYYMMDDHHMMSS integer.
>> doc.mtime
=> 20080509093951
115 116 117 |
# File 'lib/picolena/templates/app/models/document.rb', line 115 def mtime from_index[:modified].to_i end |
#pretty_date ⇒ Object
Returns the last modification date before the document got indexed. Useful to know how old a document is, and to which version the cache corresponds.
>> doc.pretty_date
=> "2008-05-09"
101 102 103 |
# File 'lib/picolena/templates/app/models/document.rb', line 101 def pretty_date from_index[:modified].sub(/(\d{4})(\d{2})(\d{2})\d{6}/,'\1-\2-\3') end |
#pretty_mtime ⇒ Object
Returns the last modification time before the document got indexed.
>> doc.pretty_mtime
=> "2008-05-09 09:39:51"
108 109 110 |
# File 'lib/picolena/templates/app/models/document.rb', line 108 def pretty_mtime from_index[:modified].sub(/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})/,'\1-\2-\3 \4:\5:\6') end |
#pretty_score ⇒ Object
Returns matching score as a percentage, e.g. 56.3%
125 126 127 |
# File 'lib/picolena/templates/app/models/document.rb', line 125 def pretty_score "%3.1f%" % (@score*100) end |
#probably_unique_id ⇒ Object
Returns an id for this document. This id will be used in Controllers in order to get tiny urls. Since it’s a base26 hash of the absolute filename, it can only be “probably unique”. For huge amount of indexed documents, it would be wise to increase HashLength in config/custom/picolena.rb
58 59 60 |
# File 'lib/picolena/templates/app/models/document.rb', line 58 def probably_unique_id @probably_unique_id||=complete_path.base26_hash end |
#supported? ⇒ Boolean
65 66 67 |
# File 'lib/picolena/templates/app/models/document.rb', line 65 def supported? PlainTextExtractor.supported_extensions.include?(self.ext_as_sym) unless ext_as_sym==:no_extension and !plain_text? end |