Class: Cloudlib::Entry
- Inherits:
-
Object
- Object
- Cloudlib::Entry
- Defined in:
- lib/cloudlib.rb
Overview
A library entry, including content and metadata. An entry has a name (which is also the key of the associated S3 object) and an attributes hash. The name is of the form “sha1.ext”, where sha1 is a SHA1 hash of the contents of the file, and ext is the file extension. This makes it impossible to have entries with duplicate contents. The attributes hash contains the following fields:
-
extension - file extension including .
-
size - size of contents (bytes)
-
date-added - date entry was added to library
-
entry_type - article, book, chapter, incollection, unpublished
-
authors - list of authors
-
editors - list of editors
-
title - title of entry
-
booktitle - title of book containing entry
-
year - publication year of entry
-
publisher - publisher of book
-
address - publication address
-
journal - journal containing entry
-
volume - volume number of journal
-
pages - page range of entry in book or journal
-
keywords - keywords
-
doi - DOI for entry
-
url - URL for entry
-
comments - miscellaneous comments
-
*_lowercase - lowercase version of *
-
*_words - lowercase version of *, split into a list of words
-
all_words - list of words in title, authors, editors, booktitle, keywords
Instance Attribute Summary collapse
-
#attributes ⇒ Object
Returns the value of attribute attributes.
-
#name ⇒ Object
Returns the value of attribute name.
Class Method Summary collapse
-
.connect(library_name = ENV['CLOUDLIB_LIBRARY_NAME'], aws_access_key_id = ENV['AWS_ACCESS_KEY_ID'], aws_secret_access_key = ENV['AWS_SECRET_ACCESS_KEY'], debug = false) ⇒ Object
Establish connections to the S3 file store and the SimpleDB database.
-
.create_library ⇒ Object
Create the S3 bucket and SimpleDB domain that will store the library entries.
-
.delete_library ⇒ Object
Delete the S3 bucket and SimpleDB domain that store the library entries.
-
.fields(entry_type = '*') ⇒ Object
Returns an array of the field keywords appropriate for a type of entry.
-
.find_by_name(name) ⇒ Object
Return an entry with the specified name.
-
.from_file(path, filename = path, attributes = {'all_words' => []}) ⇒ Object
Creates and saves an entry from a file, using attributes supplied.
-
.query(query_string, numitems = 10, token = nil) ⇒ Object
Queries the database and returns a list [token, entries].
Instance Method Summary collapse
-
#delete ⇒ Object
Deletes the entry.
-
#download(path) ⇒ Object
Downloads the entry and saves as filename.
-
#fields ⇒ Object
Returns the fields appropriate for an entry.
-
#friendly_filename ⇒ Object
Returns a human-friendly filename for the entry, constructed from authors and title.
-
#initialize(name, attributes = {'all_words' => []}) ⇒ Entry
constructor
Creates a new entry object.
-
#save ⇒ Object
Saves the entry (metadata only; contents are saved by the from_file method).
-
#set_attribute(attribute, ans) ⇒ Object
Sets the specified metadata attribute to ans.
-
#show_attribute(attribute) ⇒ Object
Returns a string representation of an attribute.
-
#to_bibtex ⇒ Object
Returns a bibtex entry for the entry.
-
#to_s ⇒ Object
Returns a string representation of the entry’s metadata.
- #url ⇒ Object
Constructor Details
#initialize(name, attributes = {'all_words' => []}) ⇒ Entry
Creates a new entry object. To create an entry with contents, use Entry.from_file.
92 93 94 95 |
# File 'lib/cloudlib.rb', line 92 def initialize(name, attributes={'all_words' => []}) @name = name @attributes = attributes end |
Instance Attribute Details
#attributes ⇒ Object
Returns the value of attribute attributes.
69 70 71 |
# File 'lib/cloudlib.rb', line 69 def attributes @attributes end |
#name ⇒ Object
Returns the value of attribute name.
69 70 71 |
# File 'lib/cloudlib.rb', line 69 def name @name end |
Class Method Details
.connect(library_name = ENV['CLOUDLIB_LIBRARY_NAME'], aws_access_key_id = ENV['AWS_ACCESS_KEY_ID'], aws_secret_access_key = ENV['AWS_SECRET_ACCESS_KEY'], debug = false) ⇒ Object
Establish connections to the S3 file store and the SimpleDB database. If values are not supplied for the parameters, they will default to the values of the environment variables CLOUDLIB_LIBRARY_NAME, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY. Note that library_name is the name of both the S3 bucket that will hold the contents of the entries and the SimpleDB domain that will hold the metadata.
77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/cloudlib.rb', line 77 def self.connect(library_name=ENV['CLOUDLIB_LIBRARY_NAME'], aws_access_key_id=ENV['AWS_ACCESS_KEY_ID'], aws_secret_access_key=ENV['AWS_SECRET_ACCESS_KEY'], debug = false) @@aws_access_key_id = aws_access_key_id @@aws_secret_access_key = aws_secret_access_key AWS::S3::Base.establish_connection!(:access_key_id => @@aws_access_key_id, :secret_access_key => @@aws_secret_access_key, :use_ssl => true) @@bucket = library_name logger = Logger.new(STDERR) logger.level = if debug then Logger::DEBUG else Logger::WARN end @@db = AwsSdb::Service.new(:access_key_id => @@aws_access_key_id, :secret_access_key => @@aws_secret_access_key, :use_ssl => true, :logger => logger) end |
.create_library ⇒ Object
Create the S3 bucket and SimpleDB domain that will store the library entries. This method should be run once to create the library.
99 100 101 102 |
# File 'lib/cloudlib.rb', line 99 def self.create_library AWS::S3::Bucket.create(@@bucket) @@db.create_domain(@@bucket) end |
.delete_library ⇒ Object
Delete the S3 bucket and SimpleDB domain that store the library entries. All data will be lost.
106 107 108 109 |
# File 'lib/cloudlib.rb', line 106 def self.delete_library AWS::S3::Bucket.delete(@@bucket, :force => true) @@db.delete_domain(@@bucket) end |
.fields(entry_type = '*') ⇒ Object
Returns an array of the field keywords appropriate for a type of entry.
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 |
# File 'lib/cloudlib.rb', line 344 def self.fields(entry_type='*') fields = [:title, :authors, :year] case entry_type when 'article' fields += [:journal, :volume, :pages] when 'book' fields += [:publisher, :address] when 'chapter' fields += [:booktitle, :chapter, :publisher, :address, :pages] when 'incollection' fields += [:booktitle, :chapter, :publisher, :address, :editors, :pages] when '*' fields += [:journal, :volume, :booktitle, :editors, :chapter, :publisher, :address, :pages] end fields += [:keywords, :url, :doi, :comments] return fields end |
.find_by_name(name) ⇒ Object
Return an entry with the specified name. Raises an error if not found.
126 127 128 129 130 |
# File 'lib/cloudlib.rb', line 126 def self.find_by_name(name) attributes = @@db.get_attributes(@@bucket, name) if attributes == {} then raise "Item not found." end Entry.new(name, attributes) end |
.from_file(path, filename = path, attributes = {'all_words' => []}) ⇒ Object
Creates and saves an entry from a file, using attributes supplied. Returns the entry.
113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/cloudlib.rb', line 113 def self.from_file(path, filename=path, attributes={'all_words' => []}) sha1 = Digest::SHA1.file(path).hexdigest ext = File.extname(filename) name = "#{sha1}#{ext}" attributes['size'] = File.size(path).to_s attributes['date-added'] = Date.today.to_s entry = Entry.new(name, attributes) AWS::S3::S3Object.store(name, open(path), @@bucket) @@db.put_attributes(@@bucket, name, attributes, replace=true) return entry end |
.query(query_string, numitems = 10, token = nil) ⇒ Object
Queries the database and returns a list [token, entries]. entries is a list of up to numitems Entry objects that match the query. If there are more entries than numitems, token will be nonempty, and can be passed in on a subsequent calls for the remaining entries.
The query string can contain one or more words. If a word is preceded by ti=, only entries that match it in the title will be returned. Similarly, au= searches authors, jo= journals, pu= publishers, ad= addresses, ed= editors, bo= booktitle (for collections), and ye= years. ye> and # ye< may also be used. The form ti=‘word1 word2’ may also be used; entries will only match if their titles contain both word1 and word2.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/cloudlib.rb', line 144 def self.query(query_string, numitems=10, token=nil) query_parts = query_string.downcase.scan(/((ti(?:tle)?|au(?:thors?)?|jo(?:urnal)?|bo(?:ooktitle)?|pu(?:blisher)?|ad(?:ddress)?|ed(?:itors?)?|ye(?:ar)?)\s*([<=>])\s*('[^']*'|"[^"]*"|\S*)|\S+)\s*/) query = query_parts.reject {|part| part[0] == '*'}.map do |part| whole, key, comparison, val = part if val then val = val.gsub(/^['"](.*)['"]$/, "\\1") end if not val then val = whole end key_full = if key case key[0..1] when 'ti' 'title' when 'au' 'authors' when 'jo' 'journal' when 'pu' 'publisher' when 'ad' 'address' when 'ed' 'editors' when 'ye' 'year' else 'all' end else 'all' end # split hyphenated names into components, since a query might just have one vals = val.split(/[-[:space:]]+/) vals.map do |v| if key_full == 'year' # there is no year_words field "['year' #{comparison} '#{v}']" else v_escaped = v.gsub(/\\/,"\\\\\\\\").gsub(/'/,"\\\\'") "['#{key_full}_words' = '#{v_escaped}']" end end.join(" intersection ") end.join(" intersection ") # note: query has to include year in order to sort by year # hence this dummy search if query.empty? query = "['year' starts-with ''] sort 'year'" else query += " intersection ['year' starts-with ''] sort 'year'" end names, token = if token @@db.query(@@bucket, query, numitems, token) else @@db.query(@@bucket, query, numitems) end entries = names.map do |name| attributes = @@db.get_attributes(@@bucket, name) Entry.new(name, attributes) end return token, entries end |
Instance Method Details
#delete ⇒ Object
Deletes the entry.
211 212 213 214 |
# File 'lib/cloudlib.rb', line 211 def delete AWS::S3::S3Object.delete(self.name, @@bucket) @@db.delete_attributes(@@bucket, self.name) end |
#download(path) ⇒ Object
Downloads the entry and saves as filename.
223 224 225 226 227 228 229 230 231 232 233 234 |
# File 'lib/cloudlib.rb', line 223 def download(path) if File.exist?(path) STDERR.puts "Backing up existing #{path} as #{path}~" FileUtils.copy_file(path, "#{path}~", preserve=true) end open(path, 'w') do |outfile| open(self.url, 'r') do |source| FileUtils.copy_stream(source, outfile) end end return path end |
#fields ⇒ Object
Returns the fields appropriate for an entry.
364 365 366 367 |
# File 'lib/cloudlib.rb', line 364 def fields entry_type = self.show_attribute('entry_type') Entry.fields(entry_type) end |
#friendly_filename ⇒ Object
Returns a human-friendly filename for the entry, constructed from authors and title.
203 204 205 206 207 208 |
# File 'lib/cloudlib.rb', line 203 def friendly_filename = self.attributes['authors'].map {|a| last_name(a)}.join('_') title = self.show_attribute('title').gsub(/[,.\/[:space:]]+/,'_') ext = File.extname(self.name) return "#{}_#{title}#{ext}" end |
#save ⇒ Object
Saves the entry (metadata only; contents are saved by the from_file method).
218 219 220 |
# File 'lib/cloudlib.rb', line 218 def save @@db.put_attributes(@@bucket, self.name, self.attributes, replace=true) end |
#set_attribute(attribute, ans) ⇒ Object
Sets the specified metadata attribute to ans. ans is assumed to be a regular string. It will be split by “ and ” for authors and editors, or by spaces for keywords.
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 |
# File 'lib/cloudlib.rb', line 305 def set_attribute(attribute, ans) index = ['title', 'authors', 'editors', 'booktitle'].member?(attribute) if ans.nil? || ans.empty? self.attributes[attribute] = nil else newval = if attribute == 'editors' || attribute == 'authors' ans.split(" and ").map {|a| a.strip} elsif attribute == 'keywords' ans.split else [ans.strip] end self.attributes[attribute] = newval unless ['url', 'doi', 'keywords'].member?(attribute) self.attributes[attribute + "_lowercase"] = newval.map {|a| a.downcase} self.attributes[attribute + "_words"] = self.attributes[attribute + "_lowercase"].map {|a| a.split(/[[:punct:]]*[[:space:]]+|-+/)}.flatten.reject {|a| a.empty?} end # recalculate all_words tit_auth_words = ['title', 'authors', 'editors', 'booktitle'].map {|att| self.attributes[att + "_words"] || []}.flatten keywords = self.attributes['keywords'] || [] self.attributes['all_words'] = keywords + tit_auth_words end end |
#show_attribute(attribute) ⇒ Object
Returns a string representation of an attribute.
330 331 332 333 334 335 336 337 338 339 340 341 |
# File 'lib/cloudlib.rb', line 330 def show_attribute(attribute) value = self.attributes[attribute] if value.nil? "" elsif attribute == 'keywords' value.join(' ') elsif attribute == 'editors' || attribute == 'authors' value.join(' and ') else value[0] end end |
#to_bibtex ⇒ Object
Returns a bibtex entry for the entry.
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
# File 'lib/cloudlib.rb', line 237 def to_bibtex pairs = self.fields.map do |field| if self.attributes[field.to_s] sprintf(" %-15s: {%s}", field.to_s, self.show_attribute(field.to_s)) else nil end end pairs += [sprintf(" %-15s: {%s}", "file", self.name)] = self.attributes['authors'].map {|a| last_name(a)}.join('.') year = self.attributes['year'] entry_type = self.show_attribute('entry_type') || 'unknown' if entry_type == 'chapter' then entry_type = 'inbook' end entry_key = "#{}:#{year}" "@#{entry_type.upcase}{#{entry_key},\n#{pairs.join(",\n")}\n}" end |
#to_s ⇒ Object
Returns a string representation of the entry’s metadata.
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
# File 'lib/cloudlib.rb', line 255 def to_s = self.show_attribute('authors') unless .empty? = "#{}, " end title = "#{self.show_attribute('title')}" year = self.show_attribute('year') titleyear = if year.empty? title + ". " else title + " (#{year}). " end pubaddr = [self.show_attribute('address'), self.show_attribute('publisher')].reject {|x| x.empty?}.join(": ") chapter = self.show_attribute('chapter') pages = self.show_attribute('pages') booktitle = self.show_attribute('booktitle') editors = self.show_attribute('editors') journal = self.show_attribute('journal') volume = self.show_attribute('volume') rest = case self.show_attribute('entry_type') when 'article' if journal.empty? "" else "#{journal} #{volume}" + if pages.empty? then "." else ", #{pages}." end end when 'book' if pubaddr.empty? then "" else "#{pubaddr}." end when 'chapter' if pubaddr.empty? then "" else "#{pubaddr}." end + if chapter.empty? then "" else " Chapter #{chapter}." end + if pages.empty? then "" else " #{pages}." end when 'incollection' "In " + if editors.empty? then "" else editors + " (eds.), " end + booktitle + if pubaddr.empty? then "" else " (#{pubaddr})." end + if chapter.empty? then "" else " Chapter #{chapter}." end + if pages.empty? then "" else " #{pages}." end when 'unpublished' " (unpublished)." else "" end return + titleyear + rest end |
#url ⇒ Object
369 370 371 |
# File 'lib/cloudlib.rb', line 369 def url AWS::S3::S3Object.find(self.name, @@bucket).url(:expires_in => 60 * 10) # expires in 10 min end |