Class: Cloudlib::Entry

Inherits:
Object
  • Object
show all
Defined in:
lib/cloudlib.rb

Overview

A library entry, including content and metadata. An entry has a name (which is also the key of the associated S3 object) and an attributes hash. The name is of the form “sha1.ext”, where sha1 is a SHA1 hash of the contents of the file, and ext is the file extension. This makes it impossible to have entries with duplicate contents. The attributes hash contains the following fields:

  • extension - file extension including .

  • size - size of contents (bytes)

  • date-added - date entry was added to library

  • entry_type - article, book, chapter, incollection, unpublished

  • authors - list of authors

  • editors - list of editors

  • title - title of entry

  • booktitle - title of book containing entry

  • year - publication year of entry

  • publisher - publisher of book

  • address - publication address

  • journal - journal containing entry

  • volume - volume number of journal

  • pages - page range of entry in book or journal

  • keywords - keywords

  • doi - DOI for entry

  • url - URL for entry

  • comments - miscellaneous comments

  • *_lowercase - lowercase version of *

  • *_words - lowercase version of *, split into a list of words

  • all_words - list of words in title, authors, editors, booktitle, keywords

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, attributes = {'all_words' => []}) ⇒ Entry

Creates a new entry object. To create an entry with contents, use Entry.from_file.



92
93
94
95
# File 'lib/cloudlib.rb', line 92

def initialize(name, attributes={'all_words' => []})
  @name = name
  @attributes = attributes
end

Instance Attribute Details

#attributesObject

Returns the value of attribute attributes.



69
70
71
# File 'lib/cloudlib.rb', line 69

def attributes
  @attributes
end

#nameObject

Returns the value of attribute name.



69
70
71
# File 'lib/cloudlib.rb', line 69

def name
  @name
end

Class Method Details

.connect(library_name = ENV['CLOUDLIB_LIBRARY_NAME'], aws_access_key_id = ENV['AWS_ACCESS_KEY_ID'], aws_secret_access_key = ENV['AWS_SECRET_ACCESS_KEY'], debug = false) ⇒ Object

Establish connections to the S3 file store and the SimpleDB database. If values are not supplied for the parameters, they will default to the values of the environment variables CLOUDLIB_LIBRARY_NAME, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY. Note that library_name is the name of both the S3 bucket that will hold the contents of the entries and the SimpleDB domain that will hold the metadata.



77
78
79
80
81
82
83
84
85
86
87
88
# File 'lib/cloudlib.rb', line 77

def self.connect(library_name=ENV['CLOUDLIB_LIBRARY_NAME'],
                 aws_access_key_id=ENV['AWS_ACCESS_KEY_ID'],
                 aws_secret_access_key=ENV['AWS_SECRET_ACCESS_KEY'],
                 debug = false)
  @@aws_access_key_id = aws_access_key_id
  @@aws_secret_access_key = aws_secret_access_key
  AWS::S3::Base.establish_connection!(:access_key_id => @@aws_access_key_id, :secret_access_key => @@aws_secret_access_key, :use_ssl => true)
  @@bucket = library_name
  logger = Logger.new(STDERR)
  logger.level = if debug then Logger::DEBUG else Logger::WARN end
  @@db = AwsSdb::Service.new(:access_key_id => @@aws_access_key_id, :secret_access_key => @@aws_secret_access_key, :use_ssl => true, :logger => logger)
end

.create_libraryObject

Create the S3 bucket and SimpleDB domain that will store the library entries. This method should be run once to create the library.



99
100
101
102
# File 'lib/cloudlib.rb', line 99

def self.create_library
  AWS::S3::Bucket.create(@@bucket)
  @@db.create_domain(@@bucket)
end

.delete_libraryObject

Delete the S3 bucket and SimpleDB domain that store the library entries. All data will be lost.



106
107
108
109
# File 'lib/cloudlib.rb', line 106

def self.delete_library
  AWS::S3::Bucket.delete(@@bucket, :force => true)
  @@db.delete_domain(@@bucket)
end

.fields(entry_type = '*') ⇒ Object

Returns an array of the field keywords appropriate for a type of entry.



344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
# File 'lib/cloudlib.rb', line 344

def self.fields(entry_type='*')
  fields = [:title, :authors, :year]
  case entry_type
  when 'article'
    fields += [:journal, :volume, :pages]
  when 'book'
    fields += [:publisher, :address]
  when 'chapter'
    fields += [:booktitle, :chapter, :publisher, :address, :pages]
  when 'incollection'
    fields += [:booktitle, :chapter, :publisher, :address, :editors, :pages]
  when '*'
    fields += [:journal, :volume, :booktitle, :editors, :chapter,
               :publisher, :address, :pages]
  end
  fields += [:keywords, :url, :doi, :comments]
  return fields
end

.find_by_name(name) ⇒ Object

Return an entry with the specified name. Raises an error if not found.



126
127
128
129
130
# File 'lib/cloudlib.rb', line 126

def self.find_by_name(name)
  attributes = @@db.get_attributes(@@bucket, name)
  if attributes == {} then raise "Item not found." end
  Entry.new(name, attributes)
end

.from_file(path, filename = path, attributes = {'all_words' => []}) ⇒ Object

Creates and saves an entry from a file, using attributes supplied. Returns the entry.



113
114
115
116
117
118
119
120
121
122
123
# File 'lib/cloudlib.rb', line 113

def self.from_file(path, filename=path, attributes={'all_words' => []})
  sha1 = Digest::SHA1.file(path).hexdigest
  ext  = File.extname(filename)
  name = "#{sha1}#{ext}"
  attributes['size'] = File.size(path).to_s
  attributes['date-added'] = Date.today.to_s
  entry = Entry.new(name, attributes)
  AWS::S3::S3Object.store(name, open(path), @@bucket)
  @@db.put_attributes(@@bucket, name, attributes, replace=true)
  return entry
end

.query(query_string, numitems = 10, token = nil) ⇒ Object

Queries the database and returns a list [token, entries]. entries is a list of up to numitems Entry objects that match the query. If there are more entries than numitems, token will be nonempty, and can be passed in on a subsequent calls for the remaining entries.

The query string can contain one or more words. If a word is preceded by ti=, only entries that match it in the title will be returned. Similarly, au= searches authors, jo= journals, pu= publishers, ad= addresses, ed= editors, bo= booktitle (for collections), and ye= years. ye> and # ye< may also be used. The form ti=‘word1 word2’ may also be used; entries will only match if their titles contain both word1 and word2.



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# File 'lib/cloudlib.rb', line 144

def self.query(query_string, numitems=10, token=nil)
  query_parts = query_string.downcase.scan(/((ti(?:tle)?|au(?:thors?)?|jo(?:urnal)?|bo(?:ooktitle)?|pu(?:blisher)?|ad(?:ddress)?|ed(?:itors?)?|ye(?:ar)?)\s*([<=>])\s*('[^']*'|"[^"]*"|\S*)|\S+)\s*/)
  query = query_parts.reject {|part| part[0] == '*'}.map do |part|
    whole, key, comparison, val = part
    if val then val = val.gsub(/^['"](.*)['"]$/, "\\1") end
    if not val then val = whole end
    key_full = if key
                 case key[0..1]
                 when 'ti'
                  'title'
                 when 'au'
                  'authors'
                 when 'jo'
                  'journal'
                 when 'pu'
                  'publisher'
                 when 'ad'
                  'address'
                 when 'ed'
                  'editors'
                 when 'ye'
                  'year'
                 else 'all'
                 end
               else
                 'all'
               end
    # split hyphenated names into components, since a query might just have one
    vals = val.split(/[-[:space:]]+/)
    vals.map do |v|
      if key_full == 'year'    # there is no year_words field
         "['year' #{comparison} '#{v}']"
      else
         v_escaped = v.gsub(/\\/,"\\\\\\\\").gsub(/'/,"\\\\'")
         "['#{key_full}_words' = '#{v_escaped}']"
      end
    end.join(" intersection ")
  end.join(" intersection ")
  # note: query has to include year in order to sort by year
  # hence this dummy search
  if query.empty?
     query = "['year' starts-with ''] sort 'year'"
  else
     query += " intersection ['year' starts-with ''] sort 'year'"
  end
  names, token = if token
                   @@db.query(@@bucket, query, numitems, token)
                 else
                   @@db.query(@@bucket, query, numitems)
                 end
  entries = names.map do |name|
    attributes = @@db.get_attributes(@@bucket, name)
    Entry.new(name, attributes)
  end
  return token, entries
end

Instance Method Details

#deleteObject

Deletes the entry.



211
212
213
214
# File 'lib/cloudlib.rb', line 211

def delete
  AWS::S3::S3Object.delete(self.name, @@bucket)
  @@db.delete_attributes(@@bucket, self.name)
end

#download(path) ⇒ Object

Downloads the entry and saves as filename.



223
224
225
226
227
228
229
230
231
232
233
234
# File 'lib/cloudlib.rb', line 223

def download(path)
  if File.exist?(path)
    STDERR.puts "Backing up existing #{path} as #{path}~"
    FileUtils.copy_file(path, "#{path}~", preserve=true)
  end
  open(path, 'w') do |outfile|
    open(self.url, 'r') do |source|
      FileUtils.copy_stream(source, outfile)
    end
  end
  return path
end

#fieldsObject

Returns the fields appropriate for an entry.



364
365
366
367
# File 'lib/cloudlib.rb', line 364

def fields
  entry_type = self.show_attribute('entry_type')
  Entry.fields(entry_type)
end

#friendly_filenameObject

Returns a human-friendly filename for the entry, constructed from authors and title.



203
204
205
206
207
208
# File 'lib/cloudlib.rb', line 203

def friendly_filename
  authornames = self.attributes['authors'].map {|a| last_name(a)}.join('_')
  title = self.show_attribute('title').gsub(/[,.\/[:space:]]+/,'_')
  ext = File.extname(self.name)
  return "#{authornames}_#{title}#{ext}"
end

#saveObject

Saves the entry (metadata only; contents are saved by the from_file method).



218
219
220
# File 'lib/cloudlib.rb', line 218

def save
  @@db.put_attributes(@@bucket, self.name, self.attributes, replace=true)
end

#set_attribute(attribute, ans) ⇒ Object

Sets the specified metadata attribute to ans. ans is assumed to be a regular string. It will be split by “ and ” for authors and editors, or by spaces for keywords.



305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
# File 'lib/cloudlib.rb', line 305

def set_attribute(attribute, ans)
  index = ['title', 'authors', 'editors', 'booktitle'].member?(attribute)
  if ans.nil? || ans.empty?
    self.attributes[attribute] = nil
  else
    newval = if attribute == 'editors' || attribute == 'authors'
                ans.split(" and ").map {|a| a.strip}
             elsif attribute == 'keywords'
                ans.split
             else
                [ans.strip]
             end
    self.attributes[attribute] = newval
    unless ['url', 'doi', 'keywords'].member?(attribute)
      self.attributes[attribute + "_lowercase"] = newval.map {|a| a.downcase}
      self.attributes[attribute + "_words"] = self.attributes[attribute + "_lowercase"].map {|a| a.split(/[[:punct:]]*[[:space:]]+|-+/)}.flatten.reject {|a| a.empty?}
    end
    # recalculate all_words
    tit_auth_words = ['title', 'authors', 'editors', 'booktitle'].map {|att| self.attributes[att + "_words"] || []}.flatten
    keywords = self.attributes['keywords'] || []
    self.attributes['all_words'] = keywords + tit_auth_words
  end
end

#show_attribute(attribute) ⇒ Object

Returns a string representation of an attribute.



330
331
332
333
334
335
336
337
338
339
340
341
# File 'lib/cloudlib.rb', line 330

def show_attribute(attribute)
  value = self.attributes[attribute]
  if value.nil?
     ""
  elsif attribute == 'keywords'
     value.join(' ')
  elsif attribute == 'editors' || attribute == 'authors'
     value.join(' and ')
  else
     value[0]
  end
end

#to_bibtexObject

Returns a bibtex entry for the entry.



237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
# File 'lib/cloudlib.rb', line 237

def to_bibtex
  pairs = self.fields.map do |field|
    if self.attributes[field.to_s]
       sprintf("  %-15s: {%s}", field.to_s, self.show_attribute(field.to_s))
    else
       nil
    end
  end
  pairs += [sprintf("  %-15s: {%s}", "file", self.name)]
  authornames = self.attributes['authors'].map {|a| last_name(a)}.join('.')
  year = self.attributes['year']
  entry_type = self.show_attribute('entry_type') || 'unknown'
  if entry_type == 'chapter' then entry_type = 'inbook' end
  entry_key = "#{authornames}:#{year}"
  "@#{entry_type.upcase}{#{entry_key},\n#{pairs.join(",\n")}\n}"
end

#to_sObject

Returns a string representation of the entry’s metadata.



255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
# File 'lib/cloudlib.rb', line 255

def to_s
  authors = self.show_attribute('authors')
  unless authors.empty?
    authors = "#{authors}, "
  end
  title = "#{self.show_attribute('title')}"
  year = self.show_attribute('year')
  titleyear = if year.empty?
                 title + ". "
              else
                 title + " (#{year}). "
              end
  pubaddr = [self.show_attribute('address'),
             self.show_attribute('publisher')].reject {|x| x.empty?}.join(": ")
  chapter = self.show_attribute('chapter')
  pages = self.show_attribute('pages')
  booktitle = self.show_attribute('booktitle')
  editors = self.show_attribute('editors')
  journal = self.show_attribute('journal')
  volume = self.show_attribute('volume')
  rest = case self.show_attribute('entry_type')
         when 'article'
           if journal.empty?
              ""
           else
              "#{journal} #{volume}" +
              if pages.empty? then "." else ", #{pages}." end
           end
         when 'book'
           if pubaddr.empty? then "" else "#{pubaddr}." end
         when 'chapter'
           if pubaddr.empty? then "" else "#{pubaddr}." end +
           if chapter.empty? then "" else " Chapter #{chapter}." end +
           if pages.empty? then "" else " #{pages}." end
         when 'incollection'
           "In " +
           if editors.empty? then "" else editors + " (eds.), " end +
           booktitle +
           if pubaddr.empty? then "" else " (#{pubaddr})." end +
           if chapter.empty? then "" else " Chapter #{chapter}." end +
           if pages.empty? then "" else " #{pages}." end
         when 'unpublished'
           " (unpublished)."
         else ""
         end
  return authors + titleyear + rest
end

#urlObject



369
370
371
# File 'lib/cloudlib.rb', line 369

def url
  AWS::S3::S3Object.find(self.name, @@bucket).url(:expires_in => 60 * 10)  # expires in 10 min
end