Class: WebpageArchivist::WebpageArchivist

Inherits:
Object
  • Object
show all
Defined in:
lib/webpage-archivist/webpage-archivist.rb

Overview

Entry point for the Web Archivist features. Database configuration will rely on the DATABASE_uri environment variable see sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html for the syntax detail

Instance Method Summary collapse

Instance Method Details

#add_webpage(uri, name) ⇒ Object

Add a webpage for future fetching, return the corresponding Webpage

uri

page uri

name

page name



13
14
15
# File 'lib/webpage-archivist/webpage-archivist.rb', line 13

def add_webpage uri, name
  Webpage.create(:name => name, :uri => uri)
end

#extract_instance_content(id, file) ⇒ Object

Write the full content of a webpage instance into a zip file

id

the instance id

file

the file to write to



35
36
37
# File 'lib/webpage-archivist/webpage-archivist.rb', line 35

def extract_instance_content id, file
  Extracter.instance_content id, file
end

#fetch_allObject

Fetch all webpages



18
19
20
# File 'lib/webpage-archivist/webpage-archivist.rb', line 18

def fetch_all
  Fetcher.fetch_webpages list_webpages
end

#fetch_webpages(webpages) ⇒ Object

Fetch several webpages, return an hash indexed by the webpages holding the corresponding instances or http result codes



23
24
25
# File 'lib/webpage-archivist/webpage-archivist.rb', line 23

def fetch_webpages webpages
  Fetcher.fetch_webpages webpages
end

#list_webpagesObject

List the webpages



28
29
30
# File 'lib/webpage-archivist/webpage-archivist.rb', line 28

def list_webpages
  Webpage.all
end

#purge_cache(retention_period) ⇒ Object

Purge cached elements from the database, they are not deleted from the disk

retention_period

number of days after which the purge should start



41
42
43
44
45
46
# File 'lib/webpage-archivist/webpage-archivist.rb', line 41

def purge_cache retention_period
  purge_starting_date = DateTime.now - retention_period
  Stylesheet.filter('last_fetched < ?', purge_starting_date).delete
  Script.filter('last_fetched < ?', purge_starting_date).delete
  Image.filter('last_fetched < ?', purge_starting_date).delete
end

#snapshot(instance, thumbnail) ⇒ Object

Create a snapshot of a web page See Snapshoter class for configuration

uri

the uri to snapshot

snapshot_path

path to the snapshot file

thumbnail_path

path to the thumbnail (can be nil for no thumbnail)



53
54
55
# File 'lib/webpage-archivist/webpage-archivist.rb', line 53

def snapshot instance, thumbnail
  Snapshoter.snapshot instance, thumbnail
end