Class: WebpageArchivist::WebpageArchivist

Inherits:

Object

Object
WebpageArchivist::WebpageArchivist

Defined in:: lib/webpage-archivist/webpage-archivist.rb

Overview

Entry point for the Web Archivist features. Database configuration will rely on the DATABASE_uri environment variable see sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html for the syntax detail

Instance Method Summary collapse

#add_webpage(uri, name) ⇒ Object
Add a webpage for future fetching, return the corresponding Webpage uri
page uri name

page name.
#extract_instance_content(id, file) ⇒ Object
Write the full content of a webpage instance into a zip file id
the instance id file

the file to write to.
#fetch_all ⇒ Object

Fetch all webpages.
#fetch_webpages(webpages) ⇒ Object

Fetch several webpages, return an hash indexed by the webpages holding the corresponding instances or http result codes.
#list_webpages ⇒ Object

List the webpages.
#purge_cache(retention_period) ⇒ Object
Purge cached elements from the database, they are not deleted from the disk retention_period

number of days after which the purge should start.
#snapshot(instance, thumbnail) ⇒ Object
Create a snapshot of a web page See Snapshoter class for configuration uri
the uri to snapshot snapshot_path
path to the snapshot file thumbnail_path

path to the thumbnail (can be nil for no thumbnail).

Instance Method Details

#add_webpage(uri, name) ⇒ `Object`

Add a webpage for future fetching, return the corresponding Webpage

uri: page uri
name: page name



13
14
15

# File 'lib/webpage-archivist/webpage-archivist.rb', line 13

def add_webpage uri, name
  Webpage.create(:name => name, :uri => uri)
end

#extract_instance_content(id, file) ⇒ `Object`

Write the full content of a webpage instance into a zip file

id: the instance id
file: the file to write to



35
36
37

# File 'lib/webpage-archivist/webpage-archivist.rb', line 35

def extract_instance_content id, file
  Extracter.instance_content id, file
end

#fetch_all ⇒ `Object`

Fetch all webpages



18
19
20

# File 'lib/webpage-archivist/webpage-archivist.rb', line 18

def fetch_all
  Fetcher.fetch_webpages list_webpages
end

#fetch_webpages(webpages) ⇒ `Object`

Fetch several webpages, return an hash indexed by the webpages holding the corresponding instances or http result codes



23
24
25

# File 'lib/webpage-archivist/webpage-archivist.rb', line 23

def fetch_webpages webpages
  Fetcher.fetch_webpages webpages
end

#list_webpages ⇒ `Object`

List the webpages



28
29
30

# File 'lib/webpage-archivist/webpage-archivist.rb', line 28

def list_webpages
  Webpage.all
end

#purge_cache(retention_period) ⇒ `Object`

Purge cached elements from the database, they are not deleted from the disk

retention_period: number of days after which the purge should start

# File 'lib/webpage-archivist/webpage-archivist.rb', line 41

def purge_cache retention_period
  purge_starting_date = DateTime.now - retention_period
  Stylesheet.filter('last_fetched < ?', purge_starting_date).delete
  Script.filter('last_fetched < ?', purge_starting_date).delete
  Image.filter('last_fetched < ?', purge_starting_date).delete
end

#snapshot(instance, thumbnail) ⇒ `Object`

Create a snapshot of a web page See Snapshoter class for configuration

uri: the uri to snapshot
snapshot_path: path to the snapshot file
thumbnail_path: path to the thumbnail (can be nil for no thumbnail)



53
54
55

# File 'lib/webpage-archivist/webpage-archivist.rb', line 53

def snapshot instance, thumbnail
  Snapshoter.snapshot instance, thumbnail
end

Class: WebpageArchivist::WebpageArchivist

Overview

Instance Method Summary collapse

Instance Method Details

#add_webpage(uri, name) ⇒ Object

#extract_instance_content(id, file) ⇒ Object

#fetch_all ⇒ Object

#fetch_webpages(webpages) ⇒ Object

#list_webpages ⇒ Object

#purge_cache(retention_period) ⇒ Object