Class: WebpageArchivist::WebpageArchivist
- Inherits:
-
Object
- Object
- WebpageArchivist::WebpageArchivist
- Defined in:
- lib/webpage-archivist/webpage-archivist.rb
Overview
Entry point for the Web Archivist features. Database configuration will rely on the DATABASE_uri environment variable see sequel.rubyforge.org/rdoc/files/doc/opening_databases_rdoc.html for the syntax detail
Instance Method Summary collapse
-
#add_webpage(uri, name) ⇒ Object
- Add a webpage for future fetching, return the corresponding Webpage uri
- page uri name
-
page name.
-
#extract_instance_content(id, file) ⇒ Object
- Write the full content of a webpage instance into a zip file id
- the instance id file
-
the file to write to.
-
#fetch_all ⇒ Object
Fetch all webpages.
-
#fetch_webpages(webpages) ⇒ Object
Fetch several webpages, return an hash indexed by the webpages holding the corresponding instances or http result codes.
-
#list_webpages ⇒ Object
List the webpages.
-
#purge_cache(retention_period) ⇒ Object
- Purge cached elements from the database, they are not deleted from the disk retention_period
-
number of days after which the purge should start.
-
#snapshot(instance, thumbnail) ⇒ Object
- Create a snapshot of a web page See Snapshoter class for configuration uri
- the uri to snapshot snapshot_path
- path to the snapshot file thumbnail_path
-
path to the thumbnail (can be nil for no thumbnail).
Instance Method Details
#add_webpage(uri, name) ⇒ Object
Add a webpage for future fetching, return the corresponding Webpage
- uri
-
page uri
- name
-
page name
13 14 15 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 13 def add_webpage uri, name Webpage.create(:name => name, :uri => uri) end |
#extract_instance_content(id, file) ⇒ Object
Write the full content of a webpage instance into a zip file
- id
-
the instance id
- file
-
the file to write to
35 36 37 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 35 def extract_instance_content id, file Extracter.instance_content id, file end |
#fetch_all ⇒ Object
Fetch all webpages
18 19 20 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 18 def fetch_all Fetcher.fetch_webpages list_webpages end |
#fetch_webpages(webpages) ⇒ Object
Fetch several webpages, return an hash indexed by the webpages holding the corresponding instances or http result codes
23 24 25 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 23 def fetch_webpages webpages Fetcher.fetch_webpages webpages end |
#list_webpages ⇒ Object
List the webpages
28 29 30 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 28 def list_webpages Webpage.all end |
#purge_cache(retention_period) ⇒ Object
Purge cached elements from the database, they are not deleted from the disk
- retention_period
-
number of days after which the purge should start
41 42 43 44 45 46 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 41 def purge_cache retention_period purge_starting_date = DateTime.now - retention_period Stylesheet.filter('last_fetched < ?', purge_starting_date).delete Script.filter('last_fetched < ?', purge_starting_date).delete Image.filter('last_fetched < ?', purge_starting_date).delete end |
#snapshot(instance, thumbnail) ⇒ Object
Create a snapshot of a web page See Snapshoter class for configuration
- uri
-
the uri to snapshot
- snapshot_path
-
path to the snapshot file
- thumbnail_path
-
path to the thumbnail (can be nil for no thumbnail)
53 54 55 |
# File 'lib/webpage-archivist/webpage-archivist.rb', line 53 def snapshot instance, thumbnail Snapshoter.snapshot instance, thumbnail end |