Class: AIPP::Downloader
- Inherits:
-
Object
- Object
- AIPP::Downloader
- Includes:
- Debugger
- Defined in:
- lib/aipp/downloader.rb,
lib/aipp/downloader/file.rb,
lib/aipp/downloader/http.rb,
lib/aipp/downloader/graphql.rb
Overview
AIP downloader infrastructure
The downloader operates in the storage
directory where it creates two subdirectories “sources” and “work”. The initializer looks for the source
archive in “sources” and (if found) unpacks its contents into “work”. When reading a document
, the downloader looks for the document
in “work” and (if not found or the clean option is set) downloads it from origin
. Finally, the contents of “work” are packed back into the source
archive.
Origins are defined as instances of downloader origin objects:
-
AIXM::Downloader::File – local file or archive
-
AIXM::Downloader::HTTP – remote file or archive via HTTP
-
AIXM::Downloader::GraphQL – GraphQL query
The following archives are recognized:
- .zip
-
ZIP archive
The following file types are recognised:
- .ofmx
-
Parsed by Nokogiri returning an instance of Nokogiri::XML::Document
- .xml
-
Parsed by Nokogiri returning an instance of Nokogiri::XML::Document
- .html
-
Parsed by Nokogiri returning an instance of Nokogiri::HTML5::Document
-
Converted to text – see PDF
- .json
-
Deserialized JSON e.g. as response to a GraphQL query
- .xlsx
-
Parsed by Roo returning an instance of Roo::Excelx
- .ods
-
Parsed by Roo returning an instance of Roo::OpenOffice
- .csv
-
Parsed by Roo returning an instance of Roo::CSV including the first header line
- .txt
-
Instance of
String
Defined Under Namespace
Classes: File, GraphQL, HTTP, NotFoundError
Instance Attribute Summary collapse
-
#source ⇒ String
readonly
Name of the source archive (without extension “.zip”).
-
#source_file ⇒ Pathname
readonly
Full path to the source archive.
-
#storage ⇒ Pathname
readonly
Directory to operate within.
Instance Method Summary collapse
-
#initialize(storage:, source:) ⇒ Downloader
constructor
A new instance of Downloader.
- #inspect ⇒ String
-
#read(document:, origin:) ⇒ Object
Download and read
document
.
Methods included from Debugger
#info, #original_warn, #verbose_info, #warn, #with_debugger
Constructor Details
#initialize(storage:, source:) ⇒ Downloader
Returns a new instance of Downloader.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/aipp/downloader.rb', line 67 def initialize(storage:, source:) @storage, @source = storage, source fail(ArgumentError, 'bad storage directory') unless Dir.exist? storage @source_file = sources_path.join("#{@source}.zip") prepare if @source_file.exist? if AIPP..clean @source_file.delete else unpack end end yield self pack ensure teardown end |
Instance Attribute Details
#source ⇒ String (readonly)
Returns name of the source archive (without extension “.zip”).
60 61 62 |
# File 'lib/aipp/downloader.rb', line 60 def source @source end |
#source_file ⇒ Pathname (readonly)
Returns full path to the source archive.
63 64 65 |
# File 'lib/aipp/downloader.rb', line 63 def source_file @source_file end |
#storage ⇒ Pathname (readonly)
Returns directory to operate within.
57 58 59 |
# File 'lib/aipp/downloader.rb', line 57 def storage @storage end |
Instance Method Details
#inspect ⇒ String
86 87 88 |
# File 'lib/aipp/downloader.rb', line 86 def inspect "#<AIPP::Downloader>" end |
#read(document:, origin:) ⇒ Object
Download and read document
96 97 98 99 100 101 102 103 |
# File 'lib/aipp/downloader.rb', line 96 def read(document:, origin:) file = work_path.join(origin.fetched_file) unless file.exist? verbose_info "downloading #{document}" origin.fetch_to(work_path) end convert file end |