Class: TaliaUtil::ImportJobHelper
- Includes:
- IoHelper
- Defined in:
- lib/talia_util/import_job_helper.rb
Overview
Helper methods that will be used during import job runs. The import jobs may use the following environment parameters:
- base_url
-
The base URL or directory. This will be prefixed to all urls, or if it is a local directory, it will be made the current directory during the import
- index
-
If given, the importer will try to read this document. While this will still support the old-style “hyper” format with sigla, it should usually contain a main element called “index” followed by “url” entries.
- xml
-
URL of an XML file to import. This is incompatible with the “index” option. If neither “xml” nor “index” are given, the class will try to read the XML data from STDIN
- importer
-
Name of the importer class to be used for the data. Uses the default class if not given
- reset_store
-
If this is set, the data store will be cleared before the import
-
- user
-
Username for HTTP authentication, if required
- pass
-
Password for HTTP authentication, if required
- callback
-
Name of a class. If given, the import will call the #before_import and #after_import methods on an object of that class. The call will receive a block which may be yielded to for each progress step and which can receive the overall number of steps
- extension
-
Only used with index files; file extension to use
- duplicates
-
How to deal with elements that already exist in the datastore. This may be set to one of the following options (default: :skip):
-
:add - Database fields will be updated and the system will add semantic properties as additional values, without removing any of the existing semantic relations. Example: If the data store already contains a title for an element, and the import file contains another for that element, the element will have two titles after the import. The system will not check for duplicates. Files will always be imported in addition to the existing ones.
-
:update - Database fields will be updated, and semantic properties will be overwritten with the new value(s). Semantic properties that are not included in the import data will be left untouched. In the example above, the element would only contain the new title. If the element also contained author information, and no author information was in the import file, the existing author information will be untouched. Existing files are replaced if the import contains new files
-
:overwrite - Database fields will be updated. All semantic data will be deleted before the import. Files are always removed.
-
:skip - If an element already exists, the import will be skipped.
-
- trace
-
Enable tracing output for errors. (By default, this takes the rake task’s setting if possible)
The import itself consists in calling #initialize and the do_import
Instance Attribute Summary collapse
-
#base_url ⇒ Object
readonly
Returns the value of attribute base_url.
-
#callback ⇒ Object
readonly
Returns the value of attribute callback.
-
#credentials ⇒ Object
readonly
Returns the value of attribute credentials.
-
#duplicates ⇒ Object
readonly
Returns the value of attribute duplicates.
-
#importer ⇒ Object
readonly
Returns the value of attribute importer.
-
#index_data ⇒ Object
readonly
Returns the value of attribute index_data.
-
#message_stream ⇒ Object
readonly
Returns the value of attribute message_stream.
-
#progressor ⇒ Object
readonly
Returns the value of attribute progressor.
-
#reset ⇒ Object
readonly
Returns the value of attribute reset.
-
#trace ⇒ Object
readonly
Returns the value of attribute trace.
-
#xml_data ⇒ Object
readonly
Returns the value of attribute xml_data.
Instance Method Summary collapse
-
#do_import ⇒ Object
Does the actual importing: .
-
#import_from_index(errors) ⇒ Object
This is only used if an index file is given for the import.
-
#init_data ⇒ Object
Reads the data for the coming import.
-
#initialize(message_stream = STDOUT, progressor = TaliaUtil::BarProgressor) ⇒ ImportJobHelper
constructor
The message_stream will be used for printing progress messages.
- #make_url_from(url) ⇒ Object
-
#print_error(e) ⇒ Object
Prints the message and, if the “trace” option is set, also the stack trace of the Exception e.
- #run_callback(name) ⇒ Object
Methods included from IoHelper
#base_for, #file_url, #open_from_url, #open_generic
Constructor Details
#initialize(message_stream = STDOUT, progressor = TaliaUtil::BarProgressor) ⇒ ImportJobHelper
The message_stream will be used for printing progress messages.
The procedure of the import is the following:
-
Set up all the attributes of this class from the respective environment variables (from the rake task)
-
Initialize the data: If an index file is given, read the index file. Otherwise read the file given by the ‘xml’ environment variable, or from STDIN if ‘xml’ isn’t set. See init_data
-
Create the callback class, if given
-
Set up the progressor for the import, if any
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/talia_util/import_job_helper.rb', line 65 def initialize( = STDOUT, progressor = TaliaUtil::BarProgressor) @trace = (defined?(Rake) ? Rake.application..trace : false) || ENV['trace'] @progressor = progressor @message_stream = @duplicates = ENV['duplicates'].to_sym if(ENV['duplicates']) @importer = ENV['importer'] || 'TaliaCore::ActiveSourceParts::Xml::SourceReader' @credentials = { :http_basic_authentication => [ENV['user'], ENV['pass']] } unless(ENV['user'].blank?) assit(!(ENV['xml'] && ENV['index']), 'Not both xml and index parameters allowed') @reset = ENV['reset_store'].yes? @base_url = ENV['base_url'].blank? ? '' : ENV['base_url'] if(base_url && File.directory?(base_url)) .puts "Setting directory to #{base_url}" FileUtils.cd(base_url) end init_data @callback = ENV['callback'].classify.constantize.new unless(ENV['callback'].blank?) .puts "Registered callback (#{callback.class.name}) - (#{callback.respond_to?(:before_import)}|#{callback.respond_to?(:after_import)})" if(callback) callback.progressor = progressor if(callback && callback.respond_to?(:'progressor=')) end |
Instance Attribute Details
#base_url ⇒ Object (readonly)
Returns the value of attribute base_url.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def base_url @base_url end |
#callback ⇒ Object (readonly)
Returns the value of attribute callback.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def callback @callback end |
#credentials ⇒ Object (readonly)
Returns the value of attribute credentials.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def credentials @credentials end |
#duplicates ⇒ Object (readonly)
Returns the value of attribute duplicates.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def duplicates @duplicates end |
#importer ⇒ Object (readonly)
Returns the value of attribute importer.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def importer @importer end |
#index_data ⇒ Object (readonly)
Returns the value of attribute index_data.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def index_data @index_data end |
#message_stream ⇒ Object (readonly)
Returns the value of attribute message_stream.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def @message_stream end |
#progressor ⇒ Object (readonly)
Returns the value of attribute progressor.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def progressor @progressor end |
#reset ⇒ Object (readonly)
Returns the value of attribute reset.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def reset @reset end |
#trace ⇒ Object (readonly)
Returns the value of attribute trace.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def trace @trace end |
#xml_data ⇒ Object (readonly)
Returns the value of attribute xml_data.
53 54 55 |
# File 'lib/talia_util/import_job_helper.rb', line 53 def xml_data @xml_data end |
Instance Method Details
#do_import ⇒ Object
Does the actual importing:
-
If required, reset the data store
-
Run the “before_import” callback
-
In case there is plain xml data, TaliaCore::ActiveSource.create_from_xml will handle all the import
-
If an index is given, the import will be done by import_from_index
-
Run the “after_import” callback
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/talia_util/import_job_helper.rb', line 118 def do_import if(reset) TaliaUtil::Util.full_reset puts "Data Store has been completely reset" end errors = [] run_callback(:before_import) if(index_data) import_from_index(errors) else puts "Importing from single data file." TaliaCore::ActiveSource.create_from_xml(xml_data, :progressor => progressor, :reader => importer, :base_file_uri => @true_root, :errors => errors, :duplicates => duplicates) end if(errors.size > 0) puts "WARNING: #{errors.size} errors during import:" errors.each { |e| print_error e } end run_callback(:after_import) end |
#import_from_index(errors) ⇒ Object
This is only used if an index file is given for the import. All “plain” imports go directly to #create_from_xml in the ActiveSource class
-
The index file is parsed as XML
-
If the root element is “sigla”, the old hyper format is used
-
In case the hyper format is used, sigla (local names for URIs) are expected as “siglum” elements. Otherwise, the import URIs are expected in “url” tags.
-
For each import url, #sources_from_url is called on the selected importer, and the attributes added to the import data
-
The result is passed to TaliaCore::ActiveSource.create_multi from to create the sources
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/talia_util/import_job_helper.rb', line 157 def import_from_index(errors) doc = Hpricot.XML(index_data) hyper_format = (doc.root.name == 'sigla') elements = hyper_format ? (doc/:siglum) : (doc/:url) puts "Import from Index file, #{elements.size} elements" # Read the Attributes from the urls source_attributes = [] my_importer = importer.classify.constantize progressor.run_with_progress('Reading w/ index', elements.size) do |prog| elements.each do |element| url = make_url_from("#{element.inner_text}#{ENV['extension']}") begin this_attribs = my_importer.sources_from_url(url, credentials) source_attributes = source_attributes + this_attribs rescue Exception => e .puts "Problem importing #{url} (#{e.})" .puts e.backtrace end prog.inc end end # Write the data TaliaCore::ActiveSource.progressor = progressor TaliaCore::ActiveSource.create_multi_from(source_attributes, :errors => errors, :duplicates => duplicates) end |
#init_data ⇒ Object
Reads the data for the coming import. If the ‘index’ parameter is found in the environment, this will be used as the file name for the index file, which will be read into the object. Otherwise, if the ‘xml’ environment variable is set, this will will be read and used as the XML data for the import
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/talia_util/import_job_helper.rb', line 94 def init_data if(ENV['index'].blank?) @xml_data = if(ENV['xml'].blank?) STDIN.read else xml_url = ENV['xml'] xml_url = base_url + xml_url unless(File.exists?(xml_url)) @true_root = base_for(xml_url) open_generic(xml_url, credentials) { |io| io.read } end else index = make_url_from(ENV['index']) @index_data = open_generic(index, credentials) { |io| io.read } end end |
#make_url_from(url) ⇒ Object
183 184 185 186 |
# File 'lib/talia_util/import_job_helper.rb', line 183 def make_url_from(url) return url if(File.exist?(url)) "#{base_url}#{url}" end |
#print_error(e) ⇒ Object
Prints the message and, if the “trace” option is set, also the stack trace of the Exception e
140 141 142 143 |
# File 'lib/talia_util/import_job_helper.rb', line 140 def print_error(e) puts e. puts e.backtrace if(trace) end |
#run_callback(name) ⇒ Object
188 189 190 191 192 193 |
# File 'lib/talia_util/import_job_helper.rb', line 188 def run_callback(name) if(callback && callback.respond_to?(name)) puts "Running callback #{name}" callback.send(name) end end |