Class: ONIX::Normaliser
- Inherits:
-
Object
- Object
- ONIX::Normaliser
- Defined in:
- lib/onix/normaliser.rb
Overview
A standalone class that can be used to normalise ONIX files into a standardised form. If you’re accepting ONIX files from a wide range of suppliers, you’re guarunteed to get all sorts of dialects.
This will create a new file that:
-
is UTF-8 encoded
-
uses reference tags, not short
-
has no named entities (ndash, etc) other than & < and >
Usage:
ONIX::Normaliser.process("oldfile.xml", "newfile.xml")
Dependencies:
At this stage the class depends on several external apps, all commonly available on *nix systems: xsltproc, isutf8, iconv and sed
Class Method Summary collapse
-
.process(oldfile, newfile) ⇒ Object
normalise oldfile and save it as newfile.
Instance Method Summary collapse
-
#app_available?(app) ⇒ Boolean
check the specified app is available on the system.
-
#initialize(oldfile, newfile) ⇒ Normaliser
constructor
A new instance of Normaliser.
-
#next_tempfile ⇒ Object
generate a temp filename.
-
#remove_control_chars(src, dest) ⇒ Object
XML files shouldn’t contain low ASCII control chars.
- #run ⇒ Object
-
#to_reference_tags(src, dest) ⇒ Object
uses an XSLT stylesheet provided by edituer to convert a file from short tags to long tags.
Constructor Details
#initialize(oldfile, newfile) ⇒ Normaliser
Returns a new instance of Normaliser.
39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/onix/normaliser.rb', line 39 def initialize(oldfile, newfile) raise ArgumentError, "#{oldfile} does not exist" unless File.file?(oldfile) raise ArgumentError, "#{newfile} already exists" if File.file?(newfile) raise "xsltproc app not found" unless app_available?("xsltproc") raise "tr app not found" unless app_available?("tr") @oldfile = oldfile @newfile = newfile @curfile = next_tempfile FileUtils.cp(@oldfile, @curfile) @head = File.open(@oldfile, "r") { |f| f.read(1024) } end |
Class Method Details
.process(oldfile, newfile) ⇒ Object
normalise oldfile and save it as newfile. oldfile will be left untouched
34 35 36 |
# File 'lib/onix/normaliser.rb', line 34 def process(oldfile, newfile) self.new(oldfile, newfile).run end |
Instance Method Details
#app_available?(app) ⇒ Boolean
check the specified app is available on the system
72 73 74 |
# File 'lib/onix/normaliser.rb', line 72 def app_available?(app) `which #{app}`.strip == "" ? false : true end |
#next_tempfile ⇒ Object
generate a temp filename
78 79 80 81 82 83 84 85 |
# File 'lib/onix/normaliser.rb', line 78 def next_tempfile p = nil Tempfile.open("onix") do |tf| p = tf.path tf.close! end p end |
#remove_control_chars(src, dest) ⇒ Object
XML files shouldn’t contain low ASCII control chars. Strip them.
102 103 104 105 106 |
# File 'lib/onix/normaliser.rb', line 102 def remove_control_chars(src, dest) inpath = File.(src) outpath = File.(dest) `cat #{inpath} | tr -d "\\000-\\010\\013\\014\\016-\\037" > #{outpath}` end |
#run ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/onix/normaliser.rb', line 52 def run # remove short tags if @head.include?("ONIXmessage") dest = next_tempfile (@curfile, dest) @curfile = dest end # remove control chars dest = next_tempfile remove_control_chars(@curfile, dest) @curfile = dest FileUtils.cp(@curfile, @newfile) end |
#to_reference_tags(src, dest) ⇒ Object
uses an XSLT stylesheet provided by edituer to convert a file from short tags to long tags.
more detail here:
http://www.editeur.org/files/ONIX%203/ONIX%20tagname%20converter%20v2.htm
93 94 95 96 97 98 |
# File 'lib/onix/normaliser.rb', line 93 def (src, dest) inpath = File.(src) outpath = File.(dest) xsltpath = File.dirname(__FILE__) + "/../../support/switch-onix-2.1-short-to-reference.xsl" `xsltproc -o #{outpath} #{xsltpath} #{inpath}` end |