Class: Cacofonix::Normaliser
- Inherits:
-
Object
- Object
- Cacofonix::Normaliser
- Defined in:
- lib/cacofonix/utils/normaliser.rb
Overview
A standalone class that can be used to normalise ONIX files into a standardised form. If you’re accepting ONIX files from a wide range of suppliers, you’re guarunteed to get all sorts of dialects.
This will create a new file that:
-
is UTF-8 encoded
-
uses reference tags, not short
-
has no named entities (ndash, etc) other than & < and >
Usage:
Cacofonix::Normaliser.process("oldfile.xml", "newfile.xml")
Dependencies:
At this stage the class depends on several external apps, all commonly available on *nix systems: xsltproc, isutf8, iconv and sed
Class Method Summary collapse
-
.process(oldfile, newfile) ⇒ Object
normalise oldfile and save it as newfile.
Instance Method Summary collapse
-
#app_available?(app) ⇒ Boolean
check the specified app is available on the system.
-
#initialize(oldfile, newfile = nil) ⇒ Normaliser
constructor
NB: Newfile argument is deprecated.
-
#next_tempfile ⇒ Object
generate a temp filename.
- #normalise_to_path(newfile) ⇒ Object
-
#normalise_to_tempfile ⇒ Object
Processes oldfile and puts the normalised result in a tempfile, returning the path to that tempfile.
-
#remove_control_chars(src, dest) ⇒ Object
XML files shouldn’t contain low ASCII control chars.
-
#run ⇒ Object
This is deprecated - use normalise_to_path with a path.
-
#to_reference_tags(src, dest) ⇒ Object
uses an XSLT stylesheet provided by edituer to convert a file from short tags to long tags.
Constructor Details
#initialize(oldfile, newfile = nil) ⇒ Normaliser
NB: Newfile argument is deprecated.
41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/cacofonix/utils/normaliser.rb', line 41 def initialize(oldfile, newfile = nil) raise ArgumentError, "#{oldfile} does not exist" unless File.file?(oldfile) raise "xsltproc app not found" unless app_available?("xsltproc") raise "tr app not found" unless app_available?("tr") @oldfile = oldfile @newfile = newfile @curfile = next_tempfile FileUtils.cp(@oldfile, @curfile) @head = File.open(@oldfile, "r") { |f| f.read(1024) } end |
Class Method Details
.process(oldfile, newfile) ⇒ Object
normalise oldfile and save it as newfile. oldfile will be left untouched
34 35 36 |
# File 'lib/cacofonix/utils/normaliser.rb', line 34 def process(oldfile, newfile) self.new(oldfile).normalise_to_path(newfile) end |
Instance Method Details
#app_available?(app) ⇒ Boolean
check the specified app is available on the system
87 88 89 |
# File 'lib/cacofonix/utils/normaliser.rb', line 87 def app_available?(app) `which #{app}`.strip == "" ? false : true end |
#next_tempfile ⇒ Object
generate a temp filename
93 94 95 96 97 98 99 100 |
# File 'lib/cacofonix/utils/normaliser.rb', line 93 def next_tempfile p = nil Tempfile.open("onix") do |tf| p = tf.path tf.close! end p end |
#normalise_to_path(newfile) ⇒ Object
58 59 60 61 62 |
# File 'lib/cacofonix/utils/normaliser.rb', line 58 def normalise_to_path(newfile) raise ArgumentError, "#{newfile} already exists" if File.file?(newfile) @curfile = normalise_to_tempfile FileUtils.cp(@curfile, newfile) end |
#normalise_to_tempfile ⇒ Object
Processes oldfile and puts the normalised result in a tempfile, returning the path to that tempfile.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/cacofonix/utils/normaliser.rb', line 67 def normalise_to_tempfile src = @curfile # remove short tags if @head.include?("ONIXmessage") dest = next_tempfile (src, dest) src = dest end # remove control chars dest = next_tempfile remove_control_chars(src, dest) dest end |
#remove_control_chars(src, dest) ⇒ Object
XML files shouldn’t contain low ASCII control chars. Strip them.
117 118 119 120 121 |
# File 'lib/cacofonix/utils/normaliser.rb', line 117 def remove_control_chars(src, dest) inpath = File.(src) outpath = File.(dest) `cat #{inpath} | tr -d "\\000-\\010\\013\\014\\016-\\037" > #{outpath}` end |
#run ⇒ Object
This is deprecated - use normalise_to_path with a path.
54 55 56 |
# File 'lib/cacofonix/utils/normaliser.rb', line 54 def run normalise_to_path(@newfile) end |
#to_reference_tags(src, dest) ⇒ Object
uses an XSLT stylesheet provided by edituer to convert a file from short tags to long tags.
more detail here:
http://www.editeur.org/files/ONIX%203/ONIX%20tagname%20converter%20v2.htm
108 109 110 111 112 113 |
# File 'lib/cacofonix/utils/normaliser.rb', line 108 def (src, dest) inpath = File.(src) outpath = File.(dest) xsltpath = File.dirname(__FILE__) + "/../../../support/switch-onix-2.1-short-to-reference.xsl" `xsltproc -o #{outpath} #{xsltpath} #{inpath}` end |