Class: MARC::XMLReader
- Inherits:
-
Object
- Object
- MARC::XMLReader
- Includes:
- Enumerable
- Defined in:
- lib/marc/xmlreader.rb
Overview
the constructor which you can pass either a filename:
reader = MARC::XMLReader.new('/Users/edsu/marc.xml')
or a File object,
reader = Marc::XMLReader.new(File.new('/Users/edsu/marc.xml'))
or really any object that responds to read(n)
reader = MARC::XMLReader.new(StringIO.new(xml))
By default, XMLReader uses REXML’s pull parser, but you can swap that out with Nokogiri or jrexml (or let the system choose the ‘best’ one). The :parser can either be one of the defined constants or the constant’s value.
reader = MARC::XMLReader.new(fh, :parser=>'magic')
It is also possible to set the default parser at the class level so all subsequent instances will use it instead:
MARC::XMLReader.best_available
"nokogiri" # returns parser name, but doesn't set it.
Use:
MARC::XMLReader.best_available!
or
MARC::XMLReader.nokogiri!
By default, all XML parsers except REXML require the MARC namespace (www.loc.gov/MARC21/slim) to be included. Adding the option ‘ignore_namespace` to the call to `new` with a true value will allow parsing to proceed, e.g.,
reader = MARC::XMLReader.new(filename, parser: :nokogiri, ignore_namespace: true)
You can also pass in an error_handler option that will be called if there are any validation errors found when parsing a record.
reader = MARC::XMLReader.new(fh, error_handler: ->(reader, record, block) { ... })
By default, a MARC::RecordException is raised halting all future parsing.
Constant Summary collapse
- USE_BEST_AVAILABLE =
"magic"
- USE_REXML =
"rexml"
- USE_NOKOGIRI =
"nokogiri"
- USE_JREXML =
"jrexml"
- USE_JSTAX =
"jstax"
- USE_LIBXML =
"libxml"
- @@parser =
USE_REXML
Instance Attribute Summary collapse
-
#error_handler ⇒ Object
readonly
Returns the value of attribute error_handler.
-
#parser ⇒ Object
readonly
Returns the value of attribute parser.
Class Method Summary collapse
-
.best_available ⇒ Object
Returns the value of the best available parser.
-
.best_available! ⇒ Object
Sets the best available parser as the default.
- .choose_parser(p) ⇒ Object
-
.jrexml! ⇒ Object
Sets jrexml as the default parser.
-
.nokogiri! ⇒ Object
Sets Nokogiri as the default parser.
-
.parser ⇒ Object
Returns the currently set parser type.
-
.parser=(p) ⇒ Object
Sets the class parser.
-
.parsers ⇒ Object
Returns an array of all the parsers available.
-
.rexml! ⇒ Object
Sets REXML as the default parser.
Instance Method Summary collapse
-
#initialize(file, options = {}) ⇒ XMLReader
constructor
A new instance of XMLReader.
Constructor Details
#initialize(file, options = {}) ⇒ XMLReader
Returns a new instance of XMLReader.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/marc/xmlreader.rb', line 58 def initialize(file, = {}) if file.is_a?(String) handle = File.new(file) elsif file.respond_to?(:read, 5) handle = file else raise ArgumentError, "must pass in path or File" end @handle = handle if [:ignore_namespace] @ignore_namespace = [:ignore_namespace] end parser = if [:parser] self.class.choose_parser([:parser].to_s) else @@parser end case parser when "magic" then extend MagicReader when "rexml" then extend REXMLReader when "jrexml" raise ArgumentError, "jrexml only available under jruby" unless defined? JRUBY_VERSION extend JREXMLReader when "nokogiri" then extend NokogiriReader when "jstax" raise ArgumentError, "jstax only available under jruby" unless defined? JRUBY_VERSION extend JRubySTAXReader when "libxml" then extend LibXMLReader raise ArgumentError, "libxml not available under jruby" if defined? JRUBY_VERSION end @error_handler = [:error_handler] end |
Instance Attribute Details
#error_handler ⇒ Object (readonly)
Returns the value of attribute error_handler.
56 57 58 |
# File 'lib/marc/xmlreader.rb', line 56 def error_handler @error_handler end |
#parser ⇒ Object (readonly)
Returns the value of attribute parser.
56 57 58 |
# File 'lib/marc/xmlreader.rb', line 56 def parser @parser end |
Class Method Details
.best_available ⇒ Object
Returns the value of the best available parser
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
# File 'lib/marc/xmlreader.rb', line 117 def best_available parser = nil if defined? JRUBY_VERSION unless parser begin require "nokogiri" parser = USE_NOKOGIRI rescue LoadError end end unless parser begin # try to find the class, so we throw an error if not found java.lang.Class.forName("javax.xml.stream.XMLInputFactory") parser = USE_JSTAX rescue java.lang.ClassNotFoundException end end unless parser begin require "jrexml" parser = USE_JREXML rescue LoadError end end else begin require "nokogiri" parser = USE_NOKOGIRI rescue LoadError end unless defined? JRUBY_VERSION unless parser begin require "xml" parser = USE_LIBXML rescue LoadError end end end end parser ||= USE_REXML parser end |
.best_available! ⇒ Object
Sets the best available parser as the default
163 164 165 |
# File 'lib/marc/xmlreader.rb', line 163 def best_available! @@parser = best_available end |
.choose_parser(p) ⇒ Object
182 183 184 185 186 187 188 189 190 191 192 |
# File 'lib/marc/xmlreader.rb', line 182 def choose_parser(p) match = false constants.each do |const| next unless const.to_s.match?("^USE_") if const_get(const) == p match = true return p end end raise ArgumentError.new("Parser '#{p}' not defined") unless match end |
.jrexml! ⇒ Object
Sets jrexml as the default parser
173 174 175 |
# File 'lib/marc/xmlreader.rb', line 173 def jrexml! @@parser = USE_JREXML end |
.nokogiri! ⇒ Object
Sets Nokogiri as the default parser
168 169 170 |
# File 'lib/marc/xmlreader.rb', line 168 def nokogiri! @@parser = USE_NOKOGIRI end |
.parser ⇒ Object
Returns the currently set parser type
97 98 99 |
# File 'lib/marc/xmlreader.rb', line 97 def parser @@parser end |
.parser=(p) ⇒ Object
Sets the class parser
112 113 114 |
# File 'lib/marc/xmlreader.rb', line 112 def parser=(p) @@parser = choose_parser(p) end |
.parsers ⇒ Object
Returns an array of all the parsers available
102 103 104 105 106 107 108 109 |
# File 'lib/marc/xmlreader.rb', line 102 def parsers p = [] constants.each do |const| next unless const.match?("^USE_") p << const end p end |
.rexml! ⇒ Object
Sets REXML as the default parser
178 179 180 |
# File 'lib/marc/xmlreader.rb', line 178 def rexml! @@parser = USE_REXML end |