Class: SAXish
- Inherits:
-
Object
- Object
- SAXish
- Defined in:
- lib/xamplr-pp/saxish.rb
Instance Attribute Summary collapse
-
#handler ⇒ Object
The Ruby implementation of the xampl-pp parser is called Xampl_PP, and SAXish will be the name of our SAX-like parser.
-
#processNamespace ⇒ Object
Sax parsers need an event handler.
-
#reportNamespaceAttributes ⇒ Object
Returns the value of attribute reportNamespaceAttributes.
Instance Method Summary collapse
- #attributeCount ⇒ Object
- #attributeName(i) ⇒ Object
- #attributeNamespace(i) ⇒ Object
- #attributePrefix(i) ⇒ Object
- #attributeQName(i) ⇒ Object
- #attributeValue(i) ⇒ Object
- #column ⇒ Object
- #depth ⇒ Object
- #line ⇒ Object
-
#parse(filename) ⇒ Object
This block of comments can be ignored, certainly for the first reading.
- #parseString(string) ⇒ Object
-
#work ⇒ Object
Constructing an instance of xampl-pp is pretty straight forward: Xampl_PP.new.
Instance Attribute Details
#handler ⇒ Object
The Ruby implementation of the xampl-pp parser is called Xampl_PP, and SAXish will be the name of our SAX-like parser.
54 55 56 |
# File 'lib/xamplr-pp/saxish.rb', line 54 def handler @handler end |
#processNamespace ⇒ Object
Sax parsers need an event handler. ‘handler’ is it. Handler is expected to implement the methods defined in the module ‘saxishHandler’. SaxishHandler is intended to be an adapter (so you can include it in any hander you write), so only the event-handlers for those events in which you are interested in need to be re-defined. SAXdemo is an implementation of SaxishHandler that gathers some statistics.
Xampl-pp requires something it calls a resolver. This is a class that implements a method called resolve. There are a number of predefined entities in xampl-pp: & ' > < and ". It is possible to add more entities by adding entries to the entityMap hashtable. If an entity is encountered that is not in entityMap then the resolve method on the resolver is called. The default resolver returns nil, which causes an exception to be thrown. If you specify your own resolver you can do anything you like to obtain a value for the entity, or you can return nil (and an exception will be thrown). Xampl-pp, by default, is its own resolver and simply return nil.
We are going to require that our saxish handler also be the entity resolver. This is reflected in the SaxHandler module, which implements a resolve method that always returns nil.
80 81 82 |
# File 'lib/xamplr-pp/saxish.rb', line 80 def processNamespace @processNamespace end |
#reportNamespaceAttributes ⇒ Object
Returns the value of attribute reportNamespaceAttributes.
81 82 83 |
# File 'lib/xamplr-pp/saxish.rb', line 81 def reportNamespaceAttributes @reportNamespaceAttributes end |
Instance Method Details
#attributeCount ⇒ Object
193 194 195 |
# File 'lib/xamplr-pp/saxish.rb', line 193 def attributeCount return @xpp.attributeName.length end |
#attributeName(i) ⇒ Object
197 198 199 |
# File 'lib/xamplr-pp/saxish.rb', line 197 def attributeName(i) return @xpp.attributeName[i] end |
#attributeNamespace(i) ⇒ Object
201 202 203 |
# File 'lib/xamplr-pp/saxish.rb', line 201 def attributeNamespace(i) return @xpp.attributeNamespace[i] end |
#attributePrefix(i) ⇒ Object
209 210 211 |
# File 'lib/xamplr-pp/saxish.rb', line 209 def attributePrefix(i) return @xpp.attributePrefix[i] end |
#attributeQName(i) ⇒ Object
205 206 207 |
# File 'lib/xamplr-pp/saxish.rb', line 205 def attributeQName(i) return @xpp.attributeQName[i] end |
#attributeValue(i) ⇒ Object
213 214 215 |
# File 'lib/xamplr-pp/saxish.rb', line 213 def attributeValue(i) return @xpp.attributeValue[i] end |
#column ⇒ Object
225 226 227 |
# File 'lib/xamplr-pp/saxish.rb', line 225 def column return @xpp.column end |
#depth ⇒ Object
217 218 219 |
# File 'lib/xamplr-pp/saxish.rb', line 217 def depth return @xpp.depth end |
#line ⇒ Object
221 222 223 |
# File 'lib/xamplr-pp/saxish.rb', line 221 def line return @xpp.line end |
#parse(filename) ⇒ Object
This block of comments can be ignored, certainly for the first reading. It talks about some control you have over how the xampl-pp works. The default behaviour is the most commonly used.
There are two main controls used here: processNamespace, and reportNamespaceAttributes. If processNamespaces is true, then namespaces in the XML file being parsed will be processed. Processing means that if an element <prefix:name/> is encountered, then four variables will be set up in the parser instance: name is ‘name’, prefix is ‘prefix’, qname is ‘prefix:name’, and namespace is defined. If the namespace cannot be defined an exception is thrown. In addition the xmlns attributes are processed. If processNamespace is false then name and qname will both be ‘prefix:name’, and both prefix and namespace undefined. If reportNamespaceAttributes is true then the xmlns attributes will be reported along with all the other attributes, if false then they will be hidden. The default behaviour is to process namespaces but to not report the namespace attributes.
There are two other controls that should be mentioned. They are not used here.
Pull parsers are pretty low level tools. They are meant to be fast. While may wellformedness constraints are enforced, not all are. If the control checkWellFormed is true then additional checks are made. Xampl-pp does not guarantee that it will parse only well formed XML documents. It will parse some XML files that are not well formed without objecting. In future releases, it will be possible to have xampl-pp accept only well formed documents. If checkWellFormed is false, then the parser doesn’t go out of its way to notice ill formed documents. The default is true.
The fourth control is ‘utf8encode’. If this is true, and it defaults to true, then an entity like Ӓ is encountered then it will be encoded using utf8 rules. Given the current state of the parser, it would be best to leave it set to true. If you want to change this then you must either never use &#; encodings with numbers greater than 255 (Ruby will throw an exception), or you must redefine xampl-pp’s encode method to do the right thing.
124 125 126 127 128 129 130 131 132 |
# File 'lib/xamplr-pp/saxish.rb', line 124 def parse(filename) @xpp = Xampl_PP.new @xpp.input = File.new(filename) @xpp.processNamespace = @processNamespace @xpp.reportNamespaceAttributes = @reportNamespaceAttributes @xpp.resolver = @handler work end |
#parseString(string) ⇒ Object
134 135 136 137 138 139 140 141 142 |
# File 'lib/xamplr-pp/saxish.rb', line 134 def parseString(string) @xpp = Xampl_PP.new @xpp.input = string @xpp.processNamespace = @processNamespace @xpp.reportNamespaceAttributes = @reportNamespaceAttributes @xpp.resolver = @handler work end |
#work ⇒ Object
Constructing an instance of xampl-pp is pretty straight forward: Xampl_PP.new
Xampl_PP accepts two kinds of input: IO and String. The same method, ‘input’, is used to specify the input. It is possible to set the input anytime, but if you do, the current input will be closed if it is of type IO, and the parsing will begin at the current location of the input.
The methods parse and parseString illustrate.
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/xamplr-pp/saxish.rb', line 155 def work while not @xpp.endDocument? do case @xpp.nextEvent when Xampl_PP::START_DOCUMENT @handler.startDocument when Xampl_PP::END_DOCUMENT @handler.endDocument when Xampl_PP::START_ELEMENT @handler.startElement(@xpp.name, @xpp.namespace, @xpp.qname, @xpp.prefix, attributeCount, @xpp.emptyElement, self) when Xampl_PP::END_ELEMENT @handler.endElement(@xpp.name, @xpp.namespace, @xpp.qname, @xpp.prefix) when Xampl_PP::TEXT @handler.text(@xpp.text, @xpp.whitespace?) when Xampl_PP::CDATA_SECTION @handler.cdataSection(@xpp.text) when Xampl_PP::ENTITY_REF @handler.entityRef(@xpp.name, @xpp.text) when Xampl_PP::IGNORABLE_WHITESPACE @handler.ignoreableWhitespace(@xpp.text) when Xampl_PP::PROCESSING_INSTRUCTION @handler.processingInstruction(@xpp.text) when Xampl_PP::COMMENT @handler.comment(@xpp.text) when Xampl_PP::DOCTYPE @handler.doctype(@xpp.text) end end end |