Class: REXML::Parsers::BaseParser
- Defined in:
- lib/rexml/parsers/baseparser.rb
Overview
Using the Pull Parser
This API is experimental, and subject to change.
parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
while parser.has_next?
res = parser.next
puts res[1]['att'] if res.start_tag? and res[0] == 'b'
end
See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.
Notice that:
parser = PullParser.new( "<a>BAD DOCUMENT" )
while parser.has_next?
res = parser.next
raise res[1] if res.error?
end
Nat Price gave me some good ideas for the API.
Constant Summary collapse
- LETTER =
'[:alpha:]'
- DIGIT =
'[:digit:]'
- COMBININGCHAR =
TODO
''
- EXTENDER =
TODO
''
- NCNAME_STR =
"[#{LETTER}_][-[:alnum:]._#{COMBININGCHAR}#{EXTENDER}]*"
- QNAME_STR =
"(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"
- QNAME =
/(#{QNAME_STR})/
- NAMECHAR =
'[\-\w\.:]'
- NAME =
"([\\w:]#{NAMECHAR}*)"
- NMTOKEN =
"(?:#{NAMECHAR})+"
- NMTOKENS =
"#{NMTOKEN}(\\s+#{NMTOKEN})*"
- REFERENCE =
"&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"
- REFERENCE_RE =
/#{REFERENCE}/
- DOCTYPE_START =
/\A\s*<!DOCTYPE\s/um
- DOCTYPE_END =
/\A\s*\]\s*>/um
- DOCTYPE_PATTERN =
/\s*<!DOCTYPE\s+(.*?)(\[|>)/um
- ATTRIBUTE_PATTERN =
/\s*(#{QNAME_STR})\s*=\s*(["'])(.*?)\4/um
- COMMENT_START =
/\A<!--/u
- COMMENT_PATTERN =
/<!--(.*?)-->/um
- CDATA_START =
/\A<!\[CDATA\[/u
- CDATA_END =
/\A\s*\]\s*>/um
- CDATA_PATTERN =
/<!\[CDATA\[(.*?)\]\]>/um
- XMLDECL_START =
/\A<\?xml\s/u
- XMLDECL_PATTERN =
/<\?xml\s+(.*?)\?>/um
- INSTRUCTION_START =
/\A<\?/u
- INSTRUCTION_PATTERN =
/<\?#{NAME}(\s+.*?)?\?>/um
- TAG_MATCH =
/^<((?>#{QNAME_STR}))/um
- CLOSE_MATCH =
/^\s*<\/(#{QNAME_STR})\s*>/um
- VERSION =
/\bversion\s*=\s*["'](.*?)['"]/um
- ENCODING =
/\bencoding\s*=\s*["'](.*?)['"]/um
- STANDALONE =
/\bstandalone\s*=\s*["'](.*?)['"]/um
- ENTITY_START =
/\A\s*<!ENTITY/
- IDENTITY =
/^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u
- ELEMENTDECL_START =
/\A\s*<!ELEMENT/um
- ELEMENTDECL_PATTERN =
/\A\s*(<!ELEMENT.*?)>/um
- SYSTEMENTITY =
/\A\s*(%.*?;)\s*$/um
- ENUMERATION =
"\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"
- NOTATIONTYPE =
"NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"
- ENUMERATEDTYPE =
"(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"
- ATTTYPE =
"(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"
- ATTVALUE =
"(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"
- DEFAULTDECL =
"(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"
- ATTDEF =
"\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"
- ATTDEF_RE =
/#{ATTDEF}/
- ATTLISTDECL_START =
/\A\s*<!ATTLIST/um
- ATTLISTDECL_PATTERN =
/\A\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um
- NOTATIONDECL_START =
/\A\s*<!NOTATION/um
- PUBLIC =
/\A\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um
- SYSTEM =
/\A\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um
- TEXT_PATTERN =
/\A([^<]*)/um
- PUBIDCHAR =
Entity constants
"\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"
- SYSTEMLITERAL =
%Q{((?:"[^"]*")|(?:'[^']*'))}
- PUBIDLITERAL =
%Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}
- EXTERNALID =
"(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
- NDATADECL =
"\\s+NDATA\\s+#{NAME}"
- PEREFERENCE =
"%#{NAME};"
- ENTITYVALUE =
%Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
- PEDEF =
"(?:#{ENTITYVALUE}|#{EXTERNALID})"
- ENTITYDEF =
"(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
- PEDECL =
"<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
- GEDECL =
"<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
- ENTITYDECL =
/\s*(?:#{GEDECL})|(?:#{PEDECL})/um
- EREFERENCE =
/&(?!#{NAME};)/
- DEFAULT_ENTITIES =
{ 'gt' => [/>/, '>', '>', />/], 'lt' => [/</, '<', '<', /</], 'quot' => [/"/, '"', '"', /"/], "apos" => [/'/, "'", "'", /'/] }
Instance Attribute Summary collapse
-
#source ⇒ Object
readonly
Returns the value of attribute source.
Instance Method Summary collapse
- #add_listener(listener) ⇒ Object
-
#empty? ⇒ Boolean
Returns true if there are no more events.
- #entity(reference, entities) ⇒ Object
-
#has_next? ⇒ Boolean
Returns true if there are more events.
-
#initialize(source) ⇒ BaseParser
constructor
A new instance of BaseParser.
-
#normalize(input, entities = nil, entity_filter = nil) ⇒ Object
Escapes all possible entities.
-
#peek(depth = 0) ⇒ Object
Peek at the
depth
event in the stack. - #position ⇒ Object
-
#pull ⇒ Object
Returns the next event.
- #stream=(source) ⇒ Object
-
#unnormalize(string, entities = nil, filter = nil) ⇒ Object
Unescapes all possible entities.
-
#unshift(token) ⇒ Object
Push an event back on the head of the stream.
Constructor Details
#initialize(source) ⇒ BaseParser
Returns a new instance of BaseParser.
111 112 113 114 |
# File 'lib/rexml/parsers/baseparser.rb', line 111 def initialize( source ) self.stream = source @listeners = [] end |
Instance Attribute Details
#source ⇒ Object (readonly)
Returns the value of attribute source.
120 121 122 |
# File 'lib/rexml/parsers/baseparser.rb', line 120 def source @source end |
Instance Method Details
#add_listener(listener) ⇒ Object
116 117 118 |
# File 'lib/rexml/parsers/baseparser.rb', line 116 def add_listener( listener ) @listeners << listener end |
#empty? ⇒ Boolean
Returns true if there are no more events
142 143 144 |
# File 'lib/rexml/parsers/baseparser.rb', line 142 def empty? return (@source.empty? and @stack.empty?) end |
#entity(reference, entities) ⇒ Object
409 410 411 412 413 414 415 416 417 |
# File 'lib/rexml/parsers/baseparser.rb', line 409 def entity( reference, entities ) value = nil value = entities[ reference ] if entities if not value value = DEFAULT_ENTITIES[ reference ] value = value[2] if value end unnormalize( value, entities ) if value end |
#has_next? ⇒ Boolean
Returns true if there are more events. Synonymous with !empty?
147 148 149 |
# File 'lib/rexml/parsers/baseparser.rb', line 147 def has_next? return !(@source.empty? and @stack.empty?) end |
#normalize(input, entities = nil, entity_filter = nil) ⇒ Object
Escapes all possible entities
420 421 422 423 424 425 426 427 428 429 430 431 432 433 |
# File 'lib/rexml/parsers/baseparser.rb', line 420 def normalize( input, entities=nil, entity_filter=nil ) copy = input.clone # Doing it like this rather than in a loop improves the speed copy.gsub!( EREFERENCE, '&' ) entities.each do |key, value| copy.gsub!( value, "&#{key};" ) unless entity_filter and entity_filter.include?(entity) end if entities copy.gsub!( EREFERENCE, '&' ) DEFAULT_ENTITIES.each do |key, value| copy.gsub!( value[3], value[1] ) end copy end |
#peek(depth = 0) ⇒ Object
Peek at the depth
event in the stack. The first element on the stack is at depth 0. If depth
is -1, will parse to the end of the input stream and return the last event, which is always :end_document. Be aware that this causes the stream to be parsed up to the depth
event, so you can effectively pre-parse the entire document (pull the entire thing into memory) using this method.
163 164 165 166 167 168 169 170 171 172 173 174 175 |
# File 'lib/rexml/parsers/baseparser.rb', line 163 def peek depth=0 raise %Q[Illegal argument "#{depth}"] if depth < -1 temp = [] if depth == -1 temp.push(pull()) until empty? else while @stack.size+temp.size < depth+1 temp.push(pull()) end end @stack += temp if temp.size > 0 @stack[depth] end |
#position ⇒ Object
132 133 134 135 136 137 138 139 |
# File 'lib/rexml/parsers/baseparser.rb', line 132 def position if @source.respond_to? :position @source.position else # FIXME 0 end end |
#pull ⇒ Object
Returns the next event. This is a PullEvent
object.
178 179 180 181 182 183 184 |
# File 'lib/rexml/parsers/baseparser.rb', line 178 def pull pull_event.tap do |event| @listeners.each do |listener| listener.receive event end end end |
#stream=(source) ⇒ Object
122 123 124 125 126 127 128 129 130 |
# File 'lib/rexml/parsers/baseparser.rb', line 122 def stream=( source ) @source = SourceFactory.create_from( source ) @closed = nil @document_status = nil @tags = [] @stack = [] @entities = [] @nsstack = [] end |
#unnormalize(string, entities = nil, filter = nil) ⇒ Object
Unescapes all possible entities
436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 |
# File 'lib/rexml/parsers/baseparser.rb', line 436 def unnormalize( string, entities=nil, filter=nil ) rv = string.clone rv.gsub!( /\r\n?/, "\n" ) matches = rv.scan( REFERENCE_RE ) return rv if matches.size == 0 rv.gsub!( /�*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) { m=$1 m = "0#{m}" if m[0] == ?x [Integer(m)].pack('U*') } matches.collect!{|x|x[0]}.compact! if matches.size > 0 matches.each do |entity_reference| unless filter and filter.include?(entity_reference) entity_value = entity( entity_reference, entities ) if entity_value re = /&#{entity_reference};/ rv.gsub!( re, entity_value ) else er = DEFAULT_ENTITIES[entity_reference] rv.gsub!( er[0], er[2] ) if er end end end rv.gsub!( /&/, '&' ) end rv end |
#unshift(token) ⇒ Object
Push an event back on the head of the stream. This method has (theoretically) infinite depth.
153 154 155 |
# File 'lib/rexml/parsers/baseparser.rb', line 153 def unshift token @stack.unshift(token) end |