Class: REXML::Parsers::BaseParser
- Inherits:
-
Object
- Object
- REXML::Parsers::BaseParser
- Defined in:
- lib/rexml/parsers/baseparser.rb
Overview
Using the Pull Parser
This API is experimental, and subject to change.
parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
while parser.has_next?
res = parser.next
puts res[1]['att'] if res.start_tag? and res[0] == 'b'
end
See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.
Notice that:
parser = PullParser.new( "<a>BAD DOCUMENT" )
while parser.has_next?
res = parser.next
raise res[1] if res.error?
end
Nat Price gave me some good ideas for the API.
Constant Summary
- LETTER =
'[:alpha:]'- DIGIT =
'[:digit:]'- COMBININGCHAR =
''- EXTENDER =
TODO TODO
''- NCNAME_STR =
"[#{LETTER}_:][-[:alnum:]._:#{COMBININGCHAR}#{EXTENDER}]*"- NAME_STR =
"(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"- UNAME_STR =
"(?:#{NCNAME_STR}:)?#{NCNAME_STR}"- NAMECHAR =
'[\-\w\.:]'- NAME =
"([\\w:]#{NAMECHAR}*)"- NMTOKEN =
"(?:#{NAMECHAR})+"- NMTOKENS =
"#{NMTOKEN}(\\s+#{NMTOKEN})*"- REFERENCE =
"&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"- REFERENCE_RE =
/#{REFERENCE}/- DOCTYPE_START =
/\A\s*<!DOCTYPE\s/um- DOCTYPE_PATTERN =
/\s*<!DOCTYPE\s+(.*?)(\[|>)/um- ATTRIBUTE_PATTERN =
/\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\4/um- COMMENT_START =
/\A<!--/u- COMMENT_PATTERN =
/<!--(.*?)-->/um- CDATA_START =
/\A<!\[CDATA\[/u- CDATA_END =
/^\s*\]\s*>/um- CDATA_PATTERN =
/<!\[CDATA\[(.*?)\]\]>/um- XMLDECL_START =
/\A<\?xml\s/u- XMLDECL_PATTERN =
/<\?xml\s+(.*?)\?>/um- INSTRUCTION_START =
/\A<\?/u- INSTRUCTION_PATTERN =
/<\?(.*?)(\s+.*?)?\?>/um- TAG_MATCH =
/^<((?>#{NAME_STR}))\s*((?>\s+#{UNAME_STR}\s*=\s*(["']).*?\5)*)\s*(\/)?>/um- CLOSE_MATCH =
/^\s*<\/(#{NAME_STR})\s*>/um- VERSION =
/\bversion\s*=\s*["'](.*?)['"]/um- ENCODING =
/\bencoding\s*=\s*["'](.*?)['"]/um- STANDALONE =
/\bstandalone\s*=\s*["'](.*?)['"]/um- ENTITY_START =
/^\s*<!ENTITY/- IDENTITY =
/^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u- ELEMENTDECL_START =
/^\s*<!ELEMENT/um- ELEMENTDECL_PATTERN =
/^\s*(<!ELEMENT.*?)>/um- SYSTEMENTITY =
/^\s*(%.*?;)\s*$/um- ENUMERATION =
"\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"- NOTATIONTYPE =
"NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"- ENUMERATEDTYPE =
"(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"- ATTTYPE =
"(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"- ATTVALUE =
"(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"- DEFAULTDECL =
"(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"- ATTDEF =
"\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"- ATTDEF_RE =
/#{ATTDEF}/- ATTLISTDECL_START =
/^\s*<!ATTLIST/um- ATTLISTDECL_PATTERN =
/^\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um- NOTATIONDECL_START =
/^\s*<!NOTATION/um- PUBLIC =
/^\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um- SYSTEM =
/^\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um- TEXT_PATTERN =
/\A([^<]*)/um- PUBIDCHAR =
Entity constants
"\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"- SYSTEMLITERAL =
%Q{((?:"[^"]*")|(?:'[^']*'))}- PUBIDLITERAL =
%Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}- EXTERNALID =
"(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"- NDATADECL =
"\\s+NDATA\\s+#{NAME}"- PEREFERENCE =
"%#{NAME};"- ENTITYVALUE =
%Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}- PEDEF =
"(?:#{ENTITYVALUE}|#{EXTERNALID})"- ENTITYDEF =
"(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"- PEDECL =
"<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"- GEDECL =
"<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"- ENTITYDECL =
/\s*(?:#{GEDECL})|(?:#{PEDECL})/um- EREFERENCE =
/&(?!#{NAME};)/- DEFAULT_ENTITIES =
{ 'gt' => [/>/, '>', '>', />/], 'lt' => [/</, '<', '<', /</], 'quot' => [/"/, '"', '"', /"/], "apos" => [/'/, "'", "'", /'/] }
- MISSING_ATTRIBUTE_QUOTES =
These are patterns to identify common markup errors, to make the error messages more informative.
/^<#{NAME_STR}\s+#{NAME_STR}\s*=\s*[^"']/um
Instance Attribute Summary (collapse)
-
- (Object) source
readonly
Returns the value of attribute source.
Instance Method Summary (collapse)
- - (Object) add_listener(listener)
-
- (Boolean) empty?
Returns true if there are no more events.
- - (Object) entity(reference, entities)
-
- (Boolean) has_next?
Returns true if there are more events.
-
- (BaseParser) initialize(source)
constructor
A new instance of BaseParser.
-
- (Object) normalize(input, entities = nil, entity_filter = nil)
Escapes all possible entities.
-
- (Object) peek(depth = 0)
Peek at the depth event in the stack.
- - (Object) position
-
- (Object) pull
Returns the next event.
- - (Object) stream=(source)
-
- (Object) unnormalize(string, entities = nil, filter = nil)
Unescapes all possible entities.
-
- (Object) unshift(token)
Push an event back on the head of the stream.
Constructor Details
- (BaseParser) initialize(source)
A new instance of BaseParser
115 116 117 118 |
# File 'lib/rexml/parsers/baseparser.rb', line 115 def initialize( source ) self.stream = source @listeners = [] end |
Instance Attribute Details
- (Object) source (readonly)
Returns the value of attribute source
124 125 126 |
# File 'lib/rexml/parsers/baseparser.rb', line 124 def source @source end |
Instance Method Details
- (Object) add_listener(listener)
120 121 122 |
# File 'lib/rexml/parsers/baseparser.rb', line 120 def add_listener( listener ) @listeners << listener end |
- (Boolean) empty?
Returns true if there are no more events
146 147 148 |
# File 'lib/rexml/parsers/baseparser.rb', line 146 def empty? return (@source.empty? and @stack.empty?) end |
- (Object) entity(reference, entities)
446 447 448 449 450 451 452 453 454 |
# File 'lib/rexml/parsers/baseparser.rb', line 446 def entity( reference, entities ) value = nil value = entities[ reference ] if entities if not value value = DEFAULT_ENTITIES[ reference ] value = value[2] if value end unnormalize( value, entities ) if value end |
- (Boolean) has_next?
Returns true if there are more events. Synonymous with !empty?
151 152 153 |
# File 'lib/rexml/parsers/baseparser.rb', line 151 def has_next? return !(@source.empty? and @stack.empty?) end |
- (Object) normalize(input, entities = nil, entity_filter = nil)
Escapes all possible entities
457 458 459 460 461 462 463 464 465 466 467 468 469 470 |
# File 'lib/rexml/parsers/baseparser.rb', line 457 def normalize( input, entities=nil, entity_filter=nil ) copy = input.clone # Doing it like this rather than in a loop improves the speed copy.gsub!( EREFERENCE, '&' ) entities.each do |key, value| copy.gsub!( value, "&#{key};" ) unless entity_filter and entity_filter.include?(entity) end if entities copy.gsub!( EREFERENCE, '&' ) DEFAULT_ENTITIES.each do |key, value| copy.gsub!( value[3], value[1] ) end copy end |
- (Object) peek(depth = 0)
Peek at the depth event in the stack. The first element on the stack is at depth 0. If depth is -1, will parse to the end of the input stream and return the last event, which is always :end_document. Be aware that this causes the stream to be parsed up to the depth event, so you can effectively pre-parse the entire document (pull the entire thing into memory) using this method.
167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/rexml/parsers/baseparser.rb', line 167 def peek depth=0 raise %Q[Illegal argument "#{depth}"] if depth < -1 temp = [] if depth == -1 temp.push(pull()) until empty? else while @stack.size+temp.size < depth+1 temp.push(pull()) end end @stack += temp if temp.size > 0 @stack[depth] end |
- (Object) position
136 137 138 139 140 141 142 143 |
# File 'lib/rexml/parsers/baseparser.rb', line 136 def position if @source.respond_to? :position @source.position else # FIXME 0 end end |
- (Object) pull
Returns the next event. This is a PullEvent object.
182 183 184 185 186 187 188 |
# File 'lib/rexml/parsers/baseparser.rb', line 182 def pull pull_event.tap do |event| @listeners.each do |listener| listener.receive event end end end |
- (Object) stream=(source)
126 127 128 129 130 131 132 133 134 |
# File 'lib/rexml/parsers/baseparser.rb', line 126 def stream=( source ) @source = SourceFactory.create_from( source ) @closed = nil @document_status = nil @tags = [] @stack = [] @entities = [] @nsstack = [] end |
- (Object) unnormalize(string, entities = nil, filter = nil)
Unescapes all possible entities
473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 |
# File 'lib/rexml/parsers/baseparser.rb', line 473 def unnormalize( string, entities=nil, filter=nil ) rv = string.clone rv.gsub!( /\r\n?/, "\n" ) matches = rv.scan( REFERENCE_RE ) return rv if matches.size == 0 rv.gsub!( /�*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) { m=$1 m = "0#{m}" if m[0] == ?x [Integer(m)].pack('U*') } matches.collect!{|x|x[0]}.compact! if matches.size > 0 matches.each do |entity_reference| unless filter and filter.include?(entity_reference) entity_value = entity( entity_reference, entities ) if entity_value re = /&#{entity_reference};/ rv.gsub!( re, entity_value ) else er = DEFAULT_ENTITIES[entity_reference] rv.gsub!( er[0], er[2] ) if er end end end rv.gsub!( /&/, '&' ) end rv end |
- (Object) unshift(token)
Push an event back on the head of the stream. This method has (theoretically) infinite depth.
157 158 159 |
# File 'lib/rexml/parsers/baseparser.rb', line 157 def unshift token @stack.unshift(token) end |