Class: REXML::Parsers::BaseParser

Inherits:

Object

Object
REXML::Parsers::BaseParser

show all

Defined in:: lib/rexml/parsers/baseparser.rb

Overview

Using the Pull Parser

This API is experimental, and subject to change. parser = PullParser.new( "texttxet" ) while parser.has_next? res = parser.next puts res['att'] if res.start_tag? and res == 'b' end See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.

Notice that: parser = PullParser.new( "BAD DOCUMENT" ) while parser.has_next? res = parser.next raise res if res.error? end

Nat Price gave me some good ideas for the API.

Constant Summary collapse

LETTER =

'[:alpha:]'

DIGIT =

'[:digit:]'

COMBININGCHAR =

TODO

''

EXTENDER =

TODO

''

NCNAME_STR =

"[#{LETTER}_:][-[:alnum:]._:#{COMBININGCHAR}#{EXTENDER}]*"

NAME_STR =

"(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"

UNAME_STR =

"(?:#{NCNAME_STR}:)?#{NCNAME_STR}"

NAMECHAR =

'[\-\w\.:]'

NAME =

"([\\w:]#{NAMECHAR}*)"

NMTOKEN =

"(?:#{NAMECHAR})+"

NMTOKENS =

"#{NMTOKEN}(\\s+#{NMTOKEN})*"

REFERENCE =

"&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"

REFERENCE_RE =

/#{REFERENCE}/

DOCTYPE_START =

/\A\s*<!DOCTYPE\s/um

DOCTYPE_END =

/\A\s*\]\s*>/um

DOCTYPE_PATTERN =

/\s*<!DOCTYPE\s+(.*?)(\[|>)/um

ATTRIBUTE_PATTERN =

/\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\4/um

COMMENT_START =

/\A/um

CDATA_START =

/\A<!\[CDATA\[/u

CDATA_END =

/\A\s*\]\s*>/um

CDATA_PATTERN =

/<!\[CDATA\[(.*?)\]\]>/um

XMLDECL_START =

/\A<\?xml\s/u

XMLDECL_PATTERN =

/<\?xml\s+(.*?)\?>/um

INSTRUCTION_START =

/\A<\?/u

INSTRUCTION_PATTERN =

/<\?(.*?)(\s+.*?)?\?>/um

TAG_MATCH =

/^<((?>#{NAME_STR}))\s*((?>\s+#{UNAME_STR}\s*=\s*(["']).*?\5)*)\s*(\/)?>/um

CLOSE_MATCH =

/^\s*<\/(#{NAME_STR})\s*>/um

VERSION =

/\bversion\s*=\s*["'](.*?)['"]/um

ENCODING =

/\bencoding\s*=\s*["'](.*?)['"]/um

STANDALONE =

/\bstandalone\s*=\s*["'](.*?)['"]/um

ENTITY_START =

/\A\s*<!ENTITY/

IDENTITY =

/^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u

ELEMENTDECL_START =

/\A\s*<!ELEMENT/um

ELEMENTDECL_PATTERN =

/\A\s*(<!ELEMENT.*?)>/um

SYSTEMENTITY =

/\A\s*(%.*?;)\s*$/um

ENUMERATION =

"\$\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\$"

NOTATIONTYPE =

"NOTATION\\s+\$\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\$"

ENUMERATEDTYPE =

"(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"

ATTTYPE =

"(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"

ATTVALUE =

"(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"

DEFAULTDECL =

"(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"

ATTDEF =

"\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"

ATTDEF_RE =

/#{ATTDEF}/

ATTLISTDECL_START =

/\A\s*<!ATTLIST/um

ATTLISTDECL_PATTERN =

/\A\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um

NOTATIONDECL_START =

/\A\s*<!NOTATION/um

PUBLIC =

/\A\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um

SYSTEM =

/\A\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um

TEXT_PATTERN =

/\A([^<]*)/um

PUBIDCHAR =

Entity constants

"\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"

SYSTEMLITERAL =

%Q{((?:"[^"]*")|(?:'[^']*'))}

PUBIDLITERAL =

%Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}

EXTERNALID =

"(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"

NDATADECL =

"\\s+NDATA\\s+#{NAME}"

PEREFERENCE =

"%#{NAME};"

ENTITYVALUE =

%Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}

PEDEF =

"(?:#{ENTITYVALUE}|#{EXTERNALID})"

ENTITYDEF =

"(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"

PEDECL =

"<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"

GEDECL =

"<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"

ENTITYDECL =

/\s*(?:#{GEDECL})|(?:#{PEDECL})/um

EREFERENCE =

/&(?!#{NAME};)/

DEFAULT_ENTITIES =

{ 'gt' => [/>/, '>', '>', />/], 'lt' => [/</, '<', '<', /</], 'quot' => [/"/, '"', '"', /"/], "apos" => [/'/, "'", "'", /'/] }

MISSING_ATTRIBUTE_QUOTES =

These are patterns to identify common markup errors, to make the error messages more informative.

/^<#{NAME_STR}\s+#{NAME_STR}\s*=\s*[^"']/um

Instance Attribute Summary collapse

#source ⇒ Object readonly
Returns the value of attribute source.

Instance Method Summary collapse

#add_listener(listener) ⇒ Object

#empty? ⇒ Boolean
Returns true if there are no more events.

#entity(reference, entities) ⇒ Object

#has_next? ⇒ Boolean
Returns true if there are more events.

#initialize(source) ⇒ BaseParser constructor
A new instance of BaseParser.

#normalize(input, entities = nil, entity_filter = nil) ⇒ Object
Escapes all possible entities.

#peek(depth = 0) ⇒ Object
Peek at the depth event in the stack.

#position ⇒ Object

#pull ⇒ Object
Returns the next event.

#stream=(source) ⇒ Object

#unnormalize(string, entities = nil, filter = nil) ⇒ Object
Unescapes all possible entities.

#unshift(token) ⇒ Object
Push an event back on the head of the stream.

Constructor Details

#initialize(source) ⇒ BaseParser

Returns a new instance of BaseParser.

116 117 118 119

# File 'lib/rexml/parsers/baseparser.rb', line 116 def initialize( source ) self.stream = source @listeners = [] end

Instance Attribute Details

#source ⇒ Object (readonly)

Returns the value of attribute source.

125 126 127

# File 'lib/rexml/parsers/baseparser.rb', line 125 def source @source end

Instance Method Details

#add_listener(listener) ⇒ Object

121 122 123

# File 'lib/rexml/parsers/baseparser.rb', line 121 def add_listener( listener ) @listeners << listener end

#empty? ⇒ Boolean

Returns true if there are no more events

Returns:

(Boolean)

147 148 149

# File 'lib/rexml/parsers/baseparser.rb', line 147 def empty? return (@source.empty? and @stack.empty?) end

#entity(reference, entities) ⇒ Object

448 449 450 451 452 453 454 455 456

# File 'lib/rexml/parsers/baseparser.rb', line 448 def entity( reference, entities ) value = nil value = entities[ reference ] if entities if not value value = DEFAULT_ENTITIES[ reference ] value = value[2] if value end unnormalize( value, entities ) if value end

#has_next? ⇒ Boolean

Returns true if there are more events. Synonymous with !empty?

Returns:

(Boolean)

152 153 154

# File 'lib/rexml/parsers/baseparser.rb', line 152 def has_next? return !(@source.empty? and @stack.empty?) end

#normalize(input, entities = nil, entity_filter = nil) ⇒ Object

Escapes all possible entities

459 460 461 462 463 464 465 466 467 468 469 470 471 472

# File 'lib/rexml/parsers/baseparser.rb', line 459 def normalize( input, entities=nil, entity_filter=nil ) copy = input.clone # Doing it like this rather than in a loop improves the speed copy.gsub!( EREFERENCE, '&' ) entities.each do |key, value| copy.gsub!( value, "&#{key};" ) unless entity_filter and entity_filter.include?(entity) end if entities copy.gsub!( EREFERENCE, '&' ) DEFAULT_ENTITIES.each do |key, value| copy.gsub!( value[3], value[1] ) end copy end

#peek(depth = 0) ⇒ Object

Peek at the depth event in the stack. The first element on the stack is at depth 0. If depth is -1, will parse to the end of the input stream and return the last event, which is always :end_document. Be aware that this causes the stream to be parsed up to the depth event, so you can effectively pre-parse the entire document (pull the entire thing into memory) using this method.

168 169 170 171 172 173 174 175 176 177 178 179 180

# File 'lib/rexml/parsers/baseparser.rb', line 168 def peek depth=0 raise %Q[Illegal argument "#{depth}"] if depth < -1 temp = [] if depth == -1 temp.push(pull()) until empty? else while @stack.size+temp.size < depth+1 temp.push(pull()) end end @stack += temp if temp.size > 0 @stack[depth] end

#position ⇒ Object

137 138 139 140 141 142 143 144

# File 'lib/rexml/parsers/baseparser.rb', line 137 def position if @source.respond_to? :position @source.position else # FIXME 0 end end

#pull ⇒ Object

Returns the next event. This is a PullEvent object.

183 184 185 186 187 188 189

# File 'lib/rexml/parsers/baseparser.rb', line 183 def pull pull_event.tap do |event| @listeners.each do |listener| listener.receive event end end end

#stream=(source) ⇒ Object

127 128 129 130 131 132 133 134 135

# File 'lib/rexml/parsers/baseparser.rb', line 127 def stream=( source ) @source = SourceFactory.create_from( source ) @closed = nil @document_status = nil @tags = [] @stack = [] @entities = [] @nsstack = [] end

#unnormalize(string, entities = nil, filter = nil) ⇒ Object

Unescapes all possible entities

475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502

# File 'lib/rexml/parsers/baseparser.rb', line 475 def unnormalize( string, entities=nil, filter=nil ) rv = string.clone rv.gsub!( /\r\n?/, "\n" ) matches = rv.scan( REFERENCE_RE ) return rv if matches.size == 0 rv.gsub!( /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) { m=$1 m = "0#{m}" if m[0] == ?x [Integer(m)].pack('U*') } matches.collect!{|x|x[0]}.compact! if matches.size > 0 matches.each do |entity_reference| unless filter and filter.include?(entity_reference) entity_value = entity( entity_reference, entities ) if entity_value re = /&#{entity_reference};/ rv.gsub!( re, entity_value ) else er = DEFAULT_ENTITIES[entity_reference] rv.gsub!( er[0], er[2] ) if er end end end rv.gsub!( /&/, '&' ) end rv end

#unshift(token) ⇒ Object

Push an event back on the head of the stream. This method has (theoretically) infinite depth.

158 159 160

# File 'lib/rexml/parsers/baseparser.rb', line 158 def unshift token @stack.unshift(token) end

Generated on Tue Apr 14 11:10:36 2026 by yard 0.9.40 (ruby-4.0.2).

Class: REXML::Parsers::BaseParser

Overview

Using the Pull Parser

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ `BaseParser`

Instance Attribute Details

#source ⇒ `Object` (readonly)

Instance Method Details

#add_listener(listener) ⇒ `Object`

#empty? ⇒ `Boolean`

#entity(reference, entities) ⇒ `Object`

#has_next? ⇒ `Boolean`

#normalize(input, entities = nil, entity_filter = nil) ⇒ `Object`

#peek(depth = 0) ⇒ `Object`

#position ⇒ `Object`

#pull ⇒ `Object`

#stream=(source) ⇒ `Object`

#unnormalize(string, entities = nil, filter = nil) ⇒ `Object`

#unshift(token) ⇒ `Object`

Class: REXML::Parsers::BaseParser

Overview

Using the Pull Parser

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ BaseParser

Instance Attribute Details

#source ⇒ Object (readonly)

Instance Method Details

#add_listener(listener) ⇒ Object

#empty? ⇒ Boolean

#entity(reference, entities) ⇒ Object

#has_next? ⇒ Boolean

#normalize(input, entities = nil, entity_filter = nil) ⇒ Object

#peek(depth = 0) ⇒ Object

#position ⇒ Object

#pull ⇒ Object

#stream=(source) ⇒ Object

#unnormalize(string, entities = nil, filter = nil) ⇒ Object

#unshift(token) ⇒ Object

#initialize(source) ⇒ `BaseParser`

#source ⇒ `Object` (readonly)

#add_listener(listener) ⇒ `Object`

#empty? ⇒ `Boolean`

#entity(reference, entities) ⇒ `Object`

#has_next? ⇒ `Boolean`

#normalize(input, entities = nil, entity_filter = nil) ⇒ `Object`

#peek(depth = 0) ⇒ `Object`

#position ⇒ `Object`

#pull ⇒ `Object`

#stream=(source) ⇒ `Object`

#unnormalize(string, entities = nil, filter = nil) ⇒ `Object`

#unshift(token) ⇒ `Object`