Class: REXML::Parsers::BaseParser
- Inherits:
-
Object
- Object
- REXML::Parsers::BaseParser
- Defined in:
- lib/rexml/parsers/baseparser.rb
Overview
Using the Pull Parser
This API is experimental, and subject to change.
parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
while parser.has_next?
res = parser.next
puts res[1]['att'] if res.start_tag? and res[0] == 'b'
end
See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.
Notice that:
parser = PullParser.new( "<a>BAD DOCUMENT" )
while parser.has_next?
res = parser.next
raise res[1] if res.error?
end
Nat Price gave me some good ideas for the API.
Constant Summary collapse
- LETTER =
'[:alpha:]'
- DIGIT =
'[:digit:]'
- COMBININGCHAR =
TODO
''
- EXTENDER =
TODO
''
- NCNAME_STR =
"[#{LETTER}_][-[:alnum:]._#{COMBININGCHAR}#{EXTENDER}]*"
- QNAME_STR =
"(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"
- QNAME =
/(#{QNAME_STR})/
- UNAME_STR =
Just for backward compatibility. For example, kramdown uses this. It’s not used in REXML.
"(?:#{NCNAME_STR}:)?#{NCNAME_STR}"
- NAMECHAR =
'[\-\w\.:]'
- NAME =
"([\\w:]#{NAMECHAR}*)"
- NMTOKEN =
"(?:#{NAMECHAR})+"
- NMTOKENS =
"#{NMTOKEN}(\\s+#{NMTOKEN})*"
- REFERENCE =
"&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"
- REFERENCE_RE =
/#{REFERENCE}/
- DOCTYPE_START =
/\A\s*<!DOCTYPE\s/um
- DOCTYPE_END =
/\A\s*\]\s*>/um
- ATTRIBUTE_PATTERN =
/\s*(#{QNAME_STR})\s*=\s*(["'])(.*?)\4/um
- COMMENT_START =
/\A<!--/u
- COMMENT_PATTERN =
/<!--(.*?)-->/um
- CDATA_START =
/\A<!\[CDATA\[/u
- CDATA_END =
/\A\s*\]\s*>/um
- CDATA_PATTERN =
/<!\[CDATA\[(.*?)\]\]>/um
- XMLDECL_START =
/\A<\?xml\s/u
- XMLDECL_PATTERN =
/<\?xml\s+(.*?)\?>/um
- INSTRUCTION_START =
/\A<\?/u
- INSTRUCTION_PATTERN =
/<\?#{NAME}(\s+.*?)?\?>/um
- TAG_MATCH =
/\A<((?>#{QNAME_STR}))/um
- CLOSE_MATCH =
/\A\s*<\/(#{QNAME_STR})\s*>/um
- VERSION =
/\bversion\s*=\s*["'](.*?)['"]/um
- ENCODING =
/\bencoding\s*=\s*["'](.*?)['"]/um
- STANDALONE =
/\bstandalone\s*=\s*["'](.*?)['"]/um
- ENTITY_START =
/\A\s*<!ENTITY/
- ELEMENTDECL_START =
/\A\s*<!ELEMENT/um
- ELEMENTDECL_PATTERN =
/\A\s*(<!ELEMENT.*?)>/um
- SYSTEMENTITY =
/\A\s*(%.*?;)\s*$/um
- ENUMERATION =
"\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"
- NOTATIONTYPE =
"NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"
- ENUMERATEDTYPE =
"(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"
- ATTTYPE =
"(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"
- ATTVALUE =
"(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"
- DEFAULTDECL =
"(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"
- ATTDEF =
"\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"
- ATTDEF_RE =
/#{ATTDEF}/
- ATTLISTDECL_START =
/\A\s*<!ATTLIST/um
- ATTLISTDECL_PATTERN =
/\A\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um
- TEXT_PATTERN =
/\A([^<]*)/um
- PUBIDCHAR =
Entity constants
"\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"
- SYSTEMLITERAL =
%Q{((?:"[^"]*")|(?:'[^']*'))}
- PUBIDLITERAL =
%Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}
- EXTERNALID =
"(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
- NDATADECL =
"\\s+NDATA\\s+#{NAME}"
- PEREFERENCE =
"%#{NAME};"
- ENTITYVALUE =
%Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
- PEDEF =
"(?:#{ENTITYVALUE}|#{EXTERNALID})"
- ENTITYDEF =
"(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
- PEDECL =
"<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
- GEDECL =
"<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
- ENTITYDECL =
/\s*(?:#{GEDECL})|\s*(?:#{PEDECL})/um
- NOTATIONDECL_START =
/\A\s*<!NOTATION/um
- EXTERNAL_ID_PUBLIC =
/\A\s*PUBLIC\s+#{PUBIDLITERAL}\s+#{SYSTEMLITERAL}\s*/um
- EXTERNAL_ID_SYSTEM =
/\A\s*SYSTEM\s+#{SYSTEMLITERAL}\s*/um
- PUBLIC_ID =
/\A\s*PUBLIC\s+#{PUBIDLITERAL}\s*/um
- EREFERENCE =
/&(?!#{NAME};)/
- DEFAULT_ENTITIES =
{ 'gt' => [/>/, '>', '>', />/], 'lt' => [/</, '<', '<', /</], 'quot' => [/"/, '"', '"', /"/], "apos" => [/'/, "'", "'", /'/] }
Constants included from Private
Private::ATTLISTDECL_END, Private::CLOSE_PATTERN, Private::ENTITYDECL_PATTERN, Private::GEDECL_PATTERN, Private::INSTRUCTION_END, Private::NAME_PATTERN, Private::PEDECL_PATTERN, Private::TAG_PATTERN
Instance Attribute Summary collapse
-
#source ⇒ Object
readonly
Returns the value of attribute source.
Instance Method Summary collapse
- #add_listener(listener) ⇒ Object
-
#empty? ⇒ Boolean
Returns true if there are no more events.
- #entity(reference, entities) ⇒ Object
-
#has_next? ⇒ Boolean
Returns true if there are more events.
-
#initialize(source) ⇒ BaseParser
constructor
A new instance of BaseParser.
-
#normalize(input, entities = nil, entity_filter = nil) ⇒ Object
Escapes all possible entities.
-
#peek(depth = 0) ⇒ Object
Peek at the
depth
event in the stack. - #position ⇒ Object
-
#pull ⇒ Object
Returns the next event.
- #stream=(source) ⇒ Object
-
#unnormalize(string, entities = nil, filter = nil) ⇒ Object
Unescapes all possible entities.
-
#unshift(token) ⇒ Object
Push an event back on the head of the stream.
Constructor Details
#initialize(source) ⇒ BaseParser
Returns a new instance of BaseParser.
139 140 141 142 |
# File 'lib/rexml/parsers/baseparser.rb', line 139 def initialize( source ) self.stream = source @listeners = [] end |
Instance Attribute Details
#source ⇒ Object (readonly)
Returns the value of attribute source.
148 149 150 |
# File 'lib/rexml/parsers/baseparser.rb', line 148 def source @source end |
Instance Method Details
#add_listener(listener) ⇒ Object
144 145 146 |
# File 'lib/rexml/parsers/baseparser.rb', line 144 def add_listener( listener ) @listeners << listener end |
#empty? ⇒ Boolean
Returns true if there are no more events
170 171 172 |
# File 'lib/rexml/parsers/baseparser.rb', line 170 def empty? return (@source.empty? and @stack.empty?) end |
#entity(reference, entities) ⇒ Object
463 464 465 466 467 468 469 470 471 |
# File 'lib/rexml/parsers/baseparser.rb', line 463 def entity( reference, entities ) value = nil value = entities[ reference ] if entities if not value value = DEFAULT_ENTITIES[ reference ] value = value[2] if value end unnormalize( value, entities ) if value end |
#has_next? ⇒ Boolean
Returns true if there are more events. Synonymous with !empty?
175 176 177 |
# File 'lib/rexml/parsers/baseparser.rb', line 175 def has_next? return !(@source.empty? and @stack.empty?) end |
#normalize(input, entities = nil, entity_filter = nil) ⇒ Object
Escapes all possible entities
474 475 476 477 478 479 480 481 482 483 484 485 486 487 |
# File 'lib/rexml/parsers/baseparser.rb', line 474 def normalize( input, entities=nil, entity_filter=nil ) copy = input.clone # Doing it like this rather than in a loop improves the speed copy.gsub!( EREFERENCE, '&' ) entities.each do |key, value| copy.gsub!( value, "&#{key};" ) unless entity_filter and entity_filter.include?(entity) end if entities copy.gsub!( EREFERENCE, '&' ) DEFAULT_ENTITIES.each do |key, value| copy.gsub!( value[3], value[1] ) end copy end |
#peek(depth = 0) ⇒ Object
Peek at the depth
event in the stack. The first element on the stack is at depth 0. If depth
is -1, will parse to the end of the input stream and return the last event, which is always :end_document. Be aware that this causes the stream to be parsed up to the depth
event, so you can effectively pre-parse the entire document (pull the entire thing into memory) using this method.
191 192 193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/rexml/parsers/baseparser.rb', line 191 def peek depth=0 raise %Q[Illegal argument "#{depth}"] if depth < -1 temp = [] if depth == -1 temp.push(pull()) until empty? else while @stack.size+temp.size < depth+1 temp.push(pull()) end end @stack += temp if temp.size > 0 @stack[depth] end |
#position ⇒ Object
160 161 162 163 164 165 166 167 |
# File 'lib/rexml/parsers/baseparser.rb', line 160 def position if @source.respond_to? :position @source.position else # FIXME 0 end end |
#pull ⇒ Object
Returns the next event. This is a PullEvent
object.
206 207 208 209 210 211 212 |
# File 'lib/rexml/parsers/baseparser.rb', line 206 def pull pull_event.tap do |event| @listeners.each do |listener| listener.receive event end end end |
#stream=(source) ⇒ Object
150 151 152 153 154 155 156 157 158 |
# File 'lib/rexml/parsers/baseparser.rb', line 150 def stream=( source ) @source = SourceFactory.create_from( source ) @closed = nil @document_status = nil @tags = [] @stack = [] @entities = [] @nsstack = [] end |
#unnormalize(string, entities = nil, filter = nil) ⇒ Object
Unescapes all possible entities
490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 |
# File 'lib/rexml/parsers/baseparser.rb', line 490 def unnormalize( string, entities=nil, filter=nil ) rv = string.gsub( /\r\n?/, "\n" ) matches = rv.scan( REFERENCE_RE ) return rv if matches.size == 0 rv.gsub!( /�*((?:\d+)|(?:x[a-fA-F0-9]+));/ ) { m=$1 m = "0#{m}" if m[0] == ?x [Integer(m)].pack('U*') } matches.collect!{|x|x[0]}.compact! if matches.size > 0 matches.each do |entity_reference| unless filter and filter.include?(entity_reference) entity_value = entity( entity_reference, entities ) if entity_value re = /&#{entity_reference};/ rv.gsub!( re, entity_value ) else er = DEFAULT_ENTITIES[entity_reference] rv.gsub!( er[0], er[2] ) if er end end end rv.gsub!( /&/, '&' ) end rv end |
#unshift(token) ⇒ Object
Push an event back on the head of the stream. This method has (theoretically) infinite depth.
181 182 183 |
# File 'lib/rexml/parsers/baseparser.rb', line 181 def unshift token @stack.unshift(token) end |