Class: XML::SAX::FragmentBuilder
Overview
Build a Nokogiri::XML::Document fragments that match an XPath.
Stream large (or small) record based XML documents building each matching XPath into a document fragment making futher manipulation of each record easier.
Notes
-
In order to save memory well balanced elements that do not match any XPath are unlinked. This means you cannot match records by position in relation to siblings.
-
Because we are parsing a SAX stream there is no read ahead. You cannot match records by any children the element may have once further events are pushed.
-
You can match by attributes of an element.
Example
builder = XML::SAX::FragmentBuilder.new(nil, {
'//record' => lambda{|record| puts el.to_s} # Process each matched record element.
})
parser = Nokogiri::XML::SAX::PushParser.new(builder)
parser << %q{
<root>
<record id="1">record one</record>
<record id="2">record two</record>
</root>
}
#=> <record id="1">record one</record>
#=> <record id="2">record two</record>
parser.finish
See
-
XML::SAX::Builder
-
XML::SAX::Filter
– TODO:
-
Namespaces.
Instance Attribute Summary
Attributes inherited from Builder
Attributes inherited from Filter
Instance Method Summary collapse
-
#cdata_block(string) ⇒ Object
:nodoc:.
-
#characters(string) ⇒ Object
:nodoc:.
-
#comment(string) ⇒ Object
:nodoc:.
-
#end_element_namespace(name, prefix = nil, uri = nil) ⇒ Object
:nodoc:.
-
#initialize(options = {}) ⇒ FragmentBuilder
constructor
Parameters handler<Nokogiri::XML::SAX::Document>:: Optional next
XML::SAX::Filter
or <tt>Nokogiri::XML::SAX::Document<tt>(final) in the chain. -
#start_element_namespace(name, attributes = [], prefix = nil, uri = nil, ns = []) ⇒ Object
:nodoc:.
Methods inherited from Builder
Methods inherited from Filter
#end_document, #error, #start_document, #warning
Constructor Details
#initialize(options = {}) ⇒ FragmentBuilder
Parameters
- handler<Nokogiri::XML::SAX::Document>
-
Optional next
XML::SAX::Filter
orNokogiri::XML::SAX::Document<tt>(final) in the chain. By default a <tt>Nokogiri::XML::SAX::Document
will be used making the chain final. - options<Hash>
-
=> &block<Proc> pairs. The first element passed to the block will be the matching Nokogiri::XML::Node. Keep in mind the node will be unlinked after your block returns.
50 51 52 53 54 55 |
# File 'lib/xml-sax-machines/fragment_builder.rb', line 50 def initialize( = {}) super() @find = @found = {} @buffer = 0 end |
Instance Method Details
#cdata_block(string) ⇒ Object
:nodoc:
95 96 97 |
# File 'lib/xml-sax-machines/fragment_builder.rb', line 95 def cdata_block(string) # :nodoc: @buffer > 0 ? super : (filter && filter.cdata_block(string)) end |
#characters(string) ⇒ Object
:nodoc:
87 88 89 |
# File 'lib/xml-sax-machines/fragment_builder.rb', line 87 def characters(string) # :nodoc: @buffer > 0 ? super : (filter && filter.characters(string)) end |
#comment(string) ⇒ Object
:nodoc:
91 92 93 |
# File 'lib/xml-sax-machines/fragment_builder.rb', line 91 def comment(string) # :nodoc: @buffer > 0 ? super : (filter && filter.comment(string)) end |
#end_element_namespace(name, prefix = nil, uri = nil) ⇒ Object
:nodoc:
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/xml-sax-machines/fragment_builder.rb', line 69 def end_element_namespace(name, prefix = nil, uri = nil) #:nodoc: path = @context.path if @buffer > 0 && block = @found.delete(path) @buffer -= 1 block.call(@context) end super if @buffer == 0 && !(path == '/') @document.at(path).unlink # Unlinked children are not garbage collected till the document they were created in is (I think). # This hack job halves memory usage but it still grows too fast for my liking :( @document = @document.dup @context = @document.at(@context.path) rescue nil end end |
#start_element_namespace(name, attributes = [], prefix = nil, uri = nil, ns = []) ⇒ Object
:nodoc:
57 58 59 60 61 62 63 64 65 66 67 |
# File 'lib/xml-sax-machines/fragment_builder.rb', line 57 def start_element_namespace(name, attributes = [], prefix = nil, uri = nil, ns = []) #:nodoc: super @find.each_pair do |xpath, block| if match = @document.at(xpath) unless @found[match.path] @buffer += 1 @found[match.path] = block end end end end |