AXML
AXML - Provides a simple, minimalistic DOM for working with data stored in an XML document. The API is very similar to LibXML, differing slightly in the handling of text nodes. It is designed with very large documents in mind: nodes are represented in memory efficient Struct objects and it works with either XMLParser or LibXML!
‘AXML’ literally translates into ‘ax XML’ which succinctly describes the occasional feeling of a programmer towards XML or its myriad parsers. AXML won’t solve all your XML woes, but it does make working with XML much less painful.
Overview
-
fast: runs on either XMLParser or LibXML
-
lean: as in ‘lines of code’ and as in ‘memory consumption’
Examples
require 'axml'
# a little example xml string to use
string = "
<n1>
<n2 size='big'>
<n3>words here</n3>
<n3></n3>
</n2>
<n2 size='small' >
<n3 id='3' thinks='out loud'></n3>
</n2>
</n1>
"
Read a string, io, or file
n1_node = AXML.parse(string) # <- can read xml as string
n1_node = AXML.parse(io) # <- can read an io object
n1_node = AXML.parse('path/to/file') # <- can read a file
Access children
n1_node.children # -> [array]
n1_node.each {|child| # do something with each child }
Traverse the whole tree structure
n1_node.traverse do |node|
# pre traversal
end
n1_node.traverse(:post) {|node| # post traversal }
Get attributes and text
n2_node['size'] == 'big'
n3_node = n2_node.child
n3_node.text # -> 'words here'
n3_node.content # -> 'words here'
Navigate nodes
n2_node = n1_node.child
the_other_n2_node = n2_node.next
the_other_n2_node.next = nil
Does a little xpath
# find_first (returns the first node)
n3_node = n1_node.find_first('descendant::n3')
other_n3_node = n3_node.find_first('following-sibling::n3')
n1_node.find_first('child::n3') # -> nil
# also callable as find_first_child and find_first_descendant
# find (returns an array)
n1_node.find('child::n2') # -> [array of 2 <n2> nodes]
n1_node.find('descendant::n3') # -> [array of all 3 <n3> nodes]
# also callable as find_child and find_descendant
Manipulate tree structure
node.drop # drop the node from its parents
## (insert?)
Output
XML Output is currently tested only with XMLParser.
node.to_s # -> formatted xml
node.to_doc # -> with xml header line
node.to_doc(filename) # -> written to filename
See ‘spec/` dir for more examples and functionality
Details
If using XMLParser, builds nodes out of Struct objects (AXML::El). Currently only parses elements, attributes, and text(content) (no CDATA right now).
If using LibXML, it uses the underlying LibXML nodes already available. It overrides some methods to treat the text in a text node as the parent node’s text attribute.
Warnings
Output of xml (i.e., node#to_s) under LibXML is untested (and probably buggy) since the node text behavor has been modified. Will work it out in future release.
Doesn’t parse CDATA using XMLParser right now.
Installation
gem install axml
Can get instructions on installing XMLParser and LibXML by issuing this command:
ruby -rubygems -e 'require "axml"; puts AXML::Autoload.install_instructions(:all)'
See Also
If you are parsing HTML or complex word processing documents this is not the parser for you. Try something like hpricot or LibXML.