AXML

AXML - Provides a simple, minimalistic DOM for working with data stored in an XML document. The API is very similar to LibXML, differing slightly in the handling of text nodes. It is designed with very large documents in mind: nodes are represented in memory efficient Struct objects and it works with either XMLParser or LibXML!

‘AXML’ literally translates into ‘ax XML’ which succinctly describes the occasional feeling of a programmer towards XML or its myriad parsers. AXML won’t solve all your XML woes, but it does make working with XML much less painful.

Features


  • fast: runs on either XMLParser or LibXML

  • lean: as in ‘lines of code’ and as in ‘memory consumption’ (nodes implemented as Struct, children in Array)

  • *easy to extend*: code your Grandmother could read and understand (if she reads ruby)

  • PLOS: implements a useful subset of libxml methods for near drop in replacement.

Examples


require 'axml'  

# a little example xml string to use
string_or_io = "
<n1>
  <n2 size='big'>
    <n3>words here</n3>
    <n3></n3>
  </n2>
  <n2 size='small'>
    <n3 id='3'></n3>
  </n2>
</n1>
"

### Read a string, io, or file

n1_node = AXML.parse(string_or_io)   
# --or--
n1_node = AXML.parse('path/to/file')

### Access children

n1_node.children # -> [array]
n1_node.each {|child|  # do something with each child }

### Traverse the whole tree structure

n1_node.traverse do |node|
  # pre traversal
end

n1_node.traverse(:post) {|node| # post traversal }

### Get attributes and text

n2_node['size'] == 'big'
n3_node = n2_node.child
n3_node.text    # -> 'words here'
n3_node.content # -> [same]

### Navigate nodes

n2_node = n1_node.child
the_other_n2_node = n2_node.next
the_other_n2_node.next = nil

### Does a little xpath

# find_first (returns the first node)
n3_node = n1_node.find_first('descendant::n3')
other_n3_node = n3_node.find_first('following-sibling::n3')
n1_node.find_first('child::n3')    # -> nil
# also callable as find_first_child and find_first_descendant

# find (returns an array)
n1_node.find('child::n2')          # -> [array of 2 <n2> nodes]
n1_node.find('descendant::n3')     # -> [array of all 3 <n3> nodes]
# also callable as find_child and find_descendant

See ‘specs/axml_spec.rb` for more examples and functionality

Detailed Description


Parses elements, attributes, and text(content), and nothing more. Should be very easy to extend and modify for special cases. It is roughly as fast as libxml, although it currently reads in the entire document first (however, this is memory efficient - nodes are implemented using Struct).

Installation


gem install axml

See Also


If you are parsing HTML or complex word processing documents this is not the parser for you. Try something like hpricot or LibXML.