XmlSplit

Split XML files on an element, yielding (streaming, so constant memory usage) each node in turn.

Uses sgrep internally.

As seen on "Split XML files with sgrep, a classic UNIX utility from 1995"

Similar, but not identical, to XML-Twig's xml_split.

Dependencies

Currently requires that you have sgrep or sgrep2 in your path.

Usage

>> require 'xml_split'
=> true
>> x = XmlSplit.new('15MinLP_15Days.xml', 'IntervalReading')
=> #<XmlSplit:0x0000010395ce60 @nodes=[], @cache_full=false, @path="/tmp/scratch/15MinLP_15Days.xml", @element="IntervalReading", @caching=false>
>> x.each { |node| puts node }
<IntervalReading>
    <cost>907</cost>
    <timePeriod>
        <duration>900</duration>
        <start>1330578000</start>
         <!-- 3/1/2012 5:00:00 AM  -->
    </timePeriod>
    <value>302</value>
</IntervalReading>
[...]

Command-line

gem install xml_split

This will give you a binary called xml_split. You can use it to split XML files into many smaller files:

$ xml_split ~/samples/CLF8762E_20120709.xml MyElement CLF8762E/20120709
[...]
$ ls CLF8762E
20120709_0000000000
20120709_0000000001
20120709_0000000002
20120709_0000000003
20120709_0000000004
20120709_0000000005
20120709_0000000006
20120709_0000000007
20120709_0000000008
[...]

Copyright 2012 Brighter Planet, Inc.