Class: XML

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/magic_xml.rb,
lib/magic_xml.rb,
lib/magic_xml.rb

Overview

Instance methods (other than those of Enumerable)

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(*args, &blk) ⇒ XML

initialize can be run in many ways

  • XML.new

  • XML.new(:tag_symbol)

  • XML.new(:tag_symbol, attributes)

  • XML.new(:tag_symbol, “children”, “more”, XML.new(…))

  • XML.new(:tag_symbol, attributes, “and”, “children”)

  • XML.new(:tag_symbol) { monadic code }

  • XML.new(:tag_symbol, attributes) { monadic code }

Or even:

  • XML.new(:tag_symbol, “children”) { and some monadic code }

  • XML.new(:tag_symbol, attributes, “children”) { and some monadic code }

But typically you won’t be mixing these two style

Attribute values can will be converted to strings



795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
# File 'lib/magic_xml.rb', line 795

def initialize(*args, &blk)
    @name     = nil
    @attrs    = {}
    @contents = []
    @name = args.shift if args.size != 0
    if args.size != 0 and args[0].is_a? Hash
        args.shift.each{|k,v|
            # Do automatic conversion here
            # This also assures that the hashes are *not* shared
            self[k] = v
        }
    end
    # Expand Arrays passed as arguments
    self << args
    # FIXME: We'd rather not have people say @name = :foo there :-)
    if blk
        instance_eval(&blk)
    end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(meth, *args, &blk) ⇒ Object

Define all foo!-methods for monadic interface, so you can write:



975
976
977
978
979
980
981
# File 'lib/magic_xml.rb', line 975

def method_missing(meth, *args, &blk) 
    if meth.to_s =~ /^(.*)!$/
        self << XML.new($1.to_sym, *args, &blk)
    else
        real_method_missing(meth, *args, &blk)
    end
end

Instance Attribute Details

#attrsObject

Returns the value of attribute attrs.



778
779
780
# File 'lib/magic_xml.rb', line 778

def attrs
  @attrs
end

#contentsObject

Returns the value of attribute contents.



778
779
780
# File 'lib/magic_xml.rb', line 778

def contents
  @contents
end

#nameObject

Returns the value of attribute name.



778
779
780
# File 'lib/magic_xml.rb', line 778

def name
  @name
end

Class Method Details

.from_file(file) ⇒ Object

Read file and parse



464
465
466
467
# File 'lib/magic_xml.rb', line 464

def self.from_file(file)
    file = File.open(file) if file.is_a? String
    parse(file)
end

.from_url(url) ⇒ Object

Fetch URL and parse Supported: http://…/ https://…/ file:foo.xml string:<foo/>



475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
# File 'lib/magic_xml.rb', line 475

def self.from_url(url)
    if url =~ /^string:(.*)$/m
        parse($1)
    elsif url =~ /^file:(.*)$/m
        from_file($1)
    elsif url =~ /^http(s?):/
        ssl = ($1 == "s")
        # No, seriously - Ruby needs something better than net/http
        # Something that groks basic auth and queries and redirects automatically:
        # HTTP_LIBRARY.get_content("http://username:passwd/u.r.l/?query")
        # URI parsing must go inside the library, client programs
        # should have nothing to do with it

        # net/http is really inconvenient to use here
        u = URI.parse(url)
        # You're not seeing this:
        if u.query then
            path = u.path + "?" + u.query
        else
            path = u.path
        end
        req = Net::HTTP::Get.new(path)
        if u.userinfo
            username, passwd = u.userinfo.split(/:/,2)
            req.basic_auth username, passwd
        end
        if ssl
            # NOTE: You need libopenssl-ruby installed
            # if you want to use HTTPS. Ubuntu is broken
            # as it doesn't provide it in the default packages.
            require 'net/https'
            http = Net::HTTP.new(u.host, u.port)
            http.use_ssl = true
            http.verify_mode = OpenSSL::SSL::VERIFY_NONE
        else
            http = Net::HTTP.new(u.host, u.port)
        end
        
        res = http.start {|http| http.request(req) }
        # TODO: Throw a more meaningful exception
        parse(res.body)
    else
        raise "URL protocol #{url} not supported (http, https, file, string are supported)"
    end
end

.load(obj) ⇒ Object

Like CDuce load_xml The path can be:

  • file handler

  • URL (a string with :)

  • file name (a string without :)



526
527
528
529
530
531
532
533
534
535
536
# File 'lib/magic_xml.rb', line 526

def self.load(obj)
    if obj.is_a? String
        if obj.include? ":"
            from_url(obj)
        else
            from_file(obj)
        end
    else
        parse(obj)
    end
end

.method_missing(meth, *args, &blk) ⇒ Object

XML.foo! == xml!(:foo) XML.foo == xml(:foo)



455
456
457
458
459
460
461
# File 'lib/magic_xml.rb', line 455

def self.method_missing(meth, *args, &blk) 
    if meth.to_s =~ /^(.*)!$/
        xml!($1.to_sym, *args, &blk)
    else
        XML.new(meth, *args, &blk)
    end
end

.parse(stream, options = {}) ⇒ Object

Parse XML using REXML. Available options:

  • :extra_entities => Proc or Hash (default = nil)

  • :remove_pretty_printing => true/false (default = false)

  • :comments => true/false (default = false)

  • :pi => true/false (default = false)

  • :normalize => true/false (default = false) - normalize

  • :multiple_roots => true/false (default=false) - document

    can have any number of roots (instread of one).
    Return all in an array instead of root/nil.
    Also include non-elements (String/PI/Comment) in the return set !!!
    

FIXME: :comments/:pi will break everything if there are comments/PIs outside document root. Now PIs are outside the document root more often than not, so we’re pretty much screwed here.

FIXME: Integrate all kinds of parse, and make them support extra options

FIXME: Benchmark normalize!

FIXME: Benchmark dup-based Enumerable methods

FIXME: Make it possible to include bogus XML_Document superparent,

and to make it support out-of-root PIs/Comments


677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
# File 'lib/magic_xml.rb', line 677

def self.parse(stream, options={})
    extra_entities = options[:extra_entities]

    parser = REXML::Parsers::BaseParser.new stream
    stack = [[]]
    
    while true
        event = parser.pull
        case event[0]
        when :start_element
            attrs = {}
            event[2].each{|k,v| attrs[k.to_sym] = v.xml_unescape(extra_entities) }
            stack << XML.new(event[1].to_sym, attrs, event[3..-1])
            stack[-2] << stack[-1]
        when :end_element
            stack.pop
        # Needs unescaping
        when :text
             e = event[1].xml_unescape(extra_entities)
             # Either inside root or in multi-root mode
             if stack.size > 1 or options[:multiple_roots]
                 stack[-1] << e
             elsif event[1] !~ /\S/
                 # Ignore out-of-root whitespace in single-root mode
             else
                 raise "Non-whitespace text out of document root (and not in multiroot mode): #{event[1]}"
             end
        # CDATA is already unescaped
        when :cdata
            e = event[1]
            if stack.size > 1 or options[:multiple_roots]
                stack[-1] << e
            else
                raise "CDATA out of the document root"
            end
        when :comment
            next unless options[:comments]
            e = XML_Comment.new(event[1])
            if stack.size > 1 or options[:multiple_roots]
                stack[-1] << e
            else
                # FIXME: Ugly !
                raise "Comments out of the document root"
            end
        when :processing_instruction
            # FIXME: Real PI node
            next unless options[:pi]
            e = XML_PI.new(event[1], event[2])
            if stack.size > 1 or options[:multiple_roots]
                stack[-1] << e
            else
                # FIXME: Ugly !
                raise "Processing instruction out of the document root"
            end
        when :end_document
            break
        when :xmldecl,:start_doctype,:end_doctype,:elementdecl
            # Positivery ignore
        when :externalentity,:entity,:attlistdecl,:notationdecl
            # Ignore ???
            #print "Ignored XML event #{event[0]} when parsing\n"
        else
            # Huh ? What's that ?
            #print "Unknown XML event #{event[0]} when parsing\n"
        end
    end
    roots = stack[0]
    
    roots.each{|root| root.remove_pretty_printing!} if options[:remove_pretty_printing]
    # :remove_pretty_printing does :normalize anyway
    roots.each{|root| root.normalize!} if options[:normalize]
    if options[:multiple_roots]
        roots
    else
        roots[0]
    end
end

.parse_as_twigs(stream) ⇒ Object

Parse XML in mixed stream/tree mode Basically the idea is that every time we get start element, we ask the block what to do about it. If it wants a tree below it, it should call e.tree If a tree was requested, elements below the current one are not processed. If it wasn’t, they are.

For example:

<foo><bar/></foo><foo2/>
yield <foo> ... </foo>
.complete! called
process <foo2> next

But:

<foo><bar/></foo><foo2/>
yield <foo> ... </foo>
.complete! not called
process <bar> next

FIXME: yielded values are not reusable for now FIXME: make more object-oriented



559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
# File 'lib/magic_xml.rb', line 559

def self.parse_as_twigs(stream)
    parser = REXML::Parsers::BaseParser.new stream
    # We don't really need to keep the stack ;-)
    stack = []
    while true
        event = parser.pull
        case event[0]
        when :start_element
            # Now the evil part evil
            attrs = {}
            event[2].each{|k,v| attrs[k.to_sym] = v.xml_unescape}
            node = XML.new(event[1].to_sym, attrs, *event[3..-1])
            
            # I can't say it's superelegant
            class <<node
                attr_accessor :do_complete
                def complete!
                    if @do_complete
                        @do_complete.call
                        @do_complete = nil
                    end
                end
            end
            node.do_complete = proc{
                parse_subtree(node, parser)
            }

            yield(node)
            if node.do_complete
                stack.push node
                node.do_complete = nil # It's too late, complete! shouldn't do anything now
            end
        when :end_element
            stack.pop
        when :end_document
            return
        else
            # FIXME: Do the right thing.
            # For now, ignore *everything* else
            # This is totally incorrect, user might want to 
            # see text, comments and stuff like that anyway
        end
    end
end

.parse_sequence(stream, options = {}) ⇒ Object

Parse a sequence. Equivalent to XML.parse(stream, :multiple_roots => true).



756
757
758
759
760
# File 'lib/magic_xml.rb', line 756

def self.parse_sequence(stream, options={})
    o = options.dup
    o[:multiple_roots] = true
    parse(stream, o)
end

.parse_subtree(start_node, parser) ⇒ Object

Basically it’s a copy of self.parse, ugly …



605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
# File 'lib/magic_xml.rb', line 605

def self.parse_subtree(start_node, parser)
    stack = [start_node]
    res = nil
    while true
        event = parser.pull
        case event[0]
        when :start_element
            attrs = {}
            event[2].each{|k,v| attrs[k.to_sym] = v.xml_unescape}
            stack << XML.new(event[1].to_sym, attrs, *event[3..-1])
            if stack.size == 1
                res = stack[0] 
            else
                stack[-2] << stack[-1]
            end
        when :end_element
            stack.pop
            return if stack == []
        # Needs unescaping
        when :text
             # Ignore whitespace
             if stack.size == 0
                 next if event[1] !~ /\S/
                 raise "Non-whitespace text out of document root"
             end
             stack[-1] << event[1].xml_unescape
        # CDATA is already unescaped
        when :cdata
             if stack.size == 0
                 raise "CDATA out of the document root"
             end
             stack[-1] << event[1]
        when :end_document
            raise "Parse error: end_document inside a subtree, tags are not balanced"
        when :xmldecl,:start_doctype,:end_doctype,:elementdecl,:processing_instruction
            # Positivery ignore
        when :comment,:externalentity,:entity,:attlistdecl,:notationdecl
            # Ignore ???
            #print "Ignored XML event #{event[0]} when parsing\n"
        else
            # Huh ? What's that ?
            #print "Unknown XML event #{event[0]} when parsing\n"
        end
    end
    res

end

.renormalize(stream) ⇒ Object

Renormalize a string containing XML document



763
764
765
# File 'lib/magic_xml.rb', line 763

def self.renormalize(stream)
    parse(stream).to_s
end

.renormalize_sequence(stream) ⇒ Object

Renormalize a string containing a sequence of XML documents and strings XMLrenormalize_sequence(“<hello />, <world></world>!”) => “<hello/>, <world/>!”



771
772
773
# File 'lib/magic_xml.rb', line 771

def self.renormalize_sequence(stream)
    parse_sequence(stream).to_s
end

Instance Method Details

#<<(cnt) ⇒ Object Also known as: add!

Add children. Possible uses:

  • Add single element

self << xml(...)
self << "foo"

Add nothing:

self << nil

Add multiple elements (also works recursively):

self << [a, b, c] 
self << [a, [b, c], d]


885
886
887
888
889
890
891
892
893
894
# File 'lib/magic_xml.rb', line 885

def <<(cnt)
    if cnt.nil?
        # skip
    elsif cnt.is_a? Array
        cnt.each{|elem| self << elem}
    else
        @contents << cnt
    end
    self
end

#==(x) ⇒ Object

Equality test, works as if XMLs were normalized, so:

XML.new(:foo, "Hello, ", "world") == XML.new(:foo, "Hello, world")


898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
# File 'lib/magic_xml.rb', line 898

def ==(x)
    return false unless x.is_a? XML
    return false unless name == x.name and attrs == x.attrs
    # Now the hard part, strings can be split in different ways
    # empty string children are possible etc.
    self_i = 0
    othr_i = 0
    while self_i != contents.size or othr_i != x.contents.size
        # Ignore ""s
        if contents[self_i].is_a? String and contents[self_i] == ""
            self_i += 1
            next
        end
        if x.contents[othr_i].is_a? String and x.contents[othr_i] == ""
            othr_i += 1
            next
        end

        # If one is finished and the other contains non-empty elements,
        # they are not equal
        return false if self_i == contents.size or othr_i == x.contents.size

        # Are they both Strings ?
        # Strings can be divided in different ways, and calling normalize!
        # here would be rather expensive, so let's use this complicated
        # algorithm
        if contents[self_i].is_a? String and x.contents[othr_i].is_a? String
            a = contents[self_i]
            b = x.contents[othr_i]
            self_i += 1
            othr_i += 1
            while a != "" or b != ""
                if a == b
                    a = ""
                    b = ""
                elsif a.size > b.size and a[0, b.size] == b
                    a = a[b.size..-1]
                    if x.contents[othr_i].is_a? String
                        b = x.contents[othr_i]
                        othr_i += 1
                        next
                    end
                elsif b.size > a.size and b[0, a.size] == a
                    b = b[a.size..-1]
                    if contents[self_i].is_a? String
                        a = contents[self_i]
                        self_i += 1
                        next
                    end
                else
                    return false
                end
            end
            next
        end

        # OK, so at least one of them is not a String.
        # Hopefully they're either both XMLs or one is an XML and the
        # other is a String. It is also possible that contents contains
        # something illegal, but we aren't catching that,
        # so xml(:foo, Garbage.new) is going to at least equal itself.
        # And we aren't, because xml(:foo, Garbage.new) == xml(:bar, Garbage.new)
        # is going to return an honest false, and incoherent sanity
        # check is worse than no sanity check.
        #
        # Oh yeah, they can be XML_PI or XML_Comment. In such case, this
        # is ok.
        return false unless contents[self_i] == x.contents[othr_i]
        self_i += 1
        othr_i += 1
    end
    return true
end

#=~(pattern) ⇒ Object

~ for a few reasonable patterns



1107
1108
1109
1110
1111
1112
1113
1114
1115
# File 'lib/magic_xml.rb', line 1107

def =~(pattern)
    if pattern.is_a? Symbol
        @name == pattern
    elsif pattern.is_a? Regexp
        rv = text =~ pattern
    else # Hash, Pattern_any, Pattern_all
        pattern === self
    end
end

#[](key) ⇒ Object

Read attributes. Also works with pseudoattributes:

img[:@x] == img.child(:x).text # or nil if there isn't any.


842
843
844
845
846
847
848
849
850
851
852
853
854
# File 'lib/magic_xml.rb', line 842

def [](key)
    if key.to_s[0] == ?@
        tag = key.to_s[1..-1].to_sym
        c = child(tag)
        if c
            c.text
        else
            nil
        end
    else
        @attrs[key]
    end
end

#[]=(key, value) ⇒ Object

Set attributes. Value is automatically converted to String, so you can say:

img[:x] = 200

Also works with pseudoattributes:

foo[:@bar] = "x"


861
862
863
864
865
866
867
868
869
870
871
872
873
# File 'lib/magic_xml.rb', line 861

def []=(key, value)
    if key.to_s[0] == ?@
        tag = key.to_s[1..-1].to_sym
        c = child(tag)
        if c
            c.contents = [value.to_s]
        else
            self << XML.new(tag, value.to_s)
        end
    else
        @attrs[key] = value.to_s
    end
end

#add_pretty_printing!Object

Add pretty-printing whitespace. Also normalizes the XML.



1143
1144
1145
1146
1147
# File 'lib/magic_xml.rb', line 1143

def add_pretty_printing!
    normalize!
    real_add_pretty_printing!
    normalize!
end

#child(pat = nil, *rest) ⇒ Object

Equivalent to node.children(pat, *rest) Returns nil if there aren’t any matching children



1227
1228
1229
1230
1231
1232
# File 'lib/magic_xml.rb', line 1227

def child(pat=nil, *rest)
    children(pat, *rest) {|c|
        return c
    }
    return nil
end

#children(pat = nil, *rest, &blk) ⇒ Object

XML#children(pattern, more_patterns) Return all children of a node with tags matching tag. Also:

  • children(:a, :b) == children(:a).children(:b)

  • children(:a, :*, :c) == children(:a).descendants(:c)



1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
# File 'lib/magic_xml.rb', line 1248

def children(pat=nil, *rest, &blk)
    return descendants(*rest, &blk) if pat == :*
    res = []
    @contents.each{|c|
        if pat.nil? or pat === c
            if rest == []
                res << c
                yield c if block_given?
            else
                res += c.children(*rest, &blk)
            end
        end
    }
    res
end

#children_sort_by(*args, &blk) ⇒ Object

Sort children of XML element.



413
414
415
# File 'lib/magic_xml.rb', line 413

def children_sort_by(*args, &blk)
    self.dup{ @contents = @contents.sort_by(*args, &blk) }
end

#deep_map(pat, &blk) ⇒ Object

Change elements based on pattern



1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
# File 'lib/magic_xml.rb', line 1291

def deep_map(pat, &blk)
    if self =~ pat
        yield self
    else
        r = XML.new(self.name, self.attrs)
        each{|c|
            if c.is_a? XML
                r << c.deep_map(pat, &blk)
            else
                r << c
            end
        }
        r
    end
end

#descendant(pat = nil, *rest) ⇒ Object

Equivalent to node.descendants(pat, *rest) Returns nil if there aren’t any matching descendants



1236
1237
1238
1239
1240
1241
# File 'lib/magic_xml.rb', line 1236

def descendant(pat=nil, *rest)
    descendants(pat, *rest) {|c|
        return c
    }
    return nil
end

#descendants(pat = nil, *rest, &blk) ⇒ Object

  • XML#descendants

  • XML#descendants(pattern)

  • XML#descendants(pattern, more_patterns)

Return all descendants of a node matching the pattern. If pattern==nil, simply return all descendants. Optionally run a block on each of them if a block was given. If pattern==nil, also match Strings !



1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
# File 'lib/magic_xml.rb', line 1272

def descendants(pat=nil, *rest, &blk)
    res = []
    @contents.each{|c|
        if pat.nil? or pat === c
            if rest == []
                res << c
                yield c if block_given?
            else
                res += c.children(*rest, &blk)
            end
        end
        if c.is_a? XML
            res += c.descendants(pat, *rest, &blk)
        end
    }
    res
end

#dup(&blk) ⇒ Object

This is not a trivial method - first it does a deep copy, second it takes a block which is instance_eval’ed, so you can do things like:

  • node.dup{ @name = :foo }

  • node.dup{ self = “blue” }



1169
1170
1171
1172
1173
1174
1175
1176
1177
# File 'lib/magic_xml.rb', line 1169

def dup(&blk)
    new_obj = self.raw_dup
    # Attr values stay shared - ugly
    new_obj.attrs = new_obj.attrs.dup
    new_obj.contents = new_obj.contents.map{|c| c.dup}
    
    new_obj.instance_eval(&blk) if blk
    return new_obj
end

#each(*selector, &blk) ⇒ Object

Iterate over children, possibly with a selector



402
403
404
405
# File 'lib/magic_xml.rb', line 402

def each(*selector, &blk)
    children(*selector, &blk)
    self
end

#exec!(&blk) ⇒ Object

Make monadic interface more “official”

  • node.exec! { foo!; bar! }

is equivalent to

  • node << xml(:foo) << xml(:bar)



987
988
989
# File 'lib/magic_xml.rb', line 987

def exec!(&blk)
    instance_eval(&blk)
end

#inspect(include_children = 0) ⇒ Object

Convert to a well-formatted XML, but without children information. This is a reasonable format for irb and debugging. If you want to see a few levels of children, call inspect(2) and so on



828
829
830
831
832
833
834
835
836
837
# File 'lib/magic_xml.rb', line 828

def inspect(include_children=0)
    "<#{@name}" + @attrs.sort.map{|k,v| " #{k}='#{v.xml_attr_escape}'"}.join +
    if @contents.size == 0
        "/>"
    elsif include_children == 0
        ">...</#{name}>"
    else
        ">" + @contents.map{|x| if x.is_a? String then x.xml_escape else x.inspect(include_children-1) end}.join + "</#{name}>"
    end
end

#map(pat = nil) ⇒ Object

FIXME: do we want a shallow or a deep copy here ? Map children, but leave the name/attributes



1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
# File 'lib/magic_xml.rb', line 1309

def map(pat=nil)
    r = XML.new(self.name, self.attrs)
    each{|c|
        if !pat || c =~ pat
            r << yield(c)
        else
            r << c
        end
    }
    r
end

#normalize!Object

Normalization means joining strings and getting rid of “”s, recursively



1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
# File 'lib/magic_xml.rb', line 1193

def normalize!
    new_contents = []
    @contents.each{|c|
        if c.is_a? String
            next if c == ""
            if new_contents[-1].is_a? String
                new_contents[-1] += c
                next
            end
        else
            c.normalize!
        end
        new_contents.push c
    }
    @contents = new_contents
end

#range(range_start, range_end, end_reached_cb = nil) ⇒ Object

Select a subtree NOTE: Uses object_id of the start/end tags ! They have to be the same, not just identical ! <foo>0<a>1</a><b/><c/><d>2</d><e/>3</foo>.range(<a>1</a>, <d>2</d>) returns <foo><b/><c/></foo> start and end and their descendants are not included in the result tree. Either start or end can be nil.

  • If both start and end are nil, return whole tree.

  • If start is nil, return subtree up to range_end.

  • If start is not inside the tree, return nil.

  • If end is nil, return subtree from start

  • If end is not inside the tree, return subtree from start.

  • If end is before or below start, or they’re the same node, the result is unspecified.

  • if end comes directly after start, or as first node when start==nil, return path reaching there.



1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
# File 'lib/magic_xml.rb', line 1007

def range(range_start, range_end, end_reached_cb=nil)
    if range_start == nil
        result = XML.new(name, attrs)
    else
        result = nil
    end
    @contents.each {|c|
        # end reached !
        if range_end and c.object_id == range_end.object_id
            end_reached_cb.call if end_reached_cb
            break
        end
        # start reached !
        if range_start and c.object_id == range_start.object_id
            result = XML.new(name, attrs)
            next
        end
        if result # We already started
            if c.is_a? XML
                break_me = false
                result.add! c.range(nil, range_end, lambda{ break_me = true })
                if break_me
                    end_reached_cb.call if end_reached_cb
                    break
                end
            else # String/XML_PI/XML_Comment
                result.add! c
            end
        else
            # Strings/XML_PI/XML_Comment obviously cannot start a range
            if c.is_a? XML
                break_me = false
                r = c.range(range_start, range_end, lambda{ break_me = true })
                if r
                    # start reached !
                    result = XML.new(name, attrs, r)
                end
                if break_me
                    # end reached !
                    end_reached_cb.call if end_reached_cb
                    break
                end
            end
        end
    }
    return result
end

#raw_dupObject



1163
# File 'lib/magic_xml.rb', line 1163

alias_method :raw_dup, :dup

#real_method_missingObject



972
# File 'lib/magic_xml.rb', line 972

alias_method :real_method_missing, :method_missing

#remove_pretty_printing!(exceptions = nil) ⇒ Object

Get rid of pretty-printing whitespace. Also normalizes the XML.



1118
1119
1120
1121
1122
# File 'lib/magic_xml.rb', line 1118

def remove_pretty_printing!(exceptions=nil)
    normalize!
    real_remove_pretty_printing!(exceptions)
    normalize!
end

#sort(*args, &blk) ⇒ Object

Sort children of XML element.

Using sort is highly wrong, as XML (and XML-extras) is not even Comparable. Use sort_by instead.

Unless you define your own XML#<=> operator, or do something equally weird.



423
424
425
# File 'lib/magic_xml.rb', line 423

def sort(*args, &blk)
    self.dup{ @contents = @contents.sort(*args, &blk) }
end

#sort_by(*args, &blk) ⇒ Object

Sort XML children of XML element.



408
409
410
# File 'lib/magic_xml.rb', line 408

def sort_by(*args, &blk)
    self.dup{ @contents = @contents.select{|c| c.is_a? XML}.sort_by(*args, &blk) }
end

#subsequence(range_start, range_end, start_seen_cb = nil, end_seen_cb = nil) ⇒ Object

XML#subsequence is similar to XML#range, but instead of trimmed subtree in returns a list of elements The same elements are included in both cases, but here we do not include any parents !

<foo><a/><b/><c/></foo>.range(a,c) => <foo><b/></foo> <foo><a/><b/><c/></foo>.subsequence(a,c) => <b/>

<foo><a><a1/></a><b/><c/></foo>.range(a1,c) => <foo><a/><b/></foo> # Does <a/> make sense ? <foo><a><a1/></a><b/><c/></foo>.subsequence(a1,c) => <b/>

<foo><a><a1/><a2/></a><b/><c/></foo>.range(a1,c) => <foo><a><a2/></a><b/></foo> <foo><a><a1/><a2/></a><b/><c/></foo>.subsequence(a1,c) => <a2/><b/>

And we return [], not nil if nothing matches



1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
# File 'lib/magic_xml.rb', line 1070

def subsequence(range_start, range_end, start_seen_cb=nil, end_seen_cb=nil)
    result = []
    start_seen = range_start.nil?
    @contents.each{|c|
        if range_end and range_end.object_id == c.object_id
            end_seen_cb.call if end_seen_cb
            break 
        end
        if range_start and range_start.object_id == c.object_id
            start_seen = true
            start_seen_cb.call if start_seen_cb
            next
        end
        if start_seen
            if c.is_a? XML
                break_me = false
                result += c.subsequence(nil, range_end, nil, lambda{break_me=true})
                break if break_me
            else # String/XML_PI/XML_Comment
                result << c
            end
        else
            # String/XML_PI/XML_Comment cannot start a subsequence
            if c.is_a? XML
                break_me = false
                result += c.subsequence(range_start, range_end, lambda{start_seen=true}, lambda{break_me=true})
                break if break_me
            end
        end
    }
    # Include starting tag if it was right from the range_start
    # Otherwise, return just the raw sequence
    result = [XML.new(@name, @attrs, result)] if range_start == nil
    return result
end

#textObject

Return text below the node, stripping all XML tags, “<foo>Hello, <bar>world</bar>!</foo>”.xml_parse.text returns “Hello, world!”



1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
# File 'lib/magic_xml.rb', line 1213

def text
    res = ""
    @contents.each{|c|
        if c.is_a? XML
            res << c.text
        elsif c.is_a? String
            res << c
        end # Ignore XML_PI/XML_Comment
    }
    res
end

#text!(*args) ⇒ Object

Add some String children (all attributes get to_s’ed)



1181
1182
1183
# File 'lib/magic_xml.rb', line 1181

def text!(*args)
    args.each{|s| self << s.to_s}
end

#to_sObject

Convert to a well-formatted XML



816
817
818
819
820
821
822
823
# File 'lib/magic_xml.rb', line 816

def to_s
    "<#{@name}" + @attrs.sort.map{|k,v| " #{k}='#{v.xml_attr_escape}'"}.join +
    if @contents.size == 0
        "/>"
    else
        ">" + @contents.map{|x| if x.is_a? String then x.xml_escape else x.to_s end}.join + "</#{name}>"
    end
end

#xml!(*args, &blk) ⇒ Object

Add XML child



1185
1186
1187
# File 'lib/magic_xml.rb', line 1185

def xml!(*args, &blk)
    @contents << XML.new(*args, &blk)
end