Module: XML
- Defined in:
- lib/webget_ramp/xml.rb
Class Method Summary collapse
-
.load_attributes(dirpath, xpath) ⇒ Object
Sugar to load attributes from a file.
-
.load_attributes_hash(dirpath, xpath) ⇒ Object
Sugar to load attributes hash from a file.
-
.load_dir(*dirpaths) ⇒ Object
Specify one or more directory patterns and pass each XML file in the matching directories to a block.
-
.load_elements(dirpath, xpath) ⇒ Object
Sugar to load elements from a file.
-
.strip_all(xml_text) ⇒ Object
Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
-
.strip_attributes(xml_text) ⇒ Object
Strip out all attributes from the xml text’s tags.
-
.strip_comments(xml_text) ⇒ Object
Strip out all comments from the xml text.
-
.strip_microsoft(xml_text) ⇒ Object
Strip out all microsoft proprietary codes.
-
.strip_unprintables(xml_text) ⇒ Object
Strip out all unprintable characters from the input string.
Class Method Details
.load_attributes(dirpath, xpath) ⇒ Object
Sugar to load attributes from a file.
Example
XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }
52 53 54 55 56 |
# File 'lib/webget_ramp/xml.rb', line 52 def XML.load_attributes(dirpath,xpath) XML.load_elements(dirpath,xpath){|e| yield e.attributes } end |
.load_attributes_hash(dirpath, xpath) ⇒ Object
Sugar to load attributes hash from a file.
Example
XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }
63 64 65 66 67 |
# File 'lib/webget_ramp/xml.rb', line 63 def XML.load_attributes_hash(dirpath,xpath) XML.load_elements(dirpath,xpath){|e| yield e.attributes.to_hash } end |
.load_dir(*dirpaths) ⇒ Object
Specify one or more directory patterns and pass each XML file in the matching directories to a block.
See [Dir#glob](www.ruby-doc.org/core/classes/Dir.html#M002347) for pattern details.
Example
XML.load_dir('/tmp/*.xml'){|xml_document|
#...whatever you want to do with each xml document
}
Example to load xml documents in files beginning in “foo” or “bar”
XML.load_dir('/tmp/foo*.yaml','/tmp/bar*.xml','){|xml_document|
#...whatever you want to do with the xml document
}
20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/webget_ramp/xml.rb', line 20 def XML.load_dir(*dirpaths) dirpaths=[*dirpaths.flatten] dirpaths.each do |dirpath| Dir[dirpath].sort.each do |filename| File.open(filename) do |file| doc = REXML::Document.new file yield doc end #file end #dir end #each end |
.load_elements(dirpath, xpath) ⇒ Object
Sugar to load elements from a file.
Example
XML.load_attributes('config.xml','userlist/user'){|element| pp element.attributes['first_name'] }
38 39 40 41 42 43 44 |
# File 'lib/webget_ramp/xml.rb', line 38 def XML.load_elements(dirpath,xpath) XML.load_dir(dirpath){|doc| doc.elements.each(xpath){|e| yield e } } end |
.strip_all(xml_text) ⇒ Object
Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.
Example
# This example shows curly braces instead of angle braces because of HTML formatting
s="{foo a=b c=d}{!--comment--}Hello{!-[if bar]}Microsoft{![endif]}World{/foo}"
XML.strip_all(s) => "{foo}HelloWorld{/foo}"
This method calls these in order:
- XML.strip_unprintables
- XML.strip_microsoft
- XML.strip_comments
- XML.strip_attributes
85 86 87 |
# File 'lib/webget_ramp/xml.rb', line 85 def XML.strip_all(xml_text) return XML.strip_attributes(XML.strip_comments(XML.strip_microsoft(XML.strip_unprintables(xml_text)))) end |
.strip_attributes(xml_text) ⇒ Object
Strip out all attributes from the xml text’s tags.
Example
s="<foo a=b c=d e=f>Hello</foo>"
XML.strip_attributes(s) => "<foo>Hello</foo>"
96 97 98 |
# File 'lib/webget_ramp/xml.rb', line 96 def XML.strip_attributes(xml_text) return xml_text.gsub(/<(\/?\w+).*?>/im){"<#{$1}>"} # delete attributes end |
.strip_comments(xml_text) ⇒ Object
Strip out all comments from the xml text.
Example
# This example shows curly braces instead of angle braces because of HTML formatting
s="Hello{!--comment--}World"
XML.strip_comments(s) => "HelloWorld"
108 109 110 |
# File 'lib/webget_ramp/xml.rb', line 108 def XML.strip_comments(xml_text) return xml_text.gsub(/<!.*?>/im,'') end |
.strip_microsoft(xml_text) ⇒ Object
Strip out all microsoft proprietary codes.
Example
s="Hello<!-[if foo]>Microsoft<![endif]->World"
XML.strip_microsoft(s) => "HelloWorld"
119 120 121 |
# File 'lib/webget_ramp/xml.rb', line 119 def XML.strip_microsoft(xml_text) return xml_text.gsub(/<!-*\[if\b.*?<!\[endif\]-*>/im,'') end |
.strip_unprintables(xml_text) ⇒ Object
Strip out all unprintable characters from the input string.
Example
s="Hello\XXXWorld" # where XXX is unprintable
XML.strip_unprintables(s) => "HelloWorld"
130 131 132 |
# File 'lib/webget_ramp/xml.rb', line 130 def XML.strip_unprintables(xml_text) return xml_text.gsub(/[^[:print:]]/, "") end |