Module: XML

Defined in:
lib/webget_ramp/xml.rb

Class Method Summary collapse

Class Method Details

.load_attributes(dirpath, xpath) ⇒ Object

Sugar to load attributes from a file.

Example

XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }


52
53
54
55
56
# File 'lib/webget_ramp/xml.rb', line 52

def XML.load_attributes(dirpath,xpath)
  XML.load_elements(dirpath,xpath){|e|
    yield e.attributes
  }
end

.load_attributes_hash(dirpath, xpath) ⇒ Object

Sugar to load attributes hash from a file.

Example

XML.load_attributes('config.xml','userlist/user'){|attributes| pp attributes['first_name'] }


63
64
65
66
67
# File 'lib/webget_ramp/xml.rb', line 63

def XML.load_attributes_hash(dirpath,xpath)
  XML.load_elements(dirpath,xpath){|e|
    yield e.attributes.to_hash
  }
end

.load_dir(*dirpaths) ⇒ Object

Specify one or more directory patterns and pass each XML file in the matching directories to a block.

See [Dir#glob](www.ruby-doc.org/core/classes/Dir.html#M002347) for pattern details.

Example

XML.load_dir('/tmp/*.xml'){|xml_document|
  #...whatever you want to do with each xml document
}

Example to load xml documents in files beginning in “foo” or “bar”

XML.load_dir('/tmp/foo*.yaml','/tmp/bar*.xml','){|xml_document|
  #...whatever you want to do with the xml document
}


20
21
22
23
24
25
26
27
28
29
30
# File 'lib/webget_ramp/xml.rb', line 20

def XML.load_dir(*dirpaths)
  dirpaths=[*dirpaths.flatten]
  dirpaths.each do |dirpath|
    Dir[dirpath].sort.each do |filename|
      File.open(filename) do |file|
        doc = REXML::Document.new file
        yield doc
      end #file
    end #dir
  end #each
end

.load_elements(dirpath, xpath) ⇒ Object

Sugar to load elements from a file.

Example

XML.load_attributes('config.xml','userlist/user'){|element| pp element.attributes['first_name'] }


38
39
40
41
42
43
44
# File 'lib/webget_ramp/xml.rb', line 38

def XML.load_elements(dirpath,xpath)
  XML.load_dir(dirpath){|doc|
    doc.elements.each(xpath){|e|
      yield e
    }
  }
end

.strip_all(xml_text) ⇒ Object

Santize dirty xml by removing unprintables, bad tags, comments, and generally anything else we might need to enable the XML parser to handle a dirty document.

Example

# This example shows curly braces instead of angle braces because of HTML formatting
s="{foo a=b c=d}{!--comment--}Hello{!-[if bar]}Microsoft{![endif]}World{/foo}"
XML.strip_all(s) => "{foo}HelloWorld{/foo}"

This method calls these in order:

- XML.strip_unprintables
- XML.strip_microsoft
- XML.strip_comments
- XML.strip_attributes


85
86
87
# File 'lib/webget_ramp/xml.rb', line 85

def XML.strip_all(xml_text)
  return XML.strip_attributes(XML.strip_comments(XML.strip_microsoft(XML.strip_unprintables(xml_text))))
end

.strip_attributes(xml_text) ⇒ Object

Strip out all attributes from the xml text’s tags.

Example

s="<foo a=b c=d e=f>Hello</foo>"
XML.strip_attributes(s) => "<foo>Hello</foo>"


96
97
98
# File 'lib/webget_ramp/xml.rb', line 96

def XML.strip_attributes(xml_text)
  return xml_text.gsub(/<(\/?\w+).*?>/im){"<#{$1}>"}  # delete attributes
end

.strip_comments(xml_text) ⇒ Object

Strip out all comments from the xml text.

Example

# This example shows curly braces instead of angle braces because of HTML formatting
s="Hello{!--comment--}World"
XML.strip_comments(s) => "HelloWorld"


108
109
110
# File 'lib/webget_ramp/xml.rb', line 108

def XML.strip_comments(xml_text)
  return xml_text.gsub(/<!.*?>/im,'')  
end

.strip_microsoft(xml_text) ⇒ Object

Strip out all microsoft proprietary codes.

Example

s="Hello<!-[if foo]>Microsoft<![endif]->World"
XML.strip_microsoft(s) => "HelloWorld"


119
120
121
# File 'lib/webget_ramp/xml.rb', line 119

def XML.strip_microsoft(xml_text)
  return xml_text.gsub(/<!-*\[if\b.*?<!\[endif\]-*>/im,'')
end

.strip_unprintables(xml_text) ⇒ Object

Strip out all unprintable characters from the input string.

Example

s="Hello\XXXWorld" # where XXX is unprintable
XML.strip_unprintables(s) => "HelloWorld"


130
131
132
# File 'lib/webget_ramp/xml.rb', line 130

def XML.strip_unprintables(xml_text)
  return xml_text.gsub(/[^[:print:]]/, "")
end