Class: Scrubyt::XPathUtils
- Inherits:
-
Object
- Object
- Scrubyt::XPathUtils
- Defined in:
- lib/scrubyt/utils/xpathutils.rb
Overview
Various XPath utility functions
Class Method Summary collapse
-
.find_image(doc, example, index = 0) ⇒ Object
Find an image based on the src of the example.
-
.find_nearest_node_with_attribute(node, attribute) ⇒ Object
Used when automatically looking up href attributes (for detail or next links) If the detail pattern did not extract a link, we first look up it’s children - and if we don’t find a link, traverse up.
-
.generate_generalized_relative_XPath(elem, relative_root) ⇒ Object
Generate a generalized XPath (i.e. without indices) of the node, relatively to the given relative_root.
-
.generate_relative_XPath(elem, relative_root) ⇒ Object
Generate an XPath of the node with indices, relatively to the given relative_root.
-
.generate_relative_XPath_from_XPaths(parent_xpath, child_xpath) ⇒ Object
Generalre relative XPath from two XPaths: a parent one, (which points higher in the tree), and a child one.
-
.generate_XPath(node, stopnode = nil, write_indices = false) ⇒ Object
Generate XPath for the given node.
-
.lowest_common_ancestor(node1, node2) ⇒ Object
Find the LCA (Lowest Common Ancestor) of two nodes.
- .to_full_XPath(doc, xpath, generalize) ⇒ Object
-
.traverse_up_until_name(node, name) ⇒ Object
Used to find the parent of a node with the given name - for example find the <form> node which is the parent of the <input> node.
Class Method Details
.find_image(doc, example, index = 0) ⇒ Object
Find an image based on the src of the example
parameters
doc - The containing document
example - The value of the src attribute of the img tag This is convenient, since if the users rigth-clicks an image and copies image location, this string will be copied to the clipboard and thus can be easily pasted as an examle
index - there might be more images with the same src on the page - most typically the user will need the 0th - but if this is not the case, there is the possibility to override this
96 97 98 99 100 101 102 103 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 96 def self.find_image(doc, example, index=0) if example =~ /\.(jpg|png|gif|jpeg)(\[\d+\])$/ res = example.scan(/(.+)\[(\d+)\]$/) example = res[0][0] index = res[0][1].to_i end (doc/"//img[@src='#{example}']")[index] end |
.find_nearest_node_with_attribute(node, attribute) ⇒ Object
Used when automatically looking up href attributes (for detail or next links) If the detail pattern did not extract a link, we first look up it’s children - and if we don’t find a link, traverse up
122 123 124 125 126 127 128 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 122 def self.find_nearest_node_with_attribute(node, attribute) @node = nil return node if node.is_a? Hpricot::Elem and node[attribute] first_child_node_with_attribute(node, attribute) first_parent_node_with_attribute(node, attribute) if !@node @node end |
.generate_generalized_relative_XPath(elem, relative_root) ⇒ Object
77 78 79 80 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 77 def self.generate_generalized_relative_XPath( elem,relative_root ) return nil if (elem == relative_root) generate_XPath(elem, relative_root, false) end |
.generate_relative_XPath(elem, relative_root) ⇒ Object
Generate an XPath of the node with indices, relatively to the given relative_root.
For example if the elem’s absolute XPath is /a/b/c, and the relative root’s Xpath is a/b, the result of the function will be /c.
66 67 68 69 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 66 def self.generate_relative_XPath( elem,relative_root ) return nil if (elem == relative_root) generate_XPath(elem, relative_root, true) end |
.generate_relative_XPath_from_XPaths(parent_xpath, child_xpath) ⇒ Object
Generalre relative XPath from two XPaths: a parent one, (which points higher in the tree), and a child one. The result of the method is the relative XPath of the node pointed to by the second XPath to the node pointed to by the firs XPath.
134 135 136 137 138 139 140 141 142 143 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 134 def self.generate_relative_XPath_from_XPaths(parent_xpath, child_xpath) original_child_xpath_parts = child_xpath.split('/').reject{|s|s==""} pairs = to_general_XPath(child_xpath).split('/').reject{|s|s==""}.zip to_general_XPath(parent_xpath).split('/').reject{|s|s==""} i = 0 pairs.each_with_index do |pair,index| i = index break if pair[0] != pair[1] end "/" + original_child_xpath_parts[i..-1].join('/') end |
.generate_XPath(node, stopnode = nil, write_indices = false) ⇒ Object
Generate XPath for the given node
parameters
node - The node we are looking up the XPath for
stopnode - The Xpath generation is stopped and the XPath that was generated so far is returned if this node is reached.
write_indices - whether the index inside the parent shuold be added, as in html/body/table/tr/td
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 35 def self.generate_XPath(node, stopnode=nil, write_indices=false) path = [] indices = [] found = false while !node.nil? && node.class != Hpricot::Doc do if node == stopnode found = true break end path.push node.name indices.push find_index(node) if write_indices node = node.parent end #This condition ensures that if there is a stopnode, and we did not found it along the way, #we return nil (since the stopnode is not contained in the path at all) return nil if stopnode != nil && !found result = "" if write_indices path.reverse.zip(indices.reverse).each { |node,index| result += "#{node}[#{index}]/" } else path.reverse.each{ |node| result += "#{node}/" } end "/" + result.chop end |
.lowest_common_ancestor(node1, node2) ⇒ Object
Find the LCA (Lowest Common Ancestor) of two nodes
10 11 12 13 14 15 16 17 18 19 20 21 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 10 def self.lowest_common_ancestor(node1, node2) path1 = traverse_up(node1) path2 = traverse_up(node2) return node1.parent if path1 == path2 closure = nil while (!path1.empty? && !path2.empty?) closure = path1.pop return closure.parent if (closure != path2.pop) end path1.size > path2.size ? path1.last.parent : path2.last.parent end |
.to_full_XPath(doc, xpath, generalize) ⇒ Object
145 146 147 148 149 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 145 def self.to_full_XPath(doc, xpath, generalize) elem = doc/xpath elem = elem.map[0] if elem.is_a? Hpricot::Elements XPathUtils.generate_XPath(elem, nil, generalize) end |
.traverse_up_until_name(node, name) ⇒ Object
Used to find the parent of a node with the given name - for example find the <form> node which is the parent of the <input> node
108 109 110 111 112 113 114 115 116 |
# File 'lib/scrubyt/utils/xpathutils.rb', line 108 def self.traverse_up_until_name(node, name) while node.class != Hpricot::Doc do #raise "The element is nil! This probably means the widget with the specified name ('#{name}') does not exist" unless node return nil unless node break if node.name == name node = node.parent end node end |