Module: HTree::Container::Trav

Includes:
Traverse
Included in:
Doc::Trav, Elem::Trav
Defined in:
lib/htree/traverse.rb,
lib/htree/modules.rb,
lib/htree/traverse.rb,
lib/htree/traverse.rb

Overview

:startdoc:

Instance Method Summary collapse

Methods included from Traverse

#bogusetag?, #comment?, #doc?, #doctype?, #elem?, #get_subnode, #procins?, #text?, #traverse_text, #xmldecl?

Instance Method Details

#each_child(&block) ⇒ Object

each_child iterates over each child.



29
30
31
32
# File 'lib/htree/traverse.rb', line 29

def each_child(&block) # :yields: child_node
  children.each(&block)
  nil
end

#each_child_with_index(&block) ⇒ Object

each_child_with_index iterates over each child.



35
36
37
38
# File 'lib/htree/traverse.rb', line 35

def each_child_with_index(&block) # :yields: child_node, index
  children.each_with_index(&block)
  nil
end

each_hyperlink traverses hyperlinks such as HTML href attribute of A element.

It yields HTree::Text or HTree::Loc.

Note that each_hyperlink yields HTML href attribute of BASE element.



161
162
163
164
165
166
# File 'lib/htree/traverse.rb', line 161

def each_hyperlink # :yields: text
  links = []
  each_hyperlink_attribute {|elem, attr, hyperlink|
    yield hyperlink
  }
end

each_hyperlink_uri traverses hyperlinks such as HTML href attribute of A element.

It yields HTree::Text (or HTree::Loc) and URI for each hyperlink.

The URI objects are created with a base URI which is given by HTML BASE element or the argument ((|base_uri|)). each_hyperlink_uri doesn’t yields href of the BASE element.



138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/htree/traverse.rb', line 138

def each_hyperlink_uri(base_uri=nil) # :yields: hyperlink, uri
  base_uri = URI.parse(base_uri) if String === base_uri
  links = []
  each_hyperlink_attribute {|elem, attr, hyperlink|
    if %r{\{http://www.w3.org/1999/xhtml\}(?:base)\z}i =~ elem.name
      base_uri = URI.parse(hyperlink.to_s)
    else
      links << hyperlink
    end
  }
  if base_uri
    links.each {|hyperlink| yield hyperlink, base_uri + hyperlink.to_s }
  else
    links.each {|hyperlink| yield hyperlink, URI.parse(hyperlink.to_s) }
  end
end

#each_uri(base_uri = nil) ⇒ Object

each_uri traverses hyperlinks such as HTML href attribute of A element.

It yields URI for each hyperlink.

The URI objects are created with a base URI which is given by HTML BASE element or the argument ((|base_uri|)).



175
176
177
# File 'lib/htree/traverse.rb', line 175

def each_uri(base_uri=nil) # :yields: URI
  each_hyperlink_uri(base_uri) {|hyperlink, uri| yield uri }
end

#filter(&block) ⇒ Object

filter rebuilds the tree without some components.

node.filter {|descendant_node| predicate } -> node
loc.filter {|descendant_loc| predicate } -> node

filter yields each node except top node. If given block returns false, corresponding node is dropped. If given block returns true, corresponding node is retained and inner nodes are examined.

filter returns an node. It doesn’t return location object even if self is location object.



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/htree/traverse.rb', line 259

def filter(&block)
  subst = {}
  each_child_with_index {|descendant, i|
    if yield descendant
      if descendant.elem?
        subst[i] = descendant.filter(&block)
      else
        subst[i] = descendant
      end
    else
      subst[i] = nil
    end
  }
  to_node.subst_subnode(subst)
end

#find_element(*names) ⇒ Object

find_element searches an element which universal name is specified by the arguments. It returns nil if not found.



43
44
45
46
# File 'lib/htree/traverse.rb', line 43

def find_element(*names)
  traverse_element(*names) {|e| return e }
  nil
end

#traverse_element(*names, &block) ⇒ Object

traverse_element traverses elements in the tree. It yields elements in depth first order.

If names are empty, it yields all elements. If non-empty names are given, it should be list of universal names.

A nested element is yielded in depth first order as follows.

t = HTree('<a id=0><b><a id=1 /></b><c id=2 /></a>') 
t.traverse_element("a", "c") {|e| p e}
# =>
{elem <a id="0"> {elem <b> {emptyelem <a id="1">} </b>} {emptyelem <c id="2">} </a>}
{emptyelem <a id="1">}
{emptyelem <c id="2">}

Universal names are specified as follows.

t = HTree(<<'End')
<html>
<meta name="robots" content="index,nofollow">
<meta name="author" content="Who am I?">    
</html>
End
t.traverse_element("{http://www.w3.org/1999/xhtml}meta") {|e| p e}
# =>
{emptyelem <{http://www.w3.org/1999/xhtml}meta name="robots" content="index,nofollow">}
{emptyelem <{http://www.w3.org/1999/xhtml}meta name="author" content="Who am I?">}


76
77
78
79
80
81
82
83
84
85
# File 'lib/htree/traverse.rb', line 76

def traverse_element(*names, &block) # :yields: element
  if names.empty?
    traverse_all_element(&block)
  else
    name_set = {}
    names.each {|n| name_set[n] = true }
    traverse_some_element(name_set, &block)
  end
  nil
end

#traverse_text_internal(&block) ⇒ Object



228
229
230
# File 'lib/htree/traverse.rb', line 228

def traverse_text_internal(&block)
  each_child {|c| c.traverse_text_internal(&block) }
end