Class: Webgen::SourceHandler::Fragment

Inherits:
Object
  • Object
show all
Includes:
Base, WebsiteAccess
Defined in:
lib/webgen/sourcehandler/fragment.rb

Overview

Handles page fragment nodes and provides utility methods for parsing HTML headers and generating fragment nodes from them.

Constant Summary collapse

HTML_HEADER_REGEXP =
/<h([123456])(?:>|\s([^>]*)>)(.*?)<\/h\1\s*>/i
HTML_ATTR_REGEXP =
/\s*(\w+)\s*=\s*('|")(.+?)\2\s*/

Instance Method Summary collapse

Methods included from WebsiteAccess

included, website

Methods included from Base

#content, #create_node, #node_exists?, #output_path, #page_from_path, #parent_node

Methods included from Base::OutputPathHelpers

#standard_output_path

Methods included from Loggable

#log, #puts

Instance Method Details

#create_fragment_nodes(sections, parent, path, in_menu, si = 1000) ⇒ Object

Create nested fragment nodes under parent from sections (which can be created using parse_html_headers). path is the source path that defines the fragments (which is not the same as the creation path for parent). The meta information in_menu of the fragment nodes is set to the parameter in_menu and the meta info sort_info is calculated from the base si value.



53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/webgen/sourcehandler/fragment.rb', line 53

def create_fragment_nodes(sections, parent, path, in_menu, si = 1000)
  sections.each do |level, id, title, sub_sections|
    fragment_path = parent.alcn.sub(/#.*$/, '') + '#' + id
    node = website.blackboard.invoke(:create_nodes,
                                     Webgen::Path.new(fragment_path, path.source_path),
                                     self) do |cn_path|
      cn_path.meta_info['title'] = title
      cn_path.meta_info['in_menu'] = in_menu
      cn_path.meta_info['sort_info'] = si = si.succ
      create_node(cn_path, :parent => parent)
    end.first
    create_fragment_nodes(sub_sections, node, path, in_menu, si.succ)
  end
end

#parse_html_headers(content) ⇒ Object

Parse the string content for headers h1, …, h6 and return the found, nested sections.

Only those headers are used which have an id attribute set. The method returns a list of arrays with entries level, id, title, sub sections where sub sections is such a list again.



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/webgen/sourcehandler/fragment.rb', line 20

def parse_html_headers(content)
  sections = []
  stack = []
  content.scan(HTML_HEADER_REGEXP).each do |level,attrs,title|
    next if attrs.nil?
    id_attr = attrs.scan(HTML_ATTR_REGEXP).find {|name,sep,value| name == 'id'}
    next if id_attr.nil?
    id = id_attr[2]

    section = [level.to_i, id, title, []]
    success = false
    while !success
      if stack.empty?
        sections << section
        stack << section
        success = true
      elsif stack.last.first < section.first
        stack.last.last << section
        stack << section
        success = true
      else
        stack.pop
      end
    end
  end
  sections
end