Class: Webgen::SourceHandler::Fragment
- Inherits:
-
Object
- Object
- Webgen::SourceHandler::Fragment
- Includes:
- Base, WebsiteAccess
- Defined in:
- lib/webgen/sourcehandler/fragment.rb
Overview
Handles page fragment nodes and provides utility methods for parsing HTML headers and generating fragment nodes from them.
Constant Summary collapse
- HTML_HEADER_REGEXP =
/<h([123456])(?:>|\s([^>]*)>)(.*?)<\/h\1\s*>/i
- HTML_ATTR_REGEXP =
/\s*(\w+)\s*=\s*('|")(.+?)\2\s*/
Instance Method Summary collapse
-
#create_fragment_nodes(sections, parent, path, in_menu, si = 1000) ⇒ Object
Create nested fragment nodes under
parent
fromsections
(which can be created usingparse_html_headers
). -
#parse_html_headers(content) ⇒ Object
Parse the string
content
for headersh1
, …,h6
and return the found, nested sections.
Methods included from WebsiteAccess
Methods included from Base
#content, #create_node, #node_exists?, #output_path, #page_from_path, #parent_node
Methods included from Base::OutputPathHelpers
Methods included from Loggable
Instance Method Details
#create_fragment_nodes(sections, parent, path, in_menu, si = 1000) ⇒ Object
Create nested fragment nodes under parent
from sections
(which can be created using parse_html_headers
). path
is the source path that defines the fragments (which is not the same as the creation path for parent
). The meta information in_menu
of the fragment nodes is set to the parameter in_menu
and the meta info sort_info
is calculated from the base si
value.
53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/webgen/sourcehandler/fragment.rb', line 53 def create_fragment_nodes(sections, parent, path, , si = 1000) sections.each do |level, id, title, sub_sections| fragment_path = parent.alcn.sub(/#.*$/, '') + '#' + id node = website.blackboard.invoke(:create_nodes, Webgen::Path.new(fragment_path, path.source_path), self) do |cn_path| cn_path.['title'] = title cn_path.['in_menu'] = cn_path.['sort_info'] = si = si.succ create_node(cn_path, :parent => parent) end.first create_fragment_nodes(sub_sections, node, path, , si.succ) end end |
#parse_html_headers(content) ⇒ Object
Parse the string content
for headers h1
, …, h6
and return the found, nested sections.
Only those headers are used which have an id
attribute set. The method returns a list of arrays with entries level, id, title, sub sections
where sub sections
is such a list again.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/webgen/sourcehandler/fragment.rb', line 20 def parse_html_headers(content) sections = [] stack = [] content.scan(HTML_HEADER_REGEXP).each do |level,attrs,title| next if attrs.nil? id_attr = attrs.scan(HTML_ATTR_REGEXP).find {|name,sep,value| name == 'id'} next if id_attr.nil? id = id_attr[2] section = [level.to_i, id, title, []] success = false while !success if stack.empty? sections << section stack << section success = true elsif stack.last.first < section.first stack.last.last << section stack << section success = true else stack.pop end end end sections end |