Class: MdnQuery::TraverseDom
- Inherits:
-
Object
- Object
- MdnQuery::TraverseDom
- Defined in:
- lib/mdn_query/traverse_dom.rb
Overview
A DOM traverser that extracts relevant elements.
Constant Summary collapse
- BLACKLIST =
Sections that are blacklisted and excluded from the document.
%w[Specifications Browser_compatibility].freeze
Instance Attribute Summary collapse
-
#current_section ⇒ MdnQuery::Section
readonly
The current section.
-
#document ⇒ MdnQuery::Document
readonly
The document that contains the extracted text.
-
#dom ⇒ Nokogiri::HTML::Document
readonly
The DOM that is traversed.
Class Method Summary collapse
-
.create_document(dom, title, url) ⇒ MdnQuery::Document
Creates a new document with the extracted text.
-
.fill_document(dom, document) ⇒ MdnQuery::Document
Fills a document with the extracted text.
Instance Method Summary collapse
-
#blacklisted?(id) ⇒ Boolean
Returns whether the id is blacklisted.
-
#create_child(desired_level, name) ⇒ MdnQuery::Section
Creates a new child section on the appropriate parent section.
-
#initialize(dom, document: nil, url: nil) ⇒ MdnQuery::TraverseDom
constructor
Creates a new DOM traverser.
-
#traverse ⇒ void
Traverses the DOM and extracts relevant informations into the document.
Constructor Details
#initialize(dom, document: nil, url: nil) ⇒ MdnQuery::TraverseDom
Creates a new DOM traverser.
The given document is used to save the extracted text. If no document is given, a new one is created with the generic title ‘root’ and the given url.
The DOM is not automatically traversed (use #traverse).
50 51 52 53 54 |
# File 'lib/mdn_query/traverse_dom.rb', line 50 def initialize(dom, document: nil, url: nil) @dom = dom @document = document || MdnQuery::Document.new('root', url) @current_section = @document.section end |
Instance Attribute Details
#current_section ⇒ MdnQuery::Section (readonly)
Returns the current section.
5 6 7 |
# File 'lib/mdn_query/traverse_dom.rb', line 5 def current_section @current_section end |
#document ⇒ MdnQuery::Document (readonly)
Returns the document that contains the extracted text.
11 12 13 |
# File 'lib/mdn_query/traverse_dom.rb', line 11 def document @document end |
#dom ⇒ Nokogiri::HTML::Document (readonly)
Returns the DOM that is traversed.
8 9 10 |
# File 'lib/mdn_query/traverse_dom.rb', line 8 def dom @dom end |
Class Method Details
.create_document(dom, title, url) ⇒ MdnQuery::Document
Creates a new document with the extracted text.
22 23 24 25 |
# File 'lib/mdn_query/traverse_dom.rb', line 22 def self.create_document(dom, title, url) document = MdnQuery::Document.new(title, url) fill_document(dom, document) end |
.fill_document(dom, document) ⇒ MdnQuery::Document
Fills a document with the extracted text.
32 33 34 35 36 |
# File 'lib/mdn_query/traverse_dom.rb', line 32 def self.fill_document(dom, document) traverser = new(dom, document: document) traverser.traverse traverser.document end |
Instance Method Details
#blacklisted?(id) ⇒ Boolean
Returns whether the id is blacklisted.
123 124 125 |
# File 'lib/mdn_query/traverse_dom.rb', line 123 def blacklisted?(id) BLACKLIST.include?(id) end |
#create_child(desired_level, name) ⇒ MdnQuery::Section
Creates a new child section on the appropriate parent section.
61 62 63 64 65 66 67 |
# File 'lib/mdn_query/traverse_dom.rb', line 61 def create_child(desired_level, name) until @current_section.level < desired_level || @current_section.parent.nil? @current_section = @current_section.parent end @current_section = @current_section.create_child(name) end |
#traverse ⇒ void
This method returns an undefined value.
Traverses the DOM and extracts relevant informations into the document.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/mdn_query/traverse_dom.rb', line 72 def traverse unless @dom.css('div.nonStandard').empty? @current_section.append_text("\n> ***Non-standard***\n") end blacklist_level = nil @dom.children.each do |child| if child_blacklisted?(child, blacklist_level) if blacklist_level.nil? blacklist_level = child.name.match(/\Ah(?<level>\d)\z/)[:level] end next end blacklist_level = nil case child.name when 'p' @current_section.append_text(child.text) when 'ul' @current_section.append_text(convert_list(child)) when 'dl' @current_section.append_text(convert_description(child)) when 'pre' if child[:class].nil? @current_section.append_code(child.text) next end match = child[:class].match(/brush:\s*(?<lang>\w+)/) if match.nil? @current_section.append_code(child.text) else lang = match[:lang] lang = 'javascript' if lang == 'js' @current_section.append_code(child.text, language: lang) end when /\Ah(?<level>\d)\z/ level = $LAST_MATCH_INFO[:level].to_i create_child(level, child[:id].tr('_', ' ')) when 'table' @current_section.append_text(convert_table(child)) when 'div' next if child[:class].nil? if child[:class].include?('note') || child[:class].include?('warning') @current_section.append_text("\n> #{child.text.strip}\n") end end end end |