Class: MdnQuery::TraverseDom

Inherits:
Object
  • Object
show all
Defined in:
lib/mdn_query/traverse_dom.rb

Overview

A DOM traverser that extracts relevant elements.

Constant Summary collapse

BLACKLIST =

Sections that are blacklisted and excluded from the document.

%w[Specifications Browser_compatibility].freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dom, document: nil, url: nil) ⇒ MdnQuery::TraverseDom

Creates a new DOM traverser.

The given document is used to save the extracted text. If no document is given, a new one is created with the generic title ‘root’ and the given url.

The DOM is not automatically traversed (use #traverse).

Parameters:

  • dom (Nokogiri::HTML::Document)

    the DOM that is traversed

  • document (MdnQuery::Document) (defaults to: nil)

    the document to be filled

  • url (String) (defaults to: nil)

    the URL for the new document if none was provided



50
51
52
53
54
# File 'lib/mdn_query/traverse_dom.rb', line 50

def initialize(dom, document: nil, url: nil)
  @dom = dom
  @document = document || MdnQuery::Document.new('root', url)
  @current_section = @document.section
end

Instance Attribute Details

#current_sectionMdnQuery::Section (readonly)

Returns the current section.

Returns:



5
6
7
# File 'lib/mdn_query/traverse_dom.rb', line 5

def current_section
  @current_section
end

#documentMdnQuery::Document (readonly)

Returns the document that contains the extracted text.

Returns:



11
12
13
# File 'lib/mdn_query/traverse_dom.rb', line 11

def document
  @document
end

#domNokogiri::HTML::Document (readonly)

Returns the DOM that is traversed.

Returns:

  • (Nokogiri::HTML::Document)

    the DOM that is traversed



8
9
10
# File 'lib/mdn_query/traverse_dom.rb', line 8

def dom
  @dom
end

Class Method Details

.create_document(dom, title, url) ⇒ MdnQuery::Document

Creates a new document with the extracted text.

Parameters:

  • dom (Nokogiri::HTML::Document)

    the DOM that is traversed

  • title (String)

    the title of the document

  • url (String)

    the URL to the document on the web

Returns:



22
23
24
25
# File 'lib/mdn_query/traverse_dom.rb', line 22

def self.create_document(dom, title, url)
  document = MdnQuery::Document.new(title, url)
  fill_document(dom, document)
end

.fill_document(dom, document) ⇒ MdnQuery::Document

Fills a document with the extracted text.

Parameters:

  • dom (Nokogiri::HTML::Document)

    the DOM that is traversed

  • document (MdnQuery::Document)

    the document to be filled

Returns:



32
33
34
35
36
# File 'lib/mdn_query/traverse_dom.rb', line 32

def self.fill_document(dom, document)
  traverser = new(dom, document: document)
  traverser.traverse
  traverser.document
end

Instance Method Details

#blacklisted?(id) ⇒ Boolean

Returns whether the id is blacklisted.

Parameters:

  • id (String)

    the id to be tested

Returns:

  • (Boolean)


123
124
125
# File 'lib/mdn_query/traverse_dom.rb', line 123

def blacklisted?(id)
  BLACKLIST.include?(id)
end

#create_child(desired_level, name) ⇒ MdnQuery::Section

Creates a new child section on the appropriate parent section.

Parameters:

  • desired_level (Fixnum)

    the desired level for the child section

  • name (String)

    the name and title of the child section

Returns:



61
62
63
64
65
66
67
# File 'lib/mdn_query/traverse_dom.rb', line 61

def create_child(desired_level, name)
  until @current_section.level < desired_level ||
        @current_section.parent.nil?
    @current_section = @current_section.parent
  end
  @current_section = @current_section.create_child(name)
end

#traversevoid

This method returns an undefined value.

Traverses the DOM and extracts relevant informations into the document.



72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/mdn_query/traverse_dom.rb', line 72

def traverse
  unless @dom.css('div.nonStandard').empty?
    @current_section.append_text("\n> ***Non-standard***\n")
  end
  blacklist_level = nil
  @dom.children.each do |child|
    if child_blacklisted?(child, blacklist_level)
      if blacklist_level.nil?
        blacklist_level = child.name.match(/\Ah(?<level>\d)\z/)[:level]
      end
      next
    end
    blacklist_level = nil
    case child.name
    when 'p'
      @current_section.append_text(child.text)
    when 'ul'
      @current_section.append_text(convert_list(child))
    when 'dl'
      @current_section.append_text(convert_description(child))
    when 'pre'
      if child[:class].nil?
        @current_section.append_code(child.text)
        next
      end
      match = child[:class].match(/brush:\s*(?<lang>\w+)/)
      if match.nil?
        @current_section.append_code(child.text)
      else
        lang = match[:lang]
        lang = 'javascript' if lang == 'js'
        @current_section.append_code(child.text, language: lang)
      end
    when /\Ah(?<level>\d)\z/
      level = $LAST_MATCH_INFO[:level].to_i
      create_child(level, child[:id].tr('_', ' '))
    when 'table'
      @current_section.append_text(convert_table(child))
    when 'div'
      next if child[:class].nil?
      if child[:class].include?('note') || child[:class].include?('warning')
        @current_section.append_text("\n> #{child.text.strip}\n")
      end
    end
  end
end