Class: BxBuilderChain::Processors::Html

Inherits:
Base
  • Object
show all
Defined in:
lib/bx_builder_chain/processors/html.rb

Constant Summary collapse

EXTENSIONS =
[".html", ".htm"]
CONTENT_TYPES =
["text/html"]
TEXT_CONTENT_TAGS =

We only look for headings and paragraphs

%w[h1 h2 h3 h4 h5 h6 p]

Instance Method Summary collapse

Methods included from DependencyHelper

#depends_on

Constructor Details

#initializeHtml

Returns a new instance of Html.



12
13
14
15
# File 'lib/bx_builder_chain/processors/html.rb', line 12

def initialize(*)
  depends_on "nokogiri"
  require "nokogiri"
end

Instance Method Details

#parse(data) ⇒ String

Parse the document and return the text

Parameters:

  • data (File)

Returns:

  • (String)


20
21
22
23
24
25
26
# File 'lib/bx_builder_chain/processors/html.rb', line 20

def parse(data)
  Nokogiri::HTML(data.read)
    .css(TEXT_CONTENT_TAGS.join(","))
    .map(&:inner_text)
    .join("\n\n")
    .strip
end