Class: TextRank::CharFilter::StripHtml

Inherits:
Nokogiri::XML::SAX::Document
  • Object
show all
Defined in:
lib/text_rank/char_filter/strip_html.rb

Overview

Character filter to remove HTML tags and convert HTML entities to text.

= Example

StripHtml.new.filter!(""Optimism", said Cacambo, "What is that?"") => "\"Optimism\", said Cacambo, \"What is that?\""

StringHtml.new.filter!("Alas! It is the obstinacy of maintaining that everything is best when it is worst.") => "Alas! It is the obstinacy of maintaining that everything is best when it is worst."

Instance Method Summary collapse

Constructor Details

#initializeStripHtml

Returns a new instance of StripHtml.



19
20
21
22
# File 'lib/text_rank/char_filter/strip_html.rb', line 19

def initialize
  super
  @text = StringIO.new
end

Instance Method Details

#filter!(text) ⇒ String

Perform the filter

Parameters:

  • text (String)

Returns:

  • (String)


27
28
29
30
31
# File 'lib/text_rank/char_filter/strip_html.rb', line 27

def filter!(text)
  @text.rewind
  Nokogiri::HTML::SAX::Parser.new(self).parse(text)
  @text.string
end