Module: Loofah::TextBehavior

Included in:
HTML::Document, HTML::DocumentFragment
Defined in:
lib/loofah/instance_methods.rb

Overview

Overrides text in HTML::Document and HTML::DocumentFragment,

and mixes in +to_text+.

Instance Method Summary collapse

Instance Method Details

#text(options = {}) ⇒ Object Also known as: inner_text, to_str

Returns a plain-text version of the markup contained by the document,

with HTML entities encoded.

This method is significantly faster than #to_text, but isn't
clever about whitespace around block elements.

  Loofah.document("<h1>Title</h1><div>Content</div>").text
  # => "TitleContent"

By default, the returned text will have HTML entities
escaped. If you want unescaped entities, and you understand
that the result is unsafe to render in a browser, then you
can pass an argument as shown:

  frag = Loofah.fragment("&lt;script&gt;alert('EVIL');&lt;/script&gt;")
  # ok for browser:
  frag.text                                 # => "&lt;script&gt;alert('EVIL');&lt;/script&gt;"
  # decidedly not ok for browser:
  frag.text(:encode_special_chars => false) # => "<script>alert('EVIL');</script>"


95
96
97
98
99
100
101
102
# File 'lib/loofah/instance_methods.rb', line 95

def text(options={})
  result = serialize_root.children.inner_text rescue ""
  if options[:encode_special_chars] == false
    result # possibly dangerous if rendered in a browser
  else
    encode_special_chars result
  end
end

#to_text(options = {}) ⇒ Object

Returns a plain-text version of the markup contained by the

fragment, with HTML entities encoded.

This method is slower than #to_text, but is clever about
whitespace around block elements.

  Loofah.document("<h1>Title</h1><div>Content</div>").to_text
  # => "\nTitle\n\nContent\n"


116
117
118
# File 'lib/loofah/instance_methods.rb', line 116

def to_text(options={})
  Loofah.remove_extraneous_whitespace self.dup.scrub!(:newline_block_elements).text(options)
end