Module: Loofah::TextBehavior

Included in:
HTML4::Document, HTML4::DocumentFragment, HTML5::Document, HTML5::DocumentFragment
Defined in:
lib/loofah/concerns.rb

Overview

Overrides text in Document and DocumentFragment classes, and mixes in to_text.

Instance Method Summary collapse

Instance Method Details

#text(options = {}) ⇒ Object Also known as: inner_text, to_str

Returns a plain-text version of the markup contained by the document, with HTML entities

encoded.

This method is significantly faster than #to_text, but isn't clever about whitespace around
block elements.

  Loofah.html5_document("<h1>Title</h1><div>Content</div>").text
  # => "TitleContent"

By default, the returned text will have HTML entities escaped. If you want unescaped
entities, and you understand that the result is unsafe to render in a browser, then you can
pass an argument as shown:

  frag = Loofah.html5_fragment("&lt;script&gt;alert('EVIL');&lt;/script&gt;")
  # ok for browser:
  frag.text                                 # => "&lt;script&gt;alert('EVIL');&lt;/script&gt;"
  # decidedly not ok for browser:
  frag.text(:encode_special_chars => false) # => "<script>alert('EVIL');</script>"


94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/loofah/concerns.rb', line 94

def text(options = {})
  result = if serialize_root
    serialize_root.children.reject(&:comment?).map(&:inner_text).join("")
  else
    ""
  end
  if options[:encode_special_chars] == false
    result # possibly dangerous if rendered in a browser
  else
    encode_special_chars(result)
  end
end

#to_text(options = {}) ⇒ Object

Returns a plain-text version of the markup contained by the fragment, with HTML entities

encoded.

This method is slower than #text, but is clever about whitespace around block elements and
line break elements.

  Loofah.html5_document("<h1>Title</h1><div>Content<br>Next line</div>").to_text
  # => "\nTitle\n\nContent\nNext line\n"


120
121
122
# File 'lib/loofah/concerns.rb', line 120

def to_text(options = {})
  Loofah.remove_extraneous_whitespace(dup.scrub!(:newline_block_elements).text(options))
end