Class: Html2rss::AttributePostProcessors::SanitizeHtml
- Defined in:
- lib/html2rss/attribute_post_processors/sanitize_html.rb
Overview
Returns sanitized HTML code as String.
It sanitizes by using the [sanitize gem](github.com/rgrove/sanitize) with [Sanitize::Config::RELAXED](github.com/rgrove/sanitize#sanitizeconfigrelaxed).
Furthermore, it adds:
-
‘rel=“nofollow noopener noreferrer”` to <a> tags
-
‘referrer-policy=’no-referrer’‘ to <img> tags
-
wraps all <img> tags, whose direct parent is not an <a>, into an <a> linking to the <img>‘s `src`.
Imagine this HTML structure:
<section>
Lorem <b>ipsum</b> dolor...
<iframe src="https://evil.corp/miner"></iframe>
<script>alert();</script>
</section>
YAML usage example:
selectors:
description:
selector: '.section'
extractor: html
post_process:
name: sanitize_html
Would return:
'<p>Lorem <b>ipsum</b> dolor ...</p>'
Instance Attribute Summary
Attributes inherited from Base
Class Method Summary collapse
-
.get(html, url) ⇒ Object
Shorthand method to get the sanitized HTML.
- .validate_args!(value, context) ⇒ Object
Instance Method Summary collapse
Methods inherited from Base
assert_type, expect_options, #initialize
Constructor Details
This class inherits a constructor from Html2rss::AttributePostProcessors::Base
Class Method Details
.get(html, url) ⇒ Object
Shorthand method to get the sanitized HTML.
50 51 52 53 54 55 |
# File 'lib/html2rss/attribute_post_processors/sanitize_html.rb', line 50 def self.get(html, url) raise ArgumentError, 'url must be a String or Addressable::URI' if url.to_s.empty? return nil if html.to_s.empty? new(html, { config: Config::Channel.new({ url: }) }).get end |
.validate_args!(value, context) ⇒ Object
42 43 44 |
# File 'lib/html2rss/attribute_post_processors/sanitize_html.rb', line 42 def self.validate_args!(value, context) assert_type value, String, :value, context: end |
Instance Method Details
#get ⇒ String
59 60 61 62 |
# File 'lib/html2rss/attribute_post_processors/sanitize_html.rb', line 59 def get sanitized_html = Sanitize.fragment(value, sanitize_config) sanitized_html.to_s.gsub(/\s+/, ' ').strip end |