Class: Sanitize
- Inherits:
-
Object
- Object
- Sanitize
- Defined in:
- lib/sanitize.rb,
lib/sanitize/css.rb,
lib/sanitize/config.rb,
lib/sanitize/version.rb,
lib/sanitize/config/basic.rb,
lib/sanitize/config/default.rb,
lib/sanitize/config/relaxed.rb,
lib/sanitize/config/restricted.rb,
lib/sanitize/transformers/clean_css.rb,
lib/sanitize/transformers/clean_cdata.rb,
lib/sanitize/transformers/clean_comment.rb,
lib/sanitize/transformers/clean_doctype.rb,
lib/sanitize/transformers/clean_element.rb
Defined Under Namespace
Modules: Config, Transformers Classes: CSS, Error
Constant Summary collapse
- REGEX_HTML_CONTROL_CHARACTERS =
Matches one or more control characters that should be removed from HTML before parsing, as defined by the HTML living standard.
/[\u0001-\u0008\u000b\u000e-\u001f\u007f-\u009f]+/u
- REGEX_HTML_NON_CHARACTERS =
Matches one or more non-characters that should be removed from HTML before parsing, as defined by the HTML living standard.
/[\ufdd0-\ufdef\ufffe\uffff\u{1fffe}\u{1ffff}\u{2fffe}\u{2ffff}\u{3fffe}\u{3ffff}\u{4fffe}\u{4ffff}\u{5fffe}\u{5ffff}\u{6fffe}\u{6ffff}\u{7fffe}\u{7ffff}\u{8fffe}\u{8ffff}\u{9fffe}\u{9ffff}\u{afffe}\u{affff}\u{bfffe}\u{bffff}\u{cfffe}\u{cffff}\u{dfffe}\u{dffff}\u{efffe}\u{effff}\u{ffffe}\u{fffff}\u{10fffe}\u{10ffff}]+/u
- REGEX_PROTOCOL =
Matches an attribute value that could be treated by a browser as a URL with a protocol prefix, such as “http:” or “javascript:”. Any string of zero or more characters followed by a colon is considered a match, even if the colon is encoded as an entity and even if it’s an incomplete entity (which IE6 and Opera will still parse).
/\A\s*([^\/#]*?)(?:\:|�*58|�*3a)/i
- REGEX_UNSUITABLE_CHARS =
Matches one or more characters that should be stripped from HTML before parsing. This is a combination of ‘REGEX_HTML_CONTROL_CHARACTERS` and `REGEX_HTML_NON_CHARACTERS`.
html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
/(?:#{REGEX_HTML_CONTROL_CHARACTERS}|#{REGEX_HTML_NON_CHARACTERS})/u
- VERSION =
'6.1.3'
Instance Attribute Summary collapse
-
#config ⇒ Object
readonly
Returns the value of attribute config.
Class Method Summary collapse
-
.clean ⇒ Object
deprecated
Deprecated.
Use Sanitize.fragment instead.
-
.clean_document ⇒ Object
deprecated
Deprecated.
Use Sanitize.document instead.
-
.clean_node! ⇒ Object
deprecated
Deprecated.
Use Sanitize.node! instead.
-
.document(html, config = {}) ⇒ Object
Returns a sanitized copy of the given full html document, using the settings in config if specified.
-
.fragment(html, config = {}) ⇒ Object
Returns a sanitized copy of the given html fragment, using the settings in config if specified.
-
.node!(node, config = {}) ⇒ Object
Sanitizes the given ‘Nokogiri::XML::Node` instance and all its children.
Instance Method Summary collapse
-
#document(html) ⇒ Object
(also: #clean_document)
Returns a sanitized copy of the given html document.
-
#fragment(html) ⇒ Object
(also: #clean)
Returns a sanitized copy of the given html fragment.
-
#initialize(config = {}) ⇒ Sanitize
constructor
Returns a new Sanitize object initialized with the settings in config.
-
#node!(node) ⇒ Object
(also: #clean_node!)
Sanitizes the given ‘Nokogiri::XML::Node` and all its children, modifying it in place.
Constructor Details
#initialize(config = {}) ⇒ Sanitize
Returns a new Sanitize object initialized with the settings in config.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/sanitize.rb', line 92 def initialize(config = {}) @config = Config.merge(Config::DEFAULT, config) @transformers = Array(@config[:transformers]).dup # Default transformers always run at the end of the chain, after any custom # transformers. @transformers << Transformers::CleanElement.new(@config) @transformers << Transformers::CleanComment unless @config[:allow_comments] if @config[:elements].include?('style') scss = Sanitize::CSS.new(config) @transformers << Transformers::CSS::CleanElement.new(scss) end if @config[:attributes].values.any? {|attr| attr.include?('style') } scss ||= Sanitize::CSS.new(config) @transformers << Transformers::CSS::CleanAttribute.new(scss) end @transformers << Transformers::CleanDoctype @transformers << Transformers::CleanCDATA @transformer_config = { config: @config } end |
Instance Attribute Details
#config ⇒ Object (readonly)
Returns the value of attribute config.
20 21 22 |
# File 'lib/sanitize.rb', line 20 def config @config end |
Class Method Details
.clean ⇒ Object
Use fragment instead.
Returns a sanitized copy of the given html fragment, using the settings in config if specified.
81 82 83 |
# File 'lib/sanitize.rb', line 81 def self.fragment(html, config = {}) Sanitize.new(config).fragment(html) end |
.clean_document ⇒ Object
78 79 80 |
# File 'lib/sanitize.rb', line 78 def self.document(html, config = {}) Sanitize.new(config).document(html) end |
.clean_node! ⇒ Object
Use node! instead.
Sanitizes the given ‘Nokogiri::XML::Node` instance and all its children.
84 85 86 |
# File 'lib/sanitize.rb', line 84 def self.node!(node, config = {}) Sanitize.new(config).node!(node) end |
.document(html, config = {}) ⇒ Object
Returns a sanitized copy of the given full html document, using the settings in config if specified.
When sanitizing a document, the ‘<html>` element must be allowlisted or an error will be raised. If this is undesirable, you should probably use #fragment instead.
60 61 62 |
# File 'lib/sanitize.rb', line 60 def self.document(html, config = {}) Sanitize.new(config).document(html) end |
Instance Method Details
#document(html) ⇒ Object Also known as: clean_document
Returns a sanitized copy of the given html document.
When sanitizing a document, the ‘<html>` element must be allowlisted or an error will be raised. If this is undesirable, you should probably use #fragment instead.
123 124 125 126 127 128 129 |
# File 'lib/sanitize.rb', line 123 def document(html) return '' unless html doc = Nokogiri::HTML5.parse(preprocess(html), **@config[:parser_options]) node!(doc) to_html(doc) end |
#fragment(html) ⇒ Object Also known as: clean
Returns a sanitized copy of the given html fragment.
135 136 137 138 139 140 141 |
# File 'lib/sanitize.rb', line 135 def fragment(html) return '' unless html frag = Nokogiri::HTML5.fragment(preprocess(html), **@config[:parser_options]) node!(frag) to_html(frag) end |
#node!(node) ⇒ Object Also known as: clean_node!
Sanitizes the given ‘Nokogiri::XML::Node` and all its children, modifying it in place.
If node is a ‘Nokogiri::XML::Document`, the `<html>` element must be allowlisted or an error will be raised.
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/sanitize.rb', line 151 def node!(node) raise ArgumentError unless node.is_a?(Nokogiri::XML::Node) if node.is_a?(Nokogiri::XML::Document) unless @config[:elements].include?('html') raise Error, 'When sanitizing a document, "<html>" must be allowlisted.' end end node_allowlist = Set.new traverse(node) do |n| transform_node!(n, node_allowlist) end node end |