Class: Dandruff::Sanitizer

Inherits:
Object
  • Object
show all
Defined in:
lib/dandruff.rb

Overview

Main sanitizer class handling HTML sanitization logic

This class manages the core sanitization process, configuration, and hooks. It parses HTML, removes dangerous elements and attributes, and serializes the result.

Constant Summary collapse

MATH_SVG_TAGS =
%w[math svg].freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(config = nil) {|config| ... } ⇒ Sanitizer

Initializes a new sanitizer instance

Parameters:

  • config (Config) (defaults to: nil)

    optional configuration object

Yields:

  • (config)

    optional block to configure instance config



99
100
101
102
103
104
105
# File 'lib/dandruff.rb', line 99

def initialize(config = nil)
  @removed = []
  @config = build_config(config)
  @hooks = create_hooks_map
  @is_supported = check_support
  yield(@config) if block_given?
end

Instance Attribute Details

#configObject (readonly)

Returns the value of attribute config.



93
94
95
# File 'lib/dandruff.rb', line 93

def config
  @config
end

#hooksObject (readonly)

Returns the value of attribute hooks.



93
94
95
# File 'lib/dandruff.rb', line 93

def hooks
  @hooks
end

#removedObject (readonly)

Returns the value of attribute removed.



93
94
95
# File 'lib/dandruff.rb', line 93

def removed
  @removed
end

Instance Method Details

#add_hook(entry_point, &hook_function) ⇒ Object

Hook management



108
109
110
111
112
113
# File 'lib/dandruff.rb', line 108

def add_hook(entry_point, &hook_function)
  return unless hook_function.is_a?(Proc)

  @hooks[entry_point] ||= []
  @hooks[entry_point] << hook_function
end

#clear_configObject

Clears current configuration, resetting to defaults



161
162
163
# File 'lib/dandruff.rb', line 161

def clear_config
  @config = parse_config({})
end

#configure {|config| ... } ⇒ Sanitizer

Configures the sanitizer with a block

Yields:

  • (config)

    the configuration object to modify

Returns:



155
156
157
158
# File 'lib/dandruff.rb', line 155

def configure
  yield(@config) if block_given?
  self
end

#remove_all_hooksObject



133
134
135
# File 'lib/dandruff.rb', line 133

def remove_all_hooks
  @hooks = create_hooks_map
end

#remove_hook(entry_point, hook_function = nil) ⇒ Object



115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/dandruff.rb', line 115

def remove_hook(entry_point, hook_function = nil)
  arr = @hooks[entry_point]
  return nil unless arr

  if hook_function
    idx = arr.rindex(hook_function)
    return nil unless idx

    arr.delete_at(idx)
  else
    arr.pop
  end
end

#remove_hooks(entry_point) ⇒ Object



129
130
131
# File 'lib/dandruff.rb', line 129

def remove_hooks(entry_point)
  @hooks[entry_point] = []
end

#sanitize(dirty, cfg = {}) ⇒ String, Nokogiri::XML::Document Also known as: scrub

Main sanitization method

Parses the input HTML, sanitizes elements and attributes, and returns clean HTML.

Parameters:

  • dirty (String, Nokogiri::XML::Node)

    the input to sanitize

  • cfg (Hash) (defaults to: {})

    optional configuration override

Returns:

  • (String, Nokogiri::XML::Document)

    sanitized HTML or DOM



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# File 'lib/dandruff.rb', line 172

def sanitize(dirty, cfg = {})
  return dirty unless supported?

  cfg.empty? ? ensure_config : set_config(cfg)
  @removed = []
  return '' if dirty.nil?

  # Handle empty strings - still process through DOM if return_dom is enabled
  if dirty.to_s.strip.empty?
    if @config.return_dom
      return parse_html('')
    elsif @config.return_dom_fragment
      return Nokogiri::HTML5::DocumentFragment.parse('')
    else
      return dirty.to_s
    end
  end

  dirty = dirty.to_s unless dirty.is_a?(String)
  doc = parse_html(dirty)
  sanitize_document(doc)
  output = serialize_html(doc)

  output = resanitize_until_stable(output) if @config.sanitize_until_stable

  if @config.return_dom
    return parse_html(output)
  elsif @config.return_dom_fragment
    return Nokogiri::HTML5::DocumentFragment.parse(output)
  end

  output
end

#set_config(cfg = {}) ⇒ Object

Sets configuration for the sanitizer

Parameters:

  • cfg (Hash) (defaults to: {})

    configuration options



147
148
149
# File 'lib/dandruff.rb', line 147

def set_config(cfg = {})
  @config = parse_config(cfg)
end

#supported?Boolean

Checks if the current environment supports Dandruff functionality

Returns:

  • (Boolean)

    true if Nokogiri is available, false otherwise



140
141
142
# File 'lib/dandruff.rb', line 140

def supported?
  @is_supported
end