Class: HTMLFilter

Inherits:
Object
  • Object
show all
Defined in:
lib/htmlfilter.rb

Overview

HTML Filter

HTML Filter library can be used to sanitize and sterilize HTML. A good idea if you let users submit HTML in comments, for instance.

HtmlFilter is a port of lib_filter.php, v1.15 by Cal Henderson <[email protected]> licensed under a Creative Commons Attribution-ShareAlike 2.5 License creativecommons.org/licenses/by-sa/3.0/.

Usage

hf = HTMLFilter.new
hf.filter("<b>Bold Action")  #=> "<b>Bold Action</b>"

Reference

Issues

  • The built in option constants could use a fair bit of refinement.

  • Eventually the old HtmlFilter name needs to be deprecated.

Constant Summary collapse

VERSION =

Library version.

"1.2.0"
DEFAULT =

Default settings

{
  'allowed' => {
    'a'   => ['href', 'target'],
    'img' => ['src', 'width', 'height', 'alt'],
    'b'   => [],
    'i'   => [],
    'em'  => [],
    'tt'  => [],
  },
  'no_close' => ['img', 'br', 'hr'],
  'always_close' => ['a', 'b'],
  'protocol_attributes' => ['src', 'href'],
  'allowed_protocols' => ['http', 'ftp', 'mailto'],
  'remove_blanks' => ['a', 'b'],
  'strip_comments' => true,
  'always_make_tags' => true,
  'allow_numbered_entities' => true,
  'allowed_entities' => ['amp', 'gt', 'lt', 'quot']
}
BASIC =

Basic settings are simlialr to DEFAULT but do not allow any type of links, neither a href or img.

{
  'allowed' => {
    'b'   => [],
    'i'   => [],
    'em'  => [],
    'tt'  => [],
  },
  'no_close' => ['img', 'br', 'hr'],
  'always_close' => ['a', 'b'],
  'protocol_attributes' => ['src', 'href'],
  'allowed_protocols' => ['http', 'ftp', 'mailto'],
  'remove_blanks' => ['a', 'b'],
  'strip_comments' => true,
  'always_make_tags' => true,
  'allow_numbered_entities' => true,
  'allowed_entities' => ['amp', 'gt', 'lt', 'quot']
}
STRICT =

Strict settings do not allow any tags.

{
  'allowed' => {},
  'no_close' => ['img', 'br', 'hr'],
  'always_close' => ['a', 'b'],
  'protocol_attributes' => ['src', 'href'],
  'allowed_protocols' => ['http', 'ftp', 'mailto'],
  'remove_blanks' => ['a', 'b'],
  'strip_comments' => true,
  'always_make_tags' => true,
  'allow_numbered_entities' => true,
  'allowed_entities' => ['amp', 'gt', 'lt', 'quot']
}
RELAXED =

Relaxed settings allows a great deal of HTML spec.

TODO: Need to expand upon RELAXED options.

{
  'allowed' => {
    'a'    => ['class', 'href', 'target'],
    'b'    => ['class'],
    'i'    => ['class'],
    'img'  => ['class', 'src', 'width', 'height', 'alt'],
    'div'  => ['class'],
    'pre'  => ['class'],
    'code' => ['class'],
    'ul'   => ['class'], 'ol' => ['class'], 'li' => ['class']
  },
  'no_close' => ['img', 'br', 'hr'],
  'always_close' => ['a', 'b'],
  'protocol_attributes' => ['src', 'href'],
  'allowed_protocols' => ['http', 'ftp', 'mailto'],
  'remove_blanks' => ['a', 'b'],
  'strip_comments' => true,
  'always_make_tags' => true,
  'allow_numbered_entities' => true,
  'allowed_entities' => ['amp', 'gt', 'lt', 'quot']
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = nil) ⇒ HTMLFilter

New html filter.

Provide custom options, or use one of the built-in options constants.

hf = HTMLFilter.new(HTMLFilter::RELAXED)
hf.filter(htmlstr)


174
175
176
177
178
179
180
181
182
183
184
185
# File 'lib/htmlfilter.rb', line 174

def initialize(options=nil)
  if options
    h = DEFAULT.dup
    options.each do |k,v|
      h[k.to_s] = v
    end
    options = h
  else
    options = DEFAULT.dup
  end
  options.each{ |k,v| send("#{k}=",v) }
end

Instance Attribute Details

#allow_numbered_entitiesObject

entity control option (true, false)



77
78
79
# File 'lib/htmlfilter.rb', line 77

def allow_numbered_entities
  @allow_numbered_entities
end

#allowedObject

tags and attributes that are allowed

Eg.

{
  'a' => ['href', 'target'],
  'b' => [],
  'img' => ['src', 'width', 'height', 'alt']
}


50
51
52
# File 'lib/htmlfilter.rb', line 50

def allowed
  @allowed
end

#allowed_entitiesObject

entity control option (amp, gt, lt, quot, etc.)



80
81
82
# File 'lib/htmlfilter.rb', line 80

def allowed_entities
  @allowed_entities
end

#allowed_protocolsObject

protocols which are allowed (http, ftp, mailto)



64
65
66
# File 'lib/htmlfilter.rb', line 64

def allowed_protocols
  @allowed_protocols
end

#always_closeObject

tags which must always have seperate opening and closing tags (e.g. “”)



57
58
59
# File 'lib/htmlfilter.rb', line 57

def always_close
  @always_close
end

#always_make_tagsObject

should we try and make a <b> tag out of “b>” (true, false)



74
75
76
# File 'lib/htmlfilter.rb', line 74

def always_make_tags
  @always_make_tags
end

#no_closeObject

tags which should always be self-closing (e.g. “<img />”)



53
54
55
# File 'lib/htmlfilter.rb', line 53

def no_close
  @no_close
end

#protocol_attributesObject

attributes which should be checked for valid protocols (src,href)



61
62
63
# File 'lib/htmlfilter.rb', line 61

def protocol_attributes
  @protocol_attributes
end

#remove_blanksObject

tags which should be removed if they contain no content (e.g. “” or “<b />”)



68
69
70
# File 'lib/htmlfilter.rb', line 68

def remove_blanks
  @remove_blanks
end

#strip_commentsObject

should we remove comments? (true, false)



71
72
73
# File 'lib/htmlfilter.rb', line 71

def strip_comments
  @strip_comments
end

Instance Method Details

#filter(html) ⇒ Object

Filter html string.



189
190
191
192
193
194
195
196
197
198
# File 'lib/htmlfilter.rb', line 189

def filter(html)
  @tag_counts = {}
  html = escape_comments(html)
  html = balance_html(html)
  html = check_tags(html)
  html = process_remove_blanks(html)
  html = validate_entities(html)
  #html = truncate_html(html)
  html
end