Module: FriendlyFormat

Defined in:: lib/friendly_format.rb,
lib/friendly_format/version.rb,
lib/friendly_format/set_common.rb,
lib/friendly_format/set_strict.rb,
lib/friendly_format/adapter/hpricot_adapter.rb,
lib/friendly_format/adapter/nokogiri_adapter.rb

Overview

2008-05-09 godfat

Defined Under Namespace

Modules: HpricotAdapter, NokogiriAdapter Classes: SetCommon, SetStrict

Constant Summary collapse

VERSION =

'0.7.3'

Class Attribute Summary collapse

.adapter ⇒ Object

Class Method Summary collapse

.attrs2str(attrs) ⇒ Object
.escape_ltgt(text) ⇒ Object private
.escape_ltgt_inside_pre(html, allowed_tags) ⇒ Object private

perhaps we should escape all inside code instead of pre?.
.force_encoding(output, input) ⇒ Object private

force encoding for ruby 1.9.
.format_article(html, *args) ⇒ Object

format entire article for you, passing allowed tags to it.
.format_article_entrance(html, allowed_tags = Set.new) ⇒ Object private

recursion entrance.
.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ Object private

recursion.
.format_autolink(html, attrs = {}) ⇒ Object

automaticly add “a href” tag on text starts from http/ftp/mailto/etc protocol.
.format_autolink_rec(elem, attrs = {}) ⇒ Object private
.format_autolink_regexp(text, attrs = {}) ⇒ Object

translated from drupal-6.2/modules/filter/filter.module same as format_autolink, but doesn’t use Hpricot, use only regexp.
.format_newline(text) ⇒ Object

convert newline character(s) to <br />.
.format_url(text, attrs = {}) ⇒ Object private

same as format_autolink_regexp, but it’s simplified and cannot process text composed with html and plain text.
.node_attrs(node) ⇒ Object
.node_attrs_reject_js(node) ⇒ Object
.node_tag_escape(node) ⇒ Object
.node_tag_normal(node) ⇒ Object
.node_tag_single(node) ⇒ Object
.trim(text, length = 75) ⇒ Object private

extract it to public?.

Class Attribute Details

.adapter ⇒ `Object`

# File 'lib/friendly_format.rb', line 13

def adapter
  @adapter ||= begin
                 NokogiriAdapter
               rescue LoadError
                 HpricotAdapter
               end
end

Class Method Details

.attrs2str(attrs) ⇒ `Object`

# File 'lib/friendly_format.rb', line 248

def attrs2str attrs
  # TODO: no need to convert to hash for nokogiri
  Hash[attrs].sort.inject(''){ |i, (k, v)| i + " #{k}=\"#{v}\"" }
end

.escape_ltgt(text) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



211
212
213

# File 'lib/friendly_format.rb', line 211

def escape_ltgt text
  text.gsub('<', '&lt;').gsub('>', '&gt;')
end

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

perhaps we should escape all inside code instead of pre?

# File 'lib/friendly_format.rb', line 132

def escape_ltgt_inside_pre html, allowed_tags
  return html unless allowed_tags.member?('pre')
  # don't bother nested pre, because we escape all tags in pre
  html = html + '</pre>' unless html =~ %r{</pre>}i
  html.gsub(%r{<pre>(.*)</pre>}mi){
    # stop escaping for '>' because drupal's url filter would make &gt; into url...
    # is there any other way to get matched group?
    "<pre>#{escape_ltgt($1)}</pre>"
  }
end

.force_encoding(output, input) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

force encoding for ruby 1.9

# File 'lib/friendly_format.rb', line 217

def force_encoding output, input
  if output.respond_to?(:force_encoding)
    output.force_encoding(input.encoding)
  else
    output
  end
end

.format_article(html, *args) ⇒ `Object`

format entire article for you, passing allowed tags to it. you can use Set or Symbol to specify which tags would be allowed. default was no tags at all, all tags would be escaped. it uses Hpricot to parse input.

# File 'lib/friendly_format.rb', line 27

def format_article html, *args
  return html if html.strip == ''

  FriendlyFormat.force_encoding(
    FriendlyFormat.format_article_entrance(html,
      args.inject(Set.new){ |allowed_tags, arg|
        case arg
          when String; allowed_tags << arg
          when Symbol; allowed_tags << arg.to_s
          when Set;    allowed_tags += Set.new(arg.map{|a|a.to_s})
          else; raise(TypeError.new("expected String|Symbol|Set, got #{arg.class}"))
        end
        allowed_tags
      }),
    html)
end

.format_article_entrance(html, allowed_tags = Set.new) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

recursion entrance

# File 'lib/friendly_format.rb', line 168

def format_article_entrance html, allowed_tags = Set.new
  format_article_rec(
    adapter.parse(escape_ltgt_inside_pre(html, allowed_tags)),
    allowed_tags)
end

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

recursion

# File 'lib/friendly_format.rb', line 176

def format_article_rec(elem, allowed_tags = Set.new, tag_name = nil)

  elem.children.map{ |e|
    if e.text?
      result = e.to_html
      case tag_name
        when 'pre'; format_url(    result)
        when   'a'; format_newline(result)
        else      ; format_newline(format_url(result))
      end

    elsif e.elem?
      if allowed_tags.member?(e.name)
        if adapter.empty?(e)
          node_tag_single(e)
        else
          node_tag_normal(e) +
          format_article_rec(e, allowed_tags, e.name) +
          "</#{e.name}>"
        end
      else
        node_tag_escape(e) +
        if adapter.empty?(e)
          ''
        else
          format_article_rec(e, allowed_tags) +
          "&lt;/#{e.name}&gt;"
        end
      end

    end
  }.join
end

.format_autolink(html, attrs = {}) ⇒ `Object`

automaticly add “a href” tag on text starts from http/ftp/mailto/etc protocol. use Hpricot to parse and regexp translated from drupal to find where’s the target. it uses simplified regexp to do the task. see format_url.

# File 'lib/friendly_format.rb', line 48

def format_autolink html, attrs = {}
  return html if html.strip == ''

  FriendlyFormat.force_encoding(
    FriendlyFormat.format_autolink_rec(
      FriendlyFormat.adapter.parse(html), attrs),
    html)
end

.format_autolink_rec(elem, attrs = {}) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

# File 'lib/friendly_format.rb', line 144

def format_autolink_rec elem, attrs = {}
  elem.children.map{ |e|
    if e.text?
      format_url(e.content, attrs)

    elsif e.elem?
      if adapter.empty?(e)
        adapter.to_xhtml(e)
      else
        node_tag_normal(e) +
        format_autolink_rec(e, attrs) +
        "</#{e.name}>"
      end

    else
      e

    end

  }.join
end

.format_autolink_regexp(text, attrs = {}) ⇒ `Object`

translated from drupal-6.2/modules/filter/filter.module same as format_autolink, but doesn’t use Hpricot, use only regexp.

# File 'lib/friendly_format.rb', line 60

def format_autolink_regexp text, attrs = {}
  attrs = attrs.map{ |k,v| " #{k}=\"#{v}\""}.join
  # Match absolute URLs.
  " #{text}".gsub(%r{(<p>|<li>|<br\s*/?>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)])?)}i){ |match|
    match = [match, $1, $2, $3, $4, $5]
    match[2] = match[2] # escape something here
    caption = FriendlyFormat.trim match[2]
    # match[2] = sanitize match[2]
    match[1]+'<a href="'+match[2]+'" title="'+match[2]+"\"#{attrs}>"+
      caption+'</a>'+match[5]

  # Match e-mail addresses.
  }.gsub(%r{(<p>|<li>|<br\s*/?>|[ \n\r\t\(])([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))}i, '\1<a href="mailto:\2">\2</a>\3').

  # Match www domains/addresses.
  gsub(%r{(<p>|<li>|[ \n\r\t\(])(www\.[a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+~#\&=/;-])([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))}i){ |match|
    match = [match, $1, $2, $3, $4, $5]
    match[2] = match[2] # escape something here
    caption = FriendlyFormat.trim match[2]
    # match[2] = sanitize match[2]
    match[1]+'<a href="http://'+match[2]+'" title="http://'+match[2]+"\"#{attrs}>"+
      caption+'</a>'+match[3]
  }[1..-1]
end

.format_newline(text) ⇒ `Object`

convert newline character(s) to <br />

# File 'lib/friendly_format.rb', line 86

def format_newline text
  # windows: \r\n
  # mac os 9: \r
  text.gsub("\r\n", "\n").tr("\r", "\n").gsub("\n", "<br />\n")
end

.format_url(text, attrs = {}) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

same as format_autolink_regexp, but it’s simplified and cannot process text composed with html and plain text. used in format_autolink.

# File 'lib/friendly_format.rb', line 113

def format_url text, attrs = {}
  # translated from drupal-6.2/modules/filter/filter.module
  # Match absolute URLs.
  text.gsub(
  %r{((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://|www\.)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)}i){ |match|
    url = $1 # is there any other way to get this variable?
    caption = trim(url)
    html_attrs = attrs.map{ |k,v| " #{k}=\"#{v}\""}.join

    # Match www domains/addresses.
    url = "http://#{url}" unless url =~ %r{^http://}
    "<a href=\"#{url}\" title=\"#{url}\"#{html_attrs}>#{caption}</a>"
  # Match e-mail addresses.
  }.gsub( %r{([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)}i,
          '<a href="mailto:\1">\1</a>')
end

.node_attrs(node) ⇒ `Object`



237
238
239

# File 'lib/friendly_format.rb', line 237

def node_attrs node
  attrs2str(node.attributes)
end

.node_attrs_reject_js(node) ⇒ `Object`

# File 'lib/friendly_format.rb', line 241

def node_attrs_reject_js node
  # TODO: no need to convert to hash for nokogiri
  attrs2str(Hash[node.attributes].reject{ |k, v|
    k      =~ /\Aon/ ||
    v.to_s =~ /\Ajavascript/ })
end

.node_tag_escape(node) ⇒ `Object`



233
234
235

# File 'lib/friendly_format.rb', line 233

def node_tag_escape node
  "&lt;#{node.name}#{node_attrs(node)}&gt;"
end

.node_tag_normal(node) ⇒ `Object`



229
230
231

# File 'lib/friendly_format.rb', line 229

def node_tag_normal node
  "<#{node.name}#{node_attrs_reject_js(node)}>"
end

.node_tag_single(node) ⇒ `Object`



225
226
227

# File 'lib/friendly_format.rb', line 225

def node_tag_single node
  "<#{node.name}#{node_attrs_reject_js(node)} />"
end

.trim(text, length = 75) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

extract it to public?

# File 'lib/friendly_format.rb', line 98

def trim text, length = 75
  # Use +3 for '...' string length.
  if text.size <= 3
    '...'
  elsif text.size > length
    "#{text[0...length-3]}..."
  else
    text
  end
end

Module: FriendlyFormat

Overview

Defined Under Namespace

Constant Summary collapse

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.adapter ⇒ Object

Class Method Details

.attrs2str(attrs) ⇒ Object

.escape_ltgt(text) ⇒ Object

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ Object

.force_encoding(output, input) ⇒ Object

.format_article(html, *args) ⇒ Object

.format_article_entrance(html, allowed_tags = Set.new) ⇒ Object

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ Object

.format_autolink(html, attrs = {}) ⇒ Object

.format_autolink_rec(elem, attrs = {}) ⇒ Object

.format_autolink_regexp(text, attrs = {}) ⇒ Object

.format_newline(text) ⇒ Object

.format_url(text, attrs = {}) ⇒ Object

.node_attrs(node) ⇒ Object

.node_attrs_reject_js(node) ⇒ Object

.node_tag_escape(node) ⇒ Object

.node_tag_normal(node) ⇒ Object

.node_tag_single(node) ⇒ Object

.trim(text, length = 75) ⇒ Object

.adapter ⇒ `Object`

.attrs2str(attrs) ⇒ `Object`

.escape_ltgt(text) ⇒ `Object`

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ `Object`

.force_encoding(output, input) ⇒ `Object`

.format_article(html, *args) ⇒ `Object`

.format_article_entrance(html, allowed_tags = Set.new) ⇒ `Object`

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ `Object`

.format_autolink(html, attrs = {}) ⇒ `Object`

.format_autolink_rec(elem, attrs = {}) ⇒ `Object`

.format_autolink_regexp(text, attrs = {}) ⇒ `Object`

.format_newline(text) ⇒ `Object`

.format_url(text, attrs = {}) ⇒ `Object`

.node_attrs(node) ⇒ `Object`

.node_attrs_reject_js(node) ⇒ `Object`

.node_tag_escape(node) ⇒ `Object`

.node_tag_normal(node) ⇒ `Object`

.node_tag_single(node) ⇒ `Object`

.trim(text, length = 75) ⇒ `Object`