Module: FriendlyFormat

Defined in:: lib/friendly_format.rb,
lib/friendly_format/version.rb,
lib/friendly_format/set_common.rb,
lib/friendly_format/set_strict.rb,
lib/friendly_format/adapter/hpricot_adapter.rb,
lib/friendly_format/adapter/nokogiri_adapter.rb

Overview

2008-05-09 godfat

Defined Under Namespace

Modules: HpricotAdapter, NokogiriAdapter Classes: SetCommon, SetStrict

Constant Summary collapse

VERSION =

'0.7.0'

Class Attribute Summary collapse

.adapter ⇒ Object

Class Method Summary collapse

.attrs2str(attrs) ⇒ Object
.escape_ltgt(text) ⇒ Object private
.escape_ltgt_inside_pre(html, allowed_tags) ⇒ Object private

perhaps we should escape all inside code instead of pre?.
.force_encoding(output, input) ⇒ Object private

force encoding for ruby 1.9.
.format_article(html, *args) ⇒ Object

format entire article for you, passing allowed tags to it.
.format_article_entrance(html, allowed_tags = Set.new) ⇒ Object private

recursion entrance.
.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ Object private

recursion.
.format_autolink(html, attrs = {}) ⇒ Object

automaticly add “a href” tag on text starts from http/ftp/mailto/etc protocol.
.format_autolink_rec(elem, attrs = {}) ⇒ Object private
.format_autolink_regexp(text, attrs = {}) ⇒ Object

translated from drupal-6.2/modules/filter/filter.module same as format_autolink, but doesn’t use Hpricot, use only regexp.
.format_newline(text) ⇒ Object

convert newline character(s) to <br />.
.format_url(text, attrs = {}) ⇒ Object private

same as format_autolink_regexp, but it’s simplified and cannot process text composed with html and plain text.
.node_attrs(node) ⇒ Object
.node_attrs_reject_js(node) ⇒ Object
.node_tag_escape(node) ⇒ Object
.node_tag_normal(node) ⇒ Object
.node_tag_single(node) ⇒ Object
.trim(text, length = 75) ⇒ Object private

extract it to public?.

Class Attribute Details

.adapter ⇒ `Object`

# File 'lib/friendly_format.rb', line 14

def adapter
  @adapter ||= begin
                 HpricotAdapter
               rescue LoadError
                 begin
                   NokogiriAdapter
                 rescue LoadError
                   LibxmlAdapter
                 end
               end
end

Class Method Details

.attrs2str(attrs) ⇒ `Object`



250
251
252

# File 'lib/friendly_format.rb', line 250

def attrs2str attrs
  attrs.sort.inject(''){ |i, (k, v)| i + " #{k}=\"#{v}\"" }
end

.escape_ltgt(text) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



216
217
218

# File 'lib/friendly_format.rb', line 216

def escape_ltgt text
  text.gsub('<', '&lt;').gsub('>', '&gt;')
end

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

perhaps we should escape all inside code instead of pre?

# File 'lib/friendly_format.rb', line 137

def escape_ltgt_inside_pre html, allowed_tags
  return html unless allowed_tags.member?('pre')
  # don't bother nested pre, because we escape all tags in pre
  html = html + '</pre>' unless html =~ %r{</pre>}i
  html.gsub(%r{<pre>(.*)</pre>}mi){
    # stop escaping for '>' because drupal's url filter would make &gt; into url...
    # is there any other way to get matched group?
    "<pre>#{escape_ltgt($1)}</pre>"
  }
end

.force_encoding(output, input) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

force encoding for ruby 1.9

# File 'lib/friendly_format.rb', line 222

def force_encoding output, input
  if output.respond_to?(:force_encoding)
    output.force_encoding(input.encoding)
  else
    output
  end
end

.format_article(html, *args) ⇒ `Object`

format entire article for you, passing allowed tags to it. you can use Set or Symbol to specify which tags would be allowed. default was no tags at all, all tags would be escaped. it uses Hpricot to parse input.

# File 'lib/friendly_format.rb', line 32

def format_article html, *args
  return html if html.strip == ''

  FriendlyFormat.force_encoding(
    FriendlyFormat.format_article_entrance(html,
      args.inject(Set.new){ |allowed_tags, arg|
        case arg
          when String; allowed_tags << arg
          when Symbol; allowed_tags << arg.to_s
          when Set;    allowed_tags += Set.new(arg.map{|a|a.to_s})
          else; raise(TypeError.new("expected String|Symbol|Set, got #{arg.class}"))
        end
        allowed_tags
      }),
    html)
end

.format_article_entrance(html, allowed_tags = Set.new) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

recursion entrance

# File 'lib/friendly_format.rb', line 173

def format_article_entrance html, allowed_tags = Set.new
  format_article_rec(
    adapter.parse(escape_ltgt_inside_pre(html, allowed_tags)),
    allowed_tags)
end

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

recursion

# File 'lib/friendly_format.rb', line 181

def format_article_rec(elem, allowed_tags = Set.new, tag_name = nil)

  elem.children.map{ |e|
    if e.text?
      result = e.to_html
      case tag_name
        when 'pre'; format_url(    result)
        when   'a'; format_newline(result)
        else      ; format_newline(format_url(result))
      end

    elsif e.elem?
      if allowed_tags.member?(e.name)
        if adapter.empty?(e)
          node_tag_single(e)
        else
          node_tag_normal(e) +
          format_article_rec(e, allowed_tags, e.name) +
          "</#{e.name}>"
        end
      else
        node_tag_escape(e) +
        if adapter.empty?(e)
          ''
        else
          format_article_rec(e, allowed_tags) +
          "&lt;/#{e.name}&gt;"
        end
      end

    end
  }.join
end

.format_autolink(html, attrs = {}) ⇒ `Object`

automaticly add “a href” tag on text starts from http/ftp/mailto/etc protocol. use Hpricot to parse and regexp translated from drupal to find where’s the target. it uses simplified regexp to do the task. see format_url.

# File 'lib/friendly_format.rb', line 53

def format_autolink html, attrs = {}
  return html if html.strip == ''

  FriendlyFormat.force_encoding(
    FriendlyFormat.format_autolink_rec(
      FriendlyFormat.adapter.parse(html), attrs),
    html)
end

.format_autolink_rec(elem, attrs = {}) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

# File 'lib/friendly_format.rb', line 149

def format_autolink_rec elem, attrs = {}
  elem.children.map{ |e|
    if e.text?
      format_url(e.content, attrs)

    elsif e.elem?
      if adapter.empty?(e)
        adapter.to_xhtml(e)
      else
        node_tag_normal(e) +
        format_autolink_rec(e, attrs) +
        "</#{e.name}>"
      end

    else
      e

    end

  }.join
end

.format_autolink_regexp(text, attrs = {}) ⇒ `Object`

translated from drupal-6.2/modules/filter/filter.module same as format_autolink, but doesn’t use Hpricot, use only regexp.

# File 'lib/friendly_format.rb', line 65

def format_autolink_regexp text, attrs = {}
  attrs = attrs.map{ |k,v| " #{k}=\"#{v}\""}.join
  # Match absolute URLs.
  " #{text}".gsub(%r{(<p>|<li>|<br\s*/?>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)])?)}i){ |match|
    match = [match, $1, $2, $3, $4, $5]
    match[2] = match[2] # escape something here
    caption = FriendlyFormat.trim match[2]
    # match[2] = sanitize match[2]
    match[1]+'<a href="'+match[2]+'" title="'+match[2]+"\"#{attrs}>"+
      caption+'</a>'+match[5]

  # Match e-mail addresses.
  }.gsub(%r{(<p>|<li>|<br\s*/?>|[ \n\r\t\(])([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))}i, '\1<a href="mailto:\2">\2</a>\3').

  # Match www domains/addresses.
  gsub(%r{(<p>|<li>|[ \n\r\t\(])(www\.[a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+~#\&=/;-])([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))}i){ |match|
    match = [match, $1, $2, $3, $4, $5]
    match[2] = match[2] # escape something here
    caption = FriendlyFormat.trim match[2]
    # match[2] = sanitize match[2]
    match[1]+'<a href="http://'+match[2]+'" title="http://'+match[2]+"\"#{attrs}>"+
      caption+'</a>'+match[3]
  }[1..-1]
end

.format_newline(text) ⇒ `Object`

convert newline character(s) to <br />

# File 'lib/friendly_format.rb', line 91

def format_newline text
  # windows: \r\n
  # mac os 9: \r
  text.gsub("\r\n", "\n").tr("\r", "\n").gsub("\n", "<br />\n")
end

.format_url(text, attrs = {}) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

same as format_autolink_regexp, but it’s simplified and cannot process text composed with html and plain text. used in format_autolink.

# File 'lib/friendly_format.rb', line 118

def format_url text, attrs = {}
  # translated from drupal-6.2/modules/filter/filter.module
  # Match absolute URLs.
  text.gsub(
  %r{((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://|www\.)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)}i){ |match|
    url = $1 # is there any other way to get this variable?
    caption = trim(url)
    html_attrs = attrs.map{ |k,v| " #{k}=\"#{v}\""}.join

    # Match www domains/addresses.
    url = "http://#{url}" unless url =~ %r{^http://}
    "<a href=\"#{url}\" title=\"#{url}\"#{html_attrs}>#{caption}</a>"
  # Match e-mail addresses.
  }.gsub( %r{([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)}i,
          '<a href="mailto:\1">\1</a>')
end

.node_attrs(node) ⇒ `Object`



242
243
244

# File 'lib/friendly_format.rb', line 242

def node_attrs node
  attrs2str(node.attributes)
end

.node_attrs_reject_js(node) ⇒ `Object`



246
247
248

# File 'lib/friendly_format.rb', line 246

def node_attrs_reject_js node
  attrs2str(node.attributes.reject{ |k, v| k =~ /\Aon/ })
end

.node_tag_escape(node) ⇒ `Object`



238
239
240

# File 'lib/friendly_format.rb', line 238

def node_tag_escape node
  "&lt;#{node.name}#{node_attrs(node)}&gt;"
end

.node_tag_normal(node) ⇒ `Object`



234
235
236

# File 'lib/friendly_format.rb', line 234

def node_tag_normal node
  "<#{node.name}#{node_attrs_reject_js(node)}>"
end

.node_tag_single(node) ⇒ `Object`



230
231
232

# File 'lib/friendly_format.rb', line 230

def node_tag_single node
  "<#{node.name}#{node_attrs_reject_js(node)} />"
end

.trim(text, length = 75) ⇒ `Object`

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

extract it to public?

# File 'lib/friendly_format.rb', line 103

def trim text, length = 75
  # Use +3 for '...' string length.
  if text.size <= 3
    '...'
  elsif text.size > length
    "#{text[0...length-3]}..."
  else
    text
  end
end

Module: FriendlyFormat

Overview

Defined Under Namespace

Constant Summary collapse

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.adapter ⇒ Object

Class Method Details

.attrs2str(attrs) ⇒ Object

.escape_ltgt(text) ⇒ Object

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ Object

.force_encoding(output, input) ⇒ Object

.format_article(html, *args) ⇒ Object

.format_article_entrance(html, allowed_tags = Set.new) ⇒ Object

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ Object

.format_autolink(html, attrs = {}) ⇒ Object

.format_autolink_rec(elem, attrs = {}) ⇒ Object

.format_autolink_regexp(text, attrs = {}) ⇒ Object

.format_newline(text) ⇒ Object

.format_url(text, attrs = {}) ⇒ Object

.node_attrs(node) ⇒ Object

.node_attrs_reject_js(node) ⇒ Object

.node_tag_escape(node) ⇒ Object

.node_tag_normal(node) ⇒ Object

.node_tag_single(node) ⇒ Object

.trim(text, length = 75) ⇒ Object

.adapter ⇒ `Object`

.attrs2str(attrs) ⇒ `Object`

.escape_ltgt(text) ⇒ `Object`

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ `Object`

.force_encoding(output, input) ⇒ `Object`

.format_article(html, *args) ⇒ `Object`

.format_article_entrance(html, allowed_tags = Set.new) ⇒ `Object`

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ `Object`

.format_autolink(html, attrs = {}) ⇒ `Object`

.format_autolink_rec(elem, attrs = {}) ⇒ `Object`

.format_autolink_regexp(text, attrs = {}) ⇒ `Object`

.format_newline(text) ⇒ `Object`

.format_url(text, attrs = {}) ⇒ `Object`

.node_attrs(node) ⇒ `Object`

.node_attrs_reject_js(node) ⇒ `Object`

.node_tag_escape(node) ⇒ `Object`

.node_tag_normal(node) ⇒ `Object`

.node_tag_single(node) ⇒ `Object`

.trim(text, length = 75) ⇒ `Object`