Module: FriendlyFormat

Defined in:
lib/friendly_format.rb,
lib/friendly_format/version.rb,
lib/friendly_format/set_common.rb,
lib/friendly_format/set_strict.rb,
lib/friendly_format/adapter/hpricot_adapter.rb,
lib/friendly_format/adapter/nokogiri_adapter.rb

Overview

2008-05-09 godfat

Defined Under Namespace

Modules: HpricotAdapter, NokogiriAdapter Classes: SetCommon, SetStrict

Constant Summary collapse

VERSION =
'0.7.3'

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.adapterObject



13
14
15
16
17
18
19
# File 'lib/friendly_format.rb', line 13

def adapter
  @adapter ||= begin
                 NokogiriAdapter
               rescue LoadError
                 HpricotAdapter
               end
end

Class Method Details

.attrs2str(attrs) ⇒ Object



248
249
250
251
# File 'lib/friendly_format.rb', line 248

def attrs2str attrs
  # TODO: no need to convert to hash for nokogiri
  Hash[attrs].sort.inject(''){ |i, (k, v)| i + " #{k}=\"#{v}\"" }
end

.escape_ltgt(text) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



211
212
213
# File 'lib/friendly_format.rb', line 211

def escape_ltgt text
  text.gsub('<', '&lt;').gsub('>', '&gt;')
end

.escape_ltgt_inside_pre(html, allowed_tags) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

perhaps we should escape all inside code instead of pre?



132
133
134
135
136
137
138
139
140
141
# File 'lib/friendly_format.rb', line 132

def escape_ltgt_inside_pre html, allowed_tags
  return html unless allowed_tags.member?('pre')
  # don't bother nested pre, because we escape all tags in pre
  html = html + '</pre>' unless html =~ %r{</pre>}i
  html.gsub(%r{<pre>(.*)</pre>}mi){
    # stop escaping for '>' because drupal's url filter would make &gt; into url...
    # is there any other way to get matched group?
    "<pre>#{escape_ltgt($1)}</pre>"
  }
end

.force_encoding(output, input) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

force encoding for ruby 1.9



217
218
219
220
221
222
223
# File 'lib/friendly_format.rb', line 217

def force_encoding output, input
  if output.respond_to?(:force_encoding)
    output.force_encoding(input.encoding)
  else
    output
  end
end

.format_article(html, *args) ⇒ Object

format entire article for you, passing allowed tags to it. you can use Set or Symbol to specify which tags would be allowed. default was no tags at all, all tags would be escaped. it uses Hpricot to parse input.



27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# File 'lib/friendly_format.rb', line 27

def format_article html, *args
  return html if html.strip == ''

  FriendlyFormat.force_encoding(
    FriendlyFormat.format_article_entrance(html,
      args.inject(Set.new){ |allowed_tags, arg|
        case arg
          when String; allowed_tags << arg
          when Symbol; allowed_tags << arg.to_s
          when Set;    allowed_tags += Set.new(arg.map{|a|a.to_s})
          else; raise(TypeError.new("expected String|Symbol|Set, got #{arg.class}"))
        end
        allowed_tags
      }),
    html)
end

.format_article_entrance(html, allowed_tags = Set.new) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

recursion entrance



168
169
170
171
172
# File 'lib/friendly_format.rb', line 168

def format_article_entrance html, allowed_tags = Set.new
  format_article_rec(
    adapter.parse(escape_ltgt_inside_pre(html, allowed_tags)),
    allowed_tags)
end

.format_article_rec(elem, allowed_tags = Set.new, tag_name = nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

recursion



176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/friendly_format.rb', line 176

def format_article_rec(elem, allowed_tags = Set.new, tag_name = nil)

  elem.children.map{ |e|
    if e.text?
      result = e.to_html
      case tag_name
        when 'pre'; format_url(    result)
        when   'a'; format_newline(result)
        else      ; format_newline(format_url(result))
      end

    elsif e.elem?
      if allowed_tags.member?(e.name)
        if adapter.empty?(e)
          node_tag_single(e)
        else
          node_tag_normal(e) +
          format_article_rec(e, allowed_tags, e.name) +
          "</#{e.name}>"
        end
      else
        node_tag_escape(e) +
        if adapter.empty?(e)
          ''
        else
          format_article_rec(e, allowed_tags) +
          "&lt;/#{e.name}&gt;"
        end
      end

    end
  }.join
end

automaticly add “a href” tag on text starts from http/ftp/mailto/etc protocol. use Hpricot to parse and regexp translated from drupal to find where’s the target. it uses simplified regexp to do the task. see format_url.



48
49
50
51
52
53
54
55
# File 'lib/friendly_format.rb', line 48

def format_autolink html, attrs = {}
  return html if html.strip == ''

  FriendlyFormat.force_encoding(
    FriendlyFormat.format_autolink_rec(
      FriendlyFormat.adapter.parse(html), attrs),
    html)
end

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/friendly_format.rb', line 144

def format_autolink_rec elem, attrs = {}
  elem.children.map{ |e|
    if e.text?
      format_url(e.content, attrs)

    elsif e.elem?
      if adapter.empty?(e)
        adapter.to_xhtml(e)
      else
        node_tag_normal(e) +
        format_autolink_rec(e, attrs) +
        "</#{e.name}>"
      end

    else
      e

    end

  }.join
end

translated from drupal-6.2/modules/filter/filter.module same as format_autolink, but doesn’t use Hpricot, use only regexp.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/friendly_format.rb', line 60

def format_autolink_regexp text, attrs = {}
  attrs = attrs.map{ |k,v| " #{k}=\"#{v}\""}.join
  # Match absolute URLs.
  " #{text}".gsub(%r{(<p>|<li>|<br\s*/?>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)])?)}i){ |match|
    match = [match, $1, $2, $3, $4, $5]
    match[2] = match[2] # escape something here
    caption = FriendlyFormat.trim match[2]
    # match[2] = sanitize match[2]
    match[1]+'<a href="'+match[2]+'" title="'+match[2]+"\"#{attrs}>"+
      caption+'</a>'+match[5]

  # Match e-mail addresses.
  }.gsub(%r{(<p>|<li>|<br\s*/?>|[ \n\r\t\(])([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))}i, '\1<a href="mailto:\2">\2</a>\3').

  # Match www domains/addresses.
  gsub(%r{(<p>|<li>|[ \n\r\t\(])(www\.[a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+~#\&=/;-])([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))}i){ |match|
    match = [match, $1, $2, $3, $4, $5]
    match[2] = match[2] # escape something here
    caption = FriendlyFormat.trim match[2]
    # match[2] = sanitize match[2]
    match[1]+'<a href="http://'+match[2]+'" title="http://'+match[2]+"\"#{attrs}>"+
      caption+'</a>'+match[3]
  }[1..-1]
end

.format_newline(text) ⇒ Object

convert newline character(s) to <br />



86
87
88
89
90
# File 'lib/friendly_format.rb', line 86

def format_newline text
  # windows: \r\n
  # mac os 9: \r
  text.gsub("\r\n", "\n").tr("\r", "\n").gsub("\n", "<br />\n")
end

.format_url(text, attrs = {}) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

same as format_autolink_regexp, but it’s simplified and cannot process text composed with html and plain text. used in format_autolink.



113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/friendly_format.rb', line 113

def format_url text, attrs = {}
  # translated from drupal-6.2/modules/filter/filter.module
  # Match absolute URLs.
  text.gsub(
  %r{((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://|www\.)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)}i){ |match|
    url = $1 # is there any other way to get this variable?
    caption = trim(url)
    html_attrs = attrs.map{ |k,v| " #{k}=\"#{v}\""}.join

    # Match www domains/addresses.
    url = "http://#{url}" unless url =~ %r{^http://}
    "<a href=\"#{url}\" title=\"#{url}\"#{html_attrs}>#{caption}</a>"
  # Match e-mail addresses.
  }.gsub( %r{([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)}i,
          '<a href="mailto:\1">\1</a>')
end

.node_attrs(node) ⇒ Object



237
238
239
# File 'lib/friendly_format.rb', line 237

def node_attrs node
  attrs2str(node.attributes)
end

.node_attrs_reject_js(node) ⇒ Object



241
242
243
244
245
246
# File 'lib/friendly_format.rb', line 241

def node_attrs_reject_js node
  # TODO: no need to convert to hash for nokogiri
  attrs2str(Hash[node.attributes].reject{ |k, v|
    k      =~ /\Aon/ ||
    v.to_s =~ /\Ajavascript/ })
end

.node_tag_escape(node) ⇒ Object



233
234
235
# File 'lib/friendly_format.rb', line 233

def node_tag_escape node
  "&lt;#{node.name}#{node_attrs(node)}&gt;"
end

.node_tag_normal(node) ⇒ Object



229
230
231
# File 'lib/friendly_format.rb', line 229

def node_tag_normal node
  "<#{node.name}#{node_attrs_reject_js(node)}>"
end

.node_tag_single(node) ⇒ Object



225
226
227
# File 'lib/friendly_format.rb', line 225

def node_tag_single node
  "<#{node.name}#{node_attrs_reject_js(node)} />"
end

.trim(text, length = 75) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

extract it to public?



98
99
100
101
102
103
104
105
106
107
# File 'lib/friendly_format.rb', line 98

def trim text, length = 75
  # Use +3 for '...' string length.
  if text.size <= 3
    '...'
  elsif text.size > length
    "#{text[0...length-3]}..."
  else
    text
  end
end