Class: URIChunk
- Inherits:
-
Chunk::Abstract
- Object
- Chunk::Abstract
- URIChunk
- Includes:
- URI::REGEXP::PATTERN
- Defined in:
- app/models/chunks/uri.rb
Overview
This wiki chunk matches arbitrary URIs, using patterns from the Ruby URI modules. It parses out a variety of fields that could be used by renderers to format the links in various ways (shortening domain names, hiding email addresses) It matches email addresses and host.com.au domains without schemes (http://) but adds these on as required.
The heuristic used to match a URI is designed to err on the side of caution. That is, it is more likely to not autolink a URI than it is to accidently autolink something that is not a URI. The reason behind this is it is easier to force a URI link by prefixing ‘http://’ to it than it is to escape and incorrectly marked up non-URI.
I’m using a part of the [ISO 3166-1 Standard] for country name suffixes. The generic names are from www.bnoack.com/data/countrycode2.html)
[iso3166]: http://geotags.com/iso3166/
Constant Summary collapse
- GENERIC =
'(?:aero|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org)'
- COUNTRY =
'(?:au|at|be|ca|ch|de|dk|fr|hk|in|ir|it|jp|nl|no|pt|ru|se|sw|tv|tw|uk|us)'
- TLDS =
These are needed otherwise HOST will match almost anything
"\\.(?:#{GENERIC}|#{COUNTRY})\\b"
- USERINFO =
Redefine USERINFO so that it must have non-zero length
"(?:[#{UNRESERVED};:&=+$,]|#{ESCAPED})+"
- URI_ENDING =
Pattern of legal URI endings to stop interference with some Textile markup. (Images: !URI!) and other punctuation eg, (wiki.com/)
'[)!]'
- URI_PATTERN =
The basic URI expression as a string
"(?:(#{SCHEME})://)?" + # Optional scheme:// (\1|\8) "(?:(#{USERINFO})@)?" + # Optional userinfo@ (\2|\9) "(#{HOSTNAME}#{TLDS})" + # Mandatory host eg, HOST.com.au (\3|\10) "(?::(#{PORT}))?" + # Optional :port (\4|\11) "(#{ABS_PATH})?" + # Optional absolute path (\5|\12) "(?:\\?(#{QUERY}))?" + # Optional ?query (\6|\13) "(?:\\#(#{FRAGMENT}))?"
Instance Attribute Summary collapse
-
#fragment ⇒ Object
readonly
Returns the value of attribute fragment.
-
#host ⇒ Object
readonly
Returns the value of attribute host.
-
#link_text ⇒ Object
readonly
Returns the value of attribute link_text.
-
#path ⇒ Object
readonly
Returns the value of attribute path.
-
#port ⇒ Object
readonly
Returns the value of attribute port.
-
#query ⇒ Object
readonly
Returns the value of attribute query.
-
#scheme ⇒ Object
readonly
Returns the value of attribute scheme.
-
#uri ⇒ Object
readonly
Returns the value of attribute uri.
-
#user ⇒ Object
readonly
Returns the value of attribute user.
Attributes inherited from Chunk::Abstract
Class Method Summary collapse
Instance Method Summary collapse
-
#escaped_text ⇒ Object
If there is no hostname in the URI, do not render it It’s probably only contains the scheme, eg ‘something:’.
-
#initialize(match_data, revision) ⇒ URIChunk
constructor
A new instance of URIChunk.
-
#unmask(content) ⇒ Object
If the text should be escaped then don’t keep this chunk.
Methods inherited from Chunk::Abstract
#mask, #post_mask, #pre_mask, #revert
Constructor Details
#initialize(match_data, revision) ⇒ URIChunk
Returns a new instance of URIChunk.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'app/models/chunks/uri.rb', line 55 def initialize(match_data, revision) super(match_data, revision) # Since the URI_PATTERN is tried twice, there are two sets of # groups, one from \1 to \7 and the second from \8 to \14. # The fields are set by which ever group matches. @scheme = match_data[1] || match_data[8] @user = match_data[2] || match_data[9] @host = match_data[3] || match_data[10] @port = match_data[4] || match_data[11] @path = match_data[5] || match_data[12] @query = match_data[6] || match_data[13] @fragment = match_data[7] || match_data[14] # If there is no scheme, add an appropriate one, otherwise # set the URI to the matched text. @text_scheme = scheme @uri = (scheme ? match_data[0] : nil ) @scheme = scheme || ( user ? 'mailto' : 'http' ) @delimiter = ( scheme == 'mailto' ? ':' : '://' ) @uri ||= scheme + @delimiter + match_data[0] # Build up the link text. Schemes are omitted unless explicitly given. @link_text = '' @link_text << "#{@scheme}#{@delimiter}" if @text_scheme @link_text << "#{@user}@" if @user @link_text << "#{@host}" if @host @link_text << ":#{@port}" if @port @link_text << "#{@path}" if @path @link_text << "?#{@query}" if @query end |
Instance Attribute Details
#fragment ⇒ Object (readonly)
Returns the value of attribute fragment.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def fragment @fragment end |
#host ⇒ Object (readonly)
Returns the value of attribute host.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def host @host end |
#link_text ⇒ Object (readonly)
Returns the value of attribute link_text.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def link_text @link_text end |
#path ⇒ Object (readonly)
Returns the value of attribute path.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def path @path end |
#port ⇒ Object (readonly)
Returns the value of attribute port.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def port @port end |
#query ⇒ Object (readonly)
Returns the value of attribute query.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def query @query end |
#scheme ⇒ Object (readonly)
Returns the value of attribute scheme.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def scheme @scheme end |
#uri ⇒ Object (readonly)
Returns the value of attribute uri.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def uri @uri end |
#user ⇒ Object (readonly)
Returns the value of attribute user.
53 54 55 |
# File 'app/models/chunks/uri.rb', line 53 def user @user end |
Class Method Details
.pattern ⇒ Object
44 45 46 47 48 49 50 51 |
# File 'app/models/chunks/uri.rb', line 44 def self.pattern() # This pattern first tries to match the URI_PATTERN that ends with # punctuation that is a valid URI character (eg, ')', '!'). If # such a match occurs, there should be no backtracking (hence the ?> ). # If the string cannot match a URI ending with URI_ENDING, then a second # attempt is tried. Regexp.new("(?>#{URI_PATTERN}(?=#{URI_ENDING}))|#{URI_PATTERN}", Regexp::EXTENDED, 'N') end |
Instance Method Details
#escaped_text ⇒ Object
If there is no hostname in the URI, do not render it It’s probably only contains the scheme, eg ‘something:’
96 |
# File 'app/models/chunks/uri.rb', line 96 def escaped_text() ( host.nil? ? @uri : nil ) end |
#unmask(content) ⇒ Object
If the text should be escaped then don’t keep this chunk. Otherwise only keep this chunk if it was substituted back into the content.
89 90 91 92 |
# File 'app/models/chunks/uri.rb', line 89 def unmask(content) return nil if escaped_text return self if content.sub!( Regexp.new(mask(content)), "<a href=\"#{uri}\">#{link_text}</a>" ) end |