Module: URI

Extended by:
Escape
Includes:
REGEXP
Included in:
Generic
Defined in:
lib/extensions/uri/uri.rb,
lib/extensions/uri/uri/ftp.rb,
lib/extensions/uri/uri/http.rb,
lib/extensions/uri/uri/ldap.rb,
lib/extensions/uri/uri/https.rb,
lib/extensions/uri/uri/ldaps.rb,
lib/extensions/uri/uri/common.rb,
lib/extensions/uri/uri/mailto.rb,
lib/extensions/uri/uri/generic.rb

Overview

uri/common.rb

Author

Akira Yamada <[email protected]>

Revision

$Id: common.rb 31799 2011-05-29 22:49:36Z yugui $

License

You can redistribute it and/or modify it under the same term as Ruby.

Defined Under Namespace

Modules: Escape, REGEXP, Util Classes: BadURIError, Error, FTP, Generic, HTTP, HTTPS, InvalidComponentError, InvalidURIError, LDAP, LDAPS, MailTo, Parser

Constant Summary collapse

VERSION_CODE =

:stopdoc:

'000911'.freeze
VERSION =
VERSION_CODE.scan(/../).collect{|n| n.to_i}.join('.').freeze
DEFAULT_PARSER =

class Parser

Parser.new
TBLENCWWWCOMP_ =

:nodoc:

{}
TBLDECWWWCOMP_ =

:nodoc:

{}
HTML5ASCIIINCOMPAT =

[Encoding::UTF_7, Encoding::UTF_16BE, Encoding::UTF_16LE,

[]
WFKV_ =

:nodoc:

'(?:%\h\h|[^%#=;&])'
@@schemes =
{}

Class Method Summary collapse

Methods included from Escape

escape, unescape

Class Method Details

.decode_www_form(str, enc = "UTF-8") ⇒ Object

Decode URL-encoded form data from given str.

This decodes application/x-www-form-urlencoded data and returns array of key-value array. This internally uses URI.decode_www_form_component.

charset hack is not supported now because the mapping from given charset to Ruby’s encoding is not clear yet. see also www.w3.org/TR/html5/syntax.html#character-encodings-0

This refers www.w3.org/TR/html5/forms.html#url-encoded-form-data

ary = URI.decode_www_form(“a=1&a=2&b=3”) p ary #=> [[‘a’, ‘1’], [‘a’, ‘2’], [‘b’, ‘3’]] p ary.assoc(‘a’).last #=> ‘1’ p ary.assoc(‘b’).last #=> ‘3’ p ary.rassoc(‘a’).last #=> ‘2’ p Hash # => “b”=>“3”

See URI.decode_www_form_component, URI.encode_www_form



836
837
838
839
840
841
842
843
844
845
846
# File 'lib/extensions/uri/uri/common.rb', line 836

def self.decode_www_form(str, enc="UTF-8") #Encoding::UTF_8)
  return [] if str.empty?
  unless /\A#{WFKV_}*=#{WFKV_}*(?:[;&]#{WFKV_}*=#{WFKV_}*)*\z/o =~ str
    raise ArgumentError, "invalid data of application/x-www-form-urlencoded (#{str})"
  end
  ary = []
  $&.scan(/([^=;&]+)=([^;&]*)/) do
    ary << [decode_www_form_component($1, enc), decode_www_form_component($2, enc)]
  end
  ary
end

.decode_www_form_component(str, enc = "UTF-8") ⇒ Object

Decode given str of URL-encoded form data.

This decods + to SP.

See URI.encode_www_form_component, URI.decode_www_form

Raises:

  • (ArgumentError)


761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
# File 'lib/extensions/uri/uri/common.rb', line 761

def self.decode_www_form_component(str, enc="UTF-8") #Encoding::UTF_8)
  if TBLDECWWWCOMP_.empty?
    tbl = {}
    256.times do |i|
      h, l = i>>4, i&15
      tbl['%%%X%X' % [h, l]] = i.chr
      tbl['%%%x%X' % [h, l]] = i.chr
      tbl['%%%X%x' % [h, l]] = i.chr
      tbl['%%%x%x' % [h, l]] = i.chr
    end
    tbl['+'] = ' '
    begin
      TBLDECWWWCOMP_.replace(tbl)
      TBLDECWWWCOMP_.freeze
    rescue
    end
  end
  raise ArgumentError, "invalid %-encoding (#{str})" unless /\A(?:%\h\h|[^%]+)*\z/ =~ str
  str.gsub(/\+|%\h\h/, TBLDECWWWCOMP_).force_encoding(enc)
end

.encode_www_form(enum) ⇒ Object

Generate URL-encoded form data from given enum.

This generates application/x-www-form-urlencoded data defined in HTML5 from given an Enumerable object.

This internally uses URI.encode_www_form_component(str).

This doesn’t convert encodings of give items, so convert them before call this method if you want to send data as other than original encoding or mixed encoding data. (strings which is encoded in HTML5 ASCII incompatible encoding is converted to UTF-8)

This doesn’t treat files. When you send a file, use multipart/form-data.

This refers www.w3.org/TR/html5/forms.html#url-encoded-form-data

See URI.encode_www_form_component, URI.decode_www_form



799
800
801
802
803
804
805
806
807
808
809
810
811
812
# File 'lib/extensions/uri/uri/common.rb', line 799

def self.encode_www_form(enum)
  str = nil
  enum.each do |k,v|
    if str
      str << '&'
    else
      str = nil.to_s
    end
    str << encode_www_form_component(k)
    str << '='
    str << encode_www_form_component(v)
  end
  str
end

.encode_www_form_component(str) ⇒ Object

Encode given str to URL-encoded form data.

This doesn’t convert *, -, ., 0-9, A-Z, _, a-z, does convert SP to +, and convert others to %XX.

This refers www.w3.org/TR/html5/forms.html#url-encoded-form-data

See URI.decode_www_form_component, URI.encode_www_form



732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
# File 'lib/extensions/uri/uri/common.rb', line 732

def self.encode_www_form_component(str)
  if TBLENCWWWCOMP_.empty?
    tbl = {}
    256.times do |i|
      tbl[i.chr] = '%%%02X' % i
    end
    tbl[' '] = '+'
    begin
      TBLENCWWWCOMP_.replace(tbl)
      TBLENCWWWCOMP_.freeze
    rescue
    end
  end
  str = str.to_s
  if HTML5ASCIIINCOMPAT.include?(str.encoding)
    str = str.encode("UTF-8") #Encoding::UTF_8)
  else
    str = str.dup
  end
  str.force_encoding("ASCII-8BIT") #Encoding::ASCII_8BIT)
  str.gsub!(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_)
  str.force_encoding("US-ASCII") #Encoding::US_ASCII)
end

.extract(str, schemes = nil, &block) ⇒ Object

Synopsis

URI::extract(str[, schemes][,&blk])

Args

str

String to extract URIs from.

schemes

Limit URI matching to a specific schemes.

Description

Extracts URIs from a string. If block given, iterates through all matched URIs. Returns nil if block given or array with matches.

Usage

require "uri"

URI.extract("text here http://foo.example.org/bla and here mailto:[email protected] and here also.")
# => ["http://foo.example.com/bla", "mailto:[email protected]"]


680
681
682
# File 'lib/extensions/uri/uri/common.rb', line 680

def self.extract(str, schemes = nil, &block)
  DEFAULT_PARSER.extract(str, schemes, &block)
end

.join(*str) ⇒ Object

Synopsis

URI::join(str[, str, ...])

Args

str

String(s) to work with

Description

Joins URIs.

Usage

require 'uri'

p URI.join("http://localhost/","main.rbx")
# => #<URI::HTTP:0x2022ac02 URL:http://localhost/main.rbx>


652
653
654
# File 'lib/extensions/uri/uri/common.rb', line 652

def self.join(*str)
  DEFAULT_PARSER.join(*str)
end

.parse(uri) ⇒ Object

Synopsis

URI::parse(uri_str)

Args

uri_str

String with URI.

Description

Creates one of the URI’s subclasses instance from the string.

Raises

URI::InvalidURIError

Raised if URI given is not a correct one.

Usage

require 'uri'

uri = URI.parse("http://www.ruby-lang.org/")
p uri
# => #<URI::HTTP:0x202281be URL:http://www.ruby-lang.org/>
p uri.scheme 
# => "http" 
p uri.host 
# => "www.ruby-lang.org"


627
628
629
# File 'lib/extensions/uri/uri/common.rb', line 627

def self.parse(uri)
  DEFAULT_PARSER.parse(uri)
end

.regexp(schemes = nil) ⇒ Object

Synopsis

URI::regexp([match_schemes])

Args

match_schemes

Array of schemes. If given, resulting regexp matches to URIs whose scheme is one of the match_schemes.

Description

Returns a Regexp object which matches to URI-like strings. The Regexp object returned by this method includes arbitrary number of capture group (parentheses). Never rely on it’s number.

Usage

require 'uri'

# extract first URI from html_string
html_string.slice(URI.regexp)

# remove ftp URIs
html_string.sub(URI.regexp(['ftp'])

# You should not rely on the number of parentheses
html_string.scan(URI.regexp) do |*matches|
  p $&
end


715
716
717
# File 'lib/extensions/uri/uri/common.rb', line 715

def self.regexp(schemes = nil)
  DEFAULT_PARSER.make_regexp(schemes)
end

.scheme_listObject



540
541
542
# File 'lib/extensions/uri/uri/common.rb', line 540

def self.scheme_list
  @@schemes
end

.split(uri) ⇒ Object

Synopsis

URI::split(uri)

Args

uri

String with URI.

Description

Splits the string on following parts and returns array with result:

* Scheme
* Userinfo
* Host
* Port
* Registry
* Path
* Opaque
* Query
* Fragment

Usage

require 'uri'

p URI.split("http://www.ruby-lang.org/")
# => ["http", nil, "www.ruby-lang.org", nil, nil, "/", nil, nil, nil]


592
593
594
# File 'lib/extensions/uri/uri/common.rb', line 592

def self.split(uri)
  DEFAULT_PARSER.split(uri)
end