Module: URI

Extended by:
Escape
Includes:
REGEXP
Included in:
Generic
Defined in:
lib/uri.rb,
lib/uri/ftp.rb,
lib/uri/ldap.rb,
lib/uri/http.rb,
lib/uri/https.rb,
lib/uri/ldaps.rb,
lib/uri/common.rb,
lib/uri/mailto.rb,
lib/uri/generic.rb

Overview

uri/common.rb

Author

Akira Yamada <[email protected]>

Revision

$Id: common.rb 44187 2013-12-13 23:22:41Z hsbt $

License

You can redistribute it and/or modify it under the same term as Ruby.

See URI for general documentation

Defined Under Namespace

Modules: Escape, REGEXP, Util Classes: BadURIError, Error, FTP, Generic, HTTP, HTTPS, InvalidComponentError, InvalidURIError, LDAP, LDAPS, MailTo, Parser

Constant Summary collapse

VERSION_CODE =

:stopdoc:

'000911'.freeze
VERSION =
VERSION_CODE.scan(/../).collect{|n| n.to_i}.join('.').freeze
DEFAULT_PARSER =

URI::Parser.new

Parser.new
TBLENCWWWCOMP_ =

:nodoc:

{}
TBLDECWWWCOMP_ =

:nodoc:

{}
HTML5ASCIIINCOMPAT =
[Encoding::UTF_7, Encoding::UTF_16BE, Encoding::UTF_16LE,
Encoding::UTF_32BE, Encoding::UTF_32LE]
@@schemes =
{}

Class Method Summary collapse

Methods included from Escape

escape, unescape

Class Method Details

.decode_www_form(str, enc = Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false) ⇒ Object

Decode URL-encoded form data from given str.

This decodes application/x-www-form-urlencoded data and returns array of key-value array.

This refers url.spec.whatwg.org/#concept-urlencoded-parser , so this supports only &-separator, don’t support ;-separator.

ary = URI.decode_www_form(“a=1&a=2&b=3”) p ary #=> [[‘a’, ‘1’], [‘a’, ‘2’], [‘b’, ‘3’]] p ary.assoc(‘a’).last #=> ‘1’ p ary.assoc(‘b’).last #=> ‘3’ p ary.rassoc(‘a’).last #=> ‘2’ p Hash # => “b”=>“3”

See URI.decode_www_form_component, URI.encode_www_form

Raises:

  • (ArgumentError)


968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
# File 'lib/uri/common.rb', line 968

def self.decode_www_form(str, enc=Encoding::UTF_8, separator: '&', use__charset_: false, isindex: false)
  raise ArgumentError, "the input of #{self.name}.#{__method__} must be ASCII only string" unless str.ascii_only?
  ary = []
  return ary if str.empty?
  enc = Encoding.find(enc)
  str.b.each_line(separator) do |string|
    string.chomp!(separator)
    key, sep, val = string.partition('=')
    if isindex
      if sep.empty?
        val = key
        key = ''
      end
      isindex = false
    end

    if use__charset_ and key == '_charset_' and e = get_encoding(val)
      enc = e
      use__charset_ = false
    end

    key.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_)
    if val
      val.gsub!(/\+|%\h\h/, TBLDECWWWCOMP_)
    else
      val = ''
    end

    ary << [key, val]
  end
  ary.each do |k, v|
    k.force_encoding(enc)
    k.scrub!
    v.force_encoding(enc)
    v.scrub!
  end
  ary
end

.decode_www_form_component(str, enc = Encoding::UTF_8) ⇒ Object

Decode given str of URL-encoded form data.

This decodes + to SP.

See URI.encode_www_form_component, URI.decode_www_form

Raises:

  • (ArgumentError)


900
901
902
903
# File 'lib/uri/common.rb', line 900

def self.decode_www_form_component(str, enc=Encoding::UTF_8)
  raise ArgumentError, "invalid %-encoding (#{str})" unless /\A[^%]*(?:%\h\h[^%]*)*\z/ =~ str
  str.b.gsub(/\+|%\h\h/, TBLDECWWWCOMP_).force_encoding(enc)
end

.encode_www_form(enum, enc = nil) ⇒ Object

Generate URL-encoded form data from given enum.

This generates application/x-www-form-urlencoded data defined in HTML5 from given an Enumerable object.

This internally uses URI.encode_www_form_component(str).

This method doesn’t convert the encoding of given items, so convert them before call this method if you want to send data as other than original encoding or mixed encoding data. (Strings which are encoded in an HTML5 ASCII incompatible encoding are converted to UTF-8.)

This method doesn’t handle files. When you send a file, use multipart/form-data.

This refers url.spec.whatwg.org/#concept-urlencoded-serializer

URI.encode_www_form([["q", "ruby"], ["lang", "en"]])
#=> "q=ruby&lang=en"
URI.encode_www_form("q" => "ruby", "lang" => "en")
#=> "q=ruby&lang=en"
URI.encode_www_form("q" => ["ruby", "perl"], "lang" => "en")
#=> "q=ruby&q=perl&lang=en"
URI.encode_www_form([["q", "ruby"], ["q", "perl"], ["lang", "en"]])
#=> "q=ruby&q=perl&lang=en"

See URI.encode_www_form_component, URI.decode_www_form



932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
# File 'lib/uri/common.rb', line 932

def self.encode_www_form(enum, enc=nil)
  enum.map do |k,v|
    if v.nil?
      encode_www_form_component(k, enc)
    elsif v.respond_to?(:to_ary)
      v.to_ary.map do |w|
        str = encode_www_form_component(k, enc)
        unless w.nil?
          str << '='
          str << encode_www_form_component(w, enc)
        end
      end.join('&')
    else
      str = encode_www_form_component(k, enc)
      str << '='
      str << encode_www_form_component(v, enc)
    end
  end.join('&')
end

.encode_www_form_component(str, enc = nil) ⇒ Object

Encode given str to URL-encoded form data.

This method doesn’t convert *, -, ., 0-9, A-Z, _, a-z, but does convert SP (ASCII space) to + and converts others to %XX.

If enc is given, convert str to the encoding before percent encoding.

This is an implementation of www.w3.org/TR/html5/forms.html#url-encoded-form-data

See URI.decode_www_form_component, URI.encode_www_form



882
883
884
885
886
887
888
889
890
891
892
893
# File 'lib/uri/common.rb', line 882

def self.encode_www_form_component(str, enc=nil)
  str = str.to_s.dup
  if str.encoding != Encoding::ASCII_8BIT
    if enc && enc != Encoding::ASCII_8BIT
      str.encode!(Encoding::UTF_8, invalid: :replace, undef: :replace)
      str.encode!(enc, fallback: ->(x){"&#{x.ord};"})
    end
    str.force_encoding(Encoding::ASCII_8BIT)
  end
  str.gsub!(/[^*\-.0-9A-Z_a-z]/, TBLENCWWWCOMP_)
  str.force_encoding(Encoding::US_ASCII)
end

.extract(str, schemes = nil, &block) ⇒ Object

Synopsis

URI::extract(str[, schemes][,&blk])

Args

str

String to extract URIs from.

schemes

Limit URI matching to a specific schemes.

Description

Extracts URIs from a string. If block given, iterates through all matched URIs. Returns nil if block given or array with matches.

Usage

require "uri"

URI.extract("text here http://foo.example.org/bla and here mailto:[email protected] and here also.")
# => ["http://foo.example.com/bla", "mailto:[email protected]"]


812
813
814
# File 'lib/uri/common.rb', line 812

def self.extract(str, schemes = nil, &block)
  DEFAULT_PARSER.extract(str, schemes, &block)
end

.join(*str) ⇒ Object

Synopsis

URI::join(str[, str, ...])

Args

str

String(s) to work with

Description

Joins URIs.

Usage

require 'uri'

p URI.join("http://example.com/","main.rbx")
# => #<URI::HTTP:0x2022ac02 URL:http://localhost/main.rbx>

p URI.join('http://example.com', 'foo')
# => #<URI::HTTP:0x01ab80a0 URL:http://example.com/foo>

p URI.join('http://example.com', '/foo', '/bar')
# => #<URI::HTTP:0x01aaf0b0 URL:http://example.com/bar>

p URI.join('http://example.com', '/foo', 'bar')
# => #<URI::HTTP:0x801a92af0 URL:http://example.com/bar>

p URI.join('http://example.com', '/foo/', 'bar')
# => #<URI::HTTP:0x80135a3a0 URL:http://example.com/foo/bar>


784
785
786
# File 'lib/uri/common.rb', line 784

def self.join(*str)
  DEFAULT_PARSER.join(*str)
end

.parse(uri) ⇒ Object

Synopsis

URI::parse(uri_str)

Args

uri_str

String with URI.

Description

Creates one of the URI’s subclasses instance from the string.

Raises

URI::InvalidURIError

Raised if URI given is not a correct one.

Usage

require 'uri'

uri = URI.parse("http://www.ruby-lang.org/")
p uri
# => #<URI::HTTP:0x202281be URL:http://www.ruby-lang.org/>
p uri.scheme
# => "http"
p uri.host
# => "www.ruby-lang.org"


746
747
748
# File 'lib/uri/common.rb', line 746

def self.parse(uri)
  DEFAULT_PARSER.parse(uri)
end

.regexp(schemes = nil) ⇒ Object

Synopsis

URI::regexp([match_schemes])

Args

match_schemes

Array of schemes. If given, resulting regexp matches to URIs whose scheme is one of the match_schemes.

Description

Returns a Regexp object which matches to URI-like strings. The Regexp object returned by this method includes arbitrary number of capture group (parentheses). Never rely on it’s number.

Usage

require 'uri'

# extract first URI from html_string
html_string.slice(URI.regexp)

# remove ftp URIs
html_string.sub(URI.regexp(['ftp'])

# You should not rely on the number of parentheses
html_string.scan(URI.regexp) do |*matches|
  p $&
end


847
848
849
# File 'lib/uri/common.rb', line 847

def self.regexp(schemes = nil)
  DEFAULT_PARSER.make_regexp(schemes)
end

.scheme_listObject

Returns a Hash of the defined schemes



659
660
661
# File 'lib/uri/common.rb', line 659

def self.scheme_list
  @@schemes
end

.split(uri) ⇒ Object

Synopsis

URI::split(uri)

Args

uri

String with URI.

Description

Splits the string on following parts and returns array with result:

* Scheme
* Userinfo
* Host
* Port
* Registry
* Path
* Opaque
* Query
* Fragment

Usage

require 'uri'

p URI.split("http://www.ruby-lang.org/")
# => ["http", nil, "www.ruby-lang.org", nil, nil, "/", nil, nil, nil]


711
712
713
# File 'lib/uri/common.rb', line 711

def self.split(uri)
  DEFAULT_PARSER.split(uri)
end