Class: Wgit::Url
Overview
Class modeling/serialising a web based HTTP URL.
Can be an internal/relative link e.g. "about.html" or an absolute URL
e.g. "http://www.google.co.uk". Is a subclass of String and uses URI
and
addressable/uri
internally for parsing.
Most of the methods in this class return new Wgit::Url
instances making
the method calls chainable e.g. url.omit_base.omit_fragment
etc. The
methods also try to be idempotent where possible.
Constant Summary
Constants included from Assertable
Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::MIXED_ENUMERABLE_MSG, Assertable::NON_ENUMERABLE_MSG
Instance Attribute Summary collapse
-
#crawl_duration ⇒ Object
The duration of the crawl for this Url (in seconds).
-
#crawled ⇒ Object
(also: #crawled?)
Whether or not the Url has been crawled or not.
-
#date_crawled ⇒ Object
The Time stamp of when this Url was crawled.
-
#redirects ⇒ Object
Record the redirects from the initial Url to the final Url.
Class Method Summary collapse
-
.parse(obj) ⇒ Wgit::Url
Initialises a new Wgit::Url instance from a String or subclass of String e.g.
-
.parse?(obj) ⇒ Wgit::Url
Returns a Wgit::Url instance from Wgit::Url.parse, or nil if obj cannot be parsed successfully e.g.
Instance Method Summary collapse
-
#absolute? ⇒ Boolean
(also: #is_absolute?)
Returns true if self is an absolute Url; false if relative.
-
#concat(other) ⇒ String
Overrides String#concat which oddly returns a Wgit::Url object, and instead returns a String.
-
#fragment? ⇒ Boolean
(also: #is_fragment?)
Returns true if self is a URL fragment e.g.
-
#index? ⇒ Boolean
(also: #is_index?)
Returns true if self equals '/' a.k.a.
-
#initialize(url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil) ⇒ Url
constructor
Initializes a new instance of Wgit::Url which models a web based HTTP URL.
-
#inspect ⇒ String
Overrides String#inspect to distingiush this Url from a String.
-
#invalid? ⇒ Boolean
Returns if self is an invalid (e.g. relative) HTTP URL.
-
#join(other) ⇒ Wgit::Url
Joins self and other together before returning a new Url.
-
#make_absolute(doc) ⇒ Wgit::Url
Returns an absolute form of self within the context of doc.
-
#normalize ⇒ Wgit::Url
Normalizes/escapes self and returns a new Wgit::Url.
-
#omit(*components) ⇒ Wgit::Url
Omits the given URL components from self and returns a new Wgit::Url.
-
#omit_base ⇒ Wgit::Url
Returns a new Wgit::Url with the base (scheme and host) removed e.g.
-
#omit_fragment ⇒ Wgit::Url
Returns a new Wgit::Url with the fragment portion removed e.g.
-
#omit_leading_slash ⇒ Wgit::Url
Returns a new Wgit::Url containing self without a trailing slash.
-
#omit_origin ⇒ Wgit::Url
Returns a new Wgit::Url with the origin (base + port) removed e.g.
-
#omit_query ⇒ Wgit::Url
Returns a new Wgit::Url with the query string portion removed e.g.
-
#omit_slashes ⇒ Wgit::Url
Returns a new Wgit::Url containing self without a leading or trailing slash.
-
#omit_trailing_slash ⇒ Wgit::Url
Returns a new Wgit::Url containing self without a trailing slash.
-
#prefix_scheme(scheme = :http) ⇒ Wgit::Url
Returns self having prefixed a scheme/protocol.
-
#query? ⇒ Boolean
(also: #is_query?)
Returns true if self is a URL query string e.g.
-
#redirects_journey ⇒ Array<Wgit::Url>
Returns the Wgit::Url's starting with the originally requested Url to be crawled, followed by each redirected to Url, finishing with the final crawled Url e.g.
-
#relative?(opts = {}) ⇒ Boolean
(also: #is_relative?)
Returns true if self is a relative Url; false if absolute.
-
#replace(new_url) ⇒ String
Overrides String#replace setting the new_url @uri and String value.
-
#scheme_relative? ⇒ Boolean
(also: #is_scheme_relative?)
Returns true if self starts with '//' a.k.a a scheme/protocol relative path.
-
#to_addressable_uri ⇒ Addressable::URI
Returns the Addressable::URI object for this URL.
-
#to_base ⇒ Wgit::Url?
(also: #base)
Returns only the base of this URL e.g.
-
#to_brand ⇒ Wgit::Url?
(also: #brand)
Returns a new Wgit::Url containing just the brand of this URL e.g.
-
#to_domain ⇒ Wgit::Url?
(also: #domain)
Returns a new Wgit::Url containing just the domain of this URL e.g.
-
#to_endpoint ⇒ Wgit::Url
(also: #endpoint)
Returns the endpoint of this URL e.g.
-
#to_extension ⇒ Wgit::Url?
(also: #extension)
Returns a new Wgit::Url containing just the file extension of this URL e.g.
-
#to_fragment ⇒ Wgit::Url?
(also: #fragment)
Returns a new Wgit::Url containing just the fragment string of this URL e.g.
-
#to_h ⇒ Hash
Returns a Hash containing this Url's instance vars excluding @uri.
-
#to_host ⇒ Wgit::Url?
(also: #host)
Returns a new Wgit::Url containing just the host of this URL e.g.
-
#to_origin ⇒ Wgit::Url?
(also: #origin)
Returns only the origin of this URL e.g.
-
#to_password ⇒ Wgit::Url?
(also: #password)
Returns a new Wgit::Url containing just the password string of this URL e.g.
-
#to_path ⇒ Wgit::Url?
(also: #path)
Returns the path of this URL e.g.
-
#to_port ⇒ Wgit::Url?
(also: #port)
Returns a new Wgit::Url containing just the port of this URL e.g.
-
#to_query ⇒ Wgit::Url?
(also: #query)
Returns a new Wgit::Url containing just the query string of this URL e.g.
-
#to_query_hash(symbolize_keys: false) ⇒ Hash<String | Symbol, String>
(also: #query_hash)
Returns a Hash containing just the query string parameters of this URL e.g.
-
#to_scheme ⇒ Wgit::Url?
(also: #scheme)
Returns a new Wgit::Url containing just the scheme of this URL e.g.
-
#to_sub_domain ⇒ Wgit::Url?
(also: #sub_domain)
Returns a new Wgit::Url containing just the sub domain of this URL e.g.
-
#to_uri ⇒ URI::HTTP, URI::HTTPS
(also: #uri)
Returns a normalised URI object for this URL.
-
#to_url ⇒ Wgit::Url
(also: #url)
Returns self.
-
#to_user ⇒ Wgit::Url?
(also: #user)
Returns a new Wgit::Url containing just the username string of this URL e.g.
-
#valid? ⇒ Boolean
(also: #is_valid?)
Returns if self is a valid and absolute HTTP URL or not.
Methods included from Assertable
#assert_arr_types, #assert_common_arr_types, #assert_required_keys, #assert_respond_to, #assert_types
Constructor Details
#initialize(url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil) ⇒ Url
Initializes a new instance of Wgit::Url which models a web based HTTP URL.
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/wgit/url.rb', line 48 def initialize( url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil ) # Init from a URL String. if url_or_obj.is_a?(String) url = url_or_obj.to_s # Else init from a Hash like object e.g. database object. else obj = url_or_obj assert_respond_to(obj, :fetch) url = obj.fetch("url") # Should always be present. crawled = obj.fetch("crawled", false) date_crawled = obj.fetch("date_crawled", nil) crawl_duration = obj.fetch("crawl_duration", nil) redirects = obj.fetch("redirects", {}) end @uri = Addressable::URI.parse(url) @crawled = crawled @date_crawled = date_crawled @crawl_duration = crawl_duration @redirects = redirects || {} super(url) end |
Instance Attribute Details
#crawl_duration ⇒ Object
The duration of the crawl for this Url (in seconds).
29 30 31 |
# File 'lib/wgit/url.rb', line 29 def crawl_duration @crawl_duration end |
#crawled ⇒ Object Also known as: crawled?
Whether or not the Url has been crawled or not. A custom crawled= method is provided by this class.
23 24 25 |
# File 'lib/wgit/url.rb', line 23 def crawled @crawled end |
#date_crawled ⇒ Object
The Time stamp of when this Url was crawled.
26 27 28 |
# File 'lib/wgit/url.rb', line 26 def date_crawled @date_crawled end |
#redirects ⇒ Object
Record the redirects from the initial Url to the final Url.
32 33 34 |
# File 'lib/wgit/url.rb', line 32 def redirects @redirects end |
Class Method Details
.parse(obj) ⇒ Wgit::Url
Initialises a new Wgit::Url instance from a String or subclass of String e.g. Wgit::Url. Any other obj type will raise an error.
If obj is already a Wgit::Url then it will be returned as is to maintain it's state. Otherwise, a new Wgit::Url is instantiated and returned. This differs from Wgit::Url.new which always instantiates a new Wgit::Url.
Note: Only use this method if you are allowing obj to be either a String or a Wgit::Url whose state you want to preserve e.g. when passing a URL to a crawl method which might redirect (calling Wgit::Url#replace). If you're sure of the type or don't care about preserving the state of the Wgit::Url, use Wgit::Url.new instead.
91 92 93 94 95 96 |
# File 'lib/wgit/url.rb', line 91 def self.parse(obj) raise "Can only parse if obj#is_a?(String)" unless obj.is_a?(String) # Return a Wgit::Url as is to avoid losing state e.g. date_crawled etc. obj.is_a?(Wgit::Url) ? obj : new(obj) end |
.parse?(obj) ⇒ Wgit::Url
Returns a Wgit::Url instance from Wgit::Url.parse, or nil if obj cannot be parsed successfully e.g. the String is invalid.
Use this method when you can't gaurentee that obj is parsable as a URL. See Wgit::Url.parse for more information.
107 108 109 110 111 112 113 |
# File 'lib/wgit/url.rb', line 107 def self.parse?(obj) parse(obj) rescue Addressable::URI::InvalidURIError Wgit.logger.debug("Wgit::Url.parse?('#{obj}') exception: \ Addressable::URI::InvalidURIError") nil end |
Instance Method Details
#absolute? ⇒ Boolean Also known as: is_absolute?
Returns true if self is an absolute Url; false if relative.
265 266 267 |
# File 'lib/wgit/url.rb', line 265 def absolute? @uri.absolute? end |
#concat(other) ⇒ String
Overrides String#concat which oddly returns a Wgit::Url object, and instead returns a String. Therefore this method works the same as if you call String#concat, or its alias String#+, which is desired for this method. If you want to join two Urls, use Wgit::Url#join method.
139 140 141 |
# File 'lib/wgit/url.rb', line 139 def concat(other) to_s.concat(other.to_s) end |
#fragment? ⇒ Boolean Also known as: is_fragment?
Returns true if self is a URL fragment e.g. #top etc. Note this shouldn't be used to determine if self contains a fragment.
687 688 689 |
# File 'lib/wgit/url.rb', line 687 def fragment? start_with?("#") end |
#index? ⇒ Boolean Also known as: is_index?
Returns true if self equals '/' a.k.a. index.
694 695 696 |
# File 'lib/wgit/url.rb', line 694 def index? self == "/" end |
#inspect ⇒ String
Overrides String#inspect to distingiush this Url from a String.
118 119 120 |
# File 'lib/wgit/url.rb', line 118 def inspect "#<Wgit::Url url=\"#{self}\" crawled=#{@crawled}>" end |
#invalid? ⇒ Boolean
Returns if self is an invalid (e.g. relative) HTTP URL. See Wgit::Url#valid? for the inverse (and more information).
285 286 287 |
# File 'lib/wgit/url.rb', line 285 def invalid? !valid? end |
#join(other) ⇒ Wgit::Url
Joins self and other together before returning a new Url. Self is not modified. Some magic occurs depending on what is being joined, see the source code for more information.
296 297 298 299 300 301 302 303 304 305 306 |
# File 'lib/wgit/url.rb', line 296 def join(other) other = Wgit::Url.new(other) raise "other must be relative" unless other.relative? other = other.omit_leading_slash separator = %w[# ? .].include?(other[0]) ? "" : "/" separator = "" if end_with?("/") joined = self + separator + other Wgit::Url.new(joined) end |
#make_absolute(doc) ⇒ Wgit::Url
Returns an absolute form of self within the context of doc. Doesn't modify the receiver.
If self is absolute then it's returned as is, making this method
idempotent. The doc's <base>
element is used if present, otherwise
doc.url
is used as the base; which is joined with self.
Typically used to build an absolute link obtained from a document.
336 337 338 339 340 341 342 343 344 |
# File 'lib/wgit/url.rb', line 336 def make_absolute(doc) assert_type(doc, Wgit::Document) raise "Cannot make absolute when Document @url is not valid" \ unless doc.url.valid? return prefix_scheme(doc.url.to_scheme&.to_sym) if scheme_relative? absolute? ? self : doc.base_url(link: self).join(self) end |
#normalize ⇒ Wgit::Url
Normalizes/escapes self and returns a new Wgit::Url. Self isn't modified. This should be used before GET'ing the url, in case it has IRI chars.
312 313 314 |
# File 'lib/wgit/url.rb', line 312 def normalize Wgit::Url.new(@uri.normalize.to_s) end |
#omit(*components) ⇒ Wgit::Url
Omits the given URL components from self and returns a new Wgit::Url.
Calls Addressable::URI#omit underneath and creates a new Wgit::Url from the output. See the Addressable::URI docs for more information.
583 584 585 586 |
# File 'lib/wgit/url.rb', line 583 def omit(*components) omitted = @uri.omit(*components) Wgit::Url.new(omitted.to_s) end |
#omit_base ⇒ Wgit::Url
Returns a new Wgit::Url with the base (scheme and host) removed e.g. Given http://google.com/search?q=something#about, search?q=something#about is returned. If relative and base isn't present then self is returned. Leading and trailing slashes are always stripped from the return value.
622 623 624 625 626 627 628 629 |
# File 'lib/wgit/url.rb', line 622 def omit_base base_url = to_base omit_base = base_url ? gsub(base_url, "") : self return self if ["", "/"].include?(omit_base) Wgit::Url.new(omit_base).omit_leading_slash end |
#omit_fragment ⇒ Wgit::Url
Returns a new Wgit::Url with the fragment portion removed e.g. Given http://google.com/search#about, http://google.com/search is returned. Self is returned as is if no fragment is present. A URL consisting of only a fragment e.g. '#about' will return an empty URL. This method assumes that the fragment is correctly placed at the very end of the URL.
668 669 670 671 672 673 |
# File 'lib/wgit/url.rb', line 668 def omit_fragment fragment = to_fragment omit_fragment = fragment ? gsub("##{fragment}", "") : self Wgit::Url.new(omit_fragment) end |
#omit_leading_slash ⇒ Wgit::Url
Returns a new Wgit::Url containing self without a trailing slash. Is idempotent meaning self will always be returned regardless of whether there's a trailing slash or not.
593 594 595 |
# File 'lib/wgit/url.rb', line 593 def omit_leading_slash start_with?("/") ? Wgit::Url.new(self[1..]) : self end |
#omit_origin ⇒ Wgit::Url
Returns a new Wgit::Url with the origin (base + port) removed e.g. Given http://google.com:81/search?q=something#about, search?q=something#about is returned. If relative and base isn't present then self is returned. Leading and trailing slashes are always stripped from the return value.
637 638 639 640 641 642 643 644 |
# File 'lib/wgit/url.rb', line 637 def omit_origin origin = to_origin omit_origin = origin ? gsub(origin, "") : self return self if ["", "/"].include?(omit_origin) Wgit::Url.new(omit_origin).omit_leading_slash end |
#omit_query ⇒ Wgit::Url
Returns a new Wgit::Url with the query string portion removed e.g. Given http://google.com/search?q=hello, http://google.com/search is returned. Self is returned as is if no query string is present. A URL consisting of only a query string e.g. '?q=hello' will return an empty URL.
653 654 655 656 657 658 |
# File 'lib/wgit/url.rb', line 653 def omit_query query = to_query omit_query_string = query ? gsub("?#{query}", "") : self Wgit::Url.new(omit_query_string) end |
#omit_slashes ⇒ Wgit::Url
Returns a new Wgit::Url containing self without a leading or trailing slash. Is idempotent and will return self regardless if there's slashes present or not.
611 612 613 614 |
# File 'lib/wgit/url.rb', line 611 def omit_slashes omit_leading_slash .omit_trailing_slash end |
#omit_trailing_slash ⇒ Wgit::Url
Returns a new Wgit::Url containing self without a trailing slash. Is idempotent meaning self will always be returned regardless of whether there's a trailing slash or not.
602 603 604 |
# File 'lib/wgit/url.rb', line 602 def omit_trailing_slash end_with?("/") ? Wgit::Url.new(chop) : self end |
#prefix_scheme(scheme = :http) ⇒ Wgit::Url
Returns self having prefixed a scheme/protocol. Doesn't modify receiver. Returns self even if absolute (with scheme); therefore is idempotent.
351 352 353 354 355 356 357 358 359 360 |
# File 'lib/wgit/url.rb', line 351 def prefix_scheme(scheme = :http) unless %i[http https].include?(scheme) raise "scheme must be :http or :https, not :#{scheme}" end return self if absolute? && !scheme_relative? separator = scheme_relative? ? "" : "//" Wgit::Url.new("#{scheme}:#{separator}#{self}") end |
#query? ⇒ Boolean Also known as: is_query?
Returns true if self is a URL query string e.g. ?q=hello etc. Note this shouldn't be used to determine if self contains a query.
679 680 681 |
# File 'lib/wgit/url.rb', line 679 def query? start_with?("?") end |
#redirects_journey ⇒ Array<Wgit::Url>
Returns the Wgit::Url's starting with the originally requested Url to be crawled, followed by each redirected to Url, finishing with the final crawled Url e.g.
Example Url redirects journey (dictated by the webserver):
http://example.com => 301 to https://example.com
https://example.com => 301 to https://example.com/
https://example.com/ => 200 OK (no more redirects, crawl complete)
Would return an Array of Wgit::Url's in the form of:
%w(
http://example.com
https://example.com
https://example.com/
)
193 194 195 |
# File 'lib/wgit/url.rb', line 193 def redirects_journey [redirects.keys, self].flatten end |
#relative?(opts = {}) ⇒ Boolean Also known as: is_relative?
Returns true if self is a relative Url; false if absolute.
An absolute URL must have a scheme prefix e.g. 'http://', otherwise the URL is regarded as being relative (regardless of whether it's valid or not). The only exception is if an opts arg is provided and self is a page belonging to that arg type e.g. host; then the link is relative.
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/wgit/url.rb', line 227 def relative?(opts = {}) defaults = { origin: nil, host: nil, domain: nil, brand: nil } opts = defaults.merge(opts) raise "Url (self) cannot be empty" if empty? return false if scheme_relative? return true if @uri.relative? # Self is absolute but may be relative to the opts param e.g. host. opts.select! { |_k, v| v } raise "Provide only one of: #{defaults.keys}" if opts.length > 1 return false if opts.empty? type, url = opts.first url = Wgit::Url.new(url) if url.invalid? raise "Invalid opts param value, it must be absolute, containing a \ protocol scheme and domain (e.g. http://example.com): #{url}" end case type when :origin # http://www.google.com:81 to_origin == url.to_origin when :host # www.google.com to_host == url.to_host when :domain # google.com to_domain == url.to_domain when :brand # google to_brand == url.to_brand else raise "Unknown opts param: :#{type}, use one of: #{defaults.keys}" end end |
#replace(new_url) ⇒ String
Overrides String#replace setting the new_url @uri and String value.
126 127 128 129 130 |
# File 'lib/wgit/url.rb', line 126 def replace(new_url) @uri = Addressable::URI.parse(new_url) super(new_url) end |
#scheme_relative? ⇒ Boolean Also known as: is_scheme_relative?
Returns true if self starts with '//' a.k.a a scheme/protocol relative path.
702 703 704 |
# File 'lib/wgit/url.rb', line 702 def scheme_relative? start_with?("//") end |
#to_addressable_uri ⇒ Addressable::URI
Returns the Addressable::URI object for this URL.
381 382 383 |
# File 'lib/wgit/url.rb', line 381 def to_addressable_uri @uri end |
#to_base ⇒ Wgit::Url? Also known as: base
Returns only the base of this URL e.g. the protocol scheme and host combined.
461 462 463 464 465 466 |
# File 'lib/wgit/url.rb', line 461 def to_base return nil unless @uri.scheme && @uri.host base = "#{@uri.scheme}://#{@uri.host}" Wgit::Url.new(base) end |
#to_brand ⇒ Wgit::Url? Also known as: brand
Returns a new Wgit::Url containing just the brand of this URL e.g. Given http://www.google.co.uk/about.html, google is returned.
451 452 453 454 |
# File 'lib/wgit/url.rb', line 451 def to_brand domain = to_domain domain ? Wgit::Url.new(domain.split(".").first) : nil end |
#to_domain ⇒ Wgit::Url? Also known as: domain
Returns a new Wgit::Url containing just the domain of this URL e.g. Given http://www.google.co.uk/about.html, google.co.uk is returned.
428 429 430 431 |
# File 'lib/wgit/url.rb', line 428 def to_domain domain = @uri.domain domain ? Wgit::Url.new(domain) : nil end |
#to_endpoint ⇒ Wgit::Url Also known as: endpoint
Returns the endpoint of this URL e.g. the bit after the host with any slashes included. For example: Wgit::Url.new("http://www.google.co.uk/about.html/").to_endpoint returns "/about.html/". See Wgit::Url#to_path if you don't want the slashes.
501 502 503 504 505 |
# File 'lib/wgit/url.rb', line 501 def to_endpoint endpoint = @uri.path endpoint = "/#{endpoint}" unless endpoint.start_with?("/") Wgit::Url.new(endpoint) end |
#to_extension ⇒ Wgit::Url? Also known as: extension
Returns a new Wgit::Url containing just the file extension of this URL e.g. Given http://google.com#about.html, html is returned.
547 548 549 550 551 552 553 |
# File 'lib/wgit/url.rb', line 547 def to_extension path = to_path&.omit_trailing_slash return nil unless path segs = path.split(".") segs.length > 1 ? Wgit::Url.new(segs.last) : nil end |
#to_fragment ⇒ Wgit::Url? Also known as: fragment
Returns a new Wgit::Url containing just the fragment string of this URL e.g. Given http://google.com#about, #about is returned.
538 539 540 541 |
# File 'lib/wgit/url.rb', line 538 def to_fragment fragment = @uri.fragment fragment ? Wgit::Url.new(fragment) : nil end |
#to_h ⇒ Hash
Returns a Hash containing this Url's instance vars excluding @uri. Used when storing the URL in a Database e.g. MongoDB etc.
366 367 368 369 |
# File 'lib/wgit/url.rb', line 366 def to_h h = Wgit::Utils.to_h(self, ignore: ["@uri"]) Hash[h.to_a.insert(0, ["url", to_s])] # Insert url at position 0. end |
#to_host ⇒ Wgit::Url? Also known as: host
Returns a new Wgit::Url containing just the host of this URL e.g. Given http://www.google.co.uk/about.html, www.google.co.uk is returned.
405 406 407 408 |
# File 'lib/wgit/url.rb', line 405 def to_host host = @uri.host host ? Wgit::Url.new(host) : nil end |
#to_origin ⇒ Wgit::Url? Also known as: origin
Returns only the origin of this URL e.g. the protocol scheme, host and port combined. For http://localhost:3000/api, http://localhost:3000 gets returned. If there's no port present, then to_base is returned.
473 474 475 476 477 478 |
# File 'lib/wgit/url.rb', line 473 def to_origin return nil unless to_base return to_base unless to_port Wgit::Url.new("#{to_base}:#{to_port}") end |
#to_password ⇒ Wgit::Url? Also known as: password
Returns a new Wgit::Url containing just the password string of this URL e.g. Given http://me:[email protected], pass1 is returned.
568 569 570 571 |
# File 'lib/wgit/url.rb', line 568 def to_password password = @uri.password password ? Wgit::Url.new(password) : nil end |
#to_path ⇒ Wgit::Url? Also known as: path
Returns the path of this URL e.g. the bit after the host without slashes. For example: Wgit::Url.new("http://www.google.co.uk/about.html/").to_path returns "about.html". See Wgit::Url#to_endpoint if you want the slashes.
486 487 488 489 490 491 492 |
# File 'lib/wgit/url.rb', line 486 def to_path path = @uri.path return nil if path.nil? || path.empty? return Wgit::Url.new("/") if path == "/" Wgit::Url.new(path).omit_leading_slash end |
#to_port ⇒ Wgit::Url? Also known as: port
Returns a new Wgit::Url containing just the port of this URL e.g. Given http://www.google.co.uk:443/about.html, '443' is returned.
414 415 416 417 418 419 420 421 422 |
# File 'lib/wgit/url.rb', line 414 def to_port port = @uri.port # @uri.port defaults port to 80/443 if missing, so we check for :#{port}. return nil unless port return nil unless include?(":#{port}") Wgit::Url.new(port.to_s) end |
#to_query ⇒ Wgit::Url? Also known as: query
Returns a new Wgit::Url containing just the query string of this URL e.g. Given http://google.com?q=foo&bar=1, 'q=ruby&bar=1' is returned.
511 512 513 514 |
# File 'lib/wgit/url.rb', line 511 def to_query query = @uri.query query ? Wgit::Url.new(query) : nil end |
#to_query_hash(symbolize_keys: false) ⇒ Hash<String | Symbol, String> Also known as: query_hash
Returns a Hash containing just the query string parameters of this URL e.g. Given http://google.com?q=ruby, "{ 'q' => 'ruby' }" is returned.
523 524 525 526 527 528 529 530 531 532 |
# File 'lib/wgit/url.rb', line 523 def to_query_hash(symbolize_keys: false) query_str = to_query return {} unless query_str query_str.split("&").each_with_object({}) do |param, hash| k, v = param.split("=") k = k.to_sym if symbolize_keys hash[k] = v end end |
#to_scheme ⇒ Wgit::Url? Also known as: scheme
Returns a new Wgit::Url containing just the scheme of this URL e.g. Given http://www.google.co.uk, http is returned.
396 397 398 399 |
# File 'lib/wgit/url.rb', line 396 def to_scheme scheme = @uri.scheme scheme ? Wgit::Url.new(scheme) : nil end |
#to_sub_domain ⇒ Wgit::Url? Also known as: sub_domain
Returns a new Wgit::Url containing just the sub domain of this URL e.g. Given http://scripts.dev.google.com, scripts.dev is returned.
437 438 439 440 441 442 443 444 445 |
# File 'lib/wgit/url.rb', line 437 def to_sub_domain return nil unless to_host dot_domain = ".#{to_domain}" return nil unless include?(dot_domain) sub_domain = to_host.sub(dot_domain, "") Wgit::Url.new(sub_domain) end |
#to_uri ⇒ URI::HTTP, URI::HTTPS Also known as: uri
Returns a normalised URI object for this URL.
374 375 376 |
# File 'lib/wgit/url.rb', line 374 def to_uri URI(normalize) end |
#to_url ⇒ Wgit::Url Also known as: url
Returns self.
388 389 390 |
# File 'lib/wgit/url.rb', line 388 def to_url self end |
#to_user ⇒ Wgit::Url? Also known as: user
Returns a new Wgit::Url containing just the username string of this URL e.g. Given http://me:[email protected], me is returned.
559 560 561 562 |
# File 'lib/wgit/url.rb', line 559 def to_user user = @uri.user user ? Wgit::Url.new(user) : nil end |
#valid? ⇒ Boolean Also known as: is_valid?
Returns if self is a valid and absolute HTTP URL or not. Self should always be crawlable if this method returns true.
273 274 275 276 277 278 279 |
# File 'lib/wgit/url.rb', line 273 def valid? return false if relative? return false unless to_origin && to_domain return false unless URI::DEFAULT_PARSER.make_regexp.match(normalize) true end |