Class: Wgit::Url

Inherits:
String show all
Includes:
Assertable
Defined in:
lib/wgit/url.rb

Overview

Class modeling/serialising a web based HTTP URL.

Can be an internal/relative link e.g. "about.html" or an absolute URL e.g. "http://www.google.co.uk". Is a subclass of String and uses URI and addressable/uri internally for parsing.

Most of the methods in this class return new Wgit::Url instances making the method calls chainable e.g. url.omit_base.omit_fragment etc. The methods also try to be idempotent where possible.

Constant Summary

Constants included from Assertable

Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::NON_ENUMERABLE_MSG

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Assertable

#assert_arr_types, #assert_required_keys, #assert_respond_to, #assert_types

Constructor Details

#initialize(url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil) ⇒ Url

Initializes a new instance of Wgit::Url which models a web based HTTP URL.

Parameters:

  • url_or_obj (String, Wgit::Url, #fetch#[])

    Is either a String based URL or an object representing a Database record e.g. a MongoDB document/object.

  • crawled (Boolean) (defaults to: false)

    Whether or not the HTML of the URL's web page has been crawled or not. Only used if url_or_obj is a String.

  • date_crawled (Time) (defaults to: nil)

    Should only be provided if crawled is true. A suitable object can be returned from Wgit::Utils.time_stamp. Only used if url_or_obj is a String.

  • crawl_duration (Float) (defaults to: nil)

    Should only be provided if crawled is true. The duration of the crawl for this Url (in seconds).

Raises:

  • (StandardError)

    If url_or_obj is an Object with missing methods.



48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/wgit/url.rb', line 48

def initialize(
  url_or_obj, crawled: false, date_crawled: nil, crawl_duration: nil
)
  # Init from a URL String.
  if url_or_obj.is_a?(String)
    url = url_or_obj.to_s
  # Else init from a Hash like object e.g. database object.
  else
    obj = url_or_obj
    assert_respond_to(obj, :fetch)

    url            = obj.fetch('url') # Should always be present.
    crawled        = obj.fetch('crawled', false)
    date_crawled   = obj.fetch('date_crawled', nil)
    crawl_duration = obj.fetch('crawl_duration', nil)
    redirects      = obj.fetch('redirects', {})
  end

  @uri            = Addressable::URI.parse(url)
  @crawled        = crawled
  @date_crawled   = date_crawled
  @crawl_duration = crawl_duration
  @redirects      = redirects || {}

  super(url)
end

Instance Attribute Details

#crawl_durationObject

The duration of the crawl for this Url (in seconds).



29
30
31
# File 'lib/wgit/url.rb', line 29

def crawl_duration
  @crawl_duration
end

#crawledObject Also known as: crawled?

Whether or not the Url has been crawled or not. A custom crawled= method is provided by this class.



23
24
25
# File 'lib/wgit/url.rb', line 23

def crawled
  @crawled
end

#date_crawledObject

The Time stamp of when this Url was crawled.



26
27
28
# File 'lib/wgit/url.rb', line 26

def date_crawled
  @date_crawled
end

#redirectsObject

Record the redirects from the initial Url to the final Url.



32
33
34
# File 'lib/wgit/url.rb', line 32

def redirects
  @redirects
end

Class Method Details

.parse(obj) ⇒ Wgit::Url

Initialises a new Wgit::Url instance from a String or subclass of String e.g. Wgit::Url. Any other obj type will raise an error.

If obj is already a Wgit::Url then it will be returned as is to maintain it's state. Otherwise, a new Wgit::Url is instantiated and returned. This differs from Wgit::Url.new which always instantiates a new Wgit::Url.

Note: Only use this method if you are allowing obj to be either a String or a Wgit::Url whose state you want to preserve e.g. when passing a URL to a crawl method which might redirect (calling Wgit::Url#replace). If you're sure of the type or don't care about preserving the state of the Wgit::Url, use Wgit::Url.new instead.

Parameters:

  • obj (Object)

    The object to parse, which #is_a?(String).

Returns:

Raises:

  • (StandardError)

    If obj.is_a?(String) is false.



91
92
93
94
95
96
# File 'lib/wgit/url.rb', line 91

def self.parse(obj)
  raise 'Can only parse if obj#is_a?(String)' unless obj.is_a?(String)

  # Return a Wgit::Url as is to avoid losing state e.g. date_crawled etc.
  obj.is_a?(Wgit::Url) ? obj : new(obj)
end

.parse?(obj) ⇒ Wgit::Url

Returns a Wgit::Url instance from Wgit::Url.parse, or nil if obj cannot be parsed successfully e.g. the String is invalid.

Use this method when you can't gaurentee that obj is parsable as a URL. See Wgit::Url.parse for more information.

Parameters:

  • obj (Object)

    The object to parse, which #is_a?(String).

Returns:

  • (Wgit::Url)

    A Wgit::Url instance or nil (if obj is invalid).

Raises:

  • (StandardError)

    If obj.is_a?(String) is false.



107
108
109
110
111
112
113
# File 'lib/wgit/url.rb', line 107

def self.parse?(obj)
  parse(obj)
rescue Addressable::URI::InvalidURIError
  Wgit.logger.debug("Wgit::Url.parse?('#{obj}') exception: \
Addressable::URI::InvalidURIError")
  nil
end

Instance Method Details

#absolute?Boolean Also known as: is_absolute?

Returns true if self is an absolute Url; false if relative.

Returns:

  • (Boolean)

    True if absolute, false if relative.



265
266
267
# File 'lib/wgit/url.rb', line 265

def absolute?
  @uri.absolute?
end

#concat(other) ⇒ String

Overrides String#concat which oddly returns a Wgit::Url object, and instead returns a String. Therefore this method works the same as if you call String#concat, or its alias String#+, which is desired for this method. If you want to join two Urls, use Wgit::Url#join method.

Parameters:

  • other (String)

    The String to concat onto this one.

Returns:

  • (String)

    The new concatted String, not a Wgit::Url.



139
140
141
# File 'lib/wgit/url.rb', line 139

def concat(other)
  to_s.concat(other.to_s)
end

#fragment?Boolean Also known as: is_fragment?

Returns true if self is a URL fragment e.g. #top etc. Note this shouldn't be used to determine if self contains a fragment.

Returns:

  • (Boolean)

    True if self is a fragment, false otherwise.



687
688
689
# File 'lib/wgit/url.rb', line 687

def fragment?
  start_with?('#')
end

#index?Boolean Also known as: is_index?

Returns true if self equals '/' a.k.a. index.

Returns:

  • (Boolean)

    True if self equals '/', false otherwise.



694
695
696
# File 'lib/wgit/url.rb', line 694

def index?
  self == '/'
end

#inspectString

Overrides String#inspect to distingiush this Url from a String.

Returns:

  • (String)

    A short textual representation of this Url.



118
119
120
# File 'lib/wgit/url.rb', line 118

def inspect
  "#<Wgit::Url url=\"#{self}\" crawled=#{@crawled}>"
end

#invalid?Boolean

Returns if self is an invalid (e.g. relative) HTTP URL. See Wgit::Url#valid? for the inverse (and more information).

Returns:

  • (Boolean)

    True if invalid, otherwise false.



285
286
287
# File 'lib/wgit/url.rb', line 285

def invalid?
  !valid?
end

#join(other) ⇒ Wgit::Url

Joins self and other together before returning a new Url. Self is not modified. Some magic occurs depending on what is being joined, see the source code for more information.

Parameters:

  • other (Wgit::Url, String)

    The other (relative) Url to join to the end of self.

Returns:

  • (Wgit::Url)

    self + separator + other, separator depends on other.



296
297
298
299
300
301
302
303
304
305
306
# File 'lib/wgit/url.rb', line 296

def join(other)
  other = Wgit::Url.new(other)
  raise 'other must be relative' unless other.relative?

  other = other.omit_leading_slash
  separator = %w[# ? .].include?(other[0]) ? '' : '/'
  separator = '' if end_with?('/')
  joined = self + separator + other

  Wgit::Url.new(joined)
end

#make_absolute(doc) ⇒ Wgit::Url

Returns an absolute form of self within the context of doc. Doesn't modify the receiver.

If self is absolute then it's returned as is, making this method idempotent. The doc's <base> element is used if present, otherwise doc.url is used as the base; which is joined with self.

Typically used to build an absolute link obtained from a document.

Examples:

link = Wgit::Url.new('/favicon.png')
doc  = Wgit::Document.new('http://example.com')

link.make_absolute(doc) # => "http://example.com/favicon.png"

Parameters:

  • doc (Wgit::Document)

    The doc whose base Url is joined with self.

Returns:

Raises:

  • (StandardError)

    If doc isn't a Wgit::Document or if doc.base_url raises an Exception.



336
337
338
339
340
341
342
343
344
# File 'lib/wgit/url.rb', line 336

def make_absolute(doc)
  assert_type(doc, Wgit::Document)
  raise 'Cannot make absolute when Document @url is not valid' \
  unless doc.url.valid?

  return prefix_scheme(doc.url.to_scheme&.to_sym) if scheme_relative?

  absolute? ? self : doc.base_url(link: self).join(self)
end

#normalizeWgit::Url

Normalizes/escapes self and returns a new Wgit::Url. Self isn't modified. This should be used before GET'ing the url, in case it has IRI chars.

Returns:

  • (Wgit::Url)

    An escaped version of self.



312
313
314
# File 'lib/wgit/url.rb', line 312

def normalize
  Wgit::Url.new(@uri.normalize.to_s)
end

#omit(*components) ⇒ Wgit::Url

Omits the given URL components from self and returns a new Wgit::Url.

Calls Addressable::URI#omit underneath and creates a new Wgit::Url from the output. See the Addressable::URI docs for more information.

Parameters:

  • components (*Symbol)

    One or more Symbols representing the URL components to omit. The following components are supported: :scheme, :user, :password, :userinfo, :host, :port, :authority, :path, :query, :fragment.

Returns:

  • (Wgit::Url)

    Self's URL value with the given components omitted.



583
584
585
586
# File 'lib/wgit/url.rb', line 583

def omit(*components)
  omitted = @uri.omit(*components)
  Wgit::Url.new(omitted.to_s)
end

#omit_baseWgit::Url

Returns a new Wgit::Url with the base (scheme and host) removed e.g. Given http://google.com/search?q=something#about, search?q=something#about is returned. If relative and base isn't present then self is returned. Leading and trailing slashes are always stripped from the return value.

Returns:

  • (Wgit::Url)

    Self containing everything after the base.



622
623
624
625
626
627
628
629
# File 'lib/wgit/url.rb', line 622

def omit_base
  base_url = to_base
  omit_base = base_url ? gsub(base_url, '') : self

  return self if ['', '/'].include?(omit_base)

  Wgit::Url.new(omit_base).omit_leading_slash
end

#omit_fragmentWgit::Url

Returns a new Wgit::Url with the fragment portion removed e.g. Given http://google.com/search#about, http://google.com/search is returned. Self is returned as is if no fragment is present. A URL consisting of only a fragment e.g. '#about' will return an empty URL. This method assumes that the fragment is correctly placed at the very end of the URL.

Returns:

  • (Wgit::Url)

    Self with the fragment portion removed.



668
669
670
671
672
673
# File 'lib/wgit/url.rb', line 668

def omit_fragment
  fragment = to_fragment
  omit_fragment = fragment ? gsub("##{fragment}", '') : self

  Wgit::Url.new(omit_fragment)
end

#omit_leading_slashWgit::Url

Returns a new Wgit::Url containing self without a trailing slash. Is idempotent meaning self will always be returned regardless of whether there's a trailing slash or not.

Returns:

  • (Wgit::Url)

    Self without a trailing slash.



593
594
595
# File 'lib/wgit/url.rb', line 593

def omit_leading_slash
  start_with?('/') ? Wgit::Url.new(self[1..]) : self
end

#omit_originWgit::Url

Returns a new Wgit::Url with the origin (base + port) removed e.g. Given http://google.com:81/search?q=something#about, search?q=something#about is returned. If relative and base isn't present then self is returned. Leading and trailing slashes are always stripped from the return value.

Returns:

  • (Wgit::Url)

    Self containing everything after the origin.



637
638
639
640
641
642
643
644
# File 'lib/wgit/url.rb', line 637

def omit_origin
  origin = to_origin
  omit_origin = origin ? gsub(origin, '') : self

  return self if ['', '/'].include?(omit_origin)

  Wgit::Url.new(omit_origin).omit_leading_slash
end

#omit_queryWgit::Url

Returns a new Wgit::Url with the query string portion removed e.g. Given http://google.com/search?q=hello, http://google.com/search is returned. Self is returned as is if no query string is present. A URL consisting of only a query string e.g. '?q=hello' will return an empty URL.

Returns:

  • (Wgit::Url)

    Self with the query string portion removed.



653
654
655
656
657
658
# File 'lib/wgit/url.rb', line 653

def omit_query
  query = to_query
  omit_query_string = query ? gsub("?#{query}", '') : self

  Wgit::Url.new(omit_query_string)
end

#omit_slashesWgit::Url

Returns a new Wgit::Url containing self without a leading or trailing slash. Is idempotent and will return self regardless if there's slashes present or not.

Returns:

  • (Wgit::Url)

    Self without leading or trailing slashes.



611
612
613
614
# File 'lib/wgit/url.rb', line 611

def omit_slashes
  omit_leading_slash
    .omit_trailing_slash
end

#omit_trailing_slashWgit::Url

Returns a new Wgit::Url containing self without a trailing slash. Is idempotent meaning self will always be returned regardless of whether there's a trailing slash or not.

Returns:

  • (Wgit::Url)

    Self without a trailing slash.



602
603
604
# File 'lib/wgit/url.rb', line 602

def omit_trailing_slash
  end_with?('/') ? Wgit::Url.new(chop) : self
end

#prefix_scheme(scheme = :http) ⇒ Wgit::Url

Returns self having prefixed a scheme/protocol. Doesn't modify receiver. Returns self even if absolute (with scheme); therefore is idempotent.

Parameters:

  • scheme (Symbol) (defaults to: :http)

    Either :http or :https.

Returns:



351
352
353
354
355
356
357
358
359
360
# File 'lib/wgit/url.rb', line 351

def prefix_scheme(scheme = :http)
  unless %i[http https].include?(scheme)
    raise "scheme must be :http or :https, not :#{scheme}"
  end

  return self if absolute? && !scheme_relative?

  separator = scheme_relative? ? '' : '//'
  Wgit::Url.new("#{scheme}:#{separator}#{self}")
end

#query?Boolean Also known as: is_query?

Returns true if self is a URL query string e.g. ?q=hello etc. Note this shouldn't be used to determine if self contains a query.

Returns:

  • (Boolean)

    True if self is a query string, false otherwise.



679
680
681
# File 'lib/wgit/url.rb', line 679

def query?
  start_with?('?')
end

#redirects_journeyArray<Wgit::Url>

Returns the Wgit::Url's starting with the originally requested Url to be crawled, followed by each redirected to Url, finishing with the final crawled Url e.g.

Example Url redirects journey (dictated by the webserver):

http://example.com   => 301 to https://example.com
https://example.com  => 301 to https://example.com/
https://example.com/ => 200 OK (no more redirects, crawl complete)

Would return an Array of Wgit::Url's in the form of:

%w(
  http://example.com
  https://example.com
  https://example.com/
)

Returns:

  • (Array<Wgit::Url>)

    Each redirected to Url's finishing with the final (successfully) crawled Url. If no redirects took place, then just the originally requested Url is returned inside the Array.



193
194
195
# File 'lib/wgit/url.rb', line 193

def redirects_journey
  [redirects.keys, self].flatten
end

#relative?(opts = {}) ⇒ Boolean Also known as: is_relative?

Returns true if self is a relative Url; false if absolute.

An absolute URL must have a scheme prefix e.g. 'http://', otherwise the URL is regarded as being relative (regardless of whether it's valid or not). The only exception is if an opts arg is provided and self is a page belonging to that arg type e.g. host; then the link is relative.

Examples:

url = Wgit::Url.new('http://example.com/about')

url.relative? # => false
url.relative?(host: 'http://example.com') # => true

Parameters:

  • opts (Hash) (defaults to: {})

    The options with which to check relativity. Only one opts param should be provided. The provided opts param Url must be absolute and be prefixed with a scheme. Consider using the output of Wgit::Url#to_origin which should work (unless it's nil).

Options Hash (opts):

Returns:

  • (Boolean)

    True if relative, false if absolute.

Raises:

  • (StandardError)

    If self is invalid (e.g. empty) or an invalid opts param has been provided.



227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
# File 'lib/wgit/url.rb', line 227

def relative?(opts = {})
  defaults = { origin: nil, host: nil, domain: nil, brand: nil }
  opts = defaults.merge(opts)
  raise 'Url (self) cannot be empty' if empty?

  return false if scheme_relative?
  return true  if @uri.relative?

  # Self is absolute but may be relative to the opts param e.g. host.
  opts.select! { |_k, v| v }
  raise "Provide only one of: #{defaults.keys}" if opts.length > 1

  return false if opts.empty?

  type, url = opts.first
  url = Wgit::Url.new(url)
  if url.invalid?
    raise "Invalid opts param value, it must be absolute, containing a \
protocol scheme and domain (e.g. http://example.com): #{url}"
  end

  case type
  when :origin # http://www.google.com:81
    to_origin == url.to_origin
  when :host   # www.google.com
    to_host   == url.to_host
  when :domain # google.com
    to_domain == url.to_domain
  when :brand  # google
    to_brand  == url.to_brand
  else
    raise "Unknown opts param: :#{type}, use one of: #{defaults.keys}"
  end
end

#replace(new_url) ⇒ String

Overrides String#replace setting the new_url @uri and String value.

Parameters:

Returns:

  • (String)

    The new URL value once set.



126
127
128
129
130
# File 'lib/wgit/url.rb', line 126

def replace(new_url)
  @uri = Addressable::URI.parse(new_url)

  super(new_url)
end

#scheme_relative?Boolean Also known as: is_scheme_relative?

Returns true if self starts with '//' a.k.a a scheme/protocol relative path.

Returns:

  • (Boolean)

    True if self starts with '//', false otherwise.



702
703
704
# File 'lib/wgit/url.rb', line 702

def scheme_relative?
  start_with?('//')
end

#to_addressable_uriAddressable::URI

Returns the Addressable::URI object for this URL.

Returns:

  • (Addressable::URI)

    The Addressable::URI object of self.



381
382
383
# File 'lib/wgit/url.rb', line 381

def to_addressable_uri
  @uri
end

#to_baseWgit::Url? Also known as: base

Returns only the base of this URL e.g. the protocol scheme and host combined.

Returns:



461
462
463
464
465
466
# File 'lib/wgit/url.rb', line 461

def to_base
  return nil unless @uri.scheme && @uri.host

  base = "#{@uri.scheme}://#{@uri.host}"
  Wgit::Url.new(base)
end

#to_brandWgit::Url? Also known as: brand

Returns a new Wgit::Url containing just the brand of this URL e.g. Given http://www.google.co.uk/about.html, google is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the brand or nil.



451
452
453
454
# File 'lib/wgit/url.rb', line 451

def to_brand
  domain = to_domain
  domain ? Wgit::Url.new(domain.split('.').first) : nil
end

#to_domainWgit::Url? Also known as: domain

Returns a new Wgit::Url containing just the domain of this URL e.g. Given http://www.google.co.uk/about.html, google.co.uk is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the domain or nil.



428
429
430
431
# File 'lib/wgit/url.rb', line 428

def to_domain
  domain = @uri.domain
  domain ? Wgit::Url.new(domain) : nil
end

#to_endpointWgit::Url Also known as: endpoint

Returns the endpoint of this URL e.g. the bit after the host with any slashes included. For example: Wgit::Url.new("http://www.google.co.uk/about.html/").to_endpoint returns "/about.html/". See Wgit::Url#to_path if you don't want the slashes.

Returns:

  • (Wgit::Url)

    Endpoint of self e.g. /about.html/. For a URL without an endpoint, / is returned.



501
502
503
504
505
# File 'lib/wgit/url.rb', line 501

def to_endpoint
  endpoint = @uri.path
  endpoint = "/#{endpoint}" unless endpoint.start_with?('/')
  Wgit::Url.new(endpoint)
end

#to_extensionWgit::Url? Also known as: extension

Returns a new Wgit::Url containing just the file extension of this URL e.g. Given http://google.com#about.html, html is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the extension string or nil.



547
548
549
550
551
552
553
# File 'lib/wgit/url.rb', line 547

def to_extension
  path = to_path&.omit_trailing_slash
  return nil unless path

  segs = path.split('.')
  segs.length > 1 ? Wgit::Url.new(segs.last) : nil
end

#to_fragmentWgit::Url? Also known as: fragment

Returns a new Wgit::Url containing just the fragment string of this URL e.g. Given http://google.com#about, #about is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the fragment string or nil.



538
539
540
541
# File 'lib/wgit/url.rb', line 538

def to_fragment
  fragment = @uri.fragment
  fragment ? Wgit::Url.new(fragment) : nil
end

#to_hHash

Returns a Hash containing this Url's instance vars excluding @uri. Used when storing the URL in a Database e.g. MongoDB etc.

Returns:

  • (Hash)

    self's instance vars as a Hash.



366
367
368
369
# File 'lib/wgit/url.rb', line 366

def to_h
  h = Wgit::Utils.to_h(self, ignore: ['@uri'])
  Hash[h.to_a.insert(0, ['url', self])] # Insert url at position 0.
end

#to_hostWgit::Url? Also known as: host

Returns a new Wgit::Url containing just the host of this URL e.g. Given http://www.google.co.uk/about.html, www.google.co.uk is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the host or nil.



405
406
407
408
# File 'lib/wgit/url.rb', line 405

def to_host
  host = @uri.host
  host ? Wgit::Url.new(host) : nil
end

#to_originWgit::Url? Also known as: origin

Returns only the origin of this URL e.g. the protocol scheme, host and port combined. For http://localhost:3000/api, http://localhost:3000 gets returned. If there's no port present, then to_base is returned.

Returns:

  • (Wgit::Url, nil)

    The origin of self or nil.



473
474
475
476
477
478
# File 'lib/wgit/url.rb', line 473

def to_origin
  return nil unless to_base
  return to_base unless to_port

  Wgit::Url.new("#{to_base}:#{to_port}")
end

#to_passwordWgit::Url? Also known as: password

Returns a new Wgit::Url containing just the password string of this URL e.g. Given http://me:[email protected], pass1 is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the password string or nil.



568
569
570
571
# File 'lib/wgit/url.rb', line 568

def to_password
  password = @uri.password
  password ? Wgit::Url.new(password) : nil
end

#to_pathWgit::Url? Also known as: path

Returns the path of this URL e.g. the bit after the host without slashes. For example: Wgit::Url.new("http://www.google.co.uk/about.html/").to_path returns "about.html". See Wgit::Url#to_endpoint if you want the slashes.

Returns:

  • (Wgit::Url, nil)

    Path of self e.g. about.html or nil.



486
487
488
489
490
491
492
# File 'lib/wgit/url.rb', line 486

def to_path
  path = @uri.path
  return nil if path.nil? || path.empty?
  return Wgit::Url.new('/') if path == '/'

  Wgit::Url.new(path).omit_leading_slash
end

#to_portWgit::Url? Also known as: port

Returns a new Wgit::Url containing just the port of this URL e.g. Given http://www.google.co.uk:443/about.html, '443' is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the port or nil.



414
415
416
417
418
419
420
421
422
# File 'lib/wgit/url.rb', line 414

def to_port
  port = @uri.port

  # @uri.port defaults port to 80/443 if missing, so we check for :#{port}.
  return nil unless port
  return nil unless include?(":#{port}")

  Wgit::Url.new(port.to_s)
end

#to_queryWgit::Url? Also known as: query

Returns a new Wgit::Url containing just the query string of this URL e.g. Given http://google.com?q=foo&bar=1, 'q=ruby&bar=1' is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the query string or nil.



511
512
513
514
# File 'lib/wgit/url.rb', line 511

def to_query
  query = @uri.query
  query ? Wgit::Url.new(query) : nil
end

#to_query_hash(symbolize_keys: false) ⇒ Hash<String | Symbol, String> Also known as: query_hash

Returns a Hash containing just the query string parameters of this URL e.g. Given http://google.com?q=ruby, "{ 'q' => 'ruby' }" is returned.

Parameters:

  • symbolize_keys (Boolean) (defaults to: false)

    The returned Hash keys will be Symbols if true, Strings otherwise.

Returns:

  • (Hash<String | Symbol, String>)

    Containing the query string params or empty if the URL doesn't contain any query parameters.



523
524
525
526
527
528
529
530
531
532
# File 'lib/wgit/url.rb', line 523

def to_query_hash(symbolize_keys: false)
  query_str = to_query
  return {} unless query_str

  query_str.split('&').each_with_object({}) do |param, hash|
    k, v = param.split('=')
    k = k.to_sym if symbolize_keys
    hash[k] = v
  end
end

#to_schemeWgit::Url? Also known as: scheme

Returns a new Wgit::Url containing just the scheme of this URL e.g. Given http://www.google.co.uk, http is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the scheme or nil.



396
397
398
399
# File 'lib/wgit/url.rb', line 396

def to_scheme
  scheme = @uri.scheme
  scheme ? Wgit::Url.new(scheme) : nil
end

#to_sub_domainWgit::Url? Also known as: sub_domain

Returns a new Wgit::Url containing just the sub domain of this URL e.g. Given http://scripts.dev.google.com, scripts.dev is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the sub domain or nil.



437
438
439
440
441
442
443
444
445
# File 'lib/wgit/url.rb', line 437

def to_sub_domain
  return nil unless to_host

  dot_domain = ".#{to_domain}"
  return nil unless include?(dot_domain)

  sub_domain = to_host.sub(dot_domain, '')
  Wgit::Url.new(sub_domain)
end

#to_uriURI::HTTP, URI::HTTPS Also known as: uri

Returns a normalised URI object for this URL.

Returns:

  • (URI::HTTP, URI::HTTPS)

    The URI object of self.



374
375
376
# File 'lib/wgit/url.rb', line 374

def to_uri
  URI(normalize)
end

#to_urlWgit::Url Also known as: url

Returns self.

Returns:



388
389
390
# File 'lib/wgit/url.rb', line 388

def to_url
  self
end

#to_userWgit::Url? Also known as: user

Returns a new Wgit::Url containing just the username string of this URL e.g. Given http://me:[email protected], me is returned.

Returns:

  • (Wgit::Url, nil)

    Containing just the user string or nil.



559
560
561
562
# File 'lib/wgit/url.rb', line 559

def to_user
  user = @uri.user
  user ? Wgit::Url.new(user) : nil
end

#valid?Boolean Also known as: is_valid?

Returns if self is a valid and absolute HTTP URL or not. Self should always be crawlable if this method returns true.

Returns:

  • (Boolean)

    True if valid, absolute and crawable, otherwise false.



273
274
275
276
277
278
279
# File 'lib/wgit/url.rb', line 273

def valid?
  return false if relative?
  return false unless to_origin && to_domain
  return false unless URI::DEFAULT_PARSER.make_regexp.match(normalize)

  true
end