Class: NetworkUtils::UrlInfo

Inherits:
Object
  • Object
show all
Defined in:
lib/network_utils/url_info.rb

Overview

Simple class to get URL info (validation/existance, headers, content-type) Allows to get all this stuff without actually downloading huge files like CSVs, images, videos, etc.

Instance Method Summary collapse

Constructor Details

#initialize(url, request_timeout = 10) ⇒ UrlInfo

Initialise a UrlInfo for a particular URL

Parameters:

  • url (String)

    the URL you want to get info about

  • request_timeout (Integer) (defaults to: 10)

    Max time to wait for headers from the server (seconds)



29
30
31
32
# File 'lib/network_utils/url_info.rb', line 29

def initialize(url, request_timeout = 10)
  @url = String.new(url.to_s).force_encoding('UTF-8')
  @request_timeout = request_timeout
end

Instance Method Details

#content_typeString

A shortcut method to get the Content-Type of the remote resource

Returns:

  • (String)

    remote resource Content-Type Header content



72
73
74
75
76
# File 'lib/network_utils/url_info.rb', line 72

def content_type
  headers&.fetch('content-type', nil)
         &.split(/,\s/)
         &.map { |ct| ct.split(/;\s/).first }
end

#headersHash?

A method to get the remote resource HTTP headers Caches the result and returns memoised version

Returns:

  • (Hash, nil)

    remote resource HTTP headers list or nil



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/network_utils/url_info.rb', line 82

def headers
  return nil if @url.to_s.empty?
  return nil unless (encoded_url = encode(@url))

  Timeout.timeout(@request_timeout + CODE_TIMEOUT_EXTRA) do
    response = HTTParty.head(encoded_url, timeout: @request_timeout)
    raise response.response if response.response.is_a?(Net::HTTPServerError) ||
                               response.response.is_a?(Net::HTTPClientError)

    @headers ||= response.headers
  end
rescue SocketError, ThreadError, Errno::ENETUNREACH, Errno::ECONNREFUSED,
       Errno::EADDRNOTAVAIL, Timeout::Error, TypeError,
       Net::HTTPServerError, Net::HTTPClientError, Net::OpenTimeout
  nil
end

#is?(type) ⇒ Boolean

Check the Content-Type of the resource

Parameters:

  • type (String, Symbol, Array)

    the prefix (before “/”) or full Content-Type content

Returns:

  • (Boolean)

    true if Content-Type matches something from the types list



38
39
40
41
42
43
44
45
# File 'lib/network_utils/url_info.rb', line 38

def is?(type)
  return false if type.to_s.empty?

  expected_types = Array.wrap(type).map(&:to_s)
  content_type && expected_types.select do |t|
    content_type.select { |ct| ct.start_with?(t) }
  end.any?
end

#sizeInteger

A shortcut method to get the remote resource size

Returns:

  • (Integer)

    remote resource size (bytes), 0 if there’s nothing



65
66
67
# File 'lib/network_utils/url_info.rb', line 65

def size
  headers&.fetch('content-length', 0).to_i
end

#valid?Boolean

Check offline URL validity

Returns:

  • (Boolean)

    true if the URL is valid from the point of view of the standard



50
51
52
# File 'lib/network_utils/url_info.rb', line 50

def valid?
  @url.match?(UrlRegex.get(mode: :validation))
end

#valid_online?Boolean

Check online URL validity (& format validity as well)

Returns:

  • (Boolean)

    true if the URL is valid from the point of view of the standard & exists (has headers)



58
59
60
# File 'lib/network_utils/url_info.rb', line 58

def valid_online?
  valid? && headers
end