Class: Html2rss::Url

Inherits:

Object

Object
Html2rss::Url

show all

Includes:: Comparable

Defined in:: lib/html2rss/url.rb

Overview

A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.

Examples:

Creating a URL from a relative path

url = Url.from_relative('/path/to/article', 'https://example.com')
url.to_s # => "https://example.com/path/to/article"

Sanitizing a raw URL string

url = Url.sanitize('https://example.com/  ')
url.to_s # => "https://example.com/"

Getting titleized versions

url = Url.from_relative('/foo-bar/baz.txt', 'https://example.com')
url.titleized # => "Foo Bar Baz"
url.channel_titleized # => "example.com: Foo Bar Baz"

Constant Summary collapse

URI_REGEXP = Regular expression for basic URI format validation

Addressable::URI::URIREGEX

SUPPORTED_SCHEMES =

%w[http https].to_set.freeze

Class Method Summary collapse

.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation.
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
.sanitize(raw_url) ⇒ Url^?
Creates a URL by sanitizing a raw URL string.

Instance Method Summary collapse

#<=>(other) ⇒ Integer
Compares this URL with another URL for equality.
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
#absolute? ⇒ Boolean
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host.
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match hash.
#fragment ⇒ Object
#hash ⇒ Integer
Returns the hash code for this URL.
#host ⇒ Object
#initialize(uri) ⇒ Url constructor
A new instance of Url.
#inspect ⇒ String
Returns a string representation of the URL for debugging.
#path ⇒ Object
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
#port ⇒ Object
#query ⇒ Object
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
#scheme ⇒ Object
#titleized ⇒ String
Returns a titleized representation of the URL path.
#to_s ⇒ Object
Delegate common URI operations to the underlying URI.
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.

Constructor Details

#initialize(uri) ⇒ `Url`

Returns a new instance of Url.

Parameters:

uri (Addressable::URI) —
the underlying Addressable::URI object (internal use only)

# File 'lib/html2rss/url.rb', line 120

def initialize(uri)
  @uri = uri.freeze
  freeze
end

Class Method Details

.for_channel(url_string) ⇒ `Url`

Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).

Examples:

Creating a channel URL

Url.for_channel('https://example.com')
# => #<Html2rss::Url:... @uri=#<Addressable::URI:... URI:https://example.com>>

Invalid channel URL

Url.for_channel('/relative/path')
# => raises ArgumentError: "URL must be absolute"

Parameters:

url_string (String) —
the URL string to validate and parse

Returns:

(Url) —
the validated and parsed URL

Raises:

(ArgumentError) —
if the URL doesn't meet channel requirements

# File 'lib/html2rss/url.rb', line 91

def self.for_channel(url_string)
  return nil if url_string.nil? || url_string.empty?

  stripped = url_string.strip
  return nil if stripped.empty?

  url = from_absolute(stripped)
  validate_channel_url(url)
  url
end

.from_absolute(url_string) ⇒ `Url`

Creates a URL from an already-absolute URL string.

Parameters:

url_string (String, Html2rss::Url) —
the absolute URL to parse

Returns:

(Url) —
the parsed and normalized URL

Raises:

(ArgumentError) —
if the URL is not absolute or cannot be parsed

# File 'lib/html2rss/url.rb', line 67

def self.from_absolute(url_string)
  return url_string if url_string.is_a?(self)

  url = new(Addressable::URI.parse(url_string.to_s.strip).normalize)
  raise ArgumentError, 'URL must be absolute' unless url.absolute?

  url
rescue Addressable::URI::InvalidURIError
  raise ArgumentError, 'URL must be absolute'
end

.from_relative(relative_url, base_url) ⇒ `Url`

Creates a URL from a relative path and base URL.

Parameters:

relative_url (String, Html2rss::Url) —
the relative URL to resolve
base_url (String, Html2rss::Url) —
the base URL to resolve against

Returns:

(Url) —
the resolved absolute URL

Raises:

(ArgumentError) —
if the URL cannot be parsed

# File 'lib/html2rss/url.rb', line 37

def self.from_relative(relative_url, base_url)
  url = Addressable::URI.parse(relative_url.to_s.strip)
  return new(url) if url.absolute?

  base_uri = Addressable::URI.parse(base_url.to_s)
  base_uri.path = '/' if base_uri.path.empty?

  new(base_uri.join(url).normalize)
end

.sanitize(raw_url) ⇒ `Url`^?

Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.

Parameters:

raw_url (String) —
the raw URL string to sanitize

Returns:

(Url, nil) —
the sanitized URL, or nil if no valid URL found

# File 'lib/html2rss/url.rb', line 53

def self.sanitize(raw_url)
  matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+})
  url = matched_urls.first.to_s.strip
  return nil if url.empty?

  new(Addressable::URI.parse(url).normalize)
end

Instance Method Details

#<=>(other) ⇒ `Integer`

Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.

Parameters:

other (Url) —
the other URL to compare with

Returns:

(Integer) —
-1, 0, or 1 for less than, equal, or greater than



224
225
226

# File 'lib/html2rss/url.rb', line 224

def <=>(other)
  to_s <=> other.to_s
end

#==(other) ⇒ `Boolean`

Returns true if this URL is equal to another URL.

Parameters:

other (Object) —
the other object to compare with

Returns:

(Boolean) —
true if the URLs are equal



233
234
235

# File 'lib/html2rss/url.rb', line 233

def ==(other)
  other.is_a?(Url) && to_s == other.to_s
end

#absolute? ⇒ `Boolean`

Returns:

(Boolean)

133	# File 'lib/html2rss/url.rb', line 133 def absolute? = @uri.absolute?

#channel_titleized ⇒ `String`

Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.

Examples:

With path

url = Url.from_absolute('https://example.com/foo-bar/baz')
url.channel_titleized # => "example.com: Foo Bar Baz"

Without path (root URL)

url = Url.from_absolute('https://example.com')
url.channel_titleized # => "example.com"

Returns:

(String) —
the titleized channel URL

# File 'lib/html2rss/url.rb', line 211

def channel_titleized
  nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?)
  host = @uri.host

  nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host
end

#eql?(other) ⇒ `Boolean`

Supports hash-based comparisons by ensuring equality semantics match hash.

Parameters:

other (Object) —
the other object to compare with

Returns:

(Boolean) —
true if the URLs are considered equal



242
243
244

# File 'lib/html2rss/url.rb', line 242

def eql?(other)
  other.is_a?(Url) && to_s == other.to_s
end

#fragment ⇒ `Object`

132	# File 'lib/html2rss/url.rb', line 132 def fragment = @uri.fragment

#hash ⇒ `Integer`

Returns the hash code for this URL.

Returns:

(Integer) —
the hash code



250
251
252

# File 'lib/html2rss/url.rb', line 250

def hash
  to_s.hash
end

#host ⇒ `Object`

128	# File 'lib/html2rss/url.rb', line 128 def host = @uri.host

#inspect ⇒ `String`

Returns a string representation of the URL for debugging.

Returns:

(String) —
the debug representation



258
259
260

# File 'lib/html2rss/url.rb', line 258

def inspect
  "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>"
end

#path ⇒ `Object`

130	# File 'lib/html2rss/url.rb', line 130 def path = @uri.path

#path_segments ⇒ `Array<String>`

Returns the URL path split into non-empty segments.

Returns:

(Array<String>) —
normalized path segments



147
148
149

# File 'lib/html2rss/url.rb', line 147

def path_segments
  @uri.path.to_s.split('/').reject(&:empty?)
end

#port ⇒ `Object`

129	# File 'lib/html2rss/url.rb', line 129 def port = @uri.port

#query ⇒ `Object`

131	# File 'lib/html2rss/url.rb', line 131 def query = @uri.query

#query_values ⇒ `Hash{String => String}`

Returns the URL query string as a hash of string keys and values.

Returns:

(Hash{String => String}) —
normalized query parameters



139
140
141

# File 'lib/html2rss/url.rb', line 139

def query_values
  @uri.query_values(Hash) || {}
end

#scheme ⇒ `Object`

127	# File 'lib/html2rss/url.rb', line 127 def scheme = @uri.scheme

#titleized ⇒ `String`

Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.

Examples:

Basic titleization

url = Url.from_absolute('https://example.com/foo-bar/baz.txt')
url.titleized # => "Foo Bar Baz"

With URL encoding

url = Url.from_absolute('https://example.com/hello%20world/article.html')
url.titleized # => "Hello World Article"

Returns:

(String) —
the titleized path, or empty string if path is empty

# File 'lib/html2rss/url.rb', line 185

def titleized
  path = @uri.path
  return '' if path.empty?

  nicer_path = CGI.unescapeURIComponent(path)
                  .split('/')
                  .flat_map do |part|
                    part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split
                  end

  nicer_path.map!(&:capitalize)
  File.basename(nicer_path.join(' '), '.*')
end

#to_s ⇒ `Object`

Delegate common URI operations to the underlying URI

126	# File 'lib/html2rss/url.rb', line 126 def to_s = @uri.to_s

#with_path(path) ⇒ `Url`

Returns a copy of the URL with the provided path.

Parameters:

path (String) —
normalized absolute path

Returns:

(Url) —
a new URL with the updated path

# File 'lib/html2rss/url.rb', line 156

def with_path(path)
  uri = @uri.dup
  uri.path = path
  self.class.from_absolute(uri.normalize.to_s)
end

#with_query_values(values) ⇒ `Url`

Returns a copy of the URL with the provided query values.

Parameters:

values (Hash{String, Symbol => #to_s}) —
query parameters to assign

Returns:

(Url) —
a new URL with the updated query string

# File 'lib/html2rss/url.rb', line 167

def with_query_values(values)
  uri = @uri.dup
  uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s)
  self.class.from_absolute(uri.normalize.to_s)
end

Class: Html2rss::Url

Overview

Examples:

Creating a URL from a relative path

Sanitizing a raw URL string

Getting titleized versions

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(uri) ⇒ Url

Class Method Details

.for_channel(url_string) ⇒ Url

Examples:

Creating a channel URL

Invalid channel URL

.from_absolute(url_string) ⇒ Url

.from_relative(relative_url, base_url) ⇒ Url

.sanitize(raw_url) ⇒ Url?

Instance Method Details

#<=>(other) ⇒ Integer

#==(other) ⇒ Boolean

#absolute? ⇒ Boolean

#channel_titleized ⇒ String

Examples:

With path

Without path (root URL)

#eql?(other) ⇒ Boolean

#fragment ⇒ Object

#hash ⇒ Integer

#host ⇒ Object

#inspect ⇒ String

#path ⇒ Object

#path_segments ⇒ Array<String>

#port ⇒ Object

#query ⇒ Object

#query_values ⇒ Hash{String => String}

#scheme ⇒ Object

#titleized ⇒ String

Examples:

Basic titleization

With URL encoding

#to_s ⇒ Object

#with_path(path) ⇒ Url

#with_query_values(values) ⇒ Url

#initialize(uri) ⇒ `Url`

.for_channel(url_string) ⇒ `Url`

.from_absolute(url_string) ⇒ `Url`

.from_relative(relative_url, base_url) ⇒ `Url`

.sanitize(raw_url) ⇒ `Url`^?

#<=>(other) ⇒ `Integer`

#==(other) ⇒ `Boolean`

#absolute? ⇒ `Boolean`

#channel_titleized ⇒ `String`

#eql?(other) ⇒ `Boolean`

#fragment ⇒ `Object`

#hash ⇒ `Integer`

#host ⇒ `Object`

#inspect ⇒ `String`

#path ⇒ `Object`

#path_segments ⇒ `Array<String>`

#port ⇒ `Object`

#query ⇒ `Object`

#query_values ⇒ `Hash{String => String}`

#scheme ⇒ `Object`

#titleized ⇒ `String`

#to_s ⇒ `Object`

#with_path(path) ⇒ `Url`

#with_query_values(values) ⇒ `Url`