Class: Html2rss::Url

Inherits:
Object
  • Object
show all
Includes:
Comparable
Defined in:
lib/html2rss/url.rb

Overview

A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.

Examples:

Creating a URL from a relative path

url = Url.from_relative('/path/to/article', 'https://example.com')
url.to_s # => "https://example.com/path/to/article"

Sanitizing a raw URL string

url = Url.sanitize('https://example.com/  ')
url.to_s # => "https://example.com/"

Getting titleized versions

url = Url.from_relative('/foo-bar/baz.txt', 'https://example.com')
url.titleized # => "Foo Bar Baz"
url.channel_titleized # => "example.com: Foo Bar Baz"

Constant Summary collapse

URI_REGEXP =

Regular expression for basic URI format validation

Addressable::URI::URIREGEX
SUPPORTED_SCHEMES =
%w[http https].to_set.freeze

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(uri) ⇒ Url

Returns a new instance of Url.

Parameters:

  • uri (Addressable::URI)

    the underlying Addressable::URI object (internal use only)



120
121
122
123
# File 'lib/html2rss/url.rb', line 120

def initialize(uri)
  @uri = uri.freeze
  freeze
end

Class Method Details

.for_channel(url_string) ⇒ Url

Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).

Examples:

Creating a channel URL

Url.for_channel('https://example.com')
# => #<Html2rss::Url:... @uri=#<Addressable::URI:... URI:https://example.com>>

Invalid channel URL

Url.for_channel('/relative/path')
# => raises ArgumentError: "URL must be absolute"

Parameters:

  • url_string (String)

    the URL string to validate and parse

Returns:

  • (Url)

    the validated and parsed URL

Raises:

  • (ArgumentError)

    if the URL doesn't meet channel requirements



91
92
93
94
95
96
97
98
99
100
# File 'lib/html2rss/url.rb', line 91

def self.for_channel(url_string)
  return nil if url_string.nil? || url_string.empty?

  stripped = url_string.strip
  return nil if stripped.empty?

  url = from_absolute(stripped)
  validate_channel_url(url)
  url
end

.from_absolute(url_string) ⇒ Url

Creates a URL from an already-absolute URL string.

Parameters:

  • url_string (String, Html2rss::Url)

    the absolute URL to parse

Returns:

  • (Url)

    the parsed and normalized URL

Raises:

  • (ArgumentError)

    if the URL is not absolute or cannot be parsed



67
68
69
70
71
72
73
74
75
76
# File 'lib/html2rss/url.rb', line 67

def self.from_absolute(url_string)
  return url_string if url_string.is_a?(self)

  url = new(Addressable::URI.parse(url_string.to_s.strip).normalize)
  raise ArgumentError, 'URL must be absolute' unless url.absolute?

  url
rescue Addressable::URI::InvalidURIError
  raise ArgumentError, 'URL must be absolute'
end

.from_relative(relative_url, base_url) ⇒ Url

Creates a URL from a relative path and base URL.

Parameters:

  • relative_url (String, Html2rss::Url)

    the relative URL to resolve

  • base_url (String, Html2rss::Url)

    the base URL to resolve against

Returns:

  • (Url)

    the resolved absolute URL

Raises:

  • (ArgumentError)

    if the URL cannot be parsed



37
38
39
40
41
42
43
44
45
# File 'lib/html2rss/url.rb', line 37

def self.from_relative(relative_url, base_url)
  url = Addressable::URI.parse(relative_url.to_s.strip)
  return new(url) if url.absolute?

  base_uri = Addressable::URI.parse(base_url.to_s)
  base_uri.path = '/' if base_uri.path.empty?

  new(base_uri.join(url).normalize)
end

.sanitize(raw_url) ⇒ Url?

Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.

Parameters:

  • raw_url (String)

    the raw URL string to sanitize

Returns:

  • (Url, nil)

    the sanitized URL, or nil if no valid URL found



53
54
55
56
57
58
59
# File 'lib/html2rss/url.rb', line 53

def self.sanitize(raw_url)
  matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+})
  url = matched_urls.first.to_s.strip
  return nil if url.empty?

  new(Addressable::URI.parse(url).normalize)
end

Instance Method Details

#<=>(other) ⇒ Integer

Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.

Parameters:

  • other (Url)

    the other URL to compare with

Returns:

  • (Integer)

    -1, 0, or 1 for less than, equal, or greater than



224
225
226
# File 'lib/html2rss/url.rb', line 224

def <=>(other)
  to_s <=> other.to_s
end

#==(other) ⇒ Boolean

Returns true if this URL is equal to another URL.

Parameters:

  • other (Object)

    the other object to compare with

Returns:

  • (Boolean)

    true if the URLs are equal



233
234
235
# File 'lib/html2rss/url.rb', line 233

def ==(other)
  other.is_a?(Url) && to_s == other.to_s
end

#absolute?Boolean

Returns:

  • (Boolean)


133
# File 'lib/html2rss/url.rb', line 133

def absolute? = @uri.absolute?

#channel_titleizedString

Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.

Examples:

With path

url = Url.from_absolute('https://example.com/foo-bar/baz')
url.channel_titleized # => "example.com: Foo Bar Baz"

Without path (root URL)

url = Url.from_absolute('https://example.com')
url.channel_titleized # => "example.com"

Returns:

  • (String)

    the titleized channel URL



211
212
213
214
215
216
# File 'lib/html2rss/url.rb', line 211

def channel_titleized
  nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?)
  host = @uri.host

  nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host
end

#eql?(other) ⇒ Boolean

Supports hash-based comparisons by ensuring equality semantics match hash.

Parameters:

  • other (Object)

    the other object to compare with

Returns:

  • (Boolean)

    true if the URLs are considered equal



242
243
244
# File 'lib/html2rss/url.rb', line 242

def eql?(other)
  other.is_a?(Url) && to_s == other.to_s
end

#fragmentObject



132
# File 'lib/html2rss/url.rb', line 132

def fragment = @uri.fragment

#hashInteger

Returns the hash code for this URL.

Returns:

  • (Integer)

    the hash code



250
251
252
# File 'lib/html2rss/url.rb', line 250

def hash
  to_s.hash
end

#hostObject



128
# File 'lib/html2rss/url.rb', line 128

def host = @uri.host

#inspectString

Returns a string representation of the URL for debugging.

Returns:

  • (String)

    the debug representation



258
259
260
# File 'lib/html2rss/url.rb', line 258

def inspect
  "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>"
end

#pathObject



130
# File 'lib/html2rss/url.rb', line 130

def path = @uri.path

#path_segmentsArray<String>

Returns the URL path split into non-empty segments.

Returns:

  • (Array<String>)

    normalized path segments



147
148
149
# File 'lib/html2rss/url.rb', line 147

def path_segments
  @uri.path.to_s.split('/').reject(&:empty?)
end

#portObject



129
# File 'lib/html2rss/url.rb', line 129

def port = @uri.port

#queryObject



131
# File 'lib/html2rss/url.rb', line 131

def query = @uri.query

#query_valuesHash{String => String}

Returns the URL query string as a hash of string keys and values.

Returns:

  • (Hash{String => String})

    normalized query parameters



139
140
141
# File 'lib/html2rss/url.rb', line 139

def query_values
  @uri.query_values(Hash) || {}
end

#schemeObject



127
# File 'lib/html2rss/url.rb', line 127

def scheme = @uri.scheme

#titleizedString

Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.

Examples:

Basic titleization

url = Url.from_absolute('https://example.com/foo-bar/baz.txt')
url.titleized # => "Foo Bar Baz"

With URL encoding

url = Url.from_absolute('https://example.com/hello%20world/article.html')
url.titleized # => "Hello World Article"

Returns:

  • (String)

    the titleized path, or empty string if path is empty



185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/html2rss/url.rb', line 185

def titleized
  path = @uri.path
  return '' if path.empty?

  nicer_path = CGI.unescapeURIComponent(path)
                  .split('/')
                  .flat_map do |part|
                    part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split
                  end

  nicer_path.map!(&:capitalize)
  File.basename(nicer_path.join(' '), '.*')
end

#to_sObject

Delegate common URI operations to the underlying URI



126
# File 'lib/html2rss/url.rb', line 126

def to_s = @uri.to_s

#with_path(path) ⇒ Url

Returns a copy of the URL with the provided path.

Parameters:

  • path (String)

    normalized absolute path

Returns:

  • (Url)

    a new URL with the updated path



156
157
158
159
160
# File 'lib/html2rss/url.rb', line 156

def with_path(path)
  uri = @uri.dup
  uri.path = path
  self.class.from_absolute(uri.normalize.to_s)
end

#with_query_values(values) ⇒ Url

Returns a copy of the URL with the provided query values.

Parameters:

  • values (Hash{String, Symbol => #to_s})

    query parameters to assign

Returns:

  • (Url)

    a new URL with the updated query string



167
168
169
170
171
# File 'lib/html2rss/url.rb', line 167

def with_query_values(values)
  uri = @uri.dup
  uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s)
  self.class.from_absolute(uri.normalize.to_s)
end