Class: Html2rss::Url
Overview
A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.
Constant Summary collapse
- URI_REGEXP =
Regular expression for basic URI format validation
Addressable::URI::URIREGEX
- SUPPORTED_SCHEMES =
%w[http https].to_set.freeze
Class Method Summary collapse
-
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation.
-
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
-
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
-
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string.
Instance Method Summary collapse
-
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality.
-
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
- #absolute? ⇒ Boolean
-
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host.
-
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match
hash. - #fragment ⇒ Object
-
#hash ⇒ Integer
Returns the hash code for this URL.
- #host ⇒ Object
-
#initialize(uri) ⇒ Url
constructor
A new instance of Url.
-
#inspect ⇒ String
Returns a string representation of the URL for debugging.
- #path ⇒ Object
-
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
- #port ⇒ Object
- #query ⇒ Object
-
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
- #scheme ⇒ Object
-
#titleized ⇒ String
Returns a titleized representation of the URL path.
-
#to_s ⇒ Object
Delegate common URI operations to the underlying URI.
-
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
-
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
Constructor Details
#initialize(uri) ⇒ Url
Returns a new instance of Url.
120 121 122 123 |
# File 'lib/html2rss/url.rb', line 120 def initialize(uri) @uri = uri.freeze freeze end |
Class Method Details
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).
91 92 93 94 95 96 97 98 99 100 |
# File 'lib/html2rss/url.rb', line 91 def self.for_channel(url_string) return nil if url_string.nil? || url_string.empty? stripped = url_string.strip return nil if stripped.empty? url = from_absolute(stripped) validate_channel_url(url) url end |
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
67 68 69 70 71 72 73 74 75 76 |
# File 'lib/html2rss/url.rb', line 67 def self.from_absolute(url_string) return url_string if url_string.is_a?(self) url = new(Addressable::URI.parse(url_string.to_s.strip).normalize) raise ArgumentError, 'URL must be absolute' unless url.absolute? url rescue Addressable::URI::InvalidURIError raise ArgumentError, 'URL must be absolute' end |
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
37 38 39 40 41 42 43 44 45 |
# File 'lib/html2rss/url.rb', line 37 def self.from_relative(relative_url, base_url) url = Addressable::URI.parse(relative_url.to_s.strip) return new(url) if url.absolute? base_uri = Addressable::URI.parse(base_url.to_s) base_uri.path = '/' if base_uri.path.empty? new(base_uri.join(url).normalize) end |
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.
53 54 55 56 57 58 59 |
# File 'lib/html2rss/url.rb', line 53 def self.sanitize(raw_url) matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+}) url = matched_urls.first.to_s.strip return nil if url.empty? new(Addressable::URI.parse(url).normalize) end |
Instance Method Details
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.
224 225 226 |
# File 'lib/html2rss/url.rb', line 224 def <=>(other) to_s <=> other.to_s end |
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
233 234 235 |
# File 'lib/html2rss/url.rb', line 233 def ==(other) other.is_a?(Url) && to_s == other.to_s end |
#absolute? ⇒ Boolean
133 |
# File 'lib/html2rss/url.rb', line 133 def absolute? = @uri.absolute? |
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.
211 212 213 214 215 216 |
# File 'lib/html2rss/url.rb', line 211 def channel_titleized nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?) host = @uri.host nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host end |
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match hash.
242 243 244 |
# File 'lib/html2rss/url.rb', line 242 def eql?(other) other.is_a?(Url) && to_s == other.to_s end |
#fragment ⇒ Object
132 |
# File 'lib/html2rss/url.rb', line 132 def fragment = @uri.fragment |
#hash ⇒ Integer
Returns the hash code for this URL.
250 251 252 |
# File 'lib/html2rss/url.rb', line 250 def hash to_s.hash end |
#host ⇒ Object
128 |
# File 'lib/html2rss/url.rb', line 128 def host = @uri.host |
#inspect ⇒ String
Returns a string representation of the URL for debugging.
258 259 260 |
# File 'lib/html2rss/url.rb', line 258 def inspect "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>" end |
#path ⇒ Object
130 |
# File 'lib/html2rss/url.rb', line 130 def path = @uri.path |
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
147 148 149 |
# File 'lib/html2rss/url.rb', line 147 def path_segments @uri.path.to_s.split('/').reject(&:empty?) end |
#port ⇒ Object
129 |
# File 'lib/html2rss/url.rb', line 129 def port = @uri.port |
#query ⇒ Object
131 |
# File 'lib/html2rss/url.rb', line 131 def query = @uri.query |
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
139 140 141 |
# File 'lib/html2rss/url.rb', line 139 def query_values @uri.query_values(Hash) || {} end |
#scheme ⇒ Object
127 |
# File 'lib/html2rss/url.rb', line 127 def scheme = @uri.scheme |
#titleized ⇒ String
Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.
185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/html2rss/url.rb', line 185 def titleized path = @uri.path return '' if path.empty? nicer_path = CGI.unescapeURIComponent(path) .split('/') .flat_map do |part| part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split end nicer_path.map!(&:capitalize) File.basename(nicer_path.join(' '), '.*') end |
#to_s ⇒ Object
Delegate common URI operations to the underlying URI
126 |
# File 'lib/html2rss/url.rb', line 126 def to_s = @uri.to_s |
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
156 157 158 159 160 |
# File 'lib/html2rss/url.rb', line 156 def with_path(path) uri = @uri.dup uri.path = path self.class.from_absolute(uri.normalize.to_s) end |
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
167 168 169 170 171 |
# File 'lib/html2rss/url.rb', line 167 def with_query_values(values) uri = @uri.dup uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s) self.class.from_absolute(uri.normalize.to_s) end |