Class: RightScraper::Repositories::Base
- Inherits:
-
RightScraper::RegisteredBase
- Object
- RightScraper::RegisteredBase
- RightScraper::Repositories::Base
- Defined in:
- lib/right_scraper/repositories/base.rb
Overview
Description of remote repository that needs to be scraped.
Repository definitions inherit from this base class. A repository must register its #repo_type in @@types so that they can be used with Repositories::Base::from_hash, as follows:
class Foo < ::RightScraper::Repositories::Base
...
# self-register
register_self
register_url_schemas('foo')
end
Subclasses should override #repo_type, #retriever and #to_url; when sensible, #revision should also be overridden. The most important methods are #to_url, which will return a URI
that completely characterizes the repository, and #retriever which returns the appropriate RightScraper::Retrievers::Base to scan that repository.
Defined Under Namespace
Modules: PATTERN Classes: RepositoryError
Instance Attribute Summary collapse
-
#display_name ⇒ Object
(String) Human readable repository name used for progress reports.
-
#resources_path ⇒ Object
(Array of String) Subdirectories in the repository to search for resources.
-
#url ⇒ Object
(String) URL to repository (e.g ‘git://github.com/rightscale/right_scraper.git’).
Class Method Summary collapse
-
.from_hash(repo_hash) ⇒ RightScraper::Repositories::Base
Factory method for a new repository.
-
.register_url_schemas(*args) ⇒ TrueClass
Registers any unknown URL schemas for validation.
-
.registered_url_schemas ⇒ Set
Set of registered repo url schemas.
-
.registration_module ⇒ Module
Module for registered repository types.
Instance Method Summary collapse
-
#==(other) ⇒ Object
Return true if this repository and
other
represent the same repository including the same checkout tag. -
#checkout_hash ⇒ Object
Return a unique identifier for this revision in this repository.
-
#equal_repo?(other) ⇒ Boolean
Return true if this repository and
other
represent the same repository, excluding the checkout tag. -
#repo_type ⇒ Object
(String) Type of the repository.
-
#repository_hash ⇒ Object
Return a unique identifier for this repository ignoring the tags to check out.
-
#retriever(options) ⇒ Object
(RightScraper::Retrievers::Base class) Appropriate class for retrieving this sort of repository.
-
#revision ⇒ Object
Return the revision this repository is currently looking at.
-
#to_s ⇒ Object
Unique representation for this repo, should resolve to the same string for repos that should be cloned in same directory.
-
#to_url ⇒ Object
Convert this repository to a URL in the style of resource URLs.
Methods inherited from RightScraper::RegisteredBase
query_registered_type, register_class, register_self, registered_types
Instance Attribute Details
#display_name ⇒ Object
(String) Human readable repository name used for progress reports
110 111 112 |
# File 'lib/right_scraper/repositories/base.rb', line 110 def display_name @display_name end |
#resources_path ⇒ Object
(Array of String) Subdirectories in the repository to search for resources
113 114 115 |
# File 'lib/right_scraper/repositories/base.rb', line 113 def resources_path @resources_path end |
#url ⇒ Object
(String) URL to repository (e.g ‘git://github.com/rightscale/right_scraper.git’)
116 117 118 |
# File 'lib/right_scraper/repositories/base.rb', line 116 def url @url end |
Class Method Details
.from_hash(repo_hash) ⇒ RightScraper::Repositories::Base
Factory method for a new repository.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/right_scraper/repositories/base.rb', line 92 def self.from_hash(repo_hash) repo_type = repo_hash[:repo_type].to_s raise ::ArgumentError, ':repo_type is required' if repo_type.empty? repo_class = query_registered_type(repo_type) repo = repo_class.new validate_uri(repo_hash[:url]) unless ENV['VALIDATE_URI'].to_s == 'false' repo_hash.each do |k, v| k = k.to_sym next if k == :repo_type if [:first_credential, :second_credential].include?(k) && is_useful?(v) v = useful_part(v) end repo.__send__("#{k.to_s}=".to_sym, v) end repo end |
.register_url_schemas(*args) ⇒ TrueClass
Registers any unknown URL schemas for validation.
77 78 79 80 81 82 83 84 85 |
# File 'lib/right_scraper/repositories/base.rb', line 77 def self.register_url_schemas(*args) # note that set += blah seems to be badly implemented as set = set + blah # for the Set class, which leaves the original set object unchanged and # will return a new set object with the new data. only use the << operator # to update an existing set object. schemas = registered_url_schemas Array(args).flatten.each { |schema| schemas << schema } true end |
.registered_url_schemas ⇒ Set
Returns set of registered repo url schemas.
64 65 66 67 68 69 70 |
# File 'lib/right_scraper/repositories/base.rb', line 64 def self.registered_url_schemas unless schemas = registration_module.instance_variable_get(:@registered_url_schemas) schemas = ::Set.new(['http', 'https', 'ftp']) registration_module.instance_variable_set(:@registered_url_schemas, schemas) end schemas end |
.registration_module ⇒ Module
Returns module for registered repository types.
59 60 61 |
# File 'lib/right_scraper/repositories/base.rb', line 59 def self.registration_module ::RightScraper::Repositories end |
Instance Method Details
#==(other) ⇒ Object
Return true if this repository and other
represent the same repository including the same checkout tag.
Parameters
- other(Repositories::Base)
-
repository to compare with
Returns
- Boolean
-
true iff this repository and
other
are the same
190 191 192 193 194 195 196 |
# File 'lib/right_scraper/repositories/base.rb', line 190 def ==(other) if other.is_a?(RightScraper::Repositories::Base) checkout_hash == other.checkout_hash else false end end |
#checkout_hash ⇒ Object
Return a unique identifier for this revision in this repository.
Returns
- String
-
opaque unique ID for this revision in this repository
161 162 163 |
# File 'lib/right_scraper/repositories/base.rb', line 161 def checkout_hash repository_hash end |
#equal_repo?(other) ⇒ Boolean
Return true if this repository and other
represent the same repository, excluding the checkout tag.
Parameters
- other(Repositories::Base)
-
repository to compare with
Returns
- Boolean
-
true iff this repository and
other
are the same
206 207 208 209 210 211 212 |
# File 'lib/right_scraper/repositories/base.rb', line 206 def equal_repo?(other) if other.is_a?(RightScraper::Repositories::Base) repository_hash == other.repository_hash else false end end |
#repo_type ⇒ Object
(String) Type of the repository. Currently one of ‘git’, ‘svn’ or ‘download’, implemented by the appropriate subclass. Needs to be overridden by subclasses.
121 122 123 |
# File 'lib/right_scraper/repositories/base.rb', line 121 def repo_type raise NotImplementedError end |
#repository_hash ⇒ Object
Return a unique identifier for this repository ignoring the tags to check out.
Returns
- String
-
opaque unique ID for this repository
153 154 155 |
# File 'lib/right_scraper/repositories/base.rb', line 153 def repository_hash digest("#{::RightScraper::PROTOCOL_VERSION}\000#{repo_type}\000#{url}") end |
#retriever(options) ⇒ Object
(RightScraper::Retrievers::Base class) Appropriate class for retrieving this sort of repository. Needs to be overridden appropriately by subclasses.
Options
:max_bytes
-
Maximum number of bytes to read
:max_seconds
-
Maximum number of seconds to spend reading
:basedir
-
Destination directory, use temp dir if not specified
:logger
-
Logger to use
Returns
- retriever(Retrievers::Base)
-
Corresponding retriever instance
136 137 138 |
# File 'lib/right_scraper/repositories/base.rb', line 136 def retriever() raise NotImplementedError end |
#revision ⇒ Object
Return the revision this repository is currently looking at.
Returns
- String
-
opaque revision type
144 145 146 |
# File 'lib/right_scraper/repositories/base.rb', line 144 def revision nil end |
#to_s ⇒ Object
Unique representation for this repo, should resolve to the same string for repos that should be cloned in same directory
Returns
- res(String)
-
Unique representation for this repo
170 171 172 |
# File 'lib/right_scraper/repositories/base.rb', line 170 def to_s res = "#{repo_type} #{url}" end |
#to_url ⇒ Object
Convert this repository to a URL in the style of resource URLs.
Returns
- URI
-
URL representing this repository
178 179 180 |
# File 'lib/right_scraper/repositories/base.rb', line 178 def to_url URI.parse(url) end |