Class: RightScraper::Retrievers::CheckoutBase
- Defined in:
- lib/right_scraper/retrievers/checkout_base.rb
Overview
Base class for retrievers that want to do version control operations (CVS, SVN, etc.). Subclasses can get away with implementing only Retrievers::Base#available? and #do_checkout but to support incremental operation need to implement #exists? and #do_update, in addition to Retrievers::Base#ignorable_paths.
Instance Attribute Summary
Attributes inherited from Base
#logger, #max_bytes, #max_seconds, #repo_dir, #repository
Instance Method Summary collapse
-
#do_checkout ⇒ TrueClass
Perform a de novo full checkout of the repository.
-
#do_update ⇒ TrueClass
Perform an incremental update of the checkout.
-
#do_update_tag ⇒ TrueClass
Updates the tag of the repository associated with this retriever to refer to the HEAD commit (SHA) on disk after retrieval.
-
#exists? ⇒ Boolean
Return true if a checkout exists.
-
#remote_differs? ⇒ TrueClass|FalseClass
Determines if the remote SHA/tag/branch referenced by the repostory differs from what appears on disk, if possible.
-
#retrieve ⇒ Object
Attempts to update and then resorts to clean checkout for repository.
-
#size_limit_exceeded? ⇒ TrueClass|FalseClass
Determines if total size of files in repo_dir has exceeded size limit.
Methods inherited from Base
#available?, #ignorable_paths, #initialize, repo_dir
Constructor Details
This class inherits a constructor from RightScraper::Retrievers::Base
Instance Method Details
#do_checkout ⇒ TrueClass
Perform a de novo full checkout of the repository. Subclasses must override this to do anything useful.
157 158 159 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 157 def do_checkout raise NotImplementedError end |
#do_update ⇒ TrueClass
Perform an incremental update of the checkout. Subclasses that want to handle incremental updating need to override this.
165 166 167 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 165 def do_update raise NotImplementedError end |
#do_update_tag ⇒ TrueClass
Updates the tag of the repository associated with this retriever to refer to the HEAD commit (SHA) on disk after retrieval.
173 174 175 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 173 def do_update_tag raise NotImplementedError end |
#exists? ⇒ Boolean
Return true if a checkout exists.
Returns
- Boolean
-
true if the checkout already exists (and thus incremental updating can occur).
117 118 119 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 117 def exists? false end |
#remote_differs? ⇒ TrueClass|FalseClass
Determines if the remote SHA/tag/branch referenced by the repostory differs from what appears on disk, if possible. Not all retrievers will have this capability. If not, the retriever should default to returning true to indicate that the remote is changed.
127 128 129 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 127 def remote_differs? true end |
#retrieve ⇒ Object
Attempts to update and then resorts to clean checkout for repository.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 39 def retrieve raise RetrieverError.new("retriever is unavailable") unless available? updated = false explanation = '' if exists? @logger.operation(:updating) do # a retriever may be able to determine that the repo directory is # already pointing to the same commit as the revision. in that case # we can return quickly. if remote_differs? # there is no point in updating and failing the size check when the # directory on disk already exceeds size limit; fall back to a clean # checkout in hopes that the latest revision corrects the issue. if size_limit_exceeded? explanation = 'switching to checkout due to existing directory exceeding size limimt' else # attempt update. begin do_update updated = true rescue ::RightScraper::Processes::Shell::LimitError # update exceeded a limitation; requires user intervention raise rescue Exception => e # retry with clean checkout after discarding repo dir. explanation = 'switching to checkout after unsuccessful update' end end else # no retrieval needed but warn exactly why we didn't do full # checkout to avoid being challenged about it. repo_ref = @repository.tag do_update_tag full_head_ref = @repository.tag abbreviated_head_ref = full_head_ref[0..6] if repo_ref == full_head_ref || repo_ref == abbreviated_head_ref detail = abbreviated_head_ref else detail = "#{repo_ref} = #{abbreviated_head_ref}" end = "Skipped updating local directory due to the HEAD commit SHA " + "on local matching the remote repository reference (#{detail})." @logger.note_warning() return false end end end # Clean checkout only if not updated. unless updated @logger.operation(:checkout, explanation) do # remove any full or partial directory before attempting a clean # checkout in case repo_dir is in a bad state. if exists? ::FileUtils.remove_entry_secure(@repo_dir) end ::FileUtils.mkdir_p(@repo_dir) begin do_checkout rescue Exception # clean checkout failed; repo directory is in an undetermined # state and must be deleted to prevent a future update attempt. if exists? ::FileUtils.remove_entry_secure(@repo_dir) rescue nil end raise end end end true end |
#size_limit_exceeded? ⇒ TrueClass|FalseClass
Determines if total size of files in repo_dir has exceeded size limit.
Return
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/right_scraper/retrievers/checkout_base.rb', line 135 def size_limit_exceeded? if @max_bytes # note that Dir.glob ignores hidden directories (e.g. ".git") so the # size total correctly excludes those hidden contents that are not to # be uploaded after scrape. this may cause the on-disk directory size # to far exceed the upload size. globbie = ::File.join(@repo_dir, '**/*') size = 0 ::Dir.glob(globbie) do |f| size += ::File.stat(f).size rescue 0 if ::File.file?(f) break if size > @max_bytes end size > @max_bytes else false end end |