Class: Gitlab::Git::Blob
- Inherits:
-
Object
- Object
- Gitlab::Git::Blob
- Extended by:
- WrapsGitalyErrors
- Includes:
- BlobHelper, EncodingHelper
- Defined in:
- lib/gitlab/git/blob.rb
Constant Summary collapse
- MAX_DATA_DISPLAY_SIZE =
This number is the maximum amount of data that we want to display to the user. We load as much as we can for encoding detection and LFS pointer parsing. All other cases where we need full blob data should use load_all_data!.
10.megabytes
- BATCH_SIZE =
The number of blobs loaded in a single Gitaly call When a large number of blobs requested, we’d want to fetch them in multiple Gitaly calls
250
- LFS_POINTER_MIN_SIZE =
These limits are used as a heuristic to ignore files which can’t be LFS pointers. The format of these is described in github.com/git-lfs/git-lfs/blob/master/docs/spec.md#the-pointer
120.bytes
- LFS_POINTER_MAX_SIZE =
200.bytes
Constants included from EncodingHelper
EncodingHelper::BOM_UTF8, EncodingHelper::ENCODING_CONFIDENCE_THRESHOLD, EncodingHelper::ESCAPED_CHARS, EncodingHelper::UNICODE_REPLACEMENT_CHARACTER
Constants included from BlobHelper
Instance Attribute Summary collapse
-
#binary ⇒ Object
Returns the value of attribute binary.
-
#commit_id ⇒ Object
Returns the value of attribute commit_id.
- #data ⇒ Object
-
#id ⇒ Object
Returns the value of attribute id.
-
#loaded_size ⇒ Object
Returns the value of attribute loaded_size.
-
#mode ⇒ Object
Returns the value of attribute mode.
- #name ⇒ Object
- #path ⇒ Object
-
#size ⇒ Object
Returns the value of attribute size.
Class Method Summary collapse
-
.batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object
Returns an array of Blob instances, specified in blob_references as [[commit_sha, path], [commit_sha, path], …].
-
.batch_lfs_pointers(repository, blob_ids) ⇒ Object
Find LFS blobs given an array of sha ids Returns array of Gitlab::Git::Blob Does not guarantee blob data will be set.
-
.batch_metadata(repository, blob_references) ⇒ Object
Returns an array of Blob instances just with the metadata, that means the data attribute has no content.
- .binary?(data, cache_key: nil) ⇒ Boolean
- .find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object
- .gitlab_blob_size ⇒ Object
- .gitlab_blob_truncated_false ⇒ Object
- .gitlab_blob_truncated_true ⇒ Object
- .raw(repository, sha, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object
- .size_could_be_lfs?(size) ⇒ Boolean
- .tree_entry(repository, sha, path, limit) ⇒ Object
Instance Method Summary collapse
- #binary_in_repo? ⇒ Boolean
- #external_storage ⇒ Object
-
#initialize(options) ⇒ Blob
constructor
A new instance of Blob.
- #lfs_oid ⇒ Object
-
#lfs_pointer? ⇒ Boolean
Valid LFS object pointer is a text file consisting of version oid size see github.com/github/git-lfs/blob/v1.1.0/docs/spec.md#the-pointer.
- #lfs_size ⇒ Object (also: #external_size)
-
#load_all_data!(repository) ⇒ Object
Load all blob data (not just the first MAX_DATA_DISPLAY_SIZE bytes) into memory as a Ruby string.
- #truncated? ⇒ Boolean
Methods included from WrapsGitalyErrors
Methods included from EncodingHelper
#binary_io, #detect_binary?, #detect_encoding, #detect_libgit2_binary?, #encode!, #encode_binary, #encode_utf8, #encode_utf8_no_detect, #encode_utf8_with_escaping!, #encode_utf8_with_replacement_character, #strip_bom, #unquote_path
Methods included from BlobHelper
#_mime_type, #binary_mime_type?, #content_type, #empty?, #encoded_newlines_re, #encoding, #extname, #image?, #known_extension?, #large?, #lines, #mime_type, #ruby_encoding, #text_in_repo?, #viewable?
Constructor Details
#initialize(options) ⇒ Blob
Returns a new instance of Blob.
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/gitlab/git/blob.rb', line 122 def initialize() %w[id name path size data mode commit_id binary].each do |key| self.__send__("#{key}=", [key.to_sym]) # rubocop:disable GitlabSecurity/PublicSend end # Retain the actual size before it is encoded @loaded_size = @data.bytesize if @data @loaded_all_data = @loaded_size == size # Recalculate binary status if we loaded all data @binary = nil if @loaded_all_data record_metric_blob_size record_metric_truncated(truncated?) end |
Instance Attribute Details
#binary ⇒ Object
Returns the value of attribute binary.
27 28 29 |
# File 'lib/gitlab/git/blob.rb', line 27 def binary @binary end |
#commit_id ⇒ Object
Returns the value of attribute commit_id.
27 28 29 |
# File 'lib/gitlab/git/blob.rb', line 27 def commit_id @commit_id end |
#data ⇒ Object
142 143 144 |
# File 'lib/gitlab/git/blob.rb', line 142 def data encode! @data end |
#id ⇒ Object
Returns the value of attribute id.
27 28 29 |
# File 'lib/gitlab/git/blob.rb', line 27 def id @id end |
#loaded_size ⇒ Object
Returns the value of attribute loaded_size.
27 28 29 |
# File 'lib/gitlab/git/blob.rb', line 27 def loaded_size @loaded_size end |
#mode ⇒ Object
Returns the value of attribute mode.
27 28 29 |
# File 'lib/gitlab/git/blob.rb', line 27 def mode @mode end |
#name ⇒ Object
162 163 164 |
# File 'lib/gitlab/git/blob.rb', line 162 def name encode! @name end |
#path ⇒ Object
166 167 168 |
# File 'lib/gitlab/git/blob.rb', line 166 def path encode! @path end |
#size ⇒ Object
Returns the value of attribute size.
27 28 29 |
# File 'lib/gitlab/git/blob.rb', line 27 def size @size end |
Class Method Details
.batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object
Returns an array of Blob instances, specified in blob_references as [[commit_sha, path], [commit_sha, path], …]. If blob_size_limit < 0 then the full blob contents are returned. If blob_size_limit >= 0 then each blob will contain no more than limit bytes in its data attribute.
Keep in mind that this method may allocate a lot of memory. It is up to the caller to limit the number of blobs and blob_size_limit.
92 93 94 95 96 |
# File 'lib/gitlab/git/blob.rb', line 92 def batch(repository, blob_references, blob_size_limit: MAX_DATA_DISPLAY_SIZE) blob_references.each_slice(BATCH_SIZE).flat_map do |refs| repository.gitaly_blob_client.get_blobs(refs, blob_size_limit).to_a end end |
.batch_lfs_pointers(repository, blob_ids) ⇒ Object
Find LFS blobs given an array of sha ids Returns array of Gitlab::Git::Blob Does not guarantee blob data will be set
107 108 109 110 111 |
# File 'lib/gitlab/git/blob.rb', line 107 def batch_lfs_pointers(repository, blob_ids) wrapped_gitaly_errors do repository.gitaly_blob_client.batch_lfs_pointers(blob_ids.to_a) end end |
.batch_metadata(repository, blob_references) ⇒ Object
Returns an array of Blob instances just with the metadata, that means the data attribute has no content.
100 101 102 |
# File 'lib/gitlab/git/blob.rb', line 100 def (repository, blob_references) batch(repository, blob_references, blob_size_limit: 0) end |
.binary?(data, cache_key: nil) ⇒ Boolean
113 114 115 |
# File 'lib/gitlab/git/blob.rb', line 113 def binary?(data, cache_key: nil) EncodingHelper.detect_libgit2_binary?(data, cache_key: cache_key) end |
.find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object
48 49 50 |
# File 'lib/gitlab/git/blob.rb', line 48 def find(repository, sha, path, limit: MAX_DATA_DISPLAY_SIZE) tree_entry(repository, sha, path, limit) end |
.gitlab_blob_size ⇒ Object
38 39 40 41 42 43 44 45 |
# File 'lib/gitlab/git/blob.rb', line 38 def self.gitlab_blob_size @gitlab_blob_size ||= ::Gitlab::Metrics.histogram( :gitlab_blob_size, 'Gitlab::Git::Blob size', {}, [1_000, 5_000, 10_000, 50_000, 100_000, 500_000, 1_000_000] ) end |
.gitlab_blob_truncated_false ⇒ Object
34 35 36 |
# File 'lib/gitlab/git/blob.rb', line 34 def self.gitlab_blob_truncated_false @gitlab_blob_truncated_false ||= ::Gitlab::Metrics.counter(:gitlab_blob_truncated_false, 'blob.truncated? == false') end |
.gitlab_blob_truncated_true ⇒ Object
30 31 32 |
# File 'lib/gitlab/git/blob.rb', line 30 def self.gitlab_blob_truncated_true @gitlab_blob_truncated_true ||= ::Gitlab::Metrics.counter(:gitlab_blob_truncated_true, 'blob.truncated? == true') end |
.raw(repository, sha, limit: MAX_DATA_DISPLAY_SIZE) ⇒ Object
80 81 82 |
# File 'lib/gitlab/git/blob.rb', line 80 def raw(repository, sha, limit: MAX_DATA_DISPLAY_SIZE) repository.gitaly_blob_client.get_blob(oid: sha, limit: limit) end |
.size_could_be_lfs?(size) ⇒ Boolean
117 118 119 |
# File 'lib/gitlab/git/blob.rb', line 117 def size_could_be_lfs?(size) size.between?(LFS_POINTER_MIN_SIZE, LFS_POINTER_MAX_SIZE) end |
.tree_entry(repository, sha, path, limit) ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/gitlab/git/blob.rb', line 52 def tree_entry(repository, sha, path, limit) return unless path path = path.sub(%r{\A/*}, '') path = '/' if path.empty? name = File.basename(path) # Gitaly will think that setting the limit to 0 means unlimited, while # the client might only need the metadata and thus set the limit to 0. # In this method we'll then set the limit to 1, but clear the byte of data # that we got back so for the outside world it looks like the limit was # actually 0. req_limit = limit == 0 ? 1 : limit entry = Gitlab::GitalyClient::CommitService.new(repository).tree_entry(sha, path, req_limit) return unless entry entry.data = "" if limit == 0 case entry.type when :COMMIT new(id: entry.oid, name: name, size: 0, data: '', path: path, commit_id: sha) when :BLOB new(id: entry.oid, name: name, size: entry.size, data: entry.data.dup, mode: entry.mode.to_s(8), path: path, commit_id: sha, binary: binary?(entry.data)) end end |
Instance Method Details
#binary_in_repo? ⇒ Boolean
138 139 140 |
# File 'lib/gitlab/git/blob.rb', line 138 def binary_in_repo? @binary.nil? ? super : @binary == true end |
#external_storage ⇒ Object
203 204 205 206 207 |
# File 'lib/gitlab/git/blob.rb', line 203 def external_storage return unless lfs_pointer? :lfs end |
#lfs_oid ⇒ Object
185 186 187 188 189 190 191 192 |
# File 'lib/gitlab/git/blob.rb', line 185 def lfs_oid if has_lfs_version_key? oid = data.match(/(?<=sha256:)([0-9a-f]{64})/) return oid[1] if oid end nil end |
#lfs_pointer? ⇒ Boolean
Valid LFS object pointer is a text file consisting of version oid size see github.com/github/git-lfs/blob/v1.1.0/docs/spec.md#the-pointer
181 182 183 |
# File 'lib/gitlab/git/blob.rb', line 181 def lfs_pointer? self.class.size_could_be_lfs?(size) && has_lfs_version_key? && lfs_oid.present? && lfs_size.present? end |
#lfs_size ⇒ Object Also known as: external_size
194 195 196 197 198 199 200 201 |
# File 'lib/gitlab/git/blob.rb', line 194 def lfs_size if has_lfs_version_key? size = data.match(/(?<=size )([0-9]+)/) return size[1].to_i if size end nil end |
#load_all_data!(repository) ⇒ Object
Load all blob data (not just the first MAX_DATA_DISPLAY_SIZE bytes) into memory as a Ruby string.
148 149 150 151 152 153 154 155 156 157 158 159 160 |
# File 'lib/gitlab/git/blob.rb', line 148 def load_all_data!(repository) return if @data == '' # don't mess with submodule blobs # Even if we return early, recalculate whether this blob is binary in # case a blob was initialized as text but the full data isn't @binary = nil return if @loaded_all_data @data = repository.gitaly_blob_client.get_blob(oid: id, limit: -1).data @loaded_all_data = true @loaded_size = @data.bytesize end |
#truncated? ⇒ Boolean
170 171 172 173 174 |
# File 'lib/gitlab/git/blob.rb', line 170 def truncated? return false unless size && loaded_size size > loaded_size end |