Module: Linguist::BlobHelper
Overview
DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like ‘Linguist.detect` over `Blob#language`. Functions are much easier to cache and compose.
Avoid adding additional bloat to this module.
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constant Summary collapse
- MEGABYTE =
1024 * 1024
- VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
- DocumentationRegexp =
Regexp.new(documentation_paths.join('|'))
- DETECTABLE_TYPES =
[:programming, :markup].freeze
Instance Method Summary collapse
-
#_mime_type ⇒ Object
Internal: Lookup mime type for filename.
-
#binary? ⇒ Boolean
Public: Is the blob binary?.
-
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type.
-
#content_type ⇒ Object
Public: Get the Content-Type header value.
-
#csv? ⇒ Boolean
Public: Is this blob a CSV file?.
-
#detect_encoding ⇒ Object
Try to guess the encoding.
-
#disposition ⇒ Object
Public: Get the Content-Disposition header value.
-
#documentation? ⇒ Boolean
Public: Is the blob in a documentation directory?.
-
#empty? ⇒ Boolean
Public: Is the blob empty?.
- #encoded_newlines_re ⇒ Object
- #encoding ⇒ Object
-
#extname ⇒ Object
Public: Get the extname of the path.
- #first_lines(n) ⇒ Object
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?.
-
#image? ⇒ Boolean
Public: Is the blob a supported image format?.
-
#include_in_language_stats? ⇒ Boolean
Internal: Should this blob be included in repository language statistics?.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#large? ⇒ Boolean
Public: Is the blob too big to load?.
- #last_lines(n) ⇒ Object
-
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#mime_type ⇒ Object
Public: Get the actual blob mime type.
-
#pdf? ⇒ Boolean
Public: Is the blob a PDF?.
- #ruby_encoding ⇒ Object
-
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?.
-
#text? ⇒ Boolean
Public: Is the blob text?.
-
#tm_scope ⇒ Object
Internal: Get the TextMate compatible scope for the blob.
-
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?.
-
#viewable? ⇒ Boolean
Public: Is the blob viewable?.
Instance Method Details
#_mime_type ⇒ Object
Internal: Lookup mime type for filename.
Returns a MIME::Type
32 33 34 35 36 37 38 |
# File 'lib/linguist/blob_helper.rb', line 32 def _mime_type if defined? @_mime_type @_mime_type else @_mime_type = MiniMime.lookup_by_filename(name.to_s) end end |
#binary? ⇒ Boolean
Public: Is the blob binary?
Return true or false
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/linguist/blob_helper.rb', line 125 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end |
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type
Return true or false
55 56 57 |
# File 'lib/linguist/blob_helper.rb', line 55 def binary_mime_type? _mime_type ? _mime_type.binary? : false end |
#content_type ⇒ Object
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'
Returns a content type String.
78 79 80 81 |
# File 'lib/linguist/blob_helper.rb', line 78 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end |
#csv? ⇒ Boolean
Public: Is this blob a CSV file?
Return true or false
175 176 177 |
# File 'lib/linguist/blob_helper.rb', line 175 def csv? text? && extname.downcase == '.csv' end |
#detect_encoding ⇒ Object
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or
no valid encoding could be found
118 119 120 |
# File 'lib/linguist/blob_helper.rb', line 118 def detect_encoding @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end |
#disposition ⇒ Object
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar"
# => "inline"
Returns a content disposition String.
91 92 93 94 95 96 97 98 99 |
# File 'lib/linguist/blob_helper.rb', line 91 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else "attachment; filename=#{CGI.escape(name)}" end end |
#documentation? ⇒ Boolean
Public: Is the blob in a documentation directory?
Documentation files are ignored by language statistics.
See “documentation.yml” for a list of documentation conventions that match this pattern.
Return true or false
245 246 247 |
# File 'lib/linguist/blob_helper.rb', line 245 def documentation? path =~ DocumentationRegexp ? true : false end |
#empty? ⇒ Boolean
Public: Is the blob empty?
Return true or false
147 148 149 |
# File 'lib/linguist/blob_helper.rb', line 147 def empty? data.nil? || data == "" end |
#encoded_newlines_re ⇒ Object
287 288 289 290 291 |
# File 'lib/linguist/blob_helper.rb', line 287 def encoded_newlines_re @encoded_newlines_re ||= Regexp.union(["\r\n", "\r", "\n"]. map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) }) end |
#encoding ⇒ Object
101 102 103 104 105 |
# File 'lib/linguist/blob_helper.rb', line 101 def encoding if hash = detect_encoding hash[:encoding] end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
25 26 27 |
# File 'lib/linguist/blob_helper.rb', line 25 def extname File.extname(name.to_s) end |
#first_lines(n) ⇒ Object
293 294 295 296 297 298 299 300 301 302 303 |
# File 'lib/linguist/blob_helper.rb', line 293 def first_lines(n) return lines[0...n] if defined? @lines return [] unless viewable? && data i, c = 0, 0 while c < n && j = data.index(encoded_newlines_re, i) i = j + $&.length c += 1 end data[0...i].split(encoded_newlines_re, -1) end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is suppressed in diffs and is ignored by language statistics.
May load Blob#data
Return true or false
358 359 360 |
# File 'lib/linguist/blob_helper.rb', line 358 def generated? @_generated ||= Generated.generated?(path, lambda { data }) end |
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?
Return true or false
205 206 207 208 |
# File 'lib/linguist/blob_helper.rb', line 205 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end |
#image? ⇒ Boolean
Public: Is the blob a supported image format?
Return true or false
161 162 163 |
# File 'lib/linguist/blob_helper.rb', line 161 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase) end |
#include_in_language_stats? ⇒ Boolean
Internal: Should this blob be included in repository language statistics?
379 380 381 382 383 384 385 386 387 |
# File 'lib/linguist/blob_helper.rb', line 379 def include_in_language_stats? !vendored? && !documentation? && !generated? && language && ( defined?(detectable?) && !detectable?.nil? ? detectable? : DETECTABLE_TYPES.include?(language.type) ) end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
367 368 369 |
# File 'lib/linguist/blob_helper.rb', line 367 def language @language ||= Linguist.detect(self) end |
#large? ⇒ Boolean
Public: Is the blob too big to load?
Return true or false
191 192 193 |
# File 'lib/linguist/blob_helper.rb', line 191 def large? size.to_i > MEGABYTE end |
#last_lines(n) ⇒ Object
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
# File 'lib/linguist/blob_helper.rb', line 305 def last_lines(n) if defined? @lines if n >= @lines.length @lines else lines[-n..-1] end end return [] unless viewable? && data no_eol = true i, c = data.length, 0 k = i while c < n && j = data.rindex(encoded_newlines_re, i - 1) if c == 0 && j + $&.length == i no_eol = false n += 1 end i = j k = j + $&.length c += 1 end r = data[k..-1].split(encoded_newlines_re, -1) r.pop if !no_eol r end |
#likely_binary? ⇒ Boolean
Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.
Return true or false
64 65 66 |
# File 'lib/linguist/blob_helper.rb', line 64 def likely_binary? binary_mime_type? && !Language.find_by_filename(name) end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
# File 'lib/linguist/blob_helper.rb', line 254 def lines @lines ||= if viewable? && data # `data` is usually encoded as ASCII-8BIT even when the content has # been detected as a different encoding. However, we are not allowed # to change the encoding of `data` because we've made the implicit # guarantee that each entry in `lines` is encoded the same way as # `data`. # # Instead, we re-encode each possible newline sequence as the # detected encoding, then force them back to the encoding of `data` # (usually a binary encoding like ASCII-8BIT). This means that the # byte sequence will match how newlines are likely encoded in the # file, but we don't have to change the encoding of `data` as far as # Ruby is concerned. This allows us to correctly parse out each line # without changing the encoding of `data`, and # also--importantly--without having to duplicate many (potentially # large) strings. begin # `data` is split after having its last `\n` removed by # chomp (if any). This prevents the creation of an empty # element after the final `\n` character on POSIX files. data.chomp.split(encoded_newlines_re, -1) rescue Encoding::ConverterNotFoundError # The data is not splittable in the detected encoding. Assume it's # one big line. [data] end else [] end end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
337 338 339 |
# File 'lib/linguist/blob_helper.rb', line 337 def loc lines.size end |
#mime_type ⇒ Object
Public: Get the actual blob mime type
Examples
# => 'text/plain'
# => 'text/html'
Returns a mime type String.
48 49 50 |
# File 'lib/linguist/blob_helper.rb', line 48 def mime_type _mime_type ? _mime_type.content_type : 'text/plain' end |
#pdf? ⇒ Boolean
Public: Is the blob a PDF?
Return true or false
182 183 184 |
# File 'lib/linguist/blob_helper.rb', line 182 def pdf? extname.downcase == '.pdf' end |
#ruby_encoding ⇒ Object
107 108 109 110 111 |
# File 'lib/linguist/blob_helper.rb', line 107 def ruby_encoding if hash = detect_encoding hash[:ruby_encoding] end end |
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?
Return true or false
198 199 200 |
# File 'lib/linguist/blob_helper.rb', line 198 def safe_to_colorize? !large? && text? && !high_ratio_of_long_lines? end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
346 347 348 |
# File 'lib/linguist/blob_helper.rb', line 346 def sloc lines.grep(/\S/).size end |
#solid? ⇒ Boolean
Public: Is the blob a supported 3D model format?
Return true or false
168 169 170 |
# File 'lib/linguist/blob_helper.rb', line 168 def solid? extname.downcase == '.stl' end |
#text? ⇒ Boolean
Public: Is the blob text?
Return true or false
154 155 156 |
# File 'lib/linguist/blob_helper.rb', line 154 def text? !binary? end |
#tm_scope ⇒ Object
Internal: Get the TextMate compatible scope for the blob
372 373 374 |
# File 'lib/linguist/blob_helper.rb', line 372 def tm_scope language && language.tm_scope end |
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
230 231 232 |
# File 'lib/linguist/blob_helper.rb', line 230 def vendored? path =~ VendoredRegexp ? true : false end |
#viewable? ⇒ Boolean
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
215 216 217 |
# File 'lib/linguist/blob_helper.rb', line 215 def viewable? !large? && text? end |