Module: Linguist::BlobHelper
- Included in:
- FileBlob
- Defined in:
- lib/linguist/blob_helper.rb
Overview
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constant Summary collapse
- MEGABYTE =
1024 * 1024
- VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
Instance Method Summary collapse
-
#average_line_length ⇒ Object
Internal: Compute average line length.
-
#binary? ⇒ Boolean
Public: Is the blob binary?.
-
#binary_mime_type? ⇒ Boolean
Public: Is the blob binary according to its mime type.
-
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob.
-
#colorize_without_wrapper(options = {}) ⇒ Object
Public: Highlight syntax of blob without the outer highlight div wrapper.
-
#compiled_coffeescript? ⇒ Boolean
Internal: Is the blob of JS generated by CoffeeScript?.
-
#content_type ⇒ Object
Public: Get the Content-Type header value.
-
#detect_encoding ⇒ Object
Try to guess the encoding.
-
#disambiguate_extension_language ⇒ Object
Internal: Disambiguates between multiple language extensions.
-
#disposition ⇒ Object
Public: Get the Content-Disposition header value.
- #encoding ⇒ Object
-
#extname ⇒ Object
Public: Get the extname of the path.
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#generated_net_docfile? ⇒ Boolean
Internal: Is this a generated documentation file for a .NET assembly?.
-
#generated_parser? ⇒ Boolean
Internal: Is the blob of JS a parser generated by PEG.js?.
-
#guess_language ⇒ Object
Internal: Guess language.
-
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?.
-
#image? ⇒ Boolean
Public: Is the blob a supported image format?.
-
#indexable? ⇒ Boolean
Public: Should the blob be indexed for searching?.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#large? ⇒ Boolean
Public: Is the blob too big to load?.
-
#lexer ⇒ Object
Internal: Get the lexer of the blob.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#mime_type ⇒ Object
Public: Get the actual blob mime type.
-
#minified_javascript? ⇒ Boolean
Internal: Is the blob minified JS?.
-
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?.
-
#shebang_extname? ⇒ Boolean
Public: Is the blob likely to have a shebang?.
-
#shebang_language ⇒ Object
Internal: Get Language for shebang script.
-
#shebang_script ⇒ Object
Internal: Extract the script name from the shebang line.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#text? ⇒ Boolean
Public: Is the blob text?.
-
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?.
-
#viewable? ⇒ Boolean
Public: Is the blob viewable?.
-
#xcode_project_file? ⇒ Boolean
Internal: Is the blob an XCode project file?.
Instance Method Details
#average_line_length ⇒ Object
Internal: Compute average line length.
Returns Integer.
227 228 229 230 231 232 233 |
# File 'lib/linguist/blob_helper.rb', line 227 def average_line_length if lines.any? lines.inject(0) { |n, l| n += l.length } / lines.length else 0 end end |
#binary? ⇒ Boolean
Public: Is the blob binary?
Return true or false
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/linguist/blob_helper.rb', line 99 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end |
#binary_mime_type? ⇒ Boolean
Public: Is the blob binary according to its mime type
Return true or false
90 91 92 93 94 |
# File 'lib/linguist/blob_helper.rb', line 90 def binary_mime_type? if mime_type = Mime.lookup_mime_type_for(extname) mime_type.binary? end end |
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob
options - A Hash of options (defaults to {})
Returns html String
515 516 517 518 519 520 |
# File 'lib/linguist/blob_helper.rb', line 515 def colorize( = {}) return unless safe_to_colorize? [:options] ||= {} [:options][:encoding] ||= encoding lexer.highlight(data, ) end |
#colorize_without_wrapper(options = {}) ⇒ Object
Public: Highlight syntax of blob without the outer highlight div wrapper.
options - A Hash of options (defaults to {})
Returns html String
528 529 530 531 532 533 534 |
# File 'lib/linguist/blob_helper.rb', line 528 def colorize_without_wrapper( = {}) if text = colorize() text[%r{<div class="highlight"><pre>(.*?)</pre>\s*</div>}m, 1] else '' end end |
#compiled_coffeescript? ⇒ Boolean
Internal: Is the blob of JS generated by CoffeeScript?
Requires Blob#data
CoffeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns output by the CS compiler.
Return true or false
310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 |
# File 'lib/linguist/blob_helper.rb', line 310 def compiled_coffeescript? return false unless extname == '.js' # CoffeeScript generated by > 1.2 include a comment on the first line if lines[0] =~ /^\/\/ Generated by / return true end if lines[0] == '(function() {' && # First line is module closure opening lines[-2] == '}).call(this);' && # Second to last line closes module closure lines[-1] == '' # Last line is blank score = 0 lines.each do |line| if line =~ /var / # Underscored temp vars are likely to be Coffee score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count # bind and extend functions are very Coffee specific score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count end end # Require a score of 3. This is fairly arbitrary. Consider # tweaking later. score >= 3 else false end end |
#content_type ⇒ Object
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'
Returns a content type String.
49 50 51 52 |
# File 'lib/linguist/blob_helper.rb', line 49 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end |
#detect_encoding ⇒ Object
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or
no valid encoding could be found
83 84 85 |
# File 'lib/linguist/blob_helper.rb', line 83 def detect_encoding @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end |
#disambiguate_extension_language ⇒ Object
Internal: Disambiguates between multiple language extensions.
Returns a Language or nil.
433 434 435 436 437 438 439 440 441 442 |
# File 'lib/linguist/blob_helper.rb', line 433 def disambiguate_extension_language if Language.ambiguous?(extname) possible_languages = Language.all.select { |l| l.extensions.include?(extname) }.map(&:name) if possible_languages.any? if result = Classifier.classify(Samples::DATA, data, possible_languages).first Language[result[0]] end end end end |
#disposition ⇒ Object
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar"
# => "inline"
Returns a content disposition String.
62 63 64 65 66 67 68 69 70 |
# File 'lib/linguist/blob_helper.rb', line 62 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}" end end |
#encoding ⇒ Object
72 73 74 75 76 |
# File 'lib/linguist/blob_helper.rb', line 72 def encoding if hash = detect_encoding hash[:encoding] end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
23 24 25 |
# File 'lib/linguist/blob_helper.rb', line 23 def extname File.extname(name) end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is supressed in diffs and is ignored by language statistics.
Requires Blob#data
Includes:
-
XCode project XML files
-
Minified JavaScript
-
Compiled CoffeeScript
-
PEG.js-generated parsers
Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.
Return true or false
252 253 254 255 256 257 258 259 |
# File 'lib/linguist/blob_helper.rb', line 252 def generated? if name == 'Gemfile.lock' || minified_javascript? || compiled_coffeescript? || xcode_project_file? || generated_net_docfile? || generated_parser? true else false end end |
#generated_net_docfile? ⇒ Boolean
Internal: Is this a generated documentation file for a .NET assembly?
Requires Blob#data
.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.
Returns true or false
352 353 354 355 356 357 358 359 360 361 |
# File 'lib/linguist/blob_helper.rb', line 352 def generated_net_docfile? return false unless extname.downcase == ".xml" return false unless lines.count > 3 # .NET Docfiles always open with <doc> and their first tag is an # <assembly> tag return lines[1].include?("<doc>") && lines[2].include?("<assembly>") && lines[-2].include?("</doc>") end |
#generated_parser? ⇒ Boolean
Internal: Is the blob of JS a parser generated by PEG.js?
Requires Blob#data
PEG.js-generated parsers are not meant to be consumed by humans.
Return true or false
289 290 291 292 293 294 295 296 297 298 299 |
# File 'lib/linguist/blob_helper.rb', line 289 def generated_parser? return false unless extname == '.js' # PEG.js-generated parsers include a comment near the top of the file # that marks them as such. if lines[0..4].join('') =~ /^(?:[^\/]|\/[^\*])*\/\*(?:[^\*]|\*[^\/])*Generated by PEG.js/ return true end false end |
#guess_language ⇒ Object
Internal: Guess language
Please add additional test coverage to ‘test/test_blob.rb#test_language` if you make any changes.
Returns a Language or nil
410 411 412 413 414 415 416 417 418 419 420 421 |
# File 'lib/linguist/blob_helper.rb', line 410 def guess_language return if binary_mime_type? # Disambiguate between multiple language extensions disambiguate_extension_language || # See if there is a Language for the extension Language.find_by_filename(name) || # Try to detect Language from shebang line shebang_language end |
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?
These types of files are usually going to make Pygments.rb angry if we try to colorize them.
Return true or false
168 169 170 171 |
# File 'lib/linguist/blob_helper.rb', line 168 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end |
#image? ⇒ Boolean
Public: Is the blob a supported image format?
Return true or false
128 129 130 |
# File 'lib/linguist/blob_helper.rb', line 128 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname) end |
#indexable? ⇒ Boolean
Public: Should the blob be indexed for searching?
Excluded:
-
Files over 0.1MB
-
Non-text files
-
Langauges marked as not searchable
-
Generated source files
Please add additional test coverage to ‘test/test_blob.rb#test_indexable` if you make any changes.
Return true or false
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 |
# File 'lib/linguist/blob_helper.rb', line 375 def indexable? if binary? false elsif language.nil? false elsif !language.searchable? false elsif generated? false elsif size > 100 * 1024 false else true end end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
396 397 398 399 400 401 402 |
# File 'lib/linguist/blob_helper.rb', line 396 def language if defined? @language @language else @language = guess_language end end |
#large? ⇒ Boolean
Public: Is the blob too big to load?
Return true or false
146 147 148 |
# File 'lib/linguist/blob_helper.rb', line 146 def large? size.to_i > MEGABYTE end |
#lexer ⇒ Object
Internal: Get the lexer of the blob.
Returns a Lexer.
426 427 428 |
# File 'lib/linguist/blob_helper.rb', line 426 def lexer language ? language.lexer : Pygments::Lexer.find_by_name('Text only') end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
202 203 204 |
# File 'lib/linguist/blob_helper.rb', line 202 def lines @lines ||= (viewable? && data) ? data.split("\n", -1) : [] end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
211 212 213 |
# File 'lib/linguist/blob_helper.rb', line 211 def loc lines.size end |
#mime_type ⇒ Object
Public: Get the actual blob mime type
Examples
# => 'text/plain'
# => 'text/html'
Returns a mime type String.
35 36 37 |
# File 'lib/linguist/blob_helper.rb', line 35 def mime_type @mime_type ||= Mime.mime_for(extname) end |
#minified_javascript? ⇒ Boolean
Internal: Is the blob minified JS?
Consider JS minified if the average line length is greater then 100c.
Returns true or false.
277 278 279 280 |
# File 'lib/linguist/blob_helper.rb', line 277 def minified_javascript? return unless extname == '.js' average_line_length > 100 end |
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?
We use Pygments.rb for syntax highlighting blobs, which has some quirks and also is essentially ‘un-killable’ via normal timeout. To workaround this we try to carefully handling Pygments.rb anything it can’t handle.
Return true or false
158 159 160 |
# File 'lib/linguist/blob_helper.rb', line 158 def safe_to_colorize? text? && !large? && !high_ratio_of_long_lines? end |
#shebang_extname? ⇒ Boolean
Public: Is the blob likely to have a shebang?
Return true or false
135 136 137 138 139 |
# File 'lib/linguist/blob_helper.rb', line 135 def shebang_extname? extname.empty? && mode && (mode.to_i(8) & 05) == 05 end |
#shebang_language ⇒ Object
Internal: Get Language for shebang script
Returns the Language or nil
501 502 503 504 505 506 507 508 |
# File 'lib/linguist/blob_helper.rb', line 501 def shebang_language # Skip file extensions unlikely to have shebangs return unless shebang_extname? if script = shebang_script Language[script] end end |
#shebang_script ⇒ Object
Internal: Extract the script name from the shebang line
Requires Blob#data
Examples
'#!/usr/bin/ruby'
# => 'ruby'
'#!/usr/bin/env ruby'
# => 'ruby'
'#!/usr/bash/python2.4'
# => 'python'
Please add additional test coverage to ‘test/test_blob.rb#test_shebang_script` if you make any changes.
Returns a script name String or nil
463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 |
# File 'lib/linguist/blob_helper.rb', line 463 def shebang_script # Fail fast if blob isn't viewable? return unless viewable? if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/ bang.sub!(/^#! /, '#!') tokens = bang.split(' ') pieces = tokens.first.split('/') if pieces.size > 1 script = pieces.last else script = pieces.first.sub('#!', '') end script = script == 'env' ? tokens[1] : script # python2.4 => python if script =~ /((?:\d+\.?)+)/ script.sub! $1, '' end # Check for multiline shebang hacks that exec themselves # # #!/bin/sh # exec foo "$0" "$@" # if script == 'sh' && lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) } script = $1 end script end end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
220 221 222 |
# File 'lib/linguist/blob_helper.rb', line 220 def sloc lines.grep(/\S/).size end |
#text? ⇒ Boolean
Public: Is the blob text?
Return true or false
121 122 123 |
# File 'lib/linguist/blob_helper.rb', line 121 def text? !binary? end |
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
193 194 195 |
# File 'lib/linguist/blob_helper.rb', line 193 def vendored? name =~ VendoredRegexp ? true : false end |
#viewable? ⇒ Boolean
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
178 179 180 |
# File 'lib/linguist/blob_helper.rb', line 178 def viewable? !large? && text? end |
#xcode_project_file? ⇒ Boolean
Internal: Is the blob an XCode project file?
Generated if the file extension is an XCode project file extension.
Returns true of false.
267 268 269 |
# File 'lib/linguist/blob_helper.rb', line 267 def xcode_project_file? ['.xib', '.nib', '.storyboard', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname) end |