Module: Linguist::BlobHelper

Included in:
FileBlob
Defined in:
lib/linguist/blob_helper.rb

Overview

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constant Summary collapse

MEGABYTE =
1024 * 1024
VendoredRegexp =
Regexp.new(vendored_paths.join('|'))

Instance Method Summary collapse

Instance Method Details

#average_line_lengthObject

Internal: Compute average line length.

Returns Integer.



227
228
229
230
231
232
233
# File 'lib/linguist/blob_helper.rb', line 227

def average_line_length
  if lines.any?
    lines.inject(0) { |n, l| n += l.length } / lines.length
  else
    0
  end
end

#binary?Boolean

Public: Is the blob binary?

Return true or false

Returns:

  • (Boolean)


99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/linguist/blob_helper.rb', line 99

def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end

#binary_mime_type?Boolean

Public: Is the blob binary according to its mime type

Return true or false

Returns:

  • (Boolean)


90
91
92
93
94
# File 'lib/linguist/blob_helper.rb', line 90

def binary_mime_type?
  if mime_type = Mime.lookup_mime_type_for(extname)
    mime_type.binary?
  end
end

#colorize(options = {}) ⇒ Object

Public: Highlight syntax of blob

options - A Hash of options (defaults to {})

Returns html String



515
516
517
518
519
520
# File 'lib/linguist/blob_helper.rb', line 515

def colorize(options = {})
  return unless safe_to_colorize?
  options[:options] ||= {}
  options[:options][:encoding] ||= encoding
  lexer.highlight(data, options)
end

#colorize_without_wrapper(options = {}) ⇒ Object

Public: Highlight syntax of blob without the outer highlight div wrapper.

options - A Hash of options (defaults to {})

Returns html String



528
529
530
531
532
533
534
# File 'lib/linguist/blob_helper.rb', line 528

def colorize_without_wrapper(options = {})
  if text = colorize(options)
    text[%r{<div class="highlight"><pre>(.*?)</pre>\s*</div>}m, 1]
  else
    ''
  end
end

#compiled_coffeescript?Boolean

Internal: Is the blob of JS generated by CoffeeScript?

Requires Blob#data

CoffeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns output by the CS compiler.

Return true or false

Returns:

  • (Boolean)


310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
# File 'lib/linguist/blob_helper.rb', line 310

def compiled_coffeescript?
  return false unless extname == '.js'

  # CoffeeScript generated by > 1.2 include a comment on the first line
  if lines[0] =~ /^\/\/ Generated by /
    return true
  end

  if lines[0] == '(function() {' &&     # First line is module closure opening
      lines[-2] == '}).call(this);' &&  # Second to last line closes module closure
      lines[-1] == ''                   # Last line is blank

    score = 0

    lines.each do |line|
      if line =~ /var /
        # Underscored temp vars are likely to be Coffee
        score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count

        # bind and extend functions are very Coffee specific
        score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count
      end
    end

    # Require a score of 3. This is fairly arbitrary. Consider
    # tweaking later.
    score >= 3
  else
    false
  end
end

#content_typeObject

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.



49
50
51
52
# File 'lib/linguist/blob_helper.rb', line 49

def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end

#detect_encodingObject

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found


83
84
85
# File 'lib/linguist/blob_helper.rb', line 83

def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end

#disambiguate_extension_languageObject

Internal: Disambiguates between multiple language extensions.

Returns a Language or nil.



433
434
435
436
437
438
439
440
441
442
# File 'lib/linguist/blob_helper.rb', line 433

def disambiguate_extension_language
  if Language.ambiguous?(extname)
    possible_languages = Language.all.select { |l| l.extensions.include?(extname) }.map(&:name)
    if possible_languages.any?
      if result = Classifier.classify(Samples::DATA, data, possible_languages).first
        Language[result[0]]
      end
    end
  end
end

#dispositionObject

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.



62
63
64
65
66
67
68
69
70
# File 'lib/linguist/blob_helper.rb', line 62

def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}"
  end
end

#encodingObject



72
73
74
75
76
# File 'lib/linguist/blob_helper.rb', line 72

def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



23
24
25
# File 'lib/linguist/blob_helper.rb', line 23

def extname
  File.extname(name)
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is supressed in diffs and is ignored by language statistics.

Requires Blob#data

Includes:

  • XCode project XML files

  • Minified JavaScript

  • Compiled CoffeeScript

  • PEG.js-generated parsers

Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.

Return true or false

Returns:

  • (Boolean)


252
253
254
255
256
257
258
259
# File 'lib/linguist/blob_helper.rb', line 252

def generated?
  if name == 'Gemfile.lock' || minified_javascript? || compiled_coffeescript? ||
  xcode_project_file? || generated_net_docfile? || generated_parser?
    true
  else
    false
  end
end

#generated_net_docfile?Boolean

Internal: Is this a generated documentation file for a .NET assembly?

Requires Blob#data

.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.

Returns true or false

Returns:

  • (Boolean)


352
353
354
355
356
357
358
359
360
361
# File 'lib/linguist/blob_helper.rb', line 352

def generated_net_docfile?
  return false unless extname.downcase == ".xml"
  return false unless lines.count > 3

  # .NET Docfiles always open with <doc> and their first tag is an
  # <assembly> tag
  return lines[1].include?("<doc>") &&
    lines[2].include?("<assembly>") &&
    lines[-2].include?("</doc>")
end

#generated_parser?Boolean

Internal: Is the blob of JS a parser generated by PEG.js?

Requires Blob#data

PEG.js-generated parsers are not meant to be consumed by humans.

Return true or false

Returns:

  • (Boolean)


289
290
291
292
293
294
295
296
297
298
299
# File 'lib/linguist/blob_helper.rb', line 289

def generated_parser?
  return false unless extname == '.js'

  # PEG.js-generated parsers include a comment near the top  of the file
  # that marks them as such.
  if lines[0..4].join('') =~ /^(?:[^\/]|\/[^\*])*\/\*(?:[^\*]|\*[^\/])*Generated by PEG.js/
    return true
  end

  false
end

#guess_languageObject

Internal: Guess language

Please add additional test coverage to ‘test/test_blob.rb#test_language` if you make any changes.

Returns a Language or nil



410
411
412
413
414
415
416
417
418
419
420
421
# File 'lib/linguist/blob_helper.rb', line 410

def guess_language
  return if binary_mime_type?

  # Disambiguate between multiple language extensions
  disambiguate_extension_language ||

    # See if there is a Language for the extension
    Language.find_by_filename(name) ||

    # Try to detect Language from shebang line
    shebang_language
end

#high_ratio_of_long_lines?Boolean

Internal: Does the blob have a ratio of long lines?

These types of files are usually going to make Pygments.rb angry if we try to colorize them.

Return true or false

Returns:

  • (Boolean)


168
169
170
171
# File 'lib/linguist/blob_helper.rb', line 168

def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end

#image?Boolean

Public: Is the blob a supported image format?

Return true or false

Returns:

  • (Boolean)


128
129
130
# File 'lib/linguist/blob_helper.rb', line 128

def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname)
end

#indexable?Boolean

Public: Should the blob be indexed for searching?

Excluded:

  • Files over 0.1MB

  • Non-text files

  • Langauges marked as not searchable

  • Generated source files

Please add additional test coverage to ‘test/test_blob.rb#test_indexable` if you make any changes.

Return true or false

Returns:

  • (Boolean)


375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
# File 'lib/linguist/blob_helper.rb', line 375

def indexable?
  if binary?
    false
  elsif language.nil?
    false
  elsif !language.searchable?
    false
  elsif generated?
    false
  elsif size > 100 * 1024
    false
  else
    true
  end
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



396
397
398
399
400
401
402
# File 'lib/linguist/blob_helper.rb', line 396

def language
  if defined? @language
    @language
  else
    @language = guess_language
  end
end

#large?Boolean

Public: Is the blob too big to load?

Return true or false

Returns:

  • (Boolean)


146
147
148
# File 'lib/linguist/blob_helper.rb', line 146

def large?
  size.to_i > MEGABYTE
end

#lexerObject

Internal: Get the lexer of the blob.

Returns a Lexer.



426
427
428
# File 'lib/linguist/blob_helper.rb', line 426

def lexer
  language ? language.lexer : Pygments::Lexer.find_by_name('Text only')
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



202
203
204
# File 'lib/linguist/blob_helper.rb', line 202

def lines
  @lines ||= (viewable? && data) ? data.split("\n", -1) : []
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



211
212
213
# File 'lib/linguist/blob_helper.rb', line 211

def loc
  lines.size
end

#mime_typeObject

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.



35
36
37
# File 'lib/linguist/blob_helper.rb', line 35

def mime_type
  @mime_type ||= Mime.mime_for(extname)
end

#minified_javascript?Boolean

Internal: Is the blob minified JS?

Consider JS minified if the average line length is greater then 100c.

Returns true or false.

Returns:

  • (Boolean)


277
278
279
280
# File 'lib/linguist/blob_helper.rb', line 277

def minified_javascript?
  return unless extname == '.js'
  average_line_length > 100
end

#safe_to_colorize?Boolean

Public: Is the blob safe to colorize?

We use Pygments.rb for syntax highlighting blobs, which has some quirks and also is essentially ‘un-killable’ via normal timeout. To workaround this we try to carefully handling Pygments.rb anything it can’t handle.

Return true or false

Returns:

  • (Boolean)


158
159
160
# File 'lib/linguist/blob_helper.rb', line 158

def safe_to_colorize?
  text? && !large? && !high_ratio_of_long_lines?
end

#shebang_extname?Boolean

Public: Is the blob likely to have a shebang?

Return true or false

Returns:

  • (Boolean)


135
136
137
138
139
# File 'lib/linguist/blob_helper.rb', line 135

def shebang_extname?
  extname.empty? &&
    mode &&
    (mode.to_i(8) & 05) == 05
end

#shebang_languageObject

Internal: Get Language for shebang script

Returns the Language or nil



501
502
503
504
505
506
507
508
# File 'lib/linguist/blob_helper.rb', line 501

def shebang_language
  # Skip file extensions unlikely to have shebangs
  return unless shebang_extname?

  if script = shebang_script
    Language[script]
  end
end

#shebang_scriptObject

Internal: Extract the script name from the shebang line

Requires Blob#data

Examples

'#!/usr/bin/ruby'
# => 'ruby'

'#!/usr/bin/env ruby'
# => 'ruby'

'#!/usr/bash/python2.4'
# => 'python'

Please add additional test coverage to ‘test/test_blob.rb#test_shebang_script` if you make any changes.

Returns a script name String or nil



463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
# File 'lib/linguist/blob_helper.rb', line 463

def shebang_script
  # Fail fast if blob isn't viewable?
  return unless viewable?

  if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/
    bang.sub!(/^#! /, '#!')
    tokens = bang.split(' ')
    pieces = tokens.first.split('/')
    if pieces.size > 1
      script = pieces.last
    else
      script = pieces.first.sub('#!', '')
    end

    script = script == 'env' ? tokens[1] : script

    # python2.4 => python
    if script =~ /((?:\d+\.?)+)/
      script.sub! $1, ''
    end

    # Check for multiline shebang hacks that exec themselves
    #
    #   #!/bin/sh
    #   exec foo "$0" "$@"
    #
    if script == 'sh' &&
        lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) }
      script = $1
    end

    script
  end
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



220
221
222
# File 'lib/linguist/blob_helper.rb', line 220

def sloc
  lines.grep(/\S/).size
end

#text?Boolean

Public: Is the blob text?

Return true or false

Returns:

  • (Boolean)


121
122
123
# File 'lib/linguist/blob_helper.rb', line 121

def text?
  !binary?
end

#vendored?Boolean

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

Returns:

  • (Boolean)


193
194
195
# File 'lib/linguist/blob_helper.rb', line 193

def vendored?
  name =~ VendoredRegexp ? true : false
end

#viewable?Boolean

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

Returns:

  • (Boolean)


178
179
180
# File 'lib/linguist/blob_helper.rb', line 178

def viewable?
  !large? && text?
end

#xcode_project_file?Boolean

Internal: Is the blob an XCode project file?

Generated if the file extension is an XCode project file extension.

Returns true of false.

Returns:

  • (Boolean)


267
268
269
# File 'lib/linguist/blob_helper.rb', line 267

def xcode_project_file?
  ['.xib', '.nib', '.storyboard', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname)
end