Module: LanguageSniffer::BlobHelper

Included in:
FileBlob
Defined in:
lib/language_sniffer/blob_helper.rb

Overview

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Instance Method Summary collapse

Instance Method Details

#average_line_lengthObject

Internal: Compute average line length.

Returns Integer.



63
64
65
66
67
68
69
# File 'lib/language_sniffer/blob_helper.rb', line 63

def average_line_length
  if lines.any?
    lines.inject(0) { |n, l| n += l.length } / lines.length
  else
    0
  end
end

#disambiguate_extension_languageObject

Internal: Disambiguates between multiple language extensions.

Delegates to “guess_EXTENSION_language”.

Please add additional test coverage to ‘test/test_blob.rb#test_language` if you add another method.

Returns a Language or nil.



226
227
228
229
230
231
# File 'lib/language_sniffer/blob_helper.rb', line 226

def disambiguate_extension_language
  if Language.ambiguous?(extname)
    name = "guess_#{extname.sub(/^\./, '')}_language"
    send(name) if respond_to?(name)
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



25
26
27
# File 'lib/language_sniffer/blob_helper.rb', line 25

def extname
  pathname.extname
end

#first_line_languageObject

Internal: Guess language from the first line.

Look for leading “<?php”

Returns a Language.



372
373
374
375
376
# File 'lib/language_sniffer/blob_helper.rb', line 372

def first_line_language
  if lines.first.to_s =~ /^<\?php/
    Language['PHP']
  end
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is supressed in diffs and is ignored by language statistics.

Requires Blob#data

Includes:

  • XCode project XML files

  • Visual Studio project XNL files

  • Minified JavaScript

Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.

Return true or false

Returns:

  • (Boolean)


87
88
89
90
91
92
93
94
95
# File 'lib/language_sniffer/blob_helper.rb', line 87

def generated?
  if xcode_project_file? || visual_studio_project_file?
    true
  elsif generated_coffeescript? || minified_javascript? || generated_net_docfile?
    true
  else
    false
  end
end

#generated_coffeescript?Boolean

Internal: Is the blob JS generated by CoffeeScript?

Requires Blob#data

CoffeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns outputed by the CS compiler.

Return true or false

Returns:

  • (Boolean)


137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# File 'lib/language_sniffer/blob_helper.rb', line 137

def generated_coffeescript?
  return unless extname == '.js'

  if lines[0] == '(function() {' &&     # First line is module closure opening
      lines[-2] == '}).call(this);' &&  # Second to last line closes module closure
      lines[-1] == ''                   # Last line is blank

    score = 0

    lines.each do |line|
      if line =~ /var /
        # Underscored temp vars are likely to be Coffee
        score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count

        # bind and extend functions are very Coffee specific
        score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count
      end
    end

    # Require a score of 3. This is fairly arbitrary. Consider
    # tweaking later.
    score >= 3
  else
    false
  end
end

#generated_net_docfile?Boolean

Internal: Is this a generated documentation file for a .NET assembly?

Requires Blob#data

.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.

Returns true or false

Returns:

  • (Boolean)


174
175
176
177
178
179
180
181
182
183
# File 'lib/language_sniffer/blob_helper.rb', line 174

def generated_net_docfile?
  return false unless extname.downcase == ".xml"
  return false unless lines.count > 3

  # .NET Docfiles always open with <doc> and their first tag is an
  # <assembly> tag
  return lines[1].include?("<doc>") &&
    lines[2].include?("<assembly>") &&
    lines[-2].include?("</doc>")
end

#guess_cls_languageObject

Internal: Guess language of .cls files

Returns a Language.



236
237
238
239
240
241
242
243
244
245
246
247
248
249
# File 'lib/language_sniffer/blob_helper.rb', line 236

def guess_cls_language
  if lines.grep(/^(%|\\)/).any?
    Language['TeX']
  elsif lines.grep(/^\s*(CLASS|METHOD|INTERFACE).*:\s*/i).any? || lines.grep(/^\s*(USING|DEFINE)/i).any?
    Language['OpenEdge ABL']
  elsif lines.grep(/\{$/).any? || lines.grep(/\}$/).any?
    Language['Apex']
  elsif lines.grep(/^(\'\*|Attribute|Option|Sub|Private|Protected|Public|Friend)/i).any?
    Language['Visual Basic']
  else
    # The most common language should be the fallback
    Language['TeX']
  end
end

#guess_gsp_languageObject

Internal: Guess language of .gsp files.

Returns a Language.



359
360
361
362
363
364
365
# File 'lib/language_sniffer/blob_helper.rb', line 359

def guess_gsp_language
  if lines.grep(/<%|<%@|\$\{|<%|<g:|<meta name="layout"|<r:/).any?
    Language['Groovy Server Pages']
  else
    Language['Gosu']
  end
end

#guess_h_languageObject

Internal: Guess language of header files (.h).

Returns a Language.



254
255
256
257
258
259
260
261
262
# File 'lib/language_sniffer/blob_helper.rb', line 254

def guess_h_language
  if lines.grep(/^@(interface|property|private|public|end)/).any?
    Language['Objective-C']
  elsif lines.grep(/^class |^\s+(public|protected|private):/).any?
    Language['C++']
  else
    Language['C']
  end
end

#guess_languageObject

Internal: Guess language

Please add additional test coverage to ‘test/test_blob.rb#test_language` if you make any changes.

Returns a Language or nil



204
205
206
207
208
209
210
211
212
213
214
215
216
# File 'lib/language_sniffer/blob_helper.rb', line 204

def guess_language
  # Disambiguate between multiple language extensions
  disambiguate_extension_language ||

    # See if there is a Language for the extension
    pathname.language ||

    # Look for idioms in first line
    first_line_language ||

    # Try to detect Language from shebang line
    shebang_language
end

#guess_m_languageObject

Internal: Guess language of .m files.

Objective-C heuristics:

  • Keywords

Matlab heuristics:

  • Leading function keyword

  • “%” comments

Returns a Language.



274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
# File 'lib/language_sniffer/blob_helper.rb', line 274

def guess_m_language
  # Objective-C keywords
  if lines.grep(/^#import|@(interface|implementation|property|synthesize|end)/).any?
    Language['Objective-C']

  # File function
  elsif lines.first.to_s =~ /^function /
    Language['Matlab']

  # Matlab comment
  elsif lines.grep(/^%/).any?
    Language['Matlab']

  # Fallback to Objective-C, don't want any Matlab false positives
  else
    Language['Objective-C']
  end
end

#guess_pl_languageObject

Internal: Guess language of .pl files

The rules for disambiguation are:

  1. Many perl files begin with a shebang

  2. Most Prolog source files have a rule somewhere (marked by the :- operator)

  3. Default to Perl, because it is more popular

Returns a Language.



302
303
304
305
306
307
308
309
310
# File 'lib/language_sniffer/blob_helper.rb', line 302

def guess_pl_language
  if shebang_script == 'perl'
    Language['Perl']
  elsif lines.grep(/:-/).any?
    Language['Prolog']
  else
    Language['Perl']
  end
end

#guess_r_languageObject

Internal: Guess language of .r files.

Returns a Language.



315
316
317
318
319
320
321
# File 'lib/language_sniffer/blob_helper.rb', line 315

def guess_r_language
  if lines.grep(/(rebol|(:\s+func|make\s+object!|^\s*context)\s*\[)/i).any?
    Language['Rebol']
  else
    Language['R']
  end
end

#guess_t_languageObject

Internal: Guess language of .t files.

Returns a Language.



326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# File 'lib/language_sniffer/blob_helper.rb', line 326

def guess_t_language
  score = 0
  score += 1 if lines.grep(/^% /).any?
  score += data.gsub(/ := /).count
  score += data.gsub(/proc |procedure |fcn |function /).count
  score += data.gsub(/var \w+: \w+/).count

  # Tell-tale signs its gotta be Perl
  if lines.grep(/^(my )?(sub |\$|@|%)\w+/).any?
    score = 0
  end

  if score >= 3
    Language['Turing']
  else
    Language['Perl']
  end
end

#guess_v_languageObject

Internal: Guess language of .v files.

Returns a Language



348
349
350
351
352
353
354
# File 'lib/language_sniffer/blob_helper.rb', line 348

def guess_v_language
  if lines.grep(/^(\/\*|\/\/|module|parameter|input|output|wire|reg|always|initial|begin|\`)/).any?
    Language['Verilog']
  else
    Language['Coq']
  end
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



190
191
192
193
194
195
196
# File 'lib/language_sniffer/blob_helper.rb', line 190

def language
  if defined? @language
    @language
  else
    @language = guess_language
  end
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



34
35
36
37
38
39
40
# File 'lib/language_sniffer/blob_helper.rb', line 34

def lines
  @lines ||= begin
    (data ? data.split("\n", -1) : [])
  rescue ArgumentError # invalid byte sequence in UTF-8
    []
  end
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



47
48
49
# File 'lib/language_sniffer/blob_helper.rb', line 47

def loc
  lines.size
end

#minified_javascript?Boolean

Internal: Is the blob minified JS?

Consider JS minified if the average line length is greater then 100c.

Returns true or false.

Returns:

  • (Boolean)


123
124
125
126
# File 'lib/language_sniffer/blob_helper.rb', line 123

def minified_javascript?
  return unless extname == '.js'
  average_line_length > 100
end

#pathnameObject

Internal: Get a Pathname wrapper for Blob#name

Returns a Pathname.



13
14
15
# File 'lib/language_sniffer/blob_helper.rb', line 13

def pathname
  Pathname.new(name || "")
end

#shebang_languageObject

Internal: Get Language for shebang script

Returns the Language or nil



432
433
434
435
436
# File 'lib/language_sniffer/blob_helper.rb', line 432

def shebang_language
  if script = shebang_script
    Language[script]
  end
end

#shebang_scriptObject

Internal: Extract the script name from the shebang line

Requires Blob#data

Examples

'#!/usr/bin/ruby'
# => 'ruby'

'#!/usr/bin/env ruby'
# => 'ruby'

'#!/usr/bash/python2.4'
# => 'python'

Please add additional test coverage to ‘test/test_blob.rb#test_shebang_script` if you make any changes.

Returns a script name String or nil



397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
# File 'lib/language_sniffer/blob_helper.rb', line 397

def shebang_script
  if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/
    bang.sub!(/^#! /, '#!')
    tokens = bang.split(' ')
    pieces = tokens.first.split('/')
    if pieces.size > 1
      script = pieces.last
    else
      script = pieces.first.sub('#!', '')
    end

    script = script == 'env' ? tokens[1] : script

    # python2.4 => python
    if script =~ /((?:\d+\.?)+)/
      script.sub! $1, ''
    end

    # Check for multiline shebang hacks that exec themselves
    #
    #   #!/bin/sh
    #   exec foo "$0" "$@"
    #
    if script == 'sh' &&
        lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) }
      script = $1
    end

    script
  end
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



56
57
58
# File 'lib/language_sniffer/blob_helper.rb', line 56

def sloc
  lines.grep(/\S/).size
end

#visual_studio_project_file?Boolean

Internal: Is the blob a Visual Studio project file?

Generated if the file extension is a Visual Studio project file extension.

Returns true of false.

Returns:

  • (Boolean)


113
114
115
# File 'lib/language_sniffer/blob_helper.rb', line 113

def visual_studio_project_file?
  ['.csproj', '.dbproj', '.fsproj', '.pyproj', '.rbproj', '.vbproj', '.vcxproj', '.wixproj', '.resx', '.sln', '.vdproj', '.isproj'].include?(extname)
end

#xcode_project_file?Boolean

Internal: Is the blob an XCode project file?

Generated if the file extension is an XCode project file extension.

Returns true of false.

Returns:

  • (Boolean)


103
104
105
# File 'lib/language_sniffer/blob_helper.rb', line 103

def xcode_project_file?
  ['.xib', '.nib', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname)
end