Module: LanguageSniffer::BlobHelper

Included in:
FileBlob
Defined in:
lib/language_sniffer/blob_helper.rb

Overview

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Instance Method Summary collapse

Instance Method Details

#average_line_lengthObject

Internal: Compute average line length.

Returns Integer.



58
59
60
61
62
63
64
# File 'lib/language_sniffer/blob_helper.rb', line 58

def average_line_length
  if lines.any?
    lines.inject(0) { |n, l| n += l.length } / lines.length
  else
    0
  end
end

#disambiguate_extension_languageObject

Internal: Disambiguates between multiple language extensions.

Delegates to “guess_EXTENSION_language”.

Please add additional test coverage to ‘test/test_blob.rb#test_language` if you add another method.

Returns a Language or nil.



221
222
223
224
225
226
# File 'lib/language_sniffer/blob_helper.rb', line 221

def disambiguate_extension_language
  if Language.ambiguous?(extname)
    name = "guess_#{extname.sub(/^\./, '')}_language"
    send(name) if respond_to?(name)
  end
end

#extnameObject

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String



24
25
26
# File 'lib/language_sniffer/blob_helper.rb', line 24

def extname
  pathname.extname
end

#first_line_languageObject

Internal: Guess language from the first line.

Look for leading “<?php”

Returns a Language.



316
317
318
319
320
# File 'lib/language_sniffer/blob_helper.rb', line 316

def first_line_language
  if lines.first.to_s =~ /^<\?php/
    Language['PHP']
  end
end

#generated?Boolean

Public: Is the blob a generated file?

Generated source code is supressed in diffs and is ignored by language statistics.

Requires Blob#data

Includes:

  • XCode project XML files

  • Visual Studio project XNL files

  • Minified JavaScript

Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.

Return true or false

Returns:

  • (Boolean)


82
83
84
85
86
87
88
89
90
# File 'lib/language_sniffer/blob_helper.rb', line 82

def generated?
  if xcode_project_file? || visual_studio_project_file?
    true
  elsif generated_coffeescript? || minified_javascript? || generated_net_docfile?
    true
  else
    false
  end
end

#generated_coffeescript?Boolean

Internal: Is the blob JS generated by CoffeeScript?

Requires Blob#data

CoffeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns outputed by the CS compiler.

Return true or false

Returns:

  • (Boolean)


132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/language_sniffer/blob_helper.rb', line 132

def generated_coffeescript?
  return unless extname == '.js'

  if lines[0] == '(function() {' &&     # First line is module closure opening
      lines[-2] == '}).call(this);' &&  # Second to last line closes module closure
      lines[-1] == ''                   # Last line is blank

    score = 0

    lines.each do |line|
      if line =~ /var /
        # Underscored temp vars are likely to be Coffee
        score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count

        # bind and extend functions are very Coffee specific
        score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count
      end
    end

    # Require a score of 3. This is fairly arbitrary. Consider
    # tweaking later.
    score >= 3
  else
    false
  end
end

#generated_net_docfile?Boolean

Internal: Is this a generated documentation file for a .NET assembly?

Requires Blob#data

.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.

Returns true or false

Returns:

  • (Boolean)


169
170
171
172
173
174
175
176
177
178
# File 'lib/language_sniffer/blob_helper.rb', line 169

def generated_net_docfile?
  return false unless extname.downcase == ".xml"
  return false unless lines.count > 3

  # .NET Docfiles always open with <doc> and their first tag is an
  # <assembly> tag
  return lines[1].include?("<doc>") &&
    lines[2].include?("<assembly>") &&
    lines[-2].include?("</doc>")
end

#guess_gsp_languageObject

Internal: Guess language of .gsp files.

Returns a Language.



303
304
305
306
307
308
309
# File 'lib/language_sniffer/blob_helper.rb', line 303

def guess_gsp_language
  if lines.grep(/<%|<%@|\$\{|<%|<g:|<meta name="layout"|<r:/).any?
    Language['Groovy Server Pages']
  else
    Language['Gosu']
  end
end

#guess_h_languageObject

Internal: Guess language of header files (.h).

Returns a Language.



231
232
233
234
235
236
237
238
239
# File 'lib/language_sniffer/blob_helper.rb', line 231

def guess_h_language
  if lines.grep(/^@(interface|property|private|public|end)/).any?
    Language['Objective-C']
  elsif lines.grep(/^class |^\s+(public|protected|private):/).any?
    Language['C++']
  else
    Language['C']
  end
end

#guess_languageObject

Internal: Guess language

Please add additional test coverage to ‘test/test_blob.rb#test_language` if you make any changes.

Returns a Language or nil



199
200
201
202
203
204
205
206
207
208
209
210
211
# File 'lib/language_sniffer/blob_helper.rb', line 199

def guess_language
  # Disambiguate between multiple language extensions
  disambiguate_extension_language ||

    # See if there is a Language for the extension
    pathname.language ||

    # Look for idioms in first line
    first_line_language ||

    # Try to detect Language from shebang line
    shebang_language
end

#guess_m_languageObject

Internal: Guess language of .m files.

Objective-C heuristics:

  • Keywords

Matlab heuristics:

  • Leading function keyword

  • “%” comments

Returns a Language.



251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
# File 'lib/language_sniffer/blob_helper.rb', line 251

def guess_m_language
  # Objective-C keywords
  if lines.grep(/^#import|@(interface|implementation|property|synthesize|end)/).any?
    Language['Objective-C']

  # File function
  elsif lines.first.to_s =~ /^function /
    Language['Matlab']

  # Matlab comment
  elsif lines.grep(/^%/).any?
    Language['Matlab']

  # Fallback to Objective-C, don't want any Matlab false positives
  else
    Language['Objective-C']
  end
end

#guess_pl_languageObject

Internal: Guess language of .pl files

The rules for disambiguation are:

  1. Many perl files begin with a shebang

  2. Most Prolog source files have a rule somewhere (marked by the :- operator)

  3. Default to Perl, because it is more popular

Returns a Language.



279
280
281
282
283
284
285
286
287
# File 'lib/language_sniffer/blob_helper.rb', line 279

def guess_pl_language
  if shebang_script == 'perl'
    Language['Perl']
  elsif lines.grep(/:-/).any?
    Language['Prolog']
  else
    Language['Perl']
  end
end

#guess_r_languageObject

Internal: Guess language of .r files.

Returns a Language.



292
293
294
295
296
297
298
# File 'lib/language_sniffer/blob_helper.rb', line 292

def guess_r_language
  if lines.grep(/(rebol|(:\s+func|make\s+object!|^\s*context)\s*\[)/i).any?
    Language['Rebol']
  else
    Language['R']
  end
end

#languageObject

Public: Detects the Language of the blob.

May load Blob#data

Returns a Language or nil if none is detected



185
186
187
188
189
190
191
# File 'lib/language_sniffer/blob_helper.rb', line 185

def language
  if defined? @language
    @language
  else
    @language = guess_language
  end
end

#linesObject

Public: Get each line of data

Requires Blob#data

Returns an Array of lines



33
34
35
# File 'lib/language_sniffer/blob_helper.rb', line 33

def lines
  @lines ||= (data ? data.split("\n", -1) : [])
end

#locObject

Public: Get number of lines of code

Requires Blob#data

Returns Integer



42
43
44
# File 'lib/language_sniffer/blob_helper.rb', line 42

def loc
  lines.size
end

#minified_javascript?Boolean

Internal: Is the blob minified JS?

Consider JS minified if the average line length is greater then 100c.

Returns true or false.

Returns:

  • (Boolean)


118
119
120
121
# File 'lib/language_sniffer/blob_helper.rb', line 118

def minified_javascript?
  return unless extname == '.js'
  average_line_length > 100
end

#pathnameObject

Internal: Get a Pathname wrapper for Blob#name

Returns a Pathname.



12
13
14
# File 'lib/language_sniffer/blob_helper.rb', line 12

def pathname
  Pathname.new(name || "")
end

#shebang_languageObject

Internal: Get Language for shebang script

Returns the Language or nil



376
377
378
379
380
# File 'lib/language_sniffer/blob_helper.rb', line 376

def shebang_language
  if script = shebang_script
    Language[script]
  end
end

#shebang_scriptObject

Internal: Extract the script name from the shebang line

Requires Blob#data

Examples

'#!/usr/bin/ruby'
# => 'ruby'

'#!/usr/bin/env ruby'
# => 'ruby'

'#!/usr/bash/python2.4'
# => 'python'

Please add additional test coverage to ‘test/test_blob.rb#test_shebang_script` if you make any changes.

Returns a script name String or nil



341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
# File 'lib/language_sniffer/blob_helper.rb', line 341

def shebang_script
  if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/
    bang.sub!(/^#! /, '#!')
    tokens = bang.split(' ')
    pieces = tokens.first.split('/')
    if pieces.size > 1
      script = pieces.last
    else
      script = pieces.first.sub('#!', '')
    end

    script = script == 'env' ? tokens[1] : script

    # python2.4 => python
    if script =~ /((?:\d+\.?)+)/
      script.sub! $1, ''
    end

    # Check for multiline shebang hacks that exec themselves
    #
    #   #!/bin/sh
    #   exec foo "$0" "$@"
    #
    if script == 'sh' &&
        lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) }
      script = $1
    end

    script
  end
end

#slocObject

Public: Get number of source lines of code

Requires Blob#data

Returns Integer



51
52
53
# File 'lib/language_sniffer/blob_helper.rb', line 51

def sloc
  lines.grep(/\S/).size
end

#visual_studio_project_file?Boolean

Internal: Is the blob a Visual Studio project file?

Generated if the file extension is a Visual Studio project file extension.

Returns true of false.

Returns:

  • (Boolean)


108
109
110
# File 'lib/language_sniffer/blob_helper.rb', line 108

def visual_studio_project_file?
  ['.csproj', '.dbproj', '.fsproj', '.pyproj', '.rbproj', '.vbproj', '.vcxproj', '.wixproj', '.resx', '.sln', '.vdproj', '.isproj'].include?(extname)
end

#xcode_project_file?Boolean

Internal: Is the blob an XCode project file?

Generated if the file extension is an XCode project file extension.

Returns true of false.

Returns:

  • (Boolean)


98
99
100
# File 'lib/language_sniffer/blob_helper.rb', line 98

def xcode_project_file?
  ['.xib', '.nib', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname)
end