Module: LanguageSniffer::BlobHelper
- Included in:
- FileBlob
- Defined in:
- lib/language_sniffer/blob_helper.rb
Overview
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Instance Method Summary collapse
-
#average_line_length ⇒ Object
Internal: Compute average line length.
-
#disambiguate_extension_language ⇒ Object
Internal: Disambiguates between multiple language extensions.
-
#extname ⇒ Object
Public: Get the extname of the path.
-
#first_line_language ⇒ Object
Internal: Guess language from the first line.
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#generated_coffeescript? ⇒ Boolean
Internal: Is the blob JS generated by CoffeeScript?.
-
#generated_net_docfile? ⇒ Boolean
Internal: Is this a generated documentation file for a .NET assembly?.
-
#guess_cls_language ⇒ Object
Internal: Guess language of .cls files.
-
#guess_gsp_language ⇒ Object
Internal: Guess language of .gsp files.
-
#guess_h_language ⇒ Object
Internal: Guess language of header files (.h).
-
#guess_language ⇒ Object
Internal: Guess language.
-
#guess_m_language ⇒ Object
Internal: Guess language of .m files.
-
#guess_pl_language ⇒ Object
Internal: Guess language of .pl files.
-
#guess_r_language ⇒ Object
Internal: Guess language of .r files.
-
#guess_t_language ⇒ Object
Internal: Guess language of .t files.
-
#guess_v_language ⇒ Object
Internal: Guess language of .v files.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#minified_javascript? ⇒ Boolean
Internal: Is the blob minified JS?.
-
#pathname ⇒ Object
Internal: Get a Pathname wrapper for Blob#name.
-
#shebang_language ⇒ Object
Internal: Get Language for shebang script.
-
#shebang_script ⇒ Object
Internal: Extract the script name from the shebang line.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#visual_studio_project_file? ⇒ Boolean
Internal: Is the blob a Visual Studio project file?.
-
#xcode_project_file? ⇒ Boolean
Internal: Is the blob an XCode project file?.
Instance Method Details
#average_line_length ⇒ Object
Internal: Compute average line length.
Returns Integer.
63 64 65 66 67 68 69 |
# File 'lib/language_sniffer/blob_helper.rb', line 63 def average_line_length if lines.any? lines.inject(0) { |n, l| n += l.length } / lines.length else 0 end end |
#disambiguate_extension_language ⇒ Object
Internal: Disambiguates between multiple language extensions.
Delegates to “guess_EXTENSION_language”.
Please add additional test coverage to ‘test/test_blob.rb#test_language` if you add another method.
Returns a Language or nil.
226 227 228 229 230 231 |
# File 'lib/language_sniffer/blob_helper.rb', line 226 def disambiguate_extension_language if Language.ambiguous?(extname) name = "guess_#{extname.sub(/^\./, '')}_language" send(name) if respond_to?(name) end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
25 26 27 |
# File 'lib/language_sniffer/blob_helper.rb', line 25 def extname pathname.extname end |
#first_line_language ⇒ Object
Internal: Guess language from the first line.
Look for leading “<?php”
Returns a Language.
372 373 374 375 376 |
# File 'lib/language_sniffer/blob_helper.rb', line 372 def first_line_language if lines.first.to_s =~ /^<\?php/ Language['PHP'] end end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is supressed in diffs and is ignored by language statistics.
Requires Blob#data
Includes:
-
XCode project XML files
-
Visual Studio project XNL files
-
Minified JavaScript
Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.
Return true or false
87 88 89 90 91 92 93 94 95 |
# File 'lib/language_sniffer/blob_helper.rb', line 87 def generated? if xcode_project_file? || visual_studio_project_file? true elsif generated_coffeescript? || minified_javascript? || generated_net_docfile? true else false end end |
#generated_coffeescript? ⇒ Boolean
Internal: Is the blob JS generated by CoffeeScript?
Requires Blob#data
CoffeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns outputed by the CS compiler.
Return true or false
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/language_sniffer/blob_helper.rb', line 137 def generated_coffeescript? return unless extname == '.js' if lines[0] == '(function() {' && # First line is module closure opening lines[-2] == '}).call(this);' && # Second to last line closes module closure lines[-1] == '' # Last line is blank score = 0 lines.each do |line| if line =~ /var / # Underscored temp vars are likely to be Coffee score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count # bind and extend functions are very Coffee specific score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count end end # Require a score of 3. This is fairly arbitrary. Consider # tweaking later. score >= 3 else false end end |
#generated_net_docfile? ⇒ Boolean
Internal: Is this a generated documentation file for a .NET assembly?
Requires Blob#data
.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.
Returns true or false
174 175 176 177 178 179 180 181 182 183 |
# File 'lib/language_sniffer/blob_helper.rb', line 174 def generated_net_docfile? return false unless extname.downcase == ".xml" return false unless lines.count > 3 # .NET Docfiles always open with <doc> and their first tag is an # <assembly> tag return lines[1].include?("<doc>") && lines[2].include?("<assembly>") && lines[-2].include?("</doc>") end |
#guess_cls_language ⇒ Object
Internal: Guess language of .cls files
Returns a Language.
236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
# File 'lib/language_sniffer/blob_helper.rb', line 236 def guess_cls_language if lines.grep(/^(%|\\)/).any? Language['TeX'] elsif lines.grep(/^\s*(CLASS|METHOD|INTERFACE).*:\s*/i).any? || lines.grep(/^\s*(USING|DEFINE)/i).any? Language['OpenEdge ABL'] elsif lines.grep(/\{$/).any? || lines.grep(/\}$/).any? Language['Apex'] elsif lines.grep(/^(\'\*|Attribute|Option|Sub|Private|Protected|Public|Friend)/i).any? Language['Visual Basic'] else # The most common language should be the fallback Language['TeX'] end end |
#guess_gsp_language ⇒ Object
Internal: Guess language of .gsp files.
Returns a Language.
359 360 361 362 363 364 365 |
# File 'lib/language_sniffer/blob_helper.rb', line 359 def guess_gsp_language if lines.grep(/<%|<%@|\$\{|<%|<g:|<meta name="layout"|<r:/).any? Language['Groovy Server Pages'] else Language['Gosu'] end end |
#guess_h_language ⇒ Object
Internal: Guess language of header files (.h).
Returns a Language.
254 255 256 257 258 259 260 261 262 |
# File 'lib/language_sniffer/blob_helper.rb', line 254 def guess_h_language if lines.grep(/^@(interface|property|private|public|end)/).any? Language['Objective-C'] elsif lines.grep(/^class |^\s+(public|protected|private):/).any? Language['C++'] else Language['C'] end end |
#guess_language ⇒ Object
Internal: Guess language
Please add additional test coverage to ‘test/test_blob.rb#test_language` if you make any changes.
Returns a Language or nil
204 205 206 207 208 209 210 211 212 213 214 215 216 |
# File 'lib/language_sniffer/blob_helper.rb', line 204 def guess_language # Disambiguate between multiple language extensions disambiguate_extension_language || # See if there is a Language for the extension pathname.language || # Look for idioms in first line first_line_language || # Try to detect Language from shebang line shebang_language end |
#guess_m_language ⇒ Object
Internal: Guess language of .m files.
Objective-C heuristics:
-
Keywords
Matlab heuristics:
-
Leading function keyword
-
“%” comments
Returns a Language.
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
# File 'lib/language_sniffer/blob_helper.rb', line 274 def guess_m_language # Objective-C keywords if lines.grep(/^#import|@(interface|implementation|property|synthesize|end)/).any? Language['Objective-C'] # File function elsif lines.first.to_s =~ /^function / Language['Matlab'] # Matlab comment elsif lines.grep(/^%/).any? Language['Matlab'] # Fallback to Objective-C, don't want any Matlab false positives else Language['Objective-C'] end end |
#guess_pl_language ⇒ Object
Internal: Guess language of .pl files
The rules for disambiguation are:
-
Many perl files begin with a shebang
-
Most Prolog source files have a rule somewhere (marked by the :- operator)
-
Default to Perl, because it is more popular
Returns a Language.
302 303 304 305 306 307 308 309 310 |
# File 'lib/language_sniffer/blob_helper.rb', line 302 def guess_pl_language if shebang_script == 'perl' Language['Perl'] elsif lines.grep(/:-/).any? Language['Prolog'] else Language['Perl'] end end |
#guess_r_language ⇒ Object
Internal: Guess language of .r files.
Returns a Language.
315 316 317 318 319 320 321 |
# File 'lib/language_sniffer/blob_helper.rb', line 315 def guess_r_language if lines.grep(/(rebol|(:\s+func|make\s+object!|^\s*context)\s*\[)/i).any? Language['Rebol'] else Language['R'] end end |
#guess_t_language ⇒ Object
Internal: Guess language of .t files.
Returns a Language.
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
# File 'lib/language_sniffer/blob_helper.rb', line 326 def guess_t_language score = 0 score += 1 if lines.grep(/^% /).any? score += data.gsub(/ := /).count score += data.gsub(/proc |procedure |fcn |function /).count score += data.gsub(/var \w+: \w+/).count # Tell-tale signs its gotta be Perl if lines.grep(/^(my )?(sub |\$|@|%)\w+/).any? score = 0 end if score >= 3 Language['Turing'] else Language['Perl'] end end |
#guess_v_language ⇒ Object
Internal: Guess language of .v files.
Returns a Language
348 349 350 351 352 353 354 |
# File 'lib/language_sniffer/blob_helper.rb', line 348 def guess_v_language if lines.grep(/^(\/\*|\/\/|module|parameter|input|output|wire|reg|always|initial|begin|\`)/).any? Language['Verilog'] else Language['Coq'] end end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
190 191 192 193 194 195 196 |
# File 'lib/language_sniffer/blob_helper.rb', line 190 def language if defined? @language @language else @language = guess_language end end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
34 35 36 37 38 39 40 |
# File 'lib/language_sniffer/blob_helper.rb', line 34 def lines @lines ||= begin (data ? data.split("\n", -1) : []) rescue ArgumentError # invalid byte sequence in UTF-8 [] end end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
47 48 49 |
# File 'lib/language_sniffer/blob_helper.rb', line 47 def loc lines.size end |
#minified_javascript? ⇒ Boolean
Internal: Is the blob minified JS?
Consider JS minified if the average line length is greater then 100c.
Returns true or false.
123 124 125 126 |
# File 'lib/language_sniffer/blob_helper.rb', line 123 def minified_javascript? return unless extname == '.js' average_line_length > 100 end |
#pathname ⇒ Object
Internal: Get a Pathname wrapper for Blob#name
Returns a Pathname.
13 14 15 |
# File 'lib/language_sniffer/blob_helper.rb', line 13 def pathname Pathname.new(name || "") end |
#shebang_language ⇒ Object
Internal: Get Language for shebang script
Returns the Language or nil
432 433 434 435 436 |
# File 'lib/language_sniffer/blob_helper.rb', line 432 def shebang_language if script = shebang_script Language[script] end end |
#shebang_script ⇒ Object
Internal: Extract the script name from the shebang line
Requires Blob#data
Examples
'#!/usr/bin/ruby'
# => 'ruby'
'#!/usr/bin/env ruby'
# => 'ruby'
'#!/usr/bash/python2.4'
# => 'python'
Please add additional test coverage to ‘test/test_blob.rb#test_shebang_script` if you make any changes.
Returns a script name String or nil
397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 |
# File 'lib/language_sniffer/blob_helper.rb', line 397 def shebang_script if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/ bang.sub!(/^#! /, '#!') tokens = bang.split(' ') pieces = tokens.first.split('/') if pieces.size > 1 script = pieces.last else script = pieces.first.sub('#!', '') end script = script == 'env' ? tokens[1] : script # python2.4 => python if script =~ /((?:\d+\.?)+)/ script.sub! $1, '' end # Check for multiline shebang hacks that exec themselves # # #!/bin/sh # exec foo "$0" "$@" # if script == 'sh' && lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) } script = $1 end script end end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
56 57 58 |
# File 'lib/language_sniffer/blob_helper.rb', line 56 def sloc lines.grep(/\S/).size end |
#visual_studio_project_file? ⇒ Boolean
Internal: Is the blob a Visual Studio project file?
Generated if the file extension is a Visual Studio project file extension.
Returns true of false.
113 114 115 |
# File 'lib/language_sniffer/blob_helper.rb', line 113 def visual_studio_project_file? ['.csproj', '.dbproj', '.fsproj', '.pyproj', '.rbproj', '.vbproj', '.vcxproj', '.wixproj', '.resx', '.sln', '.vdproj', '.isproj'].include?(extname) end |
#xcode_project_file? ⇒ Boolean
Internal: Is the blob an XCode project file?
Generated if the file extension is an XCode project file extension.
Returns true of false.
103 104 105 |
# File 'lib/language_sniffer/blob_helper.rb', line 103 def xcode_project_file? ['.xib', '.nib', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname) end |