Module: LanguageSniffer::BlobHelper
- Included in:
- FileBlob
- Defined in:
- lib/language_sniffer/blob_helper.rb
Overview
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Instance Method Summary collapse
-
#average_line_length ⇒ Object
Internal: Compute average line length.
-
#disambiguate_extension_language ⇒ Object
Internal: Disambiguates between multiple language extensions.
-
#extname ⇒ Object
Public: Get the extname of the path.
-
#first_line_language ⇒ Object
Internal: Guess language from the first line.
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#generated_coffeescript? ⇒ Boolean
Internal: Is the blob JS generated by CoffeeScript?.
-
#generated_net_docfile? ⇒ Boolean
Internal: Is this a generated documentation file for a .NET assembly?.
-
#guess_gsp_language ⇒ Object
Internal: Guess language of .gsp files.
-
#guess_h_language ⇒ Object
Internal: Guess language of header files (.h).
-
#guess_language ⇒ Object
Internal: Guess language.
-
#guess_m_language ⇒ Object
Internal: Guess language of .m files.
-
#guess_pl_language ⇒ Object
Internal: Guess language of .pl files.
-
#guess_r_language ⇒ Object
Internal: Guess language of .r files.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#minified_javascript? ⇒ Boolean
Internal: Is the blob minified JS?.
-
#pathname ⇒ Object
Internal: Get a Pathname wrapper for Blob#name.
-
#shebang_language ⇒ Object
Internal: Get Language for shebang script.
-
#shebang_script ⇒ Object
Internal: Extract the script name from the shebang line.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#visual_studio_project_file? ⇒ Boolean
Internal: Is the blob a Visual Studio project file?.
-
#xcode_project_file? ⇒ Boolean
Internal: Is the blob an XCode project file?.
Instance Method Details
#average_line_length ⇒ Object
Internal: Compute average line length.
Returns Integer.
58 59 60 61 62 63 64 |
# File 'lib/language_sniffer/blob_helper.rb', line 58 def average_line_length if lines.any? lines.inject(0) { |n, l| n += l.length } / lines.length else 0 end end |
#disambiguate_extension_language ⇒ Object
Internal: Disambiguates between multiple language extensions.
Delegates to “guess_EXTENSION_language”.
Please add additional test coverage to ‘test/test_blob.rb#test_language` if you add another method.
Returns a Language or nil.
221 222 223 224 225 226 |
# File 'lib/language_sniffer/blob_helper.rb', line 221 def disambiguate_extension_language if Language.ambiguous?(extname) name = "guess_#{extname.sub(/^\./, '')}_language" send(name) if respond_to?(name) end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
24 25 26 |
# File 'lib/language_sniffer/blob_helper.rb', line 24 def extname pathname.extname end |
#first_line_language ⇒ Object
Internal: Guess language from the first line.
Look for leading “<?php”
Returns a Language.
316 317 318 319 320 |
# File 'lib/language_sniffer/blob_helper.rb', line 316 def first_line_language if lines.first.to_s =~ /^<\?php/ Language['PHP'] end end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is supressed in diffs and is ignored by language statistics.
Requires Blob#data
Includes:
-
XCode project XML files
-
Visual Studio project XNL files
-
Minified JavaScript
Please add additional test coverage to ‘test/test_blob.rb#test_generated` if you make any changes.
Return true or false
82 83 84 85 86 87 88 89 90 |
# File 'lib/language_sniffer/blob_helper.rb', line 82 def generated? if xcode_project_file? || visual_studio_project_file? true elsif generated_coffeescript? || minified_javascript? || generated_net_docfile? true else false end end |
#generated_coffeescript? ⇒ Boolean
Internal: Is the blob JS generated by CoffeeScript?
Requires Blob#data
CoffeScript is meant to output JS that would be difficult to tell if it was generated or not. Look for a number of patterns outputed by the CS compiler.
Return true or false
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/language_sniffer/blob_helper.rb', line 132 def generated_coffeescript? return unless extname == '.js' if lines[0] == '(function() {' && # First line is module closure opening lines[-2] == '}).call(this);' && # Second to last line closes module closure lines[-1] == '' # Last line is blank score = 0 lines.each do |line| if line =~ /var / # Underscored temp vars are likely to be Coffee score += 1 * line.gsub(/(_fn|_i|_len|_ref|_results)/).count # bind and extend functions are very Coffee specific score += 3 * line.gsub(/(__bind|__extends|__hasProp|__indexOf|__slice)/).count end end # Require a score of 3. This is fairly arbitrary. Consider # tweaking later. score >= 3 else false end end |
#generated_net_docfile? ⇒ Boolean
Internal: Is this a generated documentation file for a .NET assembly?
Requires Blob#data
.NET developers often check in the XML Intellisense file along with an assembly - however, these don’t have a special extension, so we have to dig into the contents to determine if it’s a docfile. Luckily, these files are extremely structured, so recognizing them is easy.
Returns true or false
169 170 171 172 173 174 175 176 177 178 |
# File 'lib/language_sniffer/blob_helper.rb', line 169 def generated_net_docfile? return false unless extname.downcase == ".xml" return false unless lines.count > 3 # .NET Docfiles always open with <doc> and their first tag is an # <assembly> tag return lines[1].include?("<doc>") && lines[2].include?("<assembly>") && lines[-2].include?("</doc>") end |
#guess_gsp_language ⇒ Object
Internal: Guess language of .gsp files.
Returns a Language.
303 304 305 306 307 308 309 |
# File 'lib/language_sniffer/blob_helper.rb', line 303 def guess_gsp_language if lines.grep(/<%|<%@|\$\{|<%|<g:|<meta name="layout"|<r:/).any? Language['Groovy Server Pages'] else Language['Gosu'] end end |
#guess_h_language ⇒ Object
Internal: Guess language of header files (.h).
Returns a Language.
231 232 233 234 235 236 237 238 239 |
# File 'lib/language_sniffer/blob_helper.rb', line 231 def guess_h_language if lines.grep(/^@(interface|property|private|public|end)/).any? Language['Objective-C'] elsif lines.grep(/^class |^\s+(public|protected|private):/).any? Language['C++'] else Language['C'] end end |
#guess_language ⇒ Object
Internal: Guess language
Please add additional test coverage to ‘test/test_blob.rb#test_language` if you make any changes.
Returns a Language or nil
199 200 201 202 203 204 205 206 207 208 209 210 211 |
# File 'lib/language_sniffer/blob_helper.rb', line 199 def guess_language # Disambiguate between multiple language extensions disambiguate_extension_language || # See if there is a Language for the extension pathname.language || # Look for idioms in first line first_line_language || # Try to detect Language from shebang line shebang_language end |
#guess_m_language ⇒ Object
Internal: Guess language of .m files.
Objective-C heuristics:
-
Keywords
Matlab heuristics:
-
Leading function keyword
-
“%” comments
Returns a Language.
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
# File 'lib/language_sniffer/blob_helper.rb', line 251 def guess_m_language # Objective-C keywords if lines.grep(/^#import|@(interface|implementation|property|synthesize|end)/).any? Language['Objective-C'] # File function elsif lines.first.to_s =~ /^function / Language['Matlab'] # Matlab comment elsif lines.grep(/^%/).any? Language['Matlab'] # Fallback to Objective-C, don't want any Matlab false positives else Language['Objective-C'] end end |
#guess_pl_language ⇒ Object
Internal: Guess language of .pl files
The rules for disambiguation are:
-
Many perl files begin with a shebang
-
Most Prolog source files have a rule somewhere (marked by the :- operator)
-
Default to Perl, because it is more popular
Returns a Language.
279 280 281 282 283 284 285 286 287 |
# File 'lib/language_sniffer/blob_helper.rb', line 279 def guess_pl_language if shebang_script == 'perl' Language['Perl'] elsif lines.grep(/:-/).any? Language['Prolog'] else Language['Perl'] end end |
#guess_r_language ⇒ Object
Internal: Guess language of .r files.
Returns a Language.
292 293 294 295 296 297 298 |
# File 'lib/language_sniffer/blob_helper.rb', line 292 def guess_r_language if lines.grep(/(rebol|(:\s+func|make\s+object!|^\s*context)\s*\[)/i).any? Language['Rebol'] else Language['R'] end end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
185 186 187 188 189 190 191 |
# File 'lib/language_sniffer/blob_helper.rb', line 185 def language if defined? @language @language else @language = guess_language end end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
33 34 35 |
# File 'lib/language_sniffer/blob_helper.rb', line 33 def lines @lines ||= (data ? data.split("\n", -1) : []) end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
42 43 44 |
# File 'lib/language_sniffer/blob_helper.rb', line 42 def loc lines.size end |
#minified_javascript? ⇒ Boolean
Internal: Is the blob minified JS?
Consider JS minified if the average line length is greater then 100c.
Returns true or false.
118 119 120 121 |
# File 'lib/language_sniffer/blob_helper.rb', line 118 def minified_javascript? return unless extname == '.js' average_line_length > 100 end |
#pathname ⇒ Object
Internal: Get a Pathname wrapper for Blob#name
Returns a Pathname.
12 13 14 |
# File 'lib/language_sniffer/blob_helper.rb', line 12 def pathname Pathname.new(name || "") end |
#shebang_language ⇒ Object
Internal: Get Language for shebang script
Returns the Language or nil
376 377 378 379 380 |
# File 'lib/language_sniffer/blob_helper.rb', line 376 def shebang_language if script = shebang_script Language[script] end end |
#shebang_script ⇒ Object
Internal: Extract the script name from the shebang line
Requires Blob#data
Examples
'#!/usr/bin/ruby'
# => 'ruby'
'#!/usr/bin/env ruby'
# => 'ruby'
'#!/usr/bash/python2.4'
# => 'python'
Please add additional test coverage to ‘test/test_blob.rb#test_shebang_script` if you make any changes.
Returns a script name String or nil
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 |
# File 'lib/language_sniffer/blob_helper.rb', line 341 def shebang_script if lines.any? && (match = lines[0].match(/(.+)\n?/)) && (bang = match[0]) =~ /^#!/ bang.sub!(/^#! /, '#!') tokens = bang.split(' ') pieces = tokens.first.split('/') if pieces.size > 1 script = pieces.last else script = pieces.first.sub('#!', '') end script = script == 'env' ? tokens[1] : script # python2.4 => python if script =~ /((?:\d+\.?)+)/ script.sub! $1, '' end # Check for multiline shebang hacks that exec themselves # # #!/bin/sh # exec foo "$0" "$@" # if script == 'sh' && lines[0...5].any? { |l| l.match(/exec (\w+).+\$0.+\$@/) } script = $1 end script end end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
51 52 53 |
# File 'lib/language_sniffer/blob_helper.rb', line 51 def sloc lines.grep(/\S/).size end |
#visual_studio_project_file? ⇒ Boolean
Internal: Is the blob a Visual Studio project file?
Generated if the file extension is a Visual Studio project file extension.
Returns true of false.
108 109 110 |
# File 'lib/language_sniffer/blob_helper.rb', line 108 def visual_studio_project_file? ['.csproj', '.dbproj', '.fsproj', '.pyproj', '.rbproj', '.vbproj', '.vcxproj', '.wixproj', '.resx', '.sln', '.vdproj', '.isproj'].include?(extname) end |
#xcode_project_file? ⇒ Boolean
Internal: Is the blob an XCode project file?
Generated if the file extension is an XCode project file extension.
Returns true of false.
98 99 100 |
# File 'lib/language_sniffer/blob_helper.rb', line 98 def xcode_project_file? ['.xib', '.nib', '.pbxproj', '.xcworkspacedata', '.xcuserstate'].include?(extname) end |