Module: Linguist::BlobHelper
- Included in:
- FileBlob
- Defined in:
- lib/linguist/blob_helper.rb
Overview
BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.
Constant Summary collapse
- MEGABYTE =
1024 * 1024
- VendoredRegexp =
Regexp.new(vendored_paths.join('|'))
Instance Method Summary collapse
-
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
-
#binary? ⇒ Boolean
Public: Is the blob binary?.
-
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type.
-
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob.
-
#colorize_without_wrapper(options = {}) ⇒ Object
Public: Highlight syntax of blob without the outer highlight div wrapper.
-
#content_type ⇒ Object
Public: Get the Content-Type header value.
-
#detect_encoding ⇒ Object
Try to guess the encoding.
-
#disposition ⇒ Object
Public: Get the Content-Disposition header value.
- #encoding ⇒ Object
-
#extname ⇒ Object
Public: Get the extname of the path.
-
#generated? ⇒ Boolean
Public: Is the blob a generated file?.
-
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?.
-
#image? ⇒ Boolean
Public: Is the blob a supported image format?.
-
#indexable? ⇒ Boolean
Public: Should the blob be indexed for searching?.
-
#language ⇒ Object
Public: Detects the Language of the blob.
-
#large? ⇒ Boolean
Public: Is the blob too big to load?.
-
#lexer ⇒ Object
Internal: Get the lexer of the blob.
-
#line_split_character ⇒ Object
Character used to split lines.
-
#lines ⇒ Object
Public: Get each line of data.
-
#loc ⇒ Object
Public: Get number of lines of code.
-
#mac_format? ⇒ Boolean
Public: Is the data in ** Mac Format **.
-
#mime_type ⇒ Object
Public: Get the actual blob mime type.
-
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?.
-
#sloc ⇒ Object
Public: Get number of source lines of code.
-
#solid? ⇒ Boolean
Public: Is the blob a support 3D model format?.
-
#text? ⇒ Boolean
Public: Is the blob text?.
-
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?.
-
#viewable? ⇒ Boolean
Public: Is the blob viewable?.
Instance Method Details
#_mime_type ⇒ Object
Internal: Lookup mime type for extension.
Returns a MIME::Type
29 30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/linguist/blob_helper.rb', line 29 def _mime_type if defined? @_mime_type @_mime_type else guesses = ::MIME::Types.type_for(extname.to_s) # Prefer text mime types over binary @_mime_type = guesses.detect { |type| type.ascii? } || # Otherwise use the first guess guesses.first end end |
#binary? ⇒ Boolean
Public: Is the blob binary?
Return true or false
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/linguist/blob_helper.rb', line 113 def binary? # Large blobs aren't even loaded into memory if data.nil? true # Treat blank files as text elsif data == "" false # Charlock doesn't know what to think elsif encoding.nil? true # If Charlock says its binary else detect_encoding[:type] == :binary end end |
#binary_mime_type? ⇒ Boolean
Internal: Is the blob binary according to its mime type
Return true or false
57 58 59 |
# File 'lib/linguist/blob_helper.rb', line 57 def binary_mime_type? _mime_type ? _mime_type.binary? : false end |
#colorize(options = {}) ⇒ Object
Public: Highlight syntax of blob
options - A Hash of options (defaults to {})
Returns html String
331 332 333 334 335 336 |
# File 'lib/linguist/blob_helper.rb', line 331 def colorize( = {}) return unless safe_to_colorize? [:options] ||= {} [:options][:encoding] ||= encoding lexer.highlight(data, ) end |
#colorize_without_wrapper(options = {}) ⇒ Object
Public: Highlight syntax of blob without the outer highlight div wrapper.
options - A Hash of options (defaults to {})
Returns html String
344 345 346 347 348 349 350 |
# File 'lib/linguist/blob_helper.rb', line 344 def colorize_without_wrapper( = {}) if text = colorize() text[%r{<div class="highlight"><pre>(.*?)</pre>\s*</div>}m, 1] else '' end end |
#content_type ⇒ Object
Public: Get the Content-Type header value
This value is used when serving raw blobs.
Examples
# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'
Returns a content type String.
71 72 73 74 |
# File 'lib/linguist/blob_helper.rb', line 71 def content_type @content_type ||= (binary_mime_type? || binary?) ? mime_type : (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain") end |
#detect_encoding ⇒ Object
Try to guess the encoding
Returns: a Hash, with :encoding, :confidence, :type
this will return nil if an error occurred during detection or
no valid encoding could be found
106 107 108 |
# File 'lib/linguist/blob_helper.rb', line 106 def detect_encoding nil # @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data end |
#disposition ⇒ Object
Public: Get the Content-Disposition header value
This value is used when serving raw blobs.
# => "attachment; filename=file.tar"
# => "inline"
Returns a content disposition String.
84 85 86 87 88 89 90 91 92 93 |
# File 'lib/linguist/blob_helper.rb', line 84 def disposition if text? || image? 'inline' elsif name.nil? "attachment" else #"attachment; filename=#{EscapeUtils.escape_url(File.basename(name))}" "attachment; filename=#{CGI.escape(File.basename(name))}" end end |
#encoding ⇒ Object
95 96 97 98 99 |
# File 'lib/linguist/blob_helper.rb', line 95 def encoding if hash = detect_encoding hash[:encoding] end end |
#extname ⇒ Object
Public: Get the extname of the path
Examples
blob(name='foo.rb').extname
# => '.rb'
Returns a String
22 23 24 |
# File 'lib/linguist/blob_helper.rb', line 22 def extname File.extname(name.to_s) end |
#generated? ⇒ Boolean
Public: Is the blob a generated file?
Generated source code is supressed in diffs and is ignored by language statistics.
May load Blob#data
Return true or false
268 269 270 |
# File 'lib/linguist/blob_helper.rb', line 268 def generated? @_generated ||= Generated.generated?(name, lambda { data }) end |
#high_ratio_of_long_lines? ⇒ Boolean
Internal: Does the blob have a ratio of long lines?
These types of files are usually going to make Pygments.rb angry if we try to colorize them.
Return true or false
180 181 182 183 |
# File 'lib/linguist/blob_helper.rb', line 180 def high_ratio_of_long_lines? return false if loc == 0 size / loc > 5000 end |
#image? ⇒ Boolean
Public: Is the blob a supported image format?
Return true or false
142 143 144 |
# File 'lib/linguist/blob_helper.rb', line 142 def image? ['.png', '.jpg', '.jpeg', '.gif'].include?(extname) end |
#indexable? ⇒ Boolean
Public: Should the blob be indexed for searching?
Excluded:
-
Files over 0.1MB
-
Non-text files
-
Langauges marked as not searchable
-
Generated source files
Please add additional test coverage to ‘test/test_blob.rb#test_indexable` if you make any changes.
Return true or false
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
# File 'lib/linguist/blob_helper.rb', line 284 def indexable? if size > 100 * 1024 false elsif binary? false elsif extname == '.txt' true elsif language.nil? false elsif !language.searchable? false elsif generated? false else true end end |
#language ⇒ Object
Public: Detects the Language of the blob.
May load Blob#data
Returns a Language or nil if none is detected
307 308 309 310 311 312 313 314 315 316 317 |
# File 'lib/linguist/blob_helper.rb', line 307 def language return @language if defined? @language if defined?(@data) && @data.is_a?(String) data = @data else data = lambda { (binary_mime_type? || binary?) ? "" : self.data } end @language = Language.detect(name.to_s, data, mode) end |
#large? ⇒ Boolean
Public: Is the blob too big to load?
Return true or false
158 159 160 |
# File 'lib/linguist/blob_helper.rb', line 158 def large? size.to_i > MEGABYTE end |
#lexer ⇒ Object
Internal: Get the lexer of the blob.
Returns a Lexer.
322 323 324 |
# File 'lib/linguist/blob_helper.rb', line 322 def lexer language ? language.lexer : Pygments::Lexer.find_by_name('Text only') end |
#line_split_character ⇒ Object
Character used to split lines. This is almost always “n” except when Mac Format is detected in which case it’s “r”.
Returns a split pattern string.
227 228 229 |
# File 'lib/linguist/blob_helper.rb', line 227 def line_split_character @line_split_character ||= (mac_format?? "\r" : "\n") end |
#lines ⇒ Object
Public: Get each line of data
Requires Blob#data
Returns an Array of lines
214 215 216 217 218 219 220 221 |
# File 'lib/linguist/blob_helper.rb', line 214 def lines @lines ||= if viewable? && data data.split(line_split_character, -1) else [] end end |
#loc ⇒ Object
Public: Get number of lines of code
Requires Blob#data
Returns Integer
247 248 249 |
# File 'lib/linguist/blob_helper.rb', line 247 def loc lines.size end |
#mac_format? ⇒ Boolean
Public: Is the data in ** Mac Format **. This format uses r (0x0d) characters for line ends and does not include a n (0x0a).
Returns true when mac format is detected.
235 236 237 238 239 240 |
# File 'lib/linguist/blob_helper.rb', line 235 def mac_format? return if !viewable? if pos = data[0, 4096].index("\r") data[pos + 1] != ?\n end end |
#mime_type ⇒ Object
Public: Get the actual blob mime type
Examples
# => 'text/plain'
# => 'text/html'
Returns a mime type String.
50 51 52 |
# File 'lib/linguist/blob_helper.rb', line 50 def mime_type _mime_type ? _mime_type.to_s : 'text/plain' end |
#safe_to_colorize? ⇒ Boolean
Public: Is the blob safe to colorize?
We use Pygments.rb for syntax highlighting blobs, which has some quirks and also is essentially ‘un-killable’ via normal timeout. To workaround this we try to carefully handling Pygments.rb anything it can’t handle.
Return true or false
170 171 172 |
# File 'lib/linguist/blob_helper.rb', line 170 def safe_to_colorize? !large? && text? && !high_ratio_of_long_lines? end |
#sloc ⇒ Object
Public: Get number of source lines of code
Requires Blob#data
Returns Integer
256 257 258 |
# File 'lib/linguist/blob_helper.rb', line 256 def sloc lines.grep(/\S/).size end |
#solid? ⇒ Boolean
Public: Is the blob a support 3D model format?
Return true or false
149 150 151 |
# File 'lib/linguist/blob_helper.rb', line 149 def solid? ['.stl', '.obj'].include?(extname) end |
#text? ⇒ Boolean
Public: Is the blob text?
Return true or false
135 136 137 |
# File 'lib/linguist/blob_helper.rb', line 135 def text? !binary? end |
#vendored? ⇒ Boolean
Public: Is the blob in a vendored directory?
Vendored files are ignored by language statistics.
See “vendor.yml” for a list of vendored conventions that match this pattern.
Return true or false
205 206 207 |
# File 'lib/linguist/blob_helper.rb', line 205 def vendored? name =~ VendoredRegexp ? true : false end |
#viewable? ⇒ Boolean
Public: Is the blob viewable?
Non-viewable blobs will just show a “View Raw” link
Return true or false
190 191 192 |
# File 'lib/linguist/blob_helper.rb', line 190 def viewable? !large? && text? end |