Class: CodeRay::Scanners::Scanner

Inherits:
StringScanner
  • Object
show all
Extended by:
Plugin
Includes:
Enumerable
Defined in:
lib/coderay/scanners/scanner.rb

Overview

Scanner

The base class for all Scanners.

It is a subclass of Ruby’s great StringScanner, which makes it easy to access the scanning methods inside.

It is also Enumerable, so you can use it like an Array of Tokens:

require 'coderay'

c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"

for text, kind in c_scanner
  puts text if kind == :operator
end

# prints: (*==)++;

OK, this is a very simple example :) You can also use map, any?, find and even sort_by, if you want.

Constant Summary collapse

ScanError =

Raised if a Scanner fails while scanning

Class.new StandardError
DEFAULT_OPTIONS =

The default options for all scanner classes.

Define @default_options for subclasses.

{ }
KINDS_NOT_LOC =
[:comment, :doctype, :docstring]

Instance Attribute Summary collapse

Attributes included from Plugin

#plugin_id

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Plugin

aliases, plugin_host, register_for, title

Constructor Details

#initialize(code = '', options = {}) ⇒ Scanner

Create a new Scanner.

  • code is the input String and is handled by the superclass StringScanner.

  • options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)

Else, a Tokens object is used.



125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/coderay/scanners/scanner.rb', line 125

def initialize code = '', options = {}
  if self.class == Scanner
    raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses."
  end
  
  @options = self.class::DEFAULT_OPTIONS.merge options
  
  super self.class.normalize(code)
  
  @tokens = options[:tokens] || Tokens.new
  @tokens.scanner = self if @tokens.respond_to? :scanner=
  
  setup
end

Instance Attribute Details

#stateObject

Returns the value of attribute state.



44
45
46
# File 'lib/coderay/scanners/scanner.rb', line 44

def state
  @state
end

Class Method Details

.encoding(name = 'UTF-8') ⇒ Object

The encoding used internally by this scanner.



71
72
73
# File 'lib/coderay/scanners/scanner.rb', line 71

def encoding name = 'UTF-8'
  @encoding ||= defined?(Encoding.find) && Encoding.find(name)
end

.file_extension(extension = lang) ⇒ Object

The typical filename suffix for this scanner’s language.



66
67
68
# File 'lib/coderay/scanners/scanner.rb', line 66

def file_extension extension = lang
  @file_extension ||= extension.to_s
end

.langObject

The lang of this Scanner class, which is equal to its Plugin ID.



76
77
78
# File 'lib/coderay/scanners/scanner.rb', line 76

def lang
  @plugin_id
end

.normalize(code) ⇒ Object

Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.



51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/coderay/scanners/scanner.rb', line 51

def normalize code
  # original = code
  code = code.to_s unless code.is_a? ::String
  return code if code.empty?
  
  if code.respond_to? :encoding
    code = encode_with_encoding code, self.encoding
  else
    code = to_unix code
  end
  # code = code.dup if code.eql? original
  code
end

Instance Method Details

#binary_stringObject

The string in binary encoding.

To be used with #pos, which is the index of the byte the scanner will scan next.



218
219
220
221
222
223
224
225
226
227
# File 'lib/coderay/scanners/scanner.rb', line 218

def binary_string
  @binary_string ||=
    if string.respond_to?(:bytesize) && string.bytesize != string.size
      #:nocov:
      string.dup.force_encoding('binary')
      #:nocov:
    else
      string
    end
end

#column(pos = self.pos) ⇒ Object

The current column position of the scanner, starting with 1. See also: #line.



209
210
211
212
# File 'lib/coderay/scanners/scanner.rb', line 209

def column pos = self.pos
  return 1 if pos <= 0
  pos - (binary_string.rindex(?\n, pos - 1) || -1)
end

#each(&block) ⇒ Object

Traverse the tokens.



192
193
194
# File 'lib/coderay/scanners/scanner.rb', line 192

def each &block
  tokens.each(&block)
end

#file_extensionObject

the default file extension for this scanner



160
161
162
# File 'lib/coderay/scanners/scanner.rb', line 160

def file_extension
  self.class.file_extension
end

#langObject

the Plugin ID for this scanner



155
156
157
# File 'lib/coderay/scanners/scanner.rb', line 155

def lang
  self.class.lang
end

#line(pos = self.pos) ⇒ Object

The current line position of the scanner, starting with 1. See also: #column.

Beware, this is implemented inefficiently. It should be used for debugging only.



202
203
204
205
# File 'lib/coderay/scanners/scanner.rb', line 202

def line pos = self.pos
  return 1 if pos <= 0
  binary_string[0...pos].count("\n") + 1
end

#resetObject

Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.



142
143
144
145
# File 'lib/coderay/scanners/scanner.rb', line 142

def reset
  super
  reset_instance
end

#string=(code) ⇒ Object

Set a new string to be scanned.



148
149
150
151
152
# File 'lib/coderay/scanners/scanner.rb', line 148

def string= code
  code = self.class.normalize(code)
  super code
  reset_instance
end

#tokenize(source = nil, options = {}) ⇒ Object

Scan the code and returns all tokens in a Tokens object.



165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/coderay/scanners/scanner.rb', line 165

def tokenize source = nil, options = {}
  options = @options.merge(options)
  
  set_tokens_from_options options
  set_string_from_source source
  
  begin
    scan_tokens @tokens, options
  rescue => e
    message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state]
    raise_inspect e.message, @tokens, message, 30, e.backtrace
  end
  
  @cached_tokens = @tokens
  if source.is_a? Array
    @tokens.split_into_parts(*source.map { |part| part.size })
  else
    @tokens
  end
end

#tokensObject

Cache the result of tokenize.



187
188
189
# File 'lib/coderay/scanners/scanner.rb', line 187

def tokens
  @cached_tokens ||= tokenize
end