Class: CodeRay::Scanners::Scanner
- Inherits:
-
StringScanner
- Object
- StringScanner
- CodeRay::Scanners::Scanner
- Extended by:
- Plugin
- Includes:
- Enumerable
- Defined in:
- lib/coderay/scanners/scanner.rb
Overview
Scanner
The base class for all Scanners.
It is a subclass of Ruby’s great StringScanner
, which makes it easy to access the scanning methods inside.
It is also Enumerable
, so you can use it like an Array of Tokens:
require 'coderay'
c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
for text, kind in c_scanner
puts text if kind == :operator
end
# prints: (*==)++;
OK, this is a very simple example :) You can also use map
, any?
, find
and even sort_by
, if you want.
Direct Known Subclasses
C, CPlusPlus, CSS, Clojure, Debug, Delphi, Diff, ERB, Go, HAML, HTML, JSON, Java, JavaScript, Lua, PHP, Python, Raydebug, Ruby, SQL, Taskpaper, Text, YAML
Constant Summary collapse
- ScanError =
Raised if a Scanner fails while scanning
Class.new StandardError
- DEFAULT_OPTIONS =
The default options for all scanner classes.
Define @default_options for subclasses.
{ }
- KINDS_NOT_LOC =
[:comment, :doctype, :docstring]
Instance Attribute Summary collapse
-
#state ⇒ Object
Returns the value of attribute state.
Attributes included from Plugin
Class Method Summary collapse
-
.encoding(name = 'UTF-8') ⇒ Object
The encoding used internally by this scanner.
-
.file_extension(extension = lang) ⇒ Object
The typical filename suffix for this scanner’s language.
-
.lang ⇒ Object
The lang of this Scanner class, which is equal to its Plugin ID.
-
.normalize(code) ⇒ Object
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders.
Instance Method Summary collapse
-
#binary_string ⇒ Object
The string in binary encoding.
-
#column(pos = self.pos) ⇒ Object
The current column position of the scanner, starting with 1.
-
#each(&block) ⇒ Object
Traverse the tokens.
-
#file_extension ⇒ Object
the default file extension for this scanner.
-
#initialize(code = '', options = {}) ⇒ Scanner
constructor
Create a new Scanner.
-
#lang ⇒ Object
the Plugin ID for this scanner.
-
#line(pos = self.pos) ⇒ Object
The current line position of the scanner, starting with 1.
-
#reset ⇒ Object
Sets back the scanner.
-
#string=(code) ⇒ Object
Set a new string to be scanned.
-
#tokenize(source = nil, options = {}) ⇒ Object
Scan the code and returns all tokens in a Tokens object.
-
#tokens ⇒ Object
Cache the result of tokenize.
Methods included from Plugin
aliases, plugin_host, register_for, title
Constructor Details
#initialize(code = '', options = {}) ⇒ Scanner
Create a new Scanner.
-
code
is the input String and is handled by the superclass StringScanner. -
options
is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)
Else, a Tokens object is used.
125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# File 'lib/coderay/scanners/scanner.rb', line 125 def initialize code = '', = {} if self.class == Scanner raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." end @options = self.class::DEFAULT_OPTIONS.merge super self.class.normalize(code) @tokens = [:tokens] || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= setup end |
Instance Attribute Details
#state ⇒ Object
Returns the value of attribute state.
44 45 46 |
# File 'lib/coderay/scanners/scanner.rb', line 44 def state @state end |
Class Method Details
.encoding(name = 'UTF-8') ⇒ Object
The encoding used internally by this scanner.
71 72 73 |
# File 'lib/coderay/scanners/scanner.rb', line 71 def encoding name = 'UTF-8' @encoding ||= defined?(Encoding.find) && Encoding.find(name) end |
.file_extension(extension = lang) ⇒ Object
The typical filename suffix for this scanner’s language.
66 67 68 |
# File 'lib/coderay/scanners/scanner.rb', line 66 def file_extension extension = lang @file_extension ||= extension.to_s end |
.lang ⇒ Object
The lang of this Scanner class, which is equal to its Plugin ID.
76 77 78 |
# File 'lib/coderay/scanners/scanner.rb', line 76 def lang @plugin_id end |
.normalize(code) ⇒ Object
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/coderay/scanners/scanner.rb', line 51 def normalize code # original = code code = code.to_s unless code.is_a? ::String return code if code.empty? if code.respond_to? :encoding code = encode_with_encoding code, self.encoding else code = to_unix code end # code = code.dup if code.eql? original code end |
Instance Method Details
#binary_string ⇒ Object
The string in binary encoding.
To be used with #pos, which is the index of the byte the scanner will scan next.
218 219 220 221 222 223 224 225 226 227 |
# File 'lib/coderay/scanners/scanner.rb', line 218 def binary_string @binary_string ||= if string.respond_to?(:bytesize) && string.bytesize != string.size #:nocov: string.dup.force_encoding('binary') #:nocov: else string end end |
#column(pos = self.pos) ⇒ Object
The current column position of the scanner, starting with 1. See also: #line.
209 210 211 212 |
# File 'lib/coderay/scanners/scanner.rb', line 209 def column pos = self.pos return 1 if pos <= 0 pos - (binary_string.rindex(?\n, pos - 1) || -1) end |
#each(&block) ⇒ Object
Traverse the tokens.
192 193 194 |
# File 'lib/coderay/scanners/scanner.rb', line 192 def each &block tokens.each(&block) end |
#file_extension ⇒ Object
the default file extension for this scanner
160 161 162 |
# File 'lib/coderay/scanners/scanner.rb', line 160 def file_extension self.class.file_extension end |
#lang ⇒ Object
the Plugin ID for this scanner
155 156 157 |
# File 'lib/coderay/scanners/scanner.rb', line 155 def lang self.class.lang end |
#line(pos = self.pos) ⇒ Object
The current line position of the scanner, starting with 1. See also: #column.
Beware, this is implemented inefficiently. It should be used for debugging only.
202 203 204 205 |
# File 'lib/coderay/scanners/scanner.rb', line 202 def line pos = self.pos return 1 if pos <= 0 binary_string[0...pos].count("\n") + 1 end |
#reset ⇒ Object
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
142 143 144 145 |
# File 'lib/coderay/scanners/scanner.rb', line 142 def reset super reset_instance end |
#string=(code) ⇒ Object
Set a new string to be scanned.
148 149 150 151 152 |
# File 'lib/coderay/scanners/scanner.rb', line 148 def string= code code = self.class.normalize(code) super code reset_instance end |
#tokenize(source = nil, options = {}) ⇒ Object
Scan the code and returns all tokens in a Tokens object.
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/coderay/scanners/scanner.rb', line 165 def tokenize source = nil, = {} = @options.merge() set_string_from_source source begin scan_tokens @tokens, rescue => e = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] raise_inspect e., @tokens, , 30, e.backtrace end @cached_tokens = @tokens if source.is_a? Array @tokens.split_into_parts(*source.map { |part| part.size }) else @tokens end end |
#tokens ⇒ Object
Cache the result of tokenize.
187 188 189 |
# File 'lib/coderay/scanners/scanner.rb', line 187 def tokens @cached_tokens ||= tokenize end |