Class: CodeRay::Tokens
- Inherits:
-
Array
- Object
- Array
- CodeRay::Tokens
- Defined in:
- lib/coderay/tokens.rb,
lib/coderay/token_classes.rb
Overview
Tokens
The Tokens class represents a list of tokens returnd from a Scanner.
A token is not a special object, just a two-element Array consisting of
-
the token kind (a Symbol representing the type of the token)
-
the token text (the original source of the token in a String)
A token looks like this:
[:comment, '# It looks like this']
[:float, '3.1415926']
[:error, '$^']
Some scanners also yield some kind of sub-tokens, represented by special token texts, namely :open and :close .
The Ruby scanner, for example, splits “a string” into:
[
[:open, :string],
[:delimiter, '"'],
[:content, 'a string'],
[:delimiter, '"'],
[:close, :string]
]
Tokens is also the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.
Thus, the syntax below becomes clear:
CodeRay.scan('price = 2.59', :ruby).html
# the Tokens object is here -------^
See how small it is? ;)
Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip’ed string that you put in your DB.
Tokens’ subclass TokenStream allows streaming to save memory.
Direct Known Subclasses
Defined Under Namespace
Modules: Undumping
Constant Summary collapse
- ClassOfKind =
Hash.new do |h, k| h[k] = k.to_s end
Instance Attribute Summary collapse
-
#scanner ⇒ Object
The Scanner instance that created the tokens.
Class Method Summary collapse
-
.load(dump) ⇒ Object
Undump the object using Marshal.load, then unzip it using GZip.gunzip.
Instance Method Summary collapse
-
#dump(gzip_level = 7) ⇒ Object
Dumps the object into a String that can be saved in files or databases.
-
#each(kind_filter = nil, &block) ⇒ Object
Iterates over all tokens.
-
#each_text_token ⇒ Object
Iterates over all text tokens.
-
#encode(encoder, options = {}) ⇒ Object
Encode the tokens using encoder.
-
#fix ⇒ Object
Ensure that all :open tokens have a correspondent :close one.
- #fix! ⇒ Object
-
#method_missing(meth, options = {}) ⇒ Object
Redirects unknown methods to encoder calls.
-
#optimize ⇒ Object
Returns the tokens compressed by joining consecutive tokens of the same kind.
-
#optimize! ⇒ Object
Compact the object itself; see optimize.
-
#split_into_lines ⇒ Object
TODO: Scanner#split_into_lines.
- #split_into_lines! ⇒ Object
-
#stream? ⇒ Boolean
Whether the object is a TokenStream.
-
#text ⇒ Object
The total size of the tokens.
-
#text_size ⇒ Object
The total size of the tokens.
-
#to_s(options = {}) ⇒ Object
Turn into a string using Encoders::Text.
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(meth, options = {}) ⇒ Object
Redirects unknown methods to encoder calls.
For example, if you call tokens.html
, the HTML encoder is used to highlight the tokens.
116 117 118 |
# File 'lib/coderay/tokens.rb', line 116 def method_missing meth, = {} Encoders[meth].new().encode_tokens self end |
Instance Attribute Details
#scanner ⇒ Object
The Scanner instance that created the tokens.
51 52 53 |
# File 'lib/coderay/tokens.rb', line 51 def scanner @scanner end |
Class Method Details
.load(dump) ⇒ Object
Undump the object using Marshal.load, then unzip it using GZip.gunzip.
The result is commonly a Tokens object, but this is not guaranteed.
268 269 270 271 272 |
# File 'lib/coderay/tokens.rb', line 268 def Tokens.load dump require 'coderay/helpers/gzip_simple' dump = dump.gunzip @dump = Marshal.load dump end |
Instance Method Details
#dump(gzip_level = 7) ⇒ Object
Dumps the object into a String that can be saved in files or databases.
The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.
The returned String object includes Undumping so it has an #undump method. See Tokens.load.
You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.
See GZip module.
227 228 229 230 231 232 |
# File 'lib/coderay/tokens.rb', line 227 def dump gzip_level = 7 require 'coderay/helpers/gzip_simple' dump = Marshal.dump self dump = dump.gzip gzip_level dump.extend Undumping end |
#each(kind_filter = nil, &block) ⇒ Object
Iterates over all tokens.
If a filter is given, only tokens of that kind are yielded.
63 64 65 66 67 68 69 70 71 72 |
# File 'lib/coderay/tokens.rb', line 63 def each kind_filter = nil, &block unless kind_filter super(&block) else super() do |text, kind| next unless kind == kind_filter yield text, kind end end end |
#each_text_token ⇒ Object
Iterates over all text tokens. Range tokens like [:open, :string] are left out.
Example:
tokens.each_text_token { |text, kind| text.replace html_escape(text) }
79 80 81 82 83 84 |
# File 'lib/coderay/tokens.rb', line 79 def each_text_token each do |text, kind| next unless text.is_a? ::String yield text, kind end end |
#encode(encoder, options = {}) ⇒ Object
Encode the tokens using encoder.
encoder can be
-
a symbol like :html oder :statistic
-
an Encoder class
-
an Encoder object
options are passed to the encoder.
94 95 96 97 98 99 100 101 102 |
# File 'lib/coderay/tokens.rb', line 94 def encode encoder, = {} unless encoder.is_a? Encoders::Encoder unless encoder.is_a? Class encoder_class = Encoders[encoder] end encoder = encoder_class.new end encoder.encode_tokens self, end |
#fix ⇒ Object
Ensure that all :open tokens have a correspondent :close one.
TODO: Test this!
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/coderay/tokens.rb', line 164 def fix tokens = self.class.new # Check token nesting using a stack of kinds. opened = [] for type, kind in self case type when :open opened.push [:close, kind] when :begin_line opened.push [:end_line, kind] when :close, :end_line expected = opened.pop if [type, kind] != expected # Unexpected :close; decide what to do based on the kind: # - token was never opened: delete the :close (just skip it) next unless opened.rindex expected # - token was opened earlier: also close tokens in between tokens << token until (token = opened.pop) == expected end end tokens << [type, kind] end # Close remaining opened tokens tokens << token while token = opened.pop tokens end |
#fix! ⇒ Object
191 192 193 |
# File 'lib/coderay/tokens.rb', line 191 def fix! replace fix end |
#optimize ⇒ Object
Returns the tokens compressed by joining consecutive tokens of the same kind.
This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.
Combined with dump, it saves space for the cost of time.
If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
# File 'lib/coderay/tokens.rb', line 131 def optimize print ' Tokens#optimize: before: %d - ' % size if $DEBUG last_kind = last_text = nil new = self.class.new for text, kind in self if text.is_a? String if kind == last_kind last_text << text else new << [last_text, last_kind] if last_kind last_text = text last_kind = kind end else new << [last_text, last_kind] if last_kind last_kind = last_text = nil new << [text, kind] end end new << [last_text, last_kind] if last_kind print 'after: %d (%d saved = %2.0f%%)' % [new.size, size - new.size, 1.0 - (new.size.to_f / size)] if $DEBUG new end |
#optimize! ⇒ Object
Compact the object itself; see optimize.
157 158 159 |
# File 'lib/coderay/tokens.rb', line 157 def optimize! replace optimize end |
#split_into_lines ⇒ Object
TODO: Scanner#split_into_lines
Makes sure that:
-
newlines are single tokens (which means all other token are single-line)
-
there are no open tokens at the end the line
This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.
204 205 206 |
# File 'lib/coderay/tokens.rb', line 204 def split_into_lines raise NotImplementedError end |
#split_into_lines! ⇒ Object
208 209 210 |
# File 'lib/coderay/tokens.rb', line 208 def split_into_lines! replace split_into_lines end |
#stream? ⇒ Boolean
Whether the object is a TokenStream.
Returns false.
56 57 58 |
# File 'lib/coderay/tokens.rb', line 56 def stream? false end |
#text ⇒ Object
The total size of the tokens. Should be equal to the input size before scanning.
248 249 250 |
# File 'lib/coderay/tokens.rb', line 248 def text map { |t, k| t if t.is_a? ::String }.join end |
#text_size ⇒ Object
The total size of the tokens. Should be equal to the input size before scanning.
237 238 239 240 241 242 243 |
# File 'lib/coderay/tokens.rb', line 237 def text_size size = 0 each_text_token do |t, k| size + t.size end size end |
#to_s(options = {}) ⇒ Object
Turn into a string using Encoders::Text.
options
are passed to the encoder if given.
108 109 110 |
# File 'lib/coderay/tokens.rb', line 108 def to_s = {} encode :text, end |