Class: CodeRay::Tokens

Inherits:

Array

Object
Array
CodeRay::Tokens

show all

Defined in:: lib/coderay/tokens.rb,
lib/coderay/token_classes.rb

Overview

Tokens

The Tokens class represents a list of tokens returnd from a Scanner.

A token is not a special object, just a two-element Array consisting of

the token kind (a Symbol representing the type of the token)
the token text (the original source of the token in a String)

A token looks like this:

[:comment, '# It looks like this']
[:float, '3.1415926']
[:error, '$^']

Some scanners also yield some kind of sub-tokens, represented by special token texts, namely :open and :close .

The Ruby scanner, for example, splits “a string” into:

[
 [:open, :string],
 [:delimiter, '"'],
 [:content, 'a string'],
 [:delimiter, '"'],
 [:close, :string]
]

Tokens is also the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.

Thus, the syntax below becomes clear:

CodeRay.scan('price = 2.59', :ruby).html
# the Tokens object is here -------^

See how small it is? ;)

Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip’ed string that you put in your DB.

Tokens’ subclass TokenStream allows streaming to save memory.

Direct Known Subclasses

TokenStream

Defined Under Namespace

Modules: Undumping

Constant Summary collapse

ClassOfKind =

Hash.new do |h, k|
  h[k] = k.to_s
end

Instance Attribute Summary collapse

#scanner ⇒ Object

The Scanner instance that created the tokens.

Class Method Summary collapse

.load(dump) ⇒ Object

Undump the object using Marshal.load, then unzip it using GZip.gunzip.

Instance Method Summary collapse

#dump(gzip_level = 7) ⇒ Object

Dumps the object into a String that can be saved in files or databases.
#each(kind_filter = nil, &block) ⇒ Object

Iterates over all tokens.
#each_text_token ⇒ Object

Iterates over all text tokens.
#encode(encoder, options = {}) ⇒ Object

Encode the tokens using encoder.
#fix ⇒ Object

Ensure that all :open tokens have a correspondent :close one.
#fix! ⇒ Object
#method_missing(meth, options = {}) ⇒ Object

Redirects unknown methods to encoder calls.
#optimize ⇒ Object

Returns the tokens compressed by joining consecutive tokens of the same kind.
#optimize! ⇒ Object

Compact the object itself; see optimize.
#split_into_lines ⇒ Object

TODO: Scanner#split_into_lines.
#split_into_lines! ⇒ Object
#stream? ⇒ Boolean

Whether the object is a TokenStream.
#text ⇒ Object

The total size of the tokens.
#text_size ⇒ Object

The total size of the tokens.
#to_s(options = {}) ⇒ Object

Turn into a string using Encoders::Text.

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(meth, options = {}) ⇒ `Object`

Redirects unknown methods to encoder calls.

For example, if you call tokens.html, the HTML encoder is used to highlight the tokens.



116
117
118

# File 'lib/coderay/tokens.rb', line 116

def method_missing meth, options = {}
  Encoders[meth].new(options).encode_tokens self
end

Instance Attribute Details

#scanner ⇒ `Object`

The Scanner instance that created the tokens.



51
52
53

# File 'lib/coderay/tokens.rb', line 51

def scanner
  @scanner
end

Class Method Details

.load(dump) ⇒ `Object`

Undump the object using Marshal.load, then unzip it using GZip.gunzip.

The result is commonly a Tokens object, but this is not guaranteed.

# File 'lib/coderay/tokens.rb', line 268

def Tokens.load dump
  require 'coderay/helpers/gzip_simple'
  dump = dump.gunzip
  @dump = Marshal.load dump
end

Instance Method Details

#dump(gzip_level = 7) ⇒ `Object`

Dumps the object into a String that can be saved in files or databases.

The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.

The returned String object includes Undumping so it has an #undump method. See Tokens.load.

You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.

See GZip module.

# File 'lib/coderay/tokens.rb', line 227

def dump gzip_level = 7
  require 'coderay/helpers/gzip_simple'
  dump = Marshal.dump self
  dump = dump.gzip gzip_level
  dump.extend Undumping
end

#each(kind_filter = nil, &block) ⇒ `Object`

Iterates over all tokens.

If a filter is given, only tokens of that kind are yielded.

# File 'lib/coderay/tokens.rb', line 63

def each kind_filter = nil, &block
  unless kind_filter
    super(&block)
  else
    super() do |text, kind|
      next unless kind == kind_filter
      yield text, kind
    end
  end
end

#each_text_token ⇒ `Object`

Iterates over all text tokens. Range tokens like [:open, :string] are left out.

Example:

tokens.each_text_token { |text, kind| text.replace html_escape(text) }

# File 'lib/coderay/tokens.rb', line 79

def each_text_token
  each do |text, kind|
    next unless text.is_a? ::String
    yield text, kind
  end
end

#encode(encoder, options = {}) ⇒ `Object`

Encode the tokens using encoder.

encoder can be

a symbol like :html oder :statistic
an Encoder class
an Encoder object

options are passed to the encoder.

# File 'lib/coderay/tokens.rb', line 94

def encode encoder, options = {}
  unless encoder.is_a? Encoders::Encoder
    unless encoder.is_a? Class
      encoder_class = Encoders[encoder]
    end
    encoder = encoder_class.new options
  end
  encoder.encode_tokens self, options
end

#fix ⇒ `Object`

Ensure that all :open tokens have a correspondent :close one.

TODO: Test this!

# File 'lib/coderay/tokens.rb', line 164

def fix
  tokens = self.class.new
  # Check token nesting using a stack of kinds.
  opened = []
  for type, kind in self
    case type
    when :open
      opened.push [:close, kind]
    when :begin_line
      opened.push [:end_line, kind]
    when :close, :end_line
      expected = opened.pop
      if [type, kind] != expected
        # Unexpected :close; decide what to do based on the kind:
        # - token was never opened: delete the :close (just skip it)
        next unless opened.rindex expected
        # - token was opened earlier: also close tokens in between
        tokens << token until (token = opened.pop) == expected
      end
    end
    tokens << [type, kind]
  end
  # Close remaining opened tokens
  tokens << token while token = opened.pop
  tokens
end

#fix! ⇒ `Object`



191
192
193

# File 'lib/coderay/tokens.rb', line 191

def fix!
  replace fix
end

#optimize ⇒ `Object`

Returns the tokens compressed by joining consecutive tokens of the same kind.

This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.

Combined with dump, it saves space for the cost of time.

If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.

# File 'lib/coderay/tokens.rb', line 131

def optimize
  print ' Tokens#optimize: before: %d - ' % size if $DEBUG
  last_kind = last_text = nil
  new = self.class.new
  for text, kind in self
    if text.is_a? String
      if kind == last_kind
        last_text << text
      else
        new << [last_text, last_kind] if last_kind
        last_text = text
        last_kind = kind
      end
    else
      new << [last_text, last_kind] if last_kind
      last_kind = last_text = nil
      new << [text, kind]
    end
  end
  new << [last_text, last_kind] if last_kind
  print 'after: %d (%d saved = %2.0f%%)' %
    [new.size, size - new.size, 1.0 - (new.size.to_f / size)] if $DEBUG
  new
end

#optimize! ⇒ `Object`

Compact the object itself; see optimize.



157
158
159

# File 'lib/coderay/tokens.rb', line 157

def optimize!
  replace optimize
end

#split_into_lines ⇒ `Object`

TODO: Scanner#split_into_lines

Makes sure that:

newlines are single tokens (which means all other token are single-line)
there are no open tokens at the end the line

This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.

Raises:

(NotImplementedError)



204
205
206

# File 'lib/coderay/tokens.rb', line 204

def split_into_lines
  raise NotImplementedError
end

#split_into_lines! ⇒ `Object`



208
209
210

# File 'lib/coderay/tokens.rb', line 208

def split_into_lines!
  replace split_into_lines
end

#stream? ⇒ `Boolean`

Whether the object is a TokenStream.

Returns false.

Returns:

(Boolean)



56
57
58

# File 'lib/coderay/tokens.rb', line 56

def stream?
  false
end

#text ⇒ `Object`

The total size of the tokens. Should be equal to the input size before scanning.



248
249
250

# File 'lib/coderay/tokens.rb', line 248

def text
  map { |t, k| t if t.is_a? ::String }.join
end

#text_size ⇒ `Object`

The total size of the tokens. Should be equal to the input size before scanning.

# File 'lib/coderay/tokens.rb', line 237

def text_size
  size = 0
  each_text_token do |t, k|
    size + t.size
  end
  size
end

#to_s(options = {}) ⇒ `Object`

Turn into a string using Encoders::Text.

options are passed to the encoder if given.



108
109
110

# File 'lib/coderay/tokens.rb', line 108

def to_s options = {}
  encode :text, options
end

Class: CodeRay::Tokens

Overview

Tokens

Direct Known Subclasses

Defined Under Namespace

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Dynamic Method Handling

#method_missing(meth, options = {}) ⇒ Object

Instance Attribute Details

#scanner ⇒ Object

Class Method Details

.load(dump) ⇒ Object

Instance Method Details

#dump(gzip_level = 7) ⇒ Object

#each(kind_filter = nil, &block) ⇒ Object

#each_text_token ⇒ Object

#encode(encoder, options = {}) ⇒ Object

#fix ⇒ Object

#fix! ⇒ Object

#optimize ⇒ Object

#optimize! ⇒ Object

#split_into_lines ⇒ Object

#split_into_lines! ⇒ Object

#stream? ⇒ Boolean

#text ⇒ Object

#text_size ⇒ Object

#to_s(options = {}) ⇒ Object