Class: ANTLR3::TokenScheme

Inherits:
Module
  • Object
show all
Includes:
TokenFactory
Defined in:
lib/antlr3/token.rb

Overview

TokenSchemes exist to handle the problem of defining token types as integer values while maintaining meaningful text names for the types. They are dynamically defined modules that map integer values to constants with token-type names.


Fundamentally, tokens exist to take a chunk of text and identify it as belonging to some category, like “VARIABLE” or “INTEGER”. In code, the category is represented by an integer – some arbitrary value that ANTLR will decide to use as it is creating the recognizer. The purpose of using an integer (instead of say, a ruby symbol) is that ANTLR's decision logic often needs to test whether a token's type falls within a range, which is not possible with symbols.

The downside of token types being represented as integers is that a developer needs to be able to reference the unknown type value by name in action code. Furthermore, code that references the type by name and tokens that can be inspected with names in place of type values are more meaningful to a developer.

Since ANTLR requires token type names to follow capital-letter naming conventions, defining types as named constants of the recognizer class resolves the problem of referencing type values by name. Thus, a token type like “VARIABLE'' can be represented by a number like 5 and referenced within code by VARIABLE. However, when a recognizer creates tokens, the name of the token's type cannot be seen without using the data defined in the recognizer.

Of course, tokens could be defined with a name attribute that could be specified when tokens are created. However, doing so would make tokens take up more space than necessary, as well as making it difficult to change the type of a token while maintaining a correct name value.

TokenSchemes exist as a technique to manage token type referencing and name extraction. They:

  1. keep token type references clear and understandable in recognizer code

  2. permit access to a token's type-name independently of recognizer objects

  3. allow multiple classes to share the same token information

Building Token Schemes

TokenScheme is a subclass of Module. Thus, it has the method TokenScheme.new(tk_class = nil) { ... module-level code ...}, which will evaluate the block in the context of the scheme (module), similarly to Module#module_eval. Before evaluating the block, .new will setup the module with the following actions:

  1. define a customized token class (more on that below)

  2. add a new constant, TOKEN_NAMES, which is a hash that maps types to names

  3. dynamically populate the new scheme module with a couple instance methods

  4. include ANTLR3::Constants in the new scheme module

As TokenScheme the class functions as a metaclass, figuring out some of the scoping behavior can be mildly confusing if you're trying to get a handle of the entity for your own purposes. Remember that all of the instance methods of TokenScheme function as module-level methods of TokenScheme instances, ala attr_accessor and friends.

TokenScheme#define_token(name_symbol, int_value) adds a constant definition name_symbol with the value int_value. It is essentially like Module#const_set, except it forbids constant overwriting (which would mess up recognizer code fairly badly) and adds an inverse type-to-name map to its own TOKEN_NAMES table. TokenScheme#define_tokens is a convenience method for defining many types with a hash pairing names to values.

TokenScheme#register_name(value, name_string) specifies a custom type-to-name definition. This is particularly useful for the anonymous tokens that ANTLR generates for literal strings in the grammar specification. For example, if you refer to the literal '=' in some parser rule in your grammar, ANTLR will add a lexer rule for the literal and give the token a name like T__x, where x is the type's integer value. Since this is pretty meaningless to a developer, generated code should add a special name definition for type value x with the string "'='".

Sample TokenScheme Construction

TokenData = ANTLR3::TokenScheme.new do
  define_tokens(
    :INT  => 4,
    :ID   => 6,
    :T__5 => 5,
    :WS   => 7
  )

  # note the self:: scoping below is due to the fact that
  # ruby lexically-scopes constant names instead of
  # looking up in the current scope
  register_name(self::T__5, "'='")
end

TokenData::ID           # => 6
TokenData::T__5         # => 5
TokenData.token_name(4) # => 'INT'
TokenData.token_name(5) # => "'='"

class ARecognizerOrSuch < ANTLR3::Parser
  include TokenData
  ID   # => 6
end

Custom Token Classes and Relationship with Tokens

When a TokenScheme is created, it will define a subclass of ANTLR3::CommonToken and assigned it to the constant name Token. This token class will both include and extend the scheme module. Since token schemes define the private instance method token_name(type), instances of the token class are now able to provide their type names. The Token method name uses the token_name method to provide the type name as if it were a simple attribute without storing the name itself.

When a TokenScheme is included in a recognizer class, the class will now have the token types as named constants, a type-to-name map constant TOKEN_NAMES, and a grammar-specific subclass of ANTLR3::CommonToken assigned to the constant Token. Thus, when recognizers need to manufacture tokens, instead of using the generic CommonToken class, they can create tokens using the customized Token class provided by the token scheme.

If you need to use a token class other than CommonToken, you can pass the class as a parameter to TokenScheme.new, which will be used in place of the dynamically-created CommonToken subclass.

Constant Summary collapse

FETCH_KEY =
proc { | h, v | h.index( v ) }

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from TokenFactory

#create_token

Methods inherited from Module

#modspace

Instance Attribute Details

#typesObject (readonly)

Returns the value of attribute types


563
564
565
# File 'lib/antlr3/token.rb', line 563

def types
  @types
end

#unusedObject (readonly)

Returns the value of attribute unused


563
564
565
# File 'lib/antlr3/token.rb', line 563

def unused
  @unused
end

Class Method Details

.build(*token_names) ⇒ Object


539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
# File 'lib/antlr3/token.rb', line 539

def self.build( *token_names )
  token_names = [ token_names ].flatten!
  token_names.compact!
  token_names.uniq!
  tk_class = Class === token_names.first ? token_names.shift : nil
  value_maps, names = token_names.partition { |i| Hash === i }
  new( tk_class ) do
    for value_map in value_maps
      define_tokens( value_map )
    end
    
    for name in names
      define_token( name )
    end
  end
end

.new(tk_class = nil, &body) ⇒ Object


511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
# File 'lib/antlr3/token.rb', line 511

def self.new( tk_class = nil, &body )
  super() do
    tk_class ||= Class.new( ::ANTLR3::CommonToken )
    self.token_class = tk_class
    
    const_set( :TOKEN_NAMES, ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.clone )
    
    @types  = ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.invert
    @unused = ::ANTLR3::Constants::MIN_TOKEN_TYPE
    
    scheme = self
    define_method( :token_scheme ) { scheme }
    define_method( :token_names )  { scheme::TOKEN_NAMES }
    define_method( :token_name ) do |type|
      begin
        token_names[ type ] or super
      rescue NoMethodError
        ::ANTLR3::CommonToken.token_name( type )
      end
    end
    module_function :token_name, :token_names
    
    include ANTLR3::Constants
    
    body and module_eval( &body )
  end
end

Instance Method Details

#[](name_or_value) ⇒ Object


649
650
651
652
653
654
# File 'lib/antlr3/token.rb', line 649

def []( name_or_value )
  case name_or_value
  when Integer then token_names.fetch( name_or_value, nil )
  else const_get( name_or_value.to_s ) rescue FETCH_KEY.call( token_names, name_or_value )
  end
end

#built_in_type?(type_value) ⇒ Boolean

Returns:

  • (Boolean)

638
639
640
# File 'lib/antlr3/token.rb', line 638

def built_in_type?( type_value )
  Constants::BUILT_IN_TOKEN_NAMES.fetch( type_value, false ) and true
end

#define_token(name, value = nil) ⇒ Object


572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
# File 'lib/antlr3/token.rb', line 572

def define_token( name, value = nil )
  name = name.to_s
  
  if current_value = @types[ name ]
    # token type has already been defined
    # raise an error unless value is the same as the current value
    value ||= current_value
    unless current_value == value
      raise NameError.new( 
        "new token type definition ``#{ name } = #{ value }'' conflicts " <<
        "with existing type definition ``#{ name } = #{ current_value }''", name
      )
    end
  else
    value ||= @unused
    if name =~ /^[A-Z]\w*$/
      const_set( name, @types[ name ] = value )
    else
      constant = "T__#{ value }"
      const_set( constant, @types[ constant ] = value )
      @types[ name ] = value
    end
    register_name( value, name ) unless built_in_type?( value )
  end
  
  value >= @unused and @unused = value + 1
  return self
end

#define_tokens(token_map = {}) ⇒ Object


565
566
567
568
569
570
# File 'lib/antlr3/token.rb', line 565

def define_tokens( token_map = {} )
  for token_name, token_value in token_map
    define_token( token_name, token_value )
  end
  return self
end

#register_name(type_value, name) ⇒ Object


614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
# File 'lib/antlr3/token.rb', line 614

def register_name( type_value, name )
  name = name.to_s.freeze
  if token_names.has_key?( type_value )
    current_name = token_names[ type_value ]
    current_name == name and return name
    
    if current_name == "T__#{ type_value }"
      # only an anonymous name is registered -- upgrade the name to the full literal name
      token_names[ type_value ] = name
    elsif name == "T__#{ type_value }"
      # ignore name downgrade from literal to anonymous constant
      return current_name
    else
      error = NameError.new( 
        "attempted assignment of token type #{ type_value }" <<
        " to name #{ name } conflicts with existing name #{ current_name }", name
      )
      raise error
    end
  else
    token_names[ type_value ] = name.to_s.freeze
  end
end

#register_names(*names) ⇒ Object


601
602
603
604
605
606
607
608
609
610
611
612
# File 'lib/antlr3/token.rb', line 601

def register_names( *names )
  if names.length == 1 and Hash === names.first
    names.first.each do |value, name|
      register_name( value, name )
    end
  else
    names.each_with_index do |name, i|
      type_value = Constants::MIN_TOKEN_TYPE + i
      register_name( type_value, name )
    end
  end
end

#token_classObject


656
657
658
# File 'lib/antlr3/token.rb', line 656

def token_class
  self::Token
end

#token_class=(klass) ⇒ Object


660
661
662
663
664
665
666
# File 'lib/antlr3/token.rb', line 660

def token_class=( klass )
  Class === klass or raise( TypeError, "token_class must be a Class" )
  Util.silence_warnings do
    klass < self or klass.send( :include, self )
    const_set( :Token, klass )
  end
end

#token_defined?(name_or_value) ⇒ Boolean

Returns:

  • (Boolean)

642
643
644
645
646
647
# File 'lib/antlr3/token.rb', line 642

def token_defined?( name_or_value )
  case value
  when Integer then token_names.has_key?( name_or_value )
  else const_defined?( name_or_value.to_s )
  end
end