Class: ANTLR3::TokenScheme
- Inherits:
-
Module
- Object
- Module
- ANTLR3::TokenScheme
- Includes:
- TokenFactory
- Defined in:
- lib/antlr3/token.rb
Overview
TokenSchemes exist to handle the problem of defining token types as integer values while maintaining meaningful text names for the types. They are dynamically defined modules that map integer values to constants with token-type names.
Fundamentally, tokens exist to take a chunk of text and identify it as belonging to some category, like "VARIABLE" or "INTEGER". In code, the category is represented by an integer -- some arbitrary value that ANTLR will decide to use as it is creating the recognizer. The purpose of using an integer (instead of say, a ruby symbol) is that ANTLR's decision logic often needs to test whether a token's type falls within a range, which is not possible with symbols.
The downside of token types being represented as integers is that a developer needs to be able to reference the unknown type value by name in action code. Furthermore, code that references the type by name and tokens that can be inspected with names in place of type values are more meaningful to a developer.
Since ANTLR requires token type names to follow capital-letter naming conventions, defining types as named constants of the recognizer class resolves the problem of referencing type values by name. Thus, a token type like "VARIABLE'' can be represented by a number like 5 and referenced within code by VARIABLE. However, when a recognizer creates tokens, the name of the token's type cannot be seen without using the data defined in the recognizer.
Of course, tokens could be defined with a name attribute that could be specified when tokens are created. However, doing so would make tokens take up more space than necessary, as well as making it difficult to change the type of a token while maintaining a correct name value.
TokenSchemes exist as a technique to manage token type referencing and name extraction. They:
-
keep token type references clear and understandable in recognizer code
-
permit access to a token's type-name independently of recognizer objects
-
allow multiple classes to share the same token information
Building Token Schemes
TokenScheme is a subclass of Module. Thus, it has the method TokenScheme.new(tk_class = nil) { ... module-level code ...}, which will evaluate the block in the context of the scheme (module), similarly to Module#module_eval. Before evaluating the block, .new will setup the module with the following actions:
-
define a customized token class (more on that below)
-
add a new constant, TOKEN_NAMES, which is a hash that maps types to names
-
dynamically populate the new scheme module with a couple instance methods
-
include ANTLR3::Constants in the new scheme module
As TokenScheme the class functions as a metaclass, figuring out some of the scoping behavior can be mildly confusing if you're trying to get a handle of the entity for your own purposes. Remember that all of the instance methods of TokenScheme function as module-level methods of TokenScheme instances, ala attr_accessor and friends.
TokenScheme#define_token(name_symbol, int_value) adds a constant definition name_symbol with the value int_value. It is essentially like Module#const_set, except it forbids constant overwriting (which would mess up recognizer code fairly badly) and adds an inverse type-to-name map to its own TOKEN_NAMES table. TokenScheme#define_tokens is a convenience method for defining many types with a hash pairing names to values.
TokenScheme#register_name(value, name_string) specifies a custom type-to-name definition. This is particularly useful for the anonymous tokens that ANTLR generates for literal strings in the grammar specification. For example, if you refer to the literal '=' in some parser rule in your grammar, ANTLR will add a lexer rule for the literal and give the token a name like T__x, where x is the type's integer value. Since this is pretty meaningless to a developer, generated code should add a special name definition for type value x with the string "'='".
Sample TokenScheme Construction
TokenData = ANTLR3::TokenScheme.new do
define_tokens(
:INT => 4,
:ID => 6,
:T__5 => 5,
:WS => 7
)
# note the self:: scoping below is due to the fact that
# ruby lexically-scopes constant names instead of
# looking up in the current scope
register_name(self::T__5, "'='")
end
TokenData::ID # => 6
TokenData::T__5 # => 5
TokenData.token_name(4) # => 'INT'
TokenData.token_name(5) # => "'='"
class ARecognizerOrSuch < ANTLR3::Parser
include TokenData
ID # => 6
end
Custom Token Classes and Relationship with Tokens
When a TokenScheme is created, it will define a subclass of ANTLR3::CommonToken and assigned it to the constant name Token. This token class will both include and extend the scheme module. Since token schemes define the private instance method token_name(type), instances of the token class are now able to provide their type names. The Token method name uses the token_name method to provide the type name as if it were a simple attribute without storing the name itself.
When a TokenScheme is included in a recognizer class, the class will now have the token types as named constants, a type-to-name map constant TOKEN_NAMES, and a grammar-specific subclass of ANTLR3::CommonToken assigned to the constant Token. Thus, when recognizers need to manufacture tokens, instead of using the generic CommonToken class, they can create tokens using the customized Token class provided by the token scheme.
If you need to use a token class other than CommonToken, you can pass the class as a parameter to TokenScheme.new, which will be used in place of the dynamically-created CommonToken subclass.
Instance Attribute Summary (collapse)
-
- (Object) types
readonly
Returns the value of attribute types.
-
- (Object) unused
readonly
Returns the value of attribute unused.
Class Method Summary (collapse)
Instance Method Summary (collapse)
- - (Object) [](name_or_value)
- - (Boolean) built_in_type?(type_value)
- - (Object) define_token(name, value = nil)
- - (Object) define_tokens(token_map = {})
- - (Object) register_name(type_value, name)
- - (Object) register_names(*names)
- - (Object) token_class
- - (Object) token_class=(klass)
- - (Boolean) token_defined?(name_or_value)
Methods included from TokenFactory
Methods inherited from Module
Instance Attribute Details
- (Object) types (readonly)
Returns the value of attribute types
554 555 556 |
# File 'lib/antlr3/token.rb', line 554 def types @types end |
- (Object) unused (readonly)
Returns the value of attribute unused
554 555 556 |
# File 'lib/antlr3/token.rb', line 554 def unused @unused end |
Class Method Details
+ (Object) build(*token_names)
530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 |
# File 'lib/antlr3/token.rb', line 530 def self.build( *token_names ) token_names = [ token_names ].flatten! token_names.compact! token_names.uniq! tk_class = Class === token_names.first ? token_names.shift : nil value_maps, names = token_names.partition { |i| Hash === i } new( tk_class ) do for value_map in value_maps define_tokens( value_map ) end for name in names define_token( name ) end end end |
+ (Object) new(tk_class = nil, &body)
502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 |
# File 'lib/antlr3/token.rb', line 502 def self.new( tk_class = nil, &body ) super() do tk_class ||= Class.new( ::ANTLR3::CommonToken ) self.token_class = tk_class const_set( :TOKEN_NAMES, ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.clone ) @types = ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.invert @unused = ::ANTLR3::Constants::MIN_TOKEN_TYPE scheme = self define_method( :token_scheme ) { scheme } define_method( :token_names ) { scheme::TOKEN_NAMES } define_method( :token_name ) do |type| begin token_names[ type ] or super rescue NoMethodError ::ANTLR3::CommonToken.token_name( type ) end end module_function :token_name, :token_names include ANTLR3::Constants body and module_eval( &body ) end end |
Instance Method Details
- (Object) [](name_or_value)
640 641 642 643 644 645 |
# File 'lib/antlr3/token.rb', line 640 def []( name_or_value ) case name_or_value when Integer then token_names.fetch( name_or_value, nil ) else const_get( name_or_value.to_s ) rescue token_names.index( name_or_value ) end end |
- (Boolean) built_in_type?(type_value)
629 630 631 |
# File 'lib/antlr3/token.rb', line 629 def built_in_type?( type_value ) Constants::BUILT_IN_TOKEN_NAMES.fetch( type_value, false ) and true end |
- (Object) define_token(name, value = nil)
563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 |
# File 'lib/antlr3/token.rb', line 563 def define_token( name, value = nil ) name = name.to_s if current_value = @types[ name ] # token type has already been defined # raise an error unless value is the same as the current value value ||= current_value unless current_value == value raise NameError.new( "new token type definition ``#{ name } = #{ value }'' conflicts " << "with existing type definition ``#{ name } = #{ current_value }''", name ) end else value ||= @unused if name =~ /^[A-Z]\w*$/ const_set( name, @types[ name ] = value ) else constant = "T__#{ value }" const_set( constant, @types[ constant ] = value ) @types[ name ] = value end register_name( value, name ) unless built_in_type?( value ) end value >= @unused and @unused = value + 1 return self end |
- (Object) define_tokens(token_map = {})
556 557 558 559 560 561 |
# File 'lib/antlr3/token.rb', line 556 def define_tokens( token_map = {} ) for token_name, token_value in token_map define_token( token_name, token_value ) end return self end |
- (Object) register_name(type_value, name)
605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 |
# File 'lib/antlr3/token.rb', line 605 def register_name( type_value, name ) name = name.to_s.freeze if token_names.has_key?( type_value ) current_name = token_names[ type_value ] current_name == name and return name if current_name == "T__#{ type_value }" # only an anonymous name is registered -- upgrade the name to the full literal name token_names[ type_value ] = name elsif name == "T__#{ type_value }" # ignore name downgrade from literal to anonymous constant return current_name else error = NameError.new( "attempted assignment of token type #{ type_value }" << " to name #{ name } conflicts with existing name #{ current_name }", name ) raise error end else token_names[ type_value ] = name.to_s.freeze end end |
- (Object) register_names(*names)
592 593 594 595 596 597 598 599 600 601 602 603 |
# File 'lib/antlr3/token.rb', line 592 def register_names( *names ) if names.length == 1 and Hash === names.first names.first.each do |value, name| register_name( value, name ) end else names.each_with_index do |name, i| type_value = Constants::MIN_TOKEN_TYPE + i register_name( type_value, name ) end end end |
- (Object) token_class
647 648 649 |
# File 'lib/antlr3/token.rb', line 647 def token_class self::Token end |
- (Object) token_class=(klass)
651 652 653 654 655 656 657 |
# File 'lib/antlr3/token.rb', line 651 def token_class=( klass ) Class === klass or raise( TypeError, "token_class must be a Class" ) Util.silence_warnings do klass < self or klass.send( :include, self ) const_set( :Token, klass ) end end |
- (Boolean) token_defined?(name_or_value)
633 634 635 636 637 638 |
# File 'lib/antlr3/token.rb', line 633 def token_defined?( name_or_value ) case value when Integer then token_names.has_key?( name_or_value ) else const_defined?( name_or_value.to_s ) end end |