Class: ANTLR3::CommonTokenStream

Inherits:
Object
  • Object
show all
Includes:
TokenStream, Enumerable
Defined in:
lib/antlr3/streams.rb

Overview

CommonTokenStream serves as the primary token stream implementation for feeding sequential token input into parsers.

Using some TokenSource (such as a lexer), the stream collects a token sequence, setting the token's index attribute to indicate the token's position within the stream. The streams may be tuned to some channel value; off-channel tokens will be filtered out by the #peek, #look, and #consume methods.

Sample Usage

source_input = ANTLR3::StringStream.new("35 * 4 - 1")
lexer = Calculator::Lexer.new(source_input)
tokens = ANTLR3::CommonTokenStream.new(lexer)

# assume this grammar defines whitespace as tokens on channel HIDDEN
# and numbers and operations as tokens on channel DEFAULT
tokens.look         # => 0 INT['35'] @ line 1 col 0 (0..1)
tokens.look(2)      # => 2 MULT["*"] @ line 1 col 2 (3..3)
tokens.tokens(0, 2)
  # => [0 INT["35"] @line 1 col 0 (0..1), 
  #     1 WS[" "] @line 1 col 2 (1..1), 
  #     2 MULT["*"] @ line 1 col 3 (3..3)]
  # notice the #tokens method does not filter off-channel tokens

lexer.reset
hidden_tokens = 
  ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN)
hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1)

Direct Known Subclasses

TokenRewriteStream

Constant Summary

Constants included from Constants

ANTLR3::Constants::BUILT_IN_TOKEN_NAMES, ANTLR3::Constants::DEFAULT, ANTLR3::Constants::DOWN, ANTLR3::Constants::EOF, ANTLR3::Constants::EOF_TOKEN, ANTLR3::Constants::EOR_TOKEN_TYPE, ANTLR3::Constants::HIDDEN, ANTLR3::Constants::INVALID, ANTLR3::Constants::INVALID_TOKEN, ANTLR3::Constants::MEMO_RULE_FAILED, ANTLR3::Constants::MEMO_RULE_UNKNOWN, ANTLR3::Constants::MIN_TOKEN_TYPE, ANTLR3::Constants::SKIP_TOKEN, ANTLR3::Constants::UP

Instance Attribute Summary

Attributes included from TokenStream

#channel, #last_marker, #position, #token_source

Attributes included from Stream

#source_name

Instance Method Summary collapse

Constructor Details

#initialize(token_source, options = {}) ⇒ CommonTokenStream

constructs a new token stream using the token_source provided. token_source is usually a lexer, but can be any object that implements next_token and includes ANTLR3::TokenSource.

If a block is provided, each token harvested will be yielded and if the block returns a nil or false value, the token will not be added to the stream – it will be discarded.

Options

:channel

The channel value the stream should be tuned to initially

:source_name

The source name (file name) attribute of the stream

Example

# create a new token stream that is tuned to channel :comment, and
# discard all WHITE_SPACE tokens
ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token|
  token.name != 'WHITE_SPACE'
end

780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
# File 'lib/antlr3/streams.rb', line 780

def initialize( token_source, options = {} )
  case token_source
  when CommonTokenStream
    # this is useful in cases where you want to convert a CommonTokenStream
    # to a RewriteTokenStream or other variation of the standard token stream
    stream = token_source
    @token_source = stream.token_source
    @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL }
    @source_name = options.fetch( :source_name ) { stream.source_name }
    tokens = stream.tokens.map { | t | t.dup }
  else
    @token_source = token_source
    @channel = options.fetch( :channel, DEFAULT_CHANNEL )
    @source_name = options.fetch( :source_name ) {  @token_source.source_name rescue nil }
    tokens = @token_source.to_a
  end
  @last_marker = nil
  @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens
  @tokens.each_with_index { |t, i| t.index = i }
  @position = 
    if first_token = @tokens.find { |t| t.channel == @channel }
      @tokens.index( first_token )
    else @tokens.length
    end
end

Instance Method Details

#<<(k) ⇒ Object


938
939
940
# File 'lib/antlr3/streams.rb', line 938

def << k
  self >> -k
end

#[](i, *args) ⇒ Object

identical to Array#[], as applied to the stream's token buffer


1064
1065
1066
# File 'lib/antlr3/streams.rb', line 1064

def []( i, *args )
  @tokens[ i, *args ]
end

#at(i) ⇒ Object


1057
1058
1059
# File 'lib/antlr3/streams.rb', line 1057

def at( i )
  @tokens.at i
end

#consumeObject

advance the stream one step to the next on-channel token


901
902
903
904
905
906
907
# File 'lib/antlr3/streams.rb', line 901

def consume
  token = @tokens[ @position ] || EOF_TOKEN
  if @position < @tokens.length
    @position = future?( 2 ) || @tokens.length
  end
  return( token )
end

#each(*args) ⇒ Object

yields each token in the stream (including off-channel tokens) If no block is provided, the method returns an Enumerator object. #each accepts the same arguments as #tokens


996
997
998
999
# File 'lib/antlr3/streams.rb', line 996

def each( *args )
  block_given? or return enum_for( :each, *args )
  tokens( *args ).each { |token| yield( token ) }
end

#each_on_channel(channel = @channel) ⇒ Object

yields each token in the stream with the given channel value If no channel value is given, the stream's tuned channel value will be used. If no block is given, an enumerator will be returned.


1007
1008
1009
1010
1011
1012
# File 'lib/antlr3/streams.rb', line 1007

def each_on_channel( channel = @channel )
  block_given? or return enum_for( :each_on_channel, channel )
  for token in @tokens
    token.channel == channel and yield( token )
  end
end

#extract_text(start = 0, stop = @tokens.length - 1) ⇒ Object Also known as: to_s

fetches the text content of all tokens between start and stop and joins the chunks into a single string


1081
1082
1083
1084
1085
# File 'lib/antlr3/streams.rb', line 1081

def extract_text( start = 0, stop = @tokens.length - 1 )
  start = start.to_i.at_least( 0 )
  stop = stop.to_i.at_most( @tokens.length )
  @tokens[ start..stop ].map! { |t| t.text }.join( '' )
end

#future?(k = 1) ⇒ Boolean

returns the index of the on-channel token at look-ahead position k or nil if no other on-channel tokens exist

Returns:

  • (Boolean)

946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
# File 'lib/antlr3/streams.rb', line 946

def future?( k = 1 )
  @position == -1 and fill_buffer
  
  case
  when k == 0 then nil
  when k < 0 then past?( -k )
  when k == 1 then @position
  else
    # since the stream only yields on-channel
    # tokens, the stream can't just go to the
    # next position, but rather must skip
    # over off-channel tokens
    ( k - 1 ).times.inject( @position ) do |cursor, |
      begin
        tk = @tokens.at( cursor += 1 ) or return( cursor )
        # ^- if tk is nil (i.e. i is outside array limits)
      end until tk.channel == @channel
      cursor
    end
  end
end

#hold(pos = @position) ⇒ Object

saves the current stream position, yields to the block, and then ensures the stream's position is restored before returning the value of the block


887
888
889
890
891
892
893
894
# File 'lib/antlr3/streams.rb', line 887

def hold( pos = @position )
  block_given? or return enum_for( :hold, pos )
  begin
    yield
  ensure
    seek( pos )
  end
end

#inspectObject

Standard Conversion Methods ###############################


1069
1070
1071
1072
1073
1074
1075
# File 'lib/antlr3/streams.rb', line 1069

def inspect
  string = "#<%p: @token_source=%p @ %p/%p" %
    [ self.class, @token_source.class, @position, @tokens.length ]
  tk = look( -1 ) and string << " #{ tk.inspect } <--"
  tk = look( 1 ) and string << " --> #{ tk.inspect }"
  string << '>'
end

#look(k = 1) ⇒ Object Also known as: >>

operates simillarly to #peek, but returns the full token object at look-ahead position k


932
933
934
935
# File 'lib/antlr3/streams.rb', line 932

def look( k = 1 )
  index = future?( k ) or return nil
  @tokens.fetch( index, EOF_TOKEN )
end

#markObject

bookmark the current position of the input stream


869
870
871
# File 'lib/antlr3/streams.rb', line 869

def mark
  @last_marker = @position
end

#past?(k = 1) ⇒ Boolean

returns the index of the on-channel token at look-behind position k or nil if no other on-channel tokens exist before the current token

Returns:

  • (Boolean)

972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
# File 'lib/antlr3/streams.rb', line 972

def past?( k = 1 )
  @position == -1 and fill_buffer
  
  case
  when k == 0 then nil
  when @position - k < 0 then nil
  else
    
    k.times.inject( @position ) do |cursor, |
      begin
        cursor <= 0 and return( nil )
        tk = @tokens.at( cursor -= 1 ) or return( nil )
      end until tk.channel == @channel
      cursor
    end
    
  end
end

#peek(k = 1) ⇒ Object

return the type of the on-channel token at look-ahead distance k. k = 1 represents the current token. k greater than 1 represents upcoming on-channel tokens. A negative value of k returns previous on-channel tokens consumed, where k = -1 is the last on-channel token consumed. k = 0 has undefined behavior and returns nil


925
926
927
# File 'lib/antlr3/streams.rb', line 925

def peek( k = 1 )
  tk = look( k ) and return( tk.type )
end

#rebuild(token_source = nil) ⇒ Object

resets the token stream and rebuilds it with a potentially new token source. If no token_source value is provided, the stream will attempt to reset the current token_source by calling reset on the object. The stream will then clear the token buffer and attempt to harvest new tokens. Identical in behavior to CommonTokenStream.new, if a block is provided, tokens will be yielded and discarded if the block returns a false or nil value.


814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
# File 'lib/antlr3/streams.rb', line 814

def rebuild( token_source = nil )
  if token_source.nil?
    @token_source.reset rescue nil
  else @token_source = token_source
  end
  @tokens = block_given? ? @token_source.select { |token| yield( token ) } :   
                           @token_source.to_a
  @tokens.each_with_index { |t, i| t.index = i }
  @last_marker = nil
  @position = 
    if first_token = @tokens.find { |t| t.channel == @channel }
      @tokens.index( first_token )
    else @tokens.length
    end
  return self
end

#release(marker = nil) ⇒ Object


873
874
875
# File 'lib/antlr3/streams.rb', line 873

def release( marker = nil )
  # do nothing
end

#resetObject

rewind the stream to its initial state


858
859
860
861
862
863
864
# File 'lib/antlr3/streams.rb', line 858

def reset
  @position = 0
  @position += 1 while token = @tokens[ @position ] and
                       token.channel != @channel
  @last_marker = nil
  return self
end

#rewind(marker = @last_marker, release = true) ⇒ Object


878
879
880
# File 'lib/antlr3/streams.rb', line 878

def rewind( marker = @last_marker, release = true )
  seek( marker )
end

#seek(index) ⇒ Object

jump to the stream position specified by index note: seek does not check whether or not the

token at the specified position is on-channel,

914
915
916
917
# File 'lib/antlr3/streams.rb', line 914

def seek( index )
  @position = index.to_i.bound( 0, @tokens.length )
  return self
end

#sizeObject Also known as: length


847
848
849
# File 'lib/antlr3/streams.rb', line 847

def size
  @tokens.length
end

#token_classObject


838
839
840
841
842
843
# File 'lib/antlr3/streams.rb', line 838

def token_class
  @token_source.token_class
rescue NoMethodError
  @position == -1 and fill_buffer
  @tokens.empty? ? CommonToken : @tokens.first.class
end

#tokens(start = nil, stop = nil) ⇒ Object

returns a copy of the token buffer. If start and stop are provided, tokens returns a slice of the token buffer from start..stop. The parameters are converted to integers with their to_i methods, and thus tokens can be provided to specify start and stop. If a block is provided, tokens are yielded and filtered out of the return array if the block returns a false or nil value.


1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
# File 'lib/antlr3/streams.rb', line 1044

def tokens( start = nil, stop = nil )
  stop.nil?  || stop >= @tokens.length and stop = @tokens.length - 1
  start.nil? || stop < 0 and start = 0
  tokens = @tokens[ start..stop ]
  
  if block_given?
    tokens.delete_if { |t| not yield( t ) }
  end
  
  return( tokens )
end

#tune_to(channel) ⇒ Object

tune the stream to a new channel value


834
835
836
# File 'lib/antlr3/streams.rb', line 834

def tune_to( channel )
  @channel = channel
end

#walkObject

iterates through the token stream, yielding each on channel token along the way. After iteration has completed, the stream's position will be restored to where it was before #walk was called. While #each or #each_on_channel does not change the positions stream during iteration, #walk advances through the stream. This makes it possible to look ahead and behind the current token during iteration. If no block is given, an enumerator will be returned.


1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
# File 'lib/antlr3/streams.rb', line 1022

def walk
  block_given? or return enum_for( :walk )
  initial_position = @position
  begin
    while token = look and token.type != EOF
      consume
      yield( token )
    end
    return self
  ensure
    @position = initial_position
  end
end