Class: RLTK::Lexer

Inherits:
Object show all
Defined in:
lib/rltk/lexer.rb

Overview

The Lexer class may be sub-classed to produce new lexers. These lexers have a lot of features, and are described in the main documentation.

Direct Known Subclasses

RLTK::Lexers::Calculator, RLTK::Lexers::EBNF

Defined Under Namespace

Classes: Environment, Rule

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeLexer

Instantiates a new lexer and creates an environment to be used for subsequent calls.



212
213
214
# File 'lib/rltk/lexer.rb', line 212

def initialize
	@env = self.class::Environment.new(self.class.start_state)
end

Class Attribute Details

.start_stateSymbol (readonly)

Returns State in which the lexer starts.

Returns:

  • (Symbol)

    State in which the lexer starts.



56
57
58
# File 'lib/rltk/lexer.rb', line 56

def start_state
  @start_state
end

Instance Attribute Details

#envEnvironment (readonly)

Returns Environment used by an instantiated lexer.

Returns:

  • (Environment)

    Environment used by an instantiated lexer.



48
49
50
# File 'lib/rltk/lexer.rb', line 48

def env
  @env
end

Class Method Details

.inherited(klass) ⇒ void

This method returns an undefined value.

Called when the Lexer class is sub-classed, it installes necessary instance class variables.



71
72
73
# File 'lib/rltk/lexer.rb', line 71

def inherited(klass)
	klass.install_icvars
end

.install_icvarsvoid

This method returns an undefined value.

Installs instance class varialbes into a class.



61
62
63
64
65
# File 'lib/rltk/lexer.rb', line 61

def install_icvars
	@match_type	= :longest
	@rules		= Hash.new {|h,k| h[k] = Array.new}
	@start_state	= :default
end

.lex(string, file_name = nil, env = self::Environment.new(@start_state)) ⇒ Array<Token>

Lex string, using env as the environment. This method will return the array of tokens generated by the lexer with a token of type EOS (End of Stream) appended to the end.

Parameters:

  • string (String)

    String to be lexed.

  • file_name (String) (defaults to: nil)

    File name used for recording token positions.

  • env (Environment) (defaults to: self::Environment.new(@start_state))

    Lexing environment.

Returns:



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# File 'lib/rltk/lexer.rb', line 84

def lex(string, file_name = nil, env = self::Environment.new(@start_state))
	# Offset from start of stream.
	stream_offset = 0
		
	# Offset from the start of the line.
	line_offset = 0
	line_number = 1

	# Empty token list.
	tokens = Array.new

	# The scanner.
	scanner = StringScanner.new(string)

	# Start scanning the input string.
	until scanner.eos?
		match = nil
	
		# If the match_type is set to :longest all of the
		# rules for the current state need to be scanned
		# and the longest match returned.  If the
		# match_type is :first, we only need to scan until
		# we find a match.
		@rules[env.state].each do |rule|
			if (rule.flags - env.flags).empty?
				if txt = scanner.check(rule.pattern)
					if not match or match.first.length < txt.length
						match = [txt, rule]
					
						break if @match_type == :first
					end
				end
			end
		end
	
		if match
			rule = match.last
		
			txt = scanner.scan(rule.pattern)
			type, value = env.rule_exec(rule.pattern.match(txt), txt, &rule.action)
		
			if type
				pos = StreamPosition.new(stream_offset, line_number, line_offset, txt.length, file_name)
				tokens << Token.new(type, value, pos) 
			end
		
			# Advance our stat counters.
			stream_offset += txt.length
		
			if (newlines = txt.count("\n")) > 0
				line_number += newlines
				line_offset  = 0
			else
				line_offset += txt.length()
			end
		else
			error = LexingError.new(stream_offset, line_number, line_offset, scanner.post_match)
			raise(error, 'Unable to match string with any of the given rules')
		end
	end

	return tokens << Token.new(:EOS)
end

.lex_file(file_name, env = self::Environment.new(@start_state)) ⇒ Array<Token>

A wrapper function that calls lex on the contents of a file.

Parameters:

  • file_name (String)

    File to be lexed.

  • env (Environment) (defaults to: self::Environment.new(@start_state))

    Lexing environment.

Returns:



155
156
157
# File 'lib/rltk/lexer.rb', line 155

def lex_file(file_name, env = self::Environment.new(@start_state))
	File.open(file_name, 'r') { |f| self.lex(f.read, file_name, env) }
end

.match_firstvoid

This method returns an undefined value.

Used to tell a lexer to use the first match found instead of the longest match found.



163
164
165
# File 'lib/rltk/lexer.rb', line 163

def match_first
	@match_type = :first
end

.rule(pattern, state = :default, flags = [], &action) ⇒ void Also known as: r

This method returns an undefined value.

This method is used to define a new lexing rule. The first argument is the regular expression used to match substrings of the input. The second argument is the state to which the rule belongs. Flags that need to be set for the rule to be considered are specified by the third argument. The last argument is a block that returns a type and value to be used in constructing a Token. If no block is specified the matched substring will be discarded and lexing will continue.

Parameters:

  • pattern (Regexp, String)

    Pattern for matching text.

  • state (Symbol) (defaults to: :default)

    State in which this rule is active.

  • flags (Array<Symbol>) (defaults to: [])

    Flags which must be set for rule to be active.

  • action (Proc)

    Proc object that produces Tokens.



183
184
185
186
187
188
189
190
191
192
193
# File 'lib/rltk/lexer.rb', line 183

def rule(pattern, state = :default, flags = [], &action)
	# If no action is given we will set it to an empty
	# action.
	action ||= Proc.new() {}
	
	pattern = Regexp.new(pattern) if pattern.is_a?(String)
	
	r = Rule.new(pattern, action, state, flags)
	
	if state == :ALL then @rules.each_key { |k| @rules[k] << r } else @rules[state] << r end
end

.start(state) ⇒ void

This method returns an undefined value.

Changes the starting state of the lexer.

Parameters:

  • state (Symbol)

    Starting state for this lexer.



201
202
203
# File 'lib/rltk/lexer.rb', line 201

def start(state)
	@start_state = state
end

Instance Method Details

#lex(string, file_name = nil) ⇒ Array<Token>

Lexes a string using the encapsulated environment.

Parameters:

  • string (String)

    String to be lexed.

  • file_name (String) (defaults to: nil)

    File name used for Token positions.

Returns:



222
223
224
# File 'lib/rltk/lexer.rb', line 222

def lex(string, file_name = nil)
	self.class.lex(string, file_name, @env)
end

#lex_file(file_name) ⇒ Array<Token>

Lexes a file using the encapsulated environment.

Parameters:

  • file_name (String)

    File to be lexed.

Returns:



231
232
233
# File 'lib/rltk/lexer.rb', line 231

def lex_file(file_name)
	self.class.lex_file(file_name, @env)
end