Class: RLTK::Parser

Inherits:

Object

Object
RLTK::Parser

show all

Defined in:: lib/rltk/parser.rb

Overview

The Parser class may be sub-classed to produce new parsers. These parsers have a lot of features, and are described in the main documentation.

Direct Known Subclasses

RLTK::Parsers::InfixCalc, RLTK::Parsers::PostfixCalc, RLTK::Parsers::PrefixCalc

Defined Under Namespace

Classes: Accept, Action, Environment, GoTo, ParseStack, Reduce, Shift, State

Instance Attribute Summary collapse

#env ⇒ Environment readonly

Environment used by the instantiated parser.

Class Method Summary collapse

.add_state(state) ⇒ Integer

If state (or its equivalent) is not in the state list it is added and it’s ID is returned.
.array_args ⇒ void

Calling this method will cause the parser to pass right-hand side values as arrays instead of splats.
.build_finalize_opts(opts) ⇒ Hash{Symbol => Object}

Build a hash with the default options for Parser.finalize and then update it with the values from opts.
.build_parse_opts(opts) ⇒ Hash{Symbol => Object}

Build a hash with the default options for Parser.parse and then update it with the values from opts.
.check_reachability(start, dest, symbols) ⇒ Boolean

This method checks to see if the parser would be in parse state dest after starting in state start and reading symbols.
.check_sanity ⇒ void

This method is used to (surprise) check the sanity of the constructed parser.
.clause(expression, precedence = nil, &action) ⇒ void (also: c)

Declares a new clause inside of a production.
.clean ⇒ void

Removes resources that were needed to generate the parser but aren’t needed when actually parsing input.
.empty_list_production(symbol, list_elements, separator) ⇒ Object (also: empty_list)

Adds productions and actions for parsing empty lists.
.explain(io) ⇒ void

This function will print a description of the parser to the provided IO object.
.finalize(opts = {}) ⇒ void

This method will finalize the parser causing the construction of states and their actions, and the resolution of conflicts using lookahead and precedence information.
.get_io(o, mode = 'w') ⇒ IO, false

Converts an object into an IO object as appropriate.
.grammar ⇒ CFG

The grammar that can be parsed by this Parser.
.grammar_prime ⇒ CFG

This method generates and memoizes the G’ grammar used to calculate the LALR(1) lookahead sets.
.inform_conflict(state_id, type, sym) ⇒ void

Inform the parser core that a conflict has been detected.
.inherited(klass) ⇒ void

Called when the Lexer class is sub-classed, it installes necessary instance class variables.
.install_icvars ⇒ void

Installs instance class varialbes into a class.
.left(*symbols) ⇒ void

This method is used to specify that the symbols in symbols are left-associative.
.nonassoc(*symbols) ⇒ void

This method is used to specify that the symbols in symbols are non-associative.
.nonempty_list_production(symbol, list_elements, separator) ⇒ Object (also: nonempty_list)

Adds productions and actions for parsing nonempty lists.
.parse(tokens, opts = {}) ⇒ Object⁺

This function is where actual parsing takes place.
.production(symbol, expression = nil, precedence = nil, &action) ⇒ void (also: p)

Adds a new production to the parser with a left-hand value of symbol.
.prune(do_lookahead, do_precedence) ⇒ void

This method uses lookahead sets and precedence information to resolve conflicts and remove unnecessary reduce actions.
.right(*symbols) ⇒ void

This method is used to specify that the symbols in symbols are right associative.
.start(symbol) ⇒ void

Changes the starting symbol of the parser.

Instance Method Summary collapse

#initialize ⇒ Parser constructor

Instantiates a new parser and creates an environment to be used for subsequent calls.
#parse(tokens, opts = {}) ⇒ Object

Parses the given token stream using the encapsulated environment.

Constructor Details

#initialize ⇒ `Parser`

Instantiates a new parser and creates an environment to be used for subsequent calls.



1183
1184
1185

# File 'lib/rltk/parser.rb', line 1183

def initialize
	@env = self.class::Environment.new
end

Instance Attribute Details

#env ⇒ `Environment` (readonly)

Returns Environment used by the instantiated parser.

Returns:

(Environment) —

Environment used by the instantiated parser.



72
73
74

# File 'lib/rltk/parser.rb', line 72

def env
  @env
end

Class Method Details

.add_state(state) ⇒ `Integer`

If state (or its equivalent) is not in the state list it is added and it’s ID is returned. If there is already a state with the same items as state in the state list its ID is returned and state is discarded.

Parameters:

state (State) —

State to add to the parser.

Returns:

(Integer) —

The ID of the state.

# File 'lib/rltk/parser.rb', line 159

def add_state(state)
	if (id = @states.index(state))
		id
	else
		state.id = @states.length
		
		@states << state
		
		@states.length - 1
	end
end

.array_args ⇒ `void`

This method returns an undefined value.

Calling this method will cause the parser to pass right-hand side values as arrays instead of splats. This method must be called before ANY calls to Parser.production.

# File 'lib/rltk/parser.rb', line 176

def array_args
	if @grammar.productions.length == 0
		@args = :array
		
		@grammar.callback do |p, type, num|
			@procs[p.id] =
			[
				case type
				when :*
					case num
					when :first then	Proc.new { |v|           [] }
					else				Proc.new { |v| v[0] << v[1] }
					end
				
				when :+
					case num
					when :first then	Proc.new { |v|       [v[0]] }
					else				Proc.new { |v| v[0] << v[1] }
					end
				
				when :'?'
					case num
					when :first then	Proc.new { |v|  nil }
					else				Proc.new { |v| v[0] }
					end
				
				when :elp
					case num
					when :first then	Proc.new { |v|   [] }
					else				Proc.new { |v| v[0] }
					end
				
				when :nelp
					case num
					when :first	then	Proc.new { |v|                                        v }
					when :second	then	Proc.new { |v|                            v[0] + [v[2]] }
					else				Proc.new { |v| if v.length == 1 then v.first else v end }
					end
				end,
				p.rhs.length
			]
			
			@production_precs[p.id] = p.last_terminal
		end
	end
end

.build_finalize_opts(opts) ⇒ `Hash{Symbol => Object}`

Build a hash with the default options for Parser.finalize and then update it with the values from opts.

Parameters:

opts (Hash{Symbol => Object}) —

Hash containing options for finalize.

Returns:

(Hash{Symbol => Object})

# File 'lib/rltk/parser.rb', line 229

def build_finalize_opts(opts)
	opts[:explain]	= self.get_io(opts[:explain])
	
	{
		:explain		=> false,
		:lookahead	=> true,
		:precedence	=> true,
		:use			=> false
	}.update(opts)
end

.build_parse_opts(opts) ⇒ `Hash{Symbol => Object}`

Build a hash with the default options for Parser.parse and then update it with the values from opts.

Parameters:

opts (Hash{Symbol => Object}) —

Hash containing options for parse.

Returns:

(Hash{Symbol => Object})

# File 'lib/rltk/parser.rb', line 247

def build_parse_opts(opts)
	opts[:parse_tree]	= self.get_io(opts[:parse_tree])
	opts[:verbose]		= self.get_io(opts[:verbose])
	
	{
		:accept		=> :first,
		:env			=> self::Environment.new,
		:parse_tree	=> false,
		:verbose		=> false
	}.update(opts)
end

.check_reachability(start, dest, symbols) ⇒ `Boolean`

This method checks to see if the parser would be in parse state dest after starting in state start and reading symbols.

Parameters:

start (Symbol) —

Symbol representing a CFG production.
dest (Symbol) —

Symbol representing a CFG production.
symbols (Array<Symbol>) —

Grammar symbols.

Returns:

(Boolean) —

If the destination symbol is reachable from the start symbol after reading symbols.

# File 'lib/rltk/parser.rb', line 320

def check_reachability(start, dest, symbols)
	path_exists	= true
	cur_state		= start
	
	symbols.each do |sym|
		
		actions = @states[cur_state.id].on?(sym)
		actions = actions.select { |a| a.is_a?(Shift) } if CFG::is_terminal?(sym)
		
		if actions.empty?
			path_exists = false
			break
		end
		
		# There can only be one Shift action for terminals and
		# one GoTo action for non-terminals, so we know the
		# first action is the only one in the list.
		cur_state = @states[actions.first.id]
	end
	
	path_exists and cur_state.id == dest.id
end

.check_sanity ⇒ `void`

This method returns an undefined value.

This method is used to (surprise) check the sanity of the constructed parser. It checks to make sure all non-terminals used in the grammar definition appear on the left-hand side of one or more productions, and that none of the parser’s states have invalid actions. If a problem is encountered a ParserConstructionException is raised.

# File 'lib/rltk/parser.rb', line 268

def check_sanity
	# Check to make sure all non-terminals appear on the
	# left-hand side of some production.
	@grammar.nonterms.each do |sym|
		if not @lh_sides.values.include?(sym)
			raise ParserConstructionException, "Non-terminal #{sym} does not appear on the left-hand side of any production."
		end
	end
	
	# Check the actions in each state.
	@states.each do |state|
		state.actions.each do |sym, actions|
			if CFG::is_terminal?(sym)
				# Here we check actions for terminals.
				actions.each do |action|
					if action.is_a?(Accept)
						if sym != :EOS
							raise ParserConstructionException, "Accept action found for terminal #{sym} in state #{state.id}."
						end
							
					elsif not (action.is_a?(GoTo) or action.is_a?(Reduce) or action.is_a?(Shift))
						raise ParserConstructionException, "Object of type #{action.class} found in actions for terminal " +
							"#{sym} in state #{state.id}."
						
					end
				end
				
				if (conflict = state.conflict_on?(sym))
					self.inform_conflict(state.id, conflict, sym)
				end
			else
				# Here we check actions for non-terminals.
				if actions.length > 1
					raise ParserConstructionException, "State #{state.id} has multiple GoTo actions for non-terminal #{sym}."
					
				elsif actions.length == 1 and not actions.first.is_a?(GoTo)
					raise ParserConstructionException, "State #{state.id} has non-GoTo action for non-terminal #{sym}."
					
				end
			end
		end
	end
end

.clause(expression, precedence = nil, &action) ⇒ `void` Also known as: c

This method returns an undefined value.

Declares a new clause inside of a production. The right-hand side is specified by expression and the precedence of this production can be changed by setting the precedence argument to some terminal symbol.

Parameters:

expression (String) —

Right-hand side of a production.
precedence (Symbol) (defaults to: nil) —

Symbol representing the precedence of this production.
action (Proc) —

Action to be taken when the production is reduced.

# File 'lib/rltk/parser.rb', line 353

def clause(expression, precedence = nil, &action)
	# Use the curr_prec only if it isn't overridden for this
	# clause.
	precedence ||= @curr_prec
	
	production = @grammar.clause(expression)
	
	# Check to make sure the action's arity matches the number
	# of symbols on the right-hand side.
	if @args == :splat and action.arity != production.rhs.length
		raise ParserConstructionException, 'Incorrect number of arguments to action.  Action arity must match the number of ' +
			'terminals and non-terminals in the clause.'
	end
	
	# Add the action to our proc list.
	@procs[production.id] = [action, production.rhs.length]
	
	# If no precedence is specified use the precedence of the
	# last terminal in the production.
	@production_precs[production.id] = precedence || production.last_terminal
end

.clean ⇒ `void`

This method returns an undefined value.

Removes resources that were needed to generate the parser but aren’t needed when actually parsing input.

# File 'lib/rltk/parser.rb', line 380

def clean
	# We've told the developer about conflicts by now.
	@conflicts = nil
	
	# Drop the grammar and the grammar'.
	@grammar		= nil
	@grammar_prime	= nil
	
	# Drop precedence and bookkeeping information.
	@cur_lhs	= nil
	@cur_prec	= nil
	
	@prec_counts		= nil
	@production_precs	= nil
	@token_precs		= nil
	
	# Drop the items from each of the states.
	@states.each { |state| state.clean }
end

.empty_list_production(symbol, list_elements, separator) ⇒ `Object` Also known as: empty_list

Adds productions and actions for parsing empty lists.

.explain(io) ⇒ `void`

This method returns an undefined value.

This function will print a description of the parser to the provided IO object.

Parameters:

io (IO) —

Input/Output object used for printing the parser’s explanation.

# File 'lib/rltk/parser.rb', line 414

def explain(io)
	if @grammar and not @states.empty?
		io.puts('###############')
		io.puts('# Productions #')
		io.puts('###############')
		io.puts
		
		# Print the productions.
		@grammar.productions.each do |sym, productions|
			productions.each do |production|
				io.print("\tProduction #{production.id}: #{production.to_s}")
				
				if (prec = @production_precs[production.id])
					io.print(" : (#{prec.first} , #{prec.last})")
				end
				
				io.puts
			end
			
			io.puts
		end
		
		io.puts('##########')
		io.puts('# Tokens #')
		io.puts('##########')
		io.puts
		
		@grammar.terms.sort {|a,b| a.to_s <=> b.to_s }.each do |term|
			io.print("\t#{term}")
			
			if (prec = @token_precs[term])
				io.print(" : (#{prec.first}, #{prec.last})")
			end
			
			io.puts
		end
		
		io.puts
		
		io.puts('#####################')
		io.puts('# Table Information #')
		io.puts('#####################')
		io.puts
		
		io.puts("\tStart symbol: #{@grammar.start_symbol}")
		io.puts
		
		io.puts("\tTotal number of states: #{@states.length}")
		io.puts
		
		io.puts("\tTotal conflicts: #{@conflicts.values.flatten(1).length}")
		io.puts
		
		@conflicts.each do |state_id, conflicts|
			io.puts("\tState #{state_id} has #{conflicts.length} conflict(s)")
		end
		
		io.puts if not @conflicts.empty?
		
		# Print the parse table.
		io.puts('###############')
		io.puts('# Parse Table #')
		io.puts('###############')
		io.puts
		
		@states.each do |state|
			io.puts("State #{state.id}:")
			io.puts
			
			io.puts("\t# ITEMS #")
			max = state.items.inject(0) do |max, item|
				if item.lhs.to_s.length > max then item.lhs.to_s.length else max end
			end
			
			state.each do |item|
				io.puts("\t#{item.to_s(max)}")
			end
			
			io.puts
			io.puts("\t# ACTIONS #")
			
			state.actions.keys.sort {|a,b| a.to_s <=> b.to_s}.each do |sym|
				state.actions[sym].each do |action|
					io.puts("\tOn #{sym} #{action}")
				end
			end
			
			io.puts
			io.puts("\t# CONFLICTS #")
			
			if @conflicts[state.id].length == 0
				io.puts("\tNone\n\n")
			else
				@conflicts[state.id].each do |conflict|
					type, sym = conflict
					
					io.print("\t#{if type == :SR then "Shift/Reduce" else "Reduce/Reduce" end} conflict")
					
					io.puts(" on #{sym}")
				end
				
				io.puts
			end
		end
		
		# Close any IO objects that aren't $stdout.
		io.close if io.is_a?(IO) and io != $stdout
	else
		raise ParserConstructionException, 'Parser.explain called outside of finalize.'
	end
end

.finalize(opts = {}) ⇒ `void`

This method returns an undefined value.

This method will finalize the parser causing the construction of states and their actions, and the resolution of conflicts using lookahead and precedence information.

The opts hash may contain the following options, which are described in more detail in the main documentation:

:explain - To explain the parser or not.
:lookahead - To use lookahead info for conflict resolution.
:precedence - To use precedence info for conflict resolution.
:use - A file name or object that is used to load/save the parser.

No calls to production may appear after the call to Parser.finalize.

Parameters:

opts (Hash{Symbol => Object}) (defaults to: {}) —

Options describing how to finalize the parser.

# File 'lib/rltk/parser.rb', line 544

def finalize(opts = {})
	
	# Get the full options hash.
	opts = build_finalize_opts(opts)
	
	# Get the name of the file in which the parser is defined.
	#
	# FIXME: See why this is failing for the simple ListParser example.
	def_file = caller()[2].split(':')[0] if opts[:use]
	
	# Check to make sure we can load the necessary information
	# from the specified object.
	if opts[:use] and (
		(opts[:use].is_a?(String) and File.exists?(opts[:use]) and File.mtime(opts[:use]) > File.mtime(def_file)) or
		(opts[:use].is_a?(File) and opts[:use].mtime > File.mtime(def_file))
		)
		
		file = self.get_io(opts[:use], 'r')
		
		# Un-marshal our saved data structures.
		file.flock(File::LOCK_SH)
		@lh_sides, @states, @symbols = Marshal.load(file)
		file.flock(File::LOCK_UN)
		
		# Close the file if we opened it.
		file.close if opts[:use].is_a?(String)
		
		# Remove any un-needed data and return.
		return self.clean
	end
	
	# Grab all of the symbols that comprise the grammar (besides
	# the start symbol).
	@symbols = @grammar.symbols << :ERROR
	
	# Add our starting state to the state list.
	start_production	= @grammar.production(:start, @grammar.start_symbol.to_s).first
	start_state		= State.new(@symbols, [start_production.to_item])
	
	start_state.close(@grammar.productions)
	
	self.add_state(start_state)
	
	# Translate the precedence of productions from tokens to
	# (associativity, precedence) pairs.
	@production_precs.each_with_index do |prec, id|
		@production_precs[id] = @token_precs[prec]
	end
	
	# Build the rest of the transition table.
	@states.each do |state|
		#Transition states.
		tstates = Hash.new { |h,k| h[k] = State.new(@symbols) }
		
		#Bin each item in this set into reachable transition
		#states.
		state.each do |item|
			if (next_symbol = item.next_symbol)
				tstates[next_symbol] << item.copy
			end
		end
		
		# For each transition state:
		#  1) Get transition symbol
		#  2) Advance dot
		#  3) Close it
		#  4) Get state id and add transition
		tstates.each do |symbol, tstate|
			tstate.each { |item| item.advance }
			
			tstate.close(@grammar.productions)
			
			id = self.add_state(tstate)
			
			# Add Goto and Shift actions.
			state.on(symbol, CFG::is_nonterminal?(symbol) ? GoTo.new(id) : Shift.new(id))
		end
		
		# Find the Accept and Reduce actions for this state.
		state.each do |item|
			if item.at_end?
				if item.lhs == :start
					state.on(:EOS, Accept.new)
				else
					state.add_reduction(item.id)
				end
			end
		end
	end
	
	# Build the production.id -> production.lhs map.
	@grammar.productions(:id).to_a.inject(@lh_sides) do |h, pair|
		id, production = pair
		
		h[id] = production.lhs
		
		h
	end
	
	# Prune the parsing table for unnecessary reduce actions.
	self.prune(opts[:lookahead], opts[:precedence])
	
	# Check the parser for inconsistencies.
	self.check_sanity
	
	# Print the table if requested.
	self.explain(opts[:explain]) if opts[:explain]
	
	# Remove any data that is no longer needed.
	self.clean
	
	# Store the parser's final data structures if requested.
	if opts[:use]
		io = self.get_io(opts[:use])
		
		io.flock(File::LOCK_EX) if io.is_a?(File)
		Marshal.dump([@lh_sides, @states, @symbols], io)
		io.flock(File::LOCK_UN) if io.is_a?(File)
		
		# Close the IO object if we opened it.
		io.close if opts[:use].is_a?(String)
	end
end

.get_io(o, mode = 'w') ⇒ `IO`, `false`

Converts an object into an IO object as appropriate.

Parameters:

o (Object) —

Object to be converted into an IO object.
mode (String) (defaults to: 'w') —

String representing the mode to open the IO object in.

Returns:

(IO, false) —

The IO object or false if a conversion wasn’t possible.

# File 'lib/rltk/parser.rb', line 674

def get_io(o, mode = 'w')
	if o.is_a?(TrueClass)
		$stdout
	elsif o.is_a?(String)
		File.open(o, mode)
	elsif o.is_a?(IO)
		o
	else
		false
	end
end

.grammar ⇒ `CFG`

Returns The grammar that can be parsed by this Parser.

Returns:

(CFG) —

The grammar that can be parsed by this Parser.



687
688
689

# File 'lib/rltk/parser.rb', line 687

def grammar
	@grammar.clone
end

.grammar_prime ⇒ `CFG`

This method generates and memoizes the G’ grammar used to calculate the LALR(1) lookahead sets. Information about this grammar and its use can be found in the following paper:

Simple Computation of LALR(1) Lookahed Sets Manuel E. Bermudez and George Logothetis Information Processing Letters 31 - 1989

Returns:

(CFG)

# File 'lib/rltk/parser.rb', line 700

def grammar_prime
	if not @grammar_prime
		@grammar_prime = CFG.new
		
		@states.each do |state|
			state.each do |item|
				lhs = "#{state.id}_#{item.next_symbol}".to_sym
				
				next unless CFG::is_nonterminal?(item.next_symbol) and not @grammar_prime.productions.keys.include?(lhs)
				
				@grammar.productions[item.next_symbol].each do |production|
					rhs = ""
					
					cstate = state
					
					production.rhs.each do |symbol|
						rhs += "#{cstate.id}_#{symbol} "
						
						cstate = @states[cstate.on?(symbol).first.id]
					end
					
					@grammar_prime.production(lhs, rhs)
				end
			end
		end
	end
	
	@grammar_prime
end

.inform_conflict(state_id, type, sym) ⇒ `void`

This method returns an undefined value.

Inform the parser core that a conflict has been detected.

Parameters:

state_id (Integer) —

ID of the state where the conflict was encountered.
type (:RR, :SR) —

Reduce/Reduce or Shift/Reduce conflict.
sym (Symbol) —

Symbol that caused the conflict.



737
738
739

# File 'lib/rltk/parser.rb', line 737

def inform_conflict(state_id, type, sym)
	@conflicts[state_id] << [type, sym]
end

.inherited(klass) ⇒ `void`

This method returns an undefined value.

Called when the Lexer class is sub-classed, it installes necessary instance class variables.



147
148
149

# File 'lib/rltk/parser.rb', line 147

def inherited(klass)
	klass.install_icvars
end

.install_icvars ⇒ `void`

This method returns an undefined value.

Installs instance class varialbes into a class.

# File 'lib/rltk/parser.rb', line 82

def install_icvars
	@curr_lhs		= nil
	@curr_prec	= nil
	
	@conflicts	= Hash.new {|h, k| h[k] = Array.new}
	@grammar		= CFG.new
	
	@lh_sides		= Hash.new
	@procs		= Array.new
	@states		= Array.new
	
	# Variables for dealing with precedence.
	@prec_counts		= {:left => 0, :right => 0, :non => 0}
	@production_precs	= Array.new
	@token_precs		= Hash.new
	
	# Set the default argument handling policy.
	@args = :splat
	
	@grammar.callback do |p, type, num|
		@procs[p.id] =
		[
			case type
			when :*
				case num
				when :first then	Proc.new { ||           [] }
				else				Proc.new { |os, o| os << o }
				end
				
			when :+
				case num
				when :first then	Proc.new { |o|         [o] }
				else				Proc.new { |os, o| os << o }
				end
				
			when :'?'
				case num
				when :first then	Proc.new { ||  nil }
				else				Proc.new { |o|   o }
				end
				
			when :elp
				case num
				when :first then	Proc.new { ||         [] }
				else				Proc.new { |prime| prime }
				end
				
			when :nelp
				case num
				when :first	then	Proc.new { |el|                                         [el] }
				when :second	then	Proc.new { |els, _, el|                           els + [el] }
				else				Proc.new { |*el| if el.length == 1 then el.first else el end }
				end
			end,
			p.rhs.length
		]
		
		@production_precs[p.id] = p.last_terminal
	end
end

.left(*symbols) ⇒ `void`

This method returns an undefined value.

This method is used to specify that the symbols in symbols are left-associative. Subsequent calls to this method will give their arguments higher precedence.

Parameters:

symbols (Array<Symbol>) —

Symbols that are left associative.

# File 'lib/rltk/parser.rb', line 748

def left(*symbols)
	prec_level = @prec_counts[:left] += 1
	
	symbols.map { |s| s.to_sym }.each do |sym|
		@token_precs[sym] = [:left, prec_level]
	end
end

.nonassoc(*symbols) ⇒ `void`

This method returns an undefined value.

This method is used to specify that the symbols in symbols are non-associative.

Parameters:

symbols (Array<Symbol>) —

Symbols that are non-associative.

# File 'lib/rltk/parser.rb', line 762

def nonassoc(*symbols)
	prec_level = @prec_counts[:non] += 1
	
	symbols.map { |s| s.to_sym }.each do |sym|
		@token_precs[sym] = [:non, prec_level]
	end
end

.nonempty_list_production(symbol, list_elements, separator) ⇒ `Object` Also known as: nonempty_list

Adds productions and actions for parsing nonempty lists.

.parse(tokens, opts = {}) ⇒ `Object`⁺

This function is where actual parsing takes place. The tokens argument must be an array of Token objects, the last of which has type EOS. By default this method will return the value computed by the first successful parse tree found. It is possible to adjust this behavior using the opts hash as follows:

:accept - Either :first or :all.
:env - The environment in which to evaluate the production actions.
:parse_tree - To print parse trees in the DOT language or not.
:verbose - To be verbose or not.

Additional information for these options can be found in the main documentation.

Parameters:

tokens (Array<Token>) —

Tokens to be parsed.

Returns:

(Object, Array<Object>) —

Result or results of parsing the given tokens.

# File 'lib/rltk/parser.rb', line 796

def parse(tokens, opts = {})
	# Get the full options hash.
	opts	= build_parse_opts(opts)
	v	= opts[:verbose]
	
	if opts[:verbose]
		v.puts("Input tokens:")
		v.puts(tokens.map { |t| t.type }.inspect)
		v.puts
	end
	
	# Stack IDs to keep track of them during parsing.
	stack_id = 0
	
	# Error mode indicators.
	error_mode		= false
	reduction_guard	= false
	
	# Our various list of stacks.
	accepted		= []
	moving_on		= []
	processing	= [ParseStack.new(stack_id += 1)]
	
	# Iterate over the tokens.  We don't procede to the
	# next token until every stack is done with the
	# current one.
	tokens.each do |token|
		# Check to make sure this token was seen in the
		# grammar definition.
		raise BadToken if not @symbols.include?(token.type)
		
		v.puts("Current token: #{token.type}#{if token.value then "(#{token.value})" end}") if v
		
		# Iterate over the stacks until each one is done.
		while (stack = processing.shift)
			# Get the available actions for this stack.
			actions = @states[stack.state].on?(token.type)
			
			if actions.empty?
				# If we are already in error mode and there
				# are no actions we skip this token.
				if error_mode
					moving_on << stack
					next
				end
				
				# We would be dropping the last stack so we
				# are going to go into error mode.
				if accepted.empty? and moving_on.empty? and processing.empty?
					# Try and find a valid error state.
					while stack.state
						if (actions = @states[stack.state].on?(:ERROR)).empty?
							# This state doesn't have an
							# error production. Moving on.
							stack.pop
						else
							# Enter the found error state.
							stack.push(actions.first.id, nil, :ERROR, token.position)
							
							break
						end
					end
					
					if stack.state
						# We found a valid error state.
						error_mode = reduction_guard = true
						opts[:env].he = true
						processing << stack
						
						v.puts('Invalid input encountered.  Entering error handling mode.') if v
					else
						# No valid error states could be
						# found.  Time to print a message
						# and leave.
						
						v.puts("No more actions for stack #{stack.id}.  Dropping stack.") if v
					end
				else
					v.puts("No more actions for stack #{stack.id}.  Dropping stack.") if v
				end
				
				next
			end
			
			# Make (stack, action) pairs, duplicating the
			# stack as necessary.
			pairs = [[stack, actions.pop]] + actions.map {|action| [stack.branch(stack_id += 1), action] }
			
			pairs.each do |stack, action|
				if v
					v.puts
					v.puts('Current stack:')
					v.puts("\tID: #{stack.id}")
					v.puts("\tState stack:\t#{stack.state_stack.inspect}")
					v.puts("\tOutput Stack:\t#{stack.output_stack.inspect}")
					v.puts
					v.puts("Action taken: #{action.to_s}")
				end
				
				if action.is_a?(Accept)
					if opts[:accept] == :all
						accepted << stack
					else
						v.puts('Accepting input.') if v
						opts[:parse_tree].puts(stack.tree) if opts[:parse_tree]
						
						if opts[:env].he
							raise HandledError.new(opts[:env].errors, stack.result)
						else
							return stack.result
						end
					end
				
				elsif action.is_a?(Reduce)
					# Get the production associated with this reduction.
					production_proc, pop_size = @procs[action.id]
					
					if not production_proc
						raise InternalParserException, "No production #{action.id} found."
					end
					
					args, positions = stack.pop(pop_size)
					opts[:env].set_positions(positions)
					
					result =
					if @args == :array
						opts[:env].instance_exec(args, &production_proc)
					else
						opts[:env].instance_exec(*args, &production_proc)
					end
					
					if (goto = @states[stack.state].on?(@lh_sides[action.id]).first)
						
						v.puts("Going to state #{goto.id}.\n") if v
						
						pos0 = nil
						
						if args.empty?
							# Empty productions need to be
							# handled specially.
							pos0 = stack.position
							
							pos0.stream_offset	+= pos0.length + 1
							pos0.line_offset	+= pos0.length + 1
							
							pos0.length = 0
						else
							pos0 = opts[:env].pos( 0)
							pos1 = opts[:env].pos(-1)
							
							pos0.length = (pos1.stream_offset + pos1.length) - pos0.stream_offset
						end
						
						stack.push(goto.id, result, @lh_sides[action.id], pos0)
					else
						raise InternalParserException, "No GoTo action found in state #{stack.state} " +
							"after reducing by production #{action.id}"
					end
					
					# This stack is NOT ready for the next
					# token.
					processing << stack
					
					# Exit error mode if necessary.
					error_mode = false if error_mode and not reduction_guard
					
				elsif action.is_a?(Shift)
					stack.push(action.id, token.value, token.type, token.position)
					
					# This stack is ready for the next
					# token.
					moving_on << stack
					
					# Exit error mode.
					error_mode = false
				end
			end
		end
		
		v.puts("\n\n") if v
		
		processing	= moving_on
		moving_on		= []
		
		# If we don't have any active stacks at this point the
		# string isn't in the language.
		if opts[:accept] == :first and processing.length == 0
			v.close if v and v != $stdout
			raise NotInLanguage
		end
		
		reduction_guard = false
	end
	
	# If we have reached this point we are accepting all parse
	# trees.
	if v
		v.puts("Accepting input with #{accepted.length} derivation(s).")
		
		v.close if v != $stdout
	end
	
	accepted.each do |stack|
		opts[:parse_tree].puts(stack.tree)
	end if opts[:parse_tree]
	
	results = accepted.map { |stack| stack.result }
	
	if opts[:env].he
		raise HandledError.new(opts[:env].errors, results)
	else
		return results
	end
end

.production(symbol, expression = nil, precedence = nil, &action) ⇒ `void` Also known as: p

This method returns an undefined value.

Adds a new production to the parser with a left-hand value of symbol. If expression is specified it is taken as the right-hand side of the production and action is associated with the production. If expression is nil then action is evaluated and expected to make one or more calls to Parser.clause. A precedence can be associate with this production by setting precedence to a terminal symbol.

Parameters:

symbol (Symbol) —

Left-hand side of the production.
expression (String, nil) (defaults to: nil) —

Right-hand side of the production.
precedence (Symbol, nil) (defaults to: nil) —

Symbol representing the precedence of this produciton.
action (Proc) —

Action associated with this production.

# File 'lib/rltk/parser.rb', line 1025

def production(symbol, expression = nil, precedence = nil, &action)
	
	# Check the symbol.
	if not (symbol.is_a?(Symbol) or symbol.is_a?(String)) or not CFG::is_nonterminal?(symbol)
		riase ParserConstructionException, 'Production symbols must be Strings or Symbols and be in all lowercase.'
	end
	
	@grammar.curr_lhs	= symbol.to_sym
	@curr_prec		= precedence
	
	if expression
		self.clause(expression, precedence, &action)
	else
		self.instance_exec(&action)
	end
	
	@grammar.curr_lhs	= nil
	@curr_prec		= nil
end

.prune(do_lookahead, do_precedence) ⇒ `void`

This method returns an undefined value.

This method uses lookahead sets and precedence information to resolve conflicts and remove unnecessary reduce actions.

Parameters:

do_lookahead (Boolean) —

Prune based on lookahead sets or not.
do_precedence (Boolean) —

Prune based on precedence or not.

# File 'lib/rltk/parser.rb', line 1053

def prune(do_lookahead, do_precedence)
	terms = @grammar.terms
	
	# If both options are false there is no pruning to do.
	return if not (do_lookahead or do_precedence)
	
	@states.each do |state0|
		
		#####################
		# Lookahead Pruning #
		#####################
		
		if do_lookahead
			# Find all of the reductions in this state.
			reductions = state0.actions.values.flatten.uniq.select { |a| a.is_a?(Reduce) }
			
			reductions.each do |reduction|
				production = @grammar.productions(:id)[reduction.id]
				
				lookahead = Array.new
				
				# Build the lookahead set.
				@states.each do |state1|
					if self.check_reachability(state1, state0, production.rhs)
						lookahead |= self.grammar_prime.follow_set("#{state1.id}_#{production.lhs}".to_sym)
					end
				end
				
				# Translate the G' follow symbols into G lookahead
				# symbols.
				lookahead = lookahead.map { |sym| sym.to_s.split('_').last.to_sym }.uniq
				
				# Here we remove the unnecessary reductions.
				# If there are error productions we need to
				# scale back the amount of pruning done.
				(terms - lookahead).each do |sym|
					if not (terms.include?(:ERROR) and not state0.conflict_on?(sym))
						state0.actions[sym].delete(reduction)
					end
				end
			end
		end
		
		########################################
		# Precedence and Associativity Pruning #
		########################################
		
		if do_precedence
			state0.actions.each do |symbol, actions|
				
				# We are only interested in pruning actions
				# for terminal symbols.
				next unless CFG::is_terminal?(symbol)
				
				# Skip to the next one if there is no 
				# possibility of a Shift/Reduce or
				# Reduce/Reduce conflict.
				next unless actions and actions.length > 1
				
				resolve_ok = actions.inject(true) do |m, a|
					if a.is_a?(Reduce)
						m and @production_precs[a.id]
					else
						m
					end
				end and actions.inject(false) { |m, a| m or a.is_a?(Shift) }
				
				if @token_precs[symbol] and resolve_ok
					max_prec = 0
					selected_action = nil
					
					# Grab the associativity and precedence
					# for the input token.
					tassoc, tprec = @token_precs[symbol]
					
					actions.each do |a|
						assoc, prec = a.is_a?(Shift) ? [tassoc, tprec] : @production_precs[a.id]
						
						# If two actions have the same precedence we
						# will only replace the previous production if:
						#  * The token is left associative and the current action is a Reduce
						#  * The token is right associative and the current action is a Shift
						if prec > max_prec or (prec == max_prec and tassoc == (a.is_a?(Shift) ? :right : :left))
							max_prec			= prec
							selected_action	= a
							
						elsif prec == max_prec and assoc == :nonassoc
							raise ParserConstructionException, 'Non-associative token found during conflict resolution.'
							
						end
					end
					
					state0.actions[symbol] = [selected_action]
				end
			end
		end
	end
end

.right(*symbols) ⇒ `void`

This method returns an undefined value.

This method is used to specify that the symbols in symbols are right associative. Subsequent calls to this method will give their arguments higher precedence.

Parameters:

symbols (Array<Symbol>) —

Symbols that are right-associative.

# File 'lib/rltk/parser.rb', line 1159

def right(*symbols)
	prec_level = @prec_counts[:right] += 1
	
	symbols.map { |s| s.to_sym }.each do |sym|
		@token_precs[sym] = [:right, prec_level]
	end
end

.start(symbol) ⇒ `void`

This method returns an undefined value.

Changes the starting symbol of the parser.

Parameters:

symbol (Symbol) —

The starting symbol of the grammar.



1172
1173
1174

# File 'lib/rltk/parser.rb', line 1172

def start(symbol)
	@grammar.start symbol
end

Instance Method Details

#parse(tokens, opts = {}) ⇒ `Object`

Parses the given token stream using the encapsulated environment.

Class: RLTK::Parser

Overview

Direct Known Subclasses

Defined Under Namespace

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ Parser

Instance Attribute Details

#env ⇒ Environment (readonly)

Class Method Details

.add_state(state) ⇒ Integer

.array_args ⇒ void

.build_finalize_opts(opts) ⇒ Hash{Symbol => Object}

.build_parse_opts(opts) ⇒ Hash{Symbol => Object}

.check_reachability(start, dest, symbols) ⇒ Boolean

.check_sanity ⇒ void

.clause(expression, precedence = nil, &action) ⇒ void Also known as: c

.clean ⇒ void

.empty_list_production(symbol, list_elements, separator) ⇒ Object Also known as: empty_list

.explain(io) ⇒ void

.finalize(opts = {}) ⇒ void

.get_io(o, mode = 'w') ⇒ IO, false

.grammar ⇒ CFG

.grammar_prime ⇒ CFG

.inform_conflict(state_id, type, sym) ⇒ void

.inherited(klass) ⇒ void

.install_icvars ⇒ void

.left(*symbols) ⇒ void

.nonassoc(*symbols) ⇒ void

.nonempty_list_production(symbol, list_elements, separator) ⇒ Object Also known as: nonempty_list

.parse(tokens, opts = {}) ⇒ Object+

.production(symbol, expression = nil, precedence = nil, &action) ⇒ void Also known as: p

.prune(do_lookahead, do_precedence) ⇒ void

.right(*symbols) ⇒ void

.start(symbol) ⇒ void

Instance Method Details

#parse(tokens, opts = {}) ⇒ Object

#initialize ⇒ `Parser`

#env ⇒ `Environment` (readonly)

.add_state(state) ⇒ `Integer`

.array_args ⇒ `void`

.build_finalize_opts(opts) ⇒ `Hash{Symbol => Object}`

.build_parse_opts(opts) ⇒ `Hash{Symbol => Object}`

.check_reachability(start, dest, symbols) ⇒ `Boolean`

.check_sanity ⇒ `void`

.clause(expression, precedence = nil, &action) ⇒ `void` Also known as: c

.clean ⇒ `void`

.empty_list_production(symbol, list_elements, separator) ⇒ `Object` Also known as: empty_list

.explain(io) ⇒ `void`

.finalize(opts = {}) ⇒ `void`

.get_io(o, mode = 'w') ⇒ `IO`, `false`

.grammar ⇒ `CFG`

.grammar_prime ⇒ `CFG`

.inform_conflict(state_id, type, sym) ⇒ `void`

.inherited(klass) ⇒ `void`

.install_icvars ⇒ `void`

.left(*symbols) ⇒ `void`

.nonassoc(*symbols) ⇒ `void`

.nonempty_list_production(symbol, list_elements, separator) ⇒ `Object` Also known as: nonempty_list

.parse(tokens, opts = {}) ⇒ `Object`⁺

.production(symbol, expression = nil, precedence = nil, &action) ⇒ `void` Also known as: p

.prune(do_lookahead, do_precedence) ⇒ `void`

.right(*symbols) ⇒ `void`

.start(symbol) ⇒ `void`

#parse(tokens, opts = {}) ⇒ `Object`