Class: RLTK::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/rltk/parser.rb

Overview

The Parser class may be sub-classed to produce new parsers. These parsers have a lot of features, and are described in the main documentation.

Defined Under Namespace

Classes: Accept, Action, Environment, GoTo, ParseStack, ProdProc, Reduce, Shift, State

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeParser

Instantiates a new parser and creates an environment to be used for subsequent calls.



1255
1256
1257
# File 'lib/rltk/parser.rb', line 1255

def initialize
	@env = self.class::Environment.new
end

Instance Attribute Details

#envEnvironment (readonly)

Returns Environment used by the instantiated parser.

Returns:

  • (Environment)

    Environment used by the instantiated parser.



99
100
101
# File 'lib/rltk/parser.rb', line 99

def env
  @env
end

Class Method Details

.add_state(state) ⇒ Integer

If state (or its equivalent) is not in the state list it is added and it’s ID is returned. If there is already a state with the same items as state in the state list its ID is returned and state is discarded.

Parameters:

  • state (State)

    State to add to the parser.

Returns:

  • (Integer)

    The ID of the state.



193
194
195
196
197
198
199
200
201
202
203
# File 'lib/rltk/parser.rb', line 193

def add_state(state)
	if (id = @states.index(state))
		id
	else
		state.id = @states.length

		@states << state

		@states.length - 1
	end
end

.build_finalize_opts(opts) ⇒ Hash{Symbol => Object}

Build a hash with the default options for Parser.finalize and then update it with the values from opts.

Parameters:

  • opts (Hash{Symbol => Object})

    Hash containing options for finalize.

Returns:

  • (Hash{Symbol => Object})


211
212
213
214
215
216
217
218
219
220
# File 'lib/rltk/parser.rb', line 211

def build_finalize_opts(opts)
	opts[:explain]	= self.get_io(opts[:explain])

	{
		explain:    false,
		lookahead:  true,
		precedence: true,
		use:        false
	}.update(opts)
end

.build_list_production(symbol, list_elements, separator = '') ⇒ Object Also known as: list

Adds productions and actions for parsing empty lists.

See Also:

  • CFG#empty_list_production


401
402
403
# File 'lib/rltk/parser.rb', line 401

def build_list_production(symbol, list_elements, separator = '')
	@grammar.build_list_production(symbol, list_elements, separator)
end

.build_nonempty_list_production(symbol, list_elements, separator = '') ⇒ Object Also known as: nonempty_list

Adds productions and actions for parsing nonempty lists.

See Also:

  • CFG#nonempty_list_production


790
791
792
# File 'lib/rltk/parser.rb', line 790

def build_nonempty_list_production(symbol, list_elements, separator = '')
	@grammar.build_nonempty_list_production(symbol, list_elements, separator)
end

.build_parse_opts(opts) ⇒ Hash{Symbol => Object}

Build a hash with the default options for Parser.parse and then update it with the values from opts.

Parameters:

  • opts (Hash{Symbol => Object})

    Hash containing options for parse.

Returns:

  • (Hash{Symbol => Object})


229
230
231
232
233
234
235
236
237
238
239
# File 'lib/rltk/parser.rb', line 229

def build_parse_opts(opts)
	opts[:parse_tree] = self.get_io(opts[:parse_tree])
	opts[:verbose]    = self.get_io(opts[:verbose])

	{
		accept:     :first,
		env:        self::Environment.new,
		parse_tree: false,
		verbose:    false
	}.update(opts)
end

.check_reachability(start, dest, symbols) ⇒ Boolean

This method checks to see if the parser would be in parse state dest after starting in state start and reading symbols.

Parameters:

  • start (Symbol)

    Symbol representing a CFG production.

  • dest (Symbol)

    Symbol representing a CFG production.

  • symbols (Array<Symbol>)

    Grammar symbols.

Returns:

  • (Boolean)

    If the destination symbol is reachable from the start symbol after reading symbols.



302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
# File 'lib/rltk/parser.rb', line 302

def check_reachability(start, dest, symbols)
	path_exists = true
	cur_state   = start

	symbols.each do |sym|

		actions = @states[cur_state.id].on?(sym)
		actions = actions.select { |a| a.is_a?(Shift) } if CFG::is_terminal?(sym)

		if actions.empty?
			path_exists = false
			break
		end

		# There can only be one Shift action for terminals and
		# one GoTo action for non-terminals, so we know the
		# first action is the only one in the list.
		cur_state = @states[actions.first.id]
	end

	path_exists and cur_state.id == dest.id
end

.check_sanityvoid

This method returns an undefined value.

This method is used to (surprise) check the sanity of the constructed parser. It checks to make sure all non-terminals used in the grammar definition appear on the left-hand side of one or more productions, and that none of the parser’s states have invalid actions. If a problem is encountered a ParserConstructionException is raised.



250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'lib/rltk/parser.rb', line 250

def check_sanity
	# Check to make sure all non-terminals appear on the
	# left-hand side of some production.
	@grammar.nonterms.each do |sym|
		if not @lh_sides.values.include?(sym)
			raise ParserConstructionException, "Non-terminal #{sym} does not appear on the left-hand side of any production."
		end
	end

	# Check the actions in each state.
	each_state do |state|
		state.actions.each do |sym, actions|
			if CFG::is_terminal?(sym)
				# Here we check actions for terminals.
				actions.each do |action|
					if action.is_a?(Accept)
						if sym != :EOS
							raise ParserConstructionException, "Accept action found for terminal #{sym} in state #{state.id}."
						end

					elsif not (action.is_a?(GoTo) or action.is_a?(Reduce) or action.is_a?(Shift))
						raise ParserConstructionException, "Object of type #{action.class} found in actions for terminal " +
							"#{sym} in state #{state.id}."

					end
				end

				if (conflict = state.conflict_on?(sym))
					self.inform_conflict(state.id, conflict, sym)
				end
			else
				# Here we check actions for non-terminals.
				if actions.length > 1
					raise ParserConstructionException, "State #{state.id} has multiple GoTo actions for non-terminal #{sym}."

				elsif actions.length == 1 and not actions.first.is_a?(GoTo)
					raise ParserConstructionException, "State #{state.id} has non-GoTo action for non-terminal #{sym}."

				end
			end
		end
	end
end

.clause(expression, precedence = nil, arg_type = @default_arg_type, &action) ⇒ void Also known as: c

This method returns an undefined value.

Declares a new clause inside of a production. The right-hand side is specified by expression and the precedence of this production can be changed by setting the precedence argument to some terminal symbol.

Parameters:

  • expression (String, Symbol)

    Right-hand side of a production.

  • precedence (Symbol) (defaults to: nil)

    Symbol representing the precedence of this production.

  • arg_type (:array, :splat) (defaults to: @default_arg_type)

    Method to use when passing arguments to the action.

  • action (Proc)

    Action to be taken when the production is reduced.



336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
# File 'lib/rltk/parser.rb', line 336

def clause(expression, precedence = nil, arg_type = @default_arg_type, &action)
	# Use the curr_prec only if it isn't overridden for this
	# clause.
	precedence ||= @curr_prec

	production, selections = @grammar.clause(expression)

	# Check to make sure the action's arity matches the number
	# of symbols on the right-hand side.
	expected_arity = (selections.empty? ? production.rhs.length : selections.length)
	if arg_type == :splat and action.arity != expected_arity
		raise ParserConstructionException,
		      "Incorrect number of action parameters.  Expected #{expected_arity} but got #{action.arity}." +
			  ' Action arity must match the number of terminals and non-terminals in the clause.'
	end

	# Add the action to our proc list.
	@procs[production.id] = [ProdProc.new(arg_type, selections, &action), production.rhs.length]

	# If no precedence is specified use the precedence of the
	# last terminal in the production.
	@production_precs[production.id] = precedence || production.last_terminal
end

.cleanvoid

This method returns an undefined value.

Removes resources that were needed to generate the parser but aren’t needed when actually parsing input.



365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
# File 'lib/rltk/parser.rb', line 365

def clean
	# We've told the developer about conflicts by now.
	@conflicts = nil

	# Drop the grammar and the grammar'.
	@grammar       = nil
	@grammar_prime = nil

	# Drop precedence and bookkeeping information.
	@cur_lhs  = nil
	@cur_prec = nil

	@prec_counts      = nil
	@production_precs = nil
	@token_precs      = nil

	# Drop the items from each of the states.
	each_state { |state| state.clean }
end

.default_arg_type(type) ⇒ void Also known as: dat

This method returns an undefined value.

Set the default argument type for the actions associated with clauses. All actions defined after this call will be passed arguments in the way specified here, unless overridden in the call to clause.

Parameters:

  • type (:array, :splat)

    The default argument type.



393
394
395
# File 'lib/rltk/parser.rb', line 393

def default_arg_type(type)
	@default_arg_type = type if type == :array or type == :splat
end

.each_state {|state| ... } ⇒ void

This method returns an undefined value.

Iterate over the parser’s states.

Yield Parameters:

  • state (State)

    One of the parser automaton’s state objects



695
696
697
698
699
700
701
# File 'lib/rltk/parser.rb', line 695

def each_state
	current_state = 0
	while current_state < @states.count
		yield @states.at(current_state)
		current_state += 1
	end
end

.explain(io) ⇒ void

This method returns an undefined value.

This function will print a description of the parser to the provided IO object.

Parameters:

  • io (IO)

    Input/Output object used for printing the parser’s explanation.



412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
# File 'lib/rltk/parser.rb', line 412

def explain(io)
	if @grammar and not @states.empty?
		io.puts('###############')
		io.puts('# Productions #')
		io.puts('###############')
		io.puts

		max_id_length = @grammar.productions(:id).length.to_s.length

		# Print the productions.
		@grammar.productions.each do |sym, productions|

			max_rhs_length = productions.inject(0) { |m, p| if (len = p.to_s.length) > m then len else m end }

			productions.each do |production|
				p_string = production.to_s

				io.print("\tProduction #{sprintf("%#{max_id_length}d", production.id)}: #{p_string}")

				if (prec = @production_precs[production.id])
					io.print(' ' * (max_rhs_length - p_string.length))
					io.print(" : (#{sprintf("%-5s", prec.first)}, #{prec.last})")
				end

				io.puts
			end

			io.puts
		end

		io.puts('##########')
		io.puts('# Tokens #')
		io.puts('##########')
		io.puts

		max_token_len = @grammar.terms.inject(0) { |m, t| if t.length > m then t.length else m end }

		@grammar.terms.sort {|a,b| a.to_s <=> b.to_s }.each do |term|
			io.print("\t#{term}")

			if (prec = @token_precs[term])
				io.print(' ' * (max_token_len - term.length))
				io.print(" : (#{sprintf("%-5s", prec.first)}, #{prec.last})")
			end

			io.puts
		end

		io.puts

		io.puts('#####################')
		io.puts('# Table Information #')
		io.puts('#####################')
		io.puts

		io.puts("\tStart symbol: #{@grammar.start_symbol}'")
		io.puts

		io.puts("\tTotal number of states: #{@states.length}")
		io.puts

		io.puts("\tTotal conflicts: #{@conflicts.values.flatten(1).length}")
		io.puts

		@conflicts.each do |state_id, conflicts|
			io.puts("\tState #{state_id} has #{conflicts.length} conflict(s)")
		end

		io.puts if not @conflicts.empty?

		# Print the parse table.
		io.puts('###############')
		io.puts('# Parse Table #')
		io.puts('###############')
		io.puts

		each_state do |state|
			io.puts("State #{state.id}:")
			io.puts

			io.puts("\t# ITEMS #")
			max = state.items.inject(0) do |max, item|
				if item.lhs.to_s.length > max then item.lhs.to_s.length else max end
			end

			state.each do |item|
				io.puts("\t#{item.to_s(max)}")
			end

			io.puts
			io.puts("\t# ACTIONS #")

			state.actions.keys.sort {|a,b| a.to_s <=> b.to_s}.each do |sym|
				state.actions[sym].each do |action|
					io.puts("\tOn #{sym} #{action}")
				end
			end

			io.puts
			io.puts("\t# CONFLICTS #")

			if @conflicts[state.id].length == 0
				io.puts("\tNone\n\n")
			else
				@conflicts[state.id].each do |conflict|
					type, sym = conflict

					io.print("\t#{if type == :SR then "Shift/Reduce" else "Reduce/Reduce" end} conflict")

					io.puts(" on #{sym}")
				end

				io.puts
			end
		end

		# Close any IO objects that aren't $stdout.
		io.close if io.is_a?(IO) and io != $stdout
	else
		raise ParserConstructionException, 'Parser.explain called outside of finalize.'
	end
end

.finalize(opts = {}) ⇒ void

This method returns an undefined value.

This method will finalize the parser causing the construction of states and their actions, and the resolution of conflicts using lookahead and precedence information.

No calls to production may appear after the call to Parser.finalize.

Parameters:

  • opts (Hash) (defaults to: {})

    Options describing how to finalize the parser.

Options Hash (opts):

  • :explain (Boolean, String, IO)

    To explain the parser or not.

  • :lookahead (Boolean)

    To use lookahead info for conflict resolution.

  • :precedence (Boolean)

    To use precedence info for conflict resolution.

  • :use (String, IO)

    A file name or object that is used to load/save the parser.



550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
# File 'lib/rltk/parser.rb', line 550

def finalize(opts = {})

	if @grammar.productions.empty?
		raise ParserConstructionException,
		      "Parser has no productions.  Cowardly refusing to construct an empty parser."
	end

	# Get the full options hash.
	opts = build_finalize_opts(opts)

	# Get the name of the file in which the parser is defined.
	#
	# FIXME: See why this is failing for the simple ListParser example.
	def_file = caller()[2].split(':')[0] if opts[:use]

	# Check to make sure we can load the necessary information
	# from the specified object.
	if opts[:use] and (
		(opts[:use].is_a?(String) and File.exists?(opts[:use]) and File.mtime(opts[:use]) > File.mtime(def_file)) or
		(opts[:use].is_a?(File) and opts[:use].mtime > File.mtime(def_file))
		)

		file = self.get_io(opts[:use], 'r')

		# Un-marshal our saved data structures.
		file.flock(File::LOCK_SH)
		@lh_sides, @states, @symbols = Marshal.load(file)
		file.flock(File::LOCK_UN)

		# Close the file if we opened it.
		file.close if opts[:use].is_a?(String)

		# Remove any un-needed data and return.
		return self.clean
	end

	# Grab all of the symbols that comprise the grammar
	# (besides the start symbol).
	@symbols = @grammar.symbols << :ERROR

	# Add our starting state to the state list.
	@start_symbol       = (@grammar.start_symbol.to_s + '\'').to_sym
	start_production, _ = @grammar.production(@start_symbol, @grammar.start_symbol).first
	start_state         = State.new(@symbols, [start_production.to_item])

	start_state.close(@grammar.productions)

	self.add_state(start_state)

	# Translate the precedence of productions from tokens to
	# (associativity, precedence) pairs.
	@production_precs.map! { |prec| @token_precs[prec] }

	# Build the rest of the transition table.
	each_state do |state|
		#Transition states.
		tstates = Hash.new { |h,k| h[k] = State.new(@symbols) }

		#Bin each item in this set into reachable transition
		#states.
		state.each do |item|
			if (next_symbol = item.next_symbol)
				tstates[next_symbol] << item.copy
			end
		end

		# For each transition state:
		#  1) Get transition symbol
		#  2) Advance dot
		#  3) Close it
		#  4) Get state id and add transition
		tstates.each do |symbol, tstate|
			tstate.each { |item| item.advance }

			tstate.close(@grammar.productions)

			id = self.add_state(tstate)

			# Add Goto and Shift actions.
			state.on(symbol, CFG::is_nonterminal?(symbol) ? GoTo.new(id) : Shift.new(id))
		end

		# Find the Accept and Reduce actions for this state.
		state.each do |item|
			if item.at_end?
				if item.lhs == @start_symbol
					state.on(:EOS, Accept.new)
				else
					state.add_reduction(@grammar.productions(:id)[item.id])
				end
			end
		end
	end

	# Build the production.id -> production.lhs map.
	@grammar.productions(:id).each { |id, production| @lh_sides[id] = production.lhs }

	# Prune the parsing table for unnecessary reduce actions.
	self.prune(opts[:lookahead], opts[:precedence])

	# Check the parser for inconsistencies.
	self.check_sanity

	# Print the table if requested.
	self.explain(opts[:explain]) if opts[:explain]

	# Remove any data that is no longer needed.
	self.clean

	# Store the parser's final data structures if requested.
	if opts[:use]
		io = self.get_io(opts[:use])

		io.flock(File::LOCK_EX) if io.is_a?(File)
		Marshal.dump([@lh_sides, @states, @symbols], io)
		io.flock(File::LOCK_UN) if io.is_a?(File)

		# Close the IO object if we opened it.
		io.close if opts[:use].is_a?(String)
	end
end

.get_io(o, mode = 'w') ⇒ IO, false

Converts an object into an IO object as appropriate.

Parameters:

  • o (Object)

    Object to be converted into an IO object.

  • mode (String) (defaults to: 'w')

    String representing the mode to open the IO object in.

Returns:

  • (IO, false)

    The IO object or false if a conversion wasn’t possible.



678
679
680
681
682
683
684
685
686
687
688
# File 'lib/rltk/parser.rb', line 678

def get_io(o, mode = 'w')
	if o.is_a?(TrueClass)
		$stdout
	elsif o.is_a?(String)
		File.open(o, mode)
	elsif o.is_a?(IO)
		o
	else
		false
	end
end

.grammarCFG

Returns The grammar that can be parsed by this Parser.

Returns:

  • (CFG)

    The grammar that can be parsed by this Parser.



704
705
706
# File 'lib/rltk/parser.rb', line 704

def grammar
	@grammar.clone
end

.grammar_primeCFG

This method generates and memoizes the G’ grammar used to calculate the LALR(1) lookahead sets. Information about this grammar and its use can be found in the following paper:

Simple Computation of LALR(1) Lookahead Sets Manuel E. Bermudez and George Logothetis Information Processing Letters 31 - 1989

Returns:



717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
# File 'lib/rltk/parser.rb', line 717

def grammar_prime
	if not @grammar_prime
		@grammar_prime = CFG.new

		each_state do |state|
			state.each do |item|
				lhs = "#{state.id}_#{item.next_symbol}".to_sym

				next unless CFG::is_nonterminal?(item.next_symbol) and not @grammar_prime.productions.keys.include?(lhs)

				@grammar.productions[item.next_symbol].each do |production|
					rhs = ''

					cstate = state

					production.rhs.each do |symbol|
						rhs += "#{cstate.id}_#{symbol} "

						cstate = @states[cstate.on?(symbol).first.id]
					end

					@grammar_prime.production(lhs, rhs)
				end
			end
		end
	end

	@grammar_prime
end

.inform_conflict(state_id, type, sym) ⇒ void

This method returns an undefined value.

Inform the parser core that a conflict has been detected.

Parameters:

  • state_id (Integer)

    ID of the state where the conflict was encountered.

  • type (:RR, :SR)

    Reduce/Reduce or Shift/Reduce conflict.

  • sym (Symbol)

    Symbol that caused the conflict.



754
755
756
# File 'lib/rltk/parser.rb', line 754

def inform_conflict(state_id, type, sym)
	@conflicts[state_id] << [type, sym]
end

.inherited(klass) ⇒ void

This method returns an undefined value.

Called when the Lexer class is sub-classed, it installes necessary instance class variables.



181
182
183
# File 'lib/rltk/parser.rb', line 181

def inherited(klass)
	klass.install_icvars
end

.install_icvarsvoid

This method returns an undefined value.

Installs instance class varialbes into a class.



119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/rltk/parser.rb', line 119

def install_icvars
	@curr_lhs  = nil
	@curr_prec = nil

	@conflicts = Hash.new {|h, k| h[k] = Array.new}
	@grammar   = CFG.new

	@lh_sides  = Hash.new
	@procs     = Array.new
	@states    = Array.new

	# Variables for dealing with precedence.
	@prec_counts      = {:left => 0, :right => 0, :non => 0}
	@production_precs = Array.new
	@token_precs      = Hash.new
	@token_hooks      = Hash.new {|h, k| h[k] = []}

	# Set the default argument handling policy.  Valid values
	# are :array and :splat.
	@default_arg_type = :splat

	@grammar.callback do |type, which, p, sels = []|
		@procs[p.id] = [
			case type
			when :optional
				case which
				when :empty then ProdProc.new { ||  nil }
				else             ProdProc.new { |o|   o }
				end

			when :elp
				case which
				when :empty then ProdProc.new { ||         [] }
				else             ProdProc.new { |prime| prime }
				end

			when :nelp
				case which
				when :single
					ProdProc.new { |el| [el] }

				when :multiple
					ProdProc.new(:splat, sels) do |*syms|
						el = syms[1..-1]
						syms.first << (el.length == 1 ? el.first : el)
					end

				else
					ProdProc.new { |*el|   el.length == 1 ? el.first : el }
				end
			end,
			p.rhs.length
		]

		@production_precs[p.id] = p.last_terminal
	end
end

.left(*symbols) ⇒ void

This method returns an undefined value.

This method is used to specify that the symbols in symbols are left-associative. Subsequent calls to this method will give their arguments higher precedence.

Parameters:

  • symbols (Array<Symbol>)

    Symbols that are left associative.



765
766
767
768
769
770
771
# File 'lib/rltk/parser.rb', line 765

def left(*symbols)
	prec_level = @prec_counts[:left] += 1

	symbols.map { |s| s.to_sym }.each do |sym|
		@token_precs[sym] = [:left, prec_level]
	end
end

.new(*args) ⇒ Object

The overridden new prevents un-finalized parsers from being instantiated.



108
109
110
111
112
113
114
# File 'lib/rltk/parser.rb', line 108

def new(*args)
	if @symbols.nil?
		raise UselessParserException
	else
		super(*args)
	end
end

.nonassoc(*symbols) ⇒ void

This method returns an undefined value.

This method is used to specify that the symbols in symbols are non-associative.

Parameters:

  • symbols (Array<Symbol>)

    Symbols that are non-associative.



779
780
781
782
783
784
785
# File 'lib/rltk/parser.rb', line 779

def nonassoc(*symbols)
	prec_level = @prec_counts[:non] += 1

	symbols.map { |s| s.to_sym }.each do |sym|
		@token_precs[sym] = [:non, prec_level]
	end
end

.parse(tokens, opts = {}) ⇒ Object+

This function is where actual parsing takes place. The tokens argument must be an array of Token objects, the last of which has type EOS. By default this method will return the value computed by the first successful parse tree found.

Additional information about the parsing options can be found in the main documentation.

Parameters:

  • tokens (Array<Token>)

    Tokens to be parsed.

  • opts (Hash) (defaults to: {})

    Options to use when parsing input.

Options Hash (opts):

  • :accept (:first, :all)

    Either :first or :all.

  • :env (Object)

    The environment in which to evaluate the production action.

  • :parse_tree (Boolean, String, IO)

    To print parse trees in the DOT language or not.

  • :verbose (Boolean, String, IO)

    To be verbose or not.

Returns:

  • (Object, Array<Object>)

    Result or results of parsing the given tokens.



812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
# File 'lib/rltk/parser.rb', line 812

def parse(tokens, opts = {})
	# Get the full options hash.
	opts = build_parse_opts(opts)
	v    = opts[:verbose]

	if opts[:verbose]
		v.puts("Input tokens:")
		v.puts(tokens.map { |t| t.type }.inspect)
		v.puts
	end

	# Stack IDs to keep track of them during parsing.
	stack_id = 0

	# Error mode indicators.
	error_mode      = false
	reduction_guard = false

	# Our various list of stacks.
	accepted   = []
	moving_on  = []
	processing = [ParseStack.new(stack_id += 1)]

	# Iterate over the tokens.  We don't procede to the
	# next token until every stack is done with the
	# current one.
	tokens.each_with_index do |token, index|
		# Check to make sure this token was seen in the
		# grammar definition.
		raise BadToken if not @symbols.include?(token.type)

		v.puts("Current token: #{token.type}#{if token.value then "(#{token.value})" end}") if v

		# Iterate over the stacks until each one is done.
		while (stack = processing.shift)
			# Execute any token hooks in this stack's environment.
			@token_hooks[token.type].each { |hook| opts[:env].instance_exec &hook}

			# Get the available actions for this stack.
			actions = @states[stack.state].on?(token.type)

			if actions.empty?
				# If we are already in error mode and there
				# are no actions we skip this token.
				if error_mode
					v.puts("Discarding token: #{token.type}#{if token.value then "(#{token.value})" end}") if v

					# Add the current token to the array
					# that corresponds to the output value
					# for the ERROR token.
					stack.output_stack.last << token

					moving_on << stack
					next
				end

				# We would be dropping the last stack so we
				# are going to go into error mode.
				if accepted.empty? and moving_on.empty? and processing.empty?

					if v
						v.puts
						v.puts('Current stack:')
						v.puts("\tID: #{stack.id}")
						v.puts("\tState stack:\t#{stack.state_stack.inspect}")
						v.puts("\tOutput Stack:\t#{stack.output_stack.inspect}")
						v.puts
					end

					# Try and find a valid error state.
					while stack.state
						if (actions = @states[stack.state].on?(:ERROR)).empty?
							# This state doesn't have an
							# error production. Moving on.
							stack.pop
						else
							# Enter the found error state.
							stack.push(actions.first.id, [token], :ERROR, token.position)

							break
						end
					end

					if stack.state
						# We found a valid error state.
						error_mode = reduction_guard = true
						opts[:env].he = true
						moving_on << stack

						if v
							v.puts('Invalid input encountered.  Entering error handling mode.')
							v.puts("Discarding token: #{token.type}#{if token.value then "(#{token.value})" end}")
						end
					else
						# No valid error states could be
						# found.  Time to print a message
						# and leave.

						v.puts("No more actions for stack #{stack.id}.  Dropping stack.") if v
					end
				else
					v.puts("No more actions for stack #{stack.id}.  Dropping stack.") if v
				end

				next
			end

			# Make (stack, action) pairs, duplicating the
			# stack as necessary.
			pairs = [[stack, actions.pop]] + actions.map {|action| [stack.branch(stack_id += 1), action] }

			pairs.each do |stack, action|
				if v
					v.puts
					v.puts('Current stack:')
					v.puts("\tID: #{stack.id}")
					v.puts("\tState stack:\t#{stack.state_stack.inspect}")
					v.puts("\tOutput Stack:\t#{stack.output_stack.inspect}")
					v.puts
					v.puts("Action taken: #{action.to_s}")
				end

				if action.is_a?(Accept)
					if opts[:accept] == :all
						accepted << stack
					else
						v.puts('Accepting input.') if v
						opts[:parse_tree].puts(stack.tree) if opts[:parse_tree]

						if opts[:env].he
							raise HandledError.new(opts[:env].errors, stack.result)
						else
							return stack.result
						end
					end

				elsif action.is_a?(Reduce)
					# Get the production associated with this reduction.
					production_proc, pop_size = @procs[action.id]

					if not production_proc
						raise InternalParserException, "No production #{action.id} found."
					end

					args, positions = stack.pop(pop_size)
					opts[:env].set_positions(positions)

					if not production_proc.selections.empty?
						args = args.values_at(*production_proc.selections)
					end

					result =
					if production_proc.arg_type == :array
						opts[:env].instance_exec(args, &production_proc)
					else
						opts[:env].instance_exec(*args, &production_proc)
					end

					if (goto = @states[stack.state].on?(@lh_sides[action.id]).first)

						v.puts("Going to state #{goto.id}.\n") if v

						pos0 = nil

						if args.empty?
							# Empty productions need to be
							# handled specially.
							pos0 = stack.position

							pos0.stream_offset	+= pos0.length + 1
							pos0.line_offset	+= pos0.length + 1

							pos0.length = 0
						else
							pos0 = opts[:env].pos( 0)
							pos1 = opts[:env].pos(-1)

							pos0.length = (pos1.stream_offset + pos1.length) - pos0.stream_offset
						end

						stack.push(goto.id, result, @lh_sides[action.id], pos0)
					else
						raise InternalParserException, "No GoTo action found in state #{stack.state} " +
							"after reducing by production #{action.id}"
					end

					# This stack is NOT ready for the next
					# token.
					processing << stack

					# Exit error mode if necessary.
					error_mode = false if error_mode and not reduction_guard

				elsif action.is_a?(Shift)
					stack.push(action.id, token.value, token.type, token.position)

					# This stack is ready for the next
					# token.
					moving_on << stack

					# Exit error mode.
					error_mode = false
				end
			end
		end

		v.puts("\n\n") if v

		processing = moving_on
		moving_on  = []

		# If we don't have any active stacks at this point the
		# string isn't in the language.
		if opts[:accept] == :first and processing.length == 0
			v.close if v and v != $stdout
			raise NotInLanguage.new(tokens[0...index], tokens[index], tokens[index.next..-1])
		end

		reduction_guard = false
	end

	# If we have reached this point we are accepting all parse
	# trees.
	if v
		v.puts("Accepting input with #{accepted.length} derivation(s).")

		v.close if v != $stdout
	end

	accepted.each do |stack|
		opts[:parse_tree].puts(stack.tree)
	end if opts[:parse_tree]

	results = accepted.map { |stack| stack.result }

	if opts[:env].he
		raise HandledError.new(opts[:env].errors, results)
	else
		return results
	end
end

.production(symbol, expression = nil, precedence = nil, arg_type = @default_arg_type, &action) ⇒ void Also known as: p

This method returns an undefined value.

Adds a new production to the parser with a left-hand value of symbol. If expression is specified it is taken as the right-hand side of the production and action is associated with the production. If expression is nil then action is evaluated and expected to make one or more calls to Parser.clause. A precedence can be associate with this production by setting precedence to a terminal symbol.

Parameters:

  • symbol (Symbol)

    Left-hand side of the production.

  • expression (String, Symbol, nil) (defaults to: nil)

    Right-hand side of the production.

  • precedence (Symbol, nil) (defaults to: nil)

    Symbol representing the precedence of this produciton.

  • arg_type (:array, :splat) (defaults to: @default_arg_type)

    Method to use when passing arguments to the action.

  • action (Proc)

    Action associated with this production.



1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
# File 'lib/rltk/parser.rb', line 1069

def production(symbol, expression = nil, precedence = nil, arg_type = @default_arg_type, &action)

	# Check the symbol.
	if not (symbol.is_a?(Symbol) or symbol.is_a?(String)) or not CFG::is_nonterminal?(symbol)
		raise ParserConstructionException, 'Production symbols must be Strings or Symbols and be in all lowercase.'
	end

	@grammar.curr_lhs = symbol.to_sym
	@curr_prec        = precedence

	orig_dat = nil
	if arg_type != @default_arg_type
		orig_dat = @default_arg_type
		@default_arg_type = arg_type
	end

	if expression
		self.clause(expression, precedence, &action)
	else
		self.instance_exec(&action)
	end

	@default_arg_type = orig_dat if not orig_dat.nil?

	@grammar.curr_lhs = nil
	@curr_prec        = nil
end

.prune(do_lookahead, do_precedence) ⇒ void

This method returns an undefined value.

This method uses lookahead sets and precedence information to resolve conflicts and remove unnecessary reduce actions.

Parameters:

  • do_lookahead (Boolean)

    Prune based on lookahead sets or not.

  • do_precedence (Boolean)

    Prune based on precedence or not.



1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
# File 'lib/rltk/parser.rb', line 1105

def prune(do_lookahead, do_precedence)
	terms = @grammar.terms

	# If both options are false there is no pruning to do.
	return if not (do_lookahead or do_precedence)

	each_state do |state0|

		#####################
		# Lookahead Pruning #
		#####################

		if do_lookahead
			# Find all of the reductions in this state.
			reductions = state0.actions.values.flatten.uniq.select { |a| a.is_a?(Reduce) }

			reductions.each do |reduction|
				production = @grammar.productions(:id)[reduction.id]

				lookahead = Array.new

				# Build the lookahead set.
				each_state do |state1|
					if self.check_reachability(state1, state0, production.rhs)
						lookahead |= self.grammar_prime.follow_set("#{state1.id}_#{production.lhs}".to_sym)
					end
				end

				# Translate the G' follow symbols into G
				# lookahead symbols.
				lookahead = lookahead.map { |sym| sym.to_s.split('_', 2).last.to_sym }.uniq

				# Here we remove the unnecessary reductions.
				# If there are error productions we need to
				# scale back the amount of pruning done.
				pruning_candidates = terms - lookahead

				if terms.include?(:ERROR)
					pruning_candidates.each do |sym|
						state0.actions[sym].delete(reduction) if state0.conflict_on?(sym)
					end
				else
					pruning_candidates.each { |sym| state0.actions[sym].delete(reduction) }
				end
			end
		end

		########################################
		# Precedence and Associativity Pruning #
		########################################

		if do_precedence
			state0.actions.each do |symbol, actions|

				# We are only interested in pruning actions
				# for terminal symbols.
				next unless CFG::is_terminal?(symbol)

				# Skip to the next one if there is no
				# possibility of a Shift/Reduce or
				# Reduce/Reduce conflict.
				next unless actions and actions.length > 1

				resolve_ok = actions.inject(true) do |m, a|
					if a.is_a?(Reduce)
						m and @production_precs[a.id]
					else
						m
					end
				end and actions.inject(false) { |m, a| m or a.is_a?(Shift) }

				if @token_precs[symbol] and resolve_ok
					max_prec = 0
					selected_action = nil

					# Grab the associativity and precedence
					# for the input token.
					tassoc, tprec = @token_precs[symbol]

					actions.each do |a|
						assoc, prec = a.is_a?(Shift) ? [tassoc, tprec] : @production_precs[a.id]

						# If two actions have the same precedence we
						# will only replace the previous production if:
						#  * The token is left associative and the current action is a Reduce
						#  * The token is right associative and the current action is a Shift
						if prec > max_prec or (prec == max_prec and tassoc == (a.is_a?(Shift) ? :right : :left))
							max_prec        = prec
							selected_action = a

						elsif prec == max_prec and assoc == :nonassoc
							raise ParserConstructionException, 'Non-associative token found during conflict resolution.'

						end
					end

					state0.actions[symbol] = [selected_action]
				end
			end
		end
	end
end

.right(*symbols) ⇒ void

This method returns an undefined value.

This method is used to specify that the symbols in symbols are right associative. Subsequent calls to this method will give their arguments higher precedence.

Parameters:

  • symbols (Array<Symbol>)

    Symbols that are right-associative.



1215
1216
1217
1218
1219
1220
1221
# File 'lib/rltk/parser.rb', line 1215

def right(*symbols)
	prec_level = @prec_counts[:right] += 1

	symbols.map { |s| s.to_sym }.each do |sym|
		@token_precs[sym] = [:right, prec_level]
	end
end

.start(symbol) ⇒ void

This method returns an undefined value.

Changes the starting symbol of the parser.

Parameters:

  • symbol (Symbol)

    The starting symbol of the grammar.



1228
1229
1230
# File 'lib/rltk/parser.rb', line 1228

def start(symbol)
	@grammar.start symbol
end

.token_hook(sym, &proc) ⇒ void

This method returns an undefined value.

Add a hook that is executed whenever sym is seen.

The sym must be a terminal symbol.

Parameters:

  • sym (Symbol)

    Symbol to hook into

  • proc (Proc)

    Code to execute when the block is seen



1240
1241
1242
1243
1244
1245
1246
# File 'lib/rltk/parser.rb', line 1240

def token_hook(sym, &proc)
	if CFG::is_terminal?(sym)
		@token_hooks[sym] << proc
	else
		raise 'Method token_hook expects `sym` to be non-terminal.'
	end
end

Instance Method Details

#parse(tokens, opts = {}) ⇒ Object

Parses the given token stream using the encapsulated environment.

See Also:



1262
1263
1264
# File 'lib/rltk/parser.rb', line 1262

def parse(tokens, opts = {})
	self.class.parse(tokens, {:env => @env}.update(opts))
end