Class: RLTK::CFG

Inherits:
Object show all
Defined in:
lib/rltk/cfg.rb

Overview

The CFG class is used to represent context-free grammars. It is used by the RLTK::Parser class to represent the parser’s grammar, but can also be used to manipulate arbitrary CFGs.

Defined Under Namespace

Classes: Item, Production

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&callback) ⇒ CFG

Instantiates a new CFG object that uses callback to inform the programmer of the generation of new productions due to EBNF operators.

Parameters:

  • callback (Proc)

    A Proc object to be called when EBNF operators are expanded.



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/rltk/cfg.rb', line 73

def initialize(&callback)
	@curr_lhs			= nil
	@callback			= callback || Proc.new {}
	@lexer			= Lexers::EBNF.new
	@production_counter	= -1
	@start_symbol		= nil
	@wrapper_symbol	= nil
	
	@productions_id	= Hash.new
	@productions_sym	= Hash.new { |h, k| h[k] = [] }
	@production_buffer	= Array.new
	
	@terms	= Hash.new(false).update({:EOS => true})
	@nonterms	= Hash.new(false)
	
	@firsts	= Hash.new
	@follows	= Hash.new { |h,k| h[k] = Array.new }
end

Instance Attribute Details

#curr_lhsSymbol

This is used by the #production method to wrap #clause calls.

Returns:

  • (Symbol)

    The current left-hand side symbol.



38
39
40
# File 'lib/rltk/cfg.rb', line 38

def curr_lhs
  @curr_lhs
end

#start_symbolSymbol (readonly)

Returns The grammar’s starting symbol.

Returns:

  • (Symbol)

    The grammar’s starting symbol.



32
33
34
# File 'lib/rltk/cfg.rb', line 32

def start_symbol
  @start_symbol
end

Class Method Details

.is_nonterminal?(sym) ⇒ Boolean

Tests to see if a symbol is a non-terminal symbol, as used by the CFG class.

Parameters:

  • sym (Symbol)

    The symbol to test.

Returns:

  • (Boolean)


60
61
62
# File 'lib/rltk/cfg.rb', line 60

def self.is_nonterminal?(sym)
	sym and (s = sym.to_s) == s.downcase
end

.is_terminal?(sym) ⇒ Boolean

Tests to see if a symbol is a terminal symbol, as used by the CFG class.

Parameters:

  • sym (Symbol)

    The symbol to test.

Returns:

  • (Boolean)


50
51
52
# File 'lib/rltk/cfg.rb', line 50

def self.is_terminal?(sym)
	sym and (s = sym.to_s) == s.upcase
end

Instance Method Details

#add_production(production) ⇒ void

This method returns an undefined value.

Adds production to the appropriate internal data structures.

Parameters:

  • production (Production)

    The production to add to the grammar.



97
98
99
100
101
# File 'lib/rltk/cfg.rb', line 97

def add_production(production)
	@productions_sym[production.lhs] << (@productions_id[production.id] = production)
	
	production
end

#callback(&callback) ⇒ void

This method returns an undefined value.

Sets the EBNF callback to callback.

Parameters:

  • callback (Proc)

    A Proc object to be called when EBNF operators are expanded and list productions are added.



108
109
110
# File 'lib/rltk/cfg.rb', line 108

def callback(&callback)
	@callback = callback || Proc.new {}
end

#clause(expression) ⇒ Production

This function MUST be called inside a CFG.production block. It will make a new production with the left-hand side specified by the CFG.production call’s argument. This is the function that is responsible for removing EBNF symbols from the grammar.

Parameters:

  • expression (String)

    The right-hand side of a CFG production.

Returns:

Raises:



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/rltk/cfg.rb', line 120

def clause(expression)
	raise GrammarError, 'CFG#clause called outside of CFG#production block.' if not @curr_lhs
	
	lhs		= @curr_lhs.to_sym
	rhs		= Array.new
	tokens	= @lexer.lex(expression)
	
	# Set this as the start symbol if there isn't one already
	# defined.
	@start_symbol ||= lhs
	
	# Remove EBNF tokens and replace them with new productions.
	tokens.each_index do |i|
		ttype0	= tokens[i].type
		tvalue0	= tokens[i].value
		
		if ttype0 == :TERM or ttype0 == :NONTERM
			
			# Add this symbol to the correct collection.
			(ttype0 == :TERM ? @terms : @nonterms)[tvalue0] = true
			
			if i + 1 < tokens.length
				ttype1	= tokens[i + 1].type
				tvalue1	= tokens[i + 1].value
				
				rhs <<
				case ttype1
				when :'?'	then self.get_question(tvalue0)						
				when :*	then self.get_star(tvalue0)
				when :+	then self.get_plus(tvalue0)
				else			tvalue0
				end
			else
				rhs << tvalue0
			end
		end
	end
	
	# Make the production.
	@production_buffer << (production = Production.new(self.next_id, lhs, rhs))
	
	# Make sure the production symbol is collected.
	@nonterms[lhs] = true
	
	# Add the new production to our collections.
	self.add_production(production)
	
	return production
end

#empty_list_production(symbol, list_elements, separator) ⇒ void Also known as: empty_list

This method returns an undefined value.

This method adds the necessary productions for empty lists to the grammar. These productions are named ‘symbol`, `symbol + ’_prime’‘ and `symbol + ’_elements’‘

Parameters:

  • symbol (Symbol)

    The name of the production to add.

  • list_elements (Array<String>)

    An array of expressions that may appear in the list.

  • separator (Symbol)

    The list separator symbol.



179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/rltk/cfg.rb', line 179

def empty_list_production(symbol, list_elements, separator)
	# Add the items for the following productions:
	#
	# symbol: | symbol_prime
	
	prime = symbol.to_s + '_prime'
	
	# 1st Production
	production = self.production(symbol, '').first
	@callback.call(production, :elp, :first)
	
	# 2nd Production
	production = self.production(symbol, prime.to_s).first
	@callback.call(production, :elp, :second)
	
	self.nonempty_list(prime, list_elements, separator)
end

#first_set(sentence) ⇒ Array<Symbol>

Returns The *first set* for the given sentence.

Parameters:

  • sentence (Symbol, Array<Symbol>)

    Sentence to find the *first set* for.

Returns:

  • (Array<Symbol>)

    The *first set* for the given sentence.



201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/rltk/cfg.rb', line 201

def first_set(sentence)
	if sentence.is_a?(Symbol)
		first_set_prime(sentence)
		
	elsif sentence.inject(true) { |m, sym| m and self.symbols.include?(sym) }
		set0 = []
		all_have_empty = true
		
		sentence.each do |sym|
			set0 |= (set1 = self.first_set(sym)) - [:'ɛ']
			
			break if not (all_have_empty = set1.include?(:'ɛ'))
		end
		
		if all_have_empty then set0 + [:'ɛ'] else set0 end
	end
end

#follow_set(sym0, seen_lh_sides = []) ⇒ Array<Symbol>

Returns the follow set for a given symbol. The second argument is used to avoid infinite recursion when mutually recursive rules are encountered.

Parameters:

  • sym0 (Symbol)

    The symbol to find the *follow set* for.

  • seen_lh_sides (Array<Symbol>) (defaults to: [])

    Previously seen LHS symbols.

Returns:

  • (Array<Symbol>)


284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# File 'lib/rltk/cfg.rb', line 284

def follow_set(sym0, seen_lh_sides = [])
	
	# Use the memoized set if possible.
	return @follows[sym0] if @follows.has_key?(sym0)
	
	if @nonterms[sym0]
		set0 = []
		
		# Add EOS to the start symbol's follow set.
		set0 << :EOS if sym0 == @start_symbol
		
		@productions_id.values.each do |production|
			production.rhs.each_with_index do |sym1, i|
				if i + 1 < production.rhs.length
					if sym0 == sym1
						set0 |= (set1 = self.first_set(production.rhs[(i + 1)..-1])) - [:'ɛ']
						
						set0 |= self.follow_set(production.lhs) if set1.include?(:'ɛ')
					end
				elsif sym0 != production.lhs and sym0 == sym1 and not seen_lh_sides.include?(production.lhs)
					set0 |= self.follow_set(production.lhs, seen_lh_sides << production.lhs)
				end
			end
		end
		
		if seen_lh_sides.empty? or not set0.empty?
			# Memoize the result for later.
			@follows[sym0] |= set0
		else
			set0
		end
	else
		[]
	end
end

#get_plus(symbol) ⇒ Symbol

Builds productions used to eliminate the + EBNF operator.

Parameters:

  • symbol (Symbol)

    Symbol to expand.

Returns:

  • (Symbol)


325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
# File 'lib/rltk/cfg.rb', line 325

def get_plus(symbol)
	new_symbol = (symbol.to_s.downcase + '_plus').to_sym
	
	if not @productions_sym.has_key?(new_symbol)
		# Add the items for the following productions:
		#
		# token_plus: token | token token_plus
		
		# 1st production
		production = self.add_production(Production.new(self.next_id, new_symbol, [symbol]))
		@callback.call(production, :+, :first)
		
		# 2nd production
		production = self.add_production(Production.new(self.next_id, new_symbol, [new_symbol, symbol]))
		@callback.call(production, :+, :second)
		
		# Add the new symbol to the list of nonterminals.
		@nonterms[new_symbol] = true
	end
	
	return new_symbol
end

#get_question(symbol) ⇒ Symbol

Builds productions used to eliminate the ? EBNF operator.

Parameters:

  • symbol (Symbol)

    Symbol to expand.

Returns:

  • (Symbol)


353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
# File 'lib/rltk/cfg.rb', line 353

def get_question(symbol)
	new_symbol = (symbol.to_s.downcase + '_question').to_sym
	
	if not @productions_sym.has_key?(new_symbol)
		# Add the items for the following productions:
		#
		# nonterm_question: | nonterm
		
		# 1st (empty) production.
		production = self.add_production(Production.new(self.next_id, new_symbol, []))
		@callback.call(production, :'?', :first)
		
		# 2nd production
		production = self.add_production(Production.new(self.next_id, new_symbol, [symbol]))
		@callback.call(production, :'?', :second)
		
		# Add the new symbol to the list of nonterminals.
		@nonterms[new_symbol] = true
	end
	
	return new_symbol
end

#get_star(symbol) ⇒ Symbol

Builds productions used to eliminate the * EBNF operator.

Parameters:

  • symbol (Symbol)

    Symbol to expand.

Returns:

  • (Symbol)


381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
# File 'lib/rltk/cfg.rb', line 381

def get_star(symbol)
	new_symbol = (symbol.to_s.downcase + '_star').to_sym
	
	if not @productions_sym.has_key?(new_symbol)
		# Add the items for the following productions:
		#
		# token_star: | token token_star
		
		# 1st (empty) production
		production = self.add_production(Production.new(self.next_id, new_symbol, []))
		@callback.call(production, :*, :first)
		
		# 2nd production
		production = self.add_production(Production.new(self.next_id, new_symbol, [new_symbol, symbol]))
		@callback.call(production, :*, :second)
		
		# Add the new symbol to the list of nonterminals.
		@nonterms[new_symbol] = true
	end
	
	return new_symbol
end

#next_idInteger

Returns ID for the next production to be defined.

Returns:

  • (Integer)

    ID for the next production to be defined.



405
406
407
# File 'lib/rltk/cfg.rb', line 405

def next_id
	@production_counter += 1
end

#nonempty_list_production(symbol, list_elements, separator) ⇒ void Also known as: nonempty_list

This method returns an undefined value.

This method adds the necessary productions for non-empty lists to the grammar. These productions are named ‘symbol` and `symbol + ’_elements’‘

Parameters:

  • symbol (Symbol)

    The name of the production to add.

  • list_elements (String, Symbol, Array<String>)

    Expression(s) that may appear in the list.

  • separator (Symbol)

    The list separator symbol.



418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
# File 'lib/rltk/cfg.rb', line 418

def nonempty_list_production(symbol, list_elements, separator)
	# Add the items for the following productions:
	#
	# symbol_elements: list_elements.join('|')
	#
	# symbol: symbol_elements | symbol separator symbol_elements
	
	if list_elements.is_a?(String) or list_elements.is_a?(Symbol)
		list_elements = [list_elements.to_s]
		
	elsif list_elements.is_a?(Array)
		if list_elements.empty?
			raise ArgumentError, 'Parameter list_elements must not be empty.'
		else
			list_elements.map! { |el| el.to_s }
		end
		
	else
		raise ArgumentError, 'Parameter list_elements must be a String, Symbol, or array of Strings and Symbols.'
	end
	
	el_symbol = (symbol.to_s + '_elements').to_sym
	
	# 1st Production
	production = self.production(symbol, el_symbol.to_s).first
	@callback.call(production, :nelp, :first)
	
	# 2nd Production
	production = self.production(symbol, "#{symbol} #{separator} #{el_symbol}").first
	@callback.call(production, :nelp, :second)
	
	# 3rd Productions
	list_elements.each do |el|
		production = self.production(el_symbol, el).first
		@callback.call(production, :nelp, :third)
	end
end

#nontermsArray<Symbol>

Returns All terminal symbols used in the grammar’s definition.

Returns:

  • (Array<Symbol>)

    All terminal symbols used in the grammar’s definition.



458
459
460
# File 'lib/rltk/cfg.rb', line 458

def nonterms
	@nonterms.keys
end

#production(symbol, expression = nil, &block) ⇒ Array<Production>

Builds a new production with the left-hand side value of symbol. If expression is specified it is take as the right-hand side of production. If expression is nil then block is evaluated, and expected to make one or more calls to #clause.

Parameters:

  • symbol (Symbol)

    The right-hand side of a production.

  • expression (String) (defaults to: nil)

    The left-hand side of a production.

  • block (Proc)

    Optional block for defining production clauses.

Returns:



472
473
474
475
476
477
478
479
480
481
482
483
484
# File 'lib/rltk/cfg.rb', line 472

def production(symbol, expression = nil, &block)
	@production_buffer = Array.new
	@curr_lhs = symbol
	
	if expression
		self.clause(expression)
	else
		self.instance_exec(&block)
	end
	
	@curr_lhs = nil
	return @production_buffer.clone
end

#productions(by = :sym) ⇒ Array<Production>, Hash{Symbol => Production}

If by is :sym, returns a hash of the grammar’s productions, using the productions’ left-hand side symbol as the key. If by is :id an array of productions is returned in the order of their definition.

Parameters:

  • by (:sym, :id) (defaults to: :sym)

    The way in which productions should be returned.

Returns:



494
495
496
497
498
499
500
501
502
# File 'lib/rltk/cfg.rb', line 494

def productions(by = :sym)
	if by == :sym
		@productions_sym
	elsif by == :id
		@productions_id
	else
		nil
	end
end

#start(symbol) ⇒ Symbol

Sets the start symbol for this grammar.

Parameters:

  • symbol (Symbol)

    The new start symbol.

Returns:

  • (Symbol)


509
510
511
512
513
514
515
# File 'lib/rltk/cfg.rb', line 509

def start(symbol)
	if not CFG::is_nonterminal?(symbol)
		raise GrammarError, 'Start symbol must be a non-terminal.'
	end
	
	@start_symbol = symbol
end

#symbolsArray<Symbol>

Returns All symbols used in the grammar’s definition.

Returns:

  • (Array<Symbol>)

    All symbols used in the grammar’s definition.



518
519
520
# File 'lib/rltk/cfg.rb', line 518

def symbols
	self.terms + self.nonterms
end

#termsArray<Symbol>

Returns All terminal symbols used in the grammar’s definition.

Returns:

  • (Array<Symbol>)

    All terminal symbols used in the grammar’s definition.



523
524
525
# File 'lib/rltk/cfg.rb', line 523

def terms
	@terms.keys
end