Class: RLTK::CFG

Inherits:
Object
  • Object
show all
Defined in:
lib/rltk/cfg.rb

Overview

The CFG class is used to represent context-free grammars. It is used by the RLTK::Parser class to represent the parser’s grammar, but can also be used to manipulate arbitrary CFGs.

Defined Under Namespace

Classes: Item, Production

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&callback) ⇒ CFG

Instantiates a new CFG object that uses callback to inform the programmer of the generation of new productions due to EBNF operators.

Parameters:

  • callback (Proc)

    A Proc object to be called when EBNF operators are expanded.



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/rltk/cfg.rb', line 75

def initialize(&callback)
	@curr_lhs           = nil
	@callback           = callback || Proc.new {}
	@lexer              = Lexers::EBNF.new
	@production_counter = -1
	@start_symbol       = nil
	@wrapper_symbol     = nil
	
	@productions_id     = Hash.new
	@productions_sym    = Hash.new { |h, k| h[k] = [] }
	@production_buffer  = Array.new
	
	@terms    = Set.new([:EOS])
	@nonterms = Set.new
	
	@firsts   = Hash.new
	@follows  = Hash.new { |h,k| h[k] = Array.new }
end

Instance Attribute Details

#curr_lhsSymbol

This is used by the #production method to wrap #clause calls.

Returns:

  • (Symbol)

    The current left-hand side symbol.



40
41
42
# File 'lib/rltk/cfg.rb', line 40

def curr_lhs
  @curr_lhs
end

#start_symbolSymbol (readonly)

Returns The grammar’s starting symbol.

Returns:

  • (Symbol)

    The grammar’s starting symbol.



34
35
36
# File 'lib/rltk/cfg.rb', line 34

def start_symbol
  @start_symbol
end

Class Method Details

.is_nonterminal?(sym) ⇒ Boolean

Tests to see if a symbol is a non-terminal symbol, as used by the CFG class.

Parameters:

  • sym (Symbol)

    The symbol to test.

Returns:

  • (Boolean)


62
63
64
# File 'lib/rltk/cfg.rb', line 62

def self.is_nonterminal?(sym)
	sym and (s = sym.to_s) == s.downcase
end

.is_terminal?(sym) ⇒ Boolean

Tests to see if a symbol is a terminal symbol, as used by the CFG class.

Parameters:

  • sym (Symbol)

    The symbol to test.

Returns:

  • (Boolean)


52
53
54
# File 'lib/rltk/cfg.rb', line 52

def self.is_terminal?(sym)
	sym and (s = sym.to_s) == s.upcase
end

Instance Method Details

#add_production(production) ⇒ void

This method returns an undefined value.

Adds production to the appropriate internal data structures.

Parameters:

  • production (Production)

    The production to add to the grammar.



99
100
101
102
103
# File 'lib/rltk/cfg.rb', line 99

def add_production(production)
	@productions_sym[production.lhs] << (@productions_id[production.id] = production)
	
	production
end

#build_list_production(name, list_elements, separator = '') ⇒ void Also known as: list

This method returns an undefined value.

Builds a production representing a (possibly empty) list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF * operator.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String>)

    Expression(s) that may appear in the list

  • separator (Symbol, String) (defaults to: '')

    The list separator symbol or symbols



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/rltk/cfg.rb', line 114

def build_list_production(name, list_elements, separator = '')
	# Add the items for the following productions:
	#
	# name: | name_prime
	
	name_prime = "#{name}_prime".to_sym
	
	# 1st Production
	production, _ = self.production(name, '')
	@callback.call(:elp, :empty, production)
	
	# 2nd Production
	production, _ = self.production(name, name_prime)
	@callback.call(:elp, :nonempty, production)
	
	# Add remaining productions via nonempty_list helper.
	self.nonempty_list(name_prime, list_elements, separator)
	
	name
end

#build_nonempty_list_production(name, list_elements, separator = '') ⇒ void Also known as: nonempty_list

This method returns an undefined value.

Builds a production representing a non-empty list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF + operator.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String, Symbol>)

    Expression(s) that may appear in the list

  • separator (Symbol, String) (defaults to: '')

    The list separator symbol or symbols



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/rltk/cfg.rb', line 145

def build_nonempty_list_production(name, list_elements, separator = '')
	# Add the items for the following productions:
	#
	# If there is only one list element:
	#
	#   name: list_element | name separator list_element
	#
	# else
	#
	#   name: name_list_elements | name separator name_list_elements
	#
	#   name_list_elements: #{list_elements.join('|')}
	
	build_elements_productions = false
	
	list_element_string =
	if list_elements.is_a?(Array)
		if list_elements.empty?
			raise ArgumentError, 'Parameter list_elements must not be empty.'
			
		elsif list_elements.length == 1
			list_elements.first
			
		else
			build_elements_productions = true
			"#{name}_list_elements"
		end
	else
		list_elements
	end
	
	list_element_selected_string = list_element_string.to_s.split.map { |s| ".#{s}" }.join(' ')
	
	# Single Element Production
	production, _ = self.production(name, list_element_string)
	@callback.call(:nelp, :single, production)
	
	# Multiple Element Production
	production, selections = self.production(name, ".#{name} #{separator} #{list_element_selected_string}")
	@callback.call(:nelp, :multiple, production, selections)
	
	if build_elements_productions
		# List Element Productions
		list_elements.each do |element|
			production, _ = self.production(list_element_string, element)
			@callback.call(:nelp, :elements, production)
		end
	end
	
	name
end

#build_optional_production(name, opt_symbol) ⇒ Symbol

Build a production for an optional symbol. This is used to eliminate the EBNF ? operator.

Parameters:

  • name (Symbol)

    The name for the new production

  • opt_symbol (Symbol)

    Symbol to expand

Returns:

  • (Symbol)

    The value of the name argument



205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/rltk/cfg.rb', line 205

def build_optional_production(name, opt_symbol)
	if not @productions_sym.has_key?(name)
		# Add the items for the following productions:
		#
		# name: | opt_symbol
		
		# Empty production.
		production = self.add_production(Production.new(self.next_id, name, []))
		@callback.call(:optional, :empty, production)
		
		# Nonempty production
		production = self.add_production(Production.new(self.next_id, name, [opt_symbol]))
		@callback.call(:optional, :nonempty, production)
		
		# Add the new symbol to the list of nonterminals.
		@nonterms << name
	end
	
	name
end

#callback(&callback) ⇒ void

This method returns an undefined value.

Sets the EBNF callback to callback.

Parameters:

  • callback (Proc)

    A Proc object to be called when EBNF operators are expanded and list productions are added.



231
232
233
234
235
# File 'lib/rltk/cfg.rb', line 231

def callback(&callback)
	@callback = callback if callback
	
	nil
end

#clause(expression) ⇒ Array(Production, Array<Integer>)

This function MUST be called inside a CFG.production block. It will make a new production with the left-hand side specified by the CFG.production call’s argument. This is the function that is responsible for removing EBNF symbols from the grammar.

Parameters:

  • expression (String, Symbol)

    The right-hand side of a CFG production.

Returns:

Raises:



245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'lib/rltk/cfg.rb', line 245

def clause(expression)
	raise GrammarError, 'CFG#clause called outside of CFG#production block.' if not @curr_lhs
	
	lhs        = @curr_lhs.to_sym
	rhs        = Array.new
	tokens     = @lexer.lex(expression.to_s)
	selections = Array.new
	
	# Set this as the start symbol if there isn't one already
	# defined.
	@start_symbol ||= lhs
	
	# Remove EBNF tokens and replace them with new productions.
	symbol_count = 0
	tokens.each_index do |i|
		ttype0  = tokens[i].type
		tvalue0 = tokens[i].value
		
		if ttype0 == :TERM or ttype0 == :NONTERM
			
			# Add this symbol to the correct collection.
			(ttype0 == :TERM ? @terms : @nonterms) << tvalue0
			
			if i + 1 < tokens.length
				ttype1  = tokens[i + 1].type
				tvalue1 = tokens[i + 1].value
				
				rhs <<
				case ttype1
				when :QUESTION then self.build_optional_production("#{tvalue0.downcase}_optional".to_sym, tvalue0)
				when :STAR     then self.build_list_production("#{tvalue0.downcase}_list".to_sym, tvalue0)
				when :PLUS     then self.build_nonempty_list_production("#{tvalue0.downcase}_nonempty_list".to_sym, tvalue0)
				else                tvalue0
				end
			else
				rhs << tvalue0
			end
			
			symbol_count += 1
			
		elsif ttype0 == :DOT
			selections << symbol_count
		end
	end
	
	# Make the production.
	@production_buffer << [(production = Production.new(self.next_id, lhs, rhs)), selections]
	
	# Make sure the production symbol is collected.
	@nonterms << lhs
	
	# Add the new production to our collections.
	self.add_production(production)
	
	return [production, selections]
end

#first_set(sentence) ⇒ Array<Symbol>

This function calculates the first set of a series of tokens. It uses the #first_set helper function to find the first set of individual symbols.

Parameters:

  • sentence (Symbol, Array<Symbol>)

    Sentence to find the *first set* for.

Returns:

  • (Array<Symbol>)

    The *first set* for the given sentence.



309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/rltk/cfg.rb', line 309

def first_set(sentence)
	if sentence.is_a?(Symbol)
		first_set_prime(sentence)
		
	elsif sentence.inject(true) { |m, sym| m and self.symbols.include?(sym) }
		set0 = []
		all_have_empty = true
		
		sentence.each do |sym|
			set0 |= (set1 = self.first_set(sym)) - [:'ɛ']
			
			break if not (all_have_empty = set1.include?(:'ɛ'))
		end
		
		if all_have_empty then set0 + [:'ɛ'] else set0 end
	end
end

#follow_set(sym0, seen_lh_sides = []) ⇒ Array<Symbol>

Returns the follow set for a given symbol. The second argument is used to avoid infinite recursion when mutually recursive rules are encountered.

Parameters:

  • sym0 (Symbol)

    The symbol to find the *follow set* for.

  • seen_lh_sides (Array<Symbol>) (defaults to: [])

    Previously seen LHS symbols.

Returns:

  • (Array<Symbol>)


389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
# File 'lib/rltk/cfg.rb', line 389

def follow_set(sym0, seen_lh_sides = [])
	
	# Use the memoized set if possible.
	return @follows[sym0] if @follows.has_key?(sym0)
	
	if @nonterms.member? sym0
		set0 = []
		
		# Add EOS to the start symbol's follow set.
		set0 << :EOS if sym0 == @start_symbol
		
		@productions_id.values.each do |production|
			production.rhs.each_with_index do |sym1, i|
				if i + 1 < production.rhs.length
					if sym0 == sym1
						set0 |= (set1 = self.first_set(production.rhs[(i + 1)..-1])) - [:'ɛ']
						
						set0 |= self.follow_set(production.lhs) if set1.include?(:'ɛ')
					end
				elsif sym0 != production.lhs and sym0 == sym1 and not seen_lh_sides.include?(production.lhs)
					set0 |= self.follow_set(production.lhs, seen_lh_sides << production.lhs)
				end
			end
		end
		
		if seen_lh_sides.empty? or not set0.empty?
			# Memoize the result for later.
			@follows[sym0] |= set0
		else
			set0
		end
	else
		[]
	end
end

#next_idInteger

Returns ID for the next production to be defined.

Returns:

  • (Integer)

    ID for the next production to be defined.



426
427
428
# File 'lib/rltk/cfg.rb', line 426

def next_id
	@production_counter += 1
end

#nontermsSet<Symbol>

Returns All terminal symbols used in the grammar’s definition.

Returns:

  • (Set<Symbol>)

    All terminal symbols used in the grammar’s definition.



431
432
433
# File 'lib/rltk/cfg.rb', line 431

def nonterms
	@nonterms.clone
end

#production(symbol, expression = nil, &block) ⇒ Production+

Builds a new production with the left-hand side value of symbol. If expression is specified it is take as the right-hand side of production. If expression is nil then block is evaluated, and expected to make one or more calls to #clause.

Parameters:

  • symbol (Symbol)

    The left-hand side of a production

  • expression (String, Symbol) (defaults to: nil)

    The right-hand side of a production

  • block (Proc)

    Optional block for defining production clauses

Returns:

  • (Production, Array<Production>)

    A single production if called with an expression; an array of productions otherwise



446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# File 'lib/rltk/cfg.rb', line 446

def production(symbol, expression = nil, &block)
	@production_buffer = Array.new
	
	prev_lhs  = @curr_lhs
	@curr_lhs = symbol
	
	ret_val =
	if expression
		self.clause(expression)
	else
		self.instance_exec(&block)
		
		@production_buffer.clone
	end
	
	@curr_lhs = prev_lhs
	return ret_val
end

#productions(by = :sym) ⇒ Array<Production>, Hash{Symbol => Production}

If by is :sym, returns a hash of the grammar’s productions, using the productions’ left-hand side symbol as the key. If by is :id an array of productions is returned in the order of their definition.

Parameters:

  • by (:sym, :id) (defaults to: :sym)

    The way in which productions should be returned.

Returns:



473
474
475
476
477
478
479
480
481
# File 'lib/rltk/cfg.rb', line 473

def productions(by = :sym)
	if by == :sym
		@productions_sym
	elsif by == :id
		@productions_id
	else
		nil
	end
end

#start(symbol) ⇒ Symbol

Sets the start symbol for this grammar.

Parameters:

  • symbol (Symbol)

    The new start symbol.

Returns:

  • (Symbol)


488
489
490
491
492
493
494
# File 'lib/rltk/cfg.rb', line 488

def start(symbol)
	if not CFG::is_nonterminal?(symbol)
		raise GrammarError, 'Start symbol must be a non-terminal.'
	end
	
	@start_symbol = symbol
end

#symbolsArray<Symbol>

Returns All symbols used in the grammar’s definition.

Returns:

  • (Array<Symbol>)

    All symbols used in the grammar’s definition.



497
498
499
# File 'lib/rltk/cfg.rb', line 497

def symbols
	self.terms + self.nonterms
end

#termsSet<Symbol>

Returns All terminal symbols used in the grammar’s definition.

Returns:

  • (Set<Symbol>)

    All terminal symbols used in the grammar’s definition.



502
503
504
# File 'lib/rltk/cfg.rb', line 502

def terms
	@terms.clone
end