Class: RLTK::CFG

Inherits:
Object
  • Object
show all
Defined in:
lib/rltk/cfg.rb

Overview

The CFG class is used to represent context-free grammars. It is used by the RLTK::Parser class to represent the parser’s grammar, but can also be used to manipulate arbitrary CFGs.

Defined Under Namespace

Classes: Item, Production

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&callback) ⇒ CFG

Instantiates a new CFG object that uses callback to inform the programmer of the generation of new productions due to EBNF operators.

Parameters:

  • callback (Proc)

    A Proc object to be called when EBNF operators are expanded.



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/rltk/cfg.rb', line 75

def initialize(&callback)
	@curr_lhs           = nil
	@callback           = callback || Proc.new {}
	@lexer              = Lexers::EBNF.new
	@production_counter = -1
	@start_symbol       = nil
	@wrapper_symbol     = nil

	@productions_id     = Hash.new
	@productions_sym    = Hash.new { |h, k| h[k] = [] }
	@production_buffer  = Array.new

	@terms    = Set.new([:EOS])
	@nonterms = Set.new

	@firsts   = Hash.new
	@follows  = Hash.new { |h,k| h[k] = Array.new }
end

Instance Attribute Details

#curr_lhsSymbol

This is used by the #production method to wrap #clause calls.

Returns:

  • (Symbol)

    The current left-hand side symbol.



40
41
42
# File 'lib/rltk/cfg.rb', line 40

def curr_lhs
  @curr_lhs
end

#start_symbolSymbol (readonly)

Returns The grammar’s starting symbol.

Returns:

  • (Symbol)

    The grammar’s starting symbol.



34
35
36
# File 'lib/rltk/cfg.rb', line 34

def start_symbol
  @start_symbol
end

Class Method Details

.is_nonterminal?(sym) ⇒ Boolean

Tests to see if a symbol is a non-terminal symbol, as used by the CFG class.

Parameters:

  • sym (Symbol)

    The symbol to test.

Returns:

  • (Boolean)


62
63
64
# File 'lib/rltk/cfg.rb', line 62

def self.is_nonterminal?(sym)
	sym and (s = sym.to_s) == s.downcase
end

.is_terminal?(sym) ⇒ Boolean

Tests to see if a symbol is a terminal symbol, as used by the CFG class.

Parameters:

  • sym (Symbol)

    The symbol to test.

Returns:

  • (Boolean)


52
53
54
# File 'lib/rltk/cfg.rb', line 52

def self.is_terminal?(sym)
	sym and (s = sym.to_s) == s.upcase
end

Instance Method Details

#add_production(production) ⇒ void

This method returns an undefined value.

Adds production to the appropriate internal data structures.

Parameters:

  • production (Production)

    The production to add to the grammar.



99
100
101
102
103
# File 'lib/rltk/cfg.rb', line 99

def add_production(production)
	@productions_sym[production.lhs] << (@productions_id[production.id] = production)

	production
end

#build_list_production(name, list_elements, separator = '') ⇒ void Also known as: list

This method returns an undefined value.

Builds a production representing a (possibly empty) list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF * operator.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String>)

    Expression(s) that may appear in the list

  • separator (Symbol, String) (defaults to: '')

    The list separator symbol or symbols



132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/rltk/cfg.rb', line 132

def build_list_production(name, list_elements, separator = '')
	# Add the items for the following productions:
	#
	# name: | name_prime

	name_prime = "#{name}_prime".to_sym

	# 1st Production
	production, _ = self.production(name, '')
	@callback.call(:elp, :empty, production)

	# 2nd Production
	production, _ = self.production(name, name_prime)
	@callback.call(:elp, :nonempty, production)

	# Add remaining productions via nonempty_list helper.
	self.nonempty_list(name_prime, list_elements, separator)

	name
end

#build_nonempty_list_production(name, list_elements, separator = '') ⇒ void Also known as: nonempty_list

This method returns an undefined value.

Builds a production representing a non-empty list of tokens. These tokens may optionally be separated by a provided token. This function is used to eliminate the EBNF + operator.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String, Symbol>)

    Expression(s) that may appear in the list

  • separator (Symbol, String) (defaults to: '')

    The list separator symbol or symbols



181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/rltk/cfg.rb', line 181

def build_nonempty_list_production(name, list_elements, separator = '')
	# Add the items for the following productions:
	#
	# If there is only one list element:
	#
	#   name: list_element | name separator list_element
	#
	# else
	#
	#   name: name_list_elements | name separator name_list_elements
	#
	#   name_list_elements: #{list_elements.join('|')}

	build_elements_productions = false

	list_element_string =
	if list_elements.is_a?(Array)
		if list_elements.empty?
			raise ArgumentError, 'Parameter list_elements must not be empty.'

		elsif list_elements.length == 1
			list_elements.first

		else
			build_elements_productions = true
			"#{name}_list_elements"
		end
	else
		list_elements
	end

	list_element_selected_string = list_element_string.to_s.split.map { |s| ".#{s}" }.join(' ')

	# Single Element Production
	production, _ = self.production(name, list_element_string)
	@callback.call(:nelp, :single, production)

	# Multiple Element Production
	production, selections = self.production(name, ".#{name} #{separator} #{list_element_selected_string}")
	@callback.call(:nelp, :multiple, production, selections)

	if build_elements_productions
		# List Element Productions
		list_elements.each do |element|
			production, _ = self.production(list_element_string, element)
			@callback.call(:nelp, :elements, production)
		end
	end

	name
end

#build_optional_production(name, opt_symbol) ⇒ Symbol Also known as: optional

Build a production for an optional symbol. This is used to eliminate the EBNF ? operator.

Parameters:

  • name (Symbol)

    The name for the new production

  • opt_symbol (Symbol)

    Symbol to expand

Returns:

  • (Symbol)

    The value of the name argument



258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# File 'lib/rltk/cfg.rb', line 258

def build_optional_production(name, opt_symbol)
	if not @productions_sym.has_key?(name)
		# Add the items for the following productions:
		#
		# name: | opt_symbol

		# Empty production.
		production = self.add_production(Production.new(self.next_id, name, []))
		@callback.call(:optional, :empty, production)

		# Nonempty production
		production = self.add_production(Production.new(self.next_id, name, [opt_symbol]))
		@callback.call(:optional, :nonempty, production)

		# Add the new symbol to the list of nonterminals.
		@nonterms << name
	end

	name
end

#callback(&callback) ⇒ void

This method returns an undefined value.

Sets the EBNF callback to callback.

Parameters:

  • callback (Proc)

    A Proc object to be called when EBNF operators are expanded and list productions are added.



285
286
287
288
289
# File 'lib/rltk/cfg.rb', line 285

def callback(&callback)
	@callback = callback if callback

	nil
end

#clause(expression) ⇒ Array(Production, Array<Integer>)

This function MUST be called inside a CFG.production block. It will make a new production with the left-hand side specified by the CFG.production call’s argument. This is the function that is responsible for removing EBNF symbols from the grammar.

Parameters:

  • expression (String, Symbol)

    The right-hand side of a CFG production.

Returns:

Raises:



299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
# File 'lib/rltk/cfg.rb', line 299

def clause(expression)
	raise GrammarError, 'CFG#clause called outside of CFG#production block.' if not @curr_lhs

	lhs        = @curr_lhs.to_sym
	rhs        = Array.new
	tokens     = @lexer.lex(expression.to_s)
	selections = Array.new

	# Set this as the start symbol if there isn't one already
	# defined.
	@start_symbol ||= lhs

	# Remove EBNF tokens and replace them with new productions.
	symbol_count = 0
	tokens.each_index do |i|
		ttype0  = tokens[i].type
		tvalue0 = tokens[i].value

		if ttype0 == :TERM or ttype0 == :NONTERM

			# Add this symbol to the correct collection.
			(ttype0 == :TERM ? @terms : @nonterms) << tvalue0

			if i + 1 < tokens.length
				ttype1  = tokens[i + 1].type
				tvalue1 = tokens[i + 1].value

				rhs <<
				case ttype1
				when :QUESTION then self.get_optional_production("#{tvalue0.downcase}_optional".to_sym, tvalue0)
				when :STAR     then self.get_list_production("#{tvalue0.downcase}_list".to_sym, tvalue0)
				when :PLUS     then self.get_nonempty_list_production("#{tvalue0.downcase}_nonempty_list".to_sym, tvalue0)
				else                tvalue0
				end
			else
				rhs << tvalue0
			end

			symbol_count += 1

		elsif ttype0 == :DOT
			selections << symbol_count
		end
	end

	# Make the production.
	@production_buffer << [(production = Production.new(self.next_id, lhs, rhs)), selections]

	# Make sure the production symbol is collected.
	@nonterms << lhs

	# Add the new production to our collections.
	self.add_production(production)

	return [production, selections]
end

#first_set(sentence) ⇒ Array<Symbol>

This function calculates the first set of a series of tokens. It uses the #first_set helper function to find the first set of individual symbols.

Parameters:

  • sentence (Symbol, Array<Symbol>)

    Sentence to find the *first set* for.

Returns:

  • (Array<Symbol>)

    The *first set* for the given sentence.



363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
# File 'lib/rltk/cfg.rb', line 363

def first_set(sentence)
	if sentence.is_a?(Symbol)
		first_set_prime(sentence)

	elsif sentence.inject(true) { |m, sym| m and self.symbols.include?(sym) }
		set0 = []
		all_have_empty = true

		sentence.each do |sym|
			set0 |= (set1 = self.first_set(sym)) - [:'ɛ']

			break if not (all_have_empty = set1.include?(:'ɛ'))
		end

		if all_have_empty then set0 + [:'ɛ'] else set0 end
	end
end

#follow_set(sym0, seen_lh_sides = []) ⇒ Array<Symbol>

Returns the follow set for a given symbol. The second argument is used to avoid infinite recursion when mutually recursive rules are encountered.

Parameters:

  • sym0 (Symbol)

    The symbol to find the *follow set* for.

  • seen_lh_sides (Array<Symbol>) (defaults to: [])

    Previously seen LHS symbols.

Returns:

  • (Array<Symbol>)


443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
# File 'lib/rltk/cfg.rb', line 443

def follow_set(sym0, seen_lh_sides = [])

	# Use the memoized set if possible.
	return @follows[sym0] if @follows.has_key?(sym0)

	if @nonterms.member? sym0
		set0 = []

		# Add EOS to the start symbol's follow set.
		set0 << :EOS if sym0 == @start_symbol

		@productions_id.values.each do |production|
			production.rhs.each_with_index do |sym1, i|
				if i + 1 < production.rhs.length
					if sym0 == sym1
						set0 |= (set1 = self.first_set(production.rhs[(i + 1)..-1])) - [:'ɛ']

						set0 |= self.follow_set(production.lhs) if set1.include?(:'ɛ')
					end
				elsif sym0 != production.lhs and sym0 == sym1 and not seen_lh_sides.include?(production.lhs)
					set0 |= self.follow_set(production.lhs, seen_lh_sides << production.lhs)
				end
			end
		end

		if seen_lh_sides.empty? or not set0.empty?
			# Memoize the result for later.
			@follows[sym0] |= set0
		else
			set0
		end
	else
		[]
	end
end

#get_list_production(name, list_elements, separator = '') ⇒ void Also known as: get_list

This method returns an undefined value.

If the production already exists it will be returned. If it does not exist then it will be created and then returned.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String>)

    Expression(s) that may appear in the list

  • separator (Symbol, String) (defaults to: '')

    The list separator symbol or symbols



113
114
115
116
117
118
119
120
# File 'lib/rltk/cfg.rb', line 113

def get_list_production(name, list_elements, separator = '')
	if @nonterms.include?(name)
		name

	else
		build_list_production(name, list_elements, separator)
	end
end

#get_nonempty_list_production(name, list_elements, separator = '') ⇒ void Also known as: get_nonempty_list

This method returns an undefined value.

If the production already exists it will be returned. If it does not exist then it will be created and then returned.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String>)

    Expression(s) that may appear in the list

  • separator (Symbol, String) (defaults to: '')

    The list separator symbol or symbols



162
163
164
165
166
167
168
169
# File 'lib/rltk/cfg.rb', line 162

def get_nonempty_list_production(name, list_elements, separator = '')
	if @nonterms.include?(name)
		name

	else
		build_nonempty_list_production(name, list_elements, separator)
	end
end

#get_optional_production(name, list_elements) ⇒ void Also known as: get_optional

This method returns an undefined value.

If the production already exists it will be returned. If it does not exist then it will be created and then returned.

Parameters:

  • name (Symbol)

    The name of the production to add

  • list_elements (String, Symbol, Array<String>)

    Expression(s) that may appear in the list



241
242
243
244
245
246
247
248
# File 'lib/rltk/cfg.rb', line 241

def get_optional_production(name, list_elements)
	if @nonterms.include?(name)
		name

	else
		build_optional_production(name, list_elements)
	end
end

#next_idInteger

Returns ID for the next production to be defined.

Returns:

  • (Integer)

    ID for the next production to be defined.



480
481
482
# File 'lib/rltk/cfg.rb', line 480

def next_id
	@production_counter += 1
end

#nontermsSet<Symbol>

Returns All terminal symbols used in the grammar’s definition.

Returns:

  • (Set<Symbol>)

    All terminal symbols used in the grammar’s definition.



485
486
487
# File 'lib/rltk/cfg.rb', line 485

def nonterms
	@nonterms.clone
end

#production(symbol, expression = nil, &block) ⇒ Production+

Builds a new production with the left-hand side value of symbol. If expression is specified it is take as the right-hand side of production. If expression is nil then block is evaluated, and expected to make one or more calls to #clause.

Parameters:

  • symbol (Symbol)

    The left-hand side of a production

  • expression (String, Symbol) (defaults to: nil)

    The right-hand side of a production

  • block (Proc)

    Optional block for defining production clauses

Returns:

  • (Production, Array<Production>)

    A single production if called with an expression; an array of productions otherwise



500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
# File 'lib/rltk/cfg.rb', line 500

def production(symbol, expression = nil, &block)
	@production_buffer = Array.new

	prev_lhs  = @curr_lhs
	@curr_lhs = symbol

	ret_val =
	if expression
		self.clause(expression)
	else
		self.instance_exec(&block)

		@production_buffer.clone
	end

	@curr_lhs = prev_lhs
	return ret_val
end

#productions(by = :sym) ⇒ Array<Production>, Hash{Symbol => Production}

If by is :sym, returns a hash of the grammar’s productions, using the productions’ left-hand side symbol as the key. If by is :id an array of productions is returned in the order of their definition.

Parameters:

  • by (:sym, :id) (defaults to: :sym)

    The way in which productions should be returned.

Returns:



527
528
529
530
531
532
533
534
535
# File 'lib/rltk/cfg.rb', line 527

def productions(by = :sym)
	if by == :sym
		@productions_sym
	elsif by == :id
		@productions_id
	else
		nil
	end
end

#start(symbol) ⇒ Symbol

Sets the start symbol for this grammar.

Parameters:

  • symbol (Symbol)

    The new start symbol.

Returns:

  • (Symbol)


542
543
544
545
546
547
548
# File 'lib/rltk/cfg.rb', line 542

def start(symbol)
	if not CFG::is_nonterminal?(symbol)
		raise GrammarError, 'Start symbol must be a non-terminal.'
	end

	@start_symbol = symbol
end

#symbolsArray<Symbol>

Returns All symbols used in the grammar’s definition.

Returns:

  • (Array<Symbol>)

    All symbols used in the grammar’s definition.



551
552
553
# File 'lib/rltk/cfg.rb', line 551

def symbols
	self.terms + self.nonterms
end

#termsSet<Symbol>

Returns All terminal symbols used in the grammar’s definition.

Returns:

  • (Set<Symbol>)

    All terminal symbols used in the grammar’s definition.



556
557
558
# File 'lib/rltk/cfg.rb', line 556

def terms
	@terms.clone
end