ShuntingYard

TL;DR Usage

parser = ShuntingYard::Parser.new

parser.add_pattern :space, /\s+/, -> (_) { nil }
parser.add_pattern :argument_separator, /\,/
parser.add_pattern :operator, /[\+\-\*\/\^]/
parser.add_pattern :parenthesis, /[\(\)]/
parser.add_pattern :function, /(?:min|max)/
parser.add_pattern :operand, /d+/, -> (lexeme) { Integer(lexeme) }

parser.add_operator "+", 0, :left, -> (left, right) { left + right }
parser.add_operator "-", 0, :left, -> (left, right) { left - right }
parser.add_operator "*", 1, :left, -> (left, right) { left * right }
parser.add_operator "/", 1, :left, -> (left, right) { left / right }
parser.add_operator "^", 2, :right, -> (left, right) { left ** right }

parser.add_function "min", -> (left, right) { [left, right].min }
parser.add_function "max", -> (left, right) { [left, right].max }

input = "min(max(3, 4 / 2) * 2 ^ 3, 25)"

parser.to_rpn(input).to_s #=> 3 4 2 / max 2 3 ^ * 25 min
parser.evaluate(input) #=> 24

Lexer

ShuntingYard::Lexer is responsible for splitting source string into tokens. To recognize each possible token it needs to know corresponding patterns represented as regular expressions.

If substring is matched to multiple patterns, the first match will be used.

After matching a token, its lexeme is evaluated with provided function. If function is not provided - it returns the lexeme itself.

Tokens values evaluated to nil will not be included into output sequence.

lexer = ShuntingYard::Lexer.new

lexer.add_pattern :operator, /\+/
lexer.add_pattern :space, /\s+/, -> (_) { nil }
lexer.add_pattern :operand, /\d+/, -> (lexeme) { Integer(lexeme) }

puts lexer.tokenize("3 + 5").inspect
[
  #<struct ShuntingYard::Token name=:operand, lexeme="3", value=3>,
  #<struct ShuntingYard::Token name=:operator, lexeme="+", value="+">,
  #<struct ShuntingYard::Token name=:operand, lexeme="5", value=5>
]

First argument in #add_pattern is a token name. It can have any value since Lexer doesn't make assumptions about names.

Interpterer

Once we have token list, it needs to be converted to Revese Polish Notation before evalutation and here Shunting Yard algorithm comes in.

...

interpreter = ShuntingYard::Interpreter.new
interpreter.add_operator "+", 0, :left, -> (left, right) { left + right }

tokens = lexer.tokenize("3 + 5")
puts interpreter.to_rpn(tokens).inspect
[
  #<struct ShuntingYard::Operand value=3>,
  #<struct ShuntingYard::Operand value=5>,
  #<struct ShuntingYard::Operator value="+", precedence=0, associativity=:left, evaluator=#<Proc:...>>
]

Interpreter defines strict names list that are accepted in tokens:

  • :operand - arbitrary value, passed as argument operators and functions
  • :parenthesis - currently interpreter accepts only "(" and ")" parentheses
  • :operator - token value must match to one of registered in interpreter operators
  • :function - token value must match to one of registered in interpreter functions
  • :argument_separator - pattern that defines argument separation in functions (usually comma)

All other token types will not be recognized and interpreter throws ShuntingYard::UnknownTokenError.

Registering functions

#add_function(name, evaluator)

  • name - function name that must match to corresponding token value
  • evaluator - a function that accepts fixed number of arguments and returns single value

Registering operations

#add_operator(name, precedence, associativity, evaluator)

  • name - function name that must match to corresponding token value
  • precedence - operator precedence, can be any integer value
  • associativity - operator associativity, can be either :left or :right
  • evaluator - a function that accepts fixed number of arguments and returns single value

Evaluators

Evaluators for operators and functions can be defined as proc or lambda with at least one argument. Default arguments are not allowed because interpreter uses #arity method to get number of operands required by function / operator.

# Expected format
interpreter.add_operator "%", 0, :left, -> (left, right) { left % right }
interpreter.add_operator "~", 0, :left, proc { |left, right| left % right }

# Will not work
interpreter.add_operator "%", 0, :left, -> (left, right = 5) { left % right }
interpreter.add_function "max", -> (*args) { args.max }

Parser

ShuntingYard::Parser is a proxy class for Lexer and Interpreter.

When you need to recognize and evaluate an expression in one go, Parser object is a single entry point for all actions.

parser = ShuntingYard::Parser.new

# Add patterns to parser's lexer
parser.add_pattern :space, /\s+/, -> (_) { nil }
parser.add_pattern :operator, /[\+\-]/
parser.add_pattern :operand, /d+/, -> (lexeme) { Integer(lexeme) }

# Add operators to parser's interpreter
parser.add_operator "+", 0, :left, -> (left, right) { left + right }
parser.add_operator "-", 0, :left, -> (left, right) { left - right }

input = "5 + 3 - 2"

# Tokenize, convert to RPN and evaluate the expression
parser.evaluate(input) #=> 6