Class: Sneaql::Core::Tokenizer

Inherits:

Object

Object
Sneaql::Core::Tokenizer

show all

Defined in:: lib/sneaql_lib/tokenizer.rb

Overview

used to process a command string into an array of tokens. the handling here is pretty basic and geared toward providing string literal functionality. a string literal is enclosed in single quotes, with backslash as an escape character. the only escapable characters are single quotes and backslashes. this process does not interpret whether or not a token is valid in any way, it only seeks to break it down reliably. string literal tokens will not have escape characters removed, and will be enclosed in single quotes.

Instance Method Summary collapse

#classify(input_char) ⇒ Symbol

classifies a single character during lexical parsing.
#classify_all(string) ⇒ Array<Symbol>

returns an array with a classification for each character in input string.
#tokenize(string) ⇒ Array<String>

returns an array of tokens.

Instance Method Details

#classify(input_char) ⇒ `Symbol`

classifies a single character during lexical parsing

Parameters:

input_char (String) —

single character to classify

Returns:

(Symbol) —

classification for character

# File 'lib/sneaql_lib/tokenizer.rb', line 97

def classify(input_char)
  # whitespace delimits tokens not in string lteral
  return :whitespace if input_char.match(/\s/)

  # escape character can escape itself
  return :escape if input_char.match(/\\/)

  # any word character
  # also includes - for use in negative numbers
  return :word if input_char.match(/\w|\-/)

  # colon is used to represent variables
  return :colon if input_char.match(/\:/)

  # indicates start of string literal
  return :singlequote if input_char.match(/\'/)

  # deprecated, old variable reference syntax
  return :openbrace if input_char.match(/\{/)
  return :closebrace if input_char.match(/\}/)

  # comparison operator chars
  return :operator if input_char.match(/\=|\>|\<|\=|\!/)

  # any non-word characters
  return :nonword if input_char.match(/\W/)
end

#classify_all(string) ⇒ `Array<Symbol>`

returns an array with a classification for each character in input string

Parameters:

string (String)

Returns:

(Array<Symbol>) —

array of classification symbols

# File 'lib/sneaql_lib/tokenizer.rb', line 129

def classify_all(string)
  classified = []
  string.split('').each do |x|
    classified << classify(x)
  end
  classified
end

#tokenize(string) ⇒ `Array<String>`

returns an array of tokens.

Parameters:

string (String) —

command string to tokenize

Returns:

(Array<String>) —

tokens in left to right order

# File 'lib/sneaql_lib/tokenizer.rb', line 140

def tokenize(string)
  # perform lexical analysis
  classified = classify_all(string)

  # set initial state
  state = :outside_word

  # array to collect tokens
  tokens = []

  # will be rebuilt for each token
  current_token = ''

  # iterate through each character
  classified.each_with_index do |c, i|
    # perform the actions appropriate to character
    # classification and current state
    Sneaql::Core.tokenizer_state_map[c][state].each do |action|
      case
      when action == :no_action then
        nil
      when action == :new_token then
        # rotate the current token if it is not empty string
        tokens << current_token unless current_token == ''
        current_token = ''
      when action == :concat then
        # concatenage current character to current token
        current_token += string[i]
      when action == :error then
        raise 'tokenization error'
      when Sneaql::Core.valid_tokenizer_states.include?(action)
        # if the action is a state name, set the state
        state = action
      end
    end
  end
  # close current token if not empty
  tokens << current_token unless current_token == ''

  # return array of tokens
  tokens
end

Class: Sneaql::Core::Tokenizer

Overview

Instance Method Summary collapse

Instance Method Details

#classify(input_char) ⇒ Symbol

#classify_all(string) ⇒ Array<Symbol>

#tokenize(string) ⇒ Array<String>

#classify(input_char) ⇒ `Symbol`

#classify_all(string) ⇒ `Array<Symbol>`

#tokenize(string) ⇒ `Array<String>`