Class: Sneaql::Core::Tokenizer
- Inherits:
-
Object
- Object
- Sneaql::Core::Tokenizer
- Defined in:
- lib/sneaql_lib/tokenizer.rb
Overview
used to process a command string into an array of tokens. the handling here is pretty basic and geared toward providing string literal functionality. a string literal is enclosed in single quotes, with backslash as an escape character. the only escapable characters are single quotes and backslashes. this process does not interpret whether or not a token is valid in any way, it only seeks to break it down reliably. string literal tokens will not have escape characters removed, and will be enclosed in single quotes.
Instance Method Summary collapse
-
#classify(input_char) ⇒ Symbol
classifies a single character during lexical parsing.
-
#classify_all(string) ⇒ Array<Symbol>
returns an array with a classification for each character in input string.
-
#tokenize(string) ⇒ Array<String>
returns an array of tokens.
Instance Method Details
#classify(input_char) ⇒ Symbol
classifies a single character during lexical parsing
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/sneaql_lib/tokenizer.rb', line 97 def classify(input_char) # whitespace delimits tokens not in string lteral return :whitespace if input_char.match(/\s/) # escape character can escape itself return :escape if input_char.match(/\\/) # any word character # also includes - for use in negative numbers return :word if input_char.match(/\w|\-/) # colon is used to represent variables return :colon if input_char.match(/\:/) # indicates start of string literal return :singlequote if input_char.match(/\'/) # deprecated, old variable reference syntax return :openbrace if input_char.match(/\{/) return :closebrace if input_char.match(/\}/) # comparison operator chars return :operator if input_char.match(/\=|\>|\<|\=|\!/) # any non-word characters return :nonword if input_char.match(/\W/) end |
#classify_all(string) ⇒ Array<Symbol>
returns an array with a classification for each character in input string
129 130 131 132 133 134 135 |
# File 'lib/sneaql_lib/tokenizer.rb', line 129 def classify_all(string) classified = [] string.split('').each do |x| classified << classify(x) end classified end |
#tokenize(string) ⇒ Array<String>
returns an array of tokens.
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/sneaql_lib/tokenizer.rb', line 140 def tokenize(string) # perform lexical analysis classified = classify_all(string) # set initial state state = :outside_word # array to collect tokens tokens = [] # will be rebuilt for each token current_token = '' # iterate through each character classified.each_with_index do |c, i| # perform the actions appropriate to character # classification and current state Sneaql::Core.tokenizer_state_map[c][state].each do |action| case when action == :no_action then nil when action == :new_token then # rotate the current token if it is not empty string tokens << current_token unless current_token == '' current_token = '' when action == :concat then # concatenage current character to current token current_token += string[i] when action == :error then raise 'tokenization error' when Sneaql::Core.valid_tokenizer_states.include?(action) # if the action is a state name, set the state state = action end end end # close current token if not empty tokens << current_token unless current_token == '' # return array of tokens tokens end |