Class: Tokn::DFA
Overview
A DFA for tokenizing; includes pointer to a start state, and a list of token names
Constant Summary
Constants included from ToknInternal
ToknInternal::CODEMAX, ToknInternal::CODEMIN, ToknInternal::EPSILON, ToknInternal::UNKNOWN_TOKEN
Instance Attribute Summary collapse
-
#startState ⇒ Object
readonly
Returns the value of attribute startState.
-
#tokenNames ⇒ Object
readonly
Returns the value of attribute tokenNames.
Class Method Summary collapse
-
.from_file(path) ⇒ Object
Compile a Tokenizer DFA from a text file (that contains a JSON string).
-
.from_json(jsonStr) ⇒ Object
Compile a Tokenizer DFA from a JSON string.
-
.from_script(script, persistPath = nil) ⇒ Object
Compile a Tokenizer DFA from a token definition script.
-
.from_script_file(scriptPath, persistPath = nil) ⇒ Object
Similar to from_script, but reads the script into memory from the file at scriptPath.
Instance Method Summary collapse
-
#initialize(tokenNameList, startState) ⇒ DFA
constructor
Construct a DFA, given a list of token names and a starting state.
-
#serialize ⇒ Object
Serialize this DFA to a JSON string.
-
#tokenId(tokenName) ⇒ Object
Get id of token given its name.
-
#tokenName(tokenId) ⇒ Object
Determine the name of a token, given its id.
Methods included from ToknInternal
edgeLabelToTokenId, tokenIdToEdgeLabel
Constructor Details
#initialize(tokenNameList, startState) ⇒ DFA
Construct a DFA, given a list of token names and a starting state.
101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/tokn/dfa.rb', line 101 def initialize(tokenNameList, startState) if (startState.id != 0) raise ArgumentError, "Start state id must be zero" end @tokenNames = tokenNameList @startState = startState @tokenIdMap = {} @tokenNames.each_with_index do |name, i| @tokenIdMap[name] = i end end |
Instance Attribute Details
#startState ⇒ Object (readonly)
Returns the value of attribute startState.
97 98 99 |
# File 'lib/tokn/dfa.rb', line 97 def startState @startState end |
#tokenNames ⇒ Object (readonly)
Returns the value of attribute tokenNames.
97 98 99 |
# File 'lib/tokn/dfa.rb', line 97 def tokenNames @tokenNames end |
Class Method Details
.from_file(path) ⇒ Object
Compile a Tokenizer DFA from a text file (that contains a JSON string)
48 49 50 |
# File 'lib/tokn/dfa.rb', line 48 def self.from_file(path) from_json(read_text_file(path)) end |
.from_json(jsonStr) ⇒ Object
Compile a Tokenizer DFA from a JSON string
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/tokn/dfa.rb', line 54 def self.from_json(jsonStr) db = false !db|| pr("\n\nextractDFA %s...\n",jsonStr) h = JSON.parse(jsonStr) version = h["version"] if !version || version.floor != VERSION.floor raise ArgumentError, "Bad or missing version number: "+version.to_s+", expected "+VERSION.to_s end tNames = h["tokens"] stateInfo = h["states"] !db|| pr("tokens=%s\n",d(tNames)) !db|| pr("stateInfo=\n%s\n",d(stateInfo)) st = [] stateInfo.each_with_index do |(key,val),i| !db|| pr(" creating new state, id=%d\n",i) st.push(State.new(i)) end st.each do |s| !db|| pr("proc state %s\n",d(s)) finalState, edgeList = stateInfo[s.id] s.finalState = finalState edgeList.each do |edge| label,destState = edge cr = CodeSet.new() cr.setArray(label) s.addEdge(cr, st[destState]) end end DFA.new(tNames, st[0]) end |
.from_script(script, persistPath = nil) ⇒ Object
Compile a Tokenizer DFA from a token definition script. If persistPath is not null, it first checks if the file exists and if so, assumes it contains (in JSON form) a previously compiled DFA matching this script, and reads the DFA from it.
Second, if no such file exists, it writes the DFA to it after compilation.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/tokn/dfa.rb', line 20 def self.from_script(script, persistPath = nil) if persistPath and File.exist?(persistPath) return extractDFA(read_text_file(persistPath)) end req('token_defn_parser') td = TokenDefParser.new(script) dfa = td.dfa if persistPath write_text_file(persistPath, dfa.serialize()) end dfa end |
.from_script_file(scriptPath, persistPath = nil) ⇒ Object
Similar to from_script, but reads the script into memory from the file at scriptPath.
41 42 43 |
# File 'lib/tokn/dfa.rb', line 41 def self.from_script_file(scriptPath, persistPath = nil) self.from_script(read_text_file(scriptPath), persistPath) end |
Instance Method Details
#serialize ⇒ Object
Serialize this DFA to a JSON string. The DFA in JSON form has this structure:
{
"version" => version number (float)
"tokens" => array of token names (strings)
"states" => array of states, ordered by id (0,1,..)
}
Each state has this format:
[ finalState (boolean),
[edge0, edge1, ...]
]
Edge:
[label, destination id (integer)]
Labels are arrays of integers, exactly the structure of a CodeSet array.
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/tokn/dfa.rb', line 163 def serialize h = {"version"=>VERSION, "tokens"=>tokenNames} stateSet,_,_ = startState.reachableStates idToStateMap = {} stateSet.each{ |st| idToStateMap[st.id] = st } stateList = [] nextId = 0 idToStateMap.each_pair do |id, st| if nextId != id raise ArgumentError, "unexpected state ids" end nextId += 1 stateList.push(st) end if stateList.size == 0 raise ArgumentError, "bad states" end if stateList[0] != startState raise ArgumentError, "bad start state" end stateInfo = [] stateList.each do |st| stateInfo.push(stateToList(st)) end h["states"] = stateInfo JSON.generate(h) end |
#tokenId(tokenName) ⇒ Object
Get id of token given its name
139 140 141 |
# File 'lib/tokn/dfa.rb', line 139 def tokenId(tokenName) @tokenIdMap[tokenName] end |
#tokenName(tokenId) ⇒ Object
Determine the name of a token, given its id. Returns <UNKNOWN> if its id is UNKNOWN_TOKEN, or <EOF> if the tokenId is nil. Otherwise, assumes tokenId is 0 … n-1, where n is the number of token names in the DFA.
121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/tokn/dfa.rb', line 121 def tokenName(tokenId) if !tokenId nm = "<EOF>" elsif tokenId == UNKNOWN_TOKEN nm = "<UNKNOWN>" else if tokenId < 0 || tokenId >= tokenNames.size raise IndexError, "No such token id: "+tokenId.to_s end nm = tokenNames[tokenId] end nm end |