Class: Tokn::DFA

Inherits:
Object
  • Object
show all
Includes:
ToknInternal
Defined in:
lib/tokn/dfa.rb

Overview

A DFA for tokenizing; includes pointer to a start state, and a list of token names

Constant Summary

Constants included from ToknInternal

ToknInternal::CODEMAX, ToknInternal::CODEMIN, ToknInternal::EPSILON, ToknInternal::UNKNOWN_TOKEN

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from ToknInternal

edgeLabelToTokenId, tokenIdToEdgeLabel

Constructor Details

#initialize(tokenNameList, startState) ⇒ DFA

Construct a DFA, given a list of token names and a starting state.



101
102
103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/tokn/dfa.rb', line 101

def initialize(tokenNameList, startState)
  
  if (startState.id != 0)
    raise ArgumentError, "Start state id must be zero"
  end
  
  @tokenNames = tokenNameList
  @startState = startState
  @tokenIdMap = {}
  @tokenNames.each_with_index do |name, i|
    @tokenIdMap[name] = i
  end
  
end

Instance Attribute Details

#startStateObject (readonly)

Returns the value of attribute startState.



97
98
99
# File 'lib/tokn/dfa.rb', line 97

def startState
  @startState
end

#tokenNamesObject (readonly)

Returns the value of attribute tokenNames.



97
98
99
# File 'lib/tokn/dfa.rb', line 97

def tokenNames
  @tokenNames
end

Class Method Details

.from_file(path) ⇒ Object

Compile a Tokenizer DFA from a text file (that contains a JSON string)



48
49
50
# File 'lib/tokn/dfa.rb', line 48

def self.from_file(path)
  from_json(read_text_file(path))
end

.from_json(jsonStr) ⇒ Object

Compile a Tokenizer DFA from a JSON string



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/tokn/dfa.rb', line 54

def self.from_json(jsonStr)
  db = false
  
  !db|| pr("\n\nextractDFA %s...\n",jsonStr)
  
  h = JSON.parse(jsonStr)
  
  version = h["version"]
  
  if !version || version.floor != VERSION.floor 
    raise ArgumentError, 
       "Bad or missing version number: "+version.to_s+", expected "+VERSION.to_s
  end
  
  tNames = h["tokens"]
  stateInfo = h["states"]
  
  !db|| pr("tokens=%s\n",d(tNames))
  !db|| pr("stateInfo=\n%s\n",d(stateInfo))
  
  st = []
  stateInfo.each_with_index do |(key,val),i|
    !db|| pr(" creating new state, id=%d\n",i)
    st.push(State.new(i))
  end
  
  st.each do |s|
    !db|| pr("proc state %s\n",d(s))
    
    finalState, edgeList = stateInfo[s.id]
    s.finalState = finalState
    edgeList.each do |edge|
      label,destState = edge
      cr = CodeSet.new()
      cr.setArray(label)
      s.addEdge(cr, st[destState])
    end
  end
  
  DFA.new(tNames, st[0])

end

.from_script(script, persistPath = nil) ⇒ Object

Compile a Tokenizer DFA from a token definition script. If persistPath is not null, it first checks if the file exists and if so, assumes it contains (in JSON form) a previously compiled DFA matching this script, and reads the DFA from it.

Second, if no such file exists, it writes the DFA to it after compilation.



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/tokn/dfa.rb', line 20

def self.from_script(script, persistPath = nil)
  
  if persistPath and File.exist?(persistPath)
    return extractDFA(read_text_file(persistPath))
  end
  
  req('token_defn_parser')
  
  td = TokenDefParser.new(script)
  dfa = td.dfa
  
  if persistPath
    write_text_file(persistPath, dfa.serialize())
  end

  dfa  
end

.from_script_file(scriptPath, persistPath = nil) ⇒ Object

Similar to from_script, but reads the script into memory from the file at scriptPath.



41
42
43
# File 'lib/tokn/dfa.rb', line 41

def self.from_script_file(scriptPath, persistPath = nil)
  self.from_script(read_text_file(scriptPath), persistPath)  
end

Instance Method Details

#serializeObject

Serialize this DFA to a JSON string. The DFA in JSON form has this structure:

{
  "version" => version number (float)
  "tokens" => array of token names (strings)
  "states" => array of states, ordered by id (0,1,..)
}

Each state has this format:

[ finalState (boolean),
 [edge0, edge1, ...]
]

Edge:

[label, destination id (integer)]

Labels are arrays of integers, exactly the structure of a CodeSet array.



163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/tokn/dfa.rb', line 163

def serialize 
  
  h = {"version"=>VERSION, "tokens"=>tokenNames}
  
  
  stateSet,_,_ = startState.reachableStates
  
  idToStateMap = {}
  stateSet.each{ |st| idToStateMap[st.id] = st }
  
  stateList = []
  
  nextId = 0
  idToStateMap.each_pair do |id, st|
    if nextId != id
      raise ArgumentError, "unexpected state ids"
    end
    nextId += 1
    
    stateList.push(st)
  end
  
  if stateList.size == 0
    raise ArgumentError, "bad states"
  end
  
  if stateList[0] != startState
    raise ArgumentError, "bad start state"
  end
  
  stateInfo = []
  stateList.each do |st|
      stateInfo.push(stateToList(st))
  end
  h["states"] = stateInfo 
  
  JSON.generate(h)
end

#tokenId(tokenName) ⇒ Object

Get id of token given its name

Parameters:

  • tokenName

    name of token

Returns:

  • nil if there is no token with that name



139
140
141
# File 'lib/tokn/dfa.rb', line 139

def tokenId(tokenName)
  @tokenIdMap[tokenName]
end

#tokenName(tokenId) ⇒ Object

Determine the name of a token, given its id. Returns <UNKNOWN> if its id is UNKNOWN_TOKEN, or <EOF> if the tokenId is nil. Otherwise, assumes tokenId is 0 … n-1, where n is the number of token names in the DFA.



121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/tokn/dfa.rb', line 121

def tokenName(tokenId)
  if !tokenId
    nm = "<EOF>"
  elsif tokenId == UNKNOWN_TOKEN
    nm = "<UNKNOWN>"
  else
    if tokenId < 0 || tokenId >= tokenNames.size
      raise IndexError, "No such token id: "+tokenId.to_s
    end
    nm = tokenNames[tokenId]
  end 
  nm 
end