Class: Stamina::Induction::BlueFringe
- Inherits:
-
Object
- Object
- Stamina::Induction::BlueFringe
- Includes:
- Commons
- Defined in:
- lib/stamina-induction/stamina/induction/blue_fringe.rb
Overview
Implementation of the BlueFringe variant of the RPNI algorithm (with the blue-fringe heuristics).
See Lang, K., B. Pearlmutter, andR. Price. 1998. Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm, In Grammatical Inference, pp. 1–12. Ames, IO: Springer-Verlag.
Example:
# sample typically comes from an ADL file
sample = Stamina::ADL.parse_sample_file('sample.adl')
# let BlueFringe build the smallest dfa
dfa = Stamina::Induction::BlueFringe.execute(sample, {:verbose => true})
Remarks:
-
Constructor and instance methods of this class are public but not intended to be used directly. They are left public for testing purposes only.
-
Having read the Stamina::Induction::BlueFringe base algorithm may help undertanding this variant.
-
This class intensively uses the Stamina::Induction::UnionFind class and methods defined in the Stamina::Induction::Commons module which are worth reading to understand the algorithm implementation.
Constant Summary
Constants included from Commons
Instance Attribute Summary collapse
-
#ufds ⇒ Object
readonly
Union-find data structure used internally.
Attributes included from Commons
Class Method Summary collapse
-
.execute(sample, options = {}) ⇒ Object
Build the smallest DFA compatible with the sample given as input.
Instance Method Summary collapse
-
#execute(sample) ⇒ Object
Build the smallest DFA compatible with the sample given as input.
-
#fringe ⇒ Object
Computes the fringe given the current union find.
-
#initialize(options = {}) ⇒ BlueFringe
constructor
Creates an algorithm instance with given options.
-
#main(ufds) ⇒ Object
Main method of the algorithm.
-
#merge_and_determinize(i, j) ⇒ Object
Merges a state of rank j with a state of lower rank i.
-
#merge_and_determinize_score(i, j) ⇒ Object
Evaluates the score of merging states i and j.
-
#merge_score(d1, d2) ⇒ Object
Computes the score of a single (group) merge.
Methods included from Commons
#info, #merge_user_data, #pta2ufds, #sample2pta, #sample2ufds, #ufds2dfa, #verbose?, #verbose_io
Constructor Details
#initialize(options = {}) ⇒ BlueFringe
Creates an algorithm instance with given options.
35 36 37 38 39 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 35 def initialize(={}) raise ArgumentError, "Invalid options #{.inspect}" unless .is_a?(Hash) @options = DEFAULT_OPTIONS.merge() @score_cache = {} end |
Instance Attribute Details
#ufds ⇒ Object (readonly)
Union-find data structure used internally
32 33 34 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 32 def ufds @ufds end |
Class Method Details
.execute(sample, options = {}) ⇒ Object
Build the smallest DFA compatible with the sample given as input.
Options (the options hash):
-
:verbose can be set to true to trace algorithm execution on standard output.
Preconditions:
-
The sample is consistent (does not contains the same string both labeled as positive and negative) and contains at least one string.
Postconditions:
-
The returned DFA is the smallest DFA that correctly labels the learning sample given as input.
258 259 260 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 258 def self.execute(sample, ={}) BlueFringe.new().execute(sample) end |
Instance Method Details
#execute(sample) ⇒ Object
Build the smallest DFA compatible with the sample given as input.
Preconditions:
-
The sample is consistent (does not contains the same string both labeled as positive and negative) and contains at least one string.
Postconditions:
-
The returned DFA is the smallest DFA that correctly labels the learning sample given as input.
Remarks:
-
This instance version of BlueFringe.execute is not intended to be used directly and is mainly provided for testing purposes. Please use the class variant of this method if possible.
234 235 236 237 238 239 240 241 242 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 234 def execute(sample) # create union-find info("Creating PTA and UnionFind structure") ufds = sample2ufds(sample) # refine it ufds = main(ufds) # compute and return quotient automaton ufds2dfa(ufds) end |
#fringe ⇒ Object
Computes the fringe given the current union find. The fringe is returned as an array of state indices.
Postconditions:
-
Returned array contains indices of leader states only.
-
Returned array is disjoint with the kernel.
143 144 145 146 147 148 149 150 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 143 def fringe fringe = [] @kernel.each do |k1| delta = @ufds.mergeable_data(k1)[:delta] delta.each_pair{|symbol, target| fringe << @ufds.find(target)} end (fringe - @kernel).sort end |
#main(ufds) ⇒ Object
Main method of the algorithm. Refines the union find passed as first argument by merging well chosen state pairs. Returns the refined union find.
Preconditions:
-
The union find ufds is correctly initialized (contains :initial, :accepting, and :error boolean flags as well as a :delta sub hash)
Postconditions:
-
The union find has been refined. It encodes a quotient automaton (of the PTA it comes from) such that all positive and negative strings of the underlying sample are correctly classified by it.
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 165 def main(ufds) info("Starting BlueFringe (#{ufds.size} states)") @ufds, @kernel, @score_cache = ufds, [0], {} # we do it until the fringe is empty (compute it only once each step) until (the_fringe=fringe).empty? # state to consolidate (if any) to_consolidate = nil # best candidate [source index, target index, score] best = [nil, nil, -1] # for each state on the fringe as merge candidate the_fringe.each do |candidate| to_consolidate = candidate # evaluate score of merging candidate with each kernel state @kernel.each do |target| score = merge_and_determinize_score(candidate, target) unless score.nil? # if a score has been found, the candidate will not be # consolidated. We keep it as best if its better than the # previous one to_consolidate = nil best = [candidate, target, score] if score > best[2] end end # No possible target, break the loop (will consolidate right now)! break unless to_consolidate.nil? end # If not found, the last candidate must be consolidated. Otherwise, we # do the best merging unless to_consolidate.nil? info("Consolidation of #{to_consolidate}") @kernel << to_consolidate else @score_cache.clear info("Merging #{best[0]} and #{best[1]} [#{best[2]}]") # this one should never fail because its score was positive before raise "Unexpected case" unless merge_and_determinize(best[0], best[1]) end # blue_fringe does not guarantee that it will not merge a state of lower rank # with a kernel state. The kernel should then be update at each step to keep # lowest indices for the whole kernel, and we sort it @kernel = @kernel.collect{|k| @ufds.find(k)}.sort end # return the refined union find now @ufds end |
#merge_and_determinize(i, j) ⇒ Object
Merges a state of rank j with a state of lower rank i. This merge method includes merging for determinization. It returns nil if the merge is incompatible, a merge score otherwise.
Preconditions:
-
States denoted by i and j are expected leader states (non merged ones)
-
States denoted by i and j are expected to be different
Postconditions:
-
Union find is refined, states i and j having been merged, as well as all state pairs that need to be merged to ensure the deterministic property of the quotient automaton.
-
If the resulting quotient automaton is consistent with the negative sample, this method returns the number of accepting pairs + the number of error pairs that have been merged. The refined union-find correctly encodes the quotient automaton. Otherwise, the method returns nil and the union-find information must be considered inaccurate.
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 71 def merge_and_determinize(i, j) # Make the union (keep merging score as well as additional merges to be performed # in score and determinization, respectively). Recompute the user data attached to # the new state group (new_data) determinization, score = [], nil @ufds.union(i, j) do |d1, d2| # states are incompatible if new_data cannot be created because it would # lead to merge and error and an accepting state. We simply return nil in this # case... return nil unless (new_data = merge_user_data(d1, d2, determinization)) # otherwise, we score score = merge_score(d1, d2) # and we let the union find keep the new_data for the group new_data end # Merge for determinization starts here, based on the determinization array # computed as a side effect of merge_user_data determinization.each do |pair| # we take the leader states of the pair to merge pair = pair.collect{|i| @ufds.find(i)} # do nothing if already the same leader state next if pair[0]==pair[1] # otherwise recurse and keep subscore subscore = merge_and_determinize(pair[0], pair[1]) # failure if merging for determinization led to merge error and accepting # states return nil if subscore.nil? # this is the new score score += subscore end score end |
#merge_and_determinize_score(i, j) ⇒ Object
Evaluates the score of merging states i and j. Returns nil if the states are cannot be merged, a positive score otherwise.
Preconditions:
-
States denoted by i and j are expected leader states (non merged ones)
-
States denoted by i and j are expected to be different
Postconditions:
-
Returned value is nil if the quotient automaton would be incompatible with the sample. Otherwise a positive number is returned, encoding the number of interresting pairs that have been merged (interesting = both accepting or both error)
-
The union find is ALWAYS restored to its previous value after merging has been evaluated and is then seen unchanged by the caller.
122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 122 def merge_and_determinize_score(i, j) score = @score_cache[[i,j]] ||= begin # score the merging, always rollback the transaction score = nil @ufds.transactional do score = merge_and_determinize(i, j) false end score || -1 end score == -1 ? nil : score end |
#merge_score(d1, d2) ⇒ Object
Computes the score of a single (group) merge. Returned value is 1 if both are accepting states or both are error states and 0 otherwise. Note that d1 and d2 are expected to be merge compatible as this method does not distinguish this case.
47 48 49 50 |
# File 'lib/stamina-induction/stamina/induction/blue_fringe.rb', line 47 def merge_score(d1, d2) # Score of 1 if both accepting or both error ((d1[:accepting] and d2[:accepting]) or (d1[:error] and d2[:error])) ? 1 : 0 end |