Module: Stamina::Induction::Commons
- Included in:
- BlueFringe, RPNI
- Defined in:
- lib/stamina-induction/stamina/induction/commons.rb
Overview
Defines common utilities used by rpni and blue_fringe. About acronyms:
-
pta stands for Prefix Tree Acceptor
-
ufds stands for Union-Find Data Structure
Methods pta2ufds and sample2ufds are simply conversion methods used when the induction algorithm starts (executed on a sample, it first built a pta then convert it to a union find). Method ufds2dfa is used when the algorithm ends, to convert refined union find to a dfa.
The merge_user_data method is probably the most important as it actually computes the merging of two states and build information about merging for determinization.
Constant Summary collapse
- DEFAULT_OPTIONS =
{ :verbose => false, :verbose_io => $stderr }
Instance Attribute Summary collapse
-
#options ⇒ Object
readonly
Additional options of the algorithm.
Instance Method Summary collapse
-
#info(msg) ⇒ Object
Display an information message (when verbose).
-
#merge_user_data(d1, d2, determinization) ⇒ Object
Merges two user data hashes d1 and d2 according to rules defined below.
-
#pta2ufds(pta) ⇒ Object
Factors and returns a UnionFind data structure from a PTA, keeping natural order of its states for union-find elements.
-
#sample2pta(sample) ⇒ Object
Converts a Sample to an (augmented) prefix tree acceptor.
-
#sample2ufds(sample) ⇒ Object
Converts a Sample instance to a ‘ready to refine’ union find data structure.
-
#ufds2dfa(ufds) ⇒ Object
Computes the quotient automaton from a refined UnionFind data structure.
-
#verbose? ⇒ Boolean
Is the verbose mode on ?.
- #verbose_io ⇒ Object
Instance Attribute Details
#options ⇒ Object (readonly)
Additional options of the algorithm
25 26 27 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 25 def @options end |
Instance Method Details
#info(msg) ⇒ Object
Display an information message (when verbose)
37 38 39 40 41 42 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 37 def info(msg) if verbose? verbose_io << msg << "\n" verbose_io.flush end end |
#merge_user_data(d1, d2, determinization) ⇒ Object
Merges two user data hashes d1 and d2 according to rules defined below. Also fills a determinization array with pairs of state indices that are reached from d1 and d2 through the same symbol and should be merged for determinization. This method does NOT ensure that those pairs correspond to distinguish states according to the union find. In other words state indices in these pairs do not necessarily corespond to master states (see UnionFind for this term).
Returns the resulting data if the merge is successful (does not lead to merging an error state with an accepting one), nil otherwise.
The merging procedure for the different hash keys is as follows:
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 132 def merge_user_data(d1, d2, determinization) # we compute flags first new_data = {:initial => d1[:initial] || d2[:initial], :accepting => d1[:accepting] || d2[:accepting], :error => d1[:error] || d2[:error], :master => d1[:master] < d2[:master] ? d1[:master] : d2[:master]} # merge failure if accepting and error states are merged return nil if new_data[:accepting] and new_data[:error] # we recompute the delta function of the resulting state # keeping merging for determinization as pairs in _determinization_ new_data[:delta] = d1[:delta].merge(d2[:delta]) do |symbol, t1, t2| determinization << [t1, t2] t1 < t2 ? t1 : t2 end # returns merged data new_data end |
#pta2ufds(pta) ⇒ Object
Factors and returns a UnionFind data structure from a PTA, keeping natural order of its states for union-find elements. The resulting UnionFind contains a Hash as mergeable user data, presenting the following keys:
-
:initial, :accepting and :error flags of each state
-
:master indicating the index of the state in the PTA
-
:delta a delta function through a Hash => state_index
In this version, other user data attached to PTA states is lost during the conversion.
55 56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 55 def pta2ufds(pta) Stamina::Induction::UnionFind.new(pta.state_count) do |i| state = pta.ith_state(i) data = {:initial => state.initial?, :accepting => state.accepting?, :error => state.error?, :master => i, :delta => {}} state.out_edges.each {|edge| data[:delta][edge.symbol] = edge.target.index} data end end |
#sample2pta(sample) ⇒ Object
Converts a Sample to an (augmented) prefix tree acceptor. This method ensures that the states of the PTA are in lexical order, according to the <=>
operator defined on symbols. States reached by negative strings are tagged as non accepting and error.
74 75 76 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 74 def sample2pta(sample) sample.to_pta end |
#sample2ufds(sample) ⇒ Object
Converts a Sample instance to a ‘ready to refine’ union find data structure. This method is simply a shortcut for pta2ufds(sample2pta(sample))
.
82 83 84 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 82 def sample2ufds(sample) pta2ufds(sample2pta(sample)) end |
#ufds2dfa(ufds) ⇒ Object
Computes the quotient automaton from a refined UnionFind data structure.
In this version, only accepting and initial flags are taken into account when creating quotient automaton states. Other user data is lost during the conversion.
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 93 def ufds2dfa(ufds) Automaton.new(false) do |fa| mergeable_datas = ufds.mergeable_datas mergeable_datas.each do |data| state_data = data.reject {|key,value| [:master, :count, :delta].include?(key)} state_data[:name] = data[:master].to_s state_data[:error] = false fa.add_state(state_data) end mergeable_datas.each do |data| source = fa.get_state(data[:master].to_s) data[:delta].each_pair do |symbol, target| target = fa.get_state(ufds.find(target).to_s) fa.connect(source, target, symbol) end end end end |
#verbose? ⇒ Boolean
Is the verbose mode on ?
28 29 30 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 28 def verbose? @verbose ||= !![:verbose] end |
#verbose_io ⇒ Object
32 33 34 |
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 32 def verbose_io @verbose_io ||= [:verbose_io] || $stderr end |