Class: Stamina::Sample
- Inherits:
-
Object
- Object
- Stamina::Sample
- Includes:
- Enumerable
- Defined in:
- lib/stamina-core/stamina/sample.rb
Overview
A sample as an ordered collection of InputString labeled as positive or negative.
Tips and tricks
-
loading samples from disk is easy thanks to ADL !
Detailed API
Instance Attribute Summary collapse
-
#negative_count ⇒ Object
readonly
Number of negative strings in the sample.
-
#positive_count ⇒ Object
readonly
Number of positive strings in the sample.
-
#size ⇒ Object
readonly
Number of strings in the sample.
Class Method Summary collapse
-
.[](*args) ⇒ Object
Creates an empty sample and appends it with args, by calling Sample#<< on each of them.
-
.coerce(arg) ⇒ Object
Coerces ‘arg` to a Sample instance.
-
.parse(adl) ⇒ Object
Parses an ADL input.
-
.to_pta(sample) ⇒ Object
Converts a Sample to an (augmented) prefix tree acceptor.
Instance Method Summary collapse
-
#+(other) ⇒ Object
Returns a new sample as the union of both ‘self` and `other`.
-
#<<(str) ⇒ Object
Adds a string to the sample.
-
#==(other) ⇒ Object
(also: #eql?)
Compares with another sample other, which is required to be a Sample instance.
-
#correctly_classified_by?(classifier) ⇒ Boolean
Checks if the sample is correctly classified by a given classifier (expected to include the Stamina::Classfier module).
-
#each ⇒ Object
Yields the block with each string.
-
#each_negative ⇒ Object
Yields the block with each negative string.
-
#each_positive ⇒ Object
Yields the block with each positive string.
-
#empty? ⇒ Boolean
Returns true if this sample does not contain any string, false otherwise.
-
#hash ⇒ Object
Computes an hash code for this sample.
-
#include?(str) ⇒ Boolean
Returns true if a given string is included in the sample, false otherwise.
-
#initialize(strings = nil) ⇒ Sample
constructor
Creates an empty sample.
-
#negative_enumerator ⇒ Object
Returns an enumerator on negative strings.
-
#positive_enumerator ⇒ Object
Returns an enumerator on positive strings.
-
#signature ⇒ Object
Computes and returns the binary signature of the sample.
-
#take(proportion = 0.5) ⇒ Object
Takes only a given proportion of this sample and returns it as a new Sample.
-
#to_adl(buffer = "") ⇒ Object
(also: #to_s, #inspect)
Prints an ADL description of this sample on the buffer.
-
#to_cdfa ⇒ Object
Converts this sample to a canonical dfa.
-
#to_dot ⇒ Object
Converts this sample to a dot output.
-
#to_pta ⇒ Object
(also: #to_fa, #to_dfa)
Converts this sample to a PTA.
Constructor Details
#initialize(strings = nil) ⇒ Sample
Creates an empty sample.
31 32 33 34 35 |
# File 'lib/stamina-core/stamina/sample.rb', line 31 def initialize(strings = nil) @strings = [] @size, @positive_count, @negative_count = 0, 0, 0 strings.each{|s| self << s } unless strings.nil? end |
Instance Attribute Details
#negative_count ⇒ Object (readonly)
Number of negative strings in the sample
20 21 22 |
# File 'lib/stamina-core/stamina/sample.rb', line 20 def negative_count @negative_count end |
#positive_count ⇒ Object (readonly)
Number of positive strings in the sample
17 18 19 |
# File 'lib/stamina-core/stamina/sample.rb', line 17 def positive_count @positive_count end |
#size ⇒ Object (readonly)
Number of strings in the sample
14 15 16 |
# File 'lib/stamina-core/stamina/sample.rb', line 14 def size @size end |
Class Method Details
.[](*args) ⇒ Object
Creates an empty sample and appends it with args, by calling Sample#<< on each of them.
26 |
# File 'lib/stamina-core/stamina/sample.rb', line 26 def self.[](*args) Sample.new << args end |
.coerce(arg) ⇒ Object
Coerces ‘arg` to a Sample instance.
40 41 42 43 44 45 46 47 48 |
# File 'lib/stamina-core/stamina/sample.rb', line 40 def self.coerce(arg) if arg.is_a?(Sample) arg elsif arg.is_a?(String) parse(arg) else raise ArgumentError, "Invalid argument #{arg} for `Sample`" end end |
.parse(adl) ⇒ Object
Parses an ADL input
53 54 55 |
# File 'lib/stamina-core/stamina/sample.rb', line 53 def self.parse(adl) ADL::parse_sample(adl) end |
.to_pta(sample) ⇒ Object
Converts a Sample to an (augmented) prefix tree acceptor. This method ensures that the states of the PTA are in lexical order, according to the <=>
operator defined on symbols. States reached by negative strings are tagged as non accepting and error.
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
# File 'lib/stamina-core/stamina/sample.rb', line 235 def self.to_pta(sample) thepta = Automaton.new do |pta| initial_state = add_state(:initial => true, :accepting => false) # Fill the PTA with each string sample.each do |str| # split string using the dfa parsed, reached, remaining = pta.dfa_split(str, initial_state) # remaining symbols are not empty -> build the PTA unless remaining.empty? remaining.each do |symbol| newone = pta.add_state(:initial => false, :accepting => false, :error => false) pta.connect(reached, newone, symbol) reached = newone end end # flag state str.positive? ? reached.accepting! : reached.error! # check consistency, should not arrive as Sample does not allow # inconsistencies. Should appear only if _sample_ is not a Sample # instance but some other enumerable. raise(InconsistencyError, "Inconsistent sample on #{str}", caller)\ if (reached.error? and reached.accepting?) end # Reindex states by applying BFS to_index, index = [initial_state], 0 until to_index.empty? state = to_index.shift state[:__index__] = index state.out_edges.sort{|e,f| e.symbol<=>f.symbol}.each{|e| to_index << e.target} index += 1 end end # Now we rebuild a fresh one with states in order. # This look more efficient that reordering states of the PTA Automaton.new do |ordered| ordered.add_n_states(thepta.state_count) thepta.each_state do |pta_state| source = ordered.ith_state(pta_state[:__index__]) source.initial! if pta_state.initial? source.accepting! if pta_state.accepting? source.error! if pta_state.error? pta_state.out_edges.each do |e| target = ordered.ith_state(e.target[:__index__]) ordered.connect(source, target, e.symbol) end end end end |
Instance Method Details
#+(other) ⇒ Object
Returns a new sample as the union of both ‘self` and `other`
116 117 118 119 120 121 |
# File 'lib/stamina-core/stamina/sample.rb', line 116 def +(other) s = Sample.new each{|x| s << x} other.each{|x| s << x} s end |
#<<(str) ⇒ Object
Adds a string to the sample. The str argument may be an InputString instance, a String (parsed using ADL), a Sample instance (all strings are added) or an Array (recurses on each element).
Raises an InconsistencyError if the same string already exists with the opposite label. Raises an ArgumentError if the str argument is not recognized.
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/stamina-core/stamina/sample.rb', line 73 def <<(str) case str when InputString #raise(InconsistencyError, "Inconsistent sample on #{str}", caller) if self.include?(str.negate) @size += 1 str.positive? ? (@positive_count += 1) : (@negative_count += 1) @strings << str when String self << ADL::parse_string(str) when Sample str.each {|s| self << s} when Array str.each {|s| self << s} else raise(ArgumentError, "#{str} is not a valid argument.", caller) end self end |
#==(other) ⇒ Object Also known as: eql?
Compares with another sample other, which is required to be a Sample instance. Returns true if the two samples contains the same strings (including labels), false otherwise.
128 129 130 |
# File 'lib/stamina-core/stamina/sample.rb', line 128 def ==(other) include?(other) and other.include?(self) end |
#correctly_classified_by?(classifier) ⇒ Boolean
Checks if the sample is correctly classified by a given classifier (expected to include the Stamina::Classfier module). Unlabeled strings are simply ignored.
193 194 195 |
# File 'lib/stamina-core/stamina/sample.rb', line 193 def correctly_classified_by?(classifier) classifier.correctly_classify?(self) end |
#each ⇒ Object
Yields the block with each string. This method has no effect if no block is given.
144 145 146 147 |
# File 'lib/stamina-core/stamina/sample.rb', line 144 def each return unless block_given? @strings.each {|str| yield str} end |
#each_negative ⇒ Object
Yields the block with each negative string. This method has no effect if no block is given.
173 174 175 |
# File 'lib/stamina-core/stamina/sample.rb', line 173 def each_negative each {|str| yield str if str.negative?} end |
#each_positive ⇒ Object
Yields the block with each positive string. This method has no effect if no block is given.
153 154 155 156 |
# File 'lib/stamina-core/stamina/sample.rb', line 153 def each_positive return unless block_given? each {|str| yield str if str.positive?} end |
#empty? ⇒ Boolean
Returns true if this sample does not contain any string, false otherwise.
61 62 63 |
# File 'lib/stamina-core/stamina/sample.rb', line 61 def empty?() @size==0 end |
#hash ⇒ Object
Computes an hash code for this sample.
136 137 138 |
# File 'lib/stamina-core/stamina/sample.rb', line 136 def hash self.inject(37){|memo,str| memo + 17*str.hash} end |
#include?(str) ⇒ Boolean
Returns true if a given string is included in the sample, false otherwise. This method allows same flexibility as << for the str argument.
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/stamina-core/stamina/sample.rb', line 96 def include?(str) case str when InputString @strings.include?(str) when String include?(ADL::parse_string(str)) when Array str.each {|s| return false unless include?(s)} true when Sample str.each {|s| return false unless include?(s)} true else raise(ArgumentError, "#{str} is not a valid argument.", caller) end end |
#negative_enumerator ⇒ Object
Returns an enumerator on negative strings.
180 181 182 183 184 185 186 |
# File 'lib/stamina-core/stamina/sample.rb', line 180 def negative_enumerator if RUBY_VERSION >= "1.9" Enumerator.new(self, :each_negative) else Enumerable::Enumerator.new(self, :each_negative) end end |
#positive_enumerator ⇒ Object
Returns an enumerator on positive strings.
161 162 163 164 165 166 167 |
# File 'lib/stamina-core/stamina/sample.rb', line 161 def positive_enumerator if RUBY_VERSION >= "1.9" Enumerator.new(self, :each_positive) else Enumerable::Enumerator.new(self, :each_positive) end end |
#signature ⇒ Object
Computes and returns the binary signature of the sample. The signature is a String having one character for each string in the sample. A ‘1’ is used for positive strings, ‘0’ for negative ones and ‘?’ for unlabeled.
202 203 204 205 206 207 208 |
# File 'lib/stamina-core/stamina/sample.rb', line 202 def signature signature = '' each do |str| signature << (str.unlabeled? ? '?' : str.positive? ? '1' : '0') end signature end |
#take(proportion = 0.5) ⇒ Object
Takes only a given proportion of this sample and returns it as a new Sample.
213 214 215 216 217 218 |
# File 'lib/stamina-core/stamina/sample.rb', line 213 def take(proportion = 0.5) taken = Stamina::Sample.new each_positive{|s| taken << s if Kernel.rand < proportion} each_negative{|s| taken << s if Kernel.rand < proportion} taken end |
#to_adl(buffer = "") ⇒ Object Also known as: to_s, inspect
Prints an ADL description of this sample on the buffer.
223 224 225 |
# File 'lib/stamina-core/stamina/sample.rb', line 223 def to_adl(buffer="") self.inject(buffer) {|memo,str| memo << "\n" << str.to_adl} end |
#to_cdfa ⇒ Object
Converts this sample to a canonical dfa
299 300 301 |
# File 'lib/stamina-core/stamina/sample.rb', line 299 def to_cdfa to_pta.to_cdfa end |
#to_dot ⇒ Object
Converts this sample to a dot output
304 305 306 |
# File 'lib/stamina-core/stamina/sample.rb', line 304 def to_dot to_pta.to_dot end |