Class: SVMFeature
Overview
An SVMFeature object is initialized with either a string in YAML format or a file object pointing to a file with configuration given in YAML format.
The SVMFeature class is the ‘middleware’ between software that calculates features and the SVMLab class. SVMFeature’s features are :
-
Maintanence of a database of calculated features. This is done to minimize CPU time needed when developing an SVM experiment. The motto is that a calculation should be done only once and then never again.
-
Meta features can be defined that uses other features to calculate a feature . In these cases, a structure similar to the entire configuration should be given for this feature.
—CONFIGURATION—
SVMFeature is initiated by a configuration file in YAML format. Required fields in configuration are: (all paths can be given either absolute or relative to BaseDir)
Features:
-
<targetfeature>
-
<feature1>
-
<feature2>
… BaseDir: <base directory> DataSet: <path of file giving the dataset>
OR
<prefix of a set of files giving the dataset>
Example: If Dataset is not given, this gives name(s) of examples
to use.
Groups: <range (n1..n2) or (n1…n2) in example name to use for grouping>
OR
<file prefix relative to BaseDir for files giving groups>
Methods: <path of .rb file holding all feature calculation methods>
targetfeature:
HomeDir: <home directory>
If HomeDir is not given, it will be set to BaseDir/featurename
Method: <the method calculating this feature>
If Method is not given, an attempts is made to acquire
the feature from the database. If it fails, ERROR is reported.
Dimensions: <the number of dimensions in this feature>
If Dimensions is not given, it will be assumed to be 1
and only the first value for each example will be used
<Further specific configuration of this feature>
feature1:
HomeDir: <home directory>
Method: <the method calculating this feature>
Dimensions: <the number of dimensions in this feature>
<Further specific configuration of this feature>
feature2: … featureN:
If featureN configuration is not given, then all default settings
will apply to this feature.
…
Instance Attribute Summary collapse
-
#cfg ⇒ Object
readonly
Returns the value of attribute cfg.
-
#dim ⇒ Object
readonly
Returns the value of attribute dim.
Instance Method Summary collapse
-
#[](key) ⇒ Object
— [] — If indexing with a regular expression, a new SVMPrediction object is created containing all elements with matching keys.
- #featname(index) ⇒ Object
-
#getAllFeatures ⇒ Object
Returns a hash for all examples in the data set key : example name value : array of values (float) getAllFeatures is dependent on a data set being given in the configuration.
-
#getDataFile(feature) ⇒ Object
Returns a string giving the path to the file holding the feature for the current settings.
-
#getDataFileToRead(feature) ⇒ Object
Returns a string giving the path to the file holding the feature for the current settings.
- #getDataFileToWrite(feature) ⇒ Object
-
#getExAllFeatures(example) ⇒ Object
Returns an array of floats containing all features for given example.
-
#getExFeature(example, feature) ⇒ Object
Returns an array of floats giving the selected feature of the selected example.
-
#getExFeatureInternal(example, feature) ⇒ Object
Returns a string giving the selected feature of the selected example.
-
#getTopRanking(n = 0) ⇒ Object
Returns a string of the n examples with highest target feature.
-
#initialize(config) ⇒ SVMFeature
constructor
config is either a file object or a string object.
-
#printFeatures(file = nil) ⇒ Object
Prints all examples and their features Prints to a file if given, otherwise to standard output.
- #readDatabase(feature) ⇒ Object
- #to_s ⇒ Object
-
#updateFromDatabases! ⇒ Object
def []=(key,val) STDERR.puts “You can’t just PUT a value in SVMFeature…” nil end.
Methods inherited from Hash
Constructor Details
#initialize(config) ⇒ SVMFeature
config is either a file object or a string object.
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/svmfeature.rb', line 69 def initialize(config) @config = SVMFeaturesConfig.new(config) #Get examples @examples = if dataset = @config['DataSet'] dir = dataset.split(/\//)[0...-1].join('/')+'/' if File::file?(dataset) open(dataset) { |f| f.read }.split elsif (files = Dir::entries(dir). grep(/^#{dataset.split(/\//).last}/)). size>0 files.inject([]) { |exarray,fname| exarray += open(dir+fname){|f| f.read}.split } end elsif example = @config['Example'] if example.is_a? Array example else [ example ] end end # Set @feature to an empty hash @feature = {} #tmparr = @examples.forkoff :processes => 1 do |ex| #tmparr = @examples.forkmap 1 do |ex| tmparr = @examples.map do |ex| begin getExAllFeatures(ex) rescue STDERR.puts $! $! end end @examples.zip(tmparr).each do |k,v| self[k] = v end end |
Instance Attribute Details
#cfg ⇒ Object (readonly)
Returns the value of attribute cfg.
3 4 5 |
# File 'lib/svmfeature2.rb', line 3 def cfg @cfg end |
#dim ⇒ Object (readonly)
Returns the value of attribute dim.
66 67 68 |
# File 'lib/svmfeature.rb', line 66 def dim @dim end |
Instance Method Details
#[](key) ⇒ Object
— [] — If indexing with a regular expression, a new SVMPrediction object is created containing all elements with matching keys.
325 326 327 328 329 330 331 332 333 334 335 |
# File 'lib/svmfeature.rb', line 325 def [](expr) if expr.is_a? Regexp subs = SVMPrediction.new self.find_all { |(k,v)| k =~ expr }.each do |i| subs[i[0]] = i[1] end subs else super(expr) end end |
#featname(index) ⇒ Object
293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
# File 'lib/svmfeature.rb', line 293 def featname(index) i = 0 s = '' @config['Features'].each do |feature| if i + @config[feature]['Dimensions'] > index if s=='' s = if @config[feature]['Dimensions']==1 feature else "#{feature}_#{index - i}" end end else i += @config[feature]['Dimensions'] end end s end |
#getAllFeatures ⇒ Object
Returns a hash for all examples in the data set
key : example name
value : array of values (float)
getAllFeatures is dependent on a data set being given in the configuration
257 258 259 260 261 262 263 264 265 266 |
# File 'lib/svmfeature.rb', line 257 def getAllFeatures() @examples.inject({}) { |output,example| begin output[example] = getExAllFeatures(example) rescue #puts "Excluding #{example}" end output } end |
#getDataFile(feature) ⇒ Object
Returns a string giving the path to the file holding the feature for the current settings. Returns nil if current setting do not match any file.
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/svmfeature.rb', line 110 def getDataFile(feature) dfile = nil filemapfname = @config[feature]['HomeDir'] + 'filemap.yml' return nil if !File.file?(filemapfname) open(filemapfname,'r') do |f| if f.flock(File::LOCK_SH) begin filemap = YAML.load(f) if cfg = filemap.find {|c| c['Config'] == @config[feature]} dfile = @config[feature]['HomeDir'] + cfg['FeatureFile'] else dfile = nil end ensure f.flock(File::LOCK_UN) end end end dfile end |
#getDataFileToRead(feature) ⇒ Object
Returns a string giving the path to the file holding the feature for the current settings. Returns nil if current setting do not match any file.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/svmfeature2.rb', line 56 def getDataFileToRead(feature) dfile = nil filemapfname = File.join(@cfg[feature]['HomeDir'], 'filemap.yml') return nil if !File.file?(filemapfname) open(filemapfname,'r') do |f| if f.flock(File::LOCK_SH) begin filemap = YAML.load(f) if cfg = filemap.find {|c| c['Config'] == @cfg[feature]} dfile = File.join(@cfg[feature]['HomeDir'], cfg['FeatureFile']) else dfile = nil end ensure f.flock(File::LOCK_UN) end end end dfile end |
#getDataFileToWrite(feature) ⇒ Object
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/svmfeature2.rb', line 77 def getDataFileToWrite(feature) return file if file=getDataFileToRead(feature) file = File.join(@cfg[feature]['HomeDir'], 'filemap.yml') open(file,'w') {} if !File.file?(file) open(file,'r+') do |f| if f.flock(File::LOCK_EX) begin filemap = (tmp=YAML.load(f)) ? tmp : [] newdatafile = feature + filemap.size.to_s + '.yml' filemap.push({'FeatureFile' => newdatafile, 'Config' => @config[feature] }) f.rewind YAML.dump(filemap,f) ensure f.flock(File::LOCK_UN) end end end file end |
#getExAllFeatures(example) ⇒ Object
Returns an array of floats containing all features for given example
242 243 244 245 246 247 248 249 250 251 |
# File 'lib/svmfeature.rb', line 242 def getExAllFeatures(example) @config['Features'].inject([]) do |array,feature| f = getExFeature(example, feature) if array.empty? and (c=@config['PosClassFrom']) f.first >= c ? [1] : [-1] else array + f end end end |
#getExFeature(example, feature) ⇒ Object
Returns an array of floats giving the selected feature of the selected example.
132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/svmfeature.rb', line 132 def getExFeature(example, feature) x = getExFeatureInternal(example, feature) if !x then raise "ERROR: #{feature} is nil." end if x =~ /^ERROR/ raise "ERROR (#{feature}): #{x.split[1..-1].join(' ')}" end if (dim=x.split.size) != @config[feature]['Dimensions'] raise "ERROR (#{feature}): Number of dimensions (#{dim})" + " for #{example} is not correct" end x.split.map{ |v| Float(v)} end |
#getExFeatureInternal(example, feature) ⇒ Object
Returns a string giving the selected feature of the selected example.
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
# File 'lib/svmfeature.rb', line 146 def getExFeatureInternal(example,feature) val = nil # 0. Check the hash if @feature[feature] and @feature[feature][example] return @feature[feature][example] end # 1. Look in database if value is available if dfile = getDataFile(feature) open(dfile,'r') do |f| if f.flock(File::LOCK_SH) begin val = if @feature[feature] = YAML.load(f) then @feature[feature][example] else nil end ensure f.flock(File::LOCK_UN) end end end end return val if val # 2. Calculate the value calhash = {} begin method = @config[feature]['Method'] raise ArgumentError, "No method given for calculation" if !method calhash = eval("#{@config[feature]['Method']}(@config[feature],example)") if !calhash.is_a? Hash raise "Incorrect output format from #{@config[feature]['Method']}" end calhash.each do |k,v| if !v.is_a? String raise "Incorrect output class (#{v.class})" + "from #{@config[feature]['Method']}" end end rescue ArgumentError raise rescue NameError raise NameError, "Method #{method} not found." rescue error = 'ERROR:' + $!.to_s.split(/\n/).shift.split(/\:/).pop calhash = {example => error} end if !calhash or !calhash[example] calhash[example] = 'ERROR: No output from method' end val = calhash[example] # Update filemap.yml if !getDataFile(feature) filemapfname = @config[feature]['HomeDir'] + 'filemap.yml' open(filemapfname,'w') {} if !File.file?(filemapfname) open(filemapfname,'r+') do |f| if f.flock(File::LOCK_EX) begin filemap = if tmp = YAML.load(f) then tmp else [] end datafile = feature + filemap.size.to_s + '.yml' filemap.push({'FeatureFile' => datafile, 'Config' => @config[feature] }) f.rewind YAML.dump(filemap,f) ensure f.flock(File::LOCK_UN) end end end end # Add all outcome of calculation to the database dfile = getDataFile(feature) open(dfile,'a+') do |f| if f.flock(File::LOCK_EX) begin oldhash = YAML.load(f) f.puts '--- ' if !oldhash calhash.each do |k,v| if !oldhash or !oldhash[k] h = {k => v} f.puts h.to_yaml[5..-1] end end ensure f.flock(File::LOCK_UN) end end end puts "#{DateTime.now}, Calculated #{example}, #{feature} = #{val}" val end |
#getTopRanking(n = 0) ⇒ Object
Returns a string of the n examples with highest target feature
287 288 289 290 291 |
# File 'lib/svmfeature.rb', line 287 def getTopRanking(n = 0) self.sort{ |(k1,v1),(k2,v2)| v1[0]<=>v2[0]}. reverse[0..(n-1)].map{|i| "#{i.first}\t#{i[1].first}"}. join("\n") end |
#printFeatures(file = nil) ⇒ Object
Prints all examples and their features Prints to a file if given, otherwise to standard output.
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
# File 'lib/svmfeature.rb', line 270 def printFeatures(file = nil) features = self.getAllFeatures if file open(file,'w') do |f| features.each do |k,v| f.puts k + ' ' + v.join(' ') end end else features.each do |k,v| puts k + ' ' + v.join(' ') end end nil end |
#readDatabase(feature) ⇒ Object
39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/svmfeature2.rb', line 39 def readDatabase(feature) file = getDataFileToRead(feature) return {} unless file open(file,'r') do |f| if f.flock(File::LOCK_SH) begin YAML.load(f) ensure f.flock(File::LOCK_UN) end end end end |
#to_s ⇒ Object
313 314 315 316 317 318 319 320 |
# File 'lib/svmfeature.rb', line 313 def to_s (0...@config.dim).map do |i| self.featname(i) end.join(' ') + "\n" + self.keys.sort.map do |key| "#{key} #{self[key].join(' ')}" end.join("\n") end |
#updateFromDatabases! ⇒ Object
def []=(key,val)
STDERR.puts "You can't just PUT a value in SVMFeature..."
nil
end
25 26 27 28 29 30 31 32 33 34 35 36 37 |
# File 'lib/svmfeature2.rb', line 25 def updateFromDatabases! @features = @cfg['Features'].map do |feature| readDatabase(feature) end sizes = @features.map{|h| h.size} minsize = sizes.min index = sizes.index{|i| i==minsize} @features[index].each do |k,v| unless @features.find{|h| !h[k] or h[k]=~/ERROR/} self[k] = @features.map{|h| h[k]}#.join(' ') end end end |