Class: SVMFeature

Inherits:
Hash
  • Object
show all
Defined in:
lib/svmfeature.rb,
lib/svmfeature2.rb

Overview

An SVMFeature object is initialized with either a string in YAML format or a file object pointing to a file with configuration given in YAML format.

The SVMFeature class is the ‘middleware’ between software that calculates features and the SVMLab class. SVMFeature’s features are :

  • Maintanence of a database of calculated features. This is done to minimize CPU time needed when developing an SVM experiment. The motto is that a calculation should be done only once and then never again.

  • Meta features can be defined that uses other features to calculate a feature . In these cases, a structure similar to the entire configuration should be given for this feature.

—CONFIGURATION—

SVMFeature is initiated by a configuration file in YAML format. Required fields in configuration are: (all paths can be given either absolute or relative to BaseDir)


Features:

  • <targetfeature>

  • <feature1>

  • <feature2>

… BaseDir: <base directory> DataSet: <path of file giving the dataset>

 OR
<prefix of a set of files giving the dataset>

Example: If Dataset is not given, this gives name(s) of examples

to use.

Groups: <range (n1..n2) or (n1…n2) in example name to use for grouping>

 OR
<file prefix relative to BaseDir for files giving groups>

Methods: <path of .rb file holding all feature calculation methods>


targetfeature:

HomeDir: <home directory>
    If HomeDir is not given, it will be set to BaseDir/featurename
Method: <the method calculating this feature>
    If Method is not given, an attempts is made to acquire
    the feature from the database. If it fails, ERROR is reported.
Dimensions: <the number of dimensions in this feature>
    If Dimensions is not given, it will be assumed to be 1
    and only the first value for each example will be used
<Further specific configuration of this feature>

feature1:

HomeDir: <home directory>
Method: <the method calculating this feature>
Dimensions: <the number of dimensions in this feature>
<Further specific configuration of this feature>

feature2: … featureN:

If featureN configuration is not given, then all default settings
will apply to this feature.

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from Hash

#cc, #to_xy

Constructor Details

#initialize(config) ⇒ SVMFeature

config is either a file object or a string object.



69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/svmfeature.rb', line 69

def initialize(config)
  @config = SVMFeaturesConfig.new(config)
  #Get examples
  @examples = 
    if dataset = @config['DataSet']
      dir = dataset.split(/\//)[0...-1].join('/')+'/'
      if File::file?(dataset)
        open(dataset) { |f| f.read }.split
      elsif (files = Dir::entries(dir).
             grep(/^#{dataset.split(/\//).last}/)).
             size>0
        files.inject([]) { |exarray,fname| 
          exarray += open(dir+fname){|f| f.read}.split }          
      end
    elsif example = @config['Example']
      if example.is_a? Array
        example
      else
        [ example ]
      end
    end
  # Set @feature to an empty hash
  @feature = {}
  #tmparr = @examples.forkoff :processes => 1 do |ex|
  #tmparr = @examples.forkmap 1 do |ex|
  tmparr = @examples.map do |ex|
    begin
      getExAllFeatures(ex)
    rescue
      STDERR.puts $!
      $!
    end
  end
  @examples.zip(tmparr).each do |k,v|
    self[k] = v
  end
end

Instance Attribute Details

#cfgObject (readonly)

Returns the value of attribute cfg.



3
4
5
# File 'lib/svmfeature2.rb', line 3

def cfg
  @cfg
end

#dimObject (readonly)

Returns the value of attribute dim.



66
67
68
# File 'lib/svmfeature.rb', line 66

def dim
  @dim
end

Instance Method Details

#[](key) ⇒ Object

— [] — If indexing with a regular expression, a new SVMPrediction object is created containing all elements with matching keys.



325
326
327
328
329
330
331
332
333
334
335
# File 'lib/svmfeature.rb', line 325

def [](expr)
  if expr.is_a? Regexp
    subs = SVMPrediction.new
    self.find_all { |(k,v)| k =~ expr }.each do |i|
      subs[i[0]] = i[1]
    end
    subs
  else
    super(expr)
  end
end

#featname(index) ⇒ Object



293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
# File 'lib/svmfeature.rb', line 293

def featname(index)
  i = 0
  s = ''
  @config['Features'].each do |feature|
    if i + @config[feature]['Dimensions'] > index
      if s==''
        s = 
          if @config[feature]['Dimensions']==1
            feature
          else
            "#{feature}_#{index - i}"
          end
      end
    else
      i += @config[feature]['Dimensions']
    end
  end
  s
end

#getAllFeaturesObject

Returns a hash for all examples in the data set

key : example name
value : array of values (float)

getAllFeatures is dependent on a data set being given in the configuration



257
258
259
260
261
262
263
264
265
266
# File 'lib/svmfeature.rb', line 257

def getAllFeatures()
  @examples.inject({}) { |output,example|
    begin
      output[example] = getExAllFeatures(example)
    rescue
      #puts "Excluding #{example}"
    end
    output
  }
end

#getDataFile(feature) ⇒ Object

Returns a string giving the path to the file holding the feature for the current settings. Returns nil if current setting do not match any file.



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/svmfeature.rb', line 110

def getDataFile(feature)
  dfile = nil
  filemapfname = @config[feature]['HomeDir'] + 'filemap.yml'
  return nil if !File.file?(filemapfname)
  open(filemapfname,'r') do |f|
    if f.flock(File::LOCK_SH) 
      begin
        filemap = YAML.load(f)
        if cfg = filemap.find {|c| c['Config'] == @config[feature]}
          dfile = @config[feature]['HomeDir'] + cfg['FeatureFile'] 
        else
          dfile = nil
        end
      ensure
        f.flock(File::LOCK_UN)
      end
    end
  end
  dfile
end

#getDataFileToRead(feature) ⇒ Object

Returns a string giving the path to the file holding the feature for the current settings. Returns nil if current setting do not match any file.



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# File 'lib/svmfeature2.rb', line 56

def getDataFileToRead(feature)
  dfile = nil
  filemapfname = File.join(@cfg[feature]['HomeDir'], 'filemap.yml')
  return nil if !File.file?(filemapfname)
  open(filemapfname,'r') do |f|
    if f.flock(File::LOCK_SH) 
      begin
        filemap = YAML.load(f)
        if cfg = filemap.find {|c| c['Config'] == @cfg[feature]}
          dfile = File.join(@cfg[feature]['HomeDir'], cfg['FeatureFile']) 
        else
          dfile = nil
        end
      ensure
        f.flock(File::LOCK_UN)
      end
    end
  end
  dfile
end

#getDataFileToWrite(feature) ⇒ Object



77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# File 'lib/svmfeature2.rb', line 77

def getDataFileToWrite(feature)
  return file if file=getDataFileToRead(feature)
  file = File.join(@cfg[feature]['HomeDir'], 'filemap.yml')
  open(file,'w') {} if !File.file?(file)
  open(file,'r+') do |f|
    if f.flock(File::LOCK_EX) 
      begin
        filemap = (tmp=YAML.load(f)) ? tmp : []
        newdatafile = feature + filemap.size.to_s + '.yml'
        filemap.push({'FeatureFile' => newdatafile,
                       'Config'     => @config[feature] })
        f.rewind
        YAML.dump(filemap,f)
      ensure
        f.flock(File::LOCK_UN)
      end
    end
  end
  file
end

#getExAllFeatures(example) ⇒ Object

Returns an array of floats containing all features for given example



242
243
244
245
246
247
248
249
250
251
# File 'lib/svmfeature.rb', line 242

def getExAllFeatures(example)
  @config['Features'].inject([]) do |array,feature|
    f = getExFeature(example, feature)
    if array.empty? and (c=@config['PosClassFrom'])
      f.first >= c ? [1] : [-1]
    else
      array + f 
    end
  end
end

#getExFeature(example, feature) ⇒ Object

Returns an array of floats giving the selected feature of the selected example.



132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/svmfeature.rb', line 132

def getExFeature(example, feature)
  x = getExFeatureInternal(example, feature)
  if !x then raise "ERROR: #{feature} is nil." end
  if x =~ /^ERROR/
    raise "ERROR (#{feature}): #{x.split[1..-1].join(' ')}"
  end
  if (dim=x.split.size) != @config[feature]['Dimensions']
    raise "ERROR (#{feature}): Number of dimensions (#{dim})" + 
          " for #{example} is not correct"
  end
  x.split.map{ |v| Float(v)}
end

#getExFeatureInternal(example, feature) ⇒ Object

Returns a string giving the selected feature of the selected example.



146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# File 'lib/svmfeature.rb', line 146

def getExFeatureInternal(example,feature)
  val = nil

  # 0. Check the hash
  if @feature[feature] and @feature[feature][example]
    return @feature[feature][example]
  end

  # 1. Look in database if value is available
  if dfile = getDataFile(feature)
    open(dfile,'r') do |f|
      if f.flock(File::LOCK_SH) 
        begin
          val = if @feature[feature] = YAML.load(f) 
                then @feature[feature][example]
                else nil end
        ensure
          f.flock(File::LOCK_UN)
        end
      end
    end
  end
  return val if val

  # 2. Calculate the value
  calhash = {}
  begin
    method = @config[feature]['Method']
    raise ArgumentError, "No method given for calculation" if !method
    calhash = eval("#{@config[feature]['Method']}(@config[feature],example)")
    if !calhash.is_a? Hash
      raise "Incorrect output format from #{@config[feature]['Method']}"
    end
    calhash.each do |k,v|
      if !v.is_a? String
        raise "Incorrect output class (#{v.class})" +
              "from #{@config[feature]['Method']}"
      end
    end
  rescue ArgumentError
    raise 
  rescue NameError
    raise NameError, "Method #{method} not found."
  rescue
    error = 'ERROR:' + $!.to_s.split(/\n/).shift.split(/\:/).pop
    calhash = {example => error}
  end
  if !calhash or !calhash[example]
    calhash[example] = 'ERROR: No output from method'
  end
  val = calhash[example]

  # Update filemap.yml
  if !getDataFile(feature)
    filemapfname = @config[feature]['HomeDir'] + 'filemap.yml'
    open(filemapfname,'w') {} if !File.file?(filemapfname)
    open(filemapfname,'r+') do |f|
      if f.flock(File::LOCK_EX) 
        begin
          filemap = if tmp = YAML.load(f) then tmp
                    else [] end
          datafile = feature + filemap.size.to_s + '.yml'
          filemap.push({'FeatureFile' => datafile,
                         'Config'     => @config[feature] })
          f.rewind
          YAML.dump(filemap,f)
        ensure
          f.flock(File::LOCK_UN)
        end
      end
    end
  end

  # Add all outcome of calculation to the database
  dfile = getDataFile(feature)
  open(dfile,'a+') do |f|
    if f.flock(File::LOCK_EX) 
      begin
        oldhash = YAML.load(f)
        f.puts '--- ' if !oldhash 
        calhash.each do |k,v|
          if !oldhash or !oldhash[k]
            h = {k => v}
            f.puts h.to_yaml[5..-1]
          end
        end
      ensure
        f.flock(File::LOCK_UN)
      end
    end
  end
  puts "#{DateTime.now}, Calculated #{example}, #{feature} = #{val}"
  val
end

#getTopRanking(n = 0) ⇒ Object

Returns a string of the n examples with highest target feature



287
288
289
290
291
# File 'lib/svmfeature.rb', line 287

def getTopRanking(n = 0)
  self.sort{ |(k1,v1),(k2,v2)| v1[0]<=>v2[0]}.
    reverse[0..(n-1)].map{|i| "#{i.first}\t#{i[1].first}"}.
    join("\n")
end

#printFeatures(file = nil) ⇒ Object

Prints all examples and their features Prints to a file if given, otherwise to standard output.



270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
# File 'lib/svmfeature.rb', line 270

def printFeatures(file = nil)
  features = self.getAllFeatures
  if file
    open(file,'w') do |f|
      features.each do |k,v|
        f.puts k + ' ' + v.join(' ')
      end
    end
  else
    features.each do |k,v|
      puts k + ' ' + v.join(' ')
    end
  end      
  nil
end

#readDatabase(feature) ⇒ Object



39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/svmfeature2.rb', line 39

def readDatabase(feature)
  file = getDataFileToRead(feature)
  return {} unless file
  open(file,'r') do |f|
    if f.flock(File::LOCK_SH) 
      begin
        YAML.load(f) 
      ensure
        f.flock(File::LOCK_UN)
      end
    end
  end
end

#to_sObject



313
314
315
316
317
318
319
320
# File 'lib/svmfeature.rb', line 313

def to_s
  (0...@config.dim).map do |i|
    self.featname(i)
  end.join(' ') + "\n" +
  self.keys.sort.map do |key|
    "#{key} #{self[key].join(' ')}"
  end.join("\n")
end

#updateFromDatabases!Object

def []=(key,val)

STDERR.puts "You can't just PUT a value in SVMFeature..."
nil

end



25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/svmfeature2.rb', line 25

def updateFromDatabases!
  @features = @cfg['Features'].map do |feature|
             readDatabase(feature)
           end
  sizes = @features.map{|h| h.size}
  minsize = sizes.min
  index = sizes.index{|i| i==minsize}
  @features[index].each do |k,v|
    unless @features.find{|h| !h[k] or h[k]=~/ERROR/}
      self[k] = @features.map{|h| h[k]}#.join(' ')
    end
  end
end