Class: Eluka::Model
- Inherits:
-
Object
- Object
- Eluka::Model
- Includes:
- Ferret::Analysis
- Defined in:
- lib/eluka/model.rb
Overview
A binary classifier classifies data into two classes given a category (the class label)
-
Data which is indicative of the category – positive data
-
Data which is not indicative of the category – negative data
Model
A classifier model observes positve and negative data and learns the properties of each set. In the future if given an unlabelled data point it decides whether the the data point is a positive or negative instance of the category.
Internal Data Representation
A classifier model internally represents a data instance as a point in a vector space The dimensions of the vector space are termed as features
Eluka::Model
An Eluka model takes a hash of features and their values and internally processes them as points in a vector space. If the input is a string of words like in a document then it relies on Ferret’s text anaysis modules to convert it into a data point
Instance Method Summary collapse
-
#add(data, label) ⇒ Object
Add a data point to the training data.
-
#build(features = nil) ⇒ Object
Build a model from the training data using LibSVM.
-
#classify(data, features = nil) ⇒ Object
Classify a data point.
-
#initialize(params = {}) ⇒ Model
constructor
Initialize the classifier with sane defaults if customised data is not provided.
-
#suggest_features ⇒ Object
Suggests the best set of features chosen using fselect.py IMPROVE: Depending on fselect.py (an unnecessary python dependency) is stupid TODO: Finish wirting fselect.rb and integrate it.
Constructor Details
#initialize(params = {}) ⇒ Model
Initialize the classifier with sane defaults if customised data is not provided
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/eluka/model.rb', line 29 def initialize (params = {}) #Set the labels @labels = Bijection.new @labels[:positive] = 1 @labels[:negative] = -1 @labels[:unknown] = 0 @gem_root = File.(File.join(File.dirname(__FILE__), '..')) @bin_dir = File.(File.join(File.dirname(@gem_root), 'bin')) @analyzer = StandardAnalyzer.new @features = Eluka::Features.new @fv_train = Eluka::FeatureVectors.new(@features, true) @fv_test = nil @directory = (params[:directory] or "/tmp") @svm_train_path = (params[:svm_train_path] or "#{@bin_dir}/eluka-svm-train") @svm_scale_path = (params[:svm_scale_path] or "#{@bin_dir}/eluka-svm-scale") @svm_predict_path = (params[:svm_predict_path] or "#{@bin_dir}/eluka-svm-predict") @grid_py_path = (params[:grid_py_path] or "python rsvm/tools/grid.py") @fselect_py_path = (params[:fselect_py_path] or "python rsvm/tools/fselect.py") @verbose = (params[:verbose] or false) #Convert directory to absolute path Dir.chdir(@directory) do @directory = Dir.pwd end end |
Instance Method Details
#add(data, label) ⇒ Object
Add a data point to the training data
58 59 60 61 62 63 |
# File 'lib/eluka/model.rb', line 58 def add (data, label) raise "No meaningful label associated with data" unless ([:positive, :negative].include? label) data_point = Eluka::DataPoint.new(data, @analyzer) @fv_train.add(data_point.vector, @labels[label]) end |
#build(features = nil) ⇒ Object
Build a model from the training data using LibSVM
67 68 69 70 71 72 73 74 75 76 |
# File 'lib/eluka/model.rb', line 67 def build (features = nil) File.open(@directory + "/train", "w") do |f| f.puts @fv_train.to_libSVM(features) end output = `#{@svm_train_path} #{@directory}/train #{@directory}/model` puts output if (@verbose) @fv_test = Eluka::FeatureVectors.new(@features, false) return output end |
#classify(data, features = nil) ⇒ Object
Classify a data point
80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/eluka/model.rb', line 80 def classify (data, features = nil) raise "Untrained model" unless (@fv_test) data_point = Eluka::DataPoint.new(data, @analyzer) @fv_test.add(data_point.vector) File.open(@directory + "/classify", "w") do |f| f.puts @fv_test.to_libSVM(features) end output = `#{@svm_predict_path} #{@directory}/classify #{@directory}/model #{@directory}/result` puts output if (@verbose) return @labels.lookup( File.open( @directory + "/result", "r" ).read.to_i ) end |
#suggest_features ⇒ Object
Suggests the best set of features chosen using fselect.py IMPROVE: Depending on fselect.py (an unnecessary python dependency) is stupid TODO: Finish wirting fselect.rb and integrate it
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/eluka/model.rb', line 98 def suggest_features sel_features = Array.new File.open(@directory + "/train", "w") do |f| f.puts @fv_train.to_libSVM end Dir.chdir('./rsvm/bin/tools') do output = `python fselect.py #{@directory}/train` puts output if (@verbose) x = File.read("train.select") sel_f_ids = x[1..-2].split(", ") sel_f_ids.each do |f| s_f = @features.term(f.to_i) if s_f.instance_of? String then s_f = s_f.split("||") s_f[0] = s_f[0].to_sym end sel_features.push(s_f) end #Remove temporary files File.delete("train.select") if File.exist?("train.select") File.delete("train.fscore") if File.exist?("train.fscore") File.delete("train.tr.out") if File.exist?("train.tr.out") end return sel_features end |