Module: Consistency
- Included in:
- Discretizer, FSelector::INTERACT, FSelector::LasVegasFilter, FSelector::LasVegasIncremental
- Defined in:
- lib/fselector/consistency.rb
Overview
data consistency-related functions
Instance Method Summary collapse
-
#get_instance_count(my_data = nil) ⇒ Hash
get the counts of each (unique) instance (without class label) for each class, the resulting Hash table, as suggested by Zheng Zhao and Huan Liu, looks like:.
-
#get_IR(my_data = nil) ⇒ Float
get data inconsistency rate, suitable for single-time calculation.
-
#get_IR_by_count(inst_cnt) ⇒ Float
get data inconsistency rate based on the instance count in Hash table.
-
#get_IR_by_feature(inst_cnt, feats) ⇒ Float
get data inconsistency rate for given features.
Instance Method Details
#get_instance_count(my_data = nil) ⇒ Hash
intended for mulitple calculations, because chekcing data inconsistency rate based on the resultant Hash table is very efficient and avoids reconstructing new data structure and repetitive counting. For instead, you only rebuild the Hash keys and merge relevant counts
get the counts of each (unique) instance (without class label)
for each class, the resulting Hash table, as suggested by Zheng Zhao
and Huan Liu, looks like:
{
'f1:v1|f2:v2|...|fn:vn|' => {k1=>c1, k2=>c2, ..., kn=>cn},
...
}
where we use the (sorted) features and their values to construct
the key for Hash table, i.e., v_i is the value for feature f_i.
Note the symbol : separates a feature and its value, and the
symbol | separates a feature-value pair. In other words, they
should not appear in any feature or its value. If so, please
replace them with other symbols in advance. The c_i is the
instance count for class k_i
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/fselector/consistency.rb', line 32 def get_instance_count(my_data=nil) my_data ||= get_data # use internal data by default inst_cnt = {} my_data.each do |k, ss| ss.each do |s| # sort make sure a same key # : separates a feature and its value # | separates a feature-value pair key = s.keys.sort.collect { |f| "#{f}:#{s[f]}|"}.join inst_cnt[key] ||= Hash.new(0) inst_cnt[key][k] += 1 # for key in class k end end inst_cnt end |
#get_IR(my_data = nil) ⇒ Float
get data inconsistency rate, suitable for single-time calculation
108 109 110 111 112 113 114 115 |
# File 'lib/fselector/consistency.rb', line 108 def get_IR(my_data=nil) my_data ||= get_data # use internal data by default inst_cnt = get_instance_count(my_data) ir = get_IR_by_count(inst_cnt) # inconsistency rate ir end |
#get_IR_by_count(inst_cnt) ⇒ Float
get data inconsistency rate based on the instance count in Hash table
58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/fselector/consistency.rb', line 58 def get_IR_by_count(inst_cnt) incon, sample_size = 0.0, 0.0 inst_cnt.values.each do |hcnt| cnt = hcnt.values incon += cnt.sum-cnt.max sample_size += cnt.sum end # inconsistency rate (sample_size.zero?) ? 0.0 : incon/sample_size end |
#get_IR_by_feature(inst_cnt, feats) ⇒ Float
get data inconsistency rate for given features
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/fselector/consistency.rb', line 79 def get_IR_by_feature(inst_cnt, feats) return 0.0 if feats.empty? # build new inst_count for feats inst_cnt_new = {} inst_cnt.each do |key, hcnt| key_new = feats.sort.collect { |f| match_data = key.match(/#{f}:.*?\|/) match_data[0] if match_data }.compact.join # remove nil entry and join next if key_new.empty? hcnt_new = inst_cnt_new[key_new] || Hash.new(0) # merge cnts inst_cnt_new[key_new] = hcnt_new.merge(hcnt) { |kk, v1, v2| v1+v2 } end # inconsistency rate get_IR_by_count(inst_cnt_new) end |