Module: Entropy
- Included in:
- Discretizer, FSelector::CFS_d, FSelector::FastCorrelationBasedFilter, FSelector::INTERACT, FSelector::InformationGain, FSelector::KS_CCBF, FSelector::SymmetricalUncertainty
- Defined in:
- lib/fselector/entropy.rb
Overview
entropy-related functions for discrete data
ref: Wikipedia
Instance Method Summary collapse
-
#get_conditional_entropy(vecX, vecY) ⇒ Float
get the conditional entropy of vector (X) given another vector (Y).
-
#get_information_gain(vecX, vecY) ⇒ Float
get the information gain of vector (X) given another vector (Y).
-
#get_joint_entropy(vecX, vecY) ⇒ Float
get the joint entropy of vector (X) and vector (Y).
-
#get_marginal_entropy(vecX) ⇒ Float
get the marginal entropy of vector (X).
-
#get_symmetrical_uncertainty(vecX, vecY) ⇒ Float
get the symmetrical uncertainty of vector (X) and vector (Y).
Instance Method Details
#get_conditional_entropy(vecX, vecY) ⇒ Float
vecX and vecY must be of same length
get the conditional entropy of vector (X) given another vector (Y)
H(X|Y) = sigma_j (P(y_j) * H(X|y_j))
where H(X|y_j) = -1 * sigma_i (P(x_i|y_j) log2 P(x_i|y_j))
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/fselector/entropy.rb', line 38 def get_conditional_entropy(vecX, vecY) abort "[#{__FILE__}@#{__LINE__}]: \n"+ " two vectors must be of same length" if not vecX.size == vecY.size hxy = 0.0 n = vecX.size.to_f vecY.uniq.each do |y_j| p1 = vecY.count(y_j)/n indices = (0...n).to_a.select { |k| vecY[k] == y_j } xvs = vecX.values_at(*indices) m = xvs.size.to_f xvs.uniq.each do |x_i| p2 = xvs.count(x_i)/m hxy += -1.0 * p1 * (p2 * Math.log2(p2)) end end hxy end |
#get_information_gain(vecX, vecY) ⇒ Float
vecX and vecY must be of same length
get the information gain of vector (X) given another vector (Y)
IG(X;Y) = H(X) - H(X|Y)
= H(Y) - H(Y|X) = IG(Y;X)
92 93 94 |
# File 'lib/fselector/entropy.rb', line 92 def get_information_gain(vecX, vecY) get_marginal_entropy(vecX) - get_conditional_entropy(vecX, vecY) end |
#get_joint_entropy(vecX, vecY) ⇒ Float
vecX and vecY must be of same length
get the joint entropy of vector (X) and vector (Y)
H(X,Y) = H(Y) + H(X|Y)
= H(X) + H(Y|X)
i.e. H(X,Y) == H(Y,X)
76 77 78 |
# File 'lib/fselector/entropy.rb', line 76 def get_joint_entropy(vecX, vecY) get_marginal_entropy(vecY) + get_conditional_entropy(vecX, vecY) end |
#get_marginal_entropy(vecX) ⇒ Float
get the marginal entropy of vector (X)
H(X) = -1 * sigma_i (P(x_i) log2 P(x_i))
14 15 16 17 18 19 20 21 22 23 24 |
# File 'lib/fselector/entropy.rb', line 14 def get_marginal_entropy(vecX) h = 0.0 n = vecX.size.to_f vecX.uniq.each do |x_i| p = vecX.count(x_i)/n h += -1.0 * (p * Math.log2(p)) end h end |
#get_symmetrical_uncertainty(vecX, vecY) ⇒ Float
vecX and vecY must be of same length
get the symmetrical uncertainty of vector (X) and vector (Y)
IG(X;Y)
SU(X;Y) = 2 * -------------
H(X) + H(Y)
H(X) - H(X|Y) H(Y) - H(Y|X)
= 2 * --------------- = 2 * --------------- = SU(Y;X)
H(X) + H(Y) H(X) + H(Y)
113 114 115 116 117 118 119 120 121 122 |
# File 'lib/fselector/entropy.rb', line 113 def get_symmetrical_uncertainty(vecX, vecY) hx = get_marginal_entropy(vecX) hxy = get_conditional_entropy(vecX, vecY) hy = get_marginal_entropy(vecY) su = 0.0 su = 2*(hx-hxy)/(hx+hy) if not (hx+hy).zero? su end |