Module: OpenTox::Algorithm::Similarity
- Defined in:
- lib/utils.rb
Overview
Similarity calculations
Class Method Summary collapse
-
.cosine(fingerprints_a, fingerprints_b, weights = nil) ⇒ Float
Cosine similarity.
-
.cosine_num(a, b) ⇒ Float
Cosine similarity.
-
.outliers(params) ⇒ Object
Outlier detection based on Mahalanobis distances Multivariate detection on X, univariate detection on y Uses an existing Rinruby instance, if possible @param Keys query_matrix, data_matrix, acts are required; r, p_outlier optional @return indices identifying outliers (may occur several times, this is intended).
-
.tanimoto(fingerprints_a, fingerprints_b, weights = nil, params = nil) ⇒ Float
Tanimoto similarity.
Class Method Details
.cosine(fingerprints_a, fingerprints_b, weights = nil) ⇒ Float
Cosine similarity
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 |
# File 'lib/utils.rb', line 550 def self.cosine(fingerprints_a,fingerprints_b,weights=nil) # fingerprints are hashes if fingerprints_a.class == Hash && fingerprints_b.class == Hash a = []; b = [] common_features = fingerprints_a.keys & fingerprints_b.keys if common_features.size > 1 common_features.each do |p| a << fingerprints_a[p] b << fingerprints_b[p] end end # fingerprints are arrays elsif fingerprints_a.class == Array && fingerprints_b.class == Array a = fingerprints_a b = fingerprints_b end (a.size > 0 && b.size > 0) ? self.cosine_num(a.to_gv, b.to_gv) : 0.0 end |
.cosine_num(a, b) ⇒ Float
Cosine similarity
578 579 580 581 582 583 584 |
# File 'lib/utils.rb', line 578 def self.cosine_num(a, b) if a.size>12 && b.size>12 a = a[0..11] b = b[0..11] end a.dot(b) / (a.norm * b.norm) end |
.outliers(params) ⇒ Object
Outlier detection based on Mahalanobis distances Multivariate detection on X, univariate detection on y Uses an existing Rinruby instance, if possible @param Keys query_matrix, data_matrix, acts are required; r, p_outlier optional @return indices identifying outliers (may occur several times, this is intended)
592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 |
# File 'lib/utils.rb', line 592 def self.outliers(params) outlier_array = [] data_matrix = params[:data_matrix] query_matrix = params[:query_matrix] acts = params[:acts] begin LOGGER.debug "Outliers (p=#{params[:p_outlier] || 0.9999})..." r = ( params[:r] || RinRuby.new(false,false) ) r.eval "suppressPackageStartupMessages(library(\"robustbase\"))" r.eval "outlier_threshold = #{params[:p_outlier] || 0.999}" nr_cases, nr_features = data_matrix.to_a.size, data_matrix.to_a[0].size r.odx = data_matrix.to_a.flatten r.q = query_matrix.to_a.flatten r.y = acts.to_a.flatten r.eval "odx = matrix(odx, #{nr_cases}, #{nr_features}, byrow=T)" r.eval 'odx = rbind(q,odx)' # query is nr 0 (1) in ruby (R) r.eval 'mah = covMcd(odx)$mah' # run MCD alg r.eval "mah = pchisq(mah,#{nr_features})" r.eval 'outlier_array = which(mah>outlier_threshold)' # multivariate outliers using robust mahalanobis outlier_array = r.outlier_array.to_a.collect{|v| v-2 } # translate to ruby index (-1 for q, -1 due to ruby) r.eval 'fqu = matrix(summary(y))[2]' r.eval 'tqu = matrix(summary(y))[5]' r.eval 'outlier_array = which(y>(tqu+1.5*IQR(y)))' # univariate outliers due to Tukey (http://goo.gl/mwzNH) outlier_array += r.outlier_array.to_a.collect{|v| v-1 } # translate to ruby index (-1 due to ruby) r.eval 'outlier_array = which(y<(fqu-1.5*IQR(y)))' outlier_array += r.outlier_array.to_a.collect{|v| v-1 } rescue Exception => e LOGGER.debug "#{e.class}: #{e.}" #LOGGER.debug "Backtrace:\n\t#{e.backtrace.join("\n\t")}" end outlier_array end |
.tanimoto(fingerprints_a, fingerprints_b, weights = nil, params = nil) ⇒ Float
Tanimoto similarity
517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 |
# File 'lib/utils.rb', line 517 def self.tanimoto(fingerprints_a,fingerprints_b,weights=nil,params=nil) common_p_sum = 0.0 all_p_sum = 0.0 # fingerprints are hashes if fingerprints_a.class == Hash && fingerprints_b.class == Hash common_features = fingerprints_a.keys & fingerprints_b.keys all_features = (fingerprints_a.keys + fingerprints_b.keys).uniq if common_features.size > 0 common_features.each{ |f| common_p_sum += [ fingerprints_a[f], fingerprints_b[f] ].min } all_features.each{ |f| all_p_sum += [ fingerprints_a[f],fingerprints_b[f] ].compact.max } # compact, since one fp may be empty at that pos end # fingerprints are arrays elsif fingerprints_a.class == Array && fingerprints_b.class == Array size = [ fingerprints_a.size, fingerprints_b.size ].min LOGGER.warn "fingerprints don't have equal size" if fingerprints_a.size != fingerprints_b.size (0...size).each { |idx| common_p_sum += [ fingerprints_a[idx], fingerprints_b[idx] ].min all_p_sum += [ fingerprints_a[idx], fingerprints_b[idx] ].max } end (all_p_sum > 0.0) ? (common_p_sum/all_p_sum) : 0.0 end |