Class: Nimbus::Tree
- Inherits:
-
Object
- Object
- Nimbus::Tree
- Defined in:
- lib/nimbus/tree.rb
Overview
Tree object representing a random tree.
A tree is generated following this steps:
-
1: Calculate loss function for the individuals in the node (first node contains all the individuals).
-
2: Take a random sample of the SNPs (size m << total count of SNPs)
-
3: Compute the loss function for the split of the sample based on value of every SNP.
-
4: If the SNP with minimum loss function also minimizes the general loss of the node, split the individuals sample in three nodes, based on value for that SNP [0, 1, or 2]
-
5: Repeat from 1 for every node until:
-
a) The individuals count in that node is < minimum size OR
-
b) None of the SNP splits has a loss function smaller than the node loss function
-
-
6) When a node stops, label the node with the average fenotype value (for regression problems) or the majority class (for classification problems) of the individuals in the node.
Direct Known Subclasses
Constant Summary collapse
- NODE_SPLIT_01_2 =
"zero"
- NODE_SPLIT_0_12 =
"two"
Instance Attribute Summary collapse
-
#generalization_error ⇒ Object
Returns the value of attribute generalization_error.
-
#id_to_fenotype ⇒ Object
Returns the value of attribute id_to_fenotype.
-
#importances ⇒ Object
Returns the value of attribute importances.
-
#individuals ⇒ Object
Returns the value of attribute individuals.
-
#node_min_size ⇒ Object
Returns the value of attribute node_min_size.
-
#predictions ⇒ Object
Returns the value of attribute predictions.
-
#snp_sample_size ⇒ Object
Returns the value of attribute snp_sample_size.
-
#snp_total_count ⇒ Object
Returns the value of attribute snp_total_count.
-
#structure ⇒ Object
Returns the value of attribute structure.
-
#used_snps ⇒ Object
Returns the value of attribute used_snps.
Class Method Summary collapse
-
.traverse(tree_structure, data) ⇒ Object
Class method to traverse a single individual through a tree structure.
Instance Method Summary collapse
-
#build_node(individuals_ids, y_hat) ⇒ Object
Creates a node by taking a random sample of the SNPs and computing the loss function for every split by SNP of that sample.
-
#estimate_importances(oob_ids) ⇒ Object
Estimation of importance for every SNP.
-
#generalization_error_from_oob(oob_ids) ⇒ Object
Compute generalization error for the tree.
-
#initialize(options) ⇒ Tree
constructor
Initialize Tree object with the configuration (as in Nimbus::Configuration.tree) options received.
-
#seed(all_individuals, individuals_sample, ids_fenotypes) ⇒ Object
Creates the structure of the tree, as a hash of SNP splits and values.
Constructor Details
#initialize(options) ⇒ Tree
Initialize Tree object with the configuration (as in Nimbus::Configuration.tree) options received.
25 26 27 28 29 |
# File 'lib/nimbus/tree.rb', line 25 def initialize() @snp_total_count = [:snp_total_count] @snp_sample_size = [:snp_sample_size] @node_min_size = [:tree_node_min_size] end |
Instance Attribute Details
#generalization_error ⇒ Object
Returns the value of attribute generalization_error.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def generalization_error @generalization_error end |
#id_to_fenotype ⇒ Object
Returns the value of attribute id_to_fenotype.
19 20 21 |
# File 'lib/nimbus/tree.rb', line 19 def id_to_fenotype @id_to_fenotype end |
#importances ⇒ Object
Returns the value of attribute importances.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def importances @importances end |
#individuals ⇒ Object
Returns the value of attribute individuals.
19 20 21 |
# File 'lib/nimbus/tree.rb', line 19 def individuals @individuals end |
#node_min_size ⇒ Object
Returns the value of attribute node_min_size.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def node_min_size @node_min_size end |
#predictions ⇒ Object
Returns the value of attribute predictions.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def predictions @predictions end |
#snp_sample_size ⇒ Object
Returns the value of attribute snp_sample_size.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def snp_sample_size @snp_sample_size end |
#snp_total_count ⇒ Object
Returns the value of attribute snp_total_count.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def snp_total_count @snp_total_count end |
#structure ⇒ Object
Returns the value of attribute structure.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def structure @structure end |
#used_snps ⇒ Object
Returns the value of attribute used_snps.
18 19 20 |
# File 'lib/nimbus/tree.rb', line 18 def used_snps @used_snps end |
Class Method Details
.traverse(tree_structure, data) ⇒ Object
Class method to traverse a single individual through a tree structure.
Returns the prediction for that individual (the label of the final node reached by the individual).
57 58 59 60 61 62 63 64 65 66 67 |
# File 'lib/nimbus/tree.rb', line 57 def self.traverse(tree_structure, data) return tree_structure if tree_structure.is_a?(Numeric) || tree_structure.is_a?(String) raise Nimbus::TreeError, "Forest data has invalid structure. Please check your forest data (file)." if !(tree_structure.is_a?(Hash) && tree_structure.keys.size == 1) branch = tree_structure.values.first split_type = branch[1].to_s datum = data_traversing_value(data[tree_structure.keys.first - 1], split_type) return self.traverse(branch[datum], data) end |
Instance Method Details
#build_node(individuals_ids, y_hat) ⇒ Object
Creates a node by taking a random sample of the SNPs and computing the loss function for every split by SNP of that sample.
43 44 |
# File 'lib/nimbus/tree.rb', line 43 def build_node(individuals_ids, y_hat) end |
#estimate_importances(oob_ids) ⇒ Object
Estimation of importance for every SNP.
51 52 |
# File 'lib/nimbus/tree.rb', line 51 def estimate_importances(oob_ids) end |
#generalization_error_from_oob(oob_ids) ⇒ Object
Compute generalization error for the tree.
47 48 |
# File 'lib/nimbus/tree.rb', line 47 def generalization_error_from_oob(oob_ids) end |
#seed(all_individuals, individuals_sample, ids_fenotypes) ⇒ Object
Creates the structure of the tree, as a hash of SNP splits and values.
It just initializes the needed variables and then defines the first node of the tree. The rest of the structure of the tree is computed recursively building every node calling build_node
.
35 36 37 38 39 40 |
# File 'lib/nimbus/tree.rb', line 35 def seed(all_individuals, individuals_sample, ids_fenotypes) @individuals = all_individuals @id_to_fenotype = ids_fenotypes @predictions = {} @used_snps = [] end |