Class: Statsample::DominanceAnalysis
- Includes:
- Summarizable
- Defined in:
- lib/statsample/dominanceanalysis.rb,
lib/statsample/dominanceanalysis/bootstrap.rb
Overview
Dominance Analysis is a procedure based on an examination of the R<sup>2</sup> values for all possible subset models, to identify the relevance of one or more predictors in the prediction of criterium.
See Budescu(1993), Azen & Budescu (2003, 2006) for more information.
Use
a = Daru::Vector.new(1000.times.collect rand) b = Daru::Vector.new(1000.times.collect rand) c = Daru::Vector.new(1000.times.collect rand) ds= Daru::DataFrame.new(=> a,:b => b,:c => c) ds = ds.collect_rows {|row| row*5 + row*3 + row*2 + rand()} da=Statsample::DominanceAnalysis.new(ds, :y) puts da.summary
Output:
Report: Report 2010-02-08 19:10:11 -0300
Table: Dominance Analysis result
------------------------------------------------------------
| | r2 | sign | a | b | c |
------------------------------------------------------------
| Model 0 | | | 0.648 | 0.265 | 0.109 |
------------------------------------------------------------
| a | 0.648 | 0.000 | -- | 0.229 | 0.104 |
| b | 0.265 | 0.000 | 0.612 | -- | 0.104 |
| c | 0.109 | 0.000 | 0.643 | 0.260 | -- |
------------------------------------------------------------
| k=1 Average | | | 0.627 | 0.244 | 0.104 |
------------------------------------------------------------
| a*b | 0.877 | 0.000 | -- | -- | 0.099 |
| a*c | 0.752 | 0.000 | -- | 0.224 | -- |
| b*c | 0.369 | 0.000 | 0.607 | -- | -- |
------------------------------------------------------------
| k=2 Average | | | 0.607 | 0.224 | 0.099 |
------------------------------------------------------------
| a*b*c | 0.976 | 0.000 | -- | -- | -- |
------------------------------------------------------------
| Overall averages | | | 0.628 | 0.245 | 0.104 |
------------------------------------------------------------
Table: Pairwise dominance
-----------------------------------------
| Pairs | Total | Conditional | General |
-----------------------------------------
| a - b | 1.0 | 1.0 | 1.0 |
| a - c | 1.0 | 1.0 | 1.0 |
| b - c | 1.0 | 1.0 | 1.0 |
-----------------------------------------
Reference:
-
Budescu, D. V. (1993). Dominance analysis: a new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114, 542-551.
-
Azen, R. & Budescu, D.V. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8(2), 129-148.
-
Azen, R. & Budescu, D.V. (2006). Comparing predictors in Multivariate Regression Models: An extension of Dominance Analysis. Journal of Educational and Behavioral Statistics, 31(2), 157-180.
Defined Under Namespace
Constant Summary collapse
- UNIVARIATE_REGRESSION_CLASS =
Statsample::Regression::Multiple::MatrixEngine
- MULTIVARIATE_REGRESSION_CLASS =
Statsample::Regression::Multiple::MultipleDependent
Instance Attribute Summary collapse
-
#build_from_dataset ⇒ Object
Set to true if you want to build from dataset, not correlation matrix.
-
#cases ⇒ Object
If you provide a matrix as input, you should set the number of cases to define significance of R^2.
-
#dependent ⇒ Object
readonly
Returns the value of attribute dependent.
-
#method_association ⇒ Object
Method of :regression_class used to measure association.
-
#name ⇒ Object
Name of analysis.
-
#predictors ⇒ Object
Array with independent variables.
-
#regression_class ⇒ Object
Class to generate the regressions.
Class Method Summary collapse
Instance Method Summary collapse
-
#average_k(k) ⇒ Object
Hash with average for each k size model.
-
#compute ⇒ Object
Compute models.
- #conditional_dominance ⇒ Object
-
#conditional_dominance_pairwise(i, j) ⇒ Object
Returns 1 if i cD k, 0 if j cD i and 0.5 if undetermined.
- #dominance_for_nil_model(i, j) ⇒ Object
- #general_averages ⇒ Object
- #general_dominance ⇒ Object
-
#general_dominance_pairwise(i, j) ⇒ Object
Returns 1 if i gD k, 0 if j gD i and 0.5 if undetermined.
-
#get_averages(averages) ⇒ Object
For a hash with arrays of numbers as values Returns a hash with same keys and value as the mean of values of original hash.
-
#initialize(input, dependent, opts = Hash.new) ⇒ DominanceAnalysis
constructor
Creates a new DominanceAnalysis object Parameters: * input: A Matrix or Dataset object * dependent: Name of dependent variable.
- #md(m) ⇒ Object
-
#md_k(k) ⇒ Object
Get all model of size k.
- #models ⇒ Object
- #models_data ⇒ Object
- #pairs ⇒ Object
- #report_building(g) ⇒ Object
- #total_dominance ⇒ Object
-
#total_dominance_pairwise(i, j) ⇒ Object
Returns 1 if i D k, 0 if j dominates i and 0.5 if undetermined.
Methods included from Summarizable
Constructor Details
#initialize(input, dependent, opts = Hash.new) ⇒ DominanceAnalysis
Creates a new DominanceAnalysis object Parameters:
-
input: A Matrix or Dataset object
-
dependent: Name of dependent variable. Could be an array, if you want to
do an Multivariate Regression Analysis. If nil, set to all fields on input, except criteria
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/statsample/dominanceanalysis.rb', line 102 def initialize(input, dependent, opts=Hash.new) @build_from_dataset=false if dependent.is_a? Array @regression_class= MULTIVARIATE_REGRESSION_CLASS @method_association=:r2yx else @regression_class= UNIVARIATE_REGRESSION_CLASS @method_association=:r2 end @name=nil opts.each{|k,v| self.send("#{k}=",v) if self.respond_to? k } @dependent=dependent @dependent=[@dependent] unless @dependent.is_a? Array if input.kind_of? Daru::DataFrame @predictors ||= input.vectors.to_a - @dependent @ds=input @matrix=Statsample::Bivariate.correlation_matrix(input) @cases=Statsample::Bivariate.min_n_valid(input) elsif input.is_a? ::Matrix @predictors ||= input.fields-@dependent @ds=nil @matrix=input else raise ArgumentError.new("You should use a Matrix or a Dataset") end @name=_("Dominance Analysis: %s over %s") % [ @predictors.flatten.join(",") , @dependent.join(",")] if @name.nil? @models=nil @models_data=nil @general_averages=nil end |
Instance Attribute Details
#build_from_dataset ⇒ Object
Set to true if you want to build from dataset, not correlation matrix
65 66 67 |
# File 'lib/statsample/dominanceanalysis.rb', line 65 def build_from_dataset @build_from_dataset end |
#cases ⇒ Object
If you provide a matrix as input, you should set the number of cases to define significance of R^2
71 72 73 |
# File 'lib/statsample/dominanceanalysis.rb', line 71 def cases @cases end |
#dependent ⇒ Object (readonly)
Returns the value of attribute dependent.
83 84 85 |
# File 'lib/statsample/dominanceanalysis.rb', line 83 def dependent @dependent end |
#method_association ⇒ Object
Method of :regression_class used to measure association.
Only necessary to change if you have multivariate dependent.
-
:r2yx (R^2_yx), the default option, is the option when distinction between independent and dependents variable is arbitrary
-
:p2yx is the option when the distinction between independent and dependents variables is real.
80 81 82 |
# File 'lib/statsample/dominanceanalysis.rb', line 80 def method_association @method_association end |
#name ⇒ Object
Name of analysis
63 64 65 |
# File 'lib/statsample/dominanceanalysis.rb', line 63 def name @name end |
#predictors ⇒ Object
Array with independent variables. You could create subarrays,
to test groups of predictors as blocks
68 69 70 |
# File 'lib/statsample/dominanceanalysis.rb', line 68 def predictors @predictors end |
#regression_class ⇒ Object
Class to generate the regressions. Default to Statsample::Regression::Multiple::MatrixEngine
61 62 63 |
# File 'lib/statsample/dominanceanalysis.rb', line 61 def regression_class @regression_class end |
Class Method Details
Instance Method Details
#average_k(k) ⇒ Object
Hash with average for each k size model.
285 286 287 288 289 290 291 292 293 294 295 |
# File 'lib/statsample/dominanceanalysis.rb', line 285 def average_k(k) return nil if k==@predictors.size models=md_k(k) averages=@predictors.inject({}) {|a,v| a[v]=[];a} models.each do |m| @predictors.each do |f| averages[f].push(m.contributions[f]) unless m.contributions[f].nil? end end get_averages(averages) end |
#compute ⇒ Object
Compute models.
138 139 140 141 |
# File 'lib/statsample/dominanceanalysis.rb', line 138 def compute create_models fill_models end |
#conditional_dominance ⇒ Object
255 256 257 258 259 |
# File 'lib/statsample/dominanceanalysis.rb', line 255 def conditional_dominance pairs.inject({}){|a,pair| a[pair]=conditional_dominance_pairwise(pair[0], pair[1]) a } end |
#conditional_dominance_pairwise(i, j) ⇒ Object
Returns 1 if i cD k, 0 if j cD i and 0.5 if undetermined
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
# File 'lib/statsample/dominanceanalysis.rb', line 218 def conditional_dominance_pairwise(i,j) dm=dominance_for_nil_model(i,j) return 0.5 if dm==0.5 dominances=[dm] for k in 1...@predictors.size a=average_k(k) if a[i]>a[j] dominances.push(1) elsif a[i]<a[j] dominances.push(0) else return 0.5 #dominances.push(0.5) end end final=dominances.uniq final.size>1 ? 0.5 : final[0] end |
#dominance_for_nil_model(i, j) ⇒ Object
187 188 189 190 191 192 193 194 195 |
# File 'lib/statsample/dominanceanalysis.rb', line 187 def dominance_for_nil_model(i,j) if md([i]).r2>md([j]).r2 1 elsif md([i]).r2<md([j]).r2 0 else 0.5 end end |
#general_averages ⇒ Object
296 297 298 299 300 301 302 303 304 305 306 307 308 |
# File 'lib/statsample/dominanceanalysis.rb', line 296 def general_averages if @general_averages.nil? averages=@predictors.inject({}) {|a,v| a[v]=[md([v]).r2];a} for k in 1...@predictors.size ak=average_k(k) @predictors.each do |f| averages[f].push(ak[f]) end end @general_averages=get_averages(averages) end @general_averages end |
#general_dominance ⇒ Object
260 261 262 263 264 |
# File 'lib/statsample/dominanceanalysis.rb', line 260 def general_dominance pairs.inject({}){|a,pair| a[pair]=general_dominance_pairwise(pair[0], pair[1]) a } end |
#general_dominance_pairwise(i, j) ⇒ Object
Returns 1 if i gD k, 0 if j gD i and 0.5 if undetermined
237 238 239 240 241 242 243 244 245 246 |
# File 'lib/statsample/dominanceanalysis.rb', line 237 def general_dominance_pairwise(i,j) ga=general_averages if ga[i]>ga[j] 1 elsif ga[i]<ga[j] 0 else 0.5 end end |
#get_averages(averages) ⇒ Object
For a hash with arrays of numbers as values Returns a hash with same keys and value as the mean of values of original hash
279 280 281 282 283 |
# File 'lib/statsample/dominanceanalysis.rb', line 279 def get_averages(averages) out={} averages.each{ |key,val| out[key] = Daru::Vector.new(val).mean } out end |
#md(m) ⇒ Object
266 267 268 |
# File 'lib/statsample/dominanceanalysis.rb', line 266 def md(m) models_data[m.sort {|a,b| a.to_s <=> b.to_s}] end |
#md_k(k) ⇒ Object
Get all model of size k
270 271 272 273 274 |
# File 'lib/statsample/dominanceanalysis.rb', line 270 def md_k(k) out=[] @models.each{ |m| out.push(md(m)) if m.size==k } out end |
#models ⇒ Object
142 143 144 145 146 147 |
# File 'lib/statsample/dominanceanalysis.rb', line 142 def models if @models.nil? compute end @models end |
#models_data ⇒ Object
149 150 151 152 153 154 |
# File 'lib/statsample/dominanceanalysis.rb', line 149 def models_data if @models_data.nil? compute end @models_data end |
#pairs ⇒ Object
247 248 249 |
# File 'lib/statsample/dominanceanalysis.rb', line 247 def pairs models.find_all{|m| m.size==2} end |
#report_building(g) ⇒ Object
311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 |
# File 'lib/statsample/dominanceanalysis.rb', line 311 def report_building(g) compute if @models.nil? g.section(:name=>@name) do |generator| header=["","r2",_("sign")]+@predictors.collect {|c| DominanceAnalysis.predictor_name(c) } generator.table(:name=>_("Dominance Analysis result"), :header=>header) do |t| row=[_("Model 0"),"",""]+@predictors.collect{|f| sprintf("%0.3f",md([f]).r2) } t.row(row) t.hr for i in 1..@predictors.size mk=md_k(i) mk.each{|m| t.row(m.add_table_row) } # Report averages a=average_k(i) if !a.nil? t.hr row=[_("k=%d Average") % i,"",""] + @predictors.collect{|f| sprintf("%0.3f",a[f]) } t.row(row) t.hr end end g=general_averages t.hr row=[_("Overall averages"),"",""]+@predictors.collect{|f| sprintf("%0.3f",g[f]) } t.row(row) end td=total_dominance cd=conditional_dominance gd=general_dominance generator.table(:name=>_("Pairwise dominance"), :header=>[_("Pairs"),_("Total"),_("Conditional"),_("General")]) do |t| pairs.each{|pair| name=pair.map{|v| v.is_a?(Array) ? "("+v.join("-")+")" : v}.join(" - ") row=[name, sprintf("%0.1f",td[pair]), sprintf("%0.1f",cd[pair]), sprintf("%0.1f",gd[pair])] t.row(row) } end end end |
#total_dominance ⇒ Object
250 251 252 253 254 |
# File 'lib/statsample/dominanceanalysis.rb', line 250 def total_dominance pairs.inject({}){|a,pair| a[pair]=total_dominance_pairwise(pair[0], pair[1]) a } end |
#total_dominance_pairwise(i, j) ⇒ Object
Returns 1 if i D k, 0 if j dominates i and 0.5 if undetermined
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/statsample/dominanceanalysis.rb', line 197 def total_dominance_pairwise(i,j) dm=dominance_for_nil_model(i,j) return 0.5 if dm==0.5 dominances=[dm] models_data.each do |k,m| if !m.contributions[i].nil? and !m.contributions[j].nil? if m.contributions[i]>m.contributions[j] dominances.push(1) elsif m.contributions[i]<m.contributions[j] dominances.push(0) else return 0.5 #dominances.push(0.5) end end end final=dominances.uniq final.size>1 ? 0.5 : final[0] end |