Class: Statsample::Factor::ParallelAnalysis
- Includes:
- DirtyMemoize, Summarizable
- Defined in:
- lib/statsample/factor/parallelanalysis.rb
Overview
Performs Horn’s ‘parallel analysis’ to a principal components analysis to adjust for sample bias in the retention of components. Can create the bootstrap samples using random data, using number of cases and variables, parameters for actual data (mean and standard deviation of each variable) or bootstrap sampling for actual data.
Description
“PA involves the construction of a number of correlation matrices of random variables based on the same sample size and number of variables in the real data set. The average eigenvalues from the random correlation matrices are then compared to the eigenvalues from the real data correlation matrix, such that the first observed eigenvalue is compared to the first random eigenvalue, the second observed eigenvalue is compared to the second random eigenvalue, and so on.” (Hayton, Allen & Scarpello, 2004, p.194)
Usage
*With real dataset*
# ds should be any valid dataset
pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>100, :bootstrap_method=>:data)
*With number of cases and variables*
pa=Statsample::Factor::ParallelAnalysis.with_random_data(100,8)
Reference
-
Hayton, J., Allen, D. & Scarpello, V.(2004). Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis. Organizational Research Methods, 7 (2), 191-205.
-
O’Connor, B. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396-402.
-
Liu, O., & Rijmen, F. (2008). A modified procedure for parallel analysis of ordered categorical data. Behavior Research Methods, 40(2), 556-562.
Instance Attribute Summary collapse
-
#bootstrap_method ⇒ Object
Bootstrap method.
-
#debug ⇒ Object
Show extra information if true.
-
#ds ⇒ Object
readonly
Dataset.
-
#ds_eigenvalues ⇒ Object
readonly
Dataset with bootstrapped eigenvalues.
-
#iterations ⇒ Object
Number of random sets to produce.
-
#matrix_method ⇒ Object
Correlation matrix used with :raw_data .
-
#n_variables ⇒ Object
Number of eigenvalues to calculate.
-
#name ⇒ Object
Name of analysis.
-
#no_data ⇒ Object
Perform analysis without actual data.
-
#percentil ⇒ Object
Percentil over bootstrap eigenvalue should be accepted.
-
#smc ⇒ Object
Uses smc on diagonal of matrixes, to perform simulation of a Principal Axis analysis.
-
#use_gsl ⇒ Object
Returns the value of attribute use_gsl.
Class Method Summary collapse
Instance Method Summary collapse
-
#compute ⇒ Object
Perform calculation.
-
#initialize(ds, opts = Hash.new) ⇒ ParallelAnalysis
constructor
A new instance of ParallelAnalysis.
-
#number_of_factors ⇒ Object
Number of factor to retent.
-
#report_building(g) ⇒ Object
:nodoc:.
Methods included from Summarizable
Constructor Details
#initialize(ds, opts = Hash.new) ⇒ ParallelAnalysis
Returns a new instance of ParallelAnalysis.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 62 def initialize(ds, opts=Hash.new) @ds=ds @fields=@ds.vectors.to_a @n_variables=@fields.size @n_cases=ds.nrows opts_default={ :name=>_("Parallel Analysis"), :iterations=>50, # See Liu and Rijmen (2008) :bootstrap_method => :random, :smc=>false, :percentil=>95, :debug=>false, :no_data=>false, :matrix_method=>:correlation_matrix } @use_gsl=Statsample.has_gsl? @opts=opts_default.merge(opts) @opts[:matrix_method]==:correlation_matrix if @opts[:bootstrap_method]==:parameters opts_default.keys.each {|k| send("#{k}=", @opts[k]) } end |
Instance Attribute Details
#bootstrap_method ⇒ Object
Bootstrap method. :random
used by default
-
:random
: uses number of variables and cases for the dataset -
:data
: sample with replacement from actual data.
43 44 45 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 43 def bootstrap_method @bootstrap_method end |
#debug ⇒ Object
Show extra information if true
60 61 62 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 60 def debug @debug end |
#ds ⇒ Object (readonly)
Dataset. You could use mock vectors when use bootstrap method
39 40 41 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 39 def ds @ds end |
#ds_eigenvalues ⇒ Object (readonly)
Dataset with bootstrapped eigenvalues
56 57 58 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 56 def ds_eigenvalues @ds_eigenvalues end |
#iterations ⇒ Object
Number of random sets to produce. 50 by default
35 36 37 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 35 def iterations @iterations end |
#matrix_method ⇒ Object
Correlation matrix used with :raw_data . :correlation_matrix
used by default
51 52 53 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 51 def matrix_method @matrix_method end |
#n_variables ⇒ Object
Number of eigenvalues to calculate. Should be set for Principal Axis Analysis.
54 55 56 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 54 def n_variables @n_variables end |
#name ⇒ Object
Name of analysis
37 38 39 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 37 def name @name end |
#no_data ⇒ Object
Perform analysis without actual data.
58 59 60 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 58 def no_data @no_data end |
#percentil ⇒ Object
Percentil over bootstrap eigenvalue should be accepted. 95 by default
49 50 51 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 49 def percentil @percentil end |
#smc ⇒ Object
Uses smc on diagonal of matrixes, to perform simulation of a Principal Axis analysis. By default, false.
47 48 49 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 47 def smc @smc end |
#use_gsl ⇒ Object
Returns the value of attribute use_gsl.
61 62 63 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 61 def use_gsl @use_gsl end |
Class Method Details
.with_random_data(cases, vars, opts = Hash.new) ⇒ Object
24 25 26 27 28 29 30 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 24 def self.with_random_data(cases,vars,opts=Hash.new) ds= Daru::DataFrame.new({}, order: vars.times.map {|i| "v#{i+1}".to_sym}, index: cases ) opts=opts.merge({:bootstrap_method=> :random, :no_data=>true}) new(ds, opts) end |
Instance Method Details
#compute ⇒ Object
Perform calculation. Shouldn’t be called directly for the user
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 122 def compute @original=Statsample::Bivariate.send(matrix_method, @ds).eigenvalues unless no_data @ds_eigenvalues=Daru::DataFrame.new({}, order: (1..@n_variables).map{|v| ("ev_%05d" % v).to_sym}) if bootstrap_method==:parameter or bootstrap_method==:random rng = Distribution::Normal.rng end @iterations.times do |i| begin puts "#{@name}: Iteration #{i}" if $DEBUG or debug # Create a dataset of dummy values ds_bootstrap = Daru::DataFrame.new({}, order: @ds.vectors, index: @n_cases) @fields.each do |f| if bootstrap_method==:random ds_bootstrap[f] = Daru::Vector.new(@n_cases.times.map {|c| rng.call}) elsif bootstrap_method==:data ds_bootstrap[f] = ds[f].sample_with_replacement(@n_cases) else raise "bootstrap_method doesn't recogniced" end end matrix=Statsample::Bivariate.send(matrix_method, ds_bootstrap) matrix=matrix.to_gsl if @use_gsl if smc smc_v=matrix.inverse.diagonal.map{|ii| 1-(1.quo(ii))} smc_v.each_with_index do |v,ii| matrix[ii,ii]=v end end ev=matrix.eigenvalues @ds_eigenvalues.add_row(ev) rescue Statsample::Bivariate::Tetrachoric::RequerimentNotMeet => e puts "Error: #{e}" if $DEBUG redo end end end |
#number_of_factors ⇒ Object
Number of factor to retent
83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 83 def number_of_factors total=0 ds_eigenvalues.vectors.to_a.each_with_index do |f,i| if (@original[i]>0 and @original[i]>ds_eigenvalues[f].percentil(percentil)) total+=1 else break end end total end |
#report_building(g) ⇒ Object
:nodoc:
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/statsample/factor/parallelanalysis.rb', line 94 def report_building(g) #:nodoc: g.section(:name=>@name) do |s| s.text _("Bootstrap Method: %s") % bootstrap_method s.text _("Uses SMC: %s") % (smc ? _("Yes") : _("No")) s.text _("Correlation Matrix type : %s") % matrix_method s.text _("Number of variables: %d") % @n_variables s.text _("Number of cases: %d") % @n_cases s.text _("Number of iterations: %d") % @iterations if @no_data s.table(:name=>_("Eigenvalues"), :header=>[_("n"), _("generated eigenvalue"), "p.#{percentil}"]) do |t| ds_eigenvalues.vectors.to_a.each_with_index do |f,i| v=ds_eigenvalues[f] t.row [i+1, "%0.4f" % v.mean, "%0.4f" % v.percentil(percentil), ] end end else s.text _("Number or factors to preserve: %d") % number_of_factors s.table(:name=>_("Eigenvalues"), :header=>[_("n"), _("data eigenvalue"), _("generated eigenvalue"),"p.#{percentil}",_("preserve?")]) do |t| ds_eigenvalues.vectors.to_a.each_with_index do |f,i| v=ds_eigenvalues[f] t.row [i+1, "%0.4f" % @original[i], "%0.4f" % v.mean, "%0.4f" % v.percentil(percentil), (v.percentil(percentil)>0 and @original[i] > v.percentil(percentil)) ? "Yes":""] end end end end end |