Class: Statsample::Bivariate::Tetrachoric
- Inherits:
-
Object
- Object
- Statsample::Bivariate::Tetrachoric
- Includes:
- Summarizable
- Defined in:
- lib/statsample/bivariate/tetrachoric.rb
Overview
Compute tetrachoric correlation.
The tetrachoric correlation is a measure of bivariate association arising when both observed variates are categorical variables that result from dichotomizing the two undelying continuous variables (Drasgow, 2006). The tetrachoric correlation is a good way to measure rater agreement (Uebersax, 2006)
This class uses Brown (1977) algorithm. You can see FORTRAN code on lib.stat.cmu.edu/apstat/116
Usage
With two variables x and y on a crosstab like this:
-------------
| y=0 | y=1 |
-------------
x = 0 | a | b |
-------------
x = 1 | c | d |
-------------
The code will be
tc=Statsample::Bivariate::Tetrachoric.new(a,b,c,d)
tc.r # correlation
tc.se # standard error
tc.threshold_y # threshold for y variable
tc.threshold_x # threshold for x variable
Reference:
-
Brown, MB. (1977) Algorithm AS 116: the tetrachoric correlation and its standard error. Applied Statistics, 26, 343-351.
-
Drasgow F. (2006). Polychoric and polyserial correlations. In Kotz L, Johnson NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74). New York: Wiley.
-
Uebersax, J.S. (2006). The tetrachoric and polychoric correlation coefficients. Statistical Methods for Rater Agreement web site. 2006. Available at: john-uebersax.com/stat/tetra.htm . Accessed February, 11, 2010
Constant Summary collapse
- RequerimentNotMeet =
Class.new(Exception)
- TWOPI =
Math::PI*2
- SQT2PI =
2.50662827
- RLIMIT =
0.9999
- RCUT =
0.95
- UPLIM =
5.0
- CONST =
1E-36
- CHALF =
1E-18
- CONV =
1E-8
- CITER =
1E-6
- NITER =
25
- X =
[0,0.9972638618, 0.9856115115, 0.9647622556, 0.9349060759, 0.8963211558, 0.8493676137, 0.7944837960, 0.7321821187, 0.6630442669, 0.5877157572, 0.5068999089, 0.4213512761, 0.3318686023, 0.2392873623, 0.1444719616, 0.0483076657]
- W =
[0, 0.0070186100, 0.0162743947, 0.0253920653, 0.0342738629, 0.0428358980, 0.0509980593, 0.0586840935, 0.0658222228, 0.0723457941, 0.0781938958, 0.0833119242, 0.0876520930, 0.0911738787, 0.0938443991, 0.0956387201, 0.0965400885]
Instance Attribute Summary collapse
-
#name ⇒ Object
Name on the analysis.
-
#r ⇒ Object
readonly
Value for tethrachoric correlation.
-
#ruby_engine ⇒ Object
Use ruby version of algorithm.
Class Method Summary collapse
-
.new_with_matrix(m, opts = Hash.new) ⇒ Object
Creates a Tetrachoric object based on a 2x2 Matrix.
-
.new_with_vectors(v1, v2, opts = Hash.new) ⇒ Object
Creates a Tetrachoric object based on two vectors.
Instance Method Summary collapse
- #check_frequencies ⇒ Object
-
#compute_optimized ⇒ Object
Compute the tetrachoric correlation.
-
#compute_ruby ⇒ Object
Compute the tetrachoric correlation using ruby Called on object creation.
-
#initialize(a, b, c, d, opts = Hash.new) ⇒ Tetrachoric
constructor
Creates a new tetrachoric object for analysis.
-
#report_building(generator) ⇒ Object
:nodoc:.
-
#se ⇒ Object
Standard error.
-
#threshold_x ⇒ Object
Threshold for variable x (rows) Point on gauss curve under X rater select cases.
-
#threshold_y ⇒ Object
Threshold for variable y (columns) Point on gauss curve under Y rater select cases.
Constructor Details
#initialize(a, b, c, d, opts = Hash.new) ⇒ Tetrachoric
Creates a new tetrachoric object for analysis
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 146 def initialize(a,b,c,d, opts=Hash.new) @a,@b,@c,@d=a,b,c,d opts_default={ :name=>_("Tetrachoric correlation"), :ruby_engine=>false } @opts=opts_default.merge opts @opts.each{|k,v| self.send("#{k}=",v) if self.respond_to? k} # # CHECK IF ANY CELL FREQUENCY IS NEGATIVE # raise "All frequencies should be positive" if (@a < 0 or @b < 0 or @c < 0 or @d < 0) compute end |
Instance Attribute Details
#name ⇒ Object
Name on the analysis
75 76 77 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 75 def name @name end |
#r ⇒ Object (readonly)
Value for tethrachoric correlation
73 74 75 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 73 def r @r end |
#ruby_engine ⇒ Object
Use ruby version of algorithm. By default, this attribute is set to false, and C version of algorithm is used
79 80 81 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 79 def ruby_engine @ruby_engine end |
Class Method Details
.new_with_matrix(m, opts = Hash.new) ⇒ Object
Creates a Tetrachoric object based on a 2x2 Matrix.
93 94 95 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 93 def self.new_with_matrix(m, opts=Hash.new) Tetrachoric.new(m[0,0], m[0,1], m[1,0],m[1,1], opts) end |
.new_with_vectors(v1, v2, opts = Hash.new) ⇒ Object
Creates a Tetrachoric object based on two vectors. The vectors are dichotomized previously.
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 98 def self.new_with_vectors(v1,v2, opts=Hash.new) v1a, v2a=Statsample.only_valid(v1,v2) v1a=v1a.dichotomize v2a=v2a.dichotomize raise RequerimentNotMeet, "v1 have only 0" if v1a.factors==[0] raise RequerimentNotMeet, "v2 have only 0" if v2a.factors==[0] a,b,c,d = 0,0,0,0 v1a.each_index{|i| x,y=v1a[i],v2a[i] a+=1 if x==0 and y==0 b+=1 if x==0 and y==1 c+=1 if x==1 and y==0 d+=1 if x==1 and y==1 } Tetrachoric.new(a,b,c,d, opts) end |
Instance Method Details
#check_frequencies ⇒ Object
176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 176 def check_frequencies # # CHECK IF ANY FREQUENCY IS 0.0 AND SET kdelta # @kdelta = 1 @kdelta = 2 if (@a == 0 or @d == 0) @kdelta += 2 if (@b == 0 or @c == 0) # # kdelta=4 MEANS TABLE HAS 0.0 ROW OR COLUMN, RUN IS TERMINATED # raise RequerimentNotMeet, "Rows and columns should have more than 0 items" if @kdelta==4 end |
#compute_optimized ⇒ Object
Compute the tetrachoric correlation
170 171 172 173 174 175 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 170 def compute_optimized check_frequencies t=Statsample::STATSAMPLE__.tetrachoric(@a,@b,@c,@d) raise "Error on calculation of tetrachoric correlation: #{t[:ifault]}" if t[:ifault]>0 @r,@sdr,@itype,@ifault,@zab, @zac = t[:r],t[:sdr],t[:itype],t[:ifault], t[:threshold_x], t[:threshold_y] end |
#compute_ruby ⇒ Object
Compute the tetrachoric correlation using ruby Called on object creation.
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 193 def compute_ruby check_frequencies # # INITIALIZATION # @r = 0 sdzero = 0 @sdr = 0 @itype = 0 @ifault = 0 delta = 0 # GOTO (4, 1, 2 , 92), kdelta # # delta IS 0.0, 0.5 OR -0.5 ACCORDING TO WHICH CELL IS 0.0 # if(@kdelta==2) # 1 delta=0.5 @r=-1 if (@a==0 and @d==0) elsif(@kdelta==3) # 2 delta=-0.5 @r=1 if (@b==0 and @c==0) end # 4 if @r!=0 @itype=3 end # # STORE FREQUENCIES IN AA, BB, CC AND DD # @aa = @a + delta @bb = @b - delta @cc = @c - delta @dd = @d + delta @tot = @aa+@bb+@cc+@dd # # CHECK IF CORRELATION IS NEGATIVE, 0.0, POSITIVE # IF (AA * DD - BB * CC) 7, 5, 6 corr_dir=@aa * @dd - @bb * @cc if(corr_dir < 0) # 7 @probaa = @bb.quo(@tot) @probac = (@bb + @dd).quo(@tot) @ksign = 2 # -> 8 else if (corr_dir==0) # 5 @itype=4 end # 6 # # COMPUTE PROBABILITIES OF QUADRANT AND OF MARGINALS # PROBAA AND PROBAC CHOSEN SO THAT CORRELATION IS POSITIVE. # KSIGN INDICATES WHETHER QUADRANTS HAVE BEEN SWITCHED # @probaa = @aa.quo(@tot) @probac = (@aa+@cc).quo(@tot) @ksign=1 end # 8 @probab = (@aa+@bb).quo(@tot) # # COMPUTE NORMAL DEVIATES FOR THE MARGINAL FREQUENCIES # SINCE NO MARGINAL CAN BE 0.0, IE IS NOT CHECKED # @zac = Distribution::Normal.p_value(@probac.to_f) @zab = Distribution::Normal.p_value(@probab.to_f) @ss = Math::exp(-0.5 * (@zac ** 2 + @zab ** 2)).quo(TWOPI) # # WHEN R IS 0.0, 1.0 OR -1.0, TRANSFER TO COMPUTE SDZERO # if (@r != 0 or @itype > 0) compute_sdzero return true end # # WHEN MARGINALS ARE EQUAL, COSINE EVALUATION IS USED # if (@a == @b and @b == @c) calculate_cosine return true end # # INITIAL ESTIMATE OF CORRELATION IS YULES Y # @rr = ((Math::sqrt(@aa * @dd) - Math::sqrt(@bb * @cc)) ** 2) / (@aa * @dd - @bb * @cc).abs @iter = 0 begin # # IF RR EXCEEDS RCUT, GAUSSIAN QUADRATURE IS USED # #10 if @rr>RCUT gaussian_quadrature return true end # # TETRACHORIC SERIES IS COMPUTED # # INITIALIZATION # va=1.0 vb=@zac.to_f wa=1.0 wb=@zab.to_f term = 1.0 iterm = 0.0 @sum = @probab * @probac deriv = 0.0 sr = @ss #15 begin if(sr.abs<=CONST) # # RESCALE TERMS TO AVOID OVERFLOWS AND UNDERFLOWS # sr = sr / CONST va = va * CHALF vb = vb * CHALF wa = wa * CHALF wb = wb * CHALF end # # FORM SUM AND DERIVATIVE OF SERIES # # 20 dr = sr * va * wa sr = sr * @rr / term cof = sr * va * wa # # ITERM COUNTS NO. OF CONSECUTIVE TERMS < CONV # iterm+= 1 iterm=0 if (cof.abs > CONV) @sum = @sum + cof deriv += dr vaa = va waa = wa va = vb wa = wb vb = @zac * va - term * vaa wb = @zab * wa - term * waa term += 1 end while (iterm < 2 or term < 6) # # CHECK IF ITERATION CONVERGED # if((@sum-@probaa).abs <= CITER) @itype=term calculate_sdr return true end # # CALCULATE NEXT ESTIMATE OF CORRELATION # #25 @iter += 1 # # IF TOO MANY ITERATlONS, RUN IS TERMINATED # delta = (@sum - @probaa) / deriv @rrprev = @rr @rr = @rr - delta @rr += 0.5 * delta if(@iter == 1) @rr= RLIMIT if (@rr > RLIMIT) @rr =0 if (@rr < 0.0) end while @iter < NITER raise "Too many iteration" # GOTO 10 end |
#report_building(generator) ⇒ Object
:nodoc:
130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 130 def report_building(generator) # :nodoc: generator.section(:name=>@name) do |s| s.table(:name=>_("Contingence Table"),:header=>["","Y=0","Y=1", "T"]) do |t| t.row(["X=0", @a,@b,@a+@b]) t.row(["X=1", @c,@d,@c+@d]) t.hr t.row(["T", @a+@c,@b+@d,@a+@b+@c+@d]) end s.text(sprintf("r: %0.3f",r)) s.text(_("SE: %0.3f") % se) s.text(_("Threshold X: %0.3f ") % threshold_x) s.text(_("Threshold Y: %0.3f ") % threshold_y ) end end |
#se ⇒ Object
Standard error
115 116 117 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 115 def se @sdr end |
#threshold_x ⇒ Object
Threshold for variable x (rows) Point on gauss curve under X rater select cases
120 121 122 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 120 def threshold_x @zab end |
#threshold_y ⇒ Object
Threshold for variable y (columns) Point on gauss curve under Y rater select cases
127 128 129 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 127 def threshold_y @zac end |