Class: Statsample::Test::UMannWhitney

Inherits:
Object
  • Object
show all
Includes:
Summarizable
Defined in:
lib/statsample/test/umannwhitney.rb

Overview

U Mann-Whitney test

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

Assumptions

  • The two samples under investigation in the test are independent of each other and the observations within each sample are independent.

  • The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).

  • The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

Constant Summary collapse

MAX_MN_EXACT =

Max for m*n allowed for exact calculation of probability

10000

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Summarizable

#summary

Constructor Details

#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney

Create a new U Mann-Whitney test Params: Two Daru::Vectors

[View source] [View on GitHub]

118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# File 'lib/statsample/test/umannwhitney.rb', line 118

def initialize(v1,v2, opts=Hash.new)
  @v1      = v1
  @v2      = v2
  v1_valid = v1.reject_values(*Daru::MISSING_VALUES).reset_index!
  v2_valid = v2.reject_values(*Daru::MISSING_VALUES).reset_index!
  @n1      = v1_valid.size
  @n2      = v2_valid.size
  data     = Daru::Vector.new(v1_valid.to_a + v2_valid.to_a)
  groups   = Daru::Vector.new(([0] * @n1) + ([1] * @n2))
  ds       = Daru::DataFrame.new({:g => groups, :data => data})
  @t       = nil
  @ties    = data.to_a.size != data.to_a.uniq.size        
  if @ties
    adjust_for_ties(ds[:data])
  end
  ds[:ranked] = ds[:data].ranked      
  @n = ds.nrows
    
  @r1 = ds.filter_rows { |r| r[:g] == 0}[:ranked].sum
  @r2 = ((ds.nrows * (ds.nrows + 1)).quo(2)) - r1
  @u1 = r1 - ((@n1 * (@n1 + 1)).quo(2))
  @u2 = r2 - ((@n2 * (@n2 + 1)).quo(2))
  @u  = (u1 < u2) ? u1 : u2
  opts_default = { :name=>_("Mann-Whitney's U") }
  @opts = opts_default.merge(opts)
  opts_default.keys.each {|k|
    send("#{k}=", @opts[k])
  }       
end

Instance Attribute Details

#nameObject

Name of test

[View on GitHub]

112
113
114
# File 'lib/statsample/test/umannwhitney.rb', line 112

def name
  @name
end

#r1Object (readonly)

Sample 1 Rank sum

[View on GitHub]

100
101
102
# File 'lib/statsample/test/umannwhitney.rb', line 100

def r1
  @r1
end

#r2Object (readonly)

Sample 2 Rank sum

[View on GitHub]

102
103
104
# File 'lib/statsample/test/umannwhitney.rb', line 102

def r2
  @r2
end

#tObject (readonly)

Value of compensation for ties (useful for demostration)

[View on GitHub]

110
111
112
# File 'lib/statsample/test/umannwhitney.rb', line 110

def t
  @t
end

#uObject (readonly)

U Value

[View on GitHub]

108
109
110
# File 'lib/statsample/test/umannwhitney.rb', line 108

def u
  @u
end

#u1Object (readonly)

Sample 1 U (useful for demostration)

[View on GitHub]

104
105
106
# File 'lib/statsample/test/umannwhitney.rb', line 104

def u1
  @u1
end

#u2Object (readonly)

Sample 2 U (useful for demostration)

[View on GitHub]

106
107
108
# File 'lib/statsample/test/umannwhitney.rb', line 106

def u2
  @u2
end

Class Method Details

.distribution_permutations(n1, n2) ⇒ Object

Generate distribution for permutations. Very expensive, but useful for demostrations

[View source] [View on GitHub]

78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/statsample/test/umannwhitney.rb', line 78

def self.distribution_permutations(n1,n2)
  base=[0]*n1+[1]*n2
  po=Statsample::Permutation.new(base)
  
  total=n1*n2
  req={}
  po.each do |perm|
    r0,s0=0,0
    perm.each_index {|c_i|
      if perm[c_i]==0
        r0+=c_i+1
        s0+=1
      end
    }
    u1=r0-((s0*(s0+1)).quo(2))
    u2=total-u1
    temp_u= (u1 <= u2) ? u1 : u2
    req[perm]=temp_u
  end
  req
end

.u_sampling_distribution_as62(n1, n2) ⇒ Object

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.

Parameters:

  • n1: group 1 size

  • n2: group 2 size

Reference:

  • Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273

[View source] [View on GitHub]

31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/statsample/test/umannwhitney.rb', line 31

def self.u_sampling_distribution_as62(n1,n2)

  freq=[]
  work=[]
  mn1=n1*n2+1
  max_u=n1*n2
  minmn=n1<n2 ? n1 : n2
  maxmn=n1>n2 ? n1 : n2
  n1=maxmn+1
  (1..n1).each{|i| freq[i]=1}
  n1+=1
  (n1..mn1).each{|i| freq[i]=0}
  work[1]=0
  xin=maxmn
  (2..minmn).each do |i|
    work[i]=0
    xin=xin+maxmn
    n1=xin+2
    l=1+xin.quo(2)
    k=i
    (1..l).each do |j|
      k=k+1
      n1=n1-1
      sum=freq[j]+work[j]
      freq[j]=sum
      work[k]=sum-freq[n1]
      freq[n1]=sum
    end
  end
  
  # Generate percentages for normal U
  dist=(1+max_u/2).to_i
  freq.shift
  total=freq.inject(0) {|a,v| a+v }
  (0...dist).collect {|i|
    if i!=max_u-i
      ues=freq[i]*2
    else
      ues=freq[i]
    end
    ues.quo(total)
  }
end

Instance Method Details

#probability_exactObject

Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62

[View source] [View on GitHub]

162
163
164
165
166
167
168
169
# File 'lib/statsample/test/umannwhitney.rb', line 162

def probability_exact
  dist = UMannWhitney.u_sampling_distribution_as62(@n1,@n2)
  sum = 0
  (0..@u.to_i).each {|i|
    sum+=dist[i]
  }
  sum
end

#probability_zObject

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.

[View source] [View on GitHub]

202
203
204
# File 'lib/statsample/test/umannwhitney.rb', line 202

def probability_z
  (1-Distribution::Normal.cdf(z.abs()))*2
end

#report_building(generator) ⇒ Object

:nodoc:

[View source] [View on GitHub]

147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/statsample/test/umannwhitney.rb', line 147

def report_building(generator) # :nodoc:
  generator.section(:name=>@name) do |s|
    s.table(:name=>_("%s results") % @name) do |t|
      t.row([_("Sum of ranks %s") % @v1.name, "%0.3f" % @r1])
      t.row([_("Sum of ranks %s") % @v2.name, "%0.3f" % @r2])
      t.row([_("U Value"), "%0.3f" % @u])
      t.row([_("Z"), "%0.3f (p: %0.3f)" % [z, probability_z]])
      if @n1*@n2<MAX_MN_EXACT
        t.row([_("Exact p (Dinneen & Blakesley, 1973):"), "%0.3f" % probability_exact])
      end
    end
  end
end

#zObject

Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.

Reference:

  • SPSS Manual

[View source] [View on GitHub]

187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/statsample/test/umannwhitney.rb', line 187

def z
  mu=(@n1*@n2).quo(2)
  if(!@ties)
    ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12))
  else
    n=@n1+@n2
    first=(@n1*@n2).quo(n*(n-1))
    second=((n**3-n).quo(12))-@t
    ou=Math::sqrt(first*second)
  end
  (@u-mu).quo(ou)
end