Module: RMMSeg::Chunk
- Defined in:
- lib/rmmseg/chunk.rb
Overview
A Chunk holds one or more successive Word .
Class Method Summary collapse
-
.average_length(words) ⇒ Object
The average length of words.
-
.degree_of_morphemic_freedom(words) ⇒ Object
The sum of all frequencies of one-character words.
-
.total_length(words) ⇒ Object
The sum of length of all words.
-
.variance(words) ⇒ Object
The square of the standard deviation of length of all words.
Class Method Details
.average_length(words) ⇒ Object
The average length of words.
15 16 17 |
# File 'lib/rmmseg/chunk.rb', line 15 def self.average_length(words) total_length(words).to_f/words.size end |
.degree_of_morphemic_freedom(words) ⇒ Object
The sum of all frequencies of one-character words.
31 32 33 34 35 36 37 38 39 |
# File 'lib/rmmseg/chunk.rb', line 31 def self.degree_of_morphemic_freedom(words) sum = 0 for word in words if word.length == 1 && word.type == Word::TYPES[:cjk_word] sum += word.frequency end end sum end |
.total_length(words) ⇒ Object
The sum of length of all words.
6 7 8 9 10 11 12 |
# File 'lib/rmmseg/chunk.rb', line 6 def self.total_length(words) len = 0 for word in words len += word.length end len end |
.variance(words) ⇒ Object
The square of the standard deviation of length of all words.
20 21 22 23 24 25 26 27 28 |
# File 'lib/rmmseg/chunk.rb', line 20 def self.variance(words) avglen = average_length(words) sqr_sum = 0.0 for word in words tmp = word.length - avglen sqr_sum += tmp*tmp end Math.sqrt(sqr_sum) end |