Class: Chomchom::Topic
- Inherits:
-
Object
- Object
- Chomchom::Topic
- Defined in:
- lib/chomchom/topic.rb
Constant Summary collapse
- MAX =
8
Instance Method Summary collapse
-
#initialize(text, title = '', title_weight = 1) ⇒ Topic
constructor
A new instance of Topic.
-
#multiples ⇒ Object
this is not for the benefit of summary (but for db storage so move this into topic method in chomchom.rb) merge words before sorting (this keeps order of words as they appear) look at each word in single_groups and merge with the others O(n^2)(this is inefficient) just go through the list in order, for each combine them and switch the order, take whichever one generate more counts merge for 2-word, then 3-word only for 3 (triples) just build from the doubles, then combine with non-overlap singles subtract from count everytime you legally take away (combine is more than 2 and remainder is more than 2).
- #singles ⇒ Object
Constructor Details
#initialize(text, title = '', title_weight = 1) ⇒ Topic
Returns a new instance of Topic.
8 9 10 11 12 13 14 |
# File 'lib/chomchom/topic.rb', line 8 def initialize(text, title='', title_weight=1) #support unicode (require ruby 1.9.x) text = text.force_encoding("UTF-8") title = title.force_encoding("UTF-8") @content = title * title_weight + text.gsub(/\n+/,"\n") @content = @content.force_encoding("UTF-8").downcase end |
Instance Method Details
#multiples ⇒ Object
this is not for the benefit of summary (but for db storage so move this into topic method in chomchom.rb) merge words before sorting (this keeps order of words as they appear) look at each word in single_groups and merge with the others O(n^2)(this is inefficient) just go through the list in order, for each combine them and switch the order, take whichever one generate more counts merge for 2-word, then 3-word only for 3 (triples) just build from the doubles, then combine with non-overlap singles subtract from count everytime you legally take away (combine is more than 2 and remainder is more than 2)
30 31 32 |
# File 'lib/chomchom/topic.rb', line 30 def multiples end |
#singles ⇒ Object
16 17 18 19 20 21 |
# File 'lib/chomchom/topic.rb', line 16 def singles words = @content.split(' ').map { |w| w.downcase.gsub(/[^\p{Word}]/, '') }.uniq.delete_if { |w| !w or w.length<2 or w.is_common? } @singles = words.map { |w| [w, frequency(w)] } @singles = @singles.delete_if { |g| g[1] < 3}.sort { |a,b| b[1] <=> a[1] } @singles[0..MAX] end |