Class: Wikipedia::VandalismDetection::Features::Base
- Inherits:
-
Object
- Object
- Wikipedia::VandalismDetection::Features::Base
- Defined in:
- lib/wikipedia/vandalism_detection/features/base.rb
Overview
This class should be the base class for all Wikipedia::Feature classes.
Direct Known Subclasses
Anonymity, AnonymityPrevious, ArticleSize, Blanking, CharacterDiversity, CharacterSequence, CommentLength, CommentMarkupFrequency, Compressibility, ContainsBase, DigitRatio, EditsPerUser, EmoticonsFrequency, EmoticonsImpact, FrequencyBase, ImpactBase, InsertedCharacterDistribution, InsertedExternalLinks, InsertedInternalLinks, InsertedSize, InsertedWords, LongestWord, MarkupFrequency, MarkupImpact, NonAlphanumericRatio, RemovedCharacterDistribution, RemovedEmoticonsFrequency, RemovedMarkupFrequency, RemovedSize, RemovedWords, ReplacementSimilarity, RevisionsCharacterDistribution, SameEditor, SizeIncrement, SizeRatio, TimeInterval, TimeOfDay, UpperCaseRatio, UpperCaseWordsRatio, UpperToLowerCaseRatio, UserReputation, Weekday, WordsIncrement
Instance Method Summary collapse
-
#calculate(edit) ⇒ Object
Base method for feature calculation.
-
#count(terms, options = {}) ⇒ Object
Count the apperance of a given single term or multiple terms in the given text Example of usage:.
Instance Method Details
#calculate(edit) ⇒ Object
Base method for feature calculation. This method should be overwritten in the concrete Wikipedia::Feature-classes.
Example: def calculate(edit)
super # to handle ArgumentException
... concrete calculation of feature out of edit...
end
21 22 23 |
# File 'lib/wikipedia/vandalism_detection/features/base.rb', line 21 def calculate(edit) raise ArgumentError.new "parameter should be an Edit" unless edit.kind_of? Edit end |
#count(terms, options = {}) ⇒ Object
Count the apperance of a given single term or multiple terms in the given text Example of usage:
feature.count “and”, in: text feature.count [“and”, “or”], in: text
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/wikipedia/vandalism_detection/features/base.rb', line 33 def count(terms, = {}) terms_is_string = terms.is_a?(String) terms_is_array = terms.is_a?(Array) raise ArgumentError, "The second parameter should be a Hash of form {in: text}" unless [:in] raise ArgumentError, "The first parameter should be an Array or String" unless (terms_is_array || terms_is_string) words = [:in].downcase freq = Hash.new(0) words.gsub(/[\.,'{2,}:\!\?\(\)]/, '').split.each{ |v| freq[v.to_sym] += 1 } if terms_is_string freq[terms.downcase.to_sym] else terms.reduce(0) {|r, term| r + freq[term] } end end |