Class: LanguageDetector
- Inherits:
-
Object
- Object
- LanguageDetector
- Defined in:
- lib/unsupervised-language-detection/language-detector.rb
Overview
Given a set of sentences in multiple languages, build a classifier to detect the majority language.
Instance Attribute Summary collapse
-
#classifier ⇒ Object
readonly
Returns the value of attribute classifier.
Class Method Summary collapse
-
.load_yaml(filename) ⇒ Object
Loads the language model from a file.
Instance Method Summary collapse
-
#classify(sentence) ⇒ Object
Returns the (named) category the sentence belongs to.
-
#initialize(options = {}) ⇒ LanguageDetector
constructor
A new instance of LanguageDetector.
- #probabilities(sentence) ⇒ Object
- #train(max_epochs, training_sentences) ⇒ Object
-
#yamlize(filename) ⇒ Object
Dumps the language model to a file.
Constructor Details
#initialize(options = {}) ⇒ LanguageDetector
Returns a new instance of LanguageDetector.
39 40 41 42 43 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 39 def initialize( = {}) = {:ngram_size => 3}.merge() @ngram_size = [:ngram_size] @classifier = NaiveBayesClassifier.new(:num_categories => 2) end |
Instance Attribute Details
#classifier ⇒ Object (readonly)
Returns the value of attribute classifier.
37 38 39 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 37 def classifier @classifier end |
Class Method Details
.load_yaml(filename) ⇒ Object
Loads the language model from a file.
73 74 75 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 73 def self.load_yaml(filename) return YAML::load(File.read(filename)) end |
Instance Method Details
#classify(sentence) ⇒ Object
Returns the (named) category the sentence belongs to.
56 57 58 59 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 56 def classify(sentence) category_index = @classifier.classify(sentence.to_ngrams(@ngram_size)) @classifier.category_names[category_index] end |
#probabilities(sentence) ⇒ Object
61 62 63 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 61 def probabilities(sentence) @classifier.get_posterior_category_probabilities(sentence.to_ngrams(@ngram_size)) end |
#train(max_epochs, training_sentences) ⇒ Object
45 46 47 48 49 50 51 52 53 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 45 def train(max_epochs, training_sentences) @classifier = NaiveBayesClassifier.train_em(max_epochs, training_sentences.map{ |sentence| sentence.to_ngrams(@ngram_size) }) @classifier.category_names = if @classifier.get_prior_category_probability(0) > @classifier.get_prior_category_probability(1) %w( majority minority ) else %w( minority majority ) end end |
#yamlize(filename) ⇒ Object
Dumps the language model to a file.
66 67 68 69 70 |
# File 'lib/unsupervised-language-detection/language-detector.rb', line 66 def yamlize(filename) File.open(filename, "w") do |f| f.puts self.to_yaml end end |