Class: LanguageDetector

Inherits:

Object

Object
LanguageDetector

show all

Defined in:: lib/unsupervised-language-detection/language-detector.rb

Overview

Given a set of sentences in multiple languages, build a classifier to detect the majority language.

Instance Attribute Summary collapse

#classifier ⇒ Object readonly

Returns the value of attribute classifier.

Class Method Summary collapse

.load_yaml(filename) ⇒ Object

Loads the language model from a file.

Instance Method Summary collapse

#classify(sentence) ⇒ Object

Returns the (named) category the sentence belongs to.
#initialize(options = {}) ⇒ LanguageDetector constructor

A new instance of LanguageDetector.
#probabilities(sentence) ⇒ Object
#train(max_epochs, training_sentences) ⇒ Object
#yamlize(filename) ⇒ Object

Dumps the language model to a file.

Constructor Details

#initialize(options = {}) ⇒ `LanguageDetector`

Returns a new instance of LanguageDetector.

# File 'lib/unsupervised-language-detection/language-detector.rb', line 39

def initialize(options = {})
  options = {:ngram_size => 3}.merge(options)    
  @ngram_size = options[:ngram_size]
  @classifier = NaiveBayesClassifier.new(:num_categories => 2)
end

Instance Attribute Details

#classifier ⇒ `Object` (readonly)

Returns the value of attribute classifier.



37
38
39

# File 'lib/unsupervised-language-detection/language-detector.rb', line 37

def classifier
  @classifier
end

Class Method Details

.load_yaml(filename) ⇒ `Object`

Loads the language model from a file.



73
74
75

# File 'lib/unsupervised-language-detection/language-detector.rb', line 73

def self.load_yaml(filename)
  return YAML::load(File.read(filename))
end

Instance Method Details

#classify(sentence) ⇒ `Object`

Returns the (named) category the sentence belongs to.

# File 'lib/unsupervised-language-detection/language-detector.rb', line 56

def classify(sentence)
  category_index = @classifier.classify(sentence.to_ngrams(@ngram_size))
  @classifier.category_names[category_index]
end

#probabilities(sentence) ⇒ `Object`



61
62
63

# File 'lib/unsupervised-language-detection/language-detector.rb', line 61

def probabilities(sentence)
  @classifier.get_posterior_category_probabilities(sentence.to_ngrams(@ngram_size))
end

#train(max_epochs, training_sentences) ⇒ `Object`

# File 'lib/unsupervised-language-detection/language-detector.rb', line 45

def train(max_epochs, training_sentences)
  @classifier = NaiveBayesClassifier.train_em(max_epochs, training_sentences.map{ |sentence| sentence.to_ngrams(@ngram_size) })
  @classifier.category_names = 
    if @classifier.get_prior_category_probability(0) > @classifier.get_prior_category_probability(1)
      %w( majority minority )
    else
      %w( minority majority )
    end    
end

#yamlize(filename) ⇒ `Object`

Dumps the language model to a file.

# File 'lib/unsupervised-language-detection/language-detector.rb', line 66

def yamlize(filename)
  File.open(filename, "w") do |f|
    f.puts self.to_yaml
  end
end

Class: LanguageDetector

Overview

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ LanguageDetector

Instance Attribute Details

#classifier ⇒ Object (readonly)

Class Method Details

.load_yaml(filename) ⇒ Object

Instance Method Details

#classify(sentence) ⇒ Object

#probabilities(sentence) ⇒ Object

#train(max_epochs, training_sentences) ⇒ Object