N-Gram Language Detection in Ruby
For technical details about language detection, I recommend this paper:
Installation
sudo gem sources -a gems.github.com (you only have to do this once) sudo gem install igrigorik-language_detector
Usage
require 'language_detector'
# using generated model (built from scratch)
d = LanguageDetector.new
p d.detect('this text is in English')
# using textcat n-gram model
d = LanguageDetector.new('tc')
p d.detect('this text is in English')
Thanks
-
Kevin Burton (training data): feedblog.org/2005/08/19/ngram-language-categorization-source/
-
Feedbackmine: twitter.com/feedbackmine