Classifier
Text classification in Ruby. Five algorithms, native performance, streaming support.
Documentation · Tutorials · API Reference
Why This Library?
| This Gem | Other Forks | |
|---|---|---|
| Algorithms | ✅ 5 classifiers | ❌ 2 only |
| Incremental LSI | ✅ Brand's algorithm (no rebuild) | ❌ Full SVD rebuild on every add |
| LSI Performance | ✅ Native C extension (5-50x faster) | ❌ Pure Ruby or requires GSL |
| Streaming | ✅ Train on multi-GB datasets | ❌ Must load all data in memory |
| Persistence | ✅ Pluggable (file, Redis, S3) | ❌ Marshal only |
Installation
gem 'classifier'
Quick Start
Bayesian
classifier = Classifier::Bayes.new(:spam, :ham)
classifier.train(spam: "Buy cheap viagra now!", ham: "Meeting at 3pm tomorrow")
classifier.classify "You've won a prize!" # => "Spam"
Logistic Regression
classifier = Classifier::LogisticRegression.new(:positive, :negative)
classifier.train(positive: "Great product!", negative: "Terrible experience")
classifier.classify "Loved it!" # => "Positive"
LSI (Latent Semantic Indexing)
lsi = Classifier::LSI.new
lsi.add(pets: "Dogs are loyal", tech: "Ruby is elegant")
lsi.classify "My puppy is playful" # => "pets"
k-Nearest Neighbors
knn = Classifier::KNN.new(k: 3)
knn.train(spam: "Free money!", ham: "Quarterly report attached") # or knn.add()
knn.classify "Claim your prize" # => "spam"
TF-IDF
tfidf = Classifier::TFIDF.new
tfidf.fit(["Dogs are pets", "Cats are independent"])
tfidf.transform("Dogs are loyal") # => {:dog => 0.707, :loyal => 0.707}
Key Features
Incremental LSI
Add documents without rebuilding the entire index—400x faster for streaming data:
lsi = Classifier::LSI.new(incremental: true)
lsi.add(tech: ["Ruby is elegant", "Python is popular"])
lsi.build_index
# These use Brand's algorithm—no full rebuild
lsi.add(tech: "Go is fast")
lsi.add(tech: "Rust is safe")
Persistence
classifier.storage = Classifier::Storage::File.new(path: "model.json")
classifier.save
loaded = Classifier::Bayes.load(storage: classifier.storage)
Streaming Training
classifier.train_from_stream(:spam, File.open("spam_corpus.txt"))
Performance
Native C extension provides 5-50x speedup for LSI operations:
| Documents | Speedup |
|---|---|
| 10 | 25x |
| 20 | 50x |
rake benchmark:compare # Run your own comparison
Development
bundle install
rake compile # Build native extension
rake test # Run tests
Authors
- Lucas Carlson - [email protected]
- David Fayram II - [email protected]
- Cameron McBride - [email protected]
- Ivan Acosta-Rubio - [email protected]