Classifier

Gem Version CI License: LGPL

Text classification in Ruby. Five algorithms, native performance, streaming support.

Documentation · Tutorials · API Reference

Why This Library?

This Gem Other Forks
Algorithms ✅ 5 classifiers ❌ 2 only
Incremental LSI ✅ Brand's algorithm (no rebuild) ❌ Full SVD rebuild on every add
LSI Performance ✅ Native C extension (5-50x faster) ❌ Pure Ruby or requires GSL
Streaming ✅ Train on multi-GB datasets ❌ Must load all data in memory
Persistence ✅ Pluggable (file, Redis, S3) ❌ Marshal only

Installation

gem 'classifier'

Quick Start

Bayesian

classifier = Classifier::Bayes.new(:spam, :ham)
classifier.train(spam: "Buy cheap viagra now!", ham: "Meeting at 3pm tomorrow")
classifier.classify "You've won a prize!"  # => "Spam"

Bayesian Guide →

Logistic Regression

classifier = Classifier::LogisticRegression.new(:positive, :negative)
classifier.train(positive: "Great product!", negative: "Terrible experience")
classifier.classify "Loved it!"  # => "Positive"

Logistic Regression Guide →

LSI (Latent Semantic Indexing)

lsi = Classifier::LSI.new
lsi.add(pets: "Dogs are loyal", tech: "Ruby is elegant")
lsi.classify "My puppy is playful"  # => "pets"

LSI Guide →

k-Nearest Neighbors

knn = Classifier::KNN.new(k: 3)
knn.train(spam: "Free money!", ham: "Quarterly report attached")  # or knn.add()
knn.classify "Claim your prize"  # => "spam"

k-Nearest Neighbors Guide →

TF-IDF

tfidf = Classifier::TFIDF.new
tfidf.fit(["Dogs are pets", "Cats are independent"])
tfidf.transform("Dogs are loyal")  # => {:dog => 0.707, :loyal => 0.707}

TF-IDF Guide →

Key Features

Incremental LSI

Add documents without rebuilding the entire index—400x faster for streaming data:

lsi = Classifier::LSI.new(incremental: true)
lsi.add(tech: ["Ruby is elegant", "Python is popular"])
lsi.build_index

# These use Brand's algorithm—no full rebuild
lsi.add(tech: "Go is fast")
lsi.add(tech: "Rust is safe")

Learn more →

Persistence

classifier.storage = Classifier::Storage::File.new(path: "model.json")
classifier.save

loaded = Classifier::Bayes.load(storage: classifier.storage)

Learn more →

Streaming Training

classifier.train_from_stream(:spam, File.open("spam_corpus.txt"))

Learn more →

Performance

Native C extension provides 5-50x speedup for LSI operations:

Documents Speedup
10 25x
20 50x
rake benchmark:compare  # Run your own comparison

Development

bundle install
rake compile  # Build native extension
rake test     # Run tests

Authors

License

LGPL 2.1