Classifier
A Ruby library for text classification using Bayesian and Latent Semantic Indexing (LSI) algorithms.
Documentation · Tutorials · Guides
Table of Contents
- Installation
- Bayesian Classifier
- LSI (Latent Semantic Indexing)
- Persistence
- Performance
- Development
- Contributing
- License
Installation
Add to your Gemfile:
gem 'classifier'
Then run:
bundle install
Or install directly:
gem install classifier
Native C Extension
The gem includes a native C extension for fast LSI operations. It compiles automatically during gem installation. No external dependencies are required.
To verify the native extension is active:
require 'classifier'
puts Classifier::LSI.backend # => :native
To force pure Ruby mode (for debugging):
NATIVE_VECTOR=true ruby your_script.rb
To suppress the warning when native extension isn't available:
SUPPRESS_LSI_WARNING=true ruby your_script.rb
Compatibility
| Ruby Version | Status |
|---|---|
| 4.0 | Supported |
| 3.4 | Supported |
| 3.3 | Supported |
| 3.2 | Supported |
| 3.1 | EOL (unsupported) |
Bayesian Classifier
Fast, accurate classification with modest memory requirements. Ideal for spam filtering, sentiment analysis, and content categorization.
Quick Start
require 'classifier'
classifier = Classifier::Bayes.new('Spam', 'Ham')
# Train the classifier
classifier.train_spam "Buy cheap viagra now! Limited offer!"
classifier.train_spam "You've won a million dollars! Claim now!"
classifier.train_ham "Meeting scheduled for tomorrow at 10am"
classifier.train_ham "Please review the attached document"
# Classify new text
classifier.classify "Congratulations! You've won a prize!"
# => "Spam"
Learn More
- Bayes Basics Guide - In-depth documentation
- Build a Spam Filter Tutorial - Step-by-step guide
- Paul Graham: A Plan for Spam
LSI (Latent Semantic Indexing)
Semantic analysis using Singular Value Decomposition (SVD). More flexible than Bayesian classifiers, providing search, clustering, and classification based on meaning rather than just keywords.
Quick Start
require 'classifier'
lsi = Classifier::LSI.new
# Add documents with categories
lsi.add_item "Dogs are loyal pets that love to play fetch", :pets
lsi.add_item "Cats are independent and love to nap", :pets
lsi.add_item "Ruby is a dynamic programming language", :programming
lsi.add_item "Python is great for data science", :programming
# Classify new text
lsi.classify "My puppy loves to run around"
# => :pets
# Get classification with confidence score
lsi.classify_with_confidence "Learning to code in Ruby"
# => [:programming, 0.89]
Search and Discovery
# Find similar documents
lsi. "Dogs are great companions", 2
# => ["Dogs are loyal pets that love to play fetch", "Cats are independent..."]
# Search by keyword
lsi.search "programming", 3
# => ["Ruby is a dynamic programming language", "Python is great for..."]
Learn More
- LSI Basics Guide - In-depth documentation
- Wikipedia: Latent Semantic Analysis
Persistence
Save and load trained classifiers with pluggable storage backends. Works with both Bayes and LSI classifiers.
File Storage
require 'classifier'
classifier = Classifier::Bayes.new('Spam', 'Ham')
classifier.train_spam "Buy now! Limited offer!"
classifier.train_ham "Meeting tomorrow at 3pm"
# Configure storage and save
classifier.storage = Classifier::Storage::File.new(path: "spam_filter.json")
classifier.save
# Load later
loaded = Classifier::Bayes.load(storage: classifier.storage)
loaded.classify "Claim your prize now!"
# => "Spam"
Custom Storage Backends
Create backends for Redis, PostgreSQL, S3, or any storage system:
class RedisStorage < Classifier::Storage::Base
def initialize(redis:, key:)
super()
@redis, @key = redis, key
end
def write(data) = @redis.set(@key, data)
def read = @redis.get(@key)
def delete = @redis.del(@key)
def exists? = @redis.exists?(@key)
end
# Use it
classifier.storage = RedisStorage.new(redis: Redis.new, key: "classifier:spam")
classifier.save
Learn More
- Persistence Guide - Full documentation with examples
Performance
Native C Extension vs Pure Ruby
The native C extension provides dramatic speedups for LSI operations, especially build_index (SVD computation):
| Documents | build_index | Overall |
|---|---|---|
| 5 | 7x faster | 2.6x |
| 10 | 25x faster | 4.6x |
| 15 | 112x faster | 14.5x |
| 20 | 385x faster | 48.7x |
Detailed benchmark (20 documents)
``` Operation Pure Ruby Native C Speedup ---------------------------------------------------------- build_index 0.5540 0.0014 384.5x classify 0.0190 0.0060 3.2x search 0.0145 0.0037 3.9x find_related 0.0098 0.0011 8.6x ---------------------------------------------------------- TOTAL 0.5973 0.0123 48.7x ```Running Benchmarks
rake benchmark # Run with current configuration
rake benchmark:compare # Compare native C vs pure Ruby
Development
Setup
git clone https://github.com/cardmagic/classifier.git
cd classifier
bundle install
rake compile # Compile native C extension
Running Tests
rake test # Run all tests (compiles first)
ruby -Ilib test/bayes/bayesian_test.rb # Run specific test file
# Test with pure Ruby (no native extension)
NATIVE_VECTOR=true rake test
Console
rake console
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -am 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Authors
- Lucas Carlson - Original author - [email protected]
- David Fayram II - LSI implementation - [email protected]
- Cameron McBride - [email protected]
- Ivan Acosta-Rubio - [email protected]
License
This library is released under the GNU Lesser General Public License (LGPL) 2.1.