ClickhouseRuby
A lightweight Ruby client for ClickHouse with optional ActiveRecord integration.
Why ClickhouseRuby?
ClickhouseRuby is designed from the ground up with production reliability and developer experience in mind. Here's what sets it apart:
🔒 Security & Reliability
- SSL verification enabled by default - Secure by default, unlike alternatives that require explicit configuration
- Never silently fails - All errors are properly raised and propagated (fixes clickhouse-activerecord #230)
- Comprehensive error hierarchy - 30+ specific error classes mapped from ClickHouse error codes
⚡ Performance & Architecture
- Zero runtime dependencies - Uses only Ruby stdlib, making it lightweight and fully auditable
- AST-based type parser - Handles complex nested types correctly (Array(Tuple(String, UInt64)), etc.) unlike regex-based parsers
- Thread-safe connection pooling - Built-in pool with health checks and proper resource management
- Result streaming - Process millions of rows with constant memory usage
🛠️ Developer Experience
- Clean, intuitive API - Simple methods for queries, inserts, and DDL operations
- Optional ActiveRecord integration - Familiar model-based access when you need it
- ClickHouse-specific query extensions - PREWHERE, FINAL, SAMPLE, and SETTINGS DSL built-in
- Comprehensive type system - Full support for all ClickHouse types including Nullable, Array, Map, Tuple, Enum, Decimal
📊 Production Ready
- Automatic retries - Configurable exponential backoff for transient failures
- HTTP compression - Reduce bandwidth for large payloads
- Connection health monitoring - Pool statistics and health checks
- Extensive test coverage - 80%+ coverage with both unit and integration tests
Features
Core (v0.1.0)
- Simple HTTP client - Clean API for queries, commands, and bulk inserts
- Connection pooling - Built-in connection pool with health checks
- Type system - Full support for ClickHouse types including Nullable, Array, Map, Tuple
- Proper error handling - Never silently ignores errors (fixes clickhouse-activerecord #230)
- SSL/TLS support - Certificate verification enabled by default
- ActiveRecord integration - Optional familiar model-based access
Enhanced (v0.2.0)
- Enum & Decimal types - Fixed-set values and arbitrary precision arithmetic
- Query optimization - PREWHERE clause, FINAL deduplication, SAMPLE approximation
- Performance - HTTP compression, result streaming, automatic retries with backoff
- Query control - Per-query SETTINGS DSL for ClickHouse configuration
Installation
Add this line to your application's Gemfile:
gem 'clickhouse-ruby'
And then execute:
bundle install
Or install it yourself as:
gem install clickhouse-ruby
Quick Start
require 'clickhouse_ruby'
# Configure the client
config = ClickhouseRuby::Configuration.new
config.host = 'localhost'
config.port = 8123
config.database = 'default'
# Create a client
client = ClickhouseRuby::Client.new(config)
# Execute a query
result = client.execute('SELECT 1 + 1 AS result')
puts result.first['result'] # => 2
# Insert data
client.insert('events', [
{ date: '2024-01-01', event_type: 'click', count: 100 },
{ date: '2024-01-02', event_type: 'view', count: 250 }
])
Configuration
Basic Configuration
ClickhouseRuby.configure do |config|
config.host = 'localhost'
config.port = 8123
config.database = 'default'
config.username = 'default'
config.password = ''
config.ssl = false
config.connect_timeout = 10
config.read_timeout = 60
end
# Use the default client
client = ClickhouseRuby.client
Configuration Options
| Option | Description | Default |
|---|---|---|
host |
ClickHouse server hostname | localhost |
port |
HTTP interface port | 8123 |
database |
Default database | default |
username |
Authentication username | default |
password |
Authentication password | nil |
ssl |
Enable HTTPS | false |
ssl_verify |
Verify SSL certificates | true |
ssl_ca_path |
Custom CA certificate file path | nil |
connect_timeout |
Connection timeout in seconds | 10 |
read_timeout |
Read timeout in seconds | 60 |
write_timeout |
Write timeout in seconds | 60 |
pool_size |
Connection pool size | 5 |
log_level |
Logger level (:debug, :info, :warn, :error) |
:info |
default_settings |
Global ClickHouse settings for all queries | {} |
pool_timeout |
Wait time for available connection in seconds | 5 |
retry_jitter |
Jitter strategy (:full, :equal, :none) |
:equal |
Environment Variables
ClickhouseRuby can be configured via environment variables:
CLICKHOUSE_HOST=localhost
CLICKHOUSE_PORT=8123
CLICKHOUSE_DATABASE=default
CLICKHOUSE_USERNAME=default
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_SSL=false
v0.2.0 Enhancements
HTTP Compression - Reduce bandwidth for large payloads:
ClickhouseRuby.configure do |config|
config.compression = 'gzip'
config.compression_threshold = 1024 # Only compress > 1KB
end
Retry Logic - Automatic retries with exponential backoff:
ClickhouseRuby.configure do |config|
config.max_retries = 3
config.initial_backoff = 1.0
config.backoff_multiplier = 1.6
config.max_backoff = 120
end
Result Streaming - Process large results with constant memory:
client.stream_execute('SELECT * FROM huge_table') do |row|
process_row(row)
end
ActiveRecord Query Extensions - ClickHouse-specific query methods:
# Query optimization
Event.prewhere(date: Date.today).where(status: 'active')
# Deduplication
User.final.where(id: 123)
# Approximate queries
Event.sample(0.1).count # 10% sample
# Per-query configuration
Event.settings(max_threads: 4).where(active: true)
See docs/features/README.md for detailed documentation on all v0.2.0 features.
Usage
Querying Data
# Simple query - returns Result object
result = client.execute('SELECT * FROM events LIMIT 10')
result.each { |row| puts row['event_type'] }
# Access columns and types
result.columns # => ['id', 'event_type', 'count']
result.types # => ['UInt64', 'String', 'UInt32']
# Query with settings
result = client.execute(
'SELECT * FROM events',
settings: { max_rows_to_read: 1_000_000 }
)
Result Metadata
Access query metadata and result information:
result = client.execute('SELECT * FROM events LIMIT 100')
# Query execution metadata
result.elapsed_time # => 0.042 (seconds)
result.rows_read # => 100
result.bytes_read # => 8500
result.rows_written # => 0 (for SELECT queries)
result.bytes_written # => 0
# Result size information
result.count # => 100 (alias for size/length)
result.size # => 100
result.length # => 100
# Column information
result.columns # => ["id", "event_type", "count"]
result.column_types # => ["UInt64", "String", "UInt32"]
result.types # => ["UInt64", "String", "UInt32"]
# Row access methods
result.first # => {"id" => 1, "event_type" => "click", "count" => 100}
result.last # => {"id" => 100, "event_type" => "view", "count" => 50}
result[5] # => {"id" => 5, "event_type" => "click", "count" => 75}
# Get all values for a specific column
result.column_values("event_type") # => ["click", "view", "click", ...]
DDL Commands
# Create table
client.command("CREATE TABLE events (\n date Date,\n event_type String,\n count UInt32\n) ENGINE = MergeTree()\nORDER BY date\n")
# Drop table
client.command('DROP TABLE IF EXISTS events')
Inserting Data
# Insert hash array (uses efficient JSONEachRow format)
client.insert('events', [
{ date: '2024-01-01', event_type: 'click', count: 100 },
{ date: '2024-01-02', event_type: 'view', count: 250 }
])
# Insert with explicit columns
client.insert('events', data, columns: ['date', 'event_type', 'count'])
Connection Management
# Check connection
client.ping # => true
# Get server version
client.server_version # => "24.1.1.123"
# Get pool statistics
client.pool_stats # => {
# size: 5, # Total pool capacity
# available: 4, # Connections ready to use
# in_use: 1, # Connections currently in use
# total_connections: 5, # Total connections created
# total_checkouts: 150, # Lifetime pool checkout count
# total_timeouts: 0, # Timeouts waiting for connection
# uptime_seconds: 3600.5 # Seconds since pool created
# }
# Close all connections
client.close
Advanced Client Methods
Batch Processing
Process large query results in batches to manage memory efficiently:
# Process 500 rows at a time
client.each_batch('SELECT * FROM huge_table', batch_size: 500) do |batch|
# batch is an array of hashes (max 500 rows)
puts "Processing batch of #{batch.size} rows"
insert_to_cache(batch)
end
# Default batch size is 500 rows
client.each_batch('SELECT * FROM data') { |batch| process(batch) }
Row-by-Row Processing
Process results one row at a time for maximum memory efficiency:
# Stream processing - constant memory usage
client.each_row('SELECT * FROM massive_table') do |row|
# row is a single hash
puts "Processing: #{row['id']}"
update_statistics(row)
end
# Returns Enumerator if no block given
rows = client.each_row('SELECT * FROM table')
rows.each { |row| puts row['name'] }
Connection Aliases
Additional connection management methods:
# Disconnect all connections in the pool
client.disconnect
# Check if client is connected
client.connected? # => true
Module-Level Methods
Quick access without explicit client instance:
# Execute query using default client
result = ClickhouseRuby.execute('SELECT 1 AS num')
# Insert data using default client
ClickhouseRuby.insert('events', [
{ date: '2024-01-01', event_type: 'click' }
])
# Ping default client
ClickhouseRuby.ping # => true
Performance Note: Batch and row-by-row processing methods use result streaming internally, which maintains constant memory usage regardless of result size. Use these methods for queries that return millions of rows to prevent memory exhaustion.
Type Support
ClickhouseRuby supports all ClickHouse types:
| ClickHouse Type | Ruby Type |
|---|---|
| Int8-Int64, UInt8-UInt64 | Integer |
| Float32, Float64 | Float |
| String, FixedString | String |
| Date, Date32 | Date |
| DateTime, DateTime64 | Time |
| UUID | String |
| Bool | Boolean |
| Nullable(T) | T or nil |
| Array(T) | Array |
| Map(K, V) | Hash |
| Tuple(T...) | Array |
| LowCardinality(T) | T |
Error Handling
begin
client.execute('SELECT * FROM nonexistent_table')
rescue ClickhouseRuby::UnknownTable => e
puts "Table not found: #{e.message}"
puts "Error code: #{e.code}" # ClickHouse error code
puts "HTTP status: #{e.http_status}" # HTTP response code
rescue ClickhouseRuby::ConnectionError => e
puts "Connection failed: #{e.message}"
rescue ClickhouseRuby::QueryError => e
puts "Query failed: #{e.message}"
end
Error Classes
ClickhouseRuby::Error- Base error classClickhouseRuby::ConnectionError- Connection issuesClickhouseRuby::ConnectionTimeout- Timeout errorsClickhouseRuby::QueryError- Query execution errorsClickhouseRuby::SyntaxError- SQL syntax errorsClickhouseRuby::UnknownTable- Table doesn't existClickhouseRuby::UnknownColumn- Column doesn't existClickhouseRuby::UnknownDatabase- Database doesn't exist
ActiveRecord Integration
ClickhouseRuby provides optional ActiveRecord integration for familiar model-based access.
# config/initializers/clickhouse_ruby.rb
require 'clickhouse_ruby/active_record'
ClickhouseRuby::ActiveRecord.establish_connection(
host: 'localhost',
database: 'analytics'
)
# app/models/event.rb
class Event < ClickhouseRuby::ActiveRecord::Base
self.table_name = 'events'
end
# Usage
Event.where(date: '2024-01-01').limit(10).each do |event|
puts event.event_type
end
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests.
Running Tests
# Start ClickHouse
docker-compose up -d
# Run unit tests only
bundle exec rspec spec/unit
# Run all tests including integration
CLICKHOUSE_TEST_INTEGRATION=true bundle exec rspec
Requirements
- Ruby >= 2.6.0
- ClickHouse >= 20.x (tested with 24.x)
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/Mohamad-Kamar/clickhouse-ruby.
- Fork it
- Create your feature branch (
git checkout -b feature/my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin feature/my-new-feature) - Create a new Pull Request
License
The gem is available as open source under the terms of the MIT License.
Documentation
Full documentation is available at RubyDoc.