ClickhouseRuby

A lightweight Ruby client for ClickHouse with optional ActiveRecord integration.

Why ClickhouseRuby?

ClickhouseRuby is designed from the ground up with production reliability and developer experience in mind. Here's what sets it apart:

🔒 Security & Reliability

SSL verification enabled by default - Secure by default, unlike alternatives that require explicit configuration
Never silently fails - All errors are properly raised and propagated (fixes clickhouse-activerecord #230)
Comprehensive error hierarchy - 30+ specific error classes mapped from ClickHouse error codes

⚡ Performance & Architecture

Zero runtime dependencies - Uses only Ruby stdlib, making it lightweight and fully auditable
AST-based type parser - Handles complex nested types correctly (Array(Tuple(String, UInt64)), etc.) unlike regex-based parsers
Thread-safe connection pooling - Built-in pool with health checks and proper resource management
Result streaming - Process millions of rows with constant memory usage

🛠️ Developer Experience

Clean, intuitive API - Simple methods for queries, inserts, and DDL operations
Optional ActiveRecord integration - Familiar model-based access when you need it
ClickHouse-specific query extensions - PREWHERE, FINAL, SAMPLE, and SETTINGS DSL built-in
Comprehensive type system - Full support for all ClickHouse types including Nullable, Array, Map, Tuple, Enum, Decimal

📊 Production Ready

Automatic retries - Configurable exponential backoff for transient failures
HTTP compression - Reduce bandwidth for large payloads
Connection health monitoring - Pool statistics and health checks
Extensive test coverage - 80%+ coverage with both unit and integration tests

Features

Core (v0.1.0)

Simple HTTP client - Clean API for queries, commands, and bulk inserts
Connection pooling - Built-in connection pool with health checks
Type system - Full support for ClickHouse types including Nullable, Array, Map, Tuple
Proper error handling - Never silently ignores errors (fixes clickhouse-activerecord #230)
SSL/TLS support - Certificate verification enabled by default
ActiveRecord integration - Optional familiar model-based access

Enhanced (v0.2.0)

Enum & Decimal types - Fixed-set values and arbitrary precision arithmetic
Query optimization - PREWHERE clause, FINAL deduplication, SAMPLE approximation
Performance - HTTP compression, result streaming, automatic retries with backoff
Query control - Per-query SETTINGS DSL for ClickHouse configuration

Installation

Add this line to your application's Gemfile:

gem 'clickhouse-ruby'

And then execute:

bundle install

Or install it yourself as:

gem install clickhouse-ruby

Quick Start

require 'clickhouse_ruby'

# Configure the client
config = ClickhouseRuby::Configuration.new
config.host = 'localhost'
config.port = 8123
config.database = 'default'

# Create a client
client = ClickhouseRuby::Client.new(config)

# Execute a query
result = client.execute('SELECT 1 + 1 AS result')
puts result.first['result'] # => 2

# Insert data
client.insert('events', [
  { date: '2024-01-01', event_type: 'click', count: 100 },
  { date: '2024-01-02', event_type: 'view', count: 250 }
])

Configuration

Basic Configuration

ClickhouseRuby.configure do |config|
  config.host = 'localhost'
  config.port = 8123
  config.database = 'default'
  config.username = 'default'
  config.password = ''
  config.ssl = false
  config.connect_timeout = 10
  config.read_timeout = 60
end

# Use the default client
client = ClickhouseRuby.client

Configuration Options

Option	Description	Default
`host`	ClickHouse server hostname	`localhost`
`port`	HTTP interface port	`8123`
`database`	Default database	`default`
`username`	Authentication username	`default`
`password`	Authentication password	`nil`
`ssl`	Enable HTTPS	`false`
`ssl_verify`	Verify SSL certificates	`true`
`ssl_ca_path`	Custom CA certificate file path	`nil`
`connect_timeout`	Connection timeout in seconds	`10`
`read_timeout`	Read timeout in seconds	`60`
`write_timeout`	Write timeout in seconds	`60`
`pool_size`	Connection pool size	`5`
`log_level`	Logger level (`:debug`, `:info`, `:warn`, `:error`)	`:info`
`default_settings`	Global ClickHouse settings for all queries	`{}`
`pool_timeout`	Wait time for available connection in seconds	`5`
`retry_jitter`	Jitter strategy (`:full`, `:equal`, `:none`)	`:equal`

Environment Variables

ClickhouseRuby can be configured via environment variables:

CLICKHOUSE_HOST=localhost
CLICKHOUSE_PORT=8123
CLICKHOUSE_DATABASE=default
CLICKHOUSE_USERNAME=default
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_SSL=false

v0.2.0 Enhancements

HTTP Compression - Reduce bandwidth for large payloads:

ClickhouseRuby.configure do |config|
  config.compression = 'gzip'
  config.compression_threshold = 1024  # Only compress > 1KB
end

Retry Logic - Automatic retries with exponential backoff:

ClickhouseRuby.configure do |config|
  config.max_retries = 3
  config.initial_backoff = 1.0
  config.backoff_multiplier = 1.6
  config.max_backoff = 120
end

Result Streaming - Process large results with constant memory:

client.stream_execute('SELECT * FROM huge_table') do |row|
  process_row(row)
end

ActiveRecord Query Extensions - ClickHouse-specific query methods:

# Query optimization
Event.prewhere(date: Date.today).where(status: 'active')

# Deduplication
User.final.where(id: 123)

# Approximate queries
Event.sample(0.1).count  # 10% sample

# Per-query configuration
Event.settings(max_threads: 4).where(active: true)

See docs/features/README.md for detailed documentation on all v0.2.0 features.

Usage

Querying Data

# Simple query - returns Result object
result = client.execute('SELECT * FROM events LIMIT 10')
result.each { |row| puts row['event_type'] }

# Access columns and types
result.columns  # => ['id', 'event_type', 'count']
result.types    # => ['UInt64', 'String', 'UInt32']

# Query with settings
result = client.execute(
  'SELECT * FROM events',
  settings: { max_rows_to_read: 1_000_000 }
)

Result Metadata

Access query metadata and result information:

result = client.execute('SELECT * FROM events LIMIT 100')

# Query execution metadata
result.elapsed_time    # => 0.042 (seconds)
result.rows_read       # => 100
result.bytes_read      # => 8500
result.rows_written    # => 0 (for SELECT queries)
result.bytes_written   # => 0

# Result size information
result.count           # => 100 (alias for size/length)
result.size            # => 100
result.length          # => 100

# Column information
result.columns         # => ["id", "event_type", "count"]
result.column_types    # => ["UInt64", "String", "UInt32"]
result.types           # => ["UInt64", "String", "UInt32"]

# Row access methods
result.first           # => {"id" => 1, "event_type" => "click", "count" => 100}
result.last            # => {"id" => 100, "event_type" => "view", "count" => 50}
result[5]              # => {"id" => 5, "event_type" => "click", "count" => 75}

# Get all values for a specific column
result.column_values("event_type")  # => ["click", "view", "click", ...]

DDL Commands

# Create table
client.command(<<~SQL)
  CREATE TABLE events (
    date Date,
    event_type String,
    count UInt32
  ) ENGINE = MergeTree()
  ORDER BY date
SQL

# Drop table
client.command('DROP TABLE IF EXISTS events')

Inserting Data

# Insert hash array (uses efficient JSONEachRow format)
client.insert('events', [
  { date: '2024-01-01', event_type: 'click', count: 100 },
  { date: '2024-01-02', event_type: 'view', count: 250 }
])

# Insert with explicit columns
client.insert('events', data, columns: ['date', 'event_type', 'count'])

Connection Management

# Check connection
client.ping # => true

# Get server version
client.server_version # => "24.1.1.123"

# Get pool statistics
client.pool_stats # => {
#   size: 5,                    # Total pool capacity
#   available: 4,               # Connections ready to use
#   in_use: 1,                  # Connections currently in use
#   total_connections: 5,       # Total connections created
#   total_checkouts: 150,       # Lifetime pool checkout count
#   total_timeouts: 0,          # Timeouts waiting for connection
#   uptime_seconds: 3600.5      # Seconds since pool created
# }

# Close all connections
client.close

Advanced Client Methods

Batch Processing

Process large query results in batches to manage memory efficiently:

# Process 500 rows at a time
client.each_batch('SELECT * FROM huge_table', batch_size: 500) do |batch|
  # batch is an array of hashes (max 500 rows)
  puts "Processing batch of #{batch.size} rows"
  insert_to_cache(batch)
end

# Default batch size is 500 rows
client.each_batch('SELECT * FROM data') { |batch| process(batch) }

Row-by-Row Processing

Process results one row at a time for maximum memory efficiency:

# Stream processing - constant memory usage
client.each_row('SELECT * FROM massive_table') do |row|
  # row is a single hash
  puts "Processing: #{row['id']}"
  update_statistics(row)
end

# Returns Enumerator if no block given
rows = client.each_row('SELECT * FROM table')
rows.each { |row| puts row['name'] }

Connection Aliases

Additional connection management methods:

# Disconnect all connections in the pool
client.disconnect

# Check if client is connected
client.connected?  # => true

Module-Level Methods

Quick access without explicit client instance:

# Execute query using default client
result = ClickhouseRuby.execute('SELECT 1 AS num')

# Insert data using default client
ClickhouseRuby.insert('events', [
  { date: '2024-01-01', event_type: 'click' }
])

# Ping default client
ClickhouseRuby.ping  # => true

Performance Note: Batch and row-by-row processing methods use result streaming internally, which maintains constant memory usage regardless of result size. Use these methods for queries that return millions of rows to prevent memory exhaustion.

Type Support

ClickhouseRuby supports all ClickHouse types:

ClickHouse Type	Ruby Type
Int8-Int64, UInt8-UInt64	Integer
Float32, Float64	Float
String, FixedString	String
Date, Date32	Date
DateTime, DateTime64	Time
UUID	String
Bool	Boolean
Nullable(T)	T or nil
Array(T)	Array
Map(K, V)	Hash
Tuple(T...)	Array
LowCardinality(T)	T

Error Handling

begin
  client.execute('SELECT * FROM nonexistent_table')
rescue ClickhouseRuby::UnknownTable => e
  puts "Table not found: #{e.message}"
  puts "Error code: #{e.code}"        # ClickHouse error code
  puts "HTTP status: #{e.http_status}" # HTTP response code
rescue ClickhouseRuby::ConnectionError => e
  puts "Connection failed: #{e.message}"
rescue ClickhouseRuby::QueryError => e
  puts "Query failed: #{e.message}"
end

Error Classes

ClickhouseRuby::Error - Base error class
ClickhouseRuby::ConnectionError - Connection issues
ClickhouseRuby::ConnectionTimeout - Timeout errors
ClickhouseRuby::QueryError - Query execution errors
ClickhouseRuby::SyntaxError - SQL syntax errors
ClickhouseRuby::UnknownTable - Table doesn't exist
ClickhouseRuby::UnknownColumn - Column doesn't exist
ClickhouseRuby::UnknownDatabase - Database doesn't exist

ActiveRecord Integration

ClickhouseRuby provides optional ActiveRecord integration for familiar model-based access.

# config/initializers/clickhouse_ruby.rb
require 'clickhouse_ruby/active_record'

ClickhouseRuby::ActiveRecord.establish_connection(
  host: 'localhost',
  database: 'analytics'
)

# app/models/event.rb
class Event < ClickhouseRuby::ActiveRecord::Base
  self.table_name = 'events'
end

# Usage
Event.where(date: '2024-01-01').limit(10).each do |event|
  puts event.event_type
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests.

Running Tests

# Start ClickHouse
docker-compose up -d

# Run unit tests only
bundle exec rspec spec/unit

# Run all tests including integration
CLICKHOUSE_TEST_INTEGRATION=true bundle exec rspec

Requirements

Ruby >= 2.6.0
ClickHouse >= 20.x (tested with 24.x)

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/Mohamad-Kamar/clickhouse-ruby.

Fork it
Create your feature branch (git checkout -b feature/my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/my-new-feature)
Create a new Pull Request

License

The gem is available as open source under the terms of the MIT License.

Documentation

Full documentation is available at RubyDoc.