ElasticsearchRecord
ActiveRecord adapter for Elasticsearch
ElasticsearchRecord is a ActiveRecord adapter and provides similar functionality for Elasticsearch.
PLEASE NOTE:
- This is the
rails-7-0-stable
-branch, which only supports rails 7.0 (see section ‘Rails_Versions’ for supported versions) - supports ActiveRecord ~> 7.0 + Elasticsearch >= 7.17
Rails versions
Supported rails versions:
Rails 7.1:
(since gem version 1.8)
https://github.com/ruby-smart/elasticsearch_record/tree/rails-7-1-stable
Rails 7.0:
(until gem version 1.7)
https://github.com/ruby-smart/elasticsearch_record/tree/rails-7-0-stable
Installation
Add this line to your application’s Gemfile:
```ruby gem ‘elasticsearch_record’, ‘~> 1.7’
alternative
gem ‘elasticsearch_record’, git: ‘https://github.com/ruby-smart/elasticsearch_record’, branch: ‘rails-7-0-stable’
```
And then execute:
$ bundle install
Or install it yourself as:
$ gem install elasticsearch_record
Features
- ActiveRecord’s
create, read, update & delete
behaviours - Active Record Query Interface
- query-chaining
- scopes
- additional relation methods to find records with
filter, must, must_not, should
- aggregated queries with Elasticsearch
aggregation
methods - resolve search response
hits
,aggregations
,buckets
, … instead of ActiveRecord objects
- Third-party gem support
- access
elasticsearch-dsl
query builder throughmodel.search{ ... }
- access
- Schema
- dump
- create & update of tables (indices) with mappings, settings & aliases
- Instrumentation for ElasticsearchRecord
- logs Elasticsearch API-calls
- shows Runtime in logs
Notice
Since ActiveRecord does not have any configuration option to support transactions and Elasticsearch does NOT support transactions, it may be risky to ignore them.
As a default, transactions are ‘silently swallowed’ to not break any existing applications…
To raise an exception while using transactions on a ElasticsearchRecord model, the following flag can be enabled. However enabling this flag will surely fail transactional tests (prevent this with ‘use_transactional_tests=false’)
```ruby # config/initializers/elasticsearch_record.yml
enable transactional exceptions
ElasticsearchRecord.error_on_transaction = true ```
Setup
a) Update your database.yml and add a elasticsearch connection:
```yml # config/database.yml
development: primary: # <…>
# elasticsearch elasticsearch: adapter: elasticsearch host: localhost:9200 user: elastic password: ‘**’
# enable ES verbose logging
# log: true
# add table (index) prefix & suffix to all 'tables'
# table_name_prefix: 'app-'
# table_name_suffix: '-development'
production: # <…>
# elasticsearch elasticsearch: # <…>
# add table (index) prefix & suffix to all 'tables'
# table_name_prefix: 'app-'
# table_name_suffix: '-production'
test: …
```
b) Require elasticsearch_record/instrumentation
in your application.rb (if you want to…):
```ruby # config/application.rb
require_relative “boot”
require “rails” # Pick the frameworks you want:
<…>
add instrumentation
require ‘elasticsearch_record/instrumentation’
module Application # … end ```
c) Create a model that inherits from ElasticsearchRecord::Base
model.
```ruby # app/models/application_elasticsearch_record.rb
class ApplicationElasticsearchRecord < ElasticsearchRecord::Base # needs to be abstract self.abstract_class = true end ```
Example class, that inherits from ApplicationElasticsearchRecord
```ruby # app/models/search.rb
class Search < ApplicationElasticsearchRecord
end ```
d) have FUN with your model:
```ruby scope = Search .where(name: ‘Custom Object Name’) .where(token: nil) .filter(terms: [:x, :y]) .limit(5)
take the first object
obj = scope.take
update the objects name
obj.update(name: “Not-So-Important”)
extend scope and update all docs
scope.where(kind: :undefined).offset(10).update_all(name: “New Name”)
```
Active Record Query Interface
Refactored where
method:
Different to the default where-method you can now use it in different ways.
Using it by default with a Hash, the method decides itself to either add a filter, or must_not clause.
Hint: If not provided through #kind
-method a default kind :bool will be used.
```ruby # use it by default Search.where(name: ‘A nice object’) # > filter: {name: ‘A nice object’}
use it by default with an array
Search.where(name: [‘A nice object’,’or other object’]) # > filter: {name: [‘A nice object’,’or other object’]}
use it by default with nil
Search.where(name: nil) # > must_not: { exists: { field: ‘name’ } }
——————————————————————-
use it with a prefix
Search.where(:should, term: ‘Mano’) # > should: {name: ‘Mano’} ```
Result methods:
You can simply return RAW data without instantiating ActiveRecord objects:
```ruby
returns the response RAW hits hash.
hits = Search.where(name: ‘A nice object’).hits # > “relation”=>”eq”, “max_score”=>1.0, “hits”=>[{ “_index”: “search”, “_type”: “_doc”, “_id”: “abc123”, “_score”: 1.0, “_source”: { “name”: “A nice object”, …
Returns the RAW +_source+ data from each hit - aka. +rows+.
results = Search.where(name: ‘A nice object’).results # > [{ “name”: “A nice object”, …
returns the response RAW aggregations hash.
aggs = Search.where(name: ‘A nice object’).aggregate(:total, {field: :amount}).aggregations # > “total”=>{“value”=>6722604“total”=>{“value”=>6722604.0}
returns the (nested) bucket values (and aggregated values) from the response aggregations.
buckets = Search.where(name: ‘A nice object’).aggregate(:total, {field: :amount}).buckets # > “total”=>6722604“total”=>6722604.0
resolves RAW +_source+ data from each hit with a +point_in_time+ query (also includes _id)
# useful if you want more then 10000 results. results = Search.where(name: ‘A nice object’).pit_results # > [{ “_id”: “abc123”, “name”: “A nice object”, …
returns the total value of the query without querying again (it uses the total value from the response)
scope = Search.where(name: ‘A nice object’).limit(5) results_count = scope.count # > 5 total = scope.total # > 3335 ```
Available core query methods
- find_by_sql
- find_by_query
- find_by_esql
- esql
- msearch
- search
see simple documentation about these methods @ rubydoc
(also see @ github )
Available query/relation chain methods
- kind
- configure
- aggregate
- refresh
- timeout
- query
- filter
- must_not
- must
- should
- aggregate
- restrict
- hits_only!
- aggs_only!
- total_only!
see simple documentation about these methods @ rubydoc
(also see @ github )
Available calculation methods
- percentiles
- percentile_ranks
- cardinality
- average
- minimum
- maximum
- sum
- boxplot
- stats
- string_stats
- matrix_stats
- median_absolute_deviation
- calculate
see simple documentation about these methods @ rubydoc
(also see @ github )
Available result methods
- aggregations
- buckets
- hits
- results
- total
- msearch
- agg_pluck
- composite
- point_in_time
- pit_results
- pit_delete
see simple documentation about these methods @ rubydoc
(also see @ github )
Additional methods
- to_query
Useful model class attributes
index_base_name
Rails resolves a pluralized underscore table_name from the class name by default - which will not work for some models.
To support a generic +table_name_prefix+ & +table_name_suffix+ from the database.yml, the ‘index_base_name’ provides a possibility to chain prefix, base and suffix.
```ruby class UnusalStat < ApplicationElasticsearchRecord self.index_base_name = ‘unusal-stats’ end
UnusalStat.where(year: 2023).to_query # => :body … ```
delegate_id_attribute
Rails resolves the primary_key’s value by accessing the #id method.
Since Elasticsearch also supports an additional, independent id attribute,
it would only be able to access this through _read_attribute(:id)
.
To also have the ability of accessing this attribute through the default, this flag can be enabled.
```ruby class SearchUser < ApplicationElasticsearchRecord # attributes: id, name end
create new user within the index
user = SearchUser.create(id: 8, name: ‘Parker’)
accessing the id, does NOT return the stored id by default - this will be delegated to the primary_key ‘_id’.
user.id # => ‘b2e34xa2’
– ENABLE delegation ——————————————————————-
SearchUser.delegate_id_attribute = true
create new user within the index
user = SearchUser.create(id: 9, name: ‘Pam’)
accessing the id accesses the stored attribute now
user.id # => 9
accessing the ES index id
user._id # => ‘xtf31bh8x’ ```
delegate_query_nil_limit
Elasticsearch’s default value for queries without a size is forced to 10.
To provide a similar behaviour as the (my)SQL interface,
this can be automatically set to the max_result_window
value by calling .limit(nil)
on the models’ relation.
```ruby SearchUser.where(name: ‘Peter’).limit(nil) # returns a maximum of 10 items … # => […]
– ENABLE delegation ——————————————————————-
SearchUser.delegate_query_nil_limit = true
SearchUser.where(name: ‘Peter’).limit(nil) # returns up to 10_000 items … # => […]
hint: setting the ‘max_result_window’ can also be done by providing ‘max’ wto the limit method: SearchUser.limit(‘max’)
hint: if you want more than 10_000 use the +#pit_results+ method!
```
Useful model class methods
- auto_increment?
- max_result_window
- source_column_names
- searchable_column_names
- find_by_query
- msearch
Useful model API methods
Quick access to model-related methods for easier access without creating a overcomplicated method call on the models connection…
Access these methods through the model class method .api
.
```ruby # returns mapping of model class klass.api.mappings
e.g. for ElasticUser model
SearchUser.api.mappings
insert new raw data
SearchUser.api.insert([{name: ‘Hans’, age: 34, ‘Peter’, age: 22]) ```
dangerous methods
- open!
- close!
- refresh!
- block!
- unblock!
dangerous methods with args
- create!(…)
- clone!(…)
- rename!(…)
- backup!(…)
- restore!(…)
- reindex!(…)
dangerous methods with confirm parameter
- drop!(confirm: true)
- truncate!(confirm: true)
table methods
- mappings
- metas
- settings
- aliases
- state
- schema
- exists?
plain methods
- alias_exists?(…)
- setting_exists?(…)
- mapping_exists?(…)
- meta_exists?(…)
Fast insert, update, delete raw data
- index(…)
- insert(…)
- update(…)
- delete(…)
- bulk(…)
ActiveRecord ConnectionAdapters table-methods
Access these methods through the model class method .connection
.
ruby
# returns mapping of provided table (index)
klass.connection.table_mappings('table-name')
- table_mappings
- table_metas
- table_settings
- table_aliases
- table_state
- table_schema
- alias_exists?
- setting_exists?
- mapping_exists?
- meta_exists?
- max_result_window
- cluster_info
- cluster_settings
- cluster_health
Active Record Schema migration methods
Access these methods through the model’s connection or within any Migration
.
cluster actions:
- open_table
- open_tables
- close_table
- close_tables
- truncate_table
- truncate_tables
- refresh_table
- refresh_tables
- drop_table
- block_table
- unblock_table
- clone_table
- create_table
- change_table
- rename_table
- reindex_table
- backup_table
- restore_table
table actions:
- change_meta
- remove_meta
- add_mapping
- change_mapping
- change_mapping_meta
- change_mapping_attributes
- remove_mapping
- add_setting
- change_setting
- remove_setting
- add_alias
- change_alias
- remove_alias
Example migration:
```ruby class AddTests < ActiveRecord::Migration[7.0] def up create_table “assignments”, if_not_exists: true do |t| t.string :key, primary_key: true t.text :value t.timestamps
t.setting :number_of_shards, "1"
t.setting :number_of_replicas, 0
end
# changes the auto-increment value
change_meta "assignments", :auto_increment, 3625
# removes the mapping 'updated_at' from the 'assignments' index.
# the flag 'recreate' is required, since 'remove' is not supported for elasticsearch.
# this will recreate the whole index (data will be LOST!!!)
remove_mapping :assignments, :updated_at, recreate: true
create_table "settings", force: true do |t|
t.mapping :created_at, :date
t.mapping :key, :integer do |m|
m.primary_key = true
m.auto_increment = 10
end
t.mapping :status, :keyword
t.mapping :updated_at, :date
t.mapping :value, :text
t.setting "index.number_of_replicas", "0"
t.setting "index.number_of_shards", "1"
t.setting "index.routing.allocation.include._tier_preference", "data_content"
end
add_mapping "settings", :active, :boolean do |m|
m.comment = "Contains the active state"
end
change_table 'settings', force: true do |t|
t.add_setting("index.search.idle.after", "20s")
t.add_setting("index.shard.check_on_startup", true)
t.add_alias('supersettings')
end
remove_alias('settings', :supersettings)
remove_setting('settings', 'index.search.idle.after')
change_table 'settings', force: true do |t|
t.integer :amount_of_newbies
end
create_table "vintage", force: true do |t|
t.primary_key :number
t.string :name
t.string :comments
t.timestamps
end
change_table 'vintage', if_exists: true, recreate: true do |t|
t.change_mapping :number, fields: {raw: {type: :keyword}}
t.remove_mapping :number
end end
def down drop_table ‘assignments’ drop_table ‘settings’ drop_table ‘vintage’ end end ```
environment-related-table-name:
Using the _env_table_name
-method will resolve the table (index) name within the current environment,
even if the environments shares the same cluster …
This can be provided through the database.yml
by using the table_name_prefix/suffix
configuration keys.
Within the migration the _env_table_name
-method must be used in combination with the table (index) base name.
Example: Production uses a index suffix with ‘-pro’, development uses ‘-dev’ - they share the same cluster, but different indexes.
For the settings table:
- settings-pro
- settings-dev
A single migration can be created to be used within each environment:
```ruby # Example migration class AddSettings < ActiveRecord::Migration[7.0] def up create_table _env_table_name(“settings”), force: true do |t| t.mapping :created_at, :date t.mapping :key, :integer do |m| m.primary_key = true m.auto_increment = 10 end t.mapping :status, :keyword t.mapping :updated_at, :date t.mapping :value, :text
t.setting "index.number_of_replicas", "0"
t.setting "index.number_of_shards", "1"
t.setting "index.routing.allocation.include._tier_preference", "data_content"
end end
def down drop_table _env_table_name(“settings”) end end ```
Docs
Contributing
Bug reports and pull requests are welcome on GitHub. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
License
The gem is available as open source under the terms of the MIT License.
A copy of the LICENSE can be found @ the docs.
Code of Conduct
Everyone interacting in the project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the CODE OF CONDUCT.