DataChecks
This gem provides a small DSL to check your data for inconsistencies and anomalies.
Requirements
- ruby 3.0+
- activerecord 7.0+
Installation
Add this line to your application's Gemfile:
gem "data_checks"
$ bundle install
$ bin/rails generate data_checks:install
Motivation
Making sure that data stays valid is not a trivial task. For simple requirements, like "this column is not null" or "this column is unique", you of course just use the database constraints and that's it. Same goes for type validation or reference integrity.
However, when you want to check for something more complex, then it all changes. Depending on your DBMS, you can use stored procedures, but this is often harder to write, version and maintain.
You could also assume that your data will never get corrupted, and validations directly in the code can do the trick ... but that'd be way too optimistic. Bugs happen all the time, and it's best to plan for the worst.
This gem doesn't aim to replace those tools, but provides something else that could serve a close purpose: ensure that you work with the data you expect.
This gem helps you to schedule some verifications on your data and get alerts when something is unexpected.
data_checks
can help to catch:
- 🐛 Bugs due to race conditions (e.g. user accidentally double clicks a button to delete an email and ends up without emails due to a race condition bug in the app)
- 🐛 Invalid persisted data
- 🐛 Unexpected changes in behavior and data (e.g. too many (too less) of something is created/deleted/imported/enqueued/..., etc)
This idea is nicely presented at RailsConf: RailsConf 2018: The Doctor Is In: Using checkups to find bugs in production by Ryan Laughlin
Usage
A small DSL is provided to help express predicates and an easy way to configure notifications.
You will be notified when a check starts failing, and when it starts passing again.
Checking for inconsistencies
For example, we expect every image attachment to have previews in 3 sizes. It is possible, that when a new image was attached, some previews were not generated because of some failure. What we would like to ensure is that no image ends up without a full set of previews. We could write something like:
DataChecks.configure do
ensure_no :users_without_emails, tag: "minutely" do
User.where.missing(:email_addresses)
end
ensure_no :images_without_previews, tag: "hourly" do
Attachment.images
.left_joins(:previews)
.group(:attachment_id)
.having("COUNT(previews.id) < 3")
end
notifier :email,
from: "[email protected]",
to: "[email protected]"
end
Checking for anomalies
This gem can be also used to detect anomalies in the data. For example, you expect to have some number of new orders in the system in some period of time. Otherwise, this can hint at some bug in the order placing system worth investigating.
ensure_more :new_orders_per_hour, than: 10, tag: "hourly" do
Order.where("created_at >= ?", 1.hour.ago).count
end
Configuration
Custom configurations should be placed in a data_checks.rb
initializer.
# config/initializers/data_checks.rb
DataChecks.configure do
# ...
end
Notifiers
Currently, the following notifiers are supported:
:email
: UsesActionMailer
to send emails. You can pass it anyActionMailer
options.:slack
: Sends notifications to Slack. Accepts the following options:webhook_url
: The webhook url to send notifications to
:logger
: UsesLogger
to output notifications to the log. Accepts the following params:logdev
: The log device. This is a filename (String) or IO object (typically STDOUT, STDERR, or an open file).level
: Logging severity threshold (e.g. Logger::INFO)
Each of them accepts a formatter_class
config to configure the used formatter when generating a notification.
You can create custom notifiers by creating a subclass of Notifier.
Create a notifier:
notifier :email,
from: "[email protected]",
to: "[email protected]"
Create multiple notifiers of the same type:
notifier "developers",
type: :email,
from: "[email protected]",
to: ["[email protected]", "[email protected]"]
notifier "tester",
type: :email,
from: "[email protected]",
to: "[email protected]"
ensure_no :images_without_previews, notify: "developers" do # notify only developers
# ...
end
Checks
ensure_no
will check that the result of a given block iszero?
,empty?
orfalse
ensure_any
will check that the result of a given block is> 0
ensure_more
will check that the result of a given block is>
than a given number or that it contains more than a given number of itemsensure_less
will check that the result of a given block is<
than a given number or that it contains less than a given number of itemsensure_equal
will check that the result of a given block is==
to the given number or that it contains a given number of items
ensure_no :images_without_previews do
# ...
end
ensure_any :facebook_logins_per_hour do
# ...
end
ensure_more :new_orders_per_hour, than: 10 do
# ...
end
Customizing the error handler
Exceptions raised while a check runs are rescued and information about the error is persisted in the database.
If you want to integrate with an exception monitoring service (e.g. Bugsnag), you can define an error handler:
# config/initializers/data_checks.rb
DataChecks.config.error_handler = ->(error, check_context) do
Bugsnag.notify(error) do |notification|
notification.(:data_checks, check_context)
end
end
The error handler should be a lambda that accepts 2 arguments:
error
: The exception that was raised.check_context
: A hash with additional information about the check:check_name
: The name of the check that erroredran_at
: The time when the check ran
Customizing the backtrace cleaner
DataChecks.config.backtrace_cleaner
can be configured to specify a backtrace cleaner to use when a check errors and the backtrace is cleaned and persisted. An ActiveSupport::BacktraceCleaner
should be used.
# config/initializers/data_checks.rb
cleaner = ActiveSupport::BacktraceCleaner.new
cleaner.add_silencer { |line| line =~ /ignore_this_dir/ }
DataChecks.config.backtrace_cleaner = cleaner
If none is specified, the default Rails.backtrace_cleaner
will be used to clean backtraces.
Schedule checks
Schedule checks to run (with cron, Heroku Scheduler, etc).
rake data_checks:run_checks TAG="5 minutes" # run checks with tag="5 minutes"
rake data_checks:run_checks TAG="hourly" # run checks with tag="hourly"
rake data_checks:run_checks TAG="daily" # run checks with tag="daily"
rake data_checks:run_checks # run all checks
Here's what it looks like with cron.
*/5 * * * * rake data_checks:run_checks TAG="5 minutes"
0 * * * * rake data_checks:run_checks TAG="hourly"
30 7 * * * rake data_checks:run_checks TAG="daily"
You can also manually get a status of all the checks by running:
rake data_checks:status
Credits
Thanks to checker_jobs gem for the original idea.
Development
After checking out the repo, run bundle install
to install dependencies. Then, run rake test
to run the tests.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/data_checks.
License
The gem is available as open source under the terms of the MIT License.