Tartarus::Rb
A gem for archiving (deleting) old records you no longer need. Send them straight to tartarus!
Installation
Add this line to your application's Gemfile:
gem 'tartarus-rb'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install tartarus-rb
Usage
This game is based on sidekiq-cron, which means you can manage (e.g. disable/enable) jobs from sidekiq-cron UI.
Here are some examples how to use it
Put it in the initializer, e.g. in config/initializers/sidekiq.rb
right after loading schedule for sidekiq-cron
:
Sidekiq.configure_server do |config|
config.on(:startup) do
schedule_file = "config/schedule.yml"
if File.exist?(schedule_file) && Sidekiq.server?
Sidekiq::Cron::Job.load_from_hash YAML.load_file(schedule_file)
tartarus = Tartarus.new
tartarus.register do |item|
item.model = ModelThatYouWantToArchive
item.cron = "5 4 * * *"
item.queue = "default"
item.tenants_range = -> { Account.active }
item.tenant_value_source = :uuid
item.tenant_id_field = :account_uuid
item.archive_items_older_than = -> { 30.days.ago }
item.timestamp_field = :created_at
item.archive_with = :destroy_all
end
tartarus.register do |item|
item.model = OtherModelThatYouWantToArchive
item.cron = "5 5 * * *"
item.queue = "default"
item.tenants_range = -> { ["Account", "User"] }
item.tenant_id_field = :model_type
item.archive_items_older_than = -> { 30.days.ago }
item.timestamp_field = :created_at
end
glacier_configuration = Tartarus::RemoteStorage::Glacier::Configuration.build(
aws_key: ENV.fetch("AWS_KEY"),
aws_secret: ENV.fetch("AWS_SECRET"),
aws_region: ENV.fetch("AWS_REGION"),
vault_name: ENV.fetch("GLACIER_VAULT_NAME"),
root_path: Rails.root.to_s,
archive_registry_factory: ArchiveRegistry,
)
# don't forget about installing `aws-sdk-glacier` gem
tartarus.register do |item|
item.model = YetAnotherModel
item.cron = "5 6 * * *"
item.queue = "default"
item.timestamp_field = :created_at
item.archive_items_older_than = -> { 1.week.ago }
item.remote_storage = Tartarus::RemoteStorage::Glacier.new(glacier_configuration)
end
tartarus.schedule # this method must be called to create jobs for sidekiq-cron!
end
end
end
You can use the following config params:
model
- a name of the ActiveReord model you want to archive, requiredname
- name of your strategy, optional. It fallbacksmodel.to_s
. It's important to set in in cases when you have several strategies for the same model: ```rb tartarus.register do |item| item.model = InternalEvent item.name = "archive_account_and_user_internal_events" item.cron = "5 5 * * *" item.queue = "default" item.tenants_range = -> { ["Account", "User"] } item.tenant_id_field = :model_type item.archive_items_older_than = -> { 30.days.ago } item.timestamp_field = :created_at end
tartarus.register do |item| item.model = InternalEvent item.name = "archive_post_and_comment_internal_events" item.cron = "5 15 * * *" item.queue = "default" item.tenants_range = -> { ["Post", "Comment"] } item.tenant_id_field = :model_type item.archive_items_older_than = -> { 10.days.ago } item.timestamp_field = :created_at end
- `cron` - cron syntax, required
- `queue` - name of the sidekiq queue you want to use for execution of the jobs, required
- `tenants_range` - optional, use if you want to scope items by a tenant (or any field that can be used for partitioning). It doesn't have to be ActiveRecord collection, could be just an array. Must be a proc/lambda/object responding to `call` method. For ActvieRecord collection, `find_each` loop will be used for optimization.
- `tenant_value_source` - optional but required if you want to have scoping by tenant/partitioning field. Specifying `:uuid` here means that ModelThatYouWantToArchive collection will be scheduled for archiving by uuid of each Account. It defaults to `id`.
- `tenant_id_field` - required when using tenant_value_source/tenant_value_source. It's a DB column that will be used for scoping records by a tenant. For example, here it would be: `ModelThatYouWantToArchive.where(account_uuid: value_of_uuid_from_some_active_account)`
- `archive_items_older_than` - required, for defining retention policy
- `timestamp_field` - required, used for performing a query using the value from `archive_items_older_than`
- `archive_with` - optional (defaults to `delete_all`). Could be `delete_all`, `destroy_all`, `delete_all_without_batches`, `destroy_all_without_batches`, `delete_all_using_limit_in_batches`
- `batch_size` - optional (defaults to `10_000`, used with `delete_all_using_limit_in_batches` strategy)
- `remote_storage` - optional (defaults to `Tartarus::RemoteStorage::Null` which does nothing). Use this option if you want store the data somewhere before deleting it.
### Remote Storage
Currently, only `Glacier` (for AWS Glacier) is supported. Also, it works only with Postgres database and requires [postgres-copy](https://github.com/diogob/postgres-copy).
To take advantage of this feature you will need a couple of things:
1. Apply `acts_as_copy_target` to the archivable model (from `postgres-copy` gem).
2. Create a model that will be used as a registry for all uploads that happened.
3. Install `aws-sdk-glacier` gem.
If you want to make `Version` model archivable and use `ArchiveRegistry` as the registry, you will need the following models and tables:
``` rb
database.create_table(:archive_registries) do |t|
t.string :glacier_location, null: false
t.string :glacier_checksum, null: false
t.string :glacier_archive_id, null: false
t.string :archivable_model, null: false
t.string :tenant_id_field
t.string :tenant_id
t.datetime :completed_at, null: false
end
database.create_table(:versions) do |t|
end
class Version < ApplicationRecord
acts_as_copy_target
end
class ArchiveRegistry < ApplicationRecord
end
You can use the above schema for the registry model as it contains all needed fields.
To initialize the service:
glacier_configuration = Tartarus::RemoteStorage::Glacier::Configuration.build(
aws_key: ENV.fetch("AWS_KEY"),
aws_secret: ENV.fetch("AWS_SECRET"),
aws_region: ENV.fetch("AWS_REGION"),
vault_name: ENV.fetch("GLACIER_VAULT_NAME"),
root_path: Rails.root.to_s,
archive_registry_factory: ArchiveRegistry,
)
Tartarus::RemoteStorage::Glacier.new(glacier_configuration)
You can also pass account_id
(by default "-" string will be used):
glacier_configuration = Tartarus::RemoteStorage::Glacier::Configuration.build(
aws_key: ENV.fetch("AWS_KEY"),
aws_secret: ENV.fetch("AWS_SECRET"),
aws_region: ENV.fetch("AWS_REGION"),
vault_name: ENV.fetch("GLACIER_VAULT_NAME"),
root_path: Rails.root.to_s,
archive_registry_factory: ArchiveRegistry,
account_id: "some_account_id"
)
Tartarus::RemoteStorage::Glacier.new(glacier_configuration)
Important - do not use Glacier Storage for large batches (> 4 GB) as multipart uploads are not supported yet.
If you know what you are doing, you can add your own storage, as long as it complies with the following interface:
class Glacier
attr_reader :configuration
private :configuration
def initialize(configuration)
@configuration = configuration
end
def store(collection, archivable_model, tenant_id: nil, tenant_id_field: nil)
end
end
Testing before actually using it
You might want to verify that the gem works in the way you expect it to work. For that, you will be mostly interested in 2 usecases:
- scheduling/enqueueing: use
Tartarus::ScheduleArchivingModel#schedule
- for example,Tartarus::ScheduleArchivingModel.new.schedule("PaperTrailVersion")
, it's going to enqueue eitherTartarus::Sidekiq::ArchiveModelWithTenantJob
orTartarus::Sidekiq::ArchiveModelWithoutTenantJob
, depending on the config. - execution of the archiving logic: use
Tartarus::ArchiveModelWithTenant#archive
(for example,Tartarus::ArchiveModelWithTenant.new.archive("PaperTrailVersion", "User")
) orTartarus::ArchiveModelWithoutTenant#archive
(for example,Tartarus::ArchiveModelWithoutTenant.new.archive("PaperTrailVersion")
)
You might also want to check spec/integration
to get an idea how the integration tests were written.
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/tartarus-rb.
License
The gem is available as open source under the terms of the MIT License.