GDPR and Ruby logo

GDPR Admin

Build Status Code Climate Test Coverage Gem Version

Rails engine for processing GDPR processes. GDPR Admin offers a simple interface for defining strategies for automating the process of data access and data removal as required by data privacy regulations like GDPR.

GDPR Admin uses simple Ruby classes, so you're free to code your Data Policies as you see fit. A swiss knife of helper methods are available to make the processes even simpler. The focus of the gem is to easily implement:

Installation

Add this line to your application's Gemfile:

gem "gdpr_admin"

And then execute:

$ bundle

Or install it yourself as:

$ gem install gdpr_admin

Then install the migrations:

$ rails gdpr_admin:install:migrations

Usage

Create your data policies file within app/gdpr (configurable) and inherit from GdprAdmin::ApplicationDataPolicy. Implement the methods #scope, #export and #erase for the new data policy. Within the data policy, you will be able to access the GdprAdmin::Request object in any method by calling the method request - you can, therefore, have different scopes and different removal behaviors depending on the request.

Optinally, you may declare a #subject_scope method with logic for scoping subject data. If this method is not present, it will fallback to the #scope method.

class UserDataPolicy < GdprAdmin::ApplicationDataPolicy
  def scope
    User.with_deleted.where(updated_at: ...request.data_older_than)
  end

  # Optional
  def subject_scope
    scope.where(email: request.subject)
  end

  def erase(user)
    user.update_columns(
      first_name: 'Anonymous',
      last_name: "User #{user.id}",
      email: "anonymous.user#{user.id}@company.co",
      anonymized_at: Time.zone.now,
    )
  end
end

Once you have all your data policies defined, create a GdprAdmin::Request to process a new request:

GdprAdmin::Request.create!(
  tenant: current_tenant,
  requester: current_admin_user,
  request_type: 'erase_data',
  data_older_than: 1.month.ago, # Optional: by default, it will be todays date
)

Creating a request will automatically enqueue the request to be processed in 4 hours - this gives time to cancel an accidental request. You can configure this grace period as desired.

Scope Helpers

Helpers to be used within the #scope method.

scope_by_date

scope_by_date(scope, field = :updated_at)

Automatically scopes the data using the updated_at column to match the GDPR Request. You can use a different column by providing a second argument.

class ContactDataPolicy < GdprAdmin::ApplicationDataPolicy
  def scope
    scope_by_date(Contact)
  end
end

Anonymization Helpers

A set of helper methods are available to make the anonymization even simpler. These are not mandatory, but can help you keep your code cleaner and, better, write less code.

erase_fields

erase_fields(record, fields, base_changes = {})

The method erase_fields is available in the Data Policy class. It expects an array of field anonymization options. It will automatically process those fields and update the record in the database using update_columns (so validations are skipped). The last optional argument (base_changes) is a hash of attributes that should be updated when the record is updated. See the example below:

class ContactDataPolicy < GdprAdmin::ApplicationDataPolicy
  def fields
    [
      { field: :first_name, method: :anonymize_first_name },
      { field: :last_name, method: :anonymize_last_name },
      { field: :gender, method: :skip },
      {
        field: :email,
        method: lambda { |contact|
          domain = contact.email[/@.*/]
          "anonymous.contact#{contact.id}#{domain}"
        },
      },
      { field: :street_address, method: :nilify },
      { field: :city, method: :anonymize_city, seed: :id },
    ]
  end

  def erase(contact)
    erase_fields(contact, fields, { anonymized_at: Time.zone.now })
  end
end

The anonymizers used above (e.g. anonymize_first_name), by default, will use the value of the field being updated as the seed. That means that, when anonymization with the same anonymizer function, equal values will always yield the same anonymized value. (note: different values may also yield the same value)

To use the built-in anonymizer functions, you need to install the gem faker.

Data Policy Hooks

For advanced use cases, you may install hooks that are run at different stages of the data policy process.

before_process

Will be run before the policy is executed. Calling skip_data_policy! will raise GdprAdmin::SkipDataPolicyError and stop the execution of the data policy.

class ContactDataPolicy < GdprAdmin::ApplicationDataPolicy
  before_process :skip_internal_contacts!

  private

  def skip_internal_contacts!
    skip_data_policy! if contact.email =~ /.*@company\.com/
  end
end

before_process_record

Called before processing a record (either erasing or exporting). Calling skip_record! will raise GdprAdmin::SkipRecordError and skip processing that particular record.

class UserDataPolicy < GdprAdmin::ApplicationDataPolicy
  before_process :skip_super_admins!

  private

  def skip_super_admins!(user)
    skip_record! if user.role == 'super_admin'
  end
end

GDPR Request

A GDPR Request (GdprAdmin::Request) represents a request to remove a subject's data, tenant's data, or export subject data.

Model GdprAdmin::Request

GdprAdmin::Request.create!(
  tenant: current_tenant,
  requester: current_admin_user,
  request_type: 'erase_data',
  data_older_than: 30.days.ago,
)

#tenant

Defines which tenant is this request attributed to. All requests must be assigned to a Tenant.

#request_type

Represents what type of request is being performed. Must be one of:

  • erase_data: erase a tenant's data;
  • erase_subject: erase a single subject's data in a tenant;
  • export_subject: export data about a subject within a tenant;

#data_older_than

The request should only affect any data created before than date defined here (e.g. erase anything older than 30 days ago).

#process!

This method executes the request. This is automatically called by the GdprAdmin::RequestProcessorJob.

Data Retention Policies

Some tenants may have different data retention policies (e.g. some hold Personally identifiable information for 3 months, others for 1 month).

Setup Job

Run the job GdprAdmin::DataRetentionPoliciesRunnerJob periodically to execute your data retention policy (it is recommended to run it, at least, once a day).

Sidekiq

You can define a repeating job with sidekiq-cron:

Sidekiq::Cron::Job.create(
  class: 'GdprAdmin::DataRetentionPoliciesRunnerJob',
  cron: '0 2 * * *',
  name: 'Run Data Retention Policies',
  queue: 'cron',
  active_job: true,
)

DelayedJobs

GdprAdmin::DataRetentionPoliciesRunnerJob.set(cron: '0 2 * * *', queue: :cron).perform_later

Model GdprAdmin::DataRetentionPolicy

This model allows you to create a custom data policy for each tenant and automatically run the policy periodically.

GdprAdmin::DataRetentionPolicy.create!(
  tenant: acme_inc,
  period_in_days: 30,
)

If you want to have a custom logic for running the data retention policy (e.g. do not run if the organization is deleted), then you can add the logic to the method #should_process?:

module GdprAdmin
  class DataRetentionPolicy < GdprAdmin::ApplicationRecord
    include GdprAdmin::DataRetentionPolicyConcern

    def should_process?
      tenant.deleted_at.nil?
    end
  end
end

PaperTrail

GDPR Admin provides a set of tools to keep your PaperTrail GDPR Compliant.

PaperTrail Data Privacy

By default, PaperTrail versions will not be anonymized. You may extend the default PaperTrail::VersionDataPrivacy with your own scope. If you track custom fields with your versions (e.g. ip), then you can also define an anonymizer for those here:

# app/gdpr/paper_trail/version_data_privacy.rb

module PaperTrail
  class VersionDataPolicy < GdprAdmin::PaperTrail::VersionDataPolicy
    def fields
      [
        { field: 'ip', method: :anonymize_ip },
      ]
    end

    def scope
      return PaperTrail::Version.where(updated_at: ...request.data_older_than) if request.erase_data?

      PaperTrail::Version.none
    end
  end
end

PaperTrail::VersionDataPolicy#erase

NOTE: this method only support JSON format for object and, optionally, object_changes. If you need a different format, you will need to re-implement this method as desired.

erase(version, item_fields = nil)

The erase method will, by default, anonymize the data within object and object_changes (and whichever fields are defined in the #fields method). It will choose which fields to anonymize the object and object_changes and which anonymization methods by finding the item's data policy and loading the fields from its fields method. Unless item_fields is defined, in which case it will be used instead.

For example, if you have a item_type set to User, it will try to find the UserDataPrivacy. If you want to use a different class for a item_type, you must define a data_policy_class in the model.

class User < ApplicationRecord
  def data_policy_class
    PersonDataPolicy
  end
end

If you'd like to just namespace all policies, then you can define data_policy_prefix in the ApplicationRecord:

class ApplicationRecord < ActiveRecord::Base
  def data_policy_prefix
    'Gdpr::'
  end
end

# Now, user should be defined in `Gdpr::UserDataPolicy`

PaperTrail Helpers

When using the method erase_fields, no PaperTrail versions will be created in the database. GDPR Admin offer other helper methods to deal with PaperTrail. (If you are not using paper_trail, this section may not be relevant)

without_paper_trail

Given a block, this method will execute it in a context where PaperTrail is disabled, so no versions are created:

def erase(contact)
  without_paper_trail do
    contact.update!(first_name: 'John', last_name: 'Doe')
  end
end

As mentioned above, this is not required when using erase_fields as it is the default behavior.

Configuration

Configure GDPR Admin in a initializer file config/initializers/gdpr_admin.rb. The configuration should be done within the block of GdprAdmin.configure(&block):

# config/initializers/gdpr_admin/rb
GdprAdmin.configure do |config|
  # GDPR Admin configuration here...
end

Multi-Tenancy

GDPR Admin is built maily for B2B SaaS services and it assumes that the service may have multiple tenants. The GDPR::Request object always expects a tenant to be provided. When processing the request, data will be automatically segregated using the tenant adapter.

Tenant Class

By default, GDPR Admin will assume that the tenant class is Organization. You can change that by setting the tenant_class in the config.

GdprAdmin.configure do |config|
  config.tenant_class = 'Tenant'
end

ActsAsTenant Adapter

You can segregated the process to a tenant using ActsAsTenant gem:

GdprAdmin.configure do |config|
  config.tenant_adapter = :acts_as_tenant
end

Rollback on Failure

By default, GDPR Admin will attempt to rollback all changes made during the processing a Request in case it fails during the processing. This can be turned of with the flag rollback_on_failure. In this situation, all changes will be committed as soon as they are done.

GdprAdmin.configure do |config|
  config.rollback_on_failure = false
end

Jobs

Requests are processed asynchronously using ActiveJob.

Custom Queue

You can set which queue should be used to schedule the job for processing the request using the config default_job_queue:

GdprAdmin.configure do |config|
  config.default_job_queue = :gdpr_tasks
end

Grace Periods

To allow for cancelling any accidental erasure request, the jobs are scheduled with a configurable grace period. By default, erasure requests will wait 4 hours before being executed, while export requests will be executed immediately.

GdprAdmin.configure do |config|
  config.erasure_grace_period = 1.day
  config.export_grace_period = 2.minutes
end

Other Configurations

Data Policies Directory

By default, GDPR Admin will assume that your data policies are defined in app/gdpr. If you wish to have them in a different place, you can change the option data_policies_path:

GdprAdmin.configure do |config|
  # Change data policies path to be within the models directory (app/models/gdpr)
  config.data_policies_path = Rails.root.join('app', 'models', 'gdpr')
end

License

The gem is available as open source under the terms of the MIT License.