InstDataShipper
This gem is intended to facilitate easy upload of LTI datasets to Instructure Hosted Data.
Installation
Add this line to your application’s Gemfile:
ruby
gem 'inst_data_shipper'
Then run the migrations:
bundle exec rake db:migrate
Usage
Dumper
The main tool provided by this Gem is the InstDataDumper::Dumper
class. It is used to define a “Dump” which is a combination of tasks and schema.
It is a assumed that a Dumper
class definition is the source of truth for all tables that it manages, and that no other processes affect the tables’ data or schema. You can break this assumption, but you should understand how the incremental
logic works and what will and will not trigger a full table upload. Dumpers have a export_genre
method that determines the what Dumps to look at when calculating incrementals.
- High level, the HD backend will look for a past dump of the same genre. If not found, a full upload of all tables is triggered. If found, each table’s schema is compared; any tables with mismatched schema (determined by hashing) will do a full upload.
- Note that Proc
s in the schema are not included in the hash calculation. If you change a Proc
implementation and need to trigger a full-upload of the table, you’ll need to change something else too (like the version
).
Here is an example Dumper
implementation, wrapped in an ActiveJob job:
```ruby
class HostedDataPushJob < ApplicationJob
# The schema serves two purposes: defining the schema and mapping data
SCHEMA = InstDataShipper::SchemaBuilder.build do
# You can augment the Table-builder DSL with custom methods like so:
extend_table_builder do
# It may be useful to define a custom column definition helpers:
def custom_column(args, from: nil, **kwargs, &blk)
# In this example, the helper reads the value from a data
jsonb column - without it, you’d need
# to define from: ->(row) { row.data["<KEY>"] }
on each column that needs to read from the jsonb
from ||= args[0].to_s
from = ->(row) { row.data[from] } if from.is_a?(String)
column(args, **kwargs, from: from, &blk)
end
# `extend_table_builder` uses `class_eval`, so you could alternatively write your helpers in a Concern or Module and include them like normal:
include SomeConcern
end
table(ALocalModel, "<TABLE DESCRIPTION>") do
# If you define a table as incremental, it'll only export changes made since the start of the last successful Dumper run
# The first argument "scope" can be interpreted in different ways:
# If exporting a local model it may be a: (default: `updated_at`)
# Proc that will receive a Relation and return a Relation (use `incremental_since`)
# String of a column to compare with `incremental_since`
# If exporting a Canvas report it may be a: (default: `updated_after`)
# Proc that will receive report params and return modified report params (use `incremental_since`)
# String of a report param to set to `incremental_since`
# `on:` is passed to Hosted Data and is used as the unique key. It may be an array to form a composite-key
# `if:` may be a Proc or a Symbol (of a method on the Dumper)
incremental "updated_at", on: [:id], if: ->() {}
# Schemas may declaratively define the data source.
# This can be used for basic schemas where there's a 1:1 mapping between source table and destination table, and there is no conditional logic that needs to be performed.
# In order to apply these statements, your Dumper must call `auto_enqueue_from_schema`.
source :local_table
# A Proc can also be passed. The below is equivalent to the above
source ->(table_def) { import_local_table(table_def[:model] || table_def[:warehouse_name]) }
# You may manually note a version on the table.
# Note that if a version is present, the version value replaces the hash-comparison when calculating incrementals, so you must change the version whenever the schema changes enough to trigger a full-upload
version "1.0.0"
column :name_in_destinations, :maybe_optional_sql_type, "Optional description of column"
# The type may usually be omitted if the `table()` is passed a Model class, but strings are an exception to this
column :name, :"varchar(128)"
# `from:` May be...
# A Symbol of a method to be called on the record
column :sis_type, :"varchar(32)", from: :some_model_method
# A String of a column to read from the record
column :sis_type, :"varchar(32)", from: "sis_source_type"
# A Proc to be called with each record
column :sis_type, :"varchar(32)", from: ->(rec) { ... }
# Not specified. Will default to using the Schema Column Name as a String ("sis_type" in this case)
column :sis_type, :"varchar(32)"
end
table("my_table", model: ALocalModel) do
# ...
end
table("proserv_student_submissions_csv") do
column :canvas_id, :bigint, from: "canvas user id"
column :sis_id, :"varchar(64)", from: "sis user id"
column :name, :"varchar(64)", from: "user name"
column :submission_id, :bigint, from: "submission id"
end end
Dumper = InstDataShipper::Dumper.define(schema: SCHEMA, include: [ InstDataShipper::DataSources::LocalTables, InstDataShipper::DataSources::CanvasReports, ]) do import_local_table(ALocalModel) import_canvas_report_by_terms(“proserv_student_submissions_csv”, terms: Term.all.pluck(:canvas_id))
# If the report_name/Model don't directly match the Schema, a schema_name: parameter may be passed:
import_local_table(SomeModel, schema_name: "my_table")
import_canvas_report_by_terms("some_report", terms: Term.all.pluck(:canvas_id), schema_name: "my_table")
# Iterate through the Tables defined in the Schema and apply any defined `source` statements.
# This is the default behavior if `define()` is called w/o a block.
auto_enqueue_from_schema end
def perform
Dumper.perform_dump([
“hosted-data://
Dumper
s may also be formed as a normal Ruby subclass:
```ruby
class HostedDataPushJob < ApplicationJob
SCHEMA = InstDataShipper::SchemaBuilder.build do
# …
end
class Dumper < InstDataShipper::Dumper include InstDataShipper::DataSources::LocalTables include InstDataShipper::DataSources::CanvasReports
def enqueue_tasks
import_local_table(ALocalModel)
import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))
# auto_enqueue_from_schema
end
def table_schemas
SCHEMA
end end
def perform
Dumper.perform_dump([
“hosted-data://
Destinations
This Gem is mainly designed for use with Hosted Data, but it tries to abstract that a little to allow for other destinations/backends. Out of the box, support for Hosted Data and S3 are included.
Destinations are passed as URI-formatted strings. Passing Hashes is also supported, but the format/keys are destination specific.
Destinations blindly accept URI Fragments (the #
chunk at the end of the URI). These options are not used internally but will be made available as dest.user_config
. Ideally these are in the same format as query parameters (x=1&y=2
, which it will try to parse into a Hash), but it can be any string.
Hosted Data
hosted-data://<JWT>@<HOSTED DATA SERVER>
Optional Parameters:
table_prefix
: An optional string to prefix onto each table name in the schema when declaring the schema in Hosted Data
S3
s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<optional path>
Optional Parameters:
None
Development
When adding to or updating this gem, make sure you do the following:
- Update the yardoc comments where necessary, and confirm the changes by running
yardoc --server
- Write specs
- If you modify the model or migration templates, run
bundle exec rake update_test_schema
to update them in the Rails Dummy application (and commit those changes)
Docs
Docs can be generated using yard. To view the docs:
- Clone this gem’s repository
bundle install
yard server --reload
The yard server will give you a URL you can visit to view the docs.