InstDataShipper

This gem is intended to facilitate easy upload of LTI datasets to Instructure Hosted Data.

Installation

Add this line to your application’s Gemfile:

ruby gem 'inst_data_shipper'

Then run the migrations:

bundle exec rake db:migrate

Usage

Dumper

The main tool provided by this Gem is the InstDataDumper::Dumper class. It is used to define a “Dump” which is a combination of tasks and schema.

It is a assumed that a Dumper class definition is the source of truth for all tables that it manages, and that no other processes affect the tables’ data or schema. You can break this assumption, but you should understand how the incremental logic works and what will and will not trigger a full table upload. Dumpers have a export_genre method that determines the what Dumps to look at when calculating incrementals. - High level, the HD backend will look for a past dump of the same genre. If not found, a full upload of all tables is triggered. If found, each table’s schema is compared; any tables with mismatched schema (determined by hashing) will do a full upload. - Note that Procs in the schema are not included in the hash calculation. If you change a Proc implementation and need to trigger a full-upload of the table, you’ll need to change something else too (like the version).

Here is an example Dumper implementation, wrapped in an ActiveJob job: ```ruby class HostedDataPushJob < ApplicationJob # The schema serves two purposes: defining the schema and mapping data SCHEMA = InstDataShipper::SchemaBuilder.build do # You can augment the Table-builder DSL with custom methods like so: extend_table_builder do # It may be useful to define a custom column definition helpers: def custom_column(args, from: nil, **kwargs, &blk) # In this example, the helper reads the value from a data jsonb column - without it, you’d need # to define from: ->(row) { row.data["<KEY>"] } on each column that needs to read from the jsonb from ||= args[0].to_s from = ->(row) { row.data[from] } if from.is_a?(String) column(args, **kwargs, from: from, &blk) end

  # `extend_table_builder` uses `class_eval`, so you could alternatively write your helpers in a Concern or Module and include them like normal:
  include SomeConcern
end

table(ALocalModel, "<TABLE DESCRIPTION>") do
  # If you define a table as incremental, it'll only export changes made since the start of the last successful Dumper run
  #  The first argument "scope" can be interpreted in different ways:
  #    If exporting a local model it may be a: (default: `updated_at`)
  #      Proc that will receive a Relation and return a Relation (use `incremental_since`)
  #      String of a column to compare with `incremental_since`
  #    If exporting a Canvas report it may be a: (default: `updated_after`)
  #      Proc that will receive report params and return modified report params (use `incremental_since`)
  #      String of a report param to set to `incremental_since`
  #  `on:` is passed to Hosted Data and is used as the unique key. It may be an array to form a composite-key
  #  `if:` may be a Proc or a Symbol (of a method on the Dumper)
  incremental "updated_at", on: [:id], if: ->() {}

  # Schemas may declaratively define the data source.
  # This can be used for basic schemas where there's a 1:1 mapping between source table and destination table, and there is no conditional logic that needs to be performed.
  # In order to apply these statements, your Dumper must call `auto_enqueue_from_schema`.
  source :local_table
  # A Proc can also be passed. The below is equivalent to the above
  source ->(table_def) { import_local_table(table_def[:model] || table_def[:warehouse_name]) }

  # You may manually note a version on the table.
  # Note that if a version is present, the version value replaces the hash-comparison when calculating incrementals, so you must change the version whenever the schema changes enough to trigger a full-upload
  version "1.0.0"

  column :name_in_destinations, :maybe_optional_sql_type, "Optional description of column"

  # The type may usually be omitted if the `table()` is passed a Model class, but strings are an exception to this
  column :name, :"varchar(128)"

  # `from:` May be...
  # A Symbol of a method to be called on the record
  column :sis_type, :"varchar(32)", from: :some_model_method
  # A String of a column to read from the record
  column :sis_type, :"varchar(32)", from: "sis_source_type"
  # A Proc to be called with each record
  column :sis_type, :"varchar(32)", from: ->(rec) { ... }
  # Not specified. Will default to using the Schema Column Name as a String ("sis_type" in this case)
  column :sis_type, :"varchar(32)"
end

table("my_table", model: ALocalModel) do
  # ...
end

table("proserv_student_submissions_csv") do
  column :canvas_id, :bigint, from: "canvas user id"
  column :sis_id, :"varchar(64)", from: "sis user id"
  column :name, :"varchar(64)", from: "user name"
  column :submission_id, :bigint, from: "submission id"
end   end

Dumper = InstDataShipper::Dumper.define(schema: SCHEMA, include: [ InstDataShipper::DataSources::LocalTables, InstDataShipper::DataSources::CanvasReports, ]) do import_local_table(ALocalModel) import_canvas_report_by_terms(“proserv_student_submissions_csv”, terms: Term.all.pluck(:canvas_id))

# If the report_name/Model don't directly match the Schema, a schema_name: parameter may be passed:
import_local_table(SomeModel, schema_name: "my_table")
import_canvas_report_by_terms("some_report", terms: Term.all.pluck(:canvas_id), schema_name: "my_table")

# Iterate through the Tables defined in the Schema and apply any defined `source` statements.
# This is the default behavior if `define()` is called w/o a block.
auto_enqueue_from_schema   end

def perform Dumper.perform_dump([ “hosted-data://@?table_prefix=example", "s3://:@//", ]) end end ```

Dumpers may also be formed as a normal Ruby subclass: ```ruby class HostedDataPushJob < ApplicationJob SCHEMA = InstDataShipper::SchemaBuilder.build do # … end

class Dumper < InstDataShipper::Dumper include InstDataShipper::DataSources::LocalTables include InstDataShipper::DataSources::CanvasReports

def enqueue_tasks
  import_local_table(ALocalModel)
  import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))

  # auto_enqueue_from_schema
end

def table_schemas
  SCHEMA
end   end

def perform Dumper.perform_dump([ “hosted-data://@?table_prefix=example", "s3://:@//", ]) end end ```

Destinations

This Gem is mainly designed for use with Hosted Data, but it tries to abstract that a little to allow for other destinations/backends. Out of the box, support for Hosted Data and S3 are included.

Destinations are passed as URI-formatted strings. Passing Hashes is also supported, but the format/keys are destination specific.

Destinations blindly accept URI Fragments (the # chunk at the end of the URI). These options are not used internally but will be made available as dest.user_config. Ideally these are in the same format as query parameters (x=1&y=2, which it will try to parse into a Hash), but it can be any string.

Hosted Data

hosted-data://<JWT>@<HOSTED DATA SERVER>

Optional Parameters:
  • table_prefix: An optional string to prefix onto each table name in the schema when declaring the schema in Hosted Data

S3

s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<optional path>

Optional Parameters:

None

Development

When adding to or updating this gem, make sure you do the following:

  • Update the yardoc comments where necessary, and confirm the changes by running yardoc --server
  • Write specs
  • If you modify the model or migration templates, run bundle exec rake update_test_schema to update them in the Rails Dummy application (and commit those changes)

Docs

Docs can be generated using yard. To view the docs:

  • Clone this gem’s repository
  • bundle install
  • yard server --reload

The yard server will give you a URL you can visit to view the docs.