CanvasSync

This gem is intended to facilitate fast and easy syncing of Canvas data.

Installation

Add this line to your application's Gemfile:

gem 'canvas_sync'

mount CanvasSync::Engine, at: '/canvas_sync' Models and migrations can be installed using the following generator:

bin/rails generate canvas_sync:install --models users,terms,courses

Use the --models option to specify what models you would like installed. This will add both the model files and their corresponding migrations. If you'd like to install all the models that CanvasSync supports then specify --models all.

Then run the migrations:

bundle exec rake db:migrate

For a list of currently supported models, see CanvasSync::SUPPORTED_MODELS.

Additionally, your Canvas instance must have the "Proserv Provisioning Report" enabled.

The following custom reports are required for the specified models:

  • assignments = "Assignments Report" (proserv_assignment_export_csv)
  • submissions = "Student Submissions" (proserv_student_submissions_csv)
  • assignment_groups = "Assignment Group Export" (proserv_assignment_group_export_csv)
  • context_modules = "Professional Services Context Modules Report" (proserv_context_modules_csv)
  • context_module_items = "Professional Services Context Module Items Report" (proserv_context_module_items_csv)
  • content_migrations = "Professional Services Content Migrations Report" (proserv_content_migrations_csv)

Prerequisites

Postgres

The bulk inserting is made possible by using a Postgres upsert. Beause of this, you need to be using Postgres 9.5 or above.

Sidekiq

Make sure you've setup sidekiq to work properly with ActiveJob as outlined here.

Apartment

If using apartment and sidekiq make sure you include the apartment-sidekiq gem so that the jobs are run in the correct tenant.

Live Events

if enabling Live Events, the following additional dependencies are required:

  • If using Core/Data Services events: httparty, json-jwt
  • If using EventsManager/DejaVu events: symmetric-encryption

Basic Usage

Your tool must have an ActiveJob compatible job queue adapter configured, such as DelayedJob or Sidekiq.

Once that's done and you've used the generator to create your models and migrations you can run the standard provisioning sync:

CanvasSync.provisioning_sync(<array of models to sync>, term_scope: <optional term scope>)

Note: pass in 'xlist' to your array of models if you would like sections to include cross listing information

Example:

CanvasSync.provisioning_sync(['users', 'courses'], term_scope: :active)

This will kick off a string of jobs to sync your specified models.

If you pass in the optional term_scope the provisioning reports will be run for only the terms returned by that scope. The scope must be defined on your Term model. (A sample one is provided in the generated Term.)

Imports are inserted in bulk with activerecord-import so they should be very fast.

Live Events

Ensure that

mount CanvasSync::Engine, at: '/canvas_sync'

is added to your routes.rb. Configure DataServices or EventsManager to send events to https://YOUR_APP/canvas_sync/api/v1/live_event (if using DataServices, event must be signed).

Uncomment include CanvasSync::Concerns::LiveEventSync and related lines in the appropriate models. (Some models provide some basic hooks to address a "typical" workflow).

When Live Events are received, the corresponding model (if present) instance will receive process_live_event(subtype, payload, metadata) (where subtype is the event name w/o the model name - eg user_created => created). The default logic is to call ApiSyncable and update the model from the Canvas API. process_live_event can be overridden directly, or hooked with the usual Rails callbacks system (eg before_process_live_event).

You can subscribe to Live Events outside of a model context using an intializer like so:

CanvasSync::LiveEvents.subscribe(%w[Optional List of Events]) do |event|
  # Your code here
  # Note that this code is _not_ retried if it fails. If you need retries, use this block to trigger another Job.
end

Event Provenance

When using EventsManager events, events are verified as having come from a legitimate source by use of SymmetricEncryption (and thus PRODUCTION_KEY1 will need to be set correctly when deployed).

When using DataServices, CanvasSync uses the DataServices JWK to authenticate incoming events. CanvasSync is coded to default to the Prod & Beta JWK URL at https://8axpcl50e4.execute-api.us-east-1.amazonaws.com/main/jwks, but this can be overridden with the DATASERVICES_JWK_URL ENV variable.

Additionally, when PandaPal is installed too, use https://YOUR_APP/canvas_sync/api/v1/live_event?org=ORG_ID instead. CanvasSync will automatically switch to the correct organization and will validate that the event was indeed from the correct Canvas instance. If you are not using PandaPal, you'll need to monkey-patch CanvasSync::Api::V1::LiveEventsController#validate_tenant!

Legacy-Style Event Jobs

CanvasSync also supports they legacy style of Event Handlers. In this design, properly-named classes are defined in the ::LiveEvents module, such as (class LiveEvents::UserCreatedEvent). Any ActiveJob job is compatible, but CanvasSync also provides CanvasSync::LiveEvents::BaseHandler as a helpful base class.

When present, these jobs will, per event type (eg user_created), override the default behavior, meaning that subscribe blocks and process_live_event and related callbacks will not be called unless you call them. In other words: If you define class LiveEvents::UserCreatedEvent and also

subscribe(%w[user_created user_updated]) do |event|
  # ...
end

the subscribe block (and User model) will receive user_updated events, but not user_created events.

These jobs can also be generated from template using bin/rails generate canvas_sync:install_live_events --events users,courses,etc

Advanced Usage

This gem also helps with syncing and processing other reports if needed. In order to do so, you must:

  • Define a Processor class that implements a process method for handling the results of the report
  • Integrate your reports with the ReportStarter
  • Tell the gem what jobs to run

updated_after

An updated_after param may be passed when triggering a provision or making a chain:

CanvasSync.default_provisioning_report_chain(
  %i[list of models to sync], updated_after: false
)

It may be one of the following values:

  • false - Will not apply any updated_after filtering to the requested reports
  • An ISO-8601 Date - Will pass the supplied date ad the updated_after param for the requested reports
  • true (Default) - Will use the start date of the last successful sync

If updated_after is true, CanvasSync will, by default, perform a full sync every other Sunday. This logic can be customized by passing full_sync_every parameter. If you pass a date to updated_after, this logic will be disabled unless you explicitly pass a full_sync_every parameter. full_sync_every accepts the following format strings:

  • 15% - Each sync will have a 15% chance of running a full sync
  • 10 days - A full sync will be run every 10 days
  • sunday - A full sync will run every Sunday
  • saturday/4 - A full sync will run every fourth Saturday

Multiple Sync Chains

If your app uses multiple Sync Chains, you may run into issues with the automatic updated_after and full_sync_every logic. You can fix this by using custom logic or by setting the batch_genre parameter when creating the Job Chain. Chains will only use chains of the same genre when computing updated_after and full_sync_every.

Extensible chain

It is sometimes desired to extend or customize the chain of jobs that are run with CanvasSync. This can be achieved with the following pattern:

chain = CanvasSync.default_provisioning_report_chain(
  %i[list of models to sync]
)

# Add a custom job to the end of the chain.
chain << { job: CanvasSyncCompleteWorker, args: [job.id], kwargs: { job_id: job.id } }

chain.process!

# The chain object provides a fairly extensive API:
chain.insert({ job: SomeOtherJob, args: [], kwargs: {} }) # Adds the job to the end of the chain
chain.insert_at(0, { job: SomeOtherJob }) # Adds the job to the beginning of the chain
chain.insert({ job: SomeOtherJob }, after: 'CanvasSync::Jobs::SyncTermsJob') # Adds the job right after the SyncTermsJob
chain.insert({ job: SomeOtherJob }, before: 'CanvasSync::Jobs::SyncTermsJob') # Adds the job right before the SyncTermsJob
chain.insert({ job: SomeOtherJob }, with: 'CanvasSync::Jobs::SyncTermsJob') # Adds the job to be performed concurrently with the SyncTermsJob

# Some Jobs (such as the SyncTermsJob) have a sub-chain for, eg, Courses.
# chain.insert is aware of these sub-chains and will recurse into them when looking for a before:/after:/with: reference
chain.insert({ job: SomeOtherJob }, after: 'CanvasSync::Jobs::SyncCoursesJob') # Adds the job to be performed after SyncCoursesJob (which is a sub-job of the terms job and is duplicated for each term in the term_scope:)
# You can also retrieve the sub-chain like so:
chain.get_sub_chain('CanvasSync::Jobs::SyncTermsJob')

Processor

Your processor class must implement a process class method that receives a report_file_path and a hash of options. (See the CanvasSync::Processors::ProvisioningReportProcessor for an example.) The gem handles the work of enqueueing and downloading the report and then passes the file path to your class to process as needed. A simple example might be:

class MyCoolProcessor
  def self.process(report_file_path, options)
    puts "I downloaded a report to #{report_file_path}! Isn't that neat!"
  end
end

Report starter

You must implement a job that will enqueue a report starter for your report. (TODO: would be nice to make some sort of builder for this, so you just define the report and its params and then the gem runs it in a pre-defined job.)

Let's say we have a custom Canvas report called "my_really_cool_report_csv". First, we would need to create a job class that will enqueue a report starter.

class MyReallyCoolReportJob < CanvasSync::Jobs::ReportStarter
  def perform(options)
    super(
      'my_really_cool_report_csv', # Report name
      { "parameters[param1]" => true }, # Report parameters
      MyCoolProcessor.to_s, # Your processor class as a string
      options
    )
  end
end

You can also see examples in lib/canvas_sync/jobs/sync_users_job.rb and lib/canvas_sync/jobs/sync_provisioning_report.rb.

Batching

The provisioning report uses the CanvasSync::Importers::BulkImporter class to bulk import rows with the activerecord-import gem. It inserts rows in batches of 10,000 by default. This can be customized by setting the BULK_IMPORTER_BATCH_SIZE environment variable.

Mapping Overrides

Overrides are useful for two scenarios:

  • You have an existing application where the column names do not match up with what CanvasSync expects
  • You want to sync some other column in the report that CanvasSync is not configured to sync

Mappings can be modified by editing the Model class like such:

class User < ApplicationRecord
  include CanvasSync::Record

  sync_mapping(reset: false) do # `reset: false` is the default
    # The mapping can be totally cleared with `reset: true` in the `sync_mapping` call, or like such:
    reset_links

    # Add a new column:
    link_column :column_in_report => :column_in_database, type: :datetime

    # If the column name on the report and in the DB are the same, a shorthand can be used:
    link_column :omit_from_final_grade, type: :datetime

    # You can specify a block to pre-transform the value
    link_column :column_in_report => :column_in_database do |value, row|
      YAML.parse(value)
    end

    # If the defaults define a column you don't want synced, you can remove it from the mapping:
    unlink_column :column_in_database
  end

  # ...
end

You can also create a file called canvas_sync_provisioning_mapping.yml in your Rails config directory. However, this approach requires you to re-specify the complete table in order to modify a table. Define the tables and columns you want to override using the following format:

users:
  conflict_target: canvas_user_id # This must be a unique field that is present in the report and the database
  report_columns: # The keys specified here are the column names in the report CSV
    canvas_user_id_column_name_in_report:
        database_column_name: canvas_user_id_name_in_your_db # Sometimes the database column name might not match the report column name
        type: integer

API Sync

Several models implement the ApiSyncable Concern. This is done in the Model Templates so as to be customizable and tweakable. Models that include CanvasSync::Concerns::ApiSyncable should also call the api_syncable class method to configure the Synchronization. api_syncable takes two arguments and an optional block callback:

class CanvasSyncModel < ApplicationRecord
  api_syncable(
    {
      local_field: :response_field, # api_response[:response_field] will be mapped to local_field on the model.
      local_field: -> (api_response) { api_response[:some_field] + 5 }, # A calculated result will be mapped to local_field on the model. The lambda is executed in the context of the model instance.
    },
    -> (bearcat) { bearcat.some_request(some_model_getter) }, # A lambda, executed in the context of the model instance, to actually make the API call. Should accept 0 or 1 parameters. Must accept 0 parameters if your `canvas_sync_client` requires an `account_id`
    { # An optional options Hash
      mark_deleted: { workflow_state: 'deleted' }, # Action to take when a 404 is received from the API. May be a Hash that will be merged into the Model, A Symbol that should be sent to the model, or a lambda (both taking 0 arguments)
    }
  ) do |api_response, mapped_fields| # Must accept 1-2 parameters
    # Override behavior for actually applying the response to the model instance
  end

  def something()
    # ApiSyncable models add several instance methods:

    request_from_api( # Starts an API request and and returns the params
      retries: 3, # Number of times to retry the API call before failing
    )

    update_from_api_params(params) # Merge the API response into the model instance
    update_from_api_params!(params) # Merge and save! if changed

    sync_from_api( # Starts an API request and calls save! (if changed)
      retries: 3, # Number of times to retry the API call before failing
    )
  end
end

Job Batching

CanvasSync adds a CanvasSync::JobBatches module. It adds Sidekiq/sidekiq-batch like support for Job Batches. It integrates automatically with both Sidekiq and ActiveJob. The API is highly similar to the Sidekiq-batch implementation, documentation for which can be found at https://github.com/mperham/sidekiq/wiki/Batches

A batch can be created using Sidekiq::Batch or CanvasSync::JobBatching::Batch.

Also see canvas_sync/jobs/begin_sync_chain_job, canvas_sync/Job_batches/jobs/serial_batch_job, or canvas_sync/Job_batches/jobs/concurrent_batch_job for example usage.

Example:

batch = CanvasSync::JobBatches::Batch.new
batch.description = "Some Batch" # Optional, but can be useful for debugging

batch.on(:complete, "SomeClass.on_complete", kw_param: 1)
batch.on(:success, "SomeClass.on_success", some_param: 'foo')

# Add context to the batch. Can be accessed as batch_context on any jobs within the batch.
# Nested Batches will have their contexts merged
batch.context = {
  some_value: 'blah',
}

batch.jobs do
  # Enqueue jobs like normal
end

Job Pools

A job pool is like a custom Sidekiq Queue. You can add jobs to it and it will empty itself out into one of the actual queues. However, it adds some options for tweaking the logic:

  • concurrency (default: nil) - Define how many jobs from the pool can run at once.
  • order (default: fifo) - Define how the pool will empty itself
    • fifo - First-In First-Out, a traditional queue
    • lifo - Last-In First-Out
    • random - Pluck and run jobs in random order
    • priority - Execute jobs in a priority order (NB: Due to Redis, this priority-random, meaning that items with the same priority will be run in random order, not fifo)
  • clean_when_empty (default: true) - Automatically clean the pool when it is empty
  • on_failed_job (default :wait) - If a Job fails, should the pool :continue and still enqueue the next job or :wait for the job to succeed

Example:

pool = CanvasSync::JobBatches::Pool.new(concurrency: 4, order: :priority, clean_when_empty: false)
pool_id = pool.pid

# Add a job to the pool
pool << {
  job: SomeJob, # The Class of a ActiveJob Job or Sidekiq Worker
  args: [1, 2, 3], # Array of params to pass th e Job
  kwargs: {},
  priority: 100, # Only effective if order=:priority, higher is higher
}

# Add many jobs to the pool
pool.add_jobs([
  {
    job: SomeJob, # The Class of a ActiveJob Job or Sidekiq Worker
    args: [1, 2, 3], # Array of params to pass th e Job
    kwargs: {},
    priority: 100, # Only effective if order=:priority, higher is higher
  },
  # ...
])

# ...Later
CanvasSync::JobBatches::Pool.from_pid(pool_id).cleanup_redis

Custom Bearcat Instance

You can define a global canvas_sync_client method to return a Bearcat Client instance for CanvasSync to use:

# config/initializers/canvas_sync.rb
def canvas_sync_client
  Bearcat::Client.new(token: current_organization.settings[:api_token], prefix: current_organization.settings[:base_url])
end

(Having the client defined here means the sensitive API token doesn't have to be passed in plain text between jobs.)

This used to be required, but when both CanvasSync and PandaPal are up to date, this is defined automagically.

Legacy Support

Legacy Mappings

CanvasSync 0.10.0+, by default, changes Canvas primary-keys from :canvas_MODEL_id to just :canvas_id. Because CanvasSync primarily consists of templates, this change shouldn't require any large changes in your app, but you will need to apply the model_mappings_legacy.yml (located in the root of this repo) to your model mappings - see Mapping Overrides.

Row-by-Row Syncing

If you have an old style tool that needs to sync data on a row by row basis, you can pass in the legacy_support: true option. In order for this to work, your models must have a create_or_update_from_csv class method defined that accepts a row argument. This method will get passed each row from the CSV, and it's up to you to persist it.

Example:

CanvasSync.provisioning_sync(['users', 'courses'], term_scope: :active, legacy_support: true)

You may also provide an array of model names. Doing so will only provide legacy support for the specified models.

CanvasSync.provisioning_sync(['users', 'courses'], term_scope: :active, legacy_support: ['courses'])

In the above example, users will sync normally while courses will require a create_or_update_from_csv method.

CanvasSync::JobLog

Running the migrations will create a canvas_sync_job_logs table. All the jobs written in this gem will create a CanvasSync::JobLog and store data about their arguments, job class, any exceptions, and start/completion time. This will work regardless of your queue adapter.

If you want your own jobs to also log to the table all you have to do is have your job class inherit from CanvasSync::Job. You can also persist extra data you might need later by saving to the metadata column:

@job_log. = "This job ran really well!"
@job_log.save!

If you want to be able to utilize the CanvasSync::JobLog without ActiveJob (so you can get access to Sidekiq features that ActiveJob doesn't support), then add the following to an initializer in your Rails app:

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.add CanvasSync::Sidekiq::Middleware
  end
end

Syncronize different reports

CanvasSync provides the functionality to import data from other reports into an specific table.

This can be achieved by using the following method

chain = CanvasSync.default_provisioning_report_chain
chain << {
  job: CanvasSync::Jobs::SyncSimpleTableJob,
  options: {
    report_name: <report name>,
    model: <model to sync>,
    params: <hash with the require parameters the report needs to sync>
  },
}
chain.process!

Configuration

You can configure CanvasSync settings by doing the following:

CanvasSync.configure do |config|
  config.classes_to_only_log_errors_on << "ClassToOnlyLogErrorsOn"
end

Available config options (if you add more, please update this!):

  • config.classes_to_only_log_errors_on - use this if you are utilizing the CanvasSync::JobLog table, but want certain classes to only persist in the job_logs table if an error is encountered. This is useful if you've got a very frequently used job that's filling up your database, and only really care about tracking failures.

Global Options

You can pass in global_options to a job chain. Global options are added to the batch_context and referenced by various internal processes.

Pass global options into a job chain, using the options param nested in a :global key. options: { global: {...} }

report_timeout (integer): Number of days until a Canvas report should timeout. Default is 1. report_compilation_timeout (integer): Number of days until a Canvas report should timeout. Default is 1 hour. You can likely pass a float to achieve sub-day timeouts, but not tested. report_max_tries (integer): The number of times to attempt a report before giving up. A report is considered failed if it has an 'error' status in Canvas or is deleted.

This is an example job chain with global options:

job_chain = CanvasSync.default_provisioning_report_chain(
  MODELS_TO_SYNC,
  term_scope: :active,
  full_sync_every: 'sunday',
  options: { global: { report_timeout: 2 } }
)

Handling Job errors

If you need custom handling for when a CanvasSync Job fails, you can add an :on_failure option to you Job Chain's :global_options. The value should be a String in the following format: ModuleOrClass::AnotherModuleOrClass.class_method. The given method of the given class will be called when an error occurs. The handling method should accept 2 arguments: [error, **options]

The current parameters provided in **options are:

  • job_chain
  • job_log

Example:

class CanvasSyncStarterWorker
  def perform
    job_chain = CanvasSync.default_provisioning_report_chain(
      %w[desired models],
      options: {
        global: {
          on_failure: 'CanvasSyncStarterWorker.handle_canvas_sync_error',
        }
      }
    )
  end

  def self.handle_canvas_sync_error(error, **options)
    # Do Stuff
  end
end

Upgrading

Re-running the generator when there's been a gem change will give you several choices if it detects conflicts between your local files and the updated generators. You can either view a diff or allow the generator to overwrite your local file. In most cases you may just want to add the code from the diff yourself so as not to break any of your customizations.

Additionally, if there have been schema changes to an existing model you may have to run your own migration to bring it up to speed.

Also see CHANGELOG.md.

If you make updates to the gem please add any upgrade instructions to CHANGELOG.md.

Integrating with existing applications

In order for this to work properly your database tables will need to have at least the columns defined in this gem. (Adding additional columns is fine.) As such, you may need to run some migrations to rename existing columns or add missing ones. The generator only works well in a situation where that table does not already exist. Take a look at the migration templates in lib/canvas_sync/generators/templates to see what you need.

Development

When adding to or updating this gem, make sure you do the following:

  • Update the yardoc comments where necessary, and confirm the changes by running yardoc --server
  • Write specs
  • If you modify the model or migration templates, run bundle exec rake update_test_schema to update them in the Rails Dummy application (and commit those changes)

Docs

Docs can be generated using yard. To view the docs:

  • Clone this gem's repository
  • bundle install
  • yard server --reload

The yard server will give you a URL you can visit to view the docs.