Class: Webhookdb::Replicator::Base

Inherits:

Object

Object
Webhookdb::Replicator::Base

show all

Includes:: Appydays::Loggable, DBAdapter::ColumnTypes

Defined in:: lib/webhookdb/replicator/base.rb

Direct Known Subclasses

AtomSingleFeedV1, AwsPricingV1, ConvertkitBroadcastV1, ConvertkitSubscriberV1, ConvertkitTagV1, EmailOctopusCampaignV1, EmailOctopusContactV1, EmailOctopusEventV1, EmailOctopusListV1, Fake, FrontConversationV1, FrontMarketplaceRootV1, FrontMessageV1, FrontSignalwireMessageChannelAppV1, GithubIssueCommentV1, GithubIssueV1, GithubPullV1, GithubReleaseV1, GithubRepositoryEventV1, IcalendarCalendarV1, IcalendarEventV1, IncreaseACHTransferV1, IncreaseAccountNumberV1, IncreaseAccountTransferV1, IncreaseAccountV1, IncreaseAppV1, IncreaseCheckTransferV1, IncreaseEventV1, IncreaseLimitV1, IncreaseTransactionV1, IncreaseWireTransferV1, IntercomContactV1, IntercomConversationV1, IntercomMarketplaceRootV1, PlivoSmsInboundV1, PostmarkInboundMessageV1, PostmarkOutboundMessageEventV1, ShopifyCustomerV1, ShopifyOrderV1, SignalwireMessageV1, SponsyCustomerV1, SponsyPlacementV1, SponsyPublicationV1, SponsySlotV1, SponsyStatusV1, StripeChargeV1, StripeCouponV1, StripeCustomerV1, StripeDisputeV1, StripeInvoiceItemV1, StripeInvoiceV1, StripePayoutV1, StripePriceV1, StripeProductV1, StripeRefundV1, StripeSubscriptionItemV1, StripeSubscriptionV1, TransistorEpisodeStatsV1, TransistorEpisodeV1, TransistorShowV1, TwilioSmsV1, UrlRecorderV1, WebhookdbCustomerV1

Defined Under Namespace

Classes: CredentialVerificationResult, ServiceBackfiller

Constant Summary collapse

MAX_INDEX_NAME_LENGTH =

Constants included from DBAdapter::ColumnTypes

Instance Attribute Summary collapse

#service_integration ⇒ Webhookdb::ServiceIntegration readonly

Class Method Summary collapse

.chunked_row_update_bounds(max_pk, chunk_size: 1_000_000) ⇒ Object

Return an array of tuples used for splitting UPDATE queries so locks are not held on the entire table when backfilling values when adding new columns.
.descriptor ⇒ Webhookdb::Replicator::Descriptor abstract

Return the descriptor for this service.

Instance Method Summary collapse

#_any_subscriptions_to_notify? ⇒ Boolean
#_backfill_state_change_fields ⇒ Object

If we support backfilling, these keys are used for them.
#_backfillers ⇒ Array<Webhookdb::Backfiller>

Return backfillers for the replicator.
#_clear_backfill_information ⇒ Object
#_clear_webook_information ⇒ Object
#_coalesce_excluded_on_update(update, column_names) ⇒ Object

Have a column set itself only on insert or if nil.
#_denormalized_columns ⇒ Array<Webhookdb::Replicator::Column]

When an integration needs denormalized columns, specify them here.
#_enqueue_backfill_jobs(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ Object
#_extra_index_specs ⇒ Array<Webhook::Replicator::IndexSpec>

Names of columns for multi-column indices.
#_fetch_enrichment(resource, event, request) ⇒ *

Given the resource that is going to be inserted and an optional event, make an API call to enrich it with further data if needed.
#_find_dependency_candidate(value) ⇒ Object
#_notify_dependents(inserting, changed) ⇒ Object
#_parallel_backfill ⇒ Object

If this replicator supports backfilling in parallel (running multiple backfillers at a time), return the degree of paralellism (or nil if not running in parallel).
#_prepare_for_insert(resource, event, request, enrichment) ⇒ Hash

Return the hash that should be inserted into the database, based on the denormalized columns and data given.
#_publish_rowupsert(row, check_for_subscriptions: true) ⇒ Object
#_remote_key_column ⇒ Webhookdb::Replicator::Column abstract

Each integration needs a single remote key, like the Shopify order id for shopify orders, or sid for Twilio resources.
#_resource_and_event(request) ⇒ Array<Hash>^? abstract

Given a webhook/backfill item payload, return the resource hash, and an optional event hash.
#_resource_to_data(resource, event, request, enrichment) ⇒ Hash

Given the resource, return the value for the :data column.
#_store_enrichment_body? ⇒ Boolean

Use this to determine whether we should add an enrichment column in the create table modification to store the enrichment body.
#_timestamp_column_name ⇒ Symbol abstract

The name of the timestamp column in the schema.
#_to_json(v) ⇒ Object

The NULL ASCII character (u0000), when present in a string (“u0000”), and then encoded into JSON (“\u0000”) is invalid in PG JSONB- its strings cannot contain NULLs (note that JSONB does not store the encoded string verbatim, it parses it into PG types, and a PG string cannot contain NULL since C strings are NULL-terminated).
#_update_where_expr ⇒ Sequel::SQL::Expression abstract

The argument for insert_conflict update_where clause.
#_upsert_update_expr(inserting, enrichment: nil) ⇒ Object

Given the hash that is passed to the Sequel insert (so contains all columns, including those from _prepare_for_insert), return the hash used for the insert_conflict(update:) keyword args.
#_upsert_webhook(request, upsert: true) ⇒ Object

Hook to be overridden, while still retaining top-level upsert_webhook functionality like error handling.
#_verify_backfill_err_msg ⇒ Object
#_webhook_response(request) ⇒ Webhookdb::WebhookResponse abstract

Return a the response for the webhook.
#_webhook_state_change_fields ⇒ Object

If we support webhooks, these fields correspond to the webhook state machine.
#admin_dataset(**kw) ⇒ Sequel::Dataset

Yield to a dataset using the admin connection.
#backfill(job) ⇒ Object

In order to backfill, we need to: - Iterate through pages of records from the external service - Upsert each record The caveats/complexities are: - The backfill method should take care of retrying fetches for failed pages.
#backfill_not_supported_message ⇒ Object

When backfilling is not supported, this message is used.
#calculate_and_backfill_state_machine(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ Array<Webhookdb::StateMachineStep, Webhookdb::BackfillJob>

Run calculate_backfill_state_machine.
#calculate_backfill_state_machine ⇒ Webhookdb::Replicator::StateMachineStep

Return the state machine that is used when adding backfill support to an integration.
#calculate_dependency_state_machine_step(dependency_help:) ⇒ Object
#calculate_preferred_create_state_machine ⇒ Webhookdb::Replicator::StateMachineStep

See preferred_create_state_machine_method.
#calculate_webhook_state_machine ⇒ Webhookdb::Replicator::StateMachineStep abstract

Return the state machine that is used when setting up this integration.
#clear_backfill_information ⇒ Object

Remove all the information needed for backfilling from the integration so that it can be re-entered.
#clear_webhook_information ⇒ Object

Remove all the information used in the initial creation of the integration so that it can be re-entered.
#create_table(if_not_exists: false) ⇒ Object
#create_table_modification(if_not_exists: false) ⇒ Webhookdb::Replicator::SchemaModification

Return the schema modification used to create the table where it does nto exist.
#data_column ⇒ Webhookdb::DBAdapter::Column
#dbadapter_table ⇒ Webhookdb::DBAdapter::Table

Return a DBAdapter table based on the schema_and_table_symbols.
#denormalized_columns ⇒ Array<Webhookdb::DBAdapter::Column>
#descriptor ⇒ Webhookdb::Replicator::Descriptor
#dispatch_request_to(request) ⇒ Webhookdb::Replicator::Base

A given HTTP request may not be handled by the service integration it was sent to, for example where the service integration is part of some ‘root’ hierarchy.
#documentation_url ⇒ Object
#enqueue_sync_targets ⇒ Object

Some replicators support ‘instant sync’, because they are upserted en-masse rather than row-by-row.
#enrichment_column ⇒ Webhookdb::DBAdapter::Column^?

Column used to store enrichments.
#ensure_all_columns ⇒ Object

We support adding columns to existing integrations without having to bump the version; changing types, or removing/renaming columns, is not supported and should bump the version or must be handled out-of-band (like deleting the integration then backfilling).
#ensure_all_columns_modification ⇒ Webhookdb::Replicator::SchemaModification
#find_dependent(service_name) ⇒ Webhookdb::ServiceIntegration^?

Find a dependent service integration with the given service name.
#find_dependent!(service_name) ⇒ Webhookdb::ServiceIntegration
#indices(table) ⇒ Array<Webhookdb::DBAdapter::Index>
#initialize(service_integration) ⇒ Base constructor

A new instance of Base.
#on_dependency_webhook_upsert(replicator, payload, changed:) ⇒ Object

Called when the upstream dependency upserts.
#preferred_create_state_machine_method ⇒ Symbol

If the integration supports webhooks, then we want to do that on create.
#preprocess_headers_for_logging(headers) ⇒ Object

In some cases, services may send us sensitive headers we do not want to log.
#primary_key_column ⇒ Webhookdb::DBAdapter::Column
#process_state_change(field, value, attr: nil) ⇒ Webhookdb::Replicator::StateMachineStep

Set the new service integration field and return the newly calculated state machine.
#process_webhooks_synchronously? ⇒ Boolean

Return true if the service should process webhooks in the actual endpoint, rather than asynchronously through the job system.
#qualified_table_sequel_identifier(schema: nil, table: nil) ⇒ Sequel::SQL::QualifiedIdentifier

Return a Sequel identifier using schema_and_table_symbols, or schema or table as overrides if given.
#readonly_dataset(**kw) ⇒ Sequel::Dataset

Yield to a dataset using the readonly connection.
#remote_key_column ⇒ Webhookdb::DBAdapter::Column
#requires_sequence? ⇒ Boolean

Some integrations require sequences, like when upserting rows with numerical unique ids (if they were random values like UUIDs we could generate them and not use a sequence).
#resource_name_plural ⇒ Object
#resource_name_singular ⇒ Object
#schema_and_table_symbols ⇒ Array<Symbol>

Return a tuple of (schema, table) based on the organization’s replication schema, and the service integration’s table name.
#storable_columns ⇒ Array<Webhookdb::DBAdapter::Column>

Denormalized columns, plus the enrichment column if supported.
#synchronous_processing_response_body(upserted:, request:) ⇒ String

Call with the value that was inserted by synchronous processing.
#timestamp_column ⇒ Webhookdb::DBAdapter::Column

Column to use as the ‘timestamp’ for the row.
#upsert_has_deps? ⇒ Boolean

Return true if the integration requires making an API call to upsert.
#upsert_webhook(request, **kw) ⇒ Object

Upsert a webhook request into the database.
#upsert_webhook_body(body, **kw) ⇒ Object

Upsert webhook using only a body.
#verify_backfill_credentials ⇒ Webhookdb::CredentialVerificationResult

Try to verify backfill credentials, by fetching the first page of items.
#webhook_endpoint ⇒ Object
#webhook_response(request) ⇒ Webhookdb::WebhookResponse

Given a Rack request, return the webhook response object.

Constructor Details

#initialize(service_integration) ⇒ `Base`

Returns a new instance of Base.



31
32
33

# File 'lib/webhookdb/replicator/base.rb', line 31

def initialize(service_integration)
  @service_integration = service_integration
end

Instance Attribute Details

#service_integration ⇒ `Webhookdb::ServiceIntegration` (readonly)

Returns:

(Webhookdb::ServiceIntegration)



29
30
31

# File 'lib/webhookdb/replicator/base.rb', line 29

def service_integration
  @service_integration
end

Class Method Details

.chunked_row_update_bounds(max_pk, chunk_size: 1_000_000) ⇒ `Object`

Return an array of tuples used for splitting UPDATE queries so locks are not held on the entire table when backfilling values when adding new columns. See ensure_all_columns_modification.

The returned chunks are like: [[0, 100], [100, 200], [200]], and meant to be used like ‘0 < pk <= 100`, `100 < pk <= 200`, `p, > 200`.

Note that final value in the array is a single item, used like ‘pk > chunks[0]`.

# File 'lib/webhookdb/replicator/base.rb', line 594

def self.chunked_row_update_bounds(max_pk, chunk_size: 1_000_000)
  result = []
  chunk_lower_pk = 0
  chunk_upper_pk = chunk_size
  while chunk_upper_pk <= max_pk
    # Get chunks like 0 < pk <= 100, 100 < pk <= 200, etc
    # Each loop we increment one row chunk size, until we find the chunk containing our max PK.
    # Ie if row chunk size is 100, and max_pk is 450, the final chunk here is 400-500.
    result << [chunk_lower_pk, chunk_upper_pk]
    chunk_lower_pk += chunk_size
    chunk_upper_pk += chunk_size
  end
  # Finally, one final chunk for all rows greater than our biggest chunk.
  # For example, with a row chunk size of 100, and max_pk of 450, we got a final chunk of 400-500.
  # But we could have gotten 100 writes (with a new max pk of 550), so this 'pk > 500' catches those.
  result << [chunk_lower_pk]
end

.descriptor ⇒ `Webhookdb::Replicator::Descriptor`

This method is abstract.

Return the descriptor for this service.

Returns:

(Webhookdb::Replicator::Descriptor)

Raises:

(NotImplementedError)



24
25
26

# File 'lib/webhookdb/replicator/base.rb', line 24

def self.descriptor
  raise NotImplementedError, "#{self.class}: must return a descriptor that is used for registration purposes"
end

Instance Method Details

#_any_subscriptions_to_notify? ⇒ `Boolean`

Returns:

(Boolean)



719
720
721

# File 'lib/webhookdb/replicator/base.rb', line 719

def _any_subscriptions_to_notify?
  return !self.service_integration.all_webhook_subscriptions_dataset.to_notify.empty?
end

#_backfill_state_change_fields ⇒ `Object`

If we support backfilling, these keys are used for them. Override if other fields are used instead. There cannot be overlap between these and the webhook state change fields.

145	# File 'lib/webhookdb/replicator/base.rb', line 145 def _backfill_state_change_fields = ["backfill_key", "backfill_secret", "api_url"]

#_backfillers ⇒ `Array<Webhookdb::Backfiller>`

Return backfillers for the replicator. We must use an array for ‘data-based’ backfillers, like when we need to paginate for each row in another table.

By default, return a ServiceBackfiller, which will call _fetch_backfill_page on the receiver.

Returns:

(Array<Webhookdb::Backfiller>)



1064
1065
1066

# File 'lib/webhookdb/replicator/base.rb', line 1064

def _backfillers
  return [ServiceBackfiller.new(self)]
end

#_clear_backfill_information ⇒ `Object`



314
315
316

# File 'lib/webhookdb/replicator/base.rb', line 314

def _clear_backfill_information
  self.service_integration.set(api_url: "", backfill_key: "", backfill_secret: "")
end

#_clear_webook_information ⇒ `Object`



301
302
303

# File 'lib/webhookdb/replicator/base.rb', line 301

def _clear_webook_information
  self.service_integration.set(webhook_secret: "")
end

#_coalesce_excluded_on_update(update, column_names) ⇒ `Object`

Have a column set itself only on insert or if nil.

Given the payload to DO UPDATE, mutate it so that the column names included in ‘column_names’ use what is already in the table, and fall back to what’s being inserted. This new payload should be passed to the ‘update` kwarg of `insert_conflict`:

ds.insert_conflict(update: self._coalesce_excluded_on_update(payload, :created_at)).insert(payload)

Parameters:

update (Hash)
column_names (Array<Symbol>)

# File 'lib/webhookdb/replicator/base.rb', line 875

def _coalesce_excluded_on_update(update, column_names)
  # Now replace just the specific columns we're overriding.
  column_names.each do |c|
    update[c] = Sequel.function(:coalesce, self.qualified_table_sequel_identifier[c], Sequel[:excluded][c])
  end
end

#_denormalized_columns ⇒ `Array<Webhookdb::Replicator::Column]`

When an integration needs denormalized columns, specify them here. Indices are created for each column. Modifiers can be used if columns should have a default or whatever. See Webhookdb::Replicator::Column for more details about column fields.

Returns:

(Array<Webhookdb::Replicator::Column]) —

Array<Webhookdb::Replicator::Column]



480
481
482

# File 'lib/webhookdb/replicator/base.rb', line 480

def _denormalized_columns
  return []
end

#_enqueue_backfill_jobs(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ `Object`

# File 'lib/webhookdb/replicator/base.rb', line 215

def _enqueue_backfill_jobs(incremental:, criteria: nil, recursive: true, enqueue: true)
  m = recursive ? :create_recursive : :create
  j = Webhookdb::BackfillJob.send(
    m,
    service_integration:,
    incremental:,
    criteria: criteria || {},
    created_by: Webhookdb.request_user_and_admin[0],
  )
  j.enqueue if enqueue
  return j
end

#_extra_index_specs ⇒ `Array<Webhook::Replicator::IndexSpec>`

Names of columns for multi-column indices. Each one must be in denormalized_columns.

Returns:

(Array<Webhook::Replicator::IndexSpec>)



432
433
434

# File 'lib/webhookdb/replicator/base.rb', line 432

def _extra_index_specs
  return []
end

#_fetch_enrichment(resource, event, request) ⇒ `*`

Given the resource that is going to be inserted and an optional event, make an API call to enrich it with further data if needed. The result of this is passed to _prepare_for_insert.

Parameters:

resource (Hash, nil)
event (Hash, nil)
request (Webhookdb::Replicator::WebhookRequest)

Returns:

(*)



756
757
758

# File 'lib/webhookdb/replicator/base.rb', line 756

def _fetch_enrichment(resource, event, request)
  return nil
end

#_find_dependency_candidate(value) ⇒ `Object`

Parameters:

value (String)

Raises:

(Webhookdb::InvalidPrecondition)

# File 'lib/webhookdb/replicator/base.rb', line 229

def _find_dependency_candidate(value)
  int_val = value.strip.blank? ? 1 : value.to_i
  idx = int_val - 1
  dep_candidates = self.service_integration.dependency_candidates
  raise Webhookdb::InvalidPrecondition, "no dependency candidates" if dep_candidates.empty?
  raise Webhookdb::InvalidInput, "'#{value}' is not a valid dependency" if
    idx.negative? || idx >= dep_candidates.length
  return dep_candidates[idx]
end

#_notify_dependents(inserting, changed) ⇒ `Object`

Parameters:

changed (Boolean)

# File 'lib/webhookdb/replicator/base.rb', line 713

def _notify_dependents(inserting, changed)
  self.service_integration.dependents.each do |d|
    d.replicator.on_dependency_webhook_upsert(self, inserting, changed:)
  end
end

#_parallel_backfill ⇒ `Object`

If this replicator supports backfilling in parallel (running multiple backfillers at a time), return the degree of paralellism (or nil if not running in parallel). We leave parallelism up to the replicator, not CPU count, since most work involves waiting on APIs to return.

NOTE: These threads are in addition to any worker threads, so it’s important to pay attention to memory use.



1052
1053
1054

# File 'lib/webhookdb/replicator/base.rb', line 1052

def _parallel_backfill
  return nil
end

#_prepare_for_insert(resource, event, request, enrichment) ⇒ `Hash`

Return the hash that should be inserted into the database, based on the denormalized columns and data given.

Parameters:

resource (Hash, nil)
event (Hash, nil)
request (Webhookdb::Replicator::WebhookRequest)
enrichment (Hash, nil)

Returns:

(Hash)

# File 'lib/webhookdb/replicator/base.rb', line 808

def _prepare_for_insert(resource, event, request, enrichment)
  h = [self._remote_key_column].concat(self._denormalized_columns).each_with_object({}) do |col, memo|
    value = col.to_ruby_value(resource:, event:, enrichment:, service_integration:)
    skip = value.nil? && col.skip_nil?
    memo[col.name] = value unless skip
  end
  return h
end

#_publish_rowupsert(row, check_for_subscriptions: true) ⇒ `Object`

# File 'lib/webhookdb/replicator/base.rb', line 723

def _publish_rowupsert(row, check_for_subscriptions: true)
  return unless check_for_subscriptions && self._any_subscriptions_to_notify?
  payload = [
    self.service_integration.id,
    {
      row:,
      external_id_column: self._remote_key_column.name,
      external_id: row[self._remote_key_column.name],
    },
  ]
  # We AVOID pubsub here because we do NOT want to go through the router
  # and audit logger for this.
  event = Amigo::Event.create("webhookdb.serviceintegration.rowupsert", payload.as_json)
  Webhookdb::Jobs::SendWebhook.perform_async(event.as_json)
end

#_remote_key_column ⇒ `Webhookdb::Replicator::Column`

This method is abstract.

Each integration needs a single remote key, like the Shopify order id for shopify orders, or sid for Twilio resources. This column must be unique for the table, like a primary key.

Returns:

(Webhookdb::Replicator::Column)

Raises:

(NotImplementedError)



470
471
472

# File 'lib/webhookdb/replicator/base.rb', line 470

def _remote_key_column
  raise NotImplementedError
end

#_resource_and_event(request) ⇒ `Array<Hash>`^?

This method is abstract.

Given a webhook/backfill item payload, return the resource hash, and an optional event hash. If ‘body’ is the resource itself, this method returns [body, nil]. If ‘body’ is an event, this method returns [body.resource-key, body]. Columns can check for whether there is an event and/or body when converting.

If this returns nil, the upsert is skipped.

For example, a Stripe customer backfill upsert would be ‘’cus_123’‘ when we backfill, but `’event’, data: {id: ‘cus_123’}‘ when handling an event.

Parameters:

request (Webhookdb::Replicator::WebhookRequest)

Returns:

(Array<Hash>, nil)

Raises:

(NotImplementedError)



797
798
799

# File 'lib/webhookdb/replicator/base.rb', line 797

def _resource_and_event(request)
  raise NotImplementedError
end

#_resource_to_data(resource, event, request, enrichment) ⇒ `Hash`

Given the resource, return the value for the :data column. Only needed in rare situations where fields should be stored on the row, but not in :data. To skip :data column updates, return nil.

Parameters:

resource (Hash, nil)
event (Hash, nil)
request (Webhookdb::Replicator::WebhookRequest)
enrichment (Hash, nil)

Returns:

(Hash)



826
827
828

# File 'lib/webhookdb/replicator/base.rb', line 826

def _resource_to_data(resource, event, request, enrichment)
  return resource
end

#_store_enrichment_body? ⇒ `Boolean`

Use this to determine whether we should add an enrichment column in the create table modification to store the enrichment body.

Returns:

(Boolean)



338
339
340

# File 'lib/webhookdb/replicator/base.rb', line 338

def _store_enrichment_body?
  return false
end

#_timestamp_column_name ⇒ `Symbol`

This method is abstract.

The name of the timestamp column in the schema. This column is used primarily for conditional upserts (ie to know if a row has changed), but also as a general way of auditing changes.

Returns:

(Symbol)

Raises:

(NotImplementedError)



461
462
463

# File 'lib/webhookdb/replicator/base.rb', line 461

def _timestamp_column_name
  raise NotImplementedError
end

#_to_json(v) ⇒ `Object`

The NULL ASCII character (u0000), when present in a string (“u0000”), and then encoded into JSON (“\u0000”) is invalid in PG JSONB- its strings cannot contain NULLs (note that JSONB does not store the encoded string verbatim, it parses it into PG types, and a PG string cannot contain NULL since C strings are NULL-terminated).

So we remove the “\u0000” character from encoded JSON- for example, in the hash “u0000”, if we #to_json, we end up with ‘“x”:“\u0000”’. The removal of encoded NULL gives us ‘“x”:“”’.

HOWEVER, if the encoded null is itself escaped, we MUST NOT remove it. For example, in the hash “u0000”.to_json.to_json (ie, a JSON string which contains another JSON string), we end up with ‘“x”:“\\u0000”`, That is, a string containing the escaped null character. This is valid for PG, because it’s not a NULL- it’s an escaped “", followed by ”u0000“. If we were to remove the string ”\u0000“, we’d end up with ‘”x“:”\“’. This creates an invalid document.

So we remove only “\u0000” by not replacing “\\u0000”- replace all occurences of “<any one character except backslash>\u0000” with “<character before backslash>”.



708
709
710

# File 'lib/webhookdb/replicator/base.rb', line 708

def _to_json(v)
  return v.to_json.gsub(/(\\\\u0000|\\u0000)/, {"\\\\u0000" => "\\\\u0000", "\\u0000" => ""})
end

#_update_where_expr ⇒ `Sequel::SQL::Expression`

This method is abstract.

The argument for insert_conflict update_where clause. Used to conditionally update, like updating only if a row is newer than what’s stored. We must always have an ‘update where’ because we never want to overwrite with the same data as exists.

If an integration does not have any way to detect if a resource changed, it can compare data columns.

Examples:

With a meaningful timestmap

self.qualified_table_sequel_identifier[:updated_at] < Sequel[:excluded][:updated_at]

Without a meaingful timestamp

self.qualified_table_sequel_identifier[:data] !~ Sequel[:excluded][:data]

Returns:

(Sequel::SQL::Expression)

Raises:

(NotImplementedError)



776
777
778

# File 'lib/webhookdb/replicator/base.rb', line 776

def _update_where_expr
  raise NotImplementedError
end

#_upsert_update_expr(inserting, enrichment: nil) ⇒ `Object`

Given the hash that is passed to the Sequel insert (so contains all columns, including those from _prepare_for_insert), return the hash used for the insert_conflict(update:) keyword args.

Rather than sending over the literal values in the inserting statement (which is pretty verbose, like the large ‘data’ column), make a smaller statement by using ‘EXCLUDED’.

This can be overriden when the service requires different values for inserting vs. updating, such as when a column’s update value must use the EXCLUDED table in the upsert expression.

Most commonly, the use case for this is when you want to provide a row a value, but ONLY on insert, OR on update by ONLY if the column is nil. In that case, pass the result of this base method to _coalesce_excluded_on_update (see also for more details).

You can also use this method to merge :data columns together. For example: ‘super_result = Sequel.lit(“#Webhookdb::Replicator::Base.selfself.service_integrationself.service_integration.table_name.data || excluded.data”)`

By default, this will use the same values for UPDATE as are used for INSERT, like ‘email = EXCLUDED.email` (the ’EXCLUDED’ row being the one that failed to insert).

# File 'lib/webhookdb/replicator/base.rb', line 852

def _upsert_update_expr(inserting, enrichment: nil)
  result = inserting.each_with_object({}) { |(c, _), h| h[c] = Sequel[:excluded][c] }
  return result
end

#_upsert_webhook(request, upsert: true) ⇒ `Object`

Hook to be overridden, while still retaining top-level upsert_webhook functionality like error handling.

Parameters:

request (Webhookdb::Replicator::WebhookRequest)
upsert (Boolean) (defaults to: true) —

If false, just return what would be upserted.

Raises:

(Webhookdb::InvalidPostcondition)

# File 'lib/webhookdb/replicator/base.rb', line 664

def _upsert_webhook(request, upsert: true)
  resource, event = self._resource_and_event(request)
  return nil if resource.nil?
  enrichment = self._fetch_enrichment(resource, event, request)
  prepared = self._prepare_for_insert(resource, event, request, enrichment)
  raise Webhookdb::InvalidPostcondition if prepared.key?(:data)
  inserting = {}
  data_col_val = self._resource_to_data(resource, event, request, enrichment)
  inserting[:data] = self._to_json(data_col_val)
  inserting[:enrichment] = self._to_json(enrichment) if self._store_enrichment_body?
  inserting.merge!(prepared)
  return inserting unless upsert
  remote_key_col = self._remote_key_column
  updating = self._upsert_update_expr(inserting, enrichment:)
  update_where = self._update_where_expr
  upserted_rows = self.admin_dataset(timeout: :fast) do |ds|
    ds.insert_conflict(
      target: remote_key_col.name,
      update: updating,
      update_where:,
    ).insert(inserting)
  end
  row_changed = upserted_rows.present?
  self._notify_dependents(inserting, row_changed)
  self._publish_rowupsert(inserting) if row_changed
  return inserting
end

#_verify_backfill_err_msg ⇒ `Object`

Raises:

(NotImplementedError)



975
976
977

# File 'lib/webhookdb/replicator/base.rb', line 975

def _verify_backfill_err_msg
  raise NotImplementedError, "each integration must provide an error message for unanticipated errors"
end

#_webhook_response(request) ⇒ `Webhookdb::WebhookResponse`

This method is abstract.

Return a the response for the webhook. We must do this immediately in the endpoint itself, since verification may include info specific to the request content (like, it can be whitespace sensitive).

Parameters:

request (Rack::Request)

Returns:

(Webhookdb::WebhookResponse)

Raises:

(NotImplementedError)



134
135
136

# File 'lib/webhookdb/replicator/base.rb', line 134

def _webhook_response(request)
  raise NotImplementedError
end

#_webhook_state_change_fields ⇒ `Object`

If we support webhooks, these fields correspond to the webhook state machine. Override them if some other fields are also needed for webhooks.

140	# File 'lib/webhookdb/replicator/base.rb', line 140 def _webhook_state_change_fields = ["webhook_secret"]

#admin_dataset(**kw) ⇒ `Sequel::Dataset`

Yield to a dataset using the admin connection.

Returns:

(Sequel::Dataset)



884
885
886

# File 'lib/webhookdb/replicator/base.rb', line 884

def admin_dataset(**kw, &)
  self.with_dataset(self.service_integration.organization.admin_connection_url_raw, **kw, &)
end

#backfill(job) ⇒ `Object`

In order to backfill, we need to:

Iterate through pages of records from the external service
Upsert each record

The caveats/complexities are:

The backfill method should take care of retrying fetches for failed pages.
That means it needs to keep track of some pagination token.

Parameters:

job (Webhookdb::BackfillJob)

Raises:

(Webhookdb::InvalidPrecondition)

# File 'lib/webhookdb/replicator/base.rb', line 988

def backfill(job)
  raise Webhookdb::InvalidPrecondition, "job is for different service integration" unless
    job.service_integration === self.service_integration

  raise Webhookdb::InvariantViolation, "manual backfill not supported" unless self.descriptor.supports_backfill?

  sint = self.service_integration
  raise Webhookdb::Replicator::CredentialsMissing if
    sint.backfill_key.blank? && sint.backfill_secret.blank? && sint.depends_on.blank?
  last_backfilled = job.incremental? ? sint.last_backfilled_at : nil
  new_last_backfilled = Time.now
  job.update(started_at: Time.now)

  backfillers = self._backfillers(**job.criteria.symbolize_keys)
  if self._parallel_backfill && self._parallel_backfill > 1
    # Create a dedicated threadpool for these backfillers,
    # with max parallelism determined by the replicator.
    pool = Concurrent::FixedThreadPool.new(self._parallel_backfill)
    # Record any errors that occur, since they won't raise otherwise.
    # Initialize a sized array to avoid any potential race conditions (though GIL should make it not an issue?).
    errors = Array.new(backfillers.size)
    backfillers.each_with_index do |bf, idx|
      pool.post do
        bf.backfill(last_backfilled)
      rescue StandardError => e
        errors[idx] = e
      end
    end
    # We've enqueued all backfillers; do not accept anymore work.
    pool.shutdown
    loop do
      # We want to stop early if we find an error, so check for errors every 10 seconds.
      completed = pool.wait_for_termination(10)
      first_error = errors.find { |e| !e.nil? }
      if first_error.nil?
        # No error, and wait_for_termination returned true, so all work is done.
        break if completed
        # No error, but work is still going on, so loop again.
        next
      end
      # We have an error; don't run any more backfillers.
      pool.kill
      # Wait for all ongoing backfills before raising.
      pool.wait_for_termination
      raise first_error
    end
  else
    backfillers.each do |backfiller|
      backfiller.backfill(last_backfilled)
    end
  end

  sint.update(last_backfilled_at: new_last_backfilled) if job.incremental?
  job.update(finished_at: Time.now)
  job.enqueue_children
end

#backfill_not_supported_message ⇒ `Object`

When backfilling is not supported, this message is used. It can be overridden for custom explanations, or descriptor#documentation_url can be provided, which will use a default message. If no documentation is available, a fallback message is used.

# File 'lib/webhookdb/replicator/base.rb', line 277

def backfill_not_supported_message
  du = self.documentation_url
  if du.blank?
    msg = %(Sorry, you cannot backfill this integration. You may be looking for one of the following:

webhookdb integrations reset #{self.service_integration.table_name}
    )
    return msg
  end
  msg = %(Sorry, you cannot manually backfill this integration.
Please refer to the documentation at #{du}
for information on how to refresh data.)
  return msg
end

#calculate_and_backfill_state_machine(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ `Array<Webhookdb::StateMachineStep, Webhookdb::BackfillJob>`

Run calculate_backfill_state_machine. Then create and enqueue a new BackfillJob if it’s successful. Returns a tuple of the StateMachineStep and BackfillJob. If the BackfillJob is returned, the StateMachineStep was successful; otherwise no job is created and the second item is nil.

Returns:

(Array<Webhookdb::StateMachineStep, Webhookdb::BackfillJob>)

# File 'lib/webhookdb/replicator/base.rb', line 265

def calculate_and_backfill_state_machine(incremental:, criteria: nil, recursive: true, enqueue: true)
  step = self.calculate_backfill_state_machine
  bfjob = nil
  bfjob = self._enqueue_backfill_jobs(incremental:, criteria:, recursive:, enqueue:) if step.successful?
  return step, bfjob
end

#calculate_backfill_state_machine ⇒ `Webhookdb::Replicator::StateMachineStep`

Return the state machine that is used when adding backfill support to an integration. Usually this sets one or both of the backfill key and secret.

Returns:

(Webhookdb::Replicator::StateMachineStep)

Raises:

(NotImplementedError)

# File 'lib/webhookdb/replicator/base.rb', line 254

def calculate_backfill_state_machine
  # This is a pure function that can be tested on its own--the endpoints just need to return a state machine step
  raise NotImplementedError
end

#calculate_dependency_state_machine_step(dependency_help:) ⇒ `Object`

Raises:

(Webhookdb::InvalidPrecondition)

# File 'lib/webhookdb/replicator/base.rb', line 1124

def calculate_dependency_state_machine_step(dependency_help:)
  raise Webhookdb::InvalidPrecondition, "#{self.descriptor.name} does not have a dependency" if
    self.class.descriptor.dependency_descriptor.nil?
  return nil if self.service_integration.depends_on_id
  step = Webhookdb::Replicator::StateMachineStep.new
  dep_descr = self.descriptor.dependency_descriptor
  candidates = self.service_integration.dependency_candidates
  if candidates.empty?
    step.output = %(This integration requires #{dep_descr.resource_name_plural} to sync.

You don't have any #{dep_descr.resource_name_singular} integrations yet. You can run:

webhookdb integrations create #{dep_descr.name}

to set one up. Then once that's complete, you can re-run:

webhookdb integrations create #{self.descriptor.name}

to keep going.
)
    step.error_code = "no_candidate_dependency"
    return step.completed
  end
  choice_lines = candidates.each_with_index.
    map { |si, idx| "#{idx + 1} - #{si.table_name}" }.
    join("\n")
  step.output = %(This integration requires #{dep_descr.resource_name_plural} to sync.
#{dependency_help.blank? ? '' : "\n#{dependency_help}\n"}
Enter the number for the #{dep_descr.resource_name_singular} integration you want to use,
or leave blank to choose the first option.

#{choice_lines}
)
  step.prompting("Parent integration number")
  step.post_to_url = self.service_integration.authed_api_path + "/transition/dependency_choice"
  return step
end

#calculate_preferred_create_state_machine ⇒ `Webhookdb::Replicator::StateMachineStep`

See preferred_create_state_machine_method. If we prefer backfilling, and it’s successful, we also want to enqueue jobs; that is, use calculate_and_backfill_state_machine, not just calculate_backfill_state_machine.

Returns:

(Webhookdb::Replicator::StateMachineStep)

# File 'lib/webhookdb/replicator/base.rb', line 209

def calculate_preferred_create_state_machine
  m = self.preferred_create_state_machine_method
  return self.calculate_and_backfill_state_machine(incremental: true)[0] if m == :calculate_backfill_state_machine
  return self.calculate_webhook_state_machine
end

#calculate_webhook_state_machine ⇒ `Webhookdb::Replicator::StateMachineStep`

This method is abstract.

Return the state machine that is used when setting up this integration. Usually this entails providing the user the webhook url, and providing or asking for a webhook secret. In some cases, this can be a lot more complex though.

Returns:

(Webhookdb::Replicator::StateMachineStep)

Raises:

(NotImplementedError)



246
247
248

# File 'lib/webhookdb/replicator/base.rb', line 246

def calculate_webhook_state_machine
  raise NotImplementedError
end

#clear_backfill_information ⇒ `Object`

Remove all the information needed for backfilling from the integration so that it can be re-entered

# File 'lib/webhookdb/replicator/base.rb', line 306

def clear_backfill_information
  self._clear_backfill_information
  # If we don't support both webhooks and backfilling, we are safe to clear ALL fields
  # and get back into an initial state.
  self._clear_webook_information unless self.descriptor.supports_webhooks_and_backfill?
  self.service_integration.save_changes
end

#clear_webhook_information ⇒ `Object`

Remove all the information used in the initial creation of the integration so that it can be re-entered

# File 'lib/webhookdb/replicator/base.rb', line 293

def clear_webhook_information
  self._clear_webook_information
  # If we don't support both webhooks and backfilling, we are safe to clear ALL fields
  # and get back into an initial state.
  self._clear_backfill_information unless self.descriptor.supports_webhooks_and_backfill?
  self.service_integration.save_changes
end

#create_table(if_not_exists: false) ⇒ `Object`

# File 'lib/webhookdb/replicator/base.rb', line 342

def create_table(if_not_exists: false)
  cmd = self.create_table_modification(if_not_exists:)
  self.admin_dataset(timeout: :fast) do |ds|
    cmd.execute(ds.db)
  end
end

#create_table_modification(if_not_exists: false) ⇒ `Webhookdb::Replicator::SchemaModification`

Return the schema modification used to create the table where it does nto exist.

Returns:

(Webhookdb::Replicator::SchemaModification)

# File 'lib/webhookdb/replicator/base.rb', line 351

def create_table_modification(if_not_exists: false)
  table = self.dbadapter_table
  columns = [self.primary_key_column, self.remote_key_column]
  columns.concat(self.storable_columns)
  # 'data' column should be last, since it's very large, we want to see other columns in psql/pgcli first
  columns << self.data_column
  adapter = Webhookdb::DBAdapter::PG.new
  result = Webhookdb::Replicator::SchemaModification.new
  result.transaction_statements << adapter.create_table_sql(table, columns, if_not_exists:)
  self.indices(table).each do |dbindex|
    result.transaction_statements << adapter.create_index_sql(dbindex, concurrently: false)
  end
  result.application_database_statements << self.service_integration.ensure_sequence_sql if self.requires_sequence?
  return result
end

#data_column ⇒ `Webhookdb::DBAdapter::Column`

Returns:

(Webhookdb::DBAdapter::Column)



413
414
415

# File 'lib/webhookdb/replicator/base.rb', line 413

def data_column
  return Webhookdb::DBAdapter::Column.new(name: :data, type: OBJECT, nullable: false)
end

#dbadapter_table ⇒ `Webhookdb::DBAdapter::Table`

Return a DBAdapter table based on the schema_and_table_symbols.

Returns:

(Webhookdb::DBAdapter::Table)

# File 'lib/webhookdb/replicator/base.rb', line 99

def dbadapter_table
  sch, tbl = self.schema_and_table_symbols
  schema = Webhookdb::DBAdapter::Schema.new(name: sch)
  table = Webhookdb::DBAdapter::Table.new(name: tbl, schema:)
  return table
end

#denormalized_columns ⇒ `Array<Webhookdb::DBAdapter::Column>`

Returns:

(Array<Webhookdb::DBAdapter::Column>)



425
426
427

# File 'lib/webhookdb/replicator/base.rb', line 425

def denormalized_columns
  return self._denormalized_columns.map(&:to_dbadapter)
end

#descriptor ⇒ `Webhookdb::Replicator::Descriptor`

Returns:

(Webhookdb::Replicator::Descriptor)



36
37
38

# File 'lib/webhookdb/replicator/base.rb', line 36

def descriptor
  return @descriptor ||= self.class.descriptor
end

#dispatch_request_to(request) ⇒ `Webhookdb::Replicator::Base`

A given HTTP request may not be handled by the service integration it was sent to, for example where the service integration is part of some ‘root’ hierarchy. This method is called in the webhook endpoint, and should return the replicator used to handle the webhook request. The request is validated by the returned instance, and it is enqueued for processing.

By default, the service called by the webhook is the one we want to use, so return self.

Parameters:

request (Rack::Request)

Returns:

(Webhookdb::Replicator::Base)



635
636
637

# File 'lib/webhookdb/replicator/base.rb', line 635

def dispatch_request_to(request)
  return self
end

#documentation_url ⇒ `Object`

979	# File 'lib/webhookdb/replicator/base.rb', line 979 def documentation_url = nil

#enqueue_sync_targets ⇒ `Object`

Some replicators support ‘instant sync’, because they are upserted en-masse rather than row-by-row. That is, usually we run sync targets on a cron, because otherwise we’d need to run the sync target for every row. But if inserting is always done through backfilling, we know we have a useful set of results to sync, so don’t need to wait for cron.

# File 'lib/webhookdb/replicator/base.rb', line 929

def enqueue_sync_targets
  self.service_integration.sync_targets.each do |stgt|
    Webhookdb::Jobs::SyncTargetRunSync.perform_async(stgt.id)
  end
end

#enrichment_column ⇒ `Webhookdb::DBAdapter::Column`^?

Column used to store enrichments. Return nil if the service does not use enrichments.

Returns:

(Webhookdb::DBAdapter::Column, nil)

# File 'lib/webhookdb/replicator/base.rb', line 419

def enrichment_column
  return nil unless self._store_enrichment_body?
  return Webhookdb::DBAdapter::Column.new(name: :enrichment, type: OBJECT, nullable: true)
end

#ensure_all_columns ⇒ `Object`

We support adding columns to existing integrations without having to bump the version; changing types, or removing/renaming columns, is not supported and should bump the version or must be handled out-of-band (like deleting the integration then backfilling). To figure out what columns we need to add, we can check what are currently defined, check what exists, and add denormalized columns and indices for those that are missing.

# File 'lib/webhookdb/replicator/base.rb', line 509

def ensure_all_columns
  modification = self.ensure_all_columns_modification
  return if modification.noop?
  self.admin_dataset(timeout: :slow_schema) do |ds|
    modification.execute(ds.db)
    # We need to clear cached columns on the data since we know we're adding more.
    # It's probably not a huge deal but may as well keep it in sync.
    ds.send(:clear_columns_cache)
  end
  self.readonly_dataset { |ds| ds.send(:clear_columns_cache) }
end

#ensure_all_columns_modification ⇒ `Webhookdb::Replicator::SchemaModification`

Returns:

(Webhookdb::Replicator::SchemaModification)

# File 'lib/webhookdb/replicator/base.rb', line 522

def ensure_all_columns_modification
  existing_cols, existing_indices = nil
  max_pk = 0
  sint = self.service_integration
  self.admin_dataset do |ds|
    return self.create_table_modification unless ds.db.table_exists?(self.qualified_table_sequel_identifier)
    existing_cols = ds.columns.to_set
    existing_indices = ds.db[:pg_indexes].where(
      schemaname: sint.organization.replication_schema,
      tablename: sint.table_name,
    ).select_map(:indexname).to_set
    max_pk = ds.max(:pk) || 0
  end
  adapter = Webhookdb::DBAdapter::PG.new
  table = self.dbadapter_table
  result = Webhookdb::Replicator::SchemaModification.new

  missing_columns = self._denormalized_columns.delete_if { |c| existing_cols.include?(c.name) }
  # Add missing columns
  missing_columns.each do |whcol|
    # Don't bother bulking the ADDs into a single ALTER TABLE, it won't really matter.
    result.transaction_statements << adapter.add_column_sql(table, whcol.to_dbadapter)
  end
  # Easier to handle this explicitly than use storage_columns, but it a duplicated concept so be careful.
  if (enrich_col = self.enrichment_column) && !existing_cols.include?(enrich_col.name)
    result.transaction_statements << adapter.add_column_sql(table, enrich_col)
  end

  # Backfill values for new columns.
  if missing_columns.any?
    # We need to backfill values into the new column, but we don't want to lock the entire table
    # as we update each row. So we need to update in chunks of rows.
    # Chunk size should be large for speed (and sending over fewer queries), but small enough
    # to induce a viable delay if another query is updating the same row.
    # Note that the delay will only be for writes to those rows; reads will not block,
    # so something a bit longer should be ok.
    #
    # Note that at the point these UPDATEs are running, we have the new column AND the new code inserting
    # into that new column. We could in theory skip all the PKs that were added after this modification
    # started to run. However considering the number of rows in this window will always be relatively low
    # (though not absolutely low), and the SQL backfill operation should yield the same result
    # as the Ruby operation, this doesn't seem too important.
    result.nontransaction_statements.concat(missing_columns.filter_map(&:backfill_statement))
    update_expr = missing_columns.to_h { |c| [c.name, c.backfill_expr || c.to_sql_expr] }
    self.admin_dataset do |ds|
      chunks = Webhookdb::Replicator::Base.chunked_row_update_bounds(max_pk)
      chunks[...-1].each do |(lower, upper)|
        update_query = ds.where { pk > lower }.where { pk <= upper }.update_sql(update_expr)
        result.nontransaction_statements << update_query
      end
      final_update_query = ds.where { pk > chunks[-1][0] }.update_sql(update_expr)
      result.nontransaction_statements << final_update_query
    end
  end

  # Add missing indices. This should happen AFTER the UPDATE calls so the UPDATEs don't have to update indices.
  self.indices(table).map do |index|
    next if existing_indices.include?(index.name.to_s)
    result.nontransaction_statements << adapter.create_index_sql(index, concurrently: true)
  end

  result.application_database_statements << sint.ensure_sequence_sql if self.requires_sequence?
  return result
end

#find_dependent(service_name) ⇒ `Webhookdb::ServiceIntegration`^?

Find a dependent service integration with the given service name. If none are found, return nil. If multiple are found, raise, as this should only be used for automatically managed integrations.

Returns:

(Webhookdb::ServiceIntegration, nil)

Raises:

(Webhookdb::InvalidPrecondition)

# File 'lib/webhookdb/replicator/base.rb', line 322

def find_dependent(service_name)
  sints = self.service_integration.dependents.filter { |si| si.service_name == service_name }
  raise Webhookdb::InvalidPrecondition, "there are multiple #{service_name} integrations in dependents" if
    sints.length > 1
  return sints.first
end

#find_dependent!(service_name) ⇒ `Webhookdb::ServiceIntegration`

Returns:

(Webhookdb::ServiceIntegration)

Raises:

(Webhookdb::InvalidPrecondition)

# File 'lib/webhookdb/replicator/base.rb', line 330

def find_dependent!(service_name)
  sint = self.find_dependent(service_name)
  raise Webhookdb::InvalidPrecondition, "there is no #{service_name} integration in dependents" if sint.nil?
  return sint
end

#indices(table) ⇒ `Array<Webhookdb::DBAdapter::Index>`

Returns:

(Array<Webhookdb::DBAdapter::Index>)

# File 'lib/webhookdb/replicator/base.rb', line 485

def indices(table)
  dba_columns = [self.primary_key_column, self.remote_key_column]
  dba_columns.concat(self.storable_columns)
  dba_cols_by_name = dba_columns.index_by(&:name)

  result = []
  dba_columns.select(&:index?).each do |c|
    targets = [c]
    idx_name = self.index_name(targets)
    result << Webhookdb::DBAdapter::Index.new(name: idx_name.to_sym, table:, targets:, where: c.index_where)
  end
  self._extra_index_specs.each do |spec|
    targets = spec.columns.map { |n| dba_cols_by_name.fetch(n) }
    idx_name = self.index_name(targets)
    result << Webhookdb::DBAdapter::Index.new(name: idx_name.to_sym, table:, targets:, where: spec.where)
  end
  return result
end

#on_dependency_webhook_upsert(replicator, payload, changed:) ⇒ `Object`

Called when the upstream dependency upserts. In most cases, you can noop; but in some cases, you may want to update or fetch rows. One example would be a ‘db only’ integration, where values are taken from the parent service and added to this service’s table. We may want to upsert rows in our table whenever a row in our parent table changes.

Parameters:

replicator (Webhookdb::Replicator::Base)
payload (Hash)
changed (Boolean)

Raises:

(NotImplementedError)



1120
1121
1122

# File 'lib/webhookdb/replicator/base.rb', line 1120

def on_dependency_webhook_upsert(replicator, payload, changed:)
  raise NotImplementedError, "this must be overridden for replicators that have dependencies"
end

#preferred_create_state_machine_method ⇒ `Symbol`

If the integration supports webhooks, then we want to do that on create. If it’s backfill only, then we fall back to that instead. Things like choosing dependencies are webhook-vs-backfill agnostic, so which machine we choose isn’t that important (but it does happen during creation).

Returns:

(Symbol)



201
202
203

# File 'lib/webhookdb/replicator/base.rb', line 201

def preferred_create_state_machine_method
  return self.descriptor.supports_webhooks? ? :calculate_webhook_state_machine : :calculate_backfill_state_machine
end

#preprocess_headers_for_logging(headers) ⇒ `Object`

In some cases, services may send us sensitive headers we do not want to log. This should be very rare but some services are designed really badly and send auth info in the webhook. Remove or obfuscate the passed header hash.

76	# File 'lib/webhookdb/replicator/base.rb', line 76 def preprocess_headers_for_logging(headers); end

#primary_key_column ⇒ `Webhookdb::DBAdapter::Column`

Returns:

(Webhookdb::DBAdapter::Column)



403
404
405

# File 'lib/webhookdb/replicator/base.rb', line 403

def primary_key_column
  return Webhookdb::DBAdapter::Column.new(name: :pk, type: BIGINT, pk: true)
end

#process_state_change(field, value, attr: nil) ⇒ `Webhookdb::Replicator::StateMachineStep`

Set the new service integration field and return the newly calculated state machine.

Subclasses can override this method and then super, to change the field or value.

Parameters:

field (String) —

Like ‘webhook_secret’, ‘backfill_key’, etc.
value (String) —

The value of the field.
attr (String) (defaults to: nil) —

Subclasses can pass in a custom field that does not correspond to a service integration column. When doing that, they must pass in attr, which is what will be set during the state change.

Returns:

(Webhookdb::Replicator::StateMachineStep)

# File 'lib/webhookdb/replicator/base.rb', line 159

def process_state_change(field, value, attr: nil)
  attr ||= field
  desc = self.descriptor
  value = value.strip if value.respond_to?(:strip)
  case field
    when *self._webhook_state_change_fields
      # If we don't support webhooks, then the backfill state machine may be using it.
      meth = desc.supports_webhooks? ? :calculate_webhook_state_machine : :calculate_backfill_state_machine
    when *self._backfill_state_change_fields
      # If we don't support backfilling, then the create state machine may be using them.
      meth = desc.supports_backfill? ? :calculate_backfill_state_machine : :calculate_webhook_state_machine
    when "dependency_choice"
      # Choose an upstream dependency for an integration.
      # See where this is used for more details.
      meth = self.preferred_create_state_machine_method
      value = self._find_dependency_candidate(value)
      attr = "depends_on"
    when "noop_create"
      # Use this to just recalculate the state machine,
      # not make any changes to the data.
      return self.calculate_preferred_create_state_machine
    else
      raise ArgumentError, "Field '#{field}' is not valid for a state change"
  end
  self.service_integration.db.transaction do
    self.service_integration.send(:"#{attr}=", value)
    self.service_integration.save_changes
    step = self.send(meth)
    if step.successful? && meth == :calculate_backfill_state_machine
      # If we are processing the backfill state machine, and we finish successfully,
      # we always want to start syncing.
      self._enqueue_backfill_jobs(incremental: true)
    end
    return step
  end
end

#process_webhooks_synchronously? ⇒ `Boolean`

Return true if the service should process webhooks in the actual endpoint, rather than asynchronously through the job system. This should ONLY be used where we have important order-of-operations in webhook processing and/or need to return data to the webhook sender.

NOTE: You MUST implement synchronous_processing_response_body if this returns true.

Returns:

(Boolean)



56
57
58

# File 'lib/webhookdb/replicator/base.rb', line 56

def process_webhooks_synchronously?
  return false
end

#qualified_table_sequel_identifier(schema: nil, table: nil) ⇒ `Sequel::SQL::QualifiedIdentifier`

Return a Sequel identifier using schema_and_table_symbols, or schema or table as overrides if given.

Returns:

(Sequel::SQL::QualifiedIdentifier)

# File 'lib/webhookdb/replicator/base.rb', line 92

def qualified_table_sequel_identifier(schema: nil, table: nil)
  sch, tbl = self.schema_and_table_symbols
  return Sequel[schema || sch][table || tbl]
end

#readonly_dataset(**kw) ⇒ `Sequel::Dataset`

Yield to a dataset using the readonly connection.

Returns:

(Sequel::Dataset)



890
891
892

# File 'lib/webhookdb/replicator/base.rb', line 890

def readonly_dataset(**kw, &)
  self.with_dataset(self.service_integration.organization.readonly_connection_url_raw, **kw, &)
end

#remote_key_column ⇒ `Webhookdb::DBAdapter::Column`

Returns:

(Webhookdb::DBAdapter::Column)



408
409
410

# File 'lib/webhookdb/replicator/base.rb', line 408

def remote_key_column
  return self._remote_key_column.to_dbadapter(unique: true, nullable: false)
end

#requires_sequence? ⇒ `Boolean`

Some integrations require sequences, like when upserting rows with numerical unique ids (if they were random values like UUIDs we could generate them and not use a sequence). In those cases, the integrations can mark themselves as requiring a sequence.

The sequence will be created in the *application database*, but it used primarily when inserting rows into the *organization/replication database*. This is necessary because things like sequences are not possible to migrate when moving replication databases.

Returns:

(Boolean)



620
621
622

# File 'lib/webhookdb/replicator/base.rb', line 620

def requires_sequence?
  return false
end

#resource_name_plural ⇒ `Object`



44
45
46

# File 'lib/webhookdb/replicator/base.rb', line 44

def resource_name_plural
  return @resource_name_plural ||= self.descriptor.resource_name_plural
end

#resource_name_singular ⇒ `Object`



40
41
42

# File 'lib/webhookdb/replicator/base.rb', line 40

def resource_name_singular
  return @resource_name_singular ||= self.descriptor.resource_name_singular
end

#schema_and_table_symbols ⇒ `Array<Symbol>`

Return a tuple of (schema, table) based on the organization’s replication schema, and the service integration’s table name.

Returns:

(Array<Symbol>)

# File 'lib/webhookdb/replicator/base.rb', line 82

def schema_and_table_symbols
  sch = self.service_integration.organization&.replication_schema&.to_sym || :public
  tbl = self.service_integration.table_name.to_sym
  return [sch, tbl]
end

#storable_columns ⇒ `Array<Webhookdb::DBAdapter::Column>`

Denormalized columns, plus the enrichment column if supported. Does not include the data or external id columns, though perhaps it should.

Returns:

(Array<Webhookdb::DBAdapter::Column>)

# File 'lib/webhookdb/replicator/base.rb', line 439

def storable_columns
  cols = self.denormalized_columns
  if (enr = self.enrichment_column)
    cols << enr
  end
  return cols
end

#synchronous_processing_response_body(upserted:, request:) ⇒ `String`

Call with the value that was inserted by synchronous processing. Takes the row values being upserted (result upsert_webhook), and the arguments used to upsert it (arguments to upsert_webhook), and should return the body string to respond back with.

Parameters:

upserted (Hash)
request (Webhookdb::Replicator::WebhookRequest)

Returns:

(String)

Raises:

(NotImplementedError)

# File 'lib/webhookdb/replicator/base.rb', line 68

def synchronous_processing_response_body(upserted:, request:)
  return {message: "process synchronously"}.to_json if Webhookdb::Replicator.always_process_synchronously
  raise NotImplementedError, "must be implemented if process_webhooks_synchronously? is true"
end

#timestamp_column ⇒ `Webhookdb::DBAdapter::Column`

Column to use as the ‘timestamp’ for the row. This is usually some created or updated at timestamp.

Returns:

(Webhookdb::DBAdapter::Column)

Raises:

(NotImplementedError)

# File 'lib/webhookdb/replicator/base.rb', line 450

def timestamp_column
  got = self._denormalized_columns.find { |c| c.name == self._timestamp_column_name }
  raise NotImplementedError, "#{self.descriptor.name} has no timestamp column #{self._timestamp_column_name}" if
    got.nil?
  return got.to_dbadapter
end

#upsert_has_deps? ⇒ `Boolean`

Return true if the integration requires making an API call to upsert. This puts the sync into a lower-priority queue so it is less likely to block other processing. This is usually true if enrichments are involved.

Returns:

(Boolean)



744
745
746

# File 'lib/webhookdb/replicator/base.rb', line 744

def upsert_has_deps?
  return false
end

#upsert_webhook(request, **kw) ⇒ `Object`

Upsert a webhook request into the database. Note this is a WebhookRequest, NOT a Rack::Request.

Parameters:

request (Webhookdb::Replicator::WebhookRequest)

# File 'lib/webhookdb/replicator/base.rb', line 652

def upsert_webhook(request, **kw)
  return self._upsert_webhook(request, **kw)
rescue StandardError => e
  self.logger.error("upsert_webhook_error", request: request.as_json, error: e)
  raise
end

#upsert_webhook_body(body, **kw) ⇒ `Object`

Upsert webhook using only a body. This is not valid for the rare integration which does not rely on request info, like when we have to take different action based on a request method.

Parameters:

body (Hash)



644
645
646

# File 'lib/webhookdb/replicator/base.rb', line 644

def upsert_webhook_body(body, **kw)
  return self.upsert_webhook(Webhookdb::Replicator::WebhookRequest.new(body:), **kw)
end

#verify_backfill_credentials ⇒ `Webhookdb::CredentialVerificationResult`

Try to verify backfill credentials, by fetching the first page of items. Only relevant for integrations supporting backfilling.

If an error is received, return ‘verify_backfill<http status>_err_msg` as the error message, if defined. So for example, a 401 will call the method _verify_backfill_401_err_msg if defined. If such a method is not defined, call and return _verify_backfill_err_msg.

Returns:

(Webhookdb::CredentialVerificationResult)

# File 'lib/webhookdb/replicator/base.rb', line 948

def verify_backfill_credentials
  backfiller = self._backfillers.first
  if backfiller.nil?
    # If for some reason we do not have a backfiller,
    # we can't verify credentials. This should never happen in practice,
    # because we wouldn't call this method if the integration doesn't support it.
    raise "No backfiller available for #{self.service_integration.inspect}"
  end
  begin
    # begin backfill attempt but do not return backfill result
    backfiller.fetch_backfill_page(nil, last_backfilled: nil)
  rescue Webhookdb::Http::Error => e
    msg = if self.respond_to?(:"_verify_backfill_#{e.status}_err_msg")
            self.send(:"_verify_backfill_#{e.status}_err_msg")
    else
      self._verify_backfill_err_msg
    end
    return CredentialVerificationResult.new(verified: false, message: msg)
  rescue TypeError, NoMethodError => e
    # if we don't incur an HTTP error, but do incur an Error due to differences in the shapes of anticipated
    # response data in the `fetch_backfill_page` function, we can assume that the credentials are okay
    self.logger.info "verify_backfill_credentials_expected_failure", error: e
    return CredentialVerificationResult.new(verified: true, message: "")
  end
  return CredentialVerificationResult.new(verified: true, message: "")
end

#webhook_endpoint ⇒ `Object`



1162
1163
1164

# File 'lib/webhookdb/replicator/base.rb', line 1162

def webhook_endpoint
  return self._webhook_endpoint
end

#webhook_response(request) ⇒ `Webhookdb::WebhookResponse`

Given a Rack request, return the webhook response object. Usually this performs verification of the request based on the webhook secret configured on the service integration. Note that if skip_webhook_verification is true on the service integration, this method always returns 201.

Parameters:

request (Rack::Request)

Returns:

(Webhookdb::WebhookResponse)

# File 'lib/webhookdb/replicator/base.rb', line 122

def webhook_response(request)
  return Webhookdb::WebhookResponse.ok(status: 201) if self.service_integration.skip_webhook_verification
  return self._webhook_response(request)
end

Class: Webhookdb::Replicator::Base

Direct Known Subclasses

Defined Under Namespace

Constant Summary collapse

Constants included from DBAdapter::ColumnTypes

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(service_integration) ⇒ Base

Instance Attribute Details

#service_integration ⇒ Webhookdb::ServiceIntegration (readonly)

Class Method Details

.chunked_row_update_bounds(max_pk, chunk_size: 1_000_000) ⇒ Object

.descriptor ⇒ Webhookdb::Replicator::Descriptor

Instance Method Details

#_any_subscriptions_to_notify? ⇒ Boolean

#_backfill_state_change_fields ⇒ Object

#_backfillers ⇒ Array<Webhookdb::Backfiller>

#_clear_backfill_information ⇒ Object

#_clear_webook_information ⇒ Object

#_coalesce_excluded_on_update(update, column_names) ⇒ Object

#_denormalized_columns ⇒ Array<Webhookdb::Replicator::Column]

#_enqueue_backfill_jobs(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ Object

#_extra_index_specs ⇒ Array<Webhook::Replicator::IndexSpec>

#_fetch_enrichment(resource, event, request) ⇒ *

#_find_dependency_candidate(value) ⇒ Object

#_notify_dependents(inserting, changed) ⇒ Object

#_parallel_backfill ⇒ Object

#_prepare_for_insert(resource, event, request, enrichment) ⇒ Hash

#_publish_rowupsert(row, check_for_subscriptions: true) ⇒ Object

#_remote_key_column ⇒ Webhookdb::Replicator::Column

#_resource_and_event(request) ⇒ Array<Hash>?

#_resource_to_data(resource, event, request, enrichment) ⇒ Hash

#_store_enrichment_body? ⇒ Boolean

#_timestamp_column_name ⇒ Symbol

#_to_json(v) ⇒ Object

#_update_where_expr ⇒ Sequel::SQL::Expression

Examples:

With a meaningful timestmap

Without a meaingful timestamp

#_upsert_update_expr(inserting, enrichment: nil) ⇒ Object

#_upsert_webhook(request, upsert: true) ⇒ Object

#_verify_backfill_err_msg ⇒ Object

#_webhook_response(request) ⇒ Webhookdb::WebhookResponse

#_webhook_state_change_fields ⇒ Object

#admin_dataset(**kw) ⇒ Sequel::Dataset

#backfill(job) ⇒ Object

#backfill_not_supported_message ⇒ Object

#calculate_and_backfill_state_machine(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ Array<Webhookdb::StateMachineStep, Webhookdb::BackfillJob>

#calculate_backfill_state_machine ⇒ Webhookdb::Replicator::StateMachineStep

#calculate_dependency_state_machine_step(dependency_help:) ⇒ Object

#calculate_preferred_create_state_machine ⇒ Webhookdb::Replicator::StateMachineStep

#calculate_webhook_state_machine ⇒ Webhookdb::Replicator::StateMachineStep

#clear_backfill_information ⇒ Object

#clear_webhook_information ⇒ Object

#create_table(if_not_exists: false) ⇒ Object

#create_table_modification(if_not_exists: false) ⇒ Webhookdb::Replicator::SchemaModification

#data_column ⇒ Webhookdb::DBAdapter::Column

#dbadapter_table ⇒ Webhookdb::DBAdapter::Table

#denormalized_columns ⇒ Array<Webhookdb::DBAdapter::Column>

#descriptor ⇒ Webhookdb::Replicator::Descriptor

#dispatch_request_to(request) ⇒ Webhookdb::Replicator::Base

#documentation_url ⇒ Object

#enqueue_sync_targets ⇒ Object

#enrichment_column ⇒ Webhookdb::DBAdapter::Column?

#ensure_all_columns ⇒ Object

#ensure_all_columns_modification ⇒ Webhookdb::Replicator::SchemaModification

#find_dependent(service_name) ⇒ Webhookdb::ServiceIntegration?

#find_dependent!(service_name) ⇒ Webhookdb::ServiceIntegration

#indices(table) ⇒ Array<Webhookdb::DBAdapter::Index>

#on_dependency_webhook_upsert(replicator, payload, changed:) ⇒ Object

#preferred_create_state_machine_method ⇒ Symbol

#preprocess_headers_for_logging(headers) ⇒ Object

#primary_key_column ⇒ Webhookdb::DBAdapter::Column

#process_state_change(field, value, attr: nil) ⇒ Webhookdb::Replicator::StateMachineStep

#process_webhooks_synchronously? ⇒ Boolean

#qualified_table_sequel_identifier(schema: nil, table: nil) ⇒ Sequel::SQL::QualifiedIdentifier

#readonly_dataset(**kw) ⇒ Sequel::Dataset

#remote_key_column ⇒ Webhookdb::DBAdapter::Column

#initialize(service_integration) ⇒ `Base`

#service_integration ⇒ `Webhookdb::ServiceIntegration` (readonly)

.chunked_row_update_bounds(max_pk, chunk_size: 1_000_000) ⇒ `Object`

.descriptor ⇒ `Webhookdb::Replicator::Descriptor`

#_any_subscriptions_to_notify? ⇒ `Boolean`

#_backfill_state_change_fields ⇒ `Object`

#_backfillers ⇒ `Array<Webhookdb::Backfiller>`

#_clear_backfill_information ⇒ `Object`

#_clear_webook_information ⇒ `Object`

#_coalesce_excluded_on_update(update, column_names) ⇒ `Object`

#_denormalized_columns ⇒ `Array<Webhookdb::Replicator::Column]`

#_enqueue_backfill_jobs(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ `Object`

#_extra_index_specs ⇒ `Array<Webhook::Replicator::IndexSpec>`

#_fetch_enrichment(resource, event, request) ⇒ `*`

#_find_dependency_candidate(value) ⇒ `Object`

#_notify_dependents(inserting, changed) ⇒ `Object`

#_parallel_backfill ⇒ `Object`

#_prepare_for_insert(resource, event, request, enrichment) ⇒ `Hash`

#_publish_rowupsert(row, check_for_subscriptions: true) ⇒ `Object`

#_remote_key_column ⇒ `Webhookdb::Replicator::Column`

#_resource_and_event(request) ⇒ `Array<Hash>`^?

#_resource_to_data(resource, event, request, enrichment) ⇒ `Hash`

#_store_enrichment_body? ⇒ `Boolean`

#_timestamp_column_name ⇒ `Symbol`

#_to_json(v) ⇒ `Object`

#_update_where_expr ⇒ `Sequel::SQL::Expression`

#_upsert_update_expr(inserting, enrichment: nil) ⇒ `Object`

#_upsert_webhook(request, upsert: true) ⇒ `Object`

#_verify_backfill_err_msg ⇒ `Object`

#_webhook_response(request) ⇒ `Webhookdb::WebhookResponse`

#_webhook_state_change_fields ⇒ `Object`

#admin_dataset(**kw) ⇒ `Sequel::Dataset`

#backfill(job) ⇒ `Object`

#backfill_not_supported_message ⇒ `Object`

#calculate_and_backfill_state_machine(incremental:, criteria: nil, recursive: true, enqueue: true) ⇒ `Array<Webhookdb::StateMachineStep, Webhookdb::BackfillJob>`

#calculate_backfill_state_machine ⇒ `Webhookdb::Replicator::StateMachineStep`

#calculate_dependency_state_machine_step(dependency_help:) ⇒ `Object`

#calculate_preferred_create_state_machine ⇒ `Webhookdb::Replicator::StateMachineStep`

#calculate_webhook_state_machine ⇒ `Webhookdb::Replicator::StateMachineStep`

#clear_backfill_information ⇒ `Object`

#clear_webhook_information ⇒ `Object`

#create_table(if_not_exists: false) ⇒ `Object`

#create_table_modification(if_not_exists: false) ⇒ `Webhookdb::Replicator::SchemaModification`

#data_column ⇒ `Webhookdb::DBAdapter::Column`

#dbadapter_table ⇒ `Webhookdb::DBAdapter::Table`

#denormalized_columns ⇒ `Array<Webhookdb::DBAdapter::Column>`

#descriptor ⇒ `Webhookdb::Replicator::Descriptor`

#dispatch_request_to(request) ⇒ `Webhookdb::Replicator::Base`

#documentation_url ⇒ `Object`

#enqueue_sync_targets ⇒ `Object`

#enrichment_column ⇒ `Webhookdb::DBAdapter::Column`^?

#ensure_all_columns ⇒ `Object`

#ensure_all_columns_modification ⇒ `Webhookdb::Replicator::SchemaModification`

#find_dependent(service_name) ⇒ `Webhookdb::ServiceIntegration`^?

#find_dependent!(service_name) ⇒ `Webhookdb::ServiceIntegration`

#indices(table) ⇒ `Array<Webhookdb::DBAdapter::Index>`

#on_dependency_webhook_upsert(replicator, payload, changed:) ⇒ `Object`

#preferred_create_state_machine_method ⇒ `Symbol`

#preprocess_headers_for_logging(headers) ⇒ `Object`

#primary_key_column ⇒ `Webhookdb::DBAdapter::Column`

#process_state_change(field, value, attr: nil) ⇒ `Webhookdb::Replicator::StateMachineStep`

#process_webhooks_synchronously? ⇒ `Boolean`

#qualified_table_sequel_identifier(schema: nil, table: nil) ⇒ `Sequel::SQL::QualifiedIdentifier`

#readonly_dataset(**kw) ⇒ `Sequel::Dataset`

#remote_key_column ⇒ `Webhookdb::DBAdapter::Column`

#requires_sequence? ⇒ `Boolean`

#resource_name_plural ⇒ `Object`

#resource_name_singular ⇒ `Object`

#schema_and_table_symbols ⇒ `Array<Symbol>`

#storable_columns ⇒ `Array<Webhookdb::DBAdapter::Column>`

#synchronous_processing_response_body(upserted:, request:) ⇒ `String`

#timestamp_column ⇒ `Webhookdb::DBAdapter::Column`

#upsert_has_deps? ⇒ `Boolean`

#upsert_webhook(request, **kw) ⇒ `Object`

#upsert_webhook_body(body, **kw) ⇒ `Object`

#verify_backfill_credentials ⇒ `Webhookdb::CredentialVerificationResult`

#webhook_endpoint ⇒ `Object`

#webhook_response(request) ⇒ `Webhookdb::WebhookResponse`