Module: Chewy::Index::Import::ClassMethods

Defined in:
lib/chewy/index/import.rb

Instance Method Summary collapse

Instance Method Details

#bulk(**options) ⇒ Hash

Wraps elasticsearch API bulk method, adds additional features like bulk_size and suffix.

Parameters:

  • options (Hash{Symbol => Object})

    besides specific import options, it accepts all the options suitable for the bulk API call like refresh or timeout

Options Hash (**options):

  • suffix (String)

    bulk API chunk size in bytes; if passed, the request is performed several times for each chunk, empty by default

  • bulk_size (Integer)

    bulk API chunk size in bytes; if passed, the request is performed several times for each chunk, empty by default

  • body (Array<Hash>)

    elasticsearch API bulk method body

Returns:

  • (Hash)

    tricky transposed errors hash, empty if everything is fine

See Also:



104
105
106
107
108
109
# File 'lib/chewy/index/import.rb', line 104

def bulk(**options)
  error_items = BulkRequest.new(self, **options).perform(options[:body])
  Chewy.wait_for_status

  payload_errors(error_items)
end

#compose(object, crutches = nil, fields: []) ⇒ Hash

Composes a single document from the passed object. Uses either witchcraft or normal composing under the hood.

Parameters:

  • object (Object)

    a data source object

  • crutches (Object) (defaults to: nil)

    optional crutches object; if omitted - a crutch for the single passed object is created as a fallback

  • fields (Array<Symbol>) (defaults to: [])

    and array of fields to restrict the generated document

Returns:

  • (Hash)

    a JSON-ready hash



118
119
120
121
122
123
124
125
126
# File 'lib/chewy/index/import.rb', line 118

def compose(object, crutches = nil, fields: [])
  crutches ||= Chewy::Index::Crutch::Crutches.new self, [object]

  if witchcraft? && root.children.present?
    cauldron(fields: fields).brew(object, crutches)
  else
    root.compose(object, crutches, fields: fields)
  end
end

#import(*collection, **options) ⇒ true, false

Basically, one of the main methods for an index. Performs any objects import to the index. Does all the objects handling routines. Performs document import by utilizing bulk API. Bulk size and objects batch size are controlled by the corresponding options.

It accepts ORM/ODM objects, PORO, hashes, ids which are used by adapter to fetch objects from the source depending on the used adapter. It destroys passed objects from the index if they are not in the default scope or marked for destruction.

It handles parent-child relationships with a join field reindexing children when the parent is reindexed.

Performs journaling if enabled: it stores all the ids of the imported objects to a specialized index. It is possible to replay particular import later to restore the data consistency.

Performs partial index update using update bulk action if any fields are specified. Note that if document doesn't exist yet, an error will be raised by ES, but import catches this an errors and performs full indexing for the corresponding documents. This feature can be disabled by setting update_failover to false.

Utilizes ActiveSupport::Notifications, so it is possible to get imported objects later by listening to the import_objects.chewy queue. It is also possible to get the list of occurred errors from the payload if something went wrong.

Import can also be run in parallel using the Parallel gem functionality.

Examples:

UsersIndex.import(parallel: true) # imports everything in parallel with automatic workers number
UsersIndex.import(parallel: 3) # using 3 workers
UsersIndex.import(parallel: {in_threads: 10}) # in 10 threads

Parameters:

  • collection (Array<Object>)

    and array or anything to import

  • options (Hash{Symbol => Object})

    besides specific import options, it accepts all the options suitable for the bulk API call like refresh or timeout

Options Hash (**options):

  • suffix (String)

    an index name suffix, used for zero-downtime reset mostly, no suffix by default

  • bulk_size (Integer)

    bulk API chunk size in bytes; if passed, the request is performed several times for each chunk, empty by default

  • batch_size (Integer)

    passed to the adapter import method, used to split imported objects in chunks, 1000 by default

  • direct_import (Boolean)

    skips object reloading in ORM adapter, false by default

  • journal (true, false)

    enables imported objects journaling, false by default

  • update_fields (Array<Symbol, String>)

    list of fields for the partial import, empty by default

  • update_failover (true, false)

    enables full objects reimport in cases of partial update errors, true by default

  • parallel (true, Integer, Hash)

    enables parallel import processing with the Parallel gem, accepts the number of workers or any Parallel gem acceptable options

Returns:

  • (true, false)

    false in case of errors

See Also:



75
76
77
# File 'lib/chewy/index/import.rb', line 75

def import(*args)
  intercept_import_using_strategy(*args).blank?
end

#import!(*collection, **options) ⇒ Object

(see #import)

The only difference from #import is that it raises an exception in case of any import errors.

Raises:



86
87
88
89
90
91
92
# File 'lib/chewy/index/import.rb', line 86

def import!(*args)
  errors = intercept_import_using_strategy(*args)

  raise Chewy::ImportFailed.new(self, errors) if errors.present?

  true
end