Class: Mindee::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/mindee/client.rb

Overview

Mindee API Client. See: https://developers.mindee.com/docs

Instance Method Summary collapse

Constructor Details

#initialize(api_key: '') ⇒ Client

Returns a new instance of Client.

Parameters:

  • api_key (String) (defaults to: '')


13
14
15
# File 'lib/mindee/client.rb', line 13

def initialize(api_key: '')
  @api_key = api_key
end

Instance Method Details

#create_endpoint(endpoint_name: '', account_name: '', version: '') ⇒ Mindee::HTTP::Endpoint

Creates a custom endpoint with the given values. Do not set for standard (off the shelf) endpoints.

Parameters:

  • endpoint_name (String) (defaults to: '')

    For custom endpoints, the "API name" field in the "Settings" page of the API Builder. Do not set for standard (off the shelf) endpoints.

  • account_name (String) (defaults to: '')

    For custom endpoints, your account or organization username on the API Builder. This is normally not required unless you have a custom endpoint which has the same name as a standard (off the shelf) endpoint.

  • version (String) (defaults to: '')

    For custom endpoints, version of the product

Returns:



262
263
264
265
# File 'lib/mindee/client.rb', line 262

def create_endpoint(endpoint_name: '', account_name: '', version: '')
  initialize_endpoint(Mindee::Product::Custom::CustomV1, endpoint_name: endpoint_name, account_name: ,
                                                         version: version)
end

#enqueue(input_source, product_class, endpoint: nil, all_words: false, full_text: false, close_file: true, page_options: nil, cropper: false) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing

Doesn't need to be set in the case of OTS APIs.

Parameters:

  • product_class (Mindee::Inference)

    class of the product

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API.

  • all_words (Boolean) (defaults to: false)

    Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.
    • :operation Operation to apply on the document, given the `page_indexes specified:
      • :KEEP_ONLY - keep only the specified pages, and remove all others.
      • :REMOVE - remove the specified pages, and keep all others.
    • :on_min_pages Apply the operation only if document has at least this many pages.
  • cropper (Boolean) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

Returns:



93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/mindee/client.rb', line 93

def enqueue(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  full_text: false,
  close_file: true,
  page_options: nil,
  cropper: false
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.predict_async(input_source, all_words, full_text, close_file, cropper)
  Mindee::Parsing::Common::ApiResponse.new(product_class,
                                           prediction, raw_http)
end

#enqueue_and_parse(input_source, product_class, endpoint: nil, all_words: false, full_text: false, close_file: true, page_options: nil, cropper: false, initial_delay_sec: 2, delay_sec: 1.5, max_retries: 60) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing and automatically try to retrieve it

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API. Doesn't need to be set in the case of OTS APIs.

  • all_words (Boolean) (defaults to: false)

    Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.
    • :operation Operation to apply on the document, given the `page_indexes specified:
      • :KEEP_ONLY - keep only the specified pages, and remove all others.
      • :REMOVE - remove the specified pages, and keep all others.
    • :on_min_pages Apply the operation only if document has at least this many pages.
  • cropper (Boolean, nil) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

  • initial_delay_sec (Integer, Float) (defaults to: 2)

    initial delay before polling. Defaults to 2.

  • delay_sec (Integer, Float) (defaults to: 1.5)

    delay between polling attempts. Defaults to 1.5.

  • max_retries (Integer) (defaults to: 60)

    maximum amount of retries. Defaults to 60.

Returns:



156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/mindee/client.rb', line 156

def enqueue_and_parse(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  full_text: false,
  close_file: true,
  page_options: nil,
  cropper: false,
  initial_delay_sec: 2,
  delay_sec: 1.5,
  max_retries: 60
)
  enqueue_res = enqueue(
    input_source,
    product_class,
    endpoint: endpoint,
    all_words: all_words,
    full_text: full_text,
    close_file: close_file,
    page_options: page_options,
    cropper: cropper
  )
  sleep(initial_delay_sec)
  polling_attempts = 1
  job_id = enqueue_res.job.id
  queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
  while queue_res.job.status != Mindee::Parsing::Common::JobStatus::COMPLETED && polling_attempts < max_retries
    sleep(delay_sec)
    queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
    polling_attempts += 1
  end
  if queue_res.job.status != Mindee::Parsing::Common::JobStatus::COMPLETED
    elapsed = initial_delay_sec + (polling_attempts * delay_sec)
    raise "Asynchronous parsing request timed out after #{elapsed} seconds (#{polling_attempts} tries)"
  end

  queue_res
end

#load_prediction(product_class, local_response) ⇒ Mindee::Parsing::Common::ApiResponse

Load a prediction.

Parameters:

Returns:



203
204
205
206
207
# File 'lib/mindee/client.rb', line 203

def load_prediction(product_class, local_response)
  Mindee::Parsing::Common::ApiResponse.new(product_class, local_response.as_hash, local_response.as_hash.to_json)
rescue KeyError
  raise 'No prediction found in local response.'
end

#parse(input_source, product_class, endpoint: nil, all_words: false, full_text: false, close_file: true, page_options: nil, cropper: false) ⇒ Mindee::Parsing::Common::ApiResponse

Call prediction API on a document and parse the results.

Doesn't need to be set in the case of OTS APIs.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource)
  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint) (defaults to: nil)

    Endpoint of the API

  • all_words (Boolean) (defaults to: false)

    Whether to include the full text for each page. This performs a full OCR operation on the server and will increase response time.

  • full_text (Boolean) (defaults to: false)

    Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.

  • close_file (Boolean) (defaults to: true)

    Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.

  • page_options (Hash, nil) (defaults to: nil)

    Page cutting/merge options:

    • :page_indexes Zero-based list of page indexes.
    • :operation Operation to apply on the document, given the `page_indexes specified:
      • :KEEP_ONLY - keep only the specified pages, and remove all others.
      • :REMOVE - remove the specified pages, and keep all others.
    • :on_min_pages Apply the operation only if document has at least this many pages.
  • cropper (Boolean) (defaults to: false)

    Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.

Returns:



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/mindee/client.rb', line 46

def parse(
  input_source,
  product_class,
  endpoint: nil,
  all_words: false,
  full_text: false,
  close_file: true,
  page_options: nil,
  cropper: false
)
  if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
    input_source.process_pdf(page_options)
  end
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.predict(input_source, all_words, full_text, close_file, cropper)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#parse_queued(job_id, product_class, endpoint: nil) ⇒ Mindee::Parsing::Common::ApiResponse

Parses a queued document

Doesn't need to be set in the case of OTS APIs.

Parameters:

  • job_id (String)

    Id of the job (queue) to poll from

  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API

Returns:



120
121
122
123
124
125
126
127
128
# File 'lib/mindee/client.rb', line 120

def parse_queued(
  job_id,
  product_class,
  endpoint: nil
)
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  prediction, raw_http = endpoint.parse_async(job_id)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http)
end

#source_from_b64string(base64_string, filename, fix_pdf: false) ⇒ Mindee::Input::Source::Base64InputSource

Load a document from a base64 encoded string.

Parameters:

  • base64_string (String)

    Input to parse as base64 string

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



231
232
233
# File 'lib/mindee/client.rb', line 231

def source_from_b64string(base64_string, filename, fix_pdf: false)
  Input::Source::Base64InputSource.new(base64_string, filename, fix_pdf: fix_pdf)
end

#source_from_bytes(input_bytes, filename, fix_pdf: false) ⇒ Mindee::Input::Source::BytesInputSource

Load a document from raw bytes.

Parameters:

  • input_bytes (String)

    Encoding::BINARY byte input

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



222
223
224
# File 'lib/mindee/client.rb', line 222

def source_from_bytes(input_bytes, filename, fix_pdf: false)
  Input::Source::BytesInputSource.new(input_bytes, filename, fix_pdf: fix_pdf)
end

#source_from_file(input_file, filename, fix_pdf: false) ⇒ Mindee::Input::Source::FileInputSource

Load a document from a normal Ruby File.

Parameters:

  • input_file (File)

    Input file handle

  • filename (String)

    The name of the file (without the path)

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



240
241
242
# File 'lib/mindee/client.rb', line 240

def source_from_file(input_file, filename, fix_pdf: false)
  Input::Source::FileInputSource.new(input_file, filename, fix_pdf: fix_pdf)
end

#source_from_path(input_path, fix_pdf: false) ⇒ Mindee::Input::Source::PathInputSource

Load a document from an absolute path, as a string.

Parameters:

  • input_path (String)

    Path of file to open

  • fix_pdf (Boolean) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



213
214
215
# File 'lib/mindee/client.rb', line 213

def source_from_path(input_path, fix_pdf: false)
  Input::Source::PathInputSource.new(input_path, fix_pdf: fix_pdf)
end

#source_from_url(url) ⇒ Mindee::Input::Source::UrlInputSource

Load a document from a secure remote source (HTTPS).

Parameters:

  • url (String)

    Url of the file

Returns:



247
248
249
# File 'lib/mindee/client.rb', line 247

def source_from_url(url)
  Input::Source::UrlInputSource.new(url)
end