Class: ReductoAI::Resources::Extract

Inherits:
Object
  • Object
show all
Defined in:
lib/reducto_ai/resources/extract.rb

Overview

Note:

Extraction operations consume credits based on document complexity and schema size.

Extract resource for structured data extraction.

Extracts specific information from documents based on a schema or instructions. Returns structured JSON data matching the provided schema.

Examples:

Extract with schema

client = ReductoAI::Client.new
schema = {
  invoice_number: "string",
  total_amount: "number",
  line_items: ["object"]
}

result = client.extract.sync(
  input: "https://example.com/invoice.pdf",
  instructions: schema
)
puts result["result"]

Instance Method Summary collapse

Constructor Details

#initialize(client) ⇒ Extract

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of Extract.

Parameters:

  • client (Client)

    the Reducto API client



29
30
31
# File 'lib/reducto_ai/resources/extract.rb', line 29

def initialize(client)
  @client = client
end

Instance Method Details

#async(input:, instructions:, async: nil, **options) ⇒ Hash

Extracts structured data from a document asynchronously.

Returns immediately with a job_id. Poll with Jobs#retrieve to get results.

Examples:

Start async extraction

job = client.extract.async(
  input: "https://example.com/contract.pdf",
  instructions: { parties: ["string"], terms: "string" }
)
job_id = job["job_id"]

Parameters:

  • input (String, Hash)

    Document URL or hash with :url key

  • instructions (Hash, String)

    Extraction schema (same as #sync)

  • async (Boolean, nil) (defaults to: nil)

    Async mode flag

  • options (Hash)

    Additional extraction options

Returns:

  • (Hash)

    Job status with keys:

    • "job_id" [String] - Job identifier for polling
    • "status" [String] - Initial status ("processing")

Raises:

  • (ArgumentError)

    if input or instructions are nil/empty

See Also:



95
96
97
98
99
100
101
102
103
104
105
# File 'lib/reducto_ai/resources/extract.rb', line 95

def async(input:, instructions:, async: nil, **options)
  raise ArgumentError, "input is required" if input.nil?
  if instructions.nil? || (instructions.respond_to?(:empty?) && instructions.empty?)
    raise ArgumentError, "instructions are required"
  end

  payload = build_payload(input, instructions, options)
  payload[:async] = async unless async.nil?

  @client.post("/extract_async", payload)
end

#sync(input:, instructions:, **options) ⇒ Hash

Extracts structured data from a document synchronously.

Examples:

Extract invoice data

result = client.extract.sync(
  input: "https://example.com/invoice.pdf",
  instructions: {
    invoice_number: "string",
    total: "number"
  }
)

Parameters:

  • input (String, Hash)

    Document URL or hash with :url key

  • instructions (Hash, String)

    Extraction schema or instructions. Can be a simple hash (auto-wrapped as { schema: ... }) or a full instructions hash with a :schema key.

  • options (Hash)

    Additional extraction options

Returns:

  • (Hash)

    Extraction results with keys:

    • "job_id" [String] - Job identifier
    • "status" [String] - Job status ("succeeded")
    • "result" [Hash] - Extracted data matching schema
    • "usage" [Hash] - Credit usage details

Raises:

  • (ArgumentError)

    if input or instructions are nil/empty

  • (ClientError)

    if schema is invalid

  • (ServerError)

    if extraction fails

See Also:



61
62
63
64
65
66
67
68
69
# File 'lib/reducto_ai/resources/extract.rb', line 61

def sync(input:, instructions:, **options)
  raise ArgumentError, "input is required" if input.nil?
  if instructions.nil? || (instructions.respond_to?(:empty?) && instructions.empty?)
    raise ArgumentError, "instructions are required"
  end

  payload = build_payload(input, instructions, options)
  @client.post("/extract", payload)
end