ReductoAi
Ruby wrapper on ReductoAI API
Installation
bundle add reducto_ai
Usage
Configure once:
ReductoAI.configure do |config|
config.api_key = ENV.fetch("REDUCTO_API_KEY")
end
Choosing an action
- Parse: Start here for any document. Converts uploads or URLs into structured chunks and OCR text so later steps can reuse the returned
job_id. - Split: Use after parsing when you need logical sections. Provide
split_descriptionnames/rules to segment the parsed document into labeled ranges. - Extract: Run when you need structured answers (fields, JSON). Supply instructions or schema to pull values from raw input or an existing parse
job_id. - Edit: Generate marked-up PDFs using
document_urlplusedit_instructions(PDF forms supported viaform_schema). - Pipeline: Trigger a saved Studio pipeline with
input+pipeline_idto orchestrate Parse/Split/Extract/Edit in one call.
Async Operations
All resources support async variants that return a job_id for polling:
client = ReductoAI::Client.new
# Start async parse
# API Reference: https://docs.reducto.ai/api-reference/parse-async
job = client.parse.async(input: "https://example.com/large-doc.pdf")
job_id = job["job_id"]
# Response:
# {
# "job_id" => "async-123",
# "status" => "processing"
# }
# Poll for completion
# API Reference: https://docs.reducto.ai/api-reference/get-job
result = client.jobs.retrieve(job_id: job_id)
# Response:
# {
# "job_id" => "async-123",
# "status" => "complete",
# "result" => {...},
# "usage" => {"credits" => 1.0}
# }
# Or configure webhooks for notifications
# API Reference: https://docs.reducto.ai/api-reference/webhook-portal
client.jobs.configure_webhook
Available async methods:
client.parse.async(input:, **options)- Parse Async APIclient.extract.async(input:, instructions:, **options)- Extract Async APIclient.split.async(input:, **options)- Split Async APIclient.edit.async(input:, instructions:, **options)- Edit Async APIclient.pipeline.async(input:, steps:, **options)- Pipeline Async API
Rails
Create config/initializers/reducto_ai.rb:
ReductoAI.configure do |c|
c.api_key = Rails.application.credentials.dig(:reducto, :api_key)
# c.base_url = "https://platform.reducto.ai"
# c.open_timeout = 5; c.read_timeout = 30
end
# Optional: override shared client (multi-tenant or custom timeouts)
# ReductoAI.client = ReductoAI::Client.new(api_key: ..., read_timeout: 10)
Quick Start
client = ReductoAI::Client.new
# Parse a document
# API Reference: https://docs.reducto.ai/api-reference/parse
parse = client.parse.sync(input: "https://example.com/invoice.pdf")
job_id = parse["job_id"]
# Response:
# {
# "job_id" => "abc-123",
# "status" => "complete",
# "result" => {...}
# }
# Extract structured data
# API Reference: https://docs.reducto.ai/api-reference/extract
extraction = client.extract.sync(
input: job_id,
instructions: {
schema: {
type: "object",
properties: {
invoice_number: { type: "string" },
total_due: { type: "string" }
},
required: ["invoice_number", "total_due"]
}
}
)
# Response:
# {
# "job_id" => "820dca1b-3215-4d24-be09-6494d4c3cd88",
# "usage" => {"num_pages" => 1, "num_fields" => 2, "credits" => 2.0},
# "studio_link" => "https://studio.reducto.ai/job/820dca1b-3115-4d24-be09-6494d4c3cd88",
# "result" => [{"invoice_number" => "INV-2024-001", "total_due" => "$1,234.56"}],
# "citations" => nil
# }
Complete Example: Multi-invoice Processing
client = ReductoAI::Client.new
# 1. Parse the document
# API Reference: https://docs.reducto.ai/api-reference/parse
parse = client.parse.sync(input: "https://example.com/invoices.pdf")
# Response:
# {
# "job_id" => "parse-123",
# "status" => "complete",
# "result" => {...}
# }
# 2. Split into individual invoices
# API Reference: https://docs.reducto.ai/api-reference/split
split = client.split.sync(
input: parse["job_id"],
split_description: [
{
name: "Invoice",
description: "All pages that belong to a single invoice",
partition_key: "invoice_number"
}
],
split_rules: " The document contains multiple invoices one after another. Each invoice has a unique invoice number formatted like \"Invoice #12345\" near the top of the first page.\n Segment the document into one partition per invoice. Keep pages contiguous per invoice and include any following appendices until the next invoice number.\n Name each partition using the exact invoice number you detect (e.g., \"Invoice #12345\").\n PROMPT\n)\n\n# Response:\n# {\n# \"job_id\" => \"split-456\",\n# \"result\" => {\n# \"splits\" => [{\n# \"name\" => \"Invoice\",\n# \"partitions\" => [\n# {\"name\" => \"Invoice #12345\", \"pages\" => [0, 1, 2]},\n# {\"name\" => \"Invoice #12346\", \"pages\" => [3, 4]}\n# ]\n# }]\n# }\n# }\n\n# 3. Extract data from each invoice\n# API Reference: https://docs.reducto.ai/api-reference/extract\ninvoice_partitions = split.dig(\"result\", \"splits\").first.fetch(\"partitions\")\ninvoice_details = invoice_partitions.map do |partition|\n client.extract.sync(\n input: parse[\"job_id\"],\n instructions: {\n schema: {\n type: \"object\",\n properties: {\n invoice_number: { type: \"string\" },\n total_due: { type: \"string\" }\n },\n required: [\"invoice_number\", \"total_due\"]\n }\n },\n settings: { page_range: partition[\"pages\"] }\n )\nend\n\n# Response per invoice:\n# {\n# \"job_id\" => \"extract-789\",\n# \"result\" => [{\"invoice_number\" => \"INV-12345\", \"total_due\" => \"$2,500.00\"}],\n# \"usage\" => {\"credits\" => 2.0}\n# }\n"
Direct Split Example
Split a multi-invoice PDF directly without pre-parsing:
client = ReductoAI::Client.new
# Split document directly from URL
# API Reference: https://docs.reducto.ai/api-reference/split
response = client.split.sync(
input: { url: "https://example.com/invoices.pdf" },
split_description: [
{
name: "Invoice",
description: "Individual invoices within the document",
partition_key: "invoice_number"
}
]
)
# Response:
# {
# "usage" => {"num_pages" => 2, "credits" => nil},
# "result" => {
# "section_mapping" => nil,
# "splits" => [{
# "name" => "Invoice",
# "pages" => [1, 2],
# "conf" => "high",
# "partitions" => [
# {"name" => "0000569050-001", "pages" => [1], "conf" => "high"},
# {"name" => "0000569050-002", "pages" => [2], "conf" => "high"}
# ]
# }]
# }
# }
# Access partitions
partitions = response.dig("result", "splits").first["partitions"]
# => [{"name"=>"0000569050-001", "pages"=>[1], "conf"=>"high"}, ...]
Document Classification Example
client = ReductoAI::Client.new
# Parse document
# API Reference: https://docs.reducto.ai/api-reference/parse
parse = client.parse.sync(input: "https://example.com/document.pdf")
# Extract with classification
# API Reference: https://docs.reducto.ai/api-reference/extract
extraction = client.extract.sync(
input: parse["job_id"],
instructions: {
schema: {
type: "object",
properties: {
document_type: {
type: "string",
enum: ["invoice", "credit", "debit"],
description: "Document category"
},
document_number: {
type: "string",
description: "Invoice number or equivalent identifier"
}
},
required: ["document_type", "document_number"]
}
},
settings: { citations: { enabled: false } }
)
# Response:
# {
# "job_id" => "class-123",
# "result" => [{"document_type" => "invoice", "document_number" => "INV-2024-001"}],
# "usage" => {"credits" => 2.0}
# }
document_type = extraction.dig("result", 0, "document_type")
document_number = extraction.dig("result", 0, "document_number")
API Reference
Full endpoint details live in the Reducto API documentation.
Best Practices: Cost-Efficient Document Processing
Follow these patterns to minimize credit usage when processing documents:
1. Parse Once, Reuse Everywhere
❌ Expensive: Calling extract/split with URLs directly
# DON'T: Each operation parses the document again
extract1 = client.extract.sync(input: url, instructions: schema_a) # Parse + Extract = 2 credits
extract2 = client.extract.sync(input: url, instructions: schema_b) # Parse + Extract = 2 credits
split = client.split.sync(input: url, split_description: [...]) # Parse + Split = 3 credits
# Total: 7 credits for a 1-page document
✅ Cost-efficient: Parse once, reuse job_id
# DO: Parse once, reuse the job_id
parse = client.parse.sync(input: url) # 1 credit
job_id = parse["job_id"]
extract1 = client.extract.sync(input: job_id, instructions: schema_a) # 1 credit
extract2 = client.extract.sync(input: job_id, instructions: schema_b) # 1 credit
split = client.split.sync(input: job_id, split_description: [...]) # 2 credits
# Total: 5 credits for a 1-page document (saved 2 credits)
2. Split Before Extract for Multi-Document Files
✅ Best practice: Split first, then extract per partition
# 1. Parse the document once
parse = client.parse.sync(input: "multi-invoice.pdf") # 1 credit × 10 pages = 10 credits
job_id = parse["job_id"]
# 2. Split into partitions
split = client.split.sync(
input: job_id,
split_description: [{ name: "Invoice", description: "..." }]
) # 2 credits × 10 pages = 20 credits
# 3. Extract only from specific partitions
partitions = split.dig("result", "splits").first["partitions"]
invoices = partitions.map do |partition|
client.extract.sync(
input: job_id,
instructions: { schema: invoice_schema },
settings: { page_range: partition["pages"] } # Extract only relevant pages
)
end # 1 credit × 10 pages = 10 credits
# Total: 40 credits for 10-page document with 5 invoices
3. Use Async for Large Documents
✅ For documents > 10 pages: Use async to avoid timeouts
# Parse async for large files
job = client.parse.async(input: large_pdf_url)
job_id = job["job_id"]
# Poll or use webhooks
loop do
result = client.jobs.retrieve(job_id: job_id)
break if result["status"] == "complete"
sleep 2
end
# Then reuse the job_id for split/extract
split = client.split.sync(input: job_id, split_description: [...])
4. Store and Reuse Parse Results
✅ For repeated processing: Store job_id to avoid re-parsing
# Store the job_id with your document record
document.update(reducto_job_id: parse["job_id"])
# Later: Extract different schemas without re-parsing
schema_v1 = client.extract.sync(input: document.reducto_job_id, instructions: schema_v1)
schema_v2 = client.extract.sync(input: document.reducto_job_id, instructions: schema_v2)
# Only 2 credits instead of 4
Credit Math Summary
| Operation | Direct URL | With job_id | Savings |
|---|---|---|---|
| Parse | 1 credit/page | N/A | - |
| Extract | 2 credits/page | 1 credit/page | 50% |
| Split | 3 credits/page | 2 credits/page | 33% |
| Multiple extracts (3×) | 6 credits/page | 3 credits/page | 50% |
Golden rule: Always parse once and reuse job_id for all subsequent operations.
Credits & pricing overview
Reducto bills every API call in credits. Current public rates are:
- Parse: 1 credit per standard page (2 for complex VLM-enhanced pages).
- Extract: 2 credits per page (4 if agent-in-loop mode is enabled). Parsing credits are also charged if you don't reuse a previous
job_id. - Split: 2 credits per page when run standalone; free if you supply a prior parse job.
- Edit: 4 credits per page (beta pricing).
You can process ~15k credits/month before overages; additional credits are billed at $0.015 USD each according to Reducto's pricing page.
Why Extract costs 2 credits for 1 page
When you call extract.sync(input: url, instructions: schema) with a URL instead of a job_id, Reducto automatically performs two operations:
- Parse (1 credit): Converts PDF → structured text
- Extract (1 credit): Applies schema → structured JSON
- Total: 2 credits
Cost optimization: Parse once, extract multiple times:
# Parse once (1 credit)
parse = client.parse.sync(input: "https://example.com/doc.pdf")
job_id = parse["job_id"]
# Extract multiple schemas (1 credit each)
schema_a = client.extract.sync(input: job_id, instructions: schema_a)
schema_b = client.extract.sync(input: job_id, instructions: schema_b)
# Total: 3 credits instead of 4
Credit math for the examples above
- Parse → Split → Extract: when you start with
ReductoAI.parseand pass the resultingjob_idtosplitandextract, you pay 1 + 2 = 3 credits per page (parse + extract). Split reuses the parsed content so it doesn't add extra parse credits. - Document type + number extraction: the JSON-schema
extractcall uses an existing parse job, so it consumes parse (1) + extract (2) = 3 credits per page. Enabling agentic or citations may raise the per-page cost per the credit usage guide.
Development
bundle exec rake test
bundle exec rubocop
TODO
- [ ] Document webhook workflow and retry semantics
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/dpaluy/reducto_ai.
License
The gem is available as open source under the terms of the MIT License.