Class: ReductoAI::Resources::Parse

Inherits:
Object
  • Object
show all
Defined in:
lib/reducto_ai/resources/parse.rb

Overview

Note:

Each parse operation consumes credits based on document complexity. See Reducto documentation for pricing details.

Parse resource for document parsing operations.

Converts documents (PDFs, images, etc.) into structured formats like Markdown, JSON, or HTML. Supports both synchronous and asynchronous modes.

Examples:

Synchronous parsing

client = ReductoAI::Client.new
result = client.parse.sync(
  input: "https://example.com/document.pdf",
  output_formats: { markdown: true }
)
puts result["result"]["markdown"]

Asynchronous parsing

job = client.parse.async(
  input: { url: "https://example.com/large-doc.pdf" },
  async: true
)
job_id = job["job_id"]

Instance Method Summary collapse

Constructor Details

#initialize(client) ⇒ Parse

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of Parse.

Parameters:

  • client (Client)

    the Reducto API client



34
35
36
# File 'lib/reducto_ai/resources/parse.rb', line 34

def initialize(client)
  @client = client
end

Instance Method Details

#async(input:, async: nil, **options) ⇒ Hash

Parses a document asynchronously.

Returns immediately with a job_id. Poll with Jobs#retrieve to get results.

Examples:

Start async parse and poll

job = client.parse.async(input: "https://example.com/doc.pdf")
job_id = job["job_id"]

# Poll for completion
loop do
  status = client.jobs.retrieve(job_id: job_id)
  break if status["status"] == "succeeded"
  sleep 2
end

Parameters:

  • input (String, Hash)

    Document URL or hash with :url key

  • async (Boolean, nil) (defaults to: nil)

    Async mode flag (defaults to true if not provided)

  • options (Hash)

    Additional parsing options (same as #sync)

Returns:

  • (Hash)

    Job status with keys:

    • "job_id" [String] - Job identifier for polling
    • "status" [String] - Initial status ("processing")

Raises:

  • (ArgumentError)

    if input is nil

See Also:



101
102
103
104
105
106
107
108
109
110
# File 'lib/reducto_ai/resources/parse.rb', line 101

def async(input:, async: nil, **options)
  raise ArgumentError, "input is required" if input.nil?

  normalized_input = normalize_input(input)
  payload = { input: normalized_input }
  payload[:async] = async unless async.nil?
  payload.merge!(options.compact)

  @client.post("/parse_async", payload)
end

#sync(input:, **options) ⇒ Hash

Parses a document synchronously.

Blocks until parsing completes and returns the full result.

Examples:

Parse to markdown

result = client.parse.sync(
  input: "https://example.com/doc.pdf",
  output_formats: { markdown: true }
)

Parameters:

  • input (String, Hash)

    Document URL or hash with :url key

  • options (Hash)

    Additional parsing options

Options Hash (**options):

  • :output_formats (Hash)

    Output format configuration (e.g., { markdown: true, html: true })

  • :mode (String)

    Processing mode ("ocr", "auto")

  • :use_cache (Boolean)

    Whether to use cached results

Returns:

  • (Hash)

    Parsed document with keys:

    • "job_id" [String] - Job identifier
    • "status" [String] - Job status ("succeeded")
    • "result" [Hash] - Parsed content by format (e.g., "markdown", "html")
    • "usage" [Hash] - Credit usage details

Raises:

  • (ArgumentError)

    if input is nil

  • (ClientError)

    if document URL is invalid or inaccessible

  • (ServerError)

    if parsing fails

See Also:



66
67
68
69
70
71
72
# File 'lib/reducto_ai/resources/parse.rb', line 66

def sync(input:, **options)
  raise ArgumentError, "input is required" if input.nil?

  normalized_input = normalize_input(input)
  payload = { input: normalized_input, **options }.compact
  @client.post("/parse", payload)
end