Class: Mindee::Extraction::PdfExtractor::ExtractedPdf

Inherits:

Object

Object
Mindee::Extraction::PdfExtractor::ExtractedPdf

show all

Defined in:: lib/mindee/extraction/pdf_extractor/extracted_pdf.rb

Overview

An extracted sub-Pdf.

Instance Attribute Summary collapse

#filename ⇒ String readonly
Name of the file.
#pdf_bytes ⇒ StreamIO readonly
Byte contents of the pdf.

Instance Method Summary collapse

#as_input_source ⇒ Mindee::Input::Source::BytesInputSource
Returns the current PDF object as a usable BytesInputSource.
#initialize(pdf_bytes, filename) ⇒ ExtractedPdf constructor
A new instance of ExtractedPdf.
#page_count ⇒ Integer
Retrieves the page count for a given pdf.
#write_to_file(output_path) ⇒ Object
Writes the contents of the current PDF object to a file.

Constructor Details

#initialize(pdf_bytes, filename) ⇒ `ExtractedPdf`

Returns a new instance of ExtractedPdf.

Parameters:

pdf_bytes (StreamIO)
filename (String)

# File 'lib/mindee/extraction/pdf_extractor/extracted_pdf.rb', line 19

def initialize(pdf_bytes, filename)
  @pdf_bytes = pdf_bytes
  @filename = filename
end

Instance Attribute Details

#filename ⇒ `String` (readonly)

Name of the file.

Returns:

(String)



15
16
17

# File 'lib/mindee/extraction/pdf_extractor/extracted_pdf.rb', line 15

def filename
  @filename
end

#pdf_bytes ⇒ `StreamIO` (readonly)

Byte contents of the pdf

Returns:

(StreamIO)



11
12
13

# File 'lib/mindee/extraction/pdf_extractor/extracted_pdf.rb', line 11

def pdf_bytes
  @pdf_bytes
end

Instance Method Details

#as_input_source ⇒ `Mindee::Input::Source::BytesInputSource`

Returns the current PDF object as a usable BytesInputSource.

Returns:

(Mindee::Input::Source::BytesInputSource)



49
50
51

# File 'lib/mindee/extraction/pdf_extractor/extracted_pdf.rb', line 49

def as_input_source
  Mindee::Input::Source::BytesInputSource.new(@pdf_bytes.read, @filename)
end

#page_count ⇒ `Integer`

Retrieves the page count for a given pdf.

Returns:

(Integer)

# File 'lib/mindee/extraction/pdf_extractor/extracted_pdf.rb', line 26

def page_count
  current_pdf = Mindee::PDF::PdfProcessor.open_pdf(pdf_bytes)
  current_pdf.pages.size
rescue TypeError
  raise 'Could not retrieve page count from Extracted PDF object.'
end

#write_to_file(output_path) ⇒ `Object`

Writes the contents of the current PDF object to a file.

Parameters:

output_path (String) —
Path to write to.

# File 'lib/mindee/extraction/pdf_extractor/extracted_pdf.rb', line 35

def write_to_file(output_path)
  raise 'Provided path is not a file' if File.directory?(destination)
  raise 'Invalid save path provided' unless File.exist?(File.expand_path('..', output_path))

  if File.extname(output_path).downcase == '.pdf'
    base_path = File.expand_path('..', output_path)
    output_path = File.expand_path("#{File.basename(output_path)}.pdf", base_path)
  end

  File.write(output_path, @pdf_bytes)
end

Class: Mindee::Extraction::PdfExtractor::ExtractedPdf

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pdf_bytes, filename) ⇒ ExtractedPdf

Instance Attribute Details

#filename ⇒ String (readonly)

#pdf_bytes ⇒ StreamIO (readonly)

Instance Method Details

#as_input_source ⇒ Mindee::Input::Source::BytesInputSource

#page_count ⇒ Integer

#write_to_file(output_path) ⇒ Object

#initialize(pdf_bytes, filename) ⇒ `ExtractedPdf`

#filename ⇒ `String` (readonly)

#pdf_bytes ⇒ `StreamIO` (readonly)

#as_input_source ⇒ `Mindee::Input::Source::BytesInputSource`

#page_count ⇒ `Integer`

#write_to_file(output_path) ⇒ `Object`