Class: Mindee::Input::Source::LocalInputSource
- Inherits:
-
Object
- Object
- Mindee::Input::Source::LocalInputSource
- Defined in:
- lib/mindee/input/sources.rb
Overview
Base class for loading documents.
Direct Known Subclasses
Base64InputSource, BytesInputSource, FileInputSource, PathInputSource
Instance Attribute Summary collapse
- #file_mimetype ⇒ String readonly
- #filename ⇒ String readonly
- #io_stream ⇒ StringIO readonly
Instance Method Summary collapse
-
#compress!(quality: 85, max_width: nil, max_height: nil, force_source_text: false, disable_source_text: true) ⇒ Object
Compresses the file, according to the provided info.
- #count_pdf_pages ⇒ Object
-
#initialize(io_stream, filename, fix_pdf: false) ⇒ LocalInputSource
constructor
A new instance of LocalInputSource.
-
#pdf? ⇒ Boolean
Shorthand for pdf mimetype validation.
-
#process_pdf(options) ⇒ Object
Parses a PDF file according to provided options.
-
#read_document(close: true) ⇒ Array<String, [String, aBinaryString ], [Hash, nil] >
Reads a document.
-
#rescue_broken_pdf(stream) ⇒ Object
Attempts to fix pdf files if mimetype is rejected.
-
#source_text? ⇒ Boolean
Checks whether the file has source text if it is a pdf.
Constructor Details
#initialize(io_stream, filename, fix_pdf: false) ⇒ LocalInputSource
Returns a new instance of LocalInputSource.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/mindee/input/sources.rb', line 58 def initialize(io_stream, filename, fix_pdf: false) @io_stream = io_stream @filename = filename @file_mimetype = if fix_pdf Marcel::MimeType.for @io_stream else Marcel::MimeType.for @io_stream, name: @filename end return if ALLOWED_MIME_TYPES.include? @file_mimetype if filename.end_with?('.pdf') && fix_pdf rescue_broken_pdf(@io_stream) @file_mimetype = Marcel::MimeType.for @io_stream return if ALLOWED_MIME_TYPES.include? @file_mimetype end raise InvalidMimeTypeError, @file_mimetype.to_s end |
Instance Attribute Details
#file_mimetype ⇒ String (readonly)
51 52 53 |
# File 'lib/mindee/input/sources.rb', line 51 def file_mimetype @file_mimetype end |
#filename ⇒ String (readonly)
49 50 51 |
# File 'lib/mindee/input/sources.rb', line 49 def filename @filename end |
#io_stream ⇒ StringIO (readonly)
53 54 55 |
# File 'lib/mindee/input/sources.rb', line 53 def io_stream @io_stream end |
Instance Method Details
#compress!(quality: 85, max_width: nil, max_height: nil, force_source_text: false, disable_source_text: true) ⇒ Object
Compresses the file, according to the provided info.
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# File 'lib/mindee/input/sources.rb', line 140 def compress!(quality: 85, max_width: nil, max_height: nil, force_source_text: false, disable_source_text: true) buffer = if pdf? Mindee::PDF::PDFCompressor.compress_pdf( @io_stream, quality: quality, force_source_text_compression: force_source_text, disable_source_text: disable_source_text ) else Mindee::Image::ImageCompressor.compress_image( @io_stream, quality: quality, max_width: max_width, max_height: max_height ) end @io_stream = buffer @io_stream.rewind end |
#count_pdf_pages ⇒ Object
123 124 125 126 127 128 129 |
# File 'lib/mindee/input/sources.rb', line 123 def count_pdf_pages return 1 unless pdf? @io_stream.seek(0) pdf_processor = Mindee::PDF::PdfProcessor.open_pdf(@io_stream) pdf_processor.pages.size end |
#pdf? ⇒ Boolean
Shorthand for pdf mimetype validation.
95 96 97 |
# File 'lib/mindee/input/sources.rb', line 95 def pdf? @file_mimetype.to_s == 'application/pdf' end |
#process_pdf(options) ⇒ Object
Parses a PDF file according to provided options.
107 108 109 110 |
# File 'lib/mindee/input/sources.rb', line 107 def process_pdf() @io_stream.seek(0) @io_stream = PdfProcessor.parse(@io_stream, ) end |
#read_document(close: true) ⇒ Array<String, [String, aBinaryString ], [Hash, nil] >
Reads a document.
115 116 117 118 119 120 121 |
# File 'lib/mindee/input/sources.rb', line 115 def read_document(close: true) @io_stream.seek(0) # Avoids needlessly re-packing some files data = @io_stream.read @io_stream.close if close ['document', data, { filename: Mindee::Input::Source.convert_to_unicode_escape(@filename) }] end |
#rescue_broken_pdf(stream) ⇒ Object
Attempts to fix pdf files if mimetype is rejected. "Broken PDFs" are often a result of third-party injecting invalid headers. This attempts to remove them and send the file
82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/mindee/input/sources.rb', line 82 def rescue_broken_pdf(stream) stream.gets('%PDF-') raise UnfixablePDFError if stream.eof? || stream.pos > 500 stream.pos = stream.pos - 5 data = stream.read @io_stream.close @io_stream = StringIO.new @io_stream << data end |
#source_text? ⇒ Boolean
Checks whether the file has source text if it is a pdf. False otherwise
162 163 164 |
# File 'lib/mindee/input/sources.rb', line 162 def source_text? Mindee::PDF::PDFTools.source_text?(@io_stream) end |