Class: Langchain::Processors::PDF
- Defined in:
- lib/langchain/processors/pdf.rb
Constant Summary collapse
- EXTENSIONS =
[".pdf"]
- CONTENT_TYPES =
["application/pdf"]
Instance Method Summary collapse
-
#initialize ⇒ PDF
constructor
A new instance of PDF.
-
#parse(data) ⇒ String
Parse the document and return the text.
Methods included from DependencyHelper
Constructor Details
#initialize ⇒ PDF
Returns a new instance of PDF.
9 10 11 |
# File 'lib/langchain/processors/pdf.rb', line 9 def initialize(*) depends_on "pdf-reader" end |
Instance Method Details
#parse(data) ⇒ String
Parse the document and return the text
16 17 18 19 20 21 22 |
# File 'lib/langchain/processors/pdf.rb', line 16 def parse(data) ::PDF::Reader .new(StringIO.new(data.read)) .pages .map(&:text) .join("\n\n") end |