Class: Langchain::Loader

Inherits:
Object
  • Object
show all
Defined in:
lib/langchain/loader.rb

Defined Under Namespace

Classes: FileNotFound, UnknownFormatError

Constant Summary collapse

URI_REGEX =
%r{\A[A-Za-z][A-Za-z0-9+\-.]*://}

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path, options = {}, chunker: Langchain::Chunker::Text) ⇒ Langchain::Loader

Initialize Langchain::Loader

Parameters:

  • path (String | Pathname)

    path to file or URL

  • options (Hash) (defaults to: {})

    options passed to the processor class used to process the data



42
43
44
45
46
# File 'lib/langchain/loader.rb', line 42

def initialize(path, options = {}, chunker: Langchain::Chunker::Text)
  @options = options
  @path = path
  @chunker = chunker
end

Class Method Details

.load(path, options = {}, &block) ⇒ Data

Load data from a file or URL. Shorthand for ‘Langchain::Loader.new(path).load`

Examples

 # load a URL
 data = Langchain::Loader.load("https://example.com/docs/README.md")

 # load a file
 data = Langchain::Loader.load("README.md")

# Load data using a custom processor
data = Langchain::Loader.load("README.md") do |raw_data, options|
  # your processing code goes here
  # return data at the end here
end

rubocop:disable Style/ArgumentsForwarding

Parameters:

  • path (String | Pathname)

    path to file or URL

  • options (Hash) (defaults to: {})

    options passed to the processor class used to process the data

Returns:

  • (Data)

    data loaded from path



33
34
35
# File 'lib/langchain/loader.rb', line 33

def self.load(path, options = {}, &block)
  new(path, options).load(&block)
end

Instance Method Details

#directory?Boolean

Is the path a directory

Returns:

  • (Boolean)

    true if path is a directory



60
61
62
# File 'lib/langchain/loader.rb', line 60

def directory?
  File.directory?(@path)
end

#load {|String, Hash| ... } ⇒ Data

Load data from a file or URL

loader = Langchain::Loader.new("README.md")
# Load data using default processor for the file
loader.load

# Load data using a custom processor
loader.load do |raw_data, options|
  # your processing code goes here
  # return data at the end here
end

rubocop:disable Style/ArgumentsForwarding

Yields:

  • (String, Hash)

    handle parsing raw output into string directly

Yield Parameters:

  • raw_data (String)

    from the loaded URL or file

Yield Returns:

  • (String)

    parsed data, as a String

Returns:

  • (Data)

    data that was loaded



82
83
84
85
86
87
# File 'lib/langchain/loader.rb', line 82

def load(&block)
  return process_data(load_from_url, &block) if url?
  return load_from_directory(&block) if directory?

  process_data(load_from_path, &block)
end

#url?Boolean

Is the path a URL?

Returns:

  • (Boolean)

    true if path is URL



51
52
53
54
55
# File 'lib/langchain/loader.rb', line 51

def url?
  return false if @path.is_a?(Pathname)

  !!(@path =~ URI_REGEX)
end