Class: Boxcars::VectorStore::Pgvector::BuildFromFiles

Inherits:
Object
  • Object
show all
Includes:
Boxcars::VectorStore
Defined in:
lib/boxcars/vector_store/pgvector/build_from_files.rb

Instance Method Summary collapse

Methods included from Boxcars::VectorStore

included

Constructor Details

#initialize(params) ⇒ Hash

Returns vector_store: array of hashes with :content, :metadata, and :embedding keys.

Parameters:

  • training_data_path (String)

    path to training data files

  • split_chunk_size (Integer)

    number of characters to split the text into

  • embedding_tool (Symbol)

    embedding tool to use

  • database_url (String)

    database url

  • table_name (String)

    table name

  • embedding_column_name (String)

    embedding column name

  • content_column_name (String)

    content column name

  • metadata_column_name (String)

    metadata column name



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/boxcars/vector_store/pgvector/build_from_files.rb', line 22

def initialize(params)
  @split_chunk_size = params[:split_chunk_size] || 2000
  @training_data_path = File.absolute_path(params[:training_data_path])
  @embedding_tool = params[:embedding_tool] || :openai

  validate_params(embedding_tool, training_data_path)

  @database_url = params[:database_url]
  @table_name = params[:table_name]
  @embedding_column_name = params[:embedding_column_name]
  @content_column_name = params[:content_column_name]
  @metadata_column_name = params[:metadata_column_name]

  @pg_vectors = []
end

Instance Method Details

#callHash

Returns vector_store: array of Inventor::VectorStore::Document.

Returns:

  • (Hash)

    vector_store: array of Inventor::VectorStore::Document



39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/boxcars/vector_store/pgvector/build_from_files.rb', line 39

def call
  data = load_data_files(training_data_path)
  texts = split_text_into_chunks(data)
  embeddings = generate_vectors(texts)
  add_vectors(embeddings, texts)
  documents = save_vector_store

  {
    type: :pgvector,
    vector_store: documents
  }
end