Class: Boxcars::VectorStore::Hnswlib::BuildFromFiles

Inherits:
Object
  • Object
show all
Includes:
Boxcars::VectorStore
Defined in:
lib/boxcars/vector_store/hnswlib/build_from_files.rb

Overview

This class is responsible for building the vector store for the hnswlib similarity search. It will load the training data, generate the embeddings, and save the vector store. It will also load the vector store into memory. For later use, it will save the splitted document with index numbers to a json file.

Instance Method Summary collapse

Methods included from Boxcars::VectorStore

included

Constructor Details

#initialize(params) ⇒ BuildFromFiles

Returns a new instance of BuildFromFiles.



17
18
19
20
21
22
23
24
# File 'lib/boxcars/vector_store/hnswlib/build_from_files.rb', line 17

def initialize(params)
  @split_chunk_size = params[:split_chunk_size] || 2000
  @base_dir_path, @index_file_path, @json_doc_file_path =
    validate_params(params[:training_data_path], params[:index_file_path], split_chunk_size)

  @force_rebuild = params[:force_rebuild] || false
  @hnsw_vectors = []
end

Instance Method Details

#callObject



26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/boxcars/vector_store/hnswlib/build_from_files.rb', line 26

def call
  if !force_rebuild && File.exist?(index_file_path)
    load_existing_vector_store
  else
    puts "Building Hnswlib vector store..."
    data = load_data_files(training_data_path)
    Boxcars.debug("Loaded #{data.length} files from #{training_data_path}")
    texts = split_text_into_chunks(data)
    Boxcars.debug("Split #{data.length} files into #{texts.length} chunks")
    vectors = generate_vectors(texts)
    Boxcars.debug("Generated #{vectors.length} vectors")
    add_vectors(vectors, texts)
    Boxcars.debug("Added #{vectors.length} vectors to vector store")
    save_vector_store

    {
      type: :hnswlib,
      vector_store: hnsw_vectors
    }
  end
end