Class: Boxcars::VectorStore::Hnswlib::BuildFromFiles
- Inherits:
-
Object
- Object
- Boxcars::VectorStore::Hnswlib::BuildFromFiles
- Includes:
- Boxcars::VectorStore
- Defined in:
- lib/boxcars/vector_store/hnswlib/build_from_files.rb
Overview
This class is responsible for building the vector store for the hnswlib similarity search. It will load the training data, generate the embeddings, and save the vector store. It will also load the vector store into memory. For later use, it will save the splitted document with index numbers to a json file.
Instance Method Summary collapse
- #call ⇒ Object
-
#initialize(params) ⇒ BuildFromFiles
constructor
A new instance of BuildFromFiles.
Methods included from Boxcars::VectorStore
Constructor Details
#initialize(params) ⇒ BuildFromFiles
Returns a new instance of BuildFromFiles.
17 18 19 20 21 22 23 24 |
# File 'lib/boxcars/vector_store/hnswlib/build_from_files.rb', line 17 def initialize(params) @split_chunk_size = params[:split_chunk_size] || 2000 @base_dir_path, @index_file_path, @json_doc_file_path = validate_params(params[:training_data_path], params[:index_file_path], split_chunk_size) @force_rebuild = params[:force_rebuild] || false @hnsw_vectors = [] end |
Instance Method Details
#call ⇒ Object
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/boxcars/vector_store/hnswlib/build_from_files.rb', line 26 def call if !force_rebuild && File.exist?(index_file_path) load_existing_vector_store else puts "Building Hnswlib vector store..." data = load_data_files(training_data_path) Boxcars.debug("Loaded #{data.length} files from #{training_data_path}") texts = split_text_into_chunks(data) Boxcars.debug("Split #{data.length} files into #{texts.length} chunks") vectors = generate_vectors(texts) Boxcars.debug("Generated #{vectors.length} vectors") add_vectors(vectors, texts) Boxcars.debug("Added #{vectors.length} vectors to vector store") save_vector_store { type: :hnswlib, vector_store: hnsw_vectors } end end |