VectorStore
A pure Ruby library for storing and querying vectors with optional 1 bit quantization. It provides an easy-to-use interface for adding vectors, computing cosine similarity, finding the closest vectors, and serializing to JSON. It also features quantized storage for significantly reduced memory requirements.
Features
- Vector Storage: Easily add and retrieve vectors with unique keys.
- Closest Match: Find the closest vectors to a given query vector using cosine similarity.
- Serialization: Serialize the vector store to JSON for persistence.
- Quantization: Optional 1-bit quantization to reduce memory footprint (significantly).
- Save/Load: Persist vector store to disk and reload it.
Installation
In a Gemfile
:
gem 'vectorstore'
Or directly:
gem install vectorstore
Basic Usage
require 'vectorstore'
# Create a new VectorStore
store = VectorStore.new
# Add some vectors
store.add("vector1", [1.0, 2.0, 3.0])
store.add("vector2", [2.0, 3.0, 4.0])
store.add("vector3", [3.0, 4.0, 5.0])
store.add("vector4", [0.0, 0.0, 0.0]) # Zero vector edge case
# Calculate cosine similarity between two vectors
similarity = store.cosine_similarity([1.0, 2.0, 3.0], [2.0, 3.0, 4.0])
puts "Cosine similarity: #{similarity}"
# Find the closest vectors to a query vector
closest = store.find_closest([2.0, 3.0, 4.0], 2)
puts "Closest vectors: #{closest.inspect}"
# Save the store to disk
store.save("vector_store.json")
# Load the store from disk
loaded_store = VectorStore.new
loaded_store.load("vector_store.json")
Using OpenAI for vector embedding text
VectorStore can integrate with OpenAI's API to generate embeddings for text inputs and queries. To use this feature with quantization (STRONGLY RECOMMENDED), initialize the store with quantized mode.
[!TIP] Let me iterate again, using quantization with OpenAI embeddings is strongly recommended with VectorStore as the normal way we currently store the vectors is very space inefficient, particularly when serializing to disk.
Example:
store = VectorStore.new(quantized: true)
store.add_with_openai("example", "Your sample text to embed")
# You can query with text and have the text embedded automatically
store.find_closest_with_openai("Your query text", 3)
# You can also query by the key of the vector
store.find_closest_with_key("example")
Supporting other embedding systems in a nice way would be good for the future, but I like OpenAI's embedding mechanism and it's cheap, so this is just step one. You can see example scripts in examples/example_openai_*.rb
for a broader demo.
[!NOTE]
For now, your API key is assumed to be in theOPENAI_API_KEY
environment variable. Thetext-embedding-3-small
model is also used by default but this can be overridden in calls by using theembedding_model
keyword argument onfind_closest_with_openai
andadd_with_openai
calls.
Working with quantized vectors
VectorStore supports 1 bit vector quantization so that vectors can be stored in a bitfield (using a ASCII-encoded string with 8 bits per character for portability) for a significant memory use reduction. The cost is accuracy, especially on low dimension vectors – high dimension vectors such as used for text embeddings from OpenAI's API (see above) will fare a LOT better. Initialize the store with the quantized: true
option:
store = VectorStore.new(quantized: true)
store.add("vectorQ", [1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, -1.0])
Running the Tests
This project uses Minitest. To run the tests:
rake test
License
This project is licensed under the MIT License. See the LICENSE file for details.