Class: BxBuilderChain::Vectorsearch::Base

Inherits:
Object
  • Object
show all
Includes:
DependencyHelper
Defined in:
lib/bx_builder_chain/vectorsearch/base.rb

Overview

Vector Databases

A vector database a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.

Available vector databases

Usage

  1. Pick a vector database from list.

  2. Review its documentation to install the required gems, and create an account, get an API key, etc

  3. Instantiate the vector database class:

    weaviate = BxBuilderChain::Vectorsearch::Weaviate.new(
      url:         ENV["WEAVIATE_URL"],
      api_key:     ENV["WEAVIATE_API_KEY"],
      table_name:  "Documents",
      llm:         :openai,              # or :cohere, :hugging_face, :google_palm, or :replicate
      llm_api_key: ENV["OPENAI_API_KEY"] # API key for the selected LLM
    )
    
    # You can instantiate other supported vector databases the same way:
    milvus   = BxBuilderChain::Vectorsearch::Milvus.new(...)
    qdrant   = BxBuilderChain::Vectorsearch::Qdrant.new(...)
    pinecone = BxBuilderChain::Vectorsearch::Pinecone.new(...)
    chrome   = BxBuilderChain::Vectorsearch::Chroma.new(...)
    pgvector = BxBuilderChain::Vectorsearch::Pgvector.new(...)
    

Schema Creation

‘create_default_schema()` creates default schema in your vector database.

search.create_default_schema

(We plan on offering customizable schema creation shortly)

Adding Data

You can add data with:

  1. ‘add_data(path:, paths:)` to add any kind of data type

    my_pdf = BxBuilderChain.root.join("path/to/my.pdf")
    my_text = BxBuilderChain.root.join("path/to/my.txt")
    my_docx = BxBuilderChain.root.join("path/to/my.docx")
    my_csv = BxBuilderChain.root.join("path/to/my.csv")
    
    search.add_data(paths: [my_pdf, my_text, my_docx, my_csv])
    
  2. ‘add_texts(texts:)` to only add textual data

    search.add_texts(
      texts: [
        "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
        "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
      ]
    )
    

Retrieving Data

‘similarity_search_by_vector(embedding:, k:)` searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(
  embedding: ...,
  k: # number of results to be retrieved
)

‘vector_store.similarity_search(query:, k:)` generates an embedding for the query and searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(

embedding: ...,
k: # number of results to be retrieved

)

‘ask(question:)` generates an embedding for the passed-in question, searches the vector database for closest embeddings and then passes these as context to the LLM to generate an answer to the question.

search.ask(question: "What is lorem ipsum?")

Direct Known Subclasses

Pgvector

Constant Summary collapse

DEFAULT_METRIC =
"cosine"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from DependencyHelper

#depends_on

Constructor Details

#initialize(llm:) ⇒ Base

Returns a new instance of Base.

Parameters:

  • llm (Object)

    The LLM client to use



89
90
91
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 89

def initialize(llm:)
  @llm = llm
end

Instance Attribute Details

#clientObject (readonly)

Returns the value of attribute client.



84
85
86
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 84

def client
  @client
end

#llmObject (readonly)

Returns the value of attribute llm.



84
85
86
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 84

def llm
  @llm
end

#table_nameObject (readonly)

Returns the value of attribute table_name.



84
85
86
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 84

def table_name
  @table_name
end

Class Method Details

.logger_optionsObject



154
155
156
157
158
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 154

def self.logger_options
  {
    color: :blue
  }
end

Instance Method Details

#add_data(paths:) ⇒ Object

Raises:

  • (ArgumentError)


139
140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 139

def add_data(paths:)
  raise ArgumentError, "Paths must be provided" if Array(paths).empty?

  texts = Array(paths)
    .flatten
    .map do |path|
      data = BxBuilderChain::Loader.new(path)&.load&.chunks
      data.map { |chunk| chunk[:text] }
    end

  texts.flatten!

  add_texts(texts: texts)
end

#add_texts(**kwargs) ⇒ Object

Method supported by Vectorsearch DB to add a list of texts to the index

Raises:

  • (NotImplementedError)


109
110
111
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 109

def add_texts(**kwargs)
  raise NotImplementedError, "#{self.class.name} does not support adding texts"
end

#ask(**kwargs) ⇒ Object

Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.

Raises:

  • (NotImplementedError)


130
131
132
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 130

def ask(**kwargs)
  raise NotImplementedError, "#{self.class.name} does not support asking questions"
end

#create_default_schemaObject

Method supported by Vectorsearch DB to create a default schema

Raises:

  • (NotImplementedError)


99
100
101
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 99

def create_default_schema
  raise NotImplementedError, "#{self.class.name} does not support creating a default schema"
end

#destroy_default_schemaObject

Method supported by Vectorsearch DB to delete the default schema

Raises:

  • (NotImplementedError)


104
105
106
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 104

def destroy_default_schema
  raise NotImplementedError, "#{self.class.name} does not support deleting a default schema"
end

#generate_prompt(question:, context:, prompt_template: nil) ⇒ Object



134
135
136
137
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 134

def generate_prompt(question:, context:, prompt_template: nil)
  template = prompt_template || BxBuilderChain.configuration.default_prompt_template
  template % {context: context, question: question}
end

#get_default_schemaObject

Method supported by Vectorsearch DB to retrieve a default schema

Raises:

  • (NotImplementedError)


94
95
96
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 94

def get_default_schema
  raise NotImplementedError, "#{self.class.name} does not support retrieving a default schema"
end

#similarity_search(**kwargs) ⇒ Object

Method supported by Vectorsearch DB to search for similar texts in the index

Raises:

  • (NotImplementedError)


119
120
121
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 119

def similarity_search(**kwargs)
  raise NotImplementedError, "#{self.class.name} does not support similarity search"
end

#similarity_search_by_vector(**kwargs) ⇒ Object

Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector. You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.

Raises:

  • (NotImplementedError)


125
126
127
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 125

def similarity_search_by_vector(**kwargs)
  raise NotImplementedError, "#{self.class.name} does not support similarity search by vector"
end

#update_texts(**kwargs) ⇒ Object

Method supported by Vectorsearch DB to update a list of texts to the index

Raises:

  • (NotImplementedError)


114
115
116
# File 'lib/bx_builder_chain/vectorsearch/base.rb', line 114

def update_texts(**kwargs)
  raise NotImplementedError, "#{self.class.name} does not support updating texts"
end