Class: Langchain::Vectorsearch::Base
- Inherits:
-
Object
- Object
- Langchain::Vectorsearch::Base
- Extended by:
- Forwardable
- Includes:
- DependencyHelper
- Defined in:
- lib/langchain/vectorsearch/base.rb
Overview
Vector Databases
A vector database a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.
Available vector databases
Usage
-
Pick a vector database from list.
-
Review its documentation to install the required gems, and create an account, get an API key, etc
-
Instantiate the vector database class:
weaviate = Langchain::Vectorsearch::Weaviate.new( url: ENV["WEAVIATE_URL"], api_key: ENV["WEAVIATE_API_KEY"], index_name: "Documents", llm: Langchain::LLM::OpenAI.new(api_key:) ) # You can instantiate other supported vector databases the same way: epsilla = Langchain::Vectorsearch::Epsilla.new(...) milvus = Langchain::Vectorsearch::Milvus.new(...) qdrant = Langchain::Vectorsearch::Qdrant.new(...) pinecone = Langchain::Vectorsearch::Pinecone.new(...) chroma = Langchain::Vectorsearch::Chroma.new(...) pgvector = Langchain::Vectorsearch::Pgvector.new(...)
Schema Creation
‘create_default_schema()` creates default schema in your vector database.
search.create_default_schema
(We plan on offering customizable schema creation shortly)
Adding Data
You can add data with:
-
‘add_data(path:, paths:)` to add any kind of data type
my_pdf = Langchain.root.join("path/to/my.pdf") my_text = Langchain.root.join("path/to/my.txt") my_docx = Langchain.root.join("path/to/my.docx") my_csv = Langchain.root.join("path/to/my.csv") search.add_data(paths: [my_pdf, my_text, my_docx, my_csv])
-
‘add_texts(texts:)` to only add textual data
search.add_texts( texts: [ "Lorem Ipsum is simply dummy text of the printing and typesetting industry.", "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s" ] )
Retrieving Data
‘similarity_search_by_vector(embedding:, k:)` searches the vector database for the closest `k` number of embeddings.
search.similarity_search_by_vector(
embedding: ...,
k: # number of results to be retrieved
)
‘vector_store.similarity_search(query:, k:)` generates an embedding for the query and searches the vector database for the closest `k` number of embeddings.
search.similarity_search_by_vector(
embedding: ...,
k: # number of results to be retrieved
)
‘ask(question:)` generates an embedding for the passed-in question, searches the vector database for closest embeddings and then passes these as context to the LLM to generate an answer to the question.
search.ask(question: "What is lorem ipsum?")
Direct Known Subclasses
Chroma, Elasticsearch, Epsilla, Hnswlib, Milvus, Pgvector, Pinecone, Qdrant, Weaviate
Constant Summary collapse
- DEFAULT_METRIC =
"cosine"
Instance Attribute Summary collapse
-
#client ⇒ Object
readonly
Returns the value of attribute client.
-
#index_name ⇒ Object
readonly
Returns the value of attribute index_name.
-
#llm ⇒ Object
readonly
Returns the value of attribute llm.
Instance Method Summary collapse
- #add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ Object
-
#add_texts ⇒ Object
Method supported by Vectorsearch DB to add a list of texts to the index.
-
#ask ⇒ Object
Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.
-
#create_default_schema ⇒ Object
Method supported by Vectorsearch DB to create a default schema.
-
#destroy_default_schema ⇒ Object
Method supported by Vectorsearch DB to delete the default schema.
-
#generate_hyde_prompt(question:) ⇒ String
HyDE-style prompt.
-
#generate_rag_prompt(question:, context:) ⇒ String
Retrieval Augmented Generation (RAG).
-
#get_default_schema ⇒ Object
Method supported by Vectorsearch DB to retrieve a default schema.
-
#initialize(llm:) ⇒ Base
constructor
A new instance of Base.
-
#remove_texts ⇒ Object
Method supported by Vectorsearch DB to delete a list of texts from the index.
-
#similarity_search ⇒ Object
Method supported by Vectorsearch DB to search for similar texts in the index.
-
#similarity_search_by_vector ⇒ Object
Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector.
-
#similarity_search_with_hyde(query:, k: 4) ⇒ String
Paper: arxiv.org/abs/2212.10496 Hypothetical Document Embeddings (HyDE)-augmented similarity search.
-
#update_texts ⇒ Object
Method supported by Vectorsearch DB to update a list of texts to the index.
Methods included from DependencyHelper
Constructor Details
#initialize(llm:) ⇒ Base
Returns a new instance of Base.
98 99 100 |
# File 'lib/langchain/vectorsearch/base.rb', line 98 def initialize(llm:) @llm = llm end |
Instance Attribute Details
#client ⇒ Object (readonly)
Returns the value of attribute client.
93 94 95 |
# File 'lib/langchain/vectorsearch/base.rb', line 93 def client @client end |
#index_name ⇒ Object (readonly)
Returns the value of attribute index_name.
93 94 95 |
# File 'lib/langchain/vectorsearch/base.rb', line 93 def index_name @index_name end |
#llm ⇒ Object (readonly)
Returns the value of attribute llm.
93 94 95 |
# File 'lib/langchain/vectorsearch/base.rb', line 93 def llm @llm end |
Instance Method Details
#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ Object
183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
# File 'lib/langchain/vectorsearch/base.rb', line 183 def add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) raise ArgumentError, "Paths must be provided" if Array(paths).empty? texts = Array(paths) .flatten .map do |path| data = Langchain::Loader.new(path, , chunker: chunker)&.load&.chunks data.map { |chunk| chunk.text } end texts.flatten! add_texts(texts: texts) end |
#add_texts ⇒ Object
Method supported by Vectorsearch DB to add a list of texts to the index
118 119 120 |
# File 'lib/langchain/vectorsearch/base.rb', line 118 def add_texts(...) raise NotImplementedError, "#{self.class.name} does not support adding texts" end |
#ask ⇒ Object
Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.
155 156 157 |
# File 'lib/langchain/vectorsearch/base.rb', line 155 def ask(...) raise NotImplementedError, "#{self.class.name} does not support asking questions" end |
#create_default_schema ⇒ Object
Method supported by Vectorsearch DB to create a default schema
108 109 110 |
# File 'lib/langchain/vectorsearch/base.rb', line 108 def create_default_schema raise NotImplementedError, "#{self.class.name} does not support creating a default schema" end |
#destroy_default_schema ⇒ Object
Method supported by Vectorsearch DB to delete the default schema
113 114 115 |
# File 'lib/langchain/vectorsearch/base.rb', line 113 def destroy_default_schema raise NotImplementedError, "#{self.class.name} does not support deleting a default schema" end |
#generate_hyde_prompt(question:) ⇒ String
HyDE-style prompt
163 164 165 166 167 168 169 |
# File 'lib/langchain/vectorsearch/base.rb', line 163 def generate_hyde_prompt(question:) prompt_template = Langchain::Prompt.load_from_path( # Zero-shot prompt to generate a hypothetical document based on a given question file_path: Langchain.root.join("langchain/vectorsearch/prompts/hyde.yaml") ) prompt_template.format(question: question) end |
#generate_rag_prompt(question:, context:) ⇒ String
Retrieval Augmented Generation (RAG)
176 177 178 179 180 181 |
# File 'lib/langchain/vectorsearch/base.rb', line 176 def generate_rag_prompt(question:, context:) prompt_template = Langchain::Prompt.load_from_path( file_path: Langchain.root.join("langchain/vectorsearch/prompts/rag.yaml") ) prompt_template.format(question: question, context: context) end |
#get_default_schema ⇒ Object
Method supported by Vectorsearch DB to retrieve a default schema
103 104 105 |
# File 'lib/langchain/vectorsearch/base.rb', line 103 def get_default_schema raise NotImplementedError, "#{self.class.name} does not support retrieving a default schema" end |
#remove_texts ⇒ Object
Method supported by Vectorsearch DB to delete a list of texts from the index
128 129 130 |
# File 'lib/langchain/vectorsearch/base.rb', line 128 def remove_texts(...) raise NotImplementedError, "#{self.class.name} does not support deleting texts" end |
#similarity_search ⇒ Object
Method supported by Vectorsearch DB to search for similar texts in the index
133 134 135 |
# File 'lib/langchain/vectorsearch/base.rb', line 133 def similarity_search(...) raise NotImplementedError, "#{self.class.name} does not support similarity search" end |
#similarity_search_by_vector ⇒ Object
Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector. You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.
150 151 152 |
# File 'lib/langchain/vectorsearch/base.rb', line 150 def similarity_search_by_vector(...) raise NotImplementedError, "#{self.class.name} does not support similarity search by vector" end |
#similarity_search_with_hyde(query:, k: 4) ⇒ String
Paper: arxiv.org/abs/2212.10496 Hypothetical Document Embeddings (HyDE)-augmented similarity search
143 144 145 146 |
# File 'lib/langchain/vectorsearch/base.rb', line 143 def similarity_search_with_hyde(query:, k: 4) hyde_completion = llm.complete(prompt: generate_hyde_prompt(question: query)).completion similarity_search(query: hyde_completion, k: k) end |
#update_texts ⇒ Object
Method supported by Vectorsearch DB to update a list of texts to the index
123 124 125 |
# File 'lib/langchain/vectorsearch/base.rb', line 123 def update_texts(...) raise NotImplementedError, "#{self.class.name} does not support updating texts" end |