Class: Langchain::Vectorsearch::Base

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Includes:
DependencyHelper
Defined in:
lib/langchain/vectorsearch/base.rb

Overview

Vector Databases

A vector database a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.

Available vector databases

Usage

  1. Pick a vector database from list.

  2. Review its documentation to install the required gems, and create an account, get an API key, etc

  3. Instantiate the vector database class:

    weaviate = Langchain::Vectorsearch::Weaviate.new(
      url:         ENV["WEAVIATE_URL"],
      api_key:     ENV["WEAVIATE_API_KEY"],
      index_name:  "Documents",
      llm:         Langchain::LLM::OpenAI.new(api_key:)
    )
    
    # You can instantiate other supported vector databases the same way:
    epsilla  = Langchain::Vectorsearch::Epsilla.new(...)
    milvus   = Langchain::Vectorsearch::Milvus.new(...)
    qdrant   = Langchain::Vectorsearch::Qdrant.new(...)
    pinecone = Langchain::Vectorsearch::Pinecone.new(...)
    chroma   = Langchain::Vectorsearch::Chroma.new(...)
    pgvector = Langchain::Vectorsearch::Pgvector.new(...)
    

Schema Creation

‘create_default_schema()` creates default schema in your vector database.

search.create_default_schema

(We plan on offering customizable schema creation shortly)

Adding Data

You can add data with:

  1. ‘add_data(path:, paths:)` to add any kind of data type

    my_pdf = Langchain.root.join("path/to/my.pdf")
    my_text = Langchain.root.join("path/to/my.txt")
    my_docx = Langchain.root.join("path/to/my.docx")
    my_csv = Langchain.root.join("path/to/my.csv")
    
    search.add_data(paths: [my_pdf, my_text, my_docx, my_csv])
    
  2. ‘add_texts(texts:)` to only add textual data

    search.add_texts(
      texts: [
        "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
        "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
      ]
    )
    

Retrieving Data

‘similarity_search_by_vector(embedding:, k:)` searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(
  embedding: ...,
  k: # number of results to be retrieved
)

‘vector_store.similarity_search(query:, k:)` generates an embedding for the query and searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(

embedding: ...,
k: # number of results to be retrieved

)

‘ask(question:)` generates an embedding for the passed-in question, searches the vector database for closest embeddings and then passes these as context to the LLM to generate an answer to the question.

search.ask(question: "What is lorem ipsum?")

Constant Summary collapse

DEFAULT_METRIC =
"cosine"

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from DependencyHelper

#depends_on

Constructor Details

#initialize(llm:) ⇒ Base

Returns a new instance of Base.

Parameters:

  • llm (Object)

    The LLM client to use



98
99
100
# File 'lib/langchain/vectorsearch/base.rb', line 98

def initialize(llm:)
  @llm = llm
end

Instance Attribute Details

#clientObject (readonly)

Returns the value of attribute client.



93
94
95
# File 'lib/langchain/vectorsearch/base.rb', line 93

def client
  @client
end

#index_nameObject (readonly)

Returns the value of attribute index_name.



93
94
95
# File 'lib/langchain/vectorsearch/base.rb', line 93

def index_name
  @index_name
end

#llmObject (readonly)

Returns the value of attribute llm.



93
94
95
# File 'lib/langchain/vectorsearch/base.rb', line 93

def llm
  @llm
end

Instance Method Details

#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ Object

Raises:

  • (ArgumentError)


183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'lib/langchain/vectorsearch/base.rb', line 183

def add_data(paths:, options: {}, chunker: Langchain::Chunker::Text)
  raise ArgumentError, "Paths must be provided" if Array(paths).empty?

  texts = Array(paths)
    .flatten
    .map do |path|
      data = Langchain::Loader.new(path, options, chunker: chunker)&.load&.chunks
      data.map { |chunk| chunk.text }
    end

  texts.flatten!

  add_texts(texts: texts)
end

#add_textsObject

Method supported by Vectorsearch DB to add a list of texts to the index

Raises:

  • (NotImplementedError)


118
119
120
# File 'lib/langchain/vectorsearch/base.rb', line 118

def add_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support adding texts"
end

#askObject

Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.

Raises:

  • (NotImplementedError)


155
156
157
# File 'lib/langchain/vectorsearch/base.rb', line 155

def ask(...)
  raise NotImplementedError, "#{self.class.name} does not support asking questions"
end

#create_default_schemaObject

Method supported by Vectorsearch DB to create a default schema

Raises:

  • (NotImplementedError)


108
109
110
# File 'lib/langchain/vectorsearch/base.rb', line 108

def create_default_schema
  raise NotImplementedError, "#{self.class.name} does not support creating a default schema"
end

#destroy_default_schemaObject

Method supported by Vectorsearch DB to delete the default schema

Raises:

  • (NotImplementedError)


113
114
115
# File 'lib/langchain/vectorsearch/base.rb', line 113

def destroy_default_schema
  raise NotImplementedError, "#{self.class.name} does not support deleting a default schema"
end

#generate_hyde_prompt(question:) ⇒ String

HyDE-style prompt

Parameters:

  • User's (String)

    question

Returns:

  • (String)

    Prompt



163
164
165
166
167
168
169
# File 'lib/langchain/vectorsearch/base.rb', line 163

def generate_hyde_prompt(question:)
  prompt_template = Langchain::Prompt.load_from_path(
    # Zero-shot prompt to generate a hypothetical document based on a given question
    file_path: Langchain.root.join("langchain/vectorsearch/prompts/hyde.yaml")
  )
  prompt_template.format(question: question)
end

#generate_rag_prompt(question:, context:) ⇒ String

Retrieval Augmented Generation (RAG)

Parameters:

  • question (String)

    User’s question

  • context (String)

    The context to synthesize the answer from

Returns:

  • (String)

    Prompt



176
177
178
179
180
181
# File 'lib/langchain/vectorsearch/base.rb', line 176

def generate_rag_prompt(question:, context:)
  prompt_template = Langchain::Prompt.load_from_path(
    file_path: Langchain.root.join("langchain/vectorsearch/prompts/rag.yaml")
  )
  prompt_template.format(question: question, context: context)
end

#get_default_schemaObject

Method supported by Vectorsearch DB to retrieve a default schema

Raises:

  • (NotImplementedError)


103
104
105
# File 'lib/langchain/vectorsearch/base.rb', line 103

def get_default_schema
  raise NotImplementedError, "#{self.class.name} does not support retrieving a default schema"
end

#remove_textsObject

Method supported by Vectorsearch DB to delete a list of texts from the index

Raises:

  • (NotImplementedError)


128
129
130
# File 'lib/langchain/vectorsearch/base.rb', line 128

def remove_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support deleting texts"
end

#similarity_searchObject

Method supported by Vectorsearch DB to search for similar texts in the index

Raises:

  • (NotImplementedError)


133
134
135
# File 'lib/langchain/vectorsearch/base.rb', line 133

def similarity_search(...)
  raise NotImplementedError, "#{self.class.name} does not support similarity search"
end

#similarity_search_by_vectorObject

Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector. You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.

Raises:

  • (NotImplementedError)


150
151
152
# File 'lib/langchain/vectorsearch/base.rb', line 150

def similarity_search_by_vector(...)
  raise NotImplementedError, "#{self.class.name} does not support similarity search by vector"
end

#similarity_search_with_hyde(query:, k: 4) ⇒ String

Paper: arxiv.org/abs/2212.10496 Hypothetical Document Embeddings (HyDE)-augmented similarity search

Parameters:

  • query (String)

    The query to search for

  • k (Integer) (defaults to: 4)

    The number of results to return

Returns:

  • (String)

    Response



143
144
145
146
# File 'lib/langchain/vectorsearch/base.rb', line 143

def similarity_search_with_hyde(query:, k: 4)
  hyde_completion = llm.complete(prompt: generate_hyde_prompt(question: query)).completion
  similarity_search(query: hyde_completion, k: k)
end

#update_textsObject

Method supported by Vectorsearch DB to update a list of texts to the index

Raises:

  • (NotImplementedError)


123
124
125
# File 'lib/langchain/vectorsearch/base.rb', line 123

def update_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support updating texts"
end