Class: Langchain::Vectorsearch::Base

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Includes:
DependencyHelper
Defined in:
lib/langchain/vectorsearch/base.rb

Overview

Vector Databases

A vector database a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.

Available vector databases

Usage

  1. Pick a vector database from list.

  2. Review its documentation to install the required gems, and create an account, get an API key, etc

  3. Instantiate the vector database class:

    weaviate = Langchain::Vectorsearch::Weaviate.new(
      url:         ENV["WEAVIATE_URL"],
      api_key:     ENV["WEAVIATE_API_KEY"],
      index_name:  "Documents",
      llm:         Langchain::LLM::OpenAI.new(api_key:)
    )
    
    # You can instantiate other supported vector databases the same way:
    milvus   = Langchain::Vectorsearch::Milvus.new(...)
    qdrant   = Langchain::Vectorsearch::Qdrant.new(...)
    pinecone = Langchain::Vectorsearch::Pinecone.new(...)
    chroma   = Langchain::Vectorsearch::Chroma.new(...)
    pgvector = Langchain::Vectorsearch::Pgvector.new(...)
    

Schema Creation

‘create_default_schema()` creates default schema in your vector database.

search.create_default_schema

(We plan on offering customizable schema creation shortly)

Adding Data

You can add data with:

  1. ‘add_data(path:, paths:)` to add any kind of data type

    my_pdf = Langchain.root.join("path/to/my.pdf")
    my_text = Langchain.root.join("path/to/my.txt")
    my_docx = Langchain.root.join("path/to/my.docx")
    my_csv = Langchain.root.join("path/to/my.csv")
    
    search.add_data(paths: [my_pdf, my_text, my_docx, my_csv])
    
  2. ‘add_texts(texts:)` to only add textual data

    search.add_texts(
      texts: [
        "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
        "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
      ]
    )
    

Retrieving Data

‘similarity_search_by_vector(embedding:, k:)` searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(
  embedding: ...,
  k: # number of results to be retrieved
)

‘vector_store.similarity_search(query:, k:)` generates an embedding for the query and searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(

embedding: ...,
k: # number of results to be retrieved

)

‘ask(question:)` generates an embedding for the passed-in question, searches the vector database for closest embeddings and then passes these as context to the LLM to generate an answer to the question.

search.ask(question: "What is lorem ipsum?")

Constant Summary collapse

DEFAULT_METRIC =
"cosine"

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from DependencyHelper

#depends_on

Constructor Details

#initialize(llm:) ⇒ Base

Returns a new instance of Base.

Parameters:

  • llm (Object)

    The LLM client to use



96
97
98
# File 'lib/langchain/vectorsearch/base.rb', line 96

def initialize(llm:)
  @llm = llm
end

Instance Attribute Details

#clientObject (readonly)

Returns the value of attribute client.



91
92
93
# File 'lib/langchain/vectorsearch/base.rb', line 91

def client
  @client
end

#index_nameObject (readonly)

Returns the value of attribute index_name.



91
92
93
# File 'lib/langchain/vectorsearch/base.rb', line 91

def index_name
  @index_name
end

#llmObject (readonly)

Returns the value of attribute llm.



91
92
93
# File 'lib/langchain/vectorsearch/base.rb', line 91

def llm
  @llm
end

Instance Method Details

#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ Object

Raises:

  • (ArgumentError)


181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/langchain/vectorsearch/base.rb', line 181

def add_data(paths:, options: {}, chunker: Langchain::Chunker::Text)
  raise ArgumentError, "Paths must be provided" if Array(paths).empty?

  texts = Array(paths)
    .flatten
    .map do |path|
      data = Langchain::Loader.new(path, options, chunker: chunker)&.load&.chunks
      data.map { |chunk| chunk.text }
    end

  texts.flatten!

  add_texts(texts: texts)
end

#add_textsObject

Method supported by Vectorsearch DB to add a list of texts to the index

Raises:

  • (NotImplementedError)


116
117
118
# File 'lib/langchain/vectorsearch/base.rb', line 116

def add_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support adding texts"
end

#askObject

Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.

Raises:

  • (NotImplementedError)


153
154
155
# File 'lib/langchain/vectorsearch/base.rb', line 153

def ask(...)
  raise NotImplementedError, "#{self.class.name} does not support asking questions"
end

#create_default_schemaObject

Method supported by Vectorsearch DB to create a default schema

Raises:

  • (NotImplementedError)


106
107
108
# File 'lib/langchain/vectorsearch/base.rb', line 106

def create_default_schema
  raise NotImplementedError, "#{self.class.name} does not support creating a default schema"
end

#destroy_default_schemaObject

Method supported by Vectorsearch DB to delete the default schema

Raises:

  • (NotImplementedError)


111
112
113
# File 'lib/langchain/vectorsearch/base.rb', line 111

def destroy_default_schema
  raise NotImplementedError, "#{self.class.name} does not support deleting a default schema"
end

#generate_hyde_prompt(question:) ⇒ String

HyDE-style prompt

Parameters:

  • User's (String)

    question

Returns:

  • (String)

    Prompt



161
162
163
164
165
166
167
# File 'lib/langchain/vectorsearch/base.rb', line 161

def generate_hyde_prompt(question:)
  prompt_template = Langchain::Prompt.load_from_path(
    # Zero-shot prompt to generate a hypothetical document based on a given question
    file_path: Langchain.root.join("langchain/vectorsearch/prompts/hyde.yaml")
  )
  prompt_template.format(question: question)
end

#generate_rag_prompt(question:, context:) ⇒ String

Retrieval Augmented Generation (RAG)

Parameters:

  • question (String)

    User’s question

  • context (String)

    The context to synthesize the answer from

Returns:

  • (String)

    Prompt



174
175
176
177
178
179
# File 'lib/langchain/vectorsearch/base.rb', line 174

def generate_rag_prompt(question:, context:)
  prompt_template = Langchain::Prompt.load_from_path(
    file_path: Langchain.root.join("langchain/vectorsearch/prompts/rag.yaml")
  )
  prompt_template.format(question: question, context: context)
end

#get_default_schemaObject

Method supported by Vectorsearch DB to retrieve a default schema

Raises:

  • (NotImplementedError)


101
102
103
# File 'lib/langchain/vectorsearch/base.rb', line 101

def get_default_schema
  raise NotImplementedError, "#{self.class.name} does not support retrieving a default schema"
end

#remove_textsObject

Method supported by Vectorsearch DB to delete a list of texts from the index

Raises:

  • (NotImplementedError)


126
127
128
# File 'lib/langchain/vectorsearch/base.rb', line 126

def remove_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support deleting texts"
end

#similarity_searchObject

Method supported by Vectorsearch DB to search for similar texts in the index

Raises:

  • (NotImplementedError)


131
132
133
# File 'lib/langchain/vectorsearch/base.rb', line 131

def similarity_search(...)
  raise NotImplementedError, "#{self.class.name} does not support similarity search"
end

#similarity_search_by_vectorObject

Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector. You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.

Raises:

  • (NotImplementedError)


148
149
150
# File 'lib/langchain/vectorsearch/base.rb', line 148

def similarity_search_by_vector(...)
  raise NotImplementedError, "#{self.class.name} does not support similarity search by vector"
end

#similarity_search_with_hyde(query:, k: 4) ⇒ String

Paper: arxiv.org/abs/2212.10496 Hypothetical Document Embeddings (HyDE)-augmented similarity search

Parameters:

  • query (String)

    The query to search for

  • k (Integer) (defaults to: 4)

    The number of results to return

Returns:

  • (String)

    Response



141
142
143
144
# File 'lib/langchain/vectorsearch/base.rb', line 141

def similarity_search_with_hyde(query:, k: 4)
  hyde_completion = llm.complete(prompt: generate_hyde_prompt(question: query)).completion
  similarity_search(query: hyde_completion, k: k)
end

#update_textsObject

Method supported by Vectorsearch DB to update a list of texts to the index

Raises:

  • (NotImplementedError)


121
122
123
# File 'lib/langchain/vectorsearch/base.rb', line 121

def update_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support updating texts"
end