Class: Langchain::Chunker::Semantic

Inherits:

Base

Object
Base
Langchain::Chunker::Semantic

show all

Defined in:: lib/langchain/chunker/semantic.rb

Overview

LLM-powered semantic chunker. Semantic chunking is a technique of splitting texts by their semantic meaning, e.g.: themes, topics, and ideas. We use an LLM to accomplish this. The Anthropic LLM is highly recommended for this task as it has the longest context window (100k tokens).

Usage:

Langchain::Chunker::Semantic.new(
  text,
  llm: Langchain::LLM::Anthropic.new(api_key: ENV["ANTHROPIC_API_KEY"])
).chunks

Instance Attribute Summary collapse

#llm ⇒ Object readonly

Returns the value of attribute llm.
#prompt_template ⇒ Object readonly

Returns the value of attribute prompt_template.
#text ⇒ Object readonly

Returns the value of attribute text.

Instance Method Summary collapse

#chunks ⇒ Array<Langchain::Chunk>
#initialize(text, llm:, prompt_template: nil) ⇒ Semantic constructor

A new instance of Semantic.

Constructor Details

#initialize(text, llm:, prompt_template: nil) ⇒ `Semantic`

Returns a new instance of Semantic.

Parameters:

Langchain::LLM::* (Langchain::LLM::Base) —

instance
Optional (Langchain::Prompt::PromptTemplate) —

custom prompt template

# File 'lib/langchain/chunker/semantic.rb', line 18

def initialize(text, llm:, prompt_template: nil)
  @text = text
  @llm = llm
  @prompt_template = prompt_template || default_prompt_template
end

Instance Attribute Details

#llm ⇒ `Object` (readonly)

Returns the value of attribute llm.



15
16
17

# File 'lib/langchain/chunker/semantic.rb', line 15

def llm
  @llm
end

#prompt_template ⇒ `Object` (readonly)

Returns the value of attribute prompt_template.



15
16
17

# File 'lib/langchain/chunker/semantic.rb', line 15

def prompt_template
  @prompt_template
end

#text ⇒ `Object` (readonly)

Returns the value of attribute text.



15
16
17

# File 'lib/langchain/chunker/semantic.rb', line 15

def text
  @text
end

Instance Method Details

#chunks ⇒ `Array<Langchain::Chunk>`

Returns:

(Array<Langchain::Chunk>)

# File 'lib/langchain/chunker/semantic.rb', line 25

def chunks
  prompt = prompt_template.format(text: text)

  # Replace static 50k limit with dynamic limit based on text length (max_tokens_to_sample)
  completion = llm.complete(prompt: prompt, max_tokens_to_sample: 50000).completion
  completion
    .gsub("Here are the paragraphs split by topic:\n\n", "")
    .split("---")
    .map(&:strip)
    .reject(&:empty?)
    .map do |chunk|
      Langchain::Chunk.new(text: chunk)
    end
end