Class: Langchain::Chunker::Semantic

Inherits:
Base
  • Object
show all
Defined in:
lib/langchain/chunker/semantic.rb

Overview

LLM-powered semantic chunker. Semantic chunking is a technique of splitting texts by their semantic meaning, e.g.: themes, topics, and ideas. We use an LLM to accomplish this. The Anthropic LLM is highly recommended for this task as it has the longest context window (100k tokens).

Usage:

Langchain::Chunker::Semantic.new(
  text,
  llm: Langchain::LLM::Anthropic.new(api_key: ENV["ANTHROPIC_API_KEY"])
).chunks

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text, llm:, prompt_template: nil) ⇒ Semantic

Returns a new instance of Semantic.

Parameters:



18
19
20
21
22
# File 'lib/langchain/chunker/semantic.rb', line 18

def initialize(text, llm:, prompt_template: nil)
  @text = text
  @llm = llm
  @prompt_template = prompt_template || default_prompt_template
end

Instance Attribute Details

#llmObject (readonly)

Returns the value of attribute llm.



15
16
17
# File 'lib/langchain/chunker/semantic.rb', line 15

def llm
  @llm
end

#prompt_templateObject (readonly)

Returns the value of attribute prompt_template.



15
16
17
# File 'lib/langchain/chunker/semantic.rb', line 15

def prompt_template
  @prompt_template
end

#textObject (readonly)

Returns the value of attribute text.



15
16
17
# File 'lib/langchain/chunker/semantic.rb', line 15

def text
  @text
end

Instance Method Details

#chunksArray<Langchain::Chunk>

Returns:



25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/langchain/chunker/semantic.rb', line 25

def chunks
  prompt = prompt_template.format(text: text)

  # Replace static 50k limit with dynamic limit based on text length (max_tokens_to_sample)
  completion = llm.complete(prompt: prompt, max_tokens_to_sample: 50000).completion
  completion
    .gsub("Here are the paragraphs split by topic:\n\n", "")
    .split("---")
    .map(&:strip)
    .reject(&:empty?)
    .map do |chunk|
      Langchain::Chunk.new(text: chunk)
    end
end