Class: Langchain::Chunker::Semantic
- Defined in:
- lib/langchain/chunker/semantic.rb
Overview
LLM-powered semantic chunker. Semantic chunking is a technique of splitting texts by their semantic meaning, e.g.: themes, topics, and ideas. We use an LLM to accomplish this. The Anthropic LLM is highly recommended for this task as it has the longest context window (100k tokens).
Usage:
Langchain::Chunker::Semantic.new(
text,
llm: Langchain::LLM::Anthropic.new(api_key: ENV["ANTHROPIC_API_KEY"])
).chunks
Instance Attribute Summary collapse
-
#llm ⇒ Object
readonly
Returns the value of attribute llm.
-
#prompt_template ⇒ Object
readonly
Returns the value of attribute prompt_template.
-
#text ⇒ Object
readonly
Returns the value of attribute text.
Instance Method Summary collapse
- #chunks ⇒ Array<Langchain::Chunk>
-
#initialize(text, llm:, prompt_template: nil) ⇒ Semantic
constructor
A new instance of Semantic.
Constructor Details
#initialize(text, llm:, prompt_template: nil) ⇒ Semantic
Returns a new instance of Semantic.
18 19 20 21 22 |
# File 'lib/langchain/chunker/semantic.rb', line 18 def initialize(text, llm:, prompt_template: nil) @text = text @llm = llm @prompt_template = prompt_template || default_prompt_template end |
Instance Attribute Details
#llm ⇒ Object (readonly)
Returns the value of attribute llm.
15 16 17 |
# File 'lib/langchain/chunker/semantic.rb', line 15 def llm @llm end |
#prompt_template ⇒ Object (readonly)
Returns the value of attribute prompt_template.
15 16 17 |
# File 'lib/langchain/chunker/semantic.rb', line 15 def prompt_template @prompt_template end |
#text ⇒ Object (readonly)
Returns the value of attribute text.
15 16 17 |
# File 'lib/langchain/chunker/semantic.rb', line 15 def text @text end |
Instance Method Details
#chunks ⇒ Array<Langchain::Chunk>
25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/langchain/chunker/semantic.rb', line 25 def chunks prompt = prompt_template.format(text: text) # Replace static 50k limit with dynamic limit based on text length (max_tokens_to_sample) completion = llm.complete(prompt: prompt, max_tokens_to_sample: 50000).completion completion .gsub("Here are the paragraphs split by topic:\n\n", "") .split("---") .map(&:strip) .reject(&:empty?) .map do |chunk| Langchain::Chunk.new(text: chunk) end end |