Class: BxBuilderChain::Chunker::RecursiveText

Inherits:
Object
  • Object
show all
Defined in:
lib/bx_builder_chain/chunker/recursive_text.rb

Overview

Recursive text chunker. Preferentially splits on separators.

Usage:

BxBuilderChain::Chunker::RecursiveText.new(text).chunks

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text, chunk_size: 1000, chunk_overlap: 200, separators: ["\n\n", "\n", ".", " ", ""]) ⇒ RecursiveText

Returns a new instance of RecursiveText.

Parameters:

  • text (String)
  • chunk_size (Integer) (defaults to: 1000)
  • chunk_overlap (Integer) (defaults to: 200)
  • separators (Array<String>) (defaults to: ["\n\n", "\n", ".", " ", ""])


20
21
22
23
24
25
# File 'lib/bx_builder_chain/chunker/recursive_text.rb', line 20

def initialize(text, chunk_size: 1000, chunk_overlap: 200, separators: ["\n\n", "\n", ".", " ", ""])
  @text = text
  @chunk_size = chunk_size
  @chunk_overlap = chunk_overlap
  @separators = separators
end

Instance Attribute Details

#chunk_overlapObject (readonly)

Returns the value of attribute chunk_overlap.



14
15
16
# File 'lib/bx_builder_chain/chunker/recursive_text.rb', line 14

def chunk_overlap
  @chunk_overlap
end

#chunk_sizeObject (readonly)

Returns the value of attribute chunk_size.



14
15
16
# File 'lib/bx_builder_chain/chunker/recursive_text.rb', line 14

def chunk_size
  @chunk_size
end

#separatorsObject (readonly)

Returns the value of attribute separators.



14
15
16
# File 'lib/bx_builder_chain/chunker/recursive_text.rb', line 14

def separators
  @separators
end

#textObject (readonly)

Returns the value of attribute text.



14
15
16
# File 'lib/bx_builder_chain/chunker/recursive_text.rb', line 14

def text
  @text
end

Instance Method Details

#chunksArray<String>

Returns:

  • (Array<String>)


28
29
30
31
32
33
34
35
# File 'lib/bx_builder_chain/chunker/recursive_text.rb', line 28

def chunks
  splitter = Baran::RecursiveCharacterTextSplitter.new(
    chunk_size: chunk_size,
    chunk_overlap: chunk_overlap,
    separators: separators
  )
  splitter.chunks(text)
end