Class: Baran::SentenceTextSplitter

Inherits:
TextSplitter show all
Defined in:
lib/baran/sentence_text_splitter.rb

Instance Attribute Summary

Attributes inherited from TextSplitter

#chunk_overlap, #chunk_size

Instance Method Summary collapse

Methods inherited from TextSplitter

#chunks, #joined, #merged

Constructor Details

#initialize(chunk_size: 1024, chunk_overlap: 64) ⇒ SentenceTextSplitter

Returns a new instance of SentenceTextSplitter.



5
6
7
# File 'lib/baran/sentence_text_splitter.rb', line 5

def initialize(chunk_size: 1024, chunk_overlap: 64)
  super(chunk_size: chunk_size, chunk_overlap: chunk_overlap)
end

Instance Method Details

#splitted(text) ⇒ Object



9
10
11
12
# File 'lib/baran/sentence_text_splitter.rb', line 9

def splitted(text)
  # Use a regex to split text based on the specified sentence-ending characters followed by whitespace
  text.scan(/[^.!?]+[.!?]+(?:\s+|\z)/).map(&:strip)
end