SentenceChunker
Split text into chunks while preserving sentence boundaries
The SentenceChunker
splits text into chunks while preserving complete sentences, ensuring that each chunk maintains proper sentence boundaries and context.
API Reference
To use the SentenceChunker
via the API, check out the API reference documentation.
Installation
SentenceChunker is included in the base installation of Chonkie. No additional dependencies are required.
Initialization
Parameters
Tokenizer to use. Can be a string identifier or a tokenizer instance
Maximum number of tokens per chunk
Number of overlapping tokens between chunks
Minimum number of sentences to include in each chunk
Minimum number of characters per sentence
Use approximate token counting for faster processing. Note: This field is deprecated and will be removed in future versions.
Delimiters to split sentences on
Include delimiters in the chunk text. If so, specify whether to include the previous or next delimiter.
Whether to return chunks as text strings or as SentenceChunk
objects.
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Supported Tokenizers
SentenceChunker supports multiple tokenizer backends:
-
TikToken (Recommended)
-
AutoTikTokenizer
-
Hugging Face Tokenizers
-
Transformers
Return Type
SentenceChunker returns chunks as SentenceChunk
objects with additional sentence metadata: